Fork me on GitHub

If I have a websocket component that has an internal clojure.core.async/mult to distribute all incoming messages, does it make sense to have that component extend the clojure.core.async/Mult protocol? Is that the intended use of the protocol? or is it better to just make my own tap functions in my own namespace


Given N sets, how do I find out the largest possible set (or sets) that is a subset of all N sets?

(magic [#{1 1} #{1 2} #{1 3}]) ;; => #{1}
Brute-force solutions wouldn't cut it for the size of my inputs




That's not what I want, as the inputs can be disparate and most likely the intersection will be #{} 99% of the times So I should reword: of all N sets -> of the most sets as possible


I still don't understand. What would your desired function return for [#{1 2} #{3 4}], for example?


Or better yet, can you describe the algorithm? Because I cannot parse that sentence with "->". :)


something like the union of the results of intersecting of each subset with the union of all subsets?

👀 3

Uhm, that would be just the union of the input, no?


yeah, oops, I guess you could intersect each subset with union of all the other subsets?


I don't know. I think the task is not defined clearly enough for us to come up with any robust solution.


But it feels like a task of turning a 2D metric into a 1D one, and that always requires some particular approach to dimensionality reduction.


> What would your desired function return for [#{1 2} #{3 4}], for example? that's a pretty degenerate case (for my needs) since it's very small and there's nothing in common. I don't care if it returns #{} or something else I'm trying to find the largest possible subsets given N sets of size 1000 to 100000 each, which are presumed to have quite a lot in common. There will be many possible answers, and what a 'good' anwer is can be tweaked. e.g. maybe for me a set with 300 elements that is a subset of 40 other sets (but not a subset of 20 other sets), is interesting enough


Combine the sets into a bag having all elements. Then use frequencies on the bag to find the largest count?

👍 3

yes, counting seems a good first step :) I hope it doesn't lead to an exhaustive search


What would that return for [#{1 2} #{1 2} #{3} #{3}]? The frequencies are the same, yet there's no #{1 2 3} subset.


#{1 2} seems a good answer, because its size is large, even if it's not a subset of all inputs. Ultimately I'm not seeking an unequivocal mathematical function, but something that gives good answers to humans for the desired size range (1000-100000). at sizes 1-10 it's probably easy to get lost because the differences are too small to be evident. for example, for an input of 100 sets sized 10000 each: * an answer of a set sized 5000 which is a subset of 80 sets is very valuable * an answer of a set sized 9999 which is a subset of 1 set is not very valuable * an answer of a set sized 2 which is a subset of 80 sets is not very valuable


...maybe my problem is too specific, sorry for that. at first I thought it could be a commonly solved problem.


The main issue is that you have 2 numbers (subset size and the amount of its supersets) that you have to reduce to just one ("score" of a particular subset). It's a dimensionality reduction problem, and there never is a cookie-cutter solution. You will have to somehow write that function that takes two numbers and outputs just one.


thanks! dimensionality reduction seems a good read


And a deep rabbit hole. :)