clojure 2020-09-27 | Slack Archive

jjttjj15:09:36

If I have a websocket component that has an internal clojure.core.async/mult to distribute all incoming messages, does it make sense to have that component extend the clojure.core.async/Mult protocol? Is that the intended use of the protocol? or is it better to just make my own tap functions in my own namespace

vemv15:09:07

Given N sets, how do I find out the largest possible set (or sets) that is a subset of all N sets?

(magic [#{1 1} #{1 2} #{1 3}]) ;; => #{1}

Brute-force solutions wouldn't cut it for the size of my inputs

p-himik16:09:07

clojure.set/intersection.

vemv16:09:44

That's not what I want, as the inputs can be disparate and most likely the intersection will be #{} 99% of the times So I should reword: of all N sets -> of the most sets as possible

p-himik16:09:55

I still don't understand. What would your desired function return for [#{1 2} #{3 4}], for example?

p-himik16:09:58

Or better yet, can you describe the algorithm? Because I cannot parse that sentence with "->". :)

chucklehead16:09:44

something like the union of the results of intersecting of each subset with the union of all subsets?

👀 3

p-himik16:09:22

Uhm, that would be just the union of the input, no?

chucklehead16:09:20

yeah, oops, I guess you could intersect each subset with union of all the other subsets?

p-himik16:09:58

I don't know. I think the task is not defined clearly enough for us to come up with any robust solution.

p-himik16:09:40

But it feels like a task of turning a 2D metric into a 1D one, and that always requires some particular approach to dimensionality reduction.

vemv16:09:31

> What would your desired function return for [#{1 2} #{3 4}], for example? that's a pretty degenerate case (for my needs) since it's very small and there's nothing in common. I don't care if it returns #{} or something else I'm trying to find the largest possible subsets given N sets of size ~~1000 to~~ 100000 each, which are presumed to have quite a lot in common. There will be many possible answers, and what a 'good' anwer is can be tweaked. e.g. maybe for me a set with 300 elements that is a subset of 40 other sets (but not a subset of 20 other sets), is interesting enough

fadrian16:09:55

Combine the sets into a bag having all elements. Then use frequencies on the bag to find the largest count?

👍 3

vemv16:09:52

yes, counting seems a good first step :) I hope it doesn't lead to an exhaustive search

p-himik16:09:45

What would that return for [#{1 2} #{1 2} #{3} #{3}]? The frequencies are the same, yet there's no #{1 2 3} subset.

vemv16:09:21

#{1 2} seems a good answer, because its size is large, even if it's not a subset of all inputs. Ultimately I'm not seeking an unequivocal mathematical function, but something that gives good answers to humans for the desired size range (1000-100000). at sizes 1-10 it's probably easy to get lost because the differences are too small to be evident. for example, for an input of 100 sets sized 10000 each: * an answer of a set sized 5000 which is a subset of 80 sets is very valuable * an answer of a set sized 9999 which is a subset of 1 set is not very valuable * an answer of a set sized 2 which is a subset of 80 sets is not very valuable

vemv16:09:07

...maybe my problem is too specific, sorry for that. at first I thought it could be a commonly solved problem.

p-himik16:09:41

The main issue is that you have 2 numbers (subset size and the amount of its supersets) that you have to reduce to just one ("score" of a particular subset). It's a dimensionality reduction problem, and there never is a cookie-cutter solution. You will have to somehow write that function that takes two numbers and outputs just one.

vemv16:09:18

thanks! dimensionality reduction seems a good read

p-himik16:09:15

And a deep rabbit hole. :)

p-himik16:09:17

This is a quick informative read: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006907

2020-09-27

Channels