off-topic 2018-08-31 | Slack Archive

john01:08:22

I'd pay someone to implement a BloomFilterHashMap, like in this paper http://webee.technion.ac.il/~ayellet/Ps/nelson.pdf, in cljc with something like this https://github.com/davidwclin/cljc-bloom/tree/master/src

john01:08:41

What's a good rate for something like that?

dpsutton01:08:15

what's your use case?

john01:08:02

Well I think it'd be generally good for everyone to have one around, for the same reasons described in the paper

john01:08:18

Personally, I'd like to implement a graph db over an indexical log over a BFHM and play with turning the error rate way up, but pass a ton of data through it, to see if you can get probabilistic inference in a way similar to ANNs.

john01:08:59

Sorta like a bitwise ANN I'm thinking

john01:08:02

Could be a false hunch, but it'd be cool to have BFHMs around anyway

emccue04:08:13

do they provide a reference impl?

emccue04:08:29

the paper's authors i mean

john14:08:57

Looks like there's a java implementation here https://github.com/egrim/java-bloomier-filter

john14:08:26

among others https://github.com/search?q=bloomier+filter

dpsutton14:08:28

i'm still not sure i get what the structure does. It caches a function where that function sends the vast majority of inputs to one output and a small subset to different inputs. It guarantees that the small subset is accurate and the vast majority is mostly accurate?

john17:08:48

I think what's happening with that partitioning structure is that if you happen to get a positive, since it might be a false positive, it sends it through another bloom filter. And it does so in cascading way, in order to reduce false positives in a nested domain sort of way

john17:08:07

Though I could be wrong. But I'm still unsure how a functional relation is embedded in what still seems to me to be an existence check, however nested.

dpsutton18:08:56

it's just membership checking. member in the set that maps to 0, A, B, etc

dpsutton18:08:30

pre-image is a common term

john15:08:41

One use case is distributed malware detection https://arxiv.org/pdf/1601.01405.pdf

john15:08:55

Fast scanning speed with less memory
usage: By layering a cache-efficient bloom
filter on top of the more costly bloomier
filter, BitAV manages to increase end-to-end
throughput of the average-case input by 14×,
and requires less memory to do so than traditional
algorithms

dpsutton15:08:26

Interesting. Thanks

john15:08:08

I'm still trying to grok the bloomier paper myself though

john15:08:37

another paper on bloomier filters: https://arxiv.org/pdf/0807.0928.pdf

john15:08:03

Some bloomier constraints I just read on a ppt:

- Extend [bloom filters] to handle approximate functions.
- Each element of set has associated function value.
- Non-set elements should return null.
- Want to always return correct function value for set elements.
- A false positive returns a function value for a non-null element.

schmee15:08:40

another cool thing related to this, theta sketches: https://datasketches.github.io/docs/TheChallenge.html

👍 4

john15:08:53

I guess if you can tune the accuracy of your function, you can make them go faster

jaihindhreddy-duplicate15:08:54

@dpsutton For example, you can have a large set of weak passwords in a bloom filter on the front end, and you can ask it if the password chosen by the user definitely exists in the set, then you can inform the user. The Bloom filter makes doing this both fast and light on memory.

dpsutton15:08:14

yes i get the bloom filter for sure. just not the bloomier filter

😶 4

dpsutton15:08:35

since it seems to remember a small subset and the vast majority of things map to 0

john15:08:02

Yeah, I'm trying to grok what an "approximate function" is, in the context of a hash collision domain

john15:08:17

Yeah, I guess this paper is exploring along the lines of my hunch https://www.computer.org/csdl/proceedings/iccvw/2009/4442/00/05457544-abs.html

john15:08:22

Well, that java implementation seems pretty simple. It's always awkward converting a codebase to clojure, just so you can understand the code base 😆

john15:08:53

But I'd really like to understand it and I'd never met a piece of pure clojure code I couldn't manage to understand, which is why I'm willing to commission the work 🙂 Already got one offer via DM for $500

john15:08:46

Similar to the vein of "Papers we love," I'd like to start a "papers we'd like to commission the replication of in clojure"

😁 4

❤️ 24

🤤 4

john17:08:39

Okay, so we've got one offer for 500$. The offer is still open, if others would like to do it for cheaper. Also, please chime in on a thread here or DM me directly if you'd like pitch in on the fund to get this implemented. Also, if you have any recommendations for how to organize the public funding of a data structure like this, please advise, thanks!

deliciousowl19:08:51

keep up the great work guys we need more libs 🙂

john19:08:44

I'd also pitch in on a 'clojure-Xprize' for Ewin Tang's quantum-slaying recommender algorithm mentioned above. I wonder if there's a platform for that sort of thing that would fit with the clojure community.

john19:08:13

"bounty-system"! That's the name. There was some hype about them a few years ago.

john19:08:47

hmm, this one's cool, but requires ETH or some other forms of bitcoin

john19:08:49

https://openbounty.status.im/

👍 4

john20:08:15

written in clojure, no less

vijaykiran20:08:33

This could also be an option: https://www.bountysource.com

✅ 4