Fork me on GitHub
#off-topic
<
2022-12-02
>
Ben Sless07:12:38

Best data structure to map ranges of ints to values with: Good lookup performance Way of representing the difference between two versions such that t1 could be transformed efficiently to t2

Ben Sless07:12:13

Additional info: ranges never overlap but updates might move ranges

Martynas Maciulevičius07:12:14

The second requirement is not very basic. I think you have to specify what that is further to understand what you want there. Also can ranges merge once you move them? If understand you never split the ranges to produce new ones.

Martynas Maciulevičius08:12:55

I think that to calculate a diff you would still need to do the linear lookup through all of the nodes. And to lookup a range you'll need two indexes (two maps) so you'll have to do two lookups whatever way you look at it. And then you have to do the intersection of results :thinking_face:

lassemaatta08:12:51

if you're looking for an off-the-shelf solution, I think guava has some ImmutableRangeMap stuff which sounds a bit like the stuff you're describing.

Martynas Maciulevičius08:12:35

So a tuple->data map, then x->tuple and y->tuple maps too to lookup that tuple. And then you have to maintain these two additional maps as you change the ranges. Edit: This is how I'd implement an "immutable range map". Maybe it exists already.

Ben Sless08:12:52

Y fast trie?

Ben Sless08:12:09

Looks very similar to Clojure's vectors

Rupert (All Street)09:12:14

Maybe try a https://docs.oracle.com/javase/8/docs/api/java/util/TreeMap.html it has log(N) lookup time complexity. Use the bottom of your range as the Key for each value. It supports floorEntry() that will find the highest key below the value that you are searching for. e.g. {1 :a 5 :b 10 :c }.floorEntry(7)=>[5 :b] To increase performance, you could add a clojure LRU cache on top for constant time lookup of recent values.

Ben Sless11:12:06

Aha, I need a tree bitmap (dealing with IPs)

Rupert (All Street)11:12:00

Yeah tree bitmap potentially looks good. Advantage of Treemap is it's built into the language (so it's available and implementation is solid) and doesn't have much API to learn.

Martynas Maciulevičius11:12:22

I think there should be a way to use TreeMap's .ceilingEntry and .floorEntry to encode ranges because if your node has [:end] then it could mean that the range is below the found node. i.e. you could save #{:end :end-inclusive :start-inclusive :start} set in the map and this would allow you to save ranges while retaining the numbers. Then you'd have a number, let's say 200 and you'd find two sets (floor+ceiling). Then you'd need to decide based on these sets whether it's in the range. That could probably work :thinking_face:

👍 1
Ben Sless11:12:26

I can implement the tree map if I can understand the paper

dpsutton14:12:17

zach tellman made > “An adaptation of Okasaki and Gill’s \“Fast Mergeable Integer Maps`\“, which can be found at http://ittc.ku.edu/~andygill/papers/IntMap98.pdfhttps://github.com/clojure/data.int-map/blob/master/src/main/clojure/clojure/data/int_map.clj

dpsutton14:12:04

Not sure if that hits your requirement (which i don’t understand) of t1 transformed efficiently into t2 but some prior art in the area

Ben Sless15:12:37

Thanks, I realized exactly what data structure I need and found there aren't many implementations of it

👍 1
Ben Sless15:12:43

If you happen to find a reference implementation of Tree Bitmap for IP addresses which isn't written in Rust please lmk 🙂

walterl12:12:45

> The aim of this blog is to be a bridge between [software engineering] researchers and practitioners. Each post highlights some useful results from studies past and present in the hope that this will encourage discussion of what we know, what we think we know that ain't actually so, why we believe some things but not others, and what questions should be tackled next. https://neverworkintheory.org/about/

🎓 3
😎 4