xtdb 2021-06-06 | Slack Archive

rschmukler16:06:34

Hey all! I'm currently working on a raw index from a snapshot and seemingly have managed to crash the JVM on my machine when using the rocksdb backend. Presumably the C library is statically linked in org.rocksdb/rocksdbjni , although I'd seen the query work fine on a Mac that I was working with. The following call syntax should be fine, right?

(with-open [index-snapshot (db/open-index-snapshot index-store)]
    (->> (db/av index-snapshot :person/name nil)
         (map (partial db/decode-value index-snapshot))
         (into [])))

nivekuil17:06:23

just a guess, but in some places the crux code checks if it should open-nested-index-snapshot instead

rschmukler17:06:03

Nice tip. I'm actually doing this scan pretty close to node start up so I think I shouldn't need to. It looks like it may have something to do with a lock on the DB not being cleared from a previous shutdown. Digging further still

rschmukler17:06:17

Can confirm that the code works if I manually delete the db.rocksdb/LOCK file after the node has been shutdown with (.close node). I would have expected .close to clean that up for me, so digging further...

refset22:06:39

Ah, LOCK files came up recently on Zulip: https://juxt-oss.zulipchat.com/#narrow/stream/194466-crux/topic/Lock.20file.20cleanups/near/235162945 I don't think the file ever gets removed once its created, even after closing everything down, but the OS shoud release the lock after you .close (verifiable with lslocks or equivalent)

refset22:06:47

> I'm currently working on a raw index from a snapshot Exciting - this has me very curious 🙂

rschmukler22:06:55

See below 🙂

😄 3

rschmukler22:06:40

Turns out that it was actually lazyness that was biting me. Forgot to doall a for.

☺️ 3

nivekuil22:06:19

> Forgot to doall a for. I've taken to using x/for from cgrand/xforms instead, and generally just always pass eductions around instead of lazy sequences. too many footguns with lazy stuff for me, and the extra perf is nice

nivekuil22:06:18

thanks for releasing crux-geo, could be really useful for me too :) I really wish we could easily take advantage of checkpoints for these extra index stores!

refset22:06:30

> I really wish we could easily take advantage of checkpoints for these extra index stores Me too 😅 but we'll get to it soon! As per https://github.com/juxt/crux/issues/1221

nivekuil22:06:43

somewhat related question while I've got you here: would the proposed index store sharding strategy also extend to other index stores or is it specific to the triples?

rschmukler22:06:18

Definitely will add checkpointing once it lands. Perhaps Crux could define an actract Protocol Checkpointable that would allow any system implementing it to hook into the checkpointer...

👌 3

refset23:06:23

> would the proposed index store sharding strategy also extend to other index stores or is it specific to the triples? (the context here is R&D for a next-gen index) it's still very early days on that front, but it would be focused on triples at first. I know Arrow offers a lot of options for extension types though 🙂

nivekuil23:06:20

ah, so you'd be thinking of storing extra indices in the same storage layer as the triples, unlike today where it's managed independently? interesting

refset00:06:18

Something along those lines, yes. The goal is to have multiple storage layers, all implemented with Arrow, and then ideally custom indexes can also hook into this "adaptive indexing" architecture.

rschmukler22:06:42

Just opened up access to https://github.com/teknql/crux-geo - thanks Crux team for the awesome foundation

🎉 23

refset22:06:03

Awesome, nice job! > allows querying across time It's great to see this, and I'm really happy that crux-lucene was able to pave the way 🙂

refset22:06:44

Does it already solve everything you need for your use-case?

rschmukler22:06:16

It does! Thanks again for all the work on this and open-sourcing Crux. I remember seeing it being unveiled at Clojure North (if you remember someone asking about additional dimensions of temporality 👋) and I was so happy (I'd just spent a year approximating something similar in Haskell, so it was really cool to see it arrive in fuller form). Following along from crux-lucene made implementing this quite painless. This bug (https://github.com/juxt/crux/issues/1523) may prove to be an issue for us

rschmukler22:06:53

(Not so much as it relates to that exact problem, but joins are in the wrong order, a nested traversal is super expensive if you have hundreds of thousands of entities that could be filtered down to like 200 w/ one of the indices (lucene, geo spatial)

refset22:06:57

Ahh, it's wonderful to hear the story come full circle!

refset22:06:01

I think an easier workaround to #1523 is just to use subqueries for the expensive predicates, but yeah we'll look into it more this week

rschmukler22:06:35

I had considered that as well but actually got the same dependency error (or a derivative one) when I had tried it, if I recall correctly

rschmukler22:06:17

Although perhaps if I had moved everything into the subquery (ie. not passing in ?a as an arg) that would resolve it. I didn't wrestle with it too long

2021-06-06

Channels