Fork me on GitHub
#xtdb
<
2019-05-16
>
denik00:05:02

Can Crux’s underlying architecture support a default sort order?

refset10:05:28

Hi 🙂 the ave and aev indexes are sorted using built-in binary sorting within the KV store which is helpful for attributes and range queries over them. However the range queries and the results aren't necessarily related to the ultimate sort order from the result, as that depends on the join order as well. The query planner is currently optimised for joining not sorting. One thing you could use to help is the external sort capability in to do more advanced sorting on top of the lazy seq response, but this isn't in the public API as it stands today.

denik12:05:35

Thanks!

👍 4
hoppy03:05:32

I'm getting a NullPointerException from the

hoppy03:05:29

this is arch linux

hoppy03:05:33

any ideas?

hoppy03:05:07

ps - I get the same sort of trouble using the memory kv

megakorre08:05:38

@hoppy we have not seen this error before. I'm seeing some rocksdb issues on github with this description. going to read through them.

megakorre08:05:55

What error are you getting with memory kv?

megakorre08:05:27

I'm also on arch linux 🙂

hoppy10:05:22

the memory kv produces this trace, however it only does it when being fired up in a jacked-in calva (cider) repl. starting from a command line repl seems to work (at least load)

megakorre10:05:24

that error I have seen before. Any chance I can get you to try it out with crux "19.04-1.0.4-alpha-SNAPSHOT" 🙂 ?

hoppy10:05:50

and use the memory kv?

megakorre10:05:11

yea jack in and get the last error you again

hoppy10:05:42

I'm digging a bit into the rocksdb thing. They are playing the parlor trick of building the .so with crossbuild and stuffing it in the jar and extracting it otfly. Ask me how I know this is a bitey dog. They built it on a centos container, so likely the .so isn't quite so tasty on arch, but you say you are getting away with this?

hoppy10:05:02

and yes, I'll try the snapshot after coffee reload.

🙂 4
hoppy10:05:02

pretty much a clone of the prior version

hoppy10:05:07

just for fun @megakorre, what jvm / version are you running

megakorre10:05:42

openjdk 11.0.3 2019-04-16
OpenJDK Runtime Environment (build 11.0.3+4)
OpenJDK 64-Bit Server VM (build 11.0.3+4, mixed mode)

hoppy10:05:07

alrighty, not that then

megakorre10:05:14

Compiling CruxTest.core you have some aot namespaces?

megakorre10:05:28

ok so I just reproduced the error by having a global (def system (crux/start-... in a namespace that was getting aot compiled. I'm assuming this is what you are doing to.

megakorre10:05:46

you are jacking in with the uberjar profile 😮

hoppy10:05:20

mkay, perhaps we fix that.

megakorre10:05:50

but then I have a understanding of what that bug is. I wonder if I can reproduce the rocksdb error with the same condition

hoppy10:05:18

worth a shot, but I expect the way they build that .so is problem

megakorre11:05:24

can I get some info about your system @hoppy? what are you running?

hoppy11:05:26

started with rocksdb out of AUR, built it myself last night, but that didn't change anything

megakorre11:05:15

the undefined symbol in the error is from a library jemalloc its available to install from pacman. Can you try installing it?

hoppy11:05:20

no effect, but I probably need to rebuild rocks after that

megakorre11:05:42

I did not have it installed myself and could not find it on my system so not sure. I got the impression that it should be statically linked into the rocksdb .so file but maybe I'm wrong about that

hoppy11:05:39

switching back to aur, maybe yours is dragging?

megakorre11:05:23

in the snapshot version you installed we are using 6.0.1

hoppy11:05:56

are you guys bringing your own rocksdb in the jar, or using what you find?

megakorre11:05:03

it should be the one from the jar

hoppy11:05:02

is the directive to have a dependancy on rocksdbjni valid then, need a different ver?

megakorre11:05:04

right yes you provide it yourself. Well either version should work I have tried both today on my machine (the other version being the one in the blog)

hoppy11:05:29

Seems like it's on them. I pulled the .so out of their jar and this is hanging out

megakorre11:05:57

what does ldd librocksdbjni-linux64.so get you?

hoppy11:05:10

heh you wish

hoppy11:05:27

linux-vdso.so.1 (0x00007fff4b7f3000) libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007fcaecf60000) librt.so.1 => /usr/lib/librt.so.1 (0x00007fcaecf50000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007fcaecdc0000) libm.so.6 => /usr/lib/libm.so.6 (0x00007fcaecc78000) libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fcaecc58000) libc.so.6 => /usr/lib/libc.so.6 (0x00007fcaeca90000) /usr/lib64/ld-linux-x86-64.so.2 (0x00007fcaed9a0000)

hoppy11:05:53

It's not calling for jemalloc, which means it didn't dynamically link that.

hoppy11:05:26

but this is their jar they built, I presume this is produced by rocksdb

hoppy11:05:15

I probably need to dump the AUR and build it myself

hoppy11:05:37

although it begs the issue of how you are managing

Jorin16:05:36

Hi there again, I know I’m not using the right tool for the job but I’m storing small binary blobs as base64 strings in crux. I read that all top-level attributes are indexed automatically? What exactly does that mean? Can I use that to not have crux index an attribute by nesting it in another map? Thank you 🙂

refset16:05:28

Hey! "Can I use that to not have crux index an attribute by nesting it in another map" -- yep that should work

Jorin16:05:17

Thank you for your quick reply! That’s perfect. So crux checks the type of the value and only indexes simple types?

refset16:05:25

Only top-level fields (attributes) in your documents are indexed and that means the ave and aev indexes are populated accordingly

refset16:05:35

crux will index any key-value combination that conforms to the spec...let me find you the details on that

Jorin16:05:22

Awesome, thank you!

refset16:05:23

I believe that values over a certain size simply don't get indexed in ave at least

refset16:05:29

I will let you know when I have the canonical answer and update the FAQ 🙂

Jorin17:05:02

Yes, seems like it’s cut off at some point… maybe that is enough for my performance concerns and I will worry about it again if it’s actually a problem 😄 If I get this correctly, wrapping it in a map might not help but it will “freeze” the whole map instead: https://github.com/juxt/crux/blob/master/src/crux/codec.clj#L243-L245 But please don’t worry for now. The part where the buffer size is limited is totally enough safety for what I’m doing right now. Thank you again!

refset17:05:29

Hot off the press:

Strings over a certain length (128) only gets indexed as hashes. This disables range queries for the attribute, but exact match (would one want to) still works.
you can see this limit in `crux.codec/max-string-index-length` (and its usage)

refset17:05:24

You're very welcome, and thanks for using Crux ☺️

Jorin17:05:27

Perfect! That describes it pretty clear 🙂

Jorin17:05:34

Also, please let me know if there is a better place than Slack for questions like this

Jorin17:05:21

I’m hoping to share the little pet project I’m working at some point but no guarantees when that will be 😄

refset17:05:45

Slack is fine, although we also have a public juxt-oss Zulip account which has quite a bit of this sort of activity too

refset17:05:04

I look forward to seeing it!

refset17:05:29

Make sure to write a blog post 😉

Jorin17:05:39

One more thing I’m trying to figure out is how to implement transactional constraints properly. I suppose my struggles are trivial for someone with more experience implementing distributed data stores 😃 The cas transaction works great on single entities. When trying to implement for example a unique constraint for a certain attribute across entities, I can do that by cleaning up duplicate data later on using “fix-on-read” or “fix-on-write” (not ideal but possible). Also works. What I’m struggling with right now is deletes with possible references: If I have an entity and I like to delete it only if no one references it anymore, I first do a query to check if there are not references, then I can issue a delete transaction. I cannot ensure that no one created a new reference between my query and the deletion. Then only thing I can come up with is doing “fix-on-read” using the historical data. Guess this is not really specific to crux and I’m also glad about general resource on this topic 🙂 Also, are there any future plans to build a transactional layer on top of crux or is this simply the wrong use-case?

Jorin19:05:03

Never mind. I’m sure I will learn more about this some day. But for now I found a simpler solution for my use case (thanks to the talk at Clojure/north https://youtu.be/3Stja6YUB94?t=2349). I can simply create a single transactor node/thread. That way I can keep all guarantees on write. I’m sure at some point those patterns will also be listed in a place like the FAQ section 🙂

refset09:05:21

Hi again - sorry to keep you waiting on a response! Your use-case of "CAS deletion when no nodes contain a given attribute" is a good example of a higher-level transactional constraint and your single-writer solution is the correct answer for now. We would definitely like to see libraries and decorators emerge to address this area. Someone on reddit asked a similar question: https://www.reddit.com/r/Clojure/comments/bohl4a/clojurenorth_the_crux_of_bitemporality/enqodda/ Also this HN thread was interesting to read a couple of days ago: https://news.ycombinator.com/item?id=19907771 ...we will definitely be investing more thought and energy in this as well!

Jorin09:05:30

Awesome! Thank you for sharing these discussions 🙂

👍 4