Fork me on GitHub
#xtdb
<
2020-07-21
>
lgessler03:07:57

hi, can anyone speak to using crux as a granular graph database? i have some graph data where every "unit" (a typical screenful for a user) would consist of 1000s of nodes, each with only a few attributes, in a graph db like neo4j. i'm drawn to crux for its awesome features but i'm worried about perf if i use it this way, and i don't want to de-granulate (turn my 1000s of nodes into bigger documents) because i fear it'll make queries hard

jarohen08:07:10

Hi Luke 🙂 Crux does have some work to do for each document added - we maintain a bitemporal index for each document so that we know which version of the document was active at any point in our 2D time (valid time/transaction time). One of the advantages of documents in this case is that the ingest-time maintenance and the query-time lookups are shared for all the attributes in a single document, so with more documents there'd naturally be more work to do. That said, if you're mainly adding documents at the current time and not often querying back through transaction time (back through valid time is fine) this overhead should be relatively small - both ingest and query are optimised for this case. Best, as always, to profile it for your individual use case 🙂 Happy to help out where we can - feel free to email <mailto:[email protected]|[email protected]>/DM if it involves details you're not comfortable sharing publicly

lgessler17:07:19

thank you for the reply, that's helpful! Right, so for my usecase I think querying at a time other than the present (in either dimension) will be pretty rare, so that's not a concern. Now, assuming join performance is good enough, I guess my last remaining major concern is that I often want to query over entire subgraphs, e.g. to delete some a1 , some b1 and b2 which are joined on some attribute :foo to a1, and some c1, c2, c3 which are joined to b1 or b2 on :bar, etc. Neo4j's Cypher query language makes this really easy: MATCH (a1 {id: ...}) OPTIONAL MATCH (a1)<-[*]-(n) DETACH DELETE a1,n , which reads sth like "Find a1, and delete a1 along with any nodes that can reach a1 by following outgoing edges". In crux, I think the way I'd do something similar would be to use https://opencrux.com/docs#queries_rules to implement some kind of subgraph rule, though as far as I can tell this would require me to enumerate every attribute within that subgraph that is a join I want to follow. This isn't the end of the world, especially since datalog in Clojure is just data (so I could compose query fragments), but I was wondering if you know of any easier ways for querying subgraphs?

refset18:07:35

Hi @U49U72C4V it sounds like rules will probably help, but you may want to borrow some ideas from here: https://github.com/juxt/crux/blob/7268f97d738c8c1df0c78ca954c7ffbd4a0002d0/docs/example/imdb/src/imdb/main.clj

refset18:07:14

Actually, I suspect your specific use-case will inevitably involve scanning through (in Clojure) all possible attributes that might be references pointing at a1. To make this more ergonomic (avoiding Clojure) we would ideally support variables in the attribute positions of a triple clause, and to make this more efficient (particularly of concern if you have many thousands of attributes to scan through each time) we would really have to introduce an additional index. Incidentally I shared some fairly relevant thoughts about such an index here: https://github.com/tonsky/datascript/issues/351

🚀 3
refset18:07:39

Are fast and transactional deletion operations critical for your use-case?

lgessler18:07:18

re: the multiple attribute problem, the attributes I expect to be in play when I'm finding my subgraph are a closed set, so I can keep a query fragment around (something like what was used in that blog post) to list all of them. as long as I have that I think I don't need attribute-position variables, right?

👍 3
lgessler18:07:59

re: deletion operations, speed doesn't matter at all (deletes are very rare), and while atomicity would be nice I suppose, I don't expect any problems would arise from non-transactional deletes of entities in the subgraph

🙂 3
lgessler19:07:33

I guess I'm curious about why you asked though--are either of those things difficult to achieve with crux in this kind of usecase?

refset19:07:44

I wouldn't say difficult, just not as optimal as things could be (ergonomics & performance), and certainly worth careful testing and validation against the various non-functional reqs

👍 3
lgessler04:07:16

thanks a lot for the detailed answers! i'm really hopeful crux will work out once i get the time to take a more detailed look. bitemporality is such a killer feature 🙂

👌 3
Scot22:07:53

Does anyone know if there a is way to convert map ids back into maps when returned from a query?

dominicm06:07:31

Are you looking for crux/entity?

refset08:07:46

What values do you see being returned if not the maps? Are you seeing hashes?

Scot18:07:57

They're tagged literal #crux/id s, (e.g. #{[#crux/id "{:type :foo :foo-id \"my-foo\"}"] [#crux/id "{:type :foo :foo-id \"another-foo\"}"]})

Scot19:07:47

Oops. looks like they don't need to be explicitly tagged on submit-tx

refset19:07:28

Ah yes, that makes sense. I've not considered that combination before! Glad you figured it out 🙂