xtdb 2021-06-05 | Slack Archive

Santiago09:06:06

👋 I recently tried Neo4J to model some data and anwer questions like “how many hops away from node k is a node with attribute = x”. Now, I would like to experiment with a Clojure solution and I was wondering if Crux is a good database to answer these sort of questions, or if something else is more appropriate

refset20:06:37

Hey @UFPEDL1LY yes Crux can certainly answer those kinds of graph questions, although the queries are invariably more complex to write using Datalog than with Cypher (which is fair, since Datalog is far more general purpose). There have been discussions over on Zulip with similar examples, for instance: https://juxt-oss.zulipchat.com/#narrow/stream/194466-crux/topic/Recursive.20query.20performance.20on.20bi-directional.20relations/near/209796850 ...I'll paste a query from that thread here to give a flavour, since not everyone here is signed up to the Crux Zulip, but I recommend taking a look at that thread:

(c/submit-tx node
[[:crux.tx/put {:crux.db/id :user-1
                  :name "User 1"
                  :friends #{:user-2 :user-3}}]
 [:crux.tx/put {:crux.db/id :user-2
                :name "User 2"
                :friends #{:user-1 :user-4}}]
 [:crux.tx/put {:crux.db/id :user-3
                :name "User 3"
                :friends #{:user-1 :user-5}}]
 [:crux.tx/put {:crux.db/id :user-4
                :name "User 4"
                :friends #{:user-2}}]
 [:crux.tx/put {:crux.db/id :user-5
                :name "User 5"
                :friends #{:user-3 :user-6}}]
 [:crux.tx/put {:crux.db/id :user-6
                :name "User 6"
                :friends #{}}]])

(c/q (c/db node)
     {:find '[a b d]
      :args [{'a :user-2}]
      :rules '[[(friend [a d] b d*)
                [(not= a b)]
                [a :friends b]
                [(identity d) d*]]
               [(friend [a d] b d**)
                [(not= a b)]
                [(+ d 1) d*]
                [(< d 10)]
                [a :friends t]
                (friend t d* b d**)]]
      :where '[(friend a 0 b d)]})

refset20:06:15

If you have a specific Cypher query you're keen to translate I'd be happy to help

refset20:06:55

Rather coincidentally, there was a thread about graph databases on Reddit yesterday where I shared a link about some SPARQL->Cypher translation that we use in crux-bench https://www.reddit.com/r/Clojure/comments/ns5l9s/how_to_query_datomic_datascript_asami_or_other/h0m4fh6?utm_source=share&utm_medium=web2x&context=3

refset20:06:19

Santiago08:06:07

@U899JBRPF nice one thanks! I’m in Zulip too, I’ll check the conversation over there too. I looked at Asami and opened a discussion thread about this. It seems to be a better match for my needs, but I’m still confused about the whole schemeless concept like what format should I use to model relationships?

refset21:06:19

Cool, I know Asami definitely has some interesting graph-y features but I've not attempted to use them. That said, I don't think there's anything in particular that Crux would struggle with (asides from wildcard attributes, perhaps) Did you see other thing I wrote in a comment on that Reddit thread about implementing bi-directional breadth-first search? > implementing efficient "graph algorithms" requires some effort, e.g. see https://hashrocket.com/blog/posts/using-datomic-as-a-graph-database (we also adapted this implementation for https://github.com/juxt/crux/blob/08d6f6451ee297508d43507dfe76e9993de19129/docs/example/imdb/src/imdb/main.clj#L106-L207). By contrast, something like Neo4j additionally provides a bunch of graph algorithms out-of-the-box Modelling relationships in Crux comes down to generality vs performance. Like you can have a Crux entity that represents a "relationship", or you can rely on direct references which can only store properties indirectly.

dominicm20:06:07

What is Crux's pagination story atm? Is there an equivalent to https://use-the-index-luke.com/no-offset for giant datasets?

dominicm20:06:13

https://jayanta-mondal.medium.com/the-curious-case-of-pagination-for-gremlin-queries-d6fd9518620 just stumbled onto this which might answer my question generally.

refset21:06:21

Hey, after a quick skim of the TinkerPop article it looks pretty much on point, and broadly applicable to Crux, thanks for mentioning it (I'll have to read it thoroughly soon!). However a similar question came up on GitHub recently and I wrote up a bunch stuff about what you can do using the current query engine https://github.com/juxt/crux/discussions/1514

dominicm21:06:16

Yeah. I saw that. A bit tough for us to maintain our own indexes, not very easy to invert strings really either, although perhaps possible. 1515 looks pretty good, although only if it also allowed tackling offset too. If values were provided as the offset, could that help jump in? I don't remember the kv discussions well enough to say whether that's the case or not.

refset22:06:32

I imagine 1515 could handle offsets too, so I've made a note about it on there. Solving the duplication/inversion problem would be trickier but doesn't seem impossible - IIRC the real issue is that iterating with KV prefix scans is typically only efficient going in the "default" direction

2021-06-05

Channels