Fork me on GitHub
#xtdb
<
2020-10-02
>
nivekuil21:10:56

which is the better way of modeling many-to-many edges with data on the edges? I think I see a familiar tradeoff here

;; (1)   {:crux.db/id           1    :user/friends         #{2 3}    :user/friend-metadata {2 {:close? true}                           3 {:close? false}}}   ;; (2)   {:crux.db/id {:friendship #{1 2}}    :friends    #{1 2}    :close?     true}   {:crux.db/id {:friendship #{1 3}}    :friends    #{1 3}    :close?     false}

refset22:10:02

I suspect Option 2 is what you want to pursue although the performance will be somewhat worse due to the extra indirection. The problem with Option 1 is that your map value contents under friend-metadata won't get indexed and will therefore be invisible to the Datalog (which I'm guessing will be problematic for how you hope to use Datalog - though I may be wrong!).

nivekuil22:10:06

I think you can ask the same kinds of questions with (1), but you'd have to reach into the map in application logic rather than relying on the query engine

nivekuil22:10:33

well, actually, no it wouldn't be as powerful since you can't reach into the map in a nested fashion efficiently

nivekuil22:10:17

so if you wanted to ask "give me all close friends of 1" efficiently you'd have to build your own index from that map somehow, which would indeed suck.. but you could answer "is 2 a close friend?" pretty well

refset22:10:37

There is some discussion about modelling "friend" relationships over on Zulip that might be of interest, in case you hadn't seen it: https://juxt-oss.zulipchat.com/#narrow/stream/194466-crux/topic/Recursive.20query.20performance.20on.20bi-directional.20relations/near/209986334 Valery (in that thread) and I also exchanged quite a few DMs on the topic. Feel free to revive the thread over there if you're curious and maybe we can have some fresh debate and exchange findings 🙂

nivekuil22:10:01

reading through it -- one of these days I'll set up a matrix<->zulip bridge; being able to revive threads is indeed nice

😁 3
dominicm21:10:04

Just poking around crux-jdbc, seems like querying the tx-log doesn't use any kind of limits. Does that mean that on a big catchup on some db drivers (which don't support paging) will cause an OOM? I see the KVIterator uses batches of 100.

refset21:10:51

Hmm, seems plausible. I'll note it on the board to investigate further. Thanks for mentioning it!

nivekuil22:10:37

so I actually suspect that it may be an anti-pattern to model undirected edges at all, since you can query just as efficiently in either direction. The friendship example being undirected edges is just the canonical model I reach for since that's how fb modeled it, but I think it makes more sense with directed edges (especially the property of closeness), and I can't think of any examples that can't be modeled with only directed edges. Plus neo4j only has directed edges, so it probably makes sense to think of Crux in the same manner?

💡 3
Steven Deobald23:10:34

I've generally come to the same conclusion. Every time I think I've thought of a meaningful undirected edge, I realize I just haven't given the relationship enough time in my head.

3
refset23:10:34

I'm inclined to agree also, directed edges seem to be far more universal. If you really need it, you can model the notion of undirectedness as a Datalog rule. Also, friendship isn't an edge, it's a feeling 😉

❤️ 3