Fork me on GitHub
#xtdb
<
2021-07-12
>
Noah Bogart04:07:30

Hey all. I’m getting increasingly frustrated with mongodb as the backend for my little web app multiplayer turn based game. Are there any resources on transitioning from mongo to crux? I’m well versed in sql if it helps and very new to datalog

refset08:07:33

Hey @UEENNMX0T! We don't have resources on transitioning from Mongo specifically, but we did begin implementing a Mango->Datalog query compiler a few months ago, see https://github.com/jonpither/crux/blob/select/crux-test/test/crux/select_test.clj and https://github.com/jonpither/crux/blob/select/crux-core/src/crux/select.clj - unfortunately that work isn't available as part of Crux officially yet but perhaps you can borrow some ideas. Otherwise I'd suggest just running through our tutorials to start feeling comfortable with Datalog https://opencrux.com/tutorials/tutorials.html I'm very happy to answer any questions or help with specific problems 🙂

refset09:07:42

Do you have an example of a query from Mongo that you'd like to see ~roughly converted to Datalog?

Noah Bogart13:07:37

Less query specific but how to move data from within a mongo database into a crux database

refset14:07:15

Ah! I misread 😄 Same answer though...no resources on performing the Mongo->Crux data migration. Although you probably want to model Mongo's collections using a :type attribute, since Crux's indexes are comparatively "global" Which Mongo value types are you using?

Noah Bogart14:07:36

no worries, I wasn’t specific enough. Right now it’s text, numbers, dates, and booleans, with lots of nested data

👌 2
Steven Deobald15:07:37

Is your game open source? It might help to take a look at how you're nesting documents, then folks could chip in with pointers and gotchas.

Noah Bogart15:07:04

I should preface with “I inherited this mess!” lmao. We could normalize it, but that’s not great for mongo, so we’ve just leaned into the document storing

Steven Deobald15:07:46

It actually looks pretty amenable to normalization to me, if you want to migrate to Crux. Although Crux can ingest nested maps, it prefers flat records. The contents of a nested map are opaque to the query engine. All the top-level attributes which correspond to further-nested maps could become references. As far as I can tell, the nested maps aren't themselves too nested, which might save you some grief in terms of mapping a big denormalized tree to normalized Crux docs.

Steven Deobald15:07:42

The one thing to keep in mind is that small "mutations" (like this inc: https://github.com/mtgred/netrunner/blob/1b7a9001ed28201e5d70be42bcfb21ea51f65ce5/src/clj/web/stats.clj#L53) are probably something you want to localize to one document as much as possible, since Crux doesn't do structural sharing over a large doc receiving tiny incremental updates. That said, your data appears to be pretty ephemeral? If a game isn't revisited after it's finished, this might not be as big a concern. I suppose it depends how often you update stats mid-game.

Noah Bogart20:07:15

that’s good to know, thank you

nivekuil21:07:06

are range queries expected to work on instants? e.g.

(doseq [i (range 20)]
    (c/put node {:crux.db/id i :t (t/<< (t/now) (t/new-duration i :days))}))

  @(c/q node '{:find  [?e]
               :in    [MAX]
               :where [[?e :t ?t]
                       [(< ?t MAX)]]}
        java.time.Instant/MAX)
the range predicate doesn't seem to affect the result order #{[0] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [1] [2] [3] [4] [5] [6] [7] [8] [9]}

refset09:07:33

Hey, yep range queries should work, since Instants are encoded, as per https://github.com/juxt/crux/blob/b990d2f09a52636d885b894c3bdf1fdcdb5c012f/crux-core/src/crux/codec.clj#L340-L346 However, what is the type of (t/<< (t/now) (t/new-duration i :days))? Is that an Instant also? (I don't have a REPL with tick handy, sorry 😅 ) That could explain things though, since range constraints don't respect type boundaries intuitively, mentioned briefly here https://github.com/juxt/crux/pull/1281

nivekuil10:07:51

yup, it's an Instant, should be no comparing across types going on

nivekuil10:07:52

(assert (= (type java.time.Instant/MAX) (type (t/<< (t/now) (t/new-duration 1 :days)))))

refset10:07:26

thanks for confirming. What is the vars-in-join-order ? Available in the crux.query DEBUG log or via crux.query/query-plan-for

nivekuil10:07:29

query-plan-for is new to me, that's nice 🙂

{:depth->constraints [nil nil nil nil],
 :in-bindings [{:bind-type :scalar, :idx-id in165610, :tuple-idxs-in-join-order [0]}],
 :var->bindings {?e #crux.query.VarBinding {:attr :crux.db/id,
                                            :e-var ?e,
                                            :result-index 1,
                                            :result-name ?e,
                                            :type :entity,
                                            :value? false,
                                            :var ?e},
                 ?t #crux.query.VarBinding {:attr :t,
                                            :e-var ?e,
                                            :result-index 0,
                                            :result-name ?e,
                                            :type :entity,
                                            :value? false,
                                            :var ?t},
                 MAX #crux.query.VarBinding {:attr nil,
                                             :e-var nil,
                                             :result-index 2,
                                             :result-name crux.query.value/MAX,
                                             :type :in-var,
                                             :value? true,
                                             :var MAX}},
 :var->cardinality {?e 100.80263456077057, ?t 11.935930428907975},
 :var->joins {?e [{:id triple165609, :idx-fn #<Fn@7b59dd27 crux.query/triple_joins[fn]>}],
              ?t [{:id triple165609, :idx-fn #<Fn@7b59dd27 crux.query/triple_joins[fn]>}],
              MAX [{:id in165610, :idx-fn #<Fn@1aa7c7b crux.query/in_joins[fn]>}]},
 :var->logic-var-range-constraint-fns
   {MAX [#<Fn@418ecdbd crux.query/build_logic_var_range_constraint_fns[fn]>]},
 :var->range-constraints {},
 :vars-in-join-order [?t ?e MAX]}

☺️ 2
refset10:07:52

Ah, sorry, I realise what's happening now! The query planner doesn't prioritise joining against the MAX binding, because it ~can't in the general case (I'm not 100% sure of the details as to why), so if you want the range constraint to be prioritised you have to pass the MAX value in using a literal (not an :in param)

refset10:07:20

I regret that this isn't better documented. There's a note about it here https://github.com/juxt/crux/blob/b990d2f09a52636d885b894c3bdf1fdcdb5c012f/crux-test/test/crux/query_test.clj#L1794-L1797 and several other query tests show the literal behaviour working as expected

nivekuil10:07:44

aha, that probably explains it. how do I bind a literal again

refset10:07:00

using awkward quoting like [(list '< '? java.time.Instant/MAX)]

nivekuil10:07:20

still seems to be the same order

(crux.query/query-plan-for
   (crux/db node)
   {:find  '[?e]
    :where ['[?e :t ?t]
            [(list '< '?t java.time.Instant/MAX)]]}) 
{:depth->constraints [nil nil nil],
 :in-bindings [],
 :var->bindings {?e #crux.query.VarBinding {:attr :crux.db/id,
                                            :e-var ?e,
                                            :result-index 1,
                                            :result-name ?e,
                                            :type :entity,
                                            :value? false,
                                            :var ?e},
                 ?t #crux.query.VarBinding {:attr :t,
                                            :e-var ?e,
                                            :result-index 0,
                                            :result-name ?e,
                                            :type :entity,
                                            :value? false,
                                            :var ?t}},
 :var->cardinality {?e 100.80263456077057, ?t 15.53476867758077},
 :var->joins {?e [{:id triple169226, :idx-fn #<Fn@5fa65ae1 crux.query/triple_joins[fn]>}],
              ?t [{:id triple169226, :idx-fn #<Fn@5fa65ae1 crux.query/triple_joins[fn]>}]},
 :var->logic-var-range-constraint-fns {},
 :var->range-constraints {?t #<Fn@76d4290f crux.query/new_range_constraint_wrapper_fn[fn]>},
 :vars-in-join-order [?t ?e]}

refset11:07:43

that's because MAX is no longer a var, but if we pretend it was, the order would now be [MAX ?t ?e]

refset11:07:52

do the results come out in the order you expect?

nivekuil11:07:24

the results are in the same order as previously, #{[0] [10] [11] [12] [13] [14] [15] [16] [17] [18]...

refset11:07:43

how about if you add :limit 100 to force it to be a vector/bag (in case the set just happens to be printing that way)?

nivekuil11:07:12

ah, yup that was it

nivekuil11:07:58

in fact that was it the whole time, it seems

nivekuil11:07:14

totally forgot that you had to use open-q to make use of the lazy sort

nivekuil11:07:29

you should be able to iterate in either direction, right?

refset11:07:25

cool 🙂 > in fact that was it the whole time, it seems that may be true, but when MAX was a var the order coming out is more "undefined" than when there's a literal (i.e. don't rely on it) There was a discussion about this a few weeks back, but tl;dr it only works in one direction https://github.com/juxt/crux/discussions/1514

nivekuil11:07:01

interestingly, this instantly segfaults

(with-open [res (crux/open-q (crux/db node)
                               {:find  '[?e]
                                :limit 100
                                :where ['[?e :t ?t]
                                        [(list '> '?t java.time.Instant/MIN)]]})]
    (iterator-seq res))

nivekuil11:07:46

it's unlike any crux segfault I've ever seen! too tired to look into this atm, but here's a dump of what appears relevant from the crash log

Stack: [0x00007fe5da1fe000,0x00007fe5da2ff000],  sp=0x00007fe5da2fb8c8,  free space=1014k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0x1c10b0]
J 27181 c1 crux.rocksdb.RocksKvIterator.next()Ljava/lang/Object; (33 bytes) @ 0x00007fe61f9b387c [0x00007fe61f9b3180+0x00000000000006fc]
J 27180 c1 crux.kv.index_store.PrefixKvIterator.next()Ljava/lang/Object; (110 bytes) @ 0x00007fe61f991bd4 [0x00007fe61f991000+0x0000000000000bd4]
J 27283 c1 crux.kv.index_store$step_fn$step__144431$fn__144432.invoke()Ljava/lang/Object; (66 bytes) @ 0x00007fe61f9dbe8c [0x00007fe61f9db040+0x0000000000000e4c]
J 11609 jvmci clojure.lang.LazySeq.sval()Ljava/lang/Object; (42 bytes) @ 0x00007fe623a25754 [0x00007fe623a256a0+0x00000000000000b4]
J 11610 jvmci clojure.lang.LazySeq.seq()Lclojure/lang/ISeq; (53 bytes) @ 0x00007fe623a60c54 [0x00007fe623a60bc0+0x0000000000000094]
J 3280 jvmci clojure.lang.Cons.next()Lclojure/lang/ISeq; (10 bytes) @ 0x00007fe6239de5f4 [0x00007fe6239de580+0x0000000000000074]
J 2047 jvmci clojure.core$next__5404.invoke(Ljava/lang/Object;)Ljava/lang/Object; (7 bytes) @ 0x00007fe6239ae4fc [0x00007fe6239ae420+0x00000000000000dc]
J 27326 c1 crux.index.SeekFnIndex.next_values()Ljava/lang/Object; (108 bytes) @ 0x00007fe61f9fc594 [0x00007fe61f9fb6a0+0x0000000000000ef4]
J 27577 c1 crux.index.DerefIndex.next_values()Ljava/lang/Object; (61 bytes) @ 0x00007fe61fad1774 [0x00007fe61fad0ba0+0x0000000000000bd4]
J 27325 c1 crux.index.NAryJoinLayeredVirtualIndex.next_values()Ljava/lang/Object; (152 bytes) @ 0x00007fe61f9f98c4 [0x00007fe61f9f8a40+0x0000000000000e84]
j  crux.index.PredicateVirtualIndex.next_values()Ljava/lang/Object;+46
j  crux.index.GreaterThanVirtualIndex.next_values()Ljava/lang/Object;+48
j  crux.index.PredicateVirtualIndex.next_values()Ljava/lang/Object;+46
J 27325 c1 crux.index.NAryJoinLayeredVirtualIndex.next_values()Ljava/lang/Object; (152 bytes) @ 0x00007fe61f9f98c4 [0x00007fe61f9f8a40+0x0000000000000e84]
J 27550 c1 crux.index$layered_idx__GT_seq$step__95852.invokePrim(Ljava/lang/Object;JLjava/lang/Object;)Ljava/lang/Object; (446 bytes) @ 0x00007fe61fab185c [0x00007fe61faafee0+0x000000000000197c]
j  crux.index$layered_idx__GT_seq$step__95852.invoke(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+10
j  crux.index$layered_idx__GT_seq$step__95852$fn__95866.invoke()Ljava/lang/Object;+78
J 11609 jvmci clojure.lang.LazySeq.sval()Ljava/lang/Object; (42 bytes) @ 0x00007fe623a25754 [0x00007fe623a256a0+0x00000000000000b4]
J 11610 jvmci clojure.lang.LazySeq.seq()Lclojure/lang/ISeq; (53 bytes) @ 0x00007fe623a60eec [0x00007fe623a60bc0+0x000000000000032c]
J 12778 jvmci clojure.core$seq__5420.invoke(Ljava/lang/Object;)Ljava/lang/Object; (7 bytes) @ 0x00007fe623ffe7fc [0x00007fe623ffe720+0x00000000000000dc]
j  crux.query$query$fn__99980$iter__99982__99986$fn__99987.invoke()Ljava/lang/Object;+22
J 11609 jvmci clojure.lang.LazySeq.sval()Ljava/lang/Object; (42 bytes) @ 0x00007fe623a25754 [0x00007fe623a256a0+0x00000000000000b4]
J 11610 jvmci clojure.lang.LazySeq.seq()Lclojure/lang/ISeq; (53 bytes) @ 0x00007fe623a60c54 [0x00007fe623a60bc0+0x0000000000000094]
J 3904 jvmci clojure.core$take$fn__5928.invoke()Ljava/lang/Object; (79 bytes) @ 0x00007fe623a63c7c [0x00007fe623a63500+0x000000000000077c]
J 11609 jvmci clojure.lang.LazySeq.sval()Ljava/lang/Object; (42 bytes) @ 0x00007fe623a25754 [0x00007fe623a256a0+0x00000000000000b4]
J 11610 jvmci clojure.lang.LazySeq.seq()Lclojure/lang/ISeq; (53 bytes) @ 0x00007fe623a60c54 [0x00007fe623a60bc0+0x0000000000000094]
J 3263 jvmci clojure.lang.SeqIterator.hasNext()Z (64 bytes) @ 0x00007fe6239a7a5c [0x00007fe6239a7760+0x00000000000002fc]
j  crux.io.Cursor.hasNext()Z+9
J 23824 jvmci clojure.lang.RT$4.invoke()Ljava/lang/Object; (69 bytes) @ 0x00007fe6245dd51c [0x00007fe6245dd380+0x000000000000019c]
J 11609 jvmci clojure.lang.LazySeq.sval()Ljava/lang/Object; (42 bytes) @ 0x00007fe623a25754 [0x00007fe623a256a0+0x00000000000000b4]
J 11610 jvmci clojure.lang.LazySeq.seq()Lclojure/lang/ISeq; (53 bytes) @ 0x00007fe623a60c54 [0x00007fe623a60bc0+0x0000000000000094]
J 11617 jvmci clojure.core$seq__5420.invokeStatic(Ljava/lang/Object;)Ljava/lang/Object; (7 bytes) @ 0x00007fe623bbbd9c [0x00007fe623bbbd00+0x000000000000009c]
j  clojure.core$print_sequential.invokeStatic(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+195

nivekuil11:07:29

seems to be iterator-seq that's triggering it, returning nothing is fine.

nivekuil11:07:11

I observe this being sorted by ?id instead of ?date

(crux.query/query-plan-for
   (crux/db node)
   {:find  '[?date ?id]
    :where [['?e :entry/view 1]
            '[?e :entry/id ?id]
            '[?e :entry/updated-date ?date]
            [(list '< '?date java.time.Instant/MAX)]]
    :limit 250})
but the query plan has :vars-in-join-order [1 ?e ?date ?id]

nivekuil11:07:08

I think I would expect ?date to come before ?e there?

refset12:07:39

the segfault looks surprising to me, I'll try to repro it

refset12:07:32

it's hard to judge exactly why the planner chose that join order, but I suspect it will be due to the relatively cardinality/selectivity of the :entry/updated-date values vs the :entry/view values

refset12:07:57

e.g. are there significantly more :entry/updated-date KVs with a more diverse set of values, than for :entry/view?

nivekuil12:07:43

:var->cardinality
   {1 0.0, ?date 170.46747929617274, ?e 1.3768786594190381E-5, ?id 1.7976931348623157E308}

nivekuil12:07:54

not quite sure how to intrepret the fractional ?e

nivekuil13:07:30

here's a repro for you for the segfault

(ns test
  (:require [crux.api :as crux]
            [ :as io]
            [tick.alpha.api :as t]))

(defn start-crux! []
  (letfn [(kv-store [dir]
	        {:kv-store {:crux/module 'crux.rocksdb/->kv-store
	                    :db-dir      (io/file dir)
	                    :sync?       true}})]
	(crux/start-node
	 {:crux/tx-log         (kv-store "data/dev/tx-log")
	  :crux/document-store (kv-store "data/dev/doc-store")
	  :crux/index-store    (kv-store "data/dev/index-store")})))

(defonce node (start-crux!))
(doseq [i (range 100)]
  (crux/submit-tx node [[:crux.tx/put {:crux.db/id i :t (t/ago (t/new-duration i :days))}]]))
(with-open [res (crux/open-q (crux/db node)
                             {:find  '[?e]
                              :limit 100
                              :where ['[?e :t ?t]
                                      [(list '> '?t java.time.Instant/MIN)]]})]
  (iterator-seq res))

🙏 2
nivekuil13:07:42

deps

{:deps
 {pro.juxt.crux/crux-core    {:mvn/version "1.17.1"}
  pro.juxt.crux/crux-rocksdb {:mvn/version "1.17.1"}
  tick/tick                  {:mvn/version "0.4.32"}}}

refset13:07:05

> not quite sure how to intrepret the fractional `?e` me either 😅

refset13:07:22

thanks for the repro, I've added it to the project board as a gist https://gist.github.com/refset/8f09f7ff0bf553b08e7428daa2c820c8

refset13:07:01

> e.g. are there significantly more `:entry/updated-date` KVs with a more diverse set of values, than for `:entry/view`? do you have anecdotal/relative numbers for these in mind? (ignoring what's actually in the index)

nivekuil13:07:16

yes, updated-date would have the higher cardinality (view is a one-to-many)

refset13:07:14

ah right, that would probably explain it then

refset13:07:21

if you really want to process the query in your preferred order you could decompose it into 2 queries, or handle the range constraint in a subquery (though this means realising the full result set before the outer query can start processing)

nivekuil14:07:34

I guess ordering has to be maintained externally here for an efficient lookup

nivekuil14:07:31

really I'd like to get rid of updated-date completely (it's just an indexed vt), but I guess having a separate vt index wouldn't help here

refset19:07:01

I feel like re-ordering of the result set is almost always going to have to be the final step in any serious db query, yeah, unless you have some super exact knowledge of the data evolution ahead of time and build some kind of ideal index structure to support it

refset19:07:46

> really I'd like to get rid of updated-date completely (it's just an indexed vt) As discussed before, it's coming 😄

nivekuil21:07:24

fortunately I've got a denormalized data store I can use, so I can in fact get rid of updated-date 🙂 I think it must be categorically wrong to expect any kind of ordering from crux, as a data modeling rule

👍 2
refset21:07:24

Unless you use :order-by !

nivekuil21:07:24

well, that's really more clojure doing the sorting 😛

refset22:07:18

true! it does know how to spill to disk though

jarohen13:07:12

re the segfault - this is because iterator-seq is lazy, so it's trying to access the query results after the with-open has closed the query's resources

🙏 2
✔️ 2
jarohen13:07:24

something like (vec (iterator-seq ...)) should do it?

nivekuil20:07:10

I wouldn't expect a iterator-seq with nothing else going on to even do anything (should be a lazy view over the iterator?), much less have some action outside the with-open block, but I'll gladly take your word for it here

jarohen20:07:09

it is a lazy view over the iterator - if you're running this from the REPL, it might be that your REPL is trying to print out the seq?

jarohen20:07:43

if it is, it'll be evaluating the seq after the with-open closes

jarohen20:07:07

same as if you tried to do (with-open [rdr (io/reader ...)] (line-seq rdr)), say

jarohen20:07:56

another way to check it would be to def it, and then see whether it blows up when you eval the (def foo ...) or when you then check the value of foo

nivekuil21:07:25

ahh, right, the block would return a reference and the repl doesn't know that it's been "closed". that makes sense, thanks 🙂

Steven Deobald21:07:33

Those almost look string-sorted. 😕 If you give it a few more days, do they show up between [1] and [2]?

nivekuil21:07:19

#{[0] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [1] [20] [21] [22] [23] [24] [25] [26] [27]
  [28] [29] [2] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [3] [40] [41] [42] [43] [44] [45]
  [46] [47] [48] [49] [4] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [5] [60] [61] [62] [63]
  [64] [65] [66] [67] [68] [69] [6] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [7] [80] [81]
  [82] [83] [84] [85] [86] [87] [88] [89] [8] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [9]}

2
seancorfield21:07:11

I don't know whether it's been mentioned here but coming up this week: https://www.meetup.com/Los-Angeles-Clojure-Users-Group/events/279378615/

🙏 4
👍 4
seancorfield21:07:45

(I attend this meetup sometimes, even tho' I'm in the San Francisco area)

Steven Deobald22:07:32

@kevin842 I'm a bit confused by this on a number of levels. The Clojure less-than doesn't like java.time.Instant at all, so I'm a little surprised Crux doesn't get angry for the same reasons. But I really wouldn't expect it to sort in that (string-ified, presumably) order, if at all.

Steven Deobald22:07:28

@seancorfield This is probably okay to tweet to a wider audience?

seancorfield22:07:31

Sure, I think they only have 100 seats on their Zoom license for this tho' 🙂

Steven Deobald22:07:17

Pretty sure we don't have quite that much reach. 😉 I just wanted to make sure a bunch of strangers wouldn't throw off the LA CUG vibe.

seancorfield22:07:08

They already get a bunch of non-local folks joining. It's been really interesting to see how user groups have expanded geographically after going virtual.

👍 4
richiardiandrea23:07:00

Hi there question about crux-console...The repo is read-only - is the plan to take it out of Crux proper?

refset23:07:24

Hey, for others' context, you're talking about https://github.com/crux-labs/crux-console - that repo is deprecated in favour of the more recently UI now built-in to crux-http-server although the two aren't equivalent in power in various dimensions

🙏 2
Steven Deobald23:07:38

I was just sanity-checking my reply... why aren't you asleep? 😉

🤪 2
richiardiandrea23:07:44

yep thanks @taylor.jeremydavid - was referring to that repo as I did not notice the other crux-http-server one 😉

richiardiandrea23:07:44

yep thanks @taylor.jeremydavid - was referring to that repo as I did not notice the other crux-http-server one 😉