Fork me on GitHub
#xtdb
<
2020-10-01
>
nivekuil06:10:26

it seems it isn't possible to say "give me the entity with this attribute, regardless of its value, even if it's a empty set or vector" not sure how to get around that actually

refset08:10:55

Hmm, it sounds like you can't avoid using entity in that case. If you need to do it in the middle of a query then you can use a custom predicate

nivekuil08:10:29

how would a custom predicate work? From the docs it looks like predicates apply only to :args, whereas what I want to do looks like

(put {:crux.db/id 1 :foo #{}})   (q {:find  '[?e]       :where [['?e :foo '_]]})

refset10:10:07

You can do this kind of thing:

(let [db (crux/db node)]
  (crux/q db {:find '[?f]
              :where '[[_ :crux.db/id ?e]
                       [(?myf ?db ?e) ?f]]
              :args [{'?db db
                      '?myf (fn myf [db e]
                              (crux/entity db e))}]}))

refset10:10:43

The docs don't really discuss custom predicates outside of a comment here that also mentions you can refer to them via fully qualified names (instead of using an arg like I've just shown) https://www.opencrux.com/reference/20.09-1.12.1/queries.html#datascript-differences

nivekuil10:10:05

wow, fancy. is that still considered a predicate even though it's not returning a boolean?

refset10:10:42

(let [db (crux/db node)]
  (crux/q db {:find '[?f]
              :where '[[_ :crux.db/id ?e]
                       [(?myf ?db ?e :foo) ?f]]
              :args [{'?db db
                      '?myf (fn myf [db e k]
                              (get (crux/entity db e) k))}]}))

refset10:10:54

^ that should do what you want

refset10:10:51

> is that still considered a predicate even though it's not returning a boolean? That's the language we've been using internally at least, yep. There's always a chance it's wrong 🙂

nivekuil10:10:24

I think this does technically work, but it appears to not benefit from whatever black magic crux does to make the simple query fast, as the performance is reminiscent of running hive queries: about 20 seconds on 4.5ghz skylake core for a few thousand docs

refset11:10:46

(with-open [db (crux/open-db node)]
  (crux/open-q db {:find '[?f]
              :where '[[_ :crux.db/id ?e]
                       [(?myf ?db ?e :foo) ?f]]
              :args [{'?db db
                      '?myf (fn myf [db e k]
                              (get (crux/entity db e) k))}]}))

refset11:10:25

I've not tested/benchmarked, but this version should be a lot quicker

nivekuil20:10:10

this one actually ooms my 2gb heap after an even longer time, which is a bit unexpected from a streamed API

refset21:10:13

ah, I realise now that I shouldn't have changed q to open-q (use of open-db should be enough) - please try again

refset21:10:19

another reason that might make it slow is if the query engine has chosen a poor join order. Are you able to share the query you're actually running? If you enable DEBUG logging for crux.query you'll be able to share the join order also

nivekuil21:10:48

oom again; I'm running your code verbatim, actually. I'll take a look at the logger, I think that needs to be configured with log4j?

nivekuil21:10:10

2020-10-01T21:30:52.229Z machina DEBUG [crux.query:326] - :query {:find [?f], :where [[_ :crux.db/id ?e] [(?myf ?db ?e :foo) ?f]], :arg-keys [#{?myf ?db}]} 2020-10-01T21:30:52.230Z machina DEBUG [crux.query:326] - :triple-joins-var->frequency {_108013 1, ?e 1} 2020-10-01T21:30:52.230Z machina DEBUG [crux.query:326] - :triple-joins-join-order (?myf ?db _108013 ?e) 2020-10-01T21:30:52.231Z machina DEBUG [crux.query:326] - :join-order :aev _108013 ?e {:e _108013, :a :crux.db/id, :v ?e} 2020-10-01T21:32:40.157Z machina DEBUG [crux.query:326] - :where [[:triple {:e _, :a :crux.db/id, :v ?e}] [:pred {:pred {:pred-fn ?myf, :args [?db ?e :foo]}, :return [:scalar ?f]}]] 2020-10-01T21:32:40.157Z machina DEBUG [crux.query:326] - :vars-in-join-order [?myf ?db _108013 ?e ?f] 2020-10-01T21:32:40.158Z machina DEBUG [crux.query:326] - :attr-stats {:crux.db/id 2209}

refset21:10:58

thanks, was about to ask for the rest 🙂

nivekuil21:10:11

well, it does finish the query this restart for whatever reason, but still takes about 20 seconds

refset21:10:47

how many docs? And is this with Rocks as the index-store?

nivekuil21:10:06

I think the 2209 in the last log line is the # of docs? and nope, it's in-memory

✔️ 3
nivekuil21:10:50

2020-10-01T21:33:35.085Z machina DEBUG [crux.memory:326] - :pool-allocation-stats {:allocated 32636928, :deallocated 31195136, :in-use 1441792} 2020-10-01T21:33:35.114Z machina DEBUG [crux.memory:326] - :pool-allocation-stats {:allocated 32768000, :deallocated 31195136, :in-use 1572864} 2020-10-01T21:33:35.139Z machina DEBUG [crux.memory:326] - :pool-allocation-stats {:allocated 32899072, :deallocated 31195136, :in-use 1703936} 2020-10-01T21:33:35.163Z machina DEBUG [crux.memory:326] - :pool-allocation-stats {:allocated 33030144, :deallocated 31195136, :in-use 1835008}
if that's interesting

refset22:10:44

okay, so I have the following query running in ~300ms against 5000 docs in RocksDB on my modest xps, but unfortunately the ability to use :in '[$] is only merged on master and currently unreleased:

(doall
   (for [i (range 5000)]
     (crux/submit-tx n [[:crux.tx/put {:crux.db/id i :a 1 :b i :foo (inc i)}]])))

  (defn myf [db e k]
    (get (crux/entity db e) k))

  (time
   (crux/q (crux/db n)
           {:find '[?f]
            :in '[$]
            :where '[[?e :crux.db/id]
                     [(user/myf $ ?e :foo) ?f]]}))

nivekuil22:10:47

looks interesting! I didn't realize you could omit the value binding altogether either. You don't have to await-tx those submits?

refset22:10:51

ah, there was an implicit repl-driven await in there, sorry 🙂

victorb19:10:24

Can I somehow do queries that ignore start/end valid time? Query disregarding time, without knowing the ID (so seems History API won't be suitable for it).

refset20:10:26

Unfortunately Crux doesn't maintain indexes to efficiently support that kind of temporal-range querying yet. Can you describe the use-case a little? There may be a few workaround options worth considering in the meantime.

Toyam Cox22:10:37

isn't the whole point of using crux to not be held down by handling time manually, thus these sorts of queries are probably not a good idea anyway?

victorb06:10:40

@U899JBRPF yeah, the use case is to have something that expires, but can later be recovered somehow, and wanted to avoid adding a expires_at timestamp to the doc

victorb06:10:07

@U013YH4QPD0 indeed! That's why I'm trying to use the built-in features around valid times rather than adding timestamps to the docs

Toyam Cox13:10:10

So perhaps what you want is a document that expires and keep another document with all the existing documents? Could use a valid-time on a :crux.tx/fn to restrict the list to only expired items, and get via history

victorb14:10:52

Hmm, yeah, might be simpler to just revert the relationship instead then. I'll have a longer think about it. Thanks!

refset21:10:25

Adding an explicit expires_at attribute may seem like unnecessary duplication, but in most cases I expect it is the right thing to do. The valid time semantics are not yet powerful enough to model everything that people hope to use it for, but we're working on it 🙂

victorb10:10:01

@U899JBRPF yeah, I ended up adding a "created_at" attribute instead and have expire time in application code instead, so I can easily change it, so I'm back to adding timestamps 😄 Thanks for all the work you've all done so far, pleasure to use

🙏 3
victorb10:10:08

I can't figure out how I could also retrieve the tx-time in a query as I want to display when something was added, so seems I'm gonna need created_at for that as well

refset18:10:25

adding a domain timestamp that always exactly matches tx-time will require using a transaction function, but hopefully your app can survive with a few ms of drift/skew there and you can avoid putting everything through a transaction function 🙂

🙏 3
Toyam Cox22:10:28

I want to help change this documentation to be better, partly due to the glee I have at finding out well transaction functions work. https://opencrux.com/reference/20.09-1.12.0/transactions.html Where can I contribute? Source of my glee: https://github.com/juxt/crux/discussions/1146

refset19:10:46

Hey, sorry for the late response. We appreciate the enthusiasm! PRs would certainly be welcome with more descriptions & examples if you have some good ideas 🙂 I'm glad you found the specific adoc okay. Unfortunately it's not straightforward for you to build a full local preview using Antora at the moment, because you won't have access to the crux-site repo, but I can certainly build a push a branch up to http://opencrux.com if & when you open a PR