This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-10-01
Channels
- # announcements (8)
- # aws (8)
- # babashka (21)
- # beginners (125)
- # calva (12)
- # cider (10)
- # circleci (29)
- # clara (6)
- # clj-kondo (34)
- # cljdoc (3)
- # cljfx (65)
- # cljs-dev (18)
- # clojure (38)
- # clojure-australia (4)
- # clojure-berlin (5)
- # clojure-czech (2)
- # clojure-dev (15)
- # clojure-europe (22)
- # clojure-nl (3)
- # clojure-uk (31)
- # clojuredesign-podcast (7)
- # clojurescript (87)
- # code-reviews (1)
- # conjure (3)
- # cursive (2)
- # data-science (1)
- # datalog (1)
- # datomic (36)
- # emacs (12)
- # events (1)
- # fulcro (3)
- # graalvm (68)
- # instaparse (2)
- # jackdaw (2)
- # jobs (2)
- # leiningen (8)
- # luminus (2)
- # nrepl (31)
- # pedestal (44)
- # releases (1)
- # remote-jobs (6)
- # shadow-cljs (4)
- # spacemacs (4)
- # sql (13)
- # tools-deps (56)
- # uncomplicate (4)
- # xtdb (40)
- # yada (11)
it seems it isn't possible to say "give me the entity with this attribute, regardless of its value, even if it's a empty set or vector" not sure how to get around that actually
Hmm, it sounds like you can't avoid using entity
in that case. If you need to do it in the middle of a query then you can use a custom predicate
how would a custom predicate work? From the docs it looks like predicates apply only to :args
, whereas what I want to do looks like
(put {:crux.db/id 1 :foo #{}}) (q {:find '[?e] :where [['?e :foo '_]]})
You can do this kind of thing:
(let [db (crux/db node)]
(crux/q db {:find '[?f]
:where '[[_ :crux.db/id ?e]
[(?myf ?db ?e) ?f]]
:args [{'?db db
'?myf (fn myf [db e]
(crux/entity db e))}]}))
The docs don't really discuss custom predicates outside of a comment here that also mentions you can refer to them via fully qualified names (instead of using an arg like I've just shown) https://www.opencrux.com/reference/20.09-1.12.1/queries.html#datascript-differences
wow, fancy. is that still considered a predicate even though it's not returning a boolean?
(let [db (crux/db node)]
(crux/q db {:find '[?f]
:where '[[_ :crux.db/id ?e]
[(?myf ?db ?e :foo) ?f]]
:args [{'?db db
'?myf (fn myf [db e k]
(get (crux/entity db e) k))}]}))
> is that still considered a predicate even though it's not returning a boolean? That's the language we've been using internally at least, yep. There's always a chance it's wrong 🙂
I think this does technically work, but it appears to not benefit from whatever black magic crux does to make the simple query fast, as the performance is reminiscent of running hive queries: about 20 seconds on 4.5ghz skylake core for a few thousand docs
(with-open [db (crux/open-db node)]
(crux/open-q db {:find '[?f]
:where '[[_ :crux.db/id ?e]
[(?myf ?db ?e :foo) ?f]]
:args [{'?db db
'?myf (fn myf [db e k]
(get (crux/entity db e) k))}]}))
this one actually ooms my 2gb heap after an even longer time, which is a bit unexpected from a streamed API
ah, I realise now that I shouldn't have changed q
to open-q
(use of open-db
should be enough) - please try again
another reason that might make it slow is if the query engine has chosen a poor join order. Are you able to share the query you're actually running? If you enable DEBUG logging for crux.query
you'll be able to share the join order also
oom again; I'm running your code verbatim, actually. I'll take a look at the logger, I think that needs to be configured with log4j?
2020-10-01T21:30:52.229Z machina DEBUG [crux.query:326] - :query {:find [?f], :where [[_ :crux.db/id ?e] [(?myf ?db ?e :foo) ?f]], :arg-keys [#{?myf ?db}]} 2020-10-01T21:30:52.230Z machina DEBUG [crux.query:326] - :triple-joins-var->frequency {_108013 1, ?e 1} 2020-10-01T21:30:52.230Z machina DEBUG [crux.query:326] - :triple-joins-join-order (?myf ?db _108013 ?e) 2020-10-01T21:30:52.231Z machina DEBUG [crux.query:326] - :join-order :aev _108013 ?e {:e _108013, :a :crux.db/id, :v ?e} 2020-10-01T21:32:40.157Z machina DEBUG [crux.query:326] - :where [[:triple {:e _, :a :crux.db/id, :v ?e}] [:pred {:pred {:pred-fn ?myf, :args [?db ?e :foo]}, :return [:scalar ?f]}]] 2020-10-01T21:32:40.157Z machina DEBUG [crux.query:326] - :vars-in-join-order [?myf ?db _108013 ?e ?f] 2020-10-01T21:32:40.158Z machina DEBUG [crux.query:326] - :attr-stats {:crux.db/id 2209}
well, it does finish the query this restart for whatever reason, but still takes about 20 seconds
I think the 2209 in the last log line is the # of docs? and nope, it's in-memory
2020-10-01T21:33:35.085Z machina DEBUG [crux.memory:326] - :pool-allocation-stats {:allocated 32636928, :deallocated 31195136, :in-use 1441792} 2020-10-01T21:33:35.114Z machina DEBUG [crux.memory:326] - :pool-allocation-stats {:allocated 32768000, :deallocated 31195136, :in-use 1572864} 2020-10-01T21:33:35.139Z machina DEBUG [crux.memory:326] - :pool-allocation-stats {:allocated 32899072, :deallocated 31195136, :in-use 1703936} 2020-10-01T21:33:35.163Z machina DEBUG [crux.memory:326] - :pool-allocation-stats {:allocated 33030144, :deallocated 31195136, :in-use 1835008}
if that's interestingokay, so I have the following query running in ~300ms against 5000 docs in RocksDB on my modest xps, but unfortunately the ability to use :in '[$]
is only merged on master
and currently unreleased:
(doall
(for [i (range 5000)]
(crux/submit-tx n [[:crux.tx/put {:crux.db/id i :a 1 :b i :foo (inc i)}]])))
(defn myf [db e k]
(get (crux/entity db e) k))
(time
(crux/q (crux/db n)
{:find '[?f]
:in '[$]
:where '[[?e :crux.db/id]
[(user/myf $ ?e :foo) ?f]]}))
looks interesting! I didn't realize you could omit the value binding altogether either. You don't have to await-tx those submits?
Can I somehow do queries that ignore start/end valid time? Query disregarding time, without knowing the ID (so seems History API won't be suitable for it).
Unfortunately Crux doesn't maintain indexes to efficiently support that kind of temporal-range querying yet. Can you describe the use-case a little? There may be a few workaround options worth considering in the meantime.
isn't the whole point of using crux to not be held down by handling time manually, thus these sorts of queries are probably not a good idea anyway?
@U899JBRPF yeah, the use case is to have something that expires, but can later be recovered somehow, and wanted to avoid adding a expires_at
timestamp to the doc
@U013YH4QPD0 indeed! That's why I'm trying to use the built-in features around valid times rather than adding timestamps to the docs
So perhaps what you want is a document that expires and keep another document with all the existing documents? Could use a valid-time on a :crux.tx/fn to restrict the list to only expired items, and get via history
Hmm, yeah, might be simpler to just revert the relationship instead then. I'll have a longer think about it. Thanks!
Adding an explicit expires_at
attribute may seem like unnecessary duplication, but in most cases I expect it is the right thing to do. The valid time semantics are not yet powerful enough to model everything that people hope to use it for, but we're working on it 🙂
@U899JBRPF yeah, I ended up adding a "created_at" attribute instead and have expire time in application code instead, so I can easily change it, so I'm back to adding timestamps 😄 Thanks for all the work you've all done so far, pleasure to use
I can't figure out how I could also retrieve the tx-time in a query as I want to display when something was added, so seems I'm gonna need created_at
for that as well
adding a domain timestamp that always exactly matches tx-time will require using a transaction function, but hopefully your app can survive with a few ms of drift/skew there and you can avoid putting everything through a transaction function 🙂
I want to help change this documentation to be better, partly due to the glee I have at finding out well transaction functions work. https://opencrux.com/reference/20.09-1.12.0/transactions.html Where can I contribute? Source of my glee: https://github.com/juxt/crux/discussions/1146
https://github.com/juxt/crux/blob/master/docs/reference/modules/ROOT/pages/transactions.adoc Ah, I found it.
Hey, sorry for the late response. We appreciate the enthusiasm! PRs would certainly be welcome with more descriptions & examples if you have some good ideas 🙂 I'm glad you found the specific adoc okay. Unfortunately it's not straightforward for you to build a full local preview using Antora at the moment, because you won't have access to the crux-site repo, but I can certainly build a push a branch up to http://opencrux.com if & when you open a PR
Would welcome feedback on https://github.com/juxt/crux/discussions/1143 though