This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-06-15
Channels
- # babashka (41)
- # beginners (47)
- # calva (7)
- # cider (5)
- # cljsrn (2)
- # clojure (38)
- # clojure-europe (74)
- # clojure-nl (2)
- # clojure-spec (1)
- # clojure-uk (38)
- # clojurescript (42)
- # component (30)
- # core-async (2)
- # cryogen (6)
- # cursive (47)
- # datahike (7)
- # datomic (18)
- # defnpodcast (1)
- # fulcro (17)
- # graalvm (8)
- # graphql (4)
- # helix (5)
- # honeysql (5)
- # introduce-yourself (1)
- # jobs (5)
- # jobs-discuss (4)
- # malli (20)
- # meander (4)
- # mental-health (1)
- # off-topic (41)
- # pathom (18)
- # podcasts-discuss (2)
- # re-frame (20)
- # react (1)
- # reagent (22)
- # reitit (2)
- # releases (2)
- # remote-jobs (1)
- # reveal (2)
- # sci (10)
- # shadow-cljs (42)
- # sql (20)
- # tools-deps (7)
- # vim (2)
- # xtdb (51)
hey everyone, first time revisiting Crux in a while. I'm trying to run fully on SQLite for now, but not sure I got the config right. I'm getting a NullPointerException when trying to transact...
(defonce node
(crux/start-node {:crux.jdbc/connection-pool
{:dialect {:crux/module crux.jdbc.sqlite/->dialect}
:pool-opts {}
:db-spec {:jdbcUrl "jdbc:sqlite:crux.db"}}
:crux/document-store
{:crux/module crux.jdbc/->document-store
:connection-pool :crux.jdbc/connection-pool}
:crux/tx-log
{:crux/module crux.jdbc/->tx-log
:connection-pool :crux.jdbc/connection-pool}}))
(crux/submit-tx node [[:crux.tx/put {:crux.db/id :ivan}]])
=> java.lang.NullPointerException
cache.clj:13 crux.cache/evict
cache.clj:12 crux.cache/evict
document_store.clj:57 crux.document-store.CachedDocumentStore/iter/fn
Hey @U07FP7QJ0 maybe you need to quote the ->tx-log
, ->document-store
and ->dialect
like this:
(defonce node
(crux/start-node {:crux.jdbc/connection-pool
{:dialect {:crux/module 'crux.jdbc.sqlite/->dialect}
:pool-opts {}
:db-spec {:jdbcUrl "jdbc:sqlite:crux.db"}}
:crux/document-store
{:crux/module 'crux.jdbc/->document-store
:connection-pool :crux.jdbc/connection-pool}
:crux/tx-log
{:crux/module 'crux.jdbc/->tx-log
:connection-pool :crux.jdbc/connection-pool}}))
Maybe crux should catch that? :thinking_face:
@U013MQC5YKD Curious: Is this related to the conversation you and I were having during our call? If it is, it might be worth a quick conversation again (maybe with @U899JBRPF) to chat about topologies.
> Maybe crux should catch that? :thinking_face: Yeah, I think so! I've added a note on the project board. Hopefully it's just a small tweak to system.clj
@U01AVNG2XNF Arne's working on some really cool data & schema derivations stuff and was testing it out with Crux! But I really like how crux can be used purely with Sqlite3 , I think I should give it a test on windows and see how it goes 🙂 I'll definitely reach out if we need a bit of help with Crux 🙌
Oh man... I'd actually forgotten about the Windows requirement. 😬 Please do shout if you hit snags.
Hehe! Yes of course thank you 😁
Hi, I have some performance issue when using "find by attribute in" kind of query when I use (or)
in the query. I'll post my minimum repro in the thread 🧵. Hopefully I'm doing something really silly and someone will spot my stupidity 🙏 🙂
any ideas welcome
Mentioned above, but have you taken a glance at the query debug logs to see if there's anything obviously weird happening? https://opencrux.com/community/faq.html#observequeries ... at a glance, that really doesn't look like it should be taking 4 seconds. 😕 If someone is able to help you debug, it would also be useful to know what version of Crux you're on.
[juxt/crux-core "21.06-1.17.1-beta"]
[juxt/crux-rocksdb "21.06-1.17.1-beta"]
I just tested locally (on .17, in-memory node), and I can reproduce the slowdown. From under 3ms to over 1.6s (after running the query several times)
(with-out-str (time (crux/q
(crux/db crux-node)
'{:find [e something-val]
:in [[id ...]]
:where [[e :bus-stop/id id]
[e :something something-val]]}
ids)));; => "\"Elapsed time: 1.300942 msecs\"\n"
(with-out-str (time (crux/q
(crux/db crux-node)
'{:find [e something-val]
:in [[id ...]]
:where [(or [e :train-stop/id id]
[e :bus-stop/id id])
[e :something something-val]]}
ids)));; => "\"Elapsed time: 1326.292413 msecs\"\n"
that's a 1000x slowdown there... seems weird indeed
@U01LFP3LA6P, might be worth running with logging set to DEBUG, and post the query debug info...
Sure, I'll try that.
wow, what a huge amount of log lines
cat log_output.txt | grep "crux.query" | wc -l
159854
my money would be on the :something
coming first in the join order, because the query planner can't look down inside or
clauses
what this'll mean is that Crux will scan the :something
attribute first, and then check that it matched either :train-stop/id
or :bus-stop/id
, whereas (ideally) you'd want it to go directly to the id
might work as a workaround, but shouldn't crux be able to cope with the query like the one I'm using?
I can't immediately think of a reasonable way for us to do that, at least :thinking_face:
if you're going to be making a lot of queries that look for both train-stops and bus-stops, I'd probably look into adding another key onto both sets of docs, which you could then include in the query at the top level
Alternatively, you could work around this by running both queries separately, and using set/union
on the results
but admittedly, it's hacky
I'm not sure I follow, so I'll rather ask in a different way 🙂
(time (crux/q
(crux/db crux-node)
'{:find [id]
:in [[id ...]]
:where [(or [e :train-stop/id id]
[e :bus-stop/id id])]}
ids))
Is such query actually wrong? If so, how can I change it to be more efficient?Yeah, the queries as such seem nonsensical, but the underlying performance issue remains...
I mean.. sure, I might query id directly, restructure the data so that they have some additional field such as :type :bus
etc.
but what if I bump into similar situation on "non-id", regular attribute
it's the id
Maybe I shouldn't have chosen :train-stop/id and :bus-stop/id, might be a bit distracting
if they were called :key1
and :key2
, and my task was to filter out some vector of values, which appear in my database under either :key1
or :key2
, that would be an identical issue, wouldn't it?
a workaround with set/union
would probably work, as @U95NTJT4H mentioned
but such workaround indeed feels a bit hacky to me and I was trying to avoid it
@U050V1N74 the magnitude of the slow down, for a dataset with only 2k entries suggests that there's something else going on... even a linear scan on a collection of that size should not take over 1s...
mm - we do have a number of fast paths in that area to optimise common cases, maybe we're not hitting any of them in this case :thinking_face:
best to raise an issue for this with the repro above, if that'd be alright, and we can take more of a look 🙂
I just tested with an older release, to see if it was a regression, but the performance degradation remains...
I have to leave now unfortunately. If I have some spare time in the evening, I'll create the issue.
I think it should be acceptable in this specific instance to drop the [e :something something-val]
clause and use (pull e [:something])
in the :find
instead
i.e. this is fast
(time (crux/q
(crux/db crux-node)
'{:find [id (pull e [:something])]
:in [[id ...]]
:where [(or [e :train-stop/id id]
[e :bus-stop/id id])]}
ids))
You could also use the the built-in get-attr
predicate, which will necessarily come after the or
in the join order
(time (crux/q
(crux/db crux-node)
'{:find [id something-val]
:in [[id ...]]
:where [(or [e :train-stop/id id]
[e :bus-stop/id id])
[(get-attr e :something) [something-val]]]}
ids))
Interesting. It almost feels counter-intuitive to me that using pull
syntax for a single attribute may actually result in (way, way) better performance than getting the value of that attribute using :where
. Somehow I'd expect similar performance or even pull
variant being a tiny bit slower.
Can this be said in general that it's a good practice to prefer pull
or get-attr
(for fetching document's attributes' values) instead of "joining" them using :where
"matching clauses"?
we do tend to make that distinction for pull
, yep - using :where
to find the documents, and pull
to get their data out once you've found them
https://github.com/juxt/crux/issues/1533 - thanks again for raising and helping us track it down 🙏