Fork me on GitHub
#xtdb
<
2021-08-16
>
mac10:08:59

Hey guys. I am trying to construct a function that would permit me to pass vector of attribute value tuples and build a query from those, but I am getting a lot of complaints from the query spec. Any ideas of how to achieve this?

(defn q
  [where-clauses]
  (crux/q
   (crux/db node) 
    '{:find [(pull ?i [*])]
      :where (mapv #(vec (cons '?i %)) where-clauses)}))

(q '[[:id ?id] [:sub-type :start]])

refset10:08:05

Hey @U09UV3WP6 🙂 I think you just need to adjust your quoting so the mapv gets evaluated before handing over to crux/q, i.e.

(defn q
  [where-clauses]
  (crux/q
   (crux/db node) 
    {:find '[(pull ?i [*])]
     :where (mapv #(vec (cons '?i %)) where-clauses)}))

mac11:08:38

Ah, yes that works. Thanks.

🙏 3
kevinmershon19:08:23

Is there an explanation for why using pull [*] to pull a large set of documents is slower than mapping over ids and calling (crux.api/entity conn id)

refset20:08:00

Well, certainly pull has a the full machinery of the query engine surrounding it (unlike entity), but ideally it wouldn't be that much slower. Some of the overhead could be down to all the set comparisons involved, because a query will naturally return a set (I guess you could try open-q to see if that's faster). Can you say roughly how large the result set is in both count and KB?

kevinmershon21:08:04

~30 records actually, not huge

kevinmershon21:08:24

It was a difference between ~200 ms (mapping over entity) vs 2 seconds

kevinmershon21:08:12

the query was doing the (cons 'or (map ...)) trick suggested above for matching on a set of ids

refset21:08:47

ah, if there's an or clause with many legs involved then the slowness is a lot more understandable as each leg will generate intermediate set operations (and subqueries, behind the scenes). Have you tried supplying the ids via an :in binding? e,g, https://opencrux.com/reference/1.18.0/queries.html#_collection_binding

kevinmershon21:08:20

I've tried unsuccessfully with an in binding

kevinmershon21:08:34

I will try the (== approach

refset21:08:08

if you can share the query I'd be happy to help get it working. It should look roughly like this: https://github.com/juxt/crux/blob/1619657cb472179bb57a6beb8d774f5673014d41/crux-test/test/crux/query_test.clj#L268-L273

kevinmershon21:08:58

the use of :in [$ [name ...]] in the documentation is confusing

kevinmershon21:08:14

is that a literal ... or is it implying that I would care about other fields?

refset21:08:34

the == predicate in that other example is orthogonal, the main point is that you can embed sets directly in the e or v position of a triple clause (there wasn't a test showing this directly, sorry 🙂)

refset21:08:47

that's a literal, ... is a valid symbol

kevinmershon21:08:53

I can get you the exact query in a couple moments

kevinmershon21:08:38

Here's an abbreviated version:

{:find [?doc],
 :in [$ [?doc-id ...]],
 :where
 [[?doc
   :crux.db/id
   #{:RIDzY97NM6RLLGMqiAax9HO
     :RIDzXptwzORzPnJTUHxAPbC
     :PAS0SWNaJSZLHNkMJYlfDzR
     :PAS0S9GI8PM10KMWcZdmsPF
     :PAS0SYMjiLyrt7irJNnRHBf
     :0QyNgGPNKYywttc30I3C
     :RIDzXYos2NLq8sb6z7BSQI3
     :RIDzXnztkT9FFcrFAtIe6xc
     :RIDzXn46mrvs1Z1sC0jYMDk
     :0RPXWayax0danE62OVdv}]]}

kevinmershon21:08:06

these are valid crux ids and I get no results for this

kevinmershon21:08:17

I will try the == predicate

kevinmershon21:08:39

Ah I see I did something different in my implementation on that so that explains the no results issue.

kevinmershon21:08:55

What's the upper limit on these operators? Can I pass thousands of values in the set?

kevinmershon21:08:18

This works and takes 1615ms to run:

{:find [?doc],
 :where
 [[?doc :crux.db/id ?doc-id]
  [(==
    ?doc-id
    #{:RIDzY97NM6RLLGMqiAax9HO
      :RIDzXptwzORzPnJTUHxAPbC
      :PAS0SWNaJSZLHNkMJYlfDzR
      :PAS0S9GI8PM10KMWcZdmsPF
      :PAS0SYMjiLyrt7irJNnRHBf
      :0QyNgGPNKYywttc30I3C
      :RIDzXYos2NLq8sb6z7BSQI3
      :RIDzXnztkT9FFcrFAtIe6xc
      :RIDzXn46mrvs1Z1sC0jYMDk
      :0RPXWayax0danE62OVdv})]]}

kevinmershon21:08:27

(mapv #(crux/entity (crux/db crux-node) %)
      id-or-ids)
this takes 75ms

refset23:08:36

> What's the upper limit on these operators? Can I pass thousands of values in the set? Interesting question...I'd guess it's just a question of heap space To keep things like-for-like, how long does this take:

{:find [(pull ?doc [*])],
 :where
 [[?doc :crux.db/id
   #{:RIDzY97NM6RLLGMqiAax9HO
     :RIDzXptwzORzPnJTUHxAPbC
     :PAS0SWNaJSZLHNkMJYlfDzR
     :PAS0S9GI8PM10KMWcZdmsPF
     :PAS0SYMjiLyrt7irJNnRHBf
     :0QyNgGPNKYywttc30I3C
     :RIDzXYos2NLq8sb6z7BSQI3
     :RIDzXnztkT9FFcrFAtIe6xc
     :RIDzXn46mrvs1Z1sC0jYMDk
     :0RPXWayax0danE62OVdv}]]}

refset23:08:30

I'm not sure I understand how your == example is compiling, let alone returning results, as the predicate should need wrapping in a vector :thinking_face:

kevinmershon02:08:47

Yeah I fixed it after pasting

kevinmershon03:08:29

> how long does this take 13 ms

🙂 4
refset08:08:43

Ah, that's more like it! The query engine batches doc fetches so is probably able to be a little faster than individual entity calls

kevinmershon15:08:42

IMO the use of set literals should be on the documentation. Comparing a value against a known set is more common than comparing with an explicit value "Ivan"

richiardiandrea23:08:45

Is there a way to stop a Jetty server started with :crux.http-server/server? I use the reloaded pattern and I do call .close on the node on stop but it seems the Http server is still running on restart and I get back a Address already in use error