Fork me on GitHub
#xtdb
<
2023-05-16
>
Petrus Theron11:05:38

I’m struggling with the XTDB v2 query syntax:

(x/q node '{:find  [pid person]
            :where [($ :people {:xt/id pid, :xt/* person})]})
=> Execution error (IllegalArgumentException) at xtdb.expression/continue-read$fn (expression.clj:253).
No method in multimethod 'read-value-code' for dispatch value: :set
I’m trying to get all fields of a document because there is no x/entity in v2 and x/pull not supported yet. Docs show this example, but I don’t understand what uid means and why :xt/* would work here but not in my simpler query above:
{:find [event-at event]
 :in [uid]
 :where [(union-join {:find [uid event-at event]
                      :where [($ :posts [uid {:xt/valid-from event-at, :xt/* event}])]}
                     {:find [uid event-at event]
                      :where [($ :comments [uid {:xt/valid-from event-at, :xt/* event}])]}
                     {:find [uid event-at event]
                      :where [($ :likes [uid {:xt/valid-from event-at, :xt/* event}])]})]
 :order-by [[event-at :desc]]}

refset13:05:06

Hey 🙂 I think uid is "user-id" in the example. I'm not sure what that multimethod error indicates offhand. What did the map look like that you submitted?

Petrus Theron13:05:55

(def node (xt.node/start-node {}))

(x/submit-tx node [[:put :people {:xt/id :zahier
                                    :name  "Zahier"
                                    :roles #{:admin}}]
                     [:put :people {:xt/id :petrus
                                    :name  "Petrus"
                                    :roles #{:admin}}]
                     [:put :people {:xt/id :loekie
                                    :name  "Loekie"
                                    :roles #{:customer}}]])

(x/q node
    '{:find     [pid name]
      :where    [(match :people {:xt/id pid})
                 [pid :name name]]})
=> [{:pid :zahier, :name "Zahier"} {:pid :petrus, :name "Petrus"} {:pid :loekie, :name "Loekie"}]

(x/q node
    '{:find     [pid]
      :where    [($ :people {:xt/id pid, :xt/* person})]})
=> Syntax error (IllegalArgumentException) compiling at (src/zworx/data.clj:28:3).
No method in multimethod 'read-value-code' for dispatch value: :set

Petrus Theron13:05:27

Seems to be related to the sets. The query runs if the DB is empty or if I change the :roles set values to vectors.

jarohen13:05:57

@U051SPP9Z mind running the same query with :explain? true?

Petrus Theron13:05:08

The multimethod xtdb.expression/read-value-code doesn’t handle :set yet

jarohen13:05:13

our support for sets in v2 is a little basic atm, admittedly

jarohen13:05:42

yep. we might be able to fall back to the :list support there, given sets are represented as Arrow lists under the hood (with a little bit of metadata to remind us that they're sets, when required)

Petrus Theron11:05:50

v2 Query syntax: How do I query for all records in a table where a set value contains a needle value I’m looking for? E.g. [:put :people {:xt/id :joe :roles #{:admin}] <= how do I find all :people with :roles that contain :admin?

jarohen13:05:59

we're looking at explicit unnesting in v2, because we've had a fair few people in v1 wanting to get the set/list out as a value, who've been caught out by the implicit unnesting in v1

Petrus Theron13:05:29

How does imlipict unnesting look in v1 and how would explicit unnesting look in v2?

jarohen13:05:46

implicit unnesting in v1 is just {:find [user], :where [[user :roles :admin]]} - the roles value is unnested into its individual values

jarohen13:05:23

in v2, it currently looks like this:

(xt/q node '{:find [user-id],
             :where [($ :people {:xt/id user-id, :roles [role ...]})
                     [(= role :admin)]]})

👍 1
jarohen13:05:54

i.e. the [role ...] syntax is saying that 'the :roles value is a collection, give me each value'

Petrus Theron13:05:21

Ah, I see. Yeah implicit unnesting feels dodgy - how to ground an exact value of a seqable thing? Explicit unnesting feels right.

jarohen13:05:33

mm. it's a bit more verbose, sure, but much more intentional

jarohen13:05:22

(we'll want to make that work with sets, of course)

Petrus Theron13:05:36

Would be nice to have:

(x/q node
  {:find  [person]
   :in    [[?role ...]]
   :where [($ :people {:xt/id pid, :xt/* person, :roles [?role ...]}
  #{:admin})

Petrus Theron13:05:59

Or something like (there is probably official Datalog for binding coll values):

(x/q node
    '{:find  [person]
      :in    [?role ...]
      :where [($ :people {:xt/id pid,
                          :xt/* person
                          :roles roles})
              [roles [?role...]]]})

jarohen13:05:12

yep, that works:

(xt/q node ['{:find [user-id]
              :in [[role ...]]
              :where [($ :people {:xt/id user-id, :roles [role ...]})]}
            #{:admin}])
(location of the argument depends on which snapshot version you're using - this version literally pushed yesterday)

jarohen13:05:23

I have been quite tempted to take the unnesting out of the $, though, as you suggest :thinking_face:

Petrus Theron13:05:59

I pulled snapshot a few hours ago and am I’m getting:

(x/q node
    '{:find  [person]
      :in    [[role ...]]
      :where [($ :people {:xt/id pid,
                          :xt/*  person
                          :roles [role ...]})]}
    #{:admin})
=> Execution error (IllegalArgumentException) at xtdb.api/q& (api.clj:108).
Don't know how to create ISeq from: clojure.lang.Keyword
(All stored :roles values are vectors now.)

Petrus Theron13:05:53

Why is this considered malformed:

(x/q node
    '{:find  [person]
      :where [($ :people {:xt/id pid,
                          :xt/*  person
                          :roles roles})
              [(= roles [:admin])]]}
    {:explain? true})
=> Execution error (IllegalArgumentException) at xtdb.error/illegal-arg (error.clj:13).
Malformed query
How to find a person with an exact :roles value that is a vector?

jarohen13:05:56

>

Don't know how to create ISeq from: clojure.lang.Keyword
yep - this is the breaking change of the argument location - these are now passed as (xt/q node [q & args]), to make it consistent with how args are passed in SQL

jarohen13:05:44

> How to find a person with an exact :roles value that is a vector? I think this one is just a 'not implemented yet' I'm afraid 🙂

jarohen13:05:02

but that looks how I'd expect it to look

Petrus Theron13:05:33

Any possibility of getting the faster vectorized query engine being backported to v1? I have some simple queries that take way too long for ~13k entities. Equivalent Datomic queries ran instantly (obviously no bitemporality).

jarohen13:05:52

not likely, I'm afraid - it's quite a different codebase. any way we can help out with optimising the queries?

Petrus Theron13:05:58

Is there a way to get :explain? in v1?

jarohen13:05:45

the nearest equivalent is (xtdb.query/query-plan-for db q), in which the :vars-in-join-order key tells you the order the query engine'll resolve the different logic variables

Petrus Theron13:05:41

Hmmk so :order-by is what seems to be taking forever. It takes about as long as pulling all 12k matching records w/sort-by w/o :order-by clause. It’s a double-entry accounting system, so the data model cannot change without losing power: every Transaction has 2+ Entries. Each Entry is either a Debit or a Credit. Transaction has :tx/date (LocalDate) & :tx/description (String). Every entry belongs to an account (:entry/account). Accounts are a tree, so each account has a parent. Pretty much every query involves querying across :tx/date for tax periods and filters entries that belong to certain accounts with child rules. Query:

{:find [?entry ?date],
 :order-by [[?date :asc]],
 :in [?needle parent-account _], ;; ?needle not used in this version.
 :rules [[(child-of ?c ?p) [?c :account/parent ?p]] [(child-of ?c ?p) [?c :account/parent ?t] (child-of ?t ?p)]],
 :where [[?tx :tx/date ?date]
         [?entry :entry/tx ?tx]
         [?entry :entry/account ?account]
         (or [(= ?account parent-account)] (child-of ?account parent-account))]}
There are a total of 12,230 matching records. Fetching the first 150 (:limit 150) takes 530ms locally with in-process node. Time does not seem to scale much as :limit goes up. If I take out the :order-by, it takes 15ms.

Petrus Theron13:05:09

Taking out the parent account query clause does not make it any faster. Could this be slow because I’m storing LocalDate instead of #inst? I guess I can do tricks like storing ?year, ?month and ?day and cascade sorting for speed.

Petrus Theron14:05:58

Meh…I tried adding :tx/year, :tx/month & :tx/day to all txes and cascaded sorting (`:order-by [[?year :asc] [?month :asc] [?day :asc]]`) is only about 25% faster (400ms vs 530ms).

refset15:05:19

> Time does not seem to scale much as :limit goes up the :order-by will always force the whole result set to be computed and sorted behind the scene, so in reality :limit will only avoid some deserializing at the end of the process

Petrus Theron16:05:39

Managed to make this query muuuch faster by putting the account tree query in a subquery.

refset17:05:04

glad to hear! do you mean instead of using rules altogether? I realise now that I definitely should have suggested to try adding bound variable hints to the rule heads (sorry!), as in:

{:find [?entry ?date],
 :order-by [[?date :asc]],
 :in [?needle parent-account _], ;; ?needle not used in this version.
 :rules [[(child-of [?p] ?c) [?c :account/parent ?p]] [(child-of [?p] ?c) [?c :account/parent ?t] (child-of ?p ?t)]],
 :where [[?tx :tx/date ?date]
         [?entry :entry/tx ?tx]
         [?entry :entry/account ?account]
         (or [(= ?account parent-account)] (child-of parent-account ?account))]}

refset17:05:32

this would likely have a dramatic performance impact

Petrus Theron18:05:04

How do bound variables hints work? Is it documented somewhere? Did you mean to reverse the order of parameters in child-of? Is that required for it to work?

refset20:05:26

> reverse the order of parameters in child-of? Is that required for it to work? in the sense that the bounding [ ] braces only work in the first argument position, yes

Petrus Theron14:05:19

As I understand, XT keeps an index (I assume a sorted set) of each attribute in a record. How can I read that index directly like with datomic.api/datoms so that I can apply more performant subfilters from there, or do I need to resort to Datomic for this? At the moment my queries take 500ms against only 13,000 accounting records.

refset15:05:01

There's no public, supported API for that kind of direct low-level index access, although the internal protocols are there if you really want to explore that route. Have you tried using simpler queries (which will be direct index scans...more or less) and relying on open-q to stream the results into your own computations?

Petrus Theron15:05:42

When I run the following query, order of ?date is random:

(x/q (x/db node)
    '[:find ?date ?tx
      :where
      [?tx :tx/date ?date]])
Also if I use open-q:
(with-open [res (x/open-q (x/db node)
                    '[:find ?date ?tx
                      :where
                      [?tx :tx/date ?date]])]
    (into [] (iterator-seq res)))

Petrus Theron15:05:47

Hnng, even this takes 410ms (?date is LocalDate):

(time
    (do (with-open [res (x/open-q (x/db node)
                        '[:find ?date ?tx
                          :where
                          [?tx :tx/date ?date]])]
        (into (sorted-set-by (fn [[a t1] [b t2]]
                               (.isBefore a b))) (iterator-seq res)))
        nil))
"Elapsed time: 410.652663 msecs"
=> nil

refset15:05:19

a range predicate can be used to influence the returned order, iff it correlates with the 'natural' binary sorted order, see https://github.com/xtdb/xtdb/discussions/1514

Petrus Theron15:05:29

Is there a “lazy sorting” technique to realize sorted entity IDs sooner? I am only ever displaying at most 150 entities on one screen, so I can pull them in in parallel, but I want to be able to paginate faster.

refset15:05:41

The notes/messages in that discussion cover this in more detail, but there's an essential difficulty to making generic paginated sorting easy

refset15:05:04

A lot of people just dump a copy of everything into Elasticsearch for that sort of UI work

Petrus Theron16:05:21

Hmm, got it working by adding a range predicate on :tx/date. Thanks!

🙌 1
Petrus Theron14:05:00

Will v2 support prospective txes like with xt/with-tx?

refset15:05:20

Eventually yes, you can watch this issue to track progress: https://github.com/xtdb/xtdb/issues/2353

refset15:05:10

It's unlikely to look like with-tx - it will likely take the form of a ~stateless query parameter (because db no longer exists)

Petrus Theron15:05:19

I am going to miss db because I felt safe knowing that unless I passed a node to a fn, nothing could get transacted.

refset15:05:16

Interesting observation, I guess we could add a read-only wrapping protocol 🙂

Christian Pekeler21:05:39

I also wondered why db will be taken away. Seems like such a natural fit for an immutable DBMS.

jonpither08:05:12

Hey @UDQE4G9B2 🙂 Thanks for the Q. There's a couple of reasons we want to remove db. First is to make the API more data oriented - in v2, all of the temporal dimension is available for query as part of the query argument, and therefore can then be combined with other query clauses. The other reason is to focus on producing a narrower 'core' API, that others can build upon if they so wish. This is partly to make XT more remote-API friendly, but also lessons we've learnt from v1. The db in v1, in truth, doesn't do a lot, other than represent bitemporal coordinates. It would be fairly straightforward to recreate this in user-space for v2, or to make a thin wrapper, that wraps the core API.

jonpither08:05:55

@U051SPP9Z You could hopefully feel safe to some degree, by supplying the transaction time as part of the query, to get read consistent guarantees