Fork me on GitHub
#xtdb
<
2023-05-17
>
Petrus Theron08:05:44

The following v1 query consistently causes the JVM to segfault:

(with-open [res (x/open-q (x/db node)
                    {:find  '[?date ?tx]
                     :in    '[from to ?needle parent-account],
                     :rules child-rules
                     :where '[[?tx :tx/date ?date]
                              [?entry :entry/tx ?tx]
                              [(>= ?date from)]
                              [(< ?date to)]
                              [?entry :entry/account ?account]
                              (or
                                [(= ?account parent-account)]
                                (child-of ?account parent-account))]}
                    #time/date "2000-01-01"
                    #time/date "2030-01-01"
                    nil :root)]
    (take 150 (drop 0 (iterator-seq res))))
where :tx/date is stored as LocalDate. Does not happen with plain old x/q, but if I add an (into [] …) in front of the (take 150 …), it works just fine:
(with-open [res (x/open-q (x/db node)
                    {:find  '[?date ?tx]
                     :in    '[from to ?needle parent-account]
                     :rules child-rules
                     :where '[[?tx :tx/date ?date]
                              [?entry :entry/tx ?tx]
                              [(>= ?date from)]
                              [(< ?date to)]
                              [?entry :entry/account ?account]
                              (or
                                [(= ?account parent-account)]
                                (child-of ?account parent-account))]}
                    #time/date "2000-01-01"
                    #time/date "2030-01-01"
                    nil :root)]
    (take 150 (drop 0 (iterator-seq res))))

tatut08:05:06

try wrapping doall around take

2
tatut08:05:17

as I think your lazy seq is escaping the with-open

thomas09:05:05

But it still shouldn't segfault the JVM. that is also a JVM bug IMHO. (and no idea what the work around it)

tatut09:05:50

problem is that rocksdb/lmdb are native components called in the JVM, and those might crash and take the JVM with it… it is usually easy to avoid by being careful with laziness, but I agree that it shouldn’t

thomas09:05:00

aah ok, in that could be the case of course. good point. (But still not ideal that it takes down the JVM with it, so could be a bug in the native component)

Petrus Theron17:05:33

@taylor.jeremydavid surprisingly, the order of results does not seem to be stable when calling x/open-q with Lucene text-search clause, even with date range predicates, e.g.:

{:find [?tx ?date],
 :in [needle start-date end-date],
 :where [[(text-search :tx/description needle) [[?tx _ _]]]
         [?tx :tx/date ?date]
         [(>= ?date start-date)]
         [(< ?date end-date)]]}
If I take out text-search, the result is stable. I’ve tried changing the position of the clauses, but does not change the result. I had to move the text-search to a subquery to get stable results. As I understand, I can’t use :order-by because then queries will be fully materialized and will be to slow. I tried it and indeed significantly slows down my query even with open-q. Would really like an API that gives me direct access to indices that will always yield a lazy seq of sorted indexed values. I can build my own queries, but can’t have search order changing unexpectedly.

refset18:05:56

interesting, I don't think I've attempted to combine those two things before. text-search is inherently non-lazy though, so if it somehow works to wrap that in a subquery then its okay because the performance impact of doing that should be limited

refset18:05:06

> Would really like an API that gives me direct access to indices that will always yield a lazy seq of sorted indexed values. I can build my own queries, but can’t have search order changing unexpectedly. Depending on how advanced your requirements are the best solution currently may be to build up some external secondary-index-like structures somewhere. Like the way the Lucene module hooks in to the indexing + checkpointing process hints at what's possible. In the longer term though the aim is to make sure the query language is expressive enough and the query engine powerful enough that you don't feel you need to write your own bespoke query layer. Maybe the answer is to offer more control over the query optimisation, rather than exposing underlying index APIs.

Petrus Theron23:05:38

Is there a difference between :args and :in other than map vs positional? Not documented: https://docs.xtdb.com/language-reference/datalog-queries/#in

refset23:05:58

:args is essentially deprecated and is much more simplistic because it can't generate relations, so definitely use :in

👍 2
refset23:05:18

Where did you see it mentioned?

refset11:05:52

Ah, thanks, good to know

Petrus Theron23:05:44

In what order will open-q return results if I specify multiple range predicates?

2
refset23:05:59

The lazy result ordering is undefined and shouldn't be depended on. Essentially it is very sensitive to the planned join order, which is unpredictable for anything but the most trivial queries