Fork me on GitHub
#xtdb
<
2021-09-27
>
mac08:09:32

I don't understand why the two queries below return different results. I would have expected both to return the result of the first #{[1]} but the second returns #{[nil]} .

(let [concerns-interval (t/interval "2020-02-29T23:00:00Z/2020-03-31T23:00:00Z")]
    (crux/q
     (crux/db node)
     '{:find [(count ?e)]
       :in [[?concerns-interval]]
       :where [[?e :request-type :error]
               [?e :error-category 2]
               [?e :time ?time]
               [(java-time/contains? ?concerns-interval ?time)]]}
     [concerns-interval]))

(let [concerns-interval (t/interval "2020-02-29T23:00:00Z/2020-03-31T23:00:00Z")]
    (crux/q
     (crux/db node)
     '{:find [e2]
       :where [[(q {:find [(count ?e)]
                    :in [[?concerns-interval]]
                    :where [[?e :request-type :error]
                            [?e :error-category 2]
                            [?e :time ?time]
                            [(java-time/contains? ?concerns-interval ?time)]]})
                e2]]}
     [concerns-interval]))

tatut10:09:58

it looks like you aren't passing query args to the inner q

mac10:09:25

It does not complain, and it makes no difference if I comment it out. If I pass it explicitlý I get an error.

tatut10:09:53

the outer q doesn't specify any parameters

mac10:09:54

I see that you mean. If I do the below, it fails with: No implementation of method: :contains? of protocol: #'java-time.interval/AnyInterval found for class: clojure.lang.Symbol

(let [concerns-interval (t/interval "2020-02-29T23:00:00Z/2020-03-31T23:00:00Z")]
    (crux/q
     (crux/db node)
     '{:find [e2]
       :where [[(q {:find [(count ?e)]
                    :in [[?concerns-interval]]
                    :where [[?e :request-type :error]
                            [?e :error-category 2]
                            [?e :time ?time]
                            [(java-time/contains? ?concerns-interval ?time)]]}
                   [concerns-interval])
                e2]]}))

Hukka10:09:05

I'm just starting to use xtdb myself, but in this latter example the [concerns-interval] is inside the quote

Hukka10:09:13

While in the top most it gets evaluated

tatut10:09:24

you need to specify the parameter in the outer with :in as well and pass it to the inner one

mac10:09:11

The below fails with the same error. Am I missing something.

(let [concerns-interval (t/interval "2020-02-29T23:00:00Z/2020-03-31T23:00:00Z")]
    (crux/q
     (crux/db node)
     '{:find [e2]
       :in [[?concerns-interval]]
       :where [[(q {:find [(count ?e)]
                    :in [[?concerns-interval]]
                    :where [[?e :request-type :error]
                            [?e :error-category 2]
                            [?e :time ?time]
                            [(java-time/contains? ?concerns-interval ?time)]]}
                   [?concerns-interval])
                e2]]}
     [concerns-interval]))

mac10:09:48

@U8ZQ1J1RR I noticed that too, but that is how sub-queries are done in the documentation, so I followed that.

Hukka10:09:51

For what's it worth, I already marked this thread as followed to see the answers, before there were any answers. Seems like I don't have a clue either!

tatut10:09:19

(xt/submit-tx node [[::xt/put {:xt/id "test" :meaning-of-life 42}]])

tatut10:09:39

(xt/q (xt/db node) '{:find [subq] :where [[(q {:find [(count ?e)] :where [[?e :meaning-of-life ?answer]] :in [?answer]} ?outer-answer) subq]] :in [?outer-answer]} 42)
;; => #{[([1])]}

tatut10:09:00

that at least works how I expect it to

👍 1
1
mac11:09:13

@U11SJ6Q0K Thanks, I must have made a destructuring mistake somewhere. It works now.

🙌 1
🙏 1
ajones16:09:00

Quick question for those more knowledgeable than I, is XTDB subject to a similar performance degradation when transaction count gets really high (over 10 billion) as with Datomic? Been trying to find the answer lately, but haven't had much luck yet. Thanks in advance https://ask.datomic.com/index.php/403/what-is-the-size-limit-of-a-datomic-cloud-database?show=412#a412

tatut16:09:29

I'm no knowledgeable but was just thinking about similar things (what constitutes a "big" database in xtdb?) is it millions of documents or more.... but as the system is unbundled I think different parts will have very different operational parameters about what is big and at what point they become slow

tatut16:09:24

One aspect I think would be "what is a big tx log / document store" and another "what is a big local index store for LMDB/RocksDB to handle"

1
tatut16:09:55

Haven't done any benchmarks myself but I'm not really worried as I'm working in the "everything should fit in memory in a large instance" -scale of data

jonpither16:09:43

the lazyiness of query helps 🙂 Also the underlying power of Rocks to hold data

refset17:09:42

> different parts will have very different operational parameters This is definitely true. > "what is a big local index store for LMDB/RocksDB to handle" This is probably the most important specific question. Neither LMDB or RocksDB have hard practical limits, and actual performance will really depend on the hardware. RocksDB is faster for ingestion though, and there is much more experience of using it at-scale in the wild, e.g. here the team at CockroachDB discuss loading 5TB into RocksDB, which takes "6.5-10h for the import job to finish, after which it would be another 3-6h for compaction activity to quieten down" (the reasons for switching to their homegrown Rocks-compatible KV store are interesting, but definitely orthogonal to this discussion) https://www.cockroachlabs.com/blog/bulk-data-import/ On the query side, both KV stores handle reads at scale very impressively (though LMDB is faster), and as Jon mentioned, XT's query engine is uniquely able to lazily process data out of the KV indexes, which avoids any need for giant in-memory caches just to process basic queries as your data set expands into the TBs. That said, more memory is always a useful!

refset17:09:32

I should add that XTDB itself has no hard limitations either, and we/I would be happy to help you figure out some back-of-the-envelope idea of performance for your use-case & predicted data volumes

ajones20:09:13

Thanks for all the info, I'll dig in some more to what you all provided, and take it back to share with my team. Yeah, the sheer amount of data and transactions associated with it is the big thing trying to figure out, trying to get a better understanding of what our options are.  Again, thanks for the help. @U899JBRPF thanks, I'll keep that in mind

🙏 1
Steven Deobald17:09:10

It probably goes without saying, but if you have a hygienic sample data set, it could be interesting to push xt 1.x with a spike (which we could potentially help you with) to see if it's appropriate for your problem space, if the envelope calculations looks sane. XT 2.x is on a trajectory to serve really behemoth data sets but for those of us (read: me) who are using xt 1.x for its graph capabilities on relatively small data sets, it would be informative to see, empirically, how far xt 1.x can go.

1