xtdb 2021-11-15 | Slack Archive

Can someone please help me understand the intuition behind xtdb's concept of a node.

👍 1

1. A node is an xtdb.api instance, defined as a Clojure record, that allows you to submit transactions and/or run queries, see https://github.com/xtdb/xtdb/blob/master/core/src/xtdb/node.clj#L88. 2. A node may embed a transaction-log and document-store, which default to embedded in-memory KV backends, but for general production usage both the transaction-log and document-store are durable, high-available & isolated services that run outside of the nodes. 3. Nodes are fully isolated from each other, asides from shared access to the transaction-log and document-stored 4. Querying requires each node to maintain a materialized view of the transaction log in an index-store, which also defaults to an embedded in-memory KV backend 5. The index-store is always local to a node (unless you want to try something funky like https://github.com/crux-labs/crux-redis)

refset14:11:43

refset14:11:52

If you want more explanation still, I covered the architecture diagram and the topology options slide above on a meetup call earlier in the year https://youtu.be/wfaExQp9kes?t=2131 Hope that helps 🙂

refset15:11:19

sheluchin13:11:50

I'm running into an issue getting checkpoint restores to work:

I 2021-11-15T13:08:52.826Z roterski.fulcro.rad.database-adapters.xtdb.start-databases:- 74 - Starting database  :main
I 2021-11-15T13:08:52.832Z                          xtdb.checkpoint:-326 - restoring from {:xtdb.checkpoint/cp-format {:index-version 18, :xtdb.mem-kv/version "1"}, :tx #:xtdb.api{:tx-time #inst "2021-11-15T12:41:37.904-00:00", :tx-id 1}, :xtdb.checkpoint/cp-path #object[sun.nio.fs.UnixPath 0x3a8b2a9c "/home/sheluchin/repos/prj/.backups/xtdb/checkpoint-1-2021-11-15T13:04:55.572-00:00"], :xtdb.checkpoint/checkpoint-at #inst "2021-11-15T13:04:55.572-00:00"} to /tmp/memkv-cp7609197928618161464
Execution error (IllegalArgumentException) at xtdb.checkpoint.FileSystemCheckpointStore/download_checkpoint (checkpoint.clj:144).
Don't know how to create ISeq from: java.util.stream.ReferencePipeline$Head

My config:

{:main
  #:xtdb{:index-store
         {:kv-store
          {:xtdb/module "xtdb.mem-kv/->kv-store",
           :checkpointer
           {:xtdb/module "xtdb.checkpoint/->checkpointer",
            :store
            {:xtdb/module
             "xtdb.checkpoint/->filesystem-checkpoint-store",
             :path
             "/home/sheluchin/repos/prj/.backups/xtdb/"},
            :approx-frequency
            #object[java.time.Duration 0x59673c41 "PT5M"]}}}}}

And I confirmed that the checkpoint files are available:

$ tree .backups/xtdb/
.backups/xtdb/
├── checkpoint-1-2021-11-15T12:42:30.602-00:00
│   └── memkv
├── checkpoint-1-2021-11-15T12:42:30.602-00:00.edn
├── checkpoint-1-2021-11-15T12:49:12.562-00:00
│   └── memkv
├── checkpoint-1-2021-11-15T12:49:12.562-00:00.edn
├── checkpoint-1-2021-11-15T12:52:40.571-00:00
│   └── memkv
├── checkpoint-1-2021-11-15T12:52:40.571-00:00.edn
...

The config gets put through symbolize-xtdb-modules so the module strings should be okay there.

refset14:11:44

Hi @UPWHQK562 I think you have hit a bug, as we had a similar report a few weeks ago: https://juxt-oss.zulipchat.com/#narrow/stream/194466-crux-users/topic/Checkpointer.20restoring.20failed There's already a note about this on the project about it but I'll convert it to an issue now. The workaround is to use Rocks / LMDB instead, and memkv is not really intended for production usage anyway, unless you are very confident you'll never hit an OOM situation 🙂

refset14:11:43

https://github.com/xtdb/xtdb/issues/1661

sheluchin14:11:23

Hi @U899JBRPF. Sounds good, thanks for logging the issue. I wonder if this is related https://github.com/xtdb/xtdb/issues/1657

sheluchin14:11:26

I've tried to stick to just using the in-memory db for now because it'll be some time before I'm production ready, xtdb is very new to me, and I want to keep the learning surface to a minimum while I get familiar with things... but I suppose this limitation is enough reason to ramp up to Rocks / LMDB.

👍 1

refset14:11:00

> I wonder if this is related Hmm, certainly could be, I haven't dug into the possible causes of either just yet though(!)

refset14:11:51

Thanks for mentioning this problem, I hope the rest of your journey is smoother 🙂

sheluchin14:11:25

No worries, thanks for the help 🙂

🙏 1

sheluchin14:11:52

I'm surprised to see no mentions of Redis anywhere. I thought that was kinda the KV DB these days.

refset14:11:30

...did you see my reply on the other thread, or is this a coincidence? 😅 (https://clojurians.slack.com/archives/CG3AM2F7V/p1636985043183200?thread_ts=1636948349.181000&cid=CG3AM2F7V)

sheluchin14:11:19

Pure coincidence. I have a few years experience using Redis with Django in a past life.

refset15:11:11

Haha, weird. The short summary is that Redis is in a separate process, probably over the network, and that is far from optimal from a query point-of-view as XT's Datalog engine is quite "chatty"

refset15:11:10

The proof-of-concept module I linked to above did seem to work, but was on the order or 3-5x slower than Rocks. Still, for some architectures and cost-performance profiles it might be a viable idea

refset15:11:00

I was quite keen to explore the combination of Redis+Lambda at the time for awe inspiring horizontal scaling, but didn't get too far. Also since that time https://upstash.com/ has come along and made the idea even more viable (cheaper!)

sheluchin15:11:05

Hmm, haven't heard of upstash. So because Redis is a remote dictionary service it makes it less ideal for XT because of the amount of chatter the Datalog engine produces? But your second point in that thread mentions that in production environments, the transaction log and docstore are likely to be isolated, which is what I'd expect. Would Redis be as good of an option for those modules, while still being suboptimal for the index store? Does having to co-locate the index store with the node lead to optimization difficulties down the road?

sheluchin15:11:54

I'm going to check out that video you linked in the other thread during some downtime. Maybe that will answer some of these questions.

🙂 1

refset15:11:05

> So because Redis is a remote dictionary service it makes it less ideal for XT because of the amount of chatter the Datalog engine produces? ✅ > the transaction log and docstore [...] Would Redis be as good of an option for those modules Redis isn't typically deployed with ACID durability in mind, but yes, it could certainly be made to work, see http://oldblog.antirez.com/post/redis-persistence-demystified.html

refset15:11:23

> Does having to co-locate the index store with the node lead to optimization difficulties down the road? Only in the sense of horizontal scale-out being ~expensive if you have huge+fast disk requirements, and also the time to spin up a new node becomes dependent on bandwidth to the checkpoint store as the database size grows. On the plus side, the cold-query latency time should be much better than a fully remote index design (where chunks need to be downloaded on-the-fly)

👍 1

refset15:11:19

As with any database, it's usually more cost-effective to scale-up before you scale-out

sheluchin16:11:09

Is there something extra required for shutdown when using RocksDB? I'm shutting it down with the usual (for [node xtdb-nodes] (.close node)) but when trying to start it back up I get:

I 2021-11-15T16:24:42.329Z                    _fulcro.server.config:- 54 - Reading configuration file at  config/defaults.edn
I 2021-11-15T16:24:42.331Z                    _fulcro.server.config:- 54 - Reading configuration file at  config/dev.edn
I 2021-11-15T16:24:42.332Z                    prj.components.config:- 14 - Loading config config/dev.edn
I 2021-11-15T16:24:42.333Z roterski.fulcro.rad.database-adapters.xtdb.start-databases:- 74 - Starting database  :main
Execution error (RocksDBException) at org.rocksdb.RocksDB/open (RocksDB.java:-2).  lock hold by current process, acquire
time 1636993287 acquiring thread 139768984753728: /home/sheluchin/repos/prj/tmp/rocksdb/LOCK: No locks available

sheluchin16:11:42

Using this config:

{:main
  #:xtdb{:index-store
         {:kv-store
          {:xtdb/module "xtdb.rocksdb/->kv-store",
           :db-dir
           "/home/sheluchin/repos/prj/tmp/rocksdb",
           :checkpointer
           {:xtdb/module "xtdb.checkpoint/->checkpointer",
            :store
            {:xtdb/module
             "xtdb.checkpoint/->filesystem-checkpoint-store",
             :path
             "/home/sheluchin/repos/prj/.backups/xtdb/"},
            :approx-frequency "PT5M"}}}}}

refset16:11:08

Hmm, is the lazy-seq returned by the for definitely being consumed? .close should be synchronous and should do the job

sheluchin13:11:27

Thanks for the help yesterday @U899JBRPF. Got it all sorted now.

🙌 1

refset13:11:34

Awesome, glad to hear & thanks in turn for confirming / following up!

Tomas Brejla20:11:30

Hello again. I believe it might have already been discussed in the past, but I'm unable to find any material that would answer my question, so I'll ask it again. I have a 100000 tiny documents in my db (let's say in a single in-memory node or single rocksdb/rockdb node), querying for count of such items is quite slow on my machine (2.4s for in-memory node, 3.4s for rocksdb one). Is this normal behavior or it's just my machine? Is there any simple way of improving the performance of count queries? If so, which one? I believe I could transact a document with "row-count" of the "entity type" I'm trying to count - and update it whenever I'm somehow modifying (inserting/removing) my documents.. But not only would that probably be quite cumbersome (and possibly error-prone), but it wouldn't work for queries with more dynamic nature (with additional conditions in :where clause).

xlfe22:11:32

depending on exactly how you are counting (could you share your query?) you might be doing a "full table scan". but 100k documents isn't big and so that does strike me as slow. on my laptop with on disk lmdb I think counting 400k documents takes approx 1s so would expect 100k to be < 250ms

Tomas Brejla23:11:37

(ns scratch
  (:require [xtdb.api :as xt]))

(comment
  (defn insert-and-query-count [node entity-type]
    (println "transacting test data")
    (time
     (xt/await-tx
      node
      (xt/submit-tx
       node
       (->> (range 100000)
            (mapv #(-> [::xt/put {:xt/id (java.util.UUID/randomUUID)
                                  :entity-type entity-type 
                                  :item-number %}]))))))
    (println "querying count")
    (time
     (xt/q
      (xt/db node)
      {:find '[(count ?e)]
       :where [['?e :entity-type entity-type]]})))

;; against in-memory throw-away node
  (with-open [node (xt/start-node {})]
    (insert-and-query-count node :test-entity))
;; transacting test data
;; "Elapsed time: 13896.30067 msecs"
;; querying count
;; "Elapsed time: 2400.679209 msecs"
;; #{[100000]}

;; against running node in integrant system (node using rocksdb for both index and doc store, storing files to local SSD folder)
  (insert-and-query-count (:ordering/xtdb-node (user/system)) :test-entity)
;; transacting test data
;; "Elapsed time: 18236.140874 msecs"
;; querying count
;; "Elapsed time: 3430.865978 msecs"
;; #{[100000]}
  )

Tomas Brejla23:11:01

> on exactly how you are counting (could you share your query?) you might be doing a "full table scan" @U01ENMKTW0J So there's multiple approaches when doing such count query and I might somehow avoid the "full table scan"? 🤞

refset00:11:51

Hi, see attached for what your in-memory query looks like in my profiler 🙂 The very brief summary is that XT has to do a non-trivial (and perhaps surprising) amount of transformation and caching work to scan through each AVE index entry and resolve the triple clause [?e :entity-type :test-entity] - this is due to various aspects of the overall index design, which is primarily optimised for ad-hoc, bitemporal, multi-way join graph queries (as in https://docs.xtdb.com/resources/performance/#watdiv) ...as opposed to "current time" column-oriented operations (like counting). Unfortunately therefore, I don't think there is an easy speedup available for your scenario other than to compute and store the count manually in some fashion, e.g. using transaction functions, or by maintaining a secondary index of counts / current-time values (see the secondary index integration mechanism that the Lucene module uses)

xlfe01:11:25

I'm getting 145k / second (counting 1.6m docs based on attr=val in ~11 seconds) on LMDB on my laptop using a count of the AV index directly rather than through xt/q, and (for our use case) we added a memoize so the full table count is cached

Tomas Brejla09:11:36

I was kinda expecting that what @U899JBRPF wrote might be a reality and there won't be much to do with it at the moment. I wonder.. isn't this a problem for people "in the trenches" (having production systems with loads of data)? I'd expect that I'd probably have a need for fast-enough count operation relatively soon on a real-world project. For example I can imagine having a pageable listing where I want to indicate what's the total number of items matching the current user-specified filter. I guess I'd be "fine" (it would be tolerable) with showing a generic information such as "More than 1000 items" in the case of mentioned pageable listing. Which brings me to another question (sorry, I'm simply very curious 🙂)... How is count supposed to work with :limit? I tried

:find '[(count ?e)]
:limit 10

but it is still as slow as without the :limit clause and returns [[100000]] . Is this expected behavior? Is there better way of "performing count, but only to some max limit"? If i return just ?e instead of (count ?e) , then the :limit kicks in correctly of course. So in fact it's possible to perform an ID-only search with limit instead of count with limit and then count the ids manually from the result. This is fast enough (21ms for 1000 items).

refset12:11:15

the count aggregate inherently forces all intermediate result tuples to fully materialized (though the aggregate itself is lazy, so the memory overheads here should stay low) I think you want to do something like this example, where a lazy open-q is embedded within a custom function and subjected to a userspace limit: https://gist.github.com/refset/a61a89cfb6bd505ed306a18a9132c62f > a pageable listing where I want to indicate what's the total number of items matching the current user-specified filter This kind of requirement is probably much better served by a secondary search index, potentially using the Lucene full-text search module as a foundation

refset12:11:57

> I'm getting 145k / second (counting 1.6m docs based on attr=val in ~11 seconds) on LMDB on my laptop using a count of the AV index directly rather than through xt/q, and (for our use case) we added a memoize so the full table count is cached that's certainly snappier 🙂 are you also applying the entity-resolver (temporal validity) check?

xlfe22:11:07

@U899JBRPF yes (i beleive so!) the reverse lookup looks like

(defn index-ave                                                                                          
    "reverse index"                                                                                        
    [{:keys [db is]} eid attr]                                                                             
    {:pre [(some? db) (some? is)]}                                                                         
    (doall                                                                                                 
      (map                                                                                                 
        #(xtdb.db/decode-value is %)                                                                       
        (xtdb.db/ave is attr eid nil (:entity-resolver-fn db)))))

👌 1

xlfe22:11:01

I assume some of the kv backends probably have more efficient ways of counting distinct index values, but I haven't dug in to have a look

Tomas Brejla11:11:37

I tried the approach mentioned by @U01ENMKTW0J (using xtdb.db/ave directly). Counting 1 000 000 items by their attribute (`:entity-type :benchmark-1000000` ) takes around 27 seconds for my rocksdb configuration. And the performance of regular :where [[?e :entity-type :benchmark-1000000]] is pretty much the same :thinking_face: I also tried lmdb and the same took 12 seconds. Then the same query dropped to 5-7s after being executed multiple times - possibly some caches got pre-filled . And again, similar time for both db/q and xtdb/ave .

refset01:11:49

Interesting, thanks for keeping us posted with findings. It doesn't surprise me that the raw access approach isn't any faster that q in your case, since the query should compile to essentially the same code. Is it still too slow for your needs?

Tomas Brejla14:11:57

> Is it still too slow for your needs? I guess it's fine (-ish). I'm still just learning the basics of XTDB, so it's nothing that would limit me at the moment. I was just a bit surprised when I was examining data in my DB via REPL and the evaluation of count query took "forever" and appeared to be stuck. Instinctively, I wouldn't expect count operation to take that long (10s of seconds) for DB containing just 1 million docs (and nothing else, no huge version-history etc). But if this is simply inevitable due to the different structure of (bi-temporal) indexes, I'll have to accept it and find different, faster ways of achieving similar acceptable functionality. I guess counting with "max-count-limit" (and showing generic information such as "more than 10k items") might be acceptable workaround for use cases in my head. If nothing else, now I at least know that I should always be careful when using count (or any other aggregate functions, I guess) in XTDB. Thinking out loud.. should some information / warning about count's performance be described in the official docs in https://docs.xtdb.com/language-reference/datalog-queries/#find-aggregate or https://docs.xtdb.com/resources/performance?

👍 1

refset19:11:16

> should some information / warning about `count`'s performance be described in the official docs adding a note of caution to the aggregates section is a very resaonable suggestion, thanks for the prompt 👌 (I've added a note to the project board)

👍 1

Tomas Brejla23:11:17

When I have some spare time here and there, I occasionally try to explore different ways of doing more effective count-ing. Today I found that there is an

(defprotocol AttributeStats
  (all-attrs [this])
  (doc-count [this attr])
  (^double value-cardinality [this attr])
  (^double eid-cardinality [this attr]))

...and realized that instead of defining my docs as :entity-type :order I might use :entity-type/order nil and then use

(with-open [db (xt/open-db node)
                 is (:index-snapshot db)]
       (xdb/doc-count is :entity-type/order))

Which seems to yield correct numbers and is ultra-fast. More like.. WOW IT'S FAST! for 100k docs on lmdb: (count (_xdb_/ave is :entity-type :order nil (:entity-resolver-fn db))) 331ms, 100k items found (_xdb_/doc-count is _:entity-type_/order) 0.6ms, 100k items found Now I'm wondering... is this a safe pattern to be used for counting of items (of some type) in db without any other conditionsor just a totally dumb idea? :thinking_face:

refset23:11:46

Interestingly I've not seen that :entity-type/order nil pattern before but I like it(!) Unfortunately it may not be a complete solution though, since the attribute-stats index maintains a count (note: it's a materialized view, not computed) across all of time - so if you have multiple versions of a entity in the history then it gets counted multiple times, and deletes get ignored. Assuming you're fine with that (e.g. you're not using history, or if you are you can compensate), another caveat is that attributes in queries aren't dynamic (in XT anyway), so maybe you want to have both attributes (i.e. keep :entity-type :order also) for other querying reasons in future.

refset23:11:49

There's also the caveat that these aren't official APIs, but I think you know that 😅

🙂 1

Tomas Brejla23:11:17

Would :entity-type/order nil pattern open any other more performant alternative approaches? It looks silly at first glance, but perhaps it might allow some "non-mainstream" use of indexes for counting and other aggregate queries. And then again.. perhaps not :)

Tomas Brejla23:11:24

Too bad that the xdb/doc-count has that`across all of time nature. In doesn't feel very natural to me at the moment. At first glance I'd expect that if I feed an actual *index* *snapshot* to doc-count` , it might output the current stats - ie. of that very snapshot. Pitty there's no way to get current stats.

refset00:11:53

Snapshot in this context is more like low-level kv store read snapshot, rather than some funky temporal/persistent data structure

refset00:11:44

It's not so unthinkable to create a secondary index module which is just storing various counts, and even model them temporally somehow. Lucene has paved the way for many possibilities yet to be explored, by forcing us to get the APIs for integration figured out

❤️ 1

Eric Dvorsak09:06:16

I've encountered the same issue when trying to count 1 000 000 entries

xtdb> (time (xt/q (xt/db node)
      '{:find [(count ?user)]
        :keys [count]
        :where [[?user :role :USER]]
        :timeout 60000}))
"Elapsed time: 34808.286822 msecs"

It seems like writing secondary indexes is the way to go, but there's quite a skill gap between using xtdb and being able to understand lucene module source and use it as inspiration to build secondary indexes

refset09:06:12

> there's quite a skill gap between using xtdb and being able to understand lucene module source and use it as inspiration to build secondary indexes Noted @U03K8V573EC - I can try to cook up some example code ~soon. Alternatively, have you ruled out maintaining the counts at write time, via transaction functions? Or is that too awkward since you don't know what counts you need?

Eric Dvorsak09:06:16

It seems like in @U01LFP3LA6P case if display "More than 1000 items" is an acceptable trade-off instead of the full count the simplest solution is not using count in the query but on the results:

xtdb> (time (count (xt/q (xt/db node)
                            '{:find [?user]
        :where [[?user :role :USER]] :limit 1001
        :timeout 60000})))
"Elapsed time: 47.673839 msecs"
1001

✔️ 1

Tomas Brejla09:06:04

regarding secondary indexes -> that would probably work only for situations where you know in advance, what criteria (where conditions) you're going to use in the future, right?

Tomas Brejla09:06:43

> - I can try to cook up some example code ~soon. That would be super-nice. I agree with Eric that there's quite a skill gap between using xtdb and being able to understand lucene module source and use it as inspiration to build secondary indexes. Well said.

👍 1

refset10:06:39

> that would probably work only for situations where you know in advance, what criteria (where conditions) you're going to use in the future you'd have to rebuild the indexes by replaying the log, but the definitions of those indexes could change without having to migrate the main data/schema

Eric Dvorsak10:06:07

> Alternatively, have you ruled out maintaining the counts at write time, via transaction functions? there's quite a few cases where that would be a possibility indeed. here I was just trying to get the number of users in the db, it was ad-hoc. but I can think of many cases where given a sufficiently big dataset and insufficient discriminating criteria queries can't perform in a reasonable timing without secondary indexes. For example with 4million users, if I want the last 10 created users:

(time  (xt/q (xt/db node)
                            '{:find [(pull ?user [*]) created-at]
        :where [[?user :role :USER] [?user :created-at created-at]] :order-by [[created-at :desc]] :limit 10
        :timeout 60000}))
Execution error (TimeoutException) at xtdb.query.QueryDatasource/q_STAR_ (query.clj:2006).
Query timed out.

👍 1

refset11:06:35

for simple enough order-by queries, ideally the query engine would use the available index, as per https://github.com/xtdb/xtdb/issues/1515 Short of that option, you may want to read this discussion https://github.com/xtdb/xtdb/discussions/1514

Eric Dvorsak11:06:06

thanks I was actually reading through these already 🙂

😄 1

Eric Dvorsak09:09:19

@U899JBRPF https://clojurians.slack.com/archives/CG3AM2F7V/p1656494172596649?thread_ts=1637009670.198600&cid=CG3AM2F7V Did you get a chance to cook up an example of secondary index?

refset09:09:30

Hey Eric, sadly not... happy to help you navigate the APIs though 🙂

2021-11-15

Channels