Fork me on GitHub
Ben Sless08:10:45

How flexible can topologies configurations be? Can I run nodes optimized for read and others for persistence? Can I log to two logs at the same time? Anything I can do to guarantee consistency or is it not a design goal, having no transactor?


consistency is part of the tx-log

Ben Sless09:10:21

Transactions are written before updates to indices and storage, then?


yeah, tx-log and doc store are the golden stores, indexes on nodes can always be recreated if necesssary

Ben Sless09:10:52

And it's possible to run different store backends on different nodes to suite the use case?


tx and doc stores are shared but indices are node specific. I guess you could use eg. rocksdb for indices in one node and lmdb for indices in another one and perhaps get better query speeds for lmdb and faster sync for rockdb?

✔️ 1
Ben Sless10:10:40

That's along the lines of what I was wondering if possible. In theory, I might want different storage backends to serve different instances in the system


you could... I'm not convinved that you should, though.


sounds like a maintenance head ache to have different nodes with completely different indexing backend

Ben Sless10:10:36

I have some nodes that have to be quick and responsive, some that do not. Besides sharing a universal transaction log I don't see a reason why they should share anything else


every node will still index every transaction, so you can't specialize a node to a subset of the information... altho queries will have different hot indexes if you route different types of queries to them

Ben Sless10:10:57

The assumption is that some nodes will be query-intensive while others will be write intensive, or just slightly boring

Ben Sless10:10:37

I might want the index to be in memory and the documents to be stored in RocksDB or LMDB for the query intensive nodes, but for the writing nodes I don't care as much, they can store to s3 and index to whatever is cheap and easy to maintain


docs and tx stores are always shared


but it can have a local doc cache afaict

Ben Sless10:10:39

ah, that's what I missed. No way to share only the tx stores?


I currently have the docs+tx in postgresql, and local indexes in lmdb (and lucene) in a system I'm working on

Ben Sless10:10:22

was it a design decision or just a current limitation?


by design, I know postgresql very well and it is easy to run in a managed and backed up way in the cloud... so that's a good place to store the most critical golden stores


I'm having a hard time making a query over a date range efficient. Let's say I have documents with {:start-date <local-date1> :end-date <local-date2> (where end date may be nil), and I want to query if it overlaps a given start - end range (either may be unspecified)


the or branching for the nil case seems to get slow, a regular range query for a date attribute itself seems efficient


it seems like it works ok even without any nil check, how do range predicates work when the attribute value is nil?


Can you set up a repro and baseline queries? I'd like to take a crack at it


@U11SJ6Q0K hmm perhaps add a start-year end-year (or another type of bucket) to reduce the docs that need to be scanned?


I worked around this by having an indefinite end represented as date 9999-31-12, the document model is complex anyway so I have versions of some attributes separately that are "normalized" for query purposes


so I can side step the nil issue and just query that any date is within the range with two range predicates, and it seems fast

🙏 1

Ah interesting - how many documents are you querying across?


around 25k


XT on Thoughtworks tech radar: 🎉

👍 5
🎉 2

I have a weird issue with range predicates, if I add the e to the :find it always returns all the e values seemingly without applying the predicate, if I only return the v it returns the expected amount of results


so [:find e :where [e :start-date v] [(<= v start)] :in start] returns every e that has a :start-date


when I do [:find v :where [e :start-date v] [(<= v start)] :in start] it returns expected amount


is there some issue with using LocalDate as range?


not related to LocalDate, tried with long numbers as well, same behaviour... how do I do a query like "find all document ids where :date is larger than <input param>"


testing with a single value with using = as predicate, that returns the same amount of results, regardless of what the :find includes


I get that results is a set, so that what is in :find affects it, I'm trying to find a good repro of this


Hey, sorry for the delay (still catching up after a few days away!), does it work as expected if you refer to clojure.core/<= instead of <=? This won't use the indexes as intelligently, but hopefully gives the correct answer 🙂 Are you applying a limit or order-by here afterwards?


this must be user error ™️ where I had something broken in my setup, as I can't reproduce it


now trying with range predicate <= or clojure.core/<= I get the same results, the latter is only slower


Hmm! Weird, well I've made a mental note in case I hear about this again


Thanks for checking


Does evict really removes the data in the document store? If so, then it won’t be possible for a new indexer to rebuild the indices from scratch since intermediate docs are removed.


That's correct


In the doc (, it says: > In Kafka, the `doc-topic` is compacted, which enables later eviction of documents. So does XTDB indeed delete a document or not?


In the case of Kafka, the document marked for compaction is evicted (physicallly deleted) eventually, whenever the default background compaction process runs


@U899JBRPF The issue is that, after the document has been compacted, and then I launch a new indexer. When the indexer encounters the transaction that is associated with the deleted document, it can no way to access the original information, so it dunno how to keep building the index.


transaction processing should always be able to continue, like it should be able to skip over the tombstones as noops, so this might be a bug. Can you repro it easily? There may be an edge case not covered by our integration tests


What’s the logic of :evict ? Will it find the first tx that contains the evicted eid and starting from that reindexing again?


To illustrate my question with the code:

;; create entity a
  (xt/submit-tx node
                  {:xt/id :a
                   :first-name :john}]])

  ;; create entity b based on a   (STEP)
  (xt/submit-tx node
                  {:xt/id :simple-fn
                   :xt/fn '(fn [ctx eid]
                             (let [db (xtdb.api/db ctx)
                                   entity (xtdb.api/entity db eid)]
                               [[::xt/put (assoc entity :xt/id :b)]]))}]
                 [::xt/fn :simple-fn :a]])

  ;; evict a
  (xt/submit-tx node 
                [[::xt/evict :a]])

  ;; entity b is still there.
  (xt/entity (xt/db node) :b)

  ;; if now we have a new rocksdb instance indexing from scratch, since the document of a has been evicted, how does it set the entity b
  ;; when it encounters (STEP)?


Because the arg-docs are replaced with the final tx-op statements. There is no longer transaction function running for the new rocksdb instance, right?


Precisely! 🙂


What query do I write to get the only the record with :id first

(with-open [node (xt/start-node {})]
    (let [t (xt/submit-tx node [[::xt/put {:xt/id "first" :ref ["bob" "sandy"]}]
                                [::xt/put {:xt/id "second" :ref ["bob" "randy"]}]
                                [::xt/put {:xt/id "third" :ref ["bob" "sandy" "randy"]}]])]
      (xt/await-tx node t)
      (xt/q (xt/db node) qy
            ["sandy" "bob"])))
with this query
(def qy '{:find [(pull e [*])]
          :in [[fname sname]]
          :where [[e :ref ?name]
                  [(get #{fname sname} ?name) valid-name]
                  [(boolean valid-name)]]})
I get no result with this query
(def qy '{:find [(pull e [*])]
          :in [[fname sname]]
          :where [[e :ref fname]
                  [e :ref sname]
I get both the first and third entries


Is there a way to query against the entire vector value of the :ref key, not just its members


I'm not sure why the first query doesn't return anything, but the second one's search results make sense


You can put the :ref vector in map.

(xt/submit-tx @node [[::xt/put {:xt/id "first" :ref {:items ["bob" "sandy"]}}]
                     [::xt/put {:xt/id "second" :ref {:items ["bob" "randy"]}}]
                     [::xt/put {:xt/id "third" :ref {:items ["bob" "sandy" "randy"]}}]])
(xt/sync @node)
(q '{:find [(pull e [*])]
     :where [[e :ref ref]
             [(get ref :items) items]
             [(= items ["bob" "sandy"])]]})


Weirdly enough I couldn't get it to work with :in parameters


I need the :ref data on the base level so that it can be queried


I found a solution

(def qy '{:find [(pull e [*])]
            :in [names]
            :where [[e :ref]
                    [(get-attr e :ref) ?names]
                    [(set ?names) nms-set]
                    [(= names nms-set)]]})

🙏 1

It works if I change the query to

(xt/q (xt/db node) qy #{"sandy" "bob"})


very nice - you could also store the :ref as a set to begin with I believe. I don't know the context of your app/data model but it may also make sense to move the items in the ref into their own entities and then join on them


Sorry to chime in late (I'm glad you found a solution!), but the reason this function expression doesn't work is because you can't refer to logic vars within a literal set > [(get #{fname sname} ?name) valid-name] Instead you can construct the set separately > [(hash-set fname sname) ?name-set] > [(get ?name-set ?name) valid-name]


To illustrate my question with the code:

;; create entity a
  (xt/submit-tx node
                  {:xt/id :a
                   :first-name :john}]])

  ;; create entity b based on a   (STEP)
  (xt/submit-tx node
                  {:xt/id :simple-fn
                   :xt/fn '(fn [ctx eid]
                             (let [db (xtdb.api/db ctx)
                                   entity (xtdb.api/entity db eid)]
                               [[::xt/put (assoc entity :xt/id :b)]]))}]
                 [::xt/fn :simple-fn :a]])

  ;; evict a
  (xt/submit-tx node 
                [[::xt/evict :a]])

  ;; entity b is still there.
  (xt/entity (xt/db node) :b)

  ;; if now we have a new rocksdb instance indexing from scratch, since the document of a has been evicted, how does it set the entity b
  ;; when it encounters (STEP)?