Fork me on GitHub
#xtdb
<
2021-10-27
>
Ben Sless08:10:45

How flexible can topologies configurations be? Can I run nodes optimized for read and others for persistence? Can I log to two logs at the same time? Anything I can do to guarantee consistency or is it not a design goal, having no transactor?

tatut08:10:00

consistency is part of the tx-log

Ben Sless09:10:21

Transactions are written before updates to indices and storage, then?

tatut09:10:08

yeah, tx-log and doc store are the golden stores, indexes on nodes can always be recreated if necesssary

Ben Sless09:10:52

And it's possible to run different store backends on different nodes to suite the use case?

Tuomas10:10:27

tx and doc stores are shared but indices are node specific. I guess you could use eg. rocksdb for indices in one node and lmdb for indices in another one and perhaps get better query speeds for lmdb and faster sync for rockdb?

✔️ 1
Ben Sless10:10:40

That's along the lines of what I was wondering if possible. In theory, I might want different storage backends to serve different instances in the system

tatut10:10:47

you could... I'm not convinved that you should, though.

tatut10:10:29

sounds like a maintenance head ache to have different nodes with completely different indexing backend

Ben Sless10:10:36

I have some nodes that have to be quick and responsive, some that do not. Besides sharing a universal transaction log I don't see a reason why they should share anything else

tatut10:10:11

every node will still index every transaction, so you can't specialize a node to a subset of the information... altho queries will have different hot indexes if you route different types of queries to them

Ben Sless10:10:57

The assumption is that some nodes will be query-intensive while others will be write intensive, or just slightly boring

Ben Sless10:10:37

I might want the index to be in memory and the documents to be stored in RocksDB or LMDB for the query intensive nodes, but for the writing nodes I don't care as much, they can store to s3 and index to whatever is cheap and easy to maintain

tatut10:10:00

docs and tx stores are always shared

tatut10:10:35

but it can have a local doc cache afaict

Ben Sless10:10:39

ah, that's what I missed. No way to share only the tx stores?

tatut10:10:21

I currently have the docs+tx in postgresql, and local indexes in lmdb (and lucene) in a system I'm working on

Ben Sless10:10:22

was it a design decision or just a current limitation?

tatut10:10:33

by design, I know postgresql very well and it is easy to run in a managed and backed up way in the cloud... so that's a good place to store the most critical golden stores

tatut12:10:35

I'm having a hard time making a query over a date range efficient. Let's say I have documents with {:start-date <local-date1> :end-date <local-date2> (where end date may be nil), and I want to query if it overlaps a given start - end range (either may be unspecified)

tatut12:10:41

the or branching for the nil case seems to get slow, a regular range query for a date attribute itself seems efficient

tatut14:10:05

it seems like it works ok even without any nil check, how do range predicates work when the attribute value is nil?

Tuomas08:10:24

Can you set up a repro and baseline queries? I'd like to take a crack at it

xlfe09:10:10

@U11SJ6Q0K hmm perhaps add a start-year end-year (or another type of bucket) to reduce the docs that need to be scanned?

tatut09:10:24

I worked around this by having an indefinite end represented as date 9999-31-12, the document model is complex anyway so I have versions of some attributes separately that are "normalized" for query purposes

tatut09:10:03

so I can side step the nil issue and just query that any date is within the range with two range predicates, and it seems fast

🙏 1
xlfe11:10:18

Ah interesting - how many documents are you querying across?

tatut12:10:11

around 25k

lispyclouds12:10:02

XT on Thoughtworks tech radar: https://www.thoughtworks.com/radar/platforms/xtdb 🎉

👍 5
🎉 2
tatut13:10:44

I have a weird issue with range predicates, if I add the e to the :find it always returns all the e values seemingly without applying the predicate, if I only return the v it returns the expected amount of results

tatut13:10:54

so [:find e :where [e :start-date v] [(<= v start)] :in start] returns every e that has a :start-date

tatut13:10:37

when I do [:find v :where [e :start-date v] [(<= v start)] :in start] it returns expected amount

tatut13:10:06

is there some issue with using LocalDate as range?

tatut13:10:45

not related to LocalDate, tried with long numbers as well, same behaviour... how do I do a query like "find all document ids where :date is larger than <input param>"

tatut14:10:14

testing with a single value with using = as predicate, that returns the same amount of results, regardless of what the :find includes

tatut14:10:23

I get that results is a set, so that what is in :find affects it, I'm trying to find a good repro of this

refset15:10:08

Hey, sorry for the delay (still catching up after a few days away!), does it work as expected if you refer to clojure.core/<= instead of <=? This won't use the indexes as intelligently, but hopefully gives the correct answer 🙂 Are you applying a limit or order-by here afterwards?

tatut11:11:13

this must be user error ™️ where I had something broken in my setup, as I can't reproduce it

tatut11:11:00

now trying with range predicate <= or clojure.core/<= I get the same results, the latter is only slower

refset13:11:54

Hmm! Weird, well I've made a mental note in case I hear about this again

refset14:11:05

Thanks for checking

pinkfrog16:10:45

Does evict really removes the data in the document store? If so, then it won’t be possible for a new indexer to rebuild the indices from scratch since intermediate docs are removed.

jonpither16:10:00

That's correct

pinkfrog14:10:28

In the doc (https://docs.xtdb.com/language-reference/datalog-transactions/#document), it says: > In Kafka, the `doc-topic` is compacted, which enables later eviction of documents. So does XTDB indeed delete a document or not?

refset15:10:06

In the case of Kafka, the document marked for compaction is evicted (physicallly deleted) eventually, whenever the default background compaction process runs

pinkfrog16:10:16

@U899JBRPF The issue is that, after the document has been compacted, and then I launch a new indexer. When the indexer encounters the transaction that is associated with the deleted document, it can no way to access the original information, so it dunno how to keep building the index.

refset16:10:30

transaction processing should always be able to continue, like it should be able to skip over the tombstones as noops, so this might be a bug. Can you repro it easily? There may be an edge case not covered by our integration tests https://github.com/xtdb/xtdb/blob/e2f51ed99fc2716faa8ad254c0b18166c937b134/test/test/xtdb/kafka_test.clj#L22-L78

pinkfrog16:10:29

What’s the logic of :evict ? Will it find the first tx that contains the evicted eid and starting from that reindexing again?

pinkfrog03:10:57

To illustrate my question with the code:

;; create entity a
  (xt/submit-tx node
                [[::xt/put
                  {:xt/id :a
                   :first-name :john}]])

  ;; create entity b based on a   (STEP)
  (xt/submit-tx node
                [[::xt/put
                  {:xt/id :simple-fn
                   :xt/fn '(fn [ctx eid]
                             (let [db (xtdb.api/db ctx)
                                   entity (xtdb.api/entity db eid)]
                               [[::xt/put (assoc entity :xt/id :b)]]))}]
                 [::xt/fn :simple-fn :a]])

  ;; evict a
  (xt/submit-tx node 
                [[::xt/evict :a]])

  ;; entity b is still there.
  (xt/entity (xt/db node) :b)

  ;; if now we have a new rocksdb instance indexing from scratch, since the document of a has been evicted, how does it set the entity b
  ;; when it encounters (STEP)?

pinkfrog03:10:41

Because the arg-docs are replaced with the final tx-op statements. There is no longer transaction function running for the new rocksdb instance, right?

refset10:10:13

Precisely! 🙂

ozimos18:10:56

What query do I write to get the only the record with :id first

(with-open [node (xt/start-node {})]
    (let [t (xt/submit-tx node [[::xt/put {:xt/id "first" :ref ["bob" "sandy"]}]
                                [::xt/put {:xt/id "second" :ref ["bob" "randy"]}]
                                [::xt/put {:xt/id "third" :ref ["bob" "sandy" "randy"]}]])]
      (xt/await-tx node t)
      (xt/q (xt/db node) qy
            ["sandy" "bob"])))
with this query
(def qy '{:find [(pull e [*])]
          :in [[fname sname]]
          :where [[e :ref ?name]
                  [(get #{fname sname} ?name) valid-name]
                  [(boolean valid-name)]]})
I get no result with this query
(def qy '{:find [(pull e [*])]
          :in [[fname sname]]
          :where [[e :ref fname]
                  [e :ref sname]
                  ]})
I get both the first and third entries

ozimos18:10:26

Is there a way to query against the entire vector value of the :ref key, not just its members

dvingo22:10:31

I'm not sure why the first query doesn't return anything, but the second one's search results make sense

Tuomas08:10:56

You can put the :ref vector in map.

(xt/submit-tx @node [[::xt/put {:xt/id "first" :ref {:items ["bob" "sandy"]}}]
                     [::xt/put {:xt/id "second" :ref {:items ["bob" "randy"]}}]
                     [::xt/put {:xt/id "third" :ref {:items ["bob" "sandy" "randy"]}}]])
(xt/sync @node)
(q '{:find [(pull e [*])]
     :where [[e :ref ref]
             [(get ref :items) items]
             [(= items ["bob" "sandy"])]]})

Tuomas08:10:36

Weirdly enough I couldn't get it to work with :in parameters

ozimos09:10:57

I need the :ref data on the base level so that it can be queried

ozimos09:10:28

I found a solution

(def qy '{:find [(pull e [*])]
            :in [names]
            :where [[e :ref]
                    [(get-attr e :ref) ?names]
                    [(set ?names) nms-set]
                    [(= names nms-set)]]})

🙏 1
ozimos09:10:22

It works if I change the query to

(xt/q (xt/db node) qy #{"sandy" "bob"})

dvingo16:10:48

very nice - you could also store the :ref as a set to begin with I believe. I don't know the context of your app/data model but it may also make sense to move the items in the ref into their own entities and then join on them

refset15:10:20

Sorry to chime in late (I'm glad you found a solution!), but the reason this function expression doesn't work is because you can't refer to logic vars within a literal set > [(get #{fname sname} ?name) valid-name] Instead you can construct the set separately > [(hash-set fname sname) ?name-set] > [(get ?name-set ?name) valid-name]

pinkfrog03:10:57

To illustrate my question with the code:

;; create entity a
  (xt/submit-tx node
                [[::xt/put
                  {:xt/id :a
                   :first-name :john}]])

  ;; create entity b based on a   (STEP)
  (xt/submit-tx node
                [[::xt/put
                  {:xt/id :simple-fn
                   :xt/fn '(fn [ctx eid]
                             (let [db (xtdb.api/db ctx)
                                   entity (xtdb.api/entity db eid)]
                               [[::xt/put (assoc entity :xt/id :b)]]))}]
                 [::xt/fn :simple-fn :a]])

  ;; evict a
  (xt/submit-tx node 
                [[::xt/evict :a]])

  ;; entity b is still there.
  (xt/entity (xt/db node) :b)

  ;; if now we have a new rocksdb instance indexing from scratch, since the document of a has been evicted, how does it set the entity b
  ;; when it encounters (STEP)?