This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-10-27
Channels
- # announcements (13)
- # asami (12)
- # babashka (65)
- # beginners (62)
- # calva (14)
- # cider (8)
- # clara (11)
- # clj-kondo (16)
- # clojure (86)
- # clojure-europe (12)
- # clojure-gamedev (4)
- # clojure-nl (2)
- # clojure-sg (4)
- # clojure-uk (5)
- # clojurescript (206)
- # clojureverse-ops (11)
- # community-development (7)
- # conjure (12)
- # core-async (2)
- # core-logic (13)
- # cursive (49)
- # datalevin (1)
- # datomic (29)
- # deps-new (3)
- # duct (8)
- # events (5)
- # fulcro (10)
- # helix (5)
- # jobs (1)
- # klipse (5)
- # lsp (178)
- # luminus (1)
- # malli (8)
- # meander (3)
- # membrane (13)
- # missionary (1)
- # nrepl (5)
- # other-languages (4)
- # pedestal (4)
- # reitit (3)
- # releases (1)
- # reveal (27)
- # shadow-cljs (89)
- # tools-build (6)
- # tools-deps (11)
- # vim (2)
- # xtdb (64)
How flexible can topologies configurations be? Can I run nodes optimized for read and others for persistence? Can I log to two logs at the same time? Anything I can do to guarantee consistency or is it not a design goal, having no transactor?
yeah, tx-log and doc store are the golden stores, indexes on nodes can always be recreated if necesssary
And it's possible to run different store backends on different nodes to suite the use case?
tx and doc stores are shared but indices are node specific. I guess you could use eg. rocksdb for indices in one node and lmdb for indices in another one and perhaps get better query speeds for lmdb and faster sync for rockdb?
That's along the lines of what I was wondering if possible. In theory, I might want different storage backends to serve different instances in the system
sounds like a maintenance head ache to have different nodes with completely different indexing backend
I have some nodes that have to be quick and responsive, some that do not. Besides sharing a universal transaction log I don't see a reason why they should share anything else
every node will still index every transaction, so you can't specialize a node to a subset of the information... altho queries will have different hot indexes if you route different types of queries to them
The assumption is that some nodes will be query-intensive while others will be write intensive, or just slightly boring
I might want the index to be in memory and the documents to be stored in RocksDB or LMDB for the query intensive nodes, but for the writing nodes I don't care as much, they can store to s3 and index to whatever is cheap and easy to maintain
https://docs.xtdb.com/administration/configuring/#_storage has the "important" note
I currently have the docs+tx in postgresql, and local indexes in lmdb (and lucene) in a system I'm working on
by design, I know postgresql very well and it is easy to run in a managed and backed up way in the cloud... so that's a good place to store the most critical golden stores
I'm having a hard time making a query over a date range efficient. Let's say I have documents with {:start-date <local-date1> :end-date <local-date2>
(where end date may be nil), and I want to query if it overlaps a given start - end range (either may be unspecified)
the or
branching for the nil case seems to get slow, a regular range query for a date attribute itself seems efficient
it seems like it works ok even without any nil
check, how do range predicates work when the attribute value is nil?
@U11SJ6Q0K hmm perhaps add a start-year end-year (or another type of bucket) to reduce the docs that need to be scanned?
I worked around this by having an indefinite end represented as date 9999-31-12, the document model is complex anyway so I have versions of some attributes separately that are "normalized" for query purposes
so I can side step the nil issue and just query that any date is within the range with two range predicates, and it seems fast
XT on Thoughtworks tech radar: https://www.thoughtworks.com/radar/platforms/xtdb 🎉
I have a weird issue with range predicates, if I add the e
to the :find
it always returns all the e
values seemingly without applying the predicate, if I only return the v
it returns the expected amount of results
so [:find e :where [e :start-date v] [(<= v start)] :in start]
returns every e
that has a :start-date
when I do [:find v :where [e :start-date v] [(<= v start)] :in start]
it returns expected amount
not related to LocalDate, tried with long numbers as well, same behaviour... how do I do a query like "find all document ids where :date is larger than <input param>"
testing with a single value with using =
as predicate, that returns the same amount of results, regardless of what the :find
includes
I get that results is a set, so that what is in :find
affects it, I'm trying to find a good repro of this
Hey, sorry for the delay (still catching up after a few days away!), does it work as expected if you refer to clojure.core/<=
instead of <=
? This won't use the indexes as intelligently, but hopefully gives the correct answer 🙂
Are you applying a limit or order-by here afterwards?
this must be user error ™️ where I had something broken in my setup, as I can't reproduce it
now trying with range predicate <=
or clojure.core/<=
I get the same results, the latter is only slower
Does evict really removes the data in the document store? If so, then it won’t be possible for a new indexer to rebuild the indices from scratch since intermediate docs are removed.
In the doc (https://docs.xtdb.com/language-reference/datalog-transactions/#document), it says: > In Kafka, the `doc-topic` is compacted, which enables later eviction of documents. So does XTDB indeed delete a document or not?
In the case of Kafka, the document marked for compaction is evicted (physicallly deleted) eventually, whenever the default background compaction process runs
@U899JBRPF The issue is that, after the document has been compacted, and then I launch a new indexer. When the indexer encounters the transaction that is associated with the deleted document, it can no way to access the original information, so it dunno how to keep building the index.
transaction processing should always be able to continue, like it should be able to skip over the tombstones as noops, so this might be a bug. Can you repro it easily? There may be an edge case not covered by our integration tests https://github.com/xtdb/xtdb/blob/e2f51ed99fc2716faa8ad254c0b18166c937b134/test/test/xtdb/kafka_test.clj#L22-L78
What’s the logic of :evict ? Will it find the first tx that contains the evicted eid and starting from that reindexing again?
To illustrate my question with the code:
;; create entity a
(xt/submit-tx node
[[::xt/put
{:xt/id :a
:first-name :john}]])
;; create entity b based on a (STEP)
(xt/submit-tx node
[[::xt/put
{:xt/id :simple-fn
:xt/fn '(fn [ctx eid]
(let [db (xtdb.api/db ctx)
entity (xtdb.api/entity db eid)]
[[::xt/put (assoc entity :xt/id :b)]]))}]
[::xt/fn :simple-fn :a]])
;; evict a
(xt/submit-tx node
[[::xt/evict :a]])
;; entity b is still there.
(xt/entity (xt/db node) :b)
;; if now we have a new rocksdb instance indexing from scratch, since the document of a has been evicted, how does it set the entity b
;; when it encounters (STEP)?
Because the arg-docs are replaced with the final tx-op statements. There is no longer transaction function running for the new rocksdb instance, right?
What query do I write to get the only the record with :id first
(with-open [node (xt/start-node {})]
(let [t (xt/submit-tx node [[::xt/put {:xt/id "first" :ref ["bob" "sandy"]}]
[::xt/put {:xt/id "second" :ref ["bob" "randy"]}]
[::xt/put {:xt/id "third" :ref ["bob" "sandy" "randy"]}]])]
(xt/await-tx node t)
(xt/q (xt/db node) qy
["sandy" "bob"])))
with this query
(def qy '{:find [(pull e [*])]
:in [[fname sname]]
:where [[e :ref ?name]
[(get #{fname sname} ?name) valid-name]
[(boolean valid-name)]]})
I get no result
with this query
(def qy '{:find [(pull e [*])]
:in [[fname sname]]
:where [[e :ref fname]
[e :ref sname]
]})
I get both the first and third entriesIs there a way to query against the entire vector value of the :ref
key, not just its members
there are some notes on vector values here: https://docs.xtdb.com/language-reference/datalog-queries/#_maps_and_vectors_in_data
I'm not sure why the first query doesn't return anything, but the second one's search results make sense
You can put the :ref
vector in map.
(xt/submit-tx @node [[::xt/put {:xt/id "first" :ref {:items ["bob" "sandy"]}}]
[::xt/put {:xt/id "second" :ref {:items ["bob" "randy"]}}]
[::xt/put {:xt/id "third" :ref {:items ["bob" "sandy" "randy"]}}]])
(xt/sync @node)
(q '{:find [(pull e [*])]
:where [[e :ref ref]
[(get ref :items) items]
[(= items ["bob" "sandy"])]]})
I found a solution
(def qy '{:find [(pull e [*])]
:in [names]
:where [[e :ref]
[(get-attr e :ref) ?names]
[(set ?names) nms-set]
[(= names nms-set)]]})
very nice - you could also store the :ref as a set to begin with I believe. I don't know the context of your app/data model but it may also make sense to move the items in the ref into their own entities and then join on them
Sorry to chime in late (I'm glad you found a solution!), but the reason this function expression doesn't work is because you can't refer to logic vars within a literal set > [(get #{fname sname} ?name) valid-name] Instead you can construct the set separately > [(hash-set fname sname) ?name-set] > [(get ?name-set ?name) valid-name]
To illustrate my question with the code:
;; create entity a
(xt/submit-tx node
[[::xt/put
{:xt/id :a
:first-name :john}]])
;; create entity b based on a (STEP)
(xt/submit-tx node
[[::xt/put
{:xt/id :simple-fn
:xt/fn '(fn [ctx eid]
(let [db (xtdb.api/db ctx)
entity (xtdb.api/entity db eid)]
[[::xt/put (assoc entity :xt/id :b)]]))}]
[::xt/fn :simple-fn :a]])
;; evict a
(xt/submit-tx node
[[::xt/evict :a]])
;; entity b is still there.
(xt/entity (xt/db node) :b)
;; if now we have a new rocksdb instance indexing from scratch, since the document of a has been evicted, how does it set the entity b
;; when it encounters (STEP)?