xtdb 2021-11-30 | Slack Archive

xlfe03:11:46

just noticed v1.20 was released yesterday ?

tatut05:11:32

docs still say 1.19.0

tatut05:11:49

and github releases page

jarohen09:11:55

1.20.0 is indeed up on Maven, release notes to come 🙂

xlfe10:11:11

Hypothetical. Asking for a friend. How would you implement a two-tier doc-store and tx-log. The first tier has lots of data that doesn't change very much, and the second tier has a little bit of data that is changing more frequently. The doc store first checks t2 and if it doesn't find the doc, checks t1 (and makes use of document-cache too). All writes for the doc-store go to t2 by default (t1 is managed elsewhere). what rules should I follow for the tx-log for it to remain valid? could all odd tx-ids be a reference to the tier1 tx-log and all even tx-ids be a reference to the tier2 tx-log?

tatut10:11:32

why would you? what problem would this type of tier system solve?

xlfe10:11:07

tier1 has lots of reference data, and each tenant needs a copy of that reference data, but rather than have lots of copies of reference data (one for each tenant) - have one copy (tier1) and store tenant specific data in tier 2

tatut10:11:20

ok, so you are trading storage space cost vs solution complexity

xlfe10:11:45

yeah

tatut10:11:48

idk how you would do this in a way that actually works so that queries can use both the t1 and t2 data

refset13:11:05

Hypothetically speaking (😉) I can't think of a way to guarantee deterministic ingestion ordering across two continuously updating tx-logs unless they are both guaranteed to run on the same broker/JDBC backend and the tx-times lined up exactly, regardless of write ordering between threads (in which case you could then choose to always order A txes before B txes in the rare cases when the tx-times are identical). I'm not sure such a setup is actually possible using Kafka at all :thinking_face: However, if you are willing to tolerate duplicating the tx-log writes to each tenant (which shouldn't be quite so bad, since tx-log contents are typically small relative to the docs), then it makes things simpler, as I can imagine having a two-tier doc-store is quite plausible by proxying and partitioning across two physical backend doc-stores, based on inspecting the prefix of the doc ID. You would also need to handle puts of eviction tombstones somehow, since the ID is wiped - I guess you could naively write tombstones to both partitions.

👍 1

xlfe22:11:15

thanks. the backend store is GCP datastore, so timestamps will be consistent across tiers or partitions. (ie the same datastore, different namespaces). not working on this at the moment, but it was more a thought exercise about future scaling options! thanks @tatut and @U899JBRPF

👌 1

🙂 1

lgessler16:11:17

I'm seeing a strange behavior in the HTTP interface where the same query completes in ~12s but times out (even with :timeout at 3 minutes) when all the logic variables are not prefixed with a ?. can someone sanity check me here? query inside

lgessler16:11:33

with ?:

{:find [?t],
   :where [[?s :sentence/tokens ?t]
           [?s :sentence/tokens ?t2]

           [?t :token/form ?f]
           [?f :form/value #{"eat" "Eat"}]

           [?t2 :token/form ?f2]
           [?f2 :form/value #{"up" "Up"}]
           [?t2 :token/deprel ?dr2]
           [?dr2 :deprel/value "compound:prt"]]
   }

lgessler16:11:45

without:

{:find [t],
   :where [[s :sentence/tokens t]
           [s :sentence/tokens t2]

           [t :token/form f]
           [f :form/value #{"eat" "Eat"}]

           [t2 :token/form f2]
           [f2 :form/value #{"up" "Up"}]
           [t2 :token/deprel dr2]
           [dr2 :deprel/value "compound:prt"]]
   }

lgessler16:11:56

you don't need to quote the query when you're using the http interface, right?

refset16:11:12

> you don't need to quote the query when you're using the http interface, right? shouldn't do, nope

👍 1

refset16:11:11

just in case there's some obscure symbol conflicts with those specific examples, could you try adding a ~random prefix

{:find [foo-t],
   :where [[foo-s :sentence/tokens foo-t]
           [foo-s :sentence/tokens foo-t2]

           [foo-t :token/form foo-f]
           [foo-f :form/value #{"eat" "Eat"}]

           [foo-t2 :token/form foo-f2]
           [foo-f2 :form/value #{"up" "Up"}]
           [foo-t2 :token/deprel foo-dr2]
           [foo-dr2 :deprel/value "compound:prt"]]
   }

lgessler16:11:13

not sure if this makes things more or less weird, but that also times out

😄 1

lgessler16:11:09

to be clear, this isn't a critical issue for me at all (can just do ? prefixes) but i thought i'd mention it

👌 1

lgessler16:11:24

might try to repro on a very small db later

refset16:11:59

cool, good to rule out, anyway next up, see if the vars-in-join-order match when calling xtdb.query/query-plan-for with both https://github.com/xtdb/xtdb/blob/61e6a8eb07a87dcff8e65ad4be97c75346a9c86f/core/src/xtdb/query.clj#L1664 (or turn on debug logs for xtdb.query)

refset16:11:11

Hey folks 🙂 just announcing that we released XTDB 1.20.0 earlier today - https://github.com/xtdb/xtdb/releases/tag/1.20.0 - but since the release notes are brief I'll save you click... > 1.20.0 is a bugfix release with one minor breaking bugfix to our pull behaviour: > > - https://github.com/xtdb/xtdb/issues/1549 (breaking): pull now returns nil instead of {} where joined documents do not exist. > - https://github.com/xtdb/xtdb/issues/1627: Lucene handles 'match an absent document' ops > - https://github.com/xtdb/xtdb/pull/1659: Able to restore Lucene from a checkpoint (thx @tatut!) We were particularly keen to clear the decks and get this released ahead of the upcoming Re:Clojure conference, for which I will be running a 2-hour pre-conf workshop this Thursday: https://www.eventbrite.com/e/xtdb-workshop-reclojure-tickets-191330985127 And also, since it's probably of interest to quite a few of you, @j.antonelli712 is presenting on JUXT's https://github.com/juxt/site project, which is a rather novel XT-powered GraphQL and OpenAPI "Resource Server". Look out for "Schema driven development with GraphQL" on Friday @ 12:30 UTC https://www.reclojure.org/#schedule

❤️ 2

2021-11-30

Channels