This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-11-30
Channels
- # adventofcode (95)
- # announcements (17)
- # babashka (28)
- # beginners (107)
- # calva (34)
- # clj-kondo (7)
- # cljs-dev (20)
- # cljsrn (1)
- # clojure (95)
- # clojure-europe (41)
- # clojure-italy (3)
- # clojure-nl (5)
- # clojure-spec (7)
- # clojure-uk (4)
- # clojurescript (77)
- # cursive (7)
- # data-science (1)
- # datalog (4)
- # datomic (12)
- # events (3)
- # fulcro (32)
- # graalvm (2)
- # hugsql (19)
- # introduce-yourself (4)
- # jobs (2)
- # lsp (20)
- # membrane-term (19)
- # numerical-computing (1)
- # off-topic (8)
- # pathom (3)
- # polylith (17)
- # portal (42)
- # re-frame (7)
- # reagent (32)
- # remote-jobs (1)
- # shadow-cljs (86)
- # spacemacs (3)
- # tools-deps (52)
- # uncomplicate (1)
- # xtdb (23)
Hypothetical. Asking for a friend. How would you implement a two-tier doc-store and tx-log. The first tier has lots of data that doesn't change very much, and the second tier has a little bit of data that is changing more frequently. The doc store first checks t2 and if it doesn't find the doc, checks t1 (and makes use of document-cache too). All writes for the doc-store go to t2 by default (t1 is managed elsewhere). what rules should I follow for the tx-log for it to remain valid? could all odd tx-ids be a reference to the tier1 tx-log and all even tx-ids be a reference to the tier2 tx-log?
tier1 has lots of reference data, and each tenant needs a copy of that reference data, but rather than have lots of copies of reference data (one for each tenant) - have one copy (tier1) and store tenant specific data in tier 2
idk how you would do this in a way that actually works so that queries can use both the t1 and t2 data
Hypothetically speaking (😉) I can't think of a way to guarantee deterministic ingestion ordering across two continuously updating tx-logs unless they are both guaranteed to run on the same broker/JDBC backend and the tx-times lined up exactly, regardless of write ordering between threads (in which case you could then choose to always order A txes before B txes in the rare cases when the tx-times are identical). I'm not sure such a setup is actually possible using Kafka at all :thinking_face:
However, if you are willing to tolerate duplicating the tx-log writes to each tenant (which shouldn't be quite so bad, since tx-log contents are typically small relative to the docs), then it makes things simpler, as I can imagine having a two-tier doc-store is quite plausible by proxying and partitioning across two physical backend doc-stores, based on inspecting the prefix of the doc ID. You would also need to handle puts
of eviction tombstones somehow, since the ID is wiped - I guess you could naively write tombstones to both partitions.
thanks. the backend store is GCP datastore, so timestamps will be consistent across tiers or partitions. (ie the same datastore, different namespaces). not working on this at the moment, but it was more a thought exercise about future scaling options! thanks @tatut and @U899JBRPF
I'm seeing a strange behavior in the HTTP interface where the same query completes in ~12s but times out (even with :timeout
at 3 minutes) when all the logic variables are not prefixed with a ?
. can someone sanity check me here? query inside
with ?
:
{:find [?t],
:where [[?s :sentence/tokens ?t]
[?s :sentence/tokens ?t2]
[?t :token/form ?f]
[?f :form/value #{"eat" "Eat"}]
[?t2 :token/form ?f2]
[?f2 :form/value #{"up" "Up"}]
[?t2 :token/deprel ?dr2]
[?dr2 :deprel/value "compound:prt"]]
}
without:
{:find [t],
:where [[s :sentence/tokens t]
[s :sentence/tokens t2]
[t :token/form f]
[f :form/value #{"eat" "Eat"}]
[t2 :token/form f2]
[f2 :form/value #{"up" "Up"}]
[t2 :token/deprel dr2]
[dr2 :deprel/value "compound:prt"]]
}
> you don't need to quote the query when you're using the http interface, right? shouldn't do, nope
just in case there's some obscure symbol conflicts with those specific examples, could you try adding a ~random prefix
{:find [foo-t],
:where [[foo-s :sentence/tokens foo-t]
[foo-s :sentence/tokens foo-t2]
[foo-t :token/form foo-f]
[foo-f :form/value #{"eat" "Eat"}]
[foo-t2 :token/form foo-f2]
[foo-f2 :form/value #{"up" "Up"}]
[foo-t2 :token/deprel foo-dr2]
[foo-dr2 :deprel/value "compound:prt"]]
}
to be clear, this isn't a critical issue for me at all (can just do ? prefixes) but i thought i'd mention it
cool, good to rule out, anyway
next up, see if the vars-in-join-order
match when calling xtdb.query/query-plan-for
with both https://github.com/xtdb/xtdb/blob/61e6a8eb07a87dcff8e65ad4be97c75346a9c86f/core/src/xtdb/query.clj#L1664 (or turn on debug logs for xtdb.query
)
Hey folks 🙂 just announcing that we released XTDB 1.20.0
earlier today - https://github.com/xtdb/xtdb/releases/tag/1.20.0 - but since the release notes are brief I'll save you click...
> 1.20.0 is a bugfix release with one minor breaking bugfix to our pull
behaviour:
>
> - https://github.com/xtdb/xtdb/issues/1549 (breaking): pull now returns nil
instead of {}
where joined documents do not exist.
> - https://github.com/xtdb/xtdb/issues/1627: Lucene handles 'match an absent document' ops
> - https://github.com/xtdb/xtdb/pull/1659: Able to restore Lucene from a checkpoint (thx @tatut!)
We were particularly keen to clear the decks and get this released ahead of the upcoming Re:Clojure conference, for which I will be running a 2-hour pre-conf workshop this Thursday: https://www.eventbrite.com/e/xtdb-workshop-reclojure-tickets-191330985127
And also, since it's probably of interest to quite a few of you, @j.antonelli712 is presenting on JUXT's https://github.com/juxt/site project, which is a rather novel XT-powered GraphQL and OpenAPI "Resource Server". Look out for "Schema driven development with GraphQL" on Friday @ 12:30 UTC https://www.reclojure.org/#schedule