This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-09-01
Channels
- # adventofcode (2)
- # announcements (3)
- # babashka-sci-dev (79)
- # beginners (76)
- # biff (2)
- # calva (32)
- # cider (2)
- # clj-kondo (42)
- # clj-on-windows (17)
- # clojure (28)
- # clojure-belgium (1)
- # clojure-berlin (1)
- # clojure-europe (95)
- # clojure-nl (4)
- # clojure-norway (4)
- # clojure-uk (5)
- # clojurescript (27)
- # conjure (5)
- # cursive (3)
- # data-science (16)
- # datomic (67)
- # graalvm (12)
- # hyperfiddle (36)
- # jobs (3)
- # jobs-discuss (1)
- # kaocha (2)
- # klipse (1)
- # leiningen (28)
- # lsp (16)
- # luminus (3)
- # malli (10)
- # nrepl (3)
- # off-topic (57)
- # other-languages (18)
- # re-frame (4)
- # reitit (8)
- # releases (1)
- # remote-jobs (1)
- # scittle (4)
- # shadow-cljs (7)
- # test-check (1)
- # tools-deps (4)
- # vim (11)
- # xtdb (25)
I'm looking for a way to expose the whole db to be queried by semi-technical users. Maybe as a notebook or something web-based, any recommendation ?
Yes, they already query legacy db with sql. I'm talking about a handful of coworker, domain expert. Nothing exposed to external users.
I had some success porting a Dremio-compatible driver (also uses Calcite) a while back, and I'm aware there have been newer iterations on that front since https://github.com/xtdb-labs/crux-metabase-driver
Not using prod means you can be sure that it doesn't matter if somebody accidentally ::xt/evict
s everything (note there's also a :read-only?
option for the HTTP API) and so it opens up a lot more options. Like it may be simplest to get a notebook stack up and running. Would you consider the experience we have built with Nextjournal, for example, to be too complex https://nextjournal.com/try/learn-xtdb-datalog-today/learn-xtdb-datalog-today ? You could include snippets for export to csv/xls for any deeper analysis graphing requirements (note that the built-in http-server console UI already has a csv export feature)
(how) does core2 efficiently patch documents? that's the kind of usage SQL encourages, so I would think so?
Hey, no, https://github.com/xtdb/core2 doesn't implement any underlying patching mechanism at the physical layer. Row modifications (~documents) are stored each time in full on the assumption that the many qualities afforded by immutable chunked object storage are far more valuable than worrying about storage costs. There are of course still going to be trade-offs possible at indexing/query time, and core2 will generally defer as much of the decision making about optimisation strategies to higher up the stack (which is still very much WIP!). You may be interested to reflect on this minor spoiler from Håkan's upcoming https://www.thestrangeloop.com/2022/light-and-adaptive-indexing-for-immutable-databases.html 🙂 (sourced from: https://stratos.seas.harvard.edu/files/stratos/files/periodictabledatastructures.pdf)
If you consider an unfiltered table (i.e. across all time), a "row" is keyed by ID + the 4 bitemporal timestamp coordinates, which means it's effectively exactly the same as an XT document (note that there's no content hashing to speak of). Internally to the codebase the word "row " has this precise meaning. However the traditional SQL conception of "row" is only keyed by ID, so it's more like an XT entity ...we need to work on making that clearer 😅
For use-cases where the storage costs for certain categories of data might be a concern, there are already capabilities for handling external Arrow data and processing it through the query engine alongside the regular managed Arrow data, which means you can potentially build something more specialised to sit alongside core2 and interoperate.
> is a document a "row modification" I think this one is the answer to your exact question, IIUC, but the modification is a full copy, not a delta
when I insert a subset of columns of the entity/table, does it write the entire set of columns or just the columns I specified? is there a read-before-write?
ah well if it's the same immutable behavior then we still write the entire row/document, I'm just curious how you built the patch-like semantics into the DML
> is there a read-before-write? yep! there's some pretty good looking write-ups (which I confess I haven't studied) on how this is put together and has evolved, see https://github.com/xtdb/core2/pull/284 and https://github.com/xtdb/core2/issues/296 and https://github.com/xtdb/core2/pull/306 and https://github.com/xtdb/core2/issues/292
One fairly important TODO in this space (and mentioned in one of those links) is to design and implement something approximating the internals of how select *
might also work conceptually ...in this case to efficiently know which columns need stitching back together and figure out what the resulting row needs to look like.
Another big related TODO is handling missing values end-to-end through the various planner and expression engine layers, otherwise manipulating data over an evolving column set is...problematic at best 😅
Lots to do!
well, really the only way to get both efficient reads (of a whole entity) and writes (by row) is by denormalizing, which is likely outside your scope since XT will never be an efficient denormalized data store. I already split apart an entity across multiple documents in XT1 so I've needed this