Fork me on GitHub
#xtdb
<
2022-09-01
>
Panel03:09:27

I'm looking for a way to expose the whole db to be queried by semi-technical users. Maybe as a notebook or something web-based, any recommendation ?

tatut03:09:09

queried as in the users will write datalog?

Panel03:09:21

Yes, they already query legacy db with sql. I'm talking about a handful of coworker, domain expert. Nothing exposed to external users.

tatut03:09:08

fwiw xtdb-inspector has a query web UI and browsing of documents

1
refset18:09:13

Is it necessarily needed against the production database?

walterl18:09:03

Metabase would be great for that, but it doesn't like there's an XTDB driver 😞

refset19:09:14

I had some success porting a Dremio-compatible driver (also uses Calcite) a while back, and I'm aware there have been newer iterations on that front since https://github.com/xtdb-labs/crux-metabase-driver

Panel23:09:52

It doesn't have to be prod, are you suggesting something ?

refset09:09:28

Not using prod means you can be sure that it doesn't matter if somebody accidentally ::xt/evicts everything (note there's also a :read-only? option for the HTTP API) and so it opens up a lot more options. Like it may be simplest to get a notebook stack up and running. Would you consider the experience we have built with Nextjournal, for example, to be too complex https://nextjournal.com/try/learn-xtdb-datalog-today/learn-xtdb-datalog-today ? You could include snippets for export to csv/xls for any deeper analysis graphing requirements (note that the built-in http-server console UI already has a csv export feature)

nivekuil18:09:16

(how) does core2 efficiently patch documents? that's the kind of usage SQL encourages, so I would think so?

refset18:09:23

Hey, no, https://github.com/xtdb/core2 doesn't implement any underlying patching mechanism at the physical layer. Row modifications (~documents) are stored each time in full on the assumption that the many qualities afforded by immutable chunked object storage are far more valuable than worrying about storage costs. There are of course still going to be trade-offs possible at indexing/query time, and core2 will generally defer as much of the decision making about optimisation strategies to higher up the stack (which is still very much WIP!). You may be interested to reflect on this minor spoiler from Håkan's upcoming https://www.thestrangeloop.com/2022/light-and-adaptive-indexing-for-immutable-databases.html 🙂 (sourced from: https://stratos.seas.harvard.edu/files/stratos/files/periodictabledatastructures.pdf)

nivekuil18:09:59

is a document a "row modification" or a row?

nivekuil18:09:33

if it's the former then I would count that as patching an entity

refset18:09:33

If you consider an unfiltered table (i.e. across all time), a "row" is keyed by ID + the 4 bitemporal timestamp coordinates, which means it's effectively exactly the same as an XT document (note that there's no content hashing to speak of). Internally to the codebase the word "row " has this precise meaning. However the traditional SQL conception of "row" is only keyed by ID, so it's more like an XT entity ...we need to work on making that clearer 😅

refset18:09:38

For use-cases where the storage costs for certain categories of data might be a concern, there are already capabilities for handling external Arrow data and processing it through the query engine alongside the regular managed Arrow data, which means you can potentially build something more specialised to sit alongside core2 and interoperate.

refset18:09:46

> is a document a "row modification" I think this one is the answer to your exact question, IIUC, but the modification is a full copy, not a delta

nivekuil19:09:09

when I insert a subset of columns of the entity/table, does it write the entire set of columns or just the columns I specified? is there a read-before-write?

nivekuil19:09:32

ah well if it's the same immutable behavior then we still write the entire row/document, I'm just curious how you built the patch-like semantics into the DML

nivekuil19:09:06

without a e.g. transactor

refset22:09:13

> is there a read-before-write? yep! there's some pretty good looking write-ups (which I confess I haven't studied) on how this is put together and has evolved, see https://github.com/xtdb/core2/pull/284 and https://github.com/xtdb/core2/issues/296 and https://github.com/xtdb/core2/pull/306 and https://github.com/xtdb/core2/issues/292

refset22:09:16

there is still very much a deterministic transactor component in play here

refset22:09:09

One fairly important TODO in this space (and mentioned in one of those links) is to design and implement something approximating the internals of how select * might also work conceptually ...in this case to efficiently know which columns need stitching back together and figure out what the resulting row needs to look like. Another big related TODO is handling missing values end-to-end through the various planner and expression engine layers, otherwise manipulating data over an evolving column set is...problematic at best 😅 Lots to do!

nivekuil23:09:21

well, really the only way to get both efficient reads (of a whole entity) and writes (by row) is by denormalizing, which is likely outside your scope since XT will never be an efficient denormalized data store. I already split apart an entity across multiple documents in XT1 so I've needed this

nivekuil23:09:58

the second part sounds like you want a schema?

refset09:09:42

> XT will never be an efficient denormalized data store I wouldn't rule it out. The SQL spec is pretty huge and we are keen to implement anything that is useful and feasible > the second part sounds like you want a schema? perhaps!