xtdb 2022-08-18 | Slack Archive

love the veracity of explanations in the xtdb vision document! https://xtdb.com/pdfs/vision-doc.pdf

❤️ 1

Thanks for sharing that link @UPH6EL9DH! 😄

💯 1

➕ 1

q: does the core1 vs core2 distinction mean sql-first is the way forward? does this have any drawbacks to those that prefer datalog?

refset15:08:29

Hey @UPH6EL9DH thanks for taking a look and for your feedback 🙂 For better or worse, SQL remains the only ~standard way forward for the majority of the data industry for the foreseeable future. No doubt vanilla SQL makes expressing certainly kinds of queries much harder than is strictly needed (e.g. compared to implicit Datalog joins and pull), but conversely it has a tonne of features (compared with Datalog) that makes it fairly user friendly for constructing high-level queries very quickly. Ultimately, our intention here is to implement a version of SQL that is simpler for application developers to work with (using Clojure or otherwise) than anything else that exists today, thanks to immutability/dynamism/etc.

refset15:08:18

For anyone curious about the context here ("core2"), please see the newsletter we circulated earlier: https://us13.campaign-archive.com/?u=b72ef384b5199134185cbeed8&id=1f5ad9139b (and sign-up!)

👍 1

Aleed16:08:58

@U899JBRPF thanks for the detailed response! > our intention here is to implement a version of SQL that is simpler for application developers to work with is it too early to ask to share an example code snippet of what this may look like? 🙂 is there a suggestion for new users considering xtdb? shall we consider waiting for early releases of core2 or is this still too experimental to tell?

refset17:08:36

> is it too early ... is this still too experimental Yes and yes, sorry 😅 We will have some docs and end-to-end examples to point at within the next few weeks (in time for Strange Loop!), but this current phase is merely an incremental opening up of what has until recently been a private repo with only code and tests. It definitely remains (rather promising) research at this stage. > is there a suggestion for new users considering xtdb? For the avoidance of doubt: new XT users should absolutely embrace 1.21.0 (the latest version) as-is and be excited for the upcoming ingest performance gains in 1.22.0 (currently hovering around ~40% faster for bulk ingestion/re-indexing). At the moment the focus is on stability and performance, but we also have some near-term functional enhancements planned, e.g. re-working the checkpointing functionality a bit. If people are particularly excited about this concept of SQL-first and core2 more generally though then it would be great to chat sometime and hear about the kinds of things one might hope to achieve with XT in the future!

👍 3

🙏 1

tatut04:08:08

I definitely don’t want to go back to SQL, having embraced datalog

👍 4

tatut05:08:03

after a quick glance, it seems xtdb2 will be a completely different thing, so you wouldn’t be able to upgrade to it without a complete migration

tatut05:08:20

hopefully that won’t be confusing having 2 different products with the same name

tatut05:08:32

the timeline states growing xtdb1 market share up to 2023, will it be abandoned or continued to be supported after that? I’m a little worried… luckily it’s open source, but still

refset07:08:07

> I definitely don’t want to go back to SQL, having embraced datalog > Hey @U11SJ6Q0K that's fair, and I can strongly identify with the sentiment, FWIW. One thing to keep in mind is that Datalog is a much simpler thing to implement than SQL, so we're essentially just tackling the (by far) biggest unknowns first right now. I can confirm that withdrawing support for Datalog and XTDB 1.* is not at all on the cards for us at any point without a clear & workable migration plan for our customers and OSS user community (including many internal JUXT users also!), and absolutely not whilst we have any existing support relationships and partnerships in place. Fortunately having everything be ~immutable opens up a lot of technical possibilities, particularly also when the essential data model (bitemporal documents) is unchanged. We may well even end up simply folding all this work back into the main repository once it's more complete, which hopefully could also minimise many such concerns and confusion. I'll have a think about addressing this more clearly in the readme today! Thank you (as ever) for chiming in with your thoughts and feedback 🙂

tatut07:08:12

thanks, that is encouraging to hear

☺️ 1

Ben Sless12:08:40

Is SQL first even relevant? As both datalog and SQL map to relational algebra, why not just have a relational algebra engine with two language frontends?

👍 1

refset13:08:06

> why not just have a relational algebra engine with two language frontends? Great question - in fact that's essentially how we have it all working internally already, and the edn-based relational algebra IR looks something like: https://github.com/xtdb/core2/blob/dc8edd13995717675a6a2b8f6ee3f05a8dc05f5c/modules/datasets/src/core2/tpch.clj#L95 Note that we have the seeds of an edn-Datalog frontend implementation here: https://github.com/xtdb/core2/blob/master/test/core2/datalog/datalog_test.clj / https://github.com/xtdb/core2/blob/master/core/src/core2/datalog.clj (crucially though there is no rules / recursion support yet, so not yet a true Datalog!) A big engineering unknown has been that figuring out how to build a sufficiently generic and high-performance relational algebra engine really does require understanding the essential requirements of any proposed frontends ahead of time, and SQL has much more complicated semantics (in terms of bags, 3VL, scoping etc.). Therefore whatever we might have designed with a Datalog-first approach would almost certainly be insufficient to later build SQL atop, at least without lots more hammock time and budget to potentially rework everything, and so SQL-first was decided as the lowest risk path. It's probably also worth mentioning that we actually couldn't find any suitable "standard"/off-the-shelf approaches to implementing relational algebra for SQL that we might have been able to port across (which might have saved us design time), but the closest things are perhaps https://calcite.apache.org/ (same thing that we used to implement https://github.com/xtdb/xtdb/tree/master/modules/sql) and https://substrait.io/ ...so we forged our own path with the edn IR 🙂

Ben Sless14:08:02

Regarding a sufficiently generic (and reactive) relational engine, I thought #relic achieved that already

refset15:08:03

I'm not knowledgeable enough (yet!) to comment on the specific IR differences with #relic, but @U0GE2S1NH (relic's author) actually joined the XT dev team back in March...so should be fairly well-qualified to expand on that at some point 😄

Ben Sless15:08:07

That's pretty cool

➕ 1

👋 1

Ben Sless09:08:51

I guess that does mean no relic2 anytime soon :thinking_face:

wotbrew10:08:59

2.0 will need some headline feature like 'quantum joins', or 'microsoft excel integration'. Hope to get back to #relic soon do not worry. Datalog will come, hash will be given a day off, and uh - I'll make it more debuggable or something.

😄 1

zeitstein18:08:43

Has any consideration been given to 'reactive queries' / materialised views for core2?

refset18:08:54

> Has any consideration been given to 'reactive queries' / materialised views for core2? Very briefly - in that it's an interesting research space but is definitely out of scope for the time being 🙂

refset18:08:19

Immutability makes integrating differential-dataflow (or equivalent) much easier though - and maybe the intent behind https://github.com/xtdb-labs/crux-dataflow will see the light of day after all

zeitstein20:08:31

I have a data modelling question I can't seem to lay to rest. Hoping somebody might have better ideas or advice (including: "nothing to be done – tradeoffs!") 🙂 I want to enable an entity being in multiple locations in the graph. Some of the entity data should be location-dependent. The way I've modelled this so far is:

;; idea 1
  {:xt/id 1 :children [2 3]}
  {:xt/id 3 :children [4]}
  {:xt/id 2 :global :a :local :b}
  {:xt/id 4 :wraps 2 :local :c}

Another idea:

;; idea 2
  {:xt/id 1 :children [2 3]}
  {:xt/id 3 :children [2]}
  {:xt/id 2 :global :a :locations {1 {:local :b} 3 {:local :c}}}

• The biggest downside to idea 2 is the inability to refer to a location by a simple identifier – always needs [parent id]. Which is not too bad until I think about web URLs: the thought of having two UUIDs in there is not appealing. • Otherwise, there is much to like about idea 2. Less joins and indirection – much simpler business logic. :local data will not be queried for, so this even keeps indexes lighter.

zeitstein20:08:26

One idea: use https://zelark.github.io/nano-id-cc/.

tatut04:08:57

I would say it depends on how you want to use the data, what sort of queries you need to efficiently answer

➕ 1

tatut04:08:59

like, instead of having a list of children, you could have a child refer to its (possibly many) parents, but then children wouldn’t have order

zeitstein05:08:25

Order is what is missing from my analysis above. Order is required and can be changeable. I've also realised that having the dynamic [parent order id] (or just [parent order]) as identifier cannot work because it's is changeable. E.g. cannot make a reliable URL from it.

zeitstein05:08:30

And graph traversal is the most fundamental/common kind of query.

2022-08-18

Channels