Fork me on GitHub
#xtdb
<
2022-05-12
>
eelke14:05:12

Hi, is it possible to use :in with subqueries? If so, how do you do it?

Martynas Maciulevičius19:05:02

What does the $ stand for? Is it a variable? Also [x y z] are nested in the inner query. Is it possible to match according nested variables? Is this a workable thing: :where [[a :attr [x y z]] ?

eelke09:05:37

Thank you. However, I should have phrased the question in more detail. I would like to set the argument from the outside:

(let [uu 4]
    (xt/q (xt/db node)
          '{:find [x]
            :where [[(q '{:find [y]
                          :in [u]
                          :where [[(identity u) z]]}
                        uu)
                     [[x]]]]})))

eelke09:05:31

@U899JBRPF you have a solution for the above? ☝️

nivekuil09:05:06

you could always backtick quote the query and escape the args you're passing into :in. You don't need to nest quotes btw, I'm surprised that it even works

👍 1
refset11:05:06

Sorry for the late replies! > What does the $ stand for? Is it a variable? It's the implicit db context that the query is running against > Also [x y z] are nested in the inner query. Is it possible to match according nested variables? Is this a workable thing: :where [[a :attr [x y z]] Perhaps this is what you want :where [[a :attr #{x y z}]] ? > I would like to set the argument from the outside You can have :in working at multiple levels:

(let [uu 4]
    (xt/q (xt/db node)
          '{:find [x]
            :in [u]
            :where [[(q {:find [y]
                         :in [u]
                         :where [[(identity u) z]]}
                        u)
                     [[x]]]]}
           uu)))

eelke11:05:51

> You can have :in working at multiple levels: Yeah that works indeed. Why do you need to use multiple levels and can you not use only :in at the inner level?

eelke11:05:24

It feel a but like it should not be needed to do it on multiple levels

eelke11:05:53

But I am happy if this is the way to go, it works 👍

refset11:05:35

I suppose, unlike Clojure, there's no cross-query lexical scoping available in Datalog (or at least how XT implements it!)

braai engineer14:05:52

Colleague is asking, “How is [XTDB] different from traditional temporal tables?” I assume temporal table support in SQL Server.

Tomas Brejla14:05:18

Jeremy's talk at Hydraboi might give him some ideas about that. It's short, yet full of great information. https://www.hytradboi.com/2022/baking-in-time-at-the-bottom-of-the-database

1
Tomas Brejla14:05:40

I'd also recommend this generic talk about datalog databases https://www.youtube.com/watch?v=oo-7mN9WXTw, might help a lot as well

Tomas Brejla14:05:27

To name one thing.. I have no idea how they handle indexes and joins when using that temporal table support in SQL server you mentioned, as I've never had a chance to work with that yet. But I personaly find the you can efficiently join anything on anything without having to explicitly pre-create an index in advance quality of datalog dbs a very nice out-of-the-box feature.

dgb2317:05:59

Background: One fairly recent application I made had almost exclusively bi-temporal tables in SQL. I have played and tinkered with datahike (also an excellent choice) and now with xtdb, I'm primarily interested in xtdb because it supports out of the box bi-temporality, but all the other stuff is very, very useful too - more on that later. I'm convinced that temporality in some sense should be a default choice, meaning you should have a good reason not__ to use it. Even if just used for auditing, it has saved me hours of debugging work and gives me great confidence when I can talk to a client and explain exactly what happened when. Plus I have made tools/apps where I would have really needed it in retrospect, but didn't know better at the time. Note that I do care about performance (latency mostly) but don't work at scale. The biggest differences I found so far: - xtdb is a proper abstraction__ as in I don't need to worry about a ton of things as I would with the SQL implementation. - While SQL and datalog are similarly expressive on an objective/formal level, I find datalog much easier to work with, even after having more time spent learning and working with SQL, it feels more direct__. - It comes with a in-memory db with is fantastic for REPL driven development and exploration. - It comes with options: embedded/rocksdb is great for the usecases I typically have. - Schema on read is great, SQL schemas are not expressive enough anyways so you often end up implementing stuff on top regardless. - In SQL it is nice to have a temporal range (from to) for each dimension directly there on a record. This is convenient for local reasoning. In xtdb you have to do a bit more work with queries to get the same view as far as I know. - I have no idea about the performance differences.

👍 4
🙏 2
sheluchin17:05:11

Is there any way to delete all history while keeping the latest docs? Re-running ETL routines is causing my DB size to get quite large and I don't have a need for the history in my current dev environment.

Hukka18:05:19

I suppose migrating to a new db, as that whole thing is really counter to the idea of having an immutable db

Hukka18:05:12

Of course, dropping the old db (or whatever the equivalent term in kafka is) has to be done manually

sheluchin18:05:49

But then I'd lose the current copy, no? Or what kinda of migration are you suggesting?

Hukka18:05:23

Going through all the current values of entities in the db, and putting them to a clean one

sheluchin18:05:36

Yeah, I suppose that is an option. I guess you're right, the idea is somewhat counter to the concept of an immutable db.. but then so is evict, no? I do wonder if there are any other options to evict/purge just history.

Hukka18:05:48

Sure, evict is. It exists only to meet the need that you sometimes need to really delete some data, so that even the DB admins cannot retrieve it. But it's not very useful when you want to delete more than specific things

Hukka18:05:27

If you don't want to keep the history, and size is a real concern, then wouldn't a different datalog db make sense?

sheluchin18:05:10

I do want to keep the history in production but in my dev environment it would be helpful to be able to purge it once in a while. But, as you say, there are ways around it. Sounds like my use case is not common and I should just figure out a thing for myself. Thanks for the input @U8ZQ1J1RR!

Hukka18:05:54

I just run rocksdb in local dev, and just import data from original sources

sheluchin18:05:22

I'm running rocksdb as well. It's just that I have an ETL pipeline for some OLAP stuff and pulling from original sources can take quite a while. I do it sometimes, but other times when I'm just iterating some some particular part of the pipeline, I prefer to re-run just that part, instead of rebuilding everything. So that's where I start to accumulate a bit of document history. I'm moving towards more efficient ways of doing it all, but not quite there yet.

Hukka18:05:29

Could you serialize some way in the middle?

sheluchin18:05:53

How do you mean?

Hukka18:05:35

Well, it sounds like you don't really need the very original format, but you want to iterate on top of something already processed. Could you serialize that and write it to disk?

Hukka18:05:51

Then whenever you want a clean start, you just read from those and put to the db

sheluchin18:05:01

I'm kind of doing a bit of that already. The pipeline has a bunch of steps with dependencies among them. I think if I define the pipeline as a graph using something like https://github.com/commsor/titanoboa and then establish a good db backup discipline, it should help with making this smoother. Then I can restore from backup whenever. Just haven't done all that work yet.

👍 1
🎉 1
genekim00:05:12

Titanoboa! That's so strange — I was just going to ask in #find-my-lib something like "ETL, DAG, something something?" and gave up. Any chance you can think of any other libraries that support creating DAGs for ETL like jobs? I've been noodling this exact problem — I was doing some stuff in Clerk, and piled up 100Ks of db updates, which were completely unnecessary. Now that I've moved to JDBC database, the consequences I think will be more significant... I haven't quite figured out how to do work in a dev db yet, but have been pondering what strategy to use to do ETL dev and prod, in a way that doesn't do tons of unnecessary overwrites.

sheluchin14:05:34

@U6VPZS1EK I've also looked at https://github.com/domino-clj/domino, plain old Babashka (using depends), https://github.com/framed-data/overseer, and have thought about leveraging Pathom's planner for the DAG component (not execution). Titanoboa seems like the best fit and is actively developed, so it's at the top of my list. Good to hear I'm not the only person here dealing with this issue 🙂 ETL pipelines can get difficult to reason about because they are somewhat similar to a big chunk of state. Please let me know if you have much luck making an improvement in your workflow.

Martynas Maciulevičius19:05:51

Hey. Which consensus mechanism is used by XTDB? I couldn't find it in the docs.

Martynas Maciulevičius19:05:27

My second try was to look in the source code but I couldn't find it.

Hukka04:05:39

It doesn't have any, it's not distributed

Hukka04:05:55

Well, ok, you can have multiple nodes, but they all read and write to the same transaction log

Martynas Maciulevičius06:05:24

Alright. I just wanted to make sure. And if it would've used any then I wanted to look at it.

refset13:05:29

> they all read and write to the same transaction log That is generally the best way to think about things. Of course, the transaction log itself may be distributed (e.g. see the Kafka docs), but XT isn't aware of that at all. There are however some consensus/consistency nuances within the implementations of evictions and transaction functions that we've tried hard to mitigate/account for, and there are issues I can point to that describe these, e.g. https://github.com/xtdb/xtdb/pull/1054 https://github.com/xtdb/xtdb/issues/432