A question that came up while I was reviewing the book: what's the purpose of d/db in datalevin? In datomic, it gives you a consistent view of the data you can use in multiple places to ensure the data isn't changing out from under you across multiple queries. But datalevin doesn't have databases-as-a-value, so how is that d/db returns distinct from a connection? If it doesn't do anything, why can't you skip it and query a connection directly?
From what I could tell, d/db is identical to derefing the connection, so you could just do that if you wanted I think; I assume its inclusion also adds to API parity
I get the argument from a parity perspective, though I'd imagine that in docs this would be presented as more of a footnote than the default way to use datalevin. Why does a connection need to be deref'd?
d/db has a practical purpose in server mode, it refresh :last-modified by default, which decides if we want to refresh cache.
Does it have a purpose outside of server mode?
Other than server mode needs it, the purpose was mostly compatibility with datascript/datomic. However, the parity between server/embedded is one of the design goals.
That makes sense I guess. It just seems like unnecessary api overhead in embedded mode, which I feel like will be the more common mode to use (though I may well be mistaken)
A level of indirection with conn also allows us to add things like listeners and tx-meta, etc.
In embedded mode, is there any difference in behavior between reusing a db and creating a new one with d/db?
of course, you don't want to reuse a db, it may be stale.
If it were stale would you see old data? I.e. is a db a value or a reference?
you may see wrong data
db is an mutable object that carries some state, those state could be wrong in an old db
So dbs are mutable and may change out from under you, but old ones may not be consistent with the state of all transacted data?
correct
That seems like kind of a footgun. Would it be crazy to have fns that take a db also accept a conn and auto-coerce it to a db?
no, fns accept either db or conn
Right, Iβm saying that for example d/q could accept a conn or a db and if it's a conn, call d/db on it
no
there's no point, what's for?
the point is that d/q is not only working with a datalevin db, the point is to stress the asymmetry between txn and query.
txn is a connection concern, query is not
Otherwise it seems like the api relies on users to use the api perfectly and always call d/db anew for every query. If they mess up and reuse a db, then they silently get the wrong data back, and the problem wouldn't be immediately obvious until some poor dev ends up debugging a very strange issue in production. Also in general, a db isn't meaningful or useful to an api consumer. They (hopefully) know they have to call d/db for every query and never reuse the return value, so it ends up being boilerplate for every query at best, and a time bomb at worst. Given those issues, the first thing I would do after bringing datalevin into a project would be to wrap all of its db-accepting fns to accept a conn and call d/db on it. That's the only correct way to use them anyways, so I'd want to ensure no one on my team ever messed that up. And if Iβm doing that, then I feel like it's worth asking if that should just be the api.
This is different from datomic (et al) where a db value is useful on its own as a point-in-time snapshot, you can reuse it across multiple queries to provide them with a consistent view of the database between the queries
they will need to, that's the book will teach them
the point of d/db is to get a fresh view of the data, that much is clear. what's the problem with that?
I am updating the book to avoid @conn. It will not even be shown in the book.
again, d/q does not work on conn, it works on a snapshot of the data. I think the story is coherent.
it also works on things that are not database at all
also, it works on more than one database. so I don't think the expectation of a symmetric with transaction is warranted. Transaction with conn, q works with (d/db conn), this is perfectly fine. I wouldn't change a thing.
if you introduce a conn to q, you are making things confusing to people. Keep things simple. There's only one way to do things. Much easier to understand.
A connection is different from a db, that much is what people already understand. There's no need to mud the water.
Maybe there's something I'm not understanding, but I'm having trouble wrapping my head around something you just said:
> d/q does not work on conn, it works on a snapshot of the data
With Datomic, this is 100% true. d/db returns a consistent snapshot of the data that's the same 5ms and 5 hours later. If you query the same db over and over, you always get the same result.
In Datalevin's docs, it specifically says that Datalevin does not have db-as-a-value semantics, making me think that if you query the value returned by d/db at different times, then you get different results. Is that true?
If not, then I completely understand the split between conn and db and I have no confusion about their separation.
If is is true though that querying the same db at different times can give you different results, then d/q in Datalevin does not operate on a snapshot of the data, it works on something else. I don't think I understand what that thing is and what its semantics are. All my other comments/suggestions lead from that lack of understanding, so perhaps I should start there.
Db is a mutable object. If you use an old db, it is simply wrong. That's it. There is no such thing as old db.
Datalevin only works on current db.
There's no concept called an old db. There is no such thing.
It is like you are holding a reference that has expired. That's about all you can say. What exactly that thing you are holding, it is undefined.
The purpose of calling db function, is to ensure you are working with the current and the only db you should be working with.
Db is not a value, db is a state. Db function allows you to access that state. That's it.
A Db is the surrogate of external world, which is changing. Db function is basically perceiving the world, which gives you a snapshot of the state, in time. You don't hold onto your perception, you always constantly looking, get a new look before deciding to do anything. That's the model. The so called db as a value is simply a wrong model. The world is constantly changing, not something you can hold onto.
The world is a river, you cannot step into the same river. That s the mental model. There is no such thing as a static world, therefore, the idea of a previous world doesn't exist.
Added these clarification in the book. Thanks for pointing these out.
So then what happens if you do reuse an old db reference? Is it essentially undefined behavior?
correct. what exactly is going to happen depends on implementation details that will change and not part of the public API.
And how long does it stay "fresh"? For example, is this code incorrect?
(defn two-queries [conn]
(let [db (d/db conn)
ids (map first (d/q db ...some query with a smaller return size...))
(->> ids (filter ...) (mapv #(d/pull db [...larger pattern...] (find :attr %))))))
This is a pretty common datomic pattern. For filters that are complicated to apply in a query or for aggregations that would require complicated subqueries, you can just call d/q multiple times with the same db and all the calls with have the same view of the data.if db is a value, you don't need transaction, ACID etc. That's why Datomic's transaction model is unusual, because it is not compatible with the reason to introduce transaction.
you are encourage to use a single query to do what you want in datalevin.
the datomic patterns do not work well, because datomic doesn't have a query optimizer
you are not encouraged to do the query like thing in your own code, you are supposed to work with query.
> if db is a value, you don't need transaction, ACID etc. That's why Datomic's transaction model is unusual, because it is not compatible with the reason to introduce transaction. Transactions aren't just about reading consistent values. ACID is also about ensuring that the db only can transition between consistent states.
Sure, but what I'm really trying to get at is, if db is a mutable reference that goes "stale" over time and should only be used while "fresh", what are the semantics of "fresh"? Is it 1ms? 10ms? 10000ms?
the original reason to introduce the idea of transaction is to simulate the world, which does not do thing in inconsistent ways, but computer can. If db is a value, you don't need to introduce this idea, because it is by definition already the case.
it is not about fresh, it is about "current".
Ok, so it stays "current" until the next transaction is applied?
all your read will be current if you use db function, because that's what db function does. It is a snapshot, because it may not be "really" current, because it is the db state when you called it, while you are reading, the db may have advanced, but what you are reading is current to that snapshot. So it is a snapshot.
> the original reason to introduce the idea of transaction is to simulate the world maybe? but that's not the only reason folks currently reach for a db > which does not do thing in inconsistent ways, but computer can. If db is a value, you don't need to introduce this idea, because it is by definition already the case. https://en.wikipedia.org/wiki/ACID#Consistency Consistency ensures that a transaction can only bring the database from one consistent state to another, preserving database https://en.wikipedia.org/wiki/Invariant_(computer_science): any data written to the database must be valid according to all defined rules, including https://en.wikipedia.org/wiki/Integrity_constraints, https://en.wikipedia.org/wiki/Cascading_rollback, https://en.wikipedia.org/wiki/Database_trigger, and any combination thereof. This prevents database corruption by an illegal transaction. An example of a database invariant is https://en.wikipedia.org/wiki/Referential_integrity, which guarantees the https://en.wikipedia.org/wiki/Unique_keyβhttps://en.wikipedia.org/wiki/Foreign_key relationship.https://en.wikipedia.org/wiki/ACID#cite_note-Date2012-7 Datomic supports validation functions that will reject transactions that would introduce invalid states or violate invariants. It also stores transactions for auditing which is useful in its own right.
what I am saying, datomic use the word transaction not in its original sense. that's why its transaction semantics is considered "unusual", because that's not what that word means. Read the Jepson report on datomic, "unusual" is not the label I give, it is in that report.
What Datalevin does, it is the usual thing almost all other databases do. That's basically what it is.
Sure. I wasn't objecting to "unusual". I was objecting to the idea that transactions don't make sense for datomic.
If we want to have a convo about comparing datomic and datalevin txn semantics, could we move that to a new thread? I'd prefer not to derail this one
If you change the meaning of "transaction", yes.
The difference is that with most databases, you query and transact against the same connection object, which serves as a transparent connection to the outside world. Datomic introduced a separate db object specifically as an immutable value to solve a problem they created by not having traditional "fenced", session-based transactions. I guess I'm confused what value having separate db and connection objects provides if Datalevin has fenced transactions and no db-as-a-value semantics. If the only correct thing you can do with a db object is create it from a connection, pass it to a single api fn, and then discard it, then it seems like it's already basically a connection with extra steps.
right, but datalevin query is more powerful than working with a single connection object. It can also query other things.
such as?
such as query multiple dbs in the same query, query data structure directly.
q is more than working with a single connection.
I have already addressed your expectation of symmetry between txn and query above. that expectation is wrong.
Ok sure, you could do a query that involved multiple dbs. Are they all based on the same conn, or different conns? How do you create a db for a specific database?
they usually are based on different connections. creating a datalevin db is cheap. it's perfectly ok to create many in one application. in fact, a db per user is a common pattern.
a datalevin db is just a single file
Got it. So then, why not have this tiny bit of code in q?
(let [args (map #(if (conn? %) (d/db %) %) args)]
...)
That can handle querying one db, multiple dbs, data structures, whateverargs are more complicated that there are rules, there are collections.
of course. But for all of those things, conn? will be false, right?
conn? is expensive, it is what actually refresh cache and may send remote calls
conn? call db?, which check timestamp
Oh interesting. Is there no cheap way to check whether an object is a connection? Like via instance? or something?
the design decisions were all trade-offs. you can always qrgue one way or another. it's a judgement. If you only consider one thing, you may disagree, but if you have fuller picture, your judgment may change.
it's really come down if you trust the designer's judgment or not.
Sure, I totally get it. I'm more trying to understand what factors led to the decision.
And I'm pushing a little extra as a book reviewer, because I suspect you're going to get this exact same line of questioning from anyone who's used datomic before
q is already complicated enough. also, you want clear conceptual boundary. mixing conn with db is not conceptually helpful.
I think some may argue that they've been already mixed since dbs don't have value semantics
that's because you have contacted datomic. most people haven't. my goal is wide adoption, my target is PostgreSQL, datomic is not in my consideration to be honest
Well, I'd imagine postgres users being confused why they need to call d/db at all
as I have already mentioned many times, I simply believe db as a value as anti-pattern.
I'm not disputing that. I'm poking at this a little more than I would otherwise because I can imagine d/db being a little bit of a scary function for users. You have to use it correctly or else undefined behavior might silently occur, and the correct way to use it is a) different from how it would be used in datomic, and b) peculiar coming from sql dbs
because datalevin can do more than just a single connection that's enough to get people to accept a new thing. It has new capability, what's the matter to accept a new boilerplate?
no, you use it always, so it's just a matter of cause.
Anyways, I think it might be worthwhile to have something in the book that calls out calling d/db as a small piece of necessary boilerplate and explaining how not to use it
done
I think it's easier to justify its presence as "this is just how it works" than to make a design-based argument, since I think any design-based argument will end up with readers going down the same line of questioning I just went down. If we hadn't had this conversation, then I would've wrapped d/q with that same little bit of mapping code that calls conn? not knowing that there were problems with that approach
sure, in the book, it is presented as how it is.
thanks for the discussion, it does improve the book clarity on this.
besides, this required (d/db conn) call would make the future distributed db feature much cheaper and easier to implement and be more performant, as it gives a clear boundary of when to observe the database state now. We no longer have to make sure db is in consistent state all the time.
I'd certainly be interested in taking a look! I started learning/integrating Datalevin into a project at $COMPANY (and being a nuisance in this channel) a few months ago now- and I'd still definitely classify myself as someone who could benefit from a book like this :^)
wow, I canβt wait to get started reading