Fork me on GitHub
#datalevin
<
2022-11-24
>
Kein07:11:54

Hi! Datalevin does not store transaction history[1], but in Re-frame docs, it says it can store snapshots[2]. Does [2] remain true given [1]? Reframe docs: http://day8.github.io/re-frame/application-state/#the-benefits

Eugen08:11:36

datalevin does not run in browser. it uses LMDB that needs mmap feature - not available in browsers

Eugen08:11:10

with webassembly and new web API's this might change in the medium feature (1-3 years)

👀 1
Kein10:11:15

Datalevin started out as a port of https://github.com/tonsky/datascript in-memory Datalog database to https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Database. It retains the library property of Datascript, and it is meant to be embedded in applications to manage state It seems that Datalevin compares with both Datomic and Datascript. But datascript is for web applications. So what are the common/primary usages/purposes of Datalevin? Are they in the following: (1) substitution of Datomic in server for web application (2) for client-side state management in Desktop application or are there other common cases?

Eugen10:11:58

what about this link?

👀 1
Kein10:11:22

Is there a link in your message? I haven’t received yet

Eugen10:11:21

I got a bit confused with your post above. https://github.com/juji-io/datalevin/ > Datalevin started out as a port of Datascript in-memory Datalog database to Lightning Memory-Mapped Database (LMDB). It retains the library property of Datascript, and it is meant to be embedded in applications to manage state. Because data is persistent on disk in Datalevin, application state can survive application restarts, and data size can be larger than memory.

Eugen10:11:07

KV store, high performance persistent cache , datalog db

Kein10:11:49

Is it meant to be on a server side or client side ?

Eugen10:11:59

some apps require some storage and with datalevin you get clojure API to a KV / datalog store

Eugen10:11:04

only on server

Eugen10:11:48

it uses datascript query engine and adds file (LMDB) persistence + nice indexes

Eugen10:11:11

datalevin ~ sqlite

Kein10:11:15

I see. So if I have Datalevin running on a server, I still need to consider about how to manage state in client, right?

Kein10:11:49

This is one of Datascript’s features

Eugen10:11:26

yes, use datalevin on server and datascript on client if you like

Eugen10:11:36

but they are completely independent

Kein10:11:36

Oh, I see.

Eugen10:11:54

the only link is that datalevin uses the datascript query engine code.

Kein10:11:58

That answers my confusion point

Eugen10:11:01

and reimplements the storage part

Kein10:11:56

I see. Thanks for the reply : D

Eugen11:11:29

you,re welcome.

Eugen11:11:40

you should give datalevin a try, regardless

Eugen11:11:54

I think it's a great piece of software

Kein11:11:20

Can you elaborate more about it?

Kein11:11:59

I just figured out that I need a data structure of Datomic style in my application

Kein11:11:02

and I’m choosing which DB to use today. I know there are Datomic and Datalevin, and I also watched the 2020 clojurain videos

Eugen11:11:46

datalevin is simple to use and get started. you can use kv store or datalog store

Eugen11:11:00

give it a try and see if it is a good fit for your use case

Kein11:11:10

I see the point

Eugen11:11:29

no server required, a file (directory) is all your data

Kein11:11:27

Haha last month I got started and wrote some schema and data.

Kein11:11:29

But because of the clojure learning curve, I use node js + react to build my prototype version

Kein11:11:41

Now I found that I do need a Datomic style db, and I’m preparing to try Clojure and Datalevin

Kein11:11:51

I’ll try and see if there are any issues : D

pithyless11:11:49

> but in Re-frame docs, it says it can store snapshots[2] The docs (http://day8.github.io/re-frame/application-state/#the-benefits) you mention say this: > It is easy to snapshot and restore one central value. Immutable data structures have a feature called structural sharing which means it doesn't cost much RAM to keep the last, say, 200 snapshots. All very efficient. This is true - in memory, but it is a very different story if you need to keep this information durable across restarts (e.g. by writing it to disk). In fact, the docs say as much: > But, many web applications are not self-contained data-wise and, instead, are dominated by data sourced from an authoritative, remote database. For these applications, re-frame's app-db is mostly a local caching point, and being able to undo/redo its state is meaningless because the authoritative source of data is elsewhere.

👀 1
pithyless11:11:17

That is not to say, you couldn't design your schema in such a way that you could keep track of changes in such a way that you can audit them later, but both Datascript and Datalevin do not support this out of the box (unlike Datomic which does do this by default, by having a different starting schema)

pithyless11:11:46

So, when re-frame mentions things like snapshots, they're talking about a debugging process where you can go back and forth between different versions of your in-memory state (which is very useful for debugging your app) - see e.g https://github.com/day8/re-frame-10x

pithyless11:11:42

But what they're not talking about is having a complete history of user actions that is useful for long-term auditing, data time-traveling, etc.

Kein11:11:58

And “long-term auditing, data time-traveling” is what Datomic provide out of the box, right?

pithyless11:11:46

in a manner of speaking; what you get for free is that "updates" don't actually clobber values, they produce new transactions with the new values and update indices appropriately. Out of the box, the newest DB snapshot will return the newest version of the value, BUT unlike Datascript and Datalevin, Datomic will always keep the history of the previous versions - so you can also query against those.

Kein11:11:19

I see your point about the difference between in-memory vs database snapshots

pithyless11:11:25

If "data provenance" ("why exactly am I seeing this value now?") is something that is very important to your domain, you may also want to read more about "bitemporal databases" - e.g. https://xtdb.com/

👀 1
pithyless11:11:25

Datomic guarantees that the order of writes to the DB will be visible in the history; but sometimes in the real world the "perceived order" of the history is not the same as the "actual order" in which it was stored

Kein11:11:57

I see the importance of valid-time

pithyless11:11:09

Val has some good articles on the subject; e.g. this one is good at explaining Datomic history: http://vvvvalvalval.github.io/posts/2017-07-08-Datomic-this-is-not-the-history-youre-looking-for.html

👀 1
Kein11:11:40

In my case, I don’t have strong requirements of temporal data. I am building a Notion-like system but the data structure is not tree, but graphs. So I think there are frequent updates on a same entity and does not really exploit the temporal information Datomic offers.

Kein11:11:36

However, I’m thinking about the undo-redo feature as it is also important. Is it a common practice to implement history snapshots based on Datalevin?

Kein11:11:14

Haha, Val’s title nailed it.

Huahai17:11:51

As mentioned in README and my talk, Datalevin came from my company’s experience of using Datomic. Our opinion is that built-in temporal features of databases does not match well with temporal needs of applications. That is why Datalevin choose to let user to build their own temporal features.

Huahai17:11:40

As to the use case of Datalevin, the eventual goal is to have a general purpose database that serve most of the data storage and query needs. You can see that in our roadmap. That is to say, Datalevin is planned to be the only operational database of complex application, instead of having to use multiple different databases, such as postgres, datomic, redis, etc. The only storage that Datalevin does not plan to replace is log storage.

Huahai17:11:56

So in the end, Datalevin can be used as a fast KV store (redis), relational DB (sqlite, postgres, etc.), a document store (MongoDB), a search engine (Lucene), a production rule engine (Jena, etc), a graph database (Neo4j), and it can be used as an embedded library, a server, and a distributed store. The goal is to simplify data storage and query by unifying all the read-heavy data needs into a single store.

Huahai17:11:07

For write-heavy data needs, use something else, e.g. elasticsearch, influxDB, etc. For example, for history and audit purpose, one of these logging databases should be used. In my opinion, it should not be done in the operational databases. Operational databases deal with the world as of today. History belongs to somewhere else. Meshing these two serves only to complicate things unnecessarily.

👀 1
Kein12:11:10

I see your detailed points.