datalevin

winkywooster 2024-12-31T18:55:50.541929Z

i’m new to datalog, and i’m trying to do a group by (in sql terms). this query gives me the count, but how do i return the actual elements?

(d/q '[:find ?year ?month (count ?e)
         :where
         [?e :local-date ?local-date]
         [(.getMonthValue ?local-date) ?month]
         [(.getYear ?local-date) ?year]
         :order-by [?year :desc ?month :desc]]
       @db)

euccastro 2025-01-04T15:34:16.042009Z

In datalevin you may also want a network boundary between db and querier. For example, if you have various apps using the same DB, then you want a dedicated datalevin process and have the various apps access the DB through it, right?

euccastro 2025-01-04T15:36:13.365749Z

i.e., in a client/server setup, datalevin clients are not mirroring data locally like datomic does, right?

Max 2025-01-04T15:44:37.822609Z

AFAIK datalevin doesn’t have support the datomic-like architecture of distributed in-process queriers and a single transactor. It might be on the roadmap (see “distributed mode”)? They do currently support a client-server mode, but my sense is that fewer ppl are using it, and if you’re building an app that needs multiple nodes IMO you might be better off investigating other options since not using it in-process limits some of the benefits it brings

👍 1
Huahai 2025-01-04T19:33:08.067849Z

I don't have a good sense of how people are using Datalevin. So I don't know if "fewer people are using Datalevin in client server mode" is true or not. As far as I can tell, there are some usage in all Datalevin supported modes at this point: some people are using it as a bb pod; certain companies are using embedded mode in a DB per user arrangement; It may come as a surprise to some of you, at Juji, we use Datalevin exclusively in client/server mode (a dedicated machine runs a Datalevin uberjar), with multiple nodes hitting the same Datalevin server. As I have repeatedly stated, one of my goals is to replace PostgreSQL. I am not even competing with Datomic and friends.

Huahai 2025-01-04T19:37:36.639819Z

That's why I removed the biggest draw of Datomic: the "DB as a value" ideology. I don't think that's the right way of looking at DB. When your programming language is FP, state has to go somewhere. That place is called a DB.

Huahai 2025-01-04T19:38:19.440399Z

That's how most people are thinking of DB, and they are not wrong.

Huahai 2025-01-04T19:43:12.954809Z

The ultimate goal of my developing Datalevin is to serve as a basis of AI, i.e. the memory and part of the reasoning component of true AI. Of course, my idea of AI is not LLMs. Those are just perceptual components, important, but just a beginning.

Huahai 2025-01-04T19:49:50.655989Z

So really, Datalevin has its own ideology, which is mostly at odds with Datomic's ideology. So if you are buying into Datomic's, you will be disappointed. Outside of AI, my ideology of DB is mostly about ergonomics. I want DB to be easy to use, I want it to be integrated with "normal programming", not a unique skills that you have to dedicated yourself to learn. Basically, I don't want DB to be a big deal as it is today.

Huahai 2025-01-04T19:54:05.669079Z

"There's no specialization in computer science." That's my motto.

Huahai 2025-01-04T19:58:25.579969Z

There are too many thisDB, thatDB, etc. in this industry, it's wasting everybody's time.

Huahai 2025-01-04T20:03:12.170299Z

Too much of today's programming work is to massage data from an xDB into a yDB. These are incidental complexities, as they say. Removing these has a much higher ROI than "rewrite everything in fashionable language of the day".

Huahai 2025-01-04T20:06:24.203249Z

Also, too much of "data science and data engineering" is just wasting time. Data science should be done in DB. After all, that's where data is from and data will go.

Huahai 2025-01-04T20:08:11.665209Z

The language what works with DB should be the same as that of data science. You don't really need anything else.

Huahai 2025-01-04T20:11:51.965499Z

I.e. I want an extended Datalog language that can be used to do normal programming declaratively, including data science.

Huahai 2025-01-04T20:16:21.868649Z

That is to day, I want to follow the trend of programming language evolution: freedoms are taken away, features are removed. First, gotos are removed with structural programming. Then, pointers and manual memory management are removed with GC. Then variables are removed with FP on immutable data. Finally, control flows are removed with declarative Datalog. With each feature removal, programming is closer to human thoughts and further away from implementation details of Von Neumann machines. It's inevitable. Resistance is futile.

🧠 1
🍇 1
Huahai 2025-01-04T20:21:43.261199Z

Does that mean I advocate for Prolog? No, I advocate for layered mixed programming. Use the best tool for the job. Don't use a single language. Use C do deal with hardware, use Java to have structures, use Clojure to deal with logic, use Datalog to deal with data, in a tightly integrated fashion.

Huahai 2024-12-31T19:23:03.074979Z

Instead of (count ?e) , just ?e

winkywooster 2024-12-31T19:43:57.446999Z

that works, but i’m getting back [year month entity1] [year month entity2], which is fine and i could do the group-by on those results. instead though, i was wondering if there’s an aggregate function or other way to get that result directly like [year month [entity1 entity2 …]].

Huahai 2024-12-31T19:46:52.526669Z

Datalevin returns relations, same as sql db

winkywooster 2024-12-31T19:58:18.426349Z

true, but most sql’s also provide an array like aggregator for group by expressions (e.g. group_concat in sqllite or array_agg in postgres). i’m new to datalog stuff, so i’m just trying to wrap my head around how to translate queries. 🙂

winkywooster 2024-12-31T19:59:37.296499Z

@huahaiy much appreciation on answering my question, and thanks for datalevin, it’s been easy to jump into and to get working.

Huahai 2024-12-31T20:57:02.847109Z

We can surely add those. Please file issues on GitHub.

2024-12-31T23:38:59.554309Z

Have you tried :find ?year ?month (distinct ?e) ? Datascript supports distinct.

💥 3
winkywooster 2025-01-01T15:22:32.767509Z

that worked!

Max 2025-01-01T16:23:30.676829Z

Also something to keep in mind: the technical impetus for sql to have aggregations is that there’s a network boundary between the db and the queryer, such that passing all the relations back to the queryer for them to aggregate would add a ton a network overhead. This aggregation sub-language in sql adds a ton of complexity: where vs having, all the aggregation functions, windowing, etc. With Datalevin (and many other datalog-likes), there is no network boundary between the db and app, so there’s no reason to add that complexity to the query language. You can just use normal everyday functions to process the data after you’ve retrieved it.

➕ 2
Max 2025-01-01T16:23:30.847069Z

Also something to keep in mind: the technical impetus for sql to have aggregations is that there’s a network boundary between the db and the queryer, such that passing all the relations back to the queryer for them to aggregate would add a ton a network overhead. This aggregation sub-language in sql adds a ton of complexity: where vs having, all the aggregation functions, windowing, etc. With Datalevin (and many other datalog-likes), there is no network boundary between the db and app, so there’s no reason to add that complexity to the query language. You can just use normal everyday functions to process the data after you’ve retrieved it.