i’m new to datalog, and i’m trying to do a group by (in sql terms). this query gives me the count, but how do i return the actual elements?
(d/q '[:find ?year ?month (count ?e)
:where
[?e :local-date ?local-date]
[(.getMonthValue ?local-date) ?month]
[(.getYear ?local-date) ?year]
:order-by [?year :desc ?month :desc]]
@db)In datalevin you may also want a network boundary between db and querier. For example, if you have various apps using the same DB, then you want a dedicated datalevin process and have the various apps access the DB through it, right?
i.e., in a client/server setup, datalevin clients are not mirroring data locally like datomic does, right?
AFAIK datalevin doesn’t have support the datomic-like architecture of distributed in-process queriers and a single transactor. It might be on the roadmap (see “distributed mode”)? They do currently support a client-server mode, but my sense is that fewer ppl are using it, and if you’re building an app that needs multiple nodes IMO you might be better off investigating other options since not using it in-process limits some of the benefits it brings
I don't have a good sense of how people are using Datalevin. So I don't know if "fewer people are using Datalevin in client server mode" is true or not. As far as I can tell, there are some usage in all Datalevin supported modes at this point: some people are using it as a bb pod; certain companies are using embedded mode in a DB per user arrangement; It may come as a surprise to some of you, at Juji, we use Datalevin exclusively in client/server mode (a dedicated machine runs a Datalevin uberjar), with multiple nodes hitting the same Datalevin server. As I have repeatedly stated, one of my goals is to replace PostgreSQL. I am not even competing with Datomic and friends.
That's why I removed the biggest draw of Datomic: the "DB as a value" ideology. I don't think that's the right way of looking at DB. When your programming language is FP, state has to go somewhere. That place is called a DB.
That's how most people are thinking of DB, and they are not wrong.
The ultimate goal of my developing Datalevin is to serve as a basis of AI, i.e. the memory and part of the reasoning component of true AI. Of course, my idea of AI is not LLMs. Those are just perceptual components, important, but just a beginning.
So really, Datalevin has its own ideology, which is mostly at odds with Datomic's ideology. So if you are buying into Datomic's, you will be disappointed. Outside of AI, my ideology of DB is mostly about ergonomics. I want DB to be easy to use, I want it to be integrated with "normal programming", not a unique skills that you have to dedicated yourself to learn. Basically, I don't want DB to be a big deal as it is today.
"There's no specialization in computer science." That's my motto.
There are too many thisDB, thatDB, etc. in this industry, it's wasting everybody's time.
Too much of today's programming work is to massage data from an xDB into a yDB. These are incidental complexities, as they say. Removing these has a much higher ROI than "rewrite everything in fashionable language of the day".
Also, too much of "data science and data engineering" is just wasting time. Data science should be done in DB. After all, that's where data is from and data will go.
The language what works with DB should be the same as that of data science. You don't really need anything else.
I.e. I want an extended Datalog language that can be used to do normal programming declaratively, including data science.
That is to day, I want to follow the trend of programming language evolution: freedoms are taken away, features are removed. First, gotos are removed with structural programming. Then, pointers and manual memory management are removed with GC. Then variables are removed with FP on immutable data. Finally, control flows are removed with declarative Datalog. With each feature removal, programming is closer to human thoughts and further away from implementation details of Von Neumann machines. It's inevitable. Resistance is futile.
Does that mean I advocate for Prolog? No, I advocate for layered mixed programming. Use the best tool for the job. Don't use a single language. Use C do deal with hardware, use Java to have structures, use Clojure to deal with logic, use Datalog to deal with data, in a tightly integrated fashion.
Instead of (count ?e) , just ?e
that works, but i’m getting back [year month entity1] [year month entity2], which is fine and i could do the group-by on those results. instead though, i was wondering if there’s an aggregate function or other way to get that result directly like [year month [entity1 entity2 …]].
Datalevin returns relations, same as sql db
true, but most sql’s also provide an array like aggregator for group by expressions (e.g. group_concat in sqllite or array_agg in postgres). i’m new to datalog stuff, so i’m just trying to wrap my head around how to translate queries. 🙂
@huahaiy much appreciation on answering my question, and thanks for datalevin, it’s been easy to jump into and to get working.
We can surely add those. Please file issues on GitHub.
Have you tried :find ?year ?month (distinct ?e) ? Datascript supports distinct.
that worked!
Also something to keep in mind: the technical impetus for sql to have aggregations is that there’s a network boundary between the db and the queryer, such that passing all the relations back to the queryer for them to aggregate would add a ton a network overhead. This aggregation sub-language in sql adds a ton of complexity: where vs having, all the aggregation functions, windowing, etc. With Datalevin (and many other datalog-likes), there is no network boundary between the db and app, so there’s no reason to add that complexity to the query language. You can just use normal everyday functions to process the data after you’ve retrieved it.
Also something to keep in mind: the technical impetus for sql to have aggregations is that there’s a network boundary between the db and the queryer, such that passing all the relations back to the queryer for them to aggregate would add a ton a network overhead. This aggregation sub-language in sql adds a ton of complexity: where vs having, all the aggregation functions, windowing, etc. With Datalevin (and many other datalog-likes), there is no network boundary between the db and app, so there’s no reason to add that complexity to the query language. You can just use normal everyday functions to process the data after you’ve retrieved it.