Fork me on GitHub

I want to find every entity that keys the atribute :my/keys.. How to datalog it? (d/q [:find ?e :in $ [?keys ...] :where [?e :my/keys ????]] db [:foo/bar :bar/quiux])


Why do you mean "entity that keys the attribute"?


What is the valueType of :my/keys?


Not if it's type is ref and the input keywords are indents


my/keys is a ref to many. Each is it's values should has a db/ident. If I pass :a :b :c, I want just the entities that has this keys.


You have to convert ?keys from indents to entity ids, using datomic.api/entid or with another clause


The v slot of query match clauses is never interpreted because it's interpretation depends on the attribute


Only the e and a slots understand lookup refs and indents because their types are known structurally


How does datomic handle a query that needs to use more data than fits in memory?


well, depending on what you are asking:


I am a newb so don’t take my word for it, but Datomic executes Datalog queries by steps (I think @favila is the one who explained this to me, so he might be able to correct me / explain it to you)


The data queried for each of these steps should fit in memory


but this does not mean you cannot query a dataset larger than what fits in memory of course


just that the data returned by the query (and the data used in each of the intermediate steps) should fit in memory


I suspect this is also the reason Datomic threw the following exception at me: > Exception Insufficient bindings, will cause db scan


It’s basically a degenerate case of the “query step data does not fit in memory”. If clauses are not specific enough then Datomic cannot use the indexes to narrow down the data to get from storage, and so it would have to scan the whole database, which in most applications would not fit in RAM


There might be other reasons, but since you asked the question and I just had this error 5min ago I though it might be partially related


Unrelated question: are datomic backups storage-agnostic? e.g. if I use “dev” mode for a while and then decide to move to Dynamo or SQL later, will I be able to smoothly transition by populating the new storage from a backup?


With dates is it common practice to store them as java dates and coerce them to clj-time/joda dates each time query? Or is there some better way - such as for instance just keeping them as clj-time/joda dates in datomic, so everywhere they are always clj-time/joda dates? The 'clj-time/joda everywhere' makes sense to me, but all the examples I've seen have java dates being stored.


There is to request support for java.time Instants


@hmaurer well that explains why they tout a customer using Spark to overcome the limitation


But definitely the weak spot of the datomic architecture, even if most queries in a given system don't hit this wall.


By the way, while the docs still say that memory is cheap enough to fit all the data in memory, this is not a reality with enterprise data center memory prices (even if it is for your desktop machine).


The problem here is scalability and reliability, as you can simply one day find out that your queries no longer fit in memory just because data accumulation had persisted over time; which is quite terrible a situation unless you can plug in more memory by demand across each machine in your cluster in emergency mode, which is well, a terrible scenario..


I've been through the process of moving all our aggregations from Datomic to Elasticsearch and it went quite well. I see a lot of people who use a relational store and end up in a much worse situation when they hit that wall - because mutable databases simply arent well suited to feeding derived data stores, as they cant answer 'what changed' queries out of the box


@val_waeselynck are you using the log api to keep ES in sync? Or do you follow another approach? out of curiosity


Yeah it’s much easier to do on Datomic… There is bottledwater for postgres but it’s much more complex:


Is the log log api simply "the way" to sync all data changes to an external target, such as ES or even HDFS?


@matan you can do it in whatever way you want, but even when using a SQL datastore usually you want to sync data changes from a flux of events that describe all changes in your main data store


and the log API provides you with that


On your earlier message, that’s not quite true. I don’t know Datomic’s internals in details but I am pretty sure that standard queries on the “present” will not degrade in performance / memory consumed for a large database


I only commented on not scaling by query data size, not the size of the database... some queries will grow with the size of the database, and then ......


@matan I was just commenting on the part ” as you can simply one day find out that your queries no longer fit in memory just because data accumulation had persisted over time”


@hmaurer I know 🙂 and it still holds. Some queries grow with the database, so my statement holds 😉


@matan possibly, but Datomic’s target market isn’t big data. Also if you have a query which would need to go over extremely large amounts of data you are likely better of denormalising in another datastore


which seems fairly straightforward to do with Datomic’s log API


Yeah, basically what Nubanks is doing with Spark.


Kind of odd though, not aiming at being scalable in the size of the data, in this way.


There are always tradeoffs. Datomic can handle pretty large amounts of data, but their goal clearly wasn’t to build a database to process huge volumes of data/writes. They favoured other properties


Is there a tool or monitor for query memory utilization then?