datomic 2017-07-16 | Slack Archive

souenzzo03:07:47

I want to find every entity that keys the atribute :my/keys.. How to datalog it? (d/q [:find ?e :in $ [?keys ...] :where [?e :my/keys ????]] db [:foo/bar :bar/quiux])

favila03:07:31

Why do you mean "entity that keys the attribute"?

favila03:07:05

What is the valueType of :my/keys?

robert-stuttaford08:07:19

@U2J4FRT2T ???? -> ?keys

favila13:07:19

Not if it's type is ref and the input keywords are indents

souenzzo11:07:33

my/keys is a ref to many. Each is it's values should has a db/ident. If I pass :a :b :c, I want just the entities that has this keys.

favila12:07:55

You have to convert ?keys from indents to entity ids, using datomic.api/entid or with another clause

favila12:07:58

The v slot of query match clauses is never interpreted because it's interpretation depends on the attribute

favila12:07:58

Only the e and a slots understand lookup refs and indents because their types are known structurally

matan18:07:44

How does datomic handle a query that needs to use more data than fits in memory?

hmaurer18:07:22

@matan it can’t

hmaurer18:07:34

well, depending on what you are asking:

hmaurer18:07:34

http://docs.datomic.com/query.html#memory-usage

hmaurer18:07:17

I am a newb so don’t take my word for it, but Datomic executes Datalog queries by steps (I think @favila is the one who explained this to me, so he might be able to correct me / explain it to you)

hmaurer18:07:55

The data queried for each of these steps should fit in memory

hmaurer18:07:10

but this does not mean you cannot query a dataset larger than what fits in memory of course

hmaurer18:07:25

just that the data returned by the query (and the data used in each of the intermediate steps) should fit in memory

hmaurer18:07:46

I suspect this is also the reason Datomic threw the following exception at me: > Exception Insufficient bindings, will cause db scan

hmaurer18:07:10

It’s basically a degenerate case of the “query step data does not fit in memory”. If clauses are not specific enough then Datomic cannot use the indexes to narrow down the data to get from storage, and so it would have to scan the whole database, which in most applications would not fit in RAM

hmaurer18:07:31

There might be other reasons, but since you asked the question and I just had this error 5min ago I though it might be partially related

hmaurer19:07:53

Unrelated question: are datomic backups storage-agnostic? e.g. if I use “dev” mode for a while and then decide to move to Dynamo or SQL later, will I be able to smoothly transition by populating the new storage from a backup?

val_waeselynck19:07:40

Sure you can

hmaurer20:07:54

@val_waeselynck thanks!

hmaurer19:07:57

@val_waeselynck maybe ^

cjmurphy19:07:44

With dates is it common practice to store them as java dates and coerce them to clj-time/joda dates each time query? Or is there some better way - such as for instance just keeping them as clj-time/joda dates in datomic, so everywhere they are always clj-time/joda dates? The 'clj-time/joda everywhere' makes sense to me, but all the examples I've seen have java dates being stored.

danielcompton00:07:26

There is https://receptive.io/app/#/case/17713 to request support for java.time Instants

matan19:07:08

@hmaurer well that explains why they tout a customer using Spark to overcome the limitation http://www.datomic.com/nubanks-story.html

matan19:07:31

But definitely the weak spot of the datomic architecture, even if most queries in a given system don't hit this wall.

matan19:07:20

By the way, while the docs still say that memory is cheap enough to fit all the data in memory, this is not a reality with enterprise data center memory prices (even if it is for your desktop machine).

matan19:07:50

The problem here is scalability and reliability, as you can simply one day find out that your queries no longer fit in memory just because data accumulation had persisted over time; which is quite terrible a situation unless you can plug in more memory by demand across each machine in your cluster in emergency mode, which is well, a terrible scenario..

val_waeselynck20:07:57

I've been through the process of moving all our aggregations from Datomic to Elasticsearch and it went quite well. I see a lot of people who use a relational store and end up in a much worse situation when they hit that wall - because mutable databases simply arent well suited to feeding derived data stores, as they cant answer 'what changed' queries out of the box

hmaurer20:07:34

@val_waeselynck are you using the log api to keep ES in sync? Or do you follow another approach? out of curiosity

val_waeselynck20:07:50

Yes

hmaurer20:07:33

Yeah it’s much easier to do on Datomic… There is bottledwater for postgres but it’s much more complex: https://github.com/confluentinc/bottledwater-pg

matan20:07:41

Is the log log api simply "the way" to sync all data changes to an external target, such as ES or even HDFS?

hmaurer20:07:44

@matan you can do it in whatever way you want, but even when using a SQL datastore usually you want to sync data changes from a flux of events that describe all changes in your main data store

hmaurer20:07:49

and the log API provides you with that

hmaurer20:07:56

On your earlier message, that’s not quite true. I don’t know Datomic’s internals in details but I am pretty sure that standard queries on the “present” will not degrade in performance / memory consumed for a large database

matan20:07:29

I only commented on not scaling by query data size, not the size of the database... some queries will grow with the size of the database, and then ......

hmaurer20:07:25

@matan I was just commenting on the part ” as you can simply one day find out that your queries no longer fit in memory just because data accumulation had persisted over time”

matan09:07:31

@hmaurer I know 🙂 and it still holds. Some queries grow with the database, so my statement holds 😉

hmaurer20:07:37

@matan possibly, but Datomic’s target market isn’t big data. Also if you have a query which would need to go over extremely large amounts of data you are likely better of denormalising in another datastore

hmaurer20:07:58

which seems fairly straightforward to do with Datomic’s log API

hmaurer20:07:42

Yeah, basically what Nubanks is doing with Spark.

matan20:07:29

@hmaurer :thumbsup:

matan20:07:10

Kind of odd though, not aiming at being scalable in the size of the data, in this way.

hmaurer20:07:08

There are always tradeoffs. Datomic can handle pretty large amounts of data, but their goal clearly wasn’t to build a database to process huge volumes of data/writes. They favoured other properties

matan20:07:41

Clear

matan20:07:22

Is there a tool or monitor for query memory utilization then?

2017-07-16

Channels