datomic 2018-01-09 | Slack Archive

chrisblom12:01:11

Is anyone using Datomic for timeseries? I've been using datomic for simple timeseries data by using a compound id (event id + attribute + timestamp), but am running into some issues with this approach (slow queries, not easy to remove expire data), so I was wondering if and how other people solve this.

chrisblom12:01:26

i'm thinking now that Datomic is not a good fit for this purpose , but I hope someone can prove me wrong

val_waeselynck16:01:08

In my experience Datomic is not a very good fit when you want fast aggregations - we offloaded all of ours to ElasticSearch. Datomic still plays an important role in that : making data synchronization easy

chrisblom14:01:50

thanks, what do you mean by data synchronization in this case?

val_waeselynck16:01:35

@U0P1MGUSX making sure the ElasticSearch materialized view gets updated correctly and efficiently.

Vincent Cantin13:01:21

Out of curiosity, did anyone already developed an bridge between git's repository format and Datomic? Would there be any practical reason to do it?

Vincent Cantin13:01:48

I don't know if Datomic would be appropriate for this usage w.r.t. the potential huge size of the data, but I would definitively see an advantage in the gain of expressiveness of the queries we could run on imported git repositories.

chrisblom13:01:24

Have you seen this project? https://github.com/Datomic/codeq

Vincent Cantin14:01:43

I will definitely look into it. thx

conan16:01:30

Hi folks, i'm getting this error: ActiveMQSecurityException AMQ119031: Unable to validate user when trying to connect to a transactor running on Heroku. My understanding is that it's a licensing issue, but i'm using a Datomic Pro Starter Edition license which allows unlimited peers. Does anybody have an idea what else could cause this?

conan18:01:51

turns out i'm not able to expose the datomic port using heroku 😞

hansw17:01:48

I have a question about capacity planning. My process compares millions of records, one-by-one (with some parallelism involved). The left-hand version of an entity is almost always already in the db, unless we occur a 'new' one. Also, we will encounter each record only once. Is it fair to say that I should try to look for a way to disable any caching the datomic-peer (and indeed the transactor) might do for this usecase?

chrisblom08:01:04

are you only processing these records once?

hansw17:01:10

So, actual calls to transact are pretty rare, whereas a read from the db using entity happen in 99% of the time...

hansw18:01:25

Or... i try to get as much of my database in memory as I can upfront, which leads to another question: how do I determine the size of my db?

souenzzo18:01:02

"Size of db" inside SQL/Dynamo? Inside memory/running code? In (d/datoms) form? In backup form?

hansw18:01:29

Inside running code... I guess what I'm asking is, is what the ratio sizeof (psqldump) : running memory is.

hansw18:01:48

Alas, I'm pretty sure it won't fit. A single file I process is roughly 22 GB. That doesn't fit on my laptop 🙂

hansw18:01:57

Not in mem, at least... But I would consider getting lots of mem for the production-stage. Just hard to know how much that would require.

souenzzo19:01:04

#Also sent to the channel

I'm also interested. In my case, would be cool to know how many "peer memory"(50% of JVM) is enough to fit a database that has a backup with XGb

hansw20:01:55

i'll let you know if i find out mote

hansw20:01:57

calebp18:01:14

I’m not sure if any of this has changed, but I believe the docs mention being careful about getting the heap too big, so that you don’t introduce large gc pauses. The alternative is to use the memcached intergration for larger memory caches.

calebp18:01:44

I don’t actually see it in the docs, must have been support convos. If your async anyway, longer pauses might not be a big deal

hansw20:01:59

@U0H4HJB08 thanks! i have indeed run into gc problems... I have decided to go the client-api route as to be isolated from the peculiarities of the peer library for this high-traffic scenario i have

hansw20:01:39

@U0H4HJB08 in this usecase i will never hit any of the caches because i am hitting each entity in my database once

hansw20:01:48

so caching is futile

hansw20:01:54

even counterproductive

hansw20:01:15

unless my db were small and i could load all of it in mem

souenzzo19:01:04

replied to a thread:Or... i try to get as much of my database in memory as I can upfront, which leads to another question: how do I determine the size of my db?

I'm also interested. In my case, would be cool to know how many "peer memory"(50% of JVM) is enough to fit a database that has a backup with XGb

hansw20:01:32

my postgresql dump is 19 GB

donmullen23:01:47

What is the proper syntax for using pull within a clojure peer client query and using defaults?

luchini23:01:52

[:find
   (pull ?e [:job/doc-num (:job/filing-date :default "")])
   :in $
   :where
   [?e :job/job-num "01"]]

luchini23:01:10

☝️ pull as the first form after :find

luchini23:01:20

that might do the trick

donmullen23:01:33

@luchini Evidently one can put values before the pull that are included as variables in the query. And taking out the ?e doesn’t help. Wondering if I should just use (get-else …) within the :where.

2018-01-09

Channels