Fork me on GitHub
#datomic
<
2018-09-14
>
mping08:09:14

Hi, anybody using datomic for ETL? We have a low throughput webapp, and we’d like to start collecting events for some kind of BI. We don’t need fancy lambda architecture, just a collector (eg: kafka) and an ingestor. My only concern is that we want do to some heavy-ish aggregations and I’d like to do it straight from datomic

stuarthalloway12:09:33

@U0MQW27QB I do some ETL with Datomic. Often nothing more complex than a functional entry point that transforms and transacts. Do you have a particular concern?

mping13:09:32

yes, how it will handle aggregations

mping13:09:13

we basically want to do slice and dice, rollups and such on the fly - we’re pretty low volume, if don’t have to design a data warehouse that would be great

mping13:09:27

low volume meaning a couple of events a sec on peak, at most

mping13:09:52

so in fact, besides the etl I’d like to query datomic directly

stuarthalloway14:09:34

@U0MQW27QB this seems totally fine. I would consider putting this stuff in its own db.

mping14:09:09

yeah we will have a source of truth in sql, then emit some events, and ingest it to datomic

mping14:09:19

backed by pg probably

mping14:09:26

thanks, will do a spike!

mping08:09:42

I also took a look at onyx but we’re low throughput so don’t need that right now

dominicm08:09:34

I only have 1 data point for StorageGetMsec, and I'm a little confused by that. Shouldn't it be triggered fairly regularly? Does Memcache have an effect on this metric?

dominicm14:09:06

I didn't understand the difference between storage and object, that clears it up, that's!

stijn09:09:35

question on ions: is it a good idea to handle asynchronous workloads on the datomic cluster (with core.async). the use case is: based on some update in the database, another HTTP api needs to be called. the process initiating the db update should not wait on the HTTP response of the other API. I think I can use vars or atoms to keep references to channels so the core.async machinery doesn't get garbage-collected. Are there any issues with this approach? Or is it better to hand-off this functionality to a physical queue, with e.g. lamba's processing the queue (which is quite a different level of operational complexity 🙂 )?

stuarthalloway12:09:16

@U0539NJF7 I think you could do either. If your async work becomes compute intensive, it might interfere with our standard advice on scaling, since the async work would be invisible to our metrics

stuarthalloway12:09:27

but you would have choices at that point, including scaling on your own metric, and (as always) isolating the workload on specific processes

stijn20:09:28

Ok, makes sense, Stuart. Thanks!

damian15:09:38

Hi! Here in Magnet we’ve played a bit with awesome Conformity migration library by @avey_q. However we’ve had some issues with transactions getting reapplied unexpectedly so we’ve developed a simplified, atomicity-compliant version of it. We haven’t explored Ragtime by @weavejester yet, but in case it’s useful for some of you, there it is lying. https://github.com/magnetcoop/stork

ghaskins15:09:19

hello…im having a strange (but probably simple PEBCAC) problem with unique identities. I have a transaction that includes something like:

[{:db/id pid :person/email email}
 {:db/id mid :club/member pid}]

ghaskins15:09:48

where :person/email has the unique attribute in the schema

ghaskins15:09:09

insert works fine, but upsert fails, and I dont understand why

ghaskins15:09:47

the probem seems to be in referencing the entity-id of the existing item

ghaskins15:09:06

(e.g. if I take the second add away, the upsert itself “works”

ghaskins15:09:25

i just dont know how to reference it in the subsequent add

ghaskins15:09:23

the error I get is:

java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: :db.error/not-a-data-function Unable to resolve data function: :db/id

ghaskins15:09:47

any guidance appreciated

marshall17:09:25

@ghaskins what’s the exact data you are transacting when you get that error? that’s usually caused by missing a seq wrapper around the tx-data

marshall17:09:06

i.e. (d/transact conn {:tx-data [your-data-here]}) instead of (d/transact conn {:tx-data your-data-here})

spieden18:09:09

is anyone aware of a library that enables something similar to pull expressions but for EDN data?

ghaskins19:09:27

@marshall in this case, its happening inside a txn-function, so harder to get the exact set its using

ghaskins19:09:32

ill see if I can tease that out

ghaskins19:09:15

@marshall good call, that was it

👍 4
ghaskins19:09:19

the txn-fcn was returning a sequence in the initial case and a map (with one :db/id it it) for the update case

cjmurphy21:09:08

@spieden and perhaps https://github.com/wilkerlucio/pathom, although it's probably overkill for parsing just EDN.

grzm23:09:36

I'm looking at upgrading Datomic cloud and have a couple of questions: (a) I'm unsure of whether a system has been upgraded before, so I'm trying to determine whether I should follow the First Upgrade path. (b) If it is a First Upgrade, it tears down the stack and creates a new one with "Reuse existing storage on create" set to True. Both "Upgrade" and this setting imply that existing data will be preserved. Is this correct?

stuarthalloway02:09:31

hi @U0EHU1800 It is always safe to do a first upgrade

stuarthalloway02:09:18

Any and all upgrades are data safe.

grzm04:09:14

Thanks for confirming @U072WS7PE . Perhaps it's just me being overly cautious. Is that explicitly documented somewhere? If not, I know I would have found it helpful.

stuarthalloway11:09:29

it is documented (but not highlighted) at https://docs.datomic.com/cloud/operation/deleting.html. I will improve it.