Fork me on GitHub

Hi, anybody using datomic for ETL? We have a low throughput webapp, and we’d like to start collecting events for some kind of BI. We don’t need fancy lambda architecture, just a collector (eg: kafka) and an ingestor. My only concern is that we want do to some heavy-ish aggregations and I’d like to do it straight from datomic


@U0MQW27QB I do some ETL with Datomic. Often nothing more complex than a functional entry point that transforms and transacts. Do you have a particular concern?


yes, how it will handle aggregations


we basically want to do slice and dice, rollups and such on the fly - we’re pretty low volume, if don’t have to design a data warehouse that would be great


low volume meaning a couple of events a sec on peak, at most


so in fact, besides the etl I’d like to query datomic directly


@U0MQW27QB this seems totally fine. I would consider putting this stuff in its own db.


yeah we will have a source of truth in sql, then emit some events, and ingest it to datomic


backed by pg probably


thanks, will do a spike!


I also took a look at onyx but we’re low throughput so don’t need that right now


I only have 1 data point for StorageGetMsec, and I'm a little confused by that. Shouldn't it be triggered fairly regularly? Does Memcache have an effect on this metric?


I didn't understand the difference between storage and object, that clears it up, that's!


question on ions: is it a good idea to handle asynchronous workloads on the datomic cluster (with core.async). the use case is: based on some update in the database, another HTTP api needs to be called. the process initiating the db update should not wait on the HTTP response of the other API. I think I can use vars or atoms to keep references to channels so the core.async machinery doesn't get garbage-collected. Are there any issues with this approach? Or is it better to hand-off this functionality to a physical queue, with e.g. lamba's processing the queue (which is quite a different level of operational complexity 🙂 )?


@U0539NJF7 I think you could do either. If your async work becomes compute intensive, it might interfere with our standard advice on scaling, since the async work would be invisible to our metrics


but you would have choices at that point, including scaling on your own metric, and (as always) isolating the workload on specific processes


Ok, makes sense, Stuart. Thanks!


Hi! Here in Magnet we’ve played a bit with awesome Conformity migration library by @avey_q. However we’ve had some issues with transactions getting reapplied unexpectedly so we’ve developed a simplified, atomicity-compliant version of it. We haven’t explored Ragtime by @weavejester yet, but in case it’s useful for some of you, there it is lying.


hello…im having a strange (but probably simple PEBCAC) problem with unique identities. I have a transaction that includes something like:

[{:db/id pid :person/email email}
 {:db/id mid :club/member pid}]


where :person/email has the unique attribute in the schema


insert works fine, but upsert fails, and I dont understand why


the probem seems to be in referencing the entity-id of the existing item


(e.g. if I take the second add away, the upsert itself “works”


i just dont know how to reference it in the subsequent add


the error I get is:

java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: :db.error/not-a-data-function Unable to resolve data function: :db/id


any guidance appreciated


@ghaskins what’s the exact data you are transacting when you get that error? that’s usually caused by missing a seq wrapper around the tx-data


i.e. (d/transact conn {:tx-data [your-data-here]}) instead of (d/transact conn {:tx-data your-data-here})


is anyone aware of a library that enables something similar to pull expressions but for EDN data?


@marshall in this case, its happening inside a txn-function, so harder to get the exact set its using


ill see if I can tease that out


@marshall good call, that was it

👍 4

the txn-fcn was returning a sequence in the initial case and a map (with one :db/id it it) for the update case


@spieden and perhaps, although it's probably overkill for parsing just EDN.


I'm looking at upgrading Datomic cloud and have a couple of questions: (a) I'm unsure of whether a system has been upgraded before, so I'm trying to determine whether I should follow the First Upgrade path. (b) If it is a First Upgrade, it tears down the stack and creates a new one with "Reuse existing storage on create" set to True. Both "Upgrade" and this setting imply that existing data will be preserved. Is this correct?


hi @U0EHU1800 It is always safe to do a first upgrade


Any and all upgrades are data safe.


Thanks for confirming @U072WS7PE . Perhaps it's just me being overly cautious. Is that explicitly documented somewhere? If not, I know I would have found it helpful.


it is documented (but not highlighted) at I will improve it.