Fork me on GitHub
#datomic
<
2022-07-06
>
Prashant06:07:53

Hi, I have started experimenting with Datomic Analytics. Intention is to use bundled presto server to run adhoc report queries e.g. number of purchases where time window is between x and y (Time of purchase is stored as a fact). My transactor is running on-prem. • I was curious where to put in the https://clojurians.slack.com/archives/C03RZMDSH/p1579027580072600?thread_ts=1578947312.069100&amp;cid=C03RZMDSH files ? ◦ Do they have to be in peer or transactor ? • Does presto server need to run on the transactor ? ◦ Can I have a separate deployment of presto server? I would greatly appreciate if anyone can nudge me to any tutorial/walk through for setting Datomic Analytics up since the documentation is really thin. cc: @dazld

1
favila17:07:39

At a very high level, datomic analytics is a presto/trino installation with a datomic client api connector.

favila17:07:13

it is a separate process: you can run it anywhere that is network-connected to a peer server

favila17:07:14

> Metaschema files are .edn files in the datomic subdirectory of Trino’s etc-dir. Metaschema files can have any name you find convenient, and Datomic analytics will automatically associate metaschemas with any database that has matching attributes.

favila17:07:17

From the docs

favila17:07:19

So you need a “normal” datomic system (cloud or on-prem). If using on-prem you also need a peer-server running (it’s a peer process that provides the client api--cloud only provides the client api). Then you add datomic analytics (presto/trino) and point it at the thing that provides the client-api (peer-server for on-prem, the cloud service itself for cloud).

JohnJ21:07:34

probably worth mentioning that if datomic's datalog isn't a barrier then setting up presto is just pure overhead

favila21:07:42

presto can handle much larger queries than datomic’s current datalog implementation, and it has much richer aggregation options

favila21:07:29

the connector uses memory-efficient divide-and-conquer strategies (using undocumented functions that partition attribute indexes into ranges) so that the intermediate result sets in datalog don’t OOM. An equivalent naive datalog query can easily just take too much memory to complete

favila21:07:15

of course the connector is implemented with the client api so yes, in theory you are right. In practice however, analytics can handle queries with much bigger intermediate result sets using less memory, and often faster wallclock time because of parallelism and reduced memory pressure

JohnJ21:07:55

Interesting, don't remember seeing anything about performance/efficiency of the connector in the docs(maybe is included now). So yeah, since it requires the peer server I assumed that's where the bottleneck would be (and maybe storage depending on what one uses)

favila21:07:33

The peer server can still be bottleneck.

favila21:07:52

but the real bottleneck is the datomic query engine is not that smart

JohnJ21:07:29

got it, do you use it in production for non-analytics?

favila21:07:24

we use it for non-analytics, but not in production

favila21:07:07

well, maybe it’s considered analytics uses. We’re not doing it for business purposes but for schema maintenance, checking cardinality, counting, etc

favila21:07:19

data integrity, histograms

favila21:07:21

that kind of stuff

favila21:07:37

anything that isn’t a selective query

favila21:07:08

the equivalent datalog usually doesn’t work at all. d/datoms or index-pull can often do it, but it’s much more thinking and typing

favila21:07:25

so I hold my nose and type the SQL

JohnJ21:07:47

datomic can be become a heavy operational burden

JohnJ21:07:43

maybe they will easy it with cloud and provide pre-setup trino but is still another process to monitor

JohnJ21:07:23

by data integrity you mean using trino to check for corrupt data?

favila21:07:48

checking invariants

Prashant06:07:53

Thanks a ton @U09R86PA4 and @U01KZDMJ411. One question though, metaschema edn files. need to in datomic-pro-<version>/presto-server/etc/datomic/ , right?