This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-07-06
Channels
- # aleph (1)
- # announcements (3)
- # asami (32)
- # aws (12)
- # babashka (6)
- # beginners (43)
- # calva (36)
- # cider (3)
- # clj-kondo (3)
- # cljs-dev (2)
- # clojars (6)
- # clojure (66)
- # clojure-europe (14)
- # clojure-uk (2)
- # clojurescript (12)
- # conjure (1)
- # core-async (27)
- # cursive (17)
- # data-science (9)
- # datahike (1)
- # datomic (28)
- # emacs (34)
- # events (1)
- # girouette (3)
- # jobs (1)
- # klipse (4)
- # lsp (26)
- # malli (5)
- # off-topic (38)
- # portal (1)
- # releases (1)
- # shadow-cljs (72)
- # sql (7)
- # tools-deps (5)
- # vim (9)
- # xtdb (18)
Hi, I have started experimenting with Datomic Analytics. Intention is to use bundled presto server to run adhoc report queries e.g. number of purchases where time window is between x and y (Time of purchase is stored as a fact). My transactor is running on-prem. • I was curious where to put in the https://clojurians.slack.com/archives/C03RZMDSH/p1579027580072600?thread_ts=1578947312.069100&cid=C03RZMDSH files ? ◦ Do they have to be in peer or transactor ? • Does presto server need to run on the transactor ? ◦ Can I have a separate deployment of presto server? I would greatly appreciate if anyone can nudge me to any tutorial/walk through for setting Datomic Analytics up since the documentation is really thin. cc: @dazld
This is how to configure it: Cloud: https://docs.datomic.com/cloud/analytics/analytics-configuring.html On-Prem: https://docs.datomic.com/on-prem/analytics/analytics-configuring.html
At a very high level, datomic analytics is a presto/trino installation with a datomic client api connector.
it is a separate process: you can run it anywhere that is network-connected to a peer server
> Metaschema files are .edn
files in the datomic
subdirectory of Trino’s etc-dir
. Metaschema files can have any name you find convenient, and Datomic analytics will automatically associate metaschemas with any database that has matching attributes.
So you need a “normal” datomic system (cloud or on-prem). If using on-prem you also need a peer-server running (it’s a peer process that provides the client api--cloud only provides the client api). Then you add datomic analytics (presto/trino) and point it at the thing that provides the client-api (peer-server for on-prem, the cloud service itself for cloud).
probably worth mentioning that if datomic's datalog isn't a barrier then setting up presto is just pure overhead
presto can handle much larger queries than datomic’s current datalog implementation, and it has much richer aggregation options
the connector uses memory-efficient divide-and-conquer strategies (using undocumented functions that partition attribute indexes into ranges) so that the intermediate result sets in datalog don’t OOM. An equivalent naive datalog query can easily just take too much memory to complete
of course the connector is implemented with the client api so yes, in theory you are right. In practice however, analytics can handle queries with much bigger intermediate result sets using less memory, and often faster wallclock time because of parallelism and reduced memory pressure
Interesting, don't remember seeing anything about performance/efficiency of the connector in the docs(maybe is included now). So yeah, since it requires the peer server I assumed that's where the bottleneck would be (and maybe storage depending on what one uses)
well, maybe it’s considered analytics uses. We’re not doing it for business purposes but for schema maintenance, checking cardinality, counting, etc
the equivalent datalog usually doesn’t work at all. d/datoms or index-pull can often do it, but it’s much more thinking and typing
maybe they will easy it with cloud and provide pre-setup trino but is still another process to monitor
Thanks a ton @U09R86PA4 and @U01KZDMJ411.
One question though, metaschema
edn files. need to in datomic-pro-<version>/presto-server/etc/datomic/
, right?