Fork me on GitHub
#datomic
<
2021-10-13
>
helios09:10:44

We're trying out Datomic Analytics (Presto) and are puzzled by something: • The DB has a few tens of millions of datoms • We're running on a machine with 64GB RAM and 16 cores • Transactor is configured with 6GB RAM • Peer server is configured with 30GB RAM • Presto is running with 20GB RAM We have 4 "tables" in our metaschema. We have attributes of type refs that in SQL-world can be used to perform joins. • When we run a simple lookup using a datomic query (attribute value) it's of course instantanous (both with the datomic api and the client api against the peer server). • The same query on presto (again, just lookup by column value) takes 5 seconds. Performing a more complex query that "joins" in datomic on this attribute is also instantenous, and the SQL equivalent takes around 4 minutes. With logging it looks like it's scanning the whole table rather than relying on indices. What are we doing wrong? Is there anything else we can do to rely on indices? The fact that it's SO much slower than datomic feels a bit unexpected (slower yes, but by this much it makes it unusable)

2
stuarthalloway12:10:50

Some general points: • Presto is ill-suited for low latency lookups of a single object, so that will never be competitive. • There is likely an inflection point where Presto will win, at a much bigger database size than what you are describing. • We are aware of substantial opportunities to improve performance on queries, so this will get better in a future release. All that said, we would be happy to learn more about your use case and see if there are ways to make it faster today.

helios05:10:09

Thank you @U072WS7PE 🙂 What do you need to learn more about our use case?

helios08:10:27

We're building a POC for a customer with Datomic and their existing BI tools rely on SQL and they also use it for manual queries as well. So we're investigating using Datomic Analytics so the customer can evaluate it

jaret13:10:51

@U0AD3JSHL You can share these details with me via support ticket (<mailto:[email protected]|[email protected]>), but we would be interested in where specifically you encounter performance issues and what your specific POC business requirements are. Using presto/trino clusters and tuning we have found we can tune performance well (adding more nodes for parallel work). As Stu said we are aware there are substantial opportunities to improve performance of queries. In terms of getting faster today... we have had great success with is using https://trino.io/docs/current/connector/memory.html. That allows you to create a virtualized result set, an in-memory snapshot of whenever you issued the query that is held in memory on the machine. You can virtualize select * queries or queries for specific columns. This process can be run in a loop to be nearly current (i.e. stale by whatever the execution time is). Then you can point your queries at this result set for best performance.

octahedrion14:10:01

Can I use dev-local with analytics ?