This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-07-02
Channels
- # beginners (130)
- # calva (3)
- # cider (13)
- # circleci (1)
- # cljsrn (19)
- # clojure (106)
- # clojure-europe (10)
- # clojure-losangeles (1)
- # clojure-nl (9)
- # clojure-uk (33)
- # clojurescript (16)
- # code-reviews (24)
- # conjure (11)
- # cursive (41)
- # data-science (9)
- # datomic (63)
- # fulcro (19)
- # graphql (12)
- # helix (4)
- # kaocha (2)
- # leiningen (3)
- # malli (8)
- # meander (1)
- # off-topic (17)
- # re-frame (16)
- # reitit (12)
- # sci (32)
- # shadow-cljs (73)
- # spacemacs (31)
- # sql (38)
- # tools-deps (26)
- # xtdb (28)
Hi, there. I have a ~3Billion datom database in datomic cloud and the need to add an AVET index seems more than reasonable
Will it index the existing datoms? If so, would it harmful from a operation standpoint?
Hi, We are using Datomic Cloud, executing queries against a Database with approximately 3 Billion Datoms, but a trivial query is taking a long time to return, or it isn’t returning at all, rising timeout exception. With the query bellow we are trying to return all transactions from a merchant in a range of time, super trivial:
(d/q {:query '[:find (pull ?entity [*]
:in $
:where
[?entity :merchant-id "beb9c7db-a7eb-4e56-8c4e-4db195566562"]
[?entity :transaction-time ?transaction-time]
[?transaction-time :utc-time ?transaction-time-utc]
[(>= ?transaction-time-utc #inst"2020-06-01T03:00:00.000-00:00")]
[(<= ?transaction-time-utc #inst"2020-06-30T03:00:00.000-00:00")])]
:args [(d/db (:conn client))]
:timeout 50000})
We are running in query groups i3.xlarge (with 30.5 GB RAM), and wondering ourselfs if we need to increase these machines.
Can someone with more experience thrown light on this?d/query with pull inside tends to be redundant sometimes, I think that d/pull will get better performance =)
using pull in query should have the same performance characteristics as a query followed by a pull, except that in the case of using client/cloud using pull in query will save a round trip/wire cost
it's datomic cloud w/o ions
I think d/pull will help in this case
you should test the time it takes to query just for the entity IDs and how long it takes to pull the attributes of interest

it can be a problem to use this w/o Ions with too many entities on cloud?
Hi @U05120CBV, these are the queries and execution times First: PULL + 30 days interval = 110578.733438 msecs (14,054 results) Second: Entity + 30 days interval = 22990.008083 msecs (14,054 results)
(time (def pull-entities (d/q {:query '[:find (pull ?entity [*])
:in $ ?merchant-id ?transaction-time-start ?transaction-time-end
:where
[?entity :merchant-id ?merchant-id]
[?entity :transaction-time ?transaction-time]
[?transaction-time :utc-time ?transaction-time-utc]
[(>= ?transaction-time-utc ?transaction-time-start)]
[(<= ?transaction-time-utc ?transaction-time-end)]]
:args [(d/db (:conn client))
"beb9c7db-a7eb-4e56-8c4e-4db195566562"
#inst"2020-06-01T03:00:00.000-00:00"
#inst"2020-06-30T03:00:00.000-00:00"]
:timeout 50000})))
"Elapsed time: 110578.733438 msecs"
=> #'pgo.commons.datomic-test/pull-entities
(count pull-entities)
=> 14054
(time (def pull-entities (d/q {:query '[:find ?entity
:in $ ?merchant-id ?transaction-time-start ?transaction-time-end
:where
[?entity :merchant-id ?merchant-id]
[?entity :transaction-time ?transaction-time]
[?transaction-time :utc-time ?transaction-time-utc]
[(>= ?transaction-time-utc ?transaction-time-start)]
[(<= ?transaction-time-utc ?transaction-time-end)]]
:args [(d/db (:conn client))
"beb9c7db-a7eb-4e56-8c4e-4db195566562"
#inst"2020-06-01T03:00:00.000-00:00"
#inst"2020-06-30T03:00:00.000-00:00"]
:timeout 50000})))
"Elapsed time: 22990.008083 msecs"
=> #'pgo.commons.datomic-test/pull-entities
(count pull-entities)
=> 14054
And one more case, without filter time: Third: Entity + without interval = 3768.019134 msecs (17,670 results)
(time (def pull-entities (d/q {:query '[:find ?entity
:in $ ?merchant-id
:where
[?entity :merchant-id ?merchant-id]]
:args [(d/db (:conn client))
"beb9c7db-a7eb-4e56-8c4e-4db19556656"]
:timeout 50000})))
"Elapsed time: 3768.019134 msecs"
=> #'pgo.commons.datomic-test/pull-entities
(count pull-entities)
=> 17670
Where are you located, which AWS_REGION are your non-ion machines located, and which AWS_REGION is your datomic cloud cluster deployed in?
For these examples, I connected using datomic-cli from São Paulo in the cluster at us-east-1. In the productive environment, it has a VPC Endpoint connecting our applications to Datomic in us-east-1.
So I understand, in prod, your datomic cluster is in us-east-1
, and your applications connect to it from which AWS_REGION? Where are the machines themselves?
In prod, both in us-east-1
Yes I do. But I couldn’t improve any more than that (I got a better result just passing the merchant-id, I do not know a way to search better on this date ref).
[?entity :merchant-id ?merchant-id]
[?entity :transaction-time ?transaction-time]
[?transaction-time :utc-time ?transaction-time-utc]
[(>= ?transaction-time-utc ?transaction-time-start)]
[(<= ?transaction-time-utc ?transaction-time-end)
@U05120CBV we noticed that what is hurting the query performance are all clauses related to time.
Hmm, good idea @U0CJ19XAM cc: @U016FDZFA2X
@U0CJ19XAM in this way?
(d/q {:query '[:find ?entity
:in $ ?merchant-id ^java.util.Date ?transaction-time-start ^java.util.Date ?transaction-time-end
:where
[?entity :merchant-id ?merchant-id]
[?entity :transaction-time ?transaction-time]
[?transaction-time :utc-time ?transaction-time-utc]
[(>= ?transaction-time-utc ?transaction-time-start)]
[(<= ?transaction-time-utc ?transaction-time-end)]]
:args [(d/db (:conn client))
"beb9c7db-a7eb-4e56-8c4e-4db195566562"
#inst"2020-06-01T03:00:00.000-00:00"
#inst"2020-06-30T03:00:00.000-00:00"]
:timeout 50000})))
Although, it may not make a difference because you are using the custom comparators <=
and >=
.
@U016FDZFA2X what are the schema definitions for all the attributes in the query
It’s a lot of time to return the result of such trivial query, must be something we can do to decrease this time.
@marciol Have you separately tested the time it takes to query for the entity IDs and to pull the attributes from them
@U05120CBV we are going to do all this to get a more fine grained overview of what is happening.
@marciol @schultzkaue https://github.com/cognitect-labs/day-of-datomic-cloud/blob/master/tutorial/decomposing_a_query.clj have you worked through the decomposing a query ?
d/query with pull inside tends to be redundant sometimes, I think that d/pull will get better performance =)
What is the idiomatic way to answer the question "when was this attribute last changed"?
bind the ?tx (the fourth component) in a clause, then join to the ?tx's :db/txInstant

Then how do I query for something like top 10 entity with respect to an attribute? Do I have to query for all of them and then do a client-side sorting?
Take a look at d/seek-datoms and d/index-range on the peer api, and index-seek and index-pull on the peer
you could also try abusing nested queries a bit. You can call normal clojure code inside a query, so you could have an inner query that gets all results, and an outer query that sorts and limits them
I guess by nested queries you mean something like this, right? https://docs.datomic.com/cloud/query/query-data-reference.html#q
Or an unspecified nested data structure consisting of maps, vectors, and any of the supported basic types as leaves (i.e., a JSON-like structure)
@zhuxun2 Datomic is not intended for storing LOBs. You should avoid putting large objects directly in Datomic. Either split them into individual facts (datoms) or store the LOBs somewhere else (ddb, s3, etc) and store a reference to them in Datomic
@U05120CBV Do you have a rule of thumb for when a string becomes too big and should be considered a LOB?