Fork me on GitHub
#datomic
<
2020-01-29
>
Ignas06:01:06

Hey, I have a datomic query performance question. We are trying to get all entities of a certain type (transactions) that have changed in the last 5 minutes. We are currently using the default "now" db to get all the transactions and then "since" db to shave off the past. The query works but gets exponentially slower as data grows. Maybe there is a better way write this query?

(def since-5min (d/since db #inst "2020-01-28T15:45"))

(d/q '[:find [e? ...]
       :in $ $since
       :where [$ ?e :transaction/status]
       (not [?e :transaction/type :transaction.type/rejected]))
       [$since ?e]]
  db since-5min)

favila07:01:15

Five minutes is not very many transactions. Maybe just look at the transaction log directly?

maxt07:01:52

Is there a difference if you put the [$since ?e] part first?

Ignas08:01:26

yes, I won't get the ones that changed in the last 5 minutes but did not change the :transaction/status property

Ignas08:01:18

for the transactions logs, can we use both :db/txInstant to filter for the last 5 minutes and filter by :transaction/status in the same query?

Ignas08:01:48

in the way that would include also the transactions that have changed but on some other property than :transaction/status

ghadi12:01:02

Why not include txInstant directly in a query clause?

favila14:01:34

What I was thinking was something like this:

favila14:01:35

(d/q '[:find [?e ...]
       :in $ ?log ?from-t ?to-t
       :where
       [(tx-ids ?log ?from-t ?to-t) [?tx ...]]
       [(tx-data ?log ?tx) [[?e ?a ?v _ ?op]]]
       [?e :transaction/status]
       (not [?e :transaction/type :transaction.type/rejected])]
     (d/as-of (d/db conn) #inst "2020-01-28T15:50")
     (d/log conn)
     #inst "2020-01-28T15:45"
     #inst "2020-01-28T15:50")

favila14:01:34

Look at everything that happened in the last five minutes; if you see any datoms in the tx log whose entity now (currently) has a transaction status and no rejected transaction type, you know that transaction entity changed

favila14:01:43

its mere presence in the log means it changed

favila14:01:37

this strategy might not make sense for longer time periods or higher transaction loads because it depends on transaction time being the most selective thing available

favila14:01:06

(but for 5 minutes, it probably is)

Ignas14:01:39

thanks you! the query seems to do exactly what I want. I'll now try to benchmark it in different load scenarios thanks2

leongrapenthin13:01:53

Is there a faster alternative to developing against Datomic Cloud over using the Bastion/Tunnel? For my development purposes, I sometimes need to run about a 100 queries in a batch which is super slow. I don't need help with best practices etc. as I use Datomic for 5 years now. Point is, when executed within Datomic Cloud this takes milliseconds. But locally it can take half a minute which breaks any kind of dynamic/interactive flow during dev. Right now I'm thinking of using VNC to develop on a remote appliance within AWS.

dharrigan13:01:16

WOuld you consider running wireguard or zerotier on the server? then you can connect directly to the server, skipping ssh 🙂 (the connection is encrypted and secure on either wireguard and/or zerotier)

ghadi13:01:15

what makes you think the bastion vs. wireguard is going to make a difference?

ghadi13:01:05

there's probably inherent latency from @leongrapenthin’s laptop to the VPC

dharrigan13:01:13

Well, with bastion you're jumping directly from a machine in the middle to the actual target server, so that's an additional network hop

dharrigan13:01:20

wireguard/zerotier is direct peering

ghadi13:01:07

the latency from bastion -> query group is far less than the latency from laptop -> bastion

dharrigan13:01:31

perhaps - I don't know - maybe - but if one can remove a network hop, then it's good

ghadi13:01:02

wireguard is great, but it can't make packets get inside the VPC faster

maxt14:01:16

Cloud running locally would be great, like datomic-memdb but supporting tx functions and tuples. I'm also annoyed by queries that run fast in the cloud but very slow when developing.

augustl14:01:18

when will we get apt-get install aws-local? 😄

simple_smile 4
dharrigan14:01:32

yay -S t'interwebs-local

leongrapenthin15:01:11

@ghadi I'm pretty sure the performance distance is mainly due to a Datomic Ion having more "Datomic Peer like" querying performance with segment caching and whatnot, whereas the client via bastion is a flat http client.

Joe Lane15:01:02

I don't think that was the argument he was trying to make.