Fork me on GitHub
#datomic
<
2023-09-13
>
Chip02:09:59

AWS recommends a separate account when doing Amplify work. Has anyone found this necessary for Datomic/Ion-based systems? Seen any Amplify provisioning collisions?

Joe Lane13:09:44

Pure curiosity on my part, do you have a link to the docs where AWS makes that recommendation?

Chip13:09:33

I knew someone would ask. I went looking for it and couldn’t put my hands on it. I’ll double down.

1
Joe Lane13:09:01

Don’t try too hard on my behalf, but if you find it, sent it on over.

Chip13:09:04

I went looking for myself just to confirm my understanding. I’ll be in touch.

Chip14:09:30

I haven’t found it yet. If I didn’t see it, the hallucination was strong enough for me to create a separate account. If I trip over it again (while I’m tripping over myself), I’ll pipe up.

👍 1
Daniel Jomphe16:09:01

I, too, don’t remember seeing such documentation. We have a few Datomic Cloud deployments in different AWS accounts (managed with AWS Control Tower), each with its own Amplify deployment for our frontend, in the same account as its Cloud Ion backend. No issues.

Chip16:09:28

That’s good news. Thank you.

🙂 1
Daniel Jomphe16:09:00

With that said, we didn’t invest a lot in Amplify-typical tooling. We used it mostly for its ease of use, so we didn’t feel the need to read much of their docs…. so I must see I can see the possibility that for people who go all-in on the entire Amplify frontend & backend stuff, it might be written some place that it’s recommended to do all that in its own AWS account. Since you’re most probably going the “Amplify frontend only” + “Datomic Cloud Ion backend” route, I don’t feel the advice would be very valuable.

Chip17:09:13

Yes, Datomic/Ion back end. As I see it, AppSync is the line. Everything from there forward being more the Amplify world. That’s my current thinking anyway. Thanks.

🙂 1
dazld13:09:10

I’m trying to find an entity o that is referenced by other entities of kind a, with at least one of each kind of a supplied. This query is crushing my machine, on quite paltry amounts of data. what am I doing wrong?

(d/q '[:find [?o ...]
       :in $ ?a ?b
       :where 
       [?ia :a/kind ?a]
       [?ib :a/kind ?b]
       [?ia :a/entity ?o]
       [?ib :a/entity ?o]]
     (d/db conn!)
     :a.kind/a
     :a.kind/b)

favila13:09:43

It will tell you row counts before and after each clause (among other things)

dazld13:09:30

thank you, will try. am I doing something dumb in the query?

favila13:09:45

depends on the shape of your data

favila13:09:55

relative cardinalities

favila13:09:42

e.g. maybe ?ia -> :a/entity -> ?o -> :a/_entity -> ?ib realizes fewer intermediate rows

👍 1
dazld13:09:11

(d/q '[:find ?o
       :in $ ?a ?b
       :where 
       [?ia :a/kind ?a]
       [?ib :a/kind ?b]
       [?ia :a/ref ?o]
       [?ib :a/ref ?o]]
     '[[a :a/kind :a]
       [b :a/kind :b]
       [c :a/kind :b]
       [a :a/ref f]
       [b :a/ref f]
       [c :a/ref g]]
     :a
     :b)
..gives me the right answer, for example, although yes, its just dumb data

favila13:09:42

nice thing about datalog is clause order doesn’t affect its correctness, just its performance

favila13:09:03

(in an ideal world it wouldn’t affect its performance either…)

dazld13:09:06

annoyingly, I can’t get this query to complete, so the stats aren’t coming out 😄

favila13:09:05

you can try adding a clause at a time

favila13:09:51

If :a/kind is high-cardinality, I suspect this will be faster

[?ia :a/kind ?a]
[?ia :a/entity ?o]
[?ib :a/entity ?o]
[?ib :a/kind ?b]

dazld13:09:28

kind is a keyword, and low cardinality, not a ref

dazld13:09:35

perhaps it should be a ref..

favila13:09:58

is it value-indexed?

dazld13:09:14

db/index is true yep

favila13:09:17

how many ?ia and ?ib do you expect?

favila13:09:27

1 + 1 or thousands?

dazld13:09:40

a few thousand of each, and trying to find the ?o that has one of each

favila13:09:13

so thousands is high cardinality, because after the first two clauses you will have ?ia * ?ib rows

dazld13:09:41

that is starting to sound like the problem!

dazld13:09:09

sounds like my mental model of the query isn’t quite right - I was thinking of each clause producing a set of values, and then the answer is the intersection of those clauses… ?

favila13:09:47

the intersection of rows sharing a value for a binding

👍 1
favila13:09:59

the first two clauses share no bindings, so it’s just unions

favila13:09:16

you’re just adding rows

favila13:09:46

this is the “join along” rule of thumb in https://docs.datomic.com/pro/best-practices.html#join-along

👍 1
dazld13:09:49

yup, sadly no good! I tried different combinations here, including your suggestions, thank you, and they are all timing out on thousands of rows. Doing the set intersection by hand, however, is milliseconds.

favila13:09:40

how many clauses can you get through before timeout?

dazld14:09:05

as soon as I introduce the second ?ib it never returns

dazld14:09:38

[?ia :a/kind ?a]
[?ia :a/entity ?o]
[?ib :a/entity ?o]
[?ib :a/kind ?b]
I tried this ordering, and same thing, sadly

favila14:09:45

What if you add a [(!= ?ia ?ib)]

favila14:09:35

So this returns?

[?ia :a/kind ?a]
[?ia :a/entity ?o]
[?ib :a/entity ?o]
?

👍 1
favila14:09:41

how many rows at this point?

dazld14:09:51

yep, and ~6k

favila14:09:25

hm, and how many is [?ib :a/kind ?b] by itself?

dazld14:09:00

(sorry for pauses, repl has to restart on every bad query)

favila14:09:17

use d/query instead, that gives you a timeout

dazld14:09:47

so, there’s 50k of type a and 5k of type b

dazld14:09:07

oh, damn, ordering the other way around works - putting a first instead of b

dazld14:09:31

so, the kind with the most matches

dazld14:09:30

actually, no - your ordering and join along is the right solution

{:query [:find
         [?o ...]
         :in
         $
         ?a
         ?b
         :where
         [?ia :incident/kind ?a]
         [?ia :incident/organizations ?o]
         [?ib :incident/organizations ?o]
         [?ib :incident/kind ?b]],
 :phases [{:sched (([(ground $__in__3) ?b]
                    [(ground $__in__2) ?a]
                    [?ia :incident/kind ?a]
                    [?ia :incident/organizations ?o]
                    [?ib :incident/organizations ?o]
                    [?ib :incident/kind ?b])),
           :clauses [{:clause [(ground $__in__3) ?b],
                      :rows-in 0,
                      :rows-out 1,
                      :binds-in (),
                      :binds-out [?b],
                      :expansion 1}
                     {:clause [(ground $__in__2) ?a], :rows-in 1, :rows-out 1, :binds-in [?b], :binds-out [?a ?b]}
                     {:clause [?ia :incident/kind ?a],
                      :rows-in 1,
                      :rows-out 49947,
                      :binds-in [?a ?b],
                      :binds-out [?b ?ia],
                      :expansion 49946}
                     {:clause [?ia :incident/organizations ?o],
                      :rows-in 49947,
                      :rows-out 3560,
                      :binds-in [?b ?ia],
                      :binds-out [?o ?b]}
                     {:clause [?ib :incident/organizations ?o],
                      :rows-in 3560,
                      :rows-out 49411,
                      :binds-in [?o ?b],
                      :binds-out [?o ?ib ?b],
                      :expansion 45851}
                     {:clause [?ib :incident/kind ?b],
                      :rows-in 49411,
                      :rows-out 16,
                      :binds-in [?o ?ib ?b],
                      :binds-out [?o]}]}]}

dazld14:09:38

when it timed out before, the repl must have been in a bad state

favila14:09:52

{:clause [?ib :incident/organizations ?o],
                                    :rows-in 3560,
                                    :rows-out 49411,
                                    :binds-in [?o ?b],
                                    :binds-out [?o ?ib ?b],
                                    :expansion 45851}

favila14:09:59

This is the expansion you want to avoid

favila14:09:24

(if you can)

dazld14:09:01

thank you favila, this was really helpful

dazld14:09:13

the join along rule is really the solution here, afaict

favila14:09:33

The [(!= ?ia ?ib)] might help you avoid that expansion in particular

favila14:09:59

or if you know relative frequency of kinds, you can start from the least common

👍 1
dazld14:09:39

adding that rule makes the query timeout!

favila14:09:44

huh, I guess it makes it hold on to ?ia for an extra clause

favila14:09:26

kinda annoying, because most of those ?ib results are ?ia

dazld14:09:53

in all cases, where it doesn’t join along, then it blows up

dazld14:09:18

and then, ordering the inputs from lowest to highest cardinality makes it more efficient

favila14:09:38

yeah you always want your first clause to have the lowest row count possible

dazld14:09:49

awesome, thanks a ton!

favila16:09:13

I wonder if this would help

favila16:09:16

(defn other-edges-sharing-attr+v [db ^long e attr]
  (eduction
   (mapcat #(d/datoms db :vaet (:v %) (:a %)))
   (remove #(== e ^long (:e %)))
   (d/datoms db :aevt attr e)))

favila16:09:07

[?ia :incident/kind ?a]
[(other-edges-sharing-attr+v $ ?ia :incident/organizations) [[?ib _ ?o]]]
[?ib :incident/kind ?b]]

favila16:09:10

I guess a subquery could do the same thing

cch113:09:33

I’m electing to create a domain attribute to track “created-at” and “updated-at”. Ideally, I would like these values to default to :db/txInstant but I would even be happy with opt-in setting them to :db/txInstant. Is there a built-in way to access :db/txInstant or is it possible to access that value in a transaction function? My fallback would be to set both :db/txInstant and my domain values from the same wall clock time (in a transaction function to avoid inconsistencies with multiple clients).

favila20:09:25

I’m not aware of any way to get the txInstant value within a transaction

favila20:09:22

If a join is acceptable to you, you could make created-at and updated-at a reference to the tx instead

Chip21:09:05

That sounds ideal…unless I’m missing something.

favila17:09:36

it’s an extra join every time you want to read the time, so more IO

cch117:09:22

It’s also incompatible with the pull API, unfortunately.

favila17:09:18

huh? no it isn’t. (d/pull db [{:something/created-at [:db/txInstant]} id)

cch117:09:43

Sorry… I am just now appreciating that your suggestion was to create a reference to the tx -not just a time value.

cch117:09:24

In cloud, I assume I can use the “usual” string representation of the tempid for the tx like this…

[{:my/created-by "datomic.tx"
  ...}]

cch117:09:59

(substituting created-by for created-at for obvious reasons)

frankitox20:09:43

I'm trying to get the latest transaction that modified an entity, is this the best approach? 🧵

frankitox20:09:50

(d/q '[:find (max ?tx) .
       :in $ ?e
       :where [?e _ _ ?tx _]]
     (d/history db)
     17592212457845)

yes 1
favila20:09:54

It’s really the only approach. It may be a little less memory-intensive to reduce over d/datoms :eavt, but it’s doing the same thing fundamentally

frankitox20:09:10

I think this works too, but feels like a bad idea

(d/t->tx (dec (d/tx->t 17592212457845)))
Thank you!!

favila20:09:46

that doesn’t tell you the last transaction that modified an entity?

favila20:09:20

it tells you the transaction right before the entity id was minted

favila20:09:09

no, that’s not right

frankitox20:09:14

Both expressions give the same result to me

frankitox20:09:14

Also I rather not use dec there, I'm not completely sure is a good idea

favila20:09:19

it tells you some T > the T of the TX which minted the entity id and <= the T of the following TX

favila20:09:00

> Both expressions give the same result to me

favila20:09:14

If that is so, it’s because you haven’t asserted or retracted anything on the entity since the transaction that created it

favila20:09:18

also there’s no guarantee that d/t->tx is giving you a TX entity

frankitox20:09:22

oooh, alright

favila20:09:09

There’s a single T counter in the db, and transactions and entities are both minted from the same T, so the T of entity ids do interleave like TX1 entity-id-minted-in-TX1 TX2 entity-id-minted-in-TX2 etc

frankitox20:09:44

not every T has a corresponding TX?

frankitox20:09:09

So the counter can increase without minting anything?

favila20:09:00

that can happen via aborted txs; but even without that the T may belong to another entity that is not a TX

frankitox20:09:53

oooh ok, so for example if I have a transaction that adds 2 new entities, the counter should increase at least by 3?

frankitox20:09:30

ok, that's gold! thank you

frankitox20:09:47

the docstrings are so short I never can decipher anything

frankitox21:09:42

> d/tx->t just strips the partition bits, giving you the T Quoting your https://observablehq.com/@favila/datomic-internals article! (thanks again)