Fork me on GitHub
#datomic
<
2021-09-08
>
Ivar Refsdal07:09:50

I have a Datomic backend application that primarily writes and reads from the database. Sometimes however it needs to talk to external HTTP services and put response values from those into the database. These values are currently added as a part of a larger transaction. Sometimes those HTTP requests fail, and then the whole transaction fails. What is a good strategy to solve this? I'm thinking of adding a queue job in the database to make HTTP requests in order to accomplish something like https://microservices.io/patterns/data/transactional-outbox.html Is there a library for using Datomic as a queue consumer? Thanks.

Jakub Holý (HolyJak)12:09:59

Cloud or on-prem? And I did not really understand > These values are currently added as a part of a larger transaction. Sometimes those HTTP requests fail, and then the whole transaction fails. A transaction is just data, and presumably you execute it after you got response from the remote, no? How can a failed/missing response make the tx fail? Do you have some tx function that checks for the presence of data returned from the remote? Do you need to do that as a part of a single, larger transaction? Cannot you handle the job queue in the app and write results into the DB in small, dedicated transactions? I guess I simply do not understand your use case well enough...

César Olea15:09:39

Not sure if you're using cloud or on-prem. But if cloud and using ions, I would build an ion and wire the resulting lambda to be a consumer from sqs for example. When it's time to talk to the external HTTP service, publish a SQS message and let the ion handle the logic.

César Olea15:09:43

However I'm very interested in using Datomic to implement the transactional outbox pattern. I was thinking on adding a stream to the DynamoDB tables that Datomic uses for persistence, but I'm not sure what data is contained where as there are multiple DynamoDB tables created. Hopefully this is documented somewhere.

Joe Lane18:09:03

That dynamo stream won’t have what you’re looking for @U02DNF3TW3E It’s all encrypted fressian

César Olea18:09:13

Thanks @U0CJ19XAM you saved me hours of looking around. In that case what would be a good way to implement the transactional outbox pattern in a Datomic Cloud instance?

Joe Lane18:09:01

There are many ways to slice it that are better or worse depending on data volume, latency requirements, idempotency capabilities of producing and consuming systems etc. I’d need far more info to give any sort of production ready recommendation.

Joe Lane18:09:22

@UGJE0MM0W Do I understand this correctly, you're issuing HTTP requests from within a transaction function?

Ivar Refsdal11:09:42

It's on-prem. I'm sorry about replying so late. I also obviously haven't explained myself well enough as multiple people have misunderstood me. I've tried to explain the case better here: https://github.com/ivarref/yoltq#rationale That repo is also a datomic queue implementation the transactional outbox pattern. It should be portable to cloud as well @U02DNF3TW3E, but I suppose with a polling only strategy (no tx-report-queue).

Jakub Holý (HolyJak)12:09:55

I am trying to write an interesting graph query and want to check with you whether there is a better approach. I have: • A graph of components that might be connected by directional, labeled references (i.e. 2 components may be connected by multiple, different references) • Components belong to a workspace but references can go from a component in one workspace to a component in another one • => I want to fetch all components in a workspace with their references AND also the components at the other end of their references, if they are not already being fetched (i.e. if they belong to a different workspace) • ... but I want to exclude components from such other workspaces that the user is not authorised to access My approach is to do this in 4 steps: 1. Make a filtered DB so that only the workspaces and components the user can access are present 2. Pull all the components from the desired workspace together with their references and the IDs of the components at the other end 3. Extract IDs of all reference source/target components that have not been pulled (b/c they are in different workspaces) 4. Pull these missing components, by their IDs Does that make sense? The query for 2.:

[:find (pull ?component [*
                         {:reference/_source [* {:reference/target [:db/id]}]}
                         {:reference/_target [* {:reference/source [:db/id]}]}]) 
    :where [?component :component/workspace ?workspace-id]
    :in $ ?workspace-id]
Then I would do 3., i.e. something like (def fetched-comp-ids (->> result (map :db/id) set)) and (pseudocode) (get-in result [ALL :reference/_source ALL :reference/target :db/id]) + similarly for reference/_target to diff those IDs against the fetched-comp-ids to get a set of extra-workspace-comp-ids. Finally, .4, I fetch those like so:
(d/q '[:find (pull ?component [* {:component/parent ...}]) ; include all ancestors too
    :where [?component :component/$id ?component-id]
    :in $ [?component-id ...]] 
 db extra-workspace-comp-ids)
Is there a better way? Also, the 2. query will load all intra-workspace references twice, once for the target and once for the source component. Is that a performance or memory problem or is the DB smart enough not to waste any resources and does it use structural sharing not to waste any memory? Or should I rather only pull the reference IDs and fetch the references themselves in a separate query? Thank you for any advice!!!

Linus Ericsson13:09:49

I think you try to solve two different problems with your pull expressions at once: 1) the relations for a certain component 2) the data for the components. It's hard to give a definite answer of course, but in general pull expressions don't cache the various expressions, so the data structure for one component pulled twice will be identical but different objects. The primitive data will be structurally shared (I cannot see any reason why they wouldn't). It's probably a good idea to use a filtered db to restrict access for a certain user! To get all the data, I think you should first deduce which components (and pull expressions) that is requested for each component and then pull them in one go. I would look closely into how the library pathom would solve this kind of problem.

favila13:09:46

I don’t see that you’re using named rules recursively in your query. Are you aware of this technique?

favila13:09:13

I’m also not sure how important the shape of the map projection is to you

Jakub Holý (HolyJak)16:09:15

Thank you both! > pull expressions don't cache the various expressions, so the data structure for one component pulled twice will be identical but different objects Good to know! This sounds as something I would want to avoid. And no, I am not using named rules. I am vaguely aware of them but not sure how they would benefit me here? I use recursion to get the (extra-workspace) component's parent and its parent etc. The pull expression for that seems simple enough? The shape of the result is not critical, I can always reshape it in the code how I need.

Jakub Holý (HolyJak)16:09:43

@UQY3M3F6D So if I understand you right, it would be better to 1. Pull all the components in the target ns, without references (there can be 10s of thousands of these at extreme cases) 2. Pull all the references that have these components as their source or target (I could either pass in IDs from 1. as a parameter or use Datalog to figure out the right references 3. Proceed as in my original plan, to get the IDs of the reference end components that I do not have yet and to fetch them Correct?

favila16:09:35

> I use recursion to get the (extra-workspace) component’s parent and its parent etc.

favila16:09:01

You rely on db filtering to exclude “not-allowed” components?

favila16:09:03

If you do and this filtering is done correctly, it seems you can just pull recursively from your target workspace and there is no step 3

favila16:09:53

but I don’t understand what output is desired. Recursive pulling at arbitrary depth?

Jakub Holý (HolyJak)16:09:00

Yes, my plan was to leverage db filtering for this. Regarding parents - which I pull recursively - I know that if a component is "allowed" then its parents are as well. I want to fetch all the ancestors of a component, so yes, an unlimited recursion, though the number of these is normally quite low. I want to get a component including its :parent , which is also a component, including its :parent , ... until the component is the root, with no parent of its own. So

{:component/$id 3, ...
 :component/parent {:component/$id 2, ...
                    :component/parent {:component/$id 1, ...}}}

Jakub Holý (HolyJak)16:09:28

I need step 3 because when I fetch the references in 2., some ends of these references - ie. some components - have not been pulled yet in 1. (because they live in an external workspace). I could fetch references together with the components but there I would fetch also lot of data I already have (the components in the workspace of interest + duplicates of components in external workspaces that are linked to multiple components in the ws. of interest). So the idea is to fetch just IDs (`{:reference/$id .., :reference/source <id of a component>, :reference/target <id of a component>}` ) and then fetch the missing ones.

favila18:09:41

I’m still not sure exactly what you’re after here, but my surprise was this seemed like a recursive-rule problem. E.g. I was expecting to see a rule like this:

favila18:09:47

(d/q '[:find ?c1 ?ref ?c2
       :in % $ ?user ?c
       :where
       (reachable-edges ?c ?c1 ?ref ?c2)
       (user-accessible ?user ?c1)
       (user-accessible ?user ?c2)]
     '[[(user-accessible ?user ?component)
        ;; Donno the criteria here, making one up
        (not [?component :disallow ?user])
        ]
       [(refs ?comp ?ref ?comp2)
        [(ground [:reference/source :reference/target]) [?ref ...]]
        [?comp ?ref ?comp2]]

       ;; The immediate edge
     
       [(reachable-edges ?comp ?comp1 ?ref ?comp2)
        [(identity ?comp) ?comp1]
        (refs ?comp1 ?ref ?comp2)]
       [(reachable-edges ?comp ?comp1 ?ref ?comp2)
        [(identity ?comp) ?comp2]
        (refs ?comp1 ?ref ?comp2)]
       
       ;; the next edge over
     
       [(reachable-edges ?comp ?comp1 ?ref ?comp2)
        (refs ?comp _ ?comp-next)
        (reachable-edges ?comp-next ?comp1 ?ref ?comp2)]
       [(reachable-edges ?comp ?comp1 ?ref ?comp2)
        (refs ?comp-next _ ?comp)
        (reachable-edges ?comp-next ?comp1 ?ref ?comp2)]]
     [[1 :reference/source 2]
      [2 :reference/source 1]
      [1 :reference/target 2]
      [2 :reference/target 1]
      [1 :reference/target 3]

      [3 :reference/source 4]
      [3 :disallow-user "user"]
      [4 :reference/source 3]
      [3 :reference/target 4]
      [4 :reference/target 3]
      [5 :reference/target 6]]
     "user"
     1)

favila18:09:59

something that walked the component refs recursively and built a list of edges

favila18:09:07

filtering them by accessiblity

Jakub Holý (HolyJak)08:09:20

Oh, I see. It is far simpler than that. Sorry for not being able to express it properly! I will try again, better: 1. I have 3 kinds of relevant entities: workspaces that group components and references that connect two components that might or might not be in the same workspace 2. What I want is 3 lists: (a) all components in a given workspace, (b) all references that start or end at one of these components, (c) all components from other workspaces that are at one end of any of these references (and are not in workspaces forbidden to the user) - and here I also want the chain of their parents. So the query for a and b is very simple, only c is little more complicated

favila13:09:48

I think this is potentially all one query?

favila13:09:52

(d/q '[:find
       (pull ?c [*])
       (pull ?ref [:reference/source :reference/target])
       (pull ?c2 [:component/$id {:component/parent ...}])
       :in % $ ?user ?workspace
       :where
       [?c :component/workspace ?workspace]
       (involved-refs ?c ?ref _ _ ?c2)
       (user-accessible-component ?c2 ?user)]
     '[[(user-accessible-component [?c] ?user)
        ;; Donno the criteria here, making one up
        [?c :component/workspace ?wk]
        [?user :user/workspaces ?wk]]
       [(involved-refs ?this-comp ?ref ?this-rel ?other-rel ?other-comp)
        [(ground [[:reference/source :reference/target]
                  [:reference/target :reference/source]])
         [[?this-rel ?other-rel]]]
        [?ref ?this-rel ?this-comp]
        [?ref ?other-rel ?other-comp]]]
     unfiltered-db
     user
     workspace)

favila13:09:07

(a) “all components in a workspace” is clause 1. (b) is clause 2, with a visibility check in clause 3 instead of a filtered db. (c) is the “other” component in the ref, which may or may not be in the same workspace, but is definitely visible

favila13:09:56

I guess my puzzlement was why bother processing the output of pull expressions to find more components when datalog can do it for you

Jakub Holý (HolyJak)19:09:54

Ah, I have not realized I can have multiple pulls in a single :find. Thank you! If I understand it correctly, this will work nicely for all components in workspace that are the source or target of a reference whose other end is a user-accessible component (in your example, in the user's workspace). But what about 2 components that are both in workspace ? And what if they have 2 different references between them? And even for the case you describe, with c2 in the user's workspace, we would fetch c2 N times if it is connected to components in the workspace via N different relations, no? And I suppose we want to avoid pulling the same component/reference repeatedly to avoid waisting both processing time and memory?

Jakub Holý (HolyJak)19:09:26

In the workspace of interest, let's call it w1, I can have three components C1, C2, C3 with 9 different references between them plus 6 different references to components outside of w1, where 4 of these are in user-accessible workspace w2. I believe I want to fetch each of C1, C2, C3 and each of the 9+4 references and the 4 external components exactly once, because fetching anything repeatedly means Datomic has to construct the entity repeatedly, costing me more time and more memory. No?

favila19:09:11

I think it’s sensible to pull in separate steps. The point I was trying to make was that your “what components + refs to include” step is much more directly and clearly expressed as datalog queries rather than pull-walking

favila20:09:45

by changing the find you can return just the component ids + their refs + components then pull and reassemble

favila20:09:07

(Also, pull has a default cardinality-many limit of 1000--people forget that)

Jakub Holý (HolyJak)20:09:23

Ah, awesome, now I understand! Thanks a ton! And yes, I absolutely missed the point that pull-many has a limit 😅 That explains why it was so relatively fast 😂

favila20:09:15

since you nerd-sniped me pretty hard already, this is how I imagine doing it, altering pull-expressions to your taste:

😻 2
🙏 2
favila20:09:18

(let [refs (d/q '[:find (pull ?ref [:db/id :reference/source :reference/target])
                  :in % $ ?user ?workspace
                  :where
                  [?c :component/workspace ?workspace]
                  (involved-refs ?c ?ref _ _ ?c2)
                  (user-accessible-component ?c2 ?user)]
                '[[(user-accessible-component [?c] ?user)
                   ;; Donno the criteria here, making one up
                   [?c :component/workspace ?wk]
                   [?user :user/workspaces ?wk]]
                  [(involved-refs ?this-comp ?ref ?this-rel ?other-rel ?other-comp)
                   [(ground [[:reference/source :reference/target]
                             [:reference/target :reference/source]])
                    [[?this-rel ?other-rel]]]
                   [?ref ?this-rel ?this-comp]
                   [?ref ?other-rel ?other-comp]]]
                user
                workspace)
      component-refs (reduce
                      (fn [xs [ref]]
                        (let [c-source (-> ref :reference/source :db/id)
                              c-target (-> ref :reference/target :db/id)]
                          (-> xs
                              (update [c-source :reference/_source] (fnil conj []) ref)
                              (update [c-target :reference/_target] (fnil conj []) ref))))
                      {}
                      refs)
      component-ids (vec (keys component-refs))
      component-entity (zipmap
                        component-ids
                        (d/pull-many db '[*] component-ids))]
  (into []
        (map (fn [[cid refs]] (into (get component-entity cid) refs)))
        component-refs))

Jakub Holý (HolyJak)09:09:21

Neat, thanks a lot! I learned a great this from this discussion, thank you for your generosity and time!

Jakub Holý (HolyJak)09:09:22

In (user-accessible-component [?c] ?user) , why is the first argument inside a vector? Found it, it is to require binding at usage time.

2