2024-02-07 pathom | Clojure Slack Archive

pathom

Georgiy Grigoryan 2024-02-07T09:56:56.737889Z

I am exploring pathom3 with datomic, leveraging datomic pull syntax. It seems I’m coming across an issue similar to the fat type/type explosion issue. One option is to define one resolver that resolves from :db/id -> any attribute in datomic, possibly tens or hundreds of attributes. So I pull any and every attribute and then pathom presumably selects the requested keys. The other extreme is to define a resolver for every attribute in datomic. This results in seemly excessive calls to datomic pull. For now I’ve settled on defining a resolver for each entity “type”, like user, course, etc. This results in some intermediate solution. But given that with datomic pull we can specify fine grained attribute selections, it seems like a waste to lose this when we use it with pathom. I briefly examined dynamic resolvers as a possible solution, but it says they’re experimental and not to rely on them for critical parts of the system. Is this (one of the) problem they’re aiming to solve? I’m hoping some of you are farther along in this journey and maybe have some suggestions? How are you using pathom3 with datomic with/without dynamic resolvers, especially in production?

wilkerlucio 2024-02-07T15:11:04.777199Z

there is a middle ground solution you can use to avoid loading attributes you dont need without having to resort to dynamic resolvers, that is using the planner ::pcp/expects data, you can pull this from your resolver like: (-> env ::pcp/node ::pcp/expects) the value there will be a https://pathom3.wsscode.com/docs/shape-descriptor telling you what is required (considering the whole request) from that resolver, you can use that to "mask" your pull from datomic so, in summary • make a resolver that exposes every possible attribute of that entity • in the resolver execution, see which of these attributes are really required via ::pcp/expects • pull only these attributes please let me know if this seems to work for your case

Georgiy Grigoryan 2024-02-07T15:33:35.764239Z

Thank you! I will try it out and get back to you.

Georgiy Grigoryan 2024-02-16T08:57:28.467339Z

I found an issue with using (-> env _::pcp/node ::pcp/expects)_ Here’s a minimal example to illustrate:

(defresolver example
 [env {:keys [id]}]
 {::pco/output [:a :b]}
 (-> (get {1 {:a "a" :b "b"}} id)
     (select-keys (-> env ::pcp/node ::pcp/expects keys))))

(def env-example
 (-> {:com.wsscode.pathom3.error/lenient-mode? true}
   (pci/register [example])))

Now a query: (p.eql/process env-example {_:id_ 1 _:ref_ {_:id_ 1}} [_:a_ {_:ref_ [_:b_]}])

=> 
{:ref {:b "b"},
:com.wsscode.pathom3.connect.runner/attribute-errors {:a #:com.wsscode.pathom3.error{:cause :com.wsscode.pathom3.error/node-errors,
                                           :node-error-details {1 #:com.wsscode.pathom3.error{:cause :com.wsscode.pathom3.error/attribute-missing}}}}}

The :b key under :ref is fetched successfully, but :a is reported missing. My hypothesis is that pathom runs each resolver with the same input only once. In this case it ran it first when processing the {:ref [:b]} join and since :a wasn’t returned it was reported missing. Let’s confirm the hypothesis by including :a in the :ref join like this: (p.eql/process env-example {:id 1 :ref {:id 1}} [:a {:ref [:a :b]}])

=> {:a "a", :ref {:a "a", :b "b"}}

Now it is returned successfully. Is there a way around this other than not using (-> env _::pcp/node ::pcp/expects_ keys) in the resolver?

Georgiy Grigoryan 2024-02-16T09:00:56.800189Z

@wilkerlucio

wilkerlucio 2024-02-16T17:10:51.248109Z

hello @geodrome, just got to read and understand, its a tricky one, and its related to caching as you notice

wilkerlucio 2024-02-16T17:11:22.440289Z

once a resolver is cached, the next run (using same input and params) will hit the cache, which means it wont run your content

wilkerlucio 2024-02-16T17:11:35.256039Z

for Pathom to know that expects across different entities is quite tricky

wilkerlucio 2024-02-16T17:12:08.784059Z

one way out would be to not cache the resolver as a whole, you can do it by:

wilkerlucio 2024-02-16T17:12:23.019949Z

(ns demos.unexpected-expects
  (:require [com.wsscode.pathom3.connect.indexes :as pci]
            [com.wsscode.pathom3.connect.operation :as pco]
            [com.wsscode.pathom3.connect.planner :as pcp]
            [com.wsscode.pathom3.interface.eql :as p.eql]))

(pco/defresolver example
  [env {:keys [id]}]
  {::pco/output [:a :b]
   ::pco/cache? false}
  (-> (get {1 {:a "a" :b "b"}} id)
      (select-keys (-> env ::pcp/node ::pcp/expects keys))))

(def env-example
  (-> {:com.wsscode.pathom3.error/lenient-mode? true}
      (pci/register [example])))

(comment
  (p.eql/process env-example
    {:id 1 :ref {:id 1}}
    [:a {:ref [:b]}]))

wilkerlucio 2024-02-16T17:12:56.286259Z

but of course, if the actual computation is expensive, you don't want it to run multiple times, to fix that you can use an internal caching for that, like:

wilkerlucio 2024-02-16T17:15:13.912229Z

(ns demos.unexpected-expects
  (:require [com.wsscode.pathom3.cache :as p.cache]
            [com.wsscode.pathom3.connect.indexes :as pci]
            [com.wsscode.pathom3.connect.operation :as pco]
            [com.wsscode.pathom3.connect.planner :as pcp]
            [com.wsscode.pathom3.interface.eql :as p.eql]))

(defn expensive-db-get [id]
  (Thread/sleep 1000)
  (get {1 {:a "a" :b "b"}} id))

(pco/defresolver example
  [env {:keys [id]}]
  {::pco/output [:a :b]
   ::pco/cache? false}
  (-> (p.cache/cached :com.wsscode.pathom3.connect.runner/resolver-cache* env [`example id] #(expensive-db-get id))
      (select-keys (-> env ::pcp/node ::pcp/expects keys))))

(def env-example
  (-> {:com.wsscode.pathom3.error/lenient-mode? true}
      (pci/register [example])))

them:

(time
    (p.eql/process env-example
      {:id 1 :ref {:id 1}}
      [:a {:ref [:b]}]))
"Elapsed time: 1.161333 msecs"
=> {:a "a", :ref {:b "b"}}

wilkerlucio 2024-02-16T17:17:06.438949Z

I think that's the best option here, so you separate the cache of the expansive operation while you still allow the resolver to have multiple outputs depending on the expectation in that context

wilkerlucio 2024-02-16T17:19:11.352909Z

another way to "trick" pathom into running the resolver again is by changing the params, given they are also a part of the cache key:

;; you can try this with the same config you already have for the resolver:

(time
    (p.eql/process env-example
      {:id 1 :ref {:id 1}}
      ['(:a {:custom "param"}) {:ref [:b]}]))

wilkerlucio 2024-02-16T17:19:39.289809Z

but I think that's less effective, because if you have an expensive computation inside the resolver, it will run multiple times

wilkerlucio 2024-02-16T17:20:26.884829Z

please let me know how it goes 🙂

Georgiy Grigoryan 2024-02-16T17:24:49.044669Z

Thanks. Let me think about this and see how I want to handle this.

Georgiy Grigoryan 2024-02-20T10:50:20.834369Z

@wilkerlucio I’ve had a bit of time to think about this. If I’m reading it correctly, the “internal caching” approach still fetches all the ::pco/output attributes and caches them. Then the resolver selects a subset of keys. I was trying to only fetch those attributes that are in (-> env ::pcp/node ::pcp/expects keys). Basically, I have one resolver that fetches all possible attributes from a datomic db. I was supplying (-> env ::pcp/node ::pcp/expects keys) to the datomic pull query to limit how much data I fetch from datomic on each invocation of the resolver. The ideal solution would fetch (and cache!) only those attributes that are “expected” on each invocation of the resolver. If the resolver is called again with the same input it would get the cached values for cached attributes and pull any remaining attributes from datomic. So in our example it might first pull :b from datomic and cache it, and then pull :a from datomic the second time it’s called. It seems this doesn’t fit pathom’s caching mechanism, which is by [resolver input]. And I am not sure that having a single datomic resolver is necessarily what I want. I was just exploring the options. One solution might be for me implement my own layer of caching for the resolver that is per [id attribute]. At this point I would almost be building my own EAV index for each datomic db basis as obtained with (d/db conn) (I pass this in env)! This seems a little crazy. :) I could also try to use parameters to somehow convey to the resolver which attributes are already cached and which need to be pulled from datomic, but this is a kludge. I might just go back to having to having a datomic resolver for each type of entity. But having to define entity “types” kind of goes against datomic’s open/flexible entities. For now I’ve just removed (select-keys (-> env ::pcp/node ::pcp/expects keys)) from the resolver, which ends up fetching every existing attribute for each entity.

Georgiy Grigoryan 2024-02-20T10:50:59.735169Z

This also got me thinking, does pathom ever purge the cache? How does it limit the size of the cache? What about when working with traditional relational DBs that have no concept of DB basis? In other words something like Postgres is place oriented, so we’re not caching values, but references. Do you just turn off the cache for those types of DBs?

wilkerlucio 2024-02-20T13:48:40.660409Z

hi George, thanks for the elaborated response, this makes me have some thoughts too, but lets start with the simpler question about caching purge: the resolver-cache (which is the default one that you get automatically) has a lifecycle of a single request, so for each p.eql/process the cache is initiated (its a simple atom) and discarded at the end of that execution, so it won't hold your caches for long (so it shouldn't have a long impact on memory usage), you can override that cache with some other one if you want, but I recommend keeping this one as-is

wilkerlucio 2024-02-20T13:50:52.284799Z

now, about the expects and internal caching, you right, if we just load everything and filter the output, we are not doing much (would be better to don't filter the expects at all, since Pathom already does it), but we do have some interesting possible strategies here, one of them, as you said, could be caching each demanded attribute as a separate thing, so we could re-use it across different demands (lets say different entities that require different things from the same resolver)

wilkerlucio 2024-02-20T13:53:14.712079Z

another one that I'm not if would work now (because I haven't tested, just something that came into my head now, but needs more thinking) would be to convert that resolver in a batch resolver, which will centralize all the entities in a single request, having that I think we could "merge" the need across different entities that share attributes and make a single request for those, I like to try it here, I think I can do it later today and see how it goes

Georgiy Grigoryan 2024-02-21T06:21:41.592769Z

Thanks, Wilker. The batch resolver idea sounds promising. If it were possible to get the whole tree from datomic in one go, that would be ideal.

Georgiy Grigoryan 2024-02-09T11:12:42.659469Z

Having played with it, this seems like a reasonable compromise. Probably only necessary if you have entities with large number of attributes, but typically pull small subsets of those attributes.

Clojurians Log v2

pathom