2024-04-26 pathom | Clojure Slack Archive

pathom

2024-04-26T00:38:21.420179Z

I'm struggling with nesting and context. I've got a chain of resolvers that open a binary file, decode it, produce {::big-map {id-number {:foo/bar 7, ...} ...}} with a map of records. that part works fine, the map is present and correct. the part that's not working is one resolver which returns a list like {:matches [{:foo/id 12} {:foo/id 13}]}. I'm using a (pbir/attribute-table-resolver ::big-map :foo/id [:foo/bar ...]), which I was hoping would allow for a query like (p.eql/process index {:file/path "..."} [{:matches [:foo/bar ...]}]) to populate that list with all the :foo/* attributes.

wilkerlucio 2024-05-02T12:42:51.214919Z

thanks @braden.shepherdson! today I'm packed, I expect to have a proper look at it tomorrow 🙏

2024-05-02T12:55:46.991179Z

Sure thing, no rush.

2024-04-29T13:28:02.028809Z

I haven't disappeared. I tried to build a simplified repro of this on the weekend and it worked fine. then I tried more debugging and some simplification of the real thing, and it's still broken. I ran out of time but when I get back to it I want to try to find where there seems to be an explosion in the size of the graph. trying to print the (ex-data *e) after a query error is crashing my Conjure and locking up my REPL; trying to send it to Pathom Viz either locks it up or it shows empty everything. my current guess is that the huge map that results from parsing the ~4MB binary file is getting shape-descriptor'd even though I'd be better to treat it as opaque. but that's just untested speculation. it might even have cycles in it, I'm not sure.

wilkerlucio 2024-04-29T13:37:25.460069Z

thanks for the effort to repro Braden 🙏

2024-04-29T13:42:49.567879Z

I'm betting 90%+ that it'll be my bug at the bottom of this, but probably there's a documentation improvement to be wrung out of this at least.

wilkerlucio 2024-05-07T19:21:04.708409Z

hi @braden.shepherdson, sorry the delay, but having a look now, it looks like a bug to me, specially considering it works when we provide the data with a resolver, it might be missing the check for entity data in some point, I'm debugging it now

wilkerlucio 2024-05-07T19:24:58.499799Z

ah, actually, I think I just understood why the difference in providing it vs creating a resolver

wilkerlucio 2024-05-07T19:25:36.089699Z

when you send the data as entity, that data is only gonna be available for the root entity, but when you provide it as a resolver, it will be available everywhere

wilkerlucio 2024-05-07T19:27:42.238989Z

and that's what causes the difference you see, because when you do: [{::customers [:data/name]}], the place where ::big-data is going to evaluated is at the path [::customers 0] so, when you send as entity, its like having this:

{::problem-input 1
 ::customers [{... looking for ::big-data here ...}]}

as we can see here, at [::customers 0] we dont have ::problem-input available, and that causes the planning failure

wilkerlucio 2024-05-07T19:28:00.060789Z

but when you provide ::problem-input as a resolver, that makes it available at any path, so it works

wilkerlucio 2024-05-07T19:28:03.304699Z

does it make sense?

2024-05-07T19:29:07.821069Z

that makes sense, though it seems like you might have the logic inverted, since adding ::problem-input is what makes it fail to plan.

wilkerlucio 2024-05-07T19:30:23.510439Z

Im bringing the comparison between your two last cases:

(testing "fails to plan with an input"
        (is (thrown-with-msg?
              Exception #"Pathom can't find a path for the following elements"
              (p.eql/process input {::problem-input 1} [{::customers [:data/name]}]))))
      (testing "weirdly it works with constantly-resolver and I don't know why"
        (is (= {::customers [{:data/name "Alice"} {:data/name "Carol"}]}
              (p.eql/process magic [{::customers [:data/name]}]))))

wilkerlucio 2024-05-07T19:30:49.431059Z

the weirdly... part is what I'm trying to explain here

2024-05-07T19:31:14.864059Z

oh, I see.

wilkerlucio 2024-05-07T19:32:38.024699Z

its because when you have the ::problem-input required for ::big-map, when ::problem-input is provided as entity, its only available at root, but to do {::customers [:data/name]}, you need that to be available at each entity of a customer, which in the case of providing as entity doesn't satisfy

wilkerlucio 2024-05-07T19:33:09.148509Z

but when ::problem-input is provided as a resolver, it makes it available for every entity, no matter where it is (what path inside the output it is)

wilkerlucio 2024-05-07T19:33:47.614019Z

the confusion here might be thinking that when you provide entity data, its gonna be available everywhere, but it isn't, the entity data is like merging at the root, when navigating you not gonna see things at parent levels

2024-05-07T19:33:59.750659Z

okay yeah, that makes sense.

2024-05-07T19:36:35.765339Z

is there a better way to accomplish that kind of thing, when there is a "problem input"? because the only path I can see is that the HTTP request comes in, and then I run two p.eql/process or similar queries. first to compute ::big-map and return it, second to compute the actual response structures, passing the ::big-map data such that it's globally available.

wilkerlucio 2024-05-07T19:36:40.318279Z

here is a way to illustrate this, how it could work with your setup:

(testing "illustrating putting the ::problem-input where it needs to be"
        (is (= {::customers [{:data/name "Alice"} {:data/name "Carol"}]}
              (p.eql/process input {::customers [{:data/id 123 ::problem-input 1}
                                                 {:data/id 789 ::problem-input 1}]}
                [{::customers [:data/name]}]))))

wilkerlucio 2024-05-07T19:37:53.504279Z

is that input being provide directly by you? or its part of some dependency chain?

wilkerlucio 2024-05-07T19:38:16.740109Z

because one way to make it available is to add a resolver for (as discussed, in a way it gets available everywhere)

wilkerlucio 2024-05-07T19:38:52.037409Z

or, you can pull the input from env, and send it at env, this is just a different approach, but also makes it globally available

wilkerlucio 2024-05-07T19:39:02.236169Z

(talking about the input point you provide to run the query)

2024-05-07T19:41:32.893129Z

• the problem input is effectively a log file name • ::big-map is the result of digesting that big file into a map from ID to maps with a bunch of attributes • that should all be globally available for the below: • the response is a bunch of lists of "matching" entries from ::big-map, with only some of the attributes included.

wilkerlucio 2024-05-07T19:42:19.599489Z

there is an important thing to account for here: is that input for ::big-map, consistent though the whole query? if not, if it should just affect that sub-tree (because you might have different inputs at different places), them that wont work

wilkerlucio 2024-05-07T19:42:41.871699Z

if the case is the second, then you need to forward that dependency down when you create the collection (the ::customers in your example)

wilkerlucio 2024-05-07T19:44:01.509139Z

making it look like this:

customers    (pco/resolver
                       `customers
                       {::pco/input  [::problem-input ::big-map]
                        ::pco/output [{::customers [::problem-input :data/id]}]}
                       (fn [_env {m ::big-map i ::problem-input}]
                         {::customers (for [id (keys m)
                                            :when (odd? id)]
                                        {::problem-input i :data/id id})}))

wilkerlucio 2024-05-07T19:44:20.827599Z

changing the customers resolver this way fixes the input case that was throwing before

wilkerlucio 2024-05-07T19:45:26.448249Z

also note I add the nested output in the ::pco/output section, this is the recommended thing to do, and also, without this some nested input queries that should succeed might fail (because Pathom checks nested paths, and without the nested description its unable to know what is available at that level)

2024-05-07T19:46:23.728469Z

the input and ::big-map are consistent across all the rest of the query, hence why I said I could fall back to treating them as two separate queries.

wilkerlucio 2024-05-07T19:46:59.323889Z

well, in that case you dont need to, just make an env-resolver and provide it as env, that is a good way to provide any dependency that should be globally available

2024-05-07T19:47:40.296919Z

noted, that makes sense.

wilkerlucio 2024-05-07T19:48:02.952939Z

a simple way to create a resolver to pull data from env: (pbir/constantly-fn-resolver ::problem-input ::problem-input)

wilkerlucio 2024-05-07T19:48:27.712449Z

this will read ::problem-input from env and provide it as ::problem-input (for any input requirement)

2024-05-01T19:15:41.152789Z

I'm still wrestling with this, trying to narrow down what's wrong with my data or resolvers. what's the right way to express returning a map in ::pco/output? like I have a resolver that returns a map of :job/id to maps like

{:job/id 2721863168
 :job/origin {:location/id "string_slug", :company/id "string_slug"}
 :job/destination {:location/id "b", :company/id "c"}
 ...}

I want to stick this map somewhere and then reference it in a pbir/attribute-table-resolver. so is that a resolver with {::pco/output [::big-map]} returning {::big-map {123456 {:job/origin ...}}} plus (pbir/attribute-table-resolver ::big-map :job/id [{:job/origin [:location/id :company/id]} ...])? I can't find any examples with maps like that except in the built-in resolvers docs.

2024-05-01T19:17:11.756979Z

I'm wondering if there's some EQL syntax I'm missing that expresses a map of data as opposed to fixed keywords as keys.

wilkerlucio 2024-05-01T22:26:49.268629Z

nested outputs can be expressed with the join syntax, for the example you sent above:

[{:job/origin [:location/id :company/id]}
 {:job/destination [:location/id :company/id]}]

2024-05-02T01:09:17.075839Z

okay! I finally nailed down the issue. there was an important but subtle difference remaining between the attempted repro and my broken code. resolvers with inputs are not available for nested data, and resolvers without inputs are available everywhere(?)

2024-05-02T01:09:57.178239Z

(perhaps in all cases, or perhaps only if there's a graph relationship between the nesting and the inputs?)

2024-05-02T01:10:08.466739Z

I'm turning it into a distilled example now.

2024-05-02T02:53:58.607879Z

https://gist.github.com/bshepherdson/b5667894a37220a25627c18811cb1e63 hopefully that makes the issue clear. the test is a little convoluted, but it's trying to show the difference, in the small edit to add an input.

2024-05-02T03:04:07.058349Z

I kept accidentally making it work trying to shrink the repro. it does seem like the key is that there's a diamond dependency.

::problem-input
     /               \
::list             ::customers
     \               /
      [::customers 0]

2024-05-02T03:09:04.392229Z

though that doesn't explain why the pbir/constantly-resolver works.

2024-05-02T03:56:26.564529Z

I should say, none of that is necessarily a bug report. This may well be working as intended. I'd love to understand the nuances of diamond paths. It seems to me that there's nothing fundamentally broken about the diamond dep, it's not a tree but it's still a DAG. But perhaps I'm missing something. Perhaps there is a good way to make this work within one query. I think I can restructure my resolvers to keep this in one pass, and if I can't then I can make it two passes with the big map as an input or placeholder to the second one.

2024-04-26T13:09:47.123829Z

still stuck on this. I dug into the ex-data and see that ::big-map is in the :unreachable-paths, along with all the other global stuff. is a nested chunk like [:matches 0] not allowed to see global things? is it because there's a cycle I'm not seeing?

2024-04-26T13:21:38.133089Z

is there a barrier between separately registered indexes? I've got a few groups of resolvers in different files and they're getting combined like (pci/register [abc/index xyz/index local-resolver]) where abc/index is itself a (def index (pci/register [several local resolvers])).

wilkerlucio 2024-04-26T14:11:37.833409Z

hello Braden, can you make a small repro so I can run and check on my side?

wilkerlucio 2024-04-26T14:12:04.292829Z

about the last question, no, it doesn't matter when you registered the resolvers (or how you group their registering), its always like adding one at a time

2024-04-26T14:13:38.872329Z

I've been trying to create a small repro... but they're all working. I still haven't figured out what's breaking it. I'm betting on a bad output shape somewhere, but I can't find it. I'll send a repro if I get one going.

👍 1

2024-04-26T14:23:46.597179Z

should I be concerned about potential cycles? the chain is a little involved, but it's something like this: • ::big-map distills the binary file into the map of :foo/id to several properties. ◦ its output is [{::big-map [:foo/bar ...]}] giving the complete shape • ::big-list is [{:foo/id}] for all of the vals of ::big-map • a resolver considers a few :foo/bar props and provides :foo/match? which is a bool • :list/match? filters ::big-list to only those with :foo/match? true all of that seems to be working, I get a correct result for :list/match? [{:foo/id 123} {:foo/id 456}] with only a few entries (`big-list` has a few thousand). I expected that I could query for [{:list/match? [:foo/bar :foo/baz]}] and it would populate those matches with values from ::big-map - but that's the part that chokes.

2024-04-26T14:24:32.655569Z

if that plan is sound in theory then I'm not sure where it's going wrong. alternatively, I'm happy to be told that that's a bad way to model all this!

2024-04-26T14:27:42.219519Z

but I'm concerned about cycles since :list/match? and its entries already depend on ::big-map, so perhaps they can't reference it again?

wilkerlucio 2024-04-26T16:30:25.284829Z

about cycles, Pathom 3 does cycle detection and stop those pathways, laterally and nested as well but for your case, this is more complex than I can hold in my mind at once and infer from description, the issue on this cases is that little nuances are really important to understand whats happening, so a working repro is ideal because there I dont have any missing information (I can see exactly what you are doing), lets continue once you get some repro that we can discuss around

2024-04-26T16:30:56.710849Z

👍

2024-04-26T19:22:20.251459Z

what's the status of Pathom in the browser, for 2 or 3? is it production ready?

wilkerlucio 2024-04-26T21:57:46.349049Z

same as the clj versions

2024-04-26T22:00:34.507729Z

thanks

dvingo 2024-04-27T00:05:23.727039Z

Interesting in-browser example here https://www.reddit.com/r/Clojure/comments/y43tw6/pathom3_datascript_reagent_spa_toy_project/

Clojurians Log v2

pathom