Fork me on GitHub
#pathom
<
2022-04-04
>
nivekuil12:04:33

I'm looking at a node I expect to be green in pathom-viz but it's grey, and the error is this. relevant or just cosmetic?

nivekuil12:04:10

I think that's just cosmetic since every grey node has it.. I'm puzzled over this though:

nivekuil12:04:31

this OR node says the success-path is 71, which is the AND node

nivekuil12:04:11

but entry has already provided enough info

nivekuil12:04:34

what the OR node wants is embed/html which is provided already as nil

nivekuil12:04:58

so it seems like it's still walking extraneous resolvers? hard to repro this one though

wilkerlucio14:04:08

interesting one, I think I'm not checking for results ready in the OR node, so it does pick a path anyway, but this should be an easy fix, can you open an issue for that please?

wilkerlucio19:04:40

I just checked the code, the OR node does check for completion already, so that shouldn't be an issue, the grey means it got there but there was no need to run, so its skipped

wilkerlucio19:04:16

about the pluzzing error message, that's a bug in pathom viz, that error was not related to your project, but was Pathom trying to calculate the time spent there (which doesn't exists, given the node didn't ran)

wilkerlucio19:04:25

Im missing something?

wilkerlucio19:04:16

also, the order of execution might matter here, maybe it entered this OR node before the entry (the green one)

wilkerlucio19:04:33

grey nodes also happen for paths not taken in a OR branch

nivekuil15:04:58

well since entry provides a superset of the data that the OR node needs I wouldn't have expected to it be run in that order anyway

nivekuil15:04:20

also it's higher priority. I can check, I have the graph data saved but I can't find the tool to import graph data on the pathom3 website

nivekuil15:04:17

this was run with parallel, maybe that could affect order of execution?

wilkerlucio17:04:25

yes, parallel can certainly affect the order of execution

wilkerlucio17:04:58

looking at the timeline you may be able to see what ran first

wilkerlucio17:04:24

humm, but there is a problem, I double checked the code, the check is only happening on the serial runner, I'll add the same check for async and parallel today

wilkerlucio01:04:47

missed check on parallel and async are now fixed in main

nivekuil14:04:16

seems like you got it šŸ™‚

šŸŽ‰ 1
Stefan13:04:18

Hi all, weā€™re running into a bit of a performance problem with Pathom 3 that Iā€™d like to understand better. I hope someone can help. Please consider the following resolver:

(pco/defresolver all-organisations [_]
    {::pco/output [{:organisation/all [:organisation/id
                                       :organisation/name
                                       {:organisation/departments [:department/id]}]}]}
    {:organisation/all
     (map (fn [id] {:organisation/id id
                    :organisation/name (str "org-" id)
                    :organisation/departments [{:department/id 1
                                                :department/name "department-1"}]})
          (range 1 10))})
This is a resolver that normally goes out to the database, but for the sake of this discussion Iā€™m just generating some example organisations. Now I do this query:
[{:organisation/all [:organisation/id
                     :organisation/name
                     {:organisation/departments
                      [:department/id :department/name]}]}]
The results are OK, but when I do this for a thousand organisations, it takes maybe five seconds or so. Looking in Pathom Viz I see what I captured in the attached screen shot. As you can see, a lot of planning is going on for each department in each organisation, whereas all data is immediately available. Is this because Iā€™m doing something wrong? Is this maybe a missing optimisation step in Pathom? Thanks!!

Stefan13:04:40

By the way, this is not the same issue as https://github.com/wilkerlucio/pathom3/issues/134, I checked :)

wilkerlucio13:04:29

hello Stefan, one thing that you can improve here may be to use a vector mapv instead of map, Pathom handles vectors better in most cases and that should be preferred. about the sequence processing, Pathom has to plan for every entry, but this should hit a cache hit after the first, how many items are we talking about? can you make a repro? I have some ideas to improve even further the planning on sequences, but was waiting for some real case to put it forward, maybe this is the one, but like to check the possible edges first

pithyless15:04:11

Just out of curiosity, would the planner work better if you split the joins into separate resolvers and used the batch resolver? (It may also change the I/O contention and GC churn in a positive manner, i.e. theoretically more work but overall lower latency)?

wilkerlucio16:04:07

batch doesn't participate on planning in any way, its an operation during the running process only. when the runner detects a batch item, it will take note of that demand, pause the current entity execution and jump to the next entity, once the whole query is scanned, them it will look at every batch demand annotated, invoke them, merge results and then resume the process

wilkerlucio16:04:18

its a bit different on parallel, because in parallel there is no clear "scan ending" due to to many things being happening at the same time, so for batch there it uses a time window to group items (default 10ms if I remember right), accumulate and run

Stefan21:04:04

@U066U8JQJ Thanks for your suggestion. Using mapv doesnā€™t seem to make a noticeable difference in this case unfortunately. I think this is the complete reproduction, expanding on the code above:

(require '[com.wsscode.pathom3.interface.eql :as p.eql])
  (require '[com.wsscode.pathom3.connect.operation :as pco])
  (require '[com.wsscode.pathom.viz.ws-connector.core :as pvc])
  (require '[com.wsscode.pathom.viz.ws-connector.pathom3 :as p.connector])

  (pco/defresolver all-organisations [_]
    {::pco/output [{:organisation/all [:organisation/id
                                       :organisation/name
                                       {:organisation/departments [:department/id]}]}]}
    {:organisation/all
     (mapv (fn [id] {:organisation/id id
                     :organisation/name (str "org-" id)
                     :organisation/departments [{:department/id 1
                                                 :department/name "department-1"}]})
           (range 1 1400))})


  (p.eql/process (-> (pci/register [all-organisations])
                     (p.connector/connect-env {::pvc/parser-id ::env}))
                 {}
                 [{:organisation/all [:organisation/id
                                      :organisation/name
                                      {:organisation/departments
                                       [:department/id :department/name]}]}])

Stefan21:04:03

This takes about 3 seconds on my machine. And 1400 entities is not that big a number of courseā€¦

wilkerlucio21:04:16

thanks for the repro, I'll take a closer look over this week

šŸ™ 1
Stefan10:04:37

@U066U8JQJ Did you get around to looking into this? Is there anything I can do to help you? (Besides learning the internals of Pathom for which Iā€™m afraid I donā€™t have enough time right now šŸ˜…)

wilkerlucio16:04:46

hello @UGNFXV1FA, sorry the delay, I recently got a pet and now a lot of my time have been dedicated to take care of her, rsrs. I did some checks, and the short answer is that this is expected overhead for Pathom. Im starting to write a long answer for this, because I think its good to have a well explained text on the performance characteristics of Pathom. for now, I can tell you if the focus is on raw processing performance, Pathom does add a considerable overhead when processing 1000s of records (about 0.5ms per entity). I saw raw processing because that's when we consider CPU time. But a lot of Pathom usage usually has the bottleneck on IO, and in those cases, using the parallel processor feels more like a garbage collector, something that you could in simple cases make it more performant manually, but when doing complex processes Pathom can optimize things in a way that would be unfeasible to do manually (and maintain), and this makes the Pathom overhead negligible. as I said, I'm going to prepare a long answer to explain why is that in more detail (why the overhead). at this point I think the only way to improve is on micro improvements (I even tried clj-fast, but for some reason it got slower when I did...), or changing the language or the data structures, I'm open to suggestions but at this point I don't see much to done in the short term

wilkerlucio16:04:36

I'm still going to play with the plan for sequences, but that is problematic too, because Pathom focus on dynamism and correctness, and Pathom doesn't expect a sequence of items to have regular data (which means we need a plan for each entity to ensure correctness)

Stefan19:04:56

Thanks for your clarifications @U066U8JQJ. For my case, focus is not necessarily ā€œraw performanceā€, but ā€œgood enough performanceā€ šŸ˜‰ The data that weā€™re querying (e.g. Ā± 1000 organisation names) is not that much, and when this processing time is 3ā€“5 seconds it is a bit off compared to the I/O. In other words, I/O is definitely not the bottleneck anymore. Of course I completely agree with the goal of correctness, and the dynamism, but Iā€™d expect this to be a quite common case. I donā€™t understand enough of the details to make good suggestions Iā€™m afraid, but is it worth thinking about some kind of hint (metadata) that we could pass to Pathom to allow it to make certain assumptions and be more efficient?

wilkerlucio19:04:19

yeah, that's something I'm down to try for sure (specific meta to improve)

wilkerlucio19:04:49

most of the time, since we are doing UI's render, the UI is paginated, so we can go fine loading like 50 items of each thing at time

wilkerlucio19:04:30

but for times where we need to work with larger lists, there is a scape hatch, you can mark something as final, and by doing so Pathom wont perform any sub-query parts (so its up to you to ensure all the data is there)

wilkerlucio19:04:13

so what you suggesting I see more in the middle terms, like trying to mark that a list is consistent, and have the plan doing only once for every item

wilkerlucio19:04:11

we have to test, I expect this optimization may get about 20% ~ 40% reduction in overhead, but there is still a lot Pathom has to do (even when there is nothing to do, Pathom has to figure that there is nothing to do, and that has some cost)

Stefan19:04:06

Yeah that makes sense. Let me know if/when you would like me to test something on our production code. The use case by the way is that we have a (paginated) table of data that contains an ā€œorganisationā€ column. This column has a filter (select box), and the values of that filter need to be populated by all possible organisation names.

wilkerlucio19:04:26

yeah, for that simple collection case (just name and id I guess) its easy to leverage the final, let me give you an example here (I just notice its not in the docs, just went on the queue for that ;))

wilkerlucio19:04:42

(pco/defresolver all-organisations [_]
  {::pco/output [{:organisation/all [:organisation/id
                                     :organisation/name
                                     {:organisation/departments [:department/id]}]}]}
  {:organisation/all
   (with-meta
    (mapv (fn [id]
           {:organisation/id          id
            :organisation/name        (str "org-" id)
            :organisation/departments [{:department/id   id
                                        :department/name "department-1"}]})
      (range 1 1400))
     {::pco/final true})})

wilkerlucio19:04:08

note you can't mark the root map as final, you have mark things inside of it (like in this example we maked the :organisation/all as final)

Stefan20:04:17

Iā€™m going to try this first thing tomorrow, thanks Wilker!

šŸ™ 1
Stefan08:04:30

@U066U8JQJ Using final makes a huge difference.

Stefan08:04:08

What would be the ā€œidiomaticā€ way of using it? Because it depends on the use case whether it should be treated as final or not. Should it be a query parameter that decides it?

wilkerlucio11:04:56

I suggest using a different attribute name for the final version, maybe something like :organization/all-for-menu-final, so when you refer to this attribute you know the constraints

Stefan12:04:21

Thatā€™s a good suggestion, thanks!