2023-04-29 pathom | Clojure Slack Archive

pathom

jherrlin 2023-04-29T09:53:33.909489Z

Hey, TLDR; Performance problems with +100 resolvers I have been hacking around with Pathom3 to learn more about the data in my organisation, it has been fun and very educating. The data I'm trying to connect into a graph lives in databases and I wrote some code to generate datasources (sql queries) and resolvers for all the fields in all of the tables. It ended up being pretty many resolvers, at the moment like 200. I experience problems with the planner at around 100 resolvers. The executing time for me, measured with time:

| Resolvers | Elapsed time:       |
|-----------+---------------------|
|       194 | 281095.430791 msecs |
|       193 | 277340.871 msecs    |
|       192 | 286598.39025 msecs  |
|       175 | 236639.142042 msecs |
|       174 | 252806.485417 msecs |
|       111 | 18939.679042 msecs  |
|       108 | 18922.586416 msecs  |
|       105 | 15921.727917 msecs  |
|       102 | 15887.033625 msecs  |
|        99 | 16025.240708 msecs  |
|        96 | 15907.285917 msecs  |
|        93 | 4334.982333 msecs   |
|        90 | 3607.140167 msecs   |
|        87 | 588.06575 msecs     |
|        84 | 554.623 msecs       |
|        72 | 570.746792 msecs    |
|        69 | 557.793917 msecs    |
|        66 | 266.969459 msecs    |
|        21 | 199.753209 msecs    |
|        18 | 194.638125 msecs    |
|        15 | 198.238666 msecs    |
|        12 | 173.358125 msecs    |
|         9 | 168.249292 msecs    |
|         6 | 212.754333 msecs    |
|         3 | 169.075834 msecs    |

I'm not using any planner cache when doing this measures. If I use a cache the first executing takes a long time but then the rest goes fast. Can I somehow increase the sped of the planner? Is it possible to run the planner once, save that to a file and load the planner at application startup? Is there anything else I could do to speed it up? Thx!

wilkerlucio 2023-04-29T17:02:53.289749Z

hello @jherrlin, its recommended that you use at least a memory cache so you get that slow first but later fast processes. if you want to persist the cache on disk to avoid that first one you can also do it by implementing a custom cache store that saves and loads from disk

wilkerlucio 2023-04-29T17:03:11.166749Z

its probably a good idea to make that cache store also read from memory, so you don't have to keep hitting the disk (load from disk once, save in some atom, re-use when the cache key is the same)

wilkerlucio 2023-04-29T17:03:50.565789Z

here you can find some examples to write a custom cache store: https://pathom3.wsscode.com/docs/cache

wilkerlucio 2023-04-29T17:04:04.607989Z

let me know if you need any help to make one for the planner

jherrlin 2023-04-29T17:06:40.824089Z

Thank you for the input! Hopefully I have time tomorrow to read it and try it out. I’ll get back asap 😃

nivekuil 2023-04-30T00:58:11.112099Z

is the planner performance supposed to grow like that? that would mean there's a pretty small practical limit on the # of resolvers regardless of caching

Mark Wardle 2023-05-02T11:29:03.861929Z

Hi @jherrlin - if you have a resolver per attribute, then doesn't that mean that for any given entity you will have (n) queries being executed, where (n) is the number of attributes in any given entity. I generally use a resolver per fetch, and sometimes will fetch across tables to reduce needless database hits when I know one attribute is usually needed with another, even when nested, or when I want to build my property graph in a slightly different shape to the underlying database.

jherrlin 2023-05-02T13:23:30.898059Z

@mark354 yeah I get multiple resolvers for the same entity and not all of the resolvers are identifiers for the entity. My approach here was maybe a bit naive as I thought that I just generate resolvers for everything. I have this map where I translate attributes into the graph. And I first didnt know if I would query it from :ServerA.Player/id or :player/id

{:ServerA.Player/id :player/id ; maps attribute into the graph
 :ServerA.Player/name nil}     ; doesnt map attribute into the graph

I guess I could model this much better but it would take me time, the main takeaway of to get familiar with our domain data by connecting data into the graph. And I do this via this map .

jherrlin 2023-05-02T17:09:44.255209Z

To make it more clear. I generate resolvers for every attribute, but the map tells what attributes that contains the same value and can match. And in most cases it's those mappings that I will be my starting point, but not always.

Mark Wardle 2023-05-02T20:26:33.239859Z

I have 208 resolvers/operations currently and have no performance issues. I don't think I quite understand how you've set up your resolvers per-attribute sorry.

wilkerlucio 2023-05-01T19:15:55.054469Z

I really dont see that as a hack, the design expects recurrent queries in cases that requires more performance, and the fact that the planner can cache the plan for those concurrent transactions is what gives me wiggle room for its processing, which can get quite complex depending on the user setup

nivekuil 2023-05-01T22:20:55.457409Z

ah sorry I misread, thought @jherrlin was having to modify his queries, like making smaller ones and then joining them together outside of pathom for some reason

nivekuil 2023-05-01T22:23:52.035229Z

but good to know that async planning is within design 🙂

2023-05-04T17:06:36.900599Z

I’ll add, too, that we have well over 100 resolvers and are not seeing any king of degradation like that in performance. In production we absolutely have the planner cache turned on, but in development, the planner’s impact is still negligible.

👍 1

jherrlin 2023-05-04T19:16:42.076649Z

At the moment the code is very coupled with our data and I don't think it's a good idea to share it as is. Sense I turned on the cache it works just fine for me.

👍 1

wilkerlucio 2023-04-30T11:47:57.554719Z

hello @kevin842 , this grow is related to the n o resolvers that participates in the same planning, @jherrlin pointed he has one resolver per attr there, which is not ideal, bundling them in less resolvers would prob be quite better (maybe one per table? but really depends on hwo things fit together)

jherrlin 2023-04-30T14:09:34.989019Z

Im my specific case I decided to remove resolvers to some database attributes as they are not identifiers for something. My take now will be to try to "record" a couple of known queries to build up the cache and then persist it so it can be reused over application restarts.

nivekuil 2023-04-30T23:38:26.987519Z

still, don't you think the growth from 87->96 is unusually sharp? I wonder why there's such an inflection point. I could see 90 resolvers being hit in some big queries

wilkerlucio 2023-04-30T23:39:39.394969Z

I dont think you can use this number in a generalized way, it really depends the relations, number of repetitios for the same attr, nested inputs, optional inputs…

nivekuil 2023-04-30T23:41:56.415559Z

yeah, I guess you would have to print out the planning steps to really see where the problem might be

nivekuil 2023-04-30T23:42:19.352619Z

I just think the cache warming strategy sounds like an unfortunate hack

Mark Wardle 2023-05-03T09:47:33.677919Z

Happy to look at some code if you have it…

Eric Dvorsak 2024-01-28T23:20:33.369619Z

@kendall.buchanan I think it really depends on the resolvers inputs/outputs, in my case the planning alone is 500ms 😞

Clojurians Log v2

pathom