Fork me on GitHub
#pathom
<
2023-04-29
>
jherrlin09:04:33

Hey, TLDR; Performance problems with +100 resolvers I have been hacking around with Pathom3 to learn more about the data in my organisation, it has been fun and very educating. The data I'm trying to connect into a graph lives in databases and I wrote some code to generate datasources (sql queries) and resolvers for all the fields in all of the tables. It ended up being pretty many resolvers, at the moment like 200. I experience problems with the planner at around 100 resolvers. The executing time for me, measured with time:

| Resolvers | Elapsed time:       |
|-----------+---------------------|
|       194 | 281095.430791 msecs |
|       193 | 277340.871 msecs    |
|       192 | 286598.39025 msecs  |
|       175 | 236639.142042 msecs |
|       174 | 252806.485417 msecs |
|       111 | 18939.679042 msecs  |
|       108 | 18922.586416 msecs  |
|       105 | 15921.727917 msecs  |
|       102 | 15887.033625 msecs  |
|        99 | 16025.240708 msecs  |
|        96 | 15907.285917 msecs  |
|        93 | 4334.982333 msecs   |
|        90 | 3607.140167 msecs   |
|        87 | 588.06575 msecs     |
|        84 | 554.623 msecs       |
|        72 | 570.746792 msecs    |
|        69 | 557.793917 msecs    |
|        66 | 266.969459 msecs    |
|        21 | 199.753209 msecs    |
|        18 | 194.638125 msecs    |
|        15 | 198.238666 msecs    |
|        12 | 173.358125 msecs    |
|         9 | 168.249292 msecs    |
|         6 | 212.754333 msecs    |
|         3 | 169.075834 msecs    |
I'm not using any planner cache when doing this measures. If I use a cache the first executing takes a long time but then the rest goes fast. Can I somehow increase the sped of the planner? Is it possible to run the planner once, save that to a file and load the planner at application startup? Is there anything else I could do to speed it up? Thx!

wilkerlucio17:04:53

hello @UAEV0STAA, its recommended that you use at least a memory cache so you get that slow first but later fast processes. if you want to persist the cache on disk to avoid that first one you can also do it by implementing a custom cache store that saves and loads from disk

wilkerlucio17:04:11

its probably a good idea to make that cache store also read from memory, so you don't have to keep hitting the disk (load from disk once, save in some atom, re-use when the cache key is the same)

wilkerlucio17:04:50

here you can find some examples to write a custom cache store: https://pathom3.wsscode.com/docs/cache

wilkerlucio17:04:04

let me know if you need any help to make one for the planner

jherrlin17:04:40

Thank you for the input! Hopefully I have time tomorrow to read it and try it out. I’ll get back asap 😃

nivekuil00:04:11

is the planner performance supposed to grow like that? that would mean there's a pretty small practical limit on the # of resolvers regardless of caching

wilkerlucio11:04:57

hello @U797MAJ8M , this grow is related to the n o resolvers that participates in the same planning, @UAEV0STAA pointed he has one resolver per attr there, which is not ideal, bundling them in less resolvers would prob be quite better (maybe one per table? but really depends on hwo things fit together)

jherrlin14:04:34

Im my specific case I decided to remove resolvers to some database attributes as they are not identifiers for something. My take now will be to try to "record" a couple of known queries to build up the cache and then persist it so it can be reused over application restarts.

nivekuil23:04:26

still, don't you think the growth from 87->96 is unusually sharp? I wonder why there's such an inflection point. I could see 90 resolvers being hit in some big queries

wilkerlucio23:04:39

I dont think you can use this number in a generalized way, it really depends the relations, number of repetitios for the same attr, nested inputs, optional inputs…

nivekuil23:04:56

yeah, I guess you would have to print out the planning steps to really see where the problem might be

nivekuil23:04:19

I just think the cache warming strategy sounds like an unfortunate hack

wilkerlucio19:05:55

I really dont see that as a hack, the design expects recurrent queries in cases that requires more performance, and the fact that the planner can cache the plan for those concurrent transactions is what gives me wiggle room for its processing, which can get quite complex depending on the user setup

nivekuil22:05:55

ah sorry I misread, thought @UAEV0STAA was having to modify his queries, like making smaller ones and then joining them together outside of pathom for some reason

nivekuil22:05:52

but good to know that async planning is within design 🙂

Mark Wardle11:05:03

Hi @UAEV0STAA - if you have a resolver per attribute, then doesn't that mean that for any given entity you will have (n) queries being executed, where (n) is the number of attributes in any given entity. I generally use a resolver per fetch, and sometimes will fetch across tables to reduce needless database hits when I know one attribute is usually needed with another, even when nested, or when I want to build my property graph in a slightly different shape to the underlying database.

jherrlin13:05:30

@U013CFKNP2R yeah I get multiple resolvers for the same entity and not all of the resolvers are identifiers for the entity. My approach here was maybe a bit naive as I thought that I just generate resolvers for everything. I have this map where I translate attributes into the graph. And I first didnt know if I would query it from :ServerA.Player/id or :player/id

{:ServerA.Player/id :player/id ; maps attribute into the graph
 :ServerA.Player/name nil}     ; doesnt map attribute into the graph
I guess I could model this much better but it would take me time, the main takeaway of to get familiar with our domain data by connecting data into the graph. And I do this via this map .

jherrlin17:05:44

To make it more clear. I generate resolvers for every attribute, but the map tells what attributes that contains the same value and can match. And in most cases it's those mappings that I will be my starting point, but not always.

Mark Wardle20:05:33

I have 208 resolvers/operations currently and have no performance issues. I don't think I quite understand how you've set up your resolvers per-attribute sorry.

Mark Wardle09:05:33

Happy to look at some code if you have it…

kendall.buchanan17:05:36

I’ll add, too, that we have well over 100 resolvers and are not seeing any king of degradation like that in performance. In production we absolutely have the planner cache turned on, but in development, the planner’s impact is still negligible.

👍 1
jherrlin19:05:42

At the moment the code is very coupled with our data and I don't think it's a good idea to share it as is. Sense I turned on the cache it works just fine for me.

👍 1
Eric Dvorsak23:01:33

@U0HJA5ZQT I think it really depends on the resolvers inputs/outputs, in my case the planning alone is 500ms 😞