Fork me on GitHub

thinking a lot about data fetching today. cache invalidation and garbage collection


in the past, i've been very interested in caching things in a graph store like apollo-graphql, fulcro, or building something out of pyramid


however, it complicates cache invalidation and eviction quite a bit


you have to carefully track references across queries, and make sure that eviction occurs in alignment with both the lifecycle of a request and the lifecycle of the components subscribing to the data


document stores like what react-query does seem much nicer in this regard


the potential upside of a graph store is that you can have graph queries on your cache, which means that you can deduplicate a lot of fetching. but then you need to be able to relate a graph query one or more requests for the data


I think the dedup is important also to keep a consistent state on screen, if you have multiple places for the same data, it means an update of any sorts need to know about the same data in other stores, or the UI could have different state from the same entity at different components


yeah having a way to revalidate/invalidate cached responses on mutation at least is important


react-query seems to get away with doing "clever" things with cache keys and creating groupings/associations between documents based on the keys


So the trade-off between the two approaches would be: with om/pathom/fulcro you gain the update of the entities on the global cache when doing mutations at the expense of having to use EQL when talking to the server in order to optimistically resolve the query from the cache. react-query would be a named cache of the results of the promises we feed it. since they have no entity normalization, when doing mutation it's your responsibility to signal the library somehow to invalidate the result of the query. And it's also the responsibility of the programmer to name the queries consistently and use the same name/params when calling on different locations of the program so the cache can be used. Right ?


@U0516053R one thing doesn't have to be tied to the other, for instance, you can have separate stores to facilitate cache expiration, but still use EQL to load the data, so to avoid having the issue of requiring a new endpoint for each different demand


nice article, although the title is misleading, in the end he teaches how to write a proper data fetch effect 🙂


another though I was having about normalized data on the client, is that you can make dynamic decisions on what data to load, considering the db mapped like fulcro does (every entity is normalized on a table), if you initially load a list of emails, and later wanna click to open details, by checking the local data you can choose to skip the fields already loaded from the query, optimizing that load based on cached data (the fulcro db). what is interesting is that if you wanna load just an email detail from scratch, the client query remains the same, working in the same way, but without skipping any fields (cause cache is empty).


I read some of the react-query docs and was contrasting the fulcro/om db and the query cache mechanism. One focusing the normalization and react-query worried more about refetching, the staleness/freshness (if that is a word) of the data


there are ways to think about it, in the normalized world its like a graph of information, with connected nodes, via query you can "mask" the sections of the graph related to it, via EQL you can decide to reload any of those parts at any time. this also means you can always render something from what you have locally, no matter why it was loaded. I think this removes a huge coordination issue. one challenge is how to clean up things, to ensure you dont remove anything that is required by something else. reading the docs on react-query I see you have to name each query, that returns to the problem of local updates, do you know how they handle it there? from the docs I have only seem manual expiration by hitting the name of the request


It looks like the cacheTime governs the eviction of the cache. They are more concerned with the freshness of the data than the use of the cache. Makes sense to me , since they are not normalizing . . In fulcro you can merge an entity with partial stale values correct ? From the explanation it looks like when the data is the cache they return it and issue a refetch, and the data eviction happens after the query becomes inactive (no observers to it) and cacheTime has passed.


this is easy when you're using graphql or pathom on your server, because 1 query = 1 or less requests (you can batch and deduplicate with introspection). less easy if you already have a REST HTTP API that you're retrofitting a graph query lang on top of


I follow a dude that tweeted about a sql caching engine. As a single programmer I have no necessity for graphql. I have the impression that it solves as many problems as it creates new ones, for which you'll need to find a solution yourself. I've talked to a developer that told me that my problem with graphql was that I was not using a caching layer. The indirection between the sql queries and the pathom/graphql queries. Who is going to assure the query efficiency ? The other thing , I think its hard to think about trees. Our UI's are trees, it's efficient for us to fetch the data as a whole, but we program each component individually. That fits our brain. Trees are hard.


Yesterday I was looking at tanstack table. They are buiding a v8 version of a table that is compatible with vue,solid,svelte,react . I've cloned the examples, but It was not working . I'm shopping data grids. The very minimum that I saw was . It comes with no css-in-js which is very good, but all of them are based on Array . It would be very nice if we could fetch the data with transit and use a vector of maps, and hand the items to the crud for edit , delete, when the data is already on the client.