Fork me on GitHub
#datascript
<
2024-01-27
>
Mark Wardle22:01:40

Hi all. Have been looking at the new storage options - has anyone experience of how that would work with large databases - I see data will only be loaded on demand, but does that mean in a long-running process eventually all data will end up in memory? I can see the garbage collection code, but that looks as if that clears deleted data, rather than data not being used?

rutledgepaulv17:01:13

My understanding is basically the persistent sorted sets load data from disk lazily as the nodes of the tree are accessed. Once the data is loaded it is kept in soft refs (by default) or optionally in weak refs. What that means is the garbage collector can reclaim that space when nothing else is holding a reference to it (like the literal results of a query). If there's memory pressure and the space ends up being reclaimed by the GC then the next time the section of the tree representing the persistent sorted set is accessed it'll be read from disk again.

rutledgepaulv17:01:10

The garbage collection process you're referring to is a separate feature not directly related to the soft/weak refs and JVM garbage collector (which is what I'm describing). That feature is about removing excess addresses from storage that can accumulate.

Niki21:01:16

^ what Paul said, 100% correct

Mark Wardle21:01:41

Perfect. Thank you both.