2026-04-26 datahike | Clojure Slack Archive

datahike

grounded_sage 2026-04-26T00:11:43.424999Z

What are the known trade-offs with the immutable git-like semantics? I might have done something wrong but I noticed that with about 1gb of data it blew up to like 80GB when it's transacted in 1000’s of transactions instead of batched. I know we get great read distribution but is the trade-off here that we pay more in storage for high write applications?

whilo 2026-04-27T01:03:26.963459Z

How many commits did you make? if you transact in very small batches then you will get inflated storage usage. There are different ways to mitigate this, one is to use the online gc during import or bulk loads, or gc in general if you don't need the git history anymore.

whilo 2026-04-27T01:06:53.327239Z

If you can give me a way to reproduce then I can also take a closer look at your example.

whilo 2026-04-27T01:07:53.520589Z

https://github.com/replikativ/datahike/blob/main/doc/gc.md?plain=1#L72

whilo 2026-04-27T01:09:38.456609Z

To free up storage once it should be enough to run gc. Note that gc does remove your branch history though (not the history-db, that is internal if you have activated it).

grounded_sage 2026-04-26T04:10:19.982659Z

My assumption is I have done something very wrong. Or it's some local hacks I have made along the way.

whilo 2026-04-27T07:42:55.467419Z

Did you have the history support on in the db config?

grounded_sage 2026-04-27T09:13:09.413559Z

Yes have history support on. Was 11k+ transactions. Will do the gc as don't need it. Just didn't know it would grow so much.

whilo 2026-04-27T09:16:59.323859Z

The online gc will make it behave more like a mutable database overwriting and removing history immediately, you can also optionally kick off the offline gc in regular intervals. I think @alekcz360 is doing this now. I haven't implemented that by default yet, because I first wanted to provide the online gc to make bulk imports as efficient as possible and it is easy enough to do explicitly in a loop, but there should probably be a convenience function for this.

👍 1

grounded_sage 2026-04-27T10:24:04.782099Z

brought 98gb to 2gb

whilo 2026-04-27T18:46:59.129649Z

good 🙂

Clojurians Log v2

datahike