Is there way to directly load my datom into datahike without having to do a migration script?
How do you do it at the moment? I am not sure with what you mean with the migration script. There is load-entities which allows you to do a raw import from another database.
I think that's what I need. I was trying sherpa and not winning.
Second question, my cbor backup is ~100MB, but the postgress database is 16GB. Is this normal?
@alekcz360 you can apply (gc! @conn) to collect all snapshots that are older than the current one, that should bring the 16GB down a lot https://github.com/replikativ/datahike/blob/main/src/datahike/experimental/gc.cljc#L45
This is because all intermediate snapshots after each transaction are preserved by default and accessible to all distributed readers as long as they stick around. The hitchhiker-tree was more efficient in handling write operations, but we have not yet ported its functionality over to the persistent-sorted-set. Datomic uses a transaction log overlay before writing to the indices. This has the advantage of creating less index fragments on copy-on-write operations, but requires coordinating with the transactor explicitly to fetch the latest log, which induces a lot of complexity in the Datomic design from what I understand. The hitchhiker-tree has the logs integrated fractally into each node, which is in a sense optimal, but there are subtleties between having to reapply changes on read operations every time, while doing them once at write time and writing an optimal B-tree. Currently the latter is happening without the logs while causing more redundancy in storage usage.
So for now you can just invoke gc! in regular intervals and it should keep the memory usage much lower.
Thanks. I'll give that a shot