datalevin

Stephen Castro-Starkey 2025-03-18T14:01:46.373099Z

Hi! I have found some fascinating behavior in a datalevin collection -- I fetched a list of 5 small-ish objects out of a kv store and proceeded to clojure.walk/stringify-keys on it. It then proceeded to burn through 16GB of RAM, and spend over 5 minutes processing. I would catch this stack trace many times when performing kill -3 on the process:

"nREPL-session-cd302eab-88bc-41f7-9ff3-9f66423bc460" #66 [974563] daemon prio=5 os_prio=0 cpu=200025.86ms elapsed=220.03s tid=0x000076310c0063f0 nid=974563 runnable  [0x00007631179fc000]
   java.lang.Thread.State: RUNNABLE
	at clojure.core$deref.invokeStatic(core.clj:2337)
	at clojure.core$deref.invoke(core.clj:2323)
	at datalevin.spill.SpillableVector.cons(spill.clj:115)
	at datalevin.spill.SpillableVector.cons(spill.clj)
	at clojure.lang.RT.conj(RT.java:697)
	at clojure.core$conj__5474.invokeStatic(core.clj:87)
	at clojure.core$conj__5474.invoke(core.clj:84)
	at clojure.core.protocols$fn__8275.invokeStatic(protocols.clj:167)
	at clojure.core.protocols$fn__8275.invoke(protocols.clj:123)
	at clojure.core.protocols$fn__8229$G__8224__8238.invoke(protocols.clj:19)
	at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:31)
	at clojure.core.protocols$fn__8262.invokeStatic(protocols.clj:74)
	at clojure.core.protocols$fn__8262.invoke(protocols.clj:74)
	at clojure.core.protocols$fn__8203$G__8198__8216.invoke(protocols.clj:13)
	at clojure.core$reduce.invokeStatic(core.clj:6965)
	at clojure.core$into.invokeStatic(core.clj:7038)
	at clojure.walk$walk.invokeStatic(walk.clj:50)
	at clojure.walk$postwalk.invokeStatic(walk.clj:53)
	at clojure.walk$postwalk.invoke(walk.clj:53)
	at clojure.core$partial$fn__5927.invoke(core.clj:2641)
	at clojure.walk$walk.invokeStatic(walk.clj:46)
	at clojure.walk$postwalk.invokeStatic(walk.clj:53)
	at clojure.walk$postwalk.invoke(walk.clj:53)
	at clojure.core$partial$fn__5927.invoke(core.clj:2641)
	at clojure.core$map$fn__5954.invoke(core.clj:2772)
	at clojure.lang.LazySeq.force(LazySeq.java:50)
	at clojure.lang.LazySeq.realize(LazySeq.java:89)
	at clojure.lang.LazySeq.seq(LazySeq.java:106)
	at clojure.lang.Cons.next(Cons.java:41)
	at clojure.lang.RT.next(RT.java:733)
	at clojure.core$next__5470.invokeStatic(core.clj:64)
	at clojure.core.protocols$fn__8275.invokeStatic(protocols.clj:168)
	at clojure.core.protocols$fn__8275.invoke(protocols.clj:123)
	at clojure.core.protocols$fn__8229$G__8224__8238.invoke(protocols.clj:19)
	at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:31)
	at clojure.core.protocols$fn__8262.invokeStatic(protocols.clj:74)
	at clojure.core.protocols$fn__8262.invoke(protocols.clj:74)
	at clojure.core.protocols$fn__8203$G__8198__8216.invoke(protocols.clj:13)
	at clojure.core$reduce.invokeStatic(core.clj:6965)
	at clojure.core$into.invokeStatic(core.clj:7038)
	at clojure.walk$walk.invokeStatic(walk.clj:50)
	at clojure.walk$postwalk.invokeStatic(walk.clj:53)
	at clojure.walk$stringify_keys.invokeStatic(walk.clj:102)
	at clojure.walk$stringify_keys.invoke(walk.clj:102)
If I do an encode/decode operation using CBOR or something (presumably converting all vectors into standard clojure vectors), then the stringify operation comes back in 215msec and doesn't use up all my RAM. Has anyone seen anything like this before?

Stephen Castro-Starkey 2025-03-18T14:03:08.176969Z

If anybody wants to reproduce this I'm happy to send a nippy-thawed file and some sample code.

Huahai 2025-03-18T17:53:16.356509Z

we do not support storing arbitrary objects

Huahai 2025-03-18T17:53:25.178739Z

it's on the roadmap

Huahai 2025-03-18T17:53:38.194289Z

so you don't want to put random objects in it

Stephen Castro-Starkey 2025-03-18T22:33:18.685519Z

Ahh fair. So best if I encode them before storing them?

Huahai 2025-03-19T00:19:47.431939Z

Correct. Even after https://github.com/juji-io/datalevin/issues/234 is implemented, you will still need to do some work.