datalevin

Jon Hancock 2026-05-11T07:28:17.112399Z

Regarding the new vector embedding behavior: https://github.com/datalevin/datalevin/blob/master/doc/vector.md This seems to work with either a local or remote API call to an embedding model. ..by autofilling the embedding data as needed. My concern is having this behavior as part of datalevin as opposed to a separate process (docs->chunks->embeddings->add-datalevin). A separate process can fail and be restarted without worrying about the integrity and durability of the db. Would appreciate some understanding of where this is going.

Huahai 2026-05-12T23:16:40.986159Z

I would still use a separate process as you described for a production document processing workflow. The addition of embedding service in Datalevin is a convenience for those lightweight embedded use cases that do not need sophisticated workflows, e.g. a personal agent use case, etc. As to your concern about integrity and durability of DB, an external process does not degrade it, as the embedding happens before the write commit, if embeddings fails, transaction fails. That's exactly what a lightweight use case expects: if anything fails, fail fast, don't write any broken data.

Huahai 2026-05-12T23:24:45.762799Z

That said, I can add an option to build secondary index (fulltext and vector) asynchronously. This would be able to handle use cases of higher ingestion throughput requirement while less demand on "read-your-write" for secondary index, for the cases where user do not expect the search is available immediately after the write.

Huahai 2026-05-12T23:31:52.964809Z

This async secondary indexing option should be useful for a data processing pipeline, where you want retries, back pressure, rate limiting, audit, process isolation, and so on. Sure. I will add this.

Huahai 2026-05-14T05:12:23.544579Z

Added :indexing-mode :async option for fulltext, vector and embedding.

Huahai 2026-05-14T05:13:14.787399Z

It supports retires, backoff, and reclaims of work.