announcements

whilo 2026-01-17T10:15:53.732819Z

Announcing Proximum - Persistent Vector Database with Git-like Versioning We're excited to release Proximum, an embeddable vector database for Clojure and Java that brings persistent data structure semantics to vector search. Key Features:Git-like versioning - branches, commits, time-travel queries • Zero-cost branching - fork indices for experiments without copying data • Clojure collection protocols - use assoc, dissoc, get on your index • SIMD-accelerated - ~50% of native C++ hnswlib performance, pure JVM • Spring AI & LangChain4j integrations included

(require '[proximum.core :as prox])

(def idx (prox/create-index {:type :hnsw :dim 384 :capacity 10000
                             :store-config {:backend :memory :id (random-uuid)}}))

;; Works like a Clojure map
(def idx2 (assoc idx "doc-1" (float-array (repeatedly 384 rand))))

;; Git-like operations
(prox/sync! idx2)
(def experiment (prox/branch! idx2 :experiment))
Perfect for RAG applications where you need reproducible results, A/B testing embeddings, or audit trails. Install:
org.replikativ/proximum {:mvn/version "0.1.2"}
Links: • GitHub: https://github.com/replikativ/proximum • Product page: https://datahike.io/proximum/ 📋 Help us prioritize! Please fill out our 2-min feedback survey: https://docs.google.com/forms/d/e/1FAIpQLSeUQuw5SPyIx661e1pwZiX0100bP-DPpF2Zfpptg1h6k14OTA/viewform Requires Java 22+. This is an early beta - feedback welcome!

5
5
🔥 27
💜 1
whilo 2026-01-17T10:16:39.252119Z

Integration into #datahike as a secondary index is planned. Here are examples of how it can already be manually integrated into persistent databases like Datomic. Datahike, DataScript or XTDB: https://github.com/replikativ/einbetten/blob/main/docs/datalog-semantic-search-patterns.md

JAtkins 2026-01-17T21:57:29.136749Z

That's dope. I have a project in the planning stages that's using Datahike as the DB, and I thought I was going to have to lean on my python service to manage a separate Vector DB 🙂

whilo 2026-01-17T22:02:45.675099Z

Nice 🙂. What application of a vector db do you need? If you have a moment you could also fill out the form or lmk what I should add to it to get a clearer picture of the needs.

JAtkins 2026-01-17T22:07:20.910309Z

I'm not a paying customer, just doing a random personal project. But, if you are curious, it's basically a big RAG DB of podcasts. I listen to alot, and I'm building a tool to download them all to a self hosted s3 bucket, transcribe them with whisper, then index them for future search-ability.

whilo 2026-01-17T22:11:23.678179Z

Right, the replikativ libraries always were and will be open source projects and I don't expect people to pay to use it. I need to gauge commercial interests though to grow it. Yes, I am curious. That sounds very cool and makes a lot of sense. I used whisper, too, in my academic work. What embedder do you use? I found fastembed which works on CPU only and promises to be somewhat competitive recallwise, but I am a bit skeptical. I guess the Qwen embedding models are maybe a good compromise right now, but then you need a GPU or an external provider (including the latency that induces during searches).

👍🏻 1
JAtkins 2026-01-17T22:15:31.055659Z

For sure, I just don't want to steer you because my use case doesn't end in payment most likely 🙂. For embedding, I think was using multi-qa-MiniLM-L6-cos-v1 via python SentenceTransformer, though don't take that as a recommendation - I have never loaded enough data into it to see what the recall quality is for my use case.

👍 1