Datomic separates reads (direct to storage/cache) from writes (via transactor) - would it be possible to run a "read only" mode of Datomic where it doesn't need the transactor running?
Your word of the day is "Prescient" @felix 🙂
LOL. Thanks. My use case is that I'm running Datomic (Pro) on GCP, and I have a "Datomic Storage Proxy" that exposes a JDBC driver and uses GCP Datastore on the backend (which scales to zero, unlike a SQL server) - most of the workloads are read only, and very intermittent, so if I could get away with only starting a transactor when it's needed, that would be ace 👍 I'm really happy with Datastore as a storage backend for Datomic - here's the last 30d of traces for the proxy and the monitoring dashboard, noting this is just a dev environment, not a production env
Can you send me a link to this gcp storage stuff you're using?
https://clojurians.slack.com/archives/C03RZMDSH/p1710589063787439
that's a bit old now - the version I'm running is an updated version of that codebase though (I removed datalevin support, added a bunch of quality of life improvements). If there's interest, totally up for open sourcing the newer version!
And is this because gcp doesn't offer sql as a service?
Transactional SQL* like MySQL or Postgres?
Not in the slightlist. GCP has lots of SQL options. But datomic doesn't need SQL, it just needs a KV backend, right? GCP has all the SQL variants as managed services. They don't scale to zero, so you pay for servers even if you have intermittent workloads. For enterprise "always on" workloads, that's not an issue. For my workloads, where I spin up a DB for a project, and may not need it 99% of the time, that's not what I want
On AWS, you can use Dynamo as a backend. GCP Datastore is Google's Dynamo equivalent
Ahh. If we had a Datastore backend would you have used that instead? Is there an S3 equivalent?
Yeah, if Datomic had a pluggable KV backend, I wouldn't have built the JDBC wrapper. I asked Marshall about this in 2018 (obv a long while ago now) and he said "I definitely understand the desire to run on the GCE - I'll pass along the feedback. I don't expect that we are likely to open the storage API, as the majority of our support issues currently already stem from misconfiguration or issues with existing storages and supporting 3rd party storage integrations would be quite difficult." - so I decided to stop waiting and just build it myself laughcry
Also, how much disk space does your Peer machine have? I saw you had about 5k datoms in your system, is that right?
My peers are ephemeral, at the moment, rather than long running. If I'm reading these metrics correctly, I've got some databases with 15 million entities
this system is an agentic AI processing system - so the DB with 15m entities is actually a cache/memoization layer for LLM calls
can I ask why my question about "read only" mode was Prescient? is that something in the works?
You’ll just have to stay tuned
Just curious, What do you wish Datomic had to make your agentic project better?
Hmm, the only thing that comes to mind is arbitrary edn storage as currently I store edn (as string blobs) in a few places. I have external vector indexing, otherwise I would have said that. but I think vector indexes don't belong in datomic TBH