datomic

Ovi Stoica 2025-05-16T08:59:31.572309Z

Hello everyone, I’m experimenting with datomic and I have 2 types of data that I’m not so sure is ok to reside in datomic and I’m hoping for some clarity regarding this. 1. One time passwords - the type of verification codes usually of 6 digits that you send to a user to verify his email. These, by their nature are transient but still require a storage layer that is persistent. Normally I’d have a verification table inside postgres, and once the verification is complete, I just delete that particular entry. I use OTPs for forgot password, signup & password change. Is it still a good idea to store them in datomic? It is not a very frequent entry (maybe a user would generate a OTP once a month). 2. Refresh tokens - I’m implementing JWT Auth with https://auth0.com/docs/secure/tokens/refresh-tokens/refresh-token-rotation . When a user receives a refresh token, access token pair, you store the refresh token family in a persistent storage, and once the user tries to refresh you check his submitted refresh token against your stored one. This is a great mechanism for protecting agains hackers who get their hands on a refresh token from a user because when a refresh token is used twice to obtain an access token, everybody is logged out. In the case of refresh tokens, a good rule of thumb is that the lifetime of an access token is around 15 mins, therefore the token needs to be refreshed max every 15 mins depending on how much time the user spends on the website. Most likely you will store at least one refresh token per user per visit and given the “only aditions” nature of datomic, this still feels a bit excessive to me. What are some possible solutions for this? I was thinking to use a psql server with 2 databases: 1 for datomic storage, 1 for transient storage like described above but I’m wondering if there is any easier solution inside datomic Thank you! 🙏 Also leaving the table structure from SQL land of those 2 tables described above:

Ivar Refsdal 2025-05-23T20:02:48.576429Z

@cch1 and @ovidiu.stoica1094 I wrote a Datomic queue thing: https://github.com/ivarref/yoltq/ that may interest you. Like your approach @cch1 it uses CAS for all queue operations. It does not have the concept of "not before", though that shouldn't be too hard to add. It also has a strategies to deal with stuck threads (log their stacktraces) and stale jobs (based on configurable timeouts). It has worked well at my company for a few years as far as I know. </shameless plug>

👍 1
cch1 2025-05-23T20:04:07.677459Z

I studied yoltq before writing my datomic queue. I would have used or adapted yoltq -but I use Datomic Cloud.

👍 1
cch1 2025-05-23T20:06:08.718139Z

Other notable differences: my queue uses core.async for async threads instead of a dedicated pool.

Ivar Refsdal 2025-05-23T20:09:39.204379Z

I feel honored that you studied it 😀 core.async: Right. I haven't done much with core.async. With the new JVM green stuff it has some less value proposition now I suppose? Not saying that core.async is a bad or wrong choice though. I've recently (tried, completed?) writing a multicast tx-report-queue in bare bones Clojure+`future`. It was pretty hard 😕

Joe Lane 2025-05-23T21:33:06.972399Z

I'd say w/ the new JVM green stuff it's going to make using core.async much less painful. Lots of effort has gone into core.async recently to incorporate vthreads and it's going to be an absolute joy to use.

🚀 1
cch1 2025-05-23T21:33:34.503849Z

I am so looking forward to Datomic Cloud supporting this.

Joe Lane 2025-05-23T21:33:43.633669Z

Me too 🙂

Joe Lane 2025-05-19T13:39:49.861249Z

Sounds like a fine approach

cch1 2025-05-19T14:25:40.304759Z

@ovidiu.stoica1094, my approach uses a background job processor (managed with clojure.core.async/timeouts) that periodically queries the DB for unclaimed job entities. It then uses CAS to "claim" the oldest unclaimed job, perform the work (on an async thread) and delete the job. If the job was not performed successfully, it is "returned to the queue" for future retries by transacting the reverse of the claim operation. This is all happening every two seconds concurrently from two or three Datomic Ion EC2 instances. About once a month or so we get a stuck job that is caused by a db timeout transacting the claim or deleting the complete job, but it is easily verified and cleaned out manually. The CAS protects against the worst case scenario of two nodes both claiming the same job (at-most once semantics are important here). Jobs have a "not before" time that allows claiming them in order as well as scheduling for the future. More importantly, a failed job is automatically scheduled to be executed a second (or third ... or fourth) time using exponential backoff. There are plenty of race conditions around the actual "in order" claiming (not to mention db retries, etc) so don't read too much into that. We primarily use this "event queue" with the transactional outbox pattern. I wrote it at a time when I had serious concerns about reliably coupling SQS jobs to Datomic transaction. In practice, I've found those concerns un-warranted, but we still use our "system event processor" all the time because it just works and has such strong guarantees w.r.t. coupling to Datomic transactions.

Joe Lane 2025-05-16T09:28:27.435459Z

Datomic is designed (and intended) to be the storage of record for enduring facts. For OTPs, the volume is so low it doesn't really matter. It's not a great fit at scale, but when your product has millions of users and OTPs are a high-volume use-case, reach out and we can reassess 🙂 For JWT's, that is not a good fit for Datomic because the JWTs are a.) ephemeral and not enduring facts, b.) "high-volume" relative to the cardinality of users and would turn into a fixed-overhead "tax" on the system that would eventually matter, and c.) are (I suspect) large strings, which, as of May 2025, are https://docs.datomic.com/schema/schema-reference.html#notes-on-value-types. Again, this is all "fine" at small scale, but it can really sneak up on you once your product gets going. My recommendation is to keep a pointer into "transient storage" for data that is not an enduring fact.

Ovi Stoica 2025-05-16T09:35:46.418099Z

When you say pointer into “transient storage” you mean to keep a separate storage like SQL (postgres) for this type of data, right?

Joe Lane 2025-05-16T10:23:21.328559Z

Yep. Keep ephemeral data somewhere better suited for that (you can always keep a cached copy in your peer for performance. An alternative and more “sound” approach: You could also make a second Datomic db just for jwt and OTP that rotates a timeshard once per day (or week, etc), then eventually delete the stale ones and run gc-deleted-databases (or whatever that command is called) periodically. May make things simpler to query since in Datomic you can query (read: join) across multiple databases at once ( including N timeshards). If you’re interested in the timeshard approach we can discuss a design for it as I’d like to make a generalization of this pattern for other use cases as well ( eg queues in Datomic)

🤘 1
cch1 2025-05-16T15:54:55.906849Z

Queues in datomic... I have a rough pattern for that, but I would love to see something more reasoned.

cch1 2025-05-16T15:56:31.452819Z

Also, @ovidiu.stoica1094, I have been faced with much the same issues with OAuth refresh tokens but because some of my refresh tokens live for weeks and others only get used for ~1 hour about once a year, I elected to keep them in Datomic. Clearly my scale is not intimidating for this kind of "abuse".

Ovi Stoica 2025-05-17T05:08:44.707409Z

@cch1 Can you go into more details about the queues pattern you are using?

Ovi Stoica 2025-05-17T07:20:22.981759Z

Also to continue on this subject, @joe.lane if I add :db/noHistory true for all of the attributes of the OTP & refresh token entities, would this reduce the storage burden? From what I read, on each new indexing, the transactor will drop both retraction Datoms and assertions that have been retracted for the particular attribute leaving only the most recent value.

Ovi Stoica 2025-05-17T07:42:55.927929Z

I understand that when you add a new entity that will stay there, but most of the time, at least for refresh tokens, it’s the same entity but the token JWT value is rotated. The family is the primary id for this. The entity is cleared only on a breach or when the token expired which is hard since I keep expirty dates to 90 days. So when a token is refreshed, there’s just an update to the existing token entity. Given this, is it ok to assume that :db/noHistory is a good idea, right?

2food 2025-05-16T09:58:36.109129Z

Hi, I was just looking at the Datomic Pro https://docs.datomic.com/reference/hints.html introduced in 1.0.7260. It shows that you can pass hint options to with and transact/`transact-async` , but this is not shown the https://docs.datomic.com/clojure/index.html#datomic.api/with. Does anyone know where or who I should ping to get the docs updated? 😄

2food 2025-07-02T08:11:33.715469Z

Hi @joe.lane. Just pinging you to remind you of this. Sorry if I'm nagging 😅

Joe Lane 2025-07-02T13:01:36.332309Z

@jaret can we get these api docs updated if they aren’t already?

👍 1
jaret 2025-07-02T14:01:52.627589Z

Updated: https://docs.datomic.com/clojure/index.html#datomic.api/with Looking at the rest to confirm they match what is included with the doc strings. @tormathi I definitely recommend using doc until I get this completely cleaned up.

user=> (require '[datomic.api :as d])
nil
user=> (doc d/with)
-------------------------
datomic.api/with
([db tx-data] [db tx-data & opts]) ...

👍 1
2food 2025-07-02T14:32:53.726789Z

Awesome 🙌

Joe Lane 2025-05-16T10:11:34.098519Z

I’ll set a reminder to check into what’s going on this weekend. Having written those api-docs for d/with, I am… perplexed. Thanks for pointing that out. In the meantime, if you have any questions about hints feel free to ask me here directly.

🙏 1
ghadi 2025-05-16T18:57:43.220509Z

hints are very, very cool