Hello everyone,
I’m experimenting with datomic and I have 2 types of data that I’m not so sure is ok to reside in datomic and I’m hoping for some clarity regarding this.
1. One time passwords - the type of verification codes usually of 6 digits that you send to a user to verify his email. These, by their nature are transient but still require a storage layer that is persistent. Normally I’d have a verification table inside postgres, and once the verification is complete, I just delete that particular entry. I use OTPs for forgot password, signup & password change. Is it still a good idea to store them in datomic? It is not a very frequent entry (maybe a user would generate a OTP once a month).
2. Refresh tokens - I’m implementing JWT Auth with https://auth0.com/docs/secure/tokens/refresh-tokens/refresh-token-rotation . When a user receives a refresh token, access token pair, you store the refresh token family in a persistent storage, and once the user tries to refresh you check his submitted refresh token against your stored one. This is a great mechanism for protecting agains hackers who get their hands on a refresh token from a user because when a refresh token is used twice to obtain an access token, everybody is logged out.
In the case of refresh tokens, a good rule of thumb is that the lifetime of an access token is around 15 mins, therefore the token needs to be refreshed max every 15 mins depending on how much time the user spends on the website. Most likely you will store at least one refresh token per user per visit and given the “only aditions” nature of datomic, this still feels a bit excessive to me.
What are some possible solutions for this? I was thinking to use a psql server with 2 databases: 1 for datomic storage, 1 for transient storage like described above but I’m wondering if there is any easier solution inside datomic
Thank you! 🙏
Also leaving the table structure from SQL land of those 2 tables described above:
@cch1 and @ovidiu.stoica1094 I wrote a Datomic queue thing: https://github.com/ivarref/yoltq/ that may interest you. Like your approach @cch1 it uses CAS for all queue operations. It does not have the concept of "not before", though that shouldn't be too hard to add. It also has a strategies to deal with stuck threads (log their stacktraces) and stale jobs (based on configurable timeouts). It has worked well at my company for a few years as far as I know.
</shameless plug>
I studied yoltq before writing my datomic queue. I would have used or adapted yoltq -but I use Datomic Cloud.
Other notable differences: my queue uses core.async for async threads instead of a dedicated pool.
I feel honored that you studied it 😀 core.async: Right. I haven't done much with core.async. With the new JVM green stuff it has some less value proposition now I suppose? Not saying that core.async is a bad or wrong choice though. I've recently (tried, completed?) writing a multicast tx-report-queue in bare bones Clojure+`future`. It was pretty hard 😕
I'd say w/ the new JVM green stuff it's going to make using core.async much less painful. Lots of effort has gone into core.async recently to incorporate vthreads and it's going to be an absolute joy to use.
I am so looking forward to Datomic Cloud supporting this.
Me too 🙂
Sounds like a fine approach
@ovidiu.stoica1094, my approach uses a background job processor (managed with clojure.core.async/timeouts) that periodically queries the DB for unclaimed job entities. It then uses CAS to "claim" the oldest unclaimed job, perform the work (on an async thread) and delete the job. If the job was not performed successfully, it is "returned to the queue" for future retries by transacting the reverse of the claim operation. This is all happening every two seconds concurrently from two or three Datomic Ion EC2 instances. About once a month or so we get a stuck job that is caused by a db timeout transacting the claim or deleting the complete job, but it is easily verified and cleaned out manually. The CAS protects against the worst case scenario of two nodes both claiming the same job (at-most once semantics are important here). Jobs have a "not before" time that allows claiming them in order as well as scheduling for the future. More importantly, a failed job is automatically scheduled to be executed a second (or third ... or fourth) time using exponential backoff. There are plenty of race conditions around the actual "in order" claiming (not to mention db retries, etc) so don't read too much into that. We primarily use this "event queue" with the transactional outbox pattern. I wrote it at a time when I had serious concerns about reliably coupling SQS jobs to Datomic transaction. In practice, I've found those concerns un-warranted, but we still use our "system event processor" all the time because it just works and has such strong guarantees w.r.t. coupling to Datomic transactions.
Datomic is designed (and intended) to be the storage of record for enduring facts. For OTPs, the volume is so low it doesn't really matter. It's not a great fit at scale, but when your product has millions of users and OTPs are a high-volume use-case, reach out and we can reassess 🙂 For JWT's, that is not a good fit for Datomic because the JWTs are a.) ephemeral and not enduring facts, b.) "high-volume" relative to the cardinality of users and would turn into a fixed-overhead "tax" on the system that would eventually matter, and c.) are (I suspect) large strings, which, as of May 2025, are https://docs.datomic.com/schema/schema-reference.html#notes-on-value-types. Again, this is all "fine" at small scale, but it can really sneak up on you once your product gets going. My recommendation is to keep a pointer into "transient storage" for data that is not an enduring fact.
When you say pointer into “transient storage” you mean to keep a separate storage like SQL (postgres) for this type of data, right?
Yep. Keep ephemeral data somewhere better suited for that (you can always keep a cached copy in your peer for performance. An alternative and more “sound” approach: You could also make a second Datomic db just for jwt and OTP that rotates a timeshard once per day (or week, etc), then eventually delete the stale ones and run gc-deleted-databases (or whatever that command is called) periodically. May make things simpler to query since in Datomic you can query (read: join) across multiple databases at once ( including N timeshards). If you’re interested in the timeshard approach we can discuss a design for it as I’d like to make a generalization of this pattern for other use cases as well ( eg queues in Datomic)
Queues in datomic... I have a rough pattern for that, but I would love to see something more reasoned.
Also, @ovidiu.stoica1094, I have been faced with much the same issues with OAuth refresh tokens but because some of my refresh tokens live for weeks and others only get used for ~1 hour about once a year, I elected to keep them in Datomic. Clearly my scale is not intimidating for this kind of "abuse".
@cch1 Can you go into more details about the queues pattern you are using?
Also to continue on this subject, @joe.lane if I add :db/noHistory true for all of the attributes of the OTP & refresh token entities, would this reduce the storage burden? From what I read, on each new indexing, the transactor will drop both retraction Datoms and assertions that have been retracted for the particular attribute leaving only the most recent value.
I understand that when you add a new entity that will stay there, but most of the time, at least for refresh tokens, it’s the same entity but the token JWT value is rotated. The family is the primary id for this. The entity is cleared only on a breach or when the token expired which is hard since I keep expirty dates to 90 days.
So when a token is refreshed, there’s just an update to the existing token entity. Given this, is it ok to assume that :db/noHistory is a good idea, right?
Hi, I was just looking at the Datomic Pro https://docs.datomic.com/reference/hints.html introduced in 1.0.7260. It shows that you can pass hint options to with and transact/`transact-async` , but this is not shown the https://docs.datomic.com/clojure/index.html#datomic.api/with. Does anyone know where or who I should ping to get the docs updated? 😄
Hi @joe.lane. Just pinging you to remind you of this. Sorry if I'm nagging 😅
@jaret can we get these api docs updated if they aren’t already?
Updated: https://docs.datomic.com/clojure/index.html#datomic.api/with
Looking at the rest to confirm they match what is included with the doc strings. @tormathi I definitely recommend using doc until I get this completely cleaned up.
user=> (require '[datomic.api :as d])
nil
user=> (doc d/with)
-------------------------
datomic.api/with
([db tx-data] [db tx-data & opts]) ...Awesome 🙌
I’ll set a reminder to check into what’s going on this weekend. Having written those api-docs for d/with, I am… perplexed. Thanks for pointing that out. In the meantime, if you have any questions about hints feel free to ask me here directly.
hints are very, very cool