This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-10-06
Channels
- # aleph (15)
- # announcements (2)
- # babashka (121)
- # beginners (62)
- # biff (6)
- # cherry (2)
- # cider (51)
- # clerk (30)
- # cljs-dev (5)
- # clojure (77)
- # clojure-austin (2)
- # clojure-europe (10)
- # clojure-germany (6)
- # clojure-nl (1)
- # clojure-norway (19)
- # clojure-romania (1)
- # clojure-uk (3)
- # clojurescript (16)
- # core-typed (7)
- # cursive (17)
- # datomic (12)
- # deps-new (11)
- # emacs (7)
- # events (2)
- # fulcro (5)
- # honeysql (2)
- # hyperfiddle (32)
- # introduce-yourself (1)
- # jobs-discuss (2)
- # membrane (18)
- # missionary (2)
- # music (5)
- # polylith (7)
- # reagent (26)
- # releases (5)
- # testing (32)
- # tools-build (14)
- # tools-deps (7)
- # xtdb (8)
Is there a way to add bitemporality to Datomic Cloud?
Unfortunately not, unless you follow a certain design pattern in your schemas.
For instance, you can (and should) treat the txInstant
associated with transactions as the dates at which Datomic learned of a fact, but never a domain time. Domain times should go into each of your database schemas instead.
Accounting is a classic example of this. In Datomic, you would want txInstant
s to represent when you’ve learned about the existence of e.g. some asset, but the domain time should represent when such asset actually existed. This can be kept in your schema (in the case of XTDB, you can explicitly set a valid-time
when transacting the entity representing this asset).
It practically gives you bitemporality, but valid times will not be first-class citizens like they are in XTDB.
In a running system it’s probably less important. I think it’s potentially more important when migrating already known facts.
Hello, I’m sure this has been brought up before, but I’m struggling to find a good definitive resource about it. I’ve heard about a 10-100 billion datoms limit on Datomic. I’m looking at Datomic as a way to store a lot of facts (maybe 10 million per day). It seems that I would very quickly run into this limit. I’m wondering what the symptoms of reaching this limit would look like? (e.g., slower and slower queries?). Small added question: Is an update to a datom considered a new datom? So if I am making a lot of updates to values (again millions per day), could this not be a good use case for Datomic?
^ use case & domain matter a lot. especially what features of datomic you see as most relevant. it’s useful to think of what bounds your transactional requirements. you can run w/multiple datomic dbs, so long as you don’t typically transact in ways where you have to coordinate across multiple dbs. time windows, regional splits, customer/tenancy, etc all could imply different sharding strategies for separate dbs (or partition strategies in pro to handle keeping queries on large dbs zippy). also what are your needs for accessing historical data? an on-going application request/response cycle for which as-of queries remain critical? or would way back as-of questions mostly see data warehouse or dev environment use? also do you ask substantive granular questions of everything that comes in the fire hose, or mostly just about extracted or aggregated info? it’s not uncommon for datomic systems to ingest only selections or transformations of data that’s being captured in the raw w/something like a high throughput event stream.
Without knowing more context, 10m facts/day also sounds like the type of data that a time series database is far better able to handle than Datomic
To answer your question about what constitutes a datom– you're correct, each update.
Hitting 10 billion datoms in 1000 days isn't unmanageable with any one of the sharding strategies Ben suggested. Regarding what happens– some of your fetches need to reach a little further down a tree the more datoms you have. My understanding is there isn't a real limit.
Really appreciate the responses. Yes, this is perhaps data that a time series database might handle better. For context, this is finance data for fixed income securities. Each day’s return can be split by risk factor. Each security could have up to 50 risk factor returns per day. Let’s say the total security universe is around 1 million (however, this could be bigger with more history). So, 5 million datoms per day.
I am making calculations (big map / reduce) over this data. However, some of the reduce components are dynamic (decisions are made by the user at query time). What is appealing about Datomic is the flexibility in the query language. Secondarily, features like as-of queries will be useful for deterministically understanding results that were run in the past.