Is there a way to see know how many bytes a datom or entity consumes at-rest in Datomic? Or do I have to estimate it from Fressian types?
You estimate. Fressian has a value cache too, and there’s also gzipping
A better approach might be to see your consumption in aggregate using metrics rather than count calories per datom. Datomic is not a system optimized for minimizing storage use
https://docs.datomic.com/operation/monitoring.html#transactor-metrics note the metrics ending in Bytes
Keep in mind: data is written to a log when transacted (as fressian) and simultaneously added to a memory index (as jvm objects). That data isn’t put into fressian in an index until later
Disk layout on a typical sql storage system matters more because: sql usually reading directly off the disk format (datomic is not), it only has row-oriented storage (datomic does not: aevt is column oriented), and it updates all disk structures and indexes while committing the transaction (datomic does not: indexing is an asynchronous batch task)
Any tips for better understanding :db.unique/value? I've never been 100% clear on the exact use cases for it. To me :db.unique/identity makes more sense intuitively; You have something that represents the identity of an entity and this is useful because it lets you use it for upserts, so I don't need to find out the :db/id of something before adding data about it. A common case for me is I have some external service's data that I just want to put in my db, so I can just conveniently use their ids. In my usage of datomic I've used this a lot. There were very interesting threads here about if that's good or not, and I think they might have sparked this realization that I don't understand :db.unique/value well enough.
:db.unique/value doesn't allow upserts. I feel tempted to erroneously think of this as "doesn't allow updates to the entity", because in the above situation where I'm just transacting some external service's data, the first time it will succeed once and subsequently it will fail. Of course this is a wrong way to think about it, the entity can still have any number of other datoms "updated", you are just forced to explicitly use the :db/id. So you will be often forced to look up an entity before transacting info about it.
I think there are cases I've hit where :db.unique/value feels like the right thing to use but it's not as clear in my head exactly. https://docs.datomic.com/schema/schema-reference.html#db-unique-value on it leave me wanting a little more. Basically, if I had to write my own paragraph about :db.unique/value and when to use it in practice, I don't think it would be that coherent and I'm hoping to fill this gap.
Looking at https://github.com/Datomic/mbrainz-importer/blob/master/subsets/entities/schema.edn it seems like here unique values are used almost as "enum value constant entities", but it's not 100% clear to me as to why exactly. Like why is disabling upsert behavior be preferred here?
Use it when you don't want upserts. One example from our code base is :command/uuid which should never change after it has been done. On the contrary, we want the transaction to fail if we by some mistake try to run the same command more than once.
> So you will be often forced to look up an entity before transacting info about it. somebody has to look it up, this just makes the transactor look it up. peers are scalable places, transactors are not.
ultimately this comes down to whether your entity has a lifecycle (create/update/delete). If your entity has a lifecycle, upsert is a very cool way for entities to get into incoherent states (e.g. you delete it, then some racing update adds back the identity and some random attribute)
very simple, timeless, stateless entities are a good fit for db.unique/identity; for example, :db/ident is db.unique/identity
I think it is a very overused footgun though, and :db.unique/value should be your default for schema and operation modeling
Note that db/add and db/retract and the map syntax all accept [:lookup "refs"] (and ident keywords) for db ids, so it's still wrong that you "have" to look it up--you will just get an error if it doesn't find anything, instead of an implicit assertion
{:db/id [:my-attr "my-value] ...} is fine
in fact it adds extra safety in case an update comes after a delete
because it reasserts that some entity still asserts that attribute and value
I think @favila and I place different value on :db.unique/identity but I agree wholeheartedly with his warning of the dangers of assigning a unique identity to a thing that can be deleted at the end of a lifecycle. Something with a unique identity should be very permanent. If you are in the habit of modeling with permanent entities then you can consider it.
For example, one of our partners tells us about "generator" entities for which they are the authority. Sometimes they tell us about the address of the generator. Sometimes they tell us about the power rating of the generator. The order in which we learn these indisputable facts varies, but the identifier provided by the partner is the same. With upsert, I don't care about the order either: I'm simply recording indisputable, indelible facts incrementally/asynchronously about the entity. When recording permanent facts about a permanent entity, reading before writing is, IMHO, incidental complexity. Datomic upsert can remove that complexity and free you from the burden (cognitive, and code) of tracking whether or not you have already encountered the entity.
Be very careful using db.unique/identity for entities that you "birth". Use it for entities that have an existence independent of your application and which have an indisputable authority that has given them their identifier. To reinforce this point, in our applications we name schema attributes by the authority who governs them (usually their issuance, in the case of unique identifiers).
Thanks all, these are super useful and give me some stuff to think about!