I've used Datomic off and on for a while, I ran into something interesting and wanted to confirm the behavior. I thought I knew how https://docs.datomic.com/transactions/transaction-data-reference.html#redundancy-elimination worked, but it turns out that if you do something like the following:
;; TX 1
[[:db/add 1 :some/attr "A"]]
;; TX 2
[[:db/add 1 :some/attr "B"]
[:db/retract 1 :some/attr "A"]]
The log looks something like:
[[1 :some/attr "A" 1001 true] ; assertion from tx 1
[1 :some/attr "B" 1002 true] ; assertion from tx 2
[1 :some/attr "A" 1002 false] ; implicit retraction of existing value
[1 :some/attr "A" 1002 false]] ; explicit retraction, but redundant
To be clear I know the log is not incorrect, but I re-read the redundancy elimination section and I could read it both as excluding the observed behavior as well as allowing it. Anyone able to confirm? Open to my reading comprehension being low as well.Is :some/attr :cardinality/one?
“Excluding the observed behavior” —what would happen instead?
Nm, morning eyes
Is this the history index (d/datoms) or the transaction log (d/tx-range) that you are showing? (It doesn’t look like either)
Either way, this is redundancy elimination that isn’t happening. More details about how you got here welcome
Datomic product and a repl script
@solussd Yep, cardinality one string
@favila This is pseudocode for tx-range because I just wanted a gut check, but I'll get that info today.
https://gist.github.com/gws/56f141171277db2ce3ff3bf06d1703b7 This is with Datomic Pro 1.0.7469
@favila Is that what you were after? If we suspect it's a bug and not just my own misunderstanding, I'm happy to report it elsewhere.
Yes that's what I wanted. Current behavior is not ideal, but can be language-lawyered into what docs describe by understanding what "redundant datoms in tx-data" means. This is odd phrasing because "https://docs.datomic.com/transactions/transaction-data-reference.html" doesn't have datoms (it has tx-fn invocation forms, add/retract forms, and maps). You can understand it as meaning "the redundant datoms produced by expansion of tx-data (input)", not including implicit retractions. The other meaning of "tx-data" is as datom output in transaction log entries (`:tx-data` returned from d/with or :data in d/tx-range). Redundancy elimination is ultimately in service of removing redundancy from the transaction log, but there's no guarantee (nor requirement) that the transaction-log have no duplicate datoms.
Even though the log has a duplicate, the index does not. I've https://gist.github.com/favila/e55af7f3d19289758dae2c78040aaf59 to demonstrate.
Another angle to think about this: "redundancy elimination" is about not including datoms in the pending transaction that aren't needed because of the before-transaction state of the database. That's different from removing duplicate datoms produced by the current transaction.
If you assert something already asserted, or retract something not currently asserted, those datoms are omitted via redundancy elimination; if you retract something that needs to be retracted but do it 2x in the same transaction, that's something else.
OK, that makes sense. This came up in the context of violating an assumption made by a downstream tool processing the tx-log, but like I told the person who initially brought this to me, if you process these attribute values with set logic, there's no fundamental violation, just redundancy. Thanks for taking a look, and for the thorough explanation!
Thought of a more succinct explaination: • redundant datoms are redundant relative to a database they are not already part of, and necessarily have a different tx from any datom already in the database. • Because they have a different tx, redundant datoms are by definition not duplicate datoms. • redundancy elimination is therefore not the same as duplicate elimination: a datom can be non-redundant (ie necessary to express the transition to a new database state) but still duplicated.