Fork me on GitHub
#datomic
<
2022-05-09
>
lambdam09:05:44

Hello, I bumped into a behaviour of Datomic composite tuples that might be problematic for my domain modeling. I have two ways of identifying an entity as unique : one with two attributes, one with three. The first attribute is shared among the two.

[;; Shared ident between unique composite tuples
 {:db/ident :foo/name
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one}

 ;; First kind of composite tuple
 {:db/ident :foo/id
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one}
 {:db/ident :foo/name+id
  :db/valueType :db.type/tuple
  :db/tupleAttrs [:foo/name :foo/id]
  :db/cardinality :db.cardinality/one
  :db/unique :db.unique/identity}

 ;; Second kind of composite tuple
 {:db/ident :foo/domain
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one}
 {:db/ident :foo/code
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one}
 {:db/ident :foo/name+domain+code
  :db/valueType :db.type/tuple
  :db/tupleAttrs [:foo/name :foo/domain :foo/code]
  :db/cardinality :db.cardinality/one
  :db/unique :db.unique/identity}
 ]
When I create some entities with the first kind (two attributes), I get the following error at transaction time :
...
:db.error/datoms-conflict Two datoms in the same transaction conflict ...
...
:d1
[17592186186287
 :foo/name+domain+code
 ["plop" nil nil]
 13194139674906
 true],
:d2
[17592186187212
 :foo/name+domain+code
 ["plop" nil nil]
 13194139674906
 true],
...
If I understand well the behaviour of Datomic, as soon as a key of the composite tuple exists, the other ones are automatically considered as existing event if they are not present in the entity (and take the nil value)?! I thought that the composite tuple would exist if and only if all the keys are present on the entity, which doesn't seem to be the case. A workaround would be to never share keys between composite tuples. Would it be the best one? Thanks a lot.

onetom21:05:01

i was just getting tripped up on a similar situation and realized that the composite key is implicitly asserted, when i was not expecting it. it made sense though, after the realization 🙂 just to confirm your example, you are saying: EX1: when u transact

{:foo/name "NAME" :foo/id "ID"}
you expect it to actually transact
{:foo/name "NAME" :foo/id "ID"
 :foo/name+id ["NAME" "ID"]}
but what happens instead is that you get
{:foo/name "NAME" :foo/id "ID"
 :foo/name+id ["NAME" "ID"]
 :foo/name+domain+code ["NAME" nil nil]}
EX2: while when transacting
{:foo/name "NAME" :foo/domain "DOMAIN" :foo/code "CODE"}
it should mean
{:foo/name "NAME" :foo/domain "DOMAIN" :foo/code "CODE"
 :foo/name+domain+code ["NAME" "DOMAIN" "CODE"]}
but what happens instead is that you get
{:foo/name "NAME" :foo/domain "DOMAIN" :foo/code "CODE"
 :foo/name+id ["NAME" nil]
 :foo/name+domain+code ["NAME" "DOMAIN" "CODE"]}
did i understand it correctly?

onetom21:05:48

i have the feeling, that since nil is an allowed value in tuples - and why wouldn't it be? -, if any attributes, which participate in tuple attributes, are asserted a value of, then the corresponding tuple attribute values are implied.

onetom21:05:42

i have a strong feeling though, that the way you modelled the data in question, is somehow incorrect, incomplete, over simplified, or something like that. maybe there is an other entity lurking in the data model. if u acknowledge its existence by representing it as its own entity and use a :db.type/ref attribute to connect it to the current entity in question, then this problem would go away. maybe u haven't done so, because in the real-world domain, some of these entities doesn't have an established name? maybe because they don't map to some useful, real-world concept?

lambdam15:05:44

Thanks you for your answers. For EX1 and EX2, yes that is exactly what I meant. The strange thing is that nil isn't allowed as a value for regular idents (if I understood well). I don't get why it would be valid in a tuple. The annoying thing is that the composite tuple is also declared :db.unique/identity. So nil values and partially filled tuples will of course conflict with each others. My initial understanding of the behaviour of composed tuples was that it would exist iff all keys exist, but indeed it seems to be as soon as a key exists. The way I use it to model uniqueness of a remote system with information injected in our system. The :foo/name and :foo/domain are here for namespacing. Some values need 2 segments to assert uniqueness, others need 3 with two segments of namespacing. For the time being, I circumvented the problem by using different idents for the two cases (2 segments and 3 segments). But semantically, the first one is the same.

onetom00:05:08

you can provide various constraints on entities, if you want to avoid the situations with nils in tuple attrs, using :db/ensure: https://docs.datomic.com/cloud/schema/schema-reference.html#attribute-predicates

lambdam11:05:17

--- Also, are Datomic strings interned ? Or are they repeated in the DB every time they occur ? And is there a performance difference between a keyword index and a string index ? Thanks

Linus Ericsson11:05:08

I am familiar with on-prem (not Datomic Clous) so I will describe on-prem. Keywords in transactor and peer are using standard clojures machanisms for keywords. I would recommend against generating keywords for indexing entities in the application - they are mostly handlers for humans. (Enums and attribute names are suitable for keywords!) When strings are written to storage, they are compressed, and probably sometimes deduplicated (by serialization formats transit or fressian) in the blocks of datoms written (blocks are up to about 65 kb in size). I dont know exactly where/when strings are being interned in the peer or transactor. When you are using a string as an index, the index is realized in the transactor and in the peer. The strings must not nescessairly be fully realized there (could use tries or similar datastructures) but the string content is somehow loaded into memory - either in object cache or as more regulare data structures on the heap (modulo javas and the CPU:s various string optimizations). Use strings as identity ids. Dont use generated keywords for user objects. I would not worry about the memory usage of the indexes for ”normal loads”, whatever that means (a java char in an array or similar takes 2 bytes of RAM as UTF-16). If you have very special requirementsnof indexes, datomic is very well suited to have indexes kept in memory on each peer, driven by the transaction log. The built-in fulltext index is such a process.

lambdam15:05:04

Thanks a lot for your precise answer.

enn19:05:09

When using a collection binding (`[?foo ...]`) binding in a query input, what happens if the value passed for that input is an empty collection? Is the query executed at all? It seems like it is not. If that’s so, what’s the preferred way to express a query that needs to take a collection, but also needs to be able to handle empty collections?

onetom21:05:15

Q1. what do you mean by the "query not being executed"? Q2. isn't an empty input collection means there is nothing to query? you might want to use something like (or [?e :attr ?foo] [(ground ::not-found) ?e]) don't know whether (ground nil) is allowed; might be...

enn21:05:10

might be easier with an example…

enn21:05:33

Here’s a simplified version of my query:

(def my-query
  '[:find ?got-here .
    :in [?test-input ...]
    :where
    (or-join [?test-input ?got-here]
      (and [(ground :match) ?test-input]
           [(ground 1) ?got-here])
      [(ground 2) ?got-here])])

enn21:05:34

(d/q my-query [:match]) returns 1, as expected

enn21:05:53

(d/q my-query [:doesnt-match]) returns 2, as expected

enn21:05:26

without having thought about it very much, I was expecting (d/q my-query []) to also return 2 , that is, I expected the second clause of the or-join to match. Instead it returns nil. I guess because the [?test-input …] construct acts like an implicit or across all values in the input, and in this case there are none

onetom21:05:32

When I use a non-existent :db/ident in a pull, eg: (d/pull db-val ['*] :NON-existent), I get #:db{:id nil}, which is quite convenient. BUT, when I use a non-existent :db/ident in a query, I get an exception: > Execution error (IllegalArgumentException) at datomic.core.datalog/resolve-id (datalog.clj:330). > Cannot resolve key: :NON-existent example query:

(-> '{:find  [?referencing-entity]
        :in    [$ ?referenced-entity]
        :where [[?referencing-entity :ref/attr ?referenced-entity]]}
      (d/q (:val @dc) :NON-existent))
Is there some idiom for getting empty results in such case? In the production application, the ?referenced-entity is never a :db/ident, but in tests, it's practically always a :db/ident...