Fork me on GitHub
#xtdb
<
2021-08-02
>
Tuomas04:08:21

I'm trying to implement a batch put transaction fn, that takes an optimal batch (1-1000 documents iirc?), enforces unique attributes and either fails or succeeds completely. After validating input documents, my first intuition in checking db for unique attribute value violations is to use collection binding

(q '{:find [e]
       :in [[unique-attribute ...]]
       :where [[e :unique-attribute unique-attribute]]
       :limit 1}
     vector-of-unique-values)
Now if there are multiple unique attributes then I'm not sure if I should combine them into the same query or do a query for each (not only considering latency but also the load these queries are incurring)? In any case if I try to combine them into the same query I'm having trouble understanding how it should be done. Or requires same logic variables and or join must not have free-args
(q '{:find [e]
        :in [[unique1 ...] [unique2 ...]]
        :where [(or-join [unique1 unique2]
                         [e :unique1 unique1]
                         [e :unique2 unique2])]
        :limit 1}
      uniques1
      uniques2)

Tuomas05:08:34

I'm getting good enough performance with this, but I'd appreciate comments

(some
    not-empty
    (for [[attribute values] [[:unique1 (->> (range 1000) (map hash) (map str))]
                              [:unique2 (->> (range 1000) (map hash) (map str))]]]
      (time (q {:find '[e]
                :in '[[attribute ...]]
                :where [['e attribute 'attribute]]
                :limit 1}
               values))))

refset08:08:25

Hi, when the compiler complains that "or-join requires the same variables in each leg" it's because it is a fundamental principle of Datalog, but it doesn't really care how exactly you are using the variables...so you can add no-op clauses like this to keep the compiler happy 🙂

(q '{:find [e]
        :in [[unique1 ...] [unique2 ...]]
        :where [(or-join [unique1 unique2]
                         (and [e :unique1 unique1]
                              [(any? unique2)])
                         (and [e :unique2 unique2]
                              [(any? unique1)]))]
        :limit 1}
      uniques1
      uniques2)

🤯 2
refset17:08:59

I've since added a note (unpublished) about this to the docs which is long overdue https://github.com/juxt/crux/commit/b4ec77580fd1119961fce9c19a86a8596b5175e1 thanks for the prompt!

🙏 2