Fork me on GitHub
#datomic
<
2020-04-06
>
denik00:04:46

Is there a way to have optional inputs in datomic?? In my case, I'd like the where clauses that include undefined inputs to be ignored:

(d/q
 '[:find ?e
   :in $ ?tag ?more ?other
   :where
   [?e :ent/foo ?foo]
   [?e :ent/tags ?tag]
   [?e :ent/more ?more]
   [?e :ent/other ?other]]
 (d/db conn)
 :tag
 ; skip :more
 :other
 )
Here the value for ?more should not be passed but because the value for ?other follows, ?more is interpreted as ?other. Passing nil has not worked for me. It seems that it is then used as the value to match in the query engine. Do I need to write a different query for each optional input?

denik00:04:46

Is there a way to have optional inputs in datomic?? In my case, I'd like the where clauses that include undefined inputs to be ignored:

(d/q
 '[:find ?e
   :in $ ?tag ?more ?other
   :where
   [?e :ent/foo ?foo]
   [?e :ent/tags ?tag]
   [?e :ent/more ?more]
   [?e :ent/other ?other]]
 (d/db conn)
 :tag
 ; skip :more
 :other
 )
Here the value for ?more should not be passed but because the value for ?other follows, ?more is interpreted as ?other. Passing nil has not worked for me. It seems that it is then used as the value to match in the query engine. Do I need to write a different query for each optional input?

favila00:04:58

Generally yes: you can also use a sentinel value to indicate “don’t use” and an or or rule

denik01:04:14

thanks @favila I'm running into some issues doing so. For example, or demands a join variable. Could you point me at an example or some pseudocode?

denik01:04:50

This also breaks down when using inputs as arguments to functions, i.e. > . Now my sentinel value has to be a number which which would backfire or otherwise the query engine will throw an expression.

favila01:04:32

I maybe misunderstand your use cause. Your example query doesn’t make sense to me: how do you have a query which has no ?more clauses nor is called with that arg as input but still has it in the :in? How did you get in this situation? What I thought you were talking about is a scenario like this:

denik01:04:04

> I'd like the where clauses that include undefined inputs to be ignored

denik01:04:17

But I'm also fine wrapping those values in or-like clauses. However, that hasn't been working well due to numerous edge-cases

favila01:04:16

(q [:find ?e
:in $ ?id ?opt-filter
:where
[?e :id ?id]
(or-join [?e ?opt-filter]
   (And
    [(ground ::ignore) ?opt-filter]
    [?e])

    (And
        (Not [(ground ::ignore) ?opt-filter])
             [?e :attr ?opt-filter]))]
Db 123 ::ignore)

favila01:04:29

Pls excuse everything, I’m on a phone

denik01:04:13

Just retyped this for clarity

(d/q '[:find ?e
       :in $ ?id ?opt-filter
       :where
       [?e :id ?id]
       (or-join [?e ?opt-filter]
                (and
                 [(ground ::ignore) ?opt-filter]
                 [?e])

                (and
                 (not [(ground ::ignore) ?opt-filter])
                 [?e :attr ?opt-filter]))]
     (d/db conn) 123 ::ignore)

favila01:04:16

Your example query doesn’t have clauses using undefined input, it has a slot for an input that you don’t fill

favila01:04:24

That’s what confuses me

denik01:04:54

right, because hash-map-like destructuring is not supported in :in

favila01:04:24

Where did ?more come from? Was this query built with code?

denik01:04:13

No, rather the query is supposed to stay as it is but inputs should be nullable / or possible to be disabled

favila01:04:28

But no where clause uses ?more

favila01:04:45

It is already not used, so why is it input?

denik01:04:56

oh my bad, there was supposed to be a clause for ?more

denik01:04:09

I had simplified from application code

favila01:04:39

Ah ok so it is the case I’m thinking of

favila01:04:10

There’s some user supplied optional filtering field, and you wand the query to handle ignoring it

denik01:04:50

yes, you can think of it as a cond->-like where

favila01:04:52

Options are sentinel value and a pair of rules which explicitly exclude each other by sentinel match or mismatch

favila01:04:17

Or build the query dynamically using e.g. cond->

favila01:04:41

For the latter choice using the map form is more convenient

favila01:04:11

Huh that wasn’t meant for the channel

denik01:04:21

Yes, I think I'll have to do that. Unfortunately that either means lots of quoting or in the macro-case hairy s-expr parsing to handle collection inputs like [?tags ...]

favila01:04:46

Don’t inline the inputs

favila01:04:02

Just optionally exclude or include static clauses

denik01:04:02

Because query cache?

favila01:04:17

Yes also less tricky

denik01:04:51

ok thanks a lot for thinking this through with me @favila!

favila01:04:58

(Cond-> [] :always (conj ‘[?e :foo ?foo]) tags (conj ‘[?e :tag ?tag))

favila01:04:07

Eg for the :find

favila01:04:18

Then the :in is like

favila01:04:28

(Cond-> ‘[$] tags (conj ‘[?tag ...]))

favila01:04:49

With proper formatting it’s easy to read

favila01:04:49

And using the map form, each major clause can be a vector instead of having to stick together a vector positionally

favila01:04:36

{:find [?e] :in (cond-> [$] ,,,) :where (cond-> [] ,,,)}

denik01:04:17

it works but it's quite ugly

(defn query-words [{:keys [tags since]}]
  (let [tags (ensure-set tags)
        [dur-num dur-unit] since]
    (db/q {:find  '[?we]
           :in    '[$ ?dur-num ?dur-unit [?tags ...]]
           :where (cond-> ['[?we :word/ticks ?ts]
                           '[?we :word ?w]]
                    since (into '[[?ts :tick/when-ms ?tw]
                                  [(tick.util.time/since? ?dur-num ?dur-unit ?tw)]])
                    tags (into '[[?ts :tick/source ?src]
                                 [?src :source/tags ?tags]]))}
          dur-num
          dur-unit
          tags)))

arohner15:04:10

I’m considering a workload that uses kafka, where a topic is already partitioned N ways. Is creating N databases on the same transactor to increase scalability a good approach?

ghadi15:04:16

can you give more details about the workload?

arohner15:04:31

we don’t really have enough details yet, but it’s a high-volume financial service. We’re using kafka and partitioning, so I’d like to have a story for how to scale up datomic if necessary

arohner15:04:24

why is multiple databases not a good idea? I get that a single transactor machine is a limitation, but my assumption is that hardware these days can scale out across cores decently well

ghadi15:04:50

i wouldn't be comfortable weighing mechanisms without a problem statement

ghadi16:04:22

the missing part is: good idea for what

ghadi16:04:39

all this talk of scaling is abstract

ghadi16:04:11

(Disclaimer: I don't work on the Datomic team, but I use it daily and am here to hold up a mirror)

arohner16:04:41

ok, what are the recommended ways of scaling up datomic deployment? We have lots of kafka topics, lots of kafka partitions. Lots of services that read from a topic, do some processing, and potentially write to the DB. Are there any other options aside from “deploy more transactors” and “scale up the transactor hardware”?

arohner16:04:49

multiple databases seems appealing as a middle ground because it seems to reduce ops load somewhat, and my data is already sharded, and it seems like it should work to remove one bottleneck in the system

johnj16:04:17

FWIW, I recall someone from datomic saying here datomic is not desgined to run multiple DBs performantly

arohner16:04:31

yes, this is abstract and fuzzy because I don’t have a production system yet. I’m the CTO, so I have to be able to tell the CEO “yes, in 3 years when we hit our growth numbers, we won’t hit a wall”

favila16:04:52

A limitation of on-prem multi-db-per-transactor loads is that transactor indexing work isn’t scheduled or scaled evenly across dbs

ghadi16:04:26

need example transactions and example queries

ghadi16:04:48

integrity/key constraints

ghadi16:04:01

Cloud vs. On-Prem?

ghadi16:04:36

transactions/sec datoms/tx

ghadi16:04:50

@arohner wdym by "data is already sharded"?

ghadi16:04:12

to be clear, I didn't mean that N databases is a bad idea

ghadi16:04:36

only err_insufficent_context

arohner16:04:33

“data is already sharded” == we’re running on kafka. Kafka topics are partitioned N ways, typically 10-100. So in a stream of messages, one partition doesn’t see all messages, it sees 1/Nth of total load

ghadi17:04:03

what about downstream of that?

arohner17:04:31

I’m not sure what you mean

favila17:04:40

kafka topic partitioning doesn’t imply anything about the locality of that data as you intend to use it later

arohner17:04:11

Each topic is partitioned such that it doesn’t need access to other partitions. If it did, that would get in the way of scaling

ghadi17:04:24

understood - but like @favila said, the kafka topic partitioning is on dead unindexed data

ghadi17:04:54

the query patterns will determine what data needs to be colocated in the same Datomic DB

ghadi17:04:30

I'm using Datomic Cloud on one project right now, with 200 DBs in the same cluster

johnj19:04:36

those are 200DBs running on a single node?

ghadi19:04:12

single Datomic Cloud system

ghadi17:04:16

we don't anticipate needing cross-database querying

ghadi17:04:36

I'm not sure how the financial data you're storing needs to be aggregated

Braden Shepherdson17:04:33

so I have a domain ID that's auto-incrementing. I want to make a concurrency-safe way to add a new entity. I suppose there's no way to make a query inside the transactor? That would be ideal, if I could find the max and increment it. failing that, it feels like the best plan is to query the max ahead of time, increment it, and then transact

[[:db/cas "new-thing" :my/domain-id nil new-domain-id]
{:db/id "new-thing" etc etc}]
does that work? is there a better way?

fmnoise07:04:07

you can make tx-function for that

fmnoise07:04:31

{:db/ident :generate-id
           :db/doc "Generates an unique sequential id for given attribute and temp id"
           :db/fn #db/fn{:lang "clojure"
                         :params [db attribute entid]
                         :code
                         (let [attr (d/attribute db attribute)]
                           (when (and (not (string? entid))
                                      (get (d/entity db entid) attribute))
                             (throw (IllegalArgumentException.
                                     (str "Entity already has id " (get (d/entity db entid) attribute)))))
                           (if (and (= (:value-type attr) :db.type/long)
                                    (= (:unique attr) :db.unique/identity))
                             (let [id (->> (map :v (d/datoms (d/history db) :avet attribute))
                                           (reduce max 0)
                                           inc)]
                               [[:db/add entid attribute id]])
                             (throw (ex-info (str "Invalid attribute " attribute)
                                             {:attr attr}))))}}

fmnoise07:04:35

working example ☝️:skin-tone-2:

fmnoise07:04:45

then you call it like this

fmnoise07:04:56

(d/transact 
 conn 
 [[:generate-id :order/id tempid]
  {:db/id tempid
   :order/customer customer-eid
   :order/items ...}])

Braden Shepherdson17:04:20

(and then wrap that whole process in a retry loop)

ghadi17:04:51

probably need a retry loop with CAS, but your increment + your other transaction data need to be in the same tx

Braden Shepherdson17:04:23

oh, sorry. by "find the max and increment it" I meant in memory

ghadi17:04:48

but, yes, you need to pull it and inc it, conj that with your regular transaction data, and if that fails try it again

ghadi17:04:11

you can search for :cognitect.anomalies/conflict in the ex-data, I believe @braden.shepherdson

Braden Shepherdson18:04:18

cas doesn't seem to like tempids? :db.error/not-a-keyword Cannot interpret as a keyword: new thing, no leading :

favila18:04:44

correct. CAS cannot work on tempids. CAS is a transaction function. Tempid resolution cannot occur until all transaction functions have run. Therefore CAS cannot work on a tempid

favila18:04:21

More specifically, a transaction function that reads an entity, since it is unknown until the end of the tx what entity that tempid resolves to.

favila18:04:48

(A tx fn can still emit tempids or do anything with them that doesn’t require resolving them)

Braden Shepherdson18:04:18

I understand, thanks for the insight. is there another way to solve my need for an incrementing domain ID?

favila18:04:24

you need a tx fn

favila18:04:19

or cas outside the transaction, like @U050ECB92 mentioned

favila18:04:56

The most robust pattern I have implemented is: have a counter entity with a unique id attr, a no-history nonce attr and an attr to hold the current value.

favila18:04:38

then have a tx fn with a signature like [db data-entity target-attr counter-entity]

favila18:04:39

it emits [:db/add counter-entity noce-attr random-nonce] [:db/add data-entity target-attr counter-value+1] [:db/add counter-entity counter-attr counter-value+1]

favila18:04:16

this atomically increments and assigns the counter, and also protects against two things trying to increment the counter in the same tx

favila18:04:43

there are other ways to do this also

favila18:04:10

you can also run this before issuing the tx, and have a cas on the counter entity to the new max counter id

favila18:04:47

but be careful about composing multiple counter issuances together

favila18:04:14

repeating a cas twice isn’t going to cause a conflict, and will end up “issuing” the same number twice

Braden Shepherdson18:04:36

I follow. the operation to add this value is rare enough, and the "table" small enough, that I'm prepared to query for the current max domain ID inside a transactor function.

Braden Shepherdson18:04:50

got that approach working nicely