This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-04-06
Channels
- # announcements (1)
- # babashka (7)
- # beginners (93)
- # bristol-clojurians (1)
- # cider (7)
- # clj-kondo (42)
- # cljs-dev (9)
- # clojure (67)
- # clojure-europe (4)
- # clojure-france (4)
- # clojure-germany (2)
- # clojure-italy (3)
- # clojure-nl (10)
- # clojure-uk (62)
- # clojurescript (11)
- # clojurex (3)
- # conjure (77)
- # cursive (16)
- # datomic (105)
- # docker (4)
- # editors (3)
- # events (5)
- # fulcro (34)
- # jobs (1)
- # juxt (7)
- # kaocha (7)
- # lambdaisland (3)
- # lein-figwheel (2)
- # leiningen (19)
- # malli (14)
- # meander (6)
- # mid-cities-meetup (6)
- # off-topic (20)
- # pedestal (2)
- # reagent (17)
- # reitit (7)
- # remote-jobs (1)
- # shadow-cljs (17)
- # spacemacs (23)
- # specter (2)
- # tools-deps (34)
Is there a way to have optional inputs in datomic?? In my case, I'd like the where clauses that include undefined inputs to be ignored:
(d/q
'[:find ?e
:in $ ?tag ?more ?other
:where
[?e :ent/foo ?foo]
[?e :ent/tags ?tag]
[?e :ent/more ?more]
[?e :ent/other ?other]]
(d/db conn)
:tag
; skip :more
:other
)
Here the value for ?more
should not be passed but because the value for ?other
follows, ?more
is interpreted as ?other
. Passing nil
has not worked for me. It seems that it is then used as the value to match in the query engine. Do I need to write a different query for each optional input?Generally yes: you can also use a sentinel value to indicate “don’t use” and an or
or rule
thanks @favila I'm running into some issues doing so. For example, or
demands a join variable. Could you point me at an example or some pseudocode?
This also breaks down when using inputs as arguments to functions, i.e. >
. Now my sentinel value has to be a number which which would backfire or otherwise the query engine will throw an expression.
I maybe misunderstand your use cause. Your example query doesn’t make sense to me: how do you have a query which has no ?more clauses nor is called with that arg as input but still has it in the :in? How did you get in this situation? What I thought you were talking about is a scenario like this:
But I'm also fine wrapping those values in or
-like clauses. However, that hasn't been working well due to numerous edge-cases
(q [:find ?e
:in $ ?id ?opt-filter
:where
[?e :id ?id]
(or-join [?e ?opt-filter]
(And
[(ground ::ignore) ?opt-filter]
[?e])
(And
(Not [(ground ::ignore) ?opt-filter])
[?e :attr ?opt-filter]))]
Db 123 ::ignore)
Just retyped this for clarity
(d/q '[:find ?e
:in $ ?id ?opt-filter
:where
[?e :id ?id]
(or-join [?e ?opt-filter]
(and
[(ground ::ignore) ?opt-filter]
[?e])
(and
(not [(ground ::ignore) ?opt-filter])
[?e :attr ?opt-filter]))]
(d/db conn) 123 ::ignore)
Your example query doesn’t have clauses using undefined input, it has a slot for an input that you don’t fill
No, rather the query is supposed to stay as it is but inputs should be nullable / or possible to be disabled
There’s some user supplied optional filtering field, and you wand the query to handle ignoring it
Options are sentinel value and a pair of rules which explicitly exclude each other by sentinel match or mismatch
Yes, I think I'll have to do that. Unfortunately that either means lots of quoting or in the macro-case hairy s-expr parsing to handle collection inputs like [?tags ...]
And using the map form, each major clause can be a vector instead of having to stick together a vector positionally
it works but it's quite ugly
(defn query-words [{:keys [tags since]}]
(let [tags (ensure-set tags)
[dur-num dur-unit] since]
(db/q {:find '[?we]
:in '[$ ?dur-num ?dur-unit [?tags ...]]
:where (cond-> ['[?we :word/ticks ?ts]
'[?we :word ?w]]
since (into '[[?ts :tick/when-ms ?tw]
[(tick.util.time/since? ?dur-num ?dur-unit ?tw)]])
tags (into '[[?ts :tick/source ?src]
[?src :source/tags ?tags]]))}
dur-num
dur-unit
tags)))
thanks @favila I'm running into some issues doing so. For example, or
demands a join variable. Could you point me at an example or some pseudocode?
This also breaks down when using inputs as arguments to functions, i.e. >
. Now my sentinel value has to be a number which which would backfire or otherwise the query engine will throw an expression.
I’m considering a workload that uses kafka, where a topic is already partitioned N ways. Is creating N databases on the same transactor to increase scalability a good approach?
we don’t really have enough details yet, but it’s a high-volume financial service. We’re using kafka and partitioning, so I’d like to have a story for how to scale up datomic if necessary
why is multiple databases not a good idea? I get that a single transactor machine is a limitation, but my assumption is that hardware these days can scale out across cores decently well
(Disclaimer: I don't work on the Datomic team, but I use it daily and am here to hold up a mirror)
ok, what are the recommended ways of scaling up datomic deployment? We have lots of kafka topics, lots of kafka partitions. Lots of services that read from a topic, do some processing, and potentially write to the DB. Are there any other options aside from “deploy more transactors” and “scale up the transactor hardware”?
multiple databases seems appealing as a middle ground because it seems to reduce ops load somewhat, and my data is already sharded, and it seems like it should work to remove one bottleneck in the system
FWIW, I recall someone from datomic saying here datomic is not desgined to run multiple DBs performantly
yes, this is abstract and fuzzy because I don’t have a production system yet. I’m the CTO, so I have to be able to tell the CEO “yes, in 3 years when we hit our growth numbers, we won’t hit a wall”
A limitation of on-prem multi-db-per-transactor loads is that transactor indexing work isn’t scheduled or scaled evenly across dbs
“data is already sharded” == we’re running on kafka. Kafka topics are partitioned N ways, typically 10-100. So in a stream of messages, one partition doesn’t see all messages, it sees 1/Nth of total load
kafka topic partitioning doesn’t imply anything about the locality of that data as you intend to use it later
Each topic is partitioned such that it doesn’t need access to other partitions. If it did, that would get in the way of scaling
understood - but like @favila said, the kafka topic partitioning is on dead unindexed data
the query patterns will determine what data needs to be colocated in the same Datomic DB
so I have a domain ID that's auto-incrementing. I want to make a concurrency-safe way to add a new entity. I suppose there's no way to make a query inside the transactor? That would be ideal, if I could find the max and increment it. failing that, it feels like the best plan is to query the max ahead of time, increment it, and then transact
[[:db/cas "new-thing" :my/domain-id nil new-domain-id]
{:db/id "new-thing" etc etc}]
does that work? is there a better way?{:db/ident :generate-id
:db/doc "Generates an unique sequential id for given attribute and temp id"
:db/fn #db/fn{:lang "clojure"
:params [db attribute entid]
:code
(let [attr (d/attribute db attribute)]
(when (and (not (string? entid))
(get (d/entity db entid) attribute))
(throw (IllegalArgumentException.
(str "Entity already has id " (get (d/entity db entid) attribute)))))
(if (and (= (:value-type attr) :db.type/long)
(= (:unique attr) :db.unique/identity))
(let [id (->> (map :v (d/datoms (d/history db) :avet attribute))
(reduce max 0)
inc)]
[[:db/add entid attribute id]])
(throw (ex-info (str "Invalid attribute " attribute)
{:attr attr}))))}}
(d/transact
conn
[[:generate-id :order/id tempid]
{:db/id tempid
:order/customer customer-eid
:order/items ...}])
(and then wrap that whole process in a retry loop)
probably need a retry loop with CAS, but your increment + your other transaction data need to be in the same tx
oh, sorry. by "find the max and increment it" I meant in memory
but, yes, you need to pull it and inc it, conj that with your regular transaction data, and if that fails try it again
you can search for :cognitect.anomalies/conflict
in the ex-data, I believe @braden.shepherdson
cas
doesn't seem to like tempids? :db.error/not-a-keyword Cannot interpret as a keyword: new thing, no leading :
correct. CAS cannot work on tempids. CAS is a transaction function. Tempid resolution cannot occur until all transaction functions have run. Therefore CAS cannot work on a tempid
More specifically, a transaction function that reads an entity, since it is unknown until the end of the tx what entity that tempid resolves to.
(A tx fn can still emit tempids or do anything with them that doesn’t require resolving them)
I understand, thanks for the insight. is there another way to solve my need for an incrementing domain ID?
or cas outside the transaction, like @U050ECB92 mentioned
The most robust pattern I have implemented is: have a counter entity with a unique id attr, a no-history nonce attr and an attr to hold the current value.
it emits [:db/add counter-entity noce-attr random-nonce] [:db/add data-entity target-attr counter-value+1] [:db/add counter-entity counter-attr counter-value+1]
this atomically increments and assigns the counter, and also protects against two things trying to increment the counter in the same tx
you can also run this before issuing the tx, and have a cas on the counter entity to the new max counter id
repeating a cas twice isn’t going to cause a conflict, and will end up “issuing” the same number twice
I follow. the operation to add this value is rare enough, and the "table" small enough, that I'm prepared to query for the current max domain ID inside a transactor function.
got that approach working nicely