This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2024-08-02
Channels
- # announcements (6)
- # beginners (35)
- # calva (5)
- # cider (3)
- # clj-http (1)
- # clj-kondo (52)
- # clojure (51)
- # clojure-conj (2)
- # clojure-europe (28)
- # clojure-losangeles (1)
- # clojure-norway (8)
- # clojure-uk (2)
- # cursive (12)
- # datalevin (2)
- # datomic (38)
- # emacs (7)
- # events (5)
- # gratitude (1)
- # humbleui (7)
- # hyperfiddle (23)
- # jobs (1)
- # off-topic (6)
- # portland-or (1)
- # rdf (3)
- # releases (2)
- # xtdb (3)
is :db/noHistory
working as expected? Followed this (https://docs.datomic.com/schema/schema-change.html#changing-db-nohistory), and a very simple experiment via the REPL does not appear to work as described.
I’d certainly expect it to work. Could you provide some code to reproduce your experience?
And just to be sure, what you’re experiencing is definitely not related to this passage?
Note that :db/noHistory controls the operation of future indexing jobs, and does nothing to current historical values.
45 (let [pub-id #uuid "48a1df75-56cc-4e35-94ed-6005c3ad55c4"
46 db (get-db)
48 eid (ffirst (d/q '[:find ?e :in $ ?id :where [?e :test/id ?id]] db pub-id))]
49 @(d/transact conn [[:db/add eid :test/template (str (rand-int 10000))]])
50 (->> (d/q '[:find ?e ?tmpl
51 :in $ ?id
52 :where
53 [?e :test/id ?id]
54 [?e :test/template ?tmpl]]
55 (d/history (get-db)) test-id)
56 #_(count)))
The noHistory has an async behavior. When you call d/transact with a new value to your attribute, and query the db-after history, it will always include new-value true + old-value false on history Eventually, after a index process, the old value will be discarded. Probably, you can force this process by calling these functions in your REPL • sync-index • gc-storage • request-index https://docs.datomic.com/clojure/index.html I'd gues that request-index, sync-index and gc-storage, on this order, will make it work in your REPL but please, don't do this in production!
I use the Datomic ion platform to host my Datomic-focused apps. And I'm making extensive use of the lambda entry points for SQS, EventBridge events, and more. I've recently run into a situation where I would like to control the concurrency of lambda invocations. But I can't find any obvious and practical levers. Details in the 🧵 .
My app runs two instances on the primary compute group. The lambda that I would like to throttle is consuming from an SQS queue in batches (controlling the batch size is one crude lever, but I'm hoping for more control). Setting the concurrency-limit in ion-config.edn
seems like it would allow me to control concurrency, but it would also decrease latency (something that is not important to me) and increase cost (because of the reserved nature of the lambdas). IOW, that lever seems to help one manage the floor of concurrency and I'm looking to manage the ceiling.
I'm reluctant to do any blocking of the actual lambda entry point thread to achieve throttling because that will bleed over into the ability of the instance to concurrently service any entry point (right?).
So far, the best tool I have at my disposal is to use the DelaySeconds
property of the input SQS message to spread the requests across a 15 minute window. That works as long as the volume of requests divided by 15 minutes is not still so high as to provoke overly concurrent lambda executions. It comes at the cost of latency, but as noted above that is not an issue for this situation.
“Would like to control concurrency “ And “Lambda I would like to throttle” Sound like different things. Can you say a little more about what you’re trying to accomplish?
Sure. Perhaps the ultimate outcome desired is a good proxy... the ion entry point code calls a partner HTTP API. We must throttle that to no more than 100 requests per minute.
The requirement can almost be considered equivalent to throttling the lambda executions (and I'll deal with the mismatch via other means). Controlling lambda concurrency seems like an almost-available lever.
Is it truly 100 per minute to the integrator or 100 per minute per client of yours ( like, per customer ID, for example)
We get throttled. HTTP requests fail and we must re-queue them. I'm now doing that, but its possible there are other negative consequences besides the wasted resources to make a hopeless HTTP requests.
And do you get 100 tokens per minute in buckets of 1 minute or is that also the refresh rate of tokens?
(I'm waiting for the email from the vendor saying that they are going to cut us off if we hit them 1000 times per minute for five minutes straight -even if 90% are throttled.
The SQS queue is not a FIFO queue. And the DelaySeconds
trick is helping immensely, but our scheduling of ~2500 requests is still to much for a 15 minute window.
I’d have a single queue consumer in a query group of its own, size of 1. Then implement a token bucket with an atom or ddb directly ( not datomic please, unless you want to put it in Separate database). Alternatively you could implement a schedule where a the lambda invocation knows how’s kind to wait for the next eligible tick of the schedule and can sleep until then.
I've considered a variation of the latter of those two.... essentially creating multiple EventBridge schedules to handle "chunks" of the total but spread over an hour.
Lastly, you could divide your 100/min by 1/N nodes in your qg pool, thus avoiding coordination overhead and letting lambda route requests across the N nodes
So instead of queuing up 2500 requests spread over 15 minutes, I would enqueue ~700 requests spread over 15 minutes. Repeat four times spread over an hour.
https://clojurians.slack.com/archives/C03RZMDSH/p1722631208029029?thread_ts=1722611504.066139&cid=C03RZMDSH But how to throttle even one node to 100/N?
Every tick, you’ll know the next time when you’ll be eligible to make the next request. Swap that in an atom, sleep until that time, then yield and make the request. ( LockSupport/parkUntil)