God morgen!
Morn!
god morgen!
Morn!
Gjettekonkurranse: hvorfor funker ikke dette forsøket på å parallellisere? get-secret er den trege funksjonen, og tar cirka 700 ms.
(defn load-secrets []
(->> (concat (map (fn [[k secret]] [k (future (get-secret secret))]) env-secrets)
(map (fn [[k secret]] [k (future (get-secret secret {:base64? true}))]) base64-encoded-env-secrets))
(map (fn [[k f]] [k @f]))
(into {})))hehe, den derefer i loopen
men det er jo ikke et problem i seg selv? Først fyr av én future for hvert element, deretter krev at alle er ferdig?
ah, søren og
eller, nei, det er ingen "først" her 🙂 lazyyy
så den ender vel i praksis opp med å kalle future og derefe i sekvens
Bingo! Jeg la en doall mellom (concat ,,,) og (map ,,,).
Etter litt refaktorering, synes jeg denne koden ble ganske så hyggelig
(defn gcloud-secrets-access-latest [secret]
(->> ["gcloud secrets versions access latest --project REDACTED_PROJECT_ID!!!!!! --secret " secret]
(shell-med-feilrapportering {:out :string})
:out))
(defn load-secret [{:keys [name format]}]
(let [raw (gcloud-secrets-access-latest name)]
(case format
:raw raw
:base64 (.encodeToString (Base64/getEncoder) (.getBytes raw)))))
(defn load-secrets []
(-> env-secrets
(update-vals #(future (load-secret %)))
(update-vals deref)))Nå er jeg ikke helt stø på slikt i Clojure, men i Java ville jeg tydd til en ExecutorService slik at jeg får en forhåndsbestemt threadpool. Hvordan håndterer man slikt typisk i Clojure?
core.async 😍
selvfølgelig 🙂
sitter fortsatt i kroppen det foredaget fra EuroClojure i Krakow i sikkert 2015 eller når det var om en som jobbet i postvesenet i England elns og hadde et stykke sekvensiell og relativt "imperativ" kode for en integrasjon, og gikk igjennom diverse måter og gjøre det async på. Til slutt viste han core.async, og koden så i praksis helt lik ut, bare at den var wrappet med litt kall til core.async 😄 Makroer ftw
"Logan Campbell, Clojure at a Post Office" heter talken, men klarer ikke finne noe opptak eller slides
Mulig jeg er i særstilling her, men jeg har aldri vært i en posisjon hvor jeg har måttet bruke core.async skikkelig. Asynk er så ofte løst av "laget under meg", enten det er HTTP-serveren, databasen, tallknuse-biblioteket eller NATS. Jeg har tenkt en del på noe Rich sa i Language of the System: > Now we get to moving things around. I think it's one of the things in Clojure maybe I didn't make clear enough because I didn't need to wrap them is that the queues in java.util.concurrent are awesome. If you're not using them as part of your system designs internally, you're missing out. And, in the large, queues also rule because they have this really great characteristic. They're completely decoupling. [<https://github.com/matthiasn/talk-transcripts/blob/master/Hickey_Rich/LanguageSystem.md > |kilde>] Core.async.flow ble trukket ut fra / inspirert av hvordan Datomic er laget internt. Og jeg vil jo ha alt i løst koblede funksjoner! Enten ta inn data og gi data, eller gjør noe greier med køer hvis vi ikke kan "gjøre oss helt ferdig" før vi returnerer.
Svaret er fordi du ikke bruker Claypoole 😊
fordi hvis jeg bruke Claypoole, ville jeg fått problemer som ville gjort at jeg måtte brukt core.async? trollface Spøk tilside, jeg er fullt klar over at jeg har ting å lære her. Men så mye av det jeg leser er tekniske demoer! "Se, du kan gjøre $OPPGAVE med $NYTT_VERKTØY!!!". Men jeg driver ikke og samler verktøy! Jeg prøver å finne ut av hvilke problemer jeg ikke ennå har kontroll på fordi jeg ikke behersker en gitt teknikk!
> Concat, the lazily-ticking time bomb https://stuartsierra.com/2015/04/26/clojure-donts-concat/
Det er endel overraskelser i clojure sin verktøykasse. Sånn sett, og med det du sier i bakhodet, @teodorlu, så bør man gripe etter egnet verktøy til jobben. Når det gjelder concurrency så bør man åpenbart styre unna fallgruvene som er lazy collections og kanskje heller bruke noe formålsrettet (som f.eks. ExecutorService - veldig lesbart, ingen tvil om hva som kommer til å skje)
lazy-seqs traff oss her ja, men ville du foretrukket ExecutorService over disse tre linjene?
(-> env-secrets
(update-vals #(future (load-secret %)))
(update-vals deref))
Spør av nysgjerrighet!Vet heller ikke hvordan det hadde blitt med ExecutorService? future ser ut til å hente maks 32 om gangen. Vi har 9, så alt skjer samtidig.
Det er kanskje litt av problemet med future, at det er udefinert hvor mange tråder som benyttes? Det spiller kanskje liten rolle hvis du vet at du har få samtidige oppgaver, men det hele kommer vel an på hva du har tilgjengelig av ressurser og om du kan ende opp med å blokkere andre deler av appliksjonen eller andre kontekstuelle påvirkninger.
Aner ikke hvordan det ville sett ut med direkte bruk av ExecutorService i clojure - det vil helt sikkert ikke bli pent. Noe slikt som
(map #(.get %) (.invokeAll (Executors/newFixedThreadPool 4) (map load-secret env-secrets)))
(... uten at jeg vet om dette funker i det hele tatt - bare ren kladding fra hodet, og ingen som helst error-handling)enig, liker executors når man ønsker å ha en viss kontroll over runtimen sin
Eksplisitt gir mening!
Jeg gjorde faktisk et forsøk på å få systemet mitt med å tryne ved å starte flere og flere futures, og klarte ikke å observere noen effekt på systemet. Lagde ca 50 000, mener jeg.
@sardtok sa at på 1.12, skal future bruke virtual threads. Så da når jeg kanskje aldri noe tak? 🤷
Korreksjon: future bruker ikke virtual threads på 1.12. Det må man gjøre selv. Så alle dere som var bekymret for thread pools (for 9 kall til gcloud secrets) kan fortsette å være bekymret! 😄
ah nice, det hjelper jo veldig ja. Da blir det plutselig litt mindre viktig hvor mange tråder poolen bruker og sånt
future bruker samme thread pool som send-off. Den kan kontrolleres med set-agent-send-off-executor!:
(import '[java.util.concurrent Executors])
(def custom-pool (Executors/newFixedThreadPool 32))
(set-agent-send-off-executor! custom-pool)
Default er en unbounded, cached thread pool: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Agent.java#L53Dette var nyttig informasjon! (fra alle sammen) og send-off kjente jeg ikke til
send-off hører til agents, som jeg aldri har brukt på 8 år med Clojure. Noen andre som har brukt agents, og til hva?
er veldig bra hvis du skal livekode ant colony simulators ihvertfall 🙂
clojure-featuren som kun ble brukt til Rich sin salgspitch om at concurrency er fint med Clojure!
samme med software transactional memory, fet idé men man trengte det kanskje ikke i praksis?
aner ikke jeg altså, men det var kanskje Rich sin C++-erfaring som ville lage verktøy som var umulig å få til med C++ med relativt trivielt i Clojure
Ikke sant, det er det eneste eksempelet jeg har hørt om hvor agents er brukt! (Utenfor Clojure-bøker.)
(I'm still too scared to use my broken Norwegian, so please excuse the English)
My first port of call for this would have been pmap on each of the two. But because I'd expect the cores to be saturated, then I'd probably do something fancy to concat the 2 sequences and use the pmap over all of them:
(defn load-secrets []
(let [simple-count (count env-secrets)]
(->> (pmap (fn [[k secret] idx]
(if (< idx simple-count)
(get-secret secret)
(get-secret secret {:base64? true})))
(concat env-secrets base64-env-secrets)
(range))
(into {})))
core.async can definitely fine-tune thread usage much better, so it would improve things much more, especially if you're on a JVM that can use lightweight threads.However, this continues to demonstrate the same problem: one future per operation. There are a couple of libraries that reduce this overhead, like https://github.com/clj-commons/claypoole.
(require '[com.climate.claypoole :as cp])
(defn load-secrets []
(let [simple-count (count env-secrets)]
(->> (cp/pmap (fn [[k secret] idx]
(if (< idx simple-count)
(get-secret secret)
(get-secret secret {:base64? true})))
(concat env-secrets base64-env-secrets)
(range))
(into {})))
It appears I incorrectly assumed future used virtual threads!
Thanks for the code examples 🙏
I thought about it a bit more, and virtual threads are not going to help, unless your get-secret function has I/O. Virtual threads let you do more work when any of the threads need to wait for things (which is usually defined by I/O).
get-secret is all IO! It shells out to gcloud, which sits and waits for the network.
Implementation after refactor can be seen earlier in this thread: https://clojurians.slack.com/archives/C061XGG1W/p1766053062037489?thread_ts=1766050463.954099&cid=C061XGG1W
… where shell-med-feilrapportering is a wrapper around babashka.process/shell with some decisions about error reporting.
> Virtual threads let you do more work when any of the threads need to wait for things (which is usually defined by I/O). I guess virtual threads are for Java what async/await is to js (and async/task to c#), but with the same programming model as with regular threads. So just async tasks, not necessarily concurrent tasks - or can virtual threads also be run concurrently by an executor in some cases? maybe they have to be "promoted" to full threads in those cases?
Virtual threads have low overhead, so you can create a huge amount of them without worrying about exhausting the OS thread limits. They aren't directly dependent on OS scheduling like an OS thread, but handled by relatively lightweight code in the JVM. Under the hood they are run on threads in a ForkJoinPool.
with threads, you'll get an overflow of stacks 😎
Yes, you can create an enormous number of virtual threads without much overhead. (So using a “Future” per work item is less expensive. This is why I thought of them), but until @teodorlu said that it was mostly IO then scheduling batches of operations gets better CPU performance. However, IO bounding makes virtual threads perfect.
In this specific case, it is just a small Babashka task that runs when launching the dev environment, to ensure we have the right keys for various external systems. That way we can test things like various identity providers and APIs locally. They aren't super sensitive, because they are configured for test environments at the providers, but this way we could open source the code at some point without leaking any keys. They are also generated by our platform, so GCP secrets happen to be where they end up for now. The Google Cloud CLI is just super slow at getting a single key. It can take up to two seconds for one. So starting the dev environment takes a few more seconds than necessary. Sure, it's not that much, but enough to be a bit annoying if you have to restart it at some point.
This reminds me of what core.async https://swannodette.github.io/2013/07/12/communicating-sequential-processes/ 12 years ago 🤯
Then David did 10,000 go blocks https://swannodette.github.io/2013/08/02/100000-processes/
Isn't concurrent the wrong word here? In David's example none of the code will be executed at the same time, but rather one after the other.
"concurrent" is right — it is concurrent, but not parallell! Paralell means that it actually runs at the same time.
aha!
Concurrent is like a time-sharing OS on a single CPU… everything runs concurrently, but only one process or thread is active at a time. The OS uses system calls as an opportunity to pre-empt the process and schedule the next one.
Of course, modern OSes are both concurrent AND parallel
we get the "worst of both worlds" 😜
core.async uses macros to do the same thing. Which continues to blow my mind to this day
yeah — the go macro is a beast. Insane stuff.
a thousand lines of dragons. https://github.com/clojure/core.async/blob/7cc715ac25b4a0f232a7fe1049f90f14ef10fd96/src/main/clojure/clojure/core/async/impl/go.clj#L1044 Edit: more than a thousand lines, it uses other namespaces too 😅
(both records and multimethods internally)
Java 1.0 had "green threads" as well as native threads. Green threads did the same thing in the JVM. Indeed, Virtual threads follow exactly the same principle… but done properly (Green threads were terrible)
(Incidentally, this is something that bugs me about JavaScript. JavaScript does not do this. Instead, you have to explicitly acknowledge that you're giving up control, and write code to be executed when the JS engine returns to you, via a .this block. This paradigm does offer the ability to do more operations before returning, but in practice I've never anyone take advantage of it. I wish JS just did pre-emption like every other OS/VM since the late '60s, but at least core.async did it for us)
I'm curious about whether Zig's "IO as an explicit parameter" approach will be fruitful. They decided to rewrite the compiler and break all userspace IO code to get what they considered to be a clean solution. Async code now needs to "take an IO". The promise is application authors can decide how to do IO (async, sync, etc), and library authors can be IO strategy agnostic, by leveraging the passed parameter.
I do need to spend time with Zig. I listen to "Software Unscripted", and even though Richard develops Roc, he refers to Zig all the time. It sounds fascinating.
I've tried it a bit. It strikes me as very high quality tech. However, every time I try, I'm reminded about how nice it is to use a high-level language like Clojure, and how nice our data structures are. I don't really care about memory when I'm doing information programming in non-performance-bottleneck situations. My colleague @mathias.iversen487 has ventured further down the rabbit hole than I have. A month or so ago, he presented https://github.com/boosja/zlides for us. It's a terminal slideshow viewer. Nice, little application, that's a good fit for Zig.
I guess I'm curious because I've been spending so much time in C++ recently
CUDA and Metal basically require it (that's not completely true, but mostly)
gpu programming pulls at me too! It's one reason I'm interested in Zig.
Ingen grunn til å vente med denne: https://www.meetup.com/clojure-oslo/events/312486619/ 🍛🍛🍛 Clojure-lønsj tirsdag 13. januar kl. 12 🍛🍛🍛 Håper vi ses! Og god jul i mellomtiden!
god jul, Martin!
god jul, Teodor!
God jul!
vitenskap bør være åpen for alle! 🙌
Morn!
Mrn.,
Morn!
Morn!
Morn
Morn!