This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # aws (24)
- # beginners (377)
- # calva (33)
- # cider (1)
- # circleci (22)
- # cljs-dev (7)
- # clojure (40)
- # clojure-europe (1)
- # clojure-france (9)
- # clojure-norway (3)
- # clojure-taiwan (2)
- # clojurescript (8)
- # conjure (10)
- # cryogen (2)
- # emacs (1)
- # fulcro (23)
- # helix (1)
- # hoplon (2)
- # luminus (7)
- # meander (3)
- # off-topic (2)
- # re-frame (7)
- # reagent (8)
- # reveal (38)
- # sci (13)
- # shadow-cljs (17)
- # tools-deps (17)
- # vim (1)
hey, some time ago I asked in this channel for a debugger that shows the intermediate steps of a function call in the buffer (like the clojure one) for clojurescript, and I got pointed to a tool (standalone) that works with both clj and cljs. But then I hit the slack message limit and I can't find it anymore. Do you recall the name?
this are very interesting things, but the project I remember had a gui you connected to
Hi all, I have the following (probably common) problem. I have a sequence (count: 100-300) of maps that need to be filtered by a series of predicates and then (on the two dozen or so ones that remain) embellished with a couple of new keys and values. The issue is that most of the predicates (and some of the embellishments) require REST or GraphQL calls, so are blocking. What would an experienced Clojure developer (of which I'm not) reach for first in this situation to make this run quickly and make the best use of cores/threads? The r/reduce r/fold things say they are for computationally intensive stuff, not i/o blocking. Maybe async/pipeline-blocking, I thought, but it doesn't seem to help much, unless I'm using it wrong.
You forgot to describe, what's the actual problem with blocking? (e.g. performance, sth else)
is it acceptable for your use case to perform 100-300 requests in parallel? (if not, what's the max)
Yes, I have the threads. This is a command-line/batch tool. In Scala, I would probably do something like that: use futures and a big (100-200) threadpool.
given those requirements I'd simply use
pmap in such a way that each item in the 100-300 sequence gets its own thread, with its own filter->embellishment steps happening in each thread
send-off) are perfectly fine for IO-bound workloads even if some other options may look fancier
pmap a seq of 300 items you get a parallelism of 300 threads, which is desired in this case
...just make sure to
(vec (pmap, to ensure such parallelism, since pmap is lazy
@U45T93RA6 I don't think you assessment is correct.
I just experimented, and I couldn't get more than 30-something threads. With hundreds of items and long sleep times.
IIRC it's explained by the chunking.
pmap derefs its futures by chunks which have a limited size.
@UR71VR71S FWIW I just found this in my notes: https://github.com/TheClimateCorporation/claypoole As per noisesmith: > there's a version of pmap in the claypoole library that's better [than Clojure's pmap] for compute tasks > the advantage of claypoole over just mapv future / deref is it lets you define a specific parallelism (if coll has enough elements in it, it will grind the jvm or even OS to a crawl as it loses all its resources to thread context switch overhead)
future which uses a CachedThreadPool. It only grows if needed, which explains what you probably are seeing.
If a given
pmap step can be performed using a thread that was used-but-then-released from a previous
pmap step, it will.
@U45T93RA6 That's why I mentioned long sleep times. Try to use
pmap with a huge collection and a blocking function, and see how much JVM threads you get. The amount will be between 30 and 40.
Pmap has some logic in it to not grow more than ncpu+2 tasks, it's enforced by the way it realizes the sequence and futures and not the thread pool that future spawns into. Look at the source
Parallelization capped near ncpu is intended primarily for cpu bound work and not blocking io, though you can certainly use it for either but you won't achieve the kinds of throughout you might be able to using larger number of threads if your work is primarily blocking
that's true @U5RCSJ6BB, I had only checked out
future in the source but not that logic.
send-off seem simpler then (the same CachedThreadPool will be used, but without a cpu-related limit)
...you must be sure that a reasonable amount of threads will be spawned though. 300 is OK, 10000 starts to be dangerous
If you’re actually concerned about performance, I would recommend reducing roundtrips by batching queries and then merging results in a post-processing step.
Yeah, I don't have control over these APIs (an internal one and Github's REST and GraphQL). I'm batching as much as I can (doing Github searches when filtering on topics, for example), but I have limited leeway on this side.
I'm writing a CLI querying tool, to help my teammates find which one of our hundreds of microservices and lambdas are using a particular dependency or runtime environment (e.g., version of node). This involves getting everything from an internal registry (which has things like whether it is lambda or an EC2, or what version of CentOS) and then poking Github to answer queries about particular dependencies, language or topics.
My initial attempt (single threaded) took as long as two minutes to get an answer. My second attempt, using pmap and async/pipeline-async, takes 15-45 seconds. I think I can make it faster, given that just about everything is blocking i/o.
(currently trying the fixed executor directly, but actually hitting Github API rate limits, so there's probably a ceiling on how much more I can squeeze out of this.)
Since your original question was "what do people go to", I'd add core.async, since in my experience there's always a good solution you can come up with using those building blocks (pipeline for this one?). If you were looking for info tuning java threadpools, I will be very quiet 🙂
So you can kind of control the concurrency level by controlling the chunk size like so:
(defn re-chunk [n xs] (lazy-seq (when-let [s (seq (take n xs))] (let [cb (chunk-buffer n)] (doseq [x s] (chunk-append cb x)) (chunk-cons (chunk cb) (re-chunk n (drop n xs))))))) (time (dorun (pmap (fn[e] (Thread/sleep 100) (inc e)) (re-chunk 100 (range 1000))))) "Elapsed time: 1038.57 msecs"
So you can just call re-chunk on the collection before passing it to pmap, and give it the chunk size you want, that will also be the concurrency level.
That said, to the original question, I would use an Executor with some fixed thread count which is scaled either to the API I call, or the service I am using.
In the end, I went with async/pipeline-blocking, with 100 concurrency. This didn't give me much when I first tried it, but I was folding the whole sequence in the blocking channels (and less concurrency). Instead, I now feed one thing (map) at a time through the pipeline and that seems to result in acceptable performance (10-30 seconds for queries, compared with 15-45 seconds). That's good enough for my purposes. Thanks all! I learned quite a bit.
Agents could also be easy I think for your use case, something like:
(defn api-pred [e] (Thread/sleep 100) (even? e)) (let [coll-to-process (range 1000) concurrency 100 agents (repeatedly concurrency #(agent ))] (doseq [[i agnt] (map vector coll-to-process (cycle agents))] (send-off agnt #(if (api-pred i) (conj % i) %))) (apply await agents) (->> (mapv deref agents) (reduce into)))
concurrencynumber of agents (so 100 in this example). And then we round-robin sending them the task of calling the api-pred function for each collection item and if true we conj the item over the batch the agent is handling. Then we wait for all of them to be done, and we reduce over their results.
is there a version of
in-ns that functions like
let? Something that temporarily overrides the namespace for a given expr like:
(with-ns (symbol "new-ns") (do (println "the current namespace is: *ns*)))
(defmacro with-ns [ns form] `(let [nsp# (.name *ns*) _# (in-ns ~ns) res# ~form] (in-ns nsp#) res#)) (with-ns 'clojure.core.reducers (prn (.name *ns*))) ;; => clojure.core.reducers