core-async 2017-03-20 | Slack Archive

joshjones00:03:11

@wei - I know you posted this in the core.async channel, but what you asked is a general problem, and while @noisesmith posted a solution using core.async, it is not the only solution. In fact, using core.async in this case creates a channel for each thread, and even if you don't consider that "wasteful," it certainly does more than is necessary (it creates a channel for every thread, even though you don't really need a channel). I just made a gist that shows several methods of doing this besides the solution posted here. They all (except the last) allow you to capture return values from the threads if you like also. To summarize the easiest, which is very much what the core.async/thread call does, except without the channels:

(.invokeAll (Executors/newCachedThreadPool) (repeat num-of-threads func-to-run))

joshjones00:03:35

https://gist.github.com/joshjones/ff489513753595a79ccfe0e534af7363

noisesmith00:03:00

I was under the impression channels were cheap. Thread pools are not.

noisesmith00:03:29

(I would be glad to be corrected about channels if you have citation though)

joshjones00:03:30

the thread macro (line 449) creates a new cached thread pool https://github.com/clojure/core.async/blob/master/src/main/clojure/clojure/core/async.clj#L428

noisesmith00:03:35

also, invokeAll returns before the threads have all completed

noisesmith00:03:14

@joshjones the thread call reuses an existing executor

joshjones00:03:14

no, it does not, counter-intuitively -- it guarantees that Future.isDone returns true for all the values in the list returned

noisesmith00:03:39

noisesmith00:03:23

calling async/thread uses an executor that already exists, with a pool that's shared, that's different from making a new executor for a group of calls

joshjones00:03:08

I suppose that the efficiency of one method over another can be tried and tested, if indeed it is found to be lacking in efficiency, it can be swapped for another method. My point was not to show that using core.async's thread macro inside a new go block was inefficient, just to show that it is only one way, and that IMO it is not as straightforward a way as using built-in java.util.concurrent primitives to achieve thread joining

noisesmith00:03:05

I'm all for showing alternatives, yes

noisesmith00:03:08

that's a good point

noisesmith00:03:43

I still think channels are cheaper than threads though - I'd like to test it, my hunch is that 1k channels = 1 thread in allocation cost

joshjones00:03:42

yes, it would be worth testing, I'd be curious as well

joshjones00:03:03

cool, i'll be back after a while and will look forward to your findings just to be clear again, my gist has more to do with options on how to do this. For example, if someone were looking for a general solution on how to create threads and join them, then importing core.async and using a go block would be very out of place I think. However, if this is in the middle of existing core.async code, maybe it fits in just fine. So, it's just about showing different ways of doing this, some of which may be more appropriate depending on the surrounding code

noisesmith00:03:16

agreed 100%

noisesmith00:03:20

+user=> (crit/bench (>/chan))
Evaluation count : 1192380780 in 60 samples of 19873013 calls.
             Execution time mean : 46.645289 ns
    Execution time std-deviation : 0.886886 ns
   Execution time lower quantile : 45.538667 ns ( 2.5%)
   Execution time upper quantile : 49.029813 ns (97.5%)
                   Overhead used : 2.198558 ns

Found 4 outliers in 60 samples (6.6667 %)
	low-severe	 4 (6.6667 %)
 Variance from outliers : 7.8331 % Variance is slightly inflated by outliers
nil
+user=> (crit/bench (Thread.))
Evaluation count : 26920380 in 60 samples of 448673 calls.
             Execution time mean : 2.145635 µs
    Execution time std-deviation : 25.603243 ns
   Execution time lower quantile : 2.118105 µs ( 2.5%)
   Execution time upper quantile : 2.208445 µs (97.5%)
                   Overhead used : 2.198558 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 3 (5.0000 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
+user=> (/ 2145.0 46.0)
46.630434782608695

so chans are only 46 times faster than threads to allocate (via a very naiive test)

noisesmith00:03:44

it's cute that criterium uses appropriate units, but it makes direct comparisons a little less straightforward

hiredman00:03:27

Channels and threads are entirely different things, comparing them in anyway is nonsense

noisesmith00:03:44

@hiredman one approach acquires more channels than needed, the other acquires more threads than needed - or at least that was the comparison suggested

noisesmith01:03:05

@joshjones using a more complete benchmark, our approaches perform the same - mine was literally just 1% faster because it reuses an existing shared threadpool.

(defn pool-approach                                                             
  [f parallel]                                                                  
  (.invokeAll (java.util.concurrent.Executors/newCachedThreadPool)              
              (repeat parallel f)))                                             
                                                                                
(defn async-approach                                                            
  [f parallel]                                                                  
  (>/<!!                                                                        
   (>/go (doseq [t (doall (repeatedly parallel #(>/thread (f))))]               
           (>/<! t)))))                                                         
                                                                                
(defn test-fn                                                                   
  []                                                                            
  (Thread/sleep 1000))

+user=> (crit/bench (async-approach test-fn 100))
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 1.002404 sec
    Execution time std-deviation : 1.338945 ms
   Execution time lower quantile : 1.000879 sec ( 2.5%)
   Execution time upper quantile : 1.005544 sec (97.5%)
                   Overhead used : 1.857061 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 3 (5.0000 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil
+user=> (crit/bench (pool-approach test-fn 100))
Evaluation count : 60 in 60 samples of 1 calls.
             Execution time mean : 1.013606 sec
    Execution time std-deviation : 16.255855 ms
   Execution time lower quantile : 1.004019 sec ( 2.5%)
   Execution time upper quantile : 1.071875 sec (97.5%)
                   Overhead used : 1.857061 ns

Found 4 outliers in 60 samples (6.6667 %)
	low-severe	 4 (6.6667 %)
 Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
nil

noisesmith01:03:50

An aside, in my initial testing I was reminded of the main drawback of doing things this way - if you up the thread count high enough you easily crash the entire jvm. core.async comes with tools to make it easy to use the optimal number of threads to get the task done.

johanatan23:03:16

johanatan23:03:29

how does one go about merely mapping over the values in a single channel?

johanatan23:03:43

[i.e., applying a transform to the produced values]