clojure 2020-12-21 | Slack Archive

In Joy of Clojure 2nd ed. (p. 253 - 255) they give a following example of making array mutations safe:

(defn make-safe-array [t sz]
  (let [a (make-array t sz)]
    (reify SafeArray
      (count [_] (clj/count a))
      (seq [_] (clj/seq a))
      ;; is locking really neccessary for aget? what could happen?
      (aget [_ i] (locking a
                    (clj/aget a i)))
      (aset [this i f] (locking a
                         (clj/aset a i (f (aget this i))))))))

(full sample here: https://github.com/jumarko/clojure-experiments/blob/master/src/clojure_experiments/books/joy_of_clojure/ch10_mutation_and_concurrency.clj#L280-L282) I'm wondering why they lock aget at all? Isn't it enough to lock aset? Why should I block readers while there's a write in progress?

p-himik07:12:12

Likely because java.lang.reflect.Array/get doesn't say anything about it being thread-safe.

jumar08:12:39

Hmm, that might be it. But what would that mean? Like observing a half-set value? What would that even be?

p-himik08:12:34

Maybe because of this: https://docs.oracle.com/javase/tutorial/essential/concurrency/memconsist.html

p-himik08:12:52

I.e. same reason why we mark variables as volatile.

hiredman08:12:57

Exactly, you need something that orders both reads and writes respective of each other, otherwise the jvm can do things like read the array index once and cache it in a register, and just say your writes all happened after the read

roklenarcic10:12:15

@U06BE1L6T locking emits a memory fencing instruction prevents operation reordering and that makes sure your CPU caches are synced. One thread might update a value in L1 cache (which are per-core) then another thread on another core might read the same value in it’s own L1 cache. Typically memory fence causes the changes to get pushed to L3 cache which isn’t per core. writing a volatile does the same, so generally for scalar values (int, long, writing a reference), volatile is sufficient

jumar11:12:41

Ah right, so the read lock is there only to provide a fresh value - otherwised it could get cached; I think it's unlikely to happen here (I increment the array values in 100 concurrent threads, then read them all afterwards), maybe because the cache coherence protocol will actually fetch the proper value when it's modified by the aset operation (even when there's no lock in aget) I definitely couldn't find any consistency issue when removing the aget lock and testing it (https://github.com/jumarko/clojure-experiments/blob/master/src/clojure_experiments/books/joy_of_clojure/ch10_mutation_and_concurrency.clj#L286-L290)

andy.fingerhut14:12:09

Such bugs are notoriously difficult to test for. Sometimes you may catch them with such tests, but there is no guarantee you will

jumar14:12:10

Yeah, based on my understanding of JMM and memory consistency properties (https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/util/concurrent/package-summary.html#MemoryVisibility) they do the right thing in the book; in particular: > Actions prior to "releasing" synchronizer methods such as Lock.unlock, Semaphore.release, and CountDownLatch.countDown happen-before actions subsequent to a successful "acquiring" method such as Lock.lock, Semaphore.acquire, Condition.await, and CountDownLatch.await on the same synchronizer object in another thread. Here are some good resources dealing with more details regarding the notion of volatile et al meaning "flush to main memory" (which was the impression I got from reading some Java book a decade ago but found much later that this is likely false when reading about the MESI cache coherence protocol): • https://stackoverflow.com/questions/1850270/memory-effects-of-synchronization-in-java • https://mechanical-sympathy.blogspot.com/2013/02/cpu-cache-flushing-fallacy.html • https://stackoverflow.com/questions/42746793/does-a-memory-barrier-ensure-that-the-cache-coherence-has-been-completed/42750844#42750844

leonoel14:12:53

@U06BE1L6T locks or not, there's a race condition here because the sequence can be constructed before the threads are done mutating. Look at the result of (-> (make-safe-array Integer/TYPE 8) (doto pummel) seq)

jumar14:12:12

Oh yeah, you're right. I think they basically rely on the reader waiting until the threads are done (which is quick for a human experimenting in the REPL 🙂 ). ... in which case, I think, the read lock basically doesn't matter at all but would be the right thing to do for an operation happening immediately after a previous aset, right?

leonoel15:12:40

there's many possible reasons why you could see the latest value without explicit synchronization, but in general physical time is not something you should rely on

👍 3

roklenarcic09:12:26

What’s the default for clojure.compiler.direct-linking and elide-meta jvm options when doing a lein jar or lein uberjar ?

Alex Miller (Clojure team)15:12:36

by default those aren't used at all afaik

Alex Miller (Clojure team)15:12:47

so no direct linking, no elide-meta

borkdude10:12:01

@roklenarcic a build tool should not change these options unless the users asks for it

borkdude10:12:20

code in an uberjar might still rely on non-direct linking or metadata for example

Niklas10:12:15

Anyone here using vim with conjure in a monorepo? My issue is that I typically open files in multiple projects and it becomes tedious to launch the repl for every file. Is there a way to configure vim to find the projects root path and launch an nrepl-server in that dir?

Olical10:12:36

I do but I don't start the REPL from nvim, I start a bunch of REPLs using a kinda custom docker-compose wrapper then I set up Conjure to connect to the right REPL depending on what dir I :cd into. Conjure allows you to work on multiple projects at a time by setting the :ConjureClientState [state-key]

Olical10:12:35

At work, I set up a "cwd changed" autocmd that sets my ConjureClientState to the cwd path. So every time I :cd I get a fresh Conjure state with it's own nREPL connection and config.

Olical10:12:29

You could set up something similar + use something like https://github.com/clojure-vim/vim-jack-in if you really want to start your REPL from within nvim. I still recommend setting up your REPLs outside of nvim with your own script though, ensure you write your .nrepl-port files into each sub-repo directory, then :cd into each module as you work on them and Conjure will auto connect. Then you can set up the autocmd to set the state as you hop around to have multiple concurrent connections.

augroup conjure_set_state_key_on_dir_changed
  autocmd!
  autocmd DirChanged * execute "ConjureClientState " . getcwd()
augroup END

Olical10:12:33

I have a script that goes through my docker processes and maps the nREPL ports into .nrepl-port files in the correct directories of the mono repo. Making :cding into directories synonymous with connecting to them.

Olical11:12:07

You can also discuss conjure over at https://conjure.fun/discord if you so wish 🙂

Niklas11:12:24

I guess I can simply use a script to launch repls for all projects.. I guess it will eat some memory. Anyway, I joined #conjure so I'll ask future questions there.

dharrigan10:12:34

I believe there is talk around adding that, you can check in the #conjure channel

murtaza5215:12:03

spec generators rely on the Clojure property testing library test.check. However, this dependency is dynamically loaded and you can use the parts of spec other than gen, exercise, and testing without declaring test.check as a runtime dependency.

The above is from the spec guide where it speaks of loading the test.check lib. What does it mean to dynamically load a lib ? how does that work ?

kwladyka17:12:24

:test-deps {:extra-paths ["test"]
                       :extra-deps {org.clojure/test.check {:mvn/version "1.0.0"}
                                    peridot/peridot {:mvn/version "0.5.2"}}}
           :run-tests {:extra-deps {com.cognitect/test-runner
                                   {:git/url ""
                                    :sha "209b64504cb3bd3b99ecfec7937b358a879f55c1"}}
                      :main-opts ["-m" "cognitect.test-runner"
                                  "-d" "test"]}

an example of adding test.check

Alex Miller (Clojure team)15:12:42

if you do generator stuff, it will load the test.check.generator namespace. if you don't, then it won't.

Alex Miller (Clojure team)15:12:07

so you can safely include test.check at test/repl time but exclude it at production time

popeye21:12:55

(map (fn [k v]
                     (println " K " k)
                     (println " v " v)
                
                     (if-not (re-matches #"^[a-z]+\*$" (->str v))
                             nil
                             (->str v)))
                     {:id "john"})

kwladyka21:12:53

(fn [k v] …) is for 2 arguments. If you want to have key and value you need (fn [[k v]] …).

kwladyka21:12:13

(defn foo [x1 x2 x3] ...) is the fn with 3 arguments (defn foo [x1 [k v] x3] ...) is the function with 3 arugments, but second one is destructed to [k v]

popeye21:12:31

if we use reduce-kv then then parameter will be [k v] right? why so?

kwladyka21:12:33

so it takes x2 which is [:keyword-foo “value”] and place this under k and v

kwladyka21:12:01

because it is different fn which get different parameters - in simply words 😉

kwladyka21:12:13

it is designed to already get this parameters like that

kwladyka21:12:30

while on the beginning it can look confusing later it is very intuitive

👍 3

kwladyka21:12:44

so it already destruct this value for you

popeye21:12:04

Thanks @U0WL6FA77

kwladyka21:12:09

no problem

kwladyka21:12:39

There was website with challenging tasks to transform data where you can try to solve this online. After all you can compare your solutions to the best solutions made by other people. This is really god place to start.

kwladyka21:12:43

But I forgot the URL

kwladyka21:12:24

#Also sent to the channel

Maybe someone else remember URL to website where you can do online tasks challenger and compare your solution to other people?

popeye21:12:34

is that 4 clojures?

kwladyka21:12:13

indeed!

kwladyka21:12:38

At least it is how I was learning many years ago

popeye21:12:55

(map (fn [[k v]] (println "===1===k " k) (println "===1===v " v) (println "matches " (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v))) (if (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v)) true false)){:id "john"})

popeye21:12:20

in this case it is returning (true) or (false) as list

popeye21:12:33

how can convert that to get as boolean?

kwladyka21:12:26

the is you are using map not in right context

kwladyka21:12:16

{:id "John"} is a map, but you want to use map functions on collection like [{:id "John"} {:id "Popeye"}]

kwladyka21:12:38

map return a list

kwladyka21:12:03

so it process each element in vector and return the output of your function

kwladyka21:12:28

if you want to process only one map {:id “John”}, then not use map function

popeye21:12:53

what we can use if we have only 1 key and value?

kwladyka21:12:13

just remove map from there

kwladyka21:12:57

ok, this will be not enough 🙂

kwladyka21:12:25

(map println {:id "john" :foo "bar"})
[:id john]
[:foo bar]
=> (nil nil)
(map println [{:id "john"} {:foo "bar"}])
{:id john}
{:foo bar}
=> (nil nil)

kwladyka21:12:35

Do you see what I mean?

popeye21:12:10

there is no side efect?

kwladyka21:12:20

What do you mean by side effect?

popeye21:12:34

not returning nil?

kwladyka21:12:48

println return nil

popeye21:12:35

yes

popeye21:12:48

the function returning vice versa? like if you apply on map it returning as vector?

kwladyka21:12:06

I don’t understand the question. The logic is map take each element from collection and run function with this element. The result is returned by list.

kwladyka21:12:38

so map get from vector {:id john} and run (println {:id john} which return nil etc.

popeye21:12:58

yes, I got the functionality of map, In my logic i want to take key value which will be single map element and do pattern patching and result us true or false

kwladyka21:12:59

if you want to operate on single map, then you don’t need to use map as a function at all

kwladyka21:12:58

unless you want to operate on each pair of key and value in map, then map is ok

popeye21:12:35

yeah, i got the error while running this function

popeye21:12:36

((fn [[k v]] (println "===1===k " k) (println "===1===v " v) (println "matches " (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v))) (if (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v)) true false){:id "john"}))

popeye21:12:53

is that the way we call? sorry first time i am writing this

kwladyka21:12:59

[[k v]] is not correct anymore

popeye22:12:19

oops

popeye22:12:47

how can we achive then, I am passing single key and value

kwladyka22:12:01

((fn [m]
   (println m))
 {:foo "bar" :x "y"})
{:foo bar, :x y}
=> nil
(map (fn [m]
   (println m))
 {:foo "bar" :x "y"})
[:foo bar]
[:x y]
=> (nil nil)

kwladyka22:12:51

((fn [m]
   (println (:foo m)))
 {:foo "bar" :x "y"})
bar
=> nil

if you want to check :id (which is :foo here)

kwladyka22:12:24

((fn [{:keys [foo] :as m}]
   (println foo))
 {:foo "bar" :x "y"})
bar

or like above

kwladyka22:12:32

but not everything at once 🙂

kwladyka22:12:46

On the end you wouldn’t write anonymous function and call them right a way like that

kwladyka22:12:34

(let [f (fn [{:keys [foo] :as m}]
          (println foo))]
  (f {:foo "bar" :x "y"}))

this can be easier to understand

popeye22:12:58

yeah will explore

👍 3

popeye22:12:05

how about this

popeye22:12:06

(fn [v] (println "===1===v " v) (println "matches " (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v))) (if (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v)) false true )(map val (:id "john")))

kwladyka22:12:10

no, this is not how you want to do this 🙂

kwladyka22:12:26

BTW if you want to get all values only from map use vals so (vals {:foo "bar" :x 1})

kwladyka22:12:46

really hard to talk about how things should be done while we are doing things to learn

kwladyka22:12:56

you have to experiment and figure out things

popeye21:12:01

Hello Team I am passing a map to anonymous function and wanted to validate the function and tried with below code ,but it is not working, how can I pass {:id "john"} to anonymous function ?

kwladyka21:12:24

replied to a thread: (map (fn [k v] (println " K " k) (println " v " v) (if-not (re-matches #"^[a-z]+\*$" (->str v)) nil (->str v))) {:id "john"})

Maybe someone else remember URL to website where you can do online tasks challenger and compare your solution to other people?

phronmophobic23:12:39

from https://clojure.org/reference/protocols#_extend_via_metadata: > As of Clojure 1.10, protocols can optionally elect to be extended via per-value metadata:

(defprotocol Component
  :extend-via-metadata true
  (start [component]))

Is there a resource that talks about how to decide if a protocol should opt in to extension via metadata?

didibus23:12:15

Here's a fun little example of why Functional is better than OOP 😛

data = None

if data and "domain" in data:
  domain = data.get("domain").get("name", "foo")
else:
  domain = "bar"
  
print(domain)

Notice in this code, you need the condition to be: if data and "domain" in data:, the reason we have to check for the fact that data is not None otherwise the type None will not have a in method and you will see: TypeError: argument of type 'NoneType' is not iterable

didibus23:12:30

If you didn't use methods, and instead used a functional approach, and in was a function, this would not be a problem, because you could easily implement a None check inside that function.

didibus23:12:05

This is also a good example why nil isn't as bad in Clojure as it is in non null-safe OOP languages like Python or Java

kwladyka23:12:10

cljs.user=> (key nil)
ERROR - No protocol method IMapEntry.-key defined for type null:

you have to check nil and types in Clojure too 🙂

didibus20:12:03

Yes, sometimes, but now it's just a design choice, not a limitation of the paradigm. Key is just a function implemented with:

(defn key
  "Returns the key of the map entry."
  [map-entry]
  (-key map-entry))

If it wanted, it could handle nil in any way.

phronmophobic23:12:00

I wouldn't say that's a fair comparison. you typically wouldn't want to accept data as either None or a dict. I think it would be appropriate to only expect a dict. additionally, idiomatic python follows "it's easier to ask for forgiveness than permission". I would expect to just see:

data.get("domain", {}).get("name", "bar")

Tamas04:12:49

To complete the example :-): (data or {}).get("domain", {}).get("name", "bar") That being said these days I end up with a get-in function in python code.

👍 3

phronmophobic08:12:03

the above is a nice addition. I still prefer clojure to python by quite a bit, but python isn't so bad

Tamas10:12:22

same here! ie. python isn't bad but I prefer clojure

didibus20:12:40

I wasn't specifically singling out Python, more OO vs Functional.

didibus20:12:24

My point being, what if you wanted a .get that can handle None or any other type, maybe vector, etc.

didibus20:12:02

In OO, all types would need to agree to share a .get interface, and provide an implementation for it

didibus20:12:11

But also, in this particular case, ya I do find Python's handling of None on .get less then ideal. Think Clojure's handling is much nicer specifically because I think the above is a common source of bug.

didibus20:12:17

And not withstanding, I found this example because it was in our case 😅

Tamas08:12:23

I think we understood and agreed with your point, but we didn't think that the comparison was fair. In practice (at least on python codebases I worked on) that python code would look like: get-in(data, ('domain', 'name'), 'bar') or get-in(data, 'domain.name', 'bar') which doesn't compare that unfavourably to (get-in data ["domain", "name"], "bar") as your initial example.

didibus18:12:04

It's possible, no one on our team is really a pro at Python, more like learned at university or picked it up here and there. This code is in a script file part of our infra, so it also doesn't get the same level of code review scrutiny and all. I can't seem to find get-in though? Is that from a popular library?

didibus18:12:36

If so, I think it demonstrates my point pretty well, and I'd be curious to look at the implementation. My guess is get-in is a function that people create for this very problem. Instead of adding a method to Dictionaries and None, if people have found the need to change get from a method to a function, that would be a good example of what I'm talking about. In Python, you could argue that you want a null error to be thrown, maybe you prefer the fail fast, and if you didn't explicitly handle null, maybe you consider a null appearing a bug that you'd want to know about. So that can be a design choice, what do you do with data being None? And while I like that Clojure has get handle nil by default, I don't want to say that throwing a null error if get encounters a null is necessarily worse or bad. But, in OOP, you actually can't do anything about it if you did want to handle this case the way Clojure does. That's because of how methods work versus functions. If the type is wrong, the methods won't exist. All you can do is add the method to more and more types, but even then, there's always a chance a type shows up that doesn't have the method, and you get an error again. That's one of the Functional advantages in my opinion. Which you could also do in Python, since it has Functions, you could make get a function and do this.

kwladyka23:12:07

I would say the biggest difference for me is I can focus on moving from room A to B instead of object door which is not what I am interested in to achieve, because I want to move to B - but this is very abstractive description :)

kwladyka23:12:13

I am going sleep, good night

GGfpc23:12:36

I'm working on an app where I'm making several api calls concurrently to fetch data. The number is variable but let's say it's 50 on average. I'm currently using pmap to transform the urls into the response in parallel, but I was wondering if it could be faster since pmap is limited to 2 + num_cpus and the time is mostly spent in I/O wait. Any tips?

noisesmith23:12:32

When I had an app that heavily used APIs, the pattern that worked best was to have separate resource pooling per API service. This is because there's usually a per API limit (either imposed by the API, or their own resources being able to serve you)

noisesmith23:12:38

that pooling could be a thread pool (eg. claypoole which lets you use futures with custom pools) or a queue per service, with a different number of workers dedicated to each queue

noisesmith23:12:11

if you aren't hitting the limits of the APIs, you can just use future for each call, and skip pmap which is rarely the right answer

noisesmith23:12:13

if you need to do any coordination (eg. combining results from multiple calls before calling another endpoint) look into core.async (but make sure all the io is inside core.async/thread calls)

jumar07:12:39

Also note that pmap will very likely run more than 2+cpus tasks at the same time due to chunking: https://github.com/jumarko/clojure-experiments/blob/master/src/clojure_experiments/experiments.clj#L556-L576

didibus20:12:53

@U06BE1L6T I don't think you're correct here. The parallelization level is restricted by the thread pool it uses, chunking won't change that.

noisesmith21:12:37

the parallelization is controlled by the lag between the launch of new futures and the deref, it uses future which is an expanding unlimited pool

noisesmith21:12:15

chunking changes the behavior of (map #(future (f %)) coll) which is what actually creates the threads

noisesmith21:12:50

so the answer is weird and complicated (another reason I don't like pmap) - chunking causes futures to be launched a chunk at a time, if the input is chunked, otherwise the number of futures in flight is controlled by the lag between future generation and future realization (which is done via the blocking deref)

noisesmith21:12:10

(defn pmap
  "Like map, except f is applied in parallel. Semi-lazy in that the
  parallel computation stays ahead of the consumption, but doesn't
  realize the entire result unless required. Only useful for
  computationally intensive functions where the time of f dominates
  the coordination overhead."
  {:added "1.0"
   :static true}
  ([f coll]
   (let [n (+ 2 (.. Runtime getRuntime availableProcessors))
         rets (map #(future (f %)) coll)
         step (fn step [[x & xs :as vs] fs]
                (lazy-seq
                 (if-let [s (seq fs)]
                   (cons (deref x) (step xs (rest s)))
                   (map deref vs))))]
     (step rets (drop n rets))))
  ([f coll & colls]
   (let [step (fn step [cs]
                (lazy-seq
                 (let [ss (map seq cs)]
                   (when (every? identity ss)
                     (cons (map first ss) (step (map rest ss)))))))]
     (pmap #(apply f %) (step (cons coll colls))))))

noisesmith21:12:30

the (drop n rets) creates the lag between creation of new futures and blocking deref to wait on them

noisesmith21:12:00

breaking a common piece of advice to not mix lazy calculation with procedural side effects

didibus21:12:48

Oh ya, my bad, I was thinking of agent send

didibus21:12:17

I actually never deep dived the impl of pmap, hum..

didibus21:12:10

Doesn't the implementation of step here unchunks?

noisesmith22:12:08

;; changes to this atom will reported via println

(def snitch (atom 0))

(add-watch snitch :logging
           (fn [_ _ old-value new-value]
             (print (str "total goes from " old-value " to " new-value "\n"))))

(defn exercise
  [coll]
  (doall
   (pmap (fn [x]
           (swap! snitch inc)
           (print (str "processing: " x "\n"))
           (swap! snitch dec)
           @snitch)
         coll)))

user=> (exercise (range 10))
total goes from 3 to 4
total goes from 4 to 5
total goes from 2 to 3
total goes from 1 to 2
total goes from 0 to 1
processing: 0
processing: 4
processing: 2
processing: 3
processing: 1
total goes from 5 to 4
total goes from 4 to 3
total goes from 1 to 0
total goes from 2 to 1
total goes from 3 to 2
total goes from 0 to 1
total goes from 1 to 2
processing: 6
processing: 7
total goes from 2 to 3
total goes from 3 to 4
total goes from 5 to 4
total goes from 4 to 5
processing: 8
total goes from 4 to 3
processing: 9
processing: 5
total goes from 3 to 2
total goes from 2 to 1
total goes from 1 to 0
(0 0 0 0 0 0 3 2 0 0)

max parallelism here is 5 - I'm going to try a version where I capture the max and exercise it more aggressively

didibus22:12:37

Cool

noisesmith22:12:48

@U0K064KQV I am not good enough with lazy-seqs to read the pmap code and know whether it unchunks, so I'm working empirically

didibus22:12:14

Haha, no one is 😛

noisesmith22:12:16

yeah, here's my version of exercise that captures the max parallelism:

(defn exercise
  [coll]
  (let [biggest (atom 0)]
    (dorun
     (pmap (fn [x]
             (swap! snitch inc)
             (swap! biggest max @snitch)
             (print (str "processing: " x "\n"))
             (swap! snitch dec)
             @snitch)
           coll))
    @biggest))

(exercise (range 1000)) prints a lot more than I'm going to paste here, and returns 19

noisesmith22:12:41

lmk if that's flawed, but to my eye that will accurately tell you the max futures spawned concurrently by pmap

noisesmith22:12:01

(nb range is chunked, which is why I'm using it here)

didibus22:12:23

Hum. Ya, looking at the code, its kind of hard to get a full picture. I think the branch of if-let that uses cons will unchunk, but the other branch would not. And the drop n will also trigger the first chunk.

noisesmith22:12:12

all the retries on that poor little atom make the output with bigger inputs absurd

noisesmith22:12:50

or maybe that's caused by the printing contention...

didibus22:12:15

Might be better to use a sempahore? I think a lock instead of atom's retry maybe would make this more clear?

noisesmith22:12:21

(the reason all the prints call str is because otherwise the parts of the prints overlap in the output

noisesmith22:12:28

hmm

didibus22:12:18

Oh, no I don't think that's what I meant. Whatever the thing that is a locking counter is called

didibus22:12:30

Then again, hum... What if you changed the impl of pmap so that inside the future it incremented and decremented the counter before and after running f ?

noisesmith22:12:10

that would be the same behavior, with more work to achieve it

didibus22:12:09

hum..

noisesmith22:12:21

I rewrote to an agent (doesn't retry), the prints are now in intelligible order, the answer is still high (33, 37, 38, 39, 36 ...)

noisesmith22:12:38

max value in theory is 42 (32 chunk size + 8 processors + 2)

didibus22:12:43

Ya, so that matches my interpretation of the code

didibus22:12:12

The first branch I think unchunks, but the drop is what triggers the first chunk

didibus22:12:27

So instead of getting n parallelization, you get size of first chunk

noisesmith22:12:39

didibus22:12:53

+n hum..

noisesmith22:12:00

(when you overlap the next chunk)

didibus22:12:26

Oh boy, that's one confusing little function haha. It does seem like, it was written pre-chunking though, so I guess chunking just wasn't taken into account. Hum, I wonder if that explains why I see poor performance improvements from it in practice, like with chunking, the thread overhead is way too high for parallelization

noisesmith22:12:35

it launches chunk-size futures, but iterates by nproc+2 delay between reader of input and reader of future values, if your input is big enough to have multiple chunks you can have more than chunk size in flight

noisesmith22:12:37

that could be - I consider it more like "an example of what you could do to parallelize a specific problem" that happened to make it into the codebase, and it doesn't match most people's problems

noisesmith22:12:36

reducers are more general, but I haven't used them in anger and haven't seen much usage of them in the wild

didibus22:12:28

Ya, I think having to require their namespace and the fact that only fold is still useful now that we have transducers makes them kind of DOA

didibus00:12:47

Well, maybe this chunking behavior is actually a blessing in disguise? Now it means using this re-chunk function:

(defn re-chunk [n xs]
  (lazy-seq
   (when-let [s (seq (take n xs))]
     (let [cb (chunk-buffer n)]
       (doseq [x s] (chunk-append cb x))
       (chunk-cons (chunk cb) (re-chunk n (drop n xs)))))))

Taken from clojuredocs, you can actually control the concurrency level of pmap 😛

didibus00:12:50

(dorun (pmap (fn[_] (Thread/sleep 100)) (re-chunk 1 (range 1000)))) Will give you ~2+cores (dorun (pmap (fn[_] (Thread/sleep 100)) (re-chunk 100 (range 1000)))) Will give you ~100

didibus00:12:06

Not sure what to think about this. It probably just be nice if pmap was re-written to unchunk and take the number of cores+2 or an optional n.

jumar08:12:04

I have the same feeling and that’s why I created map-throttled in the repo; but it’s for a very specific use case. In most cases it’s better to use Executors or claypoole

2020-12-21

Channels