Fork me on GitHub
#clojure-uk
<
2018-01-12
>
chrisjd07:01:06

Morning!

thomas08:01:55

moin moin morning

Rachel Westmacott09:01:41

Pseudo-Random core function of the day:

-------------------------
clojure.core/pmap
([f coll] [f coll & colls])
  Like map, except f is applied in parallel. Semi-lazy in that the
  parallel computation stays ahead of the consumption, but doesn't
  realize the entire result unless required. Only useful for
  computationally intensive functions where the time of f dominates
  the coordination overhead.

Rachel Westmacott09:01:36

I’m not sure I’ve ever used pmap. I think I’ve only ever seen it used in examples of how not to do concurrency.

reborg10:01:19

that's a bit harsh on pmap it has its use cases

thomas10:01:28

I have read several times that pmap isn't that useful. Reducers(?) is much better I think.

danm10:01:39

We use pmap a fair bit in our flow

danm10:01:46

Lazy sequence in, we need to do some transforms on each item, maintain the sequence as a whole and the order of it, but there's no state so multiple transforms can be done in parallel as long as input order == output order

danm10:01:02

Unless there's a better pattern we should be using for that?

Rachel Westmacott10:01:57

@reborg I don’t have an opinion on whether it is good or bad, or if I should use it more or not. I just haven’t used it, and haven’t felt a need for it.

Rachel Westmacott10:01:31

I can well believe that our BDFL knew what he was doing when he thought it was worth adding. He has a track history of making good decisions after all.

reborg10:01:45

@peterwestmacott I don't use it that often either, I have a couple of examples that I'm digging. I don't think pmap is particularly evil tho (like the examples you mentioned seem to imply)

Rachel Westmacott10:01:25

yes, from memory the examples I’ve seen are more using pmap to demonstrate that you have to be careful with mutable bindings

Rachel Westmacott10:01:39

rather than that there’s anything wrong with pmap itself

reborg10:01:43

ah the with-redef trick?

reborg10:01:35

Friday Puzzle: (sequence (partition-by keyword) ["1" "none" "2" "clojure.core/none" "3" "4"]) ;(["1"] ["none"] ["2"] ["clojure.core/none" "3"] ["4"])

bronsa10:01:17

heh, ::c.c/none is a sentinel in many transducer impls

rickmoynihan11:01:06

Would they not be better using (Object.) as a sentinel? That’s what I try normally to use when I need one.

rickmoynihan11:01:43

It seems that the ::c.c/none pattern could potentially cause security problems etc.

bronsa11:01:27

agreed, maybe open a ticket

rickmoynihan12:01:12

https://dev.clojure.org/jira/browse/CLJ-2312 I screwed up the formatting of the code snippet, is it possible to edit a post?

rickmoynihan12:01:57

also didn’t mean to post it as major doh

bronsa12:01:21

i can edit it for you

rickmoynihan12:01:59

Legend. Thank you.

bronsa10:01:22

i’m guessing that’s what’s happening here

chrjs10:01:03

Morning all.

reborg11:01:10

pmap actually "solves" this dynamic binding issue:

(def ^:dynamic *ctx* {})
(binding [*ctx* {:a 1}] (map #(update *ctx* :a (partial + %)) [1 2]))
; NullPointerException
(binding [*ctx* {:a 1}] (pmap #(update *ctx* :a (partial + %)) [1 2]))
; ({:a 2} {:a 3})

bronsa11:01:04

only for the first chunk

bronsa11:01:02

user=> (doall (binding [*ctx* {:a 1}] (pmap #(update *ctx* :a (partial + %)) (vec (range 33)))))
NullPointerException   clojure.lang.Numbers.ops (Numbers.java:1018)

bronsa11:01:44

it’s a by product of pmap being implemented with future which does binding conveyance, and the fact that the first numproc+2 elements are eagerly produced (and with a vector, that means eagerly producing max(numproc+2, chunkSize) elements)

reborg11:01:06

ah thanks, I was trying to remember what problem I had in the past related to this, didn't dig deeper enough

reborg12:01:19

Can we put there the following rule of thumb? Prefer pmap for non-trivial jobs of predictable and uniform size and when 32 parallel threads are ok. Prefer reducers for more unpredictable jobs and proc+2 threads parallelism.

thomas13:01:16

sounds like a good rule of thumb @reborg

otfrom13:01:41

I'd RT that @reborg;-)

rickmoynihan14:01:04

@reborg: Is prefer really the right word? It seems a bit strong, and I suspect it’s more subtle than that. Why should you prefer pmap? Presumably because it’s just a one character change from map.

rickmoynihan14:01:33

I think there’s some subtlety to “when 32 parallel threads are ok”. i.e. they’re ok on a dual core machine… but you’ll probably see a lot of context switching compared to fewer threads… dunno, would really need to benchmark.

bronsa14:01:33

pmap has also issues with backpressure

reborg14:01:41

I didn't want to imply one is better than the other, or use pmap first if you can, it just happens to be first in the sentence 🙂

rickmoynihan23:01:20

yeah. I think that’s the main problem I have with the above sentence, though you articulated it much better than me.

reborg15:01:08

What about: use r/fold when you are not concerned by laziness or you have unpredictable size tasks or want parallelism to be driven by your amount of cores. Use pmap when your input is a lazy sequence that you prefer to consume on demand, you're not concerned by 32 parallel threads and your tasks are of uniform size.

rickmoynihan00:01:23

“not concerned with laziness” is a bit ambiguous I think, as it also could mean you should use r/fold if you don’t mind that your collection is lazy. I think you’re trying to say the opposite to that though. i.e.

Use `r/fold` when you have a vector or a map (or the collection is `CollFold`able) and you have unpredictable size tasks and want parallelism to be driven by your amount of cores.

rickmoynihan00:01:53

But I think r/fold is perhaps more complicating too… as you also need to have an associative combinef operation. e.g. you couldn’t reliably r/fold - but you could +.

rickmoynihan00:01:32

So you’d probably need to add the caveat to the r/fold that the function you want to reduce with is a monoid 😕

rickmoynihan00:01:32

I’d suggest also the word “try” instead of “use” or “prefer”. That way at least you pass some of the burden/context/caveats back to the reader.

rickmoynihan00:01:18

Maybe something like: Try pmap because it’s easiest to try, though be careful as it can create a lot of overhead due to it spawning 32 threads and consuming & emitting a lazy sequence. It’s best also that pmap does a large amount of work per item to reduce this overhead. Try r/fold if you have less work to do for each item you want to process, and your input collection is a map? a vector? or it satisfies? CollFoldable.

reborg14:01:07

Thanks @U06HHF230 for the suggestions. I agree for the "try" and laziness implications. It's definitely complicated to come up with a complete rule in a few sentences but this is already good

reborg14:01:34

It's my attempt to remember the gist of discussions we are having here and other channels, and possibly put it on the book :)

reborg15:01:25

@bronsa happy to cast that in the rule of thumb if you can phrase it accordingly

chrisjd20:01:57

Out of interest, anyone here deploying apps to Nanobox?

chrisjd20:01:53

I love the platform - and think it has much more popularity to come - but Clojure support could be better.

dominicm22:01:09

nanobox is interesting. It seems like they're going for / are HIPAA & PCI compliant. Which makes it quite interesting.

yogidevbear23:01:23

I'm guessing it's related to metadata? https://clojure.org/reference/metadata

yogidevbear23:01:53

More specifically, the metadata reader macro?

dominicm23:01:16

ding ding ding, yes 🙂

yogidevbear23:01:08

Ooo I can see myself using macroexpand a lot to dig deeper into succinct code examples

dominicm23:01:30

In vim (which you're of course using 😉 ) you can hit c1mm to run macroexpand-1 on code under the cursor.

dominicm23:01:00

(-1 version is better for experimenting because it's non-recursive)

dominicm23:01:30

In vim (which you're of course using 😉 ) you can hit c1mm to run macroexpand-1 on code under the cursor.