Fork me on GitHub
#clojure-uk
<
2019-07-15
>
guy07:07:42

Morning!

jasonbell07:07:32

Morning

πŸ‘‹ 4
danm08:07:43

πŸ‘‹:skin-tone-2:

maleghast11:07:38

Morning All πŸ™‚

dharrigan12:07:53

Say you have a result of a partition (say 10 lists (but could vary!) of 100,000 items each). And you wanted to hand-off each part to be processed in parallel, is there a particular approach one might take?

guy12:07:53

I’ve never used it before but maybe pmap?

guy12:07:45

quick google gives me this too, https://github.com/reborg/parallel I think reborg is in this chat too iirc?

reborg13:07:19

O hai! Any questions please ask :)

πŸ‘ 4
mccraigmccraig12:07:28

@dharrigan put descriptions of the work on a manifold stream, map over the stream with the work function (doing the work either async or on a separate thread), set a buffer size to control concurrency, reduce to get results

mccraigmccraig12:07:41

i imagine something very similar with core.async would be good too

dharrigan12:07:31

can you clarify what "put descriptions of the work" means? (I've had a basic introduction to manifold with some kafka stuff, so sorta familiar with it, to a basic level).

mccraigmccraig12:07:25

@dharrigan just a plain old clojure datastrucure describing the work to be done... {:type :http-fetch :url "" :outputf "fetch/bar.html"} etc

dharrigan12:07:44

ah, a map! okaydokey.

mccraigmccraig12:07:21

or a variant or whatever else makes sense... just pure data though

yogidevbear12:07:32

I didn't do much 🀷 but thank you for the thanks πŸ™‚

Ben Hammond13:07:25

I'm a bit late but https://juxt.pro/blog/posts/multithreading-transducers.html talks about parallelisng inside a transducer chain

reborg14:07:58

Even more late - I didn’t see the question @dharrigan :( this is the most basic solution if I understand correctly: (pmap #(map inc %) (partition-all 10 (range 100))). This hands over your partitions to 12+2+32 parallel threads (assuming you have 12 cores) at a time. It’s also lazy (considering when to realize the next chunk)

zyxmn14:07:47

I shall never see clapham junction the same again.

Ben Hammond15:07:26

actually I think that might be Budapest Keleti

dharrigan15:07:28

@reborg Thank's!! πŸ™‚

dharrigan15:07:34

and thanks to @ben.hammond too! πŸ™‚

dharrigan15:07:59

I think I still have to get my head around transducers. I mean I sorta get them, but I need to feel them πŸ™‚

Ben Hammond15:07:26

they look a bit intimidating at first but you can think of it like;

Ben Hammond15:07:35

outermost level is usually (fn [rf] ... where 'rf` stands for reducing function. but you can think of it like saying > "Hook me up to the outside world"

Ben Hammond15:07:36

and then inside that is this wierd arity 3 function which always feels really clumsy to me - each arity means a different thing, which is usually a big nono

Ben Hammond15:07:24

(fn [] means > give me your initial value (fn [acc] means > stop writing please and finish up (fn [acc next-val means > Process this

Ben Hammond15:07:57

and then you can drip-feed values to the outside world by calling the rf that you received previously

Ben Hammond15:07:27

but you don't have to

Ben Hammond15:07:46

or you can call it alot when you are feeling chatty

3Jane19:07:19

hey btw, I found this a lucid explanation of what transducers do πŸ˜„

πŸ‘ 4
3Jane19:07:34

please write it up somewhere because otherwise Slack will eat it in a week or two

dharrigan07:07:17

Thanks very helpful. I'm also trying to understand when/where I should use them? I mean, they're not appropriate for everything (or are they?) I guess it comes down to understanding when it's time to transduce and when it's not...

Ben Hammond08:07:19

comes down to scale; if you are processing zillions of rows with lots of steps, then lazy sequences are slow and hard to control because each step gets its own chunk of lazy sequence.

Ben Hammond08:07:00

you can't 'just stop' processing because each step has to chew through its own Chunk data

Ben Hammond08:07:32

each one consuming memory and potentially causing head-retention

Ben Hammond08:07:37

whereas because transducers are plugged together into each other's rf they work in exact lockstep; one incoming item gets fully processed before the next one gets taken

Ben Hammond08:07:50

which is gentler on the memory usage

Ben Hammond08:07:07

and other system resources (like file handles)

Ben Hammond08:07:49

and mean that if you decide that your bjg job is taking too long. you can just interrupt it and it will immediately stop

Ben Hammond08:07:28

(rather than interrupting it and waiting for all the intermediate lazy sequences to drain)

Ben Hammond08:07:18

I quite like the ephemeral nature of Slack in some ways

Ben Hammond08:07:39

I hope that each time I retell this story, I do it a little better

dharrigan15:07:22

But @ben.hammond your link is for reading tonight πŸ™‚

otfrom07:07:41

they are the dirtiest, should be horrible, but best crisps available atm

πŸ’― 4
seancorfield16:07:18

I'm good with anything that has the magic "J" ingredient in it... I love jalapenos!

otfrom17:07:12

jalapenos are great. I prefer habanero, chipotle, and scotch bonnet. nom nom nom

seancorfield17:07:45

There's a brand of jalapeno crisps here -- made from crinkle-cut fresh jalapenos and then deep-fried -- that I love, and my fridge is full of all sorts of hot sauces to put on any food I make. I also have some ghost pepper salt which is awesome on eggs πŸ™‚

otfrom17:07:10

I think you win with that

seancorfield16:07:22

@otfrom the crisps I mentioned -- also another desk snack 😁

seancorfield16:07:51

(and, yes, this thread in the background!)

yogidevbear16:07:48

omg! thank goodness I don't have access to the crispy jalapenos!

seancorfield16:07:44

I could bring you some next month πŸ™‚

πŸ™ 4
otfrom16:07:26

cooooool πŸ˜„

otfrom07:07:41

they are the dirtiest, should be horrible, but best crisps available atm

πŸ’― 4