Fork me on GitHub
#clojure-europe
<
2020-10-15
>
slipset05:10:34

Morning, fwiw, Iโ€™ve always thought about seagull management as drive by management.

otfrom07:10:26

@slipset it is, but messier and noisier (unless your drive by management includes shooting)

๐Ÿ˜‚ 3
genRaiy09:10:44

good october morning

otfrom10:10:01

first time I've used eduction and thought "ah, that feels like a good use"

(defn csv->nippy [in-file out-dir]
  (with-open [reader (io/reader in-file)]
    (run!
     (fn [data]
       (let [idx (-> data first :simulation)]
         (nippy/freeze-to-file (str out-dir "simulated-transitions-" idx ".npy") data)))
     (eduction
      (drop 1)
      (map #(zipmap header %))
      (map scrub-transition-count)
      (partition-by (fn [{:keys [simulation]}] (quot simulation 100)))
      (csv/read-csv reader)))))

otfrom10:10:06

so the question is... Is this a good use?

otfrom10:10:44

and I think all the data in there will get GC'd

borkdude10:10:43

@otfrom I think there's no real benefit of using eduction vs transduce here probably

borkdude10:10:15

@otfrom an eduction is basically just an xform + a source, which delays running that xform over that coll, and offers the ability to compose with more xforms.

(deftype Eduction [xform coll]
  ...)

borkdude11:10:29

it's probably one of those things that you will need when you know you need it, in other cases yagni

otfrom11:10:54

eduction (and sequence) are essentially lazy tho IIUC

otfrom11:10:00

eduction calculates each time

otfrom11:10:11

sequence will cache the results of the calculation

otfrom11:10:44

or are you thinking I should replace run! with transduce?

borkdude11:10:50

no, eduction with transduce

borkdude11:10:14

(transduce (comp ...) (csv/read-csv ...))

borkdude11:10:06

an eduction is only useful when you want to pass it around, in this function you have everything you need already, there's no need to create a wrapper around that

borkdude11:10:20

you could also make csv/read-csv an IReducible thing, so you create even less garbage

otfrom11:10:42

ok... I had thought that eduction would only realise a portion of what was coming in whereas transduce would realise the whole collection which would then go to run!

otfrom11:10:14

an IReducible read-csv would be great ๐Ÿ™‚

borkdude11:10:25

an eduction will also run transduce when reduced

borkdude11:10:21

@otfrom that's not very much different from that blog for processing lines of text maybe?

otfrom11:10:40

yeah, I just need to get my head around it again

otfrom11:10:57

and understand why people keep saying that it isn't the right way to do it

otfrom11:10:14

(finding good examples everyone agrees on for doing that feels hard, unless that ETL blog is the right way)

borkdude11:10:14

your example above is right, it's just not necessary to use eduction, since that boils down to just transduce. It's like writing (+ (identity 1) (identity 2)) while you could also write (+ 1 2)

otfrom11:10:28

doing what I did above at least had the advantage of working, whereas before getting all 500MB of csv into a vector of maps and then passing that to nippy ran out of memory

otfrom11:10:11

@borkdude ok, from my reading around it felt like the difference between transduce/`into` & sequence/`eduction` was similar to the difference between [] and a seq

otfrom11:10:38

and the difference between eduction and sequence was that sequence would hold the results in memory while eduction would recalculate each time

otfrom11:10:51

and it feels like I've got the wrong end of the stick on some of those differences

borkdude11:10:34

user=> (into [] (comp (drop 2) (take 1)) (range))
[2]
This doesn't realize the entire range, does it?

borkdude11:10:34

which is basically:

(transduce (comp (drop 1) (take 1)) conj (range))

borkdude11:10:18

anyway, if it works what you're doing, keep doing it :)

borkdude11:10:25

it's not wrong

borkdude11:10:42

user=> (transduce (comp (drop 10) (take 1)) (fn ([]) ([x]) ([x y] (prn y))) (range))
10
Note that this skips over 10 numbers, then takes 1 number, prints it and then quits.

borkdude11:10:43

so you could do your side effect in the transducing function maybe, instead of first realizing it into a lazy seq

borkdude11:10:56

anyway, maybe not important

borkdude11:10:47

user=> (defn run!! [f xform coll] (transduce xform (fn ([]) ([x]) ([x y] (f y))) coll))
#'user/run!!
user=> (run!! prn (comp (drop 10) (take 1)) (range))
10
nil

pez12:10:06

I want to grok transduce. It doesn't โ€clickโ€ yet for me. Anyone seen a tuturial about it that can be recommended?

thomas12:10:05

the problem with some docs is that you need to understand it before you can actually understand the documentation. The transducers might well fall in that catergory.

โค๏ธ 3
otfrom12:10:16

re: https://clojurians.slack.com/archives/CBJ5CGE0G/p1602760474478900 I was more thinking about how much of it was realised at any one time w/o the possibility of garbage collection. My understanding was that transduce would put all of the transduced things in memory whereas eduction would only have (1?) some things in memory at any one time

borkdude12:10:32

I don't think that's true

otfrom12:10:58

ok. That is the bit that I'm struggling with. I'm not too surprised to hear that I've got it wrong. ๐Ÿ™‚

borkdude12:10:40

Why do you assume transduce holds everything in memory at once?

borkdude12:10:59

It's more or less like reduce

borkdude12:10:19

Eduction is built on top of transduce

genRaiy12:10:47

maybe he means eager cos that's how transduce is advertised

borkdude12:10:01

yes, reduce is also eager. but that doesn't mean it will realize the entire input or hold everything in memory at once. Transducers know when to stop similar to how reduce knows to stop using a reduced value

borkdude12:10:46

I think reading the source might make more sense than speculate.

otfrom12:10:25

I mean that the result of the reduce will be held in memory all at once, whereas the eduction will only realise as much as has been asked for

otfrom12:10:24

so (take 10 (eduction (map identity) (range 100000)) would only realise the first 10 things, whereas (take 10 (transduce (map identity) my-conj (range 1000000)) would realise the whole result of processing the range and then the take would take from the fully realised thing.

borkdude12:10:02

but in your example you use run! over the entire result, so the eduction is not relevant there?

otfrom13:10:28

perhaps it is run! I'm not understanding. I thought run! would only have the one element in memory at a time that it was trying to process (unless the collection or collection producing function realised more than one)

borkdude13:10:04

run! is effectively just reduce, but you're reducing your entire eduction right. you're not lazily doing anything with your eduction. so in this case transduce or eduction boil down to the same thing

otfrom13:10:30

but reducing into a hash is going to take less memory than reducing into a seq of all the data. My understanding is that run! would not hold all of the seq in memory to do its work

otfrom13:10:01

but that if it was working on the result of transducing something into a vector then the whole vector would be in memory

borkdude13:10:38

yeah, you cannot lazily create a vector result.

borkdude13:10:47

but that's not an eduction/transducer problem?

borkdude13:10:55

not sure if I still follow :)

otfrom14:10:43

I'm not sure if I'm explaining myself badly or if my massive gap in knowledge is tripping me up

otfrom14:10:50

Or both ๐Ÿ˜‰

mpenet14:10:50

about "laziness" (not the right term here imho), the eduction will be pulled in value by value, then if the input is realized? or not is another matter

mpenet14:10:06

(run! prn
      (eduction (map (fn [x]
                       (prn :x x)
                       (Thread/sleep 1000)
                       x))
                (range 10)))

otfrom14:10:08

The problem I'm trying to solve is that I need to transform data from a CSV and write it out as partitioned nippy files without blowing up memory

otfrom14:10:44

I agree that laziness isn't quite right

mpenet14:10:11

it's a pull based thing

otfrom14:10:49

But eduction is only going to realise values as they are pulled

mpenet14:10:12

an eduction is just partial application of xforms over something

otfrom14:10:25

And run! isn't going to hold them in memory

otfrom14:10:45

Unless I put it in an atom or something

mpenet14:10:55

it's like going over an iterator, item by item

borkdude14:10:02

it's the same as if you're just running over a lazy seq, no difference there

mpenet14:10:05

value by value (sounds better)

mpenet14:10:28

kinda sorta, without the cost of a lazy seq

borkdude14:10:40

I mean wrt to holding in memory

mpenet14:10:43

could be db rows, rs.next

otfrom14:10:47

But transduce would produce the whole vector which then run! would operate on

mpenet14:10:37

transduce is a bit like reduce, you could throw-away the accumulation every-time, but then why use transduce in the first place (if you mean using transduce instead of run!)

borkdude14:10:44

@otfrom My point with using transduce was: you're running over with a side effect. you could do the side effect in transduce instead, saving you the realisation of an eduction. But as I also pointed out, it may not be so important. mpenet has repeated this.

mpenet14:10:37

you never really realize an eduction, it never materializes, it's really just (sort of) an iterator

borkdude14:10:07

ok, that's a good point yes. no garbage from the eduction.

user=> (defn run!! [f xform coll] (transduce xform (fn ([]) ([x]) ([x y] (f y))) coll))
#'user/run!!
user=> (run!! prn (comp (drop 10) (take 1)) (range))
10
nil

mpenet14:10:12

the docstring probably gives a better description than me

borkdude14:10:29

I posted that above example to show the run! equivalent for transducers

borkdude14:10:11

but run! + eduction works equally well

mpenet14:10:17

eductions are awesome ๐Ÿ™‚

mpenet14:10:26

it's the new juxt

mpenet14:10:33

but actually useful

mpenet14:10:41

I should create a company with that name maybe

๐Ÿ˜† 6
borkdude14:10:55

Are you disputing the usefulness of juxt....?

mpenet14:10:03

it's muscle flexing in most cases imho!

borkdude14:10:58

I usually use it with keywords (map (juxt :field-a :field-b) [{:field-a 1 :field-b 2}])

mpenet14:10:46

yeah I prefer select-keys, but I get your point

mpenet14:10:11

not really actually, different use

mpenet14:10:23

but yes, there are good uses for it. It's just quite rate

otfrom14:10:41

I use juxt all the time, but then I need to create a lot of vector from maps of data to go into excel or csv files, so select-keys doesn't work for me

otfrom14:10:59

most of my work is in and out of csv or excel

otfrom14:10:34

@borkdude I see what you are getting at with using transduce there now

borkdude14:10:26

yeah, but has mpenet has pointed out, the overhead from the eduction might be small enough not to make this an issue

otfrom14:10:31

esp as run! is just a reduce with a proc according to the docs

mpenet14:10:06

yes it will be very efficient

borkdude14:10:12

same here: vectors from maps, for some reason I do this fairly regularly

otfrom14:10:02

I not only use juxt, I use (apply juxt vec-of-keys) b/c I'm a monster who does (into [] (map #(friendly-key-lookup %) vec-of-keys) as well

otfrom14:10:55

thx for having the patience to go through this with me. I feel I understand a lot more of what is going on. ๐Ÿ™‚

borkdude14:10:12

I remember a Clojure meetup in Amsterdam with the author of Midje doing a talk and somehow he needed matrix transposition. I just yelled: apply map vector. It's one of these things you just know ;)

otfrom15:10:34

it is indeed

mpenet15:10:19

and rarely need! I got to use it once, on a job interview ๐Ÿ˜›

mpenet15:10:54

got an offer for that one

otfrom19:10:39

I've used it a few times, but then transposing a matrix isn't entirely odd for me

borkdude19:10:34

especially not in the case of CSVs where you want to have a column instead of a row

borkdude12:10:14

@pez The basic idea: What would be a more performant way of writing:

(->> [1 -10 11 -2] (filter pos?) (map inc))

borkdude12:10:57

You could squash filter and map into one function that runs over the seq:

(defn f [x] (when (pos? x) (inc x)))
and then do:
user=> (keep f [1 -10 11 -2])
(2 12)

otfrom12:10:58
replied to a thread:I don't think that's true

ok. That is the bit that I'm struggling with. I'm not too surprised to hear that I've got it wrong. ๐Ÿ™‚

borkdude12:10:24

Transducers basically give you the implementation of that idea for free

slipset13:10:47

Scary to share it here, but Iโ€™ve given a talk on them https://youtu.be/_4sgTq4_OjM

๐Ÿ‘ 3
slipset13:10:01

Still donโ€™t use nor understand them :)

mpenet13:10:34

it's a fine use of eduction, you don't need the return value of transduce so eduction+run! is ok

mpenet13:10:55

I mean you can juggle around not returning anything with transduce, but it's more work

mpenet13:10:13

using eduction just to get a reducible for input somewhere else is ok

๐Ÿ‘ 3
mpenet13:10:36

it's not just for "partial application" of xforms imho

pez21:10:56

Assum talk about transducers, there, @slipset.

๐Ÿ™‚ 3