This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-10-15
Channels
- # admin-announcements (1)
- # announcements (11)
- # asami (6)
- # aws (26)
- # babashka (17)
- # beginners (119)
- # bristol-clojurians (7)
- # chlorine-clover (2)
- # cider (3)
- # circleci (1)
- # clj-kondo (10)
- # clojure (127)
- # clojure-australia (3)
- # clojure-dusseldorf (5)
- # clojure-europe (135)
- # clojure-france (5)
- # clojure-nl (8)
- # clojure-uk (6)
- # clojurescript (103)
- # clojurewerkz (1)
- # css (2)
- # cursive (5)
- # datalog (5)
- # datomic (36)
- # emacs (3)
- # events (2)
- # figwheel-main (3)
- # fulcro (1)
- # graalvm (3)
- # helix (31)
- # jobs-discuss (4)
- # leiningen (1)
- # london-clojurians (1)
- # malli (17)
- # off-topic (2)
- # parinfer (10)
- # portal (1)
- # re-frame (48)
- # reitit (2)
- # reveal (12)
- # shadow-cljs (3)
- # sql (3)
- # tools-deps (4)
- # vim (4)
- # xtdb (22)
@slipset it is, but messier and noisier (unless your drive by management includes shooting)
first time I've used eduction and thought "ah, that feels like a good use"
(defn csv->nippy [in-file out-dir]
(with-open [reader (io/reader in-file)]
(run!
(fn [data]
(let [idx (-> data first :simulation)]
(nippy/freeze-to-file (str out-dir "simulated-transitions-" idx ".npy") data)))
(eduction
(drop 1)
(map #(zipmap header %))
(map scrub-transition-count)
(partition-by (fn [{:keys [simulation]}] (quot simulation 100)))
(csv/read-csv reader)))))
@otfrom I think there's no real benefit of using eduction vs transduce here probably
@otfrom an eduction is basically just an xform + a source, which delays running that xform over that coll, and offers the ability to compose with more xforms.
(deftype Eduction [xform coll]
...)
it's probably one of those things that you will need when you know you need it, in other cases yagni
an eduction is only useful when you want to pass it around, in this function you have everything you need already, there's no need to create a wrapper around that
you could also make csv/read-csv an IReducible thing, so you create even less garbage
ok... I had thought that eduction would only realise a portion of what was coming in whereas transduce would realise the whole collection which would then go to run!
@otfrom that's not very much different from that blog for processing lines of text maybe?
(finding good examples everyone agrees on for doing that feels hard, unless that ETL blog is the right way)
your example above is right, it's just not necessary to use eduction, since that boils down to just transduce. It's like writing (+ (identity 1) (identity 2))
while you could also write (+ 1 2)
doing what I did above at least had the advantage of working, whereas before getting all 500MB of csv into a vector of maps and then passing that to nippy ran out of memory
@borkdude ok, from my reading around it felt like the difference between transduce
/`into` & sequence
/`eduction` was similar to the difference between []
and a seq
and the difference between eduction and sequence was that sequence would hold the results in memory while eduction would recalculate each time
user=> (into [] (comp (drop 2) (take 1)) (range))
[2]
This doesn't realize the entire range, does it?user=> (transduce (comp (drop 10) (take 1)) (fn ([]) ([x]) ([x y] (prn y))) (range))
10
Note that this skips over 10 numbers, then takes 1 number, prints it and then quits.so you could do your side effect in the transducing function maybe, instead of first realizing it into a lazy seq
user=> (defn run!! [f xform coll] (transduce xform (fn ([]) ([x]) ([x y] (f y))) coll))
#'user/run!!
user=> (run!! prn (comp (drop 10) (take 1)) (range))
10
nil
I want to grok transduce. It doesn't ”click” yet for me. Anyone seen a tuturial about it that can be recommended?
@pez Have you seen https://clojure.org/reference/transducers?
the problem with some docs is that you need to understand it before you can actually understand the documentation. The transducers might well fall in that catergory.
re: https://clojurians.slack.com/archives/CBJ5CGE0G/p1602760474478900 I was more thinking about how much of it was realised at any one time w/o the possibility of garbage collection. My understanding was that transduce would put all of the transduced things in memory whereas eduction would only have (1?) some things in memory at any one time
ok. That is the bit that I'm struggling with. I'm not too surprised to hear that I've got it wrong. 🙂
yes, reduce is also eager. but that doesn't mean it will realize the entire input or hold everything in memory at once. Transducers know when to stop similar to how reduce knows to stop using a reduced value
I mean that the result of the reduce will be held in memory all at once, whereas the eduction will only realise as much as has been asked for
so (take 10 (eduction (map identity) (range 100000))
would only realise the first 10 things, whereas (take 10 (transduce (map identity) my-conj (range 1000000))
would realise the whole result of processing the range and then the take would take from the fully realised thing.
but in your example you use run!
over the entire result, so the eduction is not relevant there?
perhaps it is run! I'm not understanding. I thought run! would only have the one element in memory at a time that it was trying to process (unless the collection or collection producing function realised more than one)
run! is effectively just reduce, but you're reducing your entire eduction right. you're not lazily doing anything with your eduction. so in this case transduce or eduction boil down to the same thing
but reducing into a hash is going to take less memory than reducing into a seq of all the data. My understanding is that run! would not hold all of the seq in memory to do its work
but that if it was working on the result of transducing something into a vector then the whole vector would be in memory
I'm not sure if I'm explaining myself badly or if my massive gap in knowledge is tripping me up
about "laziness" (not the right term here imho), the eduction will be pulled in value by value, then if the input is realized? or not is another matter
The problem I'm trying to solve is that I need to transform data from a CSV and write it out as partitioned nippy files without blowing up memory
transduce is a bit like reduce, you could throw-away the accumulation every-time, but then why use transduce in the first place (if you mean using transduce instead of run!)
@otfrom My point with using transduce was: you're running over with a side effect. you could do the side effect in transduce instead, saving you the realisation of an eduction. But as I also pointed out, it may not be so important. mpenet has repeated this.
you never really realize an eduction, it never materializes, it's really just (sort of) an iterator
ok, that's a good point yes. no garbage from the eduction.
user=> (defn run!! [f xform coll] (transduce xform (fn ([]) ([x]) ([x y] (f y))) coll))
#'user/run!!
user=> (run!! prn (comp (drop 10) (take 1)) (range))
10
nil
I usually use it with keywords (map (juxt :field-a :field-b) [{:field-a 1 :field-b 2}])
I use juxt all the time, but then I need to create a lot of vector from maps of data to go into excel or csv files, so select-keys doesn't work for me
yeah, but has mpenet has pointed out, the overhead from the eduction might be small enough not to make this an issue
I not only use juxt, I use (apply juxt vec-of-keys) b/c I'm a monster who does (into [] (map #(friendly-key-lookup %) vec-of-keys) as well
thx for having the patience to go through this with me. I feel I understand a lot more of what is going on. 🙂
I remember a Clojure meetup in Amsterdam with the author of Midje doing a talk and somehow he needed matrix transposition. I just yelled: apply map vector. It's one of these things you just know ;)
@pez The basic idea: What would be a more performant way of writing:
(->> [1 -10 11 -2] (filter pos?) (map inc))
You could squash filter and map into one function that runs over the seq:
(defn f [x] (when (pos? x) (inc x)))
and then do:
user=> (keep f [1 -10 11 -2])
(2 12)
ok. That is the bit that I'm struggling with. I'm not too surprised to hear that I've got it wrong. 🙂
Scary to share it here, but I’ve given a talk on them https://youtu.be/_4sgTq4_OjM
it's a fine use of eduction, you don't need the return value of transduce so eduction+run! is ok