Fork me on GitHub
#clojure-uk
<
2020-04-28
>
dharrigan05:04:27

Good Morning!

thomas07:04:34

back to work after Kingsday (in isolation)

Ben Hammond07:04:57

ooh do you get to be King For A Day?

Ben Hammond07:04:33

(hoping they don't spill your blood to fertilise the fields at the end)

thomas07:04:24

No, we are supposed to celebrate his birthday.

😄 4
thomas07:04:48

But as a republican I disagree.

😄 4
Ben Hammond07:04:42

you wouldn't want to have to elect a head of state though, would you

thomas07:04:50

I'd prefer the German model. No one actually know who the Federal President of Germany is.

Ben Hammond08:04:02

same with India I suppose now you mention it

Ben Hammond08:04:51

he's not objectionable so noone remembers him

thomas08:04:37

Or Israel as well I think.

mccraigmccraig08:04:47

@thomas a schrödinger president ?

alexlynham08:04:01

nah he was president in the 90s

thomas08:04:22

he was chancelor, not president.

thomas08:04:45

But as in a cat: yes, @mccraigmccraig

thomas08:04:04

in that case definitely a yes.

thomas08:04:19

irl he was a bit boring IMHO, but by not joining the Iraq war he did a good thing.

thomas08:04:53

But @alex.lynham proves my point completely... no one knows who the German President is. Therefore it is the solution. (On a different note, why do we need a head of state anyway? surely it is just an invented artifact)

Ben Hammond08:04:31

someone to sacrifice in time of famine

Ben Hammond08:04:48

pestilence might do

thomas08:04:58

That sounds reasonable.

folcon16:04:40

I’ve just looked at a function I wrote and have realised that I might be abusing reducers or underusing loop/recur…

(defn gen-data [{:keys [init scale count noise-factor acceleration] :or {acceleration 0}}]
  (-> (reduce
        (fn [{:keys [scale] :as state} i]
          (-> state
            (update :scale + acceleration)
            (update :result conj (+ init (* scale i) (* (next-rand) noise-factor)))))
        {:scale scale
         :result []}
        (range count))
    :result))
Am I overthinking this?

dominicm16:04:37

Why do you think that?

folcon17:04:24

I don’t know, just thought that the rationale that I was using it was a bit odd? I mean I’m building a complex accumulator and then reducing over it, then once I’m finished, using a keyword to pull out the result… Hence asking, is this normal? Or is there a better way of expressing this? Does someone look at this code and think ick, why didn’t you use x? Mentally I think I tend to treat loop recur as something to pull out only if required

dominicm18:04:22

Just spotted the range. A loop/recur might be a bit tidier.

dominicm18:04:40

Generally I find that reduce is cleaner when there's a sequence involved.

folcon19:04:51

Yea, so it might not be a bad idea to rewrite it then =)… But do you find yourself doing this kind of thing when there is a sequence involved? ie, building this -> {:keys [scale] :as state} and {:scale scale :result []} or should this sort of structure generally imply loop/`recur` or something else instead of reduce?

dominicm19:04:35

Reaching a keyboard in a minute, and golf sounds fun, one sec :)

dominicm19:04:56

@folcon in this case, you can easily break this into 2 phases: Calculating scales & calculating the results:

(defn gen-data
  [{:keys [scale acceleration count init]}]
  (map (fn [scale] (+ init scale))
       (take count (iterate #(+ % acceleration) scale))))
This tidies it up a little. I'd probably use an anonymous literal #() for the (+ init scale ...) part, but I wanted to name it for example purposes.

folcon19:04:46

That’s not quite doing the same thing? Original fn:

(gen-data {:init 10 :scale 0 :count 20 :noise-factor 0 :acceleration 2})
#_=> [10.0 12.0 18.0 28.0 42.0 60.0 82.0 108.0 138.0 172.0 210.0 252.0 298.0 348.0 402.0 460.0 522.0 588.0 658.0 732.0]
Yours:
(gen-data {:init 10 :scale 0 :count 20 :noise-factor 0 :acceleration 2})
#_=> (10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48)

dominicm19:04:23

@folcon I simplified the "math" for demonstration purposes. You need to do the (+ init (* scale i) (* (next-rand) noise-factor)) still :)

dominicm19:04:38

Now I have sample values I can test my complete version :)

dominicm19:04:25

You probably also need map-indexed in order to track the (* scale i)

dominicm19:04:10

@folcon this impl gives me identical results (although I made up next-rand)

(defn gen-data2
  [{:keys [scale acceleration count init noise-factor]}]
  (map-indexed
    (fn [i scale] (+ init (* scale i) (* (next-rand) noise-factor)))
    (take count (iterate #(+ % acceleration) scale))))

folcon19:04:27

Ah, thanks. I do think that’s tidier, not considered using iterate, useful =)…

folcon19:04:25

It seems to be a lot faster as well =)…

dominicm19:04:40

Really?! I expected your version to be faster.

dominicm19:04:11

Something you might note, and this will be dependent on your domain, the count is redundant now.

dominicm19:04:39

You can just return an infinite sequence of data, and then the consumer can call take themselves. This is not possible with a reduce-based solution.

folcon19:04:59

criterium should be better but:

(defn gen-data+acceleration-old [{:keys [init scale count noise-factor acceleration] :or {acceleration 0}}]
  (-> (reduce
        (fn [{:keys [scale] :as state} i]
          (-> state
            (update :scale + acceleration)
            (update :result conj (+ init (* scale i) (* (next-rand) noise-factor)))))
        {:scale scale
         :result []}
        (range count))
    :result))

(defn gen-data+acceleration
  [{:keys [scale acceleration count init noise-factor]}]
  (map-indexed
    (fn [i scale] (+ init (* scale i) (* (next-rand) noise-factor)))
    (take count (iterate #(+ % acceleration) scale))))

(time
  (dotimes [_ 100]
    (gen-data+acceleration-old {:init 10 :scale 0 :count 2000 :noise-factor 0 :acceleration 2})))
"Elapsed time: 1407.115525 msecs"
(time
  (dotimes [_ 100]
    (gen-data+acceleration {:init 10 :scale 0 :count 2000 :noise-factor 0 :acceleration 2})))
"Elapsed time: 0.410662 msecs"

dominicm19:04:25

@folcon Try sticking a doall around my version.

dominicm19:04:41

(it's a lazy seq, so you're not doing anything until you print it, or otherwise do something)

folcon19:04:00

(time
  (dotimes [_ 100]
    (doall
      (gen-data+acceleration-old {:init 10 :scale 0 :count 2000 :noise-factor 0 :acceleration 2}))))
"Elapsed time: 1046.060648 msecs"
=> nil
(time
  (dotimes [_ 100]
    (doall
      (gen-data+acceleration {:init 10 :scale 0 :count 2000 :noise-factor 0 :acceleration 2}))))
"Elapsed time: 198.482497 msecs"
=> nil

dominicm19:04:12

That's impressive actually... I don't know how to explain that!

folcon19:04:15

slower than before, but still better

dominicm19:04:10

My experience has been that combining reducers gets you a performance boost. Historically I've taken several separate filter/map/etc. and combined them into a single reduce producing a single result & seen orders of magnitude improvements.

dominicm19:04:47

Hmm, I guess in the reduce case you're constantly iterating on a hashmap, and my version can avoid that. You just pay the lazy seq cost which I guess is cheap relatively. Useful to know.

folcon19:04:25

Yep, was surprising for me as well =)…

rickmoynihan07:04:50

I’m a little late to the golf party here; but if you take @U09LZR36F’s improvements and port it to use a transducer it’s almost 2x quicker in my crude experiments:

(defn gen-data+acceleration-transduce [{:keys [init scale count noise-factor acceleration] :or {acceleration 0}}]
    (transduce
     (comp (take count)
           (map-indexed (fn [i scale]
                          (+ init (* scale i)
                             (* (next-rand) noise-factor)))))
     conj!
     (transient [])
     (iterate #(+ % acceleration) scale)))
Note this version also uses a transient which leads to a further small improvement over using conj and a standard vector; though the transient doesn’t seem to make as big a difference as you might think.

dominicm08:04:37

I think you're missing a call to persistent

rickmoynihan08:04:44

conj! should do that in the arity-1 function body

dominicm08:04:30

Ah! Clever, interesting. Didn't know that trick.

rickmoynihan08:04:50

oh actually 👀 not sure it does

rickmoynihan08:04:57

I think this is right:

(defn gen-data+acceleration-transduce [{:keys [init scale count noise-factor acceleration] :or {acceleration 0}}]
    (persistent! (transduce
                (comp (take count)
                      (map-indexed (fn [i scale]
                                     (+ init (* scale i)
                                        (* (next-rand) noise-factor)))))
                conj!
                (iterate #(+ % acceleration) scale))))

rickmoynihan08:04:28

i.e. it does call (transient []) for you for an initial value; but won’t call persistent! — which makes sense, as I guess you may still want to do more on the transient value.

dominicm08:04:15

The downside of this version is no laziness. Although sequence can do that.

dominicm08:04:10

Conceptually I like that the function generates infinite data and the consumer can use take instead of there being a count

rickmoynihan09:04:06

Yeah — I chose to keep the existing contract for comparison. And I agree with you about not limiting it artificially. Transducers are supposed to give you that flexibility by splitting the xform from the starting sequence; however here the xforms are pretty tightly coupled to the starting sequence. You could probably refactor to take extra xforms to comp in, e.g. you could supply (take 10) as an argument — but I think that would start getting a bit messy.

rickmoynihan09:04:44

Though you could possibly supply the collection of scaled accelerations, and take on that; prior to transducing.

dominicm09:04:22

Without knowing the domain, I'm unsure whether it makes sense to do that really

rickmoynihan10:04:18

agreed — and in this case I don’t think it make sense, mainly suggesting it as a potential solution in other cases when this sort of thing arises

dominicm10:04:56

Yeah. Transducers are also just fun 😛

rickmoynihan10:04:03

yeah they are — though I do sometimes struggle with splitting their concerns, whilst retaining things a lazy solution might give… the above tight coupling between input seq and the xform being an example of it

folcon14:04:22

Thanks for this @U06HHF230! I use transducers a lot, but pretty much excursively in the context of (into [] some-xf coll). I really need to get comfortable using it in other ways =)… One thing I still find difficult is working out good ways of thinking around composing them, I’m also still not super comfortable just pulling an xform from a place and intuiting, oh, it just fits here… I suppose I don’t have a good feel for treating them similarly to how I treat higher order functions…

rickmoynihan14:04:54

TBH that way is even nicer too:

(defn gen-data+acceleration-transduce [{:keys [init scale count noise-factor acceleration] :or {acceleration 0}}]
                              (into [] (comp (take count)
                                            (map-indexed (fn [i scale]
                                                         (+ init (* scale i)
                                                            (* (next-rand) noise-factor)))))
                                    (iterate #(+ % acceleration) scale)))

rickmoynihan14:04:49

And it will already use a transient and conj! so should be just as fast

folcon14:04:00

Comparing:

(time
    (dotimes [_ 100]
      (doall
        (gen-data+acceleration-transduce {:init 10 :scale 0 :count 2000 :noise-factor 0 :acceleration 2}))))
  #_#_=> "Elapsed time: 83.439927 msecs"

  (time
    (dotimes [_ 100]
      (doall
        (gen-data+acceleration-into-transduce {:init 10 :scale 0 :count 2000 :noise-factor 0 :acceleration 2}))))
  #_#_=> "Elapsed time: 112.475216 msecs"
Not quite as good as your non-into version…

folcon14:04:19

Really need to spend some more time with criterium and up my benchmarking / profiling >_<… But in my mind not worth doing that until I have an app that’s got some meat to it =)… Almost there!

rickmoynihan14:04:51

I suspect that difference is due to unreliable benchmarking that criterium would help iron out; if you look at the definition of into, it’s essentially exactly what I wrote above.

rickmoynihan14:04:27

i.e. into is just a thin layer on transduce in that call path.

folcon15:04:45

That’s reasonable, let’s give that a go =)…

folcon16:04:22

After spending some time restarting because of editor existential failure

(criterium/bench
    (doall
      (gen-data+acceleration-old {:init 10 :scale 0 :count 2000 :noise-factor 0 :acceleration 2})))
Evaluation count : 35280 in 60 samples of 588 calls.
             Execution time mean : 1.639113 ms
    Execution time std-deviation : 53.330771 µs
   Execution time lower quantile : 1.588486 ms ( 2.5%)
   Execution time upper quantile : 1.788930 ms (97.5%)
                   Overhead used : 4.082033 ns

Found 3 outliers in 60 samples (5.0000 %)
	low-severe	 1 (1.6667 %)
	low-mild	 2 (3.3333 %)
 Variance from outliers : 19.0160 % Variance is moderately inflated by outliers

(criterium/bench
    (doall
      (gen-data+acceleration {:init 10 :scale 0 :count 2000 :noise-factor 0 :acceleration 2})))
Evaluation count : 44100 in 60 samples of 735 calls.
             Execution time mean : 1.239084 ms
    Execution time std-deviation : 87.727749 µs
   Execution time lower quantile : 1.162137 ms ( 2.5%)
   Execution time upper quantile : 1.502526 ms (97.5%)
                   Overhead used : 4.082033 ns

Found 4 outliers in 60 samples (6.6667 %)
	low-severe	 2 (3.3333 %)
	low-mild	 2 (3.3333 %)
 Variance from outliers : 53.4285 % Variance is severely inflated by outliers

(criterium/bench
    (doall
      (gen-data+acceleration-transduce {:init 10 :scale 0 :count 2000 :noise-factor 0 :acceleration 2})))
Evaluation count : 100620 in 60 samples of 1677 calls.
             Execution time mean : 593.705999 µs
    Execution time std-deviation : 23.468176 µs
   Execution time lower quantile : 568.977017 µs ( 2.5%)
   Execution time upper quantile : 649.404382 µs (97.5%)
                   Overhead used : 4.082033 ns

Found 5 outliers in 60 samples (8.3333 %)
	low-severe	 3 (5.0000 %)
	low-mild	 2 (3.3333 %)
 Variance from outliers : 25.4815 % Variance is moderately inflated by outliers

(criterium/bench
    (doall
      (gen-data+acceleration-into-transduce {:init 10 :scale 0 :count 2000 :noise-factor 0 :acceleration 2})))
Evaluation count : 90540 in 60 samples of 1509 calls.
             Execution time mean : 652.417338 µs
    Execution time std-deviation : 20.273083 µs
   Execution time lower quantile : 630.189989 µs ( 2.5%)
   Execution time upper quantile : 697.334784 µs (97.5%)
                   Overhead used : 4.082033 ns

Found 6 outliers in 60 samples (10.0000 %)
	low-severe	 5 (8.3333 %)
	low-mild	 1 (1.6667 %)
 Variance from outliers : 17.4268 % Variance is moderately inflated by outliers

rickmoynihan08:04:54

I suspect that small difference between the into variant and the transduce one is just the instance check and the attaching of metadata that into does, and you’ll find that if you increase your count to maybe 50,000 or something larger that the difference will become much less significant. Depends on how you plan on using this though.

folcon14:04:55

Attaching metadata? I didn’t know that into adds metadata. They’re basic functions to generate sample data, so I won’t be generating large numbers =)…

rickmoynihan15:04:49

Yeah just that: (meta (into (with-meta [1 2 3] {:a :b}) [4 5 6])) ;; => {:a :b} It’s the same semantics with most of the core collection functions; that metadata should be forwarded onto the new collections as you build them.

folcon16:04:18

Oh that’s cool =)…

dominicm19:04:35

@folcon working late, or an interesting hobby?

folcon19:04:20

In this case working late, I really want to get back to my hobby stuff, but I need to finish this >_<…

dominicm19:04:21

🐋 I hope you get through it fast and can return to rest

folcon20:04:38

Me too ;)…