Fork me on GitHub
#adventofcode
<
2017-12-05
>
bhauman01:12:05

wow I love how everyone is getting into it this year

bhauman01:12:26

@grzm I'm digging how you generated the direction list for day3

minikomi02:12:07

sent a PR for readme!

rj.russell.1004:12:56

being brand new to Clojure and trying to figure out AoC is incredibly frustrating haha

fellshard05:12:20

Hmmmmm. Day 5 Part B is taking an awful long time to run with my soln

fellshard05:12:20

Probably should've gone mutable or something

minikomi05:12:46

not too tricky, 2nd part took a while to execute

fellshard05:12:32

Mine's not finishing, even giving it a fair chunk of time...

minikomi05:12:45

don't forget you can go out both ends of the list

minikomi05:12:48

switching to a transient for the "instruction tape" sped it up a bit

minikomi05:12:03

..actually a lot .. 2s vs 11s

fellshard05:12:12

I've got an infinite loop or something in mine

fellshard05:12:43

Ostensibly it should always exit, all instructions gravitate towards +3

fellshard05:12:55

Checking for both bounds

fellshard05:12:19

In the end, I made a mistake by using http://repl.it

fellshard05:12:32

Lightning fast on local

dpsutton06:12:53

that was fun. did we all just use maps and update?

erwin06:12:23

I started my solution with also updating the new location, made the problem a lot harder and incorrect 😕 Part B took some time indeed

dpsutton06:12:48

(time (solve2)) "Elapsed time: 22630.602584 msecs" for map with update

bytecrafter06:12:11

Heading to bed, but I used a vector and update, no transients, and part 2 took about 10s to run for me.

cjmurphy07:12:39

I used iterate - easy to think about a state -> state function and an exit condition on that state.

borkdude07:12:22

I want to start on day 5 but suddenly I’ve got CIDER problems…

borkdude07:12:30

Still managed with inf-clojure 🙂

val_waeselynck08:12:45

@dpsutton I used an int-array, took about 120ms to solve 2 🙂

andofryn08:12:55

mine is taking forever 😄

val_waeselynck08:12:14

So yeah, Clojure is fast 😇

andofryn08:12:39

wooo, it’s finally finished! Probably took around 10 minutes, though I was running it through a REPL instance in Visual studio code.

magic_bloat08:12:32

Using update-in and a vector - Part 2 took 36 seconds on my little computer.

andofryn08:12:12

This is my slow solution: https://pastebin.com/Cj4VZVR3 I’m new to Clojure, if anyone can see something I’m doing that’s really stupid (and maybe explains the slowness) that’d be very helpful

val_waeselynck08:12:44

@ you should inline inc-accordingly, when you define it as a function it will convert primitive ints to objects

val_waeselynck08:12:03

@ and maybe more importantly, you should probably type-hint nrs with ^ints nrs

val_waeselynck08:12:46

evaluate (set! *warn-on-reflection* true) before running your code to see if that's the case

andofryn09:12:22

thanks for the suggestions, very interesting! I’m missing the understanding of why those things make a difference though, I’ll do some googling to find that out 🙂

andofryn09:12:57

I am indeed getting some warnings after setting warn-on-reflection

andofryn09:12:59

null:12 recur arg for primitive local: current_index is not matching primitive, had: Object, needed: long
Auto-boxing loop arg: current-index
Reflection warning, projects/clojure/advent-of-code/day5/solution.clj:8:37 - call to static method alength on clojure.lang.RT can't be resolved (argument types: unknown).
Reflection warning, projects/clojure/advent-of-code/day5/solution.clj:10:21 - call to static method aget on clojure.lang.RT can't be resolved (argument types: int, int).

val_waeselynck09:12:19

@ you have a bug actually, you want to do (alength nrs)

val_waeselynck09:12:00

well not a bug actually, but probably not what you intended either

andofryn10:12:55

good catch!

borkdude09:12:53

How can I get rid of this warning:

Boxed math warning, ... - call: public static boolean (java.lang.Object,long).
Trying mutable arrays for speedup

orestis09:12:29

I used vectors and assoc, “Elapsed time: 23169.810878 msecs” for part 2.

borkdude09:12:32

Mine takes about 5 seconds

borkdude09:12:25

I have one variant with a vector, a transient and an array: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj I’m still wondering why the array version is so slow

orestis09:12:36

@val_waeselynck Do you have your solution somewhere?

minikomi09:12:06

yeah, transient seems fastest for me too:

advent.2017.day5> (c/quick-bench (solve2 input))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 6.091951 sec
    Execution time std-deviation : 254.233063 ms
   Execution time lower quantile : 5.895512 sec ( 2.5%)
   Execution time upper quantile : 6.518767 sec (97.5%)
                   Overhead used : 1.887221 ns

Found 1 outliers in 6 samples (16.6667 %)
	low-severe	 1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
advent.2017.day5> (c/quick-bench (solve2-int-array input))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 1.403268 sec
    Execution time std-deviation : 18.226029 ms
   Execution time lower quantile : 1.381179 sec ( 2.5%)
   Execution time upper quantile : 1.419574 sec (97.5%)
                   Overhead used : 1.887221 ns

advent.2017.day5> (c/quick-bench (solve2-transient input)))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 1.306124 sec
    Execution time std-deviation : 15.056565 ms
   Execution time lower quantile : 1.291127 sec ( 2.5%)
   Execution time upper quantile : 1.320360 sec (97.5%)
                   Overhead used : 1.887221 ns

minikomi09:12:46

oh maybe it was not casting to int in the comparison

minikomi09:12:02

doing so gives:

advent.2017.day5> (c/quick-bench (solve2-int-array input))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 1.393645 sec
    Execution time std-deviation : 10.345934 ms
   Execution time lower quantile : 1.381836 sec ( 2.5%)
   Execution time upper quantile : 1.404963 sec (97.5%)
                   Overhead used : 1.887221 ns

borkdude09:12:34

ah, alength instead of count

val_waeselynck09:12:04

Some performance tips: - Vars lookups are slower and harder to optimize than locals lookups - Make sure to type hint primitives and primitive arrays when necessary - Primitive values (ints, bytes, booleans etc.) cannot travel across functions (they get boxed into objects) - Protocols are way faster than multimethods - Looking up a key in a record type is 5 to 10x faster than looking it up in a plain map

minikomi09:12:07

length doesn't change so i set that in a let before the loop.. for me it was the comparison not being (int 3) which was slowing things down

orestis09:12:52

@val_waeselynck I modified your version a bit, almost halved the execution time (650ms -> 380ms) - just moved the alength call outside.

val_waeselynck11:12:47

@orestis Do you mean that moving the alength call outside improved the perf of my version by 2x?

orestis14:12:50

I think that might be it, yes. I’m still a noob so I’m not really sure what I’m doing there; type hinting is a bit opaque, but I don’t get any reflection warnings so…

orestis09:12:17

Adding the (int i) in comparisons actually makes things slower.

minikomi09:12:17

wow, borkdude's version smokes mine

minikomi09:12:57

advent.2017.day5> (c/quick-bench (bork-int-arr input))
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 195.021813 ms
    Execution time std-deviation : 516.193459 µs
   Execution time lower quantile : 194.408773 ms ( 2.5%)
   Execution time upper quantile : 195.624313 ms (97.5%)
                   Overhead used : 1.887221 ns

minikomi09:12:29

(defn solve2-int-array [input]
  (let [i (into-array Integer/TYPE input)
        len (count input)]
    (loop [idx (int 0) c (int 0)]
      (if (or (> 0 idx) (<= len idx)) c
          (let [v (aget ^ints i idx)]
            (if (<= (int 3) v)
              (aset-int i idx (dec v))
              (aset-int i idx (inc v)))
            (recur (+ idx v)
                   (inc c)))))))

minikomi09:12:43

Any ideas of where to type hint that?

orestis09:12:35

@borkdude’s version takes 1000ms on my machine… Is running this inside a CIDER repl a bad idea?

borkdude09:12:53

@minikomi put this in your code:

(set! *warn-on-reflection* true)
(set! *unchecked-math* :warn-on-boxed)

minikomi09:12:17

I have both, but neither is complaining

borkdude09:12:18

@orestis Did you pull the most recent?

borkdude09:12:43

@minikomi Maybe at ^ints before i in let

borkdude10:12:03

@minikomi also don’t use count, but alength

borkdude10:12:27

@minikomi also don’t use aset-int, I discovered that’s slower as well

minikomi10:12:17

OK making those changes

minikomi10:12:20

sped up some

orestis10:12:26

@borkdude Just did, got it to 600ms.

borkdude10:12:41

@orestis what kind of machine are you on?

orestis10:12:46

I am using int-array instead of into-array, would that make a difference?

minikomi10:12:47

sped up a ton

orestis10:12:15

@borkdude 2014 Macbook Pro, 2.8GHz i5

borkdude10:12:34

@orestis Are you running exactly my code or an adaptation ?

orestis10:12:28

Tiny adaptation; I create the array with int-array, and don’t have the reducible-resource stuff; but I do this for all the adaptations.

minikomi10:12:05

the ^ints in the let by itself took me from 1.3s -> 98ms

minikomi10:12:38

weird.. it's like, make an array of type integer (by the way, it's an integer array!)

borkdude10:12:15

@orestis that’s effectively the same as mine?

borkdude10:12:34

oh yeah int-array

orestis10:12:30

I wonder if try/catch OutOfBounds exception would work.

borkdude10:12:30

btw I pushed a new version, but the performance is the same

orestis10:12:01

BTW, everyone has different inputs so we need a baseline if we are going optimize this silly thing 🙂

minikomi10:12:26

good point 😆

borkdude10:12:41

haha oh yeah

borkdude10:12:48

my input file is on github as well

borkdude10:12:35

tfw when you get out of bed somewhat earlier for Advent of Code, but CIDER won’t connect to your nrepl anymore and you have to fall back on inf-clojure …

orestis10:12:34

Is there any project that could easily take those different implementations and run proper benchmarks etc? Not sure how warm/cold JIT makes any difference here.

minikomi10:12:35

after more poking around, aset vs aset-int does indeed seem to be the culprit.

minikomi10:12:07

aset:

Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 102.748740 ms
aset-int:
Evaluation count : 6 in 6 samples of 1 calls.
             Execution time mean : 1.391871 sec

borkdude10:12:11

At this point you might want to write it in Java and just call from Clojure 😉

minikomi10:12:48

true. still fun to overthink 🙂

orestis10:12:58

@borkdude Well, the “naive” Clojure version that uses a vector can look really similar to the one that uses int arrays. Still a 10 line function anyway 🙂

borkdude10:12:28

true, but dealing with things like “should I use aset or aset-int” isn’t fun imho

borkdude10:12:01

why so slow, aset-int

orestis10:12:35

That’s because y’all know too many core functions for your own good 🙂

chrisblom12:12:04

Is there a usecase for aset-int if it is slower than aset? Or is it just a legacy function?

borkdude12:12:54

@chrisblom very good question

mfikes12:12:42

It's looking like loop / recur is the most popular for day 5, followed by iterate

borkdude12:12:05

I usually start with reduce, but when I need multiple things as the argument of the reducing function, I give up and fall back on loop

chrisblom12:12:16

if i macro expand the definition and clean it up a bit there are only 2 differences: aget has :inline metadata and uses .set instead of .setInt

mfikes12:12:17

I actually used reduce (on (range) as the step counter)

borkdude12:12:48

@mfikes Yeah, I think loop can mostly be rewritten to a reduce with destructuring

borkdude12:12:30

@mfikes Neat that you’re doing it with cljc now

mfikes12:12:12

The nice trick I missed is relying on getto return nil, as opposed to pre-checking the index

borkdude12:12:35

@mfikes That trick worked for me, until I got to primitive arrays

mfikes12:12:50

(You can learn so much by studying the range of solutions. 🙂 )

mfikes12:12:48

Yeah, that's the story. You can write clean Clojure, and then when you want it to go faster, you delve into messiness. (I suppose that's true in any language.)

mfikes12:12:28

At least in Clojure we can really push hard on the "clean" end of the spectrum.

borkdude12:12:47

Well, Scala really is better (not the right word, I must say, friendlier/easier) at this kind of stuff, while still allowing you to do functional. I think the type system helps to generate more primitive stuff if needed.

borkdude12:12:19

But I only need mutability this during Advent of Code really, and then only for extra performance, not for getting the answer, so 🙂

mfikes12:12:47

Ahh, right. Does Scala essentially generate code like C++ does when instantiating templates?

mfikes12:12:58

(The closest I think I've seen Clojure come to this is how ClojureScript has macro variants of functions that can try to generate optimal code based on what is available at compile time.)

borkdude12:12:18

@mfikes Don’t know, I just didn’t have as much trouble with it as in Clojure: https://gist.github.com/borkdude/b37563939639b40013f483361a7c5d8f

borkdude12:12:09

(I was just learning Scala back then for fun)

mfikes12:12:02

An example of how (I suppose essentially anything) can be done with macros is how str in ClojureScript passes the literal strings in something like (str "a" 1 "b") untouched. (I think in Clojure it may still end up invoking (.toString "a") and (.toString "b").)

mfikes13:12:23

That may not be the best example, but it illustrates that anything can be done at compile time with macros if you are trying to eek out perf for the generated code.

borkdude13:12:00

@mfikes How does this relate to primitive array performance?

mfikes13:12:46

I'm just saying that if Scala can do nice things with types, Clojure may also be capable of the same if you go down the road of using macros (and if you have access to inferred types).

borkdude13:12:14

With the macro approach you can get rid of some calls like (str "foo" "bar"), but will it get you very far, I don’t know?

mfikes13:12:21

I suppose Scala has the benefit of much richer type inference.

borkdude13:12:31

There was also a talk about this topic at EuroClojure 2017: http://2017.euroclojure.org/nada-amin/

borkdude13:12:50

where the type system was used to generate fast low level code: https://www.youtube.com/watch?v=QuJ-cEvH_oI

mfikes13:12:01

I haven't thought about this too much, being on the JavaScript side of things 🙂 In that world, the engine watches the values fly by and infers their types, dynamically generating optimal machine code, and de-opting if it sees a value that violates assumptions, etc.

mfikes13:12:52

The end effect is that where in Clojure you can manually hint, in ClojureScript, you need to let the lower layers do it for you.

borkdude13:12:38

Yeah, I kind of wondered why you don’t need to add type hints in ClojureScript, so that’s how it works

mfikes13:12:01

The primary type hints in ClojureScript are ones that help avoid checked if: If the compiler knows it is dealing with a Boolean value for example, it can just use JavaScript's notion of truthiness.

borkdude13:12:50

@mfikes Do you perhaps have a clojurescript version of day 5 which runs on Node, to compare it with the performance of PureScript day 5?

mfikes13:12:28

I'll convert it to Lumo real quick

mfikes13:12:42

Or are you interested in Closure-optimized code?

mfikes13:12:58

I could write that fairly easily.

borkdude13:12:21

does it help for performance? and mutable array version is fine

mfikes13:12:07

I can take your fastest mutable array version, and produce an optimized Node-compatible JavaScript file for you to use to compare with PureScript.

mfikes13:12:18

What is your recommended mutable array version?

borkdude13:12:44

I don’t about the absolute fastest, but this is my fastest: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj#L54

mfikes13:12:24

Cool. I'm putting together a GitHub repo so it is possible to revise the code if needed and build and run it in Node

borkdude13:12:12

@mfikes I only asked if you had this laying around, not if you wanted to put time in it, but if you like to do it, still thanks 🙂

mfikes13:12:22

Nah, it is close to trivial

mfikes14:12:44

@borkdude For me, ClojureScript / Node.js is running that array-based solution 25% faster than Clojure. Here it is, set up so it is trivial for anyone to reproduce perf results or change the code: https://github.com/mfikes/advent-day-5-cljs-node

borkdude14:12:25

@mfikes Awesome, I’ll try

mfikes14:12:24

Interestingly Planck runs it at about the same speed as the Closure optimized JavaScript under Node.js. (JavaScriptCore is wicked fast.)

borkdude14:12:23

I get 240 ms on the JVM, 400 on Node

borkdude14:12:24

hmm, weird enough, in lein I get 800 ms.. it’s slower than from my REPL it seems

val_waeselynck14:12:30

Maybe the JVM didn't have enough time to warm up. V8 is probably more optimized than the JVM for quick warmup

borkdude14:12:50

@mfikes boot: 240 ms, lein 800 ms?

mfikes14:12:23

I think I need to add the -server JVM option to the lein setup in that project

mfikes14:12:02

@val_waeselynck I specifically have the perf test running over and over again to avoid warm up issues. (Good point, though 🙂 )

val_waeselynck14:12:50

@mfikes yeah I noticed but I'm wondering if that's enough, Criterium typically seems to do much more sophisticated things to take both warmup and GC into account 🤔

borkdude14:12:56

Ooh even faster with your input (I forgot the inputs differs)

mfikes14:12:27

@val_waeselynck I agree. But we have a huge perf diff right now to sort out 🙂

mfikes14:12:46

Yeah, it needs -server

borkdude14:12:49

really weird, I run both tests from the repl. Exact same code now.

mfikes14:12:50

Patching the repo

borkdude14:12:59

does boot run with server by default maybe?

borkdude14:12:48

or maybe that’s the default and lein adds client

mfikes14:12:26

Yeah, lein run evidently wants to minimize startup latency

borkdude14:12:41

yeah, now it works 🙂 180ms

borkdude14:12:41

@mfikes Thanks, that was a nice adventure

mfikes14:12:41

Yeah, with any perf tests you really really need to provide the code so others can try it as well 🙂

val_waeselynck14:12:01

@borkdude so what are your numbers now?

borkdude14:12:23

@val_waeselynck same as mike’s repo, ~ 180 ms on my machine in the JVM

mfikes14:12:53

My numbers are Clojure: 167 ms, ClojureScript, Closure / Node.js: 384 ms Planck: 570 ms

val_waeselynck14:12:10

Alright, thanks!

borkdude14:12:46

Some guy on Reddit had 75 ms using C++… we’re not done yet 😉

borkdude14:12:00

Maybe we can write a Clojure program which generates C++ and run that one

borkdude14:12:18

Didn’t Rich write C++ like that as well in Common Lisp? 😉

val_waeselynck14:12:05

Yes, but was it the same input? 🙂

borkdude14:12:14

Assembly would probably be not so difficult for this problem

borkdude14:12:21

or maybe straight JVM bytecode?

borkdude15:12:41

well, twice the speed of C++ ain’t bad

val_waeselynck15:12:23

especially when you haven't compiled it with heavy optimizations

val_waeselynck15:12:03

JIT is probably the best strategy for low latency (including the whole write + debug + compile + run process)

bhauman15:12:55

so why does the array implementation take so long???

bhauman15:12:14

checking it out with ints instead of longs

bytecrafter15:12:45

@mfikes AFAIK, Scala's generics aren't like C++ templates. Scala's generics are mostly the same as Java's generics - type erasure and boxing. There are conveniences, like implicit parameters to capture class tokens. But they're fundamentally not a code generation mechanism, as C++ templates are.

mfikes15:12:25

@bytecrafter Cool. I vaguely (perhaps mis-) recall something in Scala generating code covering the possible permutations of primitives, but that was perhaps 5 years ago for me.

bhauman15:12:36

@mfikes are you guys using the same set of inputs for benchmarking?

mfikes15:12:52

No, we realized that this causes a problem @bhauman

mfikes15:12:11

At least, we caught one instance recently were we forgot the data changes. 🙂

bhauman15:12:20

the final solution should determine the complexity

bhauman15:12:42

mine is in the 28 million

mfikes15:12:58

I haven't been too involved in the perf game, until some fun stuff this morning

bhauman15:12:26

well my fastest is 5 seconds! where as yours is in the ms

mfikes15:12:30

My day 5 result was 24,315,397

bhauman15:12:44

ok so same complexity

bhauman15:12:53

darn it son

mfikes15:12:00

My reduced-based solution using persistent data structures takes about 5 seconds

mfikes15:12:19

That benchmark repo is banging on primitive arrays

mfikes15:12:31

It is Java in disguise 🙂

bhauman15:12:17

my array implementation performs much worse, so I'm going to take this opportunity to learn some things ....

borkdude15:12:06

Came up with another perf optimization, but didn’t help: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj#L127 Still 232 ms

borkdude15:12:22

@bhauman Didn’t see your solution, but aset-int are not so good for performance, we found out today

borkdude15:12:50

@bhauman Also:

(set! *warn-on-reflection* true)
(set! *unchecked-math* :warn-on-boxed)
helps

bhauman15:12:37

oh dems some tricks

bhauman15:12:42

cool I just posted mine

val_waeselynck15:12:02

@bhauman According to the docs, I think you probably want (int ...) instead of ^int ... https://clojure.org/reference/java_interop#optimization

bhauman15:12:30

yeah thanks I'm still working on the scratch area

bhauman15:12:38

just added that thanks!

bhauman15:12:00

I never have to optimize stuff like this so this is kind of a blast

val_waeselynck15:12:23

@bhauman I'm currently writing a home-made GraphQL engine, so I'm totally in the performance business right now 🙂

bhauman15:12:15

oh darn, sounds fun?????

val_waeselynck15:12:29

it is 🙂 actually not exactly a GraphQL engine, since it's an alternative to GraphQL with some added features.

val_waeselynck15:12:15

However, I guess the core server-side algorithm could be reused to implement both GraphQL engines and Om Next routers

chrisblom15:12:07

@borkdude mine is similar, but i'm not passing the mutable array via recur, runs in 130ms on my machine: https://github.com/ChrisBlom/advent-of- @code/blob/master/src/adventofcode/2017/day05.clj#L16-L33

borkdude15:12:43

Maybe try the same input for comparison?

borkdude15:12:16

I tried without passing the array in the loop, but it did not matter I think

borkdude15:12:23

it’s just a reference right

mfikes15:12:00

@borkdude What kind of black magic is involved in annotating a local as a primitive ^int but treating it as a Boolean?

mfikes15:12:21

Hmm. Something to learn there...

borkdude15:12:29

@mfikes The bound value is an int, because and returns the int. But if it’s non-true, it won’t be bound at all

mfikes15:12:46

Ahh. Thank you. I need to macroexpand to see that.

borkdude15:12:58

(set! *warn-on-reflection* true)
(set! *unchecked-math* :warn-on-boxed)
(if-let [^int x (and true 1)] (inc x) :else) ;; good
(if-let [x (and true 1)] (inc x) :else) ;; bad, boxed
(macroexpand ’(if-let [^int x (and true 1)] (inc x) :else))
(let* [temp__4655__auto__ (and true 1)] (if temp__4655__auto__ (clojure.core/let [x temp__4655__auto__] (inc x)) :else))

borkdude15:12:32

it seems macroexpand doesn’t show the type hints?

borkdude15:12:47

it will if you (set! *print-meta* true)

bytecrafter15:12:08

@mfikes Interesting, I didn't know about that in Scala. Thanks!

mfikes15:12:17

Is [1 -3] a legitimate input? (In other words, can it exit by going negative?)

borkdude15:12:49

@mfikes I thought about that, but since I got the answer, I didn’t worry about it anymore 😉

mfikes15:12:07

@borkdude I got yours to run in 80 ms. I'll put the intersting change in a branch

borkdude15:12:35

that is very close to the C++ version

mfikes15:12:36

I was trying to eliminate a checked if in what ClojureScript produces, and got it to run faster, but it also made the Clojure faster (if everything pans out)

borkdude15:12:37

if this works, we have world domination

mfikes16:12:50

It is actually around 87 ms

borkdude16:12:08

veeery nice, let’s see if I understand

mfikes16:12:12

Wow. Same speed in Node

borkdude16:12:21

still the correct answer?

borkdude16:12:41

@mfikes I don’t understand why you need the if-some at all there

borkdude16:12:12

(aget maze cur-pos) should always return something when (< cur-pos length)

mfikes16:12:23

OK. Let me address that 🙂

bhauman16:12:05

OK I got it in the ball park

bhauman16:12:13

by switching to aget

borkdude16:12:31

@mfikes turned it to a let, seems to help a bit

borkdude16:12:46

so, if-let made it slower.. hmm

mfikes16:12:58

Perhaps crappy expansion?

mfikes16:12:30

It is surprising that ternary < in Clojure botches the perf by an order of magnitude, but not so in ClojureScript. Specificially, trying to do the right thing and testing (< -1 cur-pos length) instead of just (< cur-pos length) makes it run in 900 ms instead of 87 ms.

borkdude16:12:35

Well, I could in fact shave of a couple of ms with this trick: https://github.com/borkdude/aoc2017/blob/master/src/day5.clj#L115

mfikes16:12:24

Oh, so check for zero and then optimize for that case?

borkdude16:12:25

@mfikes with your data input 78 ms

mfikes16:12:40

Wow. I'll apply that one! We must dominate.

mfikes16:12:04

Or, if you want I can add you as a committer and you can push your change in.

borkdude16:12:24

ok, add me 🙂

bhauman16:12:46

you could check for negative as well right?

mfikes16:12:23

I'm not seeing why it replaces 0 with 2. Should it be 1 instead?

borkdude16:12:59

but maybe this is no good, I don’t see a real change when applied to your benchmark with dotimes

borkdude16:12:18

Run 58 of 60.
“Elapsed time: 102.893639 msecs”
24315397
Run 59 of 60.
“Elapsed time: 90.53835 msecs”
24315397
Run 60 of 60.
“Elapsed time: 100.259889 msecs”
24315397
---
Run 1 of 60.
“Elapsed time: 129.65845 msecs”
24315397
Run 2 of 60.
“Elapsed time: 104.405081 msecs”
24315397
Run 3 of 60.
“Elapsed time: 96.121724 msecs”

mfikes16:12:58

Oh, I see why 2. You are trying to skip a loop cycle?

mfikes16:12:50

Hence the (+ steps 2)

borkdude16:12:11

with criterium I do see a difference of about 4-5 ms

borkdude16:12:16

which may not be significant

borkdude16:12:37

@bhauman negatives?

borkdude16:12:19

@mfikes we have world domination now… congrats.

mfikes16:12:16

In Node, my reduce-based "normal" solution takes 8 seconds, while the optimized version takes 80 ms in Node.

mfikes16:12:10

I should post this as an 80-ms JavaScript based solution: https://gist.github.com/mfikes/4ef3d2c3efc8f72a848e9149e1229e84

borkdude16:12:31

is that the output of the compiler?

mfikes16:12:33

Yeah! 🙂

borkdude16:12:25

@chrisblom Removed the maze from the loop arguments, another 4 ms 🙂

borkdude17:12:14

@mfikes Could you try this commit? It seems to be 30 ms faster on Node on my machine than the original: https://github.com/mfikes/advent-day-5/commit/3fdda4ba34d144fb795674f51ba81927c9125d1c

chrisblom17:12:31

has any tried unrolling the loop yet?

borkdude17:12:29

@mfikes also pushed a cljc fix, but it worked nonetheless

dpsutton17:12:11

to unroll a loop you need to know in advance how many times you will loop, no? and that isn't known until you get there

dpsutton17:12:36

unless you mean make the loop include several iterations with escape hatches in each?

dpsutton17:12:34

but i thought loop unrolling was a strategy to evade the loop test which we cannot escape so we always pay the branch penalty

mfikes17:12:06

@borkdude Yes, Node is faster for me with your commit

borkdude17:12:28

How fast now?

mfikes17:12:15

Oh wait. It actually went for an original speed of 86 ms to slower at around 93 ms

borkdude17:12:12

On my machine, branch master:

24315397
Run 57 of 60.
“Elapsed time: 122.000000 msecs”
24315397
Run 58 of 60.
“Elapsed time: 136.000000 msecs”
24315397
Run 59 of 60.
“Elapsed time: 125.000000 msecs”
24315397
Run 60 of 60.
“Elapsed time: 128.000000 msecs”
24315397

borkdude17:12:27

Branch faster:

“Elapsed time: 114.000000 msecs”
24315397
Run 58 of 60.
“Elapsed time: 109.000000 msecs”
24315397
Run 59 of 60.
“Elapsed time: 106.000000 msecs”
24315397
Run 60 of 60.
“Elapsed time: 109.000000 msecs”
24315397

mfikes17:12:56

Right, something was messed up with the squashed commit I put on master. It was causing an inexplicable regression.

mfikes17:12:05

I've since removed it.

borkdude17:12:30

I noticed when compiling some code on the background, severely impacted the performance, so these times also depend on what your machine is doing. For some real benchmarking criterium or something is needed.

borkdude17:12:48

In my repo I used criterium, which gives better time for this commit

borkdude17:12:11

Maybe node also has something like this

mfikes17:12:14

FWIW, I fixed the master branch so it now has the better perf number

mfikes17:12:05

(At least better under Node)

mfikes17:12:20

Yeah, on my instance of Node that one runs faster

borkdude17:12:40

that’s weird, because the if-some is not needed… 🙂

borkdude17:12:28

on my machine it’s faster too

mfikes17:12:26

Right... the if-some should be cleaned up. It is unnecessary, but AFAICT it doesn't hurt perf.

borkdude17:12:46

when I change the if-some to let it becomes slower

mfikes17:12:19

I was referring to perf on Node. Let me make the change as well to be sure.

borkdude17:12:31

@mfikes I was also talking about that

borkdude17:12:01

With let:

"Elapsed time: 132.000000 msecs"
24315397
Run 59 of 60.
"Elapsed time: 129.000000 msecs"
24315397
Run 60 of 60.
"Elapsed time: 138.000000 msecs"
24315397
With if-some:
Run 57 of 60.
"Elapsed time: 111.000000 msecs"
24315397
Run 58 of 60.
"Elapsed time: 106.000000 msecs"
24315397
Run 59 of 60.
"Elapsed time: 111.000000 msecs"
24315397
Run 60 of 60.
"Elapsed time: 106.000000 msecs"
24315397
(Node)

mfikes17:12:36

Ahh... indeed, converting it to a let and dropping the else branch makes it slower for me on node than having an if-some... going to look at the generate JavaScript

borkdude17:12:01

So far we have learned: if-let slows down, but if-some speeds up 😛

grzm17:12:21

@bhauman thanks 🙂 I'm amazed at all of the different ways people approach this, and I'm sure what seems obvious to one seems clever to someone else. I'm learning a lot by reading. Trying to figure out a way to make it all stick as efficiently as possible.

spfeiffer18:12:01

So AoC leads to the discovery of obscure perf regressions…great!

borkdude18:12:50

@spfeiffer We discovered that Node gets speedup from adding obsolete code…

spfeiffer18:12:10

@borkdude I think that is a very nice metaphor for the Node ecosystem :relaxed:

mfikes19:12:30

@borkdude I did a bunch of tests. It seems to boil down to the fact that an extra nil-check, while not free, pays off because it lets the Node optimizer know that it is doing arithmetic on non-`null` values. (We as humans know that it the values pulled out of the array are not nil, but the optimizer doesn't.) In fact, I saw one test run with Lumo where it started off slow, and then "figured it out" on its own, and started running at the faster speed, while in a loop, without me doing anything. I also found that you can get a lesser speedup by changing (aget maze cur-pos) to (int (aget maze cur-pos)). This gets compiled down to JavaScript that involves | 0, which is evidently a ostensibly useless appendage added to JavaScript arithmetic code to "hint" to the optimizer that it is doing integer arithmetic. So, when-some or even a more straightforward (when-not (nil? ...)) ..., while not changing the functionality, ends up acting like a hint, or an assertion that the optimizer can use to its advantage.

borkdude19:12:24

Interesting and at the same time frightening

mfikes19:12:09

One interesting question in my mind: In the early days, types were used for performance. Will we reach a point where types are inferred at runtime sufficiently well so that the performance of such code behaves just as well as statically typed code. Or, to put it another way, with enough money being dumped into JavaScript VMs, will ClojureScript end up running faster than Clojure one day. It can already do this at times for unhinted Clojure, but I'm wondering if it will all get to be so good where you can just write your dynamic unhinted Lispy code and it will run extremely fast.

borkdude19:12:43

That would be nice, but nothing beats full type inference for this I think?

borkdude19:12:21

I mean, if you have all type info, what else is there to guess?

borkdude19:12:06

PureScript probably won’t have much benefit from typing to performance because of its target?

mfikes19:12:49

An analogy: I used to find it odd that you didn't pass flags like -O3 to the javac compiler. It was difficult to just "let go" of the idea optimizing at compile time, and defer it all to runtime. One argument is that if you let the runtime do the optimization, it can leverage the call and memory access patterns in your program, and perhaps do a better job than a static analyzer could do. If static type info is available, then, sure, it can be useful. I'm just wondering if we as human developers will be in the low-level business of type hinting a decade or two from now.

borkdude19:12:28

Maybe there will be a more fuzzy type of static type inference based on ML?

borkdude19:12:51

That could also assist in IDEs and for performance, without getting in the way

borkdude19:12:07

I think Steve Yegge once wrote about this

borkdude19:12:47

That’s 10 years ago… he’s now fond of Kotlin 😉

tbaldridge20:12:36

@mfikes I'd argue we're already there with pypy

dpsutton20:12:21

i don't follow why types hints allow for anything more than asserted types

tbaldridge20:12:22

I've seen entire transducer pipelines compile down to half a dozen assembly instructions.

dpsutton20:12:04

if a jit finally believes you that they are ints, wouldn't it believe you if you said they were ints in the first place?

tbaldridge20:12:52

@mfikes what you describe is pretty much the premise of tracing JITs. https://en.wikipedia.org/wiki/Tracing_just-in-time_compilation

tbaldridge20:12:33

@dpsutton hints can also be used for storage optimization. In that case they're a bit stronger than assertions.

tbaldridge20:12:52

But yeah, hints are a way of papering over the inability of a JIT to deoptimize.

tbaldridge20:12:50

JS and PyPy do deoptimization. Put only ints in a list and the JIT will create a typed array of ints. Once you try to put an object in the array it will convert the list into an list of objects and then add the object to it.

tbaldridge20:12:01

So the question becomes, why should you ever have to hint at all? In Clojure it's because the JVM doesn't allow user-level deoptimization.

tbaldridge20:12:18

And it's a method jit, and that doesn't help either

dpsutton20:12:27

i'm not sure i follow. deoptimization sounds like recognizing a bad assumption?

dpsutton20:12:35

and that wouldn't happen with types? or does it

dpsutton20:12:52

I thought these were ints but there's a map so now they're all objects

dpsutton20:12:58

is what i'm imagining the JIT is doing

tbaldridge20:12:22

Well with types and something like C# or C++ deoptimization can never occur, since all the types are known at compile time.

tbaldridge20:12:58

On the JVM it's a bit strange, but it will deoptimize a callsite. So as you implement IFoo on more and more types the JVM may recompile every method that calls IFoo.foo.

tbaldridge20:12:29

That's why adding stuff like tuples to Clojure doesn't work since the perf improvement is a wash compared to the de-optimized callsites.

dpsutton20:12:47

interesting. thanks for explaining. my jvm knowledge is very slim

tbaldridge20:12:23

What tracing jits like PyPy do, is look at the context for a loop. Within a single loop they inline all methods and reduce all types to their lowest level parts. So as long as you never call a method with a map within that single loop, the code for handling that map will never exist in the loop.

mfikes20:12:19

Hmm. So, transducers are a little more friendly to tracing JITs, it sounds…

tbaldridge20:12:36

Right, on the JVM transducers are always boxed. On JS the JIT is a bit more tracing-like, so you get primitive math loops.

tbaldridge20:12:53

Do a transduce over an array in CLJ and CLJS and the CLJS version will be much faster if all you're doing is math.

bhauman20:12:50

so that wasn't a good idea? you starting to get hungry?

borkdude20:12:43

@bhauman did you comment on my deleted blob?

borkdude21:12:08

ah - yeah, it wasn’t a feasible idea

mfikes21:12:49

For the day 5 problem, it seems that it reads and writes rapidly to lots of parts of a big array. I wonder if memory bandwidth is affecting things more than any of the math / logic.

tbaldridge21:12:36

@mfikes in what language? and what access pattern?

tbaldridge21:12:01

I've seen 4x perf improvement from making sure I scanned an array from 0 to X instead of walking it randomly.

borkdude21:12:07

Pretty satisfied that we got it down to 75 ms, the same number the C++ and Rust guys are mentioning on Reddit 🙂

bhauman21:12:55

and thats for CLJS? right?

borkdude21:12:05

in my case Clojure JVM

borkdude21:12:35

I can’t reliably benchmark node, I use criterium in Clojure

tbaldridge21:12:14

I wonder how much using unchecked math would improve that?

bhauman21:12:28

oh thats been tried

bhauman21:12:54

or has it?

tbaldridge21:12:27

ah, cool, I missed that post

bhauman21:12:31

using :warn-on-boxed

borkdude21:12:08

@tbaldridge Mike discovered that if-let generates less performant code than just if and let…

borkdude21:12:30

this and a couple of other tweaks shaved off some ms here and there

mfikes21:12:45

The array is only about 1000 integers. We read from a spot which tells us where to jump to next, and we write to the spot we just read from with a new offset derived from the old one. Repeat until we hop out of the array.

mfikes21:12:47

Oh, I'd summarize it this way: By guarding a block of code with a nil check, that code runs faster under Node, even though we know (as humans) that none of the values are nil.

borkdude21:12:08

yeah, but on JVM Clojure the code ran also faster without if-let

bhauman21:12:49

yeah if and then let is faster than if let

borkdude21:12:54

we went from if-let to if-some and then to let… if-some turned out to be beneficial on Node… it’s hard to optimize for both worlds

bhauman21:12:19

wow it's a lot faster

tbaldridge21:12:31

yeah, I doubt memory perf is the issue then. As 1000 integers at 8 bytes that should easily fit in the cache.

mfikes21:12:45

Yeah. I mistakenly assumed it was a much larger array.

bhauman21:12:22

using if and then let cut my time in half

borkdude21:12:24

while the if-some was an obsolete check, the Node runtime still benefited from that, crazy

mfikes21:12:35

Don't we already have a solution that is around the same as one of the C++ ones out there?

borkdude21:12:08

@mfikes Yes, I posted it 5 minutes ago: (quick-bench (part-2-array-faster?)) ;; 74 ms, or did you mean something else?

borkdude21:12:48

(on a Macbook Pro 2015)

mfikes21:12:32

We are running through 20 million steps in 70 ms. Around a few nanoseconds per loop?

mfikes21:12:40

If that's around 10 machine cycles, it might be hard to improve upon 🤔

borkdude21:12:11

The assembly is probably straightforward, but I only did that in university…

tbaldridge21:12:40

hey @mfikes you had a twitter thing about emitting machine code form JS right?

tbaldridge21:12:44

got the machine code for this?

mfikes21:12:30

Yes, you can enable JavaScriptCore's logging of compilation. But it is very hard to actually catch the code you'd like. (It spews stuff faster than you can consume it.) I'll try and see 🙂

borkdude21:12:56

maybe writing a C program is easier

mfikes21:12:10

Yeah, this is probably a few lines of C

mfikes21:12:49

In short if you

JSC_dumpDisassembly=true planck src/advent_day_5/core.cljc
in https://github.com/mfikes/advent-day-5, you will see lots of disassembly

mfikes21:12:36

Perhaps this is some of the assembly JavaScriptCore is creating for our loop, but it is hard for me to tell: https://gist.github.com/mfikes/5c68510b6ab2207b85cc92697a005a3e

borkdude21:12:04

I’m trying a C program now, about 195 ms on my machine

mfikes21:12:19

@borkdude Did you compile it with optimizations enabled?

borkdude21:12:19

No, I just did gcc -o day05 day05.c && ./day05, what optimizations do you suggest?

borkdude22:12:17

ah 57 ms with -O3

borkdude22:12:29

that’s probably all the speed you’re gonna get

mfikes22:12:09

It's remarkable we are able to produce pretty much the same result with some carefully crafted Clojure.

dpsutton22:12:47

this has got to be the best channel on slack right now

mfikes22:12:40

I can reproduce the 57 ms. I also tried inlining the input data array (instead of having the program read it from disk), but it still runs at the same speed—dominated by the loop.

borkdude22:12:55

I also applied the zero checking trick to the C program, but it didn’t help there

borkdude22:12:45

an extra branch (simple boolean/int check in C) is more costly on the lower level than just doing the same loop all the time?

borkdude22:12:50

Talking about this one:

while (current <= 1036 && current >= 0) {
    move = numbers[current];
    /* if (move == 0) { */
    /*   numbers[current] = 2; */
    /*   steps += 2; */
    /*   current += 1; */
    /* }   */
    /* else */
    {
      if (move >= 3)
        numbers[current] -= 1;
      else
        numbers[current] += 1;
      current += move;
      steps += 1;
    }
  }
With code uncommented, it becomes slower, not faster.

borkdude22:12:55

I also tried to move the >= 3 branch to the first place, but that becomes even slower

borkdude22:12:27

Calling it a night. Tomorrow is another puzzle…

tbaldridge22:12:34

branch prediction at work most likely

tbaldridge22:12:52

easier to to calculate something than do a conditional