Fork me on GitHub
#clojure
<
2015-07-31
>
meow11:07:29

For anyone that's been following my Sieve of Eratosthenes adventure you might enjoy the latest update to the wiki page where, with some great advice from @rauh, we've gotten to this point of simplicity and elegance:

(defn chan-of-primes []
  (let [ints   (chan-of-ints 2)
        sieve  (posmod-sift)
        primes (chan 1 sieve)]
    (pipe ints primes)
    primes))

meow11:07:40

Now I'm entering a new phase where channels are cool, but transducers are amazing. Clojure has the best toys ever!

ul12:07:33

@meow: you've written great wiki page

meow12:07:52

@ul: thank you! I've learned a lot in the process.

meow12:07:10

I think I went from naive to optimized-but-doomed to complected to nirvana...

meow13:07:45

Rich Hickey used the word "macrology" in his blog post about transducers so I just had to look it up. Here it is:

/mak-rol'*-jee/ 1. Set of usually complex or crufty macros, e.g. as part of a large system written in Lisp, TECO, or (less commonly) assembler. 
What a great word! Who'da thought there was such a thing?

meow13:07:32

"transformers were never exposed a la carte, instead being encapsulated by the macrology of reducers"

trptcolin13:07:24

huh, i figured that must've been about how i get all logy ("dull and heavy in motion or thought; sluggish.") after thinking too long about macros

Alex Miller (Clojure team)13:07:32

reminder: Clojure/conj CFP is currently open till Aug 14th (reg/hotel/US travel paid for speakers) https://cognitect.wufoo.com/forms/clojureconj-2015-call-for-presentations/ http://clojure-conj.org

Alex Miller (Clojure team)13:07:50

hey, where'd that Clojure/West come from :) grr

meow14:07:14

cons is lazy, right?

Alex Miller (Clojure team)14:07:54

no - it eagerly builds a single cons cell :)

Alex Miller (Clojure team)14:07:38

most lazy constructions use it though to build step-wise output

meow14:07:05

(transduce (posmod-sift) conj (cons 2 (range 3 100 2)))

Alex Miller (Clojure team)14:07:29

transduce is eager so doesn't really matter what the input is here

meow14:07:48

I needed 2 at the head of a list of odd numbers.

meow14:07:58

right, in that context it didn't matter

meow14:07:08

what if I wanted it to be lazy?

meow14:07:35

a lazy version of (cons 2 (range 3 100 2))

Alex Miller (Clojure team)14:07:48

well parts of that are lazy :)

Alex Miller (Clojure team)14:07:01

the cons is not going to force realization of the range part

meow14:07:01

good enough!!!

meow14:07:16

oh, seriously?

meow14:07:20

that's cool

Alex Miller (Clojure team)14:07:36

it just points to the unrealized range

Alex Miller (Clojure team)14:07:00

range is a chunked seq so it will be realized 32 elements at a time as needed

meow14:07:29

ok, that is very nice - I just didn't want to make that kind of assumption

meow14:07:07

@alexmiller: tyvm :thumbsup:

Alex Miller (Clojure team)14:07:24

one interesting aspect of the transduce above is that a range (in addition to being a lazy chunked seq) is also self-reducible (which transduce would leverage)

Alex Miller (Clojure team)14:07:32

except that with the cons there, it will not

meow14:07:57

say what?!?!?!

Alex Miller (Clojure team)14:07:12

sorry, probably way too much inside baseball on that :)

meow14:07:40

no, keep going

Alex Miller (Clojure team)14:07:51

there is an internal protocol that can be used to mark a collection as knowing how to reduce itself (which in some cases is more efficient than traversing as a seq). as of 1.7, range does this.

Alex Miller (Clojure team)14:07:10

cons will fall into the seq-handling reduction logic instead, although I think when it encounters the the reducible, it will switch over (we worked on this near the end of 1.7 so I can't remember where it landed)

meow14:07:01

wow, so even more performance gains via transducers

scriptor14:07:37

what's the name of that protocol?

scriptor14:07:51

I think I was poking around the source the other day and might've stumbled on it

meow14:07:44

@alexmiller: this is really weird, and maybe it's a REPL thing, but I'm seeing the opposite of what I'd expect based on what you just described:

(time (count (transduce (posmod-sift) conj (cons 2 (range 3 10000)))))
"Elapsed time: 194.089542 msecs"
=> 1229
(time (count (transduce (posmod-sift) conj (range 3 10000))))
"Elapsed time: 255.656795 msecs"
=> 1229

bronsa14:07:23

@meow: I see results consistent with yours too

meow14:07:16

@bronsa: thanks for the confirmation - weird, huh?

meow14:07:49

I bet my machine is way slower than yours simple_smile

bronsa14:07:39

100ms slower simple_smile

tcrayford14:07:46

don't use time for timing things, or a repl for timing things that are basically microbenchmarks. The JIT will likely mess your benchmarks up real bad

tcrayford14:07:56

exercise for the reader: use statistics to give you a confidence rating as to whether two benchmarks are different, rather than blindly comparing two numbers

bronsa14:07:51

@tcrayford: I actually used criterium and a bare clojure repl (no lein)

rauh14:07:15

@meow: You're not comparing the same things here. Since you're missing the number "2" which short-circuits your(our) sifting filtering all even numbers

tcrayford14:07:04

@bronsa: hat tip to you then simple_smile

bronsa14:07:44

@rauh: ah there we go simple_smile

rauh14:07:44

You happen to get the exact same count even though the collection is different. It's because you'll get a "4" in your collection instead of the "2"

bronsa14:07:21

changing (range 3 ..) with (range 2 ..) in the second example shows indeed that reducing a Range is faster than the (cons ..) case

scriptor14:07:02

bronsa: what numbers are you getting?

tcrayford15:07:51

@bronsa: be sure to turn on the jvm's inline printing stuff and inspect the assembly to make sure it isn't the JIT being weird af 😉

meow15:07:48

ah, yes, thank you all

meow15:07:42

I knew that the functioning was going to be different because of the 4 instead of the 2 but didn't think through that the filtering would be slower/faster because of that as well

meow15:07:23

anyone want to benchmark this version? Seems hella fast to me:

(defn chan-of-primes []
  (let [primes (chan 1 (posmod-sift))]
    (onto-chan primes (drop 2 (range)))
    primes))

meow15:07:03

Actually, only slightly faster. And, yes, I do realize that the timings I'm running are not real benchmarks but at this point I'm just trying to get a feel for the rough differences in speed of these different approaches.

meow15:07:41

I don't think the onto-chan looping is optimal for this situation so I might write my own.

meow15:07:01

I think we've almost found the bottom of this rabbit hole...

tcrayford15:07:46

@meow: I've seen time be off by factors of 20k or more (like TWENTY THOUSAND TIMES FASTER/SLOWER). Doing rough timings that can be off by TWENTY THOUSAND TIMES seems pointless to me 😉

tcrayford15:07:49

depends on your code though 😉

dottedmag15:07:41

What is the reason of having both symbols and keywords?

Lambda/Sierra15:07:50

@dottedmag: Symbols are automatically evaluated by the compiler and usually resolved into something else, either a function or a value. Keywords are just constant values, they "evaluate" to themselves.

Lambda/Sierra15:07:46

Keywords in Clojure usually fill the role of static constants or enums in other languages.

Lambda/Sierra15:07:30

Ruby's "Symbol" type is actually more like Clojure's "Keyword"

mukeshsoni15:07:00

how does

(:name {:name 'xyz'}) 
;; 'xyz',  
work behind the scenes?

Lambda/Sierra15:07:06

@mukeshsoni: Clojure's Keyword type implements the IFn interface for functions, where it is defined to call get on the map argument with the keyword as the argument.

dottedmag15:07:07

@stuartsierra: Namespaces. Got it.

meow15:07:04

@tcrayford: oh, that's good to know, thanks. Do you recommend criterium or something else?

tcrayford15:07:37

criterium is good enough for most stuff, and there aren't any alternatives that are remotely easy to use from clojure right now

tcrayford15:07:30

@meow: just make sure you restart your repl after running criterium once, and run it with production settings (aka don't use lein repl without changing :jvm-opts)

meow15:07:27

At this point I've got three variations the need genuine benchmarking if we want to know which performs better:

(defn chan-of-primes-pipe []
  (let [ints   (chan-of-ints 2)
        sieve  (posmod-sift)
        primes (chan 1 sieve)]
    (pipe ints primes)
    primes))

(defn chan-of-primes-onto []
  (let [primes (chan 1 (posmod-sift))]
    (onto-chan primes (drop 2 (range)))
    primes))

(defn chan-of-primes-loop []
  (let [ints   (drop 2 (range))
        primes (chan 1 (posmod-sift))]
    (go-loop [vs ints]
      (when (>! primes (first vs))
        (recur (rest vs))))
    primes))

meow15:07:49

I should make those more consistent...

meow16:07:56

Ok, I cleaned those up and added them to the bottom of the wiki page. If anyone wants to help drive this forward by doing some benchmarking or suggest where to go next, it would be greatly appreciated. https://github.com/clojure/core.async/wiki/Sieve-of-Eratosthenes

hlship16:07:14

By the way, the project I was working on at Aviso has shut down (for business, not technical, reasons). I'm on the hunt for a new Clojure gig. I'm located in Portland, OR and would prefer to work with a team locally, or at least, in similar time zones.

meow16:07:43

@alexmiller: now that I'm all jazzed about transducers I have this itch to be able to do this:

(into (chan) xform (range 100))
Any chance of seeing that supported in the future?

tcrayford16:07:44

@hlship: what's gonna happen to the stuff that was under the aviso namespace? pretty etc?

hlship16:07:45

So leiningen can express dependencies that include a classifier ... but I don't see anything about publishing an artifact that includes a classifier.

hlship16:07:27

All the Aviso stuff will continue to exist and I'll continue to extend and improve it, as will others.

hlship16:07:54

I doubt we'll get to a point where there will be conflicts between Aviso's internal needs and those of external users.

hlship16:07:29

Except for one thing: I need to keep them compatible with Clojure 1.6 for the meantime (this is an evolving requirement over the last couple of hours). Thus my desire (above) to publish versions with classifiers.

hlship16:07:28

I may also be able to spin out some more code from the internal project to open source over the next month.

Frank Henard18:07:57

Hello folks, I am using upstart to run my clojure (embedded) jetty app with java -jar myapp.jar. When I make a code change, rebuild the jar, and then copy it to the location that upstart refers to, it reflects the changes I made without having to restart upstart. Is this safe to do while the app is under load? Is there a better deployment strategy you would recommend? Should I not use embedded jetty?

arohner19:07:59

@ballpark: I’m not sure I believe your explanation. You’re copying in a new jar, without restarting the JVM? and seeing the changes? Are you using a repl or anything?

arohner19:07:48

But to answer your real question, embedded jetty is fine. As far as deployment, that depends largely on the maturity and load of the project. At some point you’ll probably want a load balancer

Frank Henard19:07:31

I'm not using a repl. Perhaps java did restart. I guess I was concerned about replacing the .jar file while java is running

arohner19:07:34

so you can have “old” code connected to the load balancer. start up new code, connect it, and then remove 'old from the load balancer. That guarantees no downtime

Frank Henard19:07:16

I was planning to put nginx in front of my app

arohner19:07:21

also depending on maturity, you’ll probably want something like nginx in front

Frank Henard19:07:32

simple_smile So it sounds like you're saying that replacing the .jar file while it is running won't hurt anything, right @arohner?

arohner19:07:10

@ballpark: I’d be surprised if harmed anything, but I wouldn’t recommend it

arohner19:07:15

it’s nice to name your jars with a version or timestamp, so you shouldn’t be generating two versions of the code w/ same filename

Frank Henard19:07:18

Do you recommend any load balancers? Maybe deploying as a war to an installed jetty server would be better?

Frank Henard19:07:45

s/better/simpler

Lambda/Sierra19:07:50

You cannot reliably replace a JAR file while a Java application is running. I've seen apps break this way when they try to load classes whose names have changed.

Lambda/Sierra19:07:02

Java application servers such as Tomcat have special features to hot-swap JARs.

Frank Henard19:07:20

Hi @stuartsierra. I'm actually hoping to stay with embedded at this point because I'm using component.

Lambda/Sierra19:07:17

Then you will need to restart the JVM every time you want to deploy a new JAR.

arohner19:07:19

@ballpark: load balancers depend on where you’re deploying, and familiarity. I use AWS heavily, so ELB is a natural fit

Frank Henard19:07:27

@stuartsierra: and change my upstart script to reference the new jar everytime?

Lambda/Sierra19:07:29

@ballpark: I don't know anything about 'upstart', so I don't know what the best procedure might be.

Frank Henard19:07:04

ok. It's ubuntu's service management facility. So I can do service my-app restart or service postgresql restart

Lambda/Sierra19:07:45

@ballpark: If you want to reuse the same file name, you would need to stop the app first, copy the new JAR, then start it again.

Frank Henard19:07:43

got it. Thanks, that's probably the easiest way forward.

gtrak21:07:10

if I were to want write a REPL for some custom DSL, where could I start?

gtrak21:07:21

it maps to EDN in the end, but I want like context-sensitive completions and such