clojure 2018-09-21 | Slack Archive

idiomancy00:09:14

is there any way to make the transducer supplied to into operate on 2 elements (the accumulated to and the next element of from) instead of just the next element of from?

idiomancy00:09:49

i mean I guess the function I'm talking about is reduce 😂

idiomancy00:09:03

right. Everyone forget you heard anything here.

idiomancy01:09:20

okay, next round of "core functions jeopardy". For $500: I am a function that takes a reference function and a default value and returns a function that will return the default value if the arguments to the reference function return nil. so, some function where (fn [f default] (fn [& args] (or (apply f args) default))

idiomancy01:09:25

is that a thing?

idiomancy01:09:21

its like fnil, but it uses a default output if the output is nil, whereas fnil uses a default input if the input is nil

orestis05:09:59

This implementation will also do it for false though ;)

idiomancy17:09:37

fair!

ghadi01:09:13

It is not a thing but you just wrote it

ghadi01:09:29

Congratulations 😹

idiomancy01:09:38

lol, thank you for your help!

roklenarcic08:09:41

I am trying to use pmap, but I've hit a snag. If an item takes very long to process, pmap stops processing shorter items that follow it

roklenarcic08:09:37

it processes just a couple of items after the long item, then it stops doing work until that long item is completed

caleb.macdonaldblack08:09:19

What does the 'A' stand for in the examples for protocols?

schmee08:09:53

which examples?

schmee08:09:01

if it’s these (https://www.clojure.org/reference/protocols) I think it’s literally just “A protocol” and “A type”

andy.fingerhut09:09:06

@roklenarcic Yes, this is a property of how pmap is implemented.

andy.fingerhut09:09:44

If you want something like pmap that can achieve a desired level of parallelism, no matter how long some items take to finish relative to others, you will need something else.

stathissideris14:09:22

is there a clojure test runner that: prints diffs AND prints the ex-data of ex-infos?

Alex Miller (Clojure team)14:09:04

as of Clojure 1.10.0-alpha8, clojure.test does

Alex Miller (Clojure team)14:09:15

well prints ex-data

john14:09:20

@roklenarcic it may be the interaction between pmap and chunked seqs (ah, nm, I see Andy answered)

stathissideris14:09:08

@alexmiller thanks Alex!

stathissideris14:09:59

@alexmiller the diffing part of my question was already solved by using pjstadig/humane-test-output

roklenarcic15:09:34

yes I figured out that as well

roklenarcic15:09:28

When writing macros, the macro must return a single form, right? So when I emit 10 defns I need to wrap it in a do?

Alex Miller (Clojure team)15:09:15

yes

dominicm16:09:05

I'm not seeing classes compiled as I expect when running:

clojure -A:aot:prod \
        -e \
'(binding [*compile-path* (System/getenv "build_data")]
  (compile (symbol (System/getProperty "main"))))'

The namespace itself is compiled, but none of it's dependencies are being AOT'd. Unfortunately this fails when I run my jar because:

Exception in thread "main" java.lang.NoClassDefFoundError: edge/system$system_config

So I assume that the main's dependencies must also be AOT'd, but that isn't done for me by just running compile?

ghadi16:09:28

is your compile target path on the classpath @dominicm?

dominicm16:09:37

@ghadi I have tried both on & off. I did on by injecting the additional path with -Sdeps:

# AOT compile the application
clojure -A:aot:prod:xxx \
        -Sdeps "{:aliases {:xxx {:extra-paths [\"${build_data}\"]}}}"
        -e \
'(binding [*compile-path* (System/getenv "build_data")]
  (compile (symbol (System/getProperty "main"))))'

dominicm16:09:02

require is something new I'm trying, diverging trees and all 😄

ghadi16:09:28

it needs to be on -- put the compile target inside the :aot alias under extra-paths if you can

ghadi16:09:58

transitive deps will get AOT'ed as you say.

dominicm16:09:25

@ghadi why under :aot. Doesn't :xxx work above?

ghadi16:09:46

yeah that's fine too if it's a dynamic directory

ghadi16:09:56

I'd verify your expected classpath

ghadi16:09:21

-Spath or -e '(System/getProperty "java.class.path")'

dominicm16:09:54

❯ clj -A:xxx -Sdeps '{:aliases {:xxx {:extra-deps {parallel {:mvn/version "0.6"}}}}}'                                 
Downloading: parallel/parallel/0.6/parallel-0.6.pom from 
Downloading: parallel/parallel/0.6/parallel-0.6.jar from 
Clojure 1.9.0
user=> (require 'parallel.core)
nil
user=>

dominicm16:09:04

It's dynamic, I'm using a temp dir to ensure there's never ever anything dirty

dominicm16:09:17

I will verify with -Spath

dominicm16:09:14

@ghadi the classpath started prod:/tmp/tmp.WABU1KabYg:src:sass:resources:bin: based on System/getProperty. The temp directory looks right to me. I could perhaps verify further by using io/resource inside eval.

ghadi16:09:59

it could be a classloader issue too

ghadi16:09:13

you should be able to find a file named edge/system$system_config.class

dominicm16:09:36

@ghadi that file doesn't exist.

ghadi16:09:53

starting a thread

dominicm16:09:06

Okay 🙂

ghadi16:09:09

are you doing anything funky with classloading?

ghadi16:09:32

or is it mostly plain require

dominicm16:09:50

Yes, but not with the class file. Only with true dependencies. edge.system/new-system is loaded via :require

dominicm16:09:19

The temp directory is on the classpath, and earlier inserted file is available, res? #object[java.net.URL 0x14ef2482 file:/tmp/tmp.Iz8UbhcfuK/public/doc.js]

ghadi16:09:56

reduce the problem scope: try to (compile 'edge.system) only

ghadi16:09:52

make sure you can subsequently load the expected edge.system/system-config

dominicm16:09:21

❯ jar tf main.jar | grep 'edge.*class$'
edge/system__init.class
edge/system$system_config.class
edge/system$config.class
edge/system$fn__2168.class
edge/system$fn__2166.class
edge/system$loading__6434__auto____2164.class

It's there.

ghadi16:09:33

sweet

dominicm16:09:59

I'll need to do some fiddling to make it runnable too :thinking_face:

dominicm16:09:02

change main to clojure.main

ghadi16:09:05

java -cp main.jar clojure.main

ghadi16:09:25

then do a couple -e to require edge.system and check that the system-config is reachabl

ghadi16:09:39

no need to alter the jar

dominicm16:09:17

I'm using a classloader for dependencies within the jar.

So I can't -cp.

Aha!

seems relevant

The problem is that edge/system$system_config.class doesn't get generated when AOT'ing, no?

ghadi16:09:37

i see it up there

dominicm16:09:51

I mean, when AOT'ing my main.

ghadi16:09:08

something is screwy -- are you perhaps requiring it before compiling it?

dominicm16:09:57

:thinking_face: very strongly possible. The main was named user by someone, for other purposes of their own.

dominicm16:09:16

Can a namespace being compiled not be required first?

dominicm16:09:04

fwiw, I can require edge.system from clojure.main when my main is set to clojure.main.

ghadi16:09:17

what is your main usually?

dominicm16:09:26

I don't understand

ghadi16:09:42

what is your main class usually set to if not clojure.main

dominicm16:09:47

user

dominicm16:09:26

because it's the initial namespace when connecting to a REPL.

dominicm16:09:37

If require breaks AOT then I can understand why it's bad.

dominicm16:09:40

easily tested

ghadi16:09:51

I'm not sure... but yeah verifiable

ghadi16:09:16

I'd be curious -- gotta run, but let me know!

dominicm16:09:38

renamed user to userd and it worked.

dominicm16:09:40

Good call.

ghadi16:09:23

so what exactly was the issue? user.clj required edge.system before compilation occurred?

ghadi16:09:38

and \o/ for finding the issue

dominicm16:09:38

Well, clojure.main requires user.clj as part of startup. So I guess that would load in edge.system.

dominicm16:09:18

I didn't realize require had an effect on aot. Where is that documented?

ghadi17:09:02

I think compile skips ns'es that are already loaded (not 100% sure of that, also verifiable)

dominicm17:09:47

It doesn't entirely make sense to me as why it would do that. Seems like one for #clojure-dev

dominicm17:09:41

I know why, I remember reading the source. It uses require under the hood, which caches and doesn't call the dynamic variable that it otherwise would.

dominicm16:09:46

Sorry, that's what I meant to suggest.

andy.fingerhut16:09:35

@roklenarcic Sorry for not adding this earlier, but there is a fairly old Clojure library that despite its age probably still works just fine at providing a "bounded supervised thread pool". It is a fairly thin wrapper around Java's Executors library: https://github.com/amitrathore/medusa The link in the README has expired, but http://archive.org still has it here: https://web.archive.org/web/20140623162704/http://s-expressions.com:80/2010/06/08/medusa-0-1-a-supervised-thread-pool-for-clojure-futures-2/ I am probably not filling you with confidence with this reference 🙂

Alex Miller (Clojure team)16:09:13

Wow, deep cut

Alex Miller (Clojure team)16:09:39

Doesn’t claypoole have stuff too?

Alex Miller (Clojure team)16:09:06

https://github.com/TheClimateCorporation/claypoole

orestis17:09:13

I’ve been trying to find some reference on the overhead of clojure’s data structures, but I’m coming up short. I’m trying to do some back of the envelope calculations of whether an in-memory cache would be viable or not.

Alex Miller (Clojure team)17:09:15

https://github.com/clojure-goes-fast/clj-memory-meter maybe useful?

orestis17:09:32

Yes I’ve used that and got completely unexpected results so that’s why I’m trying to guess now :)

orestis17:09:20

Btw during this dive I found this: https://blog.acolyer.org/2015/11/27/hamt/

orestis17:09:03

Very nice performance claims but I’m unsure of the viability. I guess it’s already known since it’s quite old.

ghadi17:09:34

some people have questioned the reproducability of those results in Clojure

👍 8

jaawerth17:09:21

seems pretty benchable if you can implement a drop-in hashmap replacement

Alex Miller (Clojure team)17:09:17

the open question on that is whether its a false comparison due to different hashcode calculations

cgrand19:09:49

It is indeed a false comparison. I have the original code updated to use Clojure hasheq.

cgrand19:09:03

https://github.com/cgrand/confluent-map/blob/master/src/main/java/net/cgrand/TrieMap_5Bits.java

henrik17:09:25

Is HAMT what Bodil is talking about in https://youtu.be/cUx2b_FO8EQ ?

henrik17:09:08

She's discussing Bagwell's improvements on PersistentVector.

ghadi17:09:18

yeah this is an improvement to HAMTs. @cgrand has done some significant experimentation

henrik17:09:21

She claims logarithmic complexity for concatenation and other stuff that's linear time in Clojure though.

cgrand19:09:55

The hidden constant in big O notation matters. For lookup/update they are slower.

noisesmith17:09:17

~~as far as I know with clojure it's a log base 32 and the claim is it's "effectively linear"~~

jaawerth17:09:19

linear is O(n) which is way worse than log 32. Might you be thinking of the log 32 time for lookups and the claim that this is effectively constant time?

noisesmith17:09:42

Blah, you are right, I had that scrambled, thanks for the correction

henrik17:09:49

So the claim is,

PersistentVector     RRB
Lookup           O(logk n)            O(logk n)
Push/pop (back)  O(1) amortised       O(logk n)
Push/pop (front) O(n)                 O(logk n)
Concatenation    O(n)                 O(logk n)
Split            O(n)                 O(logk n)

henrik17:09:42

Where RRB is Bagwell’s stuff

pjstadig17:09:04

@henrik https://github.com/clojure/core.rrb-vector

jaawerth17:09:26

@henrik as for HAMT in clojure, iirc hashmaps definitely use them, and what she's talking about in that video with regard to PersistentVector is a variant of the same technique applies to vector indices. afaik the naming isn't standardized but the article(s) she's referencing at 17:30 called them bit-partitioned vector tries. It's a great read if you haven't seen it https://hypirion.com/musings/understanding-persistent-vector-pt-1

jaawerth17:09:28

apologies if this is old news, the article being from 2013 and all 😉

cgrand19:09:46

To me they are closer to persistent data structures by Chris Okasaki https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf especially binary random access lists but extended to base 32.

henrik18:09:46

So, slicing is actually O(1) for PersistentVector when using subvec, with the tradeoff of preventing garbage collection of the original.

jaawerth18:09:36

this rrb stuff is new to me.. guess I know what I'll be digging into on my commute on monday

pjstadig18:09:20

Bagwell talked about RRB trees at the clojure conj in 2011 https://youtu.be/K2NYwP90bNs?t=1515

pjstadig18:09:59

they sound pretty cool, but I haven't heard much talk of them since

henrik18:09:41

I guess slicing could be O(1) for RRBs as well, with the same trade-off.

jaawerth18:09:47

the fact that their subvec/`catvec` operations can be applied to the core PersistentVector is really cool. The mention that regular subvec prevent gc by wrapping the underlying vector applies that rrb/subvec doesn't, which is odd because it also says it works on a PersistentVector by reusing its internal structure

jaawerth18:09:51

ah, "The resulting vector shares structure with the original, but does not hold on to any elements of the original vector lying outside the given index range."

henrik18:09:08

@jesse.wertheim But those operations would still be O(n) for PersistentVector, I think.

jaawerth18:09:36

yeah, just nice not having to first deal with some kind of extra conversion cost

henrik18:09:44

Yeah, true

jaawerth18:09:19

even if the perf doesn't always win, good to know about in cases where memory leaks might be an issue. more to learn about, at least. thanks for bringing it up! I love nerding out about this stuff

👍 4

henrik18:09:41

As RRBs have more even performance across operations, it intuitively seems like a better default, with PersistentVector more suitable for edgecases than the other way around.

jaawerth18:09:35

another consideration is whether there's a similar improvement in transients, since those are likely to come into play during intensive data ops where performance is important

jaawerth18:09:21

hrm though actually none of the operations they're talking about would really apply to transients I guess. nevermind!

henrik18:09:20

Yeah, I was going to say. Transients abandon the entire trie thing, don't they?

jaawerth18:09:28

yeah. it's probably the same temporary mutable wrapper around the respective structures

jaawerth18:09:04

I had no idea this was shipped in core! Even if not the default, I wonder why it doesn't show up in the online api docs

hiredman18:09:26

transients do not abandon the entire trie thing

henrik18:09:03

I thought they reverted to something resembling an array.

Alex Miller (Clojure team)18:09:27

the core.rrb-vector contrib library docs are at http://clojure.github.io/core.rrb-vector/

hiredman18:09:14

to understand if they abandon "the entire trie thing" you don't actually have to look at the implementation, just the cost of transforming to and from a transient, which is O(1)

hiredman18:09:55

you cannot build a trie in O(1), there for the transient must still be a trie

henrik18:09:07

I didn't know it was O(1).

hiredman18:09:34

user=> (doc persistent!)
-------------------------
clojure.core/persistent!
([coll])
  Returns a new, persistent version of the transient collection, in
  constant time. The transient collection cannot be used after this
  call, any such use will throw an exception.
nil
user=>

👍 4

jaawerth18:09:49

(cc @henrik) yeah, looks like it's just copying the 32-element head and tail nodes to/from mutable versions used in the transient. makes sense. https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentVector.java#L534 https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentVector.java#L573-L577

pjstadig18:09:45

I believe Tarjan's real-time catenable deques are even faster (in terms of big-Oh) since they are O(1) for pushing and poping both ends and for catenation, but they are not very cache friendly

pjstadig18:09:39

Tarjan defines the deques as simple real-time deques and then catenable deques which are layered on top (I think it's been a while since I looked). I implemented the simple real-time deques in clojure https://github.com/pjstadig/deque-clojure/

orestis18:09:40

So I’m trying to measure the overhead of some Clojure data structures, and clj-goes-fast/clj-memory-meter gives me weird results:

user=> (first ideas)
{:title "0 This is an average title of average length etc etc", :description "0 This is a longer description<snip>", :id #uuid "25f3bb7e-e494-4a5b-9085-841cc634f079"}
user=> (mm/measure ideas :bytes true)
4575704
user=> (def ideas-map (into {} (map (juxt :id identity) ideas)))
#'user/ideas-map
user=> (mm/measure ideas-map :bytes true)
4620768
user=> (def content-length (reduce + (map #(mm/measure % :bytes true) ideas)))
user=> content-length
5071840
user=> (- (mm/measure ideas-map :bytes true) content-length)
-451072

pjstadig18:09:46

they're kind of complicated

orestis18:09:50

Hm, perhaps something about the keyword interning? Not sure how that works though.

orestis18:09:14

I.e., if the jamm thing that clj-memory-meter is using knows that it’s seen a reference to :title before, then it only counts the object pointer, whereas the reduce will count the contents of the keyword instance 1000 times.

hiredman18:09:28

it is hard to measure object size on the jvm, and really hard to measure object sizes in the presence of structural sharing

hiredman18:09:53

keywords are always the same object

hiredman18:09:53

newer jvms also do things like represent utf16 strings as utf8 if they can, and intern strings

orestis18:09:56

Turns out that perhaps yes: (mm/measure :title :bytes true) ;=> 168

orestis18:09:42

My quest here is to understand the overhead of Clojure’s data structures, I don’t care about the Strings at all.

hiredman18:09:23

then don't use keywords or strings when trying to measure

hiredman18:09:45

(keywords have a reference to the interned string)

hiredman18:09:21

I am just saying, I dunno how sophisticated the jmm tool is, but the tricks done by both the datastructures and the runtime make getting a definitive answer analytically really hard

hiredman19:09:22

so it is not super surprising that some heuristics are used here and there in jmm, and so its answers are not consistent

orestis19:09:56

With the assumption that jmm knows not to double count references to the same object and knows about interned strings, I can say that the overhead for a persistent map is roughly along the lines of 50-60 bytes per key/value pair, when storing 1000 of them.

orestis19:09:21

And around 7-8 bytes for a persistent vector.

orestis19:09:44

IIRC, a Java reference/pointer is 4 bytes, right?

hiredman19:09:29

it depends

Alex Miller (Clojure team)19:09:39

keywords don’t use interned strings

Alex Miller (Clojure team)19:09:05

(anymore)

hiredman19:09:09

ok, but then they are exactly the sort of string that the gc's background string interner would intern

Alex Miller (Clojure team)19:09:19

yes

Alex Miller (Clojure team)19:09:31

and from my limited experiments when they released that, it works great

Alex Miller (Clojure team)19:09:01

but don’t know over what time you would see that impact

orestis19:09:22

I think this is only on when using G1, which AFAIK is not the default yet? Or is it?

hiredman19:09:44

yeah, it is just a another kind of "it depends" thing that makes it hard to say

Alex Miller (Clojure team)19:09:59

yes, but I think it is “now” (Java 9+)

Alex Miller (Clojure team)19:09:09

is the default that is

ghadi19:09:44

No interning happens now, just "compact strings"

hiredman19:09:51

http://openjdk.java.net/projects/code-tools/jol/ is a thing that might be real useful

Alex Miller (Clojure team)19:09:22

I’m in Java 11 in the terminal I’m in now and seems like stringdeduplication is off by default

Alex Miller (Clojure team)19:09:32

java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | grep -i 'duplicat'

andy.fingerhut19:09:43

Regarding the core.rrb-vector library, I noted recently there are several issues/bugs filed against it that may be real problems. I haven't dug into it. Before relying on them for something real in production, I'd want to investigate them a bit and/or beef up its test cases.

jaawerth19:09:56

for sure

andy.fingerhut19:09:02

Regarding memory space used by Clojure data structures, does anyone know if the lib https://github.com/clojure-goes-fast/clj-memory-meter linked earlier has a feature like "here are two data structures d1 and d2. Tell me in the tree of those objects and everything they link to, how many bytes are unique to d1, how many are unique to d2, and how many are shared?" That would be a nifty feature to have when analyzing Clojure persistent data structures, sometimes.

andy.fingerhut19:09:47

I had experimented quite a while back with some code leading in that direction, but nothing polished.

jaawerth19:09:59

@orestis from what I know about the implementation, one part of the "it depends" is that it depends on key length for hashmaps and the bit-length for vector indices, since the keys and indices are split into (5-bit, I think) chunks. structural sharing complicates that further, though

andy.fingerhut19:09:36

@alexmiller I've seen you mention "deep cut" a couple of times recently now. Do you mean this meaning from Urban Dictionary, except applied to Clojure? https://www.urbandictionary.com/define.php?term=Deep%20Cut If so, then I like it, and yes, that was a deep cut. Sometimes that is the only answer I know to a question because I've missed something more current or widely used.

Alex Miller (Clojure team)19:09:22

Also btw, Urban Dictionary uses Clojure :)

ghadi19:09:38

Really?!

Alex Miller (Clojure team)19:09:14

yes

Alex Miller (Clojure team)19:09:06

they have oss clojure on github

noisesmith20:09:43

nice

yogidevbear06:09:30

Ha that's cool 🙂 https://github.com/urbandictionary?tab=repositories

Alex Miller (Clojure team)19:09:45

Yes

orestis19:09:46

@jesse.wertheim Oh yeah, I should have tested instead the PersistentHashMap, I’m now testing PersistentArrayMap with only 3 keys in there. I’m mainly looking for a rule-of-thumb here when loading 1000s of different docs from a database — where there is no structural sharing going on.

jaawerth19:09:10

I find when dealing with any kind of JIT engine you're usually better off testing this stuff in context so things don't get muddled by hyper-optimization of trivial test cases. I'm more familiar with seeing that happen in v8 than the JVM though

orestis19:09:51

Thanks. I guess I shouldn’t be too worried about memory overhead too much. The UTF-16 default encoding will probably dominate everything that comes from a database.

ccann20:09:43

has anyone ever imported a multimethod from another namespace to use defmethod and then at runtime gotten no method in multimethod... I have require :refer [my-multimethod]

noisesmith20:09:05

you need to require a multimethod from the namespace that creates it, not the one extending it

ccann20:09:19

i create the multimethod foo in namespace x and I require foo from x in y to use defmethod and then in z I require foo from x again to call it