This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2018-09-21
Channels
- # 100-days-of-code (6)
- # aleph (26)
- # beginners (129)
- # boot (5)
- # calva (3)
- # cider (5)
- # cljs-dev (16)
- # cljsrn (4)
- # clojure (204)
- # clojure-dev (36)
- # clojure-italy (23)
- # clojure-nl (4)
- # clojure-spec (221)
- # clojure-uk (60)
- # clojurescript (68)
- # datomic (47)
- # emacs (4)
- # figwheel-main (50)
- # fulcro (29)
- # graphql (10)
- # hyperfiddle (19)
- # lein-figwheel (3)
- # leiningen (20)
- # liberator (3)
- # off-topic (89)
- # onyx (15)
- # pedestal (1)
- # portkey (2)
- # re-frame (3)
- # reagent (6)
- # ring-swagger (1)
- # rum (12)
- # shadow-cljs (10)
- # uncomplicate (4)
- # vim (5)
is there any way to make the transducer supplied to into
operate on 2 elements (the accumulated to
and the next element of from
) instead of just the next element of from
?
okay, next round of "core functions jeopardy". For $500:
I am a function that takes a reference function and a default value and returns a function that will return the default value if the arguments to the reference function return nil.
so, some function where (fn [f default] (fn [& args] (or (apply f args) default))
its like fnil, but it uses a default output if the output is nil, whereas fnil uses a default input if the input is nil
I am trying to use pmap
, but I've hit a snag. If an item takes very long to process, pmap stops processing shorter items that follow it
it processes just a couple of items after the long item, then it stops doing work until that long item is completed
What does the 'A' stand for in the examples for protocols?
if it’s these (https://www.clojure.org/reference/protocols) I think it’s literally just “A protocol” and “A type”
@roklenarcic Yes, this is a property of how pmap is implemented.
If you want something like pmap that can achieve a desired level of parallelism, no matter how long some items take to finish relative to others, you will need something else.
is there a clojure test runner that: prints diffs AND prints the ex-data
of ex-info
s?
as of Clojure 1.10.0-alpha8, clojure.test does
well prints ex-data
@roklenarcic it may be the interaction between pmap and chunked seqs (ah, nm, I see Andy answered)
@alexmiller thanks Alex!
@alexmiller the diffing part of my question was already solved by using pjstadig/humane-test-output
yes I figured out that as well
When writing macros, the macro must return a single form, right? So when I emit 10 defn
s I need to wrap it in a do
?
I'm not seeing classes compiled as I expect when running:
clojure -A:aot:prod \
-e \
'(binding [*compile-path* (System/getenv "build_data")]
(compile (symbol (System/getProperty "main"))))'
The namespace itself is compiled, but none of it's dependencies are being AOT'd. Unfortunately this fails when I run my jar because:
Exception in thread "main" java.lang.NoClassDefFoundError: edge/system$system_config
So I assume that the main's dependencies must also be AOT'd, but that isn't done for me by just running compile?@ghadi I have tried both on & off. I did on by injecting the additional path with -Sdeps
:
# AOT compile the application
clojure -A:aot:prod:xxx \
-Sdeps "{:aliases {:xxx {:extra-paths [\"${build_data}\"]}}}"
-e \
'(binding [*compile-path* (System/getenv "build_data")]
(compile (symbol (System/getProperty "main"))))'
it needs to be on -- put the compile target inside the :aot
alias under extra-paths if you can
❯ clj -A:xxx -Sdeps '{:aliases {:xxx {:extra-deps {parallel {:mvn/version "0.6"}}}}}'
Downloading: parallel/parallel/0.6/parallel-0.6.pom from
Downloading: parallel/parallel/0.6/parallel-0.6.jar from
Clojure 1.9.0
user=> (require 'parallel.core)
nil
user=>
@ghadi the classpath started prod:/tmp/tmp.WABU1KabYg:src:sass:resources:bin:
based on System/getProperty
. The temp directory looks right to me. I could perhaps verify further by using io/resource inside eval.
Yes, but not with the class file. Only with true dependencies.
edge.system/new-system is loaded via :require
The temp directory is on the classpath, and earlier inserted file is available, res? #object[java.net.URL 0x14ef2482 file:/tmp/tmp.Iz8UbhcfuK/public/doc.js]
❯ jar tf main.jar | grep 'edge.*class$'
edge/system__init.class
edge/system$system_config.class
edge/system$config.class
edge/system$fn__2168.class
edge/system$fn__2166.class
edge/system$loading__6434__auto____2164.class
It's there.then do a couple -e to require edge.system and check that the system-config is reachabl
The problem is that edge/system$system_config.class
doesn't get generated when AOT'ing, no?
:thinking_face: very strongly possible. The main was named user
by someone, for other purposes of their own.
fwiw, I can require edge.system
from clojure.main when my main is set to clojure.main.
so what exactly was the issue? user.clj required edge.system before compilation occurred?
Well, clojure.main requires user.clj as part of startup. So I guess that would load in edge.system.
I think compile skips ns'es that are already loaded (not 100% sure of that, also verifiable)
It doesn't entirely make sense to me as why it would do that. Seems like one for #clojure-dev
I know why, I remember reading the source. It uses require under the hood, which caches and doesn't call the dynamic variable that it otherwise would.
@roklenarcic Sorry for not adding this earlier, but there is a fairly old Clojure library that despite its age probably still works just fine at providing a "bounded supervised thread pool". It is a fairly thin wrapper around Java's Executors library: https://github.com/amitrathore/medusa The link in the README has expired, but http://archive.org still has it here: https://web.archive.org/web/20140623162704/http://s-expressions.com:80/2010/06/08/medusa-0-1-a-supervised-thread-pool-for-clojure-futures-2/ I am probably not filling you with confidence with this reference 🙂
Wow, deep cut
Doesn’t claypoole have stuff too?
I’ve been trying to find some reference on the overhead of clojure’s data structures, but I’m coming up short. I’m trying to do some back of the envelope calculations of whether an in-memory cache would be viable or not.
https://github.com/clojure-goes-fast/clj-memory-meter maybe useful?
Yes I’ve used that and got completely unexpected results so that’s why I’m trying to guess now :)
Btw during this dive I found this: https://blog.acolyer.org/2015/11/27/hamt/
Very nice performance claims but I’m unsure of the viability. I guess it’s already known since it’s quite old.
the open question on that is whether its a false comparison due to different hashcode calculations
It is indeed a false comparison. I have the original code updated to use Clojure hasheq.
https://github.com/cgrand/confluent-map/blob/master/src/main/java/net/cgrand/TrieMap_5Bits.java
Is HAMT what Bodil is talking about in https://youtu.be/cUx2b_FO8EQ ?
yeah this is an improvement to HAMTs. @cgrand has done some significant experimentation
She claims logarithmic complexity for concatenation and other stuff that's linear time in Clojure though.
as far as I know with clojure it's a log base 32 and the claim is it's "effectively linear"
linear is O(n) which is way worse than log 32. Might you be thinking of the log 32 time for lookups and the claim that this is effectively constant time?
Blah, you are right, I had that scrambled, thanks for the correction
So the claim is,
PersistentVector RRB
Lookup O(logk n) O(logk n)
Push/pop (back) O(1) amortised O(logk n)
Push/pop (front) O(n) O(logk n)
Concatenation O(n) O(logk n)
Split O(n) O(logk n)
@henrik as for HAMT in clojure, iirc hashmaps definitely use them, and what she's talking about in that video with regard to PersistentVector is a variant of the same technique applies to vector indices. afaik the naming isn't standardized but the article(s) she's referencing at 17:30 called them bit-partitioned vector tries. It's a great read if you haven't seen it https://hypirion.com/musings/understanding-persistent-vector-pt-1
To me they are closer to persistent data structures by Chris Okasaki https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf especially binary random access lists but extended to base 32.
So, slicing is actually O(1) for PersistentVector when using subvec
, with the tradeoff of preventing garbage collection of the original.
this rrb stuff is new to me.. guess I know what I'll be digging into on my commute on monday
Bagwell talked about RRB trees at the clojure conj in 2011 https://youtu.be/K2NYwP90bNs?t=1515
the fact that their subvec
/`catvec` operations can be applied to the core PersistentVector is really cool. The mention that regular subvec
prevent gc by wrapping the underlying vector applies that rrb/subvec
doesn't, which is odd because it also says it works on a PersistentVector by reusing its internal structure
ah, "The resulting vector shares structure with the original, but does not hold on to any elements of the original vector lying outside the given index range."
@jesse.wertheim But those operations would still be O(n) for PersistentVector, I think.
even if the perf doesn't always win, good to know about in cases where memory leaks might be an issue. more to learn about, at least. thanks for bringing it up! I love nerding out about this stuff
As RRBs have more even performance across operations, it intuitively seems like a better default, with PersistentVector more suitable for edgecases than the other way around.
another consideration is whether there's a similar improvement in transients, since those are likely to come into play during intensive data ops where performance is important
hrm though actually none of the operations they're talking about would really apply to transients I guess. nevermind!
yeah. it's probably the same temporary mutable wrapper around the respective structures
I had no idea this was shipped in core! Even if not the default, I wonder why it doesn't show up in the online api docs
the core.rrb-vector contrib library docs are at http://clojure.github.io/core.rrb-vector/
to understand if they abandon "the entire trie thing" you don't actually have to look at the implementation, just the cost of transforming to and from a transient, which is O(1)
user=> (doc persistent!)
-------------------------
clojure.core/persistent!
([coll])
Returns a new, persistent version of the transient collection, in
constant time. The transient collection cannot be used after this
call, any such use will throw an exception.
nil
user=>
(cc @henrik) yeah, looks like it's just copying the 32-element head and tail nodes to/from mutable versions used in the transient. makes sense. https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentVector.java#L534 https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentVector.java#L573-L577
I believe Tarjan's real-time catenable deques are even faster (in terms of big-Oh) since they are O(1) for pushing and poping both ends and for catenation, but they are not very cache friendly
Tarjan defines the deques as simple real-time deques and then catenable deques which are layered on top (I think it's been a while since I looked). I implemented the simple real-time deques in clojure https://github.com/pjstadig/deque-clojure/
So I’m trying to measure the overhead of some Clojure data structures, and clj-goes-fast/clj-memory-meter
gives me weird results:
user=> (first ideas)
{:title "0 This is an average title of average length etc etc", :description "0 This is a longer description<snip>", :id #uuid "25f3bb7e-e494-4a5b-9085-841cc634f079"}
user=> (mm/measure ideas :bytes true)
4575704
user=> (def ideas-map (into {} (map (juxt :id identity) ideas)))
#'user/ideas-map
user=> (mm/measure ideas-map :bytes true)
4620768
user=> (def content-length (reduce + (map #(mm/measure % :bytes true) ideas)))
user=> content-length
5071840
user=> (- (mm/measure ideas-map :bytes true) content-length)
-451072
I.e., if the jamm thing that clj-memory-meter is using knows that it’s seen a reference to :title
before, then it only counts the object pointer, whereas the reduce will count the contents of the keyword instance 1000 times.
it is hard to measure object size on the jvm, and really hard to measure object sizes in the presence of structural sharing
newer jvms also do things like represent utf16 strings as utf8 if they can, and intern strings
My quest here is to understand the overhead of Clojure’s data structures, I don’t care about the Strings at all.
I am just saying, I dunno how sophisticated the jmm tool is, but the tricks done by both the datastructures and the runtime make getting a definitive answer analytically really hard
so it is not super surprising that some heuristics are used here and there in jmm, and so its answers are not consistent
With the assumption that jmm knows not to double count references to the same object and knows about interned strings, I can say that the overhead for a persistent map is roughly along the lines of 50-60 bytes per key/value pair, when storing 1000 of them.
keywords don’t use interned strings
(anymore)
ok, but then they are exactly the sort of string that the gc's background string interner would intern
and from my limited experiments when they released that, it works great
but don’t know over what time you would see that impact
yes, but I think it is “now” (Java 9+)
is the default that is
http://openjdk.java.net/projects/code-tools/jol/ is a thing that might be real useful
I’m in Java 11 in the terminal I’m in now and seems like stringdeduplication is off by default
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version | grep -i 'duplicat'
Regarding the core.rrb-vector library, I noted recently there are several issues/bugs filed against it that may be real problems. I haven't dug into it. Before relying on them for something real in production, I'd want to investigate them a bit and/or beef up its test cases.
Regarding memory space used by Clojure data structures, does anyone know if the lib https://github.com/clojure-goes-fast/clj-memory-meter linked earlier has a feature like "here are two data structures d1 and d2. Tell me in the tree of those objects and everything they link to, how many bytes are unique to d1, how many are unique to d2, and how many are shared?" That would be a nifty feature to have when analyzing Clojure persistent data structures, sometimes.
I had experimented quite a while back with some code leading in that direction, but nothing polished.
@orestis from what I know about the implementation, one part of the "it depends" is that it depends on key length for hashmaps and the bit-length for vector indices, since the keys and indices are split into (5-bit, I think) chunks. structural sharing complicates that further, though
@alexmiller I've seen you mention "deep cut" a couple of times recently now. Do you mean this meaning from Urban Dictionary, except applied to Clojure? https://www.urbandictionary.com/define.php?term=Deep%20Cut If so, then I like it, and yes, that was a deep cut. Sometimes that is the only answer I know to a question because I've missed something more current or widely used.
Also btw, Urban Dictionary uses Clojure :)
they have oss clojure on github
@jesse.wertheim Oh yeah, I should have tested instead the PersistentHashMap, I’m now testing PersistentArrayMap with only 3 keys in there. I’m mainly looking for a rule-of-thumb here when loading 1000s of different docs from a database — where there is no structural sharing going on.
I find when dealing with any kind of JIT engine you're usually better off testing this stuff in context so things don't get muddled by hyper-optimization of trivial test cases. I'm more familiar with seeing that happen in v8 than the JVM though
Thanks. I guess I shouldn’t be too worried about memory overhead too much. The UTF-16 default encoding will probably dominate everything that comes from a database.
has anyone ever imported a multimethod from another namespace to use defmethod and then at runtime gotten no method in multimethod...
I have require :refer [my-multimethod]
you need to require a multimethod from the namespace that creates it, not the one extending it
i create the multimethod foo
in namespace x
and I require foo
from x
in y
to use defmethod
and then in z
I require foo
from x
again to call it
@ccann if you require the ns that creates the multi, and the one extending it, you can call it based on the one that created it
in order to use the extension defined in y, yeah