This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-03-21
Channels
- # announcements (26)
- # babashka (115)
- # babashka-sci-dev (5)
- # beginners (48)
- # calva (69)
- # cider (4)
- # clj-commons (11)
- # clj-kondo (1)
- # cljfx (29)
- # clojure (109)
- # clojure-art (1)
- # clojure-czech (1)
- # clojure-europe (33)
- # clojure-nl (1)
- # clojure-nlp (3)
- # clojure-norway (7)
- # clojure-uk (1)
- # clojurescript (63)
- # clr (1)
- # data-science (41)
- # datalevin (1)
- # datomic (11)
- # emacs (58)
- # etaoin (11)
- # figwheel-main (1)
- # fulcro (5)
- # google-cloud (12)
- # helix (2)
- # honeysql (21)
- # hyperfiddle (22)
- # joyride (53)
- # malli (52)
- # off-topic (27)
- # portal (4)
- # re-frame (19)
- # releases (3)
- # ring-swagger (5)
- # xtdb (30)
Hi! I want to create dtype-next arrays of random data, and get an index from 1 to N. The following works, but I suspect it's slower than it needs to be. Any advice on how I can speed it up? https://github.com/teodorlu/lab/tree/fc5ce9f6cbe19e33b536139f5e24a7f808a66a44/src/hypercube2.clj, https://github.clerk.garden/teodorlu/lab/commit/fc5ce9f6cbe19e33b536139f5e24a7f808a66a44/src/hypercube2.html.
(require '[tech.v3.datatype :as d])
(defn randoms2 [N]
(d/->array (repeatedly N rand)))
(defn range2 [N]
(d/->array (range N)))
Hi! This way, both buffers will not consume memory, but rather compute on the fly:
(require '[tech.v3.datatype :as d])
(def N 1000000000)
(defn randoms3 [N]
;; alternatively: (d/make-reader :int32 N idx)
(d/make-reader :float32 N (rand)))
(def range3 [N]
(d/as-reader (range N)))
But note that it means you will get different random numbers each time you look into the result.If you want the random numbers to be cached in memory, then you can apply d/clone
to the result.
I think even with cloning, it is still more efficient than using repeatedly
, because repeatedly
creates a lazy sequence structure (which is probably garbage-collected after we turn it into an array).
amazing, thanks! 💯
Using your code directly gave me a 3x speedup. Further replacing (d/as-reader (range N))
with (d/make-reader :int32 N idx)
gave me 6x speedup.
https://github.com/teodorlu/lab/tree/9ae97af7b93471a898fc9d313ecdc9929964dfcc/src/hypercube3.clj / https://github.clerk.garden/teodorlu/lab/commit/9ae97af7b93471a898fc9d313ecdc9929964dfcc/src/hypercube3.html
Curiously, I'm getting very different performance results on my local machine with java 19 (left) and on clerk garden with java 17 (right). https://github.com/teodorlu/lab/tree/c3fdabd4af9d91e88af70c178dcd293f54a139c6/src/hypercube3.clj, https://github.clerk.garden/teodorlu/lab/commit/c3fdabd4af9d91e88af70c178dcd293f54a139c6/src/hypercube3.html
> Further replacing ... Interesting!
There are a few things going on here. First, your choice of using (rand) - this calls https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L892 which states it is property synchronized by my eyes could not find a mutex or syncrhonization mechanism. As long as we are talking about performance, what about calling https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/Random.java#L1096 which returns a doubles stream and then calling .toArray on that? If all you want is a double array and you have latest dtype-next then you also have ham-fisted. Ham-fisted's lazy-noncaching namespace contains a version of repeatedly that is about what Daniel typed. Finally, for absolute max performance creating a double array you could pass in simply an interface that implements counted and IREduceInit to ham-fisted's double-array pathway and that expects the rfn to be a signature of IFn$ODO - taking one double:
user> (require '[ham-fisted.api :as hamf])
nil
user> (import '[ham_fisted Transformables])
ham_fisted.Transformables
user> (defn rand-reducer [^long n] (reify
clojure.lang.Counted
(count [this] n)
clojure.lang.IReduceInit
(reduce [this rfn acc]
(let [rfn (Transformables/toDoubleReductionFn rfn)
r (java.util.Random.)]
(loop [idx 0 acc acc]
(if (< idx n)
(recur (unchecked-inc idx)
(.invokePrim rfn acc (.nextDouble r)))
acc))))))
#'user/rand-reducer
user> (hamf/double-array (rand-reducer 10))
[0.8509069788152502, 0.0507300247021899, 0.0637855191549086, 0.6670852005972568,
0.8158715565985785, 0.43124503519348023, 0.32684548017855586, 0.2408809810402055,
0.689446797741404, 0.6394673624012449]
If you want more doubles than a certain number you need to create a random number generator per-thread using https://docs.oracle.com/javase/8/docs/api/java/util/SplittableRandom.html and if you want yet more performance for really large N then using mkl's facilities via neanderthal should get you that.It is interesting to me think about a computing module where all you have are count and reduce - there is no iteration mechanism exposed. Multi-arity map becomes a impossible so this model is very limited in that sense but it is quite high performance.
> all you have are count and reduce
Note that I'm not really simulating anything interesting here yet!
I've previously worked a bit with stochastic cost/time estimation, where we used monte-carlo-simulation and distributions to work with uncertainty. So in a "real" use case, I'd be interested in:
1. Sampling multiple stochastic variables from different distributions
2. Combining them them with different (hopefully vectorized operations)
3. Displaying stochastic variables as histograms
4. Fitting resulting distributions to find expected value, standard deviation, skew, different kinds of properties.
We were using an Excel-oriented workflow with terrible performance. We generally sampled N=10 000, and had to wait a bit. I suspect that an idiomatic dtype-next
-based solution would be ~instant; we could even go for a larger sample.
Though I don't think there's any dependency between rows. You're a better judge of hos to make stuff like this fast than me!
We have annealers which are currently a hotspot on a project so we dove way into this stuff for abit 🙂.
I've also been wondering whether it's possible to "compile away the allocations" alltogether. Perhaps that's what you're talking about? Let's say you have:
x ~ normal-distribution(100, 5)
Y ~ lognorm-distribution(lower=90, mean=100, higher=140)
Z = X * Y
In order to have all immutability guarantees, I'm really only interested in being able to seed the distribution sampling. Then everything else could be "derived" on demand, similar to the reader abstraction in http://tech.ml.Right so in the above case with the custom IReduceInit everything is local to the stack - this aids greatly in hotspot being able to compile away allocations. In your latest example, you don't care really about X,Y after the fact. dtype-next will get you close to a zero-allocation pathway as it won't box anything in the first place if it doesn't have to. So declaring X,Y on the stack immediately before the reduction will give hotspot the most ability to 'compile away' things like what you are describing so in essence that is a level beyond what dtype-next is able to do as in your case all you care about is some abstract Z which could be represented as I did above by something implementing merely count and reduce - until you want to combine it in with one or more other variables. Latest dtype-next is really heavily optimized to make reductions as efficient as possible using that approach. I imagine it will be quite a bit faster than excel but the absolute fastest approach is of course an order of magnitude more work 🙂 If you are getting the equations in symbolic form then that is one thing. If you want to type them in at the repl that is another.
@UDRJMEFSN (.toArray (.doubles (java.util.Random.) N))
gave me another 4x speedup 😁
local (left) | clerk garden (right). https://github.clerk.garden/teodorlu/lab/commit/4e9317fa714b29c34ed39d141c2a0c9c7857ff21/src/hypercube3.html.
The clerk garden timings look weird. Perhaps we're hitting some caching; I don't see any reason why teodorlu2
and teodorlu3
should be faster than chrisn1
and chrisn2
. I suppose this is when I have to consider a better approach to benchmarking than my timed
function!
I agree there is more to do here to figure out why things are faster or not. Or given these lessons you can implement custom readers with overloaded reduce methods and get both
There is another architecture we could derive based on things that are double suppliers. Some of the perf difference is due to the use of rand
like I noted earlier as opposed to a random object created in a local context. But assuming we have a function that takes no arguments and returns a double we can derive an entire architecture, similar to dtype next but not assuming random access and not assuming any caching.
This will have very fast reduction, collection semantics esp. if we guarantee we will be using things in a single-threaded context or at least we guarantee a serial reduction operation. That is essentially java streams but supporting ireduceinit and with normal arithmetic operators similar to dtype-next's functional namespace.
Also, if we constrain the numeric space purely to double scalars and streams of doubles that really does simplify things quite a bit.
@U3X7174KS - Are you trying to use clerk to replace your current spreadsheet system?
Another idea - fastmath's distributions are objects - you could perhaps implement numeric operators as transformations from distribution->distribution by implementing protocols.
> Another idea - fastmath's distributions are objects - you could perhaps implement numeric operators as transformations from distribution->distribution by implementing protocols. @U3X7174KS @UDRJMEFSN if any of you are trying that, I would love to think together and possibly help. This may relate to some current hopes regarding probabilistic programming.
I don't have bandwidth to try it myself now honestly the statistical knowledge but its a thought. The numeric operators are easy if we are talking a double stream-like object. They are not at all easy nor clear to me if they are transformations from distribution to distribution.
Thanks. Yes, in a way, the answer to the question -- how to define numeric operators from distribution to distribution? -- is probabilistic programming. :thinking_face:
> @U3X7174KS - Are you trying to use clerk to replace your current spreadsheet system? My motivation for working wanting to understand array programming and monte-carlo simulation is composite. 1. I was working on the excel system from 2017 to 2019, and I'm no longer working for that company. I was able to get a very nice speedup by using normal clojure vectors. But my proposed user interface (REPL) didn't work the other team members. All had engineering (non software) degrees, and had learned some python and some Elm (my "fault"/"contribution", but it worked! 😅) in order to work effectively with analysis, statistics and data visualization. So this project is kind of personal for me. I wrote a ton of code, was really proud, but we were never able to use it. With Clerk (or notespace, ...), perhaps the outcome would have been different. 2. The regulations that govern civil engineering in europe (https://en.wikipedia.org/wiki/Eurocodes) provide a way for civil engineers to work with uncertainty through safety factors and load combinations. If we had a better way to talk about that uncertainty (for example with monte-carlo and array programming), I believe we could make certain substantial improvements to the status quo. Potentially billions+ euros per year of cost of production, cost of engineering time, and CO2 emissions (see https://en.wikipedia.org/wiki/Environmental_impact_of_concrete, wikipedia). 3. I also believe we could make substantial improvements to the current civil engineering education. When I took my degree (2010-2015), the subjects were either "math oriented" or "implementation oriented". The math was OK, but suffered a bit from being too abstract. Hard to see how the theory was actually applicable in the real world. The implementation oriented subjects were embarassingly bad. We had a course with C, Fortran and OpenGL. Everyone just got stuck. I believe array programming in a language like Python or Clojure could help bridge the gap between theory and implementation.
> Thanks. > > Yes, in a way, the answer to the question > -- how to define numeric operators from distribution to distribution? -- is probabilistic programming. :thinking_face: another loose idea based on my very small limited experience with this: support backtracking in an array container.
(def N 10e6)
(def X (f/source {:var :X :args [N] :init (fn [N] (.toArray (.doubles (java.util.Random.) N)))}))
(def Y (f/source {:var :Y :args [N] :init (fn [N] (.toArray (.doubles (java.util.Random.) N)))}))
(def A (f/+ X (f/* Y 0.2)))
(f/to-symbolic A)
;; =>
'(+ [:var :X] (* [:var :Y] 0.2))
Then the user could set a small N
in their :args
, so that everything is fast and live, and preview histograms, but be able to replace the N later without code duplication.
Note: I'm not suggesting that anyone implement this as-is. But I was asking myself if intermediate values needed to be numeric or symbolic. And I was wondering whether an answer could be "both". And just looking into how https://probprog.github.io/anglican/ works might be better way forward.Here is a simple route - https://github.com/cnuernber/streams#usage It is simpler than dtype-next by far and you can define streams of doubles from distributions and do simple arithmetic on them - all of it typed to double numbers. This is just a concept, it was fun to work on as it is quite cold here in Crestone, CO today.


To close this thread out - I tested various configurations of streams and dtype readers and fixed a fairly major performance issue with make-container. The result is very interesting comparing streams against dtype-next:
Compositions of readers are faster but a single bespoke reader is not much if any faster than a single bespoke noncaching stream stream. So the speed difference is basically undetectable until you start to compose them and then - due to basically multiple checking of the end constraint through the stack - the reader system wins and it has far cheaper per-element access.
so, thanks @U3X7174KS - as I think you suspected - you found a performance regression :-).
Wow, nice! Glad I could help! I've been swamped with work this week, so I haven't had the time to properly try your non-allocating code. > as I think you suspected Honestly, I found something that "looked suspicious", and thought that I had done some something weird, or that Clerk's caching system was causing something something I didn't expect. Either caching speeding things up (avoiding repeated evaluation) or caching slowing things down (hashing and checking hashes dominating the time taken). Working in public with shared source and reproducible computations / observations is powerful! 😁
It is 🙂 - I added a benchmark system to the streams project which you can checkout (scripts/benchmark) - on my mac which is a more stable benchmarking system it shows the same results; streams are slower compositionally than dtype-next which I guess is a good result. Further research for this project will involve trying out mkl's random number generator and potentially apple's systems https://developer.apple.com/forums/thread/721231
I figured out how to make the stream model as efficient as the dtype-next model. Infinite or unlimited streams are just as efficient as dtype-next as they don't incur any per-element checking costs. I now have a benchmarks section - https://github.com/cnuernber/streams#lies-damn-lies-and-benchmarks This does require the composition system (map, take, etc) to mark the streams as limited or not.
Nice!
> If we combine lazy-noncaching with infinite or unlimited streams we get an extremely efficient compositional model as we don’t need iterators in the composition -- there is no need to check hasNext
. We can simply get the next item - throughout this library the things returned implement this efficiently in the unlimited case in the 0-arg IFn implementation so you can just call them with no arguments.
Another interesting side effect of infinite streams is that (unless I’m mistaken) we can compose the streams first, then choose how many values we want to sample later. We get one infinite stream, and when we’re working on the simulation (in the REPL / in a notebook), we can keep the sample size reasonably low. Then we can sample 10x-100x more values when we’d like some more confidence.
And using darkstar + svg for plotting was a great idea. I was using some Vega Lite histograms with Clerk, but I knew I didn’t want to send 100 000 values to the frontend for javascript to bin the values.
Yes that is right - you can create your theoretical stream and decide the number of values to pull from it some time later. I think the library is pretty finished for now actually - perhaps I could add printing of the streams so they print to the repl nicely but that is for later. I added a fast https://github.com/cnuernber/streams/blob/c7cb0e81764dcf01bff3c49b67f9045111da3262/src/streams/api.clj#L959 to the API so you can bin and send the data to the frontend for graphing.