that warning about the mismatched versions is ok.
then, you'll also have to build the snapshots of uncomplicate commons, uncomplicate cuda, and other neandertahal-base dependencies (if any) that are snapshots in the current github repo.
build ie, lein install.
forget cuda, I forgot you're using apple macos.
then, you lein install neanderhtal-base, neanderthal-openblas, and neandertahl-accelerate. also neanderhtal-apple if you want the convenience of the bundled dependencies.
now there are a few relevant hello-world projects
https://github.com/uncomplicate/neanderthal/blob/master/examples/hello-world-aot/project.clj
Once you install all neanderhal libs (snapshots) locally, you do not need to try with your custom code. First try the provided hello world project, just to be sure whether the libraries themselves are ok. If the hello world project works, only then explore further, just to make sure that the errors are on my side (and which I can fix quickly) 🙂
I did install everything in the neanderthal repo, minus opencl and cuda.
Better yet, run lein midje for all these subprojects (base, openblas, accelerate) while building. If I left over some bug in the latest snapshots, that should discover it. Otherwise, if they pass, you'll be fine, especially if the hello world project works ok.
@solussd I've deployed the latest snapshots. Please try examples/hello-world-aot (clone neanderthal, cd to examles/hello-world-aot/....../native and load it in the repl). It should work from the Clojars directly, with no manual work on your side... Please report if it's all right now or something is off. Please note, I've built and tried this on the latest Macos Sequoia 15.5. If you're still on one of the previous versions, you might need to update (but I don't know, it should work with 15.0 up...)
@solussd I've introduced thrading? and threading! functions in the native namespace. You can use them to set threading to true, false, or a specific number of threads. Please note that specific number of threads is backend-dependent. MKL and OpenBlas treat that number differently (MKL per-thread, OpenBLAS globally). I would stick with true or false for most cases...
Is there a blessed way to do platform detection at runtime in Neanderthal?
nREPL server started on port 64331 on host 127.0.0.1 -
(in-ns 'hello-world.native)
=> #object[clojure.lang.Namespace 0x2400510 "hello-world.native"]
Loading src/hello_world/native.clj... Jun 09, 2025 9:28:42 AM clojure.tools.logging$eval2922$fn__2925 invoke
INFO: Accelerate backend loaded. from the hello-world-aot native namespace:
(rand-normal! (dv 5))
Execution error (NullPointerException) at uncomplicate.neanderthal.internal.cpp.common/int-ptr (common.clj:33).
Cannot invoke "uncomplicate.neanderthal.internal.api.Block.buffer()" because "x" is nullI removed all my local deps to ensure it was fetching from the repos
Before that, does hello world work for you or not? We should see a matrix printed out.
Yes
Matrix multiplication worked when I compiled all the things, too.
Cool. Did you require the random namespace?
yes
(ns hello-world.native
(:require
[uncomplicate.neanderthal
[random :refer :all]
[core :refer :all]
[native :refer :all]]))
;; We create two matrices...
(def a (dge 2 3 [1 2 3 4 5 6]))
(def b (dge 3 2 [1 3 5 7 9 11]))
;; ... and multiply them
(mm a b)
;; If you see something like this:
;; #RealGEMatrix[double, mxn:2x2, layout:column, offset:0]
;; ▥ ↓ ↓ ┓
;; → 35.0 89.0
;; → 44.0 116.0
;; â”— â”›
;; It means that everything is set up and you can enjoy programming with Neanderthal :)
(rand-normal! (dv 5))Can you go to neanderthal-accelerate project and run these tests? https://github.com/uncomplicate/neanderthal/blob/master/neanderthal-accelerate/test/uncomplicate/neanderthal/accelerate_test.clj
with lein midje
so, not lein test, but lein midje
All checks (3887) succeededI'm not by my mac right now, so I can't try your code myself, but all these tests (which include random generator tests) pass on my mac...
hmmm.
Can you copy/paste/adjust some of that test code into the hello world?
Maybe there is a bug in the case when seed is not provided. I'll have to test that. In the meantime, please use it with the seed.
hm, weird, ok that worked, e.g.:
(let [v (dv 5)]
(rand-normal! (rng-state v 42) v))
#RealBlockVector[double, n:5, stride:1]
[ -0.21 -0.98 0.47 -1.52 0.63 ]thanks
That's a bug, but an easy one to fix. I'll fix it tonight and I'll push new snapshot to clojars.
running a thing that usually takes ~ 80 seconds on my 32GB RAM Intel Ultra 9 185h. It’s still running and it’s been 5 minutes on my Apple M4 Max with 128GB RAM. I know that’s not a lot of information, but surprised it’s so much slower.
I didn't catch it because this case in covered in MKL engine, which is the default on linux and windows, which I use for development.
Depends on the code that you use. If you can share some I could try it...
I didn't benchmark arm accelerate yet, but I'm surprised that it's considerably slower than intel mkl (which is top class, I admit, but the difference shouldn't be that much)...
It may be that Intel is using multicore by default, while accelerate might running in a single thread...
I think it’s dominated by calls to ptrf!,
.. that’s not it.
wait, it is. I’m calling ptrf! from multiple threads. Calling it from 1 and it takes ~3ms. Calling it from 10 and each of them takes more than 1500ms.
In general, you should not create your own threads with any of BLAS/LAPACK implementations. Or, if you insist on your threading, you should configure the library to use single threads (usually you do this through relevant environment variables).
Actually, you can configure that from the repl. The relevant function for accelerate is this: https://github.com/uncomplicate/neanderthal/blob/91764385265b57305a60f23d7144f1e4bb76bee0/neanderthal-accelerate/src/uncomplicate/neanderthal/internal/cpp/accelerate/factory.clj#L60
@solussd and for OpenBLAS (which is used for linalg functions https://github.com/uncomplicate/neanderthal/blob/91764385265b57305a60f23d7144f1e4bb76bee0/neanderthal-openblas/src/uncomplicate/neanderthal/internal/cpp/openblas/factory.clj#L61
@solussd Please try the latest snapshots. The rand-nomal! and rand-uniform! functions should work now without seed.
(Clojars, of course)
weird, if I set openblas num-threads to 1 it all runs super-fast. Set to anything higher it runs progressively slower
is this context switching in vector units?
I guess that it depends on what else you do in your code, but otherwise OpenBLAS is not that great at multithreading, especially compared to MKL. Depends on the functions. The accelerate engine uses mostly Accelerate for BLAS, and OpenBLAS for LAPACK functions, but even then, this OpenBLAS should actually delegate to Accelerate (that can be cofigured, too). If you want to know more, you'll have to check openblas forums...
You can also try the OpenBLAS engine, and compare to Accelerate (just explicitly create both factories, and use them with core functions alongside each other...
https://discourse.julialang.org/t/regarding-the-multithreaded-performance-of-openblas/75450/7
Interesting- this explains why setting threads to 1 speeds things up (1 to 1 with calling threads / on calling threads), but you’d think that it’d improve once set to a much higher number, too.
You can see the native namespace for what I do there. However, the whole architecture is built in a way that you don't need to do platform detection in your code. Simply prefer the core functions instead of native namespace, and use your data as factory providers (it's already how it works). Then, you only need to choose the factory you want to use once at the start of your program, and everything else fits in. You can also use many backends (guided by respective factories) alongside each other. The only issue is that the factory that you like may not be able to physically run on your machine (MKL on MacOS for example)... Anyway, I write my code in platform-agnostic way, and defer factory selection to exactly one place...
ah, right-- I guess i can call the openblas thread setter regardless of platform
You don't even need to be openblas-specific. Call the native/threading! function and it should work not only on all platforms, but on all backends!