uncomplicate

2025-06-08T11:41:49.217319Z

that warning about the mismatched versions is ok.

2025-06-08T11:43:40.763079Z

then, you'll also have to build the snapshots of uncomplicate commons, uncomplicate cuda, and other neandertahal-base dependencies (if any) that are snapshots in the current github repo.

2025-06-08T11:44:02.893609Z

build ie, lein install.

2025-06-08T11:44:14.386269Z

forget cuda, I forgot you're using apple macos.

2025-06-08T11:45:58.872229Z

then, you lein install neanderhtal-base, neanderthal-openblas, and neandertahl-accelerate. also neanderhtal-apple if you want the convenience of the bundled dependencies.

2025-06-08T11:48:51.744119Z

now there are a few relevant hello-world projects

2025-06-08T11:49:04.888249Z

@solussd

2025-06-08T11:52:15.736509Z

Once you install all neanderhal libs (snapshots) locally, you do not need to try with your custom code. First try the provided hello world project, just to be sure whether the libraries themselves are ok. If the hello world project works, only then explore further, just to make sure that the errors are on my side (and which I can fix quickly) 🙂

Joe R. Smith 2025-06-09T14:00:07.099559Z

I did install everything in the neanderthal repo, minus opencl and cuda.

2025-06-08T11:55:54.496329Z

Better yet, run lein midje for all these subprojects (base, openblas, accelerate) while building. If I left over some bug in the latest snapshots, that should discover it. Otherwise, if they pass, you'll be fine, especially if the hello world project works ok.

2025-06-08T14:16:07.635349Z

@solussd I've deployed the latest snapshots. Please try examples/hello-world-aot (clone neanderthal, cd to examles/hello-world-aot/....../native and load it in the repl). It should work from the Clojars directly, with no manual work on your side... Please report if it's all right now or something is off. Please note, I've built and tried this on the latest Macos Sequoia 15.5. If you're still on one of the previous versions, you might need to update (but I don't know, it should work with 15.0 up...)

2025-06-10T12:47:47.175969Z

@solussd I've introduced thrading? and threading! functions in the native namespace. You can use them to set threading to true, false, or a specific number of threads. Please note that specific number of threads is backend-dependent. MKL and OpenBlas treat that number differently (MKL per-thread, OpenBLAS globally). I would stick with true or false for most cases...

❤️ 1
Joe R. Smith 2025-06-11T01:43:27.108519Z

Is there a blessed way to do platform detection at runtime in Neanderthal?

Joe R. Smith 2025-06-09T14:29:12.495859Z

nREPL server started on port 64331 on host 127.0.0.1 - 
(in-ns 'hello-world.native)
=> #object[clojure.lang.Namespace 0x2400510 "hello-world.native"]
Loading src/hello_world/native.clj... Jun 09, 2025 9:28:42 AM clojure.tools.logging$eval2922$fn__2925 invoke
INFO: Accelerate backend loaded.

Joe R. Smith 2025-06-09T14:53:11.427919Z

from the hello-world-aot native namespace:

(rand-normal! (dv 5))
Execution error (NullPointerException) at uncomplicate.neanderthal.internal.cpp.common/int-ptr (common.clj:33).
Cannot invoke "uncomplicate.neanderthal.internal.api.Block.buffer()" because "x" is null

Joe R. Smith 2025-06-09T14:55:02.108209Z

I removed all my local deps to ensure it was fetching from the repos

2025-06-09T17:01:32.656349Z

Before that, does hello world work for you or not? We should see a matrix printed out.

Joe R. Smith 2025-06-09T17:01:44.672039Z

Yes

Joe R. Smith 2025-06-09T17:02:13.003299Z

Matrix multiplication worked when I compiled all the things, too.

2025-06-09T17:02:23.604679Z

Cool. Did you require the random namespace?

Joe R. Smith 2025-06-09T17:02:55.141069Z

yes

Joe R. Smith 2025-06-09T17:03:16.072689Z

(ns hello-world.native
  (:require
    [uncomplicate.neanderthal
     [random :refer :all]
     [core :refer :all]
     [native :refer :all]]))

;; We create two matrices...
(def a (dge 2 3 [1 2 3 4 5 6]))
(def b (dge 3 2 [1 3 5 7 9 11]))
;; ... and multiply them
(mm a b)

;; If you see something like this:
;; #RealGEMatrix[double, mxn:2x2, layout:column, offset:0]
;; ▥       ↓       ↓       ┓
;; →       35.0    89.0
;; →       44.0   116.0
;; â”—                       â”›
;; It means that everything is set up and you can enjoy programming with Neanderthal :)

(rand-normal! (dv 5))

2025-06-09T17:03:43.422469Z

Can you go to neanderthal-accelerate project and run these tests? https://github.com/uncomplicate/neanderthal/blob/master/neanderthal-accelerate/test/uncomplicate/neanderthal/accelerate_test.clj

2025-06-09T17:03:50.919919Z

with lein midje

2025-06-09T17:03:56.833239Z

so, not lein test, but lein midje

Joe R. Smith 2025-06-09T17:04:38.162569Z

All checks (3887) succeeded

2025-06-09T17:04:42.701639Z

I'm not by my mac right now, so I can't try your code myself, but all these tests (which include random generator tests) pass on my mac...

2025-06-09T17:04:47.879259Z

hmmm.

2025-06-09T17:05:24.267859Z

Can you copy/paste/adjust some of that test code into the hello world?

2025-06-09T17:06:58.392669Z

Maybe there is a bug in the case when seed is not provided. I'll have to test that. In the meantime, please use it with the seed.

Joe R. Smith 2025-06-09T17:13:34.603829Z

hm, weird, ok that worked, e.g.:

(let [v (dv 5)] 
  (rand-normal! (rng-state v 42) v))

#RealBlockVector[double, n:5, stride:1]
[  -0.21   -0.98    0.47   -1.52    0.63 ]

Joe R. Smith 2025-06-09T17:13:49.381109Z

thanks

2025-06-09T17:27:02.804919Z

That's a bug, but an easy one to fix. I'll fix it tonight and I'll push new snapshot to clojars.

Joe R. Smith 2025-06-09T17:27:11.811859Z

running a thing that usually takes ~ 80 seconds on my 32GB RAM Intel Ultra 9 185h. It’s still running and it’s been 5 minutes on my Apple M4 Max with 128GB RAM. I know that’s not a lot of information, but surprised it’s so much slower.

2025-06-09T17:27:46.812479Z

I didn't catch it because this case in covered in MKL engine, which is the default on linux and windows, which I use for development.

👍 1
2025-06-09T17:28:20.402579Z

Depends on the code that you use. If you can share some I could try it...

2025-06-09T17:29:50.504049Z

I didn't benchmark arm accelerate yet, but I'm surprised that it's considerably slower than intel mkl (which is top class, I admit, but the difference shouldn't be that much)...

2025-06-09T17:30:20.180149Z

It may be that Intel is using multicore by default, while accelerate might running in a single thread...

Joe R. Smith 2025-06-09T17:33:39.366659Z

I think it’s dominated by calls to ptrf!,

Joe R. Smith 2025-06-09T17:36:22.691189Z

.. that’s not it.

Joe R. Smith 2025-06-09T17:41:01.000019Z

wait, it is. I’m calling ptrf! from multiple threads. Calling it from 1 and it takes ~3ms. Calling it from 10 and each of them takes more than 1500ms.

2025-06-09T19:17:50.077219Z

In general, you should not create your own threads with any of BLAS/LAPACK implementations. Or, if you insist on your threading, you should configure the library to use single threads (usually you do this through relevant environment variables).

2025-06-09T19:45:12.599659Z

Actually, you can configure that from the repl. The relevant function for accelerate is this: https://github.com/uncomplicate/neanderthal/blob/91764385265b57305a60f23d7144f1e4bb76bee0/neanderthal-accelerate/src/uncomplicate/neanderthal/internal/cpp/accelerate/factory.clj#L60

2025-06-09T21:43:53.602149Z

@solussd Please try the latest snapshots. The rand-nomal! and rand-uniform! functions should work now without seed.

2025-06-09T21:44:05.060049Z

(Clojars, of course)

Joe R. Smith 2025-06-09T21:53:31.550779Z

weird, if I set openblas num-threads to 1 it all runs super-fast. Set to anything higher it runs progressively slower

Joe R. Smith 2025-06-09T21:53:48.385649Z

is this context switching in vector units?

2025-06-09T23:15:40.105209Z

I guess that it depends on what else you do in your code, but otherwise OpenBLAS is not that great at multithreading, especially compared to MKL. Depends on the functions. The accelerate engine uses mostly Accelerate for BLAS, and OpenBLAS for LAPACK functions, but even then, this OpenBLAS should actually delegate to Accelerate (that can be cofigured, too). If you want to know more, you'll have to check openblas forums...

2025-06-09T23:17:28.558039Z

You can also try the OpenBLAS engine, and compare to Accelerate (just explicitly create both factories, and use them with core functions alongside each other...

Joe R. Smith 2025-06-10T02:01:14.767939Z

Interesting- this explains why setting threads to 1 speeds things up (1 to 1 with calling threads / on calling threads), but you’d think that it’d improve once set to a much higher number, too.

2025-06-11T08:39:09.200029Z

You can see the native namespace for what I do there. However, the whole architecture is built in a way that you don't need to do platform detection in your code. Simply prefer the core functions instead of native namespace, and use your data as factory providers (it's already how it works). Then, you only need to choose the factory you want to use once at the start of your program, and everything else fits in. You can also use many backends (guided by respective factories) alongside each other. The only issue is that the factory that you like may not be able to physically run on your machine (MKL on MacOS for example)... Anyway, I write my code in platform-agnostic way, and defer factory selection to exactly one place...

Joe R. Smith 2025-06-11T17:03:16.908259Z

ah, right-- I guess i can call the openblas thread setter regardless of platform

2025-06-12T08:33:29.239279Z

You don't even need to be openblas-specific. Call the native/threading! function and it should work not only on all platforms, but on all backends!