I'm thinking about building a Clojure wrapper for https://github.com/arrayfire/arrayfire, which provides tensor math over real and complex numbers on the CPU and GPU. The Java wrapper seems pretty much stalled with the last commit 6 years ago and I haven't found any prior work for Clojure. The idea is to use Coffi to build a Java 22 foreign function interface with a Clojure functional interface.
I've finally managed to start with https://github.com/lsolbach/arrayfire-clj. Currently only ArrayFire initialization, array creation for the supported datatypes and zero-copy dytpe-next integration. The only arithmetic operation implemented yet is is 'add'. But it adds on the GPU even for complex arrays. π
Next step should be a sound API design. Maybe a 'low level' API resembling the ArrayFire API, so that the ArrayFire documentation still applies to some degree and a 'higher level' Clojure API making use of protocols...
Feedback welcome.
Instead of using pointer addresses as handles, another option is to use something like deftype. The benefits of a deftype approach are:
β’ you can implement java.lang.AutoCloseable so that the create-array* functions can work with with-open
β’ You can also use java.lang.ref.Cleaner's to automatically garbage collect arrays (if desired)
β’ It makes it much harder to accidentally crash the JVM by passing something that is not a handle to be released.
β’ arrays can be printed in a more dev friendly way.
It might also be possible to implement the various dtype next protocols such that fire arrays can be directly passed to/from dtype-next without an explicit conversion step.
Sounds reasonable.
If you're working with dtype-next, you can also look at tech.v3.resource/track https://github.com/techascent/tech.resource/blob/b39d5d193f08c427cb8244e67ade7b92860e9a4a/src/tech/v3/resource.clj#L60
Looks very neat and clean! Great work!
If someone hands you a fire array, is there a way to check its dimensions? That might also be useful to store as part of your handle.
I have to look into the ArrayFire API more deeply. My first priority was a proof of concept of the integration of ArrayFire, coffi and dtype-next. That's working now (at least with my Intel GPU on the Laptop). Software engineering just begins. And I plan to make good use of AI in the implementation.
Your input is very helpful to steer it. π
Arrayfire provides type info and the dimensions for an array handle.
The close function of the AFArray type is implemented via the Cleaner, and I would use this close function as the callback for track in tech.resource. So the mechanisms would stack. https://github.com/lsolbach/arrayfire-clj/blob/613c6943bdbaa128ca361df1d279d0721af32cbd/src/org/soulspace/arrayfire/integration/base/jvm_integration.clj#L127
The registration with track would be done in the unified-api functions, which at the moment handle the error checks and handle to AFArray conversion.
Or in the creation of the AFArray handle, then it would be centralized.
I did some examples, e.g. a mandelbrot/julia animation, and found that it is really easy to run out of device memory on the GPU with a stack-resource-scope around everything with the with-arrayfire macro. ArrayFire is mainly functional and creates new arrays on e.g. arithmetic operations and with resource tracking of the AFArray handle on creation, these arrays are retained for too long. So nested scopes are needed to free the AFArray handles no longer needed and keep the number of arrays retained on the device under control.
So a bit of micro-management on the side of the developer is needed. π
On the plus side, ArrayFire arrays can be considered immutable by default, which reduces the number of surprises for us Clojure developers.
Maybe I will drop tech.resource tracking and stick to AutoCloseable and with-open for the management of the AFArray. It is more explicit and you have more control about which resources are in the with-open binding and which resources are created and closed explicitly. The lifecycle of most of the array resorces on the GPU has to be as short as possible, which does not go well with Java GC and with some global resource context.
Another function approach is to make it convenient for the user to specify a DAG of operations and then you can execute the DAG for them.
ArrayFire provides a JIT compiler, so I don't necessarily want to duplicate it on the JVM side.
Implemented with-arrayfire macro to demarcate the ArrayFire execution.
Establishes:
* ArrayFire initialization (once)
* Optional backend/device switching (serialized via lock)
* FFM Arena scope (confined, deterministic cleanup)
* tech.resource scope (AFArray lifecycle management)
* Result conversion (AFArray β host data)
https://github.com/lsolbach/arrayfire-clj/blob/d6889f7a9b27d5b9a2128818bd61d09d605c9927/src/org/soulspace/arrayfire/core.clj#L404
May be the option to select the Arena type (:confined, :shared) would be nice.
Great. By the way, I am looking into https://ejml.org/ for JVM-based complex linear algebra in the CPU (it'll be useful in a group representation theory project with a friend). I am trying to create a dtype-next like concept of complex tensor, that can wrap double array representations of real numbers and handle them as complex numbers in a zero-copy way. If that works well, I'll share some notes.
Perfect. dtype-next currently has no complex datatypes, but maybe we can talk with Harold and Chris about it.
Never heard of ArrayFire, but I've done a lot of native interop from clojure. It seems like most people who have tried coffi have liked it. I'm partial to dtype-next's ffi since it supports multiple ffi implementations (JNA, graalvm native image). I think Java 22 foreign function interface support may be supported in the future. It looks like ArrayFire is c++. If you wrap it, you'll need to write some glue code to create a C ABI compatible API. I know some clojure devs use javacpp. It doesn't look like there's already a maintained wrapper for ArrayFire, but you can check https://github.com/bytedeco/javacpp-presets to see if there's another similar library that is already wrapped for you. I do think jank could be a good fit for this sort of thing too.
Making things dtype-next compatible (maybe in the beginning only for real-valued types) would be really valuable for Clojure-side tensor-computing, composability with tech.ml.dataset, etc.
My primary need is Tensor/Linear algebra with on complex numbers to provide a fast CPU/GPU enabled backend for the QClojure quantum computing simulations. None of the wrappers for BLAS/LAPACK on the JVM (including neanderthal, jBlas and ND4J) supports complex numbers, which are needed for quantum computing and physics simulations.
I'm implementing a layered approach now. Currently I'm implementing the FFI layer, which will be as close as possible to the ArrayFire C API. Then I will add a Clojure/JVM integration layer with library loading, resource management, error/exception handling, dtype-next integration, etc. The next layer will provide an idiomatic and composeable Clojure API.
Makes sense π
maybe 75% of the bindings done. Calling it a a day. π
Used Sonnet and it generated a ton of documentation. Don't think ArrayFire has that documentation (if it is sound).
I think dtype.next has some kind of FFT related benchmark in performance tests, IIRC? which would imply some complex math handling in there.
dug up the commit (itβs still in there β itβs just convenient to point to for impl + use grouping): https://github.com/cnuernber/dtype-next/commit/103199931635c8c9ba372262a8c5b168f5ddc30d you might have more specific performance needs than is currently in the library, but I really highly recommend watching Chrisβs original London Clojurian talk (then some of his other talks & updates since then) if you havenβt: https://www.youtube.com/watch?v=5mUGu4RlwKE even if you have to drop to another native lib, just for framing how to think about getting high performance computing on Clojure & the JVM while retaining (and even getting unique affordances around) their benefits.
I've watched Chris's talk more than once. It's really good and definitly an inspiration. π
The code in the dtype-next commit is based on a Java implementation of a Complex class, but I don't see any involvement of some native BLAS/LAPACK stuff. I've already implemented a https://github.com/lsolbach/qclojure/blob/main/src/org/soulspace/qclojure/domain/math/fastmath/complex_linear_algebra.clj based on FastMath (backed by Apache Commons Math), which is an order of a magnitude faster than my naive clojure.math based implementation. But it is still way too slow to simulate quantum circuits of reasonable size (say 15 qubits, 100 gates), especially when using variational algorithms which call the circuit many times as part of a classical parameter optimization loop. To get comparable speed with python frameworks (e.g. Qiskit), it is really neccessary, IMHO, to use a native BLAS/LAPACK backed library with CPU/GPU support. BLAS/LAPACK support arrays over complex numbers and the neccessary linear algebra functions, but none of the wrappers on the JVM does (yet). Neanderthal could support it. It's on the TODO list, but not yet started.
@lsolbach if you were dev-ing this in the python ecosystem, would either pytorch or jax (or both) be suitable? I havenβt looked into the state of complex number support in a minute, but Iβve assumed a lot of the hybrid neural+differential approaches that have cropped up over the years would need it? (and they seemed to be mostly in JAX or Juliaβs flux/zygote) This is a meta-question for me as I jump back more into Clojure data science after trying to meet the rest of the data science world where theyβre at and getting mostly nowhere. Iβm wondering where the current sore points are. And it seems like the constant bickering b/t PyTorch ad hoc crazy and Theano -> TF -> JAX havenβt really cooled much: (e.g. https://neel04.github.io/my-website/blog/pytorch_rant/ ) but re: what you describe, it seems like mature linear algebra w/complex number support that hits appropriate hardware level optimization and cpu/gpu affinity is the main gist of the thing you need? (itβs not clear to me for instance in looking through packages like PennyLane for instance where the backend use and interop begin & end for JAX, pytorch, etc).
actually digging into them more, looks like theyβre heavily in JAX but in escape hatches & various lower bits fiddling w/its behavior all over the place: https://github.com/PennyLaneAI/catalyst
tech.resource integration is done using track and releasing! . Maybe in the final API I will wrap the releasing! macro in some with-arrayfire , which could also handle device initialization and optionally backend selection.
(releasing!
(-> (data/constant 1.0 [10 10] defs/AF_DTYPE_F32)
(arith/sin)
(util/print-array)))
Now you don't need a binding vector anymore and all new AFArray handles are automatically tracked, inside a releasing! block.
Is there a way to return an array created inside a releasing! block?
Should be, by converting it to a Clojure data structure (vector-based or maybe Fastmath-based).
That's the fun part, designing an ergonomic Clojure API. I will take a look at the dtype-next API first.
After we have a nice Clojure API, I will build a complex linear algebra backend for QClojure. That's why I wrapped ArrayFire in the first place. I hope to increase the performance of the quantum computing simulations by 1 or 2 orders of magnitude, compared to the current FastMath based backend.
All C API bindings are in place now.
I will look into tech.resources and the dtype-next integration now. AutoCloseable and with-open with-open falls short on ergonomics, because all arrays have to be registered via the bindings. This effectively prevents nice pipelines with threading macros, if you want control over the resource lifecycle. This problem should be addressable with tech.resource, which could still use the AutoCloseable and Cleaner infrastructure.
> AutoCloseable and with-open with-open falls short on ergonomics, because all arrays have to be registered via the bindings. This effectively prevents nice pipelines with threading macros, if you want control over the resource lifecycle.
AutoCloseable can still be helpful in this situation. The two techniques that make it easier to work with are:
(with-open [fire-ctx (create-fire-ctx)]
;; do array stuff
;; all created instances get registered with the enclosing `fire-ctx`
;; instances get released at the end
;; there's some way to mark instances for escape.
)
The other technique is similar to how https://dragan.rocks/articles/19/Deep-Learning-in-Clojure-From-Scratch-to-GPU-1-Representing-Layers-and-Connections does it:
(with-release [x (dv 0.3 0.9)
w1 (dge 4 2 [0.3 0.6
0.1 2.0
0.9 3.7
0.0 1.0]
{:layout :row})
h1 (dv 4)
w2 (dge 1 4 [0.75 0.15 0.22 0.33])
y (dv 1)]
(println (mv! w2 (mv! w1 x h1) y)))I have had issues relying purely on the garbage collector for cleaning up resources. It's great for tinkering, but the issue I had is that since the handle is fairly small, it doesn't seem like the gc prioritizes cleaning it up, even if the handle points to some huge resources.
;; Require the arrayfire API core namespace.
(require '[org.soulspace.arrayfire.api.core :as af])
;; with in the arrayfire context do some array math
(af/with-arrayfire {:backend :opencl
:converter-fn af/->value}
(-> (af/array [1.0 2.0 3.0 4.0] [2 2])
(af/* (af/array [1.0 2.0 4.0 8.0] [2 2]))
(af/sin)))This is a basic example of doing math with arrays on the GPU with the Clojure API on the OpenCL backend. Depending on the machine you can also select, :cuda, :oneapi or :cpu. With the :converter-fn, you can control, how the values are converted, when leaving the ArrayFire context, as the AFArrays will be released then.
Fantastic
what's the difference between passing :converter-fn and calling the same function on the return value of the body of with-arrayfire?
There's a default converter function in place, which you can override. The default converter returns a double array.
is :converter-fn the same as calling the same function on the return value of the body?
Yes, but it happens at the end of the with-arrayfire macro automatically. You could still leak the array by providing identity as the converter-fn.
It's a safety net.
;; No backend/device/manual-eval switching β no lock or try/finally needed
`(do
(device/ensure-af-init!)
(binding [*within-arrayfire?* true]
(with-open [~arena-sym (bmem/open-arena ~arena-type)]
(binding [bmem/*af-arena* ~arena-sym]
(stack-resource-context
(let [~result-sym (do ~@body)]
(device/sync!)
(result-convert ~converter ~result-sym))))))This is one path of the with-arrayfire macro. All C allocations happen in a given Arena and the AFArray and ArrayFire arrary refcounts are done in a tech.resource stack-resource-context. So accessing the AFArray outside of the macro will not work. But yo have access to the lower layers of the wrapper and can handle the resources by yourself, if neccessary.
I'm curious what af/->value returns.
Personally, I would be tempted to just accept and return dtype compatible stuff. If you want a java array or nio buffer, you can call the appropriate dtype function to convert it.
A scalar or vector (of vectors for higher dimensions). Clojure data as opposed to Java Arrays.
Unless you're not using dtype-next under the hood. then nevermind.
Very neat!
I have some https://github.com/lsolbach/arrayfire-clj/blob/main/src/org/soulspace/arrayfire/integration/dtype_next/dtype_next.clj integration. One thing I have to investigate is the native storage differences and conversions. ArrayFire uses column-major storage and dtype-next, AFAIK, uses row-major.
Both can handle strides, so conversion is possible. But I'm not yet sure of the tradeoffs.
Everything is backed by interfaces, so I believe either could be supported, but it's probably more work to support if they don't match. I also don't know off the top of my head.
I hope, changes can be done in the integration layer without impact on the Clojure API. π
There are a few things missing. Some spec or input validation is needed, because it is still possible to crash or hang the JVM. Never crashed a JVM as often as in the last few weeks.
It seems likely that dtype can be integrated without too much trouble. For passing to the ArrayFire, you can easily get pointers. For getting data from ArrayFire, there are helpers for wrapping addresses.
I'm very good at crashing the JVM as well.
It's not so hard as one might imagine.
And I have to build some examples to get a feel for the rough edges of the API.
Not if you reach out int C territory. π
I haven't used C/C++ in over 25 years now. I didn't even remember how bad it was.
Not correct, I implemented stuff on Arduinos. π€
> ArrayFire uses column-major storage and ArrayFire, AFAIK, uses row-major. I'm confused. Which one is which?
EJML (the JVM matrix library I use) and dtype-next tensors are also row-major. So, hopefully, we can have dtype-next tensors (and the new ComplexTensor type we are discussing) wrapping both ArrayFire and JEML with a similar API. Anyway, dtype-next tensors allow us to transpose (e.g., replace columns and rows) as a cheap, lazy-and-non-caching construct wrapping our data, so I think we can also use column-major libraries and still have the same API for everything.
the normal Clojure vector of vectors format is row-major, I have to convert them at the edge. With dtype-next you can specify storage layout via strides, so it should work for both formats.
Ah, dtype-next -> row-major by default. Updated the post above.
First iteration of the idiomatic ArrayFire is done. Will have to check for completeness and then build some examples to see how it feels.
The JVM integration is also mostly in place now, with exception handling and resource management for the arrays via Cleaner and AutoCloseable integration. It is modeled after the ArrayFire Unified API. The next step would be the design of a nice idiomatic Clojure API and the integration with dtype-next.
Sonnet generated a nice overview of the ArrayFire Unified API (https://github.com/lsolbach/arrayfire-clj/blob/main/dev/arrayfire-unified-cpp-api-catalog.md). It's nice to see all the available functions grouped by area in one place.
Shared an update here at Zulip: https://clojurians.zulipchat.com/#narrow/channel/488851-scicloj-webpublic/topic/complex.20linear.20algebra/near/574818206 complex linear algebra @ π¬>
Any thoughts on that?