uncomplicate

quoll 2025-12-18T18:26:44.038569Z

I'm curious… does Neanderthal offer async execution on CUDA? Or can it return a future?

quoll 2025-12-18T18:59:21.017339Z

Returning a future, or parking a virtual thread would need a stream callback, and I see that there is one in clojurecuda at src/java/uncomplicate/clojurecuda/internal/javacpp/CUStreamCallback.java but I can't find it being used anywhere.

quoll 2025-12-18T19:24:47.769749Z

Thinking out loud here (so please feel free to correct my errors). It occurs to me that maybe there is scope to have virtual thread support in Neanderthal. For instance: • C++: Implement JNI_OnLoad and keep the VM pointer. • C++: Register a cuda completion function with cudaLaunchHostFunc • C++: Call a kernel function, or cudaMemcpyAsync. Then return to the JVM. • JVM: Return a CompletableFuture (or increment a CountDownLatch, or call LockSupport.park()) • C++: After the CUDA operation is done, the completion function calls vm->AttachCurrentThread() / env->some_neaderthal_function() / vm->DetachCurrentThread() • JVM: The neanderthal function has been called from C++ land. This function delivers to the CompletableFuture, or decrements the countdown latch, or LockSupport.unpark() From the JVM's perspective, it could look like a Future, or a blocking call that can be preempted. I've worked with operations that took hundreds of milliseconds to transfer data across the bus, or that take a while to execute in the device, so I thought it might be useful to return to the JVM until the device is ready.

2025-12-18T23:34:51.820669Z

All GPU-related calls in Neanderthal are async (when it makes sense).

👍 2
quoll 2025-12-18T23:50:38.216539Z

Thank you!