uncomplicate

2024-07-18T11:22:28.328149Z

@quoll To be honest, I am not sure ATM, as I wrote these functions many years ago. It is probable that it was part of the common pattern at one point, and then I just missed the opportunity to trim these arguments during one of big refactorings. As-is, it doesn't introduce notable overhead, but if you are keen on testing this, you might try to remove them and we can see whether it is better that way. I suppose that something similar happens in the OpenCL engine.

2024-07-18T11:28:13.395889Z

@quoll BTW I would really appreciate any suggestions about how to improve Neanderthal, since I can see that you are looking at the details! Feel free to experiment with it, too 🙂 PS Critiques are welcome, please don't hesitate to point out at the things that you think should have been done differently.

quoll 2024-07-18T11:58:22.543559Z

Unfortunately, I’m not in a position to test any changes right now, as I don’t have an Intel-based computer. Instead, I’m trying to rewrite the CUDA functions as Metal for a Mac. It’s mostly a simple translation but there are a LOT of functions! And NVIDIA have a few math functions that Metal doesn’t (mostly non-POSIX functions like https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__DOUBLE.html#group__CUDA__MATH__DOUBLE_1gef012e8d10e9ef980940f65630f77ae3) once I’m done and tested then I’m still a long way from writing the Neanderthal engine for it. So I’ll be a while. I’m about ⅓ of the way through translating the CUDA file. I’m out of the vect_ functions and into the ge_ functions.

octahedrion 2024-07-18T12:03:45.545839Z

@quoll you are a legend! out of interest, what are the core functions that are used 80% of the time in real world computations ?

quoll 2024-07-18T12:23:39.907659Z

For me? That would be transfer!, dot, asum… that sort of thing. But others show up

2024-07-18T13:15:58.548369Z

@quoll how do you access metal from the JVM? Do you use any Java library, or you had to write your own through JNI/JNA/etc.?

quoll 2024-07-18T13:21:42.534849Z

I’m taking the JNI path.

quoll 2024-07-18T13:22:57.808519Z

I learned how to use Metal via Swift, but I’m moving over to doing it in C. All the references are in ObjectiveC though, and I’m still learning how to change that over

quoll 2024-07-18T13:24:18.787159Z

JNA would make more sense if there was a cross-platform component, but since it’s entirely a Mac platform, and it’s a performance sensitive library, then JNI/C make the most sense

wikipunk 2024-07-18T14:02:23.997459Z

There is Metal support in LWJGL3 via Vulkan compatibility with MoltenVK so you might be able to avoid JNI yourself for the Metal compute shader part

octahedrion 2024-07-18T14:54:27.105739Z

BTW today I learned about https://github.com/apangin/nalim