data-science

Ovi Stoica 2025-08-06T11:24:45.108909Z

Hello, is there a numpy alternative available in clojure? I need it for these operations:

audio_int16 = np.frombuffer(buffer, np.int16)  # ← signed int16
  audio_float32 = audio_int16.astype(np.float32) / 32768.0  # ← signed range
Basically, very efficient conversion of a byte array representing a signed 16-bit integer array to float32 array and getting back the byte buffer of that representation. I could and did write something like this in Java using ByteBuffer but I’m curios if there is an even more efficient lib that does it before I reach to bit operations myself 😄

Ovi Stoica 2025-08-06T11:28:44.687929Z

The purpose is to send an audio chunk to a VAD analizer like Silero and it expects the input to be float32

respatialized 2025-08-06T12:12:18.103459Z

https://github.com/cnuernber/dtype-next This should do what you want - and just like NumPy arrays back Pandas dataframes, dtype-next buffers back the columnar format used by https://github.com/techascent/tech.ml.dataset and the higher-level https://github.com/scicloj/tablecloth wrapper.

👍 1
☝️ 1
phronmophobic 2025-08-06T12:15:18.829179Z

Unless you’re doing any further processing, using a java nio byte buffers should also be pretty straightforward.

Ovi Stoica 2025-08-06T12:16:53.535689Z

Yeah, but that is slow compared to native byte operations from what I saw. This needs to happen in the context of continuous realtime audio coming in where you need to decide if the user started/stopped speaking. Context: https://github.com/shipclojure/simulflow

phronmophobic 2025-08-06T12:17:24.825869Z

depending on how you are getting your audio samples, you might be able to request them as floats directly so you don’t have to convert them

Ovi Stoica 2025-08-06T12:19:22.297969Z

Do you mean, from AudioSystem, for example? The base audio format through the pipeline is 16kHz PCM mono. AFAIK, this is usually represented as ints

phronmophobic 2025-08-06T12:23:23.920869Z

I think it depends on the platform. You’re allowed to request different formats. Not all formats may be supported

phronmophobic 2025-08-06T12:25:45.464519Z

Yeah, but that is slow compared to native byte operations from what I saw. This needs to happen in the context of continuous realtime audio coming in where you need to decide if the user started/stopped speakingDid you profile to see what the slow part was? If anything, I would guess boxed math as the culprit and not byte buffers.

phronmophobic 2025-08-06T12:27:26.037799Z

A single audio input is usually not very high data throughput, so you can usually get away with any approach.

phronmophobic 2025-08-06T12:38:14.296499Z

One of the challenges with realtime audio isn’t data processing but making sure threads wake up responsively when new data arrives. You want to keep your buffers full so that there are no blips and cracks, but not too full and introduce delays

Ovi Stoica 2025-08-06T12:43:48.579729Z

Interesting! I’d love to hear more on this! Currently, audio through the AI pipeline is mostly split into chunks (mostly of 32ms but can be less), and the processors through the pipeline work with those chunks.

phronmophobic 2025-08-06T12:45:39.074789Z

Ok, I feel like you should be good. I think the minimum resolution for sleeps is typically like 2-3ms if I recall correctly.

phronmophobic 2025-08-06T12:47:45.767759Z

I’m curious why you thought the short to float conversion was slow. It can’t hurt to make it more efficient, but if there are delays I would expect them to be elsewhere.

phronmophobic 2025-08-06T12:48:34.891479Z

the sleep resolution is more of a problem for playing audio rather than consuming audio

phronmophobic 2025-08-06T12:49:41.555509Z

I’m away from computer. Maybe it is slow. I don’t have any easy way to check at the moment.

Ovi Stoica 2025-08-06T12:52:24.617849Z

I didn’t want to go ByteBuffer basically because of this answer: https://stackoverflow.com/a/12347176

phronmophobic 2025-08-06T12:55:07.268919Z

Those benchmarks are on android so I’m not sure they would hold up on a desktop jvm.

Ovi Stoica 2025-08-06T12:55:52.303239Z

Lol. Didn’t see that

phronmophobic 2025-08-06T12:57:30.618609Z

I’m also not sure that code is doing what you want. I think you need to convert the signed integer to a float with a division. https://github.com/phronmophobic/whisper.clj/blob/bae472e3f3d4da0a723b6037bf5aefc6bf1974a3/src/com/phronemophobic/whisper.clj#L54 Unless my code is wrong, which would be good to know. It works for my use case, but might be technically wrong.

🚀 1
phronmophobic 2025-08-06T12:58:27.099009Z

That function could be sped up for sure. I’m not sure it’s likely to be a bottleneck.

2025-08-13T09:13:28.189849Z

@ovidiu.stoica1094 Sorry for the late answer, I was on vacation. Deep Diamond has transformers that can convert pretty much any useful tensor to any other tensor backed by Intel's native x86 ops. I doubt anything you'll find would go faster than that (if used properly). The transform function (https://github.com/uncomplicate/deep-diamond/blob/0e054b85579120d90ef861b5562c929d80051ae0/deep-diamond-base/src/uncomplicate/diamond/tensor.clj#L302) will create a custom transformer function optimized for your particular combination of shape, layouts, and data types, and you can then call it many times on ever changing data...