This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
Hi. It is not too late to join the Scicloj real-world-data group. https://clojureverse.org/t/real-world-data-meeting-10/
I'm prototyping a wrapper for https://github.com/ggerganov/ggml, a tensor library for machine learning that powers llama.cpp, whisper, and about a dozen other ML libraries. ggml has multiple backends including cpu, nvidia gpus, and apple silicon. It seems like it's possible to get the boilerplate down to a pretty reasonable API:
(def cpu-sched (cpu-scheduler))
(def gpu-sched (gpu-scheduler))
(def my-graph
(fn my-graph [ctx a b]
(let [out (raw/ggml_scale ctx (raw/ggml_add ctx a b) -1)]
;; multiple outputs
[out
(raw/ggml_sum_rows ctx out)])))
(def n 10000)
(def a (float-array (repeatedly n rand)))
(def b (float-array (repeatedly n rand)))
(def results-cpu (time (compute cpu-sched my-graph a b)))
(def results-gpu (time (compute gpu-sched my-graph a b)))
(prn (-> results-cpu second seq)
(-> results-gpu second seq))
This is definitely not a final API, but seems promising.
Do you find it better to go direct to ggml vs through llama.cpp?
Depends on the use case. For generating tokens from an LLM, the llama.cpp api is much higher level. It would probably be a lot of work reimplementing the models llama.cpp supports just with ggml.
I see, thanks!
Theoretically, it doesn't seem like it would be that bad to implement models directly, but I'm not sure there's a better way to learn the model architecture than reading a bunch of papers, c++, or python.