C++ hosted Clojure (https://jank-lang.org/) + (py)Torch Hello Clojure data science folks: We have been working on bringing native torch into the Clojure world (think pytorch but for clojure) and we would love your feedback! This is made possible by the native C++ hosted Clojure compiler and runtime jank (https://jank-lang.org/, #jank). jank allows seamless interop with C++ code and that's exactly how we are bringing Torch into Clojure (via the C++ https://docs.pytorch.org/docs/stable/cpp_index.html). We are still in early days but we will be presenting in the https://scicloj.github.io/macroexpand-2025/macroexpand_deep.html conf later this month! We would love the Clojure data science community's help and feedback in a few topics: • How does your workflow look like working with Clojure on data science and ML projects? I'm familiar with the python side of things like data processing with pandas, visualization with matplotlib etc.. It would be nice to know how the Clojure community is doing things. • How do you train your models with Clojure? Do you have to do it outside of Clojure? If torch is available to you in Clojure, would that help with your workflow? • Imagine that Clojure torch is at the same level of usability of pytorch, what other libraries do you want to bring into Clojure? Keep in mind that jank is capable of seamless C++ interop and that means it's easy to bring in or build high performance libraries in C++ and use in Clojure.
I have a python plotly server that I have instrumented with an HTTP server for Clojure visualizations. I seem to prefer that to the other frameworks I have tried over the years.
> I have a python plotly server This sounds interesting. What are the inputs/outputs? (json->svg/png?)
My plotly based server has a few endpoints -- a spectrogram endpoint which renders local audio files, and a variety of scatterplot and histogram endpoints that I can exchange data directly via a JSON payload.
They clojure side is very straight-forward:
(defn two-d
[coll]
(let [cc (count coll)
body (-> {:data (->> coll
(map-indexed
(fn [i y]
{:x i
:y y}))
vec)
:options {:show-markers false}}
keys->snake_case
json/write-str)]
(http/post
(str server "plot/scatter")
{:content-type :json
:body body})))for example
^ 🆒
I am not using Clojure for ML, but if you are looking to bring Computer Vision into Jank, an OpenCV wrapper could be pretty useful in addition to libtorch I have not seen people doing Computer Vision in Clojure likely because JVM is not an optimal platform for that
In Computer Vision it is pretty normal to develop in Python and then rewrite everything into C++ for real-time, Jank could make putting research code into production easier
@levitskyyandriy - Interesting things: https://github.com/bytedeco/javacpp-presets/tree/master/opencv and https://github.com/bytedeco/javacv ... ten years ago we called opencv from clojure (https://github.com/techascent/tech.opencv) using other technologies, but that didn't turn out to be worth it. Python opencv from libpython-clj would be another option.
My hypothesis is that OpenCV could be more useful within Jank than within JVM Clojure, because C++ is still being used a lot in production for Computer Vision, while Java much less so
@jianlingzh you probably already know, but Karpathy released C/Cuda version of nanoGPT https://github.com/karpathy/llm.c
There also a few c++ implementations referenced in notable forks part of the README
yeah, i'm aware of it but have not look much into it yet though. would be nice to get a jank version up there. 🙂
The jank team members working on this effort are @jeaye (creator of jank), @shantanu.s.sardesai, @stmontydev and myself. Feel free to ask us any questions you have!
Can you compare and contrast this idea (using jank to operate libtorch) with another similar idea, which would be using libpython-clj to operate pytorch?
> data processing with pandas, visualization with matplotlib We use TMD (https://github.com/techascent/tech.ml.dataset) and tableplot (https://scicloj.github.io/tableplot/) and clay.
I have done some work using deep java library that already exposes pytorch (and tensorflow and mxnet), it is not ideal but it works, especially if you hook it with proxy+plus I have also used duck-db to directly operate on data with honeysql and this is mostly enought
I have also binded jank with libtorch, but i haven't done anything miningfull other than printing ndarrys, ideally i would like to operate on ndarrays using filter/map/reduce type operations.
I generally interested to create libtorch bindings for jank, so maybe i might be able to help ?
Thanks for the pointer. I'm not familiar with libpython-clj and my answer is based on a fresh read on it, so correct me if anything is not accurate:
• The main difference between the two would be architecture and performance:
◦ With libpython-clj, you have two runtime/vm, the JVM and the python interpreter. As I understand it, the python shared lib is loaded into the JVM via java native access. But data must still be translated between the JVM's representation and the CPython interpreter's representation. And there is a cost. For very large datasets or frequent back-and-forth calls in a tight loop, this overhead can become a factor.
◦ With jank + libtorch, everything operates in-process. Because jank is a Clojure dialect hosted directly on C++, your Clojure code and the libtorch C++ code live in the same memory space and the same process.
• Once data is in the hands of the torch code, I don't think there is any difference in terms of the the actual computation, since they are essentially running the same torch code.
• REPL:
◦ for libpython-clj, my understanding is that python object is an opaque reference. You can't easily def a PyTorch model in Clojure and then interactively inspect its internal state as if it were a Clojure map. The boundary is always there.
◦ for jank + libtorch, the goal is to have a unified REPL. A torch::Tensor object can feel like a first-class citizen in your Clojure REPL.
• Ecosystem:
◦ libpython-clj: this is its strong suit. With it you have access to all python side of libs, like numpy and others. Python for ML and data science is still best out there.
◦ jank: on the other hand, jank enables the use of the whole c++ world libs out there. Admittedly, it's still early days and lots of work is still needed. But imagine the possibility that you have native access in Clojure to CUDA, XLA etc..
@hee-foo great to know that you are interested. let's talk in the #jank channel!
@jianlingzh I have also used libpython-clj, and actually created some neural networks, again, there i i had some issues, especially rising the python interpeter in othere threads. It doesn't have the 'native feeling' and is more 'interop' language than DJL. I Generaly dont think that the back and forth is that of a problem. more that you are using two very slow ways to operate, the python interpreter and imutable datastructures and creating custom complex logic is brittle. I would love to code a space state model directly in jank and use it
I also used libpython-clj to create some custom document processing in langchain4j (clojure interop) which was a much better but i had to glue the python and java code together in Clojure, i ended up ditching it after a couple of months with langchain4j's onnx support as i just exported the model from python and using it directly
@hee-foo glad to hear your experience with both. i suppose the lack of 'native feeling' compared with DJL would be caused by the data conversions between the two languages? are you talking about from a performance point of view or from language syntax/ergonomics point of view?
both but the native feeling has more to do with the syntax/ergonomics while i am concerned with robustness when creating layers with custom logic I believe jank can potentially eliminate these issues
> How does your workflow look like working with Clojure on data science and ML projects? If that helps, a few of the libraries for relevant data science workflows are part of the Noj toolkit: https://scicloj.github.io/noj/noj_book.underlying_libraries.html Some typical workflows can be found in the Noj docs. (At least some of the high-level libraries such as Tablecloth, Tableplot, Clay could be ported to Jank, and creating a proof-of-concept might not be hard -- let us chat about it when it seems right. The more lower-level libraries such as ham-fisted, dtype-next, tech.ml.datsaet, do rely on JVM optimisations, and it is not so immediate to imagine their future Jank equivalents, I think.)
> But data must still be translated between the JVM's representation and the CPython interpreter's representation. There are zero-copy pathways between dtype-next tensors to numpy. > your Clojure code and the libtorch C++ code live in the same memory space and the same process. I know less about this approach, but I will be interested to see how it goes. For neural networks, somehow having the capabilities of pytorch from clojure would be very good. I think a good exercise would be to try and recreate karpathy's nanogpt ideas from clojure (whether that's using jank, or libpython-clj, or basilisp).
> I think a good exercise would be to try and recreate karpathy's nanogpt ideas from clojure (whether that's using jank, or libpython-clj, or basilisp). That's exactly what we are doing! I've https://github.com/jianlingzhong/nanoGPT-cpp and we are working on the jank/Clojure version of it. We will be presenting this in the conf!
Good! 👏
> your Clojure code and the libtorch C++ code live in the same memory space and the same process. I'm pretty sure this is also the case when using libpython-clj
> for libpython-clj, my understanding is that python object is an opaque reference. You can't easily def a PyTorch model in Clojure and then interactively inspect its internal state as if it were a Clojure map. The boundary is always there.
Python has a lot of metadata that you can use to inspect data at runtime. Most clojure inspection is done through protocols. I'm not aware of any technical limitations to extending clojure's protocols and interfaces to python objects.
• jank: on the other hand, jank enables the use of the whole c++ world libs out there. Admittedly, it's still early days and lots of work is still needed. But imagine the possibility that you have native access in Clojure to CUDA, XLA etc..The JVM has support for accessing native libraries. Accessing native libraries through the C ABI isn't too bad. For c++ you need to write or generate wrappers to use them from the JVM. Jank's interop with native libraries (especially c++) is much more ergonomic, but there are still many options on the JVM.
^ this last point is well-made, if calling libtorch turns out to be a good idea, it could also be done with dtype-next, probably.
I don't want to dissuade people from having fun w/ jank, though. I also have potential use cases for jank (not neural nets), so I track progress there with great interest as well 🙂
One of the things I want to try with jank is making c++ wrappers for clojure on JVM.
Since jank is C++, that means you can also embed it in clojure on the JVM.
thanks for sharing your thoughts @smith.adriane. in your experience, what do you think is preventing people from adopting libpython-clj to do ML/data science work, if at all? i imagine it could open doors to the whole world of python libs for ML/data science for the clojure community.
I think some people are using libpython-clj for ML/data science work (which is why it exists).
I think the number of people is small, but that's not unexpected since you're looking at the intersection people who use clojure, know python, and are doing data science that really wants to use python libraries rather than an existing Java or clojure library.
sorry, i should put it another way: what do you think are the pain points for people using libpython-clj that may prevent its wilder adoption?
You basically have to convince someone who already knows python and clojure to use python+clojure instead of just using python. Another option for clojure data scientists is to just use some python for training and/or evaluation, output the data somewhere and then use clojure. Folks are doing that. See https://youtu.be/Ia9Tixzlc_M?si=O14NpYVwTFUV4PZY I just think the number of people who are interested in this use case and really want to use clojure is small.
There definitely are pain points: • Using python means using python dependencies. python dependency management can be kind of a pain, even if you're just using python without clojure. This can also complicate deployment • Even if you know python and clojure, there is a learning curve to effectively using libpython-clj. • dtype-next and tablecloth have a lot of cool stuff for working with datasets, but there is also a learning curve there.
It would be hard to convince people who are already comfortable doing ML and data science in python to switch to do those in Clojure instead. I think our audience are more for those who are doing ML and data science in Clojure and find switching between Clojure and python inefficient. We are hoping to improve the workflow and efficiency (both human and machine efficiency) for them through this work.
I also had problems with dependencies when i was using libpython-clj at the very least the bloat was enormous. One thing to keep in mind is using python to operate on ndarrays and datastructures suitable for neural network layers implies that you most probably are going to use the pytorch operations and not leave the interop/python-interpeter land. The same issues i had with DJL too, but there mixing Clojure was more straightforward.
https://clojurians.slack.com/archives/C03RZRRMP/p1759699894592549