I have caught up a bit with the great work you are doing on data science in Clojure (my PhD work kept me busy in the last years). Thanks a lot for doing it! I saw the discussion on bringing GPU support to Clojure. I have also been reading Zig and scientific CUDA code, very much in the philosophy of @blueberry, in the sense that you need to care about the low level mechanics and environment if you want to build performant code. Julia, Zig, Rust etc. encourage people to write tight memory layouts (including the ability to inspect its assembly on the REPL), which unfortunately the JVM abstracts over like a blackbox (I talked to Oracle folks lately, and it seems it will still take them quite some time to improve that...). An interesting scientific example for CUDA code I came across is https://github.com/SFGLab/cudaMMC, mapping locality in the to be simulated problem domain (chromosomes) to locality on the hardware. Having said that, there is significant overlap between functional programming and the so called data-oriented design pattern driving Zig. @chris441 very much is building his libraries along these lines with columnar layout as well in my understanding. We build on this for https://github.com/replikativ/mesalog to import CSVs into datahike quickly, but datahike is unsurprisingly not nearly fast enough to max out this performance. It does have a columnar index format via EAVT, AEVT, AVET though and ideally the indices (with schema) should be quickly readable into tmd. I have some ideas of how to build an AGI system that is both resource aware on that low level, as well as as abstract as AIXI and general functional program synthesis methods. One concrete actionable thing that I would be curious to discuss with you is to compile a subset of Clojure to CUDA. I have added exploratory CUDA compilation support to daphne last week https://github.com/plai-group/daphne?tab=readme-ov-file#cuda-export. I think a mature project in this space is https://futhark-lang.org/. Being a probabilistic programming system, daphne atm. completely inlines everything including the data (only randomness is left for runtime). This is not what you actually want for CUDA, you would like to know your data memory layout (ideally through tmd) and then map functional abstractions to reduce over the CUDA memory in threads and warps locally https://nvidia.github.io/cccl/cub/api/classcub_1_1WarpReduce.html#_CPPv4I0EN3cub10WarpReduce6ReduceE1T1T11ReductionOp. I am thinking about how to change daphne to help with that. Has anybody written CUDA kernels in this setting already?
👋 @whilo this is a tangent, not the main point — but from reading through mesalog, it might make sense to chat sometime, there’s enough overlap w/what unify does (importing a bunch of csv files into a datomic-ish system) we could probably learn a bunch from each other (see e.g. https://unifybio.github.io/import-config/) = though it’s a somewhat different design case and unify targets datomic.
Sure, @yeefay.lim worked on mesalog mostly, I advised her how to do the integration with datahike. It would definitely make sense to work together on data importers.
You may be interested in a few projects from the Clojure data ecosystem:
• https://neanderthal.uncomplicate.org/ - cross-platform GPU-based linear algebra for Clojure. (you are probably aware of this already)
• https://github.com/techascent/tvm-clj?tab=readme-ov-file - Clojure bindings to the TVM compiler stack, which, as far as I understand it, is designed to ease the burden of targeting multiple platforms and architectures for performant tensor code.
I've also been thinking about this sort of thing in the background and wondering about some combination of Jank, Sci, and this https://dl.acm.org/doi/pdf/10.1145/3158140
Thanks for the references 😃. I am aware of TVM and I love Nada's work from a PL perspective. I think it is necessary though to (re)write and read the generated low level code (something that @blueberryis also very adamant about). I think tensor libraries like TVM are nice (I use pytorch in most of my machine learning work), and one should use them if vectorization or linear algebra readily applies, but you can do things in CUDA that are not tensorized, e.g. specific control flow for each process. This requires careful thinking and measurement of the code though, potentially even specializing it against a specific hardware (like mojo does).
Daphne is written such that it emits readable low-level code (e.g. in Python or CUDA) that you can either re-edit manually or also use then to adjust the Daphne compiler for your use case (in case you need to create similar pieces of code multiple times).
hi, is this the best place to ask newbie questions about tmd, tc? I have seen couple of people suggesting zulip but idk which channel is most appropriate.
Hi!
Yes, Zulip is recommended.
You can use the #data-science channel there if you are not sure.
For questions about tmd and tc and related topics, you can use the dedicated channel:
# .
Here is some information about the various Zulip channels:
https://scicloj.github.io/docs/community/chat/