Fork me on GitHub

Actually if it goes economy, base, luxary you can have a dataeconomy


Hi all, I do a lot of 3D image processing in python and I was wondering what tools are out there in the Clojure ecosystem. One nice thing about python is that so many libraries agree on using numpy ndarrays, but idk if there's such a standard in Clojure (core.matrix?). I eventually want to shift to Clojure to make parallel image processing easier, maybe even distributed with something like Onyx.


You can also check out Neanderthal for fast matrix operations

tavistock17:11:38 geom may be similar to what you are looking for depending on what type of work you are looking at doing


@jswaney absolutely no clue, but I do know that deeplearning4j has an NDArray class that might be useful


scratch that it comes from ND4J


which clojure core.matrix has as a possible backend


Thanks for the link, I'll take a look


Seems like a good fit for handling the image data. My next problem is that my 3D images are larger than memory, so I need to chunk them for processing. I sometimes use HDF5 and other chunk compressed formats for this, but it's definitely not ideal


I would much rather have nodes accessing the image data from a parallel file system and use Onyx to process the image chunks in a distributed manner. I have no idea if something exists for breaking up huge arrays for some Onyx number-crunching


like, the set is larger than memory or an individual one might be larger than memory


Like one contiguous array / image is larger than memory


It's a huge image of a mouse brain actually


pinkies or brains?


okay so, full disclaimer, totally new field to me. Not out of college yet and the real world looks scary. That being said, i know nothing about Onyx, so my first guess would be something something mapreduce


beyond that, I have no clue


Yeah I had looked into using hdfs since it's available on the cluster that I use. idk why but the initial barrier to entry for me was so large that I ended up just using memmapped arrays


well, deeplearning4j has some guidance on scaling


(that project maintains the specific ndarray library I mentioned)


beyond that, its always worth a stack overflow question

👍 4

and, for my benefit, what exactly are the benefits of HDFS (if you get over the barrier to entry)


As far as I know, it's meant to work with huge datasets in a fault-tolerant manner. Hadoop brings your program to the data in blocks distributed across machines in a cluster, and HDFS makes the fault-tolerance happen


HDF5 is not HDFS


I was just saying that I was using HDF5 currently to store large arrays, then a separate discussion about Hadoop