Fork me on GitHub
#off-topic
<
2018-11-18
>
emccue02:11:13

Actually if it goes economy, base, luxary you can have a dataeconomy

jswaney14:11:58

Hi all, I do a lot of 3D image processing in python and I was wondering what tools are out there in the Clojure ecosystem. One nice thing about python is that so many libraries agree on using numpy ndarrays, but idk if there's such a standard in Clojure (core.matrix?). I eventually want to shift to Clojure to make parallel image processing easier, maybe even distributed with something like Onyx.

didibus22:11:57

You can also check out Neanderthal for fast matrix operations

tavistock17:11:38

http://thi.ng geom may be similar to what you are looking for depending on what type of work you are looking at doing https://github.com/thi-ng/geom

emccue17:11:12

@jswaney absolutely no clue, but I do know that deeplearning4j has an NDArray class that might be useful

emccue17:11:07

scratch that it comes from ND4J

emccue17:11:31

which clojure core.matrix has as a possible backend

jswaney17:11:11

Thanks for the link, I'll take a look

jswaney18:11:08

Seems like a good fit for handling the image data. My next problem is that my 3D images are larger than memory, so I need to chunk them for processing. I sometimes use HDF5 and other chunk compressed formats for this, but it's definitely not ideal

jswaney18:11:12

I would much rather have nodes accessing the image data from a parallel file system and use Onyx to process the image chunks in a distributed manner. I have no idea if something exists for breaking up huge arrays for some Onyx number-crunching

emccue18:11:16

like, the set is larger than memory or an individual one might be larger than memory

jswaney18:11:49

Like one contiguous array / image is larger than memory

jswaney18:11:29

It's a huge image of a mouse brain actually

emccue18:11:51

pinkies or brains?

emccue18:11:28

okay so, full disclaimer, totally new field to me. Not out of college yet and the real world looks scary. That being said, i know nothing about Onyx, so my first guess would be something something mapreduce

emccue18:11:24

beyond that, I have no clue

jswaney18:11:58

Yeah I had looked into using hdfs since it's available on the cluster that I use. idk why but the initial barrier to entry for me was so large that I ended up just using memmapped arrays

emccue18:11:25

well, deeplearning4j has some guidance on scaling

emccue18:11:48

(that project maintains the specific ndarray library I mentioned)

emccue18:11:25

beyond that, its always worth a stack overflow question

👍 4
emccue18:11:53

and, for my benefit, what exactly are the benefits of HDFS (if you get over the barrier to entry)

jswaney19:11:40

As far as I know, it's meant to work with huge datasets in a fault-tolerant manner. Hadoop brings your program to the data in blocks distributed across machines in a cluster, and HDFS makes the fault-tolerance happen

hiredman19:11:55

HDF5 is not HDFS

jswaney19:11:26

I was just saying that I was using HDF5 currently to store large arrays, then a separate discussion about Hadoop