Fork me on GitHub
#data-science
<
2020-08-12
>
chris44111:08:11

We have a new post up exploring memory mapping and the new Apache Arrow data format via Clojure and https://github.com/techascent/tech.ml.dataset. A few times in my career I have used memory mapping and found it both simpler and faster than stream-based IO. Using it we can 'load' datasets far larger than physical RAM or load only 1 column/row out of many in a dataset without loading the rest. We hope you enjoy this simple demonstration! https://techascent.com/blog/memory-mapping-arrow.html