data-science

Ben Sless 2026-01-11T19:47:44.232119Z

I haven't been able to find adequate docs / examples / questions on the subject, so pardon if this has been asked before - I have a large collection of CSVs which I want to concat into a single ds I'm looking for the most efficient way of going about it. Currently using reduce and concat-in-place

Ben Sless 2026-01-12T09:56:55.259799Z

It turned out the fastest solution (when I'm not OOMing) was using apply concat with pmap I was being stingy with RAM but I was going to end up with the same amount of data in memory in the end anyway so 🤷‍♂️

👍 1
Harold 2026-01-11T20:32:47.120809Z

(apply tech.v3.dataset/concat ...) to a sequence of datasets can significantly outperform reduce... Because apply can see all the datasets at once, where reduce forces them to go one at a time.