Fork me on GitHub
#data-science
<
2023-04-18
>
Zachary20:04:23

What’s the best way to read in parquet files (I have thousands) and start processing them into datasets? I’m currently trying to follow the example from: https://github.com/zero-one-group/geni-performance-benchmark/blob/master/dataset/src/dataset/optimised_by_chris.clj

otfrom21:04:28

That looks like a good one to follow. Doing some of the claypoole stuff with hamfisted would make an interesting comparison