Fork me on GitHub
#data-science
<
2022-08-03
>
Kira McLean02:08:31

I made a status update from the Clojure data science community. There are lots of interesting groups and projects happening, may be of interest if you’re interested in Clojure or have data to work with or both! https://www.youtube.com/watch?v=63-KGa3Flac

👍 2
1
Marko Vukovic19:08:01

Hi everyone, does anyone know what the most supported/used clj spark library is in the community? There seem to be a couple and its hard to tell which one to use

Daniel Slutsky19:08:23

Hi. https://github.com/zero-one-group/geni by @USH9B0ZU0 & friends is well-documented, still getting the maintainer's attention, and a few people have been telling about actively using it.

Marko Vukovic19:08:25

Just checked it out, looks really cool! Ty

🆒 1
Daniel Slutsky11:08:12

See you in a little less than an hour at the Clojask / Tablecloth session with @mat_1, @ezmiller, and friends.

aaelony21:08:37

does the tc/dataset function in https://github.com/scicloj/tablecloth expose a way to read in gzipped files with an arbitrary delimiter? For example, a file name ending in .psv.gz where psv indicates a pipe | delimited file?

genmeblog21:08:46

tc/dataset in most cases delegates ds creation to tmd/->dataset especially reading a file.

aaelony16:08:40

ah, I see that. I like the idea that known filetypes are easy and it does the right thing, but I think when it does not recognize the file extension it should relax a bit and ask for a row delimiter

aaelony16:08:54

#{:csv :tsv :xlsx :xls :parquet} seems restrictive

Daniel Slutsky11:08:12

See you in a little less than an hour at the Clojask / Tablecloth session with @mat_1, @ezmiller, and friends.