Fork me on GitHub
#data-science
<
2017-05-16
>
gigasquid00:05:20

Hi @elise_huard - I don't know the reason - but I suspect it is just because the maintainers have moved onto other interests

gigasquid00:05:11

The Clojure community does need a collection of good libs to support Data Science - either clojure based or good interop with java libs

elise_huard08:05:18

@gigasquid I agree. I wonder whether Incanter is close enough that it's worth pushing that, or whether other tools need to be pushed

elise_huard08:05:30

I suspect Incanter is trying to do too much

novel11:05:43

@elise_huard - there is also an interesting comment by nblumoe (one of the (former?) developer) of Incanter https://clojurians-log.clojureverse.org/rdf/2017-02-14.html starting at 09:55:30

novel11:05:24

so it seems that some developers moved on and some don’t need Incanter anymore (i.e are using directly matrix.core).

novel11:05:37

@gigasquid have you already had a look at https://github.com/MastodonC/kixi.stats and does this qualify as a good lib to support data science?

nblumoe11:05:01

Hi. Just read my comment about incanter again and it still seems up to date to me.

nblumoe11:05:29

Also what @elise_huard said: Incanter might better be split up into multiple, composable libs.

nblumoe11:05:47

Active contributors would be very welcome

novel11:05:10

what kind of contribution are you looking for? and it looks like that for a while now, pull requests haven’t been merged.

nblumoe11:05:06

Tbh #1 priority would be someone to actively drive the project

novel11:05:07

unfortunately I cannot help with that 🙂

novel11:05:13

but if the project is going to be revived, I’d certainly happy trying to help out.

elise_huard12:05:38

kixi.stats was made by someone who collaborates with the company I work for (Mastodon C) - it's very early days

elise_huard12:05:46

and explicity for 'mid-size' data sets

elise_huard12:05:15

nblumoe: I'm evaluating, let me get back to you on that 🙂

elise_huard12:05:48

kixi.stats does some basic stats with transducers, but doesn't go much further than that (cc @henrygarner )

miikka12:05:30

Are there good data-science-y Java libraries, by they way?

nblumoe12:05:36

yes @miikka of course. depends on what specifically you are looking for

nblumoe12:05:42

of course there is a plethora of things for basic data handling, visualisation, etc. then there are basic number crunching libs (linear algebra etc.) and there are also higher level “data science” libs/frameworks for ML etc

nblumoe12:05:27

maybe before the python era took off, Java was the prime environment for industrial data science with a general purpose language (excluding more specialized envs like Matlab etc.)?!

miikka12:05:35

I'm thinking of something like the numpy/matplotlib/pandas ecosystem for Python

nblumoe13:05:28

I see. Well I guess there are multiple but I am not sure how well they all play together, I mean stuff like jblas, nd4j, etc for the matrices/array processing. For plotting there are GRAL, JFreeChart, JavaFx, etc. etc.

nblumoe13:05:00

So things definitely exist but I would be very skeptical about getting the same convenience as you get in the Python or R ecosystem

nblumoe13:05:27

Also because of compilation vs scripts

nblumoe13:05:22

And I think there isn’t as much coherence as in the Python space

nblumoe13:05:29

core.matrix also has multiple, pluggable backends which might be worth having a look at to get a feel for the underlying (Java) numerical libs

jsa-aerial15:05:14

@miikka core.matrix covers the numpy/pandas stuff (it may or may not at the moment have everything you may need). That, of course, is Clojure (but sits on various impls, including Java libs like vectorz). For matplotlib (ggplot2 is better imo), gyptis https://github.com/dvdt/gyptis may be sufficient, and it is browser vega/d3 based making it very dynamic in principal.

elise_huard15:05:04

our company really does want to do everything data in clojure, so it's a practical objective 🙂

elise_huard15:05:30

data exploration happens mostly in R atm, and we're looking to change that

nblumoe15:05:18

@elise_huard also just saw you on the speaker list too. and even mentioning incanter in your abstract 😄

nblumoe15:05:56

let’s definitely meet at the conference then. I am up to helping with DS tools for Clojure

elise_huard16:05:02

cool, sounds good!

nblumoe16:05:28

I guess you are based in London, right?

elise_huard16:05:34

Bath, actually

elise_huard16:05:09

as a first effort I think I'll try to reproduce some R work in clojure (possibly with incanter develop branch) and see how far that goes

elise_huard16:05:38

(remote company)

nblumoe16:05:42

we have quite a good share of AI/DS talks on the EuroClojure program. Will try to contact people to meet and discuss 🙂

elise_huard16:05:17

definitely up for meeting and discussing!

nblumoe16:05:26

great. also don’t hesitate getting in touch regarding your thoughts and issues with Incanter.

elise_huard16:05:59

will do. I'd like to contribute to the open source ecosystem around this.

nblumoe16:05:29

and I AM looking forward to your talk as I really went the lazy route in the last year or so and just did stuff with R and Python

elise_huard16:05:57

no pressure and all that 😉

elise_huard16:05:37

same here though, so far

nblumoe16:05:23

now that I got your attention already. any experience on your side or mastodonC with probabilistic programming? maybe even Anglican?

nblumoe16:05:40

(or anyone else on the channel of course)

elise_huard16:05:33

looks interesting, haven't used anglican

elise_huard16:05:37

we have done some markov chain work, but not me personally

elise_huard16:05:04

(probably completely unrelated)

nblumoe16:05:31

thanks. not necessarily unrelated. a piece in the puzzle rather I would say 🙂

otfrom22:05:12

I do wonder about using core.matrix only when you want to do something very matrix-oriented. I wonder if taking a "everything is a dataframe/dataset" approach like in R doesn't work for clojure where the powertools are all related to seqs and maps (partly why I find kixi.stats interesting)

otfrom22:05:04

there is powderkeg for mixing transducers with spark which I find interesting too https://github.com/HCADatalab/powderkeg

otfrom22:05:31

ah, powderkeg looks all up to date w/latest spark too