Fork me on GitHub
#data-science2015-10-26
>
aaelony20:10:36

so, who's doing data science in Clojure?

tord20:10:49

Hardly anyone, judging from the activity in this channel. 😞

aaelony20:10:34

I don't think activity in this channel is a good measure 😉

aaelony20:10:06

perhaps we need a more refined topic...

tord20:10:06

Yes, I sure hope the activity here is not a good measure. As for myself, I am not currently doing data science in Clojure, but I hope to start doing so.

aaelony20:10:51

I use it in several ways... First to organize data, then to craft invocations to more specific tools outside the jvm. I hope to do more from within clojure though (e.g. with core.matrix and neanderthal) but also with the cljsjs libraries for visualizing via clojurescript.

tord20:10:18

Since you clearly have more experience with me in this area: Is the Clojure(Script) data science ecosystem sufficiently mature to be a good replacement for R or Python?

aaelony20:10:55

This is definitely the realm of opinion. I've never been a python fan, but have used R for many years. For the past 3 years, I usually have used clojure primarily unless something needed was lacking. When that happens, I usually look to R or shell out via conch to some program adept in doing the thing that is lacking

aaelony20:10:46

you can find functions in Incanter that do alot of things you can do in R as well

aaelony20:10:42

I don't have much experience with clojurescript, but wish to learn cljs to be able to take advantage of the amazing visualization options that exist in the js world (e.g. d3, vega, etc...)

tord20:10:43

I would also much prefer to avoid R and Python, but a disadvantage for beginners like myself is that most of the books and learning resources are R or Python based.

tord20:10:25

But unfortunately, it assumes that the reader is mathematically illiterate.

aaelony20:10:56

What type of resources are you looking for?

aaelony20:10:37

what type of problems are you tackling?

tord20:10:59

I have two jobs right now, and data science is interesting for both of them:

aaelony21:10:23

all the tools you would normally use are still available from clojure, plus anything the jvm offers as well.

tord21:10:00

One job is in a startup writing recommendation software. Most of our existing software is in Java, and it's hard to convince the rest of the team that Clojure is a good idea. simple_smile

aaelony21:10:06

Recommendation is a large topic.

aaelony21:10:20

parts of it may or may not be a good idea in clojure

aaelony21:10:31

it depends on a lot of factors

tord21:10:18

The other job is at Play Magnus, the company behind the official iOS/Android app of chess world champion Magnus Carlsen. We're planning to write a training app next, and I'd like to be able to identify the weaknesses and characteristic mistakes of human chess players based on analysing their games, and give them exercises tailor made for their particular weaknesses.

aaelony21:10:31

you can always interop with java for libraries as you find them as well

tord21:10:49

For the chess job, I can pretty much pick the technology I want myself.

aaelony21:10:21

for chess, I enjoy http://en.lichess.org/ which could conceivably (I imagine) be done with clj/cljs ...

tord21:10:49

Yes, I agree, Lichess is the best place to play online chess.

aaelony21:10:03

I think there was a chess blurb in one of the early clojure books that was quite good

tord21:10:40

I don't like playing chess, though. That's why I started writing chess programs. Now I don't have to play chess myself anymore, I have a program that can do it for me.

aaelony21:10:13

just a quick google shows some interesting ideas using core.logic (https://github.com/matlux/clj-chess-engine)

tord21:10:31

Clojure isn't a good fit for chess engines, though. Raw speed is too important. My engine (Stockfish) is in C++, as much as I hate it. But we're drifting away from the topic.

aaelony21:10:40

whatever it is, I like controlling it from Clojure. Sometimes, I find that the whole thing can be in clojure, sometimes it can't though.

aaelony21:10:47

If you really care about speed, then the jvm is not an option, but for most things it is enough and with concurrency and parallelism it is pretty darn good.

tord21:10:13

For data science applications, you mean?

tord21:10:27

Should be much faster than R, at least

aaelony21:10:30

Did you write Stockfish?

tord21:10:48

Not alone. I'm the original author, but I haven't contributed anything recently.

tord21:10:03

Thanks. simple_smile

aaelony21:10:58

Others hopefully will chime in as well, but for me I like have Clojure control the flow of what I am doing with the data, from the repl or with gorilla repl (notebook style). I can poke around at the repl, or shell out nicely if I want to include a fast/faster tool.

tord21:10:31

That's how I hope to work, too. Glad to hear it's feasible.

aaelony21:10:34

often there is one component that needs to be super fast, so just shell out to do that

aaelony21:10:39

check out conch which let's you bind the name of an existing (fast) executable on your machine to a function name. Then you can call it as if it were a clojure function. really nice... https://github.com/Raynes/conch

tord21:10:59

Yes, I know about conch. That's how I run Stockfish from Clojure. simple_smile

tord21:10:10

Awesome library.

aaelony21:10:11

nice, yes that's what I would do too

aaelony21:10:19

really elegant

aaelony21:10:11

if there is some other dsl is needed for your purpose, then manipulate that dsl from clojure. I often do that with sql

aaelony21:10:40

like you said, why write it by hand? let composable functions do the work.

tord21:10:07

I don't remember saying that, but of course I agree.

aaelony21:10:56

you said "I don't like playing chess, though. That's why I started writing chess programs. " which I loosely took to mean an entire philosophy 😉

tord21:10:58

That's what programming is all about. We're lazy, but thinking ahead, spending a little bit of time and energy now in order to avoid repeating tedious tasks endlessly in the future.

aaelony21:10:05

I actually like playing chess though. Timed chess is like my fit-bit, if I am winning I am well-rested, if I am losing then I am overworked 😉

aaelony21:10:59

back to your application though, perhaps managing hyperparameter search, or manipulating the data around the quality of different recommendation scenarios is a good fit for clojure

tord21:10:12

Initially, because of my lack of experience in the subject, I'll probably just use it to play around with little examples and experiments in order to learn the subject. Performance shouldn't be a problem.

aaelony21:10:37

have you heard of Anglican?

tord21:10:34

At first glance, it looks similar to gorilla-REPL?

aaelony21:10:36

it does use gorilla repl, but it is more than that

aaelony21:10:02

Anglican is an open source, compiled probabilistic programming language integrated with Clojure, a general purpose functional programming language that just-in-time compiles to the Java Virtual Machine (JVM).

tord21:10:22

... but also less, it seems: "The programming language of Anglican is a subset of Clojure". I'm a little confused.

aaelony21:10:33

Check out the docs and videos. Do you need/use MCMC ?

tord21:10:18

Will check the docs and videos. No idea what MCMC is, which I suppose mean I don't need it.

aaelony21:10:40

markov chain monte carlo

aaelony21:10:48

ok. got to go... nice chatting

tord21:10:53

Ah, OK. Then I know what it is, but not whether I'll need it.

tord21:10:08

I should go, too. Thanks for the chat! Hope I have something to contribute in the future.

aaelony21:10:19

cool, cheers