Fork me on GitHub
#data-science
<
2018-06-22
>
rustam.gilaztdinov17:06:14

Wow, another benchmarks, that's totally what all we need trollface

blueberry17:06:29

Actually, some people asked for those.

rustam.gilaztdinov17:06:43

Sorry, Dragan, but I need Theano

joelkuiper18:06:42

<rant> I've generally come to the conclusion that ML in Clojure is for people smarter than me. I like Python in this because I conceptually get a lot of the things and can try them easily, but in Clojure you have to write a lot from scratch. I don't have the intellect nor patience to pull that off (especially not in in a business setting). The thing is that this is generally met with somewhat bitter comments. Like "just write your own" or "you're not worth it if you don't write your own", or some other measure involving self-worth. And to some extend I think that's fair. I can manage to write an SVM from scratch, but reimplementing a state of the art paper from scratch while the source code is just out there in another language, I guess it's just not my thing. And while the situation certainly has changed with respect to some libraries, I personally am going to stick with other languages for data science work. I think Clojure has some nice traits, but the value proposition is lost on me if I'm constantly fighting both the ecosystem and my own upper bounds of intellect. Having libraries is nice. And doing polyglot programming seems more and more like a virtue to me. Things like containers and reliable queues make this easier than ever, and I've grown to love my eclectic mix of languages</rant>

blueberry18:06:51

I agree with your sentiment, but I feel that you expect a bit too much. Some large companies dumped a lot of highly valuable code into the Python ecosystem that happen to be just what you need. Clojure on the other hand has only a handful of people working on this stuff. Of course they can't provide exactly what you need because they're busy writing their own code. Even Numpy holds several millions dollars worth of work (if not even more). Someone paid for that. Google, universities, private contributors. You also can't expect the free ecosystem to grow while you wait for someone else to fill in the blanks...

joelkuiper18:06:54

That's absolutely true. And I don't mean to argue on the contrary. But as a value proposition, especially as a business consultant, it's a trade-off for me. Leverage millions/billions of investments that was pumped in another language and basically given away for free ... or try to equate parts of that on my own (with nowhere near the resources nor capabilities). I mean it's not a comment on the language, if Clojure had that money figure behind it and just the sheer volume of people working on it I'm sure it would be better. But that's not reality for me right now šŸ™‚

blueberry18:06:46

The thing is that Java has some more millions of dollars worth of features that Python does not have šŸ˜‰

joelkuiper18:06:50

that's also very true. Hence my current setup is Clojure/Clojurescript for anything-but-ML and then call ML stuff over a queue with pretrained models. It's a pick and choose type deal. And I think that's a fair compromise for a lot of real-life use cases.

joelkuiper19:06:40

There are many thing Clojure is just better at, like concurrency, data modelling (ironically, immutability prevents /so/ many bugs), web stack, raw performance, and a somewhat sane deploy/ecosystem story. Even somewhat obscure things like logic programming can offer tremendous value. But on the other hand you have all these very talented researchers publishing their latest and greatest in python build on keras/tensorflow/pytorch/etc, and some of it can be ported.., but a lot of things still can't easily without deep knowledge

joelkuiper19:06:51

For example I was also one of the co-authors of GeMTC, a Bayesian network meta-analysis package in R. I'm fairly sure porting that to another framework, like something supporting particle mcmc, would be nice for performance...which would open up all sorts of nice interactivity possibilities. But it's just the sheer amount of validation, edge cases, and work that goes into writing something like that... that I'll probably never bother. I think that's fine, just call R over a queue šŸ˜›

joelkuiper19:06:16

(fun fact, GeMTC still isn't allowed in a lot of publications because it's not written using WinBUGS ... a system so archaic that I can't even)

blueberry19:06:06

well, sure. once you write something exactly for your needs, it's easier to use just that than to replicate it elsewhere. on the other hand, it sometimes it is much easier to write it in a better language if you're writing it for the first time.

blueberry19:06:14

For example, now that I have Neanderthal, ClojureCUDA, and ClojureCL, writing HPC stuff is a breeze, while equivalent libraries are a mess of unportable C++ that most people do not even know how to build...

joelkuiper19:06:23

right, and I'm loving the things people are using it for. Especially excited about a PyTorch clone since I do a lot of NLP these days and just serializing the data over a queue takes like 30% of the time, if I could do that in process that would be a huge win (just by cutting the network IO and serialization overhead). But like I said, I personally neither have the time nor intellect to pull that off ... and it's a hard sell to my clients to convince them that they should build and support a Clojure equivalent, especially since the people who could realistically pull that off are still countable on one hand.

joelkuiper19:06:59

even just ditching the Nvidia dependency would be nice for me, just for personal preference šŸ˜›

blueberry19:06:28

that's exactly why I am not interested in developing clones of popular libraries. however, if you'd have a clojure solution to a problem, that's not available on other platforms at all, that's another story...

joelkuiper19:06:31

well a clone is not exactly it I guess. But a deep learning lib that can deal with dynamic graphics (especially in the context of variable length vectors like sentences) would be great. I was just mentioning PyTorch because it has a good story supporting that use case. A proper Clojure impl would probably look very different, but would still support "doing sgd/backprop with dynamic models to support variable length inputs/outputs"

blueberry19:06:50

let's hope some of the people interested in DL create that

alan20:06:46

@joelkuiper take a look at this https://github.com/aria42/flare

ā˜ļø 12
šŸ‘ 4
gigasquid20:06:44

@joelkuiper also Iā€™m working on MXNet language binding for Clojure https://github.com/gigasquid/clojure-mxnet which can handle variable length sentences as well

šŸš€ 16
šŸ‘ 4
alan20:06:36

@gigasquid noice! Though I feel we are still missing something a la scikit-learn -> easy training + deploying

āœ”ļø 8