This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2018-05-10
Channels
- # beginners (35)
- # cider (165)
- # cljsrn (18)
- # clojars (1)
- # clojure (141)
- # clojure-greece (2)
- # clojure-italy (11)
- # clojure-nl (1)
- # clojure-spec (21)
- # clojure-uk (89)
- # clojurescript (56)
- # community-development (3)
- # cursive (3)
- # data-science (55)
- # datomic (13)
- # emacs (12)
- # fulcro (31)
- # graphql (6)
- # jobs-discuss (35)
- # lein-figwheel (10)
- # mount (2)
- # off-topic (3)
- # onyx (22)
- # parinfer (4)
- # portkey (7)
- # re-frame (29)
- # ring-swagger (4)
- # shadow-cljs (37)
- # specter (9)
- # sql (30)
- # tools-deps (15)
- # vim (2)
- # yada (17)
gorilla is nice, but it is hardly maintained, lacks many features of jupyter and is less beginner friendly (jupyter has a nice UI with menu, is known to many people in the data science environment and has a lot of documentation everywhere).
i have used gorilla and also contributed a patch, but i think the clojure strategy is in general trying to use a powerful hosted environment where it is compatible with clojure's value proposition and functional nature instead of reinventing the wheel.
the same holds for plotting. gorilla and incanter are not enough to produce scientific plots. i really tried to use clojure+gorilla in competition to R and Python and it is not worth it with the pure clojure approach
even the JVM has few good plotting options imo. compare them to plotly for example, to which i have converged for now mostly because it has good examples of how to do scientific plots with it.
@justalanm what are you working on?
Yeah, unfortunately I noticed as well that Gorilla is not much alive (Incanter is even worse...). I'm a data scientist at an insurance company, I'm introducing Clojure at work for data engineering tasks (ETL, pipelines, batch processing, and so on). I feel that what we lack is not an environment such as Incanter, working with sequences of maps is easy, straightforward and pretty fast (easy parallelization is what really sold me on Clojure for data engineering), but something such as scikit-learn for Python
i think we should outsource side-effects like plotting, worksheets and maybe even some dataformats (hdf5) and so on to standards and focus on core processing
for the core control flow and algorithms we would need to have something in clojure/java
I didn't know about Anglican, I'll take a look at it. We use more or less 50% of scikit-learn facilities: many classificators, metrics, decomposition and regression algos
When I'll have some more time I'd like to experiment with https://neanderthal.uncomplicate.org/
We're throwing a lot of XGBoost and other ensembles at problems, but I totally dislike its API (when it works, because we have many issues with it)
Anyway, I don't like much jupyter notebooks, people tend to use them as IDEs and converting them to standalone scripts is non-trivial, while Gorilla's worksheets are just .clj files with comments. A much better idea in my opinion
i think it would be easy to have a one way extraction of the jupyter json into a clj file
cider also seems to support images in the REPL now, which might be good enough for plots and actually a pretty powerful environment
i have worked on a core.matrix wrapper around neanderthal: https://github.com/cailuno/denisovan
neanderthal provides some nice low-level APIs and primitives, but it is not like numpy or scikit-learn
Nice! That's a very good idea. Anyway the strong part of scikit-learn is not all the algos, but a common interface and all the "tooling" like metrics, plotting and persistence facilities
If you prefer you can fire me an email at <mailto:[email protected]|[email protected]>
@justalanm are you aware of https://www.cs.waikato.ac.nz/ml/weka/ ?
alternatively this might be the way to go if we would like to stay in clojure https://github.com/cloudkj/lambda-ml
Let's continue on the main thread then
Yeah I know about both of them, but as you just said Weka is Java and anyway both lack the scope of scikit-learn. I feel like this can't be outsourced (as Weka) and though I started with R and still use it extensively I can perfectly understand the fact that having one coherent API to perform maybe 90% of the modeling tasks most of us need is a huge benefit
Most of my colleagues use Python because it's mainstream and because is a one stop shop. And when I say "use" I mean it, most of them have never bothered about Python internals, how to solve difficult and real problems and so on
So I would say: - One consistent API (or DSL in the Clojure case, it doesn't make much difference) - Good performance out of the box (better if GPU enabled) - All the tooling required to perform an analysis from start to finish (metrics, plotting facilities, etc) - Nice to have: XGBoost and maybe even https://github.com/catboost/catboost implemented within the same API and some deploying facilities These are the factors that would make Clojure viable as a machine learning ready language for the mainstream. We can bring to the table immutability, easy parallelization and probably very good performance out of the box and the JVM (with all its pros and cons)
Oh and of course piping stuff and layers (for neural nets) it's much better in Clojure than in other languages. I'm aware of Cortex and I kinda like it, but it's easy to beat TensorFlow's clunky syntax...
i will be working on autograd next, i already have done some preliminary work last autumn: https://github.com/whilo/clj-autograd and will take a look whether this is mergeable with flare
@U1C36HC6N I am trying to adopt your autograd in my TH library binding for Common Lisp. Can you provide me references on your design of autograd code? I’ve managed to write a converted code for lisp (https://bitbucket.org/chunsj/th/src/master/ad/) and I’d like to extend this as in the case of pytorch. Thank you.
this is a general website for the autodiff community: http://autodiff.org/
i can recommend http://www.robots.ox.ac.uk/~gunes/assets/pdf/baydin-2018-ad-machinelearning.pdf as an overview
this is reverse mode autodifferentiation, in neural networks it is often called backpropagation
@justalanm what are you missing from lambda-ml?
As I already said the whole tooling part: http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics for instance, and a common API