Fork me on GitHub

gorilla is nice, but it is hardly maintained, lacks many features of jupyter and is less beginner friendly (jupyter has a nice UI with menu, is known to many people in the data science environment and has a lot of documentation everywhere).


i have used gorilla and also contributed a patch, but i think the clojure strategy is in general trying to use a powerful hosted environment where it is compatible with clojure's value proposition and functional nature instead of reinventing the wheel.


the same holds for plotting. gorilla and incanter are not enough to produce scientific plots. i really tried to use clojure+gorilla in competition to R and Python and it is not worth it with the pure clojure approach


even the JVM has few good plotting options imo. compare them to plotly for example, to which i have converged for now mostly because it has good examples of how to do scientific plots with it.


@justalanm what are you working on?


Yeah, unfortunately I noticed as well that Gorilla is not much alive (Incanter is even worse...). I'm a data scientist at an insurance company, I'm introducing Clojure at work for data engineering tasks (ETL, pipelines, batch processing, and so on). I feel that what we lack is not an environment such as Incanter, working with sequences of maps is easy, straightforward and pretty fast (easy parallelization is what really sold me on Clojure for data engineering), but something such as scikit-learn for Python


what do you use of scikit-learn?


i am atm. working on anglican, the probabilistic programming language


i think we should outsource side-effects like plotting, worksheets and maybe even some dataformats (hdf5) and so on to standards and focus on core processing


in that sense you are right


for the core control flow and algorithms we would need to have something in clojure/java


I didn't know about Anglican, I'll take a look at it. We use more or less 50% of scikit-learn facilities: many classificators, metrics, decomposition and regression algos


When I'll have some more time I'd like to experiment with


We're throwing a lot of XGBoost and other ensembles at problems, but I totally dislike its API (when it works, because we have many issues with it)


Anyway, I don't like much jupyter notebooks, people tend to use them as IDEs and converting them to standalone scripts is non-trivial, while Gorilla's worksheets are just .clj files with comments. A much better idea in my opinion


true, but they also can become huge


i could not always easily load them in an editor


i think it would be easy to have a one way extraction of the jupyter json into a clj file


but i am not sure whether this is good enough


i agree that the worksheet approach has limits


cider also seems to support images in the REPL now, which might be good enough for plots and actually a pretty powerful environment


similar to proto REPL


i have worked on a core.matrix wrapper around neanderthal:


neanderthal provides some nice low-level APIs and primitives, but it is not like numpy or scikit-learn


btw. i hate this threading interface of slack


(and i hate slack)


Nice! That's a very good idea. Anyway the strong part of scikit-learn is not all the algos, but a common interface and all the "tooling" like metrics, plotting and persistence facilities


I agree simple_smile


If you prefer you can fire me an email at <mailto:[email protected]|[email protected]>


or we keep discussing in the main slack channel


i think this is maybe interesting to others as well


i just don't like slack because of the paywall to my own content


i cannot look up stuff i discussed with others later


Let's continue on the main thread then


alternatively this might be the way to go if we would like to stay in clojure


Let's continue on the main thread then


Yeah I know about both of them, but as you just said Weka is Java and anyway both lack the scope of scikit-learn. I feel like this can't be outsourced (as Weka) and though I started with R and still use it extensively I can perfectly understand the fact that having one coherent API to perform maybe 90% of the modeling tasks most of us need is a huge benefit


Most of my colleagues use Python because it's mainstream and because is a one stop shop. And when I say "use" I mean it, most of them have never bothered about Python internals, how to solve difficult and real problems and so on


So I would say: - One consistent API (or DSL in the Clojure case, it doesn't make much difference) - Good performance out of the box (better if GPU enabled) - All the tooling required to perform an analysis from start to finish (metrics, plotting facilities, etc) - Nice to have: XGBoost and maybe even implemented within the same API and some deploying facilities These are the factors that would make Clojure viable as a machine learning ready language for the mainstream. We can bring to the table immutability, easy parallelization and probably very good performance out of the box and the JVM (with all its pros and cons)


Oh and of course piping stuff and layers (for neural nets) it's much better in Clojure than in other languages. I'm aware of Cortex and I kinda like it, but it's easy to beat TensorFlow's clunky syntax...


do you know pytorch? it is extremely good syntax wise


it feels like python


i will be working on autograd next, i already have done some preliminary work last autumn: and will take a look whether this is mergeable with flare


@U1C36HC6N I am trying to adopt your autograd in my TH library binding for Common Lisp. Can you provide me references on your design of autograd code? I’ve managed to write a converted code for lisp ( and I’d like to extend this as in the case of pytorch. Thank you.


this is a general website for the autodiff community:


this is reverse mode autodifferentiation, in neural networks it is often called backpropagation


effectively all deep learning libraries do this nowadays


(to my knowledge)


@justalanm what are you missing from lambda-ml?


As I already said the whole tooling part: for instance, and a common API


And docs, a lot of docs and tutorials