Fork me on GitHub
#data-science
<
2022-03-04
>
leifericf15:03:22

Dropping this comment here. Perhaps someone else has something to add to this 👇

Joshua Suskalo16:03:17

@leif.eric.fredheim here's probably a good spot to make a thread to continue discussion of a batteries-included one-click-install ClojureStudio for data scientists 🧵

👍 1
Joshua Suskalo16:03:12

So as I'm looking at this and the surrounding tools, I'm starting to wonder if it would be possible to use cljfx to embed the reveal repl as the repl window, paravim inside an opengl canvas window, and potentially other tools as parts of the window as well to make a single integrated experience rather than a bunch of windows from the different tools.

pavlosmelissinos16:03:02

How about Clojupyter? https://github.com/clojupyter/clojupyter In my experience, Jupyter is used quite a lot in Python for prototype work and demos by the DS community It's not exactly a one-click install but something on top of deps.edn and Clojupyter might be able to bring some value? edit: Check out https://github.com/nextjournal/clerk instead, seems like a better choice

Joshua Suskalo16:03:35

So the goal is to make something that's like a one-click install with batteries included libraries. If that's possible with this, I'm totally up for it if DS people think it fits better, I myself have no experience in this domain, but I know a reasonable amount about the different editor projects around and how to package things well.

mkvlr17:03:44

with #clerk we're building a notebook. It's batteries included regarding visualization libraries but it's editor & library agnostic. Which doesn't mean that you can't bundle it. https://nextjournal.github.io/clerk-demo/notebooks/data_science.html is a data science example.

3
Joshua Suskalo17:03:00

Right, part of the goal here is to bundle an editor as well. Bundling a project that works with the builtin editor and also has stuff for building the notebook frontend to actually publish what you build seems like a good idea though.

mkvlr17:03:53

for clojure editing I can also plug https://github.com/nextjournal/clojure-mode ;) we're also thinking about extracting and open sourcing the rich text editor (integrated with clojure-mode) that we built for http://nextjournal.com

Joshua Suskalo17:03:28

Ah, is that something that could act as an in-development replacement of nightlight?

Joshua Suskalo17:03:17

'cause the goal with that route would be to be able to just start up the project and it'll automatically fire up the editor with a connected repl, which is what nightlight and paravim would do.

Joshua Suskalo17:03:24

I'll have to look closer at what this one is doing!

mkvlr17:03:19

you can certainly build this with these pieces, see https://nextjournal.com/try/clojure

👀 1
Joshua Suskalo17:03:21

Yeah, this looks fantastic for journals

Carsten Behring21:03:41

I like Emacs... But if I would build such a tool, It would be build out of Calva, the Clojure plugin for Visual Studio Code. Can we maybe add Clerk "by default" to Calva ?

chrisn22:03:20

I think the another very close option is https://github.com/jsa-aerial/saite. It comes bundled with tmd, neanderthal, etc. and is meant to be very much an all in one toolkit for science in general using Clojure.

👀 1
Joshua Suskalo22:03:29

So @U7CAHM72M there's two things being mixed up there, Calva is a general-purpose addon to vscode for Clojure, but it doesn't really control your project's dependencies besides nrepl stuff. While it would be feasible to add extra dependencies to your project for something like clerk, it might be very surprising to developers who are doing their own work that uses clerk because of version issues. The second thing is that the goal here is to make an easily-installable single application which could compete with RStudio, not so much a collected set of nice things to add to an existing clojure setup.

jsa-aerial22:03:34

As @UDRJMEFSN pointed out, this is pretty much exactly what Saite is intended to be and in large measure is. A self installing and running uberjar with most batteries included. At this point, it is pretty much all I use in my day to day data analysis work in genomics. There are two new videos going over the basics [Saite : A Clojure studio for data exploration and dashboards](https://www.youtube.com/watch?v=yOHmzbL5BV8) and one on advanced features, including the construction of dashboards [Advanced Topics in Saite for Dashboards and Data Exploration](https://www.youtube.com/watch?v=bVlwrNTDzvQ)

👀 1
aaelony22:03:39

another key part of the R ecosystem (with or without Rstudio), is the ability to dynamically install new R packages from your current environment, either via install.packages(...) or via remotes::install_github(...) (https://github.com/r-lib/remotes)

Joshua Suskalo22:03:35

The stuff about Saite is pretty awesome. And yeah, I agree installing dependencies at runtime is a pretty nice bit, although there is work being done on that with clojure tools.

aaelony23:03:47

agree. Other than that, there are a ton of data-centric libraries out there in R, that I haven't seen in Python or anywhere else. An example might be date processing via https://github.com/arg0naut91/neatRanges where you can expand, collapse, partition, fill date ranges that commonly arise in data

aaelony23:03:04

another nice functionality in R that is very doable in clojure if it isn't there already, is the idea of "nesting" data tables (or data frames) with a data table. e.g. https://stackoverflow.com/questions/25430986/create-nested-data-tables-by-collapsing-rows-into-new-data-tables

chrisn23:03:23

tick has a lot of the date functionality, for instance https://host-22a9c.web.app/.

chrisn23:03:54

tmd datasets can already be nested - only example I have is the result of https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.reductions.html#var-reservoir-dataset. The tablecloth system includes the idea of grouped vs ungrouped tables which is a concept coming from R or the database world.

👀 1
chrisn23:03:29

Lots of clojure systems use pomegranate to include dependencies dynamically. This works fine until you have an actual conflict at which point you will get odd errors.

chrisn23:03:59

In any case, you can go saite, clerk, notespace, or clojupyter and get a notebook experience. The https://scicloj.github.io/scicloj.ml-tutorials/userguide-intro.html were written using notespace which is IMO another project to consider.

chrisn23:03:26

Specifically R studio though I think is most closely approximated by saite.

👍 1
aaelony23:03:32

if I am not mistaken, tick.core/range might be more like R's lubridate, rather than neatRanges, but points well taken

leifericf11:03:38

My attempt to summarize: It seems to be that the general theme of this discussion is that most of the things are already possible with Clojure and the available tools and libraries. The difficult part is for the user to discover and compose the right tools and libraries, which is not something, say, a statistician, biologist, geologist or sociologist, knows how to do. R (more specifically https://www.tidyverse.org and RStudio) have solved this problem by providing a “complete, cohesive and pre-packaged” solution, which allows such domain experts to focus on their domain instead of what they would consider to be “accidental complexity.” And there are already some initiatives within the Clojure community, like Clerk and Saite, which have similar goals.

chrisn14:03:35

Yes, I agree mostly with that. I would add that with Clojure you have a great REPL experience and you always had emacs org-mode so the distinct need for something like R-Studio is significantly decreased. We have done lots of data science including with many different visualization systems without that but we aren't scientists and in general the scientists are going to Python or R and thus the market for something like R Studio in Clojure is far, far smaller. Existing non-studio-like tools take a solid chunk out of it and just the intended audience is very very small. I think the best place to go for anything like this is the scicloj website which you do find if you google clojure and data science. All these tools are things you can find from there. IF the scientists don't happen to land on scicloj then they won't find the majority of the toolkits/libraries because google won't help them - for instance googling "R studio Clojure" doesn't turn up any of the above systems.

👍 1
Carsten Behring17:03:32

I would agree as well hat the tidyverse helped a lot in that for R, as most data analysis problems can be done on the same type of data frame. We have now TDM which is equally capable and I hope the compatible libraries will grow. is indeed one example for machine learning.

👀 1
leifericf09:03:28

@U7CAHM72M Here’s a dumb question… What’s “TDM?” 😅