Fork me on GitHub
#data-science
<
2017-08-14
>
aaelony02:08:34

@blueberry, that is exactly right, python became well-utilized because of numpy, scipy, scikit-learn and other libraries. But how did they learn to use these python libraries? Pretty much because of notebooks and the ipython notebook. You can see this clearly because R had much of this utility long before python had it. R has a long tradition of literate programming, but embedded in the IDE, either via the R environment or after Rstudio (and many others). knittr is an example. I'd like to see data on this, but my guess today is that, Python is much more popular than R because of the impact of web-based Notebooks.

aaelony02:08:16

now, popularity is not everything (at least in my book), and I actually love and prefer org-mode and emacs, just saying though that the expected result will be different because the emacs barrier is perceived to be high.

elise_huard07:08:44

@aaeelony I agree, dockerized seems difficult, also I haven't tried that one yet, just thought it looked promising

elise_huard07:08:50

we're on classical gorilla-repl atm

blueberry08:08:40

@aaelony Python and R exactly show that notebooks are not that important. R works around RStudio (an IDE practically) and is hugely popular. Python dominates Deep Learning (due to available libraries and push from Google and Facebook, not due to notebooks, I'd say), but R dominates "data science" (due also to thousands of readily available packages for whatever analysis people can think of). Also, note that many people use Python from a proper development environment...

jsa-aerial13:08:07

For what it's worth, I'd say you have this exactly right. Notebooks (and friends) are more the 'frosting on the cake' - not the cake.

aaelony17:08:49

I'm not of that opinion, and I believe popularity number data bears that out. I believe that if we want clojure popularity for ds to rise, a Notebook angle is vital. In particular a library of Notebooks is even more vital.

aaelony17:08:04

You can argue that blogs can be just as good as Notebooks, but being able to tweak running examples is vital

blueberry17:08:49

@aaelony that might be, but it seems that we have a fine notebook (GorillaREPL was ok when I tried it 2.5 years ago) but somehow the pages are empty.

aaelony18:08:06

We're largely in agreement. I believe however that a widely available Gorilla repl project, possibly in a docker container so that it "just works", for Bayadera and Neanderthal might dramatically increase adoption. I do think that apriori installs might be hurting adoption. Maybe even a 'boot' based project that can hotload or guide the user through the install might be cool too. Btw, on blueberry's recommendation bought the dbda book and it is surprisingly well-written. The R code is a bit out of date, but still nice. Haven't seen anything about K-L Divergence yet but they can't cover everything. Great read so far.

blueberry08:08:24

not to mention that all people in this thread, including you, say they prefer something other than notebooks for themselves. so, should we basically advise people in "do what I say, not what I do" manner?

blueberry08:08:36

I'm not saying "abandon notebooks". I'm just saying that we should not expect anything spectacular out of it. Hardly anyone will be impressed, and if we don't provide an impressive content for those notebooks, nobody will blink an eye.

jsa-aerial13:08:17

I also don't think we should expect a whole lot from any notebook either - not because it couldn't be spectacular, but because it would require a massive effort to be so. Certainly it is now imaginable (if not readily plausible) to create something very impressive. The enabling technology is there : Atom, Reagent/Re-Frame, D3/Vega, cljsjs, Websockets, etc. But it wouldn't be a case of 'falling off a log'...

elise_huard14:08:06

@otfrom thank you, need to check that one out 🙂

aaelony16:08:21

fwiw, you can also use R directly from emacs, but most people don't do that (but I do)

blueberry17:08:19

well, I guess most people don't even know what Python or R are 🙂

blueberry17:08:47

Even facebook is not used by most people, just some 20% of them.

blueberry17:08:54

I appreciate the aspect of popularity, but am frankly more concerned with what I can do with some technology. As long as there is useful stuff to learn from other people and use in my work I'm happy 😉

blueberry17:08:06

@aaelony that might be, but it seems that we have a fine notebook (GorillaREPL was ok when I tried it 2.5 years ago) but somehow the pages are empty.

blueberry17:08:03

so, we are discussing and comparing the attractiveness of notebooks that do not exist despite the notebook technology for clojure being there for almost 3 years, with the blogs that are also not numerous, but at least there is something tiny there of various quality.

elise_huard17:08:52

@jsa-aerial for me the most interesting talk of euroclojure 2017!

jsa-aerial17:08:01

@blueberry I don't think the technology for a very impressive notebook was there 3 years ago - in anything. Well, maybe the 'bricks and mortar' and 'nuts, bolts and screws'. The thing in Gorilla that is very nice is the renderer. The rest of it is serviceable, but is based in fairly primitive (by today's std) SPA tech. ProtoRepl is a bit of a tease at what is possible, but is also a fairly 'small

jsa-aerial18:08:28

OK, just to be clear - I was talking browser based stuff. Certainly MatLab is there. But I do think the browser is the future for any of this sort of thing.

blueberry18:08:38

@jsa-aerial whether the particular notebook technology is impressive or just ok is not very relevant here. The point is that the pages of of the notebooks are empty. Sure, a better notebook technology would make the content a bit prettier or a bit more convenient. But for the content to be more shiny, there first have to be some material for the content. Or, to put it in one more analogy: there has been a passable and working notebook for the past 3 years, but there were either no pencils, or people who know how to write, or simply no literate people with pencils who liked to share their diaries in such a way. Whatever it was, it was obviously not because the notebook was not pretty enough.

jsa-aerial19:08:34

As I have already mentioned, I pretty much totally agree with the core of what you are saying here. However, don't mix that up with lack of functionality. It isn't so much 'prettier' or 'more convenient' as 'scope too limited to be widely useful'. Put another way - even with some really great content, or just nice internal content, if the limitations of 'the notebook' make rendering that content fully (or even mostly) not really feasible, it won't happen.

blueberry18:08:30

@aaelony I'm glad that you liked the DBDA book 🙂

aaelony18:08:23

very well written so far, very clear exposition

blueberry18:08:38

I don't know which chapter you're at, but the more practical stuf starts at the 4XX-ish page

blueberry18:08:25

Everything before that is necessarry but toy-ish introduction

blueberry18:08:40

Which is great

aaelony18:08:56

The material is not new to me, the book is. I read it when I can, after hours

blueberry18:08:08

because you can actually understand the basics before trying to understand something real

aaelony18:08:04

it is nice to rethink things through from the perspective of the author

blueberry18:08:43

regarding docker: well, the user first have to have docker configured. how is that easier than having leiningen configured? I get the docker's usefulness for python/ryby/r ecosystem, which have a messy library building and versioning story, but IMHO, java/maven solved the problem docker was supposed to solve at least 15 years ago.

blueberry18:08:46

i agree that anything that reduces friction is useful

blueberry18:08:57

but the friction here is not in the install process

blueberry18:08:08

or, there is some friction there

blueberry18:08:19

but it's like 0.1% of the friction

blueberry18:08:48

if you reduce that by half, you're still left with 99.95% friction 🙂

blueberry18:08:09

I believe there is more space for improvement in those 99.95%

blueberry18:08:53

of course, that's my perspective, and other people should do what they think is best

blueberry18:08:36

for example, neanderthal is completely free (in all senses). anybody can make that docker image if they think it'll help

blueberry23:08:08

hwever, python rise to fame using even less capable and older-fashioned notebook technology of 4-5 years ago. the main thing people are interested in is something like "wow, what a useful way to analyze genetic data using technique x with library y and plot it with matplotlib", not "hey, this can be edited with something that uses web sockets and has ATOM underneath". at least not when machine learning is in focus.

blueberry23:08:24

sure, having more responsive and prettier web editor can't hurt, but only when there is something interesting as the content in the first place, and at that point, anyone really interested is interested enough and capable enough to use more serious, "real" environment if that is required.

blueberry23:08:07

and r and python are the right example. to use them with blas backend was (and probably still is) really complicated to the point that there is a company selling the package, and that didn't matter. people were willing to pay with time AND money, because they can use library x for analysis y and get their job done.