This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-02-22
Channels
- # aatree (2)
- # beginners (14)
- # boot (190)
- # cider (16)
- # cljs-dev (15)
- # cljsjs (6)
- # cljsrn (7)
- # clojure (101)
- # clojure-austin (26)
- # clojure-berlin (2)
- # clojure-estonia (4)
- # clojure-greece (53)
- # clojure-russia (46)
- # clojurescript (44)
- # core-async (12)
- # cursive (57)
- # data-science (49)
- # datomic (5)
- # emacs (8)
- # hoplon (92)
- # ldnclj (20)
- # lein-figwheel (22)
- # leiningen (4)
- # mount (37)
- # om (103)
- # onyx (26)
- # parinfer (70)
- # proton (6)
- # re-frame (32)
- # reagent (1)
- # yada (24)
@aaelony: The packages are what keep R on the table for us. While the real work they do could be implemented in Clojure (or other languages), the development time it would take to write those packages in Clojure would have to be balanced by some performance or other gain in the end. Right now that isn't there, so I use Incanter for small projects and R for big ones.
@nkraft, that is interesting. Having used R for many years, I would say that R's historic weakness is that it doesn't scale well and that if you have small data that fits in memory, you might as well use R for that too.
@aaelony: I suppose it depends on the packages. We use pbdR for large data sets. There's others out there for R and big data. I'm no fan of R. I would move on and never look back if I could, but most of the R packages "just work" making my job easier. Clojure is great and one of my favorite languages, but it lacks maturity in the big data arena.
Clojure is my favorite language as well. I actually use it more for large data instead of R. I also enjoy the solid, production-grade, JVM environment, and the fact that being in the lingua franca of jvm prod is a leg up than having something in R that you have to translate...
@aaelony: I think we've drifted a bit. We were talking about Incanter, specifically, and I was saying that I use R for the kind of things that library provides when working with large data sets. That's by no means a preference for all things Clojure or all things R. Each has their place, and each has their use-case. I use R to create interactive data visualization destined for management types when working with datasets larger than Incanter can handle. I use Clojure for lots of other things.
interesting. As for myself, I haven't used Incanter in quite a while. For visualizing things, I would favor gyptis or vizard that take the Vega.js and D3.js approach. For amazing visualizations, cljsjs with d3 is the cutting edge.
but yeah i agree with @nkraft. R is nice for a really quick (and accurate) viz of “here’s how things work” to be shown off to people
I'm not a language purist. I've worked in dozens of them. For me, I use the appropriate tool for the job at hand. That might be Clojure or R or Excel for that matter.
There are a ton of gaps with R: (1) it doesn't scale with production (jvm), (2) your data might be too large for the R memory you have, (3) R reproducibility is a nightmare -- which R version? which package version? which dependency of the package version? Lots more
@aaelony: other way around, what gaps would you find in Clojure that would be solved in R?
for right now, ggplot2 is easier to use for visuals, but that is changing fast with the Vega spec. Also, there are a lot of R statistical modeling tools that are nascent or not there yet from the ease of use perspective in clojure
To me it seems overkill to put together cljs with D3 and Cubism just to show a graph of differences in two datasets containing time series information. It's quick in R, I'm done in 30 minutes and on to other things. By the way: I'm not using R in "production" but something that looks more like business intelligence.
Re: ggplot2 is easier to use for visuals, but that is changing fast with the Vega spec: True - on both counts. Even the R folks are converting (well, porting) ggplot2 to use Vega + D3 which is a clear win overall as it is directly usable by a browser (and can be live - dynamic)
Re: Also, there are a lot of R statistical modeling tools : This is the key thing about R, there are just a boatload of packages out there that already just work (well, modulo scaling issues...).
On a very related note, what would you guys think of an R2CLJ compiler? A while ago, myself and a couple others worked up a proof of concept (aka 'toy') with some analysis of what it would take to actually have a useful version. On simple benchmarks (typical micro things) the compiled version of the R (to Clojure) was 10X faster than the R on a single thread. Of course since the output was just a Clojure function, you could fold it over the data and get N speed up over that (N = # cores)
Re: was 10X faster than the R : Ugh! It's been a while and I forgot just how good this really was. I went back and looked it up and it actually was 440X faster than the R code with OUT parallelization.
what was faster? for example, people use packages like data.table for speed these days
The basic idea was that you could take the R libs / packages as is and compile them into Clojure and then just use the results in Clojure
It was just comparing hand written R for a simple floating point benchmark with the generated code (from the hand written R)
especially for the many packages out there. And it would be easy to compare for correctness, b/c you could verify what R would report.
Re: correctness - Yes, that is exactly what we thought as well
Yes, that was the idea - you could just use 'these packages' as you would in R, only now you would have their Clojure compiled equivalents
that is really cool. My main R critique is that old code often needs tweaking because "things have changed", versions of something, etc..
The main driver for this was what you mentioned earlier - R doesn't work in production and the way it is typically used is you figure the models in R and then toss them to some Java coders to recode to run on the JVM.
There are all sorts of other fugly edge things in R (actually you could call them bugs, but code depends on them) that we looked to mostly cover (basically bug for bug compatible), but some of the scoping was so insane that some bits were not covered - that would have caused a need to 'fix' the R code to get it to compile.
But the biggest obstacle was really getting the R runtime (the C libs) available - either via port (huge effort) or via JNI or JNA
All in all I still think it is a great idea - and definitely doable. But a lot of work to get something really useful.
is that a bigger/easier win than just targeting good R packages for some Clojure functional analog ?
It's clearly a bigger win (and easier overall) since you amortize the effort. Once you had the compiler, you could pick whatever package you wanted and just generate the Clojure. NOTE: the Clojure generated was really fugly and totally non idiomatic, but the idea was - who looks at generated code? And it all becomes JVM code in the end anyway.
Yes, agree about the long tail aspect and we were going to target a select set as a starting useful base.