Fork me on GitHub
#data-science
<
2022-06-13
>
emccue23:06:04

Does anyone feel like they have a good grasp as to what Python has going for it with data science that clojure/java doesn't? I know thats a strange question but I feel like my knowledge of the entire ecosystem is massively out of date

respatialized23:06:10

Lots of eyeballs on model implementations to look for bugs and subtle errors would probably be the primary advantage

emccue00:06:25

hmm, i'm also thinking from a tooling/language perspective

emccue00:06:13

like I know Java doesn't have an equivalent of pandas (even if clojure kinda does with tech.ml.dataset), but i'm sure there are other huge gaps

Carsten Behring10:06:14

"size" of the ecosystem is my answer. "by coincidence" all models / algorithm get implemented in Python / R. It could have been Java as well. As there are "so many" and "progress is still fast", nobody bothers to "re-implement in Java".

Carsten Behring10:06:34

I think that "tech.ml.dataset/tablecloth" are equivalent to pandas. (if not even superior) This is not true for "number of models/algorithms implemented". Python is far , far, far ahead.

☝️ 1
1
metasoarous21:06:38

Fully agree with @U7CAHM72M's assessment. The only thing I'd add is that libpython-clj is an amazing asset for helping us bridge that gap. Whether it's worth building bridges or just sucking it up and writing python depends on the project and parameters though, because as much as libpython-clj lowers the barrier, for a quick one off task/test using a python library, it will often be easier to just write the python. This changes if you know you're going to be reusing or building on code over time and can amortize the cost of wrapping things, but it's still a cost that has to be paid, and will remain an advantage of the Python (&R) ecosystem(s) for the foreseeable future.

aaelony18:06:26

While this question is a bit like asking if vi is better than emacs, Python had/has a lot of marketing behind it that informs everyone how "easy" it is (whether verifiable or not). With this message in hand, and a few key libraries (i.e. numpy, pandas, and sklearn) they built a community and convinced most everyone except longtime practitioners it was/is the way to go. For more context and depth see https://news.ycombinator.com/item?id=31688667 and other such threads.

vim 1
✔️ 3
lgessler16:06:23

proximate cause like others have said is huge ecosystem, and the cause of that in turn is, IMHO, having a language syntax that minimizes grok-time and is amenable to rapid prototyping. Academics with little formal CS education were able to pick it up and use it for their research and teaching. I say this as a Clojure fan and a somewhat reluctant Python user: there's no question that learning S-expressions requires more upfront investment and friction than something that looks as pseudocodey and familiar as Python, and syntactic differences which might seem trivial to someone with a BS in CS are actually quite significant to a scientist who's just trying to get some statistical tests and models done. Moreover some of python's features which are unsavory to engineers (dynamic typing, "pragmatic", shall we say, dependency system, practical but inelegant language constructs) are advantageous to academics who often can and do totally abandon a codebase after a few months of developing it for a paper. A lot of good engineering practice just doesn't matter, and Python as a language is more enabling than others of allowing you to flout it

metasoarous18:06:00

I remain unconvinced that python is any easier to learn than clojure for people just starting to code. Anecdotally, from people I know who learned Clojure as a first language, learning OOP seems to be as hard as learning FP for someone coming from a more traditional background. I think it's just easier to use Python if most of the people you know who code are using languages more similar to it, more relevant resources at the intersection of data science, etc. Who knows though 🤷

metasoarous18:06:15

Your points about academic attitudes towards writing "good software" though are 100% right on. I have seen a lot of this. To some extent, I think it's very pragmatic, but having had to wrestle with buggy, poorly written and documented code in an academic context, I can say that I wish they had a bit more discipline. Good, clean, readable code => better science, IMHO.

Carsten Behring10:06:14

"size" of the ecosystem is my answer. "by coincidence" all models / algorithm get implemented in Python / R. It could have been Java as well. As there are "so many" and "progress is still fast", nobody bothers to "re-implement in Java".