Fork me on GitHub
#data-science
<
2020-03-27
>
daveliepmann06:03:01

Hey folks, over at Applied Science we're playing around with accessing r-project RData files from Clojure clj with a minimum of interop. We build a single-purpose library: https://github.com/appliedsciencestudio/rdata/ Before we polish it up for release, we'd like to get your feedback. :male-detective::skin-tone-3: 👷 We'd appreciate it if you try it out and let us know what you think. clojure-berlin

👏 20
aaelony20:03:47

For those interested in R (especially from Clojure), there is a free remote conference tomorrow: http://dc2020.netlify.com/

Daniel Slutsky09:03:49

Our friend Danjela is interested in Clojure and data science and is looking for a teammeate for the Rails Girls Summer of Code project. https://www.reddit.com/r/Clojure/comments/fpkz98/rails_girls_summer_of_code_teammate/ Do you happen to know anyone who may like to join Danjela?

hindol14:03:53

Hi, do you have any Clojure data science talk suggestions? I have seen the two listed in Oz's README (one on Oz, one on Vega Lite). I know Clojure. Want to learn data science.

🎉 4
aaelony18:03:51

I recommend:

1.  The Book of Why for an introduction to causal inference ()  

2. Richard McElreath lectures () and his Statistical Rethinking book

3. Anything Tensorflow (books, documentation, examples) 

4. Dragan Djuric's books and software 

hindol19:03:55

Great list. I have seen some of Dragan's materials. They are great.

😎 4
practicalli-johnny13:03:46

@hindol.adhya If you just want some basic intros, I have done one on Dragan's baby steps in data science and another on the basics of Oz https://www.youtube.com/playlist?list=PLpr9V-R8ZxiDUXIR2z8Y8wvhpoPyl0t_D

😎 4
metasoarous17:03:01

Do you have any math background @hindol.adhya?

hindol17:03:14

How much math? If you mean probability and statistics, I know the basics like mean, mode and median. A little shaky on k-means, SVM and totally green on neural network etc.

metasoarous17:03:35

OK; That's a good start. IMHO (not that my math background biases me or anything), math/prob/stats skills are really the bedrock of data science. So the more you can learn on that front the better.

metasoarous17:03:49

I studied probability in school, and tutored basic stats (among other things), but this was my first introduction to higher level statistics and machine learning. What I really like about this book is that it endeavors to teach them together, seeing them as opposite sides of the same coin, which is %100 in my book: https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12.pdf

👍 8
metasoarous17:03:01

I wouldn't say this book is the easiest to get, but take a look and see how you take to it. If you are having a hard time getting through, you can pull at the bits that are you giving you trouble from other resources.

hindol17:03:31

One book I studied as part of coursework is https://nlp.stanford.edu/IR-book/information-retrieval-book.html But this does not go too much in depth. I have a CS major.

hindol17:03:59

This book only touched upon various clustering, supervised/unsupervised learning techniques.

hindol17:03:28

Nothing related to modern machine learning like neural networks, deep learning etc.

val_waeselynck17:03:20

The IR book does a surprisingly good job at introducing and motivating ML techniques, more so than many ML-specific books!

val_waeselynck17:03:57

(Also, despite the hype, there is more to modern ML than just deep learning/NNs - e.g graphical models / Gaussian Processes / TDA to name just a few- and non-modern ML often works quite well 😉 )

4
metasoarous18:03:34

^ 100% this! NN can do certain things really well, but it's often difficult to figure out what they're doing or why they're doing it. Good advise is to choose a model and approach based on the details of the situation, and not just grab the latest fad.

metasoarous18:03:41

Using a NN when you have a reasonable and principled probabilistic model, taylored to the situation at hand, that can be interpretted, etc, is always the way to go if you have a choice.

hindol18:03:54

This is great advice. Thank you so much.

hindol17:03:52

I should mention, I am not trying to change careers or anything. I am interested, and will spend my own time learning.

👍 4
hindol17:03:40

One part I love is visualization. This I enjoy much more than exploratory data analysis.

metasoarous17:03:50

Visualization is huge; A picture is worth a thousand words, right?

metasoarous17:03:05

Good luck! Interested to see what other recommendations folks have!

val_waeselynck17:03:58

@hindol.adhya I find it hard to answer because it seems to me a lot of different things are meant by "Data Science". On everything related to probabilistic modeling and inference (which arguably are a core component of Data Science), MacKay's ITILA (http://www.inference.org.uk/mackay/itila/book.html) is the most insightful introduction I've found, and probably the best-written scientific textbook I've read. Self-studying it has been a joy.

👍 8
val_waeselynck17:03:58

@hindol.adhya I find it hard to answer because it seems to me a lot of different things are meant by "Data Science". On everything related to probabilistic modeling and inference (which arguably are a core component of Data Science), MacKay's ITILA (http://www.inference.org.uk/mackay/itila/book.html) is the most insightful introduction I've found, and probably the best-written scientific textbook I've read. Self-studying it has been a joy.

👍 8