data-science

seb231 2024-04-16T08:22:28.434039Z

new column api in tablecloth https://humanscode.com/columns-for-tablecloth-launch

πŸš€ 1
πŸŽ‰ 5
emilaasa 2024-04-16T11:42:21.795779Z

Does anyone have a good example of exploratory data analysis using the clojure ds stack? Some particular things I'm looking for are: β€’ a pretty correlation matrix chart β€’ plotting many smaller charts to get a "feel" for the dataset β€’ EDA tables of "extreme values" and things similar to that, where you can figure out data quality issues quickly

Kira Howe 2024-05-15T11:46:44.325599Z

I did a talk for London Clojurians last month that covers a slot of these things, the recording is available here now: https://youtube.com/watch?v=eUFf3-og_-Y In the description there’s a link to the repo used in the video.

emilaasa 2024-04-17T07:21:52.021429Z

Interesting - I'm personally not that invested in my EDA being a full interactive web app in the style of sweetviz / ydata-profiling, even though that is nice in some cases. But I do end up using many of the same elements in notebooks.

Daniel Slutsky 2024-04-17T17:09:49.511529Z

@emilaasa do you still need the correlation matrix chart? I found the old drafts and can try to write a tutorial if that helps.

emilaasa 2024-04-17T19:01:07.275599Z

Well yes I would find it useful - maybe I can contribute somehow to the tutorial?

Daniel Slutsky 2024-04-17T19:06:12.569959Z

Nice. Maybe I'll try to tidy those old drafts, and then surely you'll have ideas about how to improve them.

Daniel Slutsky 2024-04-17T21:39:50.386909Z

@emilaasa here is a work-in-progress draft with a correlation heatmap using Echarts: https://scicloj.github.io/noj/noj_book.visualizing_correlation_matrices I am still working on similar heatmaps using Vega and cljplot, and maybe also a scatterplot matrix.

emilaasa 2024-04-18T06:47:20.348779Z

Looks promising! Typically the rudimentary analysis I do day to day is along the lines of "which feature is most important for x?" It almost always ends up being matrices of plots, or correlation matrices - so anything that's in that area is of interest to me. πŸ™‚

Daniel Slutsky 2024-04-18T06:49:12.615289Z

Thanks. Then would a scatterplot matrix be more important here?

emilaasa 2024-04-18T06:52:20.555949Z

I think they are equally important - when you have enough features the scatterplot matrices (or any plot matrix) will become unweildy.

πŸ‘ 1
emilaasa 2024-04-18T06:52:44.350869Z

With few features I think you get the point across with any matrix of plots

πŸ‘ 1
3starblaze 2024-04-16T13:31:17.399139Z

Clerk is a pretty nice tool to weave code with results, something similar to Jupyter Notebook but Clerk just acts as a renderer. You can draw charts with plotly or vega and put them in a grid with Clerk viewer composition ( https://book.clerk.vision/#composing-viewers ).

Daniel Slutsky 2024-04-16T14:19:18.350149Z

Regarding correlation matrix plots, you may find this old discussion at the Zulip chat helpful: https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/correlation.20matrix.20plot.20.3F (possibly related to the code by jsa you mentioned above).

Daniel Slutsky 2024-04-16T14:20:03.339419Z

We had a draft somehwhere, trying to plot correlation matrices with Vega / Echarts / cljplot. I'll try to find it.

Daniel Slutsky 2024-04-16T14:22:47.566969Z

Regarding plotting many smaller charts, EDA, extreme values, etc., I think it would be great to create tutorials of this kind. If you have a proposed public dataset or an existing tutorial in another language, this can be a starting point for a tutorial we may create, maybe in collaboration.

2024-04-16T16:43:39.784919Z

There was just a thread about re-implementing a python tool which does this: https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/Profiling