new column api in tablecloth https://humanscode.com/columns-for-tablecloth-launch
Does anyone have a good example of exploratory data analysis using the clojure ds stack? Some particular things I'm looking for are: β’ a pretty correlation matrix chart β’ plotting many smaller charts to get a "feel" for the dataset β’ EDA tables of "extreme values" and things similar to that, where you can figure out data quality issues quickly
I did a talk for London Clojurians last month that covers a slot of these things, the recording is available here now: https://youtube.com/watch?v=eUFf3-og_-Y In the description thereβs a link to the repo used in the video.
Interesting - I'm personally not that invested in my EDA being a full interactive web app in the style of sweetviz / ydata-profiling, even though that is nice in some cases. But I do end up using many of the same elements in notebooks.
@emilaasa do you still need the correlation matrix chart? I found the old drafts and can try to write a tutorial if that helps.
Well yes I would find it useful - maybe I can contribute somehow to the tutorial?
Nice. Maybe I'll try to tidy those old drafts, and then surely you'll have ideas about how to improve them.
@emilaasa here is a work-in-progress draft with a correlation heatmap using Echarts: https://scicloj.github.io/noj/noj_book.visualizing_correlation_matrices I am still working on similar heatmaps using Vega and cljplot, and maybe also a scatterplot matrix.
Looks promising! Typically the rudimentary analysis I do day to day is along the lines of "which feature is most important for x?" It almost always ends up being matrices of plots, or correlation matrices - so anything that's in that area is of interest to me. π
Thanks. Then would a scatterplot matrix be more important here?
I think they are equally important - when you have enough features the scatterplot matrices (or any plot matrix) will become unweildy.
With few features I think you get the point across with any matrix of plots
I found this: https://github.com/jsa-aerial/aerosaite/blob/main/resources/Code/Templates/ml-charts.clj
Clerk is a pretty nice tool to weave code with results, something similar to Jupyter Notebook but Clerk just acts as a renderer. You can draw charts with plotly or vega and put them in a grid with Clerk viewer composition ( https://book.clerk.vision/#composing-viewers ).
Regarding correlation matrix plots, you may find this old discussion at the Zulip chat helpful: https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/correlation.20matrix.20plot.20.3F (possibly related to the code by jsa you mentioned above).
We had a draft somehwhere, trying to plot correlation matrices with Vega / Echarts / cljplot. I'll try to find it.
Regarding plotting many smaller charts, EDA, extreme values, etc., I think it would be great to create tutorials of this kind. If you have a proposed public dataset or an existing tutorial in another language, this can be a starting point for a tutorial we may create, maybe in collaboration.
There was just a thread about re-implementing a python tool which does this: https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/Profiling