Fork me on GitHub
#data-science
<
2024-03-07
>
Omer ZAK00:03:47

This is probably the oldest FAQ in scicloj. I have a dataset of (x y) values and want to do least-squares fit of the data to a function and make a plot showing both the data and predictions by the fitted function. In Python, I accomplished this using:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
However, it was not easy for me to find the corresponding libraries in Clojure ecosystem. For curve fitting, the following were mentioned: net.mikera/core.matrix, net.mikera/vectorz-clj, EJML . For plotting, the following were mentioned: cljplot, clojure2d.charts, hanami. The incanter package was mentioned for both roles. Which of the above would you recommend?

respatialized02:03:19

scicloj.ml.smile.regression has a least-squares implementation. https://github.com/scicloj/scicloj.ml.smile?tab=readme-ov-file I agree that it is not super easy to find basic examples of least-squares regression in the clojure data science ecosystem. I have struggled with it myself.

👍 2
Daniel Slutsky07:03:13

Thank you for pointing this out. https://scicloj.github.io/noj/ is an attempt to provide answers to such questions, by • wrapping a few of the relevant libraries together • adding some convenience functions • documenting their recommended use Some parts of the underlying libraries are stable, but some of the added functions (the scicloj.noj.* namespaces) are still experimental, and might change. One of our goals for the coming few weeks is to clarify these details.

1
Daniel Slutsky07:03:36

Specifically, you may wish to look into: • https://scicloj.github.io/noj/interactions_ols (rather stable) • https://scicloj.github.io/noj/stats (experimental) • https://scicloj.github.io/noj/visualization (experimental)

Daniel Slutsky07:03:06

Also, this tutorial (+video) at the Clojure Data Scrapbook: https://scicloj.github.io/clojure-data-scrapbook/projects/visual-tools/clay-calva-demo-20231216 (edit: just updated the deps and fixed some problem)

Daniel Slutsky13:03:34

@U06BDSLBVC6 if your data is something that can be shared, or if there is any toy dataset that you consider similar, I'd be happy to look together and see how we can extend Noj (and the docs) to support this case, At this stage, this kind of use case would be helpful in building Noj. We hope to have many such explorations in the upcoming https://clojureverse.org/t/real-world-data-meeting-1/ group.

Carsten Behring21:03:42

This guide has exampèle for a regression line: https://scicloj.github.io/scicloj.ml-tutorials/userguide-models.html chapter: :smile.regression/ordinary-least-square

👍 1
Omer ZAK04:03:16

@U7CAHM72M, Thanks for the link to the guide. Do you have also a link to guides about clerk/vl and utils/surface-plot? Thanks!

Mattias21:03:24

I have a text file I can read by doing (slurp "data/stuff.csv" :encoding "UTF-16"). Have problems getting Tablecloth/dataset to eat it, do I have to fudge around or is there an easy way? 😊

respatialized21:03:46

have you tried (tablecloth.api/dataset “data/stuff.csv”)? did that throw an error?

Mattias21:03:14

|                                :$value |                                                 :$error |
|----------------------------------------|---------------------------------------------------------|
| data/stuff.csv | Cannot read the array length because "<local4>" is null |

Mattias21:03:49

Sorry, bad formatting. But yeah, a Java error in the dataset.

Mattias21:03:37

If I (slurp) while specifying UTF-16 and spit the contents as is, tablecloth handles it nicely.

Mattias21:03:35

I have now learned that UTF-16 never caught on and is rarely used, so… my bad luck, perhaps. But nice of slurp to support it, anyway 🙂