Basic stats/regressions:
In julia, you can do:
additive_model = lm(@formula(y ~ x1 + x2), dataframe)
interaction_model = lm(@formula(y ~ x1 + x2 + x1 * x2), dataframe)
and it just works.
In R, you can do:
additive_model = lm(y ~ x1 + x2, dataframe)
interaction_model = lm(y ~ x1 + x2 + x1 * x2, dataframe)
and it just works.
If x1 is categorical, the lm call will automatically one-hot encode the variable, drop the first (which becomes the reference group), and run the regression. If I want to do interaction-terms, the function call will automatically create the interaction-columns for each pair.
Using the fastmath library in Clojure, you'd have to first manually one-hot encode using (categorical->one-hot), then manually choose the one to drop. This you have to do for each categorical variable. If you want to do interaction terms, it seems you have to manually calculate each interaction-column as well. If you have many groups (say, four treatment types for four different patient groups), this is a lot of boilerplate code (treatment_1 times group_1, treatment_1 times group_2, treatment_1 times group_3, ... etc).
Is this really the best way to do linear regressions/logistic regressions in Clojure? Am I missing a smarter way,, better suited library?
Regardless of further discussion in #data-science in the fastmath itself you can use the transformation function and contrasts to make convertion automatic for regression and prediction. I'll show an example for that soon.
That would be great!
Well... looks like I underestimated an effort with a fastmath, also discovered a subtle error with prediction. So, it's still manual but I think it's not too tedious.
;; formula: Sepal.Length ~ Species + Sepal.Width + Petal.Length + Petal.Width
;; contrast: dummy (default in R)
(require '[fastmath.ml.regression.contrast :as contrast]
'[fastmath.ml.regression :as reg])
;; iris data from R, as tech.ml.dataset / tablecloth
(def iris (rr/r->clj 'iris))
(tc/column-names iris)
;; => (:Sepal.Length :Sepal.Width :Petal.Length :Petal.Width :Species)
;; Create contrast, extract Species row, find levels and build dummy mapping
(def species-contrast (-> iris :Species distinct contrast/dummy))
;; => {:levels (:setosa :versicolor :virginica),
;; :names ("versicolor.vs.setosa" "virginica.vs.setosa"),
;; :mapping
;; {:setosa [0.0 0.0], :versicolor [1.0 0.0], :virginica [0.0 1.0]}}
(defn transformer
"Convert iris row into a row with contrast, skip a first value"
[[_ sw pl pw sp]]
(conj ((:mapping species-contrast) sp) sw pl pw))
(def result (reg/lm (:Sepal.Length iris) (tc/rows iris)
{:transformer transformer}))
;; this one should work, but I've found a bug: (result (first (tc/rows iris)))
(result (transformer (first (tc/rows iris))))
;; => -2.4239627994558695