This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-04-07
Channels
- # announcements (5)
- # asami (17)
- # aws (11)
- # babashka (67)
- # beginners (90)
- # calva (13)
- # cider (17)
- # circleci (6)
- # clj-kondo (3)
- # clojure (53)
- # clojure-europe (12)
- # clojure-france (8)
- # clojure-germany (3)
- # clojure-losangeles (1)
- # clojure-nl (4)
- # clojure-norway (4)
- # clojure-spec (15)
- # clojure-uk (8)
- # clojurescript (41)
- # cursive (7)
- # data-science (6)
- # datomic (8)
- # emacs (10)
- # exercism (1)
- # figwheel-main (2)
- # fulcro (5)
- # graalvm-mobile (97)
- # graphql (1)
- # hyperfiddle (7)
- # inf-clojure (6)
- # interop (4)
- # introduce-yourself (5)
- # jobs (3)
- # kaocha (3)
- # malli (8)
- # meander (8)
- # music (3)
- # nrepl (7)
- # observability (1)
- # off-topic (45)
- # overtone (2)
- # polylith (63)
- # portal (2)
- # re-frame (26)
- # reveal (8)
- # ring (3)
- # shadow-cljs (56)
- # tools-build (5)
- # vim (11)
- # xtdb (8)
goal: linear interpolation of missing values while avoiding lookahead (using scicloj/techascent stack)
inspiration for my attempt (but tell me there is something simpler): tech.v3.dataset.rolling
has mean
, and I'm trying to create an extrapolate
reducer in the same style to use with rolling
...
broken impl:
(defn extrapolator
"double extrapolation of data"
(^double [data options]
(if (== 0 (tech.v3.datatype.base/ecount data))
Double/NaN
(let [diffs (map - (rest data) data)
{:keys [n-elems sum]} (tech.v3.datatype.reductions/staged-double-consumer-reduction
:tech.numerics/+ options diffs)
mean-diff
(com.github.ztellman.primitive-math// (double sum)
(double n-elems))]
(->> data
(filter some?)
last
(+ mean-diff)))))
(^double [data]
(extrapolator data nil)))
(basically the mean
reducer with higher-level clojure injected to apply the average slope to the last non-nil value)
fn__90967 cannot be cast to class clojure.lang.Associative
, so I probably have to provide a map somewhere instead of a function...• https://cnuernber.github.io/dtype-next/tech.v3.datatype.gradient.html#var-diff1d • https://cnuernber.github.io/dtype-next/tech.v3.datatype.functional.html#var-mean If the data is a column then you can get a bitmap of the missing indexes via https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.html#var-missing. It appears you want to fill in the first missing value with an extrapolated value from the average difference of the existing values added to the last known value. Potentially you may like to try the dataset https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.html#var-induction pathway which should work for this.
Thanks so much for the response!! I decided to give the clojure-only route a try last night after seeing
, and now it is just a matter of piecing together everything from that outer wrapper and the tech.v3.dataset
stuff. Simultaneously getting into time-series means that I'm probably reproducing a lot of things or taking the wrong approach.
Here is my somewhat goofy approach to filling missing values using a rolling mean:
(require
'[tech.v3.dataset :as tds]
'[tech.v3.dataset.rolling :as roll]
'[scicloj.ml.dataset :as ds])
;; fill with rolling mean
(ds/replace-missing
unemp-rand-missing
:unemp-rate
:value
(-> unemp-rand-missing
(roll/rolling
{:window-type :fixed
:window-size 4
:relative-window-position :left}
{:mean (roll/mean :unemp-rate)})
(ds/select-rows
(tds/missing
(:unemp-rate
unemp-rand-missing)))
:mean))
Interesting. Honestly if it all works for you I think it is great. Lots of ways to get that done 🙂.
There are already some interesting pieces of R I'd like to figure out in this domain.
all.dates <- seq(from = start.date, to = end.date, by = "months")
(how do they even standardize here?)
Also just the concept of "rolling joins" on time series indices.
I'm using clojure.java-time
, and YearMonth
in one learning case. This feels like the best practice given a year and month to preserve the granularity of the original data. Maybe it is best to also create a timestamp version with defaults that enables functions with built in timestamp aggregation/manipulation? Any libraries focused on time series for this stack?