This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-10-04
Channels
- # aleph (23)
- # announcements (1)
- # babashka (21)
- # beginners (70)
- # biff (3)
- # cider (8)
- # clj-kondo (45)
- # clj-yaml (9)
- # clojure (69)
- # clojure-europe (82)
- # clojure-nl (1)
- # clojure-norway (2)
- # clojurescript (34)
- # conjure (19)
- # core-typed (6)
- # cursive (2)
- # events (5)
- # fulcro (55)
- # honeysql (1)
- # integrant (18)
- # jobs (1)
- # lsp (124)
- # malli (10)
- # meander (1)
- # off-topic (26)
- # polylith (8)
- # reagent (7)
- # releases (1)
- # remote-jobs (1)
- # sci (2)
- # shadow-cljs (19)
- # squint (5)
- # vim (17)
- # xtdb (31)
Ran into a little perf trouble with malli sampling. Looks like performance is exponential wrt to sample size. Is there a reason for this? Sample code:
(def cashflow-schema
(m/schema
[:map
[:type keyword?]
[:direction keyword?]
[:incurred-by {:optional true} [:map [:type keyword?] [:name string?]]]
[:amount int?]
[:categories [:set string?]]
;; [:date inst?]
[:description {:optional true} string?]
[:towards {:optional true} [:map [:type keyword?] [:name string?]]]]))
(defn get-sampling-data [sample-sizes]
(map (fn [ss] [ss ((comp first :mean)
(quick-benchmark (doseq [x (mg/sample cashflow-schema {:size ss})]
x) {}))])
sample-sizes))
(defn get-vis-data [sample-sizes]
(let [data (get-sampling-data sample-sizes)
values (map (fn [[ss time]] {:sample-size ss :time time}) data)]
{:data {:values values}
:encoding {:x {:field "sample-size" :type "quantitative"}
:y {:field "time" :type "quantitative"}}
:mark "line"}))
(get-vis-data (range 5 110 5))
(oz/view! *1)
Awesome! All malli operations can instantiate single closures which already do all the heavy lifting and only need to validate, parse, w/e You should also try a larger range, and add a doall around the generated result,it may be lazy
Now that you mention it, it might be lazy, I agree. Time doesn't look right. But I tried generating samples with this and it's much faster than with previous method so for my purposes this is fine. Let me check the perf anyway
One question: I used generator like this:
(defn generate-samples [sample-size]
(let [gen (mg/generator cashflow-schema)]
(doall (gen/sample gen sample-size))))
This is fine right?Yes, but you can even def the generator right under the schema. As you can see it does not depend on the sample size at all
Yeah, so doing this makes it significantly faster (For example, to generate 500 samples, my previous code takes 10s while generator takes 1.5s so there's a marked perf improvement. But the growth is still exponential. But I think I will leave it there. I originally wanted to generate around 1000 samples and it took too long that's why I delved into this.