Fork me on GitHub
#malli
<
2022-10-04
>
dumrat03:10:34

Ran into a little perf trouble with malli sampling. Looks like performance is exponential wrt to sample size. Is there a reason for this? Sample code:

(def cashflow-schema
  (m/schema
   [:map
    [:type keyword?]
    [:direction keyword?]
    [:incurred-by {:optional true} [:map [:type keyword?] [:name string?]]]
    [:amount int?]
    [:categories [:set string?]]
    ;; [:date inst?]
    [:description {:optional true} string?]
    [:towards {:optional true} [:map [:type keyword?] [:name string?]]]]))

(defn get-sampling-data [sample-sizes]
  (map (fn [ss] [ss ((comp first :mean)
                           (quick-benchmark (doseq [x (mg/sample cashflow-schema {:size ss})]
                                              x) {}))])
             sample-sizes))

(defn get-vis-data [sample-sizes]
  (let [data (get-sampling-data sample-sizes)
        values (map (fn [[ss time]] {:sample-size ss :time time}) data)]
    {:data {:values values}
     :encoding {:x {:field "sample-size" :type "quantitative"}
                :y {:field "time" :type "quantitative"}}
     :mark "line"}))

(get-vis-data (range 5 110 5))
(oz/view! *1)

Ben Sless03:10:17

Can you def a generator from the schema and run again?

dumrat05:10:13

Yes, that works! Looks like it's almost constant time now.

Ben Sless06:10:17

Awesome! All malli operations can instantiate single closures which already do all the heavy lifting and only need to validate, parse, w/e You should also try a larger range, and add a doall around the generated result,it may be lazy

👍 2
dumrat06:10:46

Now that you mention it, it might be lazy, I agree. Time doesn't look right. But I tried generating samples with this and it's much faster than with previous method so for my purposes this is fine. Let me check the perf anyway

dumrat06:10:16

One question: I used generator like this:

(defn generate-samples [sample-size]
  (let [gen (mg/generator cashflow-schema)]
    (doall (gen/sample gen sample-size))))
This is fine right?

Ben Sless07:10:31

Yes, but you can even def the generator right under the schema. As you can see it does not depend on the sample size at all

dumrat07:10:38

Yeah, so doing this makes it significantly faster (For example, to generate 500 samples, my previous code takes 10s while generator takes 1.5s so there's a marked perf improvement. But the growth is still exponential. But I think I will leave it there. I originally wanted to generate around 1000 samples and it took too long that's why I delved into this.

Ben Sless10:10:06

The old nerd snipe 🙂