Fork me on GitHub
#malli
<
2021-10-20
>
ikitommi09:10:29

hi all. there is bunch of quality PRs being held in the queue, sorry. I’m working on with the malli internals (schema ast, lifecycle, providers, registries) and want to find a good design before merging anything big in. Will review and pull the queued PRs after that. Any small fixes welcome anytime.

Ben Sless09:10:57

Necroing a question I asked some time ago, any idea for a transformer which removes invalid keys?

Ben Sless09:10:08

(given that they're optional)

ikitommi09:10:32

invalid… you could a) validate the keys when transforming or b) explain and remove based on that.

Ben Sless09:10:45

I think I can build (a), the transformer has access to the parent schema

Ben Sless09:10:21

Also reminds me, I think an interleaved transformer/validator will be good

ikitommi09:10:07

doing both in a single pipeline?

Ben Sless09:10:29

The decoder can end up doing lots of allocations

Ben Sless09:10:42

you can short circuit on it

ikitommi09:10:02

but, you can’t validate before it’s transformed, right?

ikitommi09:10:24

so, it would need to happen on :leave -> it’s all transformed already?

Ben Sless09:10:27

you'll have something like a coercer which transforms and validates in one pass

ikitommi09:10:07

how would it allocate less if that happens after the transformation?

ikitommi09:10:31

wouldn’t all the nested childs get re-validated when you are leaving them?

ikitommi09:10:12

[:map [:a [:map [:b [:map [:c [:map [:d :boolean]]]]]

ikitommi09:10:53

unless the walking knows which childs are already transformed & validated.

Ben Sless09:10:21

hm, generally, you have no way of knowing if you need to re-walk the children

Ben Sless09:10:33

especially if you do interesting transformations

ikitommi09:10:16

my assumption is that having validation and transformation as separate steps is the fastest way to do it. 2 simple sweeps instead of one (more complex) sweep.

ikitommi09:10:24

but, all ears if there is a better way.

ikitommi09:10:12

originally, i though of doing all the workers using just -walk. there would be walker to create a validator, decoder etc. but as all schemas should have all of those and as performance was a one of the primary goals - they got separate (protocol) methods.

ikitommi10:10:10

the one walker would have allowed to compose a chain of validate + transform in an easier way, I think.

Ben Sless10:10:20

Figured out how to strip invalid optional keys, feel free to add it to tips, after some beautifying

(defn strip-invalid-optional-keys-transformer
  ([]
   (let [transform
         {:compile
          (fn [schema _]
            (let [entries (filter #(:optional (m/properties (second %))) (m/entries schema))
                  fs (map (fn [[k v]]
                            (let [validator (m/validator v)]
                              (fn [m]
                                (if-let [e' (find m k)]
                                  (let [v' (val e')]
                                    (if (validator v')
                                      m
                                      (dissoc m k)))
                                  m))))
                          entries)]
              (reduce comp fs)))}]
     (mt/transformer
      {:decoders {:map transform}
       :encoders {:map transform}}))))

Ben Sless10:10:42

(m/decode [:map [:a {:optional true} int?] [:b {:optional true} int?]] {:a 1 :b 2.2} strip-invalid-optional-keys-transformer)

ikitommi09:10:50

a) might be cleaner (and faster)?

ikitommi09:10:00

:or does also validation on transformation, as it needs to find the branch, which is valid after transformation.

ikitommi09:10:25

spike about caching computations (`-form`, -validator etc) with schemas:

(def schema
  (m/schema
   [:map
    [:x boolean?]
    [:y {:optional true} int?]
    [:z [:map
         [:x boolean?]
         [:y {:optional true} int?]]]]))

;; 1.5µs -> 11ns (130x)
(p/bench
 (m/validator schema)

;; 1.7µs -> 64ns (25x)
(p/bench
 (m/validate schema {:x true, :z {:x true}})) ; => true

ikitommi09:10:14

there is the initial cost of creating the thing, but just once opposed to every call. the results are cached with the actual schema instance, so when the schema instance is not needed, the cached results will also be GCd, so not leaking memory.

ikitommi09:10:59

using computation directly is still fastest, but not much:

;; 55ns
(let [validate (m/validator schema)]
  (p/bench
   (validate {:x true, :z {:x true}})))

Ben Sless12:10:40

Please disregard last message, user error