Fork me on GitHub
#malli
<
2021-08-22
>
anonimitoraf11:08:50

Hi guys, I'm looking to use malli for the following use cases... Context: • I'm working on a system that takes in data from various 3rd party sources (think scraping). • The "scraped" data get transformed into an internal, universal data format that then get saved to DB Problems to be solved: • Validating that the 3rd party data are of some expected shape Obvious how malli gets utilized here • Mapping the 3rd party to the internal, universal data format Is this possible to be done via custom transformers on each property of the scraped data? I can give examples if needed

Ben Sless11:08:06

Definitely possible

anonimitoraf11:08:49

Awesome! I'll write up some dummy examples

Ben Sless11:08:54

probably more than one way for how you could do it, too

anonimitoraf11:08:19

Actually, I'm finding it a bit hard to quickly write up dummy examples. Do you mind linking docs on possible ways to do it @UK0810AQ2?

anonimitoraf12:08:45

Actually here's an example:

(def source-shape
  [:map
   [:events [:sequential
             [:map
              [:id int?]
              [:desc string?]
              [:details-id int?]]]]
   [:details [:sequential
                [:map
                 [:id int?]
                 [:content string?]]]]])
Sample data:
{:events {:id 1
          :desc "Blah"
          :details-id 11}
 :details [{:id 11
            :content "Blargh"}]}
mapped to
[{:id 1
  :description "Blah"
  :details "Blargh"}]

Ben Sless12:08:36

Should events be a sequence of maps?

anonimitoraf12:08:23

Ah whoops yes. I'll correct my example

Ben Sless13:08:05

Very roughly:

(defn index-by
  [f xs]
  (reduce (fn [acc x] (assoc acc (f x) x)) {} xs))

(defn join
  [{:keys [events details]}]
  (let [details (index-by :id details)]
    (reduce
     (fn [acc {:keys [details-id] :as event}]
       (conj acc (-> event
                     (dissoc :details-id)
                     (assoc :details (get (get details details-id) :content)))))
     []
     events)))

(def source-shape
  [:map
   {:decode/fun join}
   [:events [:sequential
             [:map
              [:id int?]
              [:desc string?]
              [:details-id int?]]]]
   [:details [:sequential
              [:map
               [:id int?]
               [:content string?]]]]])

(def data
  {:events [{:id 1
             :desc "Blah"
             :details-id 11}]
   :details [{:id 11
              :content "Blargh"}]})

(m/decode
 source-shape
 data
 (mt/transformer {:name :fun}))

Ben Sless13:08:28

but there should be a better day to do it

pithyless14:08:17

IMO, things to be aware of: 1. decode is best-effort and will not throw errors if the validation does not pass 2. the transformation functions are expected to handle bad input without throwing errors 3. I think it's best to split the decode-validate-transform into several steps (each can have it's own validation)

pithyless14:08:02

This is how I'd approach it:

(def source-shape
  [:map
   [:events [:sequential
             [:map
              [:id int?]
              [:desc string?]
              [:details-id int?]]]]
   [:details [:sequential
              [:map
               [:id int?]
               [:content string?]]]]])


(def destination-shape
  [:sequential
   [:map
    [:id int?]
    [:description string?]
    [:details string?]]])

pithyless14:08:27

(def sample-data
  {:events  [{:id         "1"
              :desc       "Blah"
              :details-id "11"}
             {:id         "2"
              :desc       "Blah 2"
              :details-id "12"}]
   :details [{:id      "11"
              :content "Blargh"}
             {:id      "12"
              :content "Blargh 2"}]})

;; Example broken input
(def sample-data2
  {:events [{:id nil
             :desc "Blah"
             :details-id 11}]
   :details [{:id "11"
              :content "Blargh"}]})

pithyless14:08:51

(let [raw-data sample-data

        ;; First, normalize input (this is best-effort and does not throw errors).
        ;; This can take care of things like converting strings to integers, because
        ;; the serialization protocol could not represent integers, etc.
        decoded (m/decode source-shape raw-data mt/string-transformer)

        ;; Second, validate that the normalized input matches our expectations.
        ;; NOTE: you can use m/validate instead of m/parse if schema has branching logic,
        ;; but you don't want to take advantage of the extra information at this point.
        ;; NOTE 2: you could use a different, more strict schema at this point.
        parsed (m/parse source-shape decoded)
        _ (when (identical? parsed ::m/invalid)
            (throw (ex-info "Invalid Input" (m/explain source-shape decoded))))

        ;; At this point we've validated input and want to do transformation
        transformed (custom-transform-logic parsed)

        ;; And would be good idea to validate transformation worked...
        _ (when-not (m/validate destination-shape transformed)
            (throw (ex-info "Invalid Transformation" (m/explain destination-shape transformed))))]
    transformed)

pithyless14:08:16

The custom-transform-logic could be the approach @UK0810AQ2 mentioned; but if we're doing these kind of map/join transformations, it might be worth it to checkout meander (especially when the cases become more complicated to grok):

(defn custom-transform-logic
  [data]
  (-> data
      (meander/search
        {:events  (meander/scan
                   {:id         ?event-id
                    :desc       ?description
                    :details-id ?details-id})
         :details (meander/scan
                   {:id      ?details-id
                    :content ?details})}
        {:id          ?event-id
         :description ?description
         :details     ?details})
      vec))

pithyless14:08:52

Also important to mention I used m/validate m/parse etc., but these should all be replaced with m/validator, m/parser etc. in production code

☝️ 2
pithyless14:08:18

I apologize for pasting so much code into slack; here's a gist: https://gist.github.com/pithyless/0be222e1b1b3bca0239a9ca07d1b34c2

schmee17:08:52

I second the recommendation for Meander, perfect use-case for it: https://github.com/noprompt/meander

anonimitoraf23:08:50

Thanks for the replies guys! I'll check them out after work 🙂

ikitommi13:08:06

Yes, meander is the king in complex transformations and works nicely with malli. Great example @U05476190!

pithyless14:08:01

Thanks @U055NJ5CC, I was hoping someone more experienced would confirm if the approach was reasonable.

anonimitoraf12:08:49

Hey guys, again, thanks for these! I somewhat understand how they work. Follow up question: Is there a way to do it via metadata on the malli's schema fields and I somehow manipulate those metadata? I'm thinking something like

(def source-shape
  [:map
   [:events [:sequential
             [:map
              [:id int?]
              [:desc
               {:custom/map-to :description}  ;; <----- like this
               string?]
              [:details-id int?]]]]
   [:details [:sequential
              [:map
               [:id int?]
               [:content string?]]]]])
?

Ben Sless18:08:30

Wrote a mutable entries parser just to see if it was possible. Eliminated almost all the overhead related to parsing entries. It's a disgustingly ugly direct translation of the code, but it works and cut down about 1us / entry

Ben Sless18:08:37

Behold, my horrible creation

ikitommi19:08:01

:smiling_face_with_3_hearts:

ikitommi19:08:55

my current project use case is on cljs-side btw.

Ben Sless19:08:00

I'm just surprised it works For cljs we can do what I started with which was propagating the bindings for every case, it should save some, too

respatialized19:08:08

is there a more idiomatic way of testing two schemas for equality than comparing them with malli.util/to-map-syntax?

respatialized19:08:14

lol, just noticed malli.util/equals, please disregard

Ben Sless19:08:00

Caveat emptor

(mu/equals (m/schema [:re #"hi"]) (m/schema [:re #"hi"]))
false

✔️ 5
ikitommi19:08:42

regexp schemas should be thrown away, could be just a property in :string schema

👍 7
ikitommi19:08:17

[:string {:pattern "hi"}]