Fork me on GitHub

Hi guys, I'm looking to use malli for the following use cases... Context: • I'm working on a system that takes in data from various 3rd party sources (think scraping). • The "scraped" data get transformed into an internal, universal data format that then get saved to DB Problems to be solved: • Validating that the 3rd party data are of some expected shape Obvious how malli gets utilized here • Mapping the 3rd party to the internal, universal data format Is this possible to be done via custom transformers on each property of the scraped data? I can give examples if needed

Ben Sless11:08:06

Definitely possible


Awesome! I'll write up some dummy examples

Ben Sless11:08:54

probably more than one way for how you could do it, too


Actually, I'm finding it a bit hard to quickly write up dummy examples. Do you mind linking docs on possible ways to do it @UK0810AQ2?


Actually here's an example:

(def source-shape
   [:events [:sequential
              [:id int?]
              [:desc string?]
              [:details-id int?]]]]
   [:details [:sequential
                 [:id int?]
                 [:content string?]]]]])
Sample data:
{:events {:id 1
          :desc "Blah"
          :details-id 11}
 :details [{:id 11
            :content "Blargh"}]}
mapped to
[{:id 1
  :description "Blah"
  :details "Blargh"}]

Ben Sless12:08:36

Should events be a sequence of maps?


Ah whoops yes. I'll correct my example

Ben Sless13:08:05

Very roughly:

(defn index-by
  [f xs]
  (reduce (fn [acc x] (assoc acc (f x) x)) {} xs))

(defn join
  [{:keys [events details]}]
  (let [details (index-by :id details)]
     (fn [acc {:keys [details-id] :as event}]
       (conj acc (-> event
                     (dissoc :details-id)
                     (assoc :details (get (get details details-id) :content)))))

(def source-shape
   {:decode/fun join}
   [:events [:sequential
              [:id int?]
              [:desc string?]
              [:details-id int?]]]]
   [:details [:sequential
               [:id int?]
               [:content string?]]]]])

(def data
  {:events [{:id 1
             :desc "Blah"
             :details-id 11}]
   :details [{:id 11
              :content "Blargh"}]})

 (mt/transformer {:name :fun}))

Ben Sless13:08:28

but there should be a better day to do it


IMO, things to be aware of: 1. decode is best-effort and will not throw errors if the validation does not pass 2. the transformation functions are expected to handle bad input without throwing errors 3. I think it's best to split the decode-validate-transform into several steps (each can have it's own validation)


This is how I'd approach it:

(def source-shape
   [:events [:sequential
              [:id int?]
              [:desc string?]
              [:details-id int?]]]]
   [:details [:sequential
               [:id int?]
               [:content string?]]]]])

(def destination-shape
    [:id int?]
    [:description string?]
    [:details string?]]])


(def sample-data
  {:events  [{:id         "1"
              :desc       "Blah"
              :details-id "11"}
             {:id         "2"
              :desc       "Blah 2"
              :details-id "12"}]
   :details [{:id      "11"
              :content "Blargh"}
             {:id      "12"
              :content "Blargh 2"}]})

;; Example broken input
(def sample-data2
  {:events [{:id nil
             :desc "Blah"
             :details-id 11}]
   :details [{:id "11"
              :content "Blargh"}]})


(let [raw-data sample-data

        ;; First, normalize input (this is best-effort and does not throw errors).
        ;; This can take care of things like converting strings to integers, because
        ;; the serialization protocol could not represent integers, etc.
        decoded (m/decode source-shape raw-data mt/string-transformer)

        ;; Second, validate that the normalized input matches our expectations.
        ;; NOTE: you can use m/validate instead of m/parse if schema has branching logic,
        ;; but you don't want to take advantage of the extra information at this point.
        ;; NOTE 2: you could use a different, more strict schema at this point.
        parsed (m/parse source-shape decoded)
        _ (when (identical? parsed ::m/invalid)
            (throw (ex-info "Invalid Input" (m/explain source-shape decoded))))

        ;; At this point we've validated input and want to do transformation
        transformed (custom-transform-logic parsed)

        ;; And would be good idea to validate transformation worked...
        _ (when-not (m/validate destination-shape transformed)
            (throw (ex-info "Invalid Transformation" (m/explain destination-shape transformed))))]


The custom-transform-logic could be the approach @UK0810AQ2 mentioned; but if we're doing these kind of map/join transformations, it might be worth it to checkout meander (especially when the cases become more complicated to grok):

(defn custom-transform-logic
  (-> data
        {:events  (meander/scan
                   {:id         ?event-id
                    :desc       ?description
                    :details-id ?details-id})
         :details (meander/scan
                   {:id      ?details-id
                    :content ?details})}
        {:id          ?event-id
         :description ?description
         :details     ?details})


Also important to mention I used m/validate m/parse etc., but these should all be replaced with m/validator, m/parser etc. in production code

☝️ 2

I apologize for pasting so much code into slack; here's a gist:


I second the recommendation for Meander, perfect use-case for it:


Thanks for the replies guys! I'll check them out after work 🙂


Yes, meander is the king in complex transformations and works nicely with malli. Great example @U05476190!


Thanks @U055NJ5CC, I was hoping someone more experienced would confirm if the approach was reasonable.


Hey guys, again, thanks for these! I somewhat understand how they work. Follow up question: Is there a way to do it via metadata on the malli's schema fields and I somehow manipulate those metadata? I'm thinking something like

(def source-shape
   [:events [:sequential
              [:id int?]
               {:custom/map-to :description}  ;; <----- like this
              [:details-id int?]]]]
   [:details [:sequential
               [:id int?]
               [:content string?]]]]])

Ben Sless18:08:30

Wrote a mutable entries parser just to see if it was possible. Eliminated almost all the overhead related to parsing entries. It's a disgustingly ugly direct translation of the code, but it works and cut down about 1us / entry

Ben Sless18:08:37

Behold, my horrible creation




my current project use case is on cljs-side btw.

Ben Sless19:08:00

I'm just surprised it works For cljs we can do what I started with which was propagating the bindings for every case, it should save some, too


is there a more idiomatic way of testing two schemas for equality than comparing them with malli.util/to-map-syntax?


lol, just noticed malli.util/equals, please disregard

Ben Sless19:08:00

Caveat emptor

(mu/equals (m/schema [:re #"hi"]) (m/schema [:re #"hi"]))

✔️ 5

regexp schemas should be thrown away, could be just a property in :string schema

👍 7

[:string {:pattern "hi"}]