This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-08-22
Channels
- # anglican (2)
- # announcements (8)
- # babashka (37)
- # beginners (13)
- # calva (16)
- # clj-kondo (20)
- # cljdoc (1)
- # cljsrn (3)
- # clojure (6)
- # clojure-europe (3)
- # clojurescript (3)
- # clojureverse-ops (3)
- # conjure (1)
- # core-async (9)
- # cursive (6)
- # cypress (2)
- # datomic (5)
- # fulcro (1)
- # honeysql (15)
- # luminus (2)
- # malli (35)
- # music (3)
- # nbb (1)
- # off-topic (7)
- # pathom (12)
- # practicalli (1)
- # re-frame (15)
- # reagent (37)
- # sci (9)
- # shadow-cljs (3)
- # show-and-tell (3)
- # spacemacs (3)
- # workspaces (3)
Hi guys, I'm looking to use malli
for the following use cases...
Context:
• I'm working on a system that takes in data from various 3rd party sources (think scraping).
• The "scraped" data get transformed into an internal, universal data format that then get saved to DB
Problems to be solved:
• Validating that the 3rd party data are of some expected shape
Obvious how malli
gets utilized here
• Mapping the 3rd party to the internal, universal data format
Is this possible to be done via custom transformers on each property of the scraped data?
I can give examples if needed
Awesome! I'll write up some dummy examples
Actually, I'm finding it a bit hard to quickly write up dummy examples. Do you mind linking docs on possible ways to do it @UK0810AQ2?
Actually here's an example:
(def source-shape
[:map
[:events [:sequential
[:map
[:id int?]
[:desc string?]
[:details-id int?]]]]
[:details [:sequential
[:map
[:id int?]
[:content string?]]]]])
Sample data:
{:events {:id 1
:desc "Blah"
:details-id 11}
:details [{:id 11
:content "Blargh"}]}
mapped to
[{:id 1
:description "Blah"
:details "Blargh"}]
Ah whoops yes. I'll correct my example
Very roughly:
(defn index-by
[f xs]
(reduce (fn [acc x] (assoc acc (f x) x)) {} xs))
(defn join
[{:keys [events details]}]
(let [details (index-by :id details)]
(reduce
(fn [acc {:keys [details-id] :as event}]
(conj acc (-> event
(dissoc :details-id)
(assoc :details (get (get details details-id) :content)))))
[]
events)))
(def source-shape
[:map
{:decode/fun join}
[:events [:sequential
[:map
[:id int?]
[:desc string?]
[:details-id int?]]]]
[:details [:sequential
[:map
[:id int?]
[:content string?]]]]])
(def data
{:events [{:id 1
:desc "Blah"
:details-id 11}]
:details [{:id 11
:content "Blargh"}]})
(m/decode
source-shape
data
(mt/transformer {:name :fun}))
IMO, things to be aware of: 1. decode is best-effort and will not throw errors if the validation does not pass 2. the transformation functions are expected to handle bad input without throwing errors 3. I think it's best to split the decode-validate-transform into several steps (each can have it's own validation)
This is how I'd approach it:
(def source-shape
[:map
[:events [:sequential
[:map
[:id int?]
[:desc string?]
[:details-id int?]]]]
[:details [:sequential
[:map
[:id int?]
[:content string?]]]]])
(def destination-shape
[:sequential
[:map
[:id int?]
[:description string?]
[:details string?]]])
(def sample-data
{:events [{:id "1"
:desc "Blah"
:details-id "11"}
{:id "2"
:desc "Blah 2"
:details-id "12"}]
:details [{:id "11"
:content "Blargh"}
{:id "12"
:content "Blargh 2"}]})
;; Example broken input
(def sample-data2
{:events [{:id nil
:desc "Blah"
:details-id 11}]
:details [{:id "11"
:content "Blargh"}]})
(let [raw-data sample-data
;; First, normalize input (this is best-effort and does not throw errors).
;; This can take care of things like converting strings to integers, because
;; the serialization protocol could not represent integers, etc.
decoded (m/decode source-shape raw-data mt/string-transformer)
;; Second, validate that the normalized input matches our expectations.
;; NOTE: you can use m/validate instead of m/parse if schema has branching logic,
;; but you don't want to take advantage of the extra information at this point.
;; NOTE 2: you could use a different, more strict schema at this point.
parsed (m/parse source-shape decoded)
_ (when (identical? parsed ::m/invalid)
(throw (ex-info "Invalid Input" (m/explain source-shape decoded))))
;; At this point we've validated input and want to do transformation
transformed (custom-transform-logic parsed)
;; And would be good idea to validate transformation worked...
_ (when-not (m/validate destination-shape transformed)
(throw (ex-info "Invalid Transformation" (m/explain destination-shape transformed))))]
transformed)
The custom-transform-logic
could be the approach @UK0810AQ2 mentioned; but if we're doing these kind of map/join transformations, it might be worth it to checkout meander (especially when the cases become more complicated to grok):
(defn custom-transform-logic
[data]
(-> data
(meander/search
{:events (meander/scan
{:id ?event-id
:desc ?description
:details-id ?details-id})
:details (meander/scan
{:id ?details-id
:content ?details})}
{:id ?event-id
:description ?description
:details ?details})
vec))
^ FYI @UR37CBF8D :)
Also important to mention I used m/validate
m/parse
etc., but these should all be replaced with m/validator
, m/parser
etc. in production code
I apologize for pasting so much code into slack; here's a gist: https://gist.github.com/pithyless/0be222e1b1b3bca0239a9ca07d1b34c2
I second the recommendation for Meander, perfect use-case for it: https://github.com/noprompt/meander
Thanks for the replies guys! I'll check them out after work 🙂
Yes, meander is the king in complex transformations and works nicely with malli. Great example @U05476190!
Thanks @U055NJ5CC, I was hoping someone more experienced would confirm if the approach was reasonable.
Hey guys, again, thanks for these! I somewhat understand how they work.
Follow up question:
Is there a way to do it via metadata on the malli
's schema fields and I somehow manipulate those metadata?
I'm thinking something like
(def source-shape
[:map
[:events [:sequential
[:map
[:id int?]
[:desc
{:custom/map-to :description} ;; <----- like this
string?]
[:details-id int?]]]]
[:details [:sequential
[:map
[:id int?]
[:content string?]]]]])
?Wrote a mutable entries parser just to see if it was possible. Eliminated almost all the overhead related to parsing entries. It's a disgustingly ugly direct translation of the code, but it works and cut down about 1us / entry
I'm just surprised it works For cljs we can do what I started with which was propagating the bindings for every case, it should save some, too
is there a more idiomatic way of testing two schemas for equality than comparing them with malli.util/to-map-syntax
?
lol, just noticed malli.util/equals
, please disregard