I'm receiving some types from an API, and I've written some transformers for the data. However, the main event type in the API is a :multi, and the field I dispatch on is being transformed. I can't get Malli to handle this. The spec looks like this:
(def RawMessageStreamEvent
"A raw event in a message stream, which can be any of the defined event types."
[:multi {:dispatch :type}
[:message-start RawMessageStartEvent]
[:message-delta RawMessageDeltaEvent]
... etc etc ...
I have two transformers:
(def ->kebab
(mt/key-transformer
{:decode (let [do-it #(clojure.string/replace % "_" "-")]
(fn [x] (if (keyword? x)
(if-let [ns (namespace x)]
(keyword (do-it ns) (do-it (name x)))
(keyword (do-it (name x))))
x)))
:encode (let [do-it #(clojure.string/replace % "-" "_")]
(fn [x] (if (keyword? x)
(if-let [ns (namespace x)]
(keyword (do-it ns) (do-it (name x)))
(keyword (do-it (name x))))
x)))}))
(defn entry-transformer [key decode-fn encode-fn]
(mt/transformer
{:decoders
{:map {:compile (fn [_ _]
(fn [x]
(if (contains? x key)
(update x key decode-fn)
x)))}}
:encoders
{:map {:compile (fn [_ _]
(fn [x]
(if (contains? x key)
(update x key encode-fn)
x)))}}}))
Which I compose like:
(mt/transformer
->kebab
(entry-transformer :type
#(keyword (str/replace % "_" "-"))
#(str/replace (name %) "-" "_")))
The problem is that the :type field comes in like "message_start". If I dispatch on "message_start" in the spec, then the value gets transformed, and the coercion fails because the :type field is now :message-start. If I dispatch on :message-start, then the :type field is not transformed, and the coercion also fails. How should I handle this?I guess entry-transformer only affects those map schemas inside multi. Not sure if there is a easy way to create transformer to match the :multi schema like this.
I've usually used the string transformer for these cases:
[:multi {:dispatch :type, :decode/string (fn [x] (update x :type decode))} ...]
The decode property allows easily to decide when the decoding happens.
So something like this?
(def RawMessageStreamEvent
"A raw event in a message stream, which can be any of the defined event types."
[:multi {:dispatch :type
:decode/string (fn [x] (update x :type #(keyword (str/replace % "_" "-"))))}
[:message-start RawMessageStartEvent]
...That still doesn't seem to decode the type field for the dispatch.
Have you enabled the string transformer? https://github.com/metosin/malli/blob/master/src/malli/transform.cljc#L421
(You could also use json-transformer / :json/decode if it makes more sense)
Thanks for the pointers, I'm fiddling with those now...
The combination of the :decode/string and the string and json transformers seems to have worked, thanks!
String transformers are a superset of json transformers Something is weird here
Yeah, either :decode/string / :decode/json property and string-transformer enabled OR
:decode/json and json-transformer enabled (`json-transformer` won't inherit string transformation so decode/strong wouldn't be used).
But enabling both string- and json-transformer shouldn't be needed
If the API Is using JSON/EDN/Transit using json-transformer probably makes more sense (even though it includes some transformations that you won't need with EDN/Transit)
You should be able to roll your own transformer which uses those/similar properties also, if needed. We might not have examples of that though.
The API is using JSON (Anthropic's LLM API). I'll try to come up with a minimal repro tomorrow, at the moment I'm getting by with this:
(defn munge-keywords-for-output [data]
; fuck it, I need to get some work done...
(walk/postwalk
#(if (keyword? %)
(transform-keyword snake-case %)
%)
data))
But it's clearly not ideal.Do you have it all specified?
Here's a gist with a stripped-down example: https://gist.github.com/cursive-ide/af961a5c513adcb12bd75509813b1619
As shown, testit outputs:
{"type":"first_message","reason":"foo_bar"}
{"type":"first-message","reason":"foo-bar"}
{"type":"second_message","value":0.0}
{"type":"second-message","value":0.0}So the ->kebab isn't being applied on the encode, for some reason.
Also, I have string-transformer on the input, if I don't I get a failure because the value that is being conformed is: {:type "first_message", :reason "foo_bar"}, i.e. the entry values are not converted to keywords despite being spec'ed as such. However, I need to use json-transformer on the output, because if I use string-transformer there, the float value is coerced to a string.
I'm very new to Malli, so my understanding of how this works is sketchy at best, but despite having tried to read the doc to sort these issues out, it's been hard to make it work.
Ok, based on @juhoteperi's comment above, I've made some progress, if I switch to decode/json instead of decode/string for the type field, then I can use json-transformer on the input too. It's definitely not clear from the doc that those annotations are related to the corresponding built-in transformers, but it makes sense now I know that.
But I would be very interested in knowing why the ->kebab transformer isn't applied on the encoding.
Maybe = doesn't match enum or any of those
transforming the inspected values under :mullti - my only weakness
Try adding encoder for :=
https://github.com/metosin/malli/blob/master/src/malli/transform.cljc#L274 the built-in transformers have it separate from enum also
Ok, adding one for := gets me closer, weirdly it seems to get a string not a keyword, so := #(clojure.string/replace % "-" "_"). That works for the type fields, but the enums are still not being transformed.
Oh wait, the enums also receive a string, so :enum #(if (string? %) (clojure.string/replace % "-" "_") %) fixes that, too.
Ugh, but then other values which are actually strings also get munged (in the real data). So "claude-3-5-sonnet-20240620" gets converted to "claude_3_5_sonnet_20240620". That one is a string enum. Can I use the schema type somehow in the encoder?
I thought :encoder/json identity might help there, but it still gets translated.
I updated the gist, the config there works except for the string enums.
:encode/json
Arg. Unfortunately, it still doesn't work.
When composing transformers, is it expected to use the same transformer chain in the same order for encoding and decoding?
same transformer chain, encoders work on interceptor enter and decoders on leave
On the gist, keyword or = encoder runs before it gets to encode-type (if you add that to the :multi schema props)
I've added another case to the gist. The reason enum at the top level is correctly mapped, but the stop-reason one embedded inside delta is not.
Ok, I'll try that.
I've probably usually handled the map-key transformations on json decoding/encoding part, with Jsonista options. It is the most performant option.
But if you want to do it with malli... I think having :map handler on the ->kebab transformer would be enough
Or I guess you also want to change values to snake-case, then this is ok
The map key part actually works ok, it's the map values that are the problem.
But I already do have a :map handler on the ->kebab transformer.
encode-type is actually not used in that gist, I had that there from when I was trying to add it manually to some fields.
One of your decoders returns STRING values, e.g. for :reason, though schema is enum of keyword for them.
Then you call encoder with non-conforming value, so the encoding doesn't run correctly
or hmm. checking m/validate does say decoded value is valid.
All the decoders should return keywords.
(defn round-trip [msg-json]
(let [decoded (m/coerce Message
(json/read-str msg-json :key-fn keyword)
message-input-transformer)
_ (println "decoded" decoded)
valid? (m/validate Message decoded)
_ (println "valid" valid?)
encoded (m/encode Message
decoded
message-output-transformer)]
_ (println "encoded" encoded)
(json/write-str encoded)))
decoded {:type :second-message, :value 0.0, :reason foo-bar}
valid true
encoded {:type second_message, :value 0.0, :reason foo_bar}
Oh reason is enum of strings so this is correct
Yes, I have two cases of enums in this API - they have some which are identifiers (e.g. for stop reasons), which I would like to have as keywords since that makes sense. But they also have enums of strings, e.g. for model names, which need to be preserved.
I mean, I could just make the model name a string and not validate it, that wouldn't be terrible.
But the other weird case is stop-reason, which is inside delta. That looks the same as reason at the top level in the same object, but one is correctly encoded and the other is not.
your enum encoder fn has keyword call instead of keyword?
Arg, that will do it.
I added println to enum encoder and it seems like it isn't even called for the stop_reason
And it looks like that actually receives strings anyway. So in fact the erroneous keyword check was doing nothing anyway, it always returned true because it's always passed a string, just :enum #(clojure.string/replace % "-" "_") works fine. But the string enums are still incorrectly encoded, and stop_reason is still skipped, right.
If it's not called, I wonder if a higher level schema is not validating and that's causing that to be skipped.
Or is it that the json transformer is running first
Probably, and json transformer encoder fn will take care of doing keyword -> string conversion so the second transformer sees the values as strings already
but hmm, why aren't the encoder fns even called
I don't understand why the :enum and := encoders see strings, but the :keyword one sees keywords.
decode keyword sees keywords because json-transformer already is doing that transformation first
decode enum for FirstMessage :reason also sees keywords for same reason
Actually, in json-transformer they both have:
:enum {:compile (-infer-child-compiler :decode)}
:= {:compile (-infer-child-compiler :decode)}
And if I understand correctly the child compilers will convert to strings. So right, in those two cases, json-transformer will convert them to strings.(for encoding, that is)
Right, I understand how that works a little better now.
So by the time ->kebab is called, all enums are strings on encoding. I can't tell them apart unless I can access the schema in the encoder. It looks like I would have to use a :compile encoder, is that right?
I don't really understand that part, but yes, seems like you would be able to access the schema and possibly even control which interceptor stage those run
I still don't understand why enum encoder isn't called for FirstMessage :delta :stop-reason :enum
Is there something that stops the encoding process once the value is already encoded once
Ok! I can handle the string/keyword enums doing this:
{:enum {:compile
(fn [schema _]
(if-let [type (some-> schema
(m/children)
(m/-infer))]
(if (= type :keyword)
#(clojure.string/replace % "-" "_")
identity)
identity))}The only remaining mystery is the stop-reason one.
Ha! After the json-transformer the map key is transformed to stop_reason -> the ->kebab transformer doesn't then see the stop-reason key.
Ohhhh
I'm not sure what would be a correct way to handle case where you combine transfomers and the first one changes map keys
So the map key is a string by then?
not sure if string/keyword matters here, but the value is different
It works if you rename the key to stopreason
So because the value is different, the encoder is no longer called because the key no longer matches the schema, is that right?
so there is no - _ difference
yeah
Surely I can't be the only person in the world wanting to convert snake to kebab when calling an API π
Well, I have experience from several solutions to this, also before Malli.
We have had most success when avoiding this, and just using whatever naming the external system/API definition whatever needs. It doesn't look too bad to just have :foo_bar or :fooBar in the clojure code.
But we have also done this at JSON level or JDBC options level for map keys.
Yeah, I can see that being easiest and most robust.
But very rarely for values I think.
We might coerce values to keyword etc, but I don't think we've done letter-case type changes
Yes, keys are easy, I can just use key-fn or whatever jsonista does for that.
One option is to drop json-transformer and extend your own transformer to handle parts of what json-transformer would do which you need
Ok. I will think about my best solution, but I suspect it will be just transforming the keys at the json level, and using snake case.
Right, I was also thinking that - I have a very limited set of types.
If you handle keys in json level, Malli might work better for values also, as the transformers won't need to change map keys then
That's also true, yeah.
Ok, thank you very much for your help, that was quite enlightening. I'll buy you a beer if I see you next week!
I'm skipping Heart of Clojure but there will be a few others from Metosin π
I'm sure there will be beer anyway - thanks again!
Correction: json-transformer doesn't change stop-reason -> stop_reason, making the enum transformer to not match, but the ->kebab :map transformer seems to run before :enum transformer
{"type":"first_message","reason":"foo_bar","delta":{"stop_reason":"end_turn"}}
decode enum :foo_bar
decode enum :end_turn
valid true
encode map {:type :first-message, :reason :foo-bar, :delta {:stop-reason :end-turn}}
encode enum foo-bar
encode map {:stop-reason :end-turn}
{"type":"first_message","reason":"foo-bar","delta":{"stop_reason":"end-turn"}}I would have though the enum encoder fn runs before map
https://gist.github.com/Deraen/72dd1da901671272dc698012b851c210
this seems to work
you can use :compile to make :map encoder run after :enum
I think I'm going to remove the :map encoder altogether, and just convert the keys at the JSON level. Like you say, that seems to make everything else work better.
Is there any way to debug the transformation process? This has been by far the hardest part of using Malli, for me, and when it doesn't work it's incredibly opaque trying to figure out what's not working and why. Are there any tips for figuring out why something is not working?
That sort of depends on the problem What I did trying to debug my own transformers was test individual transformers against values before composing them. Can you provide more details?
I don't have specific cases at the moment, I've managed to get through them, but I was more interested in developing a better intuition for how they work. In particular, the fact that they're driven by the schema seems to make them quite brittle - if some values end up not being transformed for some reason, that then seems to stop other values being transformed in a cascade.
I'm making a list of things which were not obvious (and some of which are still not obvious) when getting started, in case that's helpful, it's still a WIP though.
Transforming is one of the hardest parts to figure out. What could be done:
1. change -intercepting so that it sees both the schema and the transformer
2. allow adding a listener (via alter-var-root or option)
3. β¦ e.g. allow capturing intermediate values and their context
poor mans solution (just values):
(defn -printing [f]
(fn [x]
(println "BEFORE:" x)
(let [res (f x)]
(println " AFTER:" res)
res)))
(defn -intercepting
([interceptor] (-intercepting interceptor nil))
([{:keys [enter leave]} f]
(some->> [leave f enter] (keep identity) (seq) (map -printing) (apply m/-comp))))
(alter-var-root
(var m/-intercepting)
(constantly -intercepting))
(m/decode
[:map
{:decode/math {:enter #(update % :x inc)
:leave #(update % :x (partial * 2))}}
[:x [int? {:decode/math {:enter (partial + 2)
:leave (partial * 3)}}]]]
{:x 1}
(mt/transformer {:name :math}))
;BEFORE: {:x 1}
; AFTER: {:x 2}
;BEFORE: {:x 2}
;BEFORE: 2
;BEFORE: 2
; AFTER: 4
;BEFORE: 4
; AFTER: 12
; AFTER: 12
; AFTER: {:x 12}
;BEFORE: {:x 12}
; AFTER: {:x 24}
;=> {:x 24}could be tap>d etc. but without schema and the transformer, itβs not fully useful.
That looks interesting, thanks Tommi. I'll play around with that next time I'm having problems. I think one of the most mysterious parts is when particular transformers are applied (i.e. how the dispatch works), that looks like it would be helpful.
Ok, I've got a large project (at least for me. I'm guessing 3800 lines of code is more like "medium", but it's more code than I've ever written in a project before) that I wrote before I understood what spec/malli/etc were. And in that project, since it's a cljfx project, I keep getting errors to do with nils ending up places they shouldn't be. Would I be correct in thinking that Malli could help with that?
I think it could. If you pass data structures around, that end up missing keys in inappropriate places, you could add Malli (or spec)-based assertions in strategic places (where these are depends on your project) to assert that the required keys are present when required, that the values have the appropriate type/shape etc. Malli can help with verifying deep or complex data structures and give you humanized error messages, so you don't have to craft these by hand.
You should regard Malli as a handy tool (because strictly speaking you don't need it for this) to write runtime assertions, e.g. at the start of certain functions that do DB calls. Although you could add assertions everywhere, the art is to find a sweet spot of how many and where you write these assertions, cfr. diminishing returns.
If you want, you can go fancy and use a library to add input/output validation to functions, like snoop.
Ok, I do have some pretty complex data structures.
So how do I try to add Malli to an existing project? Any advice on that?
I'd start with writing some malli specs and play around with m/validate, m/explain in the repl. Once you get the hang of that, you can add these assertions in the code base.
(let [schema [:map
[:a {:optional true} :int]
[:b :string]]]
(when-let [explanation (m/explain schema {:a 123})]
(throw (ex-info (str "Invalid spec: " (me/humanize explanation)) {}))))that should get you started
In short, I'm really confused by Malli and what it does, but it feels like it would solve a problem I keep having. Can you help me understand if it actually would help?