malli

cfleming 2024-09-09T10:58:17.546059Z

I'm receiving some types from an API, and I've written some transformers for the data. However, the main event type in the API is a :multi, and the field I dispatch on is being transformed. I can't get Malli to handle this. The spec looks like this:

(def RawMessageStreamEvent
  "A raw event in a message stream, which can be any of the defined event types."
  [:multi {:dispatch :type}
   [:message-start RawMessageStartEvent]
   [:message-delta RawMessageDeltaEvent]
... etc etc ...
I have two transformers:
(def ->kebab
  (mt/key-transformer
    {:decode (let [do-it #(clojure.string/replace % "_" "-")]
               (fn [x] (if (keyword? x)
                         (if-let [ns (namespace x)]
                           (keyword (do-it ns) (do-it (name x)))
                           (keyword (do-it (name x))))
                         x)))
     :encode (let [do-it #(clojure.string/replace % "-" "_")]
               (fn [x] (if (keyword? x)
                         (if-let [ns (namespace x)]
                           (keyword (do-it ns) (do-it (name x)))
                           (keyword (do-it (name x))))
                         x)))}))

(defn entry-transformer [key decode-fn encode-fn]
  (mt/transformer
    {:decoders
     {:map {:compile (fn [_ _]
                       (fn [x]
                         (if (contains? x key)
                           (update x key decode-fn)
                           x)))}}
     :encoders
     {:map {:compile (fn [_ _]
                       (fn [x]
                         (if (contains? x key)
                           (update x key encode-fn)
                           x)))}}}))
Which I compose like:
(mt/transformer
  ->kebab
  (entry-transformer :type
                     #(keyword (str/replace % "_" "-"))
                     #(str/replace (name %) "-" "_")))
The problem is that the :type field comes in like "message_start". If I dispatch on "message_start" in the spec, then the value gets transformed, and the coercion fails because the :type field is now :message-start. If I dispatch on :message-start, then the :type field is not transformed, and the coercion also fails. How should I handle this?

juhoteperi 2024-09-09T11:08:30.248199Z

I guess entry-transformer only affects those map schemas inside multi. Not sure if there is a easy way to create transformer to match the :multi schema like this. I've usually used the string transformer for these cases: [:multi {:dispatch :type, :decode/string (fn [x] (update x :type decode))} ...] The decode property allows easily to decide when the decoding happens.

cfleming 2024-09-09T11:15:09.225939Z

So something like this?

(def RawMessageStreamEvent
  "A raw event in a message stream, which can be any of the defined event types."
  [:multi {:dispatch :type
           :decode/string (fn [x] (update x :type #(keyword (str/replace % "_" "-"))))}
   [:message-start RawMessageStartEvent]
...

cfleming 2024-09-09T11:15:29.781149Z

That still doesn't seem to decode the type field for the dispatch.

juhoteperi 2024-09-09T11:20:14.540939Z

Have you enabled the string transformer? https://github.com/metosin/malli/blob/master/src/malli/transform.cljc#L421 (You could also use json-transformer / :json/decode if it makes more sense)

cfleming 2024-09-09T11:41:04.159199Z

Thanks for the pointers, I'm fiddling with those now...

cfleming 2024-09-09T11:59:28.617989Z

The combination of the :decode/string and the string and json transformers seems to have worked, thanks!

Ben Sless 2024-09-10T06:24:52.855379Z

String transformers are a superset of json transformers Something is weird here

juhoteperi 2024-09-10T07:30:27.401739Z

Yeah, either :decode/string / :decode/json property and string-transformer enabled OR :decode/json and json-transformer enabled (`json-transformer` won't inherit string transformation so decode/strong wouldn't be used). But enabling both string- and json-transformer shouldn't be needed

juhoteperi 2024-09-10T07:31:16.064519Z

If the API Is using JSON/EDN/Transit using json-transformer probably makes more sense (even though it includes some transformations that you won't need with EDN/Transit)

juhoteperi 2024-09-10T07:32:12.672029Z

You should be able to roll your own transformer which uses those/similar properties also, if needed. We might not have examples of that though.

cfleming 2024-09-10T07:33:20.672569Z

The API is using JSON (Anthropic's LLM API). I'll try to come up with a minimal repro tomorrow, at the moment I'm getting by with this:

(defn munge-keywords-for-output [data]
  ; fuck it, I need to get some work done...
  (walk/postwalk
    #(if (keyword? %)
       (transform-keyword snake-case %)
       %)
    data))
But it's clearly not ideal.

Ben Sless 2024-09-10T07:39:24.103159Z

Do you have it all specified?

cfleming 2024-09-10T07:53:31.771949Z

Here's a gist with a stripped-down example: https://gist.github.com/cursive-ide/af961a5c513adcb12bd75509813b1619

cfleming 2024-09-10T07:54:57.573389Z

As shown, testit outputs:

{"type":"first_message","reason":"foo_bar"}
{"type":"first-message","reason":"foo-bar"}
{"type":"second_message","value":0.0}
{"type":"second-message","value":0.0}

cfleming 2024-09-10T07:55:19.892069Z

So the ->kebab isn't being applied on the encode, for some reason.

cfleming 2024-09-10T07:57:21.245169Z

Also, I have string-transformer on the input, if I don't I get a failure because the value that is being conformed is: {:type "first_message", :reason "foo_bar"}, i.e. the entry values are not converted to keywords despite being spec'ed as such. However, I need to use json-transformer on the output, because if I use string-transformer there, the float value is coerced to a string.

cfleming 2024-09-10T07:59:58.528409Z

I'm very new to Malli, so my understanding of how this works is sketchy at best, but despite having tried to read the doc to sort these issues out, it's been hard to make it work.

cfleming 2024-09-10T08:04:44.384299Z

Ok, based on @juhoteperi's comment above, I've made some progress, if I switch to decode/json instead of decode/string for the type field, then I can use json-transformer on the input too. It's definitely not clear from the doc that those annotations are related to the corresponding built-in transformers, but it makes sense now I know that.

cfleming 2024-09-10T08:08:05.541959Z

But I would be very interested in knowing why the ->kebab transformer isn't applied on the encoding.

juhoteperi 2024-09-10T08:10:26.413719Z

Maybe = doesn't match enum or any of those

Ben Sless 2024-09-10T08:10:37.695809Z

transforming the inspected values under :mullti - my only weakness

juhoteperi 2024-09-10T08:11:11.159149Z

Try adding encoder for :=

juhoteperi 2024-09-10T08:11:34.290329Z

https://github.com/metosin/malli/blob/master/src/malli/transform.cljc#L274 the built-in transformers have it separate from enum also

cfleming 2024-09-10T08:17:04.448919Z

Ok, adding one for := gets me closer, weirdly it seems to get a string not a keyword, so := #(clojure.string/replace % "-" "_"). That works for the type fields, but the enums are still not being transformed.

cfleming 2024-09-10T08:17:58.595499Z

Oh wait, the enums also receive a string, so :enum #(if (string? %) (clojure.string/replace % "-" "_") %) fixes that, too.

cfleming 2024-09-10T08:21:58.154449Z

Ugh, but then other values which are actually strings also get munged (in the real data). So "claude-3-5-sonnet-20240620" gets converted to "claude_3_5_sonnet_20240620". That one is a string enum. Can I use the schema type somehow in the encoder?

cfleming 2024-09-10T08:25:11.977729Z

I thought :encoder/json identity might help there, but it still gets translated.

cfleming 2024-09-10T08:40:57.835239Z

I updated the gist, the config there works except for the string enums.

juhoteperi 2024-09-10T08:40:59.123709Z

:encode/json

cfleming 2024-09-10T08:42:11.142659Z

Arg. Unfortunately, it still doesn't work.

cfleming 2024-09-10T08:43:01.732729Z

When composing transformers, is it expected to use the same transformer chain in the same order for encoding and decoding?

juhoteperi 2024-09-10T09:03:42.580809Z

same transformer chain, encoders work on interceptor enter and decoders on leave

juhoteperi 2024-09-10T09:09:09.796759Z

On the gist, keyword or = encoder runs before it gets to encode-type (if you add that to the :multi schema props)

cfleming 2024-09-10T09:10:51.172809Z

I've added another case to the gist. The reason enum at the top level is correctly mapped, but the stop-reason one embedded inside delta is not.

cfleming 2024-09-10T09:11:01.915179Z

Ok, I'll try that.

juhoteperi 2024-09-10T09:11:31.007489Z

I've probably usually handled the map-key transformations on json decoding/encoding part, with Jsonista options. It is the most performant option. But if you want to do it with malli... I think having :map handler on the ->kebab transformer would be enough

juhoteperi 2024-09-10T09:12:10.950369Z

Or I guess you also want to change values to snake-case, then this is ok

cfleming 2024-09-10T09:12:21.725289Z

The map key part actually works ok, it's the map values that are the problem.

cfleming 2024-09-10T09:13:23.902229Z

But I already do have a :map handler on the ->kebab transformer.

cfleming 2024-09-10T09:15:46.052989Z

encode-type is actually not used in that gist, I had that there from when I was trying to add it manually to some fields.

juhoteperi 2024-09-10T09:22:23.377709Z

One of your decoders returns STRING values, e.g. for :reason, though schema is enum of keyword for them. Then you call encoder with non-conforming value, so the encoding doesn't run correctly

juhoteperi 2024-09-10T09:23:11.292919Z

or hmm. checking m/validate does say decoded value is valid.

cfleming 2024-09-10T09:23:29.975529Z

All the decoders should return keywords.

juhoteperi 2024-09-10T09:24:08.952349Z

(defn round-trip [msg-json]
  (let [decoded (m/coerce Message
                          (json/read-str msg-json :key-fn keyword)
                          message-input-transformer)
        _ (println "decoded" decoded)
        valid? (m/validate Message decoded)
        _ (println "valid" valid?)
        encoded (m/encode Message
                          decoded
                          message-output-transformer)]
    _ (println "encoded" encoded)
    (json/write-str encoded)))


decoded {:type :second-message, :value 0.0, :reason foo-bar}
valid true
encoded {:type second_message, :value 0.0, :reason foo_bar}

juhoteperi 2024-09-10T09:25:11.603199Z

Oh reason is enum of strings so this is correct

cfleming 2024-09-10T09:26:17.109919Z

Yes, I have two cases of enums in this API - they have some which are identifiers (e.g. for stop reasons), which I would like to have as keywords since that makes sense. But they also have enums of strings, e.g. for model names, which need to be preserved.

cfleming 2024-09-10T09:26:36.738189Z

I mean, I could just make the model name a string and not validate it, that wouldn't be terrible.

cfleming 2024-09-10T09:29:04.410339Z

But the other weird case is stop-reason, which is inside delta. That looks the same as reason at the top level in the same object, but one is correctly encoded and the other is not.

juhoteperi 2024-09-10T09:32:28.159139Z

your enum encoder fn has keyword call instead of keyword?

cfleming 2024-09-10T09:33:19.818589Z

Arg, that will do it.

juhoteperi 2024-09-10T09:36:06.993979Z

I added println to enum encoder and it seems like it isn't even called for the stop_reason

cfleming 2024-09-10T09:37:30.747069Z

And it looks like that actually receives strings anyway. So in fact the erroneous keyword check was doing nothing anyway, it always returned true because it's always passed a string, just :enum #(clojure.string/replace % "-" "_") works fine. But the string enums are still incorrectly encoded, and stop_reason is still skipped, right.

cfleming 2024-09-10T09:38:30.078149Z

If it's not called, I wonder if a higher level schema is not validating and that's causing that to be skipped.

juhoteperi 2024-09-10T09:38:53.701009Z

Or is it that the json transformer is running first

juhoteperi 2024-09-10T09:39:20.337039Z

Probably, and json transformer encoder fn will take care of doing keyword -> string conversion so the second transformer sees the values as strings already

juhoteperi 2024-09-10T09:39:34.788849Z

but hmm, why aren't the encoder fns even called

cfleming 2024-09-10T09:40:23.252829Z

I don't understand why the :enum and := encoders see strings, but the :keyword one sees keywords.

juhoteperi 2024-09-10T09:42:01.547659Z

decode keyword sees keywords because json-transformer already is doing that transformation first

juhoteperi 2024-09-10T09:42:57.335289Z

decode enum for FirstMessage :reason also sees keywords for same reason

cfleming 2024-09-10T09:43:00.113909Z

Actually, in json-transformer they both have:

:enum {:compile (-infer-child-compiler :decode)}
   := {:compile (-infer-child-compiler :decode)}
And if I understand correctly the child compilers will convert to strings. So right, in those two cases, json-transformer will convert them to strings.

cfleming 2024-09-10T09:43:20.449439Z

(for encoding, that is)

cfleming 2024-09-10T09:45:25.540719Z

Right, I understand how that works a little better now.

cfleming 2024-09-10T09:47:16.628379Z

So by the time ->kebab is called, all enums are strings on encoding. I can't tell them apart unless I can access the schema in the encoder. It looks like I would have to use a :compile encoder, is that right?

juhoteperi 2024-09-10T09:48:50.479319Z

I don't really understand that part, but yes, seems like you would be able to access the schema and possibly even control which interceptor stage those run

juhoteperi 2024-09-10T09:49:36.497739Z

I still don't understand why enum encoder isn't called for FirstMessage :delta :stop-reason :enum

juhoteperi 2024-09-10T09:50:23.350959Z

Is there something that stops the encoding process once the value is already encoded once

cfleming 2024-09-10T09:57:20.940579Z

Ok! I can handle the string/keyword enums doing this:

{:enum    {:compile
                (fn [schema _]
                  (if-let [type (some-> schema
                                        (m/children)
                                        (m/-infer))]
                    (if (= type :keyword)
                      #(clojure.string/replace % "-" "_")
                      identity)
                    identity))}

cfleming 2024-09-10T09:58:08.192209Z

The only remaining mystery is the stop-reason one.

juhoteperi 2024-09-10T10:00:22.494789Z

Ha! After the json-transformer the map key is transformed to stop_reason -> the ->kebab transformer doesn't then see the stop-reason key.

cfleming 2024-09-10T10:00:55.813399Z

Ohhhh

juhoteperi 2024-09-10T10:01:01.858719Z

I'm not sure what would be a correct way to handle case where you combine transfomers and the first one changes map keys

cfleming 2024-09-10T10:01:25.008009Z

So the map key is a string by then?

juhoteperi 2024-09-10T10:01:53.762299Z

not sure if string/keyword matters here, but the value is different

juhoteperi 2024-09-10T10:03:06.154839Z

It works if you rename the key to stopreason

cfleming 2024-09-10T10:03:10.894189Z

So because the value is different, the encoder is no longer called because the key no longer matches the schema, is that right?

juhoteperi 2024-09-10T10:03:15.297099Z

so there is no - _ difference

juhoteperi 2024-09-10T10:03:20.800449Z

yeah

cfleming 2024-09-10T10:03:54.595479Z

Surely I can't be the only person in the world wanting to convert snake to kebab when calling an API πŸ™‚

juhoteperi 2024-09-10T10:04:56.898999Z

Well, I have experience from several solutions to this, also before Malli. We have had most success when avoiding this, and just using whatever naming the external system/API definition whatever needs. It doesn't look too bad to just have :foo_bar or :fooBar in the clojure code.

juhoteperi 2024-09-10T10:05:19.762219Z

But we have also done this at JSON level or JDBC options level for map keys.

cfleming 2024-09-10T10:05:22.557189Z

Yeah, I can see that being easiest and most robust.

juhoteperi 2024-09-10T10:05:28.158209Z

But very rarely for values I think.

juhoteperi 2024-09-10T10:05:46.111029Z

We might coerce values to keyword etc, but I don't think we've done letter-case type changes

cfleming 2024-09-10T10:06:20.656939Z

Yes, keys are easy, I can just use key-fn or whatever jsonista does for that.

juhoteperi 2024-09-10T10:07:09.422959Z

One option is to drop json-transformer and extend your own transformer to handle parts of what json-transformer would do which you need

cfleming 2024-09-10T10:07:26.990239Z

Ok. I will think about my best solution, but I suspect it will be just transforming the keys at the json level, and using snake case.

cfleming 2024-09-10T10:07:38.710819Z

Right, I was also thinking that - I have a very limited set of types.

juhoteperi 2024-09-10T10:08:13.001649Z

If you handle keys in json level, Malli might work better for values also, as the transformers won't need to change map keys then

cfleming 2024-09-10T10:08:28.947209Z

That's also true, yeah.

cfleming 2024-09-10T10:09:17.663199Z

Ok, thank you very much for your help, that was quite enlightening. I'll buy you a beer if I see you next week!

juhoteperi 2024-09-10T10:11:48.077649Z

I'm skipping Heart of Clojure but there will be a few others from Metosin πŸ™‚

cfleming 2024-09-10T10:12:38.161279Z

I'm sure there will be beer anyway - thanks again!

juhoteperi 2024-09-10T10:15:38.813149Z

Correction: json-transformer doesn't change stop-reason -> stop_reason, making the enum transformer to not match, but the ->kebab :map transformer seems to run before :enum transformer

juhoteperi 2024-09-10T10:15:53.749549Z

{"type":"first_message","reason":"foo_bar","delta":{"stop_reason":"end_turn"}}
decode enum :foo_bar
decode enum :end_turn
valid true
encode map {:type :first-message, :reason :foo-bar, :delta {:stop-reason :end-turn}}
encode enum foo-bar
encode map {:stop-reason :end-turn}
{"type":"first_message","reason":"foo-bar","delta":{"stop_reason":"end-turn"}}

juhoteperi 2024-09-10T10:16:11.685819Z

I would have though the enum encoder fn runs before map

juhoteperi 2024-09-10T10:19:23.497669Z

this seems to work

juhoteperi 2024-09-10T10:19:36.252179Z

you can use :compile to make :map encoder run after :enum

cfleming 2024-09-10T10:21:20.055739Z

I think I'm going to remove the :map encoder altogether, and just convert the keys at the JSON level. Like you say, that seems to make everything else work better.

πŸ‘ 1
cfleming 2024-09-09T23:16:55.436959Z

Is there any way to debug the transformation process? This has been by far the hardest part of using Malli, for me, and when it doesn't work it's incredibly opaque trying to figure out what's not working and why. Are there any tips for figuring out why something is not working?

Ben Sless 2024-09-10T03:11:20.258549Z

That sort of depends on the problem What I did trying to debug my own transformers was test individual transformers against values before composing them. Can you provide more details?

cfleming 2024-09-10T05:39:22.390239Z

I don't have specific cases at the moment, I've managed to get through them, but I was more interested in developing a better intuition for how they work. In particular, the fact that they're driven by the schema seems to make them quite brittle - if some values end up not being transformed for some reason, that then seems to stop other values being transformed in a cascade.

cfleming 2024-09-10T05:40:10.719749Z

I'm making a list of things which were not obvious (and some of which are still not obvious) when getting started, in case that's helpful, it's still a WIP though.

ikitommi 2024-09-10T05:40:11.250419Z

Transforming is one of the hardest parts to figure out. What could be done: 1. change -intercepting so that it sees both the schema and the transformer 2. allow adding a listener (via alter-var-root or option) 3. … e.g. allow capturing intermediate values and their context

ikitommi 2024-09-10T05:40:23.684239Z

poor mans solution (just values):

(defn -printing [f]
  (fn [x]
    (println "BEFORE:" x)
    (let [res (f x)]
      (println " AFTER:" res)
      res)))

(defn -intercepting
  ([interceptor] (-intercepting interceptor nil))
  ([{:keys [enter leave]} f]
   (some->> [leave f enter] (keep identity) (seq) (map -printing) (apply m/-comp))))

(alter-var-root
 (var m/-intercepting)
 (constantly -intercepting))

(m/decode
 [:map
  {:decode/math {:enter #(update % :x inc)
                 :leave #(update % :x (partial * 2))}}
  [:x [int? {:decode/math {:enter (partial + 2)
                           :leave (partial * 3)}}]]]
 {:x 1}
 (mt/transformer {:name :math}))
;BEFORE: {:x 1}
; AFTER: {:x 2}
;BEFORE: {:x 2}
;BEFORE: 2
;BEFORE: 2
; AFTER: 4
;BEFORE: 4
; AFTER: 12
; AFTER: 12
; AFTER: {:x 12}
;BEFORE: {:x 12}
; AFTER: {:x 24}
;=> {:x 24}

ikitommi 2024-09-10T05:41:19.952049Z

could be tap>d etc. but without schema and the transformer, it’s not fully useful.

cfleming 2024-09-10T05:43:06.055199Z

That looks interesting, thanks Tommi. I'll play around with that next time I'm having problems. I think one of the most mysterious parts is when particular transformers are applied (i.e. how the dispatch works), that looks like it would be helpful.

πŸ‘ 1
Jonathan Bennett 2024-09-09T02:59:34.953479Z

Ok, I've got a large project (at least for me. I'm guessing 3800 lines of code is more like "medium", but it's more code than I've ever written in a project before) that I wrote before I understood what spec/malli/etc were. And in that project, since it's a cljfx project, I keep getting errors to do with nils ending up places they shouldn't be. Would I be correct in thinking that Malli could help with that?

Thomas Moerman 2024-09-09T07:49:06.307299Z

I think it could. If you pass data structures around, that end up missing keys in inappropriate places, you could add Malli (or spec)-based assertions in strategic places (where these are depends on your project) to assert that the required keys are present when required, that the values have the appropriate type/shape etc. Malli can help with verifying deep or complex data structures and give you humanized error messages, so you don't have to craft these by hand.

Thomas Moerman 2024-09-09T07:58:21.565749Z

You should regard Malli as a handy tool (because strictly speaking you don't need it for this) to write runtime assertions, e.g. at the start of certain functions that do DB calls. Although you could add assertions everywhere, the art is to find a sweet spot of how many and where you write these assertions, cfr. diminishing returns.

Thomas Moerman 2024-09-09T07:59:00.795979Z

If you want, you can go fancy and use a library to add input/output validation to functions, like snoop.

Jonathan Bennett 2024-09-09T11:52:37.541529Z

Ok, I do have some pretty complex data structures.

Jonathan Bennett 2024-09-09T11:53:01.964499Z

So how do I try to add Malli to an existing project? Any advice on that?

Thomas Moerman 2024-09-09T13:17:09.743179Z

I'd start with writing some malli specs and play around with m/validate, m/explain in the repl. Once you get the hang of that, you can add these assertions in the code base.

Thomas Moerman 2024-09-09T13:20:37.983129Z

(let [schema [:map
              [:a {:optional true} :int]
              [:b :string]]]
  (when-let [explanation (m/explain schema {:a 123})]
    (throw (ex-info (str "Invalid spec: " (me/humanize explanation)) {}))))

Thomas Moerman 2024-09-09T13:20:44.640499Z

that should get you started

πŸ‘ 1
Jonathan Bennett 2024-09-09T03:00:05.822609Z

In short, I'm really confused by Malli and what it does, but it feels like it would solve a problem I keep having. Can you help me understand if it actually would help?