2025-01-07 malli | Clojure Slack Archive

malli 2025-01-07

2025-01-07T22:02:53.889749Z

Hello friends. I am looking for inspiration on how to solve a problem. The problem is identifying whether two schemas are "compatible", in the sense that for any value that is valid by Foo it will also be valid per Bar (but not necessarily in the other direction). Example:

(def Foo [:map [:a :int] [:b :string]])
(def Bar [:map [:a :int]])
(compatible? Foo Bar) ; => true, all Foos are valid Bars
(compatible? Bar Foo) ; => false, some Bars are not valid Foos

There are two main approaches I can think of to pursue: 1. Analyze the structure of the schema vector/keyword syntax and hand-write the compatibility logic at each node type. 2. Generate example values with malli.generator and check that all values generated from one schema are valid by the other schema. Is there any prior art for either approach? Approach #2 sounds fairly straightforward assuming all my output schemas are generable, though the results won't be 100% guaranteed to cover all the important possibilities. Approach #1 seems like a significant amount of effort, but as long as the schemas are constrained to a narrow slice (i.e. :map/`:sequential` collection types, common built-in leaf types, etc.) this seems doable and fairly reliable once finished.

opqdonut 2025-01-08T08:22:18.741879Z

for map schemas in particular, malli.util/merge will help you

opqdonut 2025-01-08T08:23:08.509099Z

you want to test something like (mu/equals Foo (mu/merge Foo Bar))

2025-01-08T16:54:57.032339Z

That will handle some cases, but there are some complicated cases that need a more sophisticated approach. Consider this example:

(let [foo [:map [:a :int]]
      bar [:map [:a :int] [:b {:optional true} :string]]]
  (mu/equals foo (mu/merge foo bar)))                       ; => false, should be true

✅ 1

2025-01-12T19:26:41.184809Z

This problem you're solving is called subtyping. Generative testing is a simple solution but won't always return the correct answer. If you really need a correct answer at any cost, you can translate the schemas to Typed Clojure types and use the subtyping algorithm there. I'm guessing this is overkill for you, but it would look like (subtype? (malli->type t1) (malli->type t2)).

😍 1

2025-01-12T19:28:30.712779Z

I think I've sketched out subtyping algorithms for spec and malli at some point also somewhere.

2025-01-12T19:28:56.637629Z

Probably would be a good contribution to malli itself tbh.

2025-01-12T19:32:10.489589Z

Here's the subtype algorithm for spec fwiw https://github.com/typedclojure/typedclojure/blob/main/typed/clj.spec/src/typed/clj/spec/subtype.clj

2025-01-12T19:32:33.436269Z

Do those subtype? and malli->type fns exist somewhere already or is this more hypothetical?

2025-01-12T19:32:51.233059Z

If you're interested I can find the real names, but yes they exist

2025-01-12T19:33:37.394509Z

I'm away from my computer today but it would be nice to have a breadcrumb to follow. I am definitely interested in this approach!

2025-01-12T19:33:55.264409Z

Alright I'll send you a demo

2025-01-12T19:35:01.937239Z

That subtype.clj looks like all the code I imagined I would need to write and was hoping someone else wrote it already 😅

😁 1

2025-01-12T21:00:37.749109Z

Added a test here https://github.com/typedclojure/typedclojure/blob/b9dfdba29091e70675828189372b9df676693420/typed/malli/test/typed_test/malli/schema_to_type.clj Should be able to run ./script/repl at the root to get an nrepl port to play with that file directly.

Piotr Roterski 2025-01-10T15:06:06.287349Z

hey @ben.grabow The other approach I could think of, is to embrace schemas as data and operate on diffs. Even though that might be just an expansion of your approach 1. and would probably require some fine-tunning on edge cases, I came up with the following:

(ns malli-compatible.core
  (:require [malli.core :as ma]
            [clojure.walk :as w]
            [clojure.data :as cd]))

(def optional?
  (ma/validator [:cat :any [:map
                            [:optional [:= true]]] :any]))

(defn compatible?
  [a b]
  (let [[_only-in-a only-in-b _in-both] (cd/diff a b)]
    (or (nil? only-in-b)
        (->> only-in-b
             (w/prewalk (fn [f]
                          (when-not (optional? f)
                            f)))
             (every? nil?)))))

^:rct/test
(comment
  (compatible? [:map [:a :int] [:b :string]]
               [:map [:a :int]])
  ;;=> true
  (compatible? [:map [:a :int]]
               [:map [:a :int] [:b :string]])
  ;;=> false

  (compatible? [:map [:a :int]]
               [:map [:a :int] [:b {:optional true} :string]])
  ;;=> true
  )

2025-01-10T16:17:03.766799Z

Hmm, that's an interesting approach! I think most of the complexity of this problem will be in the list of special cases and exceptions. Basically I think (when-not (optional? f) will need to turn into a big case/cond statement to cover many different situations. Consider:

(compatible? [:sequential :string] [:vector :string]) ; => should be false
(compatible? [:vector :string] [:sequential :string]) ; => should be true
(compatible? [:= :foo] [:enum :foo :bar]) ; => should be true
(compatible? [:enum :foo :bar] [:= :foo]) ; => should be false

Checking for map key membership is one small slice of the problem. I think this is solveable with some significant toil but I'd like to avoid all that toil if possible.

💯 1

2025-01-10T16:18:22.165059Z

For my short term I'm going with a generative approach and will probably punt on the analytical approach unless there is some battle-tested prior art that covers many of these cases already.

(let [generated-values (mg/sample producer)
      valid? (m/validator consumer)]
  (every? valid? generated-values))

🙌 1

2025-01-13T19:00:06.601229Z

I'm not sure if/when I'll be able to try out this analytical approach, but I greatly appreciate your work and your suggestion @ambrosebs. I will try to remember to post about my experience here if/when this becomes a priority for my company beyond the generative approach.

👍 1

2025-01-07T22:17:00.410229Z

Some favorable constraints that I am able to assume: • This check happens at configuration time, not runtime, so it's okay if it's a little slow. • My org is in control of all the schemas, so we can require that every schema has a reasonable generator. • It's okay to have some false negatives, e.g. (compatible? [:fn (constantly true)] [:fn (constantly true)]) ; => false is okay. We don't need to exhaustively compare opaque functions beyond what their generators and/or var identities can tell us.

Clojurians Log v2

malli 2025-01-07