Hello friends. I am looking for inspiration on how to solve a problem. The problem is identifying whether two schemas are "compatible", in the sense that for any value that is valid by Foo it will also be valid per Bar (but not necessarily in the other direction). Example:
(def Foo [:map [:a :int] [:b :string]])
(def Bar [:map [:a :int]])
(compatible? Foo Bar) ; => true, all Foos are valid Bars
(compatible? Bar Foo) ; => false, some Bars are not valid Foos
There are two main approaches I can think of to pursue:
1. Analyze the structure of the schema vector/keyword syntax and hand-write the compatibility logic at each node type.
2. Generate example values with malli.generator and check that all values generated from one schema are valid by the other schema.
Is there any prior art for either approach? Approach #2 sounds fairly straightforward assuming all my output schemas are generable, though the results won't be 100% guaranteed to cover all the important possibilities. Approach #1 seems like a significant amount of effort, but as long as the schemas are constrained to a narrow slice (i.e. :map/`:sequential` collection types, common built-in leaf types, etc.) this seems doable and fairly reliable once finished.for map schemas in particular, malli.util/merge will help you
you want to test something like (mu/equals Foo (mu/merge Foo Bar))
That will handle some cases, but there are some complicated cases that need a more sophisticated approach. Consider this example:
(let [foo [:map [:a :int]]
bar [:map [:a :int] [:b {:optional true} :string]]]
(mu/equals foo (mu/merge foo bar))) ; => false, should be trueThis problem you're solving is called subtyping. Generative testing is a simple solution but won't always return the correct answer. If you really need a correct answer at any cost, you can translate the schemas to Typed Clojure types and use the subtyping algorithm there. I'm guessing this is overkill for you, but it would look like (subtype? (malli->type t1) (malli->type t2)).
I think I've sketched out subtyping algorithms for spec and malli at some point also somewhere.
Probably would be a good contribution to malli itself tbh.
Here's the subtype algorithm for spec fwiw https://github.com/typedclojure/typedclojure/blob/main/typed/clj.spec/src/typed/clj/spec/subtype.clj
Do those subtype? and malli->type fns exist somewhere already or is this more hypothetical?
If you're interested I can find the real names, but yes they exist
I'm away from my computer today but it would be nice to have a breadcrumb to follow. I am definitely interested in this approach!
Alright I'll send you a demo
That subtype.clj looks like all the code I imagined I would need to write and was hoping someone else wrote it already 😅
Added a test here https://github.com/typedclojure/typedclojure/blob/b9dfdba29091e70675828189372b9df676693420/typed/malli/test/typed_test/malli/schema_to_type.clj
Should be able to run ./script/repl at the root to get an nrepl port to play with that file directly.
hey @ben.grabow The other approach I could think of, is to embrace schemas as data and operate on diffs. Even though that might be just an expansion of your approach 1. and would probably require some fine-tunning on edge cases, I came up with the following:
(ns malli-compatible.core
(:require [malli.core :as ma]
[clojure.walk :as w]
[clojure.data :as cd]))
(def optional?
(ma/validator [:cat :any [:map
[:optional [:= true]]] :any]))
(defn compatible?
[a b]
(let [[_only-in-a only-in-b _in-both] (cd/diff a b)]
(or (nil? only-in-b)
(->> only-in-b
(w/prewalk (fn [f]
(when-not (optional? f)
f)))
(every? nil?)))))
^:rct/test
(comment
(compatible? [:map [:a :int] [:b :string]]
[:map [:a :int]])
;;=> true
(compatible? [:map [:a :int]]
[:map [:a :int] [:b :string]])
;;=> false
(compatible? [:map [:a :int]]
[:map [:a :int] [:b {:optional true} :string]])
;;=> true
)Hmm, that's an interesting approach!
I think most of the complexity of this problem will be in the list of special cases and exceptions. Basically I think (when-not (optional? f) will need to turn into a big case/cond statement to cover many different situations. Consider:
(compatible? [:sequential :string] [:vector :string]) ; => should be false
(compatible? [:vector :string] [:sequential :string]) ; => should be true
(compatible? [:= :foo] [:enum :foo :bar]) ; => should be true
(compatible? [:enum :foo :bar] [:= :foo]) ; => should be false
Checking for map key membership is one small slice of the problem. I think this is solveable with some significant toil but I'd like to avoid all that toil if possible.For my short term I'm going with a generative approach and will probably punt on the analytical approach unless there is some battle-tested prior art that covers many of these cases already.
(let [generated-values (mg/sample producer)
valid? (m/validator consumer)]
(every? valid? generated-values))I'm not sure if/when I'll be able to try out this analytical approach, but I greatly appreciate your work and your suggestion @ambrosebs. I will try to remember to post about my experience here if/when this becomes a priority for my company beyond the generative approach.
Some favorable constraints that I am able to assume:
• This check happens at configuration time, not runtime, so it's okay if it's a little slow.
• My org is in control of all the schemas, so we can require that every schema has a reasonable generator.
• It's okay to have some false negatives, e.g. (compatible? [:fn (constantly true)] [:fn (constantly true)]) ; => false is okay. We don't need to exhaustively compare opaque functions beyond what their generators and/or var identities can tell us.