Some news from Metosin open source team: We want to ensure the continuity of malli maintenance and to that end I have been named as main responsible person. I will be triaging new issues and PRs and handling releasing for the most part.
has tommi stepped back from maintenance or is he no longer with metosin? either way, thank you for taking over, i'm glad it's not been abandoned
Tommi is still with Metosin. I would say this means we have assigned clearer responsibilities and allocated time for the maintenance tasks.
cool. thank you for the clarification
Wonderful news, Malli works so nicely/naturally it's one of those things that just feels like its a part of clojure.core for me.
Alive and kicking, but -30% time for OS just now.
Releasing an alpha release for malli. We merged a PR that tightens up some previously unspecified behavior about transforming parsers. More information in the release notes and linked PR https://github.com/metosin/malli/releases/tag/0.20.0-alpha2
Clojars: https://clojars.org/metosin/malli/versions/0.20.0-alpha2
Looking forward to community feedback about the alpha. In our internal testing we have not noticed regressions.
@matti.uusitalo117 This change seems to break metosin/oksa. I've posted an issue to its repository for further details: https://github.com/metosin/oksa/issues/27
I've narrowed it down to (m/parser [:schema {:registry (oksa.parser/registry nil)} :oksa.parser/Arguments])
I'm guessing we need cycle detection like in malli.generators. I'll work on it.
Here's the inlined version that blows the stack:
(m/parser
[:schema {:registry
{::Name [:or :keyword :string]
::Value [:or
number?
:string
:boolean
:nil
:keyword
[:sequential ::Value]
[:map-of ::Name ::Value]]
::Arguments [:map-of ::Name ::Value]}}
::Arguments])
minimal failure:
(is (m/parser
[:schema {:registry
{::Value [:sequential [:ref ::Value]]}}
::Value]))
fixed it, both schemas are inferred as having simple parsers, and it resolves the oksa issue. I'll add a bunch of interesting unit tests with strange cycles and send a pr.
a nice abstraction emerged from using the same principle to detect cycles as generators, I'm sure it's useful in other places like tying the knot for validators. the problem is -validator doesn't take an opts arg 😕
here's the pr https://github.com/metosin/malli/pull/1234
Here's a sketch of my idea for tying the knot for :ref validators: https://github.com/metosin/malli/pull/1235
Should the :or schema also disallow multiple transforming parsers? Otherwise you'd get ambiguous parses like
(m/parse [:or
[:orn [:x :int] [:y :boolean]]
[:orn [:x :boolean] [:y :int]]]
123)
;; => #malli.core.Tag{:key :x, :value 123}The main problem with :and is its "flowing" behavior of the parsed value going from left to right. in [:and [:orn ..] S] S would receive the parsed value of :orn to parse, instead of the original value. This doesn't really make sense. OTOH :or is unambiguous in this respect. I think you bring up a good point that attempting to unparse this :or would be ambiguous.
If you were thinking about unparsing, could you find a concrete example of it breaking unparsing?
It may just be the nature of :or that it works poorly with unparsing. If you really cared about it, you'd use :orn today.
maybe there's a third schema we can think about [:orpos ..] which parses schemas using positional tag keys, or even [:or {:parse/mode :positional} ..] , the default being :transparent mode. IIRC this really crystallizes an important deviation with spec's design, that everything is tagged. This is the consequence AFAICT.
but we can't really change the format of :or's default parsed value without disruption. ditto restricting its parsing to only schemas that unambiguously unparse.
IMO. would love to hear your thoughts @qythium
yeah I can't think of a 'realistic' example offhand, the above is of course really artificial but it kinda makes me wary about the compositional soundness of the whole parsing system in the large
🤔
(let [schema [:or [:fn record?] [:orn [:i :int]]]]
(->> 123
(m/parse schema)
(m/unparse schema)))
;; => #malli.core.Tag{:key :i, :value 123}Yes, maybe there's something we can do about that at least in terms of removing the footgun. We could have an extensible linter to check if your schema is fully roundtrip-able. For :or that contain overlapping schemas, we return false. So if you're doing real work with parsing, you'd assert that check before you m/parse anything.
is it overly conservative on all regex schemas by design treating them as always transforming? I realise this can be handled with the :parse/transforming-child 0 prop
(m/parse [:and
[:orn [:v vector?] [:l list?]]
[:cat :keyword [:* :any]]]
[:x])
;; => Execution error (ExceptionInfo) at malli.core/-exception (core.cljc:203).
;; :malli.core/and-schema-multiple-transforming-parsersI don't think I gave much thought to regex schemas.
so no, that's open to suggestions
Which regexes are actually transforming?
Here's the linter I proposed, it detects your example using record? and seems pretty accurate https://github.com/frenchy64/malli/pull/38
It actually leverages -parser-info so that's a point to it being a good abstraction. I'm exploring using malli schemas to write Typed Clojure typing rules and I think I'm going to use this to force users to only use schemas with roundtripping parsers.
Yup, can double-confirm malli master now works against oksa. Thanks @ambrosebs!
yeah, apologies for somewhat derailing the conversation :P I haven't actually used m/parse in production (mostly validate/coerce) so the above was all theoretical edge-case finding, would be great if others with real experience could chime in.
I did find the behaviour of -parser-info somewhat unintuitive (how it appeared to 'drop information' from child schemas by returning nil) - but I guess that's an internal API detail users shouldn't have to interact with.
@qythium fair point on -parser-info. I was a bit vague on what nil means, the way I think about it is there's 3 values of (:simple-parser (-parser-info s)):
• true (non-transforming parser)
• false (always-transforming parser)
• nil (unknown)
I conflated the last two because I couldn't motivate distinguishing them.
hmm I just had a look at this, not quite convinced that it's worth the engineering effort to prevent such edge cases - after all I don't think Malli isn't trying to be a completely sound type system in the formal sense
eg here's another constructed example
(m/parse :keyword :malli.core/invalid)
;; => :malli.core/invalid
is that a passing or failing parse? Malli uses social namespacing conventions to say there's a 99.9% chance that users won't be manipulating this specific keyword in their domain, I think that's a reasonable tradeoff but it means that you sacrifice 'correctness' on such gotchas like
(m/parse [:tuple :int :keyword] [1 :malli.core/invalid])
;; => :malli.core/invalid
(m/unparse [:tuple :int :keyword] [1 :malli.core/invalid])
;; => :malli.core/invalidre regex schemas, I think it's also reasonable to treat them as always-transforming, even the versions without -n because of what happens when they're nested:
(m/parse [:cat :int [:cat :int]]
[1 2])
;; => [1 [2]]
which.. if you squint a little lets you construct all kinds of non-injective parses which violate the rountrip property:
(map (m/parser [:cat :int [:? [:maybe :int]] :int])
[[1 2 3]
[1 3]
[1 nil 3]])
;; => ([1 2 3] [1 nil 3] [1 nil 3])
I do think this one's a bit more significant than the above one, returning nil as a Nothing sentinel feels a little iffy given how prevalent nils are in actual data (vs the m/invalid)
to be a little more realistic:
(def Hiccup ; (leaf node)
[:cat :keyword [:? [:maybe [:map-of :keyword :any]]] [:* string?]])
(m/parse Hiccup [:div "edge" "case"])
;; => [:div nil ["edge" "case"]]
(let [v [:div "edge" "case"]]
(= v (->> v (m/parse Hiccup) (m/unparse Hiccup))))
;; => falseHere's one that doesn't involve nils / maybes:
(let [s [:cat :keyword [:alt [:vector :any] [:+ :int]]]
v [:k 1 2 3]
roundtrip #(->> % (m/parse s) (m/unparse s))]
(list v (roundtrip v)))
;; => ([:k 1 2 3] [:k [1 2 3]])Parsing regexes is weird! :? especially seems either broken or needing a tagged counterpart.
(m/parse :keyword :malli.core/invalid)
;; => :malli.core/invalid
I guess this one still roundtrips tho? To distinguish the two cases you'd use a singleton :orn I think.Oh, did you conclude this was actually a failing parse?
yeah check out the following tuple thing, the impl basically treats any sub-parser return value of m/invalid as a monadic failure and threads that through to the top
Since there's no subparser in this case, is this a successful parse?
Is the :+ unparser broken? Why doesn't it unwrap the vector?
or is that because of :alt?
that's a useful feature actually - it means you can destructure the parsed value according to the nested structure of the defining schema, rather than the flat input
IIUC that's the parser, I was confused that the unparser wasn't firing, but then I realized you constructed the :alt such that it used :vector for the unparser.
yeah exactly, messing around with the asymmetry
different clauses being activated on parse vs unparse
does that mean :alt is has similar rules to :or in terms of roundtripping?
hmm it seems strange to me to use :alt outside of an enclosing :cat , although I guess that's totally valid - really haven't used these regex schemas much in practice. if you really wanted guaranteed rountrippability then probably the thing to do is use the -n versions, though that means desugaring [:? s] into something like [:altn [:zero [:cat]] [:one [:cat s]]] and similarly for [:+ s] into [:catn [:head s] [:tail [:* s]]
seems analogous to the situation with :or/`:orn`. I like the tip about :? tho, might be useful in the readme about general tips for robust roundtripping.
I do wonder what the practical upshot of guaranteeing 'true' roundtrippability is besides satisfying oneself of the unp ∘ p ≅ id 'law' - in the hiccup example above the output is actually in a sense isomorphic / 'normalized' version of the original in the sense of what it denotes? Agree that the regex ones might need a bit of closer look though, I can imagine some variation of my examples above making it into a real world bug in a much subtler / hard to trace form
Well the original problem was to make -parser-info more precise for as many schemas as possible so the new parser for :and causes minimal disruption.
yeah my gut feeling is that you'd have to to do a ground-up breaking change to solve this in fuller generality, in malli idiom something like (defprotocol Parsed (-ptype [_]) (-pval [_])), and then a (parse :int 5) would return something like (reify Parsed (-ptype [_] :int) (-pval [_] 5)) - though then the ergonomics get a lot worse
can regex parsers be unparsed? don't they nest?
> ground-up breaking change FWIW the whole point behind my linter idea is that we can't fix this. We can only try and detect if the subset of schema's we're using roundtrip. That's a useful idea to me, sometimes it's mission-critical. Most of the time, you never unparse.
One of the really disappointing things about spec for me is that you can't use it to manipulate programs using associative structures in practice, since you always lose some information during unconforming. This is one usecase for thinking about this problem systematically. You could then go even further, by preserving metadata.
> can regex parsers be unparsed? don't they nest?
I think they do unparse, the examples above are just edge cases where the unparsing logic has been hijacked. like in [:alt A B] you parse with B then unparse with A.
ah yeah, looking at the code, there's X-unparser for all of them
the implicit lesson here is always use tagged schemas for parsing if you intend to unparse.
hmm I've never really used spec conform/unform in practice, could you give an example of what that 'information loss' looks like? I suppose you're thinking in terms of using it as bidirectional lens for updating some nested part of a structure - curious actually where unparse/unform gets actually used in real world scenarios
Something like a set losing its sortedness or a list turning into a vector. Some information is just lost. It's been probably a decade since I really tried it, but you can find historical reports of people using the core specs for destructuring etc to manipulate syntax and not having success.
My frustration was more than the general idea of robust roundtripping was not a big a focus as I'd hoped.
update-in is so much nicer than a huge positional destructure with an apply list...
I think -parser-info opens up a lot of doors for tooling in this direction, that just isn't there in spec. Especially if the community implements extends their own schemas. So while Malli roundtripping is much less robust, I have hope that this problem can be addressed, even outside of malli itself, for people who want it.
(that being said, I don't know if -parse-info is really needed in spec because IIRC it prefers tags for conforming)
In terms of action items, I think we could improve the inference of the transforming spec for schemas like [:and [:alt ..] ..] and [:and [:cat ..] ..] in some cases. People do get :alt mixed up with :or and the :cat might be useful in the real world.
I think there's one in the readme like [:and [:cat ..] vector?] for improving the generator.
yeah I think it would be nice if :cat took a property to say whether it should match against lists/vectors/other(?) , the current way of saying [:and .. vector?] (and then having to manually add a :gen/fmap vec for generative testing) always felt like a bit of a kludge
Yes I have a solution for that in my long long backlog. Maybe next year.
But your solution is much more straightforward 🙂
What's a good name for such a property?
feeling something like :gen/into []
Could also create a :catv schema.
I can't think of any other base types you'd need. I think :cat uses seqs by default.
it's something I was planning to put a bit more thought into as well - wondering if it could be unified with another friction point I've noticed with [:multi {:dispatch first}] - or any other partial predicate - there doesn't seem to be any clean way of cleanly telling it to only match sequential things besides [:and sequential? :multi..] and even then it would throw an error on m/explain - thinking aloud both seem to point to a sort of :pre -like invariant , like a subschema that does double duty as a precondition for validating/parsing and post-transformation for unparsing/generating?
seems relevant https://github.com/metosin/malli?tab=readme-ov-file#distributive-schemas
should :and distribute over :multi?
Oh are you worried about guarding the dispatch function from getting bad input?
yeah, for example
(m/explain [:and vector? [:multi {:dispatch first}
[:k [:tuple :keyword :any]]]]
:oops)
;; => Execution error (IllegalArgumentException) at malli.core/-multi-schema$reify$reify$fn (core.cljc:1870).
;; Don't know how to create ISeq from: clojure.lang.Keyword(although this is maybe getting a bit off-topic from the original thread)
nah I appreciate wanting to get maximum value from new abstractions!
I bet that could be automatically generated by finding the commonalities between all branches.
brings up a lot of questions tho.
I think this :multi generates just fine, it's probably onto the user to have the dispatch function cover all vals (I'm surprised :multi doesn't wrap a catch implicitly). So yeah maybe a different axis of improvement than :catv.
I think :catv has other uses too, it's basically :tuple that supports regexes.
not a great name tho since it's not a regex. should be more like :tuple-regex or :vcat
i think :variant is a nice term for it - maybe a bit specific though I'm thinking of the tagged-variant idiom for sum types that malli doesn't seem to have a nice way of speccing, probably something that would live in malli.util though edit: will flesh that last bit out and start a new thread - maybe there's just something I'm missing
ok so I think simply reusing the same logic of :or in :alt's -parser-info is the lowest hanging fruit from this discussion.
and perhaps :tuple's in :cat.
and leave :? :+ and :* as transforming.
ah but only top-level regexes?
it's a bit different though, I think alt and cat are non-transforming only if their children are all non-regex schemas - see the [:cat [:cat ]] example
Great that still sounds achievable to implement.
(don't take my word for it though haha, I've taken a look at the regex impl and have no idea how its crazy CPS stuff works)
they wrote a great blog post about it here: https://www.metosin.fi/blog/malli-regex-schemas
summarized https://github.com/metosin/malli/issues/1230
yep even after reading that blog - the actual productionized/optimized impl turned out quite a bit different
catjam
I submitted a bugfix to the regex impl and I still have no idea how it works.
https://github.com/metosin/malli/commit/50e196109f2120cd42f7d92b7797c64b98946e1f
I'm sure there was a fleeting moment when I thought I understood.
Reading through these comments in the context of conversation about the alpha, this is mostly "yes and we might want to look at these things in the future" comments? I didn't see "no that's breaking my workflows" or "no that will be bad going towards the future" comments.
Vocabulary question For the following map:
(def ex-schema
(m/schema
[:map
[:id [:int {:min 1}]]
[:name :string]
[:email {:optional true} email?]
[:phone {:optional true} phone-number?]]))
Is there any terminology you would use to differentiate between the properties of the schema which is the value of the :id child (`{:min 1}`), and the properties of each of the children of the :map schema (`{:optional true}`)
I want to call both the :min 1 and :optional true "child properties", but that's ambiguousThose children are nested at different levels.
The :map has entry schemas like :id.
The entry schema has a key, props, and value schema part.
The value schema then can have its own props.
So :map contains the`:id` entry schema which contains the`:int` value schema.
Not sure if that helps.
To my knowledge we just refer to them as properties, but I see your point. In the :int spec we have a property that defines details about the value. In the case of :optional key it controls if the key is required in the map. So there is a difference
(map) entry properties
I appreciate the clarifications everyone gratitude
Sticking to the calling the map's children "entries" helps, as I think calling them "entry schema" is a bit of a misnomer (calling m/schema and m/properties on [:email {:optional true} email?] would fail (which actually inspired this question!)).
Helps me keep a better picture :^)