Fork me on GitHub
#clojure-spec
<
2019-10-14
>
yuhan12:10:10

Is there a regex op which expresses "n to m repetitions of a spec"?

vlaaad12:10:56

@qythium &:

(s/valid? (s/cat :3-to-5-evens (s/& (s/* even?) #(<= 3 (count %) 5)))
          [2 2 2 2 2]) 
=> true

yuhan12:10:38

hmm, what if the sub-specs are variable length regex ops themselves?

yuhan12:10:19

something like: (re-find #"^(?:a+|b+){3,5}$" "aababbbbaa")

jaihindhreddy12:10:17

I believe that cannot be done with a regular language. You can do (s/coll-of ::s :min-length n :max-length m) though.

yuhan14:10:23

hmm, I'm not so familiar with automata theory, but isn't the above string-based regex constructed using a regular language?

yuhan14:10:02

Good to know that I'm not missing anything obvious built-in though

andy.fingerhut15:10:32

Something like a{1,2} can be hand-expanded into what it means, which is a|aa. The expansions can get pretty long, of course.

Alex Whitt14:10:01

Hi! I have some s/keys specs in which the keys' specs use s/or, and the map is only valid for certain combinations of s/or paths. I ended up writing a function, keys-conform?, which allows me to give a map of keywords to spec path, and check those spec paths against the conformed values:

(s/def ::foo (s/or :int    int?
                   :string string?))
(s/def ::bar (s/or :branching ::foo
                   :kw        #{:a :b}))
(s/def ::map (s/keys :req [::foo ::bar]))

(deftest keys-conform?
  (are [?conform-test ?input ?result]
      (= ?result (keys-conform? ?conform-test (s/conform ::map ?input)))
    {::foo :int ::bar :kw}               {::foo 1 ::bar :a}            true
    {::foo :int ::bar :kw}               {::foo "hi" ::bar :a}         false
    {::foo :int ::bar :branching}        {::foo 1 ::bar "hi"}          true
    {::foo :int ::bar [:branching :int]} {::foo 1 ::bar :a}            false
    {::foo :int ::bar [:branching :int]} {::foo 1 ::bar 1}             true))

My typical use-case involves using keys-conform? in other specs:
(s/def ::specific-map
  (s/and ::map
         (s/or :both-int
               #(keys-conform? {::foo :int ::bar [:branching :int]} %)

               :string-and-kw
               #(keys-conform? {::foo :string ::bar :kw} %))))

(s/valid? ::specific-map {::foo 1 ::bar 1})
;=> true

(s/valid? ::specific-map {::foo 1 ::bar "hi"})
;=> false

(s/valid? ::specific-map {::foo "hi" ::bar :a})
;=> true

(s/conform ::specific-map {::foo "hi" ::bar :a})
;=> [:string-and-kw #:user{:foo [:string "hi"], :bar [:kw :a]}]
This works, but as I've been using it, I'm finding I also want access to more subtle conditions for valid paths, like "or" and "not", and the logic is getting more complex. So I'm wondering if there's already a built-in way to do what I'm doing, or if someone has written a library like this? I can't think of a good way to use multi-specs for this, but maybe someone else can? (Also, I'll preempt any possible objections that ::foo and ::bar should be split up into different keys. In my use case, they really do have to be the same key with different possible paths.)

seancorfield20:10:10

@alex.joseph.whitt Is there any sort of disciminant you could use to turn this into a multi-spec?

Alex Whitt20:10:31

The problem seems to be that the branching at the level of ::specific-map is sort of multi-dimensional. So there's not necessarily one keyword that changes everything. The options are sort of mix-and-matchable. Do you think there might still be a way?

Alex Whitt20:10:05

The domain in question is parsing a rather nuanced binary protocol called BACnet. The packets have a number of valid shapes, but just as many invalid combinations of fields.

Alex Whitt20:10:40

And the fields are obnoxiously polyvalent

Alex Whitt20:10:29

Perhaps there could be a solution in nested multi-specs? Like if the flag in the header says this is a foo-packet, check foo-spec, and foo-spec has a number of other options under it based on the given values?

Alex Whitt20:10:54

Well actually, it seems like multi-specs would just be replacing the use of s/or in ::specific-map... I don't think that would change the use of keys-conform?...

misha20:10:33

@alex.joseph.whitt do you have specification document url?

Alex Whitt20:10:25

Do you mean you want to see my real use case?

misha20:10:57

but I think the issue here is your attribute tries to be too much different things. I'd try to model it with many attributes instead, e.g.

(s/def : int?)
(s/def :bac.str/foo string?)
,,,

misha20:10:31

might save you some ifs down the road

misha20:10:09

another question is: "why exactly do you want to spec that?"

misha20:10:39

and "how many valid combinations there are vs. how many possible ones?"

misha20:10:14

lol, it's 150$ for pdf

Alex Whitt20:10:43

I can put a portion of my code up in a gist in a bit. I have a few main reasons why I want to spec it: 1. To validate packets and my parsing logic 2. To have the ability to generate random but valid packets for automated test purposes (testing an embedded device that supports the BACnet protocol). 3. Generative testing for code that touches BACnet How many valid combinations: Depends. For most parts that would use this pattern, probably less that 6 for each level of abstraction. The possible combinations are probably 2-3 times that number. Heh, yeah it's a closed standard. As for splitting the keys, that's a pattern I can and do utilize, but at some point it does boil down to one key representing multiple possibilities due to the nature of the fixed positions in the binary protocol. I'll keep noodling on it though.

misha21:10:24

> BACnet currently defines 35 message types, or "services," that are divided into 5 classes. sound like you can put em into multispec

misha21:10:35

if messages are sequential, did you try to write reg-ex specs for them, instead of map ones? https://clojure.org/guides/spec#_sequences

Alex Whitt21:10:07

I'm working at a different level than the message types at the moment. At the message type level, absolutely a multi-spec works (and that's what I use.) Here's my code for the generic header: https://gist.github.com/WhittlesJr/4571fed1596400e242724972ee39b2d4

misha21:10:35

so the general advice is to reduce combinatorial explosion, and spec only things you need to spec. how exactly – depends on that protocol, and what exactly you are doing with it.

misha21:10:10

can you show raw header/message/whatever?

Alex Whitt21:10:39

yeah, one sec

misha21:10:59

I think this

(s/def ::class
  (s/or :application #{:application}
        :contextual  #{:contextual}))
can be just
(s/def ::class #{:application :contextual})

misha21:10:32

sorry, Sean kappa

Alex Whitt21:10:52

Here's an example capture of BACnet packets. Open in Wireshark and apply the bacnet filter. You'll probably also need to add UDP port 47824 as a BVLC decoding. (Analyze -> Decode as... -> +)

Alex Whitt21:10:37

I used the s/or there for testing branching in ::attributes

Alex Whitt21:10:04

Alternatively I suppose you could use a multi-spec, but seems to be 6 of one 1/2 dozen of the other

misha21:10:12

well, looking at raw data did not help opieop

Alex Whitt21:10:05

Merp. The BACnet spec is really dense and nuanced, takes a ton of contextualizing to grok the issues involved.

misha21:10:38

this is what I dislike about it: context. spec works great when attributes are the same regardless of context. and it (still) seems to me, flattening out amount of valid attribute combinations with different attribute names would be simpler solution, than trying to replicate 1to1 all the branchyness in specs (don't forget to write custom generators hehe)

misha21:10:39

looking at spec names (all ::/) namespaces of attributes don't mean much for you, so I'd try to encode types in those, to keep attribute name "polymorphic" (if you need for something)

Alex Whitt21:10:57

Yeah, I have to wrestle with the high-context nature of the spec when writing the parsing logic. It's horrible. I'll look into breaking it up, but something tells me that it won't be a fully satisfying solution. Part of the purpose of my code is to represent the packet fully, much like Wireshark does. It will make it less obvious what the actual binary fields are saying if I have three different optional keys rather than one key that can have three different types. But that's more of a human-level issue rather than a code-level issue.

misha21:10:27

what are you parsing exactly? what is your input to spec?

Alex Whitt21:10:53

If you want to look at the capture, check out packet 337 -> "Building Automation and Control Network APDU" -> "list of Values:" -> "{[4]". You should be able to recognize the header in there, which is the part of the spec that my gist is dealing with.

Alex Whitt21:10:38

Everything under the "list of Values:" has one of those headers (or is a standalone "header" acting as an opening or closing tag)

Alex Whitt21:10:49

Anyway, the approach I'm taking is working, I was just curious if there was already a solution out there or if I was coming at this from the wrong angle. Depending on how this goes, I may just make a library out of keys-conform? with some well-defined syntax. If my employer will let me open source anything, that is... :c

misha21:10:20

does it have generator tho? :)

Alex Whitt21:10:45

Yeah, actually I'm doing pretty well with generators, surprisingly. I'm finding that the coercion logic a la spec-coerce is synergizing really well with generators.

Alex Whitt21:10:31

Especially as you climb up the abstraction chain and you get more and more fields that are interdependent, the coercion approach is helping to keep generation sane

Alex Whitt21:10:53

But maybe I have yet to hit a real barrier and it's lurking somewhere in the thickets, waiting to spring an ambuscade on my productivity

Alex Whitt21:10:59

::attributes in the gist generates out of the box