Fork me on GitHub
#clojure-spec
<
2020-11-04
>
Jim Newton10:11:39

Yet another question about spec. This time about the semantics of the sequence designators such as s/alt, s/+, s/* etc. an element such as (s/+ even?) means (if I understand correctly, a sequence of 1 or more objects for which the even? predicate is true. but I can substitute a spec name for even? and say (s/+ ::big-even) and that will be a sequence of 1 or more objects each of which validates the ::big-even spec. However, what does (s/+ (s/alt :x even? :y prime?)) mean. I would guess that it means a sequence of 1 or more elements each of which either satisfy even? or satisfy prime? But if I define

(s/def ::even-or-prime (s/alt :x even? :y prime?)) 
what does this mean? (s/+ ::even-or-prime) It is a sequence of 1 or more sequences (each of which contains elements which are either even or prime), or is it a sequence of 1 or more integers each of which is either even or prime?

Jim Newton10:11:38

I'm pretty sure spec has a way to designate both, as they are both possible things the user might want to express. right?

borkdude10:11:46

@jimka.issy s/alt is for regexes, use s/or for normal predicates

borkdude10:11:48

(s/+ (s/alt ....)) means a sequence of one or more alternatives

borkdude10:11:36

it doesn't mean each, this can be expressed with s/every or s/coll

Jim Newton10:11:23

Sorry, my question too convoluted. Let me ask simpler. If ::foo is a spec defined somewhere using s/def , what does (s/+ ::foo) mean? Does it mean a sequence of 1 or more objects which each validate ::foo ?

Jim Newton10:11:49

So it is not referntially transparent?

Jim Newton10:11:07

I.e. substituting the expression for ::foo in place, changes the meaning?

borkdude10:11:32

I don't see why that would be the case?

Jim Newton10:11:21

suppose

(s/def ::foo (s/alt :x even? :y prime?))
Now (s/+ (s/alt :x even? :y prime?)) matches a sequence of 1 or more integers each of which are either even or prime, right? but (s/+ ::foo) matches a sequence of 1 or more objects which each satisfy :foo , i.e., a sequence of sequences each of which contain even or prime integers.

borkdude10:11:34

@jimka.issy Please give a working example. E.g.:

user=> (s/def ::foo (s/or :x even? :y neg?))
:user/foo
user=> (s/conform (s/+ ::foo) [2 4 -11])
[[:x 2] [:x 4] [:y -11]]
user=> (s/conform (s/+ (s/or :x even? :y neg?)) [2 4 -11])
[[:x 2] [:x 4] [:y -11]]

borkdude10:11:05

A spec is only applied to one thing, not to all elements of a collection, unless you use s/every or s/coll-of.

borkdude10:11:55

user=> (s/def ::nums-or-strings (s/or :nums (s/every number?) :strings (s/every string?)))
:user/nums-or-strings
user=> (s/conform ::nums-or-strings [1 2 3])
[:nums [1 2 3]]
user=> (s/conform ::nums-or-strings ["foo" "bar"])
[:strings ["foo" "bar"]]

Jim Newton10:11:57

yes, but you answer avoids my question. I'm not asking how to write a good spec. I'm asking about the semantics.

borkdude10:11:29

And I asked you about a counter-example of referential transparency. A working one without imaginary functions like prime? which does not exist in core.

Jim Newton10:11:54

Ok here is a very simple one.

vlaaad10:11:17

I mean you don’t really need prime? source to know its semantics

Jim Newton10:11:28

(s/+ ::foo) matches a sequence for which all the elements are either even or negative.

vlaaad10:11:55

looks right

borkdude10:11:15

Your words are ambiguous to me. Do you mean: every element is a ..., or on a case by case basis?

Jim Newton10:11:52

lets stick with integers and sequences thereof.

Jim Newton10:11:35

Now define

(s/def ::foo2 (s/+ ::foo))

Jim Newton10:11:31

what does (s/+ ::foo2) mean. It does not mean a sequence of 1 or more objects which of which match ::foo2

borkdude10:11:00

What does ::foo mean in your code here?

borkdude10:11:38

some kind of integer that's either this or that?

Jim Newton10:11:54

(s/def ::foo (s/or :x even? :y neg?))

borkdude10:11:22

then ::foo2 means multiple even or negs, but they can be mixed. it's not all or nothing

vlaaad10:11:23

I would. say (s/+ ::foo2) is a seq of 1 or more objects each of which is is ::foo2, why would you say it isn’t that?

Jim Newton10:11:36

(s/def ::foo (s/or :x even? :y neg?))
(s/def ::foo2 (s/+ ::foo))

Jim Newton10:11:00

Now ::foo matches even and negative integers

Jim Newton10:11:27

(s/+ ::foo) matches a non-empty sequence of even and negative integers

vlaaad10:11:00

sounds right

3
Jim Newton10:11:40

but (s/+ ::foo2)does not match a non-empty sequence of objects which match ::foo2

vlaaad10:11:16

ah, I think that’s because of how regex ops nest

Jim Newton10:11:07

yess!!!!!!!!

Jim Newton10:11:36

this was my original question (s/+ x) does not match a non empty sequence of things which match x

borkdude10:11:48

@jimka.issy If you want to do this, you need to use s/spec around the ::foo2

3
borkdude10:11:09

This has to do with regex nesting indeed, not with referential transparency. Read the spec guide, which explains this

borkdude10:11:59

> When regex ops are combined, they describe a single sequence. If you need to spec a nested sequential collection, you must use an explicit call to spec to start a new nested regex context.

vlaaad10:11:08

(s/valid? (s/+ (s/spec ::foo2)) [[2 -3 4] [2 -3 4] [2 -3 4]]) => true

Jim Newton10:11:10

Did I miss the paragraph about when regex ops are combined.

borkdude10:11:21

I assume you know how Ctrl-f works ;)

Jim Newton10:11:23

Besides, I'm confused about what "regex" operator means.

borkdude10:11:52

regex operators are things like s/+, s/?

Jim Newton10:11:57

sometimes it means string regexps. and some times it means sequence operations.

borkdude10:11:25

in the context of spec, regex means a spec which describes a sequence of things

borkdude10:11:39

The idea of spec is based on this paper: http://matt.might.net/papers/might2011derivatives.pdf where that terminology comes from

Jim Newton10:11:45

so does this mean that if a sequence operator is directly within another sequence operator it has this special modal/merging meaning. otherwise it has its normal meaning?

borkdude10:11:46

(or maybe not that one, but the paper where that paper is based on)

borkdude10:11:13

> When regex ops are combined, they describe a single sequence. If you need to spec a nested sequential collection, you must use an explicit call to spec to start a new nested regex context.

borkdude10:11:36

s/alt returns a regex op, s/or is for combining predicates

borkdude10:11:46

so s/or is not a regex op.

Jim Newton10:11:51

and this one, where I introduced the derivate for implementing something quite similar to spec in common lisp https://www.lrde.epita.fr/dload/papers/newton.16.els.pdf

Jim Newton10:11:53

the system I devised (in 2016) and after, is called RTE (regular type expressions).

Jim Newton10:11:19

What i'm trying to do now is figure how similar or different it is from spec at a theoretical leve.

Jim Newton10:11:32

So I'm trying to see if I can "compile" a spec into an RTE.

Jim Newton10:11:13

if so, then an RTE can me validated in linear time with no backtracking.

Jim Newton10:11:36

as I understand it, a spec takes exponential time worst case, so users avoid worst cases.

Jim Newton10:11:13

however, since I have found no academic papers on spec, and I am not an expert of spec, I'm just guessing. Maybe I'm completely wrong about my supposition

borkdude10:11:15

@jimka.issy I'm interested in that for clj-kondo as well. Clj-kondo has a tiny type system in which I try to avoid backtracking of type checking arguments. Currently by just not allowing sophisticated constructs :)

Jim Newton10:11:45

interesting*

Jim Newton10:11:43

I submitted a paper recently to the Dynamic Languages Symposium where I introduced a Simple Type System for use in dynamic languages. The paper was rejected. One of the reasons for rejection was the revewers didn't see practical applications.

Jim Newton10:11:45

a simple type system allows you to reason about types (where type is defined as any set of values) . Furthermore intersections, unions, and complements of types are also types.

Jim Newton10:11:19

If your type logic can conclude that a type is empty, then you've usually found a bug in the user's code.

borkdude10:11:29

It sounds like it could be very useful for clojure. Are you aware of the work done in core.typed?

Jim Newton10:11:42

Thats Ambrose?

Jim Newton10:11:11

He's looked at my work and is interested in helping out. but everyone is busy.

Jim Newton10:11:41

I'd also love to have a student researcher interested in working with clojure.

borkdude10:11:44

It looks like your work could help improve clj-kondo's type system since linear checking is necessary for performance

Jim Newton11:11:33

what's clj-kondo written in? is it written in clojure?

Jim Newton11:11:02

are you more of a hacker, or more of an academic?

Jim Newton11:11:23

or more of an engineer? I don't mean that in any way insulting. sorry if it sounds as such....

borkdude11:11:29

haha. I do have a CS master degree, but not a Phd in type systems ;)

borkdude11:11:24

clj-kondo focuses on usefulness though, whereas something like core.typed is maybe more an academic project

borkdude11:11:35

I don't care about publishing papers, I want the tool to help me

Jim Newton11:11:13

I'm not sure why my next best step should be. For the moment I have an implementation of genus, (the simple type system in clojure) and RTE (regular type expressions) which allows genus to represent sets of sequences which match regular expressions in terms of types. Genus claims to be extensible. For the moment my experimental task is see if I can indeed extend genus (in user space) to incorporate the spec type. I.e., given a spec, the type will be the set of all objects which validate the spec.

Jim Newton11:11:33

spec types will be treated as opaque types such as arbitrary predicate. except in the case where I can compile a spec into more fundamental types in which case the system can reason about them.

borkdude11:11:46

Another tool I made recently also uses clojure.spec for searching code: https://github.com/borkdude/grasp This could potentially made faster Another project which tries to use spec at compile time: https://github.com/arohner/spectrum It's experimental in nature.

Jim Newton11:11:10

for example it will be able to decide whether the set of objects satisfying spec1 and also spec2 is empty. or decide whether two given specs are equivelent.

Jim Newton11:11:53

in order for RTE to represent the regular type expression as a deterministic finite automaton, it needs to be able to determine whether two given types are disjoint. This is currently not possible with spec in the most general case, but could be possible in many particular cases.

Jim Newton11:11:50

As I understand spec's internal data structure uses non-deterministic finite automata, thus the exponential behavior in the worst case.

Jim Newton11:11:42

ANYWAY, maybe now you understand a bit better the reason for my strange beginner questions about the semantics of spec ???

borkdude11:11:54

yes, thanks for explaining!

borkdude11:11:26

The authors of spec usually emphasise that spec is not a replacement for a type system, but a runtime validation system.

Jim Newton11:11:43

in my opinion there is no difference

borkdude11:11:51

What might be of interest, I have a collection of specs for core functions here: https://github.com/borkdude/speculative

borkdude11:11:18

As in, spec could be expressed as a case of dependent typing?

Jim Newton11:11:54

what is speculative? I didn't immediately understand by reading the intro.

borkdude11:11:12

> a collection of specs for core functions

Jim Newton11:11:40

For me a type is a set of values. And a type system allows programmers to designate and reason about certain types.

Jim Newton11:11:44

that's what spec does.

borkdude11:11:42

yes, if you define it like that, it makes sense. but this set of values can often not be known at compile time, which makes the difference between static compile time checking

Jim Newton11:11:45

what is s/fdef ?

borkdude11:11:54

clojure.spec supports defining specs for checking function arguments and return values using fdef

Jim Newton11:11:39

yes many times you can indeed know a lot about a type at compile time. in those cases, compilers like the SBCL common lisp compiler can create more efficient code, or tell the user about errors or unreachable code.

Jim Newton11:11:17

so an fdef decribes what a valid function call site looks like?

Jim Newton11:11:45

I can see how the predicate based assumption of spec, could be a bottle neck for a code analyzer like kondo.

borkdude11:11:15

I initially used spec in clj-kondo, but I found the performance not good enough

Jim Newton11:11:42

one think that I've noticed is that many many many of these predicates can be rewritten as a simple type check. And I have code which automates this.

Jim Newton11:11:02

ahhh, so you have abandoned that piste now?

borkdude11:11:04

the format now uses something more explicit: you provide the seq of types per arity. so you don't have to do any backtracking to to match a seq based on some s/alt expression. which makes more sense to me considering the structure of multi-arity functions.

borkdude11:11:31

and a seq of types is usually a fixed number of things or a fixed number of thing + a variable amount of things of the same type

borkdude11:11:46

I think this matches 99% of cases how people call functions in clojure

Jim Newton11:11:06

for example

(gns/canonicalize-type '(satisfies int?))
  ==> (or Long Integer Short Byte)
(or ...) is a union type. when intersected and unioned with other types, it can be arranged so that redundant checks are eliminated.

borkdude11:11:37

clj-kondo also supports union types, it's just #{:int :string}

borkdude11:11:12

it also tries to infer returns types and threads types through let expressions

Jim Newton11:11:02

so it has type inferencing?

borkdude11:11:43

It has inferencing yes, but limited to the things it knows, else it's any? and it won't complain

Jim Newton11:11:05

I wrote something pretty similar some years ago for SKILL++ which was a proprietary lisp used by the company I worked for for many years.

Jim Newton11:11:50

in my system, called "loathing" it would see that foo returns type string (because of the definition), and the call site inc expects an integer. So the type at that point is (and integer string) which is the empty-type.

borkdude11:11:47

clj-kondo does something similar. but it just has a simple graph of things that are reachable or not: you can never go from string to int, so this is an error

borkdude11:11:09

but you can go from any to int, or any to string, so this is not an error

Jim Newton11:11:16

however, if inc had input type (or integer string), then the intersection would be (and string (or string integer)) = string != empty-set

borkdude11:11:31

yep, same thing in kondo

borkdude11:11:37

e.g.: this will not give a warning:

(defn foo [x]
  (if (int? x)
    x
    (str x)))

(inc (foo 1))

Jim Newton11:11:11

You mentioned that you are not interested in publications. But I need to make publications. It's part of my job.

borkdude11:11:29

Well, I meant that this is not the goal of clj-kondo.

Jim Newton11:11:36

I need to get several publications out of this work.

borkdude11:11:39

It's not the product of academic research

borkdude11:11:51

I have nothing against making publications

borkdude11:11:05

I do have my name on one publication ;)

Jim Newton11:11:33

and I think there are several interesting results to publish. But i'm a very poor publicist. I don't do well at convincing people of the interest or novelty of my work.

borkdude11:11:11

If you could get practical benefit out of your work by e.g. improving clj-kondo or a similar tool, I'm sure it will convince people

borkdude11:11:32

@jimka.issy I'm sharing this for fun, but it's also related: https://borkdude.github.io/re-find.web/ Find functions based on example in and outputs

borkdude11:11:46

This uses the speculative specs to search for matches

Jim Newton11:11:01

the clojure community is friendly for the most part. although I do see some snide comments now and then.

Jim Newton11:11:11

So it is usally pretty easy to get questions answered.

Jim Newton11:11:40

I would love to have some spec examples for test cases.

borkdude11:11:44

I would say the clojure community is better in that respect than some others

Jim Newton11:11:27

Could I solicit you to give me some spec examples, simple and complex. some which are trivial, and some which are very slow to validate?

Jim Newton11:11:53

For the moment I'm just worried about sequences, not functions or maps or data structures. My code is not advanced enough to handle those.

Jim Newton11:11:20

I want to try to compile these to RTE and see if I can come up with some cases where RTE is faster.

Jim Newton11:11:35

if RTE is never faster, then i'm in trouble

borkdude11:11:37

You could take a look at speculative, which has more than hundred specs

borkdude11:11:59

I don't have a particular example of a slow spec, but I guess you can fabricate one based one which does backtracking

Jim Newton11:11:18

when analyzing code statically, i suppose you are usually looking at nested sequences, not really at maps and such.

borkdude11:11:58

This tool: https://github.com/borkdude/grasp searches through code. It applies the spec at every sub-tree of an s-expression

borkdude11:11:09

So performance improvements would be really noticeable there

borkdude11:11:49

I have to go now. speak to you later

Jim Newton11:11:20

ok, thanks. talk later

Jim Newton11:11:40

BTW i filed an issue for kondo, was it helpful?

borkdude11:11:05

Yes, thank you. I'm not sure if clj-kondo will be able to support this, as it sees one file as a standalone unit and in-ns doesn't really match with this model. So either using in-ns with :refer on the original namespace requires you to help clj-kondo with some annotations or config, or clj-kondo has to make non-trivial changes.

borkdude11:11:44

I will think about this for a while

Jim Newton13:11:44

I don't think s/& is a regular expression operation. It doesn't seem regular. as far as I can tell, it requires the implementation to remember the sequence being tested. a regular expression demands that every decision be based only one the current object of the sequence, and a state. whereas only a finite number of states are allowed. In this case if there are N states, you cannot test a sequence of length N+1.

Jim Newton10:11:33

Wait. I think I'm confused. Let me work out a more consistent example. And come back to you in 5 minutes. OK?

👍 3
Jim Newton10:11:28

taking this conversation to a thread. I hope everyone follows.

Jim Newton11:11:42

another spec question. Are all spec definitions global? If I'm writing test cases, I'd love to introduce local definitions that don't pollute the global space.

borkdude11:11:25

@jimka.issy Yes, spec has one global registry (compare spec keywords to RDF uris for example). Malli (#malli) which aims to be an alternative schema library has local registries as well.

Jim Newton11:11:48

I suppose for test cases I can use ::xyzzy to keep the symbols in my local namespace.

Jim Newton11:11:00

Do I understand correctly that there is no intersection equivalent of s/alt for regex semantics. I.e., s/or is to s/alt as s/and is to what?

Jim Newton12:11:55

ahh, it's not included in https://clojure.org/guides/spec#_sequences, at least not in the table with the others

Jim Newton12:11:25

syntactically does s/& work like s/alt and s/cat in that I need to tag the components? or like s/and where no tags are used?

borkdude12:11:27

@jimka.issy In the guide, search for: > Spec also defines one additional regex operator, &, which takes a regex operator and constrains it with one or more additional predicates. to see an example

Jim Newton12:11:07

ahh so s/& is not really analogous to s/alt it is yet a third syntax and semantic.

borkdude12:11:48

I thought you were asking analogous to s/and. > I.e., s/or is to s/alt as s/and is to what?

borkdude12:11:59

s/alt and s/& are supposed to be used to build regexes, s/and and s/or are just general ways of combining predicates

borkdude12:11:06

but sometimes they do the same thing

Jim Newton12:11:42

It looks (from the documentation) that s/& is not analgous to s/and in the same way that s/alt is analogous to s/or. if they were analogous I'd expect s/& to work like this.

(s/* (s/alt :x (s/cat :a neg? :b even?) :y (s/cat :c odd? :d pos?)))
(s/* (s/& (s/cat :a neg? :b even?) (s/cat :c odd? :d pos)))
but instead it works like this
(s/* (s/& (s/cat :a neg? :b even?) #(even? (count %))))

borkdude12:11:18

See the difference here:

user=> (s/conform (s/alt :foo int? :bar string?) [1])
[:foo 1]

user=> (s/conform (s/or :foo int? :bar string?) [1])
:clojure.spec.alpha/invalid
user=> (s/conform (s/or :foo int? :bar string?) 1)
[:foo 1]

Jim Newton12:11:06

yes s/or matches an object, and s/alt matches a sequence of objects

Jim Newton12:11:16

So why doesn't this return true?

(s/valid? (s/* (s/&  (s/cat :a neg? :b even?)  
                     (s/cat :c odd? :d pos?))) 
          [-3 4 -5 2])
it is both a sequence of neg even occurring 0 or more times, and also a sequence of odd pos occuring 0 or more times

Jim Newton12:11:05

If I replace s/& with s/alt (and insert the required keys), it works as expected.

(s/valid? (s/* (s/alt :x  (s/cat :a neg? :b even?)  
                      :y  (s/cat :c odd? :d pos?))) 
          [-3 4 -5 2])

Jim Newton12:11:41

It looks to me (on first and second reading of the specification) that (s/& pattern predicate) means that the subsequence which matches the pattern must as a sequence match the predicate. So the following

(s/cat :a (s/& (s/* string?) #(even? (count %)))
       :b (s/& (s/+ int?)    #(odd? (count %))))
matches a sequence which begins with an even number of strings and ends with an odd number of integers. Is my interpretation correct? I don't think these are really regular expressions any more. I have to think about it, but my suspicion is that this cannot be solved without using a stack.

Jim Newton14:11:50

How can I recognize an object such as the one returned by

(s/* (s/alt :x  (s/cat :a neg? :b even?)  
            :y  (s/cat :c odd? :d pos?)))
s/spec? returns false . type returns clojure.lang.PersistentArrayMap . s/get-spec returns nil.

Jim Newton14:11:53

so regex? returns non-nil. is that one I should use?

Jim Newton14:11:04

seems to be any object for which ::op returns boolean true.

Jim Newton14:11:09

where ::op is :clojure-spec-alpha/op