clojure-spec

karol.adamiec 2023-04-23T20:50:08.594499Z

Hi, I get the mysterious "Additional data must be non-nil." Below is a sample. What do i do wrong?

(defn mean
  [numbers]
  (when-let [num-count (seq numbers)]
    (/ (reduce + 0 num-count) (count num-count))))

(defn within-range?
  [avg coll]
  (if (empty? coll)
    true
    (let [min-val (apply min coll)
          max-val (apply max coll)]
      (<= min-val avg max-val))))

(defn avg-unchanged-by-order?
  [avg coll]
  (let [shuffled-coll (shuffle coll)
        shuffled-avg (mean shuffled-coll)]
    (= avg shuffled-avg)))

(s/fdef mean
  :args (s/cat :numbers (s/nilable (s/coll-of (s/double-in :NaN? false :infinite? false))))
  :ret  (s/nilable number?)
  :fn (fn [{:keys [args ret]}]
              (if-let [coll (:numbers args)]
                (and (within-range? ret coll)
                     (avg-unchanged-by-order? ret coll))
                true)))

(stest/check `mean)

2023-05-26T19:28:54.803009Z

I'm using metosin data specs and have a datastructure like this:

[{:type "foo" :message "foo-msg"}]
In my code I have various maps of type->msg
(def errors-a
  {:foo "foo-msg-a"
   :bar "bar-msg-a"})

(def errors-b
  {:foo "foo-msg-b"
   :bar "bar-msg-b"})

(defn keyset [m]
  (->> m keys (map name) (into #{})))

(defn ->error-spec [errors]
  [{:type (s/spec (keyset errors))
    :message string?}])
s/spec expects a macro and so s/explain-data outputs the symbol (e.g. keyset) instead of the set inlined. I've tried writing a macro to inline the set produced but have been getting errors to do with the mixing of macros and trying to evaluate values at compile time. Anyone able to help me write a macro so (s/spec (keyset errors)) inlines to (s/spec #{"foo" "bar"}) as this makes the explain-data most understandable

phronmophobic 2023-04-24T22:53:50.237419Z

It depends on your goals. If you just want a function that calculates the mean of a sequence of numbers correctly, then I would use a library. However, there might be other goals (like learning spec) where you might do something different.

phronmophobic 2023-04-24T22:58:36.939219Z

There are lots of fiddly trade offs when working with floats. Most libs that work with floats use a number of tricks like comparing within some tolerance for tests. Another trick is to pre sort the numbers before summing to improve accuracy and consistency. I’m not an expert here which is why I would just use a library.

karol.adamiec 2023-04-25T06:21:50.184579Z

yeah, the goal is to use spec to learn. i am not implementing new statistical package... yet :D

πŸ‘ 1
karol.adamiec 2023-04-25T06:24:23.574339Z

i went with tolerance delta. The biggest takeaway is: if your spec is non-deterministic, you will get the mysterious exception mentioned above. Easy trap to fall into and burn, but once burned unlikely to happen again. And seeing that error immediately points in right direction. Typical newbie problem...

phronmophobic 2023-04-25T07:45:34.839989Z

I actually haven't used s/fdef . For these types of things, I usually go straight to https://clojure.org/guides/test_check_beginner. The nice thing is that you can create separate tests for each property (eg. within-range, avg-unchanged-by-order?) and run them independently for however long it takes to feel comfortable that your implementation works. For avg-unchanged-order?, doing one shuffle only tests a small part of the test space. I would probably write it to use https://github.com/clojure/math.combinatorics to test a broader part of the input space.

phronmophobic 2023-04-25T07:50:51.789919Z

For test.check, there's a way to get the RNG seed to make tests consistent and reproducible. There's probably also a way to do that with s/fdef so that avg-unchanged-order? is consistent/reproducible and avoids the problem you were running into.

karol.adamiec 2023-04-25T07:56:47.774659Z

Thanks for ideas. Will think on that. One point intrigued me especially. You say: _I actually haven't used `s/fdef` . For these types of things, I usually go straight to <https://clojure.org/guides/test_check_beginner|test.check>. The nice thing is that you can create separate tests for each property (eg. within-range, avg-unchanged-by-order?) and run them independently for however long it takes to feel comfortable that your implementation works._ One could add `s/fdef` to helpers as well too. It seems to mee then, that desired test "pyramid" would be: 1. s/fdef for most important and public facing parts 2. test.check for these as well, 3. exaple based classic unit tests, to present few cases to readers. It gets hard to get a glance of "shapes" involved without examples.

karol.adamiec 2023-04-25T07:59:11.048069Z

will try to roll with this and see how my perceptions change πŸ˜„

karol.adamiec 2023-04-25T08:00:12.306279Z

i do feel there is this a bit awkward dichotomy with spec/gen and test.check/gen i.e. some things are wrapped, but sooner or later (rather sooner) one HAS TO land in test.check land and learn it good.

karol.adamiec 2023-04-25T08:01:39.741879Z

OTOH, using test.check for double generator, where there is a nice s/double-in in spec feels wrong... so i end up doing things twice... and have extra pass => can i do that using spec namespace only...

phronmophobic 2023-04-25T08:03:28.981679Z

I still use stuff like s/def and s/double-in. You can get the generator to use with test.check via s/gen.

πŸ‘ 1
karol.adamiec 2023-04-25T08:06:34.390839Z

in official docs they do mention that one should use test.check as last order of biz. as that will force that dependency on prod build. So it seems like the spirit of what i should do is try to avoid test.check in my prod namespaces, and reach for that only in testing namespaces. Although i have to say, i do not really follow the argument there, as spec depends on it anyway... so who cares whether i include that explicit or it is sucked in in deps of deps...

phronmophobic 2023-04-25T08:08:26.662489Z

spec dynamically loads it so you can load spec without test.check

karol.adamiec 2023-04-25T08:11:25.352269Z

yes it does. so i do not fully understand this: There are three ways to build up custom generators - in decreasing order of preference: 1. Let spec create a generator based on a predicate/spec 2. Create your own generator from the tools in clojure.spec.gen.alpha 3. Use test.check or other test.check compatible libraries (like https://github.com/gfredericks/test.chuck) The last option requires a runtime dependency on test.check so the first two options are strongly preferred over using test.check directly.

karol.adamiec 2023-04-25T08:12:37.331639Z

test.check bytecode ends up in my prod slim and shiny build anyway... What is the perspective here that i miss? Maybe some more intricate interactions when designing libs based on libs...

karol.adamiec 2023-04-25T08:13:10.048059Z

Or just a rule of thumb to make code as future proof as possible as this will be the direction?

phronmophobic 2023-04-25T08:13:32.504069Z

I don't think test.check should show up unless you explicitly include it.

πŸ€” 1
phronmophobic 2023-04-25T08:14:17.380239Z

maybe some other dependency is pulling it in?

karol.adamiec 2023-04-25T08:17:51.095609Z

hmm... i might have misunderstood the wording then. i assumed that spec is using test.check namespace internally

karol.adamiec 2023-04-25T08:18:06.674689Z

i have not inspected the build image btw, just to be clear

karol.adamiec 2023-04-25T08:19:08.522459Z

but what you say makes more sense for sure πŸ™‚

karol.adamiec 2023-04-25T08:19:22.690289Z

and clears my confusion, so this is 99% correct i think

phronmophobic 2023-04-25T08:19:46.954589Z

It does use test.check , but only for functionality starting at generators and later, https://clojure.org/guides/spec#_generators

phronmophobic 2023-04-25T08:20:04.551249Z

which you won't usually need at runtime

karol.adamiec 2023-04-25T08:20:22.361699Z

spec generators rely on the Clojure property testing library https://github.com/clojure/test.check. However, this dependency is dynamically loaded

karol.adamiec 2023-04-25T08:20:35.762229Z

yes, to that was the part that set my mind on it

karol.adamiec 2023-04-25T08:21:20.973629Z

so, i hack away happily using spec namespace, but ... it will load in test.check at some point even in prod. even without me saying it has to

karol.adamiec 2023-04-25T08:21:39.301109Z

so we go back to the 1.2.3. above. why would that matter?

karol.adamiec 2023-04-25T08:22:05.544099Z

anyway. i can live with not knowing for now. Thank You for help πŸ™‚

phronmophobic 2023-04-25T08:23:12.801309Z

If your prod code doesn't use generators or things that need them, then you don't need test.check as a prod dependency and it won't be included

phronmophobic 2023-04-25T08:25:30.378639Z

That's what all the delay and dynaload stuff is doing.

phronmophobic 2023-04-25T08:27:18.569269Z

and if test.check isn't included and you try to use generators, you'll get an exception, https://github.com/clojure/spec.alpha/blob/ad06cdc7407c11990c7e93206133fb14eb62cacf/src/main/clojure/clojure/spec/gen/alpha.clj#L40

karol.adamiec 2023-04-25T08:37:23.951359Z

nice. πŸ‘

phronmophobic 2023-04-23T21:04:03.022439Z

The docs suggest using stest/abbrev-result to get more info about what happened when it failed, https://clojure.org/guides/spec#_testing. Have you tried looking at that info?

karol.adamiec 2023-04-23T21:13:43.297649Z

if i do as per docs (stest/abbrev-result (first (stest/check mean`) i do not see any more relevant information. There is the sample case that "failed", but it is correct input and if i give it to mean or its validators/helpers everything seems fine :/

phronmophobic 2023-04-23T21:14:35.204279Z

What is the failing sample?

karol.adamiec 2023-04-23T21:23:02.731909Z

it is a bit random. sometimes it says spec failed and collection is [ 1.0000004768371582 1.000000238418579 2.147483648E9 ] and calculated mean is 7.158278833333336E8 other times there is nothing inside, other than the message in exception "additional data must not be null".

karol.adamiec 2023-04-23T21:24:53.107269Z

stacks point at explainInfo --- Contents: 0. clojure.spec.test.alpha$explain_check 1. invokeStatic 2. "alpha.clj" 3. 390

karol.adamiec 2023-04-23T21:27:29.599389Z

found some old @alexmiller explanations about similiar thing... but unsure if that still applies...

phronmophobic 2023-04-23T21:30:31.034479Z

one thing that is suspicious is that mean is called in the validation function. I would try to narrow down where the problem might be. Examples: β€’ run stest/check with just the :args specification β€’ run with just the :ret β€’ run with just :fn β€’ run with just :fn and only check within-range? β€’ etc

karol.adamiec 2023-04-23T21:39:55.163569Z

yeah, that mean call maybe sth... i changed the avg-unchanged-by-order? to be a dummy that always returns true and now 1000 tests pass...

karol.adamiec 2023-04-23T21:40:09.450439Z

thanks πŸ™‚

πŸŽ‰ 1
phronmophobic 2023-04-23T21:41:31.306789Z

I think there’s a way to write those sorts of checks, but I’m not sure how to do it off the top of my head.

phronmophobic 2023-04-23T21:43:00.672909Z

They’re definitely supported by test.check

karol.adamiec 2023-04-23T21:43:33.325249Z

i would think calling function under test is ok... it is not like recursion...

karol.adamiec 2023-04-23T21:46:05.573269Z

yeah, all pass if i return true, but still do all the calcs and logic. so just calling mean seems to not be the issue

phronmophobic 2023-04-23T23:01:38.994759Z

Another thing that just popped into my head is that your test expects the order of the numbers not to make a difference in the calculated mean, but that's not true for all floating point numbers!

πŸ‘ 1
πŸ’ƒ 1
🧐 1
1
πŸ™Œ 1
phronmophobic 2023-04-23T23:02:37.753719Z

not sure why that didn't occur to me earlier doh

karol.adamiec 2023-04-24T05:14:54.439529Z

You saved my bacon sir! Swapped the spec to be ints, not doubles, and behold it is rock solid now. That mysterious error pops up when the spec is non-deterministic, which Alex has said years ago more less. So a better error message would be nice πŸ™‚

karol.adamiec 2023-04-24T06:03:47.622919Z

So a new question arises, what now? What would be best way to move forward? 1. drop the invariant (yikes...) 2. replace the generator in spec to gen ints, even though we say doubles... 3. use a tolerance delta in the invariant <= my favourite i think... but adds so much noise ;/