Fork me on GitHub
#clojure-spec
<
2017-02-12
>
nblumoe08:02:09

Does anyone have experience with using clojure.spec in a scenario where numerical precision becomes an issue? I would like to have generative testings for a function whose output would match the input (except for some numerical imprecision due to matrix calculations happening). I struggle to come up with a good :fn spec which does not require crippling the generator. A simple :fn #(m/equals (:ret %) (-> % :args :matrix) 1e-6) does not work, because the generator ends up using “large” doubles which break the max. diff of 1e-6. I tried a spec using ratios but this ends up with “Couldn’t satify such-that” issues:

#(-> (m/div (:ret %) (-> % :args :matrix))
       (m/sub 1.0)
       m/abs
       m/maximum
       (< 1e-8))

nblumoe08:02:11

I am pretty sure I could constrain the generator in a way to make the simple check work but of course I would like to avoid doing that.

gfredericks13:02:24

@nblumoe what does the function you're trying to spec do?

nblumoe15:02:55

@gfredericks a simplified answer would be: Principal component analysis and then reconstruction of the original data from scores and loadings. Some special twists due to the specific domain of the problem though (chemistry).

gfredericks15:02:02

@nblumoe so your problem is that the algorithm breaks down at extreme values, but it would be arbitrary and artificial to impose constraints on the values in the spec?

nblumoe15:02:01

no, the algorithm works fine. the only issue is specing it reliably without making the generator too specific (with “specing” I mean including generative testing)

nblumoe15:02:50

(well, if by “algorithm” you mean the :fn spec predicate then YES 😉 )

gfredericks15:02:06

okay, so it's not that the algorithm doesn't work at extreme values, just that the arbitrary equality threshold of 1e-6 breaks?

nblumoe15:02:02

yes exactly. because this criterion is easy to break for example when using doubles in the range of 10e100 or whatever large enough

gfredericks15:02:24

can you make the equality threshold a function of the input data?

gfredericks15:02:42

that might have the advantage of giving you a tighter threshold at the other end as well

nblumoe15:02:47

I was trying that too but ended up with unsatisfied “such-that” issues. would also need to set the threshold element wise (e.g. if a matrix contains 1e100 as well as 1e1 those should have different thresholds then)

gfredericks15:02:47

what does such-that have to do with it?

nblumoe15:02:57

using ratios of the inputs and returns was how I tried to circumvent the issues....

nblumoe15:02:18

quote from above: “I tried a spec using ratios but this ends up with “Couldn’t satify such-that” issues"

gfredericks15:02:22

asserting the ratio should be close to 1?

gfredericks15:02:50

are you using such-that directly, or is it generated by spec because of a spec/and?

nblumoe15:02:12

the such-that issue basically means that the generator was not able to find valid data within 100 tries, which can happen when the spec is quite specific and the search space for data is rather large

nblumoe15:02:34

(lol pun not intended)

nblumoe15:02:28

indeed I am using s/and

gfredericks15:02:30

I'm just not seeing why the "using ratios" idea entails modifying the input spec; I'd imagine it just means modifying the :fn on the spec

nblumoe15:02:11

yeah I was also surprised that changes to :fn could result in those issues

gfredericks15:02:44

I'm 80% sure changes to :fn should be independent of the probability of such-that errors; what does your s/and look like?

nblumoe15:02:08

I would have that that only :args could have such effect. is anything generated for the :fn itself maybe?!?

nblumoe15:02:58

:args is a bit complicated tbh:

:args (s/and (s/cat :matrix (s/and ::matrix
                                     ::range-transformable
                                     ::non-zero-row-sums
                                     ::non-zero-length-rows
                                     #(s/valid? ::non-zero-length-rows (range-transformation %)))
                      :const-row-sum (s/int-in 1 100)

                      :num-eigenvectors (s/with-gen pos-int?
                                          #(s/gen (set (range 0 20)))))
                                        ; upper limit of eigenvectors
                                        ; should not be hardcoded
               #(= (:num-eigenvectors %)
                   (min (count (first (:matrix %)))
                        (count (:matrix %))))
               #(s/valid? ::non-zero-row-sums (-> (:matrix %)
                                                  range-transformation
                                                  (initial-loadings (:num-eigenvectors %))))) 

gfredericks15:02:03

the such-that errors might be nondeterministic -- are you sure you don't get them with the non-ratio code?

nblumoe15:02:15

usually that does not cause any such-that issues on generation, works pretty flawlessly accept for the aforementioned changes to :fn

gfredericks15:02:16

maybe your test fails before it's likely to encounter them?

nblumoe15:02:42

can test again… how many tries should I give it? 😉

gfredericks15:02:59

I'm not sure what you're asking

nblumoe15:02:10

sry, edited

gfredericks15:02:57

I think just try s/exercise on the args spec

gfredericks15:02:17

(s/exercise (s/and (s/cat ...) ...) 10000)

gfredericks15:02:41

probably wrap that in (count ...) 🙂

gfredericks15:02:47

to avoid printing big things

nblumoe15:02:04

indeed! it occasionally fails, so the issue is rather the :args spec and the generator....

gfredericks16:02:32

it could be either of the s/ands, or both

gfredericks16:02:14

the way s/and works as a generator is to generate something from the first spec and then filter on the rest of them; so the generator of the first spec needs to be highly likely to generate something that passes all of the predicates, or you get the such-that error

gfredericks16:02:05

sometimes you can simply restructure the s/and to tailor to that, and it works out; other times the only way to get a good generator is to write a custom one for the s/and

gfredericks16:02:23

since you have nested s/ands it's not obvious which one is the problem

gfredericks16:02:29

I'd try exercising the inner one and see what you get

nblumoe16:02:09

ok thanks, that is valuable information! I already have a custom generator for ::matrix but that is not really tailored in any way to satisfy the following predicates. will do some exercises 🙂

nblumoe16:02:58

great! the outer s/and was the issue: the second predicate for :num-eigenvectors was apparently failing too often

nblumoe16:02:36

that specific predicate also does only cover one case where the number of eigenvector matches either the number of columns or number of rows. that is only one specific case though (in which the input data will be reproduced). I need to think about how to cover the usage with less eigenvectors than that… the invariant for the generative testing seems to be quite tricky though then