Fork me on GitHub
#test-check
<
2017-04-26
>
gfredericks12:04:12

I'm working on a biginteger generator for test.check, presumably to be used by spec as well (directly or indirectly); the main decision to make is what the distribution should be. Here's what I've got so far, interested in any comments. https://gist.github.com/gfredericks/b6b59f1c531dc36017e45f2f0beeff9e

bbloom18:04:31

let’s say i want to test some stateful thing by generating a sequence of actions to perform

bbloom18:04:42

how do i use state to inform future generated actions?

bbloom18:04:06

simple example: let’s say i want to test a growable array class with three methods: getLength, append, and getNth

bbloom18:04:24

that was easy to test, but as soon as i wanted to test setNth, i was at a loss

bbloom18:04:40

how do i make it so that setNth can check getLength first?

bbloom18:04:22

my current hack is to just do modulo math at the time of applying the action - but that only works for this simple example & i have more complex stuff i want to test

bbloom18:04:06

my best guest is to try defining generators recursively, deferred by an fmap or bind or something - but wasn’t sure if that was a hack or recommended or what

bbloom18:04:44

ie parameterize them based on a model of the state

Alex Miller (Clojure team)18:04:40

the general approach is to first build a (random) model, then produce the action list via fmap/bind

bbloom18:04:44

not 100% sure i follow. is there a super simple example somewhere?

gfredericks18:04:28

There's a whole lib for this kind of thing I think

bbloom18:04:00

i’m looking at two such libraries now, trying to see if they clarify things for me: stateful-check and states

gfredericks18:04:19

I haven't used either, but I've reviewed stateful-check a bit and it seems solid

bbloom18:04:43

there’s a non-trivial amount of code in this lib. i’ll study it, but i’m hoping to identify the essence of it.

gfredericks18:04:40

@bbloom do you understand the idea of modeling the whole interaction ahead of time?

Alex Miller (Clojure team)18:04:21

it’s been a while since I’ve looked at collection-check but I know it does a lot of this operation kind of thing (can’t remember if it’s stateful though)

bbloom18:04:38

so i haven’t used test-check in anger at all, but i did successfully roll my own generative/simulation testing thing in go for a wire protocol. but that was a hierarchical temporal marckov model, i didn’t do shrinking, and i did all the validation later on the log file - worked mostly as a stress test for a long running system

bbloom18:04:28

to get multiple tests, i didn’t have a backtracking generator type thing - instead just ran a bunch in parallel, since they were also testing limited hardware resources (ie physical devices)

gfredericks18:04:41

I think the shrinking model is the biggest reason that the generate-everything-up-front approach works best

bbloom18:04:41

so i have a bit of a grasp on the concepts, but no knowledge of the specific apis

bbloom18:04:03

sooo now back to your question: i’m not sure what you mean “modeling the whole itneraction ahead of time”

gfredericks18:04:11

the only viable alternative I've seen is the python/hypothesis approach which is extremely imperative and wildly different

bbloom18:04:16

@alexmiller: i’m looking at collection-check, as what i’m doing is quite similar. thanks

gfredericks18:04:19

@bbloom I mean the generator generates the entire intended interaction, e.g. - insert a - insert b - read 0, expect a - read 1, expect b - etc.

gfredericks18:04:32

I believe this is essentially the idea behind stateful-check

gfredericks18:04:44

it does all the wiring-up for you

bbloom19:04:10

yeah, so i’m trying to test some java code that is stateful & i’m successfully generating a sequence of actions & applying them

bbloom19:04:44

the tricky bit is the bit that stateful-check seems to address: using the state of the model in order inform the generators

gfredericks19:04:48

"Why can't I call generators in the middle of my test?" is a common thing people run into; currently test.check doesn't try to give you a way to do that, but I'm not 100% convinced it can't be done

gfredericks19:04:00

I've thought about it a lot; the shrinking is the hard part

bbloom19:04:38

the fact that it’s asked for a lot leads me to wonder: is it a misunderstanding that leads to it being wanted? or it just hard to provide?

bbloom19:04:44

ie should i be thinking about it differently?

Alex Miller (Clojure team)19:04:32

We've done a number of sim testing projects at Cognitect that pushed this kind of thing really far

Alex Miller (Clojure team)19:04:05

But there you don't really care about shrinking at all

bbloom19:04:51

glad to hear that - since i had no use for shrinking in my sim test and wondered if i was just missing out

Alex Miller (Clojure team)19:04:15

I'd ask @luke about some of those ideas - he has some libs that I think are oss that do statistical / stateful generative stuff

bbloom19:04:22

this is the first time i wanted to test something that felt complex enough to justify test check and not complex enough to justify distributed sim testing

gfredericks19:04:25

it should be possible to write a thingamajigger to call generators in the test deterministically, without doing any shrinking

Alex Miller (Clojure team)19:04:42

In general with sim testing you want some simple (ish) model how users use the system such that you can generate realistic (ish) streams of random activity

bbloom19:04:27

yeah - in my case, i simulated a police officer on patrol. turning cameras on and off, watching videos, driving to and from places with or without wifi, etc

bbloom19:04:39

worked nicely

Alex Miller (Clojure team)19:04:17

Our domains were a lot more complicated :)

bbloom19:04:29

i’d imagine

Alex Miller (Clojure team)19:04:54

Cognitect does arch consult gigs, just saying :)

gfredericks19:04:45

like...CPU chips? big buildings?

gfredericks19:04:53

large-scale computer systems?

bbloom19:04:53

i don’t work on that any more, but the team seems quite happy with the architecture i built for them - which is largely clojure/hickey-inspired, thank you very much 😉

bbloom19:04:13

@gfredericks i don’t quite understand the api well enough yet to know if that code is useful to me

gfredericks19:04:26

@bbloom I added an example

gfredericks19:04:44

the idea is you generate one of these fs and then you can call it from the body of your property as many times as you want with arbitrary generators

gfredericks19:04:35

it's a stateful function, whose state is randomness derived from the normal test.check source of randomness, so the whole thing should be deterministic as long as you don't call the function in nondeterministic ways

bbloom19:04:12

i’m not sure i need something that sophisticated

gfredericks19:04:30

so e.g., if you have your stateful java collection and want to generate a valid index, you'd call (f (gen/choose 0 (dec (.getLength thingamajig))))

gfredericks19:04:00

and that call would return an index between 0 and (dec (.getLength thingamajig))

gfredericks19:04:18

the sophistication is mostly to satisfy my own ideals of reproducibility

gfredericks19:04:40

if you couldn't care less about that you can just call gen/generate and not bother with any of this

bbloom19:04:10

so i don’t actually need to call .getLength on the real stateful object during test execution

bbloom19:04:19

i just want to track some state in the model & use that to inform which actions to generate

bbloom19:04:51

looking at collection-check, they seem to use clojure’s types as a model

bbloom19:04:15

but zach does basically what i mentioned earlier: he generates wild indexes for nth/assoc and then fixes them up during execution

bbloom19:04:00

how much am i expected to know about the rose tree stuff in order to make effective use of test-check? this stateful-check thing manipulates it quite a bit, plus has some gen-do monadic bind syntax etc that seems redundant w/ fmap, gen/let, etc

gfredericks19:04:41

users don't normally need to care about the rose tree

gfredericks19:04:28

my guess is that stateful-check's low-level internals are an attempt to shrink better than the naive shrinking you get by making a pile of binds

gfredericks19:04:46

gen/bind is pretty dumb about shrinking

gfredericks19:04:11

and stateful-check might be able to make assumptions that bind can't

gfredericks19:04:53

maybe a simple example of generating a model with actions would be helpful

gfredericks19:04:28

gimme a minute

bbloom19:04:43

thanks for your help. greatly appreciated

bbloom19:04:05

i’m still reading and trying to make sense of stateful-check, but some things concern me

bbloom19:04:29

for example, command argument generation seems to attempt to implicitly and recursively lift values to be generators

bbloom19:04:45

which seems like magic i don’t want, at least not before i am comfortable w/ the monadic api

bbloom19:04:20

meanwhile, i’m failing to extract the essence from the core of the generate-commands* routine

gfredericks19:04:04

yeah I've never liked the values-can-be-generators idea

gfredericks19:04:23

in practice it's probably not too confusing

bbloom19:04:56

….maybe with a type system? 😛

gfredericks19:04:00

could fancy it up to add the intermediate state at each step, or generate random assertions if you don't want to just compare the whole map

gfredericks19:04:12

note that the gen/lets are where bind is happening

gfredericks19:04:12

there might be stack problems with this; maybe that's another reason for stateful-check's complexity

gfredericks19:04:26

e.g., if you wanted to generate 10000 actions you might have problems

bbloom19:04:33

yeah - so i tried something like this and got a stackoverflow

bbloom19:04:59

but thanks, this example is much closer to my intuition

gfredericks19:04:20

I just generated 20,000 of these up to the normal max-size of 200 and got no exceptions

gfredericks19:04:49

looks like the sizes don't get above 20 though

gfredericks19:04:52

so that might be bad

gfredericks19:04:34

that's probably why it's not SOing 🙂

gfredericks19:04:50

to get larger you'd use gen/frequency instead of gen/one-of and tweak the weights

gfredericks19:04:02

alternately you could generate a size up front and pass that through the recursion

bbloom19:04:02

well you had equal assoc/dissoc frequency

bbloom19:04:25

i wasn’t doing any dissoc to reduce the size, so it grew faster

gfredericks19:04:02

I think stack problems would be related to action-count, not the size of the state

bbloom19:04:36

oh, hmm that’s right

bbloom19:04:47

not sure what i did, since it’s like 5000 undos ago 🙂

bbloom19:04:10

with respect to shrinking: how well does that work as things grow more complex?

bbloom19:04:35

i’m somewhat skeptical it can work well for stateful tests - since each removed operation can potentially invalidate the remainder of the operations

gfredericks19:04:42

that's exactly the problem

gfredericks19:04:15

if you're using the naive bind structure in my example, what would happen is if test.check tries to shrink an earlier action it will generate a totally new set of following actions

gfredericks19:04:28

it might be that in practice this isn't as bad as it sounds

gfredericks19:04:44

I get almost 0 complaints about this

bbloom19:04:30

which of course could mean 1) it’s perfect 2) no body is using it or 3) people don’t understand it well enough to complain about it without embarassing themselves like i’m doing now 🙂

gfredericks19:04:31

I made a ticket about it at least, but don't know if it will lead anywhere https://dev.clojure.org/jira/browse/TCHECK-112

gfredericks19:04:57

2 isn't plausible; people are definitely using bind, at least via gen/let

gfredericks19:04:08

in fact gen/let tends to encourage over using bind

gfredericks19:04:41

i.e., using bind to combine two independent generators in a way that you could do with gen/tuple

bbloom19:04:23

does tuple do “parallel” bind? ie it shrinks from any position fairly?

bbloom19:04:39

and bind is “left biased”?

gfredericks19:04:02

that...might be right

gfredericks19:04:08

tuple shrinks each thing independently

gfredericks19:04:19

you can always rewrite tuple with bind but not vice versa

bbloom19:04:22

gotcha - so tuple is applicative

gfredericks19:04:32

that's probably true

bbloom19:04:58

ok - so i guess i’ve confirmed 1) i understand this and 2) i still have no idea how to use the API 😛 thanks

bbloom19:04:11

i’ll keep playing with it

bbloom19:04:21

will let you know how it turns out

gfredericks19:04:24

oh well; do let me know if you have more questions or accusations

bbloom19:04:09

heh, i hope the accusations thing was a joke and/or not about me. let me know if i failed to politely convey my gratitude!

gfredericks19:04:33

oh yeah sorry, definitely a joke

bbloom20:04:05

so shrinking only removes elements from the reproduction, right? no rewriting in any way?

bbloom20:04:50

i’m attempting a “loose” approach to action generation - will leave state out of that part & use state in the code that “runs” the actions. that will screw up shrinking probably, but i’ll deal with that when i have trouble pinpointing an issue in the future

bbloom20:04:56

so like if i were testing that random access stack example from before, actions may just be like “do 5 pushes, then randomly swap elements 8 times, then do two pops, etc….”

bbloom20:04:00

and see how that does for me

bbloom21:04:17

is there a generator that will (with high likelihood?) test boundary cases such as max/min integers? - the large-integer generator doesn’t seem to explicitly select those, so i guess i need to just use one-of or frequency to ensure those cases get covered (which is relevant to me b/c the code under test does some low level binary fiddling sometimes & i want to make sure it doesn’t screw that up)

bbloom21:04:46

i’ve gotta say - the learning curve is relatively steep with this stuff

bbloom21:04:14

but that might just be b/c i’m not comfortable using stuff w/o understanding it - and the monadic stuff makes it a bit opaque

bbloom21:04:57

for example, it wasn’t immediately clear what a “property” really was under the hood (just a generator that produces a particular shape of data) or that it was safe to use side effecting asserts in such a property

bbloom21:04:12

the readme shows an example that returns a boolean

bbloom21:04:30

but if i tested each property individually, my tests would be far too slow

bbloom22:04:45

while i’m rambling to myself (or whoever is listening?):

bbloom22:04:27

i’m not quite happy with my attempts to generate action sequences yet - i’m getting more variability in parameters and less variability in the sequence of actions - i’m not sure how to influence that properly yet

bbloom22:04:59

to use the stack example again, i’m getting a lot of tests that try just using different elements to push on the stack, but really what i want is a lot more tests that have longer sequences of pushes/pops/etc

bbloom22:04:48

the internal generation strategy doesn’t appear to be documented anywhere. is it a top-down strategy? or a bottom-up one? or some hybrid? i can’t really tell easily, nor what the implications of that would be

gfredericks23:04:16

https://clojurians.slack.com/archives/C0JKW8K62/p1493238965936535 If you're talking about something like the event modeling code I shared earlier, new events CAN be generated during shrinking

bbloom23:04:47

how does that work?

gfredericks23:04:48

So bind generates a value from one generator, then uses the user supplied function to create a new generator and generates a value from that

gfredericks23:04:57

That much is straightforward?

gfredericks23:04:15

There are two ways to shrink from there

gfredericks23:04:46

You can shrink the thing generated in the second step - that's the easy way

gfredericks23:04:04

The hard way is to shrink the thing generated in the first step, because then you're obligated to call the user function with the new smaller value, which returns a potentiality entirely new generator that may have nothing to do with the original second-step value

gfredericks23:04:18

In any case you can't assume they're related

gfredericks23:04:02

So the only way to proceed is to generate something entirely new and probably unrelated from that new generator

gfredericks23:04:50

I suppose you could at least use the same randomness, just in case that helps. I think test.check already does that

bbloom23:04:24

i guess i don’t understand the shrinking process at all then - b/c what you’re saying here doesn’t make sense to me

bbloom23:04:06

is it just that shrinking re-runs the generators in some “shrink mode” and can do whatever the hell it wants?

gfredericks23:04:13

So shrinking is not an independent operation on generated values -- it's essentially done at generation time, by each generator in the composition

gfredericks23:04:42

This is a difference from haskell as I understand it

bbloom23:04:18

so is shrinking just “re-generate, but smaller”?

bbloom23:04:29

i would have expected generate & shrink to be two methods on a protocol

gfredericks23:04:55

It's not - a generator returns a Very Large lazy tree of values

gfredericks23:04:22

The root is the generated value; each node's children are ways to shrink from the value at that node

gfredericks23:04:32

The shrinking algorithm walks that tree

gfredericks23:04:54

Higher order generators combine these trees

gfredericks23:04:10

E.g., gen/fmap calls rose/fmap

bbloom23:04:15

ah, see, my mental model was one of producer/consumer

bbloom23:04:20

emitting generated values

bbloom23:04:28

totally wrong apparently 😛

gfredericks23:04:59

I can't remember if reid came up with it himself, but people don't seem to expect it

gfredericks23:04:17

I think maybe he guessed the impl based on erlang's API

gfredericks23:04:51

So perhaps john hughes is the source

gfredericks23:04:06

My talk Purely Random is about a lot of these details and about how I converted test.check to use an immutable rng

bbloom23:04:20

i might have to watch that

gfredericks23:04:18

I wish I knew how to get slack to notify me of any activity in this channel

gfredericks23:04:43

Maybe I just did that

bbloom23:04:51

so if i do (gen/sample (gen/list gen/int)) for a large sample size - and take the max of the lengths of the lists, it seems to stop at 99

bbloom23:04:55

i have no idea what controls that value

bbloom23:04:01

and i have no idea how to go about figuring it out lol

gfredericks23:04:31

There's a doc page in the repo on growth and shrinking

gfredericks23:04:15

The other thing people don't usually expect is that growth and shrinking are not related

bbloom23:04:52

the word “grow” does not appear in any of the .md files in the doc directory 😛

gfredericks23:04:08

Should be in one of the titles

bbloom23:04:44

nope - maybe it’s somewhere else? i’ve read all the docs in the repo that i could find, heh

gfredericks23:04:41

It's new, maybe you didn't have it locally

bbloom23:04:16

ah, sorry - my bad. my last pull failed, i have code from december

bbloom23:04:33

will read that

bbloom23:04:33

lol, i had code checked out from ~2 days before you added that

bbloom23:04:55

ok - that doc gave me a much better idea of how growth & size works. thank you!

bbloom23:04:03

however, it didn’t give me much insight in to shrinking

bbloom23:04:16

other than the fact that shrinking is unrelated to growth 😛

bbloom23:04:05

also, i’m still not clear on how size is related to whether or not a generator is run

bbloom23:04:12

for example, if i do this: (gen/sample (gen/vector (gen/resize 0 gen/int)))

bbloom23:04:37

i get stuff like [0] multiple times, which i’d expect - but it suggests that the frequency of generator execution is unrelated to the size of the range of the generator

bbloom23:04:25

in order to have any confidence… i guess i’m going to need to understand this rose tree thing….

gfredericks23:04:22

gen/generate can give you better control than sample for playing with sizing

bbloom23:04:13

ok, i think i understand the rose tree / shrinking now….

bbloom23:04:50

Reid’s rationale suggests getting a shrinker “for free” from the generator, but i’m skeptical of that approach vs the two-method approach that apparently haskell uses

gfredericks23:04:26

You're thinking a generator is an implementation of a protocol with one method for generating a value and another for shrinking it?

bbloom23:04:49

i did think that, which is apparently true in QuickCheck according to Reid’s blog post

gfredericks23:04:26

Consider how to implement fmap then

gfredericks23:04:55

(gen/fmap str gen/nat), e.g.

gfredericks23:04:33

How do you shrink "42" given a function that can shrink 42

bbloom23:04:21

same way it does now? seems like it just divides by two in some complex round-about fashion

gfredericks23:04:35

It's a string, not a number

bbloom23:04:06

sorry, i misread

bbloom23:04:26

(defn fmap [f x] (reify IGenerator (gen [this ctx] (f (gen x ctx))) (shrink [this ctx value] (f (shrink x ctx))))

bbloom23:04:00

er sub that second value with x

bbloom23:04:05

or second x with value

bbloom23:04:08

you get the idea

bbloom23:04:20

but yeah, i get my intuition doesn’t match reality

bbloom23:04:12

but as far as i can tell, the public API of test.check doesn’t expose the rose tree stuff

bbloom23:04:22

all the constructors for generators assume you return a value, not a rose tree

bbloom23:04:05

which i guess that explains why stateful-check mucks around with internals a bit