Fork me on GitHub
#test-check
<
2016-07-28
>
mattly00:07:05

I found a few other places in the actual generation of things where I'm being wasteful

mattly00:07:46

but cutting that out wasn't nearly quite as effective as getting rid of distinct requirement

mattly00:07:01

btw, thanks for all the work on test.chuck, checking and subsequence are invaluable tools in my toolbox these days

gfredericks00:07:03

I don't quite understand what you mean by "putting those values in now after-the-fact"

mattly00:07:52

I'll do an example

gfredericks00:07:49

FYI distinct collections are generated by maintaining the elements in a transient set: https://github.com/clojure/test.check/blob/master/src/main/clojure/clojure/test/check/generators.cljc#L546

mattly00:07:59

oh interesting

gfredericks00:07:35

so it's not inconceivable that the overhead of adding and looking up elements is slowing things down

gfredericks00:07:56

if that's really your issue I have a hard time imagining how to make it faster

mattly00:07:16

(gen/fmap (fn [things] (map #(assoc %2 :name (str %1 "-" (:name %2))) (range) things)) (gen/list (gen/hash-map :name gen/string-alphanumeric)))

mattly00:07:02

I get that, and really for me and my use case it comes down to, am I using the generator for actual values I want to test or just random input?

mattly00:07:20

and in the case of these names, it's just random input that needs to be distinct

gfredericks00:07:03

I suppose you probably have to compute hash values for the data when you wouldn't otherwise

mattly00:07:05

I won't get much value of shrinking

mattly00:07:41

where I get the value of shrinking here is the number and depth of branches and the leaf values

mattly00:07:50

but not the id

gfredericks00:07:30

FYI gen/uuid is evenly distributed and doesn't shrink, so you could use that for uniqueness without needing to check

mattly00:07:08

yeah, that occurred to me as well

mattly00:07:50

but this works better for my use-case

mattly00:07:59

it's... complicated

mattly00:07:46

I'm working on a system to let people do self-serve analytics on a data warehouse, but with a complicated permissions structure on top of it

mattly00:07:58

from my experience doing similar things in the past, i know that scenario-based testing has its own set of gotchas

mattly00:07:59

so I'm basically generating the dimension/fact graph and putting that into our data store, which, well, it's not what I would have chose

mattly00:07:55

property-based testing of it though has helped me catch a ton of bugs in the prototype I'm replacing that I don't think anyone would have ever thought to look for

mattly06:07:44

having gotten rid of the distinct name requirements across my entire gen'd tree, and doing some other things around the frequency of size and depth of branches, I cut the run for my test suite down to 1/10th of what it was before

mattly06:07:57

and it still shrinks awesomely

mattly06:07:33

I also included in the trunk generator some flags to turn off certain branches of the tree if they're not needed for a test

gfredericks14:07:34

@mattly do you think parallelizing tests and/or generators would help out?

gfredericks14:07:57

I've worked on parallelizing tests before, but it just occurred to me that slow generators could be parallelized even if the tests themselves can't

gfredericks14:07:53

E.g., during the test run you have one or more background threads doing the generating while the main thread does the actual tests

mattly14:07:20

I'm not sure, tbh

mattly14:07:29

One thing I'm looking into now, after a deeper branch ends up with 100k+ nodes when it starts to fail, is exponential scaling of the size of nodes

mattly14:07:51

which would actually fit the shape of the data I'm trying to model well

gfredericks16:07:40

You're saying you like it being so large?

mattly17:07:52

eh, well, that's the count of something akin to (gen/vector (gen/vector (gen/vector gen/int))) but flattened; and I've found a few bugs that have only manifested when the node size gets that large

mattly17:07:20

shrinking, of course, will end up narrowing that down to like the 2 or 3 end nodes that cause the failure

mattly17:07:43

and really it's more due to the nature of the data I'm working with / replicating, and the complex query sets I have to run on top of them

mattly17:07:04

as I find specific cases like that I tend to break out specific tests for that data scenario