I've noticed that gen/large-integer* does not distribute uniformly when providing small bounds. This is not surprising once you read its implementation, but has anyone tackled this problem before? Here's the kinds of distributions with a range of 13 numbers, using gen/large-integer* vs gen/elements https://github.com/metosin/malli/issues/1291
I have some stuff here that isn't a drop in solution but may help https://git.sr.ht/~hiredman/surmise/tree/master/item/src/com/manigfeald/surmise.clj#L124 I think it should be uniformly distributed, but I believe it tops out in the range of longs
IIUC your approach looks similar in some ways, in that it compresses 64 bits of randomness into a smaller size. I think this is what skews bounded large-integer's distribution, can you explain your intuition why yours is uniformly distributed where gen/large-integer** isn't?
Have you looked into having the range change the number of bits we generate? e.g., have max-bit-count in gen/large-integer** be dependent on min/max args?
My intuition(possibly wrong) is that if you start with a random number from a uniform distribution between bounds (in this case min and max long) and scale it to be within another set of bounds, it should be uniformly distributed in that other set of bounds
https://git.sr.ht/~hiredman/surmise/tree/master/item/src/com/manigfeald/surmise.clj#L86
user=> (clojure.pprint/pprint (into (sorted-map) (frequencies (take 100000 (repeatedly #(clojure.test.check.generators/generate (s/choose 60 72)))))))
{60 8375,
61 8273,
62 8188,
63 8269,
64 8305,
65 8368,
66 8325,
67 8359,
68 8314,
69 8307,
70 8489,
71 4258,
72 4170}
nil
user=>
interesting, I am missing the high endre: bits, I think the interfaces to randomness sources tend to be something like "get a random byte" so going from that to random bits is still a scaling problem?
hah, test.check may lready be doing that for floating point https://github.com/clojure/test.check/blob/master/src/main/clojure/clojure/test/check/generators.cljc#L1124
Aha https://docs.oracle.com/javase/8/docs/api/java/util/Random.html#nextLong-- only being 48 bits may be what is killing my distribution
Since the range does not overlap with zero I think in this case s/choose is using choose-within-offside-range
offside range tweaks the range so it does include zero and calls choose' again
My wife helped me talk through why shifting bits until you're in the range does not result in a uniform distribution (large-integer's impl). Sharing because it helped me. Say you're going from 2 bit numbers to 1 bit numbers, you're 75% likely to get a 1 and 25% likely for 0.
11 -> 01
10 -> 01
01 -> 01
00 -> 00OTOH if you select the last 1 bit it's uniform.
11 -> 01
10 -> 00
01 -> 01
00 -> 00
I wonder if the bit shifts in choose-within-offside-range are hitting this kind of problem.
the shifts there don't really change the distribution though, they are about transforming a range centered on 0 to one that is not (or the other way around, been a bit since I really stared at this thing)
they accomplish a mapping between {... -1 0 1 ...} and {0 1 ...}
is it uniform if you choose a range that is a power of 2?
I don't think so
the non-uniformity is in the highest numbers, but folded that is likely the highest and the lowest, so the bounds of the scale, I sort of wonder if the double and round thing I do is blurring the edges of the scaling
user=> (clojure.pprint/pprint (into (sorted-map) (frequencies (take 100000 (repeatedly #(clojure.test.check.generators/generate (s/choose -10 10)))))))
{-10 2443,
-9 5043,
-8 4979,
-7 4953,
-6 5022,
-5 4998,
-4 4940,
-3 5021,
-2 4981,
-1 5025,
0 5060,
1 5087,
2 4946,
3 5058,
4 4988,
5 4903,
6 4943,
7 4906,
8 5017,
9 5142,
10 2545}
nil
user=>
yeah with a 0 centered range the highest and the lowest are out of whack