test-check

2026-06-17T03:06:31.942899Z

I've noticed that gen/large-integer* does not distribute uniformly when providing small bounds. This is not surprising once you read its implementation, but has anyone tackled this problem before? Here's the kinds of distributions with a range of 13 numbers, using gen/large-integer* vs gen/elements https://github.com/metosin/malli/issues/1291

2026-06-17T04:53:58.706869Z

I have some stuff here that isn't a drop in solution but may help https://git.sr.ht/~hiredman/surmise/tree/master/item/src/com/manigfeald/surmise.clj#L124 I think it should be uniformly distributed, but I believe it tops out in the range of longs

👏 1
2026-06-17T23:55:31.532269Z

IIUC your approach looks similar in some ways, in that it compresses 64 bits of randomness into a smaller size. I think this is what skews bounded large-integer's distribution, can you explain your intuition why yours is uniformly distributed where gen/large-integer** isn't? Have you looked into having the range change the number of bits we generate? e.g., have max-bit-count in gen/large-integer** be dependent on min/max args?

2026-06-18T00:34:41.139529Z

My intuition(possibly wrong) is that if you start with a random number from a uniform distribution between bounds (in this case min and max long) and scale it to be within another set of bounds, it should be uniformly distributed in that other set of bounds

2026-06-18T00:47:01.644029Z

user=> (clojure.pprint/pprint  (into (sorted-map) (frequencies (take 100000 (repeatedly #(clojure.test.check.generators/generate (s/choose 60 72)))))))
{60 8375,
 61 8273,
 62 8188,
 63 8269,
 64 8305,
 65 8368,
 66 8325,
 67 8359,
 68 8314,
 69 8307,
 70 8489,
 71 4258,
 72 4170}
nil
user=>
interesting, I am missing the high end

2026-06-18T00:50:05.092379Z

re: bits, I think the interfaces to randomness sources tend to be something like "get a random byte" so going from that to random bits is still a scaling problem?

2026-06-18T00:52:45.337129Z

hah, test.check may lready be doing that for floating point https://github.com/clojure/test.check/blob/master/src/main/clojure/clojure/test/check/generators.cljc#L1124

2026-06-18T01:18:13.033559Z

Aha https://docs.oracle.com/javase/8/docs/api/java/util/Random.html#nextLong-- only being 48 bits may be what is killing my distribution

2026-06-18T01:26:58.162709Z

Since the range does not overlap with zero I think in this case s/choose is using choose-within-offside-range

2026-06-18T01:29:17.324689Z

offside range tweaks the range so it does include zero and calls choose' again

👍 1
2026-06-18T01:30:19.124579Z

My wife helped me talk through why shifting bits until you're in the range does not result in a uniform distribution (large-integer's impl). Sharing because it helped me. Say you're going from 2 bit numbers to 1 bit numbers, you're 75% likely to get a 1 and 25% likely for 0.

11 -> 01
10 -> 01
01 -> 01
00 -> 00

🙌 1
2026-06-18T01:34:37.824319Z

OTOH if you select the last 1 bit it's uniform.

11 -> 01
10 -> 00
01 -> 01
00 -> 00

2026-06-18T01:37:49.213999Z

I wonder if the bit shifts in choose-within-offside-range are hitting this kind of problem.

2026-06-18T01:40:28.478449Z

the shifts there don't really change the distribution though, they are about transforming a range centered on 0 to one that is not (or the other way around, been a bit since I really stared at this thing)

👍 1
2026-06-18T01:41:24.452679Z

they accomplish a mapping between {... -1 0 1 ...} and {0 1 ...}

2026-06-18T01:41:28.779859Z

is it uniform if you choose a range that is a power of 2?

2026-06-18T01:43:11.010609Z

I don't think so

2026-06-18T01:44:03.777899Z

the non-uniformity is in the highest numbers, but folded that is likely the highest and the lowest, so the bounds of the scale, I sort of wonder if the double and round thing I do is blurring the edges of the scaling

2026-06-18T01:44:56.174569Z

user=> (clojure.pprint/pprint  (into (sorted-map) (frequencies (take 100000 (repeatedly #(clojure.test.check.generators/generate (s/choose -10 10)))))))
{-10 2443,
 -9 5043,
 -8 4979,
 -7 4953,
 -6 5022,
 -5 4998,
 -4 4940,
 -3 5021,
 -2 4981,
 -1 5025,
 0 5060,
 1 5087,
 2 4946,
 3 5058,
 4 4988,
 5 4903,
 6 4943,
 7 4906,
 8 5017,
 9 5142,
 10 2545}
nil
user=>

2026-06-18T01:45:11.207869Z

yeah with a 0 centered range the highest and the lowest are out of whack

👌 1