Fork me on GitHub
#datascript
<
2017-03-20
>
thedavidmeister02:03:15

For example, the number of random version 4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion This number is equivalent to generating 1 billion UUIDs per second for about 85 years, and a file containing this many UUIDs, at 16 bytes per UUID, would be about 45 exabytes, many times larger than the largest databases currently in existence, which are on the order of hundreds of petabytes

thedavidmeister02:03:31

you can't guarantee the uniqueness without some kind of centralised co-ordination

qqq02:03:31

eh, 50% probability is too high forcomroft

qqq02:03:46

I'd rather prefer "more likely you'll die from asteriod collision than uuid collision"

thedavidmeister02:03:12

50% of one collision in 2.71 quintillion samples

qqq02:03:50

that's not a useful metric

qqq02:03:06

a userful metric would be, what is the largest N such that even fif you generated N uuids, pr of having a collision is < change of dying from asteriod attack

thedavidmeister02:03:26

Putting a probability number on the chances of being hit by a space rock is difficult, since the events are so rare. Still, Tulane University earth sciences professor Stephen A. Nelson published a paper in 2014 that made the effort. He put the lifetime odds of dying from a local meteorite, asteroid, or comet impact at 1 in 1,600,000.

thedavidmeister02:03:53

the equation is in the wiki article

thedavidmeister02:03:32

actually, i can't find the article now

thedavidmeister02:03:00

but a more useful metric would be considering the chance that a centralised UUID scheme results in collisions thanks to human/technological error

thedavidmeister02:03:35

"dying from asteroid attack" is not a seriously useful anything

thedavidmeister02:03:57

i think you have to propose what alternative exists that doesn't have a non-zero probability of issuing a duplicate

thedavidmeister03:03:34

@qqq or, explain what you're doing that has a good chance of producing entity ids every microsecond consistently for 285 years straight in the same datascript db

thedavidmeister03:03:05

with a solid use case, it's more likely that other people can:

thedavidmeister03:03:11

- agree there is a problem to be solved

thedavidmeister03:03:17

- solve the problem without causing new problems

thedavidmeister03:03:01

even twitter "only" receives 6000 tweets per second, based on a quick google

thedavidmeister03:03:00

and datascript on a single machine cannot physically transact at the rate you're talking, as highlighted by the benchmarks from the 11th