Is it safe to discard bytes with a UUID and still expect it to retain its uniqueness?

Question

Is it safe to discard bytes with a UUID and still expect it to retain its uniqueness?

I wrote the following module, which encodes the UUID on an arbitrary base:

http://pypi.python.org/pypi/shortuuid/

Now it reduces to 22 characters with the default alphabet, preserving the uniqueness, but I was interested to know how many (/) digits I could cut and maximize the stored uniqueness.

Are all UUID digits equally random / unique, or are some digits more random than others? For example, if the first few digits are the identifier of the machine / application, then obviously they will be less random than the last few. I have not noticed anything like this in my experiments, but I want to be sure before I advise people on it.

Truncate it, say, 8 digits, has a probability of collision 1/57 ^ 8, or is the probability uneven in numbers?

+4

python uuid random

Stavros korokithakis Jan 9 '11 at 17:47

source share

2 answers

It seems like it depends on which version you are dealing with. Starting with version 3, everything should be pretty random.

http://en.wikipedia.org/wiki/Universally_unique_identifier

+1

Jens schauder Jan 9 '11 at 17:55

source share

Wolph · Accepted Answer · 2011-01-09T18:01:40+0000

Due to the fact that the UUID is designed, it is highly version dependent. And yes, some will be more random than others. http://en.wikipedia.org/wiki/Uuid#Version_1_.28MAC_address.29

One way to crack this is to take the hash (i.e. sha256 , for example) of the UUID. These hashes must propagate uniformly.

_{Notice that I did not conduct a thorough analysis here.} _{My answer should be on the football field, but I can not guarantee that he is completely right.}

Is it safe to discard bytes with a UUID and still expect it to retain its uniqueness?

More articles: