Ability to duplicate a hash when using the first 8 characters of SHA1

Question

Ability to duplicate a hash when using the first 8 characters of SHA1

If I have an index of URLs and their ID by the first 8 characters of the SHA1 hash, what is the likelihood that two different URLs have the same identifiers?

+6

math probability hash-collision sha1

zino May 31 '15 at 18:29

source share

2 answers

The SHA-1 hash has 40 base 16 digits. If you look only at the first 8 of them, then the probability that the second URL will have the same 8 digits is (1/16)^8 ~ 2.32e-10 . In fact, it does not depend on the fact that for a start there should be 40 digits, or even that it is SHA-1. The only assumption you need is that SHA-1 has the first 8 digits independently and equally distributed.

+2

Teepeemm Jun 01 '15 at 1:04

source share

Norman gray · Accepted Answer · 2015-06-01T03:11:55+0000

@Teepeemm answered the related question correctly, "given a certain sequence of 8 hexadecimal digits, what is the likelihood that another SHA-1 hash will appear with the same 8 digits? This is a very small number.

However, another question arises on this question: “Given the large number of sequences with 8 six digits, what is the probability that they will be the same? As the first comment on this question notes, this is due to a paradoxical birthday, which is not“ that that someone in the room has the same birthday as me ?, but instead, "what is the likelihood that two people in this room will have the same birthday? As is well known, the probability that it is 50% , is only 23 people.

The hash collision problem is essentially the same problem, but generalized from N = 365 days to N = 16 8 8-byte sequences, which is about 4.30e9. This is a 'generalized happy birthday problem . Using the expression expressed here (n = sqrt (2 * d * ln (1 / (1-p))), with d = 4.30e9 and p = 0.5, we find a 50% chance of collision with only 77,000 samples. If If you build the corresponding function, you will see that the probability increases quite quickly as the number of tests increases.

Even with 16 bytes of a hash (so d = 16 ^ 16) there is a 50% chance of collision after only 5 billion samples.

Happy birthday

Ability to duplicate a hash when using the first 8 characters of SHA1

More articles: