Why is the most common hashed (SHA1) password prefix β€œ00000”?

I read the Troy Hunt blog post ( https://www.troyhunt.com/ive-just-launched-pwned-passwords-version-2/ ), about a feature called "Pwned Passwords" that checks if your password was is in a database with over 1 billion missing passwords.

To do this, without passing your password, the client hash code and passes only the first five characters of this hash, the backend returns all sha1 hashes of passwords that begin with the prefix that you passed. Then, to check whether the hash of your password is in the database or not, the comparison is performed using the client code.

And he put some information about the data of these hashed passwords ...

  • Each hash prefix from 00000 to FFFFF is populated with data (combination 16 ^ 5)
  • The average number of returned hashes is 478
  • The smallest is 381 (hash prefixes "E0812" and "E613D")
  • The largest is 584 (hash prefixes "00000" and "4A4E8")

In the comments, people wondered if the presence of this β€œ00000” is a coincidence or is it mathematical ...

Can someone who understands the SHA1 algorithm explain this to us?

+6
source share
3 answers

Well, since passwords were originally due to data breaches, I think the password table in one of the broken systems was sorted or grouped (unsalted are people who get their passwords stolen) SHA1 password hash. When the system was broken, the attackers started with β€œ00000” hashes and simply did not do all this to the end ...

Or maybe the list that Troy used includes the first part of the SHA1 rainbow table ( https://en.wikipedia.org/wiki/Rainbow_table ) ...

Or something like that. The basic idea is that the SHA1 password hash was part of the password selection process.

+11
source

This is either a coincidence or an (less likely) artifact / error in obtaining or assembling the results for publication.

Not that it looked like a significant surge. The distribution that is described (381 min, 478 on average, 584 max.) Appears to be evenly distributed across the sample size. The graph of the entire body is likely to look pretty random.

Like any reasonably constructed hash algorithm, the frequency of characters in the results of SHA1 must be randomly distributed. (If SHA1 had any bias, this would be breaking news in the math and cryptography / cryptology community!)

+8
source

someone will have to check my assumptions regarding the sha1 algorithm (and Troy may have already refuted it, because according to the answer on his blog, he β€œprevailed over passwords [in clear text]), but since the passwords are just alphanumeric and limited characters, as the creation of the hash shown in ASCII will ALWAYS start working with the first ZERO bit (ascii is 0-255, but the used numbers of letters and characters are in the range 32-98, I think, therefore the first bit of every 8 bits is always zero), and although it has a hash function to hide it, I go Revai that bits predictable positioning is not so easy to entangle, as might be expected, although it binds to 4, 0 is equal to 00000000 in bit form, and 4 - 00000100, so both have the first FIVE bits as 0,

also note that the two least frequent hash headers both start with E, which corresponds to 11111110 in binary form, so they are almost completely opposite in design (1 versus 0) and frequency (low compared to high), implying the presence of zero bits, it may be a side the influence of the algorithm directly (doubtful) or the function of the algorithm on a limited subset skewed by convention, in other words, letters and numbers occupy only 1/3 - 1/4 of the entire range displayed by ASCII, which is most likely

Of course, we could go to the "tin can" with this convoy, but I would argue that coincidence and ASCII are to blame more than this man on a grassy knoll

0
source

Source: https://habr.com/ru/post/1275521/


All Articles