A lot of iterations on a hash: doesn't that decrease entropy?

I see that this method is recommended in many places (including the stack), and I canโ€™t get out of my head that it will reduce entropy! In the end, you again hash something that has already been hashed and has a chance of collision. Would a collision probability with a collision probability be more likely to collide? After research, it seems I'm wrong, but why?

+6
source share
2 answers

Since you marked md5, I will use this as an example. From wikipedia :

if two prefixes with the same hash can be created, a common suffix can be added to both so that the chance that the collision will be accepted as valid data by the application using it. In addition, current collision detection methods allow you to specify an arbitrary prefix: an attacker can create two counter files that start with the same content. All attackers must generate two colliding files - this is a template file with a 128-byte data block aligned on a 64-byte border, which can be freely changed by the collision search algorithm. For example, an MD5 collision with two messages differing in 6 bits:

And then the example they give is 256 bytes. Since the collision attack is based on a 128-byte data block and the hash digest is only 128 bits, there is really no increased risk of a collision attack that goes beyond the first iteration, that is, you cannot really affect the probability of a collision outside the first hash .

Also note that hash entropy is the aforementioned 128 bits. Even considering that the total probability of a collision is only 2 ^ 20.96 (again from wikipedia ), it will take many iterations to trigger two inputs for a collision. The first thing I think you are a victim of is:

  • Any two arbitrary inputs have a probability of collision x%.
  • The outputs of the first hash are two such inputs.
  • Therefore, each iteration increases the probability of a collision by x%.

This can be quite easily refuted by a counterexample. Consider MD5 again:

  • The probability of a collision between two inputs is 1: 2 ^ 21 (taking into account the worst-case scenario from MD5 cryptographic analysis).
  • Hashing again causes an equally probable collision probability, so the probability of a collision in the second round is 1: 2 ^ 20
  • Thus, for any two inputs hashed several times equal to the entropy of the digest, it is guaranteed that they collide.

MD5 any two inputs 128 times in a row, and you will see that this is not true. You probably won't find any duplicate hashes between them - in the end, you created only 256 of the possible hash values โ€‹โ€‹of 2 ^ 128, leaving the possibilities of 2 ^ 120. The probability of collisions between each round is independent of all other rounds.

There are two approaches to understand why this is so. First, each iteration essentially tries to hit a moving target. I think you could build a proof based on the paradox of a birthday that there is a surprisingly small number of hash iterations, where you are likely to see one hash digest from one input matching the hash digest of the other input. But they almost certainly occurred at different stages of the iteration. And once this happens, they can never have the same output in the same iteration, because the hash algorithm itself is deterministic.

Another approach is to understand that the hash function actually adds entropy during its launch. Note that an empty string has a 128-bit digest, like any other input; what cannot happen without adding entropy during the steps of the algorithm. This is actually an integral part of the cryptographic hash function: the data must be destroyed, otherwise the input can be restored from the digest. For inputs longer than a digest, yes, entropy is lost altogether; this should be in order to fit in the length of the digest. But entropy is also added.

I do not have exact numbers for other hash algorithms, but I think that all the points that I have made generalize to other hash functions and one-way / mapping functions.

+3
source

It reduces entropy.

In an article entitled "Random Matching Statistics" by Flejole and Odlyzko, the theorem (Theorem 2) shows that:

"If an n-bit random function is repeated k times, the expected number of image points is (1 - t_k) * 2 ^ n (for large n), where t_k satisfies the recurrence relation t_0 = 0 and t_ {k + 1} = e ^ { - 1 + t_k}. From this it can be shown that the expected number of image points is 2 ^ {ni + 1}, when a random function iterates k = 2 ^ times. "

Further links are as follows:

  • Gligoroski, D. and Klima, V., 2010, September. The practical consequences of aberration of narrow-pipe hash constructions from ideal random functions. At the International Conference on ICT Innovations (pp. 81-93). Springer Berlin Heidelberg.

  • Bhaumik, R., Dutta, A., Guo, J., Jean, J., Mouha, N. and Nikoliฤ‡, I., 2015. Extra rounds, less security?

  • Dinur, I. and Leurent, G., 2014, August. Improved general attacks against hashes and HAIFA MAC addresses. At the International Conference on Cryptology (pp. 149-168). Springer Berlin Heidelberg.

From the last reference work there are the following two lemmas: Two lemmas on the loss of entropy . Thus, the observation of the loss of entropy is also performed if k independent random functions are used instead of one random function that is repeated k times.

+1
source

Source: https://habr.com/ru/post/912385/


All Articles