Underground Clash MD5

I need a 4 character hash. I am currently taking the first 4 characters of the md5() hash. I haveh a string 80 characters or less in length. Will this lead to a collision? or, what is the probability of a collision, assuming I have a hash of less than 65 536 (16 4 ) different elements?

+4
source share
3 answers

Well, every md5 character is hexadecimal. This means that it can have one of 16 possible values. Therefore, if you use only the first 4 "hexadecimal bits", this means that you may have the capabilities 16 * 16 * 16 * 16 or 16^4 or 65536 or 2^16 .

Thus, this means that the total available “space” for the results is only 16 bits. Now, according to the birthday / issue , there are the following chances for a collision:

  • 50% chance → 300 entries
  • 1% chance → 36 entries
  • 0.0000001% chance → 2 .

Thus, the probability of collisions is very high.

Now you say that you need a hash with 4 characters. Depending on the exact requirements, you can:

  • 4 hexadecimal bits for 16^4 (65 536) possible values
  • 4 alpha bits for 26^4 (456 976) possible values
  • 4 alpha-numeric bits for 36^4 (1,679,616) possible values
  • 4 ascii bits to print about 93^4 (74 805 201) possible values ​​(assuming ASCII 33 → 126)
  • 4 bytes for 256^4 (4,294,967,296) possible values.

Now which one you choose will depend on the actual use case. Do I need to transfer the hash to the browser? How do you store it, etc.

I will give an example of each (in PHP, but you need to easily translate / see what happens):

4 hexadecimal bits :

 $hash = substr(md5($data), 0, 4); 

4 alpha bits :

 $hash = substr(base_convert(md5($data), 16, 26)0, 4); $hash = str_replace(range(0, 9), range('S', 'Z'), $hash); 

4 Alphanumeric Numeric Bits :

 $hash = substr(base_convert(md5($data), 16, 36), 0, 4); 

4 Assci bits to print :

 $hash = hash('md5', $data, true); // We want the raw bytes $out = ''; for ($i = 0; $i < 4; $i++) { $out .= chr((ord($hash[$i]) % 93) + 33); } 

4 full bytes :

 $hash = substr(hash('md5', $data, true), 0, 4); // We want the raw bytes 
+4
source

Amazingly high standard. As you can see from this graph, the approximate probability of a collision (formula on the wikipedia page), only a few hundred elements, the probability that the probability of a collision exceeds 50%.

Note that if you are faced with the possibility that an attacker provided a string, you probably assume that it is 100% scanning to find a collision in a 16-bit search space, you can do it almost instantly on any modern PC. Or even any modern cell phone, for that matter.

+1
source

The first 4 characters contain 4 * 4 = 16 bits of data, so the collision will definitely be on 65536 elements, and due to a birthday attack, it will be found much faster. You should use more hash bits.

0
source

Source: https://habr.com/ru/post/1335417/


All Articles