Well, every md5 character is hexadecimal. This means that it can have one of 16 possible values. Therefore, if you use only the first 4 "hexadecimal bits", this means that you may have the capabilities 16 * 16 * 16 * 16 or 16^4 or 65536 or 2^16 .
Thus, this means that the total available “space” for the results is only 16 bits. Now, according to the birthday / issue , there are the following chances for a collision:
50% chance → 300 entries1% chance → 36 entries0.0000001% chance → 2 .
Thus, the probability of collisions is very high.
Now you say that you need a hash with 4 characters. Depending on the exact requirements, you can:
- 4 hexadecimal bits for
16^4 (65 536) possible values - 4 alpha bits for
26^4 (456 976) possible values - 4 alpha-numeric bits for
36^4 (1,679,616) possible values - 4 ascii bits to print about
93^4 (74 805 201) possible values (assuming ASCII 33 → 126) - 4 bytes for
256^4 (4,294,967,296) possible values.
Now which one you choose will depend on the actual use case. Do I need to transfer the hash to the browser? How do you store it, etc.
I will give an example of each (in PHP, but you need to easily translate / see what happens):
4 hexadecimal bits :
$hash = substr(md5($data), 0, 4);
4 alpha bits :
$hash = substr(base_convert(md5($data), 16, 26)0, 4); $hash = str_replace(range(0, 9), range('S', 'Z'), $hash);
4 Alphanumeric Numeric Bits :
$hash = substr(base_convert(md5($data), 16, 36), 0, 4);
4 Assci bits to print :
$hash = hash('md5', $data, true); // We want the raw bytes $out = ''; for ($i = 0; $i < 4; $i++) { $out .= chr((ord($hash[$i]) % 93) + 33); }
4 full bytes :
$hash = substr(hash('md5', $data, true), 0, 4);