CRC32 + Size vs MD5 / SHA1

We have a file vault and the vault uniquely identifies the file based on the size attached to crc32.

I wanted to know if this checksum (crc32 + size) would be good enough to identify files or should we consider some other hashing methods like MD5 / SHA1?

+4
source share
4 answers

The space that will be used by the CRC32 + size gives you enough space for a larger CRC, which would be a much better choice. If you are not worried about a malicious collision, then Thomas’s answer applies.

You did not specify the language, but, for example, in C ++ you received Boost CRC , giving you the CRC of the right size (or you can allow it to be stored).

0
source

CRC is the most efficient error detection method than a serious hash function. This helps identify corrupt files, rather than uniquely identifying them. Therefore, your choice should be between MD5 and SHA1.

If you do not have strong security requirements, you can choose MD5, which should be faster. (remember that MD5 is vulnerable to collisions). If you need extra security, it is better to use SHA1 or even SHA2.

+2
source

CRC-32 is not good enough; it is trivial to create conflicts, i.e. two files (the same length if you want it) that have the same CRC-32. Even in the absence of a malicious attacker, collisions will occur randomly when you have about 65,000 separate files of the same length.

The hash function is designed to prevent collisions. With MD5 or SHA-1 you will not get random collisions. If your setup is security-related (i.e., someone, someone who might be actively trying to create conflicts), you need a secure hash function. MD5 is no longer protected (creating conflicts with MD5 is very simple), and SHA-1 is somewhat weaker in this respect (the actual collisions were not calculated, but the method for creating it is known and, although expensive, much cheaper than what it should be ) The usual recommendation is to use SHA-256 or SHA-512 (for security, SHA-256 is enough: SHA-512 can be a little faster on large 64-bit systems, but the bandwidth of reading files will be more limited than the hash rate).

Note: when using a cryptographic hash function, there is no need to store and compare the file length; the hash is sufficient to disambiguate the files.

In a configuration without protection (i.e. you are only afraid of random collisions), then MD4 . It is completely “broken” as a cryptographic hash function, but it is still a very good checksum and it is very fast (on some ARM-based platforms it is even faster than CRC-32 and significantly increases resistance to accidental collisions). Basically, you should not use MD5: if you have security problems, then MD5 should not be used (it is damaged, use SHA-256); and if you have no security issues, MD4 is faster than MD5.

+2
source

As others have said, CRC does not guarantee no collisions. However, your problem is solved simply by providing the files with increment of 64-bit numbers. This is guaranteed never to happen (unless you want to store gazillion files in the same directory, which is not a good idea anyway).

0
source

Source: https://habr.com/ru/post/1346960/


All Articles