Comparing two files in C ++

I have a function that compares two files to make sure they are the same. It reads the bytes of the files byte and checks that they are the same.
The problem that I am facing now is that for large files this function takes quite a lot of time.

What is the best and fastest way to check if files match?

+4
source share
5 answers

If your files do not match, can they be the same size? If not, you can immediately determine the file sizes (fseek to the end, ftell to determine the position), and if they are different, you know that they do not match without comparing the data. If the size is the same, do not forget to return to the beginning.

If you read your files in large memory buffers and compare each buffer with memcmp (), you improve performance. You don’t have to read the whole file right away, just set a large buffer size and read blocks of this size from each file, for each iteration of the comparison through your loop. Memcpy function will work with 32-bit values, but not with 8-bit bytes.

+6
source

If you really want to compare the brute forces of two files, mmaping can help .

If you know the file structure of what you are reading, read unique sections that allow you to quickly identify them (for example, the title and the corresponding fragments / sections). Of course, you'll want to get your basic attributes before comparing.

Generate hashes (or something else) if you are doing a few comparisons.

+2
source

Read the files in chunks of size X. From X to 1-10-50 megabytes. Use memcmp() on these pieces.

0
source

Although there are a number of examples of cryptographic hash functions using SHA or MD5, it is better to use a non-cryptographic hash to compare files, since it will be faster:

https://en.wikipedia.org/wiki/List_of_hash_functions#Non-cryptographic_hash_functions

An FNV hash is considered fast for your needs:

https://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash

0
source

If you are not familiar with google hash search about the "MD5" or "SHA" algorithms. Hashing is one of the most effective approaches for checking file compliance. Only you need to find an implementation of one of these algorithms and test them; eg:

 if(md5(file1Path) == md5(file2Path)) cout<<"Files are equal"<<endl; else cout<<"Files are not equal"<<endl; 
-one
source

Source: https://habr.com/ru/post/1398596/


All Articles