How to find all files with one content?

This is the question: "Given a directory with a large number of files, find files that have the same content." I would suggest using a hash function to generate hash values ​​for the contents of the file and compare only files with the same hash values. Does this make sense?

The next question is how to choose a hash function. Would you use SHA-1 for this purpose?

+3
source share
4 answers

Like most interview questions, this is more about talking than about answering.

, , , ( ). , , . , , , . hIt , .

-, SHA-1. SHA-1 , . Adler 32, , 2-3 . , , . IO , , , , IO, , .

- , .

+4

. .

+6

, , SHA-1 MD5. MD5. , - .

+2

Yes, hashing is the first thing that comes to mind. For your specific task you need to use the fastest hash function. Adler32 will work. Collisions are not a problem in your case, so you do not need a cryptographically strong function.

+1
source

Source: https://habr.com/ru/post/1773607/


All Articles