How to find all files with one content?

Question

How to find all files with one content?

This is the question: "Given a directory with a large number of files, find files that have the same content." I would suggest using a hash function to generate hash values for the contents of the file and compare only files with the same hash values. Does this make sense?

The next question is how to choose a hash function. Would you use SHA-1 for this purpose?

+3

filesystems hash

Michael Nov 08 '10 at 12:30

source share

4 answers

. .

+6

Dr. belisarius 08 . '10 13:57

, , SHA-1 MD5. MD5. , - .

+2

sharptooth 08 . '10 12:31

Yes, hashing is the first thing that comes to mind. For your specific task you need to use the fastest hash function. Adler32 will work. Collisions are not a problem in your case, so you do not need a cryptographically strong function.

+1

Eugene Mayevski 'Allied Bits Nov 08 '10 at 12:32

source share

Zack Bloom · Accepted Answer · 2010-11-08T12:37:08+0000

Like most interview questions, this is more about talking than about answering.

, , , ( ). , , . , , , . hIt , .

-, SHA-1. SHA-1 , . Adler 32, , 2-3 . , , . IO , , , , IO, , .

- , .

How to find all files with one content?

More articles: