This is the question: "Given a directory with a large number of files, find files that have the same content." I would suggest using a hash function to generate hash values for the contents of the file and compare only files with the same hash values. Does this make sense?
The next question is how to choose a hash function. Would you use SHA-1 for this purpose?
Like most interview questions, this is more about talking than about answering.
, , , ( ). , , . , , , . hIt , .
-, SHA-1. SHA-1 , . Adler 32, , 2-3 . , , . IO , , , , IO, , .
- , .
. .
, , SHA-1 MD5. MD5. , - .
Yes, hashing is the first thing that comes to mind. For your specific task you need to use the fastest hash function. Adler32 will work. Collisions are not a problem in your case, so you do not need a cryptographically strong function.
Source: https://habr.com/ru/post/1773607/More articles:Did JDBC set the database session timezone for the Java virtual machine? - javaWPF: How to move an extended custom window? - wpfHow to speed up this MySQL query? - performancehttps://translate.googleusercontent.com/translate_c?depth=1&pto=aue&rurl=translate.google.com&sl=ru&sp=nmt4&tl=en&u=https://fooobar.com/questions/1773605/php-how-to-send-lead-forms-to-salesforcecom-api&usg=ALkJrhgu3oE3kikyw3YheBL_Csyvf4nZ5QAutomatically instantiate multiple types in C ++ - c ++Use the button on the HTML page to invoke xcode IBAction - htmlКак создать пользовательские имена функций в gedit? - linuxSearch Problem with Map Lists - c ++Window for positioning problems when using SetParent () - c #Passing a Hashtable to Unmanaged Code with interop - c #All Articles