What is the fastest way to compare two item lists?

I have two folders, each of which contains about 10,000 files. I would like to write a script or program that can tell me if these folders are synchronized, and then tell me which files are missing from each one in order to synchronize them.

Therefore, after generating a list of files, what is the fastest algorithm sorts them for unique files? What I think now is comparing the first file in each list, then if they are different, delete one until they become the same, and then remove both from the list (because they are not unique.)

Is there a faster algorithm?

+3
source share
5 answers

diff -s [1] [2]

+8

C, qsort() , ""

, . :

  • , -
  • 1 > 2, 2 , - 2,
  • 1 1 - 1

, , , .

, , . . . , . , .

+5

, :

, . O (N) ..! , .. O (1) ..! O (N) .. .

, ..!

+3

md5 sha1 . -

cd dir1; md5sum * | sort > /tmp/hash1
cd dir2; md5sum * | sort > /tmp/hash2
diff /tmp/hash1 /tmp/hash2  # could also use comm

, , diff dir1 dir2 .

+1

, :

  • :
    • ,
    • :
    • else ,
  • ,

If you want to do this in two passes, or if you need the information you want to copy, replace "copy over" with "put the name and direction in the list of results."

+1
source

Source: https://habr.com/ru/post/1736851/


All Articles