Search for duplicate image files

Question

Search for duplicate image files

I have about 1 TB of images stored on my hard drive. These are photographs taken over time of friends and family. Many of these images are duplicates, in the sense of the same file saved in different places, possibly with a different name. I want to ask if there is any tool, utility or approach (I can encode one) to find duplicate files.

+1

image image-processing

abhinav Mar 6 '13 at 5:26

source share

1 answer

mvp · Accepted Answer · 2013-03-06T05:39:09+0000

I would recommend using md5deep or sha1deep . On Linux, just install the md5deep package (it is included in most Linux distributions).

After you install it, just run it in recursive mode on the entire disk and save the checksums for each file on your disk into a text file using the following command:

 md5deep -r -l . > filelist.txt

If you like sha1 better than md5 , use sha1deep instead (it is part of the same package).

Once you have the file, just sort it with sort (or move it to sort in the previous step):

 sort < filelist.txt > filelist_sorted.txt

Now just view the result using any text editor - you will quickly see all the duplicates along with their locations on the disk.

If you are so prone, you can write a simple script in Perl or Python to remove duplicates based on this list of files.

Search for duplicate image files

More articles: