I would write a script to create a hash of each file. You can store hashes in the set, iterate over files and hash the file to the value already found in the set, delete the file. This would be trivial to do in Python, for example.
For 30,000 files, in 64 bytes to write to a hash table, you only look at about 200 megabytes.
source
share