As part of a Java-based web application, I am going to download the downloaded .xls and .csv files (and possibly other types). Each file will be uniquely renamed with a combination of parameters and time stamp.
I would like to be able to identify any duplicate files. In duplicate, I mean the same thing, regardless of name. Ideally, I would like to be able to detect duplicates as quickly as possible after loading, so that the server can include this information in response. (If processing time by file size does not cause too much lag.)
I read about running MD5 in files and saving the result as unique keys, etc. but I have a suspicion that there is a much better way. (Is there a better way?)
Any advice on how best to approach this is welcome.
Thank.
UPDATE:
I have nothing against using MD5. I have used it several times in the past with Perl ( Digest :: MD5 ). I thought a different (better) solution might come up in the Java world. But it seems that I was wrong.
Thanks to everyone for the answers and comments. I really enjoy using MD5 right now.
source
share