Best way to detect duplicate downloaded files in Java environment?

Question

Best way to detect duplicate downloaded files in Java environment?

As part of a Java-based web application, I am going to download the downloaded .xls and .csv files (and possibly other types). Each file will be uniquely renamed with a combination of parameters and time stamp.

I would like to be able to identify any duplicate files. In duplicate, I mean the same thing, regardless of name. Ideally, I would like to be able to detect duplicates as quickly as possible after loading, so that the server can include this information in response. (If processing time by file size does not cause too much lag.)

I read about running MD5 in files and saving the result as unique keys, etc. ~~but I have a suspicion that there is a much better way.~~ (Is there a better way?)

Any advice on how best to approach this is welcome.

Thank.

UPDATE: I have nothing against using MD5. I have used it several times in the past with Perl ( Digest :: MD5 ). I thought a different (better) solution might come up in the Java world. But it seems that I was wrong.

Thanks to everyone for the answers and comments. I really enjoy using MD5 right now.

+3

java java-ee duplicates file-upload

S. jones 15 sept. '10 at 20:34

source share

2 answers

, . , . API .

public static String calc(InputStream is ) {
        String output;
        int read;
        byte[] buffer = new byte[8192];

        try {
            MessageDigest digest = MessageDigest.getInstance("SHA-256"); //"MD5");
            while ((read = is.read(buffer)) > 0) {
                digest.update(buffer, 0, read);
            }
            byte[] hash = digest.digest();
            BigInteger bigInt = new BigInteger(1, hash);
            output = bigInt.toString(16);

        } 
        catch (Exception e) {
            e.printStackTrace( System.err );
            return null;
        }
        return output;
    }

+1

stacker 15 . '10 20:54

BalusC · Accepted Answer · 2010-09-15T21:15:47+0000

OutputStream DigestOutputStream, . - ( , ?).

Best way to detect duplicate downloaded files in Java environment?

More articles: