Reuse calculated by md5 (or any other checksum)

Trying to calculate an incremental md5digest for all files in deep directory trees, but I cannot "reuse" an already calculated digest.

Here is my test code:

#!/usr/bin/env perl
use 5.014;
use warnings;
use Digest::MD5;
use Path::Tiny;

# create some test-files in the tempdir
my @filenames = qw(a b);
my $testdir = Path::Tiny->tempdir;
$testdir->child($_)->spew($_) for @filenames; #create 2 files

dirmd5($testdir, @filenames);
exit;

sub dirmd5 {
    my($dir, @files) = @_;

    my $dirctx = Digest::MD5->new;  #the md5 for the whole directory

    for my $fname (@files) {

        # calculate the md5 for one file
        my $filectx = Digest::MD5->new;
        my $fd = $dir->child($fname)->openr_raw;
        $filectx->addfile($fd);
        close $fd;
        say "md5 for $fname  : ", $filectx->clone->hexdigest;

        # want somewhat "add" the above file-md5 to the directory md5    
        # this not work - even if the $filectx isn't reseted (note the "clone" above)
        #$dirctx->add($filectx);

        # works adding the file as bellow,
        # but this calculating the md5 again
        # e.g. for each file the calculation is done two times...
        # once for the file-alone (above)
        # and second time for the directory
        # too bad if case of many and large files. ;(
        # especially, if i want calculate the md5sum for the whole directory trees
        $fd = $dir->child($fname)->openr_raw;
        $dirctx->addfile($fd);
        close $fd;
    }
    say "md5 for dir: ", $dirctx->hexdigest;
}

The above prints:

md5 for a  : 0cc175b9c0f1b6a831c399e269772661
md5 for b  : 92eb5ffee6ae2fec3ad71c777531578f
md5 for dir: 187ef4436122d1cc2f40dc2b92f0eba0

which is the correct, but unfortunately inefficient way. (see comments).

Reading docs , I did not find a way to reuse the already calculated md5. for example as stated above $dirctx->add($filectx);. This is probably not possible.

- -, , / ?

:

+2
1

. , MD5(initial data) MD5(new data) MD5(initial data + new data), , . , aba, aab baa

, .

#!/usr/bin/env perl

use 5.014;
use warnings 'all';

use Digest::MD5;
use Path::Tiny;

# create some test-files in the tempdir
my @filenames = qw(a b);
my $testdir   = Path::Tiny->tempdir;
$testdir->child($_)->spew($_) for @filenames; # create 2 files

dirmd5($testdir, @filenames);

sub dirmd5 {
    my ($dir, @files) = @_;

    my $dir_ctx = Digest::MD5->new;  #the md5 for the whole directory

    for my $fname ( @files ) {

        my $data = $dir->child($fname)->slurp_raw;

        # calculate the md5 for one file
        my $file_md5 = Digest::MD5->new->add($data)->hexdigest;
        say "md5 for $fname  : $file_md5";

        $dir_ctx->add($data);
    }

    my $dir_md5 = $dir_ctx->hexdigest;
    say "md5 for dir: $dir_md5";
}

, - ,

#!/usr/bin/env perl

use 5.014;
use warnings 'all';

use Digest::MD5;
use Path::Tiny;
use Fcntl ':seek';

# create some test-files in the tempdir
my @filenames = qw(a b);
my $testdir   = Path::Tiny->tempdir;
$testdir->child($_)->spew($_) for @filenames; # create 2 files

dirmd5($testdir, @filenames);

sub dirmd5 {
    my ($dir, @files) = @_;

    my $dir_ctx = Digest::MD5->new;  # The digest for the whole directory

    for my $fname ( @files ) {

        my $fh = $dir->child($fname)->openr_raw;

        # The digest for just the current file
        my $file_md5 = Digest::MD5->new->addfile($fh)->hexdigest;
        say "md5 for $fname  : $file_md5";

        seek $fh, 0, SEEK_SET;
        $dir_ctx->addfile($fh);
    }

    my $dir_md5 = $dir_ctx->hexdigest;
    say "md5 for dir: $dir_md5";
}
+2

Source: https://habr.com/ru/post/1016703/


All Articles