Programmatically compare file sizes on Linux

I have two versions of a very large and complex directory structure with tens of thousands of separate files, and I want to look for significant file changes from one version to another.

Each file has changed slightly. For example, you might have a file called intro.txt that will contain

[Build 1057 done by Mike 12:00] - (version 1)

[Build 1065 made by Mike 18:10] - (version 2)

I do not like such changes because they do not contain any useful information. I also don't need corrections for spelling mistakes or adding a word or two.

What I really want to do is pull out files that have changed in a more important way. One of the ways they may have changed is to add a lot of additional content that will increase the file size - which interests me the change.

So, how would you recursively analyze directories that searched for files that increased (or decreased) by a given amount from one version to another.

I am running linux, but in any case, any language will work.

+3
source share
7 answers

In python, you want to start with the filecmp module .

- , (left_only right_only).

diff_files - os.stat, , .

, .

+3

CPAN , . ,

:: DirCompare ....

 use File::DirCompare;

 File::DirCompare->compare('dirA', 'dirB', sub {
     my ($a, $b) = @_;

     ... callback runs on different or missing files   ...
     ... so perform extra checks on files $a & $b here ...

 });

, , , ...

File::DirCompare->compare('dirA', 'dirB', size_diff_by_more_than(1024) );

sub size_diff_by_more_than {
    my $this = shift;

    return sub {
        my @files = grep { $_ } @_;

        if ( @files == 2 ) {
            # get the two file sizes and report if more than $this
            my @sizes = sort { $a <=> $b } map { (stat)[7] } @files;
            print "Different by more than $this bytes: $files[1]\n"
                if $sizes[1] - $sizes[0] > $this
        }
        else {
            print "Only: $files[0]\n";
        }
    };
}
+4

diff -r -b FOLDER1 FOLDER2, , , ( bash script) filename, .

-b diff , , .

-r , .

+2

bash:

before_dir=foo.old
after_dir=foo.new
interesting_size=10
for file in `find $before_dir -type f`; do
    diff_size=$(diff -u "$file" "$after_dir$(echo $file | sed "s,$before_dir,,")" | wc -l)
    if [ $diff_size -ge $interesting_size ]; then
        echo $file;
    fi;
done
+2

diff diffstat. Diffstat : , . , , .

+2

C stat .

#include 
#include 
#include 

int main( int argc, char* argv[] )
{
   struct stat fileInfoA;
   struct stat fileInfoB;

   if( argc == 3 )
   {
     stat( argv[1], &fileInfoA );
     stat( argv[2], &fileInfoB );

     // Now, you can use the following fields of stat to compare the files:
     //      struct stat {
     //          dev_t     st_dev;     /* ID of device containing file */
     //          ino_t     st_ino;     /* inode number */
     //          mode_t    st_mode;    /* protection */
     //          nlink_t   st_nlink;   /* number of hard links */
     //          uid_t     st_uid;     /* user ID of owner */
     //          gid_t     st_gid;     /* group ID of owner */
     //          dev_t     st_rdev;    /* device ID (if special file) */
     //          off_t     st_size;    /* total size, in bytes */
     //          blksize_t st_blksize; /* blocksize for filesystem I/O */
     //          blkcnt_t  st_blocks;  /* number of blocks allocated */
     //          time_t    st_atime;   /* time of last access */
     //          time_t    st_mtime;   /* time of last modification */
     //          time_t    st_ctime;   /* time of last status change */
     //      };

   }

. , , ( ). opendir() readdir().

+2

:

, diff diff .

( ) , , . .

0

Source: https://habr.com/ru/post/1704447/


All Articles