How to find duplicate directories

Let me create some testing directory tree:

#!/bin/bash

top="./testdir"
[[ -e "$top" ]] && { echo "$top already exists!" >&2; exit 1; }

mkfile() { printf "%s\n" $(basename "$1") > "$1"; }

mkdir -p "$top"/d1/d1{1,2}
mkdir -p "$top"/d2/d1some/d12copy
mkfile "$top/d1/d12/a"
mkfile "$top/d1/d12/b"
mkfile "$top/d2/d1some/d12copy/a"
mkfile "$top/d2/d1some/d12copy/b"
mkfile "$top/d2/x"
mkfile "$top/z"

Structure: find testdir \( -type d -printf "%p/\n" , -type f -print \)

testdir/
testdir/d1/
testdir/d1/d11/
testdir/d1/d12/
testdir/d1/d12/a
testdir/d1/d12/b
testdir/d2/
testdir/d2/d1some/
testdir/d2/d1some/d12copy/
testdir/d2/d1some/d12copy/a
testdir/d2/d1some/d12copy/b
testdir/d2/x
testdir/z

I need to find duplicate directories , but I need to consider only files (for example, I should ignore directories without files). So, from the above test tree, the desired result:

duplicate directories:
testdir/d1
testdir/d2/d1some

because in both (under) trees there are only two identical files aand b. (and several directories, without files).

, md5deep -Zr ., perl script ( File::Find + Digest::MD5 Path::Tiny .) md5-digests, ...: (

, ? , .

  • code. ( )
  • "" " " .:)

Edit2

- : 2,5 , . . $HOME dirs ( ) . , . , .

, , , . :

/some/path/project1/a
/some/path/project1/b

/some/path/project2/a
/some/path/project2/x

. a - ( , ), . a , . "", .

+6
2

, , . , : , .

. , ( - , , , ). . , .

, , , . , , .

, . , .

.

. ,

dir1/subdir1/{a,b}  # duplicates (files 'a' and 'b' are considered equal)
dir2/subdir2/{a,b}

proj1/subproj1/{a,b,X}  # NOT duplicates, since there are different files
proj2/subproj2/{a,b,Y}

'dir1/subdir1/a,b',
'dir2/subdir2/a,b',
'proj1/subproj1/a,b,X',
'proj2/subproj2/a,b,Y';

() 'a,b' dir1/subdir1 dir2/subdir2 .

, , .


, , , .

   dirA/          dirB/
a b sdA/       a X sdB/
    c d            c d

dirA/sdA/ dirB/sdB/ , dirA/ dirB/ . , , .

. , . ( ). , , (/sdA/). ,

'dirA/sdA,a,b/c,d',  'dirB/sdB,a,X/c,d'

(c,d) . , c,d, , , ( ) .


, , sdA (, sdA2). , (a,b, dirA/sdaA2,a,b/). (c,d) , , , , a,b .

, , , " " - , . , . , , , .

, , , . - , , . , , - jm666.

+3

:

  • ( , : "( , , , )", , , , , .
  • . = . ( ).:) , md5deep -Zr -of /top/dir .
  • -of, , fifo - .
  • md5 2.5TB, , . md5deep cpu-. , , .
  • md5deep sudo, , , , -... ( ):):)

"":

  • "" "-" , .
  • - :
    • , . , . , , . , . , script , md5 ( md5deep.)
    • "-" . ( ). " ", md5 , , !

. , /path/to/some a b

if file "a" has md5 : 0cc175b9c0f1b6a831c399e269772661
and file "b" has md5: 92eb5ffee6ae2fec3ad71c777531578f

"-" -, . Digest::MD5, :

perl -MDigest::MD5=md5_hex -E 'say md5_hex(sort qw( 92eb5ffee6ae2fec3ad71c777531578f 0cc175b9c0f1b6a831c399e269772661))'

3bc22fb7aaebe9c8c5d7de312b876bb8 "-" . (!) , , :

perl -MDigest::MD5=md5_hex -E 'say md5_hex(qw( 92eb5ffee6ae2fec3ad71c777531578f 0cc175b9c0f1b6a831c399e269772661))'

: 3a13f2408f269db87ef0110a90e168ae.

, , . ( - md5). , , . -

file "aaa" has md5 : 92eb5ffee6ae2fec3ad71c777531578f
file "bbb" has md5 : 0cc175b9c0f1b6a831c399e269772661

sort and md5, : 3bc22fb7aaebe9c8c5d7de312b876bb8 - . , , ...

, "-" , , , 3bc22fb7aaebe9c8c5d7de312b876bb8 thats, : a b ( ).

, "-" 32- , .

. :

3a13f2408f269db87ef0110a90e168ae /some/directory
16ea2389b5e62bc66b873e27072b0d20 /another/directory
3a13f2408f269db87ef0110a90e168ae /path/to/other/directory

:

/some/directory /path/to/other/directory , "-".

Hm... perl script. , perl- script , , , ...:):)

+3

Source: https://habr.com/ru/post/1016699/


All Articles