Detect duplicate music files

I have two directories containing ~ 20 GB of music files (mostly mp3, some ogg), and I would like to detect all duplicate songs. There are two complicating factors:

  • A song can contain different file names in two directories.
  • Two files containing the same song can have different ID3 tags and therefore have different checksums.

What is a good approach to solving this issue?

+3
source share
8 answers

As I said about this in the past, you need to use genpuids that come from Music IP. Closed source software creates an audio fingerprint of a file regardless of format, id3, checksum, etc.

.

. id3.

+4

( )...

... ! ( digg: ​​ "... !" )

/

+2

ID3 ​​/OGG-equiv ? , .

Edit: , , , ... , , , .

, .

+1

, , . , , // ..

+1

, - , ID3 ​​ . ID3, .

+1

, Last.fm API. track.getInfo, XML, , , .. , N , , , .

, , API 40 .

0

How about something like this: find a library to get the length of the mp3, as well as a pointer to the audio data (there seem to be several libraries that can do this), do a first pass filter based on the song's length, and for songs that have a consistent length, check your audio data. Similarly, a script for finding duplicate files / images .

0
source

Some adaptation of ffTES did a great job for me for a very similar task.

0
source

Source: https://habr.com/ru/post/1698533/


All Articles