I have two directories containing ~ 20 GB of music files (mostly mp3, some ogg), and I would like to detect all duplicate songs. There are two complicating factors:
What is a good approach to solving this issue?
As I said about this in the past, you need to use genpuids that come from Music IP. Closed source software creates an audio fingerprint of a file regardless of format, id3, checksum, etc.
.
. id3.
( )...
... ! ( digg: "... !" )
/
ID3 /OGG-equiv ? , .
Edit: , , , ... , , , .
, .
, , . , , // ..
, - , ID3 . ID3, .
, Last.fm API. track.getInfo, XML, , , .. , N , , , .
, , API 40 .
How about something like this: find a library to get the length of the mp3, as well as a pointer to the audio data (there seem to be several libraries that can do this), do a first pass filter based on the song's length, and for songs that have a consistent length, check your audio data. Similarly, a script for finding duplicate files / images .
Some adaptation of ffTES did a great job for me for a very similar task.
Source: https://habr.com/ru/post/1698533/More articles:cherry non-closing sockets - pythonThere are no report items in the toolbar (VS 2008 SP1) - c #Save attribute value of xml element with single quotes using linq for xml - c #Manipulating strings in MS SQL Server - stringtf-idf and previously invisible terms - algorithmWhat is the point of the garbage collector - garbage-collectionHow to create Delphi TSpeedButton or SpeedButton in C # 2.0? - c #Как вставить содержимое подзапроса в MS SQL Server? - sql-serverJQuery inserts a layer on top of existing content - jqueryПриложение mdi с несколькими потоками графического интерфейса - multithreadingAll Articles