I record a daily 2 minute broadcast from the Internet. There is always the same initial and final ringing. Since the exact time of the broadcast can vary from more or less 6 minutes, I have to record about 15 minutes of radio.
I want to determine the exact time when these jingles are recorded in 15 minutes, so I can extract the part of the audio that I want.
I already started a C # application where I decoded MP3 data for PCM and converted PCM data to a spectrogram based on http://www.codeproject.com/KB/audio-video/SoundCatcher.aspx
I tried to use the Cross Correlation algorithm for PCM data, but the algorithm is very slow for about 6 minutes in 10 ms increments and in some cases it cannot find the start time of the call.
Any ideas on algorithms comparing two spectrograms to match? Or the best way to find the start time of a call?
Thanks,
Refresh, sorry for the delay
Firstly, thanks to all of them who were the majority of them, were interesting and interesting ideas.
I tried to implement the Shazam algorithm proposed by fonzo. But it was not possible to detect peaks in the spectrogram. There are three spectrograms of the initial call from three different records. I tried AForge.NET with a blob filter (but it could not identify the peaks) to blur the image and check the difference in height, Laplace convolution, slope analysis to detect a series of vertical bars (but there were too many false positives) ...
At the same time, I tried the Hugh algorithm proposed by Dave Aaron Smith. Where I calculate the RMS for each column. Yes, yes, each column is O (N * M), but M <N (Note that the column is about 8k samples). So in general, this is not so bad, yet the algorithm takes about 3 minutes, but never fails.
I could go with this solution, but if possible, I would prefer Shazam to call it O (N) and probably much faster (and cooler). Thus, any of you have the idea that the algorithm always detects the same points in these spectrograms (it does not have to be peaks), thanks to the addition of a comment.



New update
Finally, I went with the algorithm described above, I tried to implement the Shazam algorithm, but could not find the correct peaks in the spectrogram, identified points that were not constant from one sound file to another. Theoretically, the Shazam algorithm is a solution to this problem. The Hough algorithm proposed by Dave Aaron Smith was more stable and efficient. I divided about 400 files, and only 20 of them could not be split correctly. Disk space when from 8 GB to 1 GB.
Thank you for your help.