Peak Detection in Executive Code

I was looking for the ability to detect voice pitch in iphone using the HPS method. But the detected tones are not very accurate. Performous does a decent job of detecting pitch.

I looked at the code, but did not fully understand the theory of calculations. They use FFT and find peaks. But the part in which they use the FFT output phase made me confused. I suggest that they use some heuristics for voice frequencies.

So, can anyone explain the algorithm used by Performous to determine the pitch?

+3
source share
1 answer

[Performous] [1] extracts the pitch from the microphone. Also, the code is open source. Here is a description of what the algorithm does, from the guy who encoded it (Tronic on irc.freenode.net # performous).

  • PCM input (buffered)
  • FFT (1024 samples at a time, then removes 200 samples from the front of the buffer)
  • The reassignment method (versus the previous FFT, which was previously 200 samples)
  • Peak filtering (this part can be done much better or even eliminated).
  • The combination of peaks with harmonic sets (we call the combination of timbre)
  • Temporal filtering of tones (update the set of previously detected tones instead of simply using just discovered tones)
  • Choose the best vocal tone (frequency limits, weighting, you can also use a harmonic array, but I don’t think we do it)

I still could not figure out this information and implement it. If anyone can handle this, please post your results here and comment on this answer so that SO notifies me.

The challenge would be to create a minimal C ++ wrapper around this code.

+1
source

Source: https://habr.com/ru/post/1337863/


All Articles