Detect the beginning of sound or voice in Android

I would like to listen to the microphone (I think, using AudioRecord) and perform some action at the very moment when the person begins to speak. I know that I can record audio using AudioRecord, but how do I analyze it?

+4
source share
1 answer

Well, the tricky part is making the phone recognize that voice. You can install a voice recognition system as an input, rather than a microphone that can do this. I don’t think so, because (in fact, I commented on everything about it yesterday) the phone actually doesn’t recognize it, it just opens a live broadcast (for example, a phone call) to Google servers, and they make recognitions.

In addition, the information that I have found so far indicates that Android does not support the analysis of live sound from a microphone. All these other applications that seem to be “live” actually just take a bunch of small samples and analyze them very quickly to make them seem alive. It seems that a 500 millisecond sample every 300 milliseconds seems to be common.

Fortunately, on the side of my programming work, I am also a sound engineer, so I can tell you that (if you were ready to do the work), there is a way to determine the actual voice, not just the sound. Each voice is divided into several separate frequency ratios, which all combine so that the voice we hear and all voice coefficients remain fairly constant, while each individual voice ratio is different (which is why passwords based on voice work). So, if you could take a sample, divide it into frequencies of about 10 Hz each and monitor the amplitude of each, and when you get a frequency / amplitude that looked like a voice and not just “white noise”, you will be in business. MAKE that, however, it seems like it would be easy, Something similar has been done before using the SpectralView application, which displays the entire spectrum of sound.

In addition, as you can see, using Voice Search, the voice also fluctuates greatly in how loud it is. You could look for it, but it would not be so reliable.

In conclusion, how do you analyze it? Well, you have to look for a pattern at frequencies that look like a voice. How do you do this? Well, to be honest, I don't know for sure. I'm sorry.

+8
source

Source: https://habr.com/ru/post/1335117/


All Articles