Voice Trigger Detection

I have a voice application that would be greatly improved if it were possible to use the “trigger word” to start recording sound. I don’t need the full mechanism of the speech text, just the ability to reliably / efficiently detect the trigger word.

I am wondering if there are any specialized speech mechanisms that support this particular use case, or any libraries / methods for developing such a single object detection mechanism. Ideally, I would like it to work in noisy environments, but it can be trained for a single user voice.

Pointers to scientific articles / topics would also be appreciated, so I know what to ask for.

+3
source share
5 answers

My Red5 project colleague created a similar demo using trigger words to trigger a search in the image repository. Saying "cat" caused the cat to appear for about a second. The client application was written in Flash, and the back-end in Red5, using the free Sphinx library. You, of course, could do what you want with Sphinx effortlessly.

Sphinx Project: http://cmusphinx.sourceforge.net/sphinx4/

+2
source

, , .

- , -, :

- . - . , .

, ( ) , 80% . 80% - /. thresold , .

, .

+1

O/S? , , Windows Vista. .

0

Linux. , , , , . , joeforker, .

0

win32. OCX /.

I know this is not exactly the solution you are asking for, but you might think about the pedal. It is easy to program and will be very similar to a spoken word to start / stop recording. Check them out: www.pedalpower.com

Hope this helps,

Reynaldo.

0
source

Source: https://habr.com/ru/post/1708950/


All Articles