I have a voice application that would be greatly improved if it were possible to use the “trigger word” to start recording sound. I don’t need the full mechanism of the speech text, just the ability to reliably / efficiently detect the trigger word.
I am wondering if there are any specialized speech mechanisms that support this particular use case, or any libraries / methods for developing such a single object detection mechanism. Ideally, I would like it to work in noisy environments, but it can be trained for a single user voice.
Pointers to scientific articles / topics would also be appreciated, so I know what to ask for.
source
share