Voice Messaging API - High Accuracy for Specific Phrases

I have some ideas for voice-controlled applications. Unfortunately, based on what I saw from Siri and Google Voice Actions, the technology is not quite there yet. Even in a completely calm environment, accuracy is so poor that it is often easier to type into a telephone.

One way to alleviate the problem would be to limit the system to a few commands specially selected for sound very different, in contrast to transferring sound to a service and simply returning text.

So, I have the following requirements:

  • Very high accuracy when accessing a limited set of commands
  • It is desirable that it works on mobile devices, but only libraries on the computer can be useful.
  • Offline is again preferred, but not necessary.
  • No need to be open source - licensing is fine

Is there such an API or software?

+4
source share
4 answers

I recently participated in a project to develop a platform for grammar-based mobile speech recognition applications with the following features:

All components are open source and should not be too complicated to configure your own server and transfer the system to your language, given that you have acoustic models for this language.

+4
source

VoiceXML and SRGS can be a good starting point for your search. Not so much in the open-source world, unfortunately, because getting this kind of β€œright” will mean a big salary.

+1
source

Using a speech recognition system that supports grammar ( SRGS ) will increase your recognition rate. Grammars limit the search space by defining expected words and phrases as rules that the speech recognition system uses to match, and therefore, can improve recognition performance and speed.

VoiceXML is a good language for developing speech applications that use the phone as a way of interacting. What I mean by using the phone as a way of interacting is that the user actually dials an IVR system that answers the call, and then begins to interact with the user through recorded audio prompts and user input through speech input or keyboards. VoiceXML is not intended for mobile applications with visual interfaces, such as a native Android application or a web application. To develop visual applications that use speech, you can use something like the β€œNuance Mobile Tool,” which can have a hefty price tag. Or something open source, like Sphinx .

+1
source

Most cloud-based speech recognition APIs (Google, AT&T, Siri, etc.) do not allow the use of regular SRGS grammar to improve accuracy. This is really unsuccessful.

One possibility is to combine the two technologies from Voxeo , namely Tropo and Phono . The first is an API-based voice platform, which is much easier to use than the VoiceXML platform, and the second is a jQuery plugin for creating (and managing) voice calls from your browser. Tropo supports SRGS grammar.

+1
source

Source: https://habr.com/ru/post/1435820/


All Articles