Accuracy of MS System.Speech.Recognizer and SpeechRecognitionEngine

I am currently testing SpeechRecognitionEngine, loading a fairly simple rule from an XML file. Actually it’s just between (“decrypt email”, “remove encryption”) or (“encrypt email”, “add encryption”).

I prepared my PC with Windows 7 and added the words encryption and decryption, as I understand that they are very similar. The recognizer already has a problem with the difference between the two.

The problem I am facing is that it too often recognizes things. I have established confidence in 0.93 because my voice is in a quiet room when they say that exact words sometimes reach 0.93. But then, if I turn on the radio, the voice of the announcer or song may mean that this recognizer believes that he heard with confidence more than 0.93 with the words "decrpyt the email".

Maybe Lady Gaga is a mask for disguise Applause for secretly decrypting emails :-)

Can anyone help in developing how to do something to make this recognizer workable.

In fact, the recognizer also picks up keyboard noise as "decrypt email." I do not understand how this is possible.

In addition to my editing buddy, there are at least two managed namespaces for MS Speech Microsoft.Speech and System.Speech. - For this issue, it is important to know that this is System.Speech.

+4
source share
1 answer

If the only thing the System.Speech recognizer is listening to is "encrypt email", then the recognizer will generate a lot of false positives. (Especially in a noisy environment.) If you add a DictationGrammar (especially pronunciation grammar) in parallel, the DictationGrammar will take noise and you can check (for example) the name of the grammar in the event handler to discard the fiction recognition.

Example

A (subset):

static void Main(string[] args) { Choices gb = new Choices(); gb.Add("encrypt the document"); gb.Add("decrypt the document"); Grammar commands = new Grammar(gb); commands.Name = "commands"; DictationGrammar dg = new DictationGrammar("grammar:dictation#pronunciation"); dg.Name = "Random"; using (SpeechRecognitionEngine recoEngine = new SpeechRecognitionEngine(new CultureInfo("en-US"))) { recoEngine.SetInputToDefaultAudioDevice(); recoEngine.LoadGrammar(commands); recoEngine.LoadGrammar(dg); recoEngine.RecognizeCompleted += recoEngine_RecognizeCompleted; recoEngine.RecognizeAsync(); System.Console.ReadKey(true); recoEngine.RecognizeAsyncStop(); } } static void recoEngine_RecognizeCompleted(object sender, RecognizeCompletedEventArgs e) { if (e.Result.Grammar.Name != "Random") { System.Console.WriteLine(e.Result.Text); } } 
+11
source

Source: https://habr.com/ru/post/1502348/


All Articles