Ultra fast text to speech (WAV & # 8594; MP3) in ASP.NET MVC

This question mainly relates to the suitability of the Microsoft Speech API (SAPI) for server workloads and whether it can be reliably used inside w3wp for speech synthesis. We have an asynchronous controller that uses the built-in System.Speech assembly in .NET 4 (and not Microsoft.Speech , which comes as part of the Microsoft Speech Platform - Runtime Version 11) and lame.exe for generating mp3 files as follows:

  [CacheFilter] public void ListenAsync(string url) { string fileName = string.Format(@"C:\test\{0}.wav", Guid.NewGuid()); try { var t = new System.Threading.Thread(() => { using (SpeechSynthesizer ss = new SpeechSynthesizer()) { ss.SetOutputToWaveFile(fileName, new SpeechAudioFormatInfo(22050, AudioBitsPerSample.Eight, AudioChannel.Mono)); ss.Speak("Here is a test sentence..."); ss.SetOutputToNull(); ss.Dispose(); } var process = new Process() { EnableRaisingEvents = true }; process.StartInfo.FileName = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, @"bin\lame.exe"); process.StartInfo.Arguments = string.Format("-V2 {0} {1}", fileName, fileName.Replace(".wav", ".mp3")); process.StartInfo.UseShellExecute = false; process.StartInfo.RedirectStandardOutput = false; process.StartInfo.RedirectStandardError = false; process.Exited += (sender, e) => { System.IO.File.Delete(fileName); AsyncManager.OutstandingOperations.Decrement(); }; AsyncManager.OutstandingOperations.Increment(); process.Start(); }); t.Start(); t.Join(); } catch { } AsyncManager.Parameters["fileName"] = fileName; } public FileResult ListenCompleted(string fileName) { return base.File(fileName.Replace(".wav", ".mp3"), "audio/mp3"); } 

The question is why does SpeechSynthesizer need to be run in a separate thread like this in order to return (this is reported elsewhere on SO here and here ) and does the STAThreadRouteHandler implement for this request more efficient / scalable than the approach above?

Secondly, what are the options for running SpeakAsync in the context of ASP.NET (MVC or WebForms)? None of the options I tried seem to work (see Update below).

Any other suggestions for improving this template are welcome (i.e. two dependencies that must be executed sequentially with each other, but each of them has async support). I don’t think this circuit is stable under load, especially considering the known memory leaks in SpeechSynthesizer . Whereas starting this service on a different stack together.

Update: None of the Speak or SpeakAsnc parameters work under STAThreadRouteHandler . The first produces:

System.InvalidOperationException: Asynchronous operations are not allowed in this context. The page that starts the asynchronous operation has the Async attribute set to true and the asynchronous operation can only be started on the page before the PreRenderComplete event. in System.Web.LegacyAspNetSynchronizationContext.OperationStarted () at System.ComponentModel.AsyncOperationManager.CreateOperation (UserSuppliedState Object) with System.Speech.Internal.Synthesis.VoiceSynthesis..ctor (WeakReySynthesisizer) Speech.Sntntechizer ) at System.Speech.Synthesis.SpeechSynthesizer.SetOutputToWaveFile (String path, SpeechAudioFormatInfo formatInfo)

The latter leads to:

System.InvalidOperationException: The asynchronous Listen method cannot be executed synchronously. in System.Web.Mvc.Async.AsyncActionDescriptor.Execute (ControllerContext controllerContext, IDictionary`2)

It seems that a custom STA thread pool (with COM ThreadStatic instances) is the best approach: http://marcinbudny.blogspot.ca/2012/04/dealing-with-sta-coms-in-web.html

Update # 2 : it looks like System.Speech.SpeechSynthesizer needs STA processing, it seems to work fine in MTA streams as long as you follow this Start/Join pattern. Here is a new version that can correctly use SpeakAsync (the problem was fixed prematurely!) And splits the WAV generation and MP3 generation into two separate requests:

 [CacheFilter] [ActionName("listen-to-text")] public void ListenToTextAsync(string text) { AsyncManager.OutstandingOperations.Increment(); var t = new Thread(() => { SpeechSynthesizer ss = new SpeechSynthesizer(); string fileName = string.Format(@"C:\test\{0}.wav", Guid.NewGuid()); ss.SetOutputToWaveFile(fileName, new SpeechAudioFormatInfo(22050, AudioBitsPerSample.Eight, AudioChannel.Mono)); ss.SpeakCompleted += (sender, e) => { ss.SetOutputToNull(); ss.Dispose(); AsyncManager.Parameters["fileName"] = fileName; AsyncManager.OutstandingOperations.Decrement(); }; CustomPromptBuilder pb = new CustomPromptBuilder(settings.DefaultVoiceName); pb.AppendParagraphText(text); ss.SpeakAsync(pb); }); t.Start(); t.Join(); } [CacheFilter] public ActionResult ListenToTextCompleted(string fileName) { return RedirectToAction("mp3", new { fileName = fileName }); } [CacheFilter] [ActionName("mp3")] public void Mp3Async(string fileName) { var process = new Process() { EnableRaisingEvents = true, StartInfo = new ProcessStartInfo() { FileName = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, @"bin\lame.exe"), Arguments = string.Format("-V2 {0} {1}", fileName, fileName.Replace(".wav", ".mp3")), UseShellExecute = false, RedirectStandardOutput = false, RedirectStandardError = false } }; process.Exited += (sender, e) => { System.IO.File.Delete(fileName); AsyncManager.Parameters["fileName"] = fileName; AsyncManager.OutstandingOperations.Decrement(); }; AsyncManager.OutstandingOperations.Increment(); process.Start(); } [CacheFilter] public ActionResult Mp3Completed(string fileName) { return base.File(fileName.Replace(".wav", ".mp3"), "audio/mp3"); } 
+4
source share
2 answers

I / O is very expensive on the server. how many multiple streams of wav files do you think you can get to the server’s hard drive? Why not do all this in memory and only write mp3 when it is fully processed? mp3 is much smaller, and I / O will be used for a short period of time. You can even change the code to return the stream directly to the user instead of saving to mp3 if you want.

How can I use LAME to encode wav in mp3 c #

+4
source

This question is a bit outdated, but this is what I am doing and it works fine:

  public Task<FileStreamResult> Speak(string text) { return Task.Factory.StartNew(() => { using (var synthesizer = new SpeechSynthesizer()) { var ms = new MemoryStream(); synthesizer.SetOutputToWaveStream(ms); synthesizer.Speak(text); ms.Position = 0; return new FileStreamResult(ms, "audio/wav"); } }); } 

can help someone ...

0
source

Source: https://habr.com/ru/post/1433259/


All Articles