KitKat takes 6 seconds longer than Froyo to respond to TextToSpeech.speak () on first call

On a recent phone that has the latest version of Android installed, the TextToSpeech engine can take about 6 seconds longer to respond when it is first called compared to an old phone.

My test code is shown below. (EDITED: Alternative code for Android 4.0.3 Ice Cream Sandwich, API 15 and above, shown at the end.)

On a 1-year-old Motorola Moto G running 4.4.4 KitKat, it may take more than 7 seconds for the TextToSpeech engine to complete the first speak() call of the word “Start”. Here is the output of my code.

 D/speak﹕ call: 1415501851978 D/speak﹕ done: 1415501859122, delay: 7144 

With the 3-year-old Samsung SGH-T499Y running 2.2 Froyo, it takes less than a second to end a conversation:

 D/speak﹕ call: 1415502283050 D/speak﹕ done: 1415502283900, delay: 850 

Is there any way to find out what happens during this 6 second delay?
Is there a way to get a faster (and presumably faster) device device to respond faster?

 package com.example.speak import android.app.Activity; import android.speech.tts.TextToSpeech; import android.os.Bundle; import android.util.Log; import java.util.HashMap; import java.util.Locale; public class MainActivity extends Activity implements TextToSpeech.OnInitListener, TextToSpeech.OnUtteranceCompletedListener { private final String TAG = "speak"; private Activity activity; private TextToSpeech tts; private long launchTime; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); tts = new TextToSpeech(getApplicationContext(), this); } public void onInit(int status) { if (status == TextToSpeech.SUCCESS) { tts.setOnUtteranceCompletedListener(this); tts.setLanguage(Locale.UK); ttsSay("Started"); } } private void ttsSay(String toSpeak) { int mode = TextToSpeech.QUEUE_FLUSH; HashMap hashMap = new HashMap<String, String>(); hashMap.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, TAG); launchTime = System.currentTimeMillis(); Log.d(TAG, "call: " + launchTime); tts.speak(toSpeak, mode, hashMap); } public void onUtteranceCompleted(String utteranceID) { long millis = System.currentTimeMillis(); Log.d(TAG, "done: " + millis + ", delay: " + (millis - launchTime)); } } 

EDIT: Starting with Ice Cream Sandwich 4.0.3, API 15, Android provides a UtteranceProgressListener that can be used to start and end text-to-speech. The following is incompatible with Froyo;

 package com.example.announceappprogress; import android.app.Activity; import android.speech.tts.TextToSpeech; import android.os.Bundle; import android.speech.tts.UtteranceProgressListener; import android.util.Log; import java.util.HashMap; import java.util.Locale; public class MainActivity extends Activity implements TextToSpeech.OnInitListener { private final String TAG = "speak"; private TextToSpeech tts; private long launchTime; private long startTime; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); tts = new TextToSpeech(getApplicationContext(), this); tts.setOnUtteranceProgressListener(mProgressListener); } public void onInit(int status) { if (status == TextToSpeech.SUCCESS) { tts.setLanguage(Locale.UK); ttsSay("Started"); } } private void ttsSay(String toSpeak) { int mode = TextToSpeech.QUEUE_FLUSH; HashMap hashMap = new HashMap<String, String>(); hashMap.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, TAG); launchTime = System.currentTimeMillis(); Log.d(TAG, "called: " + launchTime); tts.speak(toSpeak, mode, hashMap); } private final UtteranceProgressListener mProgressListener = new UtteranceProgressListener() { @Override public void onStart(String utteranceId) { startTime = System.currentTimeMillis(); Log.d(TAG, "started: " + startTime + ", delay: " + (startTime - launchTime)); } @Override public void onError(String utteranceId) {} // Do nothing. @Override public void onDone(String utteranceId) { long millis = System.currentTimeMillis(); Log.d(TAG, "done: " + millis + ", total: " + (millis - launchTime) + ", duration: " + (millis - startTime)); } }; } 

Here is an example of the output that it gives on a Motorola Moto G running on 4.4.4 KitKat:

 D/speak﹕ called: 1415654293442 D/speak﹕ started: 1415654299287, delay: 5845 D/speak﹕ done: 1415654299995, total: 6553, duration: 708 
+5
source share
1 answer

You probably are not using the same TTS engine on both devices.

The more human-sounding TTS concatenational engines (which you might have installed on your new device) can use hundreds of megabytes of data files to generate speech. Most of these systems require a certain setup time for the first utterance. Simple (and more mechanically sounding) systems based on formants can only require a couple of megabytes, and therefore loading is much faster.

An interesting experiment would be the time of the “second” statement. I predict that it will be faster than the first on your new phone. In addition, the more natural-sounding TTS systems typically have a longer waiting time between calling the TTS and the beginning of the sound from the utterance. In particular, if a long sentence is given, since the system scans the entire sentence in order to formulate the best wording before starting the statement.

Are you also sure that your new device does not use any TTS cloud service? There are other important additional variables that will affect latency.

+1
source

Source: https://habr.com/ru/post/1206508/


All Articles