Saving audio input to the Android Stock speech recognition system

Question

Saving audio input to the Android Stock speech recognition system

I am trying to save audio data listened to by the speech recognition service for android in a file.

I actually implement a RecognitionListener as described here: Speech to text on Android

save data to buffer as shown here: Capturing sound sent to Google's speech recognition server

and write the buffer to the Wav file, as here. Android Write raw bytes to WAVE file for Http streaming

My problem is how to get the appropriate sound settings to save in wav file headers. In fact, when I play a wav file, I hear only a strange noise, with these parameters,

 short nChannels=2;// audio channels int sRate=44100; // Sample rate short bSamples = 16;// byteSample

or nothing with this:

 short nChannels=1;// audio channels int sRate=8000; // Sample rate short bSamples = 16;// byteSample

What is confusing is that looking at the parameters of the speech recognition task from logcat, I find first Set the PLAYBACK sampling rate to 44100 Hz :

  12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Set PLAYBACK PCM format to S16_LE (Signed 16 bit Little Endian) 12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Using 2 channels for PLAYBACK. 12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Set PLAYBACK sample rate to 44100 HZ 12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Buffer size: 2048 12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Latency: 46439

and then aInfo.SampleRate = 8000 when it plays the file to send to google server:

  12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::InitWavParser 12-20 14:41:36.152: DEBUG/(2364): File open Succes 12-20 14:41:36.152: DEBUG/(2364): File SEEK End Succes ... 12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::ReadData 12-20 14:41:36.152: DEBUG/(2364): Data Read buff = RIFF? 12-20 14:41:36.152: DEBUG/(2364): Data Read = RIFF? 12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::ReadData 12-20 14:41:36.152: DEBUG/(2364): Data Read buff = fmt ... 12-20 14:41:36.152: DEBUG/(2364): PVWAVPARSER_OK 12-20 14:41:36.156: DEBUG/(2364): aInfo.AudioFormat = 1 12-20 14:41:36.156: DEBUG/(2364): aInfo.NumChannels = 1 12-20 14:41:36.156: DEBUG/(2364): aInfo.SampleRate = 8000 12-20 14:41:36.156: DEBUG/(2364): aInfo.ByteRate = 16000 12-20 14:41:36.156: DEBUG/(2364): aInfo.BlockAlign = 2 12-20 14:41:36.156: DEBUG/(2364): aInfo.BitsPerSample = 16 12-20 14:41:36.156: DEBUG/(2364): aInfo.BytesPerSample = 2 12-20 14:41:36.156: DEBUG/(2364): aInfo.NumSamples = 2258

So, how can I find out the correct parameters for saving the sound buffer in a good wav sound file?

+38

android speech-recognition voice-recognition audio wav

mmmx Dec 20 '11 at 23:40

source share

3 answers

Malcolm Smith · Answer 1 · 2012-05-29 21:50

You did not specify your code to actually write PCM data, so it is difficult to diagnose, but if you hear strange sounds, then most likely you are mistaken endian when recording data or the wrong number of channels. Improper use of the sampling rate will only result in a slower or faster sound, but if the sound sounds completely distorted, this is probably an error in determining the number of channels or the entericity of your byte stream.

To know for sure, just transfer your bytes directly to a file without a header (raw PCM data). Thus, you can eliminate any errors when writing the file header. Then use Audacity to import the raw data, experimenting with various parameters (bit depth, end, channels) until you get an audio file that sounds right (only one will be right). You do this from File-> Import-> Raw Data ...

Once you have defined your byte format this way, you only need to worry about whether you are setting the headers correctly. You can refer to this link http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html for the file format. Or see the following links to existing Java solutions for recording audio files, Java - reading, managing and writing WAV files or FMJ . Although I think they may not be used on Android.

If you need to roll your own WAV / RIFF-writer, remember that Java data types are big-endian , so any multibyte primitives you write to a file must be written in reverse byte order to match the small end of RIFF.

chandru · Answer 2 · 2012-07-11 19:35

8000 , small endian, 16-bit PCM , mono channel did the trick

Nikolay Shmyrev · Answer 3 · 2016-02-03 15:30

OnBufferReceived does not work in the latest version, you can instead use recording / saving sound from the intention of voice recognition .

Saving audio input to the Android Stock speech recognition system

More articles: