So it doesn't seem like sdk is accepting text, but it is accepting audio file input. It even outputs an audio file.
python -m pushtotalk -i somefile.wav -o outputfile.wav
This made me think, and I wrote a script:
echo $1 >> query.txt espeak -f query.txt -w audio_query.wav python -m pushtotalk -i audio_query.wav -o audio_response.wav &> pushtotalk.log pocketsphinx_continuous -infile audio_response.wav 2> pocketsphinx.log > response.txt cat response.txt rm response.txt query.txt audio_query.wav audio_response.wav pocketsphinx.log pushtotalk.log
This is just a shell script, but it can probably also be converted to python. To use it, save the script as pushtotalk_script.sh and run ./pushtotalk_script.sh "how tall is mount kilamanjaro? I use espeak to turn the text into a wav file. Then using the sdk helper you get the answer. You can stop here and play the answer. Pocketsphinx is an audio transcription mechanism created by CMU. You can find packages for these tools using apt-get, but if you are on OSX, the pocketsphinx package does not work and you will need to use these formulas . Also, here python module used espeak. And there repo for pocketsphinx like python module, but I can not svya amb more than two links.
The Google Assistant doesn't seem to be very good at espeak exit. However, Pocketsphinx has few problems decrypting text. But it works well for simple answers. Depending on the length of the question and the sound files of the answer, the whole process takes from 5 to 10 seconds.
source share