Siri-like voice chat with Raspberry Pi : keep kids busy for a while :)

Here is a fun project to try out: a Siri-like voice talk with your Raspberry Pi, its a lovely way to attract youngsters and keep them entertained for a while. I use three components for the project, code is mainly scrapped out from various Internet sources

A speech-to-text component that will do the voice recognition
Some “brains” to analyze the so captured text
A text to speech component that will speak out the result from component 2

The hardware required is a Raspberry Pi with Internet connectivity and a USB microphone. Pi is running the 2012-12-16-wheezy-raspbian image; I don’t have a USB microphone, but I have a USB webcam (Logitech V-UAV35) with in-built microphone, so that worked out fine without any driver installation.

Speech recognition for Raspberry Pi can be done in number of ways, but I thought the most elegant would be to use Google’s voice recognition functions. I used this bash script to get that part done (source):

#!/bin/bash
arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -loglevel panic -y -i - -ar 16000 -acodec flac file.flac  > /dev/null 2>&1
wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt
cat stt.txt
rm file.flac  > /dev/null 2>&1

..and then set it to executable:

chmod +x stt.sh

You may need to install ffmpeg

sudo apt-get install ffmpeg

So what this does is to record to a flac file from the USB microphone until you press Ctrl+C and then passes that file to Google for analysis, which in turn returns the recognized text. Lets give it a try:

It work pretty good even with my bad accent. The output is saved to stt.txt file.

Now onto the “brains” section, this is with no doubt a task for Wolfram Aplha. I used Python to interface with it, there is already a library to use. It is pretty easy to install, just follow the instructions in the link. I had to get an API key, which is a 2 minute task and gives you 2000 queries a month.

#!/usr/bin/python
import wolframalpha
import sys
#Get a free API key here http://products.wolframalpha.com/api/
#I may disable this key if I see lots of abuse
app_id='Q59EW4-7K8AHE858R'

client = wolframalpha.Client(app_id)

query = ' '.join(sys.argv[1:])
res = client.query(query)

if len(res.pods) > 0:
    texts = ""
    pod = res.pods[1]
    if pod.text:
        texts = pod.text
    else:
        texts = "I have no answer for that"
    print texts
else:
    print "I am not sure"

.. and lets try it out with the questions that keep me up at night:

yep, brains are there. Now to the last part: speaking that answer out. Sure enough, we use Google’s speech services again (source)

#!/bin/bash
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols "http://translate.google.com/translate_tts?tl=en&q=$*"; }
say $*

..you may need to “sudo apt-get install mplayer” first..

It sounds pretty cool indeed.

So finally a small script to put these to work together:

#!/bin/bash
echo Please speak now and press Ctrl+C when done
./stt.sh
./tts.sh $(./wa.py $(cat stt.txt))

So overall a fun project, maybe with some potential to use in home automation..

By Martin | March 21, 2013 | Raspberry Pi |

17 thoughts on “Siri-like voice chat with Raspberry Pi : keep kids busy for a while :)”

Duncan March 25, 2013 at 2:27 pm

I’ve been experimenting with Google Voice recognition too, but found problems using a USB mic with the Pi (too quiet, even with alsamixer turned right up). The same mic works fine on Windows, which led me to discover some Google results that suggest recording audio via USB isn’t great on Linux. Did you have any problems in this area, and if so, how did you overcome them?
1. Martin Post authorMarch 25, 2013 at 2:34 pm
  
  I have to speak somewhat loud for this to work, but just high voice, not shout at it. I use a webcam’s mic, maybe I am just lucky that it is sensitive enough. I just put alsamixer to max and nothing else.
  1. James Rowley July 30, 2013 at 9:31 pm
    
    Hi – i find the Logitec webcam’s with integrated mic work real nice. I have tried other USB mic’s & had similiar problems (too quiet, distorted & unsupported audio rates)
Rob March 28, 2013 at 12:55 am

Never mind the kids; I’m enjoying making the computer shout stuff at me!
Lance April 1, 2013 at 5:05 am

Hey now — this is fantastic! What a brilliant way to integrate all those services! I’m bookmarking this for dissection later.
viraj May 3, 2013 at 10:55 am

I have read all your post.
I am having a problem at the first step only.Please help me.
when i paste the first binbash,i get the following error:

pi@raspberrypi ~ $ #!/bin/bash
pi@raspberrypi ~ $ arecord -D “plughw:1,0” -q -f cd -t wav | ffmpeg -y -i – -ar 16000 -acodec flac file.flac
ALSA lib pcm_hw.c:1401:(_snd_pcm_hw_open) Invalid value for card
arecord: main:682: audio open error: No such file or directory
ffmpeg version 0.8.6-6:0.8.6-1+rpi1, Copyright (c) 2000-2013 the Libav developer s
built on Mar 31 2013 13:58:10 with gcc 4.6.3
*** THIS PROGRAM IS DEPRECATED ***
This program is only provided for compatibility and will be removed in a future release. Please use avconv instead.
pipe:: Invalid data found when processing input
pi@raspberrypi ~ $ wget -q -U “Mozilla/5.0” –post-file file.flac –header “Cont ent-Type: audio/x-flac; rate=16000” -O – “http://www.google.com/speech-api/v1/re cognize?lang=en-us&client=chromium” | cut -d\” -f12
pi@raspberrypi ~ $ rm file.flac
rm: cannot remove `file.flac’: No such file or directory
1. Martin Post authorMay 3, 2013 at 11:58 am
  
  can you try to run “sudo usermod -a -G audio pi”? You will have to log out/log in again for this to take effect
Lokesh May 15, 2013 at 9:55 am

I tried the google speech service but didnt work. I put below code in speech.sh file, hardcoded it to say ‘Hello’

#!/bin/bash
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols “http://translate.google.com/translate_tts?tl=en&q=hello”; }
say $*

When i put url – http://translate.google.com/translate_tts?tl=en&q=hello in my chrome browser it says out hello but not when i run speech.sh

I have installed mplayer and was able to play few mp3 file too. Please help
matt May 18, 2013 at 4:00 am

can you use a microphone from the audio port
1. Martin Post authorMay 18, 2013 at 11:00 am
  
  No, it is audio out only
captaindyson June 4, 2013 at 6:09 pm

i have compiled everything, but i keep getting the error:
mplayer: could not connect to socket
mplayer: No such file or directory
then the speaker says:
“I am not sure”

i have trawled the internet for these error messages but i just cant find the solution, any ideas?
Hermit June 5, 2013 at 9:16 am

I also had the same problem with mplayer. I could play wav files but could not get google translate to play or mp3’s to play. So I just swapped out the call to mplayer with mpg321:

#!/bin/bash
say() { local IFS=+;/usr/bin/mpg321 -q “http://translate.google.com/translate_tts?tl=en&q=$*”; }
say $*

I know it doesn’t solve mplayer, but it is a work around to get you going.
1. Martin Post authorJune 5, 2013 at 9:17 am
  
  Thanks for sharing
Chris July 12, 2013 at 3:45 am

so… the stt.sh file simply didn’t work for me, I HAVE installed ffmpeg, I’m using the newest version of raspbian, I’ve followed all instructions, and… the stt.sh file simply doesn’t work for me
Minja November 13, 2013 at 11:44 pm

Hey! I keep getting that “bad fd number”-error in stt.sh, can anyone help me with that?
Tyler Swain November 23, 2013 at 12:48 am

Hello, I am having issues recording in with the speech to text. I am using a logitech C905 webcam with built in mic.
When I run speech2text or the entire program I recieve ^Ccut: invalid byte or field list
Try ‘cut –help’ for more information.

I assume this means nothing is being picked up from the microphone.
I am thinking that something is not configured properly in alsa, when I try to run arecord, I receive the following message
arecord: main:682: audio open error: No such file or directory
I have verified that my user is a member of the audio group, I don’t really know what else to try before I buy a new webcame or usb mic. I am currently running facial recognition through opencv with the cam, and it’s working wonderfully, so I really don’t want to fiddle with a different camera if I can avoid it, and the arecord error makes me think it must be a configuration issue?
Pingback: Cubieboard Voice Recognition | cubieboard

Martin's corner on the web

personal blog

Siri-like voice chat with Raspberry Pi : keep kids busy for a while :)

17 thoughts on “Siri-like voice chat with Raspberry Pi : keep kids busy for a while :)”

Share this:

17 thoughts on “Siri-like voice chat with Raspberry Pi : keep kids busy for a while :)”