Martin's corner on the web

Siri-like voice chat with Raspberry Pi : keep kids busy for a while :)

Here is a fun project to try out: a Siri-like voice talk with your Raspberry Pi, its a lovely way to attract youngsters and keep them entertained for a while. I use three components for the project, code is mainly scrapped out from various Internet sources

  1. A speech-to-text component that will do the voice recognition
  2. Some “brains” to analyze the so captured text
  3. A text to speech component that will speak out the result from component 2

The hardware required is a Raspberry Pi with Internet connectivity and a USB microphone.  Pi is running the 2012-12-16-wheezy-raspbian image; I don’t have a USB microphone, but I have a USB webcam (Logitech V-UAV35) with in-built microphone, so that worked out fine without any driver installation.

Speech recognition for Raspberry Pi can be done in number of ways, but I thought the most elegant would be to use Google’s voice recognition functions. I used this bash script to get that part done (source):

#!/bin/bash
arecord -D "plughw:1,0" -q -f cd -t wav | ffmpeg -loglevel panic -y -i - -ar 16000 -acodec flac file.flac  > /dev/null 2>&1
wget -q -U "Mozilla/5.0" --post-file file.flac --header "Content-Type: audio/x-flac; rate=16000" -O - "http://www.google.com/speech-api/v1/recognize?lang=en-us&client=chromium" | cut -d\" -f12  >stt.txt
cat stt.txt
rm file.flac  > /dev/null 2>&1

..and then set it to executable:

chmod +x stt.sh

You may need to install ffmpeg

sudo apt-get install ffmpeg

So what this does is to record to a flac file from the USB microphone until you press Ctrl+C and then passes that file to Google for analysis, which in turn returns the recognized text. Lets give it a try:

Untitled

It work pretty good even with my bad accent. The output is saved to stt.txt file.

Now onto the “brains” section, this is with no doubt a task for Wolfram Aplha. I used Python to interface with it, there is already a library to use. It is pretty easy to install, just follow the instructions in the link. I had to get an API key, which is a 2 minute task and gives you 2000 queries a month.

#!/usr/bin/python
import wolframalpha
import sys
#Get a free API key here http://products.wolframalpha.com/api/
#I may disable this key if I see lots of abuse
app_id='Q59EW4-7K8AHE858R'

client = wolframalpha.Client(app_id)

query = ' '.join(sys.argv[1:])
res = client.query(query)

if len(res.pods) > 0:
    texts = ""
    pod = res.pods[1]
    if pod.text:
        texts = pod.text
    else:
        texts = "I have no answer for that"
    print texts
else:
    print "I am not sure"

.. and lets try it out with the questions that keep me up at night:

Untitled

 

yep, brains are there. Now to the last part: speaking that answer out. Sure enough, we use Google’s speech services again (source)

#!/bin/bash
say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols "http://translate.google.com/translate_tts?tl=en&q=$*"; }
say $*

..you may need to “sudo apt-get install mplayer” first..

It sounds pretty cool indeed.

So finally a small script to put these to work together:

#!/bin/bash
echo Please speak now and press Ctrl+C when done
./stt.sh
./tts.sh $(./wa.py $(cat stt.txt))

So overall a fun project, maybe with some potential to use in home automation..

17 thoughts on “Siri-like voice chat with Raspberry Pi : keep kids busy for a while :)

  1. Duncan

    I’ve been experimenting with Google Voice recognition too, but found problems using a USB mic with the Pi (too quiet, even with alsamixer turned right up). The same mic works fine on Windows, which led me to discover some Google results that suggest recording audio via USB isn’t great on Linux. Did you have any problems in this area, and if so, how did you overcome them?

    1. Martin Post author

      I have to speak somewhat loud for this to work, but just high voice, not shout at it. I use a webcam’s mic, maybe I am just lucky that it is sensitive enough. I just put alsamixer to max and nothing else.

      1. James Rowley

        Hi – i find the Logitec webcam’s with integrated mic work real nice. I have tried other USB mic’s & had similiar problems (too quiet, distorted & unsupported audio rates)

  2. Lance

    Hey now — this is fantastic! What a brilliant way to integrate all those services! I’m bookmarking this for dissection later.

  3. viraj

    I have read all your post.
    I am having a problem at the first step only.Please help me.
    when i paste the first binbash,i get the following error:

    pi@raspberrypi ~ $ #!/bin/bash
    pi@raspberrypi ~ $ arecord -D “plughw:1,0” -q -f cd -t wav | ffmpeg -y -i – -ar 16000 -acodec flac file.flac
    ALSA lib pcm_hw.c:1401:(_snd_pcm_hw_open) Invalid value for card
    arecord: main:682: audio open error: No such file or directory
    ffmpeg version 0.8.6-6:0.8.6-1+rpi1, Copyright (c) 2000-2013 the Libav developer s
    built on Mar 31 2013 13:58:10 with gcc 4.6.3
    *** THIS PROGRAM IS DEPRECATED ***
    This program is only provided for compatibility and will be removed in a future release. Please use avconv instead.
    pipe:: Invalid data found when processing input
    pi@raspberrypi ~ $ wget -q -U “Mozilla/5.0” –post-file file.flac –header “Cont ent-Type: audio/x-flac; rate=16000” -O – “http://www.google.com/speech-api/v1/re cognize?lang=en-us&client=chromium” | cut -d\” -f12
    pi@raspberrypi ~ $ rm file.flac
    rm: cannot remove `file.flac’: No such file or directory

    1. Martin Post author

      can you try to run “sudo usermod -a -G audio pi”? You will have to log out/log in again for this to take effect

  4. Lokesh

    I tried the google speech service but didnt work. I put below code in speech.sh file, hardcoded it to say ‘Hello’

    #!/bin/bash
    say() { local IFS=+;/usr/bin/mplayer -ao alsa -really-quiet -noconsolecontrols “http://translate.google.com/translate_tts?tl=en&q=hello”; }
    say $*

    When i put url – http://translate.google.com/translate_tts?tl=en&q=hello in my chrome browser it says out hello but not when i run speech.sh

    I have installed mplayer and was able to play few mp3 file too. Please help

  5. captaindyson

    i have compiled everything, but i keep getting the error:
    mplayer: could not connect to socket
    mplayer: No such file or directory
    then the speaker says:
    “I am not sure”

    i have trawled the internet for these error messages but i just cant find the solution, any ideas?

  6. Hermit

    I also had the same problem with mplayer. I could play wav files but could not get google translate to play or mp3’s to play. So I just swapped out the call to mplayer with mpg321:

    #!/bin/bash
    say() { local IFS=+;/usr/bin/mpg321 -q “http://translate.google.com/translate_tts?tl=en&q=$*”; }
    say $*

    I know it doesn’t solve mplayer, but it is a work around to get you going.

  7. Chris

    so… the stt.sh file simply didn’t work for me, I HAVE installed ffmpeg, I’m using the newest version of raspbian, I’ve followed all instructions, and… the stt.sh file simply doesn’t work for me

  8. Tyler Swain

    Hello, I am having issues recording in with the speech to text. I am using a logitech C905 webcam with built in mic.
    When I run speech2text or the entire program I recieve ^Ccut: invalid byte or field list
    Try ‘cut –help’ for more information.

    I assume this means nothing is being picked up from the microphone.
    I am thinking that something is not configured properly in alsa, when I try to run arecord, I receive the following message
    arecord: main:682: audio open error: No such file or directory
    I have verified that my user is a member of the audio group, I don’t really know what else to try before I buy a new webcame or usb mic. I am currently running facial recognition through opencv with the cam, and it’s working wonderfully, so I really don’t want to fiddle with a different camera if I can avoid it, and the arecord error makes me think it must be a configuration issue?

  9. Pingback: Cubieboard Voice Recognition | cubieboard