├── .gitignore ├── requirements.txt ├── requirements_Python2.txt ├── images ├── favicon.png └── little_pi.png ├── audio_files ├── harvard.wav └── jackhammer.wav ├── LICENSE ├── guessing_game.py └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | *.py[cod] 2 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pyaudio>=0.2.11 2 | SpeechRecognition>=3.8.1 3 | -------------------------------------------------------------------------------- /requirements_Python2.txt: -------------------------------------------------------------------------------- 1 | monotonic>=1.4 2 | pyaudio>=0.2.11 3 | SpeechRecognition>=3.8.1 4 | -------------------------------------------------------------------------------- /images/favicon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mramshaw/Speech-Recognition/HEAD/images/favicon.png -------------------------------------------------------------------------------- /images/little_pi.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mramshaw/Speech-Recognition/HEAD/images/little_pi.png -------------------------------------------------------------------------------- /audio_files/harvard.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mramshaw/Speech-Recognition/HEAD/audio_files/harvard.wav -------------------------------------------------------------------------------- /audio_files/jackhammer.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mramshaw/Speech-Recognition/HEAD/audio_files/jackhammer.wav -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Real Python 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /guessing_game.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # pylint: disable=C0103 4 | 5 | """ 6 | A simple Guessing Game to test speech recognition. 7 | 8 | A random fruit is chosen, and the user has three tries to guess which. 9 | """ 10 | 11 | from __future__ import print_function 12 | 13 | import random 14 | import time 15 | 16 | import speech_recognition as sr 17 | 18 | 19 | def recognize_speech_from_mic(recognizer, microphone): 20 | """Transcribe speech recorded from `microphone`. 21 | 22 | Returns a dictionary with three keys: 23 | "success": a boolean indicating whether or not the API request was 24 | successful 25 | "error": `None` if no error occured, otherwise a string containing 26 | an error message if the API could not be reached or 27 | speech was unrecognizable 28 | "transcription": `None` if speech could not be transcribed, 29 | otherwise a string containing the transcribed text 30 | """ 31 | # check that recognizer and microphone arguments are appropriate type 32 | if not isinstance(recognizer, sr.Recognizer): 33 | raise TypeError("`recognizer` must be `Recognizer` instance") 34 | 35 | if not isinstance(microphone, sr.Microphone): 36 | raise TypeError("`microphone` must be `Microphone` instance") 37 | 38 | # adjust the recognizer sensitivity to ambient noise and record audio 39 | # from the microphone 40 | with microphone as source: 41 | recognizer.adjust_for_ambient_noise(source) 42 | audio = recognizer.listen(source) 43 | 44 | # set up the response object 45 | response = { 46 | "success": True, 47 | "error": None, 48 | "transcription": None 49 | } 50 | 51 | # try recognizing the speech in the recording 52 | # if a RequestError or UnknownValueError exception is caught, 53 | # update the response object accordingly 54 | try: 55 | response["transcription"] = recognizer.recognize_google(audio) 56 | except sr.RequestError: 57 | # API was unreachable or unresponsive 58 | response["success"] = False 59 | response["error"] = "API unavailable" 60 | except sr.UnknownValueError: 61 | # speech was unintelligible 62 | response["error"] = "Unable to recognize speech" 63 | 64 | return response 65 | 66 | 67 | if __name__ == "__main__": 68 | # set the list of words, maxnumber of guesses, and prompt limit 69 | WORDS = ["apple", "banana", "grape", "orange", "mango", "lemon"] 70 | NUM_GUESSES = 3 71 | PROMPT_LIMIT = 5 72 | 73 | # create recognizer and mic instances 74 | recognizer = sr.Recognizer() 75 | microphone = sr.Microphone() 76 | 77 | # get a random word from the list 78 | word = random.choice(WORDS) 79 | 80 | # format the instructions string 81 | instructions = ( 82 | "I'm thinking of one of these words:\n" 83 | "{words}\n" 84 | "You have {n} tries to guess which one.\n" 85 | ).format(words=', '.join(WORDS), n=NUM_GUESSES) 86 | 87 | # show instructions and wait 3 seconds before starting the game 88 | print(instructions) 89 | time.sleep(3) 90 | 91 | for i in range(NUM_GUESSES): 92 | # get the guess from the user 93 | # if a transcription is returned, break out of the loop and 94 | # continue 95 | # if no transcription returned and API request failed, break 96 | # loop and continue 97 | # if API request succeeded but no transcription was returned, 98 | # re-prompt the user to say their guess again. Do this up 99 | # to PROMPT_LIMIT times 100 | for j in range(PROMPT_LIMIT): 101 | print('Guess {}. Speak!'.format(i+1)) 102 | guess = recognize_speech_from_mic(recognizer, microphone) 103 | if guess["transcription"]: 104 | break 105 | if not guess["success"]: 106 | break 107 | print("I didn't catch that. What did you say?\n") 108 | 109 | # if there was an error, stop the game 110 | if guess["error"]: 111 | print("ERROR: {}".format(guess["error"])) 112 | break 113 | 114 | # show the user the transcription 115 | print("You said: {}".format(guess["transcription"])) 116 | 117 | # determine if guess is correct and if any attempts remain 118 | guess_is_correct = guess["transcription"].lower() == word.lower() 119 | user_has_more_attempts = i < NUM_GUESSES - 1 120 | 121 | # determine if the user has won the game 122 | # if not, repeat the loop if user has more attempts 123 | # if no attempts left, the user loses the game 124 | if guess_is_correct: 125 | print("Correct! You win!") 126 | break 127 | elif user_has_more_attempts: 128 | print("Incorrect. Try again.\n") 129 | else: 130 | print("Sorry, you lose!\nI was thinking of '{}'.".format(word)) 131 | break 132 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Speech Recognition with Python 2 | 3 | [![Known Vulnerabilities](http://snyk.io/test/github/mramshaw/Speech-Recognition/badge.svg?style=plastic&targetFile=requirements.txt)](http://snyk.io/test/github/mramshaw/Speech-Recognition?style=plastic&targetFile=requirements.txt) 4 | 5 | I stumbled across this great tutorial, so why not try it out? 6 | 7 | http://realpython.com/python-speech-recognition/ 8 | 9 | As recommended, we will use [SpeechRecognition](http://github.com/Uberi/speech_recognition). 10 | 11 | After setting up my own repo, I found the author's: 12 | 13 | http://github.com/realpython/python-speech-recognition 14 | 15 | ![Raspberry](images/favicon.png) 16 | 17 | This also works with Raspberry Pi (using Python 3). 18 | 19 | ## Contents 20 | 21 | The contents are as follows: 22 | 23 | * [Prerequisites](#prerequisites) 24 | * [For microphone use](#for-microphone-use) 25 | * [Optional: monotonic (for Python 2)](#optional-monotonic-for-python-2) 26 | * [For speech recognition](#for-speech-recognition) 27 | * [Speech Engine](#speech-engine) 28 | * [Smoke Test](#smoke-test) 29 | * [Ambient Noise](#ambient-noise) 30 | * [Speech testing](#speech-testing) 31 | * [And finally, the guessing game](#and-finally-the-guessing-game) 32 | * [Raspberry Pi](#raspberry-pi) 33 | * [To Do](#to-do) 34 | 35 | ## Prerequisites 36 | 37 | Python 3 and `pip` installed (Python 2 is scheduled for End-of-life, although the instructions and 38 | code have been tested with Python 2 and an approprate `requirements` file for Python 2 is provided). 39 | 40 | #### For microphone use 41 | 42 | 1. Check for `pyaudio`: 43 | 44 | ``` Python 45 | >>> import pyaudio as pa 46 | Traceback (most recent call last): 47 | File "", line 1, in 48 | ImportError: No module named pyaudio 49 | >>> 50 | ``` 51 | 52 | [The next step is for linux; check the [pyaudio requirements](http://people.csail.mit.edu/hubert/pyaudio/#downloads) first.] 53 | 54 | 2. Install `portaudio19-dev`: 55 | 56 | ``` 57 | $ sudo apt-get install portaudio19-dev 58 | ``` 59 | 60 | 3. Install `pyaudio`: 61 | 62 | ``` 63 | $ pip install --user pyaudio 64 | ``` 65 | 66 | 4. Verify installation: 67 | 68 | ``` Python 69 | >>> import pyaudio as pa 70 | >>> pa.__version__ 71 | '0.2.11' 72 | >>> 73 | ``` 74 | 75 | #### Optional: monotonic (for Python 2) 76 | 77 | [SpeechRecognition](http://github.com/Uberi/speech_recognition#monotonic-for-python-2-for-faster-operations-in-some-functions-on-python-2) 78 | recommends installing [monotonic](http://pypi.python.org/pypi/monotonic) for Python 2 users. 79 | 80 | 1. Check for `monotonic`: 81 | 82 | ``` 83 | $ pip list --format=freeze | grep monotonic 84 | ``` 85 | 86 | 2. Install `monotonic`: 87 | 88 | ``` 89 | $ pip install --user monotonic 90 | ``` 91 | 92 | 3. Verify installation: 93 | 94 | ``` 95 | $ pip list --format=freeze | grep monotonic 96 | monotonic==1.4 97 | $ 98 | ``` 99 | 100 | #### For speech recognition 101 | 102 | SpeechRecognition can be used as a _sound recorder_: 103 | 104 | http://github.com/Uberi/speech_recognition/blob/master/examples/write_audio.py 105 | 106 | This is probably fine for occasional use - but there are better options available. 107 | 108 | 1. Check for `SpeechRecognition`: 109 | 110 | ``` 111 | $ pip list --format=freeze | grep SpeechRecognition 112 | ``` 113 | 114 | 2. Install `SpeechRecognition`: 115 | 116 | ``` 117 | $ pip install --user SpeechRecognition 118 | ``` 119 | 120 | 3. Verify: 121 | 122 | ``` Python 123 | >>> import speech_recognition as sr 124 | >>> sr.__version__ 125 | '3.8.1' 126 | >>> 127 | ``` 128 | 129 | 130 | ## Speech Engine 131 | 132 | The tutorial uses the __Google Web Speech API__, however installing [PocketSphinx](http://cmusphinx.github.io/) 133 | (which can work offline) is fairly easy. 134 | 135 | [Snowboy](http://snowboy.kitt.ai/) (which can also work offline) is an option for Hotword Detection, but perhaps 136 | unsuitable for speech recognition (SpeechRecognition tellingly refers to Snowboy as "Snowboy Hotword Detection"). 137 | 138 | For another online option, there is [Wit.ai](http://github.com/wit-ai/pywit) (which also has a [Node.js SDK](http://github.com/wit-ai/node-wit)). 139 | 140 | #### Smoke Test 141 | 142 | The final step may take a few seconds to execute: 143 | 144 | ``` Python 145 | >>> import speech_recognition as sr 146 | >>> r = sr.Recognizer() 147 | >>> harvard = sr.AudioFile('audio_files/harvard.wav') 148 | >>> with harvard as source: 149 | ... audio = r.record(source) 150 | ... 151 | >>> type(audio) 152 | 153 | >>> r.recognize_google(audio) 154 | u'the stale smell of old beer lingers it takes heat to bring out the odor a cold dip restores health and zest a salt pickle taste fine with ham tacos al Pastore are my favorite a zestful food is the hot cross bun' 155 | >>> 156 | ``` 157 | 158 | #### Ambient Noise 159 | 160 | ``` Python 161 | >>> jackhammer = sr.AudioFile('audio_files/jackhammer.wav') 162 | >>> with jackhammer as source: 163 | ... audio = r.record(source) 164 | ... 165 | >>> r.recognize_google(audio) 166 | u'the snail smell of old beer drinkers' 167 | >>> with jackhammer as source: 168 | ... r.adjust_for_ambient_noise(source) 169 | ... audio = r.record(source) 170 | ... 171 | >>> r.recognize_google(audio) 172 | u'still smell old gear vendors' 173 | >>> 174 | ``` 175 | 176 | [Slightly different from the tutorial's `the snail smell of old gear vendors` and `still smell of old beer vendors`.] 177 | 178 | And: 179 | 180 | ``` Python 181 | >>> with jackhammer as source: 182 | ... r.adjust_for_ambient_noise(source, duration=0.5) 183 | ... audio = r.record(source) 184 | ... 185 | >>> r.recognize_google(audio) 186 | u'the snail smell like old beermongers' 187 | >>> 188 | ``` 189 | 190 | [Pretty much the same as `the snail smell like old Beer Mongers`.] 191 | 192 | 193 | ## Speech testing 194 | 195 | Using the speech recognition module: 196 | 197 | $ python -m speech_recognition 198 | A moment of silence, please... 199 | Set minimum energy threshold to 259.109953712 200 | Say something! 201 | Got it! Now to recognize it... 202 | You said hello hello 203 | Got it! Now to recognize it... 204 | You said the rain in Spain 205 | Say something! 206 | ^C$ 207 | 208 | And: 209 | 210 | ``` Python 211 | >>> with mic as source: 212 | ... audio = r.listen(source) 213 | ... 214 | >>> r.recognize_google(audio) 215 | u'Shazam' 216 | >>> 217 | ``` 218 | 219 | And, as stated in the article, a loud hand-clap generates an exception: 220 | 221 | ``` Python 222 | >>> with mic as source: 223 | ... audio = r.listen(source) 224 | ... 225 | >>> r.recognize_google(audio) 226 | Traceback (most recent call last): 227 | File "", line 1, in 228 | File "/home/owner/.local/lib/python2.7/site-packages/speech_recognition/__init__.py", line 858, in recognize_google 229 | if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError() 230 | speech_recognition.UnknownValueError 231 | >>> 232 | ``` 233 | 234 | 235 | ## And finally, the guessing game 236 | 237 | Run the guessing game as follows: 238 | 239 | $ python guessing_game.py 240 | I'm thinking of one of these words: 241 | apple, banana, grape, orange, mango, lemon 242 | You have 3 tries to guess which one. 243 | 244 | Guess 1. Speak! 245 | You said: banana 246 | Incorrect. Try again. 247 | 248 | Guess 2. Speak! 249 | You said: Orange 250 | Incorrect. Try again. 251 | 252 | Guess 3. Speak! 253 | You said: mango 254 | Sorry, you lose! 255 | I was thinking of 'apple'. 256 | $ 257 | 258 | 259 | ## Raspberry Pi 260 | 261 | ![Raspberry Pi](images/little_pi.png) 262 | 263 | After hooking up a Raspberry Pi with a Logitech 4000 webcam (for its microphone) 264 | and configuring with AlsaMixer, everything worked pretty much as expected with 265 | Python 3. 266 | 267 | There were some installation stumbles, but `sudo apt-get update` fixed them. 268 | 269 | It turned out that `flac` was required so it was also installed. 270 | 271 | 272 | ## To Do 273 | 274 | - [x] Add original License (this is probably 'fair use' but better safe than sorry) 275 | - [x] Add `monotonic` as an optional component for Python 2 276 | - [x] Retry with PocketSphinx (works offline) 277 | - [x] Retry with [Snowboy](http://snowboy.kitt.ai/) (works offline) 278 | - [ ] Retry with [Wit.ai](http://github.com/wit-ai/pywit) (which also has a [Node.js SDK](http://github.com/wit-ai/node-wit)) 279 | - [x] Try with Raspberry Pi (works nicely) 280 | - [x] Update for recent versions of `pip` 281 | - [x] Update code to conform to `pylint`, `pycodestyle` and `pydocstyle` 282 | - [x] Update `requirements` files to fix Snyk.io quibbles 283 | - [x] Update code for Python 3 284 | - [x] Add table of Contents 285 | --------------------------------------------------------------------------------