├── .gitignore
├── requirements.txt
├── requirements_Python2.txt
├── images
    ├── favicon.png
    └── little_pi.png
├── audio_files
    ├── harvard.wav
    └── jackhammer.wav
├── LICENSE
├── guessing_game.py
└── README.md


/.gitignore:
--------------------------------------------------------------------------------
1 | *.py[cod]
2 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | pyaudio>=0.2.11
2 | SpeechRecognition>=3.8.1
3 | 


--------------------------------------------------------------------------------
/requirements_Python2.txt:
--------------------------------------------------------------------------------
1 | monotonic>=1.4
2 | pyaudio>=0.2.11
3 | SpeechRecognition>=3.8.1
4 | 


--------------------------------------------------------------------------------
/images/favicon.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mramshaw/Speech-Recognition/HEAD/images/favicon.png


--------------------------------------------------------------------------------
/images/little_pi.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mramshaw/Speech-Recognition/HEAD/images/little_pi.png


--------------------------------------------------------------------------------
/audio_files/harvard.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mramshaw/Speech-Recognition/HEAD/audio_files/harvard.wav


--------------------------------------------------------------------------------
/audio_files/jackhammer.wav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mramshaw/Speech-Recognition/HEAD/audio_files/jackhammer.wav


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 Real Python
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/guessing_game.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # pylint: disable=C0103
  4 | 
  5 | """
  6 | A simple Guessing Game to test speech recognition.
  7 | 
  8 | A random fruit is chosen, and the user has three tries to guess which.
  9 | """
 10 | 
 11 | from __future__ import print_function
 12 | 
 13 | import random
 14 | import time
 15 | 
 16 | import speech_recognition as sr
 17 | 
 18 | 
 19 | def recognize_speech_from_mic(recognizer, microphone):
 20 |     """Transcribe speech recorded from `microphone`.
 21 | 
 22 |     Returns a dictionary with three keys:
 23 |     "success": a boolean indicating whether or not the API request was
 24 |                successful
 25 |     "error":   `None` if no error occured, otherwise a string containing
 26 |                an error message if the API could not be reached or
 27 |                speech was unrecognizable
 28 |     "transcription": `None` if speech could not be transcribed,
 29 |                otherwise a string containing the transcribed text
 30 |     """
 31 |     # check that recognizer and microphone arguments are appropriate type
 32 |     if not isinstance(recognizer, sr.Recognizer):
 33 |         raise TypeError("`recognizer` must be `Recognizer` instance")
 34 | 
 35 |     if not isinstance(microphone, sr.Microphone):
 36 |         raise TypeError("`microphone` must be `Microphone` instance")
 37 | 
 38 |     # adjust the recognizer sensitivity to ambient noise and record audio
 39 |     # from the microphone
 40 |     with microphone as source:
 41 |         recognizer.adjust_for_ambient_noise(source)
 42 |         audio = recognizer.listen(source)
 43 | 
 44 |     # set up the response object
 45 |     response = {
 46 |         "success": True,
 47 |         "error": None,
 48 |         "transcription": None
 49 |     }
 50 | 
 51 |     # try recognizing the speech in the recording
 52 |     # if a RequestError or UnknownValueError exception is caught,
 53 |     #     update the response object accordingly
 54 |     try:
 55 |         response["transcription"] = recognizer.recognize_google(audio)
 56 |     except sr.RequestError:
 57 |         # API was unreachable or unresponsive
 58 |         response["success"] = False
 59 |         response["error"] = "API unavailable"
 60 |     except sr.UnknownValueError:
 61 |         # speech was unintelligible
 62 |         response["error"] = "Unable to recognize speech"
 63 | 
 64 |     return response
 65 | 
 66 | 
 67 | if __name__ == "__main__":
 68 |     # set the list of words, maxnumber of guesses, and prompt limit
 69 |     WORDS = ["apple", "banana", "grape", "orange", "mango", "lemon"]
 70 |     NUM_GUESSES = 3
 71 |     PROMPT_LIMIT = 5
 72 | 
 73 |     # create recognizer and mic instances
 74 |     recognizer = sr.Recognizer()
 75 |     microphone = sr.Microphone()
 76 | 
 77 |     # get a random word from the list
 78 |     word = random.choice(WORDS)
 79 | 
 80 |     # format the instructions string
 81 |     instructions = (
 82 |         "I'm thinking of one of these words:\n"
 83 |         "{words}\n"
 84 |         "You have {n} tries to guess which one.\n"
 85 |     ).format(words=', '.join(WORDS), n=NUM_GUESSES)
 86 | 
 87 |     # show instructions and wait 3 seconds before starting the game
 88 |     print(instructions)
 89 |     time.sleep(3)
 90 | 
 91 |     for i in range(NUM_GUESSES):
 92 |         # get the guess from the user
 93 |         # if a transcription is returned, break out of the loop and
 94 |         #     continue
 95 |         # if no transcription returned and API request failed, break
 96 |         #     loop and continue
 97 |         # if API request succeeded but no transcription was returned,
 98 |         #     re-prompt the user to say their guess again. Do this up
 99 |         #     to PROMPT_LIMIT times
100 |         for j in range(PROMPT_LIMIT):
101 |             print('Guess {}. Speak!'.format(i+1))
102 |             guess = recognize_speech_from_mic(recognizer, microphone)
103 |             if guess["transcription"]:
104 |                 break
105 |             if not guess["success"]:
106 |                 break
107 |             print("I didn't catch that. What did you say?\n")
108 | 
109 |         # if there was an error, stop the game
110 |         if guess["error"]:
111 |             print("ERROR: {}".format(guess["error"]))
112 |             break
113 | 
114 |         # show the user the transcription
115 |         print("You said: {}".format(guess["transcription"]))
116 | 
117 |         # determine if guess is correct and if any attempts remain
118 |         guess_is_correct = guess["transcription"].lower() == word.lower()
119 |         user_has_more_attempts = i < NUM_GUESSES - 1
120 | 
121 |         # determine if the user has won the game
122 |         # if not, repeat the loop if user has more attempts
123 |         # if no attempts left, the user loses the game
124 |         if guess_is_correct:
125 |             print("Correct! You win!")
126 |             break
127 |         elif user_has_more_attempts:
128 |             print("Incorrect. Try again.\n")
129 |         else:
130 |             print("Sorry, you lose!\nI was thinking of '{}'.".format(word))
131 |             break
132 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Speech Recognition with Python
  2 | 
  3 | [![Known Vulnerabilities](http://snyk.io/test/github/mramshaw/Speech-Recognition/badge.svg?style=plastic&targetFile=requirements.txt)](http://snyk.io/test/github/mramshaw/Speech-Recognition?style=plastic&targetFile=requirements.txt)
  4 | 
  5 | I stumbled across this great tutorial, so why not try it out?
  6 | 
  7 |     http://realpython.com/python-speech-recognition/
  8 | 
  9 | As recommended, we will use [SpeechRecognition](http://github.com/Uberi/speech_recognition).
 10 | 
 11 | After setting up my own repo, I found the author's:
 12 | 
 13 |     http://github.com/realpython/python-speech-recognition
 14 | 
 15 | ![Raspberry](images/favicon.png)
 16 | 
 17 | This also works with Raspberry Pi (using Python 3).
 18 | 
 19 | ## Contents
 20 | 
 21 | The contents are as follows:
 22 | 
 23 | * [Prerequisites](#prerequisites)
 24 |     * [For microphone use](#for-microphone-use)
 25 |     * [Optional: monotonic (for Python 2)](#optional-monotonic-for-python-2)
 26 |     * [For speech recognition](#for-speech-recognition)
 27 | * [Speech Engine](#speech-engine)
 28 |     * [Smoke Test](#smoke-test)
 29 |     * [Ambient Noise](#ambient-noise)
 30 | * [Speech testing](#speech-testing)
 31 | * [And finally, the guessing game](#and-finally-the-guessing-game)
 32 | * [Raspberry Pi](#raspberry-pi)
 33 | * [To Do](#to-do)
 34 | 
 35 | ## Prerequisites
 36 | 
 37 | Python 3 and `pip` installed (Python 2 is scheduled for End-of-life, although the instructions and
 38 | code have been tested with Python 2 and an approprate `requirements` file for Python 2 is provided).
 39 | 
 40 | #### For microphone use
 41 | 
 42 | 1. Check for `pyaudio`:
 43 | 
 44 |     ``` Python
 45 |     >>> import pyaudio as pa
 46 |     Traceback (most recent call last):
 47 |       File "<stdin>", line 1, in <module>
 48 |     ImportError: No module named pyaudio
 49 |     >>>
 50 |     ```
 51 | 
 52 | [The next step is for linux; check the [pyaudio requirements](http://people.csail.mit.edu/hubert/pyaudio/#downloads) first.]
 53 | 
 54 | 2. Install `portaudio19-dev`:
 55 | 
 56 |     ```
 57 |     $ sudo apt-get install portaudio19-dev
 58 |     ```
 59 | 
 60 | 3. Install `pyaudio`:
 61 | 
 62 |     ```
 63 |     $ pip install --user pyaudio
 64 |     ```
 65 | 
 66 | 4. Verify installation:
 67 | 
 68 |     ``` Python
 69 |     >>> import pyaudio as pa
 70 |     >>> pa.__version__
 71 |     '0.2.11'
 72 |     >>>
 73 |     ```
 74 | 
 75 | #### Optional: monotonic (for Python 2)
 76 | 
 77 | [SpeechRecognition](http://github.com/Uberi/speech_recognition#monotonic-for-python-2-for-faster-operations-in-some-functions-on-python-2)
 78 | recommends installing [monotonic](http://pypi.python.org/pypi/monotonic) for Python 2 users.
 79 | 
 80 | 1. Check for `monotonic`:
 81 | 
 82 |     ```
 83 |     $ pip list --format=freeze | grep monotonic
 84 |     ```
 85 | 
 86 | 2. Install `monotonic`:
 87 | 
 88 |     ```
 89 |     $ pip install --user monotonic
 90 |     ```
 91 | 
 92 | 3. Verify installation:
 93 | 
 94 |     ```
 95 |     $ pip list --format=freeze | grep monotonic
 96 |     monotonic==1.4
 97 |     $
 98 |     ```
 99 | 
100 | #### For speech recognition
101 | 
102 | SpeechRecognition can be used as a _sound recorder_:
103 | 
104 |     http://github.com/Uberi/speech_recognition/blob/master/examples/write_audio.py
105 | 
106 | This is probably fine for occasional use - but there are better options available.
107 | 
108 | 1. Check for `SpeechRecognition`:
109 | 
110 |     ```
111 |     $ pip list --format=freeze | grep SpeechRecognition
112 |     ```
113 | 
114 | 2. Install `SpeechRecognition`:
115 | 
116 |     ```
117 |     $ pip install --user SpeechRecognition
118 |     ```
119 | 
120 | 3. Verify:
121 | 
122 |     ``` Python
123 |     >>> import speech_recognition as sr
124 |     >>> sr.__version__
125 |     '3.8.1'
126 |     >>>
127 |     ```
128 | 
129 | 
130 | ## Speech Engine
131 | 
132 | The tutorial uses the __Google Web Speech API__, however installing [PocketSphinx](http://cmusphinx.github.io/)
133 | (which can work offline) is fairly easy.
134 | 
135 | [Snowboy](http://snowboy.kitt.ai/) (which can also work offline) is an option for Hotword Detection, but perhaps
136 | unsuitable for speech recognition (SpeechRecognition tellingly refers to Snowboy as "Snowboy Hotword Detection").
137 | 
138 | For another online option, there is [Wit.ai](http://github.com/wit-ai/pywit) (which also has a [Node.js SDK](http://github.com/wit-ai/node-wit)).
139 | 
140 | #### Smoke Test
141 | 
142 | The final step may take a few seconds to execute:
143 | 
144 | ``` Python
145 | >>> import speech_recognition as sr
146 | >>> r = sr.Recognizer()
147 | >>> harvard = sr.AudioFile('audio_files/harvard.wav')
148 | >>> with harvard as source:
149 | ...     audio = r.record(source)
150 | ... 
151 | >>> type(audio)
152 | <class 'speech_recognition.AudioData'>
153 | >>> r.recognize_google(audio)
154 | u'the stale smell of old beer lingers it takes heat to bring out the odor a cold dip restores health and zest a salt pickle taste fine with ham tacos al Pastore are my favorite a zestful food is the hot cross bun'
155 | >>> 
156 | ```
157 | 
158 | #### Ambient Noise
159 | 
160 | ``` Python
161 | >>> jackhammer = sr.AudioFile('audio_files/jackhammer.wav')
162 | >>> with jackhammer as source:
163 | ...     audio = r.record(source)
164 | ... 
165 | >>> r.recognize_google(audio)
166 | u'the snail smell of old beer drinkers'
167 | >>> with jackhammer as source:
168 | ...     r.adjust_for_ambient_noise(source)
169 | ...     audio = r.record(source)
170 | ... 
171 | >>> r.recognize_google(audio)
172 | u'still smell old gear vendors'
173 | >>> 
174 | ```
175 | 
176 | [Slightly different from the tutorial's `the snail smell of old gear vendors` and `still smell of old beer vendors`.]
177 | 
178 | And:
179 | 
180 | ``` Python
181 | >>> with jackhammer as source:
182 | ...     r.adjust_for_ambient_noise(source, duration=0.5)
183 | ...     audio = r.record(source)
184 | ... 
185 | >>> r.recognize_google(audio)
186 | u'the snail smell like old beermongers'
187 | >>>
188 | ```
189 | 
190 | [Pretty much the same as `the snail smell like old Beer Mongers`.]
191 | 
192 | 
193 | ## Speech testing
194 | 
195 | Using the speech recognition module:
196 | 
197 |     $ python -m speech_recognition
198 |     A moment of silence, please...
199 |     Set minimum energy threshold to 259.109953712
200 |     Say something!
201 |     Got it! Now to recognize it...
202 |     You said hello hello
203 |     Got it! Now to recognize it...
204 |     You said the rain in Spain
205 |     Say something!
206 |     ^C$
207 | 
208 | And:
209 | 
210 | ``` Python
211 | >>> with mic as source:
212 | ...     audio = r.listen(source)
213 | ... 
214 | >>> r.recognize_google(audio)
215 | u'Shazam'
216 | >>>
217 | ```
218 | 
219 | And, as stated in the article, a loud hand-clap generates an exception:
220 | 
221 | ``` Python
222 | >>> with mic as source:
223 | ...     audio = r.listen(source)
224 | ... 
225 | >>> r.recognize_google(audio)
226 | Traceback (most recent call last):
227 |   File "<stdin>", line 1, in <module>
228 |   File "/home/owner/.local/lib/python2.7/site-packages/speech_recognition/__init__.py", line 858, in recognize_google
229 |     if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError()
230 | speech_recognition.UnknownValueError
231 | >>>
232 | ```
233 | 
234 | 
235 | ## And finally, the guessing game
236 | 
237 | Run the guessing game as follows:
238 | 
239 |     $ python guessing_game.py
240 |     I'm thinking of one of these words:
241 |     apple, banana, grape, orange, mango, lemon
242 |     You have 3 tries to guess which one.
243 |     
244 |     Guess 1. Speak!
245 |     You said: banana
246 |     Incorrect. Try again.
247 |     
248 |     Guess 2. Speak!
249 |     You said: Orange
250 |     Incorrect. Try again.
251 |     
252 |     Guess 3. Speak!
253 |     You said: mango
254 |     Sorry, you lose!
255 |     I was thinking of 'apple'.
256 |     $
257 | 
258 | 
259 | ## Raspberry Pi
260 | 
261 | ![Raspberry Pi](images/little_pi.png)
262 | 
263 | After hooking up a Raspberry Pi with a Logitech 4000 webcam (for its microphone)
264 | and configuring with AlsaMixer, everything worked pretty much as expected with
265 | Python 3.
266 | 
267 | There were some installation stumbles, but `sudo apt-get update` fixed them.
268 | 
269 | It turned out that `flac` was required so it was also installed.
270 | 
271 | 
272 | ## To Do
273 | 
274 | - [x] Add original License (this is probably 'fair use' but better safe than sorry)
275 | - [x] Add `monotonic` as an optional component for Python 2
276 | - [x] Retry with PocketSphinx (works offline)
277 | - [x] Retry with [Snowboy](http://snowboy.kitt.ai/) (works offline)
278 | - [ ] Retry with [Wit.ai](http://github.com/wit-ai/pywit) (which also has a [Node.js SDK](http://github.com/wit-ai/node-wit))
279 | - [x] Try with Raspberry Pi (works nicely)
280 | - [x] Update for recent versions of `pip`
281 | - [x] Update code to conform to `pylint`, `pycodestyle` and `pydocstyle`
282 | - [x] Update `requirements` files to fix Snyk.io quibbles
283 | - [x] Update code for Python 3
284 | - [x] Add table of Contents
285 | 


--------------------------------------------------------------------------------