├── .github └── ISSUE_TEMPLATE.md ├── .gitignore ├── .travis.yml ├── LICENSE-FLAC.txt ├── LICENSE.txt ├── MANIFEST.in ├── README.rst ├── examples ├── audio_transcribe.py ├── background_listening.py ├── calibrate_energy_threshold.py ├── chinese.flac ├── counting.gram ├── english.wav ├── extended_results.py ├── french.aiff ├── microphone_recognition.py ├── special_recognizer_features.py ├── tensorflow_commands.py ├── threaded_workers.py └── write_audio.py ├── make-release.sh ├── reference ├── library-reference.rst └── pocketsphinx.rst ├── setup.cfg ├── setup.py ├── speech_recognition ├── __init__.py ├── __main__.py ├── deepspeech-data │ └── README ├── flac-linux-x86 ├── flac-linux-x86_64 ├── flac-mac ├── flac-win32.exe └── pocketsphinx-data │ └── en-US │ ├── LICENSE.txt │ ├── acoustic-model │ ├── README │ ├── feat.params │ ├── mdef │ ├── means │ ├── noisedict │ ├── sendump │ ├── transition_matrices │ └── variances │ ├── language-model.lm.bin │ └── pronounciation-dictionary.dict ├── tests ├── __init__.py ├── audio-mono-16-bit-44100Hz.aiff ├── audio-mono-16-bit-44100Hz.flac ├── audio-mono-16-bit-44100Hz.wav ├── audio-mono-24-bit-44100Hz.flac ├── audio-mono-24-bit-44100Hz.wav ├── audio-mono-32-bit-44100Hz.wav ├── audio-mono-8-bit-44100Hz.wav ├── audio-stereo-16-bit-44100Hz.aiff ├── audio-stereo-16-bit-44100Hz.flac ├── audio-stereo-16-bit-44100Hz.wav ├── audio-stereo-24-bit-44100Hz.flac ├── audio-stereo-24-bit-44100Hz.wav ├── audio-stereo-32-bit-44100Hz.wav ├── audio-stereo-8-bit-44100Hz.wav ├── chinese.flac ├── english.wav ├── french.aiff ├── test_audio.py ├── test_recognition.py └── test_special_features.py └── third-party ├── Compiling Python extensions on Windows.pdf ├── LICENSE-PyAudio.txt ├── LICENSE-Sphinx.txt ├── PyAudio-0.2.11-cp27-cp27m-win_amd64.whl ├── PyAudio-0.2.11-cp34-cp34m-win_amd64.whl ├── PyAudio-0.2.11-cp35-cp35m-win_amd64.whl ├── PyAudio-0.2.11-cp36-cp36m-win_amd64.whl ├── PyAudio-0.2.11.tar.gz ├── Source code for Google API Client Library for Python and its dependencies ├── google-api-python-client-1.6.0.tar.gz ├── httplib2-0.9.2.tar.gz ├── oauth2client-4.0.0.tar.gz ├── pyasn1-0.1.9.tar.gz ├── pyasn1-modules-0.0.8.tar.gz ├── rsa-3.4.2.tar.gz ├── six-1.10.0.tar.gz └── uritemplate-3.0.0.tar.gz ├── flac-1.3.2.tar.xz ├── irstlm-master.zip ├── pocketsphinx-0.1.3-cp27-cp27m-win_amd64.whl ├── pocketsphinx-0.1.3-cp35-cp35m-win_amd64.whl └── pocketsphinx-0.1.3.zip /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | Steps to reproduce 2 | ------------------ 3 | 4 | 1. (How do you make the issue happen? Does it happen every time you try it?) 5 | 2. (Make sure to go into as much detail as needed to reproduce the issue. Posting your code here can help us resolve the problem much faster!) 6 | 3. (If there are any files, like audio recordings, don't forget to include them.) 7 | 8 | Expected behaviour 9 | ------------------ 10 | 11 | (What did you expect to happen?) 12 | 13 | Actual behaviour 14 | ---------------- 15 | 16 | (What happened instead? How is it different from what you expected?) 17 | 18 | ``` 19 | (If the library threw an exception, paste the full stack trace here) 20 | ``` 21 | 22 | System information 23 | ------------------ 24 | 25 | (Delete all the statements that don't apply.) 26 | 27 | My **system** is . (For example, "Ubuntu 16.04 LTS x64", "Windows 10 x64", or "macOS Sierra".) 28 | 29 | My **Python version** is . (You can check this by running `python -V`.) 30 | 31 | My **Pip version** is . (You can check this by running `pip -V`.) 32 | 33 | My **SpeechRecognition library version** is . (You can check this by running `python -c "import speech_recognition as sr;print(sr.__version__)"`.) 34 | 35 | My **PyAudio library version** is / I don't have PyAudio installed. (You can check this by running `python -c "import pyaudio as p;print(p.__version__)"`.) 36 | 37 | My **microphones** are: (You can check this by running `python -c "import speech_recognition as sr;print(sr.Microphone.list_microphone_names())"`.) 38 | 39 | My **working microphones** are: (You can check this by running `python -c "import speech_recognition as sr;print(sr.Microphone.list_working_microphones())"`.) 40 | 41 | I **installed PocketSphinx from** . (For example, from the Debian repositories, from Homebrew, or from the source code.) 42 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.egg-info 2 | build 3 | dist 4 | __pycache__ 5 | *.pyc 6 | speech_recognition/pocketsphinx-data/fr-FR/ 7 | speech_recognition/pocketsphinx-data/zh-CN/ 8 | fr-FR.zip 9 | zh-CN.zip 10 | it-IT.zip 11 | pocketsphinx-python/ 12 | examples/TEST.py 13 | speech_recognition/deepspeech-data/en-US 14 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: python 2 | python: 3 | - "2.7" 4 | - "3.3" 5 | - "3.4" 6 | - "3.5" 7 | - "3.6" 8 | - "3.6-dev" 9 | - "3.7-dev" 10 | - "nightly" 11 | 12 | addons: 13 | apt: 14 | packages: 15 | - swig 16 | - libpulse-dev 17 | - libasound2-dev 18 | install: 19 | - trap 'sleep 3' ERR 20 | - pip install pocketsphinx monotonic 21 | - pip install flake8 rstcheck 22 | - pip install -e . 23 | script: 24 | - flake8 --ignore=E501,E701,W504 speech_recognition tests examples setup.py # ignore errors for long lines and multi-statement lines 25 | - rstcheck README.rst reference/*.rst # ensure RST is well-formed 26 | - python -m unittest discover --verbose # run unit tests 27 | sudo: false # this allows TravisCI to use the fast Docker build environment rather than the slower VMs 28 | env: 29 | global: 30 | - secure: "jFHi/NK+hkf8Jw/bA06utypMRAzOcpeKPEZz/P2U79c70aIcmeAOGNUG6t5x2hmaeNpaP1STDtOLVdDawLY904rv/2sAhdMExlLUYubVQrJumvfgwyHRep0NLxrWV/Sf7y6FBPsvS0We29sn5HeEUlSzFwLrANyagpZYGeeWI3SGfdseDK/n4SlD436i7n5jM0Vlbmo07JDtdTN5Ov17APtuqy0ZViNhhTG+wvU8RCd/0/1IvstaaOhSa/82jABXNzH12hY4ynSuK75EVdVLj/WstSmH90r+8TS+YHH1D68yFeoub8kjTzZirqDuwb1s0nGOzx3VAC03+Fb48jHNfz2X0LJEj6gOpaaxgXOr4qkb1+Bx4L1bUkMk3ywjKoXFF0BU/haZfPbzG0fFUDubEXYjhC88gM1CR0LrFf4qtIqFcdM4sjasfv7TD2peiuWqVRZeHzjcvQVC8aDxVFFbTF+Cx1xZ1qLxAY5iJ/dUPWpOVcSs0GIJaJw7LQJU5uQbiU0vg17k9QcVYbASJu0cFAt/OsWGDZp/uArSWrMcSoexe8wI8/k5u9XFnOmlEu5kUJXOrZANjniUk5ilFUe+lag2Zl/ZasNtW16qke+vaWfBnpKl7NOoQemWNdYOxgyc/4x9B3x8gryf5XAmfBeqneh7k10O18u6GYpt33r0zuQ=" # encrypted version of "WIT_AI_KEY=(my key)" 31 | - secure: "ZKs+ywhJett8CpA24wR8js3C5B0uzaXMFIaiWBgkQfVhwbwkecCjG2HbLaJ1ncXP5VZnrXF6Ym4pZm87q0mIp/S0dMS7ZC5Jikowc3Bdyph9L49MDubZL0SO98+YR9j0QeKw8wxiVP6kv9cw12uVWn4VNgGcuW6AYZ0AqzdvUfW4+zby+Ua9U8LC0RcDKY3GR4Svq6dUjNFtJmI5uJ129UFO4oujCzuHNZL3KSSUJVt1KelVX+1eUNJ67sN3AvoMfx86jXNtN0kS12lZ+dP4YDo+lCtViG/W1dHCCdBmnUZsPE4Bc+Uyvg/BeKZaL1hgrNb6QHCNWmZC7jGxzkP2akwX5PxmKW7ClXn/79c7e84YUiRHlYQgL0qP+kZ7WDG6nJyKqLNFAtTHAw5F++5cpomNThYoCJeQOmkhi+KLEs9KMmn4d/bOLzW1RCeuf0bhjOOCI89+761aqJ1ky8UHJUZCKjYegHLM/bZ9LkKnUi+d+KYNQB8qpluZSLqObknCczh6ekKt/1FdrC+FBbFmpkTCuru1K9APdz01+ipVV8Av6NB+ax0+KPlKp49TA9uzANKWyLRkW9j6LD67MGF6SH/h8t5OeNZXdmf4DGjqv1erbKZeW+y25Hw7lVbqEo1m4T9wn1lmA1nse0kBrqGF+kQ4mNdfNSmWGWKxj+gFuxA=" # encrypted version of "BING_KEY=(my key)" 32 | - secure: "JEtMaAhDglqRrHdKZapxIaY0zlCohsepgxfRckhuCB3RZljeIKjt15Q/8LzFcx0ZdQV2thOQ/2oA0WpnfTckEnh42X+Ki0AUlezjXXYII2DenCs9q7jXxuOYK5AjxcNzyfeh7NnI2R3jdAyf49FdnoOa/OdEZq7aYRouP0yZtVKK/eMueURfr7JMsTrmuYoy1LXkF/yEyxns9HiaSebn7YqeQ7cb9Q5LcSigM6kCXZrtG1K4MqWGrvnqGeabE6xoZVxkf+az6fMv91oZ4spZRfjjlFpGx050gP4SCpk8XQqVS2HAtzVSFBdnLld4ydRoGVHVMAOmvQY5xbk5y9REVj4EVdfeErOhaEz6CfFqZi9UpAS0Zza/7khGDDWkHmfg4O4CzrVLkfdcPIgIKcz9TT9zP+wPVCYmfN2Qq0XB+PJkewjmgPuWZnUyBb402iPs1hWEze8oK6Yk5K3OnBuSqeE4EtvpT/SUrLtroSNcWJJ7i585cqgNB5KwzDDKNnyn0zteQQTj+fUzrumQ+/FTYjaafOVZ6ZAiZ+xvgge0+foB94GCoV/8LUm5rVTtk8vV3c3oJu9jdzsyiOSargYPSYg7iy1kzkC/eQ12rX89EWLGjoP+mveLGBpUebQNbB8vxaVRd8uaozW/G3Vwgelqg7gzrvmwkaYK3g6a1TAVpcs=" # encrypted version of "HOUNDIFY_CLIENT_ID=(my client ID)" 33 | - secure: "izFPobia0Luga6mL0pXDBmp/V1/kzZdFc09PbYUBNoyx63DPmDwP8dtSFy9ynEERJg4HQ6KeQzsPED3ZhnYO3C3lD3y078+k6Ryl15aONLrou6jzDiYMw6KV1CQ6V1OIz3tLwZoS7wwWdr0ZYdMEklYVVVu8wJOzl6aZ8gtp8Y3woev6qrxFeXhkkNZOybtQ8ugV6a5EypVEVQ2IGTEVvA6A8oSGDd8BDOSYyKPQ3LXPx7imA6freqio/b5HaACkBIidFRykly3xkBib2phhww2D18Zdu5imJtCmHxFQ3V+N5ZzlUkgmR9gyvdblQgJ7sCwpQAC/Mb0KWqUDar59nRA1WmY+onVN/t7sjBBCPjS0Ddu5Ls3X9Qdh3rflQ2Fc7nSi8iVITAHFreUKEW/jgJyBnFuau0Cu5DNcZYy24W+GBzwks1g/uoy4vWVbijaIzSEXu352CqClSJpBTltp3z0KZ/9D9VRB1tFoFmlVWkW39bBBqpJy/49mGVlbrG2J+hyCW+J+BQFpTcjXSd+JS57XXYKcm3QXnNxxnIQ5lw/6t92SbWWP+IeJB9fJENFLteE5XjtQWQ7gHbb7hP0iH9u92mJbehzvdo9KwePlIeWFC1Wyw3ZHrLa56DykfPNg9kYcuJdTwLMRxI4X5aG/e1QBVAwM8tii6Zrjag684iM=" # encrypted version of "HOUNDIFY_CLIENT_KEY=(my client key)" 34 | - secure: "uj5LUKDtf214EZPqsjpy6tk8iXEfydC3z/px98xbXa/H6PVN6wMPTHsF1DuuTWCbLrqNyi9/rMbpJFiNuqMm+q0LarrvvuTKHA9JFe/ZA11R1w3WI2ZMTvub6vzCbmcznIkjq981BjFWz5aCazPXhLt18e0iMit2FR+D6jwZ4al8TIo9i6RjkJ3MimH2/Sgm2BnXZ7qHsmDlG+4VsABiPiH0SPzrxqJJ4WSOb8EnNkNcOujiHuYvDNR+6R566bXjV1x+z2ewKb2nae5LOEl8L+6B/CsNT2cyeds2imYWAw9vTZoTajXf2u21M3pqRINQ67CuWhGFOdUXiEd6E/jTQFcsE4GuB7eMIYcHCmPzhhHn1b6XzNJtf923+YlSnayf63Y5jHjeSWSWs6pjJOUjJquuXS8vQYuJYX4n8sXDeEsZg0yD2jdxFMqMmjZoKKJzWPTPUkDTLawZdZs2q6bOF+xBQysUPozgSnxe3koCMFLeA1cU6fUkXWWIFDuAehR0JqYQHaUovoO0ZYx8Env0Ojhl6IZclONxaLVA41CbzkSUC1pg0k/VeMiv6YB2SQsFxV1riKM/OPDxq7AAuUuNVDCj/SGya4BJEYrxtagtmq0em8Q8SJzLq7IFNBNq5pO8IaqA0JO/tieSIsutrhdRzVMI35apuwbE+5jxoDmiGW0=" # encrypted version of "IBM_USERNAME=(my username)" 35 | - secure: "fqWkYnsx5xrYjDosEkHramkzuuRjAu6NUkSx/yJf78WTDgJ0XAvu7BP9vdfO9g+KvwVZ9uBSClBXiNM7c1i/CpZCJcZJQtQS9PqL3YB9+76J3hPwOsQx0t3oRiYKPDmHX3WFUFuGhI2k90iw4n6nWHUUiU2WxWk/8sibXxyCf99CRMGwpfycd+w8mhsi/MkzbyxWBjzgRKIFu6tg28bs6GcjrMyoq6avD3jpwghGAu1CA3UnuxdOqY9WI2+d9MwmtK6cUQ88o/5MX7GjPZzfkiohru03yn1sBBBivf1V7Vwvd7xsnUZ+6/WiJnzRkaqoGaYSOnI5bhJ/qR21zNMwNEaYrbdyCWau+YLuOheTJzihyeTN9f5zQg/PiBQMLDyKWBw7+v2rQMzTmKoif7fz+SAN5GMXvqgcoMlZ7se9sk0QH6z+GLYbnZNtu0Qpf01gNaJaveQRuurdLtihF8EBTET+hBouiRTUWHvJMgd6PI2pp9BRdnvwwHlhCQLwUjqprLUHX6OdbhFc2ixHwao+Qbg+oCEv+IhCrW1HoTCFIBy/SllRx0l7MfroEiRDRkaZeKA6bOr+3yirVmUOQVLH5rmVUuoNCmI0BZG5GPt5+AhZ36Wlw3/CXkcJAf7VNcya+u4ls+Hdxb9SyFNsZ5IF0ZWNRPfZlG8uEGDy/o05fbY=" # encrypted version of "IBM_PASSWORD=(my password)" 36 | -------------------------------------------------------------------------------- /LICENSE-FLAC.txt: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 2, June 1991 3 | 4 | Copyright (C) 1989, 1991 Free Software Foundation, Inc., 5 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA 6 | Everyone is permitted to copy and distribute verbatim copies 7 | of this license document, but changing it is not allowed. 8 | 9 | Preamble 10 | 11 | The licenses for most software are designed to take away your 12 | freedom to share and change it. By contrast, the GNU General Public 13 | License is intended to guarantee your freedom to share and change free 14 | software--to make sure the software is free for all its users. This 15 | General Public License applies to most of the Free Software 16 | Foundation's software and to any other program whose authors commit to 17 | using it. (Some other Free Software Foundation software is covered by 18 | the GNU Lesser General Public License instead.) You can apply it to 19 | your programs, too. 20 | 21 | When we speak of free software, we are referring to freedom, not 22 | price. Our General Public Licenses are designed to make sure that you 23 | have the freedom to distribute copies of free software (and charge for 24 | this service if you wish), that you receive source code or can get it 25 | if you want it, that you can change the software or use pieces of it 26 | in new free programs; and that you know you can do these things. 27 | 28 | To protect your rights, we need to make restrictions that forbid 29 | anyone to deny you these rights or to ask you to surrender the rights. 30 | These restrictions translate to certain responsibilities for you if you 31 | distribute copies of the software, or if you modify it. 32 | 33 | For example, if you distribute copies of such a program, whether 34 | gratis or for a fee, you must give the recipients all the rights that 35 | you have. You must make sure that they, too, receive or can get the 36 | source code. And you must show them these terms so they know their 37 | rights. 38 | 39 | We protect your rights with two steps: (1) copyright the software, and 40 | (2) offer you this license which gives you legal permission to copy, 41 | distribute and/or modify the software. 42 | 43 | Also, for each author's protection and ours, we want to make certain 44 | that everyone understands that there is no warranty for this free 45 | software. If the software is modified by someone else and passed on, we 46 | want its recipients to know that what they have is not the original, so 47 | that any problems introduced by others will not reflect on the original 48 | authors' reputations. 49 | 50 | Finally, any free program is threatened constantly by software 51 | patents. We wish to avoid the danger that redistributors of a free 52 | program will individually obtain patent licenses, in effect making the 53 | program proprietary. To prevent this, we have made it clear that any 54 | patent must be licensed for everyone's free use or not licensed at all. 55 | 56 | The precise terms and conditions for copying, distribution and 57 | modification follow. 58 | 59 | GNU GENERAL PUBLIC LICENSE 60 | TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 61 | 62 | 0. This License applies to any program or other work which contains 63 | a notice placed by the copyright holder saying it may be distributed 64 | under the terms of this General Public License. The "Program", below, 65 | refers to any such program or work, and a "work based on the Program" 66 | means either the Program or any derivative work under copyright law: 67 | that is to say, a work containing the Program or a portion of it, 68 | either verbatim or with modifications and/or translated into another 69 | language. (Hereinafter, translation is included without limitation in 70 | the term "modification".) Each licensee is addressed as "you". 71 | 72 | Activities other than copying, distribution and modification are not 73 | covered by this License; they are outside its scope. The act of 74 | running the Program is not restricted, and the output from the Program 75 | is covered only if its contents constitute a work based on the 76 | Program (independent of having been made by running the Program). 77 | Whether that is true depends on what the Program does. 78 | 79 | 1. You may copy and distribute verbatim copies of the Program's 80 | source code as you receive it, in any medium, provided that you 81 | conspicuously and appropriately publish on each copy an appropriate 82 | copyright notice and disclaimer of warranty; keep intact all the 83 | notices that refer to this License and to the absence of any warranty; 84 | and give any other recipients of the Program a copy of this License 85 | along with the Program. 86 | 87 | You may charge a fee for the physical act of transferring a copy, and 88 | you may at your option offer warranty protection in exchange for a fee. 89 | 90 | 2. You may modify your copy or copies of the Program or any portion 91 | of it, thus forming a work based on the Program, and copy and 92 | distribute such modifications or work under the terms of Section 1 93 | above, provided that you also meet all of these conditions: 94 | 95 | a) You must cause the modified files to carry prominent notices 96 | stating that you changed the files and the date of any change. 97 | 98 | b) You must cause any work that you distribute or publish, that in 99 | whole or in part contains or is derived from the Program or any 100 | part thereof, to be licensed as a whole at no charge to all third 101 | parties under the terms of this License. 102 | 103 | c) If the modified program normally reads commands interactively 104 | when run, you must cause it, when started running for such 105 | interactive use in the most ordinary way, to print or display an 106 | announcement including an appropriate copyright notice and a 107 | notice that there is no warranty (or else, saying that you provide 108 | a warranty) and that users may redistribute the program under 109 | these conditions, and telling the user how to view a copy of this 110 | License. (Exception: if the Program itself is interactive but 111 | does not normally print such an announcement, your work based on 112 | the Program is not required to print an announcement.) 113 | 114 | These requirements apply to the modified work as a whole. If 115 | identifiable sections of that work are not derived from the Program, 116 | and can be reasonably considered independent and separate works in 117 | themselves, then this License, and its terms, do not apply to those 118 | sections when you distribute them as separate works. But when you 119 | distribute the same sections as part of a whole which is a work based 120 | on the Program, the distribution of the whole must be on the terms of 121 | this License, whose permissions for other licensees extend to the 122 | entire whole, and thus to each and every part regardless of who wrote it. 123 | 124 | Thus, it is not the intent of this section to claim rights or contest 125 | your rights to work written entirely by you; rather, the intent is to 126 | exercise the right to control the distribution of derivative or 127 | collective works based on the Program. 128 | 129 | In addition, mere aggregation of another work not based on the Program 130 | with the Program (or with a work based on the Program) on a volume of 131 | a storage or distribution medium does not bring the other work under 132 | the scope of this License. 133 | 134 | 3. You may copy and distribute the Program (or a work based on it, 135 | under Section 2) in object code or executable form under the terms of 136 | Sections 1 and 2 above provided that you also do one of the following: 137 | 138 | a) Accompany it with the complete corresponding machine-readable 139 | source code, which must be distributed under the terms of Sections 140 | 1 and 2 above on a medium customarily used for software interchange; or, 141 | 142 | b) Accompany it with a written offer, valid for at least three 143 | years, to give any third party, for a charge no more than your 144 | cost of physically performing source distribution, a complete 145 | machine-readable copy of the corresponding source code, to be 146 | distributed under the terms of Sections 1 and 2 above on a medium 147 | customarily used for software interchange; or, 148 | 149 | c) Accompany it with the information you received as to the offer 150 | to distribute corresponding source code. (This alternative is 151 | allowed only for noncommercial distribution and only if you 152 | received the program in object code or executable form with such 153 | an offer, in accord with Subsection b above.) 154 | 155 | The source code for a work means the preferred form of the work for 156 | making modifications to it. For an executable work, complete source 157 | code means all the source code for all modules it contains, plus any 158 | associated interface definition files, plus the scripts used to 159 | control compilation and installation of the executable. However, as a 160 | special exception, the source code distributed need not include 161 | anything that is normally distributed (in either source or binary 162 | form) with the major components (compiler, kernel, and so on) of the 163 | operating system on which the executable runs, unless that component 164 | itself accompanies the executable. 165 | 166 | If distribution of executable or object code is made by offering 167 | access to copy from a designated place, then offering equivalent 168 | access to copy the source code from the same place counts as 169 | distribution of the source code, even though third parties are not 170 | compelled to copy the source along with the object code. 171 | 172 | 4. You may not copy, modify, sublicense, or distribute the Program 173 | except as expressly provided under this License. Any attempt 174 | otherwise to copy, modify, sublicense or distribute the Program is 175 | void, and will automatically terminate your rights under this License. 176 | However, parties who have received copies, or rights, from you under 177 | this License will not have their licenses terminated so long as such 178 | parties remain in full compliance. 179 | 180 | 5. You are not required to accept this License, since you have not 181 | signed it. However, nothing else grants you permission to modify or 182 | distribute the Program or its derivative works. These actions are 183 | prohibited by law if you do not accept this License. Therefore, by 184 | modifying or distributing the Program (or any work based on the 185 | Program), you indicate your acceptance of this License to do so, and 186 | all its terms and conditions for copying, distributing or modifying 187 | the Program or works based on it. 188 | 189 | 6. Each time you redistribute the Program (or any work based on the 190 | Program), the recipient automatically receives a license from the 191 | original licensor to copy, distribute or modify the Program subject to 192 | these terms and conditions. You may not impose any further 193 | restrictions on the recipients' exercise of the rights granted herein. 194 | You are not responsible for enforcing compliance by third parties to 195 | this License. 196 | 197 | 7. If, as a consequence of a court judgment or allegation of patent 198 | infringement or for any other reason (not limited to patent issues), 199 | conditions are imposed on you (whether by court order, agreement or 200 | otherwise) that contradict the conditions of this License, they do not 201 | excuse you from the conditions of this License. If you cannot 202 | distribute so as to satisfy simultaneously your obligations under this 203 | License and any other pertinent obligations, then as a consequence you 204 | may not distribute the Program at all. For example, if a patent 205 | license would not permit royalty-free redistribution of the Program by 206 | all those who receive copies directly or indirectly through you, then 207 | the only way you could satisfy both it and this License would be to 208 | refrain entirely from distribution of the Program. 209 | 210 | If any portion of this section is held invalid or unenforceable under 211 | any particular circumstance, the balance of the section is intended to 212 | apply and the section as a whole is intended to apply in other 213 | circumstances. 214 | 215 | It is not the purpose of this section to induce you to infringe any 216 | patents or other property right claims or to contest validity of any 217 | such claims; this section has the sole purpose of protecting the 218 | integrity of the free software distribution system, which is 219 | implemented by public license practices. Many people have made 220 | generous contributions to the wide range of software distributed 221 | through that system in reliance on consistent application of that 222 | system; it is up to the author/donor to decide if he or she is willing 223 | to distribute software through any other system and a licensee cannot 224 | impose that choice. 225 | 226 | This section is intended to make thoroughly clear what is believed to 227 | be a consequence of the rest of this License. 228 | 229 | 8. If the distribution and/or use of the Program is restricted in 230 | certain countries either by patents or by copyrighted interfaces, the 231 | original copyright holder who places the Program under this License 232 | may add an explicit geographical distribution limitation excluding 233 | those countries, so that distribution is permitted only in or among 234 | countries not thus excluded. In such case, this License incorporates 235 | the limitation as if written in the body of this License. 236 | 237 | 9. The Free Software Foundation may publish revised and/or new versions 238 | of the General Public License from time to time. Such new versions will 239 | be similar in spirit to the present version, but may differ in detail to 240 | address new problems or concerns. 241 | 242 | Each version is given a distinguishing version number. If the Program 243 | specifies a version number of this License which applies to it and "any 244 | later version", you have the option of following the terms and conditions 245 | either of that version or of any later version published by the Free 246 | Software Foundation. If the Program does not specify a version number of 247 | this License, you may choose any version ever published by the Free Software 248 | Foundation. 249 | 250 | 10. If you wish to incorporate parts of the Program into other free 251 | programs whose distribution conditions are different, write to the author 252 | to ask for permission. For software which is copyrighted by the Free 253 | Software Foundation, write to the Free Software Foundation; we sometimes 254 | make exceptions for this. Our decision will be guided by the two goals 255 | of preserving the free status of all derivatives of our free software and 256 | of promoting the sharing and reuse of software generally. 257 | 258 | NO WARRANTY 259 | 260 | 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY 261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN 262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES 263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED 264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS 266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE 267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, 268 | REPAIR OR CORRECTION. 269 | 270 | 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING 271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR 272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, 273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING 274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED 275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY 276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER 277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE 278 | POSSIBILITY OF SUCH DAMAGES. 279 | 280 | END OF TERMS AND CONDITIONS 281 | 282 | How to Apply These Terms to Your New Programs 283 | 284 | If you develop a new program, and you want it to be of the greatest 285 | possible use to the public, the best way to achieve this is to make it 286 | free software which everyone can redistribute and change under these terms. 287 | 288 | To do so, attach the following notices to the program. It is safest 289 | to attach them to the start of each source file to most effectively 290 | convey the exclusion of warranty; and each file should have at least 291 | the "copyright" line and a pointer to where the full notice is found. 292 | 293 | 294 | Copyright (C) 295 | 296 | This program is free software; you can redistribute it and/or modify 297 | it under the terms of the GNU General Public License as published by 298 | the Free Software Foundation; either version 2 of the License, or 299 | (at your option) any later version. 300 | 301 | This program is distributed in the hope that it will be useful, 302 | but WITHOUT ANY WARRANTY; without even the implied warranty of 303 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 304 | GNU General Public License for more details. 305 | 306 | You should have received a copy of the GNU General Public License along 307 | with this program; if not, write to the Free Software Foundation, Inc., 308 | 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. 309 | 310 | Also add information on how to contact you by electronic and paper mail. 311 | 312 | If the program is interactive, make it output a short notice like this 313 | when it starts in an interactive mode: 314 | 315 | Gnomovision version 69, Copyright (C) year name of author 316 | Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. 317 | This is free software, and you are welcome to redistribute it 318 | under certain conditions; type `show c' for details. 319 | 320 | The hypothetical commands `show w' and `show c' should show the appropriate 321 | parts of the General Public License. Of course, the commands you use may 322 | be called something other than `show w' and `show c'; they could even be 323 | mouse-clicks or menu items--whatever suits your program. 324 | 325 | You should also get your employer (if you work as a programmer) or your 326 | school, if any, to sign a "copyright disclaimer" for the program, if 327 | necessary. Here is a sample; alter the names: 328 | 329 | Yoyodyne, Inc., hereby disclaims all copyright interest in the program 330 | `Gnomovision' (which makes passes at compilers) written by James Hacker. 331 | 332 | , 1 April 1989 333 | Ty Coon, President of Vice 334 | 335 | This General Public License does not permit incorporating your program into 336 | proprietary programs. If your program is a subroutine library, you may 337 | consider it more useful to permit linking proprietary applications with the 338 | library. If this is what you want to do, use the GNU Lesser General 339 | Public License instead of this License. 340 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Copyright (c) 2014-2017, Anthony Zhang 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 5 | 6 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 7 | 8 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 9 | 10 | 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 11 | 12 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | graft speech_recognition 2 | graft reference 3 | recursive-exclude speech_recognition *.pyc 4 | include README.rst 5 | include LICENSE.txt 6 | include LICENSE-FLAC.txt 7 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | SpeechRecognition 2 | ================= 3 | 4 | .. image:: https://img.shields.io/pypi/v/SpeechRecognition.svg 5 | :target: https://pypi.python.org/pypi/SpeechRecognition/ 6 | :alt: Latest Version 7 | 8 | .. image:: https://img.shields.io/pypi/status/SpeechRecognition.svg 9 | :target: https://pypi.python.org/pypi/SpeechRecognition/ 10 | :alt: Development Status 11 | 12 | .. image:: https://img.shields.io/pypi/pyversions/SpeechRecognition.svg 13 | :target: https://pypi.python.org/pypi/SpeechRecognition/ 14 | :alt: Supported Python Versions 15 | 16 | .. image:: https://img.shields.io/pypi/l/SpeechRecognition.svg 17 | :target: https://pypi.python.org/pypi/SpeechRecognition/ 18 | :alt: License 19 | 20 | .. image:: https://api.travis-ci.org/Uberi/speech_recognition.svg?branch=master 21 | :target: https://travis-ci.org/Uberi/speech_recognition 22 | :alt: Continuous Integration Test Results 23 | 24 | Library for performing speech recognition, with support for several engines and APIs, online and offline. 25 | 26 | Speech recognition engine/API support: 27 | 28 | * `CMU Sphinx `__ (works offline) 29 | * Google Speech Recognition 30 | * `Google Cloud Speech API `__ 31 | * `Wit.ai `__ 32 | * `Microsoft Azure Speech `__ 33 | * `Microsoft Bing Voice Recognition (Deprecated) `__ 34 | * `Houndify API `__ 35 | * `IBM Speech to Text `__ 36 | * `Snowboy Hotword Detection `__ (works offline) 37 | 38 | **Quickstart:** ``pip install SpeechRecognition``. See the "Installing" section for more details. 39 | 40 | To quickly try it out, run ``python -m speech_recognition`` after installing (which additionally requires the ``pyaudio`` package). 41 | 42 | Project links: 43 | 44 | - `PyPI `__ 45 | - `Source code `__ 46 | - `Issue tracker `__ 47 | 48 | Library Reference 49 | ----------------- 50 | 51 | The `library reference `__ documents every publicly accessible object in the library. 52 | 53 | See `Notes on using PocketSphinx `__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. 54 | 55 | Examples 56 | -------- 57 | 58 | See the `examples directory `__ in the repository root for usage examples: 59 | 60 | - `Recognize speech input from the microphone `__ 61 | - `Transcribe an audio file `__ 62 | - `Save audio data to an audio file `__ 63 | - `Show extended recognition results `__ 64 | - `Calibrate the recognizer energy threshold for ambient noise levels `__ (see ``recognizer_instance.energy_threshold`` for details) 65 | - `Listening to a microphone in the background `__ 66 | - `Various other useful recognizer features `__ 67 | 68 | Installing 69 | ---------- 70 | 71 | First, make sure you have all the requirements listed in the "Requirements" section. 72 | 73 | The easiest way to install this is using ``pip install SpeechRecognition``. 74 | 75 | Otherwise, download the source distribution from `PyPI `__, and extract the archive. 76 | 77 | In the folder, run ``python setup.py install``. 78 | 79 | Requirements 80 | ------------ 81 | 82 | To use all of the functionality of the library, you should have: 83 | 84 | * **Python** `2.6, 2.7, or 3.3+ `__ (required) 85 | * **PyAudio** 0.2.11+ (required only if you use microphone input, ``Microphone``) 86 | * **PocketSphinx** (required only if you use the Sphinx recognizer, ``recognizer_instance.recognize_sphinx``) 87 | * **Google API Client Library for Python** (required only if you use the Google Cloud Speech API, ``recognizer_instance.recognize_google_cloud``) 88 | * **FLAC encoder** (required only if the system is not x86-based Windows/Linux/OS X) 89 | 90 | The following requirements are optional, but can improve or extend functionality in some situations: 91 | 92 | * On Python 2, and only on Python 2, some functions (like ``recognizer_instance.recognize_bing``) will run slower if you do not have **Monotonic for Python 2** installed. 93 | * If using CMU Sphinx, you may want to `install additional language packs `__ to support languages like International French or Mandarin Chinese. 94 | 95 | The following sections go over the details of each requirement. 96 | 97 | PyAudio (for microphone users) 98 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 99 | 100 | `PyAudio `__ is required if and only if you want to use microphone input (``Microphone``). PyAudio version 0.2.11+ is required, as earlier versions have known memory management bugs when recording from microphones in certain situations. 101 | 102 | If not installed, everything in the library will still work, except attempting to instantiate a ``Microphone`` object will raise an ``AttributeError``. 103 | 104 | The installation instructions on the PyAudio website are quite good - for convenience, they are summarized below: 105 | 106 | * On Windows, install PyAudio using `Pip `__: execute ``pip install pyaudio`` in a terminal. 107 | * On Debian-derived Linux distributions (like Ubuntu and Mint), install PyAudio using `APT `__: execute ``sudo apt-get install python-pyaudio python3-pyaudio`` in a terminal. 108 | * If the version in the repositories is too old, install the latest release using Pip: execute ``sudo apt-get install portaudio19-dev python-all-dev python3-all-dev && sudo pip install pyaudio`` (replace ``pip`` with ``pip3`` if using Python 3). 109 | * On OS X, install PortAudio using `Homebrew `__: ``brew install portaudio``. Then, install PyAudio using `Pip `__: ``pip install pyaudio``. 110 | * On other POSIX-based systems, install the ``portaudio19-dev`` and ``python-all-dev`` (or ``python3-all-dev`` if using Python 3) packages (or their closest equivalents) using a package manager of your choice, and then install PyAudio using `Pip `__: ``pip install pyaudio`` (replace ``pip`` with ``pip3`` if using Python 3). 111 | 112 | PyAudio `wheel packages `__ for common 64-bit Python versions on Windows and Linux are included for convenience, under the ``third-party/`` `directory `__ in the repository root. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the repository root directory. 113 | 114 | PocketSphinx-Python (for Sphinx users) 115 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 116 | 117 | `PocketSphinx-Python `__ is **required if and only if you want to use the Sphinx recognizer** (``recognizer_instance.recognize_sphinx``). 118 | 119 | PocketSphinx-Python `wheel packages `__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows are included for convenience, under the ``third-party/`` `directory `__. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder. 120 | 121 | On Linux and other POSIX systems (such as OS X), follow the instructions under "Building PocketSphinx-Python from source" in `Notes on using PocketSphinx `__ for installation instructions. 122 | 123 | Note that the versions available in most package repositories are outdated and will not work with the bundled language data. Using the bundled wheel packages or building from source is recommended. 124 | 125 | See `Notes on using PocketSphinx `__ for information about installing languages, compiling PocketSphinx, and building language packs from online resources. This document is also included under ``reference/pocketsphinx.rst``. 126 | 127 | Google Cloud Speech Library for Python (for Google Cloud Speech API users) 128 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 129 | 130 | `Google Cloud Speech library for Python `__ is required if and only if you want to use the Google Cloud Speech API (``recognizer_instance.recognize_google_cloud``). 131 | 132 | If not installed, everything in the library will still work, except calling ``recognizer_instance.recognize_google_cloud`` will raise an ``RequestError``. 133 | 134 | According to the `official installation instructions `__, the recommended way to install this is using `Pip `__: execute ``pip install google-cloud-speech`` (replace ``pip`` with ``pip3`` if using Python 3). 135 | 136 | FLAC (for some systems) 137 | ~~~~~~~~~~~~~~~~~~~~~~~ 138 | 139 | A `FLAC encoder `__ is required to encode the audio data to send to the API. If using Windows (x86 or x86-64), OS X (Intel Macs only, OS X 10.6 or higher), or Linux (x86 or x86-64), this is **already bundled with this library - you do not need to install anything**. 140 | 141 | Otherwise, ensure that you have the ``flac`` command line tool, which is often available through the system package manager. For example, this would usually be ``sudo apt-get install flac`` on Debian-derivatives, or ``brew install flac`` on OS X with Homebrew. 142 | 143 | Monotonic for Python 2 (for faster operations in some functions on Python 2) 144 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 145 | 146 | On Python 2, and only on Python 2, if you do not install the `Monotonic for Python 2 `__ library, some functions will run slower than they otherwise could (though everything will still work correctly). 147 | 148 | On Python 3, that library's functionality is built into the Python standard library, which makes it unnecessary. 149 | 150 | This is because monotonic time is necessary to handle cache expiry properly in the face of system time changes and other time-related issues. If monotonic time functionality is not available, then things like access token requests will not be cached. 151 | 152 | To install, use `Pip `__: execute ``pip install monotonic`` in a terminal. 153 | 154 | Troubleshooting 155 | --------------- 156 | 157 | The recognizer tries to recognize speech even when I'm not speaking, or after I'm done speaking. 158 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 159 | 160 | Try increasing the ``recognizer_instance.energy_threshold`` property. This is basically how sensitive the recognizer is to when recognition should start. Higher values mean that it will be less sensitive, which is useful if you are in a loud room. 161 | 162 | This value depends entirely on your microphone or audio data. There is no one-size-fits-all value, but good values typically range from 50 to 4000. 163 | 164 | Also, check on your microphone volume settings. If it is too sensitive, the microphone may be picking up a lot of ambient noise. If it is too insensitive, the microphone may be rejecting speech as just noise. 165 | 166 | The recognizer can't recognize speech right after it starts listening for the first time. 167 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 168 | 169 | The ``recognizer_instance.energy_threshold`` property is probably set to a value that is too high to start off with, and then being adjusted lower automatically by dynamic energy threshold adjustment. Before it is at a good level, the energy threshold is so high that speech is just considered ambient noise. 170 | 171 | The solution is to decrease this threshold, or call ``recognizer_instance.adjust_for_ambient_noise`` beforehand, which will set the threshold to a good value automatically. 172 | 173 | The recognizer doesn't understand my particular language/dialect. 174 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 175 | 176 | Try setting the recognition language to your language/dialect. To do this, see the documentation for ``recognizer_instance.recognize_sphinx``, ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, ``recognizer_instance.recognize_bing``, ``recognizer_instance.recognize_api``, ``recognizer_instance.recognize_houndify``, and ``recognizer_instance.recognize_ibm``. 177 | 178 | For example, if your language/dialect is British English, it is better to use ``"en-GB"`` as the language rather than ``"en-US"``. 179 | 180 | The recognizer hangs on ``recognizer_instance.listen``; specifically, when it's calling ``Microphone.MicrophoneStream.read``. 181 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 182 | 183 | This usually happens when you're using a Raspberry Pi board, which doesn't have audio input capabilities by itself. This causes the default microphone used by PyAudio to simply block when we try to read it. If you happen to be using a Raspberry Pi, you'll need a USB sound card (or USB microphone). 184 | 185 | Once you do this, change all instances of ``Microphone()`` to ``Microphone(device_index=MICROPHONE_INDEX)``, where ``MICROPHONE_INDEX`` is the hardware-specific index of the microphone. 186 | 187 | To figure out what the value of ``MICROPHONE_INDEX`` should be, run the following code: 188 | 189 | .. code:: python 190 | 191 | import speech_recognition as sr 192 | for index, name in enumerate(sr.Microphone.list_microphone_names()): 193 | print("Microphone with name \"{1}\" found for `Microphone(device_index={0})`".format(index, name)) 194 | 195 | This will print out something like the following: 196 | 197 | :: 198 | 199 | Microphone with name "HDA Intel HDMI: 0 (hw:0,3)" found for `Microphone(device_index=0)` 200 | Microphone with name "HDA Intel HDMI: 1 (hw:0,7)" found for `Microphone(device_index=1)` 201 | Microphone with name "HDA Intel HDMI: 2 (hw:0,8)" found for `Microphone(device_index=2)` 202 | Microphone with name "Blue Snowball: USB Audio (hw:1,0)" found for `Microphone(device_index=3)` 203 | Microphone with name "hdmi" found for `Microphone(device_index=4)` 204 | Microphone with name "pulse" found for `Microphone(device_index=5)` 205 | Microphone with name "default" found for `Microphone(device_index=6)` 206 | 207 | Now, to use the Snowball microphone, you would change ``Microphone()`` to ``Microphone(device_index=3)``. 208 | 209 | Calling ``Microphone()`` gives the error ``IOError: No Default Input Device Available``. 210 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 211 | 212 | As the error says, the program doesn't know which microphone to use. 213 | 214 | To proceed, either use ``Microphone(device_index=MICROPHONE_INDEX, ...)`` instead of ``Microphone(...)``, or set a default microphone in your OS. You can obtain possible values of ``MICROPHONE_INDEX`` using the code in the troubleshooting entry right above this one. 215 | 216 | The code examples raise ``UnicodeEncodeError: 'ascii' codec can't encode character`` when run. 217 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 218 | 219 | When you're using Python 2, and your language uses non-ASCII characters, and the terminal or file-like object you're printing to only supports ASCII, an error is raised when trying to write non-ASCII characters. 220 | 221 | This is because in Python 2, ``recognizer_instance.recognize_sphinx``, ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, ``recognizer_instance.recognize_bing``, ``recognizer_instance.recognize_api``, ``recognizer_instance.recognize_houndify``, and ``recognizer_instance.recognize_ibm`` return unicode strings (``u"something"``) rather than byte strings (``"something"``). In Python 3, all strings are unicode strings. 222 | 223 | To make printing of unicode strings work in Python 2 as well, replace all print statements in your code of the following form: 224 | 225 | .. code:: python 226 | 227 | print SOME_UNICODE_STRING 228 | 229 | With the following: 230 | 231 | .. code:: python 232 | 233 | print SOME_UNICODE_STRING.encode("utf8") 234 | 235 | This change, however, will prevent the code from working in Python 3. 236 | 237 | The program doesn't run when compiled with `PyInstaller `__. 238 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 239 | 240 | As of PyInstaller version 3.0, SpeechRecognition is supported out of the box. If you're getting weird issues when compiling your program using PyInstaller, simply update PyInstaller. 241 | 242 | You can easily do this by running ``pip install --upgrade pyinstaller``. 243 | 244 | On Ubuntu/Debian, I get annoying output in the terminal saying things like "bt_audio_service_open: [...] Connection refused" and various others. 245 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 246 | 247 | The "bt_audio_service_open" error means that you have a Bluetooth audio device, but as a physical device is not currently connected, we can't actually use it - if you're not using a Bluetooth microphone, then this can be safely ignored. If you are, and audio isn't working, then double check to make sure your microphone is actually connected. There does not seem to be a simple way to disable these messages. 248 | 249 | For errors of the form "ALSA lib [...] Unknown PCM", see `this StackOverflow answer `__. Basically, to get rid of an error of the form "Unknown PCM cards.pcm.rear", simply comment out ``pcm.rear cards.pcm.rear`` in ``/usr/share/alsa/alsa.conf``, ``~/.asoundrc``, and ``/etc/asound.conf``. 250 | 251 | For "jack server is not running or cannot be started" or "connect(2) call to /dev/shm/jack-1000/default/jack_0 failed (err=No such file or directory)" or "attempt to connect to server failed", these are caused by ALSA trying to connect to JACK, and can be safely ignored. I'm not aware of any simple way to turn those messages off at this time, besides `entirely disabling printing while starting the microphone `__. 252 | 253 | On OS X, I get a ``ChildProcessError`` saying that it couldn't find the system FLAC converter, even though it's installed. 254 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 255 | 256 | Installing `FLAC for OS X `__ directly from the source code will not work, since it doesn't correctly add the executables to the search path. 257 | 258 | Installing FLAC using `Homebrew `__ ensures that the search path is correctly updated. First, ensure you have Homebrew, then run ``brew install flac`` to install the necessary files. 259 | 260 | Developing 261 | ---------- 262 | 263 | To hack on this library, first make sure you have all the requirements listed in the "Requirements" section. 264 | 265 | - Most of the library code lives in ``speech_recognition/__init__.py``. 266 | - Examples live under the ``examples/`` `directory `__, and the demo script lives in ``speech_recognition/__main__.py``. 267 | - The FLAC encoder binaries are in the ``speech_recognition/`` `directory `__. 268 | - Documentation can be found in the ``reference/`` `directory `__. 269 | - Third-party libraries, utilities, and reference material are in the ``third-party/`` `directory `__. 270 | 271 | To install/reinstall the library locally, run ``python setup.py install`` in the project root directory. 272 | 273 | Before a release, the version number is bumped in ``README.rst`` and ``speech_recognition/__init__.py``. Version tags are then created using ``git config gpg.program gpg2 && git config user.signingkey DB45F6C431DE7C2DCD99FF7904882258A4063489 && git tag -s VERSION_GOES_HERE -m "Version VERSION_GOES_HERE"``. 274 | 275 | Releases are done by running ``make-release.sh VERSION_GOES_HERE`` to build the Python source packages, sign them, and upload them to PyPI. 276 | 277 | Testing 278 | ~~~~~~~ 279 | 280 | To run all the tests: 281 | 282 | .. code:: bash 283 | 284 | python -m unittest discover --verbose 285 | 286 | Testing is also done automatically by TravisCI, upon every push. To set up the environment for offline/local Travis-like testing on a Debian-like system: 287 | 288 | .. code:: bash 289 | 290 | sudo docker run --volume "$(pwd):/speech_recognition" --interactive --tty quay.io/travisci/travis-python:latest /bin/bash 291 | su - travis && cd /speech_recognition 292 | sudo apt-get update && sudo apt-get install swig libpulse-dev 293 | pip install --user pocketsphinx monotonic && pip install --user flake8 rstcheck && pip install --user -e . 294 | python -m unittest discover --verbose # run unit tests 295 | python -m flake8 --ignore=E501,E701 speech_recognition tests examples setup.py # ignore errors for long lines and multi-statement lines 296 | python -m rstcheck README.rst reference/*.rst # ensure RST is well-formed 297 | 298 | FLAC Executables 299 | ~~~~~~~~~~~~~~~~ 300 | 301 | The included ``flac-win32`` executable is the `official FLAC 1.3.2 32-bit Windows binary `__. 302 | 303 | The included ``flac-linux-x86`` and ``flac-linux-x86_64`` executables are built from the `FLAC 1.3.2 source code `__ with `Manylinux `__ to ensure that it's compatible with a wide variety of distributions. 304 | 305 | The built FLAC executables should be bit-for-bit reproducible. To rebuild them, run the following inside the project directory on a Debian-like system: 306 | 307 | .. code:: bash 308 | 309 | # download and extract the FLAC source code 310 | cd third-party 311 | sudo apt-get install --yes docker.io 312 | 313 | # build FLAC inside the Manylinux i686 Docker image 314 | tar xf flac-1.3.2.tar.xz 315 | sudo docker run --tty --interactive --rm --volume "$(pwd):/root" quay.io/pypa/manylinux1_i686:latest bash 316 | cd /root/flac-1.3.2 317 | ./configure LDFLAGS=-static # compiler flags to make a static build 318 | make 319 | exit 320 | cp flac-1.3.2/src/flac/flac ../speech_recognition/flac-linux-x86 && sudo rm -rf flac-1.3.2/ 321 | 322 | # build FLAC inside the Manylinux x86_64 Docker image 323 | tar xf flac-1.3.2.tar.xz 324 | sudo docker run --tty --interactive --rm --volume "$(pwd):/root" quay.io/pypa/manylinux1_x86_64:latest bash 325 | cd /root/flac-1.3.2 326 | ./configure LDFLAGS=-static # compiler flags to make a static build 327 | make 328 | exit 329 | cp flac-1.3.2/src/flac/flac ../speech_recognition/flac-linux-x86_64 && sudo rm -r flac-1.3.2/ 330 | 331 | The included ``flac-mac`` executable is extracted from `xACT 2.39 `__, which is a frontend for FLAC 1.3.2 that conveniently includes binaries for all of its encoders. Specifically, it is a copy of ``xACT 2.39/xACT.app/Contents/Resources/flac`` in ``xACT2.39.zip``. 332 | 333 | Authors 334 | ------- 335 | 336 | :: 337 | 338 | Uberi (Anthony Zhang) 339 | bobsayshilol 340 | arvindch (Arvind Chembarpu) 341 | kevinismith (Kevin Smith) 342 | haas85 343 | DelightRun 344 | maverickagm 345 | kamushadenes (Kamus Hadenes) 346 | sbraden (Sarah Braden) 347 | tb0hdan (Bohdan Turkynewych) 348 | Thynix (Steve Dougherty) 349 | beeedy (Broderick Carlin) 350 | 351 | Please report bugs and suggestions at the `issue tracker `__! 352 | 353 | How to cite this library (APA style): 354 | 355 | Zhang, A. (2017). Speech Recognition (Version 3.8) [Software]. Available from https://github.com/Uberi/speech_recognition#readme. 356 | 357 | How to cite this library (Chicago style): 358 | 359 | Zhang, Anthony. 2017. *Speech Recognition* (version 3.8). 360 | 361 | Also check out the `Python Baidu Yuyin API `__, which is based on an older version of this project, and adds support for `Baidu Yuyin `__. Note that Baidu Yuyin is only available inside China. 362 | 363 | License 364 | ------- 365 | 366 | Copyright 2014-2017 `Anthony Zhang (Uberi) `__. 367 | The source code for this library is available online at `GitHub `__. 368 | 369 | SpeechRecognition is made available under the 3-clause BSD license. See ``LICENSE.txt`` in the project's `root directory `__ for more information. 370 | 371 | For convenience, all the official distributions of SpeechRecognition already include a copy of the necessary copyright notices and licenses. In your project, you can simply **say that licensing information for SpeechRecognition can be found within the SpeechRecognition README, and make sure SpeechRecognition is visible to users if they wish to see it**. 372 | 373 | SpeechRecognition distributes source code, binaries, and language files from `CMU Sphinx `__. These files are BSD-licensed and redistributable as long as copyright notices are correctly retained. See ``speech_recognition/pocketsphinx-data/*/LICENSE*.txt`` and ``third-party/LICENSE-Sphinx.txt`` for license details for individual parts. 374 | 375 | SpeechRecognition distributes source code and binaries from `PyAudio `__. These files are MIT-licensed and redistributable as long as copyright notices are correctly retained. See ``third-party/LICENSE-PyAudio.txt`` for license details. 376 | 377 | SpeechRecognition distributes binaries from `FLAC `__ - ``speech_recognition/flac-win32.exe``, ``speech_recognition/flac-linux-x86``, and ``speech_recognition/flac-mac``. These files are GPLv2-licensed and redistributable, as long as the terms of the GPL are satisfied. The FLAC binaries are an `aggregate `__ of `separate programs `__, so these GPL restrictions do not apply to the library or your programs that use the library, only to FLAC itself. See ``LICENSE-FLAC.txt`` for license details. 378 | -------------------------------------------------------------------------------- /examples/audio_transcribe.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import speech_recognition as sr 4 | 5 | # obtain path to "english.wav" in the same folder as this script 6 | from os import path 7 | AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav") 8 | # AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff") 9 | # AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac") 10 | 11 | # use the audio file as the audio source 12 | r = sr.Recognizer() 13 | with sr.AudioFile(AUDIO_FILE) as source: 14 | audio = r.record(source) # read the entire audio file 15 | 16 | # recognize speech using Sphinx 17 | try: 18 | print("Sphinx thinks you said " + r.recognize_sphinx(audio)) 19 | except sr.UnknownValueError: 20 | print("Sphinx could not understand audio") 21 | except sr.RequestError as e: 22 | print("Sphinx error; {0}".format(e)) 23 | 24 | # recognize speech using Google Speech Recognition 25 | try: 26 | # for testing purposes, we're just using the default API key 27 | # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")` 28 | # instead of `r.recognize_google(audio)` 29 | print("Google Speech Recognition thinks you said " + r.recognize_google(audio)) 30 | except sr.UnknownValueError: 31 | print("Google Speech Recognition could not understand audio") 32 | except sr.RequestError as e: 33 | print("Could not request results from Google Speech Recognition service; {0}".format(e)) 34 | 35 | # recognize speech using Google Cloud Speech 36 | GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE""" 37 | try: 38 | print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS)) 39 | except sr.UnknownValueError: 40 | print("Google Cloud Speech could not understand audio") 41 | except sr.RequestError as e: 42 | print("Could not request results from Google Cloud Speech service; {0}".format(e)) 43 | 44 | # recognize speech using Wit.ai 45 | WIT_AI_KEY = "INSERT WIT.AI API KEY HERE" # Wit.ai keys are 32-character uppercase alphanumeric strings 46 | try: 47 | print("Wit.ai thinks you said " + r.recognize_wit(audio, key=WIT_AI_KEY)) 48 | except sr.UnknownValueError: 49 | print("Wit.ai could not understand audio") 50 | except sr.RequestError as e: 51 | print("Could not request results from Wit.ai service; {0}".format(e)) 52 | 53 | # recognize speech using Microsoft Azure Speech 54 | AZURE_SPEECH_KEY = "INSERT AZURE SPEECH API KEY HERE" # Microsoft Speech API keys 32-character lowercase hexadecimal strings 55 | try: 56 | print("Microsoft Azure Speech thinks you said " + r.recognize_azure(audio, key=AZURE_SPEECH_KEY)) 57 | except sr.UnknownValueError: 58 | print("Microsoft Azure Speech could not understand audio") 59 | except sr.RequestError as e: 60 | print("Could not request results from Microsoft Azure Speech service; {0}".format(e)) 61 | 62 | # recognize speech using Microsoft Bing Voice Recognition 63 | BING_KEY = "INSERT BING API KEY HERE" # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings 64 | try: 65 | print("Microsoft Bing Voice Recognition thinks you said " + r.recognize_bing(audio, key=BING_KEY)) 66 | except sr.UnknownValueError: 67 | print("Microsoft Bing Voice Recognition could not understand audio") 68 | except sr.RequestError as e: 69 | print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e)) 70 | 71 | # recognize speech using Houndify 72 | HOUNDIFY_CLIENT_ID = "INSERT HOUNDIFY CLIENT ID HERE" # Houndify client IDs are Base64-encoded strings 73 | HOUNDIFY_CLIENT_KEY = "INSERT HOUNDIFY CLIENT KEY HERE" # Houndify client keys are Base64-encoded strings 74 | try: 75 | print("Houndify thinks you said " + r.recognize_houndify(audio, client_id=HOUNDIFY_CLIENT_ID, client_key=HOUNDIFY_CLIENT_KEY)) 76 | except sr.UnknownValueError: 77 | print("Houndify could not understand audio") 78 | except sr.RequestError as e: 79 | print("Could not request results from Houndify service; {0}".format(e)) 80 | 81 | # recognize speech using IBM Speech to Text 82 | IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE" # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX 83 | IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE" # IBM Speech to Text passwords are mixed-case alphanumeric strings 84 | try: 85 | print("IBM Speech to Text thinks you said " + r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD)) 86 | except sr.UnknownValueError: 87 | print("IBM Speech to Text could not understand audio") 88 | except sr.RequestError as e: 89 | print("Could not request results from IBM Speech to Text service; {0}".format(e)) 90 | -------------------------------------------------------------------------------- /examples/background_listening.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # NOTE: this example requires PyAudio because it uses the Microphone class 4 | 5 | import time 6 | 7 | import speech_recognition as sr 8 | 9 | 10 | # this is called from the background thread 11 | def callback(recognizer, audio): 12 | # received audio data, now we'll recognize it using Google Speech Recognition 13 | try: 14 | # for testing purposes, we're just using the default API key 15 | # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")` 16 | # instead of `r.recognize_google(audio)` 17 | print("Google Speech Recognition thinks you said " + recognizer.recognize_google(audio)) 18 | except sr.UnknownValueError: 19 | print("Google Speech Recognition could not understand audio") 20 | except sr.RequestError as e: 21 | print("Could not request results from Google Speech Recognition service; {0}".format(e)) 22 | 23 | 24 | r = sr.Recognizer() 25 | m = sr.Microphone() 26 | with m as source: 27 | r.adjust_for_ambient_noise(source) # we only need to calibrate once, before we start listening 28 | 29 | # start listening in the background (note that we don't have to do this inside a `with` statement) 30 | stop_listening = r.listen_in_background(m, callback) 31 | # `stop_listening` is now a function that, when called, stops background listening 32 | 33 | # do some unrelated computations for 5 seconds 34 | for _ in range(50): time.sleep(0.1) # we're still listening even though the main thread is doing other things 35 | 36 | # calling this function requests that the background listener stop listening 37 | stop_listening(wait_for_stop=False) 38 | 39 | # do some more unrelated things 40 | while True: time.sleep(0.1) # we're not listening anymore, even though the background thread might still be running for a second or two while cleaning up and stopping 41 | -------------------------------------------------------------------------------- /examples/calibrate_energy_threshold.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # NOTE: this example requires PyAudio because it uses the Microphone class 4 | 5 | import speech_recognition as sr 6 | 7 | # obtain audio from the microphone 8 | r = sr.Recognizer() 9 | with sr.Microphone() as source: 10 | r.adjust_for_ambient_noise(source) # listen for 1 second to calibrate the energy threshold for ambient noise levels 11 | print("Say something!") 12 | audio = r.listen(source) 13 | 14 | # recognize speech using Google Speech Recognition 15 | try: 16 | # for testing purposes, we're just using the default API key 17 | # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")` 18 | # instead of `r.recognize_google(audio)` 19 | print("Google Speech Recognition thinks you said " + r.recognize_google(audio)) 20 | except sr.UnknownValueError: 21 | print("Google Speech Recognition could not understand audio") 22 | except sr.RequestError as e: 23 | print("Could not request results from Google Speech Recognition service; {0}".format(e)) 24 | -------------------------------------------------------------------------------- /examples/chinese.flac: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/examples/chinese.flac -------------------------------------------------------------------------------- /examples/counting.gram: -------------------------------------------------------------------------------- 1 | #JSGF V1.0; 2 | 3 | /** 4 | * JSGF Grammar for English counting example 5 | */ 6 | 7 | grammar counting; 8 | 9 | public = ( ) +; 10 | 11 | = one | two | three | four | five | six | seven ; 12 | -------------------------------------------------------------------------------- /examples/english.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/examples/english.wav -------------------------------------------------------------------------------- /examples/extended_results.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | from pprint import pprint 4 | 5 | import speech_recognition as sr 6 | 7 | # obtain path to "english.wav" in the same folder as this script 8 | from os import path 9 | 10 | AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav") 11 | # AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "french.aiff") 12 | # AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "chinese.flac") 13 | 14 | # use the audio file as the audio source 15 | r = sr.Recognizer() 16 | with sr.AudioFile(AUDIO_FILE) as source: 17 | audio = r.record(source) # read the entire audio file 18 | 19 | # recognize speech using Sphinx 20 | try: 21 | print("Sphinx thinks you said " + r.recognize_sphinx(audio)) 22 | except sr.UnknownValueError: 23 | print("Sphinx could not understand audio") 24 | except sr.RequestError as e: 25 | print("Sphinx error; {0}".format(e)) 26 | 27 | # recognize speech using Google Speech Recognition 28 | try: 29 | # for testing purposes, we're just using the default API key 30 | # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY", show_all=True)` 31 | # instead of `r.recognize_google(audio, show_all=True)` 32 | print("Google Speech Recognition results:") 33 | pprint(r.recognize_google(audio, show_all=True)) # pretty-print the recognition result 34 | except sr.UnknownValueError: 35 | print("Google Speech Recognition could not understand audio") 36 | except sr.RequestError as e: 37 | print("Could not request results from Google Speech Recognition service; {0}".format(e)) 38 | 39 | # recognize speech using Google Cloud Speech 40 | GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE""" 41 | try: 42 | print("Google Cloud Speech recognition results:") 43 | pprint(r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, show_all=True)) # pretty-print the recognition result 44 | except sr.UnknownValueError: 45 | print("Google Cloud Speech could not understand audio") 46 | except sr.RequestError as e: 47 | print("Could not request results from Google Cloud Speech service; {0}".format(e)) 48 | 49 | # recognize speech using Wit.ai 50 | WIT_AI_KEY = "INSERT WIT.AI API KEY HERE" # Wit.ai keys are 32-character uppercase alphanumeric strings 51 | try: 52 | print("Wit.ai recognition results:") 53 | pprint(r.recognize_wit(audio, key=WIT_AI_KEY, show_all=True)) # pretty-print the recognition result 54 | except sr.UnknownValueError: 55 | print("Wit.ai could not understand audio") 56 | except sr.RequestError as e: 57 | print("Could not request results from Wit.ai service; {0}".format(e)) 58 | 59 | # recognize speech using Microsoft Bing Voice Recognition 60 | BING_KEY = "INSERT BING API KEY HERE" # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings 61 | try: 62 | print("Bing recognition results:") 63 | pprint(r.recognize_bing(audio, key=BING_KEY, show_all=True)) 64 | except sr.UnknownValueError: 65 | print("Microsoft Bing Voice Recognition could not understand audio") 66 | except sr.RequestError as e: 67 | print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e)) 68 | 69 | # recognize speech using Houndify 70 | HOUNDIFY_CLIENT_ID = "INSERT HOUNDIFY CLIENT ID HERE" # Houndify client IDs are Base64-encoded strings 71 | HOUNDIFY_CLIENT_KEY = "INSERT HOUNDIFY CLIENT KEY HERE" # Houndify client keys are Base64-encoded strings 72 | try: 73 | print("Houndify recognition results:") 74 | pprint(r.recognize_houndify(audio, client_id=HOUNDIFY_CLIENT_ID, client_key=HOUNDIFY_CLIENT_KEY, show_all=True)) 75 | except sr.UnknownValueError: 76 | print("Houndify could not understand audio") 77 | except sr.RequestError as e: 78 | print("Could not request results from Houndify service; {0}".format(e)) 79 | 80 | # recognize speech using IBM Speech to Text 81 | IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE" # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX 82 | IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE" # IBM Speech to Text passwords are mixed-case alphanumeric strings 83 | try: 84 | print("IBM Speech to Text results:") 85 | pprint(r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD, show_all=True)) # pretty-print the recognition result 86 | except sr.UnknownValueError: 87 | print("IBM Speech to Text could not understand audio") 88 | except sr.RequestError as e: 89 | print("Could not request results from IBM Speech to Text service; {0}".format(e)) 90 | -------------------------------------------------------------------------------- /examples/french.aiff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/examples/french.aiff -------------------------------------------------------------------------------- /examples/microphone_recognition.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # NOTE: this example requires PyAudio because it uses the Microphone class 4 | 5 | import speech_recognition as sr 6 | 7 | # obtain audio from the microphone 8 | r = sr.Recognizer() 9 | with sr.Microphone() as source: 10 | print("Say something!") 11 | audio = r.listen(source) 12 | 13 | # recognize speech using Sphinx 14 | try: 15 | print("Sphinx thinks you said " + r.recognize_sphinx(audio)) 16 | except sr.UnknownValueError: 17 | print("Sphinx could not understand audio") 18 | except sr.RequestError as e: 19 | print("Sphinx error; {0}".format(e)) 20 | 21 | # recognize speech using Google Speech Recognition 22 | try: 23 | # for testing purposes, we're just using the default API key 24 | # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")` 25 | # instead of `r.recognize_google(audio)` 26 | print("Google Speech Recognition thinks you said " + r.recognize_google(audio)) 27 | except sr.UnknownValueError: 28 | print("Google Speech Recognition could not understand audio") 29 | except sr.RequestError as e: 30 | print("Could not request results from Google Speech Recognition service; {0}".format(e)) 31 | 32 | # recognize speech using Google Cloud Speech 33 | GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE""" 34 | try: 35 | print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS)) 36 | except sr.UnknownValueError: 37 | print("Google Cloud Speech could not understand audio") 38 | except sr.RequestError as e: 39 | print("Could not request results from Google Cloud Speech service; {0}".format(e)) 40 | 41 | # recognize speech using Wit.ai 42 | WIT_AI_KEY = "INSERT WIT.AI API KEY HERE" # Wit.ai keys are 32-character uppercase alphanumeric strings 43 | try: 44 | print("Wit.ai thinks you said " + r.recognize_wit(audio, key=WIT_AI_KEY)) 45 | except sr.UnknownValueError: 46 | print("Wit.ai could not understand audio") 47 | except sr.RequestError as e: 48 | print("Could not request results from Wit.ai service; {0}".format(e)) 49 | 50 | # recognize speech using Microsoft Bing Voice Recognition 51 | BING_KEY = "INSERT BING API KEY HERE" # Microsoft Bing Voice Recognition API keys 32-character lowercase hexadecimal strings 52 | try: 53 | print("Microsoft Bing Voice Recognition thinks you said " + r.recognize_bing(audio, key=BING_KEY)) 54 | except sr.UnknownValueError: 55 | print("Microsoft Bing Voice Recognition could not understand audio") 56 | except sr.RequestError as e: 57 | print("Could not request results from Microsoft Bing Voice Recognition service; {0}".format(e)) 58 | 59 | # recognize speech using Microsoft Azure Speech 60 | AZURE_SPEECH_KEY = "INSERT AZURE SPEECH API KEY HERE" # Microsoft Speech API keys 32-character lowercase hexadecimal strings 61 | try: 62 | print("Microsoft Azure Speech thinks you said " + r.recognize_azure(audio, key=AZURE_SPEECH_KEY)) 63 | except sr.UnknownValueError: 64 | print("Microsoft Azure Speech could not understand audio") 65 | except sr.RequestError as e: 66 | print("Could not request results from Microsoft Azure Speech service; {0}".format(e)) 67 | 68 | # recognize speech using Houndify 69 | HOUNDIFY_CLIENT_ID = "INSERT HOUNDIFY CLIENT ID HERE" # Houndify client IDs are Base64-encoded strings 70 | HOUNDIFY_CLIENT_KEY = "INSERT HOUNDIFY CLIENT KEY HERE" # Houndify client keys are Base64-encoded strings 71 | try: 72 | print("Houndify thinks you said " + r.recognize_houndify(audio, client_id=HOUNDIFY_CLIENT_ID, client_key=HOUNDIFY_CLIENT_KEY)) 73 | except sr.UnknownValueError: 74 | print("Houndify could not understand audio") 75 | except sr.RequestError as e: 76 | print("Could not request results from Houndify service; {0}".format(e)) 77 | 78 | # recognize speech using IBM Speech to Text 79 | IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE" # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX 80 | IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE" # IBM Speech to Text passwords are mixed-case alphanumeric strings 81 | try: 82 | print("IBM Speech to Text thinks you said " + r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD)) 83 | except sr.UnknownValueError: 84 | print("IBM Speech to Text could not understand audio") 85 | except sr.RequestError as e: 86 | print("Could not request results from IBM Speech to Text service; {0}".format(e)) 87 | -------------------------------------------------------------------------------- /examples/special_recognizer_features.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import speech_recognition as sr 4 | 5 | from os import path 6 | AUDIO_FILE_EN = path.join(path.dirname(path.realpath(__file__)), "english.wav") 7 | AUDIO_FILE_FR = path.join(path.dirname(path.realpath(__file__)), "french.aiff") 8 | 9 | # use the audio file as the audio source 10 | r = sr.Recognizer() 11 | with sr.AudioFile(AUDIO_FILE_EN) as source: 12 | audio_en = r.record(source) # read the entire audio file 13 | with sr.AudioFile(AUDIO_FILE_FR) as source: 14 | audio_fr = r.record(source) # read the entire audio file 15 | 16 | # recognize keywords using Sphinx 17 | try: 18 | print("Sphinx recognition for \"one two three\" with different sets of keywords:") 19 | print(r.recognize_sphinx(audio_en, keyword_entries=[("one", 1.0), ("two", 1.0), ("three", 1.0)])) 20 | print(r.recognize_sphinx(audio_en, keyword_entries=[("wan", 0.95), ("too", 1.0), ("tree", 1.0)])) 21 | print(r.recognize_sphinx(audio_en, keyword_entries=[("un", 0.95), ("to", 1.0), ("tee", 1.0)])) 22 | except sr.UnknownValueError: 23 | print("Sphinx could not understand audio") 24 | except sr.RequestError as e: 25 | print("Sphinx error; {0}".format(e)) 26 | 27 | # grammar example using Sphinx 28 | try: 29 | print("Sphinx recognition for \"one two three\" for counting grammar:") 30 | print(r.recognize_sphinx(audio_en, grammar='counting.gram')) 31 | except sr.UnknownValueError: 32 | print("Sphinx could not understand audio") 33 | except sr.RequestError as e: 34 | print("Sphinx error; {0}".format(e)) 35 | 36 | 37 | # recognize preferred phrases using Google Cloud Speech 38 | GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE""" 39 | try: 40 | print("Google Cloud Speech recognition for \"numero\" with different sets of preferred phrases:") 41 | print(r.recognize_google_cloud(audio_fr, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, preferred_phrases=["noomarow"])) 42 | print(r.recognize_google_cloud(audio_fr, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, preferred_phrases=["newmarrow"])) 43 | except sr.UnknownValueError: 44 | print("Google Cloud Speech could not understand audio") 45 | except sr.RequestError as e: 46 | print("Could not request results from Google Cloud Speech service; {0}".format(e)) 47 | -------------------------------------------------------------------------------- /examples/tensorflow_commands.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import time 3 | import speech_recognition as sr 4 | from tensorflow.contrib.framework.python.ops import audio_ops as contrib_audio # noqa 5 | 6 | # obtain audio from the microphone 7 | r = sr.Recognizer() 8 | m = sr.Microphone() 9 | 10 | with m as source: 11 | r.adjust_for_ambient_noise(source) 12 | 13 | 14 | def callback(recognizer, audio): 15 | try: 16 | # You can download the data here: http://download.tensorflow.org/models/speech_commands_v0.01.zip 17 | spoken = recognizer.recognize_tensorflow(audio, tensor_graph='speech_recognition/tensorflow-data/conv_actions_frozen.pb', tensor_label='speech_recognition/tensorflow-data/conv_actions_labels.txt') 18 | print(spoken) 19 | except sr.UnknownValueError: 20 | print("Tensorflow could not understand audio") 21 | except sr.RequestError as e: 22 | print("Could not request results from Tensorflow service; {0}".format(e)) 23 | 24 | 25 | stop_listening = r.listen_in_background(m, callback, phrase_time_limit=0.6) 26 | time.sleep(100) 27 | -------------------------------------------------------------------------------- /examples/threaded_workers.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # NOTE: this example requires PyAudio because it uses the Microphone class 4 | 5 | from threading import Thread 6 | try: 7 | from queue import Queue # Python 3 import 8 | except ImportError: 9 | from Queue import Queue # Python 2 import 10 | 11 | import speech_recognition as sr 12 | 13 | 14 | r = sr.Recognizer() 15 | audio_queue = Queue() 16 | 17 | 18 | def recognize_worker(): 19 | # this runs in a background thread 20 | while True: 21 | audio = audio_queue.get() # retrieve the next audio processing job from the main thread 22 | if audio is None: break # stop processing if the main thread is done 23 | 24 | # received audio data, now we'll recognize it using Google Speech Recognition 25 | try: 26 | # for testing purposes, we're just using the default API key 27 | # to use another API key, use `r.recognize_google(audio, key="GOOGLE_SPEECH_RECOGNITION_API_KEY")` 28 | # instead of `r.recognize_google(audio)` 29 | print("Google Speech Recognition thinks you said " + r.recognize_google(audio)) 30 | except sr.UnknownValueError: 31 | print("Google Speech Recognition could not understand audio") 32 | except sr.RequestError as e: 33 | print("Could not request results from Google Speech Recognition service; {0}".format(e)) 34 | 35 | audio_queue.task_done() # mark the audio processing job as completed in the queue 36 | 37 | 38 | # start a new thread to recognize audio, while this thread focuses on listening 39 | recognize_thread = Thread(target=recognize_worker) 40 | recognize_thread.daemon = True 41 | recognize_thread.start() 42 | with sr.Microphone() as source: 43 | try: 44 | while True: # repeatedly listen for phrases and put the resulting audio on the audio processing job queue 45 | audio_queue.put(r.listen(source)) 46 | except KeyboardInterrupt: # allow Ctrl + C to shut down the program 47 | pass 48 | 49 | audio_queue.join() # block until all current audio processing jobs are done 50 | audio_queue.put(None) # tell the recognize_thread to stop 51 | recognize_thread.join() # wait for the recognize_thread to actually stop 52 | -------------------------------------------------------------------------------- /examples/write_audio.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | # NOTE: this example requires PyAudio because it uses the Microphone class 4 | 5 | import speech_recognition as sr 6 | 7 | # obtain audio from the microphone 8 | r = sr.Recognizer() 9 | with sr.Microphone() as source: 10 | print("Say something!") 11 | audio = r.listen(source) 12 | 13 | # write audio to a RAW file 14 | with open("microphone-results.raw", "wb") as f: 15 | f.write(audio.get_raw_data()) 16 | 17 | # write audio to a WAV file 18 | with open("microphone-results.wav", "wb") as f: 19 | f.write(audio.get_wav_data()) 20 | 21 | # write audio to an AIFF file 22 | with open("microphone-results.aiff", "wb") as f: 23 | f.write(audio.get_aiff_data()) 24 | 25 | # write audio to a FLAC file 26 | with open("microphone-results.flac", "wb") as f: 27 | f.write(audio.get_flac_data()) 28 | -------------------------------------------------------------------------------- /make-release.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # set up bash to handle errors more aggressively - a "strict mode" of sorts 4 | set -e # give an error if any command finishes with a non-zero exit code 5 | set -u # give an error if we reference unset variables 6 | set -o pipefail # for a pipeline, if any of the commands fail with a non-zero exit code, fail the entire pipeline with that exit code 7 | 8 | echo "Making release for SpeechRecognition-$1" 9 | 10 | python setup.py bdist_wheel 11 | gpg --detach-sign -a dist/SpeechRecognition-$1-*.whl 12 | twine upload dist/SpeechRecognition-$1-*.whl dist/SpeechRecognition-$1-*.whl.asc 13 | -------------------------------------------------------------------------------- /reference/library-reference.rst: -------------------------------------------------------------------------------- 1 | Speech Recognition Library Reference 2 | ==================================== 3 | 4 | ``Microphone(device_index: Union[int, None] = None, sample_rate: int = 16000, chunk_size: int = 1024) -> Microphone`` 5 | --------------------------------------------------------------------------------------------------------------------- 6 | 7 | Creates a new ``Microphone`` instance, which represents a physical microphone on the computer. Subclass of ``AudioSource``. 8 | 9 | This will throw an ``AttributeError`` if you don't have PyAudio 0.2.11 or later installed. 10 | 11 | If ``device_index`` is unspecified or ``None``, the default microphone is used as the audio source. Otherwise, ``device_index`` should be the index of the device to use for audio input. 12 | 13 | A device index is an integer between 0 and ``pyaudio.get_device_count() - 1`` (assume we have used ``import pyaudio`` beforehand) inclusive. It represents an audio device such as a microphone or speaker. See the `PyAudio documentation `__ for more details. 14 | 15 | The microphone audio is recorded in chunks of ``chunk_size`` samples, at a rate of ``sample_rate`` samples per second (Hertz). 16 | 17 | Higher ``sample_rate`` values result in better audio quality, but also more bandwidth (and therefore, slower recognition). Additionally, some machines, such as some Raspberry Pi models, can't keep up if this value is too high. 18 | 19 | Higher ``chunk_size`` values help avoid triggering on rapidly changing ambient noise, but also makes detection less sensitive. This value, generally, should be left at its default. 20 | 21 | Instances of this class are context managers, and are designed to be used with ``with`` statements: 22 | 23 | .. code:: python 24 | 25 | with Microphone() as source: # open the microphone and start recording 26 | pass # do things here - ``source`` is the Microphone instance created above 27 | # the microphone is automatically released at this point 28 | 29 | ``Microphone.list_microphone_names() -> List[str]`` 30 | --------------------------------------------------- 31 | 32 | Returns a list of the names of all available microphones. For microphones where the name can't be retrieved, the list entry contains ``None`` instead. 33 | 34 | The index of each microphone's name in the returned list is the same as its device index when creating a ``Microphone`` instance - if you want to use the microphone at index 3 in the returned list, use ``Microphone(device_index=3)``. 35 | 36 | To create a ``Microphone`` instance by name: 37 | 38 | .. code:: python 39 | 40 | m = None 41 | for i, microphone_name in enumerate(Microphone.list_microphone_names()): 42 | if microphone_name == "HDA Intel HDMI: 0 (hw:0,3)": 43 | m = Microphone(device_index=i) 44 | 45 | ``Microphone.list_working_microphones() -> Dict[int, str]`` 46 | ----------------------------------------------------------- 47 | 48 | Returns a dictionary mapping device indices to microphone names, for microphones that are currently hearing sounds. When using this function, ensure that your microphone is unmuted and make some noise at it to ensure it will be detected as working. 49 | 50 | Each key in the returned dictionary can be passed to the ``Microphone`` constructor to use that microphone. For example, if the return value is ``{3: "HDA Intel PCH: ALC3232 Analog (hw:1,0)"}``, you can do ``Microphone(device_index=3)`` to use that microphone. 51 | 52 | To create a ``Microphone`` instance for the first working microphone: 53 | 54 | .. code:: python 55 | 56 | for device_index in Microphone.list_working_microphones(): 57 | m = Microphone(device_index=device_index) 58 | break 59 | else: 60 | print("No working microphones found!") 61 | 62 | ``AudioFile(filename_or_fileobject: Union[str, io.IOBase]) -> AudioFile`` 63 | ------------------------------------------------------------------------- 64 | 65 | Creates a new ``AudioFile`` instance given a WAV/AIFF/FLAC audio file ``filename_or_fileobject``. Subclass of ``AudioSource``. 66 | 67 | If ``filename_or_fileobject`` is a string, then it is interpreted as a path to an audio file on the filesystem. Otherwise, ``filename_or_fileobject`` should be a file-like object such as ``io.BytesIO`` or similar. 68 | 69 | Note that functions that read from the audio (such as ``recognizer_instance.record`` or ``recognizer_instance.listen``) will move ahead in the stream. For example, if you execute ``recognizer_instance.record(audiofile_instance, duration=10)`` twice, the first time it will return the first 10 seconds of audio, and the second time it will return the 10 seconds of audio right after that. This is always reset when entering the context with a context manager. 70 | 71 | WAV files must be in PCM/LPCM format; WAVE_FORMAT_EXTENSIBLE and compressed WAV are not supported and may result in undefined behaviour. 72 | 73 | Both AIFF and AIFF-C (compressed AIFF) formats are supported. 74 | 75 | FLAC files must be in native FLAC format; OGG-FLAC is not supported and may result in undefined behaviour. 76 | 77 | Instances of this class are context managers, and are designed to be used with ``with`` statements: 78 | 79 | .. code:: python 80 | 81 | import speech_recognition as sr 82 | with sr.AudioFile("SOME_AUDIO_FILE") as source: # open the audio file for reading 83 | pass # do things here - ``source`` is the AudioFile instance created above 84 | 85 | ``audiofile_instance.DURATION # type: float`` 86 | ---------------------------------------------- 87 | 88 | Represents the length of the audio stored in the audio file in seconds. This property is only available when inside a context - essentially, that means it should only be accessed inside the body of a ``with audiofile_instance ...`` statement. Outside of contexts, this property is ``None``. 89 | 90 | This is useful when combined with the ``offset`` parameter of ``recognizer_instance.record``, since when together it is possible to perform speech recognition in chunks. 91 | 92 | However, note that recognizing speech in multiple chunks is not the same as recognizing the whole thing at once. If spoken words appear on the boundaries that we split the audio into chunks on, each chunk only gets part of the word, which may result in inaccurate results. 93 | 94 | ``Recognizer() -> Recognizer`` 95 | ------------------------------ 96 | 97 | Creates a new ``Recognizer`` instance, which represents a collection of speech recognition settings and functionality. 98 | 99 | ``recognizer_instance.energy_threshold = 300 # type: float`` 100 | ------------------------------------------------------------- 101 | 102 | Represents the energy level threshold for sounds. Values below this threshold are considered silence, and values above this threshold are considered speech. Can be changed. 103 | 104 | This is adjusted automatically if dynamic thresholds are enabled (see ``recognizer_instance.dynamic_energy_threshold``). A good starting value will generally allow the automatic adjustment to reach a good value faster. 105 | 106 | This threshold is associated with the perceived loudness of the sound, but it is a nonlinear relationship. The actual energy threshold you will need depends on your microphone sensitivity or audio data. Typical values for a silent room are 0 to 100, and typical values for speaking are between 150 and 3500. Ambient (non-speaking) noise has a significant impact on what values will work best. 107 | 108 | If you're having trouble with the recognizer trying to recognize words even when you're not speaking, try tweaking this to a higher value. If you're having trouble with the recognizer not recognizing your words when you are speaking, try tweaking this to a lower value. For example, a sensitive microphone or microphones in louder rooms might have a ambient energy level of up to 4000: 109 | 110 | .. code:: python 111 | 112 | import speech_recognition as sr 113 | r = sr.Recognizer() 114 | r.energy_threshold = 4000 115 | # rest of your code goes here 116 | 117 | The dynamic energy threshold setting can mitigate this by increasing or decreasing this automatically to account for ambient noise. However, this takes time to adjust, so it is still possible to get the false positive detections before the threshold settles into a good value. 118 | 119 | To avoid this, use ``recognizer_instance.adjust_for_ambient_noise(source, duration = 1)`` to calibrate the level to a good value. Alternatively, simply set this property to a high value initially (4000 works well), so the threshold is always above ambient noise levels: over time, it will be automatically decreased to account for ambient noise levels. 120 | 121 | ``recognizer_instance.dynamic_energy_threshold = True # type: bool`` 122 | --------------------------------------------------------------------- 123 | 124 | Represents whether the energy level threshold (see ``recognizer_instance.energy_threshold``) for sounds should be automatically adjusted based on the currently ambient noise level while listening. Can be changed. 125 | 126 | Recommended for situations where the ambient noise level is unpredictable, which seems to be the majority of use cases. If the ambient noise level is strictly controlled, better results might be achieved by setting this to ``False`` to turn it off. 127 | 128 | ``recognizer_instance.dynamic_energy_adjustment_damping = 0.15 # type: float`` 129 | ------------------------------------------------------------------------------- 130 | 131 | If the dynamic energy threshold setting is enabled (see ``recognizer_instance.dynamic_energy_threshold``), represents approximately the fraction of the current energy threshold that is retained after one second of dynamic threshold adjustment. Can be changed (not recommended). 132 | 133 | Lower values allow for faster adjustment, but also make it more likely to miss certain phrases (especially those with slowly changing volume). This value should be between 0 and 1. As this value approaches 1, dynamic adjustment has less of an effect over time. When this value is 1, dynamic adjustment has no effect. 134 | 135 | ``recognizer_instance.dynamic_energy_adjustment_ratio = 1.5 # type: float`` 136 | ---------------------------------------------------------------------------- 137 | 138 | If the dynamic energy threshold setting is enabled (see ``recognizer_instance.dynamic_energy_threshold``), represents the minimum factor by which speech is louder than ambient noise. Can be changed (not recommended). 139 | 140 | For example, the default value of 1.5 means that speech is at least 1.5 times louder than ambient noise. Smaller values result in more false positives (but fewer false negatives) when ambient noise is loud compared to speech. 141 | 142 | ``recognizer_instance.pause_threshold = 0.8 # type: float`` 143 | ------------------------------------------------------------ 144 | 145 | Represents the minimum length of silence (in seconds) that will register as the end of a phrase. Can be changed. 146 | 147 | Smaller values result in the recognition completing more quickly, but might result in slower speakers being cut off. 148 | 149 | ``recognizer_instance.operation_timeout = None # type: Union[float, None]`` 150 | ---------------------------------------------------------------------------- 151 | 152 | Represents the timeout (in seconds) for internal operations, such as API requests. Can be changed. 153 | 154 | Setting this to a reasonable value ensures that these operations will never block indefinitely, though good values depend on your network speed and the expected length of the audio to recognize. 155 | 156 | ``recognizer_instance.record(source: AudioSource, duration: Union[float, None] = None, offset: Union[float, None] = None) -> AudioData`` 157 | ---------------------------------------------------------------------------------------------------------------------------------------- 158 | 159 | Records up to ``duration`` seconds of audio from ``source`` (an ``AudioSource`` instance) starting at ``offset`` (or at the beginning if not specified) into an ``AudioData`` instance, which it returns. 160 | 161 | If ``duration`` is not specified, then it will record until there is no more audio input. 162 | 163 | ``recognizer_instance.adjust_for_ambient_noise(source: AudioSource, duration: float = 1) -> None`` 164 | -------------------------------------------------------------------------------------------------- 165 | 166 | Adjusts the energy threshold dynamically using audio from ``source`` (an ``AudioSource`` instance) to account for ambient noise. 167 | 168 | Intended to calibrate the energy threshold with the ambient energy level. Should be used on periods of audio without speech - will stop early if any speech is detected. 169 | 170 | The ``duration`` parameter is the maximum number of seconds that it will dynamically adjust the threshold for before returning. This value should be at least 0.5 in order to get a representative sample of the ambient noise. 171 | 172 | ``recognizer_instance.listen(source: AudioSource, timeout: Union[float, None] = None, phrase_time_limit: Union[float, None] = None, snowboy_configuration: Union[Tuple[str, Iterable[str]], None] = None) -> AudioData`` 173 | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 174 | 175 | Records a single phrase from ``source`` (an ``AudioSource`` instance) into an ``AudioData`` instance, which it returns. 176 | 177 | This is done by waiting until the audio has an energy above ``recognizer_instance.energy_threshold`` (the user has started speaking), and then recording until it encounters ``recognizer_instance.pause_threshold`` seconds of non-speaking or there is no more audio input. The ending silence is not included. 178 | 179 | The ``timeout`` parameter is the maximum number of seconds that this will wait for a phrase to start before giving up and throwing an ``speech_recognition.WaitTimeoutError`` exception. If ``timeout`` is ``None``, there will be no wait timeout. 180 | 181 | The ``phrase_time_limit`` parameter is the maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached. The resulting audio will be the phrase cut off at the time limit. If ``phrase_timeout`` is ``None``, there will be no phrase time limit. 182 | 183 | The ``snowboy_configuration`` parameter allows integration with `Snowboy `__, an offline, high-accuracy, power-efficient hotword recognition engine. When used, this function will pause until Snowboy detects a hotword, after which it will unpause. This parameter should either be ``None`` to turn off Snowboy support, or a tuple of the form ``(SNOWBOY_LOCATION, LIST_OF_HOT_WORD_FILES)``, where ``SNOWBOY_LOCATION`` is the path to the Snowboy root directory, and ``LIST_OF_HOT_WORD_FILES`` is a list of paths to Snowboy hotword configuration files (`*.pmdl` or `*.umdl` format). 184 | 185 | This operation will always complete within ``timeout + phrase_timeout`` seconds if both are numbers, either by returning the audio data, or by raising a ``speech_recognition.WaitTimeoutError`` exception. 186 | 187 | ``recognizer_instance.listen_in_background(source: AudioSource, callback: Callable[[Recognizer, AudioData], Any]) -> Callable[bool, None]`` 188 | ------------------------------------------------------------------------------------------------------------------------------------------- 189 | 190 | Spawns a thread to repeatedly record phrases from ``source`` (an ``AudioSource`` instance) into an ``AudioData`` instance and call ``callback`` with that ``AudioData`` instance as soon as each phrase are detected. 191 | 192 | Returns a function object that, when called, requests that the background listener thread stop. The background thread is a daemon and will not stop the program from exiting if there are no other non-daemon threads. The function accepts one parameter, ``wait_for_stop``: if truthy, the function will wait for the background listener to stop before returning, otherwise it will return immediately and the background listener thread might still be running for a second or two afterwards. Additionally, if you are using a truthy value for ``wait_for_stop``, you must call the function from the same thread you originally called ``listen_in_background`` from. 193 | 194 | Phrase recognition uses the exact same mechanism as ``recognizer_instance.listen(source)``. The ``phrase_time_limit`` parameter works in the same way as the ``phrase_time_limit`` parameter for ``recognizer_instance.listen(source)``, as well. 195 | 196 | The ``callback`` parameter is a function that should accept two parameters - the ``recognizer_instance``, and an ``AudioData`` instance representing the captured audio. Note that ``callback`` function will be called from a non-main thread. 197 | 198 | ``recognizer_instance.recognize_sphinx(audio_data: AudioData, language: str = "en-US", keyword_entries: Union[Iterable[Tuple[str, float]], None] = None, grammar: Union[str, None] = None, show_all: bool = False) -> Union[str, pocketsphinx.pocketsphinx.Decoder]`` 199 | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 200 | 201 | Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using CMU Sphinx. 202 | 203 | The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. Out of the box, only ``en-US`` is supported. See `Notes on using `PocketSphinx `__ for information about installing other languages. This document is also included under ``reference/pocketsphinx.rst``. The ``language`` parameter can also be a tuple of filesystem paths, of the form ``(acoustic_parameters_directory, language_model_file, phoneme_dictionary_file)`` - this allows you to load arbitrary Sphinx models. 204 | 205 | If specified, the keywords to search for are determined by ``keyword_entries``, an iterable of tuples of the form ``(keyword, sensitivity)``, where ``keyword`` is a phrase, and ``sensitivity`` is how sensitive to this phrase the recognizer should be, on a scale of 0 (very insensitive, more false negatives) to 1 (very sensitive, more false positives) inclusive. If not specified or ``None``, no keywords are used and Sphinx will simply transcribe whatever words it recognizes. Specifying ``keyword_entries`` is more accurate than just looking for those same keywords in non-keyword-based transcriptions, because Sphinx knows specifically what sounds to look for. 206 | 207 | Sphinx can also handle FSG or JSGF grammars. The parameter ``grammar`` expects a path to the grammar file. Note that if a JSGF grammar is passed, an FSG grammar will be created at the same location to speed up execution in the next run. If ``keyword_entries`` are passed, content of ``grammar`` will be ignored. 208 | 209 | Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the Sphinx ``pocketsphinx.pocketsphinx.Decoder`` object resulting from the recognition. 210 | 211 | Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if there are any issues with the Sphinx installation. 212 | 213 | ``recognizer_instance.recognize_google(audio_data: AudioData, key: Union[str, None] = None, language: str = "en-US", , pfilter: Union[0, 1], show_all: bool = False) -> Union[str, Dict[str, Any]]`` 214 | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 215 | 216 | Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Speech Recognition API. 217 | 218 | The Google Speech Recognition API key is specified by ``key``. If not specified, it uses a generic key that works out of the box. This should generally be used for personal or testing purposes only, as it **may be revoked by Google at any time**. 219 | 220 | To obtain your own API key, simply follow the steps on the `API Keys `__ page at the Chromium Developers site. In the Google Developers Console, Google Speech Recognition is listed as "Speech API". Note that **the API quota for your own keys is 50 requests per day**, and there is currently no way to raise this limit. 221 | 222 | The recognition language is determined by ``language``, an IETF language tag like ``"en-US"`` or ``"en-GB"``, defaulting to US English. A list of supported language tags can be found `here `__. Basically, language codes can be just the language (``en``), or a language with a dialect (``en-US``). 223 | 224 | The profanity filter level can be adjusted with ``pfilter``: 0 - No filter, 1 - Only shows the first character and replaces the rest with asterisks. The default is level 0. 225 | 226 | Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary. 227 | 228 | Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection. 229 | 230 | ``recognizer_instance.recognize_google_cloud(audio_data: AudioData, credentials_json: Union[str, None] = None, language: str = "en-US", preferred_phrases: Union[Iterable[str], None] = None, show_all: bool = False) -> Union[str, Dict[str, Any]]`` 231 | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 232 | 233 | Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Cloud Speech API. 234 | 235 | This function requires a Google Cloud Platform account; see the `Google Cloud Speech API Quickstart `__ for details and instructions. Basically, create a project, enable billing for the project, enable the Google Cloud Speech API for the project, and set up Service Account Key credentials for the project. The result is a JSON file containing the API credentials. The text content of this JSON file is specified by ``credentials_json``. If not specified, the library will try to automatically `find the default API credentials JSON file `__. 236 | 237 | The recognition language is determined by ``language``, which is a BCP-47 language tag like ``"en-US"`` (US English). A list of supported language tags can be found in the `Google Cloud Speech API documentation `__. 238 | 239 | If ``preferred_phrases`` is an iterable of phrase strings, those given phrases will be more likely to be recognized over similar-sounding alternatives. This is useful for things like keyword/command recognition or adding new phrases that aren't in Google's vocabulary. Note that the API imposes certain `restrictions on the list of phrase strings `__. 240 | 241 | Returns the most likely transcription if ``show_all`` is False (the default). Otherwise, returns the raw API response as a JSON dictionary. 242 | 243 | Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the credentials aren't valid, or if there is no Internet connection. 244 | 245 | ``recognizer_instance.recognize_wit(audio_data: AudioData, key: str, show_all: bool = False) -> Union[str, Dict[str, Any]]`` 246 | ---------------------------------------------------------------------------------------------------------------------------- 247 | 248 | Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Wit.ai API. 249 | 250 | The Wit.ai API key is specified by ``key``. Unfortunately, these are not available without `signing up for an account `__ and creating an app. You will need to add at least one intent to the app before you can see the API key, though the actual intent settings don't matter. 251 | 252 | To get the API key for a Wit.ai app, go to the app's overview page, go to the section titled "Make an API request", and look for something along the lines of ``Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX``; ``XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX`` is the API key. Wit.ai API keys are 32-character uppercase alphanumeric strings. 253 | 254 | The recognition language is configured in the Wit.ai app settings. 255 | 256 | Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response `__ as a JSON dictionary. 257 | 258 | Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection. 259 | 260 | ``recognizer_instance.recognize_bing(audio_data: AudioData, key: str, language: str = "en-US", show_all: bool = False) -> Union[str, Dict[str, Any]]`` 261 | ------------------------------------------------------------------------------------------------------------------------------------------------------ 262 | 263 | Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Microsoft Bing Speech API. 264 | 265 | The Microsoft Bing Speech API key is specified by ``key``. Unfortunately, these are not available without `signing up for an account `__ with Microsoft Azure. 266 | 267 | To get the API key, go to the `Microsoft Azure Portal Resources `__ page, go to "All Resources" > "Add" > "See All" > Search "Bing Speech API > "Create", and fill in the form to make a "Bing Speech API" resource. On the resulting page (which is also accessible from the "All Resources" page in the Azure Portal), go to the "Show Access Keys" page, which will have two API keys, either of which can be used for the `key` parameter. Microsoft Bing Speech API keys are 32-character lowercase hexadecimal strings. 268 | 269 | The recognition language is determined by ``language``, a BCP-47 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language values can be found in the `API documentation `__ under "Interactive and dictation mode". 270 | 271 | Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response `__ as a JSON dictionary. 272 | 273 | Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection. 274 | 275 | ``recognizer_instance.recognize_houndify(audio_data: AudioData, client_id: str, client_key: str, show_all: bool = False) -> Union[str, Dict[str, Any]]`` 276 | -------------------------------------------------------------------------------------------------------------------------------------------------------- 277 | 278 | Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Houndify API. 279 | 280 | The Houndify client ID and client key are specified by ``client_id`` and ``client_key``, respectively. Unfortunately, these are not available without `signing up for an account `__. Once logged into the `dashboard `__, you will want to select "Register a new client", and fill in the form as necessary. When at the "Enable Domains" page, enable the "Speech To Text Only" domain, and then select "Save & Continue". 281 | 282 | To get the client ID and client key for a Houndify client, go to the `dashboard `__ and select the client's "View Details" link. On the resulting page, the client ID and client key will be visible. Client IDs and client keys are both Base64-encoded strings. 283 | 284 | Currently, only English is supported as a recognition language. 285 | 286 | Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary. 287 | 288 | Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection. 289 | 290 | ``recognizer_instance.recognize_ibm(audio_data: AudioData, username: str, password: str, language: str = "en-US", show_all: bool = False) -> Union[str, Dict[str, Any]]`` 291 | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 292 | 293 | Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the IBM Speech to Text API. 294 | 295 | The IBM Speech to Text username and password are specified by ``username`` and ``password``, respectively. Unfortunately, these are not available without `signing up for an account `__. Once logged into the Bluemix console, follow the instructions for `creating an IBM Watson service instance `__, where the Watson service is "Speech To Text". IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX, while passwords are mixed-case alphanumeric strings. 296 | 297 | The recognition language is determined by ``language``, an RFC5646 language tag with a dialect like ``"en-US"`` (US English) or ``"zh-CN"`` (Mandarin Chinese), defaulting to US English. The supported language values are listed under the ``model`` parameter of the `audio recognition API documentation `__, in the form ``LANGUAGE_BroadbandModel``, where ``LANGUAGE`` is the language value. 298 | 299 | Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response `__ as a JSON dictionary. 300 | 301 | Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection. 302 | 303 | ``AudioSource`` 304 | --------------- 305 | 306 | Base class representing audio sources. Do not instantiate. 307 | 308 | Instances of subclasses of this class, such as ``Microphone`` and ``AudioFile``, can be passed to things like ``recognizer_instance.record`` and ``recognizer_instance.listen``. Those instances act like context managers, and are designed to be used with ``with`` statements. 309 | 310 | For more information, see the documentation for the individual subclasses. 311 | 312 | ``AudioData(frame_data: bytes, sample_rate: int, sample_width: int) -> AudioData`` 313 | ---------------------------------------------------------------------------------- 314 | 315 | Creates a new ``AudioData`` instance, which represents mono audio data. 316 | 317 | The raw audio data is specified by ``frame_data``, which is a sequence of bytes representing audio samples. This is the frame data structure used by the PCM WAV format. 318 | 319 | The width of each sample, in bytes, is specified by ``sample_width``. Each group of ``sample_width`` bytes represents a single audio sample. 320 | 321 | The audio data is assumed to have a sample rate of ``sample_rate`` samples per second (Hertz). 322 | 323 | Usually, instances of this class are obtained from ``recognizer_instance.record`` or ``recognizer_instance.listen``, or in the callback for ``recognizer_instance.listen_in_background``, rather than instantiating them directly. 324 | 325 | ``audiodata_instance.get_segment(start_ms: Union[float, None] = None, end_ms: Union[float, None] = None) -> AudioData`` 326 | ----------------------------------------------------------------------------------------------------------------------- 327 | 328 | Returns a new ``AudioData`` instance, trimmed to a given time interval. In other words, an ``AudioData`` instance with the same audio data except starting at ``start_ms`` milliseconds in and ending ``end_ms`` milliseconds in. 329 | 330 | If not specified, ``start_ms`` defaults to the beginning of the audio, and ``end_ms`` defaults to the end. 331 | 332 | ``audiodata_instance.get_raw_data(convert_rate: Union[int, None] = None, convert_width: Union[int, None] = None) -> bytes`` 333 | --------------------------------------------------------------------------------------------------------------------------- 334 | 335 | Returns a byte string representing the raw frame data for the audio represented by the ``AudioData`` instance. 336 | 337 | If ``convert_rate`` is specified and the audio sample rate is not ``convert_rate`` Hz, the resulting audio is resampled to match. 338 | 339 | If ``convert_width`` is specified and the audio samples are not ``convert_width`` bytes each, the resulting audio is converted to match. 340 | 341 | Writing these bytes directly to a file results in a valid `RAW/PCM audio file `__. 342 | 343 | ``audiodata_instance.get_wav_data(convert_rate: Union[int, None] = None, convert_width: Union[int, None] = None) -> bytes`` 344 | --------------------------------------------------------------------------------------------------------------------------- 345 | 346 | Returns a byte string representing the contents of a WAV file containing the audio represented by the ``AudioData`` instance. 347 | 348 | If ``convert_width`` is specified and the audio samples are not ``convert_width`` bytes each, the resulting audio is converted to match. 349 | 350 | If ``convert_rate`` is specified and the audio sample rate is not ``convert_rate`` Hz, the resulting audio is resampled to match. 351 | 352 | Writing these bytes directly to a file results in a valid `WAV file `__. 353 | 354 | ``audiodata_instance.get_aiff_data(convert_rate: Union[int, None] = None, convert_width: Union[int, None] = None) -> bytes`` 355 | ---------------------------------------------------------------------------------------------------------------------------- 356 | 357 | Returns a byte string representing the contents of an AIFF-C file containing the audio represented by the ``AudioData`` instance. 358 | 359 | If ``convert_width`` is specified and the audio samples are not ``convert_width`` bytes each, the resulting audio is converted to match. 360 | 361 | If ``convert_rate`` is specified and the audio sample rate is not ``convert_rate`` Hz, the resulting audio is resampled to match. 362 | 363 | Writing these bytes directly to a file results in a valid `AIFF-C file `__. 364 | 365 | ``audiodata_instance.get_flac_data(convert_rate: Union[int, None] = None, convert_width: Union[int, None] = None) -> bytes`` 366 | ---------------------------------------------------------------------------------------------------------------------------- 367 | 368 | Returns a byte string representing the contents of a FLAC file containing the audio represented by the ``AudioData`` instance. 369 | 370 | Note that 32-bit FLAC is not supported. If the audio data is 32-bit and ``convert_width`` is not specified, then the resulting FLAC will be a 24-bit FLAC. 371 | 372 | If ``convert_rate`` is specified and the audio sample rate is not ``convert_rate`` Hz, the resulting audio is resampled to match. 373 | 374 | If ``convert_width`` is specified and the audio samples are not ``convert_width`` bytes each, the resulting audio is converted to match. 375 | 376 | Writing these bytes directly to a file results in a valid `FLAC file `__. 377 | -------------------------------------------------------------------------------- /reference/pocketsphinx.rst: -------------------------------------------------------------------------------- 1 | Notes on using PocketSphinx 2 | =========================== 3 | 4 | Installing other languages 5 | -------------------------- 6 | 7 | By default, SpeechRecognition's Sphinx functionality supports only US English. Additional language packs are also available, but not included due to the files being too large: 8 | 9 | * `International French `__ 10 | * `Mandarin Chinese `__ 11 | * `Italian `__ 12 | 13 | To install a language pack, download the ZIP archives and extract them directly into the module install directory (you can find the module install directory by running ``python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))"``). 14 | 15 | Here is a simple Bash script to install all of them, assuming you've downloaded all three ZIP files into your current directory: 16 | 17 | .. code:: bash 18 | 19 | #!/usr/bin/env bash 20 | SR_LIB=$(python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))") 21 | sudo apt-get install --yes unzip 22 | sudo unzip -o fr-FR.zip -d "$SR_LIB" 23 | sudo chmod --recursive a+r "$SR_LIB/fr-FR/" 24 | sudo unzip -o zh-CN.zip -d "$SR_LIB" 25 | sudo chmod --recursive a+r "$SR_LIB/zh-CN/" 26 | sudo unzip -o it-IT.zip -d "$SR_LIB" 27 | sudo chmod --recursive a+r "$SR_LIB/it-IT/" 28 | 29 | Once installed, you can simply specify the language using the ``language`` parameter of ``recognizer_instance.recognize_sphinx``. For example, French would be specified with ``"fr-FR"`` and Mandarin with ``"zh-CN"``. 30 | 31 | Building PocketSphinx-Python from source 32 | ---------------------------------------- 33 | 34 | For Windows, it is recommended to install the precompiled Wheel packages in the ``third-party`` directory. These are provided because building Pocketsphinx on Windows requires a lot of work, and can take hours to download and install all the surrounding software. 35 | 36 | For Linux and other POSIX systems (like OS X), you'll want to build from source. It should take less than two minutes on a fast machine. 37 | 38 | * On any Debian-derived Linux distributions (like Ubuntu and Mint): 39 | 1. Run ``sudo apt-get install python python-all-dev python-pip build-essential swig git libpulse-dev libasound2-dev`` for Python 2, or ``sudo apt-get install python3 python3-all-dev python3-pip build-essential swig git libpulse-dev libasound2-dev`` for Python 3. 40 | 2. Run ``pip install pocketsphinx`` for Python 2, or ``pip3 install pocketsphinx`` for Python 3. 41 | * On OS X: 42 | 1. Run ``brew install swig git python`` for Python 2, or ``brew install swig git python3`` for Python 3. 43 | 2. Install PocketSphinx-Python using Pip: ``pip install pocketsphinx``. 44 | * If this gives errors when importing the library in your program, try running ``brew link --overwrite python``. 45 | * On other POSIX-based systems: 46 | 1. Install `Python `__, `Pip `__, `SWIG `__, and `Git `__, preferably using a package manager. 47 | 2. Install PocketSphinx-Python using Pip: ``pip install pocketsphinx``. 48 | * On Windows: 49 | 1. Install `Python `__, `Pip `__, `SWIG `__, and `Git `__, preferably using a package manager. 50 | 2. Install the necessary `compilers suite `__ (`here's a PDF version `__ in case the link goes down) for compiling modules for your particular Python version: 51 | * `Microsoft Visual C++ Compiler for Python 2.7 `__ for Python 2.7. 52 | * `Visual Studio 2015 Community Edition `__ for Python 3.5. 53 | * The installation process for Python 3.4 is outlined in the article above. 54 | 3. Add the folders containing the Python, SWIG, and Git binaries to your ``PATH`` environment variable. 55 | * My ``PATH`` environment variable looks something like: ``C:\Users\Anthony\Desktop\swigwin-3.0.8;C:\Program Files\Git\cmd;(A BUNCH OF OTHER PATHS)``. 56 | 4. Reboot to apply changes. 57 | 5. Download the full PocketSphinx-Python source code by running ``git clone --recursive --depth 1 https://github.com/cmusphinx/pocketsphinx-python`` (downloading the ZIP archive from GitHub will not work). 58 | 6. Run ``python setup.py install`` in the PocketSphinx-Python source code folder to compile and install PocketSphinx. 59 | 7. Side note: when I build the precompiled Wheel packages, I skip steps 5 and 6 and do the following instead: 60 | * For Python 2.7: ``C:\Python27\python.exe setup.py bdist_wheel``. 61 | * For Python 3.4: ``C:\Python34\python.exe setup.py bdist_wheel``. 62 | * For Python 3.5: ``C:\Users\Anthony\AppData\Local\Programs\Python\Python35\python.exe setup.py bdist_wheel``. 63 | * The resulting packages are located in the ``dist`` folder of the PocketSphinx-Python project directory. 64 | 65 | Notes on the structure of the language data 66 | ------------------------------------------- 67 | 68 | * Every language has its own folder under ``/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/``, where ``LANGUAGE_NAME`` is the IETF language tag, like ``"en-US"`` (US English) or ``"en-GB"`` (UK English). 69 | * For example, the US English data is stored in ``/speech_recognition/pocketsphinx-data/en-US/``. 70 | * The ``language`` parameter of ``recognizer_instance.recognize_sphinx`` simply chooses the folder with the given name. 71 | * Languages are composed of 3 parts: 72 | * An acoustic model ``/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/acoustic-model/``, which describes how to interpret audio data. 73 | * Acoustic models can be downloaded from the `CMU Sphinx files `__. These are pretty disorganized, but instructions for cleaning up specific versions are listed below. 74 | * All of these should be 16 kHz (broadband) models, since that's what the library will assume is being used. 75 | * A language model ``/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/language-model.lm.bin`` (in `CMU binary format `__). 76 | * A pronounciation dictionary ``/speech_recognition/pocketsphinx-data/LANGUAGE_NAME/pronounciation-dictionary.dict``, which describes how words in the language are pronounced. 77 | 78 | Notes on building the language data from source 79 | ----------------------------------------------- 80 | 81 | * All of the following points assume a Debian-derived Linux Distibution (like Ubuntu or Mint). 82 | * To work with any complete, real-world languages, you will need quite a bit of RAM (16 GB recommended) and a fair bit of disk space (20 GB recommended). 83 | * `SphinxBase `__ is needed for all language model file format conversions. We use it to convert between ``*.dmp`` DMP files (an obselete Sphinx binary format), ``*.lm`` ARPA files, and Sphinx binary ``*.lm.bin`` files: 84 | * Install all the SphinxBase build dependencies with ``sudo apt-get install build-essential automake autotools-dev autoconf libtool``. 85 | * Download and extract the `SphinxBase source code `__. 86 | * Follow the instructions in the README to install SphinxBase. Basically, run ``sh autogen.sh --force && ./configure && make && sudo make install`` in the SphinxBase folder. 87 | * Pruning (getting rid of less important information) is useful if language model files are too large. We can do this using `IRSTLM `__: 88 | * Install all the IRSTLM build dependencies with ``sudo apt-get install build-essential automake autotools-dev autoconf libtool`` 89 | * Download and extract the `IRSTLM source code `__. 90 | * Follow the instructions in the README to install IRSTLM. Basically, run ``sh regenerate-makefiles.sh --force && ./configure && make && sudo make install`` in the IRSTLM folder. 91 | * If the language model is not in ARPA format, convert it to the ARPA format. To do this, ensure that SphinxBase is installed and run ``sphinx_lm_convert -i LANGUAGE_MODEL_FILE_GOES_HERE -o language-model.lm -ofmt arpa``. 92 | * Prune the model using IRSTLM: run ``prune-lm --threshold=1e-8 t.lm pruned.lm`` to prune with a threshold of 0.00000001. The higher the threshold, the smaller the resulting file. 93 | * Convert the model back into binary format if it was originally not in ARPA format. To do this, ensure that SphinxBase is installed and run ``sphinx_lm_convert -i language-model.lm -o LANGUAGE_MODEL_FILE_GOES_HERE``. 94 | * US English: ``/speech_recognition/pocketsphinx-data/en-US/`` is taken directly from the contents of `PocketSphinx's US English model `__. 95 | * International French: ``/speech_recognition/pocketsphinx-data/fr-FR/``: 96 | * ``/speech_recognition/pocketsphinx-data/fr-FR/language-model.lm.bin`` is ``fr-small.lm.bin`` from the `Sphinx French language model `__. 97 | * ``/speech_recognition/pocketsphinx-data/fr-FR/pronounciation-dictionary.dict`` is ``fr.dict`` from the `Sphinx French language model `__. 98 | * ``/speech_recognition/pocketsphinx-data/fr-FR/acoustic-model/`` contains all of the files extracted from ``cmusphinx-fr-5.2.tar.gz`` in the `Sphinx French acoustic model `__. 99 | * To get better French recognition accuracy at the expense of higher disk space and RAM usage: 100 | 1. Download ``fr.lm.gmp`` from the `Sphinx French language model `__. 101 | 2. Convert from DMP (an obselete Sphinx binary format) to ARPA format: ``sphinx_lm_convert -i fr.lm.gmp -o french.lm.bin``. 102 | 3. Replace ``/speech_recognition/pocketsphinx-data/fr-FR/language-model.lm.bin`` with ``french.lm.bin`` created in the previous step. 103 | * Mandarin Chinese: ``/speech_recognition/pocketsphinx-data/zh-CN/``: 104 | * ``/speech_recognition/pocketsphinx-data/zh-CN/language-model.lm.bin`` is generated as follows: 105 | 1. Download ``zh_broadcastnews_64000_utf8.DMP`` from the `Sphinx Mandarin language model `__. 106 | 2. Convert from DMP (an obselete Sphinx binary format) to ARPA format: ``sphinx_lm_convert -i zh_broadcastnews_64000_utf8.DMP -o chinese.lm -ofmt arpa``. 107 | 3. Prune with a threshold of 0.00000004 using ``prune-lm --threshold=4e-8 chinese.lm chinese.lm``. 108 | 4. Convert from ARPA format to Sphinx binary format: ``sphinx_lm_convert -i chinese.lm -o chinese.lm.bin``. 109 | 5. Replace ``/speech_recognition/pocketsphinx-data/zh-CN/language-model.lm.bin`` with ``chinese.lm.bin`` created in the previous step. 110 | * ``/speech_recognition/pocketsphinx-data/zh-CN/pronounciation-dictionary.dict`` is ``zh_broadcastnews_utf8.dic`` from the `Sphinx Mandarin language model `__. 111 | * ``/speech_recognition/pocketsphinx-data/zh-CN/acoustic-model/`` contains all of the files extracted from ``zh_broadcastnews_16k_ptm256_8000.tar.bz2`` in the `Sphinx Mandarin acoustic model `__. 112 | * To get better Chinese recognition accuracy at the expense of higher disk space and RAM usage, simply skip step 3 when preparing ``zh_broadcastnews_64000_utf8.DMP``. 113 | * Italian: ``/speech_recognition/pocketsphinx-data/it-IT/``: 114 | * ``/speech_recognition/pocketsphinx-data/it-IT/language-model.lm.bin`` is generated as follows: 115 | 1. Download ``cmusphinx-it-5.2.tar.gz`` from the `Sphinx Italian language model `__. 116 | 2. Extract ``/etc/voxforge_it_sphinx.lm`` from ``cmusphinx-it-5.2.tar.gz`` as ``italian.lm``. 117 | 3. Convert from ARPA format to Sphinx binary format: ``sphinx_lm_convert -i italian.lm -o italian.lm.bin``. 118 | 4. Replace ``/speech_recognition/pocketsphinx-data/it-IT/language-model.lm.bin`` with ``italian.lm.bin`` created in the previous step. 119 | * ``/speech_recognition/pocketsphinx-data/it-IT/pronounciation-dictionary.dict`` is ``/etc/voxforge_it_sphinx.dic`` from ``cmusphinx-it-5.2.tar.gz`` (from the `Sphinx Italian language model `__). 120 | * ``/speech_recognition/pocketsphinx-data/it-IT/acoustic-model/`` contains all of the files in ``/model_parameters`` extracted from ``cmusphinx-it-5.2.tar.gz`` (from the `Sphinx Italian language model `__). 121 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [bdist_wheel] 2 | # the `universal` setting means that the project runs unmodified on both Python 2 and 3, 3 | # and doesn't use any C extensions to Python 4 | universal=1 5 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import sys 4 | import os 5 | import stat 6 | 7 | from setuptools import setup 8 | from setuptools.command.install import install 9 | from distutils import log 10 | 11 | import speech_recognition 12 | 13 | if sys.version_info < (2, 6): 14 | print("THIS MODULE REQUIRES PYTHON 2.6, 2.7, OR 3.3+. YOU ARE CURRENTLY USING PYTHON {0}".format(sys.version)) 15 | sys.exit(1) 16 | 17 | 18 | FILES_TO_MARK_EXECUTABLE = ["flac-linux-x86", "flac-linux-x86_64", "flac-mac", "flac-win32.exe"] 19 | 20 | 21 | class InstallWithExtraSteps(install): 22 | def run(self): 23 | install.run(self) # do the original install steps 24 | 25 | # mark the FLAC executables as executable by all users (this fixes occasional issues when file permissions get messed up) 26 | for output_path in self.get_outputs(): 27 | if os.path.basename(output_path) in FILES_TO_MARK_EXECUTABLE: 28 | log.info("setting executable permissions on {}".format(output_path)) 29 | stat_info = os.stat(output_path) 30 | os.chmod( 31 | output_path, 32 | stat_info.st_mode | 33 | stat.S_IRUSR | stat.S_IXUSR | # owner can read/execute 34 | stat.S_IRGRP | stat.S_IXGRP | # group can read/execute 35 | stat.S_IROTH | stat.S_IXOTH # everyone else can read/execute 36 | ) 37 | 38 | 39 | setup( 40 | name="SpeechRecognition", 41 | version=speech_recognition.__version__, 42 | packages=["speech_recognition"], 43 | include_package_data=True, 44 | cmdclass={"install": InstallWithExtraSteps}, 45 | 46 | # PyPI metadata 47 | author=speech_recognition.__author__, 48 | author_email="azhang9@gmail.com", 49 | description=speech_recognition.__doc__, 50 | long_description=open("README.rst").read(), 51 | license=speech_recognition.__license__, 52 | keywords="speech recognition voice sphinx google wit bing api houndify ibm snowboy", 53 | url="https://github.com/Uberi/speech_recognition#readme", 54 | classifiers=[ 55 | "Development Status :: 5 - Production/Stable", 56 | "Intended Audience :: Developers", 57 | "Natural Language :: English", 58 | "License :: OSI Approved :: BSD License", 59 | "Operating System :: Microsoft :: Windows", 60 | "Operating System :: POSIX :: Linux", 61 | "Operating System :: MacOS :: MacOS X", 62 | "Operating System :: Other OS", 63 | "Programming Language :: Python", 64 | "Programming Language :: Python :: 2", 65 | "Programming Language :: Python :: 2.7", 66 | "Programming Language :: Python :: 3", 67 | "Programming Language :: Python :: 3.3", 68 | "Programming Language :: Python :: 3.4", 69 | "Programming Language :: Python :: 3.5", 70 | "Programming Language :: Python :: 3.6", 71 | "Topic :: Software Development :: Libraries :: Python Modules", 72 | "Topic :: Multimedia :: Sound/Audio :: Speech", 73 | ], 74 | ) 75 | -------------------------------------------------------------------------------- /speech_recognition/__main__.py: -------------------------------------------------------------------------------- 1 | import speech_recognition as sr 2 | 3 | r = sr.Recognizer() 4 | m = sr.Microphone() 5 | 6 | try: 7 | print("A moment of silence, please...") 8 | with m as source: r.adjust_for_ambient_noise(source) 9 | print("Set minimum energy threshold to {}".format(r.energy_threshold)) 10 | while True: 11 | print("Say something!") 12 | with m as source: audio = r.listen(source) 13 | print("Got it! Now to recognize it...") 14 | try: 15 | # recognize speech using Google Speech Recognition 16 | value = r.recognize_google(audio) 17 | 18 | # we need some special handling here to correctly print unicode characters to standard output 19 | if str is bytes: # this version of Python uses bytes for strings (Python 2) 20 | print(u"You said {}".format(value).encode("utf-8")) 21 | else: # this version of Python uses unicode for strings (Python 3+) 22 | print("You said {}".format(value)) 23 | except sr.UnknownValueError: 24 | print("Oops! Didn't catch that") 25 | except sr.RequestError as e: 26 | print("Uh oh! Couldn't request results from Google Speech Recognition service; {0}".format(e)) 27 | except KeyboardInterrupt: 28 | pass 29 | -------------------------------------------------------------------------------- /speech_recognition/deepspeech-data/README: -------------------------------------------------------------------------------- 1 | Directory for deepspeech data 2 | 3 | Get the following DeepSpeech model files 4 | https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.pbmm 5 | https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.tflite 6 | https://github.com/mozilla/DeepSpeech/releases/download/v0.7.0/deepspeech-0.7.0-models.scorer 7 | and put them into the 8 | en-US 9 | directory. 10 | 11 | Further languages can be added to respectives directories named after language codes. 12 | 13 | -------------------------------------------------------------------------------- /speech_recognition/flac-linux-x86: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/speech_recognition/flac-linux-x86 -------------------------------------------------------------------------------- /speech_recognition/flac-linux-x86_64: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/speech_recognition/flac-linux-x86_64 -------------------------------------------------------------------------------- /speech_recognition/flac-mac: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/speech_recognition/flac-mac -------------------------------------------------------------------------------- /speech_recognition/flac-win32.exe: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/speech_recognition/flac-win32.exe -------------------------------------------------------------------------------- /speech_recognition/pocketsphinx-data/en-US/LICENSE.txt: -------------------------------------------------------------------------------- 1 | Copyright (c) 1999-2015 Carnegie Mellon University. All rights 2 | reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions 6 | are met: 7 | 8 | 1. Redistributions of source code must retain the above copyright 9 | notice, this list of conditions and the following disclaimer. 10 | 11 | 2. Redistributions in binary form must reproduce the above copyright 12 | notice, this list of conditions and the following disclaimer in 13 | the documentation and/or other materials provided with the 14 | distribution. 15 | 16 | This work was supported in part by funding from the Defense Advanced 17 | Research Projects Agency and the National Science Foundation of the 18 | United States of America, and the CMU Sphinx Speech Consortium. 19 | 20 | THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND 21 | ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, 22 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 23 | PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY 24 | NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 25 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 26 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 27 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 28 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 29 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 30 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 31 | -------------------------------------------------------------------------------- /speech_recognition/pocketsphinx-data/en-US/acoustic-model/README: -------------------------------------------------------------------------------- 1 | /* ==================================================================== 2 | * Copyright (c) 2015 Alpha Cephei Inc. All rights 3 | * reserved. 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that the following conditions 7 | * are met: 8 | * 9 | * 1. Redistributions of source code must retain the above copyright 10 | * notice, this list of conditions and the following disclaimer. 11 | * 12 | * 2. Redistributions in binary form must reproduce the above copyright 13 | * notice, this list of conditions and the following disclaimer in 14 | * the documentation and/or other materials provided with the 15 | * distribution. 16 | * 17 | * THIS SOFTWARE IS PROVIDED BY ALPHA CEPHEI INC. ``AS IS'' AND. 18 | * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,. 19 | * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 20 | * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL ALPHA CEPHEI INC. 21 | * NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 22 | * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT. 23 | * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,. 24 | * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY. 25 | * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT. 26 | * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE. 27 | * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | * 29 | * ==================================================================== 30 | * 31 | */ 32 | 33 | This directory contains generic US english acoustic model trained with 34 | latest sphinxtrain. 35 | -------------------------------------------------------------------------------- /speech_recognition/pocketsphinx-data/en-US/acoustic-model/feat.params: -------------------------------------------------------------------------------- 1 | -lowerf 130 2 | -upperf 6800 3 | -nfilt 25 4 | -transform dct 5 | -lifter 22 6 | -feat 1s_c_d_dd 7 | -svspec 0-12/13-25/26-38 8 | -agc none 9 | -cmn current 10 | -varnorm no 11 | -model ptm 12 | -cmninit 40,3,-1 13 | -------------------------------------------------------------------------------- /speech_recognition/pocketsphinx-data/en-US/acoustic-model/mdef: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/speech_recognition/pocketsphinx-data/en-US/acoustic-model/mdef -------------------------------------------------------------------------------- /speech_recognition/pocketsphinx-data/en-US/acoustic-model/means: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/speech_recognition/pocketsphinx-data/en-US/acoustic-model/means -------------------------------------------------------------------------------- /speech_recognition/pocketsphinx-data/en-US/acoustic-model/noisedict: -------------------------------------------------------------------------------- 1 | SIL 2 | SIL 3 | SIL 4 | [NOISE] +NSN+ 5 | [SPEECH] +SPN+ 6 | -------------------------------------------------------------------------------- /speech_recognition/pocketsphinx-data/en-US/acoustic-model/sendump: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/speech_recognition/pocketsphinx-data/en-US/acoustic-model/sendump -------------------------------------------------------------------------------- /speech_recognition/pocketsphinx-data/en-US/acoustic-model/transition_matrices: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/speech_recognition/pocketsphinx-data/en-US/acoustic-model/transition_matrices -------------------------------------------------------------------------------- /speech_recognition/pocketsphinx-data/en-US/acoustic-model/variances: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/speech_recognition/pocketsphinx-data/en-US/acoustic-model/variances -------------------------------------------------------------------------------- /speech_recognition/pocketsphinx-data/en-US/language-model.lm.bin: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/speech_recognition/pocketsphinx-data/en-US/language-model.lm.bin -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- 1 | # placeholder file to make this folder a module - this allows tests in this folder to be discovered by `python -m unittest discover` 2 | -------------------------------------------------------------------------------- /tests/audio-mono-16-bit-44100Hz.aiff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-mono-16-bit-44100Hz.aiff -------------------------------------------------------------------------------- /tests/audio-mono-16-bit-44100Hz.flac: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-mono-16-bit-44100Hz.flac -------------------------------------------------------------------------------- /tests/audio-mono-16-bit-44100Hz.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-mono-16-bit-44100Hz.wav -------------------------------------------------------------------------------- /tests/audio-mono-24-bit-44100Hz.flac: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-mono-24-bit-44100Hz.flac -------------------------------------------------------------------------------- /tests/audio-mono-24-bit-44100Hz.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-mono-24-bit-44100Hz.wav -------------------------------------------------------------------------------- /tests/audio-mono-32-bit-44100Hz.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-mono-32-bit-44100Hz.wav -------------------------------------------------------------------------------- /tests/audio-mono-8-bit-44100Hz.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-mono-8-bit-44100Hz.wav -------------------------------------------------------------------------------- /tests/audio-stereo-16-bit-44100Hz.aiff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-stereo-16-bit-44100Hz.aiff -------------------------------------------------------------------------------- /tests/audio-stereo-16-bit-44100Hz.flac: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-stereo-16-bit-44100Hz.flac -------------------------------------------------------------------------------- /tests/audio-stereo-16-bit-44100Hz.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-stereo-16-bit-44100Hz.wav -------------------------------------------------------------------------------- /tests/audio-stereo-24-bit-44100Hz.flac: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-stereo-24-bit-44100Hz.flac -------------------------------------------------------------------------------- /tests/audio-stereo-24-bit-44100Hz.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-stereo-24-bit-44100Hz.wav -------------------------------------------------------------------------------- /tests/audio-stereo-32-bit-44100Hz.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-stereo-32-bit-44100Hz.wav -------------------------------------------------------------------------------- /tests/audio-stereo-8-bit-44100Hz.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/audio-stereo-8-bit-44100Hz.wav -------------------------------------------------------------------------------- /tests/chinese.flac: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/chinese.flac -------------------------------------------------------------------------------- /tests/english.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/english.wav -------------------------------------------------------------------------------- /tests/french.aiff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/tests/french.aiff -------------------------------------------------------------------------------- /tests/test_audio.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | import unittest 4 | from os import path 5 | 6 | import speech_recognition as sr 7 | 8 | 9 | class TestAudioFile(unittest.TestCase): 10 | def assertSimilar(self, bytes_1, bytes_2): 11 | for i, (byte_1, byte_2) in enumerate(zip(bytes_1, bytes_2)): 12 | if str is bytes: byte_1, byte_2 = ord(byte_1), ord(byte_2) # Python 2 compatibility - get the bytes as integer values 13 | if abs(byte_1 - byte_2) > 2: 14 | raise AssertionError("{} is really different from {} at index {}".format(bytes_1, bytes_2, i)) 15 | 16 | def test_get_segment(self): 17 | r = sr.Recognizer() 18 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-mono-32-bit-44100Hz.wav")) as source: audio = r.record(source) 19 | self.assertEqual(audio.get_raw_data(), audio.get_segment().get_raw_data()) 20 | self.assertEqual(audio.get_raw_data()[8:], audio.get_segment(0.022675738 * 2).get_raw_data()) 21 | self.assertEqual(audio.get_raw_data()[:16], audio.get_segment(None, 0.022675738 * 4).get_raw_data()) 22 | self.assertEqual(audio.get_raw_data()[8:16], audio.get_segment(0.022675738 * 2, 0.022675738 * 4).get_raw_data()) 23 | 24 | def test_wav_mono_8_bit(self): 25 | r = sr.Recognizer() 26 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-mono-8-bit-44100Hz.wav")) as source: audio = r.record(source) 27 | self.assertIsInstance(audio, sr.AudioData) 28 | self.assertEqual(audio.sample_rate, 44100) 29 | self.assertEqual(audio.sample_width, 1) 30 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\xff\x00\xff\x00\xff\xff\x00\xff\x00\xff\x00\xff\x00\x00\xff\x00\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\x00\xff\xff") 31 | 32 | def test_wav_mono_16_bit(self): 33 | r = sr.Recognizer() 34 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-mono-16-bit-44100Hz.wav")) as source: audio = r.record(source) 35 | self.assertIsInstance(audio, sr.AudioData) 36 | self.assertEqual(audio.sample_rate, 44100) 37 | self.assertEqual(audio.sample_width, 2) 38 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\xff\xff\x01\x00\xff\xff\x00\x00\x01\x00\xfe\xff\x01\x00\xfe\xff\x04\x00\xfc\xff\x04\x00\xfe\xff\xff\xff\x03\x00\xfe\xff") 39 | 40 | def test_wav_mono_24_bit(self): 41 | r = sr.Recognizer() 42 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-mono-24-bit-44100Hz.wav")) as source: audio = r.record(source) 43 | self.assertIsInstance(audio, sr.AudioData) 44 | self.assertEqual(audio.sample_rate, 44100) 45 | if audio.sample_width == 3: 46 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\x00\x00\xff\xff\x00\x01\x00\x00\xff\xff\x00\x00\x00\x00\x01\x00\x00\xfe\xff\x00\x01\x00\x00\xfe\xff\x00\x04\x00\x00\xfb") 47 | else: 48 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\x00\x00\x00\x00\xff\xff\x00\x00\x01\x00\x00\x00\xff\xff\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\xfe\xff\x00\x00\x01\x00") 49 | 50 | def test_wav_mono_32_bit(self): 51 | r = sr.Recognizer() 52 | audio_file_path = path.join(path.dirname(path.realpath(__file__)), "audio-mono-32-bit-44100Hz.wav") 53 | with sr.AudioFile(audio_file_path) as source: audio = r.record(source) 54 | self.assertIsInstance(audio, sr.AudioData) 55 | self.assertEqual(audio.sample_rate, 44100) 56 | self.assertEqual(audio.sample_width, 4) 57 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\x00\x00\x00\x00\xff\xff\x00\x00\x01\x00\x00\x00\xff\xff\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\xfe\xff\x00\x00\x01\x00") 58 | 59 | def test_wav_stereo_8_bit(self): 60 | r = sr.Recognizer() 61 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-stereo-8-bit-44100Hz.wav")) as source: audio = r.record(source) 62 | self.assertIsInstance(audio, sr.AudioData) 63 | self.assertEqual(audio.sample_rate, 44100) 64 | self.assertEqual(audio.sample_width, 1) 65 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\xff\x00\xff\x00\x00\xff\x7f\x7f\x00\xff\x00\xff\x00\x00\xff\x00\x7f\x7f\x7f\x00\x00\xff\x00\xff\x00\xff\x00\x7f\x7f\x7f\x7f") 66 | 67 | def test_wav_stereo_16_bit(self): 68 | r = sr.Recognizer() 69 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-stereo-16-bit-44100Hz.wav")) as source: audio = r.record(source) 70 | self.assertIsInstance(audio, sr.AudioData) 71 | self.assertEqual(audio.sample_rate, 44100) 72 | self.assertEqual(audio.sample_width, 2) 73 | self.assertSimilar(audio.get_raw_data()[:32], b"\x02\x00\xfb\xff\x04\x00\xfe\xff\xfe\xff\x07\x00\xf6\xff\x07\x00\xf9\xff\t\x00\xf5\xff\x0c\x00\xf8\xff\x02\x00\x04\x00\xfa\xff") 74 | 75 | def test_wav_stereo_24_bit(self): 76 | r = sr.Recognizer() 77 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-stereo-24-bit-44100Hz.wav")) as source: audio = r.record(source) 78 | self.assertIsInstance(audio, sr.AudioData) 79 | self.assertEqual(audio.sample_rate, 44100) 80 | if audio.sample_width == 3: 81 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\x00\x00\xfe\xff\x00\x02\x00\x00\xfe\xff\x00\x00\x00\x00\x02\x00\x00\xfc\xff\x00\x02\x00\x00\xfc\xff\x00\x08\x00\x00\xf6") 82 | else: 83 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\x00\x00\x00\x00\xfe\xff\x00\x00\x02\x00\x00\x00\xfe\xff\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\xfc\xff\x00\x00\x02\x00") 84 | 85 | def test_wav_stereo_32_bit(self): 86 | r = sr.Recognizer() 87 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-stereo-32-bit-44100Hz.wav")) as source: audio = r.record(source) 88 | self.assertIsInstance(audio, sr.AudioData) 89 | self.assertEqual(audio.sample_rate, 44100) 90 | self.assertEqual(audio.sample_width, 4) 91 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\x00\x00\x00\x00\xfe\xff\x00\x00\x02\x00\x00\x00\xfe\xff\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\xfc\xff\x00\x00\x02\x00") 92 | 93 | def test_aiff_mono_16_bit(self): 94 | r = sr.Recognizer() 95 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-mono-16-bit-44100Hz.aiff")) as source: audio = r.record(source) 96 | self.assertIsInstance(audio, sr.AudioData) 97 | self.assertEqual(audio.sample_rate, 44100) 98 | self.assertEqual(audio.sample_width, 2) 99 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\x00\x00\xff\xff\x01\x00\xff\xff\x01\x00\xfe\xff\x02\x00\xfd\xff\x04\x00\xfc\xff\x03\x00\x00\x00\xfe\xff\x03\x00\xfd\xff") 100 | 101 | def test_aiff_stereo_16_bit(self): 102 | r = sr.Recognizer() 103 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-stereo-16-bit-44100Hz.aiff")) as source: audio = r.record(source) 104 | self.assertIsInstance(audio, sr.AudioData) 105 | self.assertEqual(audio.sample_rate, 44100) 106 | self.assertEqual(audio.sample_width, 2) 107 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\xfe\xff\x02\x00\xfe\xff\xff\xff\x04\x00\xfa\xff\x04\x00\xfa\xff\t\x00\xf6\xff\n\x00\xfa\xff\xff\xff\x08\x00\xf5\xff") 108 | 109 | def test_flac_mono_16_bit(self): 110 | r = sr.Recognizer() 111 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-mono-16-bit-44100Hz.flac")) as source: audio = r.record(source) 112 | self.assertIsInstance(audio, sr.AudioData) 113 | self.assertEqual(audio.sample_rate, 44100) 114 | self.assertEqual(audio.sample_width, 2) 115 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\xff\xff\x01\x00\xff\xff\x00\x00\x01\x00\xfe\xff\x02\x00\xfc\xff\x06\x00\xf9\xff\x06\x00\xfe\xff\xfe\xff\x05\x00\xfa\xff") 116 | 117 | def test_flac_mono_24_bit(self): 118 | r = sr.Recognizer() 119 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-mono-24-bit-44100Hz.flac")) as source: audio = r.record(source) 120 | self.assertIsInstance(audio, sr.AudioData) 121 | self.assertEqual(audio.sample_rate, 44100) 122 | if audio.sample_width == 3: 123 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\x00\xff\xfe\xff\x02\x01\x00\xfd\xfe\xff\x04\x00\x00\xfc\x00\x00\x04\xfe\xff\xfb\x00\x00\x05\xfe\xff\xfc\x03\x00\x04\xfb") 124 | else: 125 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\x00\x00\x00\xff\xfe\xff\x00\x02\x01\x00\x00\xfd\xfe\xff\x00\x04\x00\x00\x00\xfc\x00\x00\x00\x04\xfe\xff\x00\xfb\x00\x00") 126 | 127 | def test_flac_stereo_16_bit(self): 128 | r = sr.Recognizer() 129 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-stereo-16-bit-44100Hz.flac")) as source: audio = r.record(source) 130 | self.assertIsInstance(audio, sr.AudioData) 131 | self.assertEqual(audio.sample_rate, 44100) 132 | self.assertEqual(audio.sample_width, 2) 133 | self.assertSimilar(audio.get_raw_data()[:32], b"\xff\xff\xff\xff\x02\x00\xfe\xff\x00\x00\x01\x00\xfd\xff\x01\x00\xff\xff\x04\x00\xfa\xff\x05\x00\xff\xff\xfd\xff\x08\x00\xf6\xff") 134 | 135 | def test_flac_stereo_24_bit(self): 136 | r = sr.Recognizer() 137 | with sr.AudioFile(path.join(path.dirname(path.realpath(__file__)), "audio-stereo-24-bit-44100Hz.flac")) as source: audio = r.record(source) 138 | self.assertIsInstance(audio, sr.AudioData) 139 | self.assertEqual(audio.sample_rate, 44100) 140 | if audio.sample_width == 3: 141 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\x00\x00\xfe\xff\x00\x02\x00\x00\xfe\xff\x00\x00\x00\xff\x01\x00\x02\xfc\xff\xfe\x01\x00\x02\xfc\xff\xfe\x07\x00\x01\xf6") 142 | else: 143 | self.assertSimilar(audio.get_raw_data()[:32], b"\x00\x00\x00\x00\x00\x00\xfe\xff\x00\x00\x02\x00\x00\x00\xfe\xff\x00\x00\x00\x00\x00\xff\x01\x00\x00\x02\xfc\xff\x00\xfe\x01\x00") 144 | 145 | 146 | if __name__ == "__main__": 147 | unittest.main() 148 | -------------------------------------------------------------------------------- /tests/test_recognition.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | 4 | import os 5 | import unittest 6 | 7 | import speech_recognition as sr 8 | 9 | 10 | class TestRecognition(unittest.TestCase): 11 | def setUp(self): 12 | self.AUDIO_FILE_EN = os.path.join(os.path.dirname(os.path.realpath(__file__)), "english.wav") 13 | self.AUDIO_FILE_FR = os.path.join(os.path.dirname(os.path.realpath(__file__)), "french.aiff") 14 | self.AUDIO_FILE_ZH = os.path.join(os.path.dirname(os.path.realpath(__file__)), "chinese.flac") 15 | 16 | def test_sphinx_english(self): 17 | r = sr.Recognizer() 18 | with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source) 19 | self.assertEqual(r.recognize_sphinx(audio), "one two three") 20 | 21 | def test_google_english(self): 22 | r = sr.Recognizer() 23 | with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source) 24 | self.assertIn(r.recognize_google(audio), ["1 2 3", "one two three"]) 25 | 26 | def test_google_french(self): 27 | r = sr.Recognizer() 28 | with sr.AudioFile(self.AUDIO_FILE_FR) as source: audio = r.record(source) 29 | self.assertEqual(r.recognize_google(audio, language="fr-FR"), u"et c'est la dictée numéro 1") 30 | 31 | def test_google_chinese(self): 32 | r = sr.Recognizer() 33 | with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source) 34 | self.assertEqual(r.recognize_google(audio, language="zh-CN"), u"砸自己的脚") 35 | 36 | @unittest.skipUnless("WIT_AI_KEY" in os.environ, "requires Wit.ai key to be specified in WIT_AI_KEY environment variable") 37 | def test_wit_english(self): 38 | r = sr.Recognizer() 39 | with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source) 40 | self.assertEqual(r.recognize_wit(audio, key=os.environ["WIT_AI_KEY"]), "one two three") 41 | 42 | @unittest.skipUnless("BING_KEY" in os.environ, "requires Microsoft Bing Voice Recognition key to be specified in BING_KEY environment variable") 43 | def test_bing_english(self): 44 | r = sr.Recognizer() 45 | with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source) 46 | self.assertEqual(r.recognize_bing(audio, key=os.environ["BING_KEY"]), "123.") 47 | 48 | @unittest.skipUnless("BING_KEY" in os.environ, "requires Microsoft Bing Voice Recognition key to be specified in BING_KEY environment variable") 49 | def test_bing_french(self): 50 | r = sr.Recognizer() 51 | with sr.AudioFile(self.AUDIO_FILE_FR) as source: audio = r.record(source) 52 | self.assertEqual(r.recognize_bing(audio, key=os.environ["BING_KEY"], language="fr-FR"), u"Essaye la dictée numéro un.") 53 | 54 | @unittest.skipUnless("BING_KEY" in os.environ, "requires Microsoft Bing Voice Recognition key to be specified in BING_KEY environment variable") 55 | def test_bing_chinese(self): 56 | r = sr.Recognizer() 57 | with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source) 58 | self.assertEqual(r.recognize_bing(audio, key=os.environ["BING_KEY"], language="zh-CN"), u"砸自己的脚。") 59 | 60 | @unittest.skipUnless("HOUNDIFY_CLIENT_ID" in os.environ and "HOUNDIFY_CLIENT_KEY" in os.environ, "requires Houndify client ID and client key to be specified in HOUNDIFY_CLIENT_ID and HOUNDIFY_CLIENT_KEY environment variables") 61 | def test_houndify_english(self): 62 | r = sr.Recognizer() 63 | with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source) 64 | self.assertEqual(r.recognize_houndify(audio, client_id=os.environ["HOUNDIFY_CLIENT_ID"], client_key=os.environ["HOUNDIFY_CLIENT_KEY"]), "one two three") 65 | 66 | @unittest.skipUnless("IBM_USERNAME" in os.environ and "IBM_PASSWORD" in os.environ, "requires IBM Speech to Text username and password to be specified in IBM_USERNAME and IBM_PASSWORD environment variables") 67 | def test_ibm_english(self): 68 | r = sr.Recognizer() 69 | with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source) 70 | self.assertEqual(r.recognize_ibm(audio, username=os.environ["IBM_USERNAME"], password=os.environ["IBM_PASSWORD"]), "one two three ") 71 | 72 | @unittest.skipUnless("IBM_USERNAME" in os.environ and "IBM_PASSWORD" in os.environ, "requires IBM Speech to Text username and password to be specified in IBM_USERNAME and IBM_PASSWORD environment variables") 73 | def test_ibm_french(self): 74 | r = sr.Recognizer() 75 | with sr.AudioFile(self.AUDIO_FILE_FR) as source: audio = r.record(source) 76 | self.assertEqual(r.recognize_ibm(audio, username=os.environ["IBM_USERNAME"], password=os.environ["IBM_PASSWORD"], language="fr-FR"), u"si la dictée numéro un ") 77 | 78 | @unittest.skipUnless("IBM_USERNAME" in os.environ and "IBM_PASSWORD" in os.environ, "requires IBM Speech to Text username and password to be specified in IBM_USERNAME and IBM_PASSWORD environment variables") 79 | def test_ibm_chinese(self): 80 | r = sr.Recognizer() 81 | with sr.AudioFile(self.AUDIO_FILE_ZH) as source: audio = r.record(source) 82 | self.assertEqual(r.recognize_ibm(audio, username=os.environ["IBM_USERNAME"], password=os.environ["IBM_PASSWORD"], language="zh-CN"), u"砸 自己 的 脚 ") 83 | 84 | 85 | if __name__ == "__main__": 86 | unittest.main() 87 | -------------------------------------------------------------------------------- /tests/test_special_features.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | 4 | import os 5 | import unittest 6 | 7 | import speech_recognition as sr 8 | 9 | 10 | class TestSpecialFeatures(unittest.TestCase): 11 | def setUp(self): 12 | self.AUDIO_FILE_EN = os.path.join(os.path.dirname(os.path.realpath(__file__)), "english.wav") 13 | self.addTypeEqualityFunc(str, self.assertSameWords) 14 | 15 | def test_sphinx_keywords(self): 16 | r = sr.Recognizer() 17 | with sr.AudioFile(self.AUDIO_FILE_EN) as source: audio = r.record(source) 18 | self.assertEqual(r.recognize_sphinx(audio, keyword_entries=[("one", 1.0), ("two", 1.0), ("three", 1.0)]), "three two one") 19 | self.assertEqual(r.recognize_sphinx(audio, keyword_entries=[("wan", 0.95), ("too", 1.0), ("tree", 1.0)]), "tree too wan") 20 | self.assertEqual(r.recognize_sphinx(audio, keyword_entries=[("un", 0.95), ("to", 1.0), ("tee", 1.0)]), "tee to un") 21 | 22 | def assertSameWords(self, tested, reference, msg=None): 23 | set_tested = set(tested.split()) 24 | set_reference = set(reference.split()) 25 | if set_tested != set_reference: 26 | raise self.failureException(msg if msg is not None else "%r doesn't consist of the same words as %r" % (tested, reference)) 27 | 28 | 29 | if __name__ == "__main__": 30 | unittest.main() 31 | -------------------------------------------------------------------------------- /third-party/Compiling Python extensions on Windows.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/Compiling Python extensions on Windows.pdf -------------------------------------------------------------------------------- /third-party/LICENSE-PyAudio.txt: -------------------------------------------------------------------------------- 1 | Copyright (c) 2006 Hubert Pham 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 4 | 5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 6 | 7 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 8 | -------------------------------------------------------------------------------- /third-party/LICENSE-Sphinx.txt: -------------------------------------------------------------------------------- 1 | Copyright (c) 1999-2015 Carnegie Mellon University. All rights 2 | reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions 6 | are met: 7 | 8 | 1. Redistributions of source code must retain the above copyright 9 | notice, this list of conditions and the following disclaimer. 10 | 11 | 2. Redistributions in binary form must reproduce the above copyright 12 | notice, this list of conditions and the following disclaimer in 13 | the documentation and/or other materials provided with the 14 | distribution. 15 | 16 | This work was supported in part by funding from the Defense Advanced 17 | Research Projects Agency and the National Science Foundation of the 18 | United States of America, and the CMU Sphinx Speech Consortium. 19 | 20 | THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND 21 | ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, 22 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 23 | PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY 24 | NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 25 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 26 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 27 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 28 | THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 29 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 30 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 31 | -------------------------------------------------------------------------------- /third-party/PyAudio-0.2.11-cp27-cp27m-win_amd64.whl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/PyAudio-0.2.11-cp27-cp27m-win_amd64.whl -------------------------------------------------------------------------------- /third-party/PyAudio-0.2.11-cp34-cp34m-win_amd64.whl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/PyAudio-0.2.11-cp34-cp34m-win_amd64.whl -------------------------------------------------------------------------------- /third-party/PyAudio-0.2.11-cp35-cp35m-win_amd64.whl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/PyAudio-0.2.11-cp35-cp35m-win_amd64.whl -------------------------------------------------------------------------------- /third-party/PyAudio-0.2.11-cp36-cp36m-win_amd64.whl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/PyAudio-0.2.11-cp36-cp36m-win_amd64.whl -------------------------------------------------------------------------------- /third-party/PyAudio-0.2.11.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/PyAudio-0.2.11.tar.gz -------------------------------------------------------------------------------- /third-party/Source code for Google API Client Library for Python and its dependencies/google-api-python-client-1.6.0.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/Source code for Google API Client Library for Python and its dependencies/google-api-python-client-1.6.0.tar.gz -------------------------------------------------------------------------------- /third-party/Source code for Google API Client Library for Python and its dependencies/httplib2-0.9.2.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/Source code for Google API Client Library for Python and its dependencies/httplib2-0.9.2.tar.gz -------------------------------------------------------------------------------- /third-party/Source code for Google API Client Library for Python and its dependencies/oauth2client-4.0.0.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/Source code for Google API Client Library for Python and its dependencies/oauth2client-4.0.0.tar.gz -------------------------------------------------------------------------------- /third-party/Source code for Google API Client Library for Python and its dependencies/pyasn1-0.1.9.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/Source code for Google API Client Library for Python and its dependencies/pyasn1-0.1.9.tar.gz -------------------------------------------------------------------------------- /third-party/Source code for Google API Client Library for Python and its dependencies/pyasn1-modules-0.0.8.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/Source code for Google API Client Library for Python and its dependencies/pyasn1-modules-0.0.8.tar.gz -------------------------------------------------------------------------------- /third-party/Source code for Google API Client Library for Python and its dependencies/rsa-3.4.2.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/Source code for Google API Client Library for Python and its dependencies/rsa-3.4.2.tar.gz -------------------------------------------------------------------------------- /third-party/Source code for Google API Client Library for Python and its dependencies/six-1.10.0.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/Source code for Google API Client Library for Python and its dependencies/six-1.10.0.tar.gz -------------------------------------------------------------------------------- /third-party/Source code for Google API Client Library for Python and its dependencies/uritemplate-3.0.0.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/Source code for Google API Client Library for Python and its dependencies/uritemplate-3.0.0.tar.gz -------------------------------------------------------------------------------- /third-party/flac-1.3.2.tar.xz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/flac-1.3.2.tar.xz -------------------------------------------------------------------------------- /third-party/irstlm-master.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/irstlm-master.zip -------------------------------------------------------------------------------- /third-party/pocketsphinx-0.1.3-cp27-cp27m-win_amd64.whl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/pocketsphinx-0.1.3-cp27-cp27m-win_amd64.whl -------------------------------------------------------------------------------- /third-party/pocketsphinx-0.1.3-cp35-cp35m-win_amd64.whl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/pocketsphinx-0.1.3-cp35-cp35m-win_amd64.whl -------------------------------------------------------------------------------- /third-party/pocketsphinx-0.1.3.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fossasia/speech_recognition/8ba42818419138c34669a693dea73a2b80faf447/third-party/pocketsphinx-0.1.3.zip --------------------------------------------------------------------------------