├── .github └── FUNDING.yml ├── LICENSE ├── README.md ├── client.py ├── requirements-client.txt ├── requirements-server.txt └── server.py /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | github: daanzu 2 | patreon: daanzu 3 | custom: "https://paypal.me/daanzu" 4 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Mozilla Public License Version 2.0 2 | ================================== 3 | 4 | 1. Definitions 5 | -------------- 6 | 7 | 1.1. "Contributor" 8 | means each individual or legal entity that creates, contributes to 9 | the creation of, or owns Covered Software. 10 | 11 | 1.2. "Contributor Version" 12 | means the combination of the Contributions of others (if any) used 13 | by a Contributor and that particular Contributor's Contribution. 14 | 15 | 1.3. "Contribution" 16 | means Covered Software of a particular Contributor. 17 | 18 | 1.4. "Covered Software" 19 | means Source Code Form to which the initial Contributor has attached 20 | the notice in Exhibit A, the Executable Form of such Source Code 21 | Form, and Modifications of such Source Code Form, in each case 22 | including portions thereof. 23 | 24 | 1.5. "Incompatible With Secondary Licenses" 25 | means 26 | 27 | (a) that the initial Contributor has attached the notice described 28 | in Exhibit B to the Covered Software; or 29 | 30 | (b) that the Covered Software was made available under the terms of 31 | version 1.1 or earlier of the License, but not also under the 32 | terms of a Secondary License. 33 | 34 | 1.6. "Executable Form" 35 | means any form of the work other than Source Code Form. 36 | 37 | 1.7. "Larger Work" 38 | means a work that combines Covered Software with other material, in 39 | a separate file or files, that is not Covered Software. 40 | 41 | 1.8. "License" 42 | means this document. 43 | 44 | 1.9. "Licensable" 45 | means having the right to grant, to the maximum extent possible, 46 | whether at the time of the initial grant or subsequently, any and 47 | all of the rights conveyed by this License. 48 | 49 | 1.10. "Modifications" 50 | means any of the following: 51 | 52 | (a) any file in Source Code Form that results from an addition to, 53 | deletion from, or modification of the contents of Covered 54 | Software; or 55 | 56 | (b) any new file in Source Code Form that contains any Covered 57 | Software. 58 | 59 | 1.11. "Patent Claims" of a Contributor 60 | means any patent claim(s), including without limitation, method, 61 | process, and apparatus claims, in any patent Licensable by such 62 | Contributor that would be infringed, but for the grant of the 63 | License, by the making, using, selling, offering for sale, having 64 | made, import, or transfer of either its Contributions or its 65 | Contributor Version. 66 | 67 | 1.12. "Secondary License" 68 | means either the GNU General Public License, Version 2.0, the GNU 69 | Lesser General Public License, Version 2.1, the GNU Affero General 70 | Public License, Version 3.0, or any later versions of those 71 | licenses. 72 | 73 | 1.13. "Source Code Form" 74 | means the form of the work preferred for making modifications. 75 | 76 | 1.14. "You" (or "Your") 77 | means an individual or a legal entity exercising rights under this 78 | License. For legal entities, "You" includes any entity that 79 | controls, is controlled by, or is under common control with You. For 80 | purposes of this definition, "control" means (a) the power, direct 81 | or indirect, to cause the direction or management of such entity, 82 | whether by contract or otherwise, or (b) ownership of more than 83 | fifty percent (50%) of the outstanding shares or beneficial 84 | ownership of such entity. 85 | 86 | 2. License Grants and Conditions 87 | -------------------------------- 88 | 89 | 2.1. Grants 90 | 91 | Each Contributor hereby grants You a world-wide, royalty-free, 92 | non-exclusive license: 93 | 94 | (a) under intellectual property rights (other than patent or trademark) 95 | Licensable by such Contributor to use, reproduce, make available, 96 | modify, display, perform, distribute, and otherwise exploit its 97 | Contributions, either on an unmodified basis, with Modifications, or 98 | as part of a Larger Work; and 99 | 100 | (b) under Patent Claims of such Contributor to make, use, sell, offer 101 | for sale, have made, import, and otherwise transfer either its 102 | Contributions or its Contributor Version. 103 | 104 | 2.2. Effective Date 105 | 106 | The licenses granted in Section 2.1 with respect to any Contribution 107 | become effective for each Contribution on the date the Contributor first 108 | distributes such Contribution. 109 | 110 | 2.3. Limitations on Grant Scope 111 | 112 | The licenses granted in this Section 2 are the only rights granted under 113 | this License. No additional rights or licenses will be implied from the 114 | distribution or licensing of Covered Software under this License. 115 | Notwithstanding Section 2.1(b) above, no patent license is granted by a 116 | Contributor: 117 | 118 | (a) for any code that a Contributor has removed from Covered Software; 119 | or 120 | 121 | (b) for infringements caused by: (i) Your and any other third party's 122 | modifications of Covered Software, or (ii) the combination of its 123 | Contributions with other software (except as part of its Contributor 124 | Version); or 125 | 126 | (c) under Patent Claims infringed by Covered Software in the absence of 127 | its Contributions. 128 | 129 | This License does not grant any rights in the trademarks, service marks, 130 | or logos of any Contributor (except as may be necessary to comply with 131 | the notice requirements in Section 3.4). 132 | 133 | 2.4. Subsequent Licenses 134 | 135 | No Contributor makes additional grants as a result of Your choice to 136 | distribute the Covered Software under a subsequent version of this 137 | License (see Section 10.2) or under the terms of a Secondary License (if 138 | permitted under the terms of Section 3.3). 139 | 140 | 2.5. Representation 141 | 142 | Each Contributor represents that the Contributor believes its 143 | Contributions are its original creation(s) or it has sufficient rights 144 | to grant the rights to its Contributions conveyed by this License. 145 | 146 | 2.6. Fair Use 147 | 148 | This License is not intended to limit any rights You have under 149 | applicable copyright doctrines of fair use, fair dealing, or other 150 | equivalents. 151 | 152 | 2.7. Conditions 153 | 154 | Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted 155 | in Section 2.1. 156 | 157 | 3. Responsibilities 158 | ------------------- 159 | 160 | 3.1. Distribution of Source Form 161 | 162 | All distribution of Covered Software in Source Code Form, including any 163 | Modifications that You create or to which You contribute, must be under 164 | the terms of this License. You must inform recipients that the Source 165 | Code Form of the Covered Software is governed by the terms of this 166 | License, and how they can obtain a copy of this License. You may not 167 | attempt to alter or restrict the recipients' rights in the Source Code 168 | Form. 169 | 170 | 3.2. Distribution of Executable Form 171 | 172 | If You distribute Covered Software in Executable Form then: 173 | 174 | (a) such Covered Software must also be made available in Source Code 175 | Form, as described in Section 3.1, and You must inform recipients of 176 | the Executable Form how they can obtain a copy of such Source Code 177 | Form by reasonable means in a timely manner, at a charge no more 178 | than the cost of distribution to the recipient; and 179 | 180 | (b) You may distribute such Executable Form under the terms of this 181 | License, or sublicense it under different terms, provided that the 182 | license for the Executable Form does not attempt to limit or alter 183 | the recipients' rights in the Source Code Form under this License. 184 | 185 | 3.3. Distribution of a Larger Work 186 | 187 | You may create and distribute a Larger Work under terms of Your choice, 188 | provided that You also comply with the requirements of this License for 189 | the Covered Software. If the Larger Work is a combination of Covered 190 | Software with a work governed by one or more Secondary Licenses, and the 191 | Covered Software is not Incompatible With Secondary Licenses, this 192 | License permits You to additionally distribute such Covered Software 193 | under the terms of such Secondary License(s), so that the recipient of 194 | the Larger Work may, at their option, further distribute the Covered 195 | Software under the terms of either this License or such Secondary 196 | License(s). 197 | 198 | 3.4. Notices 199 | 200 | You may not remove or alter the substance of any license notices 201 | (including copyright notices, patent notices, disclaimers of warranty, 202 | or limitations of liability) contained within the Source Code Form of 203 | the Covered Software, except that You may alter any license notices to 204 | the extent required to remedy known factual inaccuracies. 205 | 206 | 3.5. Application of Additional Terms 207 | 208 | You may choose to offer, and to charge a fee for, warranty, support, 209 | indemnity or liability obligations to one or more recipients of Covered 210 | Software. However, You may do so only on Your own behalf, and not on 211 | behalf of any Contributor. You must make it absolutely clear that any 212 | such warranty, support, indemnity, or liability obligation is offered by 213 | You alone, and You hereby agree to indemnify every Contributor for any 214 | liability incurred by such Contributor as a result of warranty, support, 215 | indemnity or liability terms You offer. You may include additional 216 | disclaimers of warranty and limitations of liability specific to any 217 | jurisdiction. 218 | 219 | 4. Inability to Comply Due to Statute or Regulation 220 | --------------------------------------------------- 221 | 222 | If it is impossible for You to comply with any of the terms of this 223 | License with respect to some or all of the Covered Software due to 224 | statute, judicial order, or regulation then You must: (a) comply with 225 | the terms of this License to the maximum extent possible; and (b) 226 | describe the limitations and the code they affect. Such description must 227 | be placed in a text file included with all distributions of the Covered 228 | Software under this License. Except to the extent prohibited by statute 229 | or regulation, such description must be sufficiently detailed for a 230 | recipient of ordinary skill to be able to understand it. 231 | 232 | 5. Termination 233 | -------------- 234 | 235 | 5.1. The rights granted under this License will terminate automatically 236 | if You fail to comply with any of its terms. However, if You become 237 | compliant, then the rights granted under this License from a particular 238 | Contributor are reinstated (a) provisionally, unless and until such 239 | Contributor explicitly and finally terminates Your grants, and (b) on an 240 | ongoing basis, if such Contributor fails to notify You of the 241 | non-compliance by some reasonable means prior to 60 days after You have 242 | come back into compliance. Moreover, Your grants from a particular 243 | Contributor are reinstated on an ongoing basis if such Contributor 244 | notifies You of the non-compliance by some reasonable means, this is the 245 | first time You have received notice of non-compliance with this License 246 | from such Contributor, and You become compliant prior to 30 days after 247 | Your receipt of the notice. 248 | 249 | 5.2. If You initiate litigation against any entity by asserting a patent 250 | infringement claim (excluding declaratory judgment actions, 251 | counter-claims, and cross-claims) alleging that a Contributor Version 252 | directly or indirectly infringes any patent, then the rights granted to 253 | You by any and all Contributors for the Covered Software under Section 254 | 2.1 of this License shall terminate. 255 | 256 | 5.3. In the event of termination under Sections 5.1 or 5.2 above, all 257 | end user license agreements (excluding distributors and resellers) which 258 | have been validly granted by You or Your distributors under this License 259 | prior to termination shall survive termination. 260 | 261 | ************************************************************************ 262 | * * 263 | * 6. Disclaimer of Warranty * 264 | * ------------------------- * 265 | * * 266 | * Covered Software is provided under this License on an "as is" * 267 | * basis, without warranty of any kind, either expressed, implied, or * 268 | * statutory, including, without limitation, warranties that the * 269 | * Covered Software is free of defects, merchantable, fit for a * 270 | * particular purpose or non-infringing. The entire risk as to the * 271 | * quality and performance of the Covered Software is with You. * 272 | * Should any Covered Software prove defective in any respect, You * 273 | * (not any Contributor) assume the cost of any necessary servicing, * 274 | * repair, or correction. This disclaimer of warranty constitutes an * 275 | * essential part of this License. No use of any Covered Software is * 276 | * authorized under this License except under this disclaimer. * 277 | * * 278 | ************************************************************************ 279 | 280 | ************************************************************************ 281 | * * 282 | * 7. Limitation of Liability * 283 | * -------------------------- * 284 | * * 285 | * Under no circumstances and under no legal theory, whether tort * 286 | * (including negligence), contract, or otherwise, shall any * 287 | * Contributor, or anyone who distributes Covered Software as * 288 | * permitted above, be liable to You for any direct, indirect, * 289 | * special, incidental, or consequential damages of any character * 290 | * including, without limitation, damages for lost profits, loss of * 291 | * goodwill, work stoppage, computer failure or malfunction, or any * 292 | * and all other commercial damages or losses, even if such party * 293 | * shall have been informed of the possibility of such damages. This * 294 | * limitation of liability shall not apply to liability for death or * 295 | * personal injury resulting from such party's negligence to the * 296 | * extent applicable law prohibits such limitation. Some * 297 | * jurisdictions do not allow the exclusion or limitation of * 298 | * incidental or consequential damages, so this exclusion and * 299 | * limitation may not apply to You. * 300 | * * 301 | ************************************************************************ 302 | 303 | 8. Litigation 304 | ------------- 305 | 306 | Any litigation relating to this License may be brought only in the 307 | courts of a jurisdiction where the defendant maintains its principal 308 | place of business and such litigation shall be governed by laws of that 309 | jurisdiction, without reference to its conflict-of-law provisions. 310 | Nothing in this Section shall prevent a party's ability to bring 311 | cross-claims or counter-claims. 312 | 313 | 9. Miscellaneous 314 | ---------------- 315 | 316 | This License represents the complete agreement concerning the subject 317 | matter hereof. If any provision of this License is held to be 318 | unenforceable, such provision shall be reformed only to the extent 319 | necessary to make it enforceable. Any law or regulation which provides 320 | that the language of a contract shall be construed against the drafter 321 | shall not be used to construe this License against a Contributor. 322 | 323 | 10. Versions of the License 324 | --------------------------- 325 | 326 | 10.1. New Versions 327 | 328 | Mozilla Foundation is the license steward. Except as provided in Section 329 | 10.3, no one other than the license steward has the right to modify or 330 | publish new versions of this License. Each version will be given a 331 | distinguishing version number. 332 | 333 | 10.2. Effect of New Versions 334 | 335 | You may distribute the Covered Software under the terms of the version 336 | of the License under which You originally received the Covered Software, 337 | or under the terms of any subsequent version published by the license 338 | steward. 339 | 340 | 10.3. Modified Versions 341 | 342 | If you create software not governed by this License, and you want to 343 | create a new license for such software, you may create and use a 344 | modified version of this License if you rename the license and remove 345 | any references to the name of the license steward (except to note that 346 | such modified license differs from this License). 347 | 348 | 10.4. Distributing Source Code Form that is Incompatible With Secondary 349 | Licenses 350 | 351 | If You choose to distribute Source Code Form that is Incompatible With 352 | Secondary Licenses under the terms of this version of the License, the 353 | notice described in Exhibit B of this License must be attached. 354 | 355 | Exhibit A - Source Code Form License Notice 356 | ------------------------------------------- 357 | 358 | This Source Code Form is subject to the terms of the Mozilla Public 359 | License, v. 2.0. If a copy of the MPL was not distributed with this 360 | file, You can obtain one at http://mozilla.org/MPL/2.0/. 361 | 362 | If it is not possible or desirable to put the notice in a particular 363 | file, then You may include the notice in a location (such as a LICENSE 364 | file in a relevant directory) where a recipient would be likely to look 365 | for such a notice. 366 | 367 | You may add additional accurate notices of copyright ownership. 368 | 369 | Exhibit B - "Incompatible With Secondary Licenses" Notice 370 | --------------------------------------------------------- 371 | 372 | This Source Code Form is "Incompatible With Secondary Licenses", as 373 | defined by the Mozilla Public License, v. 2.0. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DeepSpeech WebSocket Server 2 | 3 | [![Donate](https://img.shields.io/badge/donate-GitHub-pink.svg)](https://github.com/sponsors/daanzu) 4 | [![Donate](https://img.shields.io/badge/donate-Patreon-orange.svg)](https://www.patreon.com/daanzu) 5 | [![Donate](https://img.shields.io/badge/donate-PayPal-green.svg)](https://paypal.me/daanzu) 6 | [![Donate](https://img.shields.io/badge/preferred-GitHub-black.svg)](https://github.com/sponsors/daanzu) 7 | [**GitHub** is currently matching all my donations $-for-$.] 8 | 9 | This is a [WebSocket](https://en.wikipedia.org/wiki/WebSocket) server (& client) for Mozilla's [DeepSpeech](https://github.com/mozilla/DeepSpeech), to allow easy real-time speech recognition, using a separate client & server that can be run in different environments, either locally or remotely. 10 | 11 | Work in progress. Developed to quickly test new models running DeepSpeech in [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/about) using microphone input from host Windows. Available to save others some time. 12 | 13 | ## Features 14 | 15 | * Server 16 | - Tested and works with DeepSpeech v0.7 (thanks [@Kai-Karren](https://github.com/Kai-Karren)) 17 | - Streaming inference via DeepSpeech v0.2+ 18 | - Streams raw audio data from client via WebSocket 19 | - Multi-user (only decodes one stream at a time, but can block until decoding is available) 20 | * Client 21 | - Streams raw audio data from microphone to server via WebSocket 22 | - Voice activity detection (VAD) to ignore noise and segment microphone input into separate utterances 23 | - Hypnotizing spinner to indicate voice activity is detected! 24 | - Option to automatically save each utterance to a separate .wav file, for later testing 25 | - Need to pause/unpause listening? [See here](https://github.com/daanzu/deepspeech-websocket-server/issues/6). 26 | 27 | ## Installation 28 | 29 | This package is developed in Python 3. 30 | Activate a virtualenv, then install the requirements for the server and/or client, depending on usage: 31 | 32 | ```bash 33 | pip install -r requirements-server.txt 34 | ### AND/OR ### 35 | pip install -r requirements-client.txt 36 | ``` 37 | 38 | To run the server in an environment, you also need to install DeepSpeech, which requires choosing either the CPU xor GPU version: 39 | 40 | ```bash 41 | pip install deepspeech 42 | ### XOR ### 43 | pip install deepspeech-gpu 44 | ``` 45 | 46 | Upgrade to the latest DeepSpeech with `pip install deepspeech --upgrade` (or gpu version). This package works with v0.3.0. 47 | 48 | The client uses `pyaudio` and `portaudio` for microphone access. In my experience, this works out of the box on Windows. 49 | On Linux, you may need to install portaudio header files to compile the pyaudio package: `sudo apt install portaudio19-dev` . 50 | On MacOS, try installing portaudio with brew: `brew install portaudio` . 51 | 52 | ## Server 53 | 54 | ``` 55 | > python server.py --model ../models/daanzu-6h-512l-0001lr-425dr/ -l -t 56 | Initializing model... 57 | 2018-10-06 AM 05:55:16.357: __main__: INFO: (): args.model: ../models/daanzu-6h-512l-0001lr-425dr/output_graph.pb 58 | 2018-10-06 AM 05:55:16.357: __main__: INFO: (): args.alphabet: ../models/daanzu-6h-512l-0001lr-425dr/alphabet.txt 59 | TensorFlow: v1.6.0-18-g5021473 60 | DeepSpeech: v0.2.0-0-g009f9b6 61 | Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage. 62 | 2018-10-06 05:55:16.358385: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 63 | 2018-10-06 AM 05:55:16.395: __main__: INFO: (): args.lm: ../models/daanzu-6h-512l-0001lr-425dr/lm.binary 64 | 2018-10-06 AM 05:55:16.395: __main__: INFO: (): args.trie: ../models/daanzu-6h-512l-0001lr-425dr/trie 65 | Bottle v0.12.13 server starting up (using GeventWebSocketServer())... 66 | Listening on http://127.0.0.1:8080/ 67 | Hit Ctrl-C to quit. 68 | 69 | 2018-10-06 AM 05:55:30.194: __main__: INFO: echo(): recognized: 'alpha bravo charlie' 70 | 2018-10-06 AM 05:55:32.297: __main__: INFO: echo(): recognized: 'delta echo foxtrot' 71 | 2018-10-06 AM 05:55:54.747: __main__: INFO: echo(): dead websocket 72 | ^CKeyboardInterrupt 73 | ``` 74 | 75 | ``` 76 | > python server.py -h 77 | usage: server.py [-h] -m MODEL [-a [ALPHABET]] [-l [LM]] [-t [TRIE]] [--lw LW] 78 | [--vwcw VWCW] [--bw BW] [-p PORT] 79 | 80 | optional arguments: 81 | -h, --help show this help message and exit 82 | -m MODEL, --model MODEL 83 | Path to the model (protocol buffer binary file, or 84 | directory containing all files for model) 85 | -a [ALPHABET], --alphabet [ALPHABET] 86 | Path to the configuration file specifying the alphabet 87 | used by the network. Default: alphabet.txt 88 | -l [LM], --lm [LM] Path to the language model binary file. Default: 89 | lm.binary 90 | -t [TRIE], --trie [TRIE] 91 | Path to the language model trie file created with 92 | native_client/generate_trie. Default: trie 93 | --lw LW The alpha hyperparameter of the CTC decoder. Language 94 | Model weight. Default: 1.5 95 | --vwcw VWCW Valid word insertion weight. This is used to lessen 96 | the word insertion penalty when the inserted word is 97 | part of the vocabulary. Default: 2.25 98 | --bw BW Beam width used in the CTC decoder when building 99 | candidate transcriptions. Default: 1024 100 | -p PORT, --port PORT Port to run server on. Default: 8080 101 | ``` 102 | 103 | ## Client 104 | 105 | ``` 106 | λ py client.py 107 | Listening... 108 | Recognized: alpha bravo charlie 109 | Recognized: delta echo foxtrot 110 | ^C 111 | ``` 112 | 113 | ``` 114 | λ py client.py -h 115 | usage: client.py [-h] [-s SERVER] [-a AGGRESSIVENESS] [--nospinner] 116 | [-w SAVEWAV] 117 | 118 | Streams raw audio data from microphone with VAD to server via WebSocket 119 | 120 | optional arguments: 121 | -h, --help show this help message and exit 122 | -s SERVER, --server SERVER 123 | Default: ws://localhost:8080/recognize 124 | -a AGGRESSIVENESS, --aggressiveness AGGRESSIVENESS 125 | Set aggressiveness of VAD: an integer between 0 and 3, 126 | 0 being the least aggressive about filtering out non- 127 | speech, 3 the most aggressive. Default: 3 128 | --nospinner Disable spinner 129 | -w SAVEWAV, --savewav SAVEWAV 130 | Save .wav files of utterences to given directory 131 | ``` 132 | 133 | ## Contributions 134 | 135 | Pull requests welcome. 136 | 137 | Contributors: 138 | * [@Zeddy913](https://github.com/Zeddy913) 139 | -------------------------------------------------------------------------------- /client.py: -------------------------------------------------------------------------------- 1 | import time, logging 2 | from datetime import datetime 3 | import threading, collections, queue, os, os.path 4 | import wave 5 | import pyaudio 6 | import webrtcvad 7 | from lomond import WebSocket, events 8 | from halo import Halo 9 | 10 | logger = logging.getLogger(__name__) 11 | logging.basicConfig(level=30, 12 | format="%(asctime)s.%(msecs)03d: %(name)s: %(levelname)s: %(funcName)s(): %(message)s", 13 | datefmt="%Y-%m-%d %p %I:%M:%S", 14 | ) 15 | logging.getLogger('lomond').setLevel(30) 16 | 17 | 18 | ############################################################################################################################################################### 19 | 20 | class Audio(object): 21 | """Streams raw audio from microphone. Data is received in a separate thread, and stored in a buffer, to be read from.""" 22 | 23 | FORMAT = pyaudio.paInt16 24 | RATE = 16000 25 | CHANNELS = 1 26 | BLOCKS_PER_SECOND = 50 27 | 28 | def __init__(self, callback=None, buffer_s=0, flush_queue=True): 29 | def proxy_callback(in_data, frame_count, time_info, status): 30 | callback(in_data) 31 | return (None, pyaudio.paContinue) 32 | if callback is None: callback = lambda in_data: self.buffer_queue.put(in_data, block=False) 33 | self.sample_rate = self.RATE 34 | self.flush_queue = flush_queue 35 | self.buffer_queue = queue.Queue(maxsize=(buffer_s * 1000 // self.block_duration_ms)) 36 | self.pa = pyaudio.PyAudio() 37 | self.stream = self.pa.open(format=self.FORMAT, 38 | channels=self.CHANNELS, 39 | rate=self.sample_rate, 40 | input=True, 41 | frames_per_buffer=self.block_size, 42 | stream_callback=proxy_callback) 43 | self.stream.start_stream() 44 | self.active = True 45 | 46 | def destroy(self): 47 | self.stream.stop_stream() 48 | self.stream.close() 49 | self.pa.terminate() 50 | self.active = False 51 | 52 | def read(self): 53 | """Return a block of audio data, blocking if necessary.""" 54 | if self.active or (self.flush_queue and not self.buffer_queue.empty()): 55 | return self.buffer_queue.get() 56 | else: 57 | return None 58 | 59 | def read_loop(self, callback): 60 | """Block looping reading, repeatedly passing a block of audio data to callback.""" 61 | for block in iter(self): 62 | callback(block) 63 | 64 | def __iter__(self): 65 | """Generator that yields all audio blocks from microphone.""" 66 | while True: 67 | block = self.read() 68 | if block is None: 69 | break 70 | yield block 71 | 72 | block_size = property(lambda self: int(self.sample_rate / float(self.BLOCKS_PER_SECOND))) 73 | block_duration_ms = property(lambda self: 1000 * self.block_size // self.sample_rate) 74 | 75 | def write_wav(self, filename, data): 76 | logging.info("write wav %s", filename) 77 | wf = wave.open(filename, 'wb') 78 | wf.setnchannels(self.CHANNELS) 79 | # wf.setsampwidth(self.pa.get_sample_size(FORMAT)) 80 | assert self.FORMAT == pyaudio.paInt16 81 | wf.setsampwidth(2) 82 | wf.setframerate(self.sample_rate) 83 | wf.writeframes(data) 84 | wf.close() 85 | 86 | 87 | ############################################################################################################################################################### 88 | 89 | class VADAudio(Audio): 90 | """Filter & segment audio with voice activity detection.""" 91 | 92 | def __init__(self, aggressiveness=3): 93 | super().__init__() 94 | self.vad = webrtcvad.Vad(aggressiveness) 95 | 96 | def vad_collector_simple(self, pre_padding_ms, blocks=None): 97 | if blocks is None: blocks = iter(self) 98 | num_padding_blocks = padding_ms // self.block_duration_ms 99 | buff = collections.deque(maxlen=num_padding_blocks) 100 | triggered = False 101 | 102 | for block in blocks: 103 | is_speech = self.vad.is_speech(block, self.sample_rate) 104 | 105 | if not triggered: 106 | if is_speech: 107 | triggered = True 108 | for f in buff: 109 | yield f 110 | buff.clear() 111 | yield block 112 | else: 113 | buff.append(block) 114 | 115 | else: 116 | if is_speech: 117 | yield block 118 | else: 119 | triggered = False 120 | yield None 121 | buff.append(block) 122 | 123 | def vad_collector(self, padding_ms=300, ratio=0.75, blocks=None): 124 | """Generator that yields series of consecutive audio blocks comprising each utterence, separated by yielding a single None. 125 | Determines voice activity by ratio of blocks in padding_ms. Uses a buffer to include padding_ms prior to being triggered. 126 | Example: (block, ..., block, None, block, ..., block, None, ...) 127 | |---utterence---| |---utterence---| 128 | """ 129 | if blocks is None: blocks = iter(self) 130 | num_padding_blocks = padding_ms // self.block_duration_ms 131 | ring_buffer = collections.deque(maxlen=num_padding_blocks) 132 | triggered = False 133 | 134 | for block in blocks: 135 | is_speech = self.vad.is_speech(block, self.sample_rate) 136 | 137 | if not triggered: 138 | ring_buffer.append((block, is_speech)) 139 | num_voiced = len([f for f, speech in ring_buffer if speech]) 140 | if num_voiced > ratio * ring_buffer.maxlen: 141 | triggered = True 142 | for f, s in ring_buffer: 143 | yield f 144 | ring_buffer.clear() 145 | 146 | else: 147 | yield block 148 | ring_buffer.append((block, is_speech)) 149 | num_unvoiced = len([f for f, speech in ring_buffer if not speech]) 150 | if num_unvoiced > ratio * ring_buffer.maxlen: 151 | triggered = False 152 | yield None 153 | ring_buffer.clear() 154 | 155 | @classmethod 156 | def test_vad(cls, aggressiveness): 157 | self = cls(aggressiveness=aggressiveness) 158 | blocks = iter(self) 159 | for block in blocks: 160 | is_speech = self.vad.is_speech(block, self.sample_rate) 161 | print('|' if is_speech else '.', end='', flush=True) 162 | 163 | 164 | ############################################################################################################################################################### 165 | 166 | ready = False 167 | 168 | def print_output(*args): 169 | if logger.isEnabledFor(40): 170 | print(*args) 171 | 172 | def audio_consumer(vad_audio, websocket): 173 | """blocks""" 174 | spinner = None 175 | if not ARGS.nospinner: spinner = Halo(spinner='line') # circleHalves point arc boxBounce2 bounce line 176 | length_ms = 0 177 | wav_data = bytearray() 178 | 179 | for block in vad_audio.vad_collector(): 180 | if ready and websocket.is_active: 181 | if block is not None: 182 | if not length_ms: 183 | logging.debug("begin utterence") 184 | if spinner: spinner.start() 185 | logging.log(5, "sending block") 186 | websocket.send_binary(block) 187 | if ARGS.savewav: wav_data.extend(block) 188 | length_ms += vad_audio.block_duration_ms 189 | 190 | else: 191 | if spinner: spinner.stop() 192 | if not length_ms: raise RuntimeError("ended utterence without beginning") 193 | logging.debug("end utterence") 194 | if ARGS.savewav: 195 | vad_audio.write_wav(os.path.join(ARGS.savewav, datetime.now().strftime("savewav_%Y-%m-%d_%H-%M-%S_%f.wav")), wav_data) 196 | wav_data = bytearray() 197 | logging.info("sent audio length_ms: %d" % length_ms) 198 | logging.log(5, "sending EOS") 199 | websocket.send_text('EOS') 200 | length_ms = 0 201 | 202 | def websocket_runner(websocket): 203 | """blocks""" 204 | 205 | def on_event(event): 206 | if isinstance(event, events.Ready): 207 | global ready 208 | if not ready: 209 | print_output("Connected!") 210 | ready = True 211 | elif isinstance(event, events.Text): 212 | if 1: print_output("Recognized: %s" % event.text) 213 | elif 1: 214 | logging.debug(event) 215 | 216 | for event in websocket: 217 | try: 218 | on_event(event) 219 | except: 220 | logger.exception('error handling %r', event) 221 | websocket.close() 222 | 223 | def main(): 224 | websocket = WebSocket(ARGS.server) 225 | # TODO: compress? 226 | print_output("Connecting to '%s'..." % websocket.url) 227 | 228 | vad_audio = VADAudio(aggressiveness=ARGS.aggressiveness) 229 | print_output("Listening (ctrl-C to exit)...") 230 | audio_consumer_thread = threading.Thread(target=lambda: audio_consumer(vad_audio, websocket)) 231 | audio_consumer_thread.start() 232 | 233 | websocket_runner(websocket) 234 | 235 | 236 | ############################################################################################################################################################### 237 | 238 | def main_test(): 239 | if 0: 240 | def consumer(self, blocks): 241 | length_ms = 0 242 | for block in blocks: 243 | if block is not None: 244 | print('|', end='', flush=True) 245 | length_ms += self.block_duration_ms 246 | else: 247 | print('.', end='', flush=True) 248 | length_ms = 0 249 | VADAudio(consumer) 250 | elif 1: 251 | VADAudio.test_vad(3) 252 | 253 | if __name__ == '__main__': 254 | import argparse 255 | parser = argparse.ArgumentParser(description="Streams raw audio data from microphone with VAD to server via WebSocket") 256 | parser.add_argument('-s', '--server', default='ws://localhost:8080/recognize', 257 | help="Default: ws://localhost:8080/recognize") 258 | parser.add_argument('-a', '--aggressiveness', type=int, default=3, 259 | help="Set aggressiveness of VAD: an integer between 0 and 3, 0 being the least aggressive about filtering out non-speech, 3 the most aggressive. Default: 3") 260 | parser.add_argument('--nospinner', action='store_true', 261 | help="Disable spinner") 262 | parser.add_argument('-w', '--savewav', 263 | help="Save .wav files of utterences to given directory. Example for current directory: -w .") 264 | parser.add_argument('-v', '--verbose', action='store_true', 265 | help="Print debugging info") 266 | ARGS = parser.parse_args() 267 | 268 | if ARGS.verbose: logging.getLogger().setLevel(10) 269 | if ARGS.savewav: os.makedirs(ARGS.savewav, exist_ok=True) 270 | 271 | if 0: 272 | main_test() 273 | else: 274 | main() 275 | -------------------------------------------------------------------------------- /requirements-client.txt: -------------------------------------------------------------------------------- 1 | lomond>=0.3.3 2 | pyaudio>=0.2.11 3 | webrtcvad>=2.0.10 4 | halo>=0.0.18 5 | -------------------------------------------------------------------------------- /requirements-server.txt: -------------------------------------------------------------------------------- 1 | numpy>=1.15.1 2 | bottle>=0.12.13 3 | bottle-websocket>=0.2.9 4 | -------------------------------------------------------------------------------- /server.py: -------------------------------------------------------------------------------- 1 | import argparse, logging, os.path 2 | from time import time 3 | 4 | from bottle import get, run, template 5 | from bottle.ext.websocket import GeventWebSocketServer 6 | from bottle.ext.websocket import websocket 7 | from gevent.lock import BoundedSemaphore 8 | 9 | import deepspeech 10 | import numpy as np 11 | 12 | logger = logging.getLogger(__name__) 13 | logging.basicConfig(level=20, 14 | format="%(asctime)s.%(msecs)03d: %(name)s: %(levelname)s: %(funcName)s(): %(message)s", 15 | datefmt="%Y-%m-%d %p %I:%M:%S", 16 | ) 17 | 18 | parser = argparse.ArgumentParser(description='') 19 | parser.add_argument('-m', '--model', required=True, 20 | help='Path to the model (protocol buffer binary file, or directory containing all files for model)') 21 | parser.add_argument('-s', '--scorer', help='The path to the scorer that adds an (optional) external language model to deepspeech') 22 | parser.add_argument('-a', '--alphabet', nargs='?', const='alphabet.txt', 23 | help='Path to the configuration file specifying the alphabet used by the network. Default: alphabet.txt') 24 | parser.add_argument('-l', '--lm', nargs='?', const='lm.binary', 25 | help='Path to the language model binary file. Default: lm.binary') 26 | parser.add_argument('-t', '--trie', nargs='?', const='trie', 27 | help='Path to the language model trie file created with native_client/generate_trie. Default: trie') 28 | parser.add_argument('--lw', type=float, default=1.5, 29 | help='The alpha hyperparameter of the CTC decoder. Language Model weight. Default: 1.5') 30 | parser.add_argument('--vwcw', type=float, default=2.25, 31 | help='Valid word insertion weight. This is used to lessen the word insertion penalty when the inserted word is part of the vocabulary. Default: 2.25') 32 | parser.add_argument('--bw', type=int, default=1024, 33 | help='Beam width used in the CTC decoder when building candidate transcriptions. Default: 1024') 34 | parser.add_argument('-p', '--port', default=8080, 35 | help='Port to run server on. Default: 8080') 36 | parser.add_argument('--debuglevel', default=20, 37 | help='Debug logging level. Default: 20') 38 | ARGS = parser.parse_args() 39 | 40 | logging.getLogger().setLevel(int(ARGS.debuglevel)) 41 | 42 | gSem = BoundedSemaphore(1) # Only one Deepspeech instance available at a time 43 | 44 | if os.path.isdir(ARGS.model): 45 | model_dir = ARGS.model 46 | ARGS.model = os.path.join(model_dir, 'model.pbmm') 47 | 48 | LM_WEIGHT = ARGS.lw 49 | VALID_WORD_COUNT_WEIGHT = ARGS.vwcw 50 | BEAM_WIDTH = ARGS.bw 51 | 52 | print('Initializing model...') 53 | logger.info("ARGS.model: %s", ARGS.model) 54 | 55 | # code for version deepspech version 0.7 and above 56 | model = deepspeech.Model(ARGS.model) 57 | 58 | if ARGS.scorer: 59 | model.enableExternalScorer(ARGS.scorer) 60 | logger.info("ARGS.scorer: %s", ARGS.scorer) 61 | 62 | if ARGS.lw and ARGS.vwcw: 63 | model.setScorerAlphaBeta(ARGS.lw, ARGS.vwcw) 64 | 65 | if ARGS.bw: 66 | model.setBeamWidth(ARGS.bw) 67 | 68 | @get('/recognize', apply=[websocket]) 69 | def recognize(ws): 70 | logger.debug("new websocket") 71 | start_time = None 72 | gSem_acquired = False 73 | 74 | while True: 75 | data = ws.receive() 76 | # logger.log(5, "got websocket data: %r", data) 77 | 78 | if isinstance(data, bytearray): 79 | # Receive stream data 80 | if not start_time: 81 | # Start of stream (utterance) 82 | start_time = time() 83 | stream = model.createStream() 84 | assert not gSem_acquired 85 | # logger.debug("acquiring lock for deepspeech ...") 86 | gSem.acquire(blocking=True) 87 | gSem_acquired = True 88 | # logger.debug("lock acquired") 89 | stream.feedAudioContent(np.frombuffer(data, np.int16)) 90 | 91 | elif isinstance(data, str) and data == 'EOS': 92 | # End of stream (utterance) 93 | eos_time = time() 94 | text = stream.finishStream() 95 | logger.info("recognized: %r", text) 96 | logger.info(" time: total=%s post_eos=%s", time()-start_time, time()-eos_time) 97 | ws.send(text) 98 | # FIXME: handle ConnectionResetError & geventwebsocket.exceptions.WebSocketError 99 | # logger.debug("releasing lock ...") 100 | gSem.release() 101 | gSem_acquired = False 102 | # logger.debug("lock released") 103 | start_time = None 104 | 105 | else: 106 | # Lost connection 107 | logger.debug("dead websocket") 108 | if gSem_acquired: 109 | # logger.debug("releasing lock ...") 110 | gSem.release() 111 | gSem_acquired = False 112 | # logger.debug("lock released") 113 | break 114 | 115 | @get('/') 116 | def index(): 117 | return template('index') 118 | 119 | run(host='127.0.0.1', port=ARGS.port, server=GeventWebSocketServer) 120 | 121 | # python server.py --model ../models/daanzu-30330/output_graph.pb --alphabet ../models/daanzu-30330/alphabet.txt --lm ../models/daanzu-30330/lm.binary --trie ../models/daanzu-30330/trie 122 | # python server.py --model ../models/daanzu-30330.2/output_graph.pb --alphabet ../models/daanzu-30330.2/alphabet.txt --lm ../models/daanzu-30330.2/lm.binary --trie ../models/daanzu-30330.2/trie 123 | --------------------------------------------------------------------------------