├── .github
    └── FUNDING.yml
├── LICENSE
├── README.md
├── client.py
├── requirements-client.txt
├── requirements-server.txt
└── server.py


/.github/FUNDING.yml:
--------------------------------------------------------------------------------
1 | github: daanzu
2 | patreon: daanzu
3 | custom: "https://paypal.me/daanzu"
4 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | Mozilla Public License Version 2.0
  2 | ==================================
  3 | 
  4 | 1. Definitions
  5 | --------------
  6 | 
  7 | 1.1. "Contributor"
  8 |     means each individual or legal entity that creates, contributes to
  9 |     the creation of, or owns Covered Software.
 10 | 
 11 | 1.2. "Contributor Version"
 12 |     means the combination of the Contributions of others (if any) used
 13 |     by a Contributor and that particular Contributor's Contribution.
 14 | 
 15 | 1.3. "Contribution"
 16 |     means Covered Software of a particular Contributor.
 17 | 
 18 | 1.4. "Covered Software"
 19 |     means Source Code Form to which the initial Contributor has attached
 20 |     the notice in Exhibit A, the Executable Form of such Source Code
 21 |     Form, and Modifications of such Source Code Form, in each case
 22 |     including portions thereof.
 23 | 
 24 | 1.5. "Incompatible With Secondary Licenses"
 25 |     means
 26 | 
 27 |     (a) that the initial Contributor has attached the notice described
 28 |         in Exhibit B to the Covered Software; or
 29 | 
 30 |     (b) that the Covered Software was made available under the terms of
 31 |         version 1.1 or earlier of the License, but not also under the
 32 |         terms of a Secondary License.
 33 | 
 34 | 1.6. "Executable Form"
 35 |     means any form of the work other than Source Code Form.
 36 | 
 37 | 1.7. "Larger Work"
 38 |     means a work that combines Covered Software with other material, in
 39 |     a separate file or files, that is not Covered Software.
 40 | 
 41 | 1.8. "License"
 42 |     means this document.
 43 | 
 44 | 1.9. "Licensable"
 45 |     means having the right to grant, to the maximum extent possible,
 46 |     whether at the time of the initial grant or subsequently, any and
 47 |     all of the rights conveyed by this License.
 48 | 
 49 | 1.10. "Modifications"
 50 |     means any of the following:
 51 | 
 52 |     (a) any file in Source Code Form that results from an addition to,
 53 |         deletion from, or modification of the contents of Covered
 54 |         Software; or
 55 | 
 56 |     (b) any new file in Source Code Form that contains any Covered
 57 |         Software.
 58 | 
 59 | 1.11. "Patent Claims" of a Contributor
 60 |     means any patent claim(s), including without limitation, method,
 61 |     process, and apparatus claims, in any patent Licensable by such
 62 |     Contributor that would be infringed, but for the grant of the
 63 |     License, by the making, using, selling, offering for sale, having
 64 |     made, import, or transfer of either its Contributions or its
 65 |     Contributor Version.
 66 | 
 67 | 1.12. "Secondary License"
 68 |     means either the GNU General Public License, Version 2.0, the GNU
 69 |     Lesser General Public License, Version 2.1, the GNU Affero General
 70 |     Public License, Version 3.0, or any later versions of those
 71 |     licenses.
 72 | 
 73 | 1.13. "Source Code Form"
 74 |     means the form of the work preferred for making modifications.
 75 | 
 76 | 1.14. "You" (or "Your")
 77 |     means an individual or a legal entity exercising rights under this
 78 |     License. For legal entities, "You" includes any entity that
 79 |     controls, is controlled by, or is under common control with You. For
 80 |     purposes of this definition, "control" means (a) the power, direct
 81 |     or indirect, to cause the direction or management of such entity,
 82 |     whether by contract or otherwise, or (b) ownership of more than
 83 |     fifty percent (50%) of the outstanding shares or beneficial
 84 |     ownership of such entity.
 85 | 
 86 | 2. License Grants and Conditions
 87 | --------------------------------
 88 | 
 89 | 2.1. Grants
 90 | 
 91 | Each Contributor hereby grants You a world-wide, royalty-free,
 92 | non-exclusive license:
 93 | 
 94 | (a) under intellectual property rights (other than patent or trademark)
 95 |     Licensable by such Contributor to use, reproduce, make available,
 96 |     modify, display, perform, distribute, and otherwise exploit its
 97 |     Contributions, either on an unmodified basis, with Modifications, or
 98 |     as part of a Larger Work; and
 99 | 
100 | (b) under Patent Claims of such Contributor to make, use, sell, offer
101 |     for sale, have made, import, and otherwise transfer either its
102 |     Contributions or its Contributor Version.
103 | 
104 | 2.2. Effective Date
105 | 
106 | The licenses granted in Section 2.1 with respect to any Contribution
107 | become effective for each Contribution on the date the Contributor first
108 | distributes such Contribution.
109 | 
110 | 2.3. Limitations on Grant Scope
111 | 
112 | The licenses granted in this Section 2 are the only rights granted under
113 | this License. No additional rights or licenses will be implied from the
114 | distribution or licensing of Covered Software under this License.
115 | Notwithstanding Section 2.1(b) above, no patent license is granted by a
116 | Contributor:
117 | 
118 | (a) for any code that a Contributor has removed from Covered Software;
119 |     or
120 | 
121 | (b) for infringements caused by: (i) Your and any other third party's
122 |     modifications of Covered Software, or (ii) the combination of its
123 |     Contributions with other software (except as part of its Contributor
124 |     Version); or
125 | 
126 | (c) under Patent Claims infringed by Covered Software in the absence of
127 |     its Contributions.
128 | 
129 | This License does not grant any rights in the trademarks, service marks,
130 | or logos of any Contributor (except as may be necessary to comply with
131 | the notice requirements in Section 3.4).
132 | 
133 | 2.4. Subsequent Licenses
134 | 
135 | No Contributor makes additional grants as a result of Your choice to
136 | distribute the Covered Software under a subsequent version of this
137 | License (see Section 10.2) or under the terms of a Secondary License (if
138 | permitted under the terms of Section 3.3).
139 | 
140 | 2.5. Representation
141 | 
142 | Each Contributor represents that the Contributor believes its
143 | Contributions are its original creation(s) or it has sufficient rights
144 | to grant the rights to its Contributions conveyed by this License.
145 | 
146 | 2.6. Fair Use
147 | 
148 | This License is not intended to limit any rights You have under
149 | applicable copyright doctrines of fair use, fair dealing, or other
150 | equivalents.
151 | 
152 | 2.7. Conditions
153 | 
154 | Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted
155 | in Section 2.1.
156 | 
157 | 3. Responsibilities
158 | -------------------
159 | 
160 | 3.1. Distribution of Source Form
161 | 
162 | All distribution of Covered Software in Source Code Form, including any
163 | Modifications that You create or to which You contribute, must be under
164 | the terms of this License. You must inform recipients that the Source
165 | Code Form of the Covered Software is governed by the terms of this
166 | License, and how they can obtain a copy of this License. You may not
167 | attempt to alter or restrict the recipients' rights in the Source Code
168 | Form.
169 | 
170 | 3.2. Distribution of Executable Form
171 | 
172 | If You distribute Covered Software in Executable Form then:
173 | 
174 | (a) such Covered Software must also be made available in Source Code
175 |     Form, as described in Section 3.1, and You must inform recipients of
176 |     the Executable Form how they can obtain a copy of such Source Code
177 |     Form by reasonable means in a timely manner, at a charge no more
178 |     than the cost of distribution to the recipient; and
179 | 
180 | (b) You may distribute such Executable Form under the terms of this
181 |     License, or sublicense it under different terms, provided that the
182 |     license for the Executable Form does not attempt to limit or alter
183 |     the recipients' rights in the Source Code Form under this License.
184 | 
185 | 3.3. Distribution of a Larger Work
186 | 
187 | You may create and distribute a Larger Work under terms of Your choice,
188 | provided that You also comply with the requirements of this License for
189 | the Covered Software. If the Larger Work is a combination of Covered
190 | Software with a work governed by one or more Secondary Licenses, and the
191 | Covered Software is not Incompatible With Secondary Licenses, this
192 | License permits You to additionally distribute such Covered Software
193 | under the terms of such Secondary License(s), so that the recipient of
194 | the Larger Work may, at their option, further distribute the Covered
195 | Software under the terms of either this License or such Secondary
196 | License(s).
197 | 
198 | 3.4. Notices
199 | 
200 | You may not remove or alter the substance of any license notices
201 | (including copyright notices, patent notices, disclaimers of warranty,
202 | or limitations of liability) contained within the Source Code Form of
203 | the Covered Software, except that You may alter any license notices to
204 | the extent required to remedy known factual inaccuracies.
205 | 
206 | 3.5. Application of Additional Terms
207 | 
208 | You may choose to offer, and to charge a fee for, warranty, support,
209 | indemnity or liability obligations to one or more recipients of Covered
210 | Software. However, You may do so only on Your own behalf, and not on
211 | behalf of any Contributor. You must make it absolutely clear that any
212 | such warranty, support, indemnity, or liability obligation is offered by
213 | You alone, and You hereby agree to indemnify every Contributor for any
214 | liability incurred by such Contributor as a result of warranty, support,
215 | indemnity or liability terms You offer. You may include additional
216 | disclaimers of warranty and limitations of liability specific to any
217 | jurisdiction.
218 | 
219 | 4. Inability to Comply Due to Statute or Regulation
220 | ---------------------------------------------------
221 | 
222 | If it is impossible for You to comply with any of the terms of this
223 | License with respect to some or all of the Covered Software due to
224 | statute, judicial order, or regulation then You must: (a) comply with
225 | the terms of this License to the maximum extent possible; and (b)
226 | describe the limitations and the code they affect. Such description must
227 | be placed in a text file included with all distributions of the Covered
228 | Software under this License. Except to the extent prohibited by statute
229 | or regulation, such description must be sufficiently detailed for a
230 | recipient of ordinary skill to be able to understand it.
231 | 
232 | 5. Termination
233 | --------------
234 | 
235 | 5.1. The rights granted under this License will terminate automatically
236 | if You fail to comply with any of its terms. However, if You become
237 | compliant, then the rights granted under this License from a particular
238 | Contributor are reinstated (a) provisionally, unless and until such
239 | Contributor explicitly and finally terminates Your grants, and (b) on an
240 | ongoing basis, if such Contributor fails to notify You of the
241 | non-compliance by some reasonable means prior to 60 days after You have
242 | come back into compliance. Moreover, Your grants from a particular
243 | Contributor are reinstated on an ongoing basis if such Contributor
244 | notifies You of the non-compliance by some reasonable means, this is the
245 | first time You have received notice of non-compliance with this License
246 | from such Contributor, and You become compliant prior to 30 days after
247 | Your receipt of the notice.
248 | 
249 | 5.2. If You initiate litigation against any entity by asserting a patent
250 | infringement claim (excluding declaratory judgment actions,
251 | counter-claims, and cross-claims) alleging that a Contributor Version
252 | directly or indirectly infringes any patent, then the rights granted to
253 | You by any and all Contributors for the Covered Software under Section
254 | 2.1 of this License shall terminate.
255 | 
256 | 5.3. In the event of termination under Sections 5.1 or 5.2 above, all
257 | end user license agreements (excluding distributors and resellers) which
258 | have been validly granted by You or Your distributors under this License
259 | prior to termination shall survive termination.
260 | 
261 | ************************************************************************
262 | *                                                                      *
263 | *  6. Disclaimer of Warranty                                           *
264 | *  -------------------------                                           *
265 | *                                                                      *
266 | *  Covered Software is provided under this License on an "as is"       *
267 | *  basis, without warranty of any kind, either expressed, implied, or  *
268 | *  statutory, including, without limitation, warranties that the       *
269 | *  Covered Software is free of defects, merchantable, fit for a        *
270 | *  particular purpose or non-infringing. The entire risk as to the     *
271 | *  quality and performance of the Covered Software is with You.        *
272 | *  Should any Covered Software prove defective in any respect, You     *
273 | *  (not any Contributor) assume the cost of any necessary servicing,   *
274 | *  repair, or correction. This disclaimer of warranty constitutes an   *
275 | *  essential part of this License. No use of any Covered Software is   *
276 | *  authorized under this License except under this disclaimer.         *
277 | *                                                                      *
278 | ************************************************************************
279 | 
280 | ************************************************************************
281 | *                                                                      *
282 | *  7. Limitation of Liability                                          *
283 | *  --------------------------                                          *
284 | *                                                                      *
285 | *  Under no circumstances and under no legal theory, whether tort      *
286 | *  (including negligence), contract, or otherwise, shall any           *
287 | *  Contributor, or anyone who distributes Covered Software as          *
288 | *  permitted above, be liable to You for any direct, indirect,         *
289 | *  special, incidental, or consequential damages of any character      *
290 | *  including, without limitation, damages for lost profits, loss of    *
291 | *  goodwill, work stoppage, computer failure or malfunction, or any    *
292 | *  and all other commercial damages or losses, even if such party      *
293 | *  shall have been informed of the possibility of such damages. This   *
294 | *  limitation of liability shall not apply to liability for death or   *
295 | *  personal injury resulting from such party's negligence to the       *
296 | *  extent applicable law prohibits such limitation. Some               *
297 | *  jurisdictions do not allow the exclusion or limitation of           *
298 | *  incidental or consequential damages, so this exclusion and          *
299 | *  limitation may not apply to You.                                    *
300 | *                                                                      *
301 | ************************************************************************
302 | 
303 | 8. Litigation
304 | -------------
305 | 
306 | Any litigation relating to this License may be brought only in the
307 | courts of a jurisdiction where the defendant maintains its principal
308 | place of business and such litigation shall be governed by laws of that
309 | jurisdiction, without reference to its conflict-of-law provisions.
310 | Nothing in this Section shall prevent a party's ability to bring
311 | cross-claims or counter-claims.
312 | 
313 | 9. Miscellaneous
314 | ----------------
315 | 
316 | This License represents the complete agreement concerning the subject
317 | matter hereof. If any provision of this License is held to be
318 | unenforceable, such provision shall be reformed only to the extent
319 | necessary to make it enforceable. Any law or regulation which provides
320 | that the language of a contract shall be construed against the drafter
321 | shall not be used to construe this License against a Contributor.
322 | 
323 | 10. Versions of the License
324 | ---------------------------
325 | 
326 | 10.1. New Versions
327 | 
328 | Mozilla Foundation is the license steward. Except as provided in Section
329 | 10.3, no one other than the license steward has the right to modify or
330 | publish new versions of this License. Each version will be given a
331 | distinguishing version number.
332 | 
333 | 10.2. Effect of New Versions
334 | 
335 | You may distribute the Covered Software under the terms of the version
336 | of the License under which You originally received the Covered Software,
337 | or under the terms of any subsequent version published by the license
338 | steward.
339 | 
340 | 10.3. Modified Versions
341 | 
342 | If you create software not governed by this License, and you want to
343 | create a new license for such software, you may create and use a
344 | modified version of this License if you rename the license and remove
345 | any references to the name of the license steward (except to note that
346 | such modified license differs from this License).
347 | 
348 | 10.4. Distributing Source Code Form that is Incompatible With Secondary
349 | Licenses
350 | 
351 | If You choose to distribute Source Code Form that is Incompatible With
352 | Secondary Licenses under the terms of this version of the License, the
353 | notice described in Exhibit B of this License must be attached.
354 | 
355 | Exhibit A - Source Code Form License Notice
356 | -------------------------------------------
357 | 
358 |   This Source Code Form is subject to the terms of the Mozilla Public
359 |   License, v. 2.0. If a copy of the MPL was not distributed with this
360 |   file, You can obtain one at http://mozilla.org/MPL/2.0/.
361 | 
362 | If it is not possible or desirable to put the notice in a particular
363 | file, then You may include the notice in a location (such as a LICENSE
364 | file in a relevant directory) where a recipient would be likely to look
365 | for such a notice.
366 | 
367 | You may add additional accurate notices of copyright ownership.
368 | 
369 | Exhibit B - "Incompatible With Secondary Licenses" Notice
370 | ---------------------------------------------------------
371 | 
372 |   This Source Code Form is "Incompatible With Secondary Licenses", as
373 |   defined by the Mozilla Public License, v. 2.0.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # DeepSpeech WebSocket Server
  2 | 
  3 | [![Donate](https://img.shields.io/badge/donate-GitHub-pink.svg)](https://github.com/sponsors/daanzu)
  4 | [![Donate](https://img.shields.io/badge/donate-Patreon-orange.svg)](https://www.patreon.com/daanzu)
  5 | [![Donate](https://img.shields.io/badge/donate-PayPal-green.svg)](https://paypal.me/daanzu)
  6 | [![Donate](https://img.shields.io/badge/preferred-GitHub-black.svg)](https://github.com/sponsors/daanzu)
  7 | [**GitHub** is currently matching all my donations $-for-$.]
  8 | 
  9 | This is a [WebSocket](https://en.wikipedia.org/wiki/WebSocket) server (& client) for Mozilla's [DeepSpeech](https://github.com/mozilla/DeepSpeech), to allow easy real-time speech recognition, using a separate client & server that can be run in different environments, either locally or remotely.
 10 | 
 11 | Work in progress. Developed to quickly test new models running DeepSpeech in [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/about) using microphone input from host Windows. Available to save others some time.
 12 | 
 13 | ## Features
 14 | 
 15 | * Server
 16 |     - Tested and works with DeepSpeech v0.7 (thanks [@Kai-Karren](https://github.com/Kai-Karren))
 17 |     - Streaming inference via DeepSpeech v0.2+
 18 |     - Streams raw audio data from client via WebSocket
 19 |     - Multi-user (only decodes one stream at a time, but can block until decoding is available)
 20 | * Client
 21 |     - Streams raw audio data from microphone to server via WebSocket
 22 |     - Voice activity detection (VAD) to ignore noise and segment microphone input into separate utterances
 23 |     - Hypnotizing spinner to indicate voice activity is detected!
 24 |     - Option to automatically save each utterance to a separate .wav file, for later testing
 25 |     - Need to pause/unpause listening? [See here](https://github.com/daanzu/deepspeech-websocket-server/issues/6).
 26 | 
 27 | ## Installation
 28 | 
 29 | This package is developed in Python 3.
 30 | Activate a virtualenv, then install the requirements for the server and/or client, depending on usage:
 31 | 
 32 | ```bash
 33 | pip install -r requirements-server.txt
 34 | ### AND/OR ###
 35 | pip install -r requirements-client.txt
 36 | ```
 37 | 
 38 | To run the server in an environment, you also need to install DeepSpeech, which requires choosing either the CPU xor GPU version:
 39 | 
 40 | ```bash
 41 | pip install deepspeech
 42 | ### XOR ###
 43 | pip install deepspeech-gpu
 44 | ```
 45 | 
 46 | Upgrade to the latest DeepSpeech with `pip install deepspeech --upgrade` (or gpu version). This package works with v0.3.0.
 47 | 
 48 | The client uses `pyaudio` and `portaudio` for microphone access. In my experience, this works out of the box on Windows. 
 49 | On Linux, you may need to install portaudio header files to compile the pyaudio package: `sudo apt install portaudio19-dev` .
 50 | On MacOS, try installing portaudio with brew: `brew install portaudio` .
 51 | 
 52 | ## Server
 53 | 
 54 | ```
 55 | > python server.py --model ../models/daanzu-6h-512l-0001lr-425dr/ -l -t
 56 | Initializing model...
 57 | 2018-10-06 AM 05:55:16.357: __main__: INFO: <module>(): args.model: ../models/daanzu-6h-512l-0001lr-425dr/output_graph.pb
 58 | 2018-10-06 AM 05:55:16.357: __main__: INFO: <module>(): args.alphabet: ../models/daanzu-6h-512l-0001lr-425dr/alphabet.txt
 59 | TensorFlow: v1.6.0-18-g5021473
 60 | DeepSpeech: v0.2.0-0-g009f9b6
 61 | Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
 62 | 2018-10-06 05:55:16.358385: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
 63 | 2018-10-06 AM 05:55:16.395: __main__: INFO: <module>(): args.lm: ../models/daanzu-6h-512l-0001lr-425dr/lm.binary
 64 | 2018-10-06 AM 05:55:16.395: __main__: INFO: <module>(): args.trie: ../models/daanzu-6h-512l-0001lr-425dr/trie
 65 | Bottle v0.12.13 server starting up (using GeventWebSocketServer())...
 66 | Listening on http://127.0.0.1:8080/
 67 | Hit Ctrl-C to quit.
 68 | 
 69 | 2018-10-06 AM 05:55:30.194: __main__: INFO: echo(): recognized: 'alpha bravo charlie'
 70 | 2018-10-06 AM 05:55:32.297: __main__: INFO: echo(): recognized: 'delta echo foxtrot'
 71 | 2018-10-06 AM 05:55:54.747: __main__: INFO: echo(): dead websocket
 72 | ^CKeyboardInterrupt
 73 | ```
 74 | 
 75 | ```
 76 | > python server.py -h
 77 | usage: server.py [-h] -m MODEL [-a [ALPHABET]] [-l [LM]] [-t [TRIE]] [--lw LW]
 78 |                  [--vwcw VWCW] [--bw BW] [-p PORT]
 79 | 
 80 | optional arguments:
 81 |   -h, --help            show this help message and exit
 82 |   -m MODEL, --model MODEL
 83 |                         Path to the model (protocol buffer binary file, or
 84 |                         directory containing all files for model)
 85 |   -a [ALPHABET], --alphabet [ALPHABET]
 86 |                         Path to the configuration file specifying the alphabet
 87 |                         used by the network. Default: alphabet.txt
 88 |   -l [LM], --lm [LM]    Path to the language model binary file. Default:
 89 |                         lm.binary
 90 |   -t [TRIE], --trie [TRIE]
 91 |                         Path to the language model trie file created with
 92 |                         native_client/generate_trie. Default: trie
 93 |   --lw LW               The alpha hyperparameter of the CTC decoder. Language
 94 |                         Model weight. Default: 1.5
 95 |   --vwcw VWCW           Valid word insertion weight. This is used to lessen
 96 |                         the word insertion penalty when the inserted word is
 97 |                         part of the vocabulary. Default: 2.25
 98 |   --bw BW               Beam width used in the CTC decoder when building
 99 |                         candidate transcriptions. Default: 1024
100 |   -p PORT, --port PORT  Port to run server on. Default: 8080
101 | ```
102 | 
103 | ## Client
104 | 
105 | ```
106 | λ py client.py
107 | Listening...
108 | Recognized: alpha bravo charlie
109 | Recognized: delta echo foxtrot
110 | ^C
111 | ```
112 | 
113 | ```
114 | λ py client.py -h
115 | usage: client.py [-h] [-s SERVER] [-a AGGRESSIVENESS] [--nospinner]
116 |                  [-w SAVEWAV]
117 | 
118 | Streams raw audio data from microphone with VAD to server via WebSocket
119 | 
120 | optional arguments:
121 |   -h, --help            show this help message and exit
122 |   -s SERVER, --server SERVER
123 |                         Default: ws://localhost:8080/recognize
124 |   -a AGGRESSIVENESS, --aggressiveness AGGRESSIVENESS
125 |                         Set aggressiveness of VAD: an integer between 0 and 3,
126 |                         0 being the least aggressive about filtering out non-
127 |                         speech, 3 the most aggressive. Default: 3
128 |   --nospinner           Disable spinner
129 |   -w SAVEWAV, --savewav SAVEWAV
130 |                         Save .wav files of utterences to given directory
131 | ```
132 | 
133 | ## Contributions
134 | 
135 | Pull requests welcome.
136 | 
137 | Contributors:
138 | * [@Zeddy913](https://github.com/Zeddy913)
139 | 


--------------------------------------------------------------------------------
/client.py:
--------------------------------------------------------------------------------
  1 | import time, logging
  2 | from datetime import datetime
  3 | import threading, collections, queue, os, os.path
  4 | import wave
  5 | import pyaudio
  6 | import webrtcvad
  7 | from lomond import WebSocket, events
  8 | from halo import Halo
  9 | 
 10 | logger = logging.getLogger(__name__)
 11 | logging.basicConfig(level=30,
 12 |     format="%(asctime)s.%(msecs)03d: %(name)s: %(levelname)s: %(funcName)s(): %(message)s",
 13 |     datefmt="%Y-%m-%d %p %I:%M:%S",
 14 |     )
 15 | logging.getLogger('lomond').setLevel(30)
 16 | 
 17 | 
 18 | ###############################################################################################################################################################
 19 | 
 20 | class Audio(object):
 21 |     """Streams raw audio from microphone. Data is received in a separate thread, and stored in a buffer, to be read from."""
 22 | 
 23 |     FORMAT = pyaudio.paInt16
 24 |     RATE = 16000
 25 |     CHANNELS = 1
 26 |     BLOCKS_PER_SECOND = 50
 27 | 
 28 |     def __init__(self, callback=None, buffer_s=0, flush_queue=True):
 29 |         def proxy_callback(in_data, frame_count, time_info, status):
 30 |             callback(in_data)
 31 |             return (None, pyaudio.paContinue)
 32 |         if callback is None: callback = lambda in_data: self.buffer_queue.put(in_data, block=False)
 33 |         self.sample_rate = self.RATE
 34 |         self.flush_queue = flush_queue
 35 |         self.buffer_queue = queue.Queue(maxsize=(buffer_s * 1000 // self.block_duration_ms))
 36 |         self.pa = pyaudio.PyAudio()
 37 |         self.stream = self.pa.open(format=self.FORMAT,
 38 |                                    channels=self.CHANNELS,
 39 |                                    rate=self.sample_rate,
 40 |                                    input=True,
 41 |                                    frames_per_buffer=self.block_size,
 42 |                                    stream_callback=proxy_callback)
 43 |         self.stream.start_stream()
 44 |         self.active = True
 45 | 
 46 |     def destroy(self):
 47 |         self.stream.stop_stream()
 48 |         self.stream.close()
 49 |         self.pa.terminate()
 50 |         self.active = False
 51 | 
 52 |     def read(self):
 53 |         """Return a block of audio data, blocking if necessary."""
 54 |         if self.active or (self.flush_queue and not self.buffer_queue.empty()):
 55 |             return self.buffer_queue.get()
 56 |         else:
 57 |             return None
 58 | 
 59 |     def read_loop(self, callback):
 60 |         """Block looping reading, repeatedly passing a block of audio data to callback."""
 61 |         for block in iter(self):
 62 |             callback(block)
 63 | 
 64 |     def __iter__(self):
 65 |         """Generator that yields all audio blocks from microphone."""
 66 |         while True:
 67 |             block = self.read()
 68 |             if block is None:
 69 |                 break
 70 |             yield block
 71 | 
 72 |     block_size = property(lambda self: int(self.sample_rate / float(self.BLOCKS_PER_SECOND)))
 73 |     block_duration_ms = property(lambda self: 1000 * self.block_size // self.sample_rate)
 74 | 
 75 |     def write_wav(self, filename, data):
 76 |         logging.info("write wav %s", filename)
 77 |         wf = wave.open(filename, 'wb')
 78 |         wf.setnchannels(self.CHANNELS)
 79 |         # wf.setsampwidth(self.pa.get_sample_size(FORMAT))
 80 |         assert self.FORMAT == pyaudio.paInt16
 81 |         wf.setsampwidth(2)
 82 |         wf.setframerate(self.sample_rate)
 83 |         wf.writeframes(data)
 84 |         wf.close()
 85 | 
 86 | 
 87 | ###############################################################################################################################################################
 88 | 
 89 | class VADAudio(Audio):
 90 |     """Filter & segment audio with voice activity detection."""
 91 | 
 92 |     def __init__(self, aggressiveness=3):
 93 |         super().__init__()
 94 |         self.vad = webrtcvad.Vad(aggressiveness)
 95 | 
 96 |     def vad_collector_simple(self, pre_padding_ms, blocks=None):
 97 |         if blocks is None: blocks = iter(self)
 98 |         num_padding_blocks = padding_ms // self.block_duration_ms
 99 |         buff = collections.deque(maxlen=num_padding_blocks)
100 |         triggered = False
101 | 
102 |         for block in blocks:
103 |             is_speech = self.vad.is_speech(block, self.sample_rate)
104 | 
105 |             if not triggered:
106 |                 if is_speech:
107 |                     triggered = True
108 |                     for f in buff:
109 |                         yield f
110 |                     buff.clear()
111 |                     yield block
112 |                 else:
113 |                     buff.append(block)
114 | 
115 |             else:
116 |                 if is_speech:
117 |                     yield block
118 |                 else:
119 |                     triggered = False
120 |                     yield None
121 |                     buff.append(block)
122 | 
123 |     def vad_collector(self, padding_ms=300, ratio=0.75, blocks=None):
124 |         """Generator that yields series of consecutive audio blocks comprising each utterence, separated by yielding a single None.
125 |             Determines voice activity by ratio of blocks in padding_ms. Uses a buffer to include padding_ms prior to being triggered.
126 |             Example: (block, ..., block, None, block, ..., block, None, ...)
127 |                       |---utterence---|        |---utterence---|
128 |         """
129 |         if blocks is None: blocks = iter(self)
130 |         num_padding_blocks = padding_ms // self.block_duration_ms
131 |         ring_buffer = collections.deque(maxlen=num_padding_blocks)
132 |         triggered = False
133 | 
134 |         for block in blocks:
135 |             is_speech = self.vad.is_speech(block, self.sample_rate)
136 | 
137 |             if not triggered:
138 |                 ring_buffer.append((block, is_speech))
139 |                 num_voiced = len([f for f, speech in ring_buffer if speech])
140 |                 if num_voiced > ratio * ring_buffer.maxlen:
141 |                     triggered = True
142 |                     for f, s in ring_buffer:
143 |                         yield f
144 |                     ring_buffer.clear()
145 | 
146 |             else:
147 |                 yield block
148 |                 ring_buffer.append((block, is_speech))
149 |                 num_unvoiced = len([f for f, speech in ring_buffer if not speech])
150 |                 if num_unvoiced > ratio * ring_buffer.maxlen:
151 |                     triggered = False
152 |                     yield None
153 |                     ring_buffer.clear()
154 | 
155 |     @classmethod
156 |     def test_vad(cls, aggressiveness):
157 |         self = cls(aggressiveness=aggressiveness)
158 |         blocks = iter(self)
159 |         for block in blocks:
160 |             is_speech = self.vad.is_speech(block, self.sample_rate)
161 |             print('|' if is_speech else '.', end='', flush=True)
162 | 
163 | 
164 | ###############################################################################################################################################################
165 | 
166 | ready = False
167 | 
168 | def print_output(*args):
169 |     if logger.isEnabledFor(40):
170 |         print(*args)
171 | 
172 | def audio_consumer(vad_audio, websocket):
173 |     """blocks"""
174 |     spinner = None
175 |     if not ARGS.nospinner: spinner = Halo(spinner='line') # circleHalves point arc boxBounce2 bounce line
176 |     length_ms = 0
177 |     wav_data = bytearray()
178 | 
179 |     for block in vad_audio.vad_collector():
180 |         if ready and websocket.is_active:
181 |             if block is not None:
182 |                 if not length_ms:
183 |                     logging.debug("begin utterence")
184 |                 if spinner: spinner.start()
185 |                 logging.log(5, "sending block")
186 |                 websocket.send_binary(block)
187 |                 if ARGS.savewav: wav_data.extend(block)
188 |                 length_ms += vad_audio.block_duration_ms
189 | 
190 |             else:
191 |                 if spinner: spinner.stop()
192 |                 if not length_ms: raise RuntimeError("ended utterence without beginning")
193 |                 logging.debug("end utterence")
194 |                 if ARGS.savewav:
195 |                     vad_audio.write_wav(os.path.join(ARGS.savewav, datetime.now().strftime("savewav_%Y-%m-%d_%H-%M-%S_%f.wav")), wav_data)
196 |                     wav_data = bytearray()
197 |                 logging.info("sent audio length_ms: %d" % length_ms)
198 |                 logging.log(5, "sending EOS")
199 |                 websocket.send_text('EOS')
200 |                 length_ms = 0
201 | 
202 | def websocket_runner(websocket):
203 |     """blocks"""
204 | 
205 |     def on_event(event):
206 |         if isinstance(event, events.Ready):
207 |             global ready
208 |             if not ready:
209 |                 print_output("Connected!")
210 |             ready = True
211 |         elif isinstance(event, events.Text):
212 |             if 1: print_output("Recognized: %s" % event.text)
213 |         elif 1:
214 |             logging.debug(event)
215 | 
216 |     for event in websocket:
217 |         try:
218 |             on_event(event)
219 |         except:
220 |             logger.exception('error handling %r', event)
221 |             websocket.close()
222 | 
223 | def main():
224 |     websocket = WebSocket(ARGS.server)
225 |     # TODO: compress?
226 |     print_output("Connecting to '%s'..." % websocket.url)
227 | 
228 |     vad_audio = VADAudio(aggressiveness=ARGS.aggressiveness)
229 |     print_output("Listening (ctrl-C to exit)...")
230 |     audio_consumer_thread = threading.Thread(target=lambda: audio_consumer(vad_audio, websocket))
231 |     audio_consumer_thread.start()
232 | 
233 |     websocket_runner(websocket)
234 | 
235 | 
236 | ###############################################################################################################################################################
237 | 
238 | def main_test():
239 |     if 0:
240 |         def consumer(self, blocks):
241 |             length_ms = 0
242 |             for block in blocks:
243 |                 if block is not None:
244 |                     print('|', end='', flush=True)
245 |                     length_ms += self.block_duration_ms
246 |                 else:
247 |                     print('.', end='', flush=True)
248 |                     length_ms = 0
249 |         VADAudio(consumer)
250 |     elif 1:
251 |         VADAudio.test_vad(3)
252 | 
253 | if __name__ == '__main__':
254 |     import argparse
255 |     parser = argparse.ArgumentParser(description="Streams raw audio data from microphone with VAD to server via WebSocket")
256 |     parser.add_argument('-s', '--server', default='ws://localhost:8080/recognize',
257 |         help="Default: ws://localhost:8080/recognize")
258 |     parser.add_argument('-a', '--aggressiveness', type=int, default=3,
259 |         help="Set aggressiveness of VAD: an integer between 0 and 3, 0 being the least aggressive about filtering out non-speech, 3 the most aggressive. Default: 3")
260 |     parser.add_argument('--nospinner', action='store_true',
261 |         help="Disable spinner")
262 |     parser.add_argument('-w', '--savewav',
263 |         help="Save .wav files of utterences to given directory. Example for current directory: -w .")
264 |     parser.add_argument('-v', '--verbose', action='store_true',
265 |         help="Print debugging info")
266 |     ARGS = parser.parse_args()
267 | 
268 |     if ARGS.verbose: logging.getLogger().setLevel(10)
269 |     if ARGS.savewav: os.makedirs(ARGS.savewav, exist_ok=True)
270 | 
271 |     if 0:
272 |         main_test()
273 |     else:
274 |         main()
275 | 


--------------------------------------------------------------------------------
/requirements-client.txt:
--------------------------------------------------------------------------------
1 | lomond>=0.3.3
2 | pyaudio>=0.2.11
3 | webrtcvad>=2.0.10
4 | halo>=0.0.18
5 | 


--------------------------------------------------------------------------------
/requirements-server.txt:
--------------------------------------------------------------------------------
1 | numpy>=1.15.1
2 | bottle>=0.12.13
3 | bottle-websocket>=0.2.9
4 | 


--------------------------------------------------------------------------------
/server.py:
--------------------------------------------------------------------------------
  1 | import argparse, logging, os.path
  2 | from time import time
  3 | 
  4 | from bottle import get, run, template
  5 | from bottle.ext.websocket import GeventWebSocketServer
  6 | from bottle.ext.websocket import websocket
  7 | from gevent.lock import BoundedSemaphore
  8 | 
  9 | import deepspeech
 10 | import numpy as np
 11 | 
 12 | logger = logging.getLogger(__name__)
 13 | logging.basicConfig(level=20,
 14 |     format="%(asctime)s.%(msecs)03d: %(name)s: %(levelname)s: %(funcName)s(): %(message)s",
 15 |     datefmt="%Y-%m-%d %p %I:%M:%S",
 16 |     )
 17 | 
 18 | parser = argparse.ArgumentParser(description='')
 19 | parser.add_argument('-m', '--model', required=True,
 20 |                     help='Path to the model (protocol buffer binary file, or directory containing all files for model)')
 21 | parser.add_argument('-s', '--scorer', help='The path to the scorer that adds an (optional) external language model to deepspeech')
 22 | parser.add_argument('-a', '--alphabet', nargs='?', const='alphabet.txt',
 23 |                     help='Path to the configuration file specifying the alphabet used by the network. Default: alphabet.txt')
 24 | parser.add_argument('-l', '--lm', nargs='?', const='lm.binary',
 25 |                     help='Path to the language model binary file. Default: lm.binary')
 26 | parser.add_argument('-t', '--trie', nargs='?', const='trie',
 27 |                     help='Path to the language model trie file created with native_client/generate_trie. Default: trie')
 28 | parser.add_argument('--lw', type=float, default=1.5,
 29 |                     help='The alpha hyperparameter of the CTC decoder. Language Model weight. Default: 1.5')
 30 | parser.add_argument('--vwcw', type=float, default=2.25,
 31 |                     help='Valid word insertion weight. This is used to lessen the word insertion penalty when the inserted word is part of the vocabulary. Default: 2.25')
 32 | parser.add_argument('--bw', type=int, default=1024,
 33 |                     help='Beam width used in the CTC decoder when building candidate transcriptions. Default: 1024')
 34 | parser.add_argument('-p', '--port', default=8080,
 35 |                     help='Port to run server on. Default: 8080')
 36 | parser.add_argument('--debuglevel', default=20,
 37 |                     help='Debug logging level. Default: 20')
 38 | ARGS = parser.parse_args()
 39 | 
 40 | logging.getLogger().setLevel(int(ARGS.debuglevel))
 41 | 
 42 | gSem = BoundedSemaphore(1)  # Only one Deepspeech instance available at a time
 43 | 
 44 | if os.path.isdir(ARGS.model):
 45 |     model_dir = ARGS.model
 46 |     ARGS.model = os.path.join(model_dir, 'model.pbmm')
 47 | 
 48 | LM_WEIGHT = ARGS.lw
 49 | VALID_WORD_COUNT_WEIGHT = ARGS.vwcw
 50 | BEAM_WIDTH = ARGS.bw
 51 | 
 52 | print('Initializing model...')
 53 | logger.info("ARGS.model: %s", ARGS.model)
 54 | 
 55 | # code for version deepspech version 0.7 and above
 56 | model = deepspeech.Model(ARGS.model)
 57 | 
 58 | if ARGS.scorer:
 59 |     model.enableExternalScorer(ARGS.scorer)
 60 |     logger.info("ARGS.scorer: %s", ARGS.scorer)
 61 | 
 62 | if ARGS.lw and ARGS.vwcw:
 63 |     model.setScorerAlphaBeta(ARGS.lw, ARGS.vwcw)
 64 | 
 65 | if ARGS.bw:
 66 |     model.setBeamWidth(ARGS.bw)
 67 | 
 68 | @get('/recognize', apply=[websocket])
 69 | def recognize(ws):
 70 |     logger.debug("new websocket")
 71 |     start_time = None
 72 |     gSem_acquired = False
 73 | 
 74 |     while True:
 75 |         data = ws.receive()
 76 |         # logger.log(5, "got websocket data: %r", data)
 77 | 
 78 |         if isinstance(data, bytearray):
 79 |             # Receive stream data
 80 |             if not start_time:
 81 |                 # Start of stream (utterance)
 82 |                 start_time = time()
 83 |                 stream = model.createStream()
 84 |                 assert not gSem_acquired
 85 |                 # logger.debug("acquiring lock for deepspeech ...")
 86 |                 gSem.acquire(blocking=True)
 87 |                 gSem_acquired = True
 88 |                 # logger.debug("lock acquired")
 89 |             stream.feedAudioContent(np.frombuffer(data, np.int16))
 90 | 
 91 |         elif isinstance(data, str) and data == 'EOS':
 92 |             # End of stream (utterance)
 93 |             eos_time = time()
 94 |             text = stream.finishStream()
 95 |             logger.info("recognized: %r", text)
 96 |             logger.info("    time: total=%s post_eos=%s", time()-start_time, time()-eos_time)
 97 |             ws.send(text)
 98 |             # FIXME: handle ConnectionResetError & geventwebsocket.exceptions.WebSocketError
 99 |             # logger.debug("releasing lock ...")
100 |             gSem.release()
101 |             gSem_acquired = False
102 |             # logger.debug("lock released")
103 |             start_time = None
104 | 
105 |         else:
106 |             # Lost connection
107 |             logger.debug("dead websocket")
108 |             if gSem_acquired:
109 |                 # logger.debug("releasing lock ...")
110 |                 gSem.release()
111 |                 gSem_acquired = False
112 |                 # logger.debug("lock released")
113 |             break
114 | 
115 | @get('/')
116 | def index():
117 |     return template('index')
118 | 
119 | run(host='127.0.0.1', port=ARGS.port, server=GeventWebSocketServer)
120 | 
121 | # python server.py --model ../models/daanzu-30330/output_graph.pb --alphabet ../models/daanzu-30330/alphabet.txt --lm ../models/daanzu-30330/lm.binary --trie ../models/daanzu-30330/trie
122 | # python server.py --model ../models/daanzu-30330.2/output_graph.pb --alphabet ../models/daanzu-30330.2/alphabet.txt --lm ../models/daanzu-30330.2/lm.binary --trie ../models/daanzu-30330.2/trie
123 | 


--------------------------------------------------------------------------------