├── README.md ├── output └── hypotheses.txt ├── recordings.txt ├── recordings ├── 0001.wav ├── 0002.wav ├── 0003.wav ├── 0004.wav ├── 0005.wav ├── 0006.wav ├── 0007.wav ├── 0008.wav ├── 0009.wav └── 0010.wav ├── requirements.txt └── sttClient.py /README.md: -------------------------------------------------------------------------------- 1 | ## This sample has been deprecated. Please use the Official [Watson Python SDK](https://github.com/watson-developer-cloud/python-sdk) 2 | 3 | ## Synopsis 4 | 5 | This project consists of a python client that interacts with the IBM Watson Speech To Text service through its WebSockets interface. The client streams audio to the STT service and receives recognition hypotheses in real time. It can run N simultaneous recognition sessions 6 | 7 | ## Installation 8 | 9 | There are some dependencies that need to be installed for this script to work. It is advisable to install the required packages in a separate virtual environment. Certain packages have been observed to conflict with the package requirements for this script; in particular the package nose conflicts with these required packages. In order to interact with the STT service via WebSockets, it is necessary to install [pip](https://pip.readthedocs.org/en/1.1/installing.html), then write the following commands: 10 | 11 | ` 12 | pip install -r requirements.txt 13 | ` 14 | 15 | You also may need to write this command 16 | 17 | ` 18 | $ apt-get install build-essential python-dev 19 | ` 20 | 21 | If you are creating an environment using anaconda, proceed with the above pip command to install the packages--do not use conda to install the requirements as conda will install nose as a dependency. 22 | 23 | ## Examples 24 | 25 | The example below will run the default 10 WAV files through the WebSockets interface of the Speech To Text (STT) service and will dump the recognition hypotheses to a file under the "./output" directory. 26 | 27 | ` 28 | $ python ./sttClient.py -credentials : -model en-US_BroadbandModel 29 | ` 30 | 31 | The example below performs the same task much faster by opening 10 simultaneous recognition sessions (WebSocket connections) against the STT service. 32 | 33 | ` 34 | $ python ./sttClient.py -credentials : -model en-US_BroadbandModel -threads 10 35 | ` 36 | 37 | ## Options 38 | 39 | To see the list of available options type: 40 | 41 | ` 42 | $ python sttClient.py -h 43 | ` 44 | 45 | ## Motivation 46 | 47 | This script has been created by Daniel Bolanos in order to facilitate and promote the utilization of the IBM Watson Speech To Text service. 48 | 49 | 50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /output/hypotheses.txt: -------------------------------------------------------------------------------- 1 | 1: several tornadoes touch down as a line of severe thunderstorms swept through colorado on sunday 2 | 2: with one of the twisters hitting near a junior golf tournament 3 | 3: m. m. during a caddy 4 | 4: six of the tornado struck in northeast colorado 5 | 5: well to others hit in park county 6 | 6: in the center of the state 7 | 7: the national weather service said 8 | 8: at least three of them caused the damage 9 | 9: aurora fire department officials said a twister touched down near the blackstone country club 10 | 10: causing one minor injury and flipping an empty trailer 11 | -------------------------------------------------------------------------------- /recordings.txt: -------------------------------------------------------------------------------- 1 | ./recordings/0001.wav 2 | ./recordings/0002.wav 3 | ./recordings/0003.wav 4 | ./recordings/0004.wav 5 | ./recordings/0005.wav 6 | ./recordings/0006.wav 7 | ./recordings/0007.wav 8 | ./recordings/0008.wav 9 | ./recordings/0009.wav 10 | ./recordings/0010.wav 11 | -------------------------------------------------------------------------------- /recordings/0001.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/watson-developer-cloud/speech-to-text-websockets-python/f34493bfb323917420ba9ca4c48cbec20d4d826e/recordings/0001.wav -------------------------------------------------------------------------------- /recordings/0002.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/watson-developer-cloud/speech-to-text-websockets-python/f34493bfb323917420ba9ca4c48cbec20d4d826e/recordings/0002.wav -------------------------------------------------------------------------------- /recordings/0003.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/watson-developer-cloud/speech-to-text-websockets-python/f34493bfb323917420ba9ca4c48cbec20d4d826e/recordings/0003.wav -------------------------------------------------------------------------------- /recordings/0004.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/watson-developer-cloud/speech-to-text-websockets-python/f34493bfb323917420ba9ca4c48cbec20d4d826e/recordings/0004.wav -------------------------------------------------------------------------------- /recordings/0005.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/watson-developer-cloud/speech-to-text-websockets-python/f34493bfb323917420ba9ca4c48cbec20d4d826e/recordings/0005.wav -------------------------------------------------------------------------------- /recordings/0006.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/watson-developer-cloud/speech-to-text-websockets-python/f34493bfb323917420ba9ca4c48cbec20d4d826e/recordings/0006.wav -------------------------------------------------------------------------------- /recordings/0007.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/watson-developer-cloud/speech-to-text-websockets-python/f34493bfb323917420ba9ca4c48cbec20d4d826e/recordings/0007.wav -------------------------------------------------------------------------------- /recordings/0008.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/watson-developer-cloud/speech-to-text-websockets-python/f34493bfb323917420ba9ca4c48cbec20d4d826e/recordings/0008.wav -------------------------------------------------------------------------------- /recordings/0009.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/watson-developer-cloud/speech-to-text-websockets-python/f34493bfb323917420ba9ca4c48cbec20d4d826e/recordings/0009.wav -------------------------------------------------------------------------------- /recordings/0010.wav: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/watson-developer-cloud/speech-to-text-websockets-python/f34493bfb323917420ba9ca4c48cbec20d4d826e/recordings/0010.wav -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | autobahn>=0.10.9 2 | pyOpenSSL>=0.13.1 3 | requests>=2.8.1 4 | Twisted>=13.2.0 5 | txaio>=2.0.4 6 | service_identity>=16.0.0 7 | -------------------------------------------------------------------------------- /sttClient.py: -------------------------------------------------------------------------------- 1 | # 2 | # Copyright IBM Corp. 2014 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | # 16 | 17 | # Author: Daniel Bolanos 18 | # Date: 2015 19 | 20 | # coding=utf-8 21 | import json # json 22 | import threading # multi threading 23 | import os # for listing directories 24 | import queue as Queue # queue used for thread syncronization 25 | import sys # system calls 26 | import argparse # for parsing arguments 27 | import base64 # necessary to encode in base64 28 | # # according to the RFC2045 standard 29 | import requests # python HTTP requests library 30 | 31 | # WebSockets 32 | from autobahn.twisted.websocket import WebSocketClientProtocol, \ 33 | WebSocketClientFactory, connectWS 34 | from twisted.python import log 35 | from twisted.internet import ssl, reactor 36 | 37 | try: 38 | raw_input # Python 2 39 | except NameError: 40 | raw_input = input # Python 3 41 | 42 | 43 | class Utils: 44 | 45 | @staticmethod 46 | def getAuthenticationToken(hostname, serviceName, username, password): 47 | 48 | fmt = hostname + "{0}/authorization/api/v1/token?url={0}/{1}/api" 49 | uri = fmt.format(hostname, serviceName) 50 | uri = uri.replace("wss://", "https://").replace("ws://", "https://") 51 | print(uri) 52 | auth = (username, password) 53 | headers = {'Accept': 'application/json'} 54 | resp = requests.get(uri, auth=auth, verify=False, headers=headers, 55 | timeout=(30, 30)) 56 | print(resp.text) 57 | jsonObject = resp.json() 58 | return jsonObject['token'] 59 | 60 | 61 | class WSInterfaceFactory(WebSocketClientFactory): 62 | 63 | def __init__(self, queue, summary, dirOutput, contentType, model, 64 | url=None, headers=None, debug=None): 65 | 66 | WebSocketClientFactory.__init__(self, url=url, headers=headers) 67 | self.queue = queue 68 | self.summary = summary 69 | self.dirOutput = dirOutput 70 | self.contentType = contentType 71 | self.model = model 72 | self.queueProto = Queue.Queue() 73 | 74 | self.openHandshakeTimeout = 10 75 | self.closeHandshakeTimeout = 10 76 | 77 | # start the thread that takes care of ending the reactor so 78 | # the script can finish automatically (without ctrl+c) 79 | endingThread = threading.Thread(target=self.endReactor, args=()) 80 | endingThread.daemon = True 81 | endingThread.start() 82 | 83 | def prepareUtterance(self): 84 | 85 | try: 86 | utt = self.queue.get_nowait() 87 | self.queueProto.put(utt) 88 | return True 89 | except Queue.Empty: 90 | print("getUtterance: no more utterances to process, queue is " 91 | "empty!") 92 | return False 93 | 94 | def endReactor(self): 95 | 96 | self.queue.join() 97 | print("about to stop the reactor!") 98 | reactor.stop() 99 | 100 | # this function gets called every time connectWS is called (once 101 | # per WebSocket connection/session) 102 | def buildProtocol(self, addr): 103 | 104 | try: 105 | utt = self.queueProto.get_nowait() 106 | proto = WSInterfaceProtocol(self, self.queue, self.summary, 107 | self.dirOutput, self.contentType) 108 | proto.setUtterance(utt) 109 | return proto 110 | except Queue.Empty: 111 | print("queue should not be empty, otherwise this function should " 112 | "not have been called") 113 | return None 114 | 115 | 116 | # WebSockets interface to the STT service 117 | # 118 | # note: an object of this class is created for each WebSocket 119 | # connection, every time we call connectWS 120 | class WSInterfaceProtocol(WebSocketClientProtocol): 121 | 122 | def __init__(self, factory, queue, summary, dirOutput, contentType): 123 | self.factory = factory 124 | self.queue = queue 125 | self.summary = summary 126 | self.dirOutput = dirOutput 127 | self.contentType = contentType 128 | self.packetRate = 20 129 | self.listeningMessages = 0 130 | self.timeFirstInterim = -1 131 | self.bytesSent = 0 132 | self.chunkSize = 2000 # in bytes 133 | super(self.__class__, self).__init__() 134 | print(dirOutput) 135 | print("contentType: {} queueSize: {}".format(self.contentType, 136 | self.queue.qsize())) 137 | 138 | def setUtterance(self, utt): 139 | 140 | self.uttNumber = utt[0] 141 | self.uttFilename = utt[1] 142 | self.summary[self.uttNumber] = {"hypothesis": "", 143 | "status": {"code": "", "reason": ""}} 144 | self.fileJson = "{}/{}.json.txt".format(self.dirOutput, self.uttNumber) 145 | try: 146 | os.remove(self.fileJson) 147 | except OSError: 148 | pass 149 | 150 | # helper method that sends a chunk of audio if needed (as required 151 | # what the specified pacing is) 152 | def maybeSendChunk(self, data): 153 | 154 | def sendChunk(chunk, final=False): 155 | self.bytesSent += len(chunk) 156 | self.sendMessage(chunk, isBinary=True) 157 | if final: 158 | self.sendMessage(b'', isBinary=True) 159 | 160 | if (self.bytesSent + self.chunkSize >= len(data)): 161 | if (len(data) > self.bytesSent): 162 | sendChunk(data[self.bytesSent:len(data)], True) 163 | return 164 | sendChunk(data[self.bytesSent:self.bytesSent + self.chunkSize]) 165 | self.factory.reactor.callLater(0.01, self.maybeSendChunk, data=data) 166 | return 167 | 168 | def onConnect(self, response): 169 | print("onConnect, server connected: {}".format(response.peer)) 170 | 171 | def onOpen(self): 172 | print("onOpen") 173 | data = {"action": "start", 174 | "content-type": str(self.contentType), 175 | "continuous": True, 176 | "interim_results": True, 177 | "inactivity_timeout": 600, 178 | 'max_alternatives': 3, 179 | 'timestamps': True, 180 | 'word_confidence': True} 181 | print("sendMessage(init)") 182 | # send the initialization parameters 183 | self.sendMessage(json.dumps(data).encode('utf8')) 184 | 185 | # start sending audio right away (it will get buffered in the 186 | # STT service) 187 | print(self.uttFilename) 188 | with open(str(self.uttFilename), 'rb') as f: 189 | self.bytesSent = 0 190 | dataFile = f.read() 191 | self.maybeSendChunk(dataFile) 192 | print("onOpen ends") 193 | 194 | def onMessage(self, payload, isBinary): 195 | 196 | if isBinary: 197 | print("Binary message received: {0} bytes".format(len(payload))) 198 | else: 199 | print(u"Text message received: {0}".format(payload.decode('utf8'))) 200 | 201 | # if uninitialized, receive the initialization response 202 | # from the server 203 | jsonObject = json.loads(payload.decode('utf8')) 204 | if 'state' in jsonObject: 205 | self.listeningMessages += 1 206 | if self.listeningMessages == 2: 207 | print("sending close 1000") 208 | # close the connection 209 | self.sendClose(1000) 210 | 211 | # if in streaming 212 | elif 'results' in jsonObject: 213 | jsonObject = json.loads(payload.decode('utf8')) 214 | hypothesis = "" 215 | # empty hypothesis 216 | if len(jsonObject['results']) == 0: 217 | print("empty hypothesis!") 218 | # regular hypothesis 219 | else: 220 | # dump the message to the output directory 221 | jsonObject = json.loads(payload.decode('utf8')) 222 | with open(self.fileJson, "a") as f: 223 | f.write(json.dumps(jsonObject, indent=4, 224 | sort_keys=True)) 225 | 226 | res = jsonObject['results'][0] 227 | hypothesis = res['alternatives'][0]['transcript'] 228 | bFinal = (res['final'] is True) 229 | if bFinal: 230 | print('final hypothesis: "' + hypothesis + '"') 231 | self.summary[self.uttNumber]['hypothesis'] += hypothesis 232 | else: 233 | print('interim hyp: "' + hypothesis + '"') 234 | 235 | def onClose(self, wasClean, code, reason): 236 | 237 | print("onClose") 238 | print("WebSocket connection closed: {0}, code: {1}, clean: {2}, " 239 | "reason: {0}".format(reason, code, wasClean)) 240 | self.summary[self.uttNumber]['status']['code'] = code 241 | self.summary[self.uttNumber]['status']['reason'] = reason 242 | 243 | # create a new WebSocket connection if there are still 244 | # utterances in the queue that need to be processed 245 | self.queue.task_done() 246 | 247 | if not self.factory.prepareUtterance(): 248 | return 249 | 250 | # SSL client context: default 251 | if self.factory.isSecure: 252 | contextFactory = ssl.ClientContextFactory() 253 | else: 254 | contextFactory = None 255 | connectWS(self.factory, contextFactory) 256 | 257 | 258 | # function to check that a value is a positive integer 259 | def check_positive_int(value): 260 | ivalue = int(value) 261 | if ivalue < 1: 262 | raise argparse.ArgumentTypeError( 263 | '"%s" is an invalid positive int value' % value) 264 | return ivalue 265 | 266 | 267 | # function to check the credentials format 268 | def check_credentials(credentials): 269 | elements = credentials.split(":") 270 | if len(elements) == 2: 271 | return elements 272 | else: 273 | raise argparse.ArgumentTypeError( 274 | '"%s" is not a valid format for the credentials ' % credentials) 275 | 276 | 277 | if __name__ == '__main__': 278 | 279 | # parse command line parameters 280 | parser = argparse.ArgumentParser( 281 | description=('client to do speech recognition using the WebSocket ' 282 | 'interface to the Watson STT service')) 283 | parser.add_argument( 284 | '-credentials', action='store', dest='credentials', 285 | help="Basic Authentication credentials in the form 'username:password'", 286 | required=True, type=check_credentials) 287 | parser.add_argument( 288 | '-in', action='store', dest='fileInput', default='./recordings.txt', 289 | help='text file containing audio files') 290 | parser.add_argument( 291 | '-out', action='store', dest='dirOutput', default='./output', 292 | help='output directory') 293 | parser.add_argument( 294 | '-type', action='store', dest='contentType', default='audio/wav', 295 | help='audio content type, for example: \'audio/l16; rate=44100\'') 296 | parser.add_argument( 297 | '-model', action='store', dest='model', default='en-US_BroadbandModel', 298 | help='STT model that will be used') 299 | parser.add_argument( 300 | '-amcustom', action='store', dest='am_custom_id', default=None, 301 | help='id of the acoustic model customization that will be used', required=False) 302 | parser.add_argument( 303 | '-lmcustom', action='store', dest='lm_custom_id', default=None, 304 | help='id of the language model customization that will be used', required=False) 305 | parser.add_argument( 306 | '-threads', action='store', dest='threads', default='1', 307 | help='number of simultaneous STT sessions', type=check_positive_int) 308 | parser.add_argument( 309 | '-optout', action='store_true', dest='optOut', 310 | help=('specify opt-out header so user data, such as speech and ' 311 | 'hypotheses are not logged into the server')) 312 | parser.add_argument( 313 | '-tokenauth', action='store_true', dest='tokenauth', 314 | help='use token based authentication') 315 | 316 | args = parser.parse_args() 317 | 318 | # create output directory if necessary 319 | if os.path.isdir(args.dirOutput): 320 | fmt = 'the output directory "{}" already exists, overwrite? (y/n)? ' 321 | while True: 322 | answer = raw_input(fmt.format(args.dirOutput)).strip().lower() 323 | if answer == "n": 324 | sys.stderr.write("exiting...") 325 | sys.exit() 326 | elif answer == "y": 327 | break 328 | else: 329 | os.makedirs(args.dirOutput) 330 | 331 | # logging 332 | log.startLogging(sys.stdout) 333 | 334 | # add audio files to the processing queue 335 | q = Queue.Queue() 336 | lines = [line.rstrip('\n') for line in open(args.fileInput)] 337 | fileNumber = 0 338 | for fileName in lines: 339 | print(fileName) 340 | q.put((fileNumber, fileName)) 341 | fileNumber += 1 342 | 343 | hostname = "stream.watsonplatform.net" 344 | headers = {'X-WDC-PL-OPT-OUT': '1'} if args.optOut else {} 345 | 346 | # authentication header 347 | if args.tokenauth: 348 | headers['X-Watson-Authorization-Token'] = ( 349 | Utils.getAuthenticationToken('https://' + hostname, 350 | 'speech-to-text', 351 | args.credentials[0], 352 | args.credentials[1])) 353 | else: 354 | auth = args.credentials[0] + ":" + args.credentials[1] 355 | headers["Authorization"] = "Basic " + base64.b64encode(auth.encode()).decode('utf-8') 356 | 357 | print(headers) 358 | # create a WS server factory with our protocol 359 | fmt = "wss://{}/speech-to-text/api/v1/recognize?model={}" 360 | url = fmt.format(hostname, args.model) 361 | if args.am_custom_id != None: 362 | url += "&acoustic_customization_id=" + args.am_custom_id 363 | if args.lm_custom_id != None: 364 | url += "&customization_id=" + args.lm_custom_id 365 | print(url) 366 | summary = {} 367 | factory = WSInterfaceFactory(q, summary, args.dirOutput, args.contentType, 368 | args.model, url, headers, debug=False) 369 | factory.protocol = WSInterfaceProtocol 370 | 371 | for i in range(min(int(args.threads), q.qsize())): 372 | 373 | factory.prepareUtterance() 374 | 375 | # SSL client context: default 376 | if factory.isSecure: 377 | contextFactory = ssl.ClientContextFactory() 378 | else: 379 | contextFactory = None 380 | connectWS(factory, contextFactory) 381 | 382 | reactor.run() 383 | 384 | # dump the hypotheses to the output file 385 | fileHypotheses = args.dirOutput + "/hypotheses.txt" 386 | f = open(fileHypotheses, "w") 387 | successful = 0 388 | emptyHypotheses = 0 389 | print(sorted(summary.items())) 390 | counter = 0 391 | for key, value in enumerate(sorted(summary.items())): 392 | value = value[1] 393 | if value['status']['code'] == 1000: 394 | print('{}: {} {}'.format(key, value['status']['code'], 395 | value['hypothesis'].encode('utf-8'))) 396 | successful += 1 397 | if value['hypothesis'][0] == "": 398 | emptyHypotheses += 1 399 | else: 400 | fmt = '{}: {status[code]} REASON: {status[reason]}' 401 | print(fmt.format(key, **status)) 402 | f.write('{}: {}\n'.format(counter + 1, value['hypothesis'].encode('utf-8'))) 403 | counter += 1 404 | f.close() 405 | fmt = "successful sessions: {} ({} errors) ({} empty hypotheses)" 406 | print(fmt.format(successful, len(summary) - successful, emptyHypotheses)) 407 | --------------------------------------------------------------------------------