├── LICENSE ├── README.md ├── attack.py ├── classify.py ├── docker ├── aae_deepspeech_041_cpu.dockerfile └── aae_deepspeech_041_gpu.dockerfile ├── ds_ctcdecoder-0.4.1-cp35-cp35m-linux_x86_64.whl ├── filterbanks.npy ├── sample-000000.wav ├── setup.sh ├── test_setup.sh ├── tf_logits.py └── xdg.py /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2017 Nicholas Carlini 2 | 3 | LICENSE 4 | 5 | Redistribution and use in source and binary forms, with or without 6 | modification, are permitted provided that the following conditions are met: 7 | 8 | 1. Redistributions of source code must retain the above copyright notice, this 9 | list of conditions and the following disclaimer. 10 | 2. Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 15 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 16 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 18 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 19 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 20 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 21 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 22 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 23 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Audio Adversarial Examples 2 | This is the code corresponding to the paper 3 | "Audio Adversarial Examples: Targeted Attacks on Speech-to-Text" 4 | Nicholas Carlini and David Wagner 5 | https://arxiv.org/abs/1801.01944 6 | 7 | To generate adversarial examples for your own files, follow the below process 8 | and modify the arguments to attack,py. Ensure that the file is sampled at 9 | 16KHz and uses signed 16-bit ints as the data type. You may want to modify 10 | the number of iterations that the attack algorithm is allowed to run. 11 | 12 | WARNING: THIS IS NOT THE CODE USED IN THE PAPER. If you just want to get going 13 | generating adversarial examples on audio then proceed as described below. 14 | 15 | The current master branch points to code which will run on TensorFlow 1.14 and 16 | DeepSpeech 0.4.1, an almost-recent version of the dependencies. (Large portions 17 | of tf_logits.py will need to be re-written to run on DeepSpeech 0.5.1 which uses 18 | a new feature extraction pipeline with TensorFlow's C++ implementation. If you 19 | feel motivated to do that I would gladly accept a PR.) 20 | 21 | However, IF YOU ARE TRYING TO REPRODUCE THE PAPER (or just have decided 22 | that you enjoy pain and want to suffer through dependency hell) then you 23 | will have to checkout commit a8d5f675ac8659072732d3de2152411f07c7aa3a and 24 | follow the README from there. 25 | 26 | There are two ways to install this project. The first is to just use Docker 27 | with a buildfile provided by Tom Doerr. It works. The second is to try and 28 | set up everything on your machine directly. This might work, if you happen 29 | to have the right versions of things. 30 | 31 | 32 | # Docker Installation (highly recommended) 33 | 34 | These docker instructions were kindly provided by Tom Doerr, and are simple to follow if you have Docker set up. 35 | 36 | 37 | 1. Install Docker. 38 | On Ubuntu/Debian/Linux-Mint etc.: 39 | ``` 40 | sudo apt-get install docker.io 41 | sudo systemctl enable --now docker 42 | ``` 43 | Instructions for other platforms: 44 | https://docs.docker.com/install/ 45 | 46 | 47 | 2. Download DeepSpeech and build the Docker images: 48 | ``` 49 | ./setup.sh 50 | ``` 51 | 52 | ### With Nvidia-GPU support: 53 | 3. Install the NVIDIA Container Toolkit. 54 | This step will only work on Linux and is only necessary if you want GPU support. 55 | As far as I know it's not possible to use a GPU with docker under Windows/Mac. 56 | On Ubuntu/Debian/Linux-Mint etc. you can install the toolkit with the following commands: 57 | ```sh 58 | # Add the package repositories 59 | distribution=$(. /etc/os-release;echo $ID$VERSION_ID) 60 | curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - 61 | curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list 62 | sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit 63 | sudo systemctl restart docker 64 | ``` 65 | Instructions for other platforms (CentOS/RHEL): 66 | https://github.com/NVIDIA/nvidia-docker 67 | 68 | 4. Start the container using the GPU image we just build: 69 | ``` 70 | docker run --gpus all -it --mount src=$(pwd),target=/audio_adversarial_examples,type=bind -w /audio_adversarial_examples aae_deepspeech_041_gpu 71 | ``` 72 | 73 | ### CPU-only (Skip if already started with Nvidia-GPU support): 74 | 4. Start the container using the CPU image we just build: 75 | ``` 76 | docker run -it --mount src=$(pwd),target=/audio_adversarial_examples,type=bind -w /audio_adversarial_examples aae_deepspeech_041_cpu 77 | ``` 78 | 79 | 80 | ### Test Setup 81 | 5. Check that you can classify normal audio correctly: 82 | ``` 83 | python3 classify.py --in sample-000000.wav --restore_path deepspeech-0.4.1-checkpoint/model.v0.4.1 84 | ``` 85 | 86 | 6. Generate adversarial examples: 87 | ``` 88 | python3 attack.py --in sample-000000.wav --target "this is a test" --out adv.wav --iterations 1000 --restore_path deepspeech-0.4.1-checkpoint/model.v0.4.1 89 | ``` 90 | 91 | 7. Verify the attack succeeded: 92 | ``` 93 | python3 classify.py --in adv.wav --restore_path deepspeech-0.4.1-checkpoint/model.v0.4.1 94 | ``` 95 | 96 | ## Docker Hub 97 | The docker images are available on Docker Hub. 98 | 99 | CPU-Version: `tomdoerr/aae_deepspeech_041_cpu` 100 | 101 | GPU-Version: `tomdoerr/aae_deepspeech_041_gpu` 102 | 103 | 104 | 105 | # Direct Install 106 | 107 | These are the original instructions from earlier. They will work, but require manual installs. 108 | 109 | 110 | 1. Install the dependencies 111 | ``` 112 | pip3 install tensorflow-gpu==1.14 progressbar numpy scipy pandas python_speech_features tables attrdict pyxdg 113 | pip3 install $(python3 util/taskcluster.py --decoder) 114 | ``` 115 | 116 | Download and install 117 | https://git-lfs.github.com/ 118 | 119 | 1b. Make sure you have installed git lfs. Otherwise later steps will mysteriously fail. 120 | 121 | 2. Clone the Mozilla DeepSpeech repository into a folder called DeepSpeech: 122 | ``` 123 | git clone https://github.com/mozilla/DeepSpeech.git 124 | ``` 125 | 126 | 2b. Checkout the correct version of the code: 127 | ``` 128 | (cd DeepSpeech; git checkout tags/v0.4.1) 129 | ``` 130 | 131 | 2c. If you get an error with tflite_convert, comment out DeepSpeech.py Line 21 132 | ``` 133 | from tensorflow.contrib.lite.python import tflite_convert 134 | ``` 135 | 136 | 3. Download the DeepSpeech model 137 | 138 | ``` 139 | wget https://github.com/mozilla/DeepSpeech/releases/download/v0.4.1/deepspeech-0.4.1-checkpoint.tar.gz 140 | tar -xzf deepspeech-0.4.1-checkpoint.tar.gz 141 | ``` 142 | 143 | 4. Verify that you have a file deepspeech-0.4.1-checkpoint/model.v0.4.1.data-00000-of-00001 144 | Its MD5 sum should be 145 | ``` 146 | ca825ad95066b10f5e080db8cb24b165 147 | ``` 148 | 149 | 5. Check that you can classify normal images correctly 150 | ``` 151 | python3 attack.py --in sample-000000.wav --restore_path deepspeech-0.4.1-checkpoint/model.v0.4.1 152 | ``` 153 | 154 | 6. Generate adversarial examples 155 | ``` 156 | python3 attack.py --in sample-000000.wav --target "this is a test" --out adv.wav --iterations 1000 --restore_path deepspeech-0.4.1-checkpoint/model.v0.4.1 157 | ``` 158 | 159 | 8. Verify the attack succeeded 160 | ``` 161 | python3 attack.py --in adv.wav --restore_path deepspeech-0.4.1-checkpoint/model.v0.4.1 162 | ``` 163 | -------------------------------------------------------------------------------- /attack.py: -------------------------------------------------------------------------------- 1 | ## attack.py -- generate audio adversarial examples 2 | ## 3 | ## Copyright (C) 2017, Nicholas Carlini . 4 | ## 5 | ## This program is licenced under the BSD 2-Clause licence, 6 | ## contained in the LICENCE file in this directory. 7 | 8 | import numpy as np 9 | import tensorflow as tf 10 | import argparse 11 | from shutil import copyfile 12 | 13 | import scipy.io.wavfile as wav 14 | 15 | import struct 16 | import time 17 | import os 18 | import sys 19 | from collections import namedtuple 20 | sys.path.append("DeepSpeech") 21 | 22 | try: 23 | import pydub 24 | except: 25 | print("pydub was not loaded, MP3 compression will not work") 26 | 27 | import DeepSpeech 28 | 29 | from tensorflow.python.keras.backend import ctc_label_dense_to_sparse 30 | from tf_logits import get_logits 31 | 32 | # These are the tokens that we're allowed to use. 33 | # The - token is special and corresponds to the epsilon 34 | # value in CTC decoding, and can not occur in the phrase. 35 | toks = " abcdefghijklmnopqrstuvwxyz'-" 36 | 37 | def convert_mp3(new, lengths): 38 | import pydub 39 | wav.write("/tmp/load.wav", 16000, 40 | np.array(np.clip(np.round(new[0][:lengths[0]]), 41 | -2**15, 2**15-1),dtype=np.int16)) 42 | pydub.AudioSegment.from_wav("/tmp/load.wav").export("/tmp/saved.mp3") 43 | raw = pydub.AudioSegment.from_mp3("/tmp/saved.mp3") 44 | mp3ed = np.array([struct.unpack(" 0: 167 | sess.run(self.delta.assign(finetune-audio)) 168 | 169 | # We'll make a bunch of iterations of gradient descent here 170 | now = time.time() 171 | MAX = self.num_iterations 172 | for i in range(MAX): 173 | iteration = i 174 | now = time.time() 175 | 176 | # Print out some debug information every 10 iterations. 177 | if i%10 == 0: 178 | new, delta, r_out, r_logits = sess.run((self.new_input, self.delta, self.decoded, self.logits)) 179 | lst = [(r_out, r_logits)] 180 | if self.mp3: 181 | mp3ed = convert_mp3(new, lengths) 182 | 183 | mp3_out, mp3_logits = sess.run((self.decoded, self.logits), 184 | {self.new_input: mp3ed}) 185 | lst.append((mp3_out, mp3_logits)) 186 | 187 | for out, logits in lst: 188 | chars = out[0].values 189 | 190 | res = np.zeros(out[0].dense_shape)+len(toks)-1 191 | 192 | for ii in range(len(out[0].values)): 193 | x,y = out[0].indices[ii] 194 | res[x,y] = out[0].values[ii] 195 | 196 | # Here we print the strings that are recognized. 197 | res = ["".join(toks[int(x)] for x in y).replace("-","") for y in res] 198 | print("\n".join(res)) 199 | 200 | # And here we print the argmax of the alignment. 201 | res2 = np.argmax(logits,axis=2).T 202 | res2 = ["".join(toks[int(x)] for x in y[:(l-1)//320]) for y,l in zip(res2,lengths)] 203 | print("\n".join(res2)) 204 | 205 | 206 | if self.mp3: 207 | new = sess.run(self.new_input) 208 | mp3ed = convert_mp3(new, lengths) 209 | feed_dict = {self.new_input: mp3ed} 210 | else: 211 | feed_dict = {} 212 | 213 | # Actually do the optimization ste 214 | d, el, cl, l, logits, new_input, _ = sess.run((self.delta, self.expanded_loss, 215 | self.ctcloss, self.loss, 216 | self.logits, self.new_input, 217 | self.train), 218 | feed_dict) 219 | 220 | # Report progress 221 | print("%.3f"%np.mean(cl), "\t", "\t".join("%.3f"%x for x in cl)) 222 | 223 | logits = np.argmax(logits,axis=2).T 224 | for ii in range(self.batch_size): 225 | # Every 100 iterations, check if we've succeeded 226 | # if we have (or if it's the final epoch) then we 227 | # should record our progress and decrease the 228 | # rescale constant. 229 | if (self.loss_fn == "CTC" and i%10 == 0 and res[ii] == "".join([toks[x] for x in target[ii]])) \ 230 | or (i == MAX-1 and final_deltas[ii] is None): 231 | # Get the current constant 232 | rescale = sess.run(self.rescale) 233 | if rescale[ii]*2000 > np.max(np.abs(d)): 234 | # If we're already below the threshold, then 235 | # just reduce the threshold to the current 236 | # point and save some time. 237 | print("It's way over", np.max(np.abs(d[ii]))/2000.0) 238 | rescale[ii] = np.max(np.abs(d[ii]))/2000.0 239 | 240 | # Otherwise reduce it by some constant. The closer 241 | # this number is to 1, the better quality the result 242 | # will be. The smaller, the quicker we'll converge 243 | # on a result but it will be lower quality. 244 | rescale[ii] *= .8 245 | 246 | # Adjust the best solution found so far 247 | final_deltas[ii] = new_input[ii] 248 | 249 | print("Worked i=%d ctcloss=%f bound=%f"%(ii,cl[ii], 2000*rescale[ii][0])) 250 | #print('delta',np.max(np.abs(new_input[ii]-audio[ii]))) 251 | sess.run(self.rescale.assign(rescale)) 252 | 253 | # Just for debugging, save the adversarial example 254 | # to /tmp so we can see it if we want 255 | wav.write("/tmp/adv.wav", 16000, 256 | np.array(np.clip(np.round(new_input[ii]), 257 | -2**15, 2**15-1),dtype=np.int16)) 258 | 259 | return final_deltas 260 | 261 | 262 | def main(): 263 | """ 264 | Do the attack here. 265 | 266 | This is all just boilerplate; nothing interesting 267 | happens in this method. 268 | 269 | For now we only support using CTC loss and only generating 270 | one adversarial example at a time. 271 | """ 272 | parser = argparse.ArgumentParser(description=None) 273 | parser.add_argument('--in', type=str, dest="input", nargs='+', 274 | required=True, 275 | help="Input audio .wav file(s), at 16KHz (separated by spaces)") 276 | parser.add_argument('--target', type=str, 277 | required=True, 278 | help="Target transcription") 279 | parser.add_argument('--out', type=str, nargs='+', 280 | required=False, 281 | help="Path for the adversarial example(s)") 282 | parser.add_argument('--outprefix', type=str, 283 | required=False, 284 | help="Prefix of path for adversarial examples") 285 | parser.add_argument('--finetune', type=str, nargs='+', 286 | required=False, 287 | help="Initial .wav file(s) to use as a starting point") 288 | parser.add_argument('--lr', type=int, 289 | required=False, default=100, 290 | help="Learning rate for optimization") 291 | parser.add_argument('--iterations', type=int, 292 | required=False, default=1000, 293 | help="Maximum number of iterations of gradient descent") 294 | parser.add_argument('--l2penalty', type=float, 295 | required=False, default=float('inf'), 296 | help="Weight for l2 penalty on loss function") 297 | parser.add_argument('--mp3', action="store_const", const=True, 298 | required=False, 299 | help="Generate MP3 compression resistant adversarial examples") 300 | parser.add_argument('--restore_path', type=str, 301 | required=True, 302 | help="Path to the DeepSpeech checkpoint (ending in model0.4.1)") 303 | args = parser.parse_args() 304 | while len(sys.argv) > 1: 305 | sys.argv.pop() 306 | 307 | with tf.Session() as sess: 308 | finetune = [] 309 | audios = [] 310 | lengths = [] 311 | 312 | if args.out is None: 313 | assert args.outprefix is not None 314 | else: 315 | assert args.outprefix is None 316 | assert len(args.input) == len(args.out) 317 | if args.finetune is not None and len(args.finetune): 318 | assert len(args.input) == len(args.finetune) 319 | 320 | # Load the inputs that we're given 321 | for i in range(len(args.input)): 322 | fs, audio = wav.read(args.input[i]) 323 | assert fs == 16000 324 | assert audio.dtype == np.int16 325 | print('source dB', 20*np.log10(np.max(np.abs(audio)))) 326 | audios.append(list(audio)) 327 | lengths.append(len(audio)) 328 | 329 | if args.finetune is not None: 330 | finetune.append(list(wav.read(args.finetune[i])[1])) 331 | 332 | maxlen = max(map(len,audios)) 333 | audios = np.array([x+[0]*(maxlen-len(x)) for x in audios]) 334 | finetune = np.array([x+[0]*(maxlen-len(x)) for x in finetune]) 335 | 336 | phrase = args.target 337 | 338 | # Set up the attack class and run it 339 | attack = Attack(sess, 'CTC', len(phrase), maxlen, 340 | batch_size=len(audios), 341 | mp3=args.mp3, 342 | learning_rate=args.lr, 343 | num_iterations=args.iterations, 344 | l2penalty=args.l2penalty, 345 | restore_path=args.restore_path) 346 | deltas = attack.attack(audios, 347 | lengths, 348 | [[toks.index(x) for x in phrase]]*len(audios), 349 | finetune) 350 | 351 | # And now save it to the desired output 352 | if args.mp3: 353 | convert_mp3(deltas, lengths) 354 | copyfile("/tmp/saved.mp3", args.out[0]) 355 | print("Final distortion", np.max(np.abs(deltas[0][:lengths[0]]-audios[0][:lengths[0]]))) 356 | else: 357 | for i in range(len(args.input)): 358 | if args.out is not None: 359 | path = args.out[i] 360 | else: 361 | path = args.outprefix+str(i)+".wav" 362 | wav.write(path, 16000, 363 | np.array(np.clip(np.round(deltas[i][:lengths[i]]), 364 | -2**15, 2**15-1),dtype=np.int16)) 365 | print("Final distortion", np.max(np.abs(deltas[i][:lengths[i]]-audios[i][:lengths[i]]))) 366 | 367 | main() 368 | -------------------------------------------------------------------------------- /classify.py: -------------------------------------------------------------------------------- 1 | ## classify.py -- actually classify a sequence with DeepSpeech 2 | ## 3 | ## Copyright (C) 2017, Nicholas Carlini . 4 | ## 5 | ## This program is licenced under the BSD 2-Clause licence, 6 | ## contained in the LICENCE file in this directory. 7 | 8 | import numpy as np 9 | import tensorflow as tf 10 | import argparse 11 | 12 | import scipy.io.wavfile as wav 13 | 14 | import time 15 | import os 16 | os.environ['CUDA_VISIBLE_DEVICES'] = '' 17 | import sys 18 | from collections import namedtuple 19 | sys.path.append("DeepSpeech") 20 | import DeepSpeech 21 | 22 | try: 23 | import pydub 24 | import struct 25 | except: 26 | print("pydub was not loaded, MP3 compression will not work") 27 | 28 | from tf_logits import get_logits 29 | 30 | 31 | # These are the tokens that we're allowed to use. 32 | # The - token is special and corresponds to the epsilon 33 | # value in CTC decoding, and can not occur in the phrase. 34 | toks = " abcdefghijklmnopqrstuvwxyz'-" 35 | 36 | 37 | 38 | def main(): 39 | parser = argparse.ArgumentParser(description=None) 40 | parser.add_argument('--in', type=str, dest="input", 41 | required=True, 42 | help="Input audio .wav file(s), at 16KHz (separated by spaces)") 43 | parser.add_argument('--restore_path', type=str, 44 | required=True, 45 | help="Path to the DeepSpeech checkpoint (ending in model0.4.1)") 46 | args = parser.parse_args() 47 | while len(sys.argv) > 1: 48 | sys.argv.pop() 49 | with tf.Session() as sess: 50 | if args.input.split(".")[-1] == 'mp3': 51 | raw = pydub.AudioSegment.from_mp3(args.input) 52 | audio = np.array([struct.unpack(". 4 | ## 5 | ## This program is licenced under the BSD 2-Clause licence, 6 | ## contained in the LICENCE file in this directory. 7 | 8 | 9 | import numpy as np 10 | import tensorflow as tf 11 | import argparse 12 | 13 | import scipy.io.wavfile as wav 14 | 15 | import time 16 | import os 17 | import sys 18 | 19 | sys.path.append("DeepSpeech") 20 | import DeepSpeech 21 | 22 | def compute_mfcc(audio, **kwargs): 23 | """ 24 | Compute the MFCC for a given audio waveform. This is 25 | identical to how DeepSpeech does it, but does it all in 26 | TensorFlow so that we can differentiate through it. 27 | """ 28 | 29 | batch_size, size = audio.get_shape().as_list() 30 | audio = tf.cast(audio, tf.float32) 31 | 32 | # 1. Pre-emphasizer, a high-pass filter 33 | audio = tf.concat((audio[:, :1], audio[:, 1:] - 0.97*audio[:, :-1], np.zeros((batch_size,512),dtype=np.float32)), 1) 34 | 35 | # 2. windowing into frames of 512 samples, overlapping 36 | windowed = tf.stack([audio[:, i:i+512] for i in range(0,size-320,320)],1) 37 | 38 | window = np.hamming(512) 39 | windowed = windowed * window 40 | 41 | # 3. Take the FFT to convert to frequency space 42 | ffted = tf.spectral.rfft(windowed, [512]) 43 | ffted = 1.0 / 512 * tf.square(tf.abs(ffted)) 44 | 45 | # 4. Compute the Mel windowing of the FFT 46 | energy = tf.reduce_sum(ffted,axis=2)+np.finfo(float).eps 47 | filters = np.load("filterbanks.npy").T 48 | feat = tf.matmul(ffted, np.array([filters]*batch_size,dtype=np.float32))+np.finfo(float).eps 49 | 50 | # 5. Take the DCT again, because why not 51 | feat = tf.log(feat) 52 | feat = tf.spectral.dct(feat, type=2, norm='ortho')[:,:,:26] 53 | 54 | # 6. Amplify high frequencies for some reason 55 | _,nframes,ncoeff = feat.get_shape().as_list() 56 | n = np.arange(ncoeff) 57 | lift = 1 + (22/2.)*np.sin(np.pi*n/22) 58 | feat = lift*feat 59 | width = feat.get_shape().as_list()[1] 60 | 61 | 62 | # 7. And now stick the energy next to the features 63 | feat = tf.concat((tf.reshape(tf.log(energy),(-1,width,1)), feat[:, :, 1:]), axis=2) 64 | 65 | return feat 66 | 67 | 68 | def get_logits(new_input, length, first=[]): 69 | """ 70 | Compute the logits for a given waveform. 71 | 72 | First, preprocess with the TF version of MFC above, 73 | and then call DeepSpeech on the features. 74 | """ 75 | 76 | batch_size = new_input.get_shape()[0] 77 | 78 | # 1. Compute the MFCCs for the input audio 79 | # (this is differentable with our implementation above) 80 | empty_context = np.zeros((batch_size, 9, 26), dtype=np.float32) 81 | new_input_to_mfcc = compute_mfcc(new_input) 82 | features = tf.concat((empty_context, new_input_to_mfcc, empty_context), 1) 83 | 84 | # 2. We get to see 9 frames at a time to make our decision, 85 | # so concatenate them together. 86 | features = tf.reshape(features, [new_input.get_shape()[0], -1]) 87 | features = tf.stack([features[:, i:i+19*26] for i in range(0,features.shape[1]-19*26+1,26)],1) 88 | features = tf.reshape(features, [batch_size, -1, 19, 26]) 89 | 90 | 91 | # 3. Finally we process it with DeepSpeech 92 | # We need to init DeepSpeech the first time we're called 93 | if first == []: 94 | first.append(False) 95 | 96 | DeepSpeech.create_flags() 97 | tf.app.flags.FLAGS.alphabet_config_path = "DeepSpeech/data/alphabet.txt" 98 | DeepSpeech.initialize_globals() 99 | 100 | logits, _ = DeepSpeech.BiRNN(features, length, [0]*10) 101 | 102 | return logits 103 | -------------------------------------------------------------------------------- /xdg.py: -------------------------------------------------------------------------------- 1 | # Even more hacks. 2 | 3 | class BaseDirectory: 4 | def save_data_path(*args, **kwargs): 5 | return 6 | def save_data_path(*args, **kwargs): 7 | return 8 | --------------------------------------------------------------------------------