├── .gitignore ├── LICENSE ├── README.md ├── src ├── corpus_statistics.py ├── custom_ops.py ├── deepSpeech.py ├── deepSpeech_input.py ├── deepSpeech_test.py ├── deepSpeech_train.py ├── helper_routines.py ├── mkldnn_rnn_op.py ├── preprocess_LibriSpeech.py ├── setenvs.py ├── test.sh ├── train.sh ├── validation.sh └── vtune.sh └── tools ├── parse_log.py ├── prof.py └── vtune.sh /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2017 Ford Motor Company 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | * Redistributions of source code must retain the above copyright 7 | notice, this list of conditions and the following disclaimer. 8 | * Redistributions in binary form must reproduce the above copyright 9 | notice, this list of conditions and the following disclaimer in the 10 | documentation and/or other materials provided with the distribution. 11 | * Neither the name of the copyright holder nor the 12 | names of its contributors may be used to endorse or promote products 13 | derived from this software without specific prior written permission. 14 | 15 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 16 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 17 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 18 | DISCLAIMED. IN NO EVENT SHALL BE LIABLE FOR ANY 19 | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 20 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 21 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 22 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 23 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 24 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 25 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # TensorFlow implementation of DeepSpeech2 2 | End-to-end speech recognition using TensorFlow 3 | 4 | This repository contains TensorFlow code for an end-to-end speech recognition engine by implementing Baidu's DeepSpeech2 model on IA architectures. This work was based on the code developed by Ford[https://github.com/fordDeepDSP/deepSpeech] and many changes have been conducted to fin our solution. 5 | 6 | This software is released under a BSD license. The license to this software does not apply to TensorFlow, which is available under the Apache 2.0 license, or the third party pre-requisites listed below, which are available under their own respective licenses. 7 | 8 | Pre-requisites 9 | ------------- 10 | * TensorFlow - version: 1.1.0, 1.2.0 11 | * Python - version: 2.7 12 | * python-levenshtein - to compute Character-Error-Rate 13 | * python_speech_features - to generate mfcc features 14 | * PySoundFile - to read FLAC files 15 | * scipy - helper functions for windowing 16 | * tqdm - for displaying a progress bar 17 | 18 | Getting started 19 | ------------------ 20 | *Step 1: Install all dependencies.* 21 | 22 | ```shell 23 | $ yum install libsndfile 24 | $ pip install python-Levenshtein 25 | $ pip install python_speech_features 26 | $ pip install PySoundFile 27 | $ pip install scipy 28 | $ pip install tqdm 29 | 30 | # Install TensorFlow 1.2.0: 31 | $ pip install 'tensorflow==1.2.0' 32 | 33 | # [GPU ONLY] Update ~/.bashrc to reflect path for CUDA. 34 | 1. Add these lines to the ~/.bashrc: 35 | export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64" 36 | export CUDA_HOME=/usr/local/cuda 37 | 2. Install TF GPU package 38 | $ pip install --upgrade 'tensorflow-gpu==1.2.0' 39 | 40 | ``` 41 | *Step 2: Clone this git repo.* 42 | ```shell 43 | $ git clone https://github.com/yao-matrix/deepSpeech2.git 44 | $ cd deepSpeech 45 | ``` 46 | 47 | Preprocessing the data 48 | ---------------------- 49 | *Step 1: Download and unpack the LibriSpeech data* 50 | 51 | Inside the github repo that you have cloned run: 52 | ```shell 53 | $ mkdir -p data/librispeech 54 | $ cd data/librispeech 55 | $ wget http://www.openslr.org/resources/12/train-clean-100.tar.gz 56 | $ wget http://www.openslr.org/resources/12/dev-clean.tar.gz 57 | $ wget http://www.openslr.org/resources/12/test-clean.tar.gz 58 | $ mkdir audio 59 | $ cd audio 60 | $ tar xvzf ../train-clean-100.tar.gz LibriSpeech/train-clean-100 --strip-components=1 61 | $ tar xvzf ../dev-clean.tar.gz LibriSpeech/dev-clean --strip-components=1 62 | $ tar xvzf ../test-clean.tar.gz LibriSpeech/test-clean --strip-components=1 63 | # delete audios which are too short 64 | $ rm -rf LibriSpeech/train-clean-100/1578/6379/1578-6379-0029.flac 65 | $ rm -rf LibriSpeech/train-clean-100/460/172359/460-172359-0090.flac 66 | ``` 67 | *Step 2: Run this command to preprocess the audio and generate TFRecord files.* 68 | 69 | The computed mfcc features will be stored within TFRecords files inside data/librispeech/processed/ 70 | ```shell 71 | $ cd ./src 72 | $ python preprocess_LibriSpeech.py 73 | ``` 74 | 75 | Training a model w/ dummy data 76 | ---------------- 77 | ```shell 78 | $ cd ./src 79 | $ vim ./train.sh 80 | # let dummy=1 in train.sh 81 | $ ./train.sh 82 | ``` 83 | 84 | Training a model w/ real data 85 | ---------------- 86 | ```shell 87 | # To continue training from a saved checkpoint file 88 | $ cd ./src 89 | $ vim ./train.sh 90 | # let dummy=0 in train.sh 91 | $ ./train.sh 92 | ``` 93 | The script train.sh contains commands to train on utterances in sorted order for the first epoch and then to resume training on shuffled utterances. 94 | Note that during the first epoch, the cost will increase and it will take longer to train on later steps because the utterances are presented in sorted order to the network. 95 | 96 | Monitoring training 97 | -------------------- 98 | Since the training data is fed through a shuffled queue, to check validation loss a separate graph needs to be set up in a different session. This graph is fed with the valildation data to compute predictions. The deepSpeech_test.py script initializes the graph from a previously saved checkpoint file and computes the CER on the eval_data every 5 minutes by default. It saves the computed CER values in the models/librispeech/eval folder. By calling tensorboard with logdir set to models/librispeech, it is possible to monitor validation CER and training loss during training. 99 | ```shell 100 | $ cd ./src 101 | $ ./validation.sh 102 | $ tensorboard --logdir PATH_TO_SUMMARY 103 | ``` 104 | Testing a model 105 | ---------------- 106 | ```shell 107 | $ cd ./src 108 | $ ./test.sh 109 | ``` 110 | 111 | # Thanks 112 | Thanks to Aswathy for helping refine the README 113 | -------------------------------------------------------------------------------- /src/corpus_statistics.py: -------------------------------------------------------------------------------- 1 | 2 | librispeech = {1: 28, 2: 563, 3: 681, 4: 672, 5: 623, 6: 637, 7: 661, 8: 722, 9: 784, 10: 980, 11: 1384, 12: 2505, 13: 4090, 14: 6041, 15: 6402, 16: 1714, 17: 47, 18: 1, 19: 2, 20: 0, 21: 0, 22: 0, 23: 0, 24: 1} 3 | 4 | total = sum(librispeech.values()) 5 | print "total is: %d" % (total) 6 | -------------------------------------------------------------------------------- /src/custom_ops.py: -------------------------------------------------------------------------------- 1 | """ 2 | Custom RNN Cell definition. 3 | Default RNNCell in TensorFlow throws errors when 4 | variables are re-used between devices. 5 | """ 6 | import tensorflow as tf 7 | 8 | from tensorflow.contrib.rnn import BasicRNNCell 9 | from tensorflow.python.util import nest 10 | from tensorflow.python.training import moving_averages 11 | from tensorflow.python.ops import array_ops 12 | from tensorflow.python.ops import control_flow_ops 13 | from tensorflow.python.framework import ops 14 | 15 | from helper_routines import _variable_on_cpu 16 | 17 | class CustomRNNCell(BasicRNNCell): 18 | """ This is a customRNNCell that allows the weights 19 | to be re-used on multiple devices. In particular, the Matrix of weights is 20 | set using _variable_on_cpu. 21 | The default version of the BasicRNNCell, did not support the ability to 22 | pin weights on one device (say cpu). 23 | """ 24 | 25 | def __init__(self, num_units, input_size=None, activation=tf.nn.relu6, use_fp16=False): 26 | self._num_units = num_units 27 | self._activation = activation 28 | self.use_fp16 = use_fp16 29 | 30 | def __call__(self, inputs, state, scope = None): 31 | """Most basic RNN: 32 | output = new_state = activation(W * input + U * state + B).""" 33 | with tf.variable_scope(scope or type(self).__name__): 34 | output = self._activation(_linear([inputs, state], self._num_units, 35 | True, use_fp16 = self.use_fp16)) 36 | return output, output 37 | 38 | 39 | class CustomRNNCell2(BasicRNNCell): 40 | """ This is a customRNNCell2 that allows the weights 41 | to be re-used on multiple devices. In particular, the Matrix of weights is 42 | set using _variable_on_cpu. 43 | The default version of the BasicRNNCell, did not support the ability to 44 | pin weights on one device (say cpu). 45 | """ 46 | 47 | def __init__(self, num_units, input_size=None, activation=tf.nn.relu6): 48 | self._num_units = num_units 49 | 50 | def __call__(self, inputs, state, scope=None): 51 | """ 52 | output = new_state = activation(BN(W * input) + U * state + B). 53 | state dim: batch_size * num_units 54 | input dim: batch_size * feature_size 55 | W: feature_size * num_units 56 | U: num_units * num_units 57 | """ 58 | with tf.variable_scope(scope or type(self).__name__): 59 | # print "rnn cell input size: ", inputs.get_shape().as_list() 60 | # print "rnn cell state size: ", state.get_shape().as_list() 61 | wsize = inputs.get_shape()[1] 62 | w = _variable_on_cpu('W', [self._num_units, wsize], initializer=tf.orthogonal_initializer()) 63 | # print w.name 64 | resi = tf.matmul(inputs, w, transpose_a=False, transpose_b=True) 65 | # batch_size * num_units 66 | bn_resi = seq_batch_norm(resi) 67 | # bn_resi = resi 68 | usize = state.get_shape()[1] 69 | u = _variable_on_cpu('U', [self._num_units, usize], initializer=tf.orthogonal_initializer()) 70 | resu = tf.matmul(state, u, transpose_a=False, transpose_b=True) 71 | # res_nb = tf.add_n([bn_resi, resu]) 72 | res_nb = tf.add(bn_resi, resu) 73 | bias = _variable_on_cpu('B', [self._num_units], 74 | tf.constant_initializer(0)) 75 | res = tf.add(res_nb, bias) 76 | output = relux(res, capping=20) 77 | return output, output 78 | 79 | 80 | def stacked_brnn(cell_fw, cell_bw, num_units, num_layers, inputs, batch_size, conved_seq_lens): 81 | """ 82 | multi layer bidirectional rnn 83 | :param cell: RNN cell 84 | :param num_units: hidden unit of RNN cell 85 | :param num_layers: the number of layers 86 | :param inputs: the input sequence 87 | :param batch_size: batch size 88 | :return: the output of last layer bidirectional rnn with concatenating 89 | """ 90 | prev_layer = inputs 91 | for i in xrange(num_layers): 92 | with tf.variable_scope("brnn-%d" % i) as scope: 93 | state_fw = cell_fw.zero_state(batch_size, tf.float32) 94 | state_bw = cell_fw.zero_state(batch_size, tf.float32) 95 | (outputs, state) = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, prev_layer, sequence_length=conved_seq_lens, 96 | initial_state_fw=state_fw, initial_state_bw=state_bw, dtype=tf.float32, time_major=True) 97 | outputs_fw, outputs_bw = outputs 98 | # prev_layer = tf.add_n([outputs_fw, outputs_bw]) 99 | prev_layer = array_ops.concat(outputs, 2) 100 | return prev_layer 101 | 102 | 103 | def relux(x, capping=None): 104 | """Clipped ReLU""" 105 | x = tf.nn.relu(x) 106 | if capping is not None: 107 | y = tf.minimum(x, capping) 108 | return y 109 | 110 | 111 | def batch_norm2(inputs, 112 | decay=0.999, 113 | center=True, 114 | scale=True, 115 | epsilon=0.001, 116 | moving_vars='moving_vars', 117 | activation=None, 118 | is_training=True, 119 | trainable=True, 120 | scope=None, 121 | data_format='NHWC'): 122 | """Adds a Batch Normalization layer. 123 | 124 | Args: 125 | inputs: a tensor of size [batch_size, height, width, channels] 126 | or [batch_size, channels]. 127 | decay: decay for the moving average. 128 | center: If True, subtract beta. If False, beta is not created and ignored. 129 | scale: If True, multiply by gamma. If False, gamma is 130 | not used. When the next layer is linear (also e.g. ReLU), this can be 131 | disabled since the scaling can be done by the next layer. 132 | epsilon: small float added to variance to avoid dividing by zero. 133 | moving_vars: collection to store the moving_mean and moving_variance. 134 | activation: activation function. 135 | is_training: whether or not the model is in training mode. 136 | trainable: whether or not the variables should be trainable or not. 137 | scope: Optional scope for variable_scope. 138 | 139 | Returns: 140 | a tensor representing the output of the operation. 141 | 142 | """ 143 | inputs_shape = inputs.get_shape() 144 | with tf.variable_scope('bn2'): 145 | if data_format == 'NCHW': 146 | params_shape = inputs_shape[1] 147 | else: 148 | params_shape = inputs_shape[-1] 149 | 150 | # scale 151 | scale = _variable_on_cpu('scale', params_shape, initializer=tf.ones_initializer()) 152 | # shift 153 | shift = _variable_on_cpu('shift', params_shape, initializer=tf.zeros_initializer()) 154 | 155 | moving_mean = _variable_on_cpu('moving_mean', [params_shape], initializer=tf.zeros_initializer(), trainable=False) 156 | moving_var = _variable_on_cpu('moving_variance', [params_shape], initializer=tf.ones_initializer(), trainable=False) 157 | 158 | if is_training: 159 | y, batch_mean, batch_var = tf.nn.fused_batch_norm(inputs, gamma, beta, mean=None, variance=None, epsilon=epsilon, 160 | data_format=data_format, is_training=is_training) 161 | moving_averages.assign_moving_average(moving_mean, batch_mean, decay) 162 | moving_averages.assign_moving_average(moving_var, batch_var, decay) 163 | else: 164 | y, _, _ = tf.nn.fused_batch_norm(inputs, scale, shift, mean=moving_mean, variance=moving_var, epsilon=epsilon, 165 | data_format=data_format, is_training=is_training) 166 | return y 167 | 168 | 169 | def batch_norm(x, scope=None, is_train=True, data_format=None): 170 | """batch normalization, currently only work on NHWC""" 171 | with tf.variable_scope(scope or 'bn'): 172 | inputs_shape = x.get_shape() 173 | param_shape = inputs_shape[-1] 174 | 175 | batch_mean, batch_var = tf.nn.moments(x, [0, 1, 2], name='moments') 176 | 177 | ema = tf.train.ExponentialMovingAverage(decay=0.9997) 178 | def mean_var_with_update(): 179 | ema_apply_op = ema.apply([batch_mean, batch_var]) 180 | with tf.control_dependencies([ema_apply_op]): 181 | return tf.identity(batch_mean), tf.identity(batch_var) 182 | 183 | mean, var = control_flow_ops.cond(tf.cast(is_train, "bool"), mean_var_with_update, lambda:(ema.average(batch_mean), ema.average(batch_var))) 184 | 185 | offset = _variable_on_cpu('offset', [param_shape], initializer=tf.zeros_initializer()) 186 | scale = _variable_on_cpu('scale', [param_shape], initializer=tf.ones_initializer()) 187 | 188 | normed = tf.nn.batch_normalization(x, mean, var, offset, scale, 0.001) 189 | return normed 190 | 191 | 192 | def seq_batch_norm(x, scope=None, is_train=True): 193 | """sequence batch normalization, input N * D""" 194 | with tf.variable_scope("sbn"): 195 | inputs_shape = x.get_shape() 196 | param_shape = inputs_shape[-1] 197 | 198 | batch_mean, batch_var = tf.nn.moments(x, [0], name='moments') 199 | 200 | ema = tf.train.ExponentialMovingAverage(decay=0.9997) 201 | def mean_var_with_update(): 202 | ema_apply_op = ema.apply([batch_mean, batch_var]) 203 | with tf.control_dependencies([ema_apply_op]): 204 | return tf.identity(batch_mean), tf.identity(batch_var) 205 | 206 | mean, var = control_flow_ops.cond(tf.cast(is_train, "bool"), mean_var_with_update, lambda:(ema.average(batch_mean), ema.average(batch_var))) 207 | 208 | offset = _variable_on_cpu('offset', [param_shape], initializer=tf.zeros_initializer(), trainable=True) 209 | scale = _variable_on_cpu('scale', [param_shape], initializer=tf.ones_initializer(), trainable=True) 210 | 211 | normed = tf.nn.batch_normalization(x, mean, var, offset, scale, 0.001) 212 | return normed 213 | 214 | 215 | def _linear(args, output_size, bias, scope=None, use_fp16=False): 216 | """Linear map: sum_i(args[i] * W[i]), where W[i] is a variable. 217 | 218 | Args: 219 | args: a 2D Tensor or a list of 2D, batch x n, Tensors. 220 | output_size: int, second dimension of W[i]. 221 | bias: boolean, whether to add a bias term or not. 222 | bias_start: starting value to initialize the bias; 0 by default. 223 | scope: VariableScope for the created subgraph; defaults to "Linear". 224 | 225 | Returns: 226 | A 2D Tensor with shape [batch x output_size] equal to 227 | sum_i(args[i] * W[i]), where W[i]s are newly created matrices. 228 | 229 | Raises: 230 | ValueError: if some of the arguments has unspecified or wrong shape. 231 | """ 232 | if args is None or (nest.is_sequence(args) and not args): 233 | raise ValueError("`args` must be specified") 234 | if not nest.is_sequence(args): 235 | args = [args] 236 | 237 | # Calculate the total size of arguments on dimension 1. 238 | total_arg_size = 0 239 | shapes = [a.get_shape().as_list() for a in args] 240 | for shape in shapes: 241 | if len(shape) != 2: 242 | raise ValueError( 243 | "Linear is expecting 2D arguments: %s" % str(shapes)) 244 | if not shape[1]: 245 | raise ValueError( 246 | "Linear expects shape[1] of arguments: %s" % str(shapes)) 247 | else: 248 | total_arg_size += shape[1] 249 | 250 | dtype = [a.dtype for a in args][0] 251 | 252 | # Now the computation. 253 | with tf.variable_scope(scope or "Linear"): 254 | matrix = _variable_on_cpu('Matrix', [total_arg_size, output_size], 255 | use_fp16 = use_fp16) 256 | if use_fp16: 257 | dtype = tf.float16 258 | else: 259 | dtype = tf.float32 260 | args = [tf.cast(x, dtype) for x in args] 261 | if len(args) == 1: 262 | res = tf.matmul(args[0], matrix, transpose_a=False, transpose_b=True) 263 | else: 264 | res = tf.matmul(tf.concat(args, 1), matrix, transpose_a=False, transpose_b=True) 265 | if not bias: 266 | return res 267 | bias_term = _variable_on_cpu('Bias', [output_size], 268 | tf.constant_initializer(0), 269 | use_fp16=use_fp16) 270 | return res + bias_term 271 | -------------------------------------------------------------------------------- /src/deepSpeech.py: -------------------------------------------------------------------------------- 1 | # Author: Lakshmi Krishnan 2 | # Email: lkrishn7@ford.com 3 | # Author: YAO Matrix 4 | # Email: yaoweifeng0301@126.com 5 | 6 | 7 | """Builds the deepSpeech network. 8 | 9 | Summary of major functions: 10 | 11 | # Compute input feats and labels for training. 12 | inputs, labels, seq_len = inputs() 13 | 14 | # Compute inference on the model inputs to make a prediction. 15 | predictions = inference(inputs) 16 | 17 | # Compute the total loss of the prediction with respect to the labels. 18 | loss = loss(predictions, labels) 19 | 20 | """ 21 | 22 | 23 | import tensorflow as tf 24 | 25 | from helper_routines import _variable_on_cpu 26 | from helper_routines import _variable_with_weight_decay 27 | from helper_routines import _activation_summary 28 | import custom_ops 29 | import deepSpeech_input 30 | 31 | # Global constants describing the speech data set. 32 | NUM_CLASSES = deepSpeech_input.NUM_CLASSES 33 | NUM_PER_EPOCH_FOR_TRAIN = deepSpeech_input.NUM_PER_EPOCH_FOR_TRAIN 34 | NUM_PER_EPOCH_FOR_EVAL = deepSpeech_input.NUM_PER_EPOCH_FOR_EVAL 35 | NUM_PER_EPOCH_FOR_TEST = deepSpeech_input.NUM_PER_EPOCH_FOR_TEST 36 | 37 | 38 | def get_rnn_seqlen(seq_lens): 39 | # seq_lens = tf.Print(seq_lens, [seq_lens], "Original seq len: ", 32) 40 | seq_lens = tf.cast(seq_lens, tf.float64) 41 | rnn_seq_lens = tf.div(tf.subtract(seq_lens, 10), 2.0) 42 | rnn_seq_lens = tf.ceil(rnn_seq_lens) 43 | rnn_seq_lens = tf.div(tf.subtract(rnn_seq_lens, 10), 1.0) 44 | rnn_seq_lens = tf.ceil(rnn_seq_lens) 45 | rnn_seq_lens = tf.cast(rnn_seq_lens, tf.int32) 46 | 47 | 48 | # rnn_seq_lens = tf.Print(rnn_seq_lens, [rnn_seq_lens], "Conved seq len: ", summarize=32) 49 | # print "rnn_seq_lens shape: ", rnn_seq_lens.get_shape().as_list() 50 | return rnn_seq_lens 51 | 52 | 53 | def inputs(eval_data, data_dir, batch_size, use_fp16, shuffle): 54 | """Construct input for LibriSpeech model evaluation using the Reader ops. 55 | 56 | Args: 57 | eval_data: 'train', 'test' or 'eval' 58 | data_dir: folder containing the pre-processed data 59 | batch_size: int,size of mini-batch 60 | use_fp16: bool, if True use fp16 else fp32 61 | shuffle: bool, to shuffle the tfrecords or not. 62 | 63 | Returns: 64 | feats: Spetrogram. 3D tensor of [batch_size, T, F] size. 65 | labels: Labels. 1D tensor of [batch_size] size. 66 | seq_lens: SeqLens. 1D tensor of [batch_size] size. 67 | 68 | Raises: 69 | ValueError: If no data_dir 70 | """ 71 | if not data_dir: 72 | raise ValueError('Please supply a data_dir') 73 | print 'Using LibriSpeech Corpus' 74 | feats, labels, seq_lens = deepSpeech_input.inputs(eval_data=eval_data, 75 | data_dir=data_dir, 76 | batch_size=batch_size, 77 | shuffle=shuffle) 78 | return feats, labels, seq_lens 79 | 80 | 81 | def inference(sess, feats, seq_lens, params): 82 | """Build the deepSpeech model. 83 | 84 | Args: 85 | feats: MFCC features returned from distorted_inputs() or inputs(). 86 | seq_lens: Input sequence length per utterance. 87 | params: parameters of the model. 88 | 89 | Returns: 90 | logits. 91 | """ 92 | # data layout: N, T, F 93 | # feat_len = feats.get_shape().as_list()[-1] 94 | # print "feat shape: ", feats.get_shape().as_list() 95 | 96 | ######################### 97 | # convolutional layers 98 | ######################### 99 | with tf.variable_scope('conv1') as scope: 100 | ## N, T, F 101 | feats = tf.expand_dims(feats, axis=3) 102 | 103 | ## N, T, F, 1 104 | # convolution 105 | kernel = _variable_with_weight_decay('weights', 106 | shape=[11, 41, 1, params.num_filters], 107 | wd_value=None, 108 | use_fp16=params.use_fp16) 109 | conv = tf.nn.conv2d(feats, kernel, 110 | [1, 2, 2, 1], 111 | padding='VALID') 112 | # biases = _variable_on_cpu('biases', [params.num_filters], 113 | # tf.constant_initializer(-0.05), 114 | # params.use_fp16) 115 | # bias = tf.nn.bias_add(conv, biases) 116 | 117 | ## N, T, F, 32 118 | # batch normalization 119 | bn = custom_ops.batch_norm(conv) 120 | 121 | # clipped ReLU 122 | conv1 = custom_ops.relux(bn, capping=20) 123 | _activation_summary(conv1) 124 | 125 | with tf.variable_scope('conv2') as scope: 126 | ## N, T, F, 32 127 | # convolution 128 | kernel = _variable_with_weight_decay('weights', 129 | shape=[11, 21, params.num_filters, params.num_filters], 130 | wd_value=None, 131 | use_fp16=params.use_fp16) 132 | conv = tf.nn.conv2d(conv1, 133 | kernel, 134 | [1, 1, 2, 1], 135 | padding='VALID') 136 | # biases = _variable_on_cpu('biases', 137 | # [params.num_filters], 138 | # tf.constant_initializer(-0.05), 139 | # params.use_fp16) 140 | # bias = tf.nn.bias_add(conv, biases) 141 | 142 | ## N, T, F, 32 143 | # batch normalization 144 | bn = custom_ops.batch_norm(conv) 145 | 146 | # clipped ReLU 147 | conv2 = custom_ops.relux(bn, capping=20) 148 | _activation_summary(conv2) 149 | 150 | ###################### 151 | # recurrent layers 152 | ###################### 153 | # Reshape conv output to fit rnn input: N, T, F * C 154 | fdim = conv2.get_shape().dims 155 | feat_dim = fdim[2].value * fdim[3].value 156 | rnn_input = tf.reshape(conv2, [params.batch_size, -1, feat_dim]) 157 | 158 | # Permute into time major order for rnn: T, N, F * C 159 | rnn_input = tf.transpose(rnn_input, perm=[1, 0, 2]) 160 | 161 | fw_cell = custom_ops.CustomRNNCell2(params.num_hidden) 162 | # fw_cell_list = [fw_cell] * params.num_rnn_layers 163 | 164 | # bw_cell = custom_ops.CustomRNNCell2(params.num_hidden) 165 | # bw_cell_list = [bw_cell] * params.num_rnn_layers 166 | 167 | 168 | conved_seq_lens = get_rnn_seqlen(seq_lens) 169 | 170 | rnn_outputs = custom_ops.stacked_brnn(fw_cell, fw_cell, params.num_hidden, params.num_rnn_layers, rnn_input, params.batch_size, conved_seq_lens) 171 | _activation_summary(rnn_outputs) 172 | 173 | # Linear layer(WX + b) - softmax is applied by CTC cost function. 174 | with tf.variable_scope('softmax_linear') as scope: 175 | weights = _variable_with_weight_decay('weights', [NUM_CLASSES, params.num_hidden * 2], 176 | wd_value=None, 177 | use_fp16=params.use_fp16) 178 | biases = _variable_on_cpu('biases', [NUM_CLASSES], 179 | tf.constant_initializer(0.0), 180 | params.use_fp16) 181 | logit_inputs = tf.reshape(rnn_outputs, [-1, params.num_hidden * 2]) 182 | logits = tf.add(tf.matmul(logit_inputs, weights, transpose_a=False, transpose_b=True), 183 | biases, name=scope.name) 184 | logits = tf.reshape(logits, [-1, params.batch_size, NUM_CLASSES]) 185 | _activation_summary(logits) 186 | 187 | return logits 188 | 189 | 190 | def loss(logits, labels, seq_lens): 191 | """Compute mean CTC Loss. 192 | 193 | Add summary for "Loss" and "Loss/avg". 194 | Args: 195 | logits: Logits from inference(). layout: T N F 196 | labels: Labels from distorted_inputs or inputs(). 1-D tensor 197 | of shape [batch_size] 198 | seq_lens: Length of each utterance for ctc cost computation. 199 | 200 | Returns: 201 | Loss tensor of type float. 202 | """ 203 | # dense_labels = tf.sparse_tensor_to_dense(labels) 204 | # dense_labels = tf.Print(dense_labels, [dense_labels], "labels: ") 205 | 206 | ## scheme 1 207 | # logits_shape = tf.shape(logits) 208 | # logits_shape = tf.Print(logits_shape, [logits_shape], "logits shape: ") 209 | # max_seq_len = logits_shape[0] 210 | # batch_size = logits_shape[1] 211 | # max_seq_len = tf.Print(max_seq_len, [max_seq_len], "max seq len: ") 212 | # conved_seq_lens = tf.fill([batch_size], max_seq_len) 213 | # conved_seq_lens = tf.Print(conved_seq_lens, [conved_seq_lens], "conved seq len: ", summarize=32) 214 | 215 | ## scheme 2 216 | conved_seq_lens = get_rnn_seqlen(seq_lens) 217 | 218 | conved_seq_lens = tf.Print(conved_seq_lens, [conved_seq_lens], "conved seq len: ", summarize=32) 219 | 220 | # Calculate the average ctc loss across the batch. 221 | ctc_loss = tf.nn.ctc_loss(labels=labels, inputs=tf.cast(logits, tf.float32), 222 | sequence_length=conved_seq_lens, 223 | preprocess_collapse_repeated=False, 224 | ctc_merge_repeated=True, 225 | time_major=True, 226 | ignore_longer_outputs_than_inputs=True) 227 | ctc_loss = tf.Print(ctc_loss, [ctc_loss], "CTC loss: ", summarize=32) 228 | ctc_loss_mean = tf.reduce_mean(ctc_loss, name='ctc_loss') 229 | ctc_loss_mean = tf.Print(ctc_loss_mean, [ctc_loss_mean], "mean CTC loss: ") 230 | # tf.add_to_collection('losses', ctc_loss_mean) 231 | 232 | # The total loss is defined as the cross entropy loss plus all 233 | # of the weight decay terms (L2 loss). 234 | # return tf.add_n(tf.get_collection('losses'), name='total_loss') 235 | return ctc_loss_mean 236 | 237 | 238 | def _add_loss_summaries(total_loss): 239 | """Add summaries for losses in deepSpeech model. 240 | 241 | Generates moving average for all losses and associated summaries for 242 | visualizing the performance of the network. 243 | 244 | Args: 245 | total_loss: Total loss from loss(). 246 | Returns: 247 | loss_averages_op: op for generating moving averages of losses. 248 | """ 249 | # Compute the moving average of all individual losses and the total loss. 250 | loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg') 251 | losses = tf.get_collection('losses') 252 | loss_averages_op = loss_averages.apply(losses + [total_loss]) 253 | 254 | # Attach a scalar summary to all individual losses and the total loss; 255 | # do the same for the averaged version of the losses. 256 | for each_loss in losses + [total_loss]: 257 | # Name each loss as '(raw)' and name the moving average 258 | # version of the loss as the original loss name. 259 | tf.scalar_summary(each_loss.op.name + ' (raw)', each_loss) 260 | tf.scalar_summary(each_loss.op.name, loss_averages.average(each_loss)) 261 | 262 | return loss_averages_op 263 | -------------------------------------------------------------------------------- /src/deepSpeech_input.py: -------------------------------------------------------------------------------- 1 | # Author: Lakshmi Krishnan 2 | # Email : lkrishn7@ford.com 3 | 4 | """Routines for reading the audio data.""" 5 | 6 | import os.path 7 | import glob 8 | import tensorflow as tf 9 | 10 | # Global constants describing the dataset 11 | # Note this definition must match the ALPHABET chosen in 12 | # preprocess_Librispeech.py 13 | ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZ' " # for LibriSpeech 14 | NUM_CLASSES = len(ALPHABET) + 1 # Additional class for blank character 15 | NUM_PER_EPOCH_FOR_TRAIN = 28538 16 | NUM_PER_EPOCH_FOR_EVAL = 2703 17 | NUM_PER_EPOCH_FOR_TEST = 2620 18 | 19 | 20 | def _generate_feats_and_label_batch(filename_queue, batch_size): 21 | """Construct a queued batch of spectral features and transcriptions. 22 | 23 | Args: 24 | filename_queue: queue of filenames to read data from. 25 | batch_size: Number of utterances per batch. 26 | 27 | Returns: 28 | feats: spectrograms. 4D tensor of [batch_size, height, width, 3] size. 29 | labels: transcripts. List of length batch_size. 30 | seq_lens: Sequence Lengths. List of length batch_size. 31 | """ 32 | 33 | # Define how to parse the example 34 | reader = tf.TFRecordReader() 35 | _, serialized_example = reader.read(filename_queue) 36 | context_features = { 37 | "seq_len": tf.FixedLenFeature([], dtype=tf.int64), 38 | "labels": tf.VarLenFeature(dtype=tf.int64) 39 | } 40 | sequence_features = { 41 | # features are 161 dimensional 42 | "feats": tf.FixedLenSequenceFeature([161], dtype=tf.float32) 43 | } 44 | 45 | # Parse the example (returns a dictionary of tensors) 46 | context_parsed, sequence_parsed = tf.parse_single_sequence_example( 47 | serialized=serialized_example, 48 | context_features=context_features, 49 | sequence_features=sequence_features 50 | ) 51 | 52 | # Generate a batch worth of examples after bucketing 53 | seq_len, (feats, labels) = tf.contrib.training.bucket_by_sequence_length( 54 | input_length=tf.cast(context_parsed['seq_len'], tf.int32), 55 | tensors=[sequence_parsed['feats'], context_parsed['labels']], 56 | batch_size=batch_size, 57 | bucket_boundaries=list(range(100, 2500, 100)), 58 | allow_smaller_final_batch=True, 59 | num_threads=16, 60 | dynamic_pad=True) 61 | 62 | return feats, tf.cast(labels, tf.int32), seq_len 63 | 64 | 65 | def inputs(eval_data, data_dir, batch_size, shuffle=False): 66 | """Construct input for fordspeech evaluation using the Reader ops. 67 | 68 | Args: 69 | eval_data: bool, indicating if one should use the train or eval data set. 70 | data_dir: Path to the fordspeech data directory. 71 | batch_size: Number of images per batch. 72 | 73 | Returns: 74 | images: Images. 4D tensor of 75 | [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size. 76 | labels: Labels. 1D tensor of [batch_size] size. 77 | """ 78 | if eval_data == 'train': 79 | num_files = len(glob.glob(os.path.join(data_dir, 'train*/*.tfrecords'))) 80 | filenames = [os.path.join(data_dir, 'train-clean-100/train_' + str(i) + '.tfrecords') 81 | for i in range(1, num_files + 1)] 82 | # print filenames 83 | elif eval_data == 'val': 84 | filenames = glob.glob(os.path.join(data_dir, 'dev*/*.tfrecords')) 85 | 86 | elif eval_data == 'test': 87 | filenames = glob.glob(os.path.join(data_dir, 'test*/*.tfrecords')) 88 | 89 | for file in filenames: 90 | if not tf.gfile.Exists(file): 91 | raise ValueError('Failed to find file: ' + file) 92 | 93 | # Create a queue that produces the filenames to read. 94 | filename_queue = tf.train.string_input_producer(filenames, shuffle=shuffle) 95 | 96 | # Generate a batch of images and labels by building up a queue of examples. 97 | return _generate_feats_and_label_batch(filename_queue, batch_size) 98 | -------------------------------------------------------------------------------- /src/deepSpeech_test.py: -------------------------------------------------------------------------------- 1 | # Author: Lakshmi Krishnan 2 | # Email: lkrishn7@ford.com 3 | # Author: YAO Matrix 4 | # Email: yaoweifeng0301@126.com 5 | 6 | """Evaluation for DeepSpeech2. 7 | 8 | Usage: 9 | Please see the tutorial and website for how to download the Librispeech 10 | data set, generate the TFRecord files of features and train the model. 11 | 12 | """ 13 | 14 | import json 15 | import os 16 | import math 17 | import time 18 | import argparse 19 | from datetime import datetime 20 | import numpy as np 21 | from Levenshtein import distance 22 | import distutils.util 23 | 24 | import tensorflow as tf 25 | 26 | # Note this definition must match the ALPHABET chosen in 27 | # preprocess_Librispeech.py 28 | ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZ' " 29 | IX_TO_CHAR = {i: ch for (i, ch) in enumerate(ALPHABET)} 30 | 31 | 32 | def parse_args(): 33 | """ Parses command line arguments.""" 34 | parser = argparse.ArgumentParser() 35 | parser.add_argument('--eval_dir', type=str, 36 | default='../models/librispeech/eval', 37 | help='Directory to write event logs') 38 | parser.add_argument('--checkpoint_dir', type=str, 39 | default='../models/librispeech/train', 40 | help='Directory where to read model checkpoints.') 41 | parser.add_argument('--eval_data', type=str, default='val', 42 | help="Either 'test' or 'val' or 'train' ") 43 | parser.add_argument('--batch_size', type=int, default=1, 44 | help='Number of feats to process in a batch') 45 | parser.add_argument('--eval_interval_secs', type=int, default=60 * 5, 46 | help='How often to run the eval') 47 | parser.add_argument('--data_dir', type=str, 48 | default='../data/LibriSpeech/processed/', 49 | help='Path to the deepSpeech data directory') 50 | parser.add_argument('--run_once', type=distutils.util.strtobool, default=False, 51 | help='Whether to run eval only once') 52 | parser.add_argument('--engine', type=str, default='tf', 53 | help = 'Select the engine you use: tf, mkl, mkldnn_rnn, cudnn_rnn') 54 | parser.add_argument('--nchw', type=distutils.util.strtobool, default=True, 55 | help = 'Whether to use nchw memory layout') 56 | args = parser.parse_args() 57 | 58 | print "nchw: ", args.nchw 59 | print "engine: ", args.engine 60 | 61 | # Read saved parameters from file 62 | param_file = os.path.join(args.checkpoint_dir, 63 | 'deepSpeech_parameters.json') 64 | with open(param_file, 'r') as file: 65 | params = json.load(file) 66 | # Read network architecture parameters from 67 | # previously saved parameter file. 68 | args.num_hidden = params['num_hidden'] 69 | args.num_rnn_layers = params['num_rnn_layers'] 70 | args.rnn_type = params['rnn_type'] 71 | args.num_filters = params['num_filters'] 72 | args.use_fp16 = params['use_fp16'] 73 | args.moving_avg_decay = params['moving_avg_decay'] 74 | return args 75 | 76 | ARGS = parse_args() 77 | 78 | if ARGS.nchw: 79 | import deepSpeech_NCHW as deepSpeech 80 | else: 81 | import deepSpeech 82 | 83 | 84 | def sparse_to_labels(sparse_matrix): 85 | """ Convert index based transcripts to strings""" 86 | 87 | results = [''] * sparse_matrix.dense_shape[0] 88 | for i, val in enumerate(sparse_matrix.values.tolist()): 89 | results[sparse_matrix.indices[i, 0]] += IX_TO_CHAR[val] 90 | return results 91 | 92 | 93 | def initialize_from_checkpoint(sess, saver): 94 | """ Initialize variables on the graph""" 95 | 96 | # Initialise variables from a checkpoint file, if provided. 97 | ckpt = tf.train.get_checkpoint_state(ARGS.checkpoint_dir) 98 | if ckpt and ckpt.model_checkpoint_path: 99 | # Restores from checkpoint 100 | saver.restore(sess, ckpt.model_checkpoint_path) 101 | # Assuming model_checkpoint_path looks something like: 102 | # /my-favorite-path/train/model.ckpt-0, 103 | # extract global_step from it. 104 | checkpoint_path = ckpt.model_checkpoint_path 105 | global_step = checkpoint_path.split('/')[-1].split('-')[-1] 106 | return global_step 107 | else: 108 | print('No checkpoint file found') 109 | return 110 | 111 | 112 | def inference(predictions_op, true_labels_op, display, sess): 113 | """ Perform inference per batch on pre-trained model. 114 | This function performs inference and computes the CER per utterance. 115 | Args: 116 | predictions_op: Prediction op 117 | true_labels_op: True Labels op 118 | display: print sample predictions if True 119 | sess: default session to evaluate the ops. 120 | Returns: 121 | char_err_rate: list of CER per utterance. 122 | """ 123 | char_err_rate = [] 124 | # Perform inference of batch worth of data at a time. 125 | [predictions, true_labels] = sess.run([predictions_op, 126 | true_labels_op]) 127 | pred_label = sparse_to_labels(predictions[0][0]) 128 | actual_label = sparse_to_labels(true_labels) 129 | for (label, pred) in zip(actual_label, pred_label): 130 | char_err_rate.append(distance(label, pred) / len(label)) 131 | 132 | if display: 133 | # Print sample responses 134 | for i in range(ARGS.batch_size): 135 | print(actual_label[i] + ' vs ' + pred_label[i]) 136 | return char_err_rate 137 | 138 | 139 | def eval_once(sess, saver, summary_writer, predictions_op, summary_op, 140 | true_labels_op): 141 | """Run Eval once. 142 | 143 | Args: 144 | saver: Saver. 145 | summary_writer: Summary writer. 146 | predictions_ops: Op to compute predictions. 147 | summary_op: Summary op. 148 | """ 149 | 150 | # Initialize weights from checkpoint file. 151 | global_step = initialize_from_checkpoint(sess, saver) 152 | 153 | # Start the queue runners. 154 | coord = tf.train.Coordinator() 155 | try: 156 | threads = [] 157 | for queue_runners in tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS): 158 | threads.extend(queue_runners.create_threads(sess, coord=coord, 159 | daemon=True, 160 | start=True)) 161 | # Only using a subset of the training data 162 | if ARGS.eval_data == 'train': 163 | num_examples = 2048 164 | elif ARGS.eval_data == 'val': 165 | num_examples = 2703 166 | elif ARGS.eval_data == 'test': 167 | num_examples = 2620 168 | num_iter = int(math.ceil(num_examples / ARGS.batch_size)) 169 | step = 0 170 | char_err_rate = [] 171 | while step < num_iter and not coord.should_stop(): 172 | print "step: ", step 173 | char_err_rate.append(inference(predictions_op, true_labels_op, 174 | True, sess)) 175 | step += 1 176 | 177 | # Compute and print mean CER 178 | avg_cer = np.mean(char_err_rate) * 100 179 | print('%s: char_err_rate = %.3f %%' % (datetime.now(), avg_cer)) 180 | 181 | # Add summary ops 182 | summary = tf.Summary() 183 | summary.ParseFromString(sess.run(summary_op)) 184 | summary.value.add(tag='char_err_rate', simple_value=avg_cer) 185 | summary_writer.add_summary(summary, global_step) 186 | except Exception as exc: # pylint: disable=broad-except 187 | coord.request_stop(exc) 188 | 189 | # Close threads 190 | coord.request_stop() 191 | coord.join(threads, stop_grace_period_secs=10) 192 | 193 | 194 | def evaluate(): 195 | """ Evaluate deepSpeech modelfor a number of steps.""" 196 | 197 | with tf.Graph().as_default() as graph: 198 | # Get feats and labels for deepSpeech. 199 | feats, labels, seq_lens = deepSpeech.inputs(ARGS.eval_data, 200 | data_dir=ARGS.data_dir, 201 | batch_size=ARGS.batch_size, 202 | use_fp16=ARGS.use_fp16, 203 | shuffle=True) 204 | session = tf.Session() 205 | 206 | # Build ops that computes the logits predictions from the 207 | # inference model. 208 | ARGS.keep_prob = 1.0 # Disable dropout during testing. 209 | logits = deepSpeech.inference(session, feats, seq_lens, ARGS) 210 | 211 | # Calculate predictions. 212 | output_log_prob = tf.nn.log_softmax(logits) 213 | decoder = tf.nn.ctc_greedy_decoder 214 | strided_seq_lens = deepSpeech.get_rnn_seqlen(seq_lens) 215 | predictions = decoder(output_log_prob, strided_seq_lens) 216 | 217 | # Restore the moving average version of the learned variables for eval. 218 | variable_averages = tf.train.ExponentialMovingAverage(ARGS.moving_avg_decay) 219 | variables_to_restore = variable_averages.variables_to_restore() 220 | saver = tf.train.Saver(variables_to_restore) 221 | 222 | # Build the summary operation based on the TF collection of Summaries. 223 | summary_op = tf.summary.merge_all() 224 | summary_writer = tf.summary.FileWriter(ARGS.eval_dir, graph) 225 | 226 | while True: 227 | eval_once(session, saver, summary_writer, predictions, summary_op, labels) 228 | 229 | if ARGS.run_once: 230 | break 231 | time.sleep(ARGS.eval_interval_secs) 232 | 233 | 234 | def main(): 235 | """ 236 | Create eval directory and perform inference on checkpointed model. 237 | """ 238 | if tf.gfile.Exists(ARGS.eval_dir): 239 | tf.gfile.DeleteRecursively(ARGS.eval_dir) 240 | tf.gfile.MakeDirs(ARGS.eval_dir) 241 | evaluate() 242 | 243 | 244 | if __name__ == '__main__': 245 | main() 246 | -------------------------------------------------------------------------------- /src/deepSpeech_train.py: -------------------------------------------------------------------------------- 1 | # Author: Lakshmi Krishnan 2 | # Email: lkrishn7@ford.com 3 | # Author: YAO Matrix 4 | # Email: yaoweifeng0301@126.com 5 | 6 | 7 | """A script to train a deepSpeech model on LibriSpeech data. 8 | 9 | References: 10 | 1. Hannun, Awni, et al. "Deep speech: Scaling up end-to-end 11 | speech recognition." arXiv preprint arXiv:1412.5567 (2014). 12 | 2. Amodei, Dario, et al. "Deep speech 2: End-to-end 13 | speech recognition in english and mandarin." 14 | arXiv preprint arXiv:1512.02595 (2015). 15 | """ 16 | 17 | from datetime import datetime 18 | import os.path 19 | import re 20 | import time 21 | import argparse 22 | import json 23 | import sys 24 | import numpy as np 25 | import distutils.util 26 | 27 | import tensorflow as tf 28 | from tensorflow.python.client import device_lib 29 | from tensorflow.python.client import timeline 30 | from tensorflow.python import debug as tf_debug 31 | 32 | import helper_routines 33 | from setenvs import setenvs 34 | from setenvs import arglist 35 | 36 | def parse_args(): 37 | " Parses command line arguments." 38 | parser = argparse.ArgumentParser() 39 | parser.add_argument('--train_dir', type=str, 40 | default='../models/librispeech/train', 41 | help='Directory to write event logs and checkpoints') 42 | parser.add_argument('--platform', type=str, 43 | default='knl', 44 | help='running platform: knl or bdw') 45 | parser.add_argument('--data_dir', type=str, 46 | default='', 47 | help='Path to the audio data directory') 48 | parser.add_argument('--max_steps', type=int, default=20000, 49 | help='Number of batches to run') 50 | parser.add_argument('--log_device_placement', type=bool, default=False, 51 | help='Whether to log device placement') 52 | parser.add_argument('--batch_size', type=int, default=32, 53 | help='Number of inputs to process in a batch per GPU') 54 | 55 | feature_parser = parser.add_mutually_exclusive_group(required=False) 56 | feature_parser.add_argument('--shuffle', dest='shuffle', 57 | action='store_true') 58 | feature_parser.add_argument('--no-shuffle', dest='shuffle', 59 | action='store_false') 60 | parser.set_defaults(shuffle=True) 61 | 62 | feature_parser = parser.add_mutually_exclusive_group(required=False) 63 | feature_parser.add_argument('--use_fp16', dest='use_fp16', 64 | action='store_true') 65 | feature_parser.add_argument('--use_fp32', dest='use_fp16', 66 | action='store_false') 67 | parser.set_defaults(use_fp16=False) 68 | 69 | parser.add_argument('--keep_prob', type=float, default=0.5, 70 | help='Keep probability for dropout') 71 | parser.add_argument('--num_hidden', type=int, default=1024, 72 | help='Number of hidden nodes') 73 | parser.add_argument('--num_rnn_layers', type=int, default=2, 74 | help='Number of recurrent layers') 75 | parser.add_argument('--checkpoint', type=str, default=None, 76 | help='Continue training from checkpoint file') 77 | parser.add_argument('--rnn_type', type=str, default='bidirectional', 78 | help='unidirectional or bidirectional') 79 | parser.add_argument('--initial_lr', type=float, default=2e-5, 80 | help='Initial learning rate for training') 81 | parser.add_argument('--num_filters', type=int, default=32, 82 | help='Number of convolutional filters') 83 | parser.add_argument('--moving_avg_decay', type=float, default=0.9999, 84 | help='Decay to use for the moving average of weights') 85 | parser.add_argument('--num_epochs_per_decay', type=int, default=5, 86 | help='Epochs after which learning rate decays') 87 | parser.add_argument('--lr_decay_factor', type=float, default=0.999, 88 | help='Learning rate decay factor') 89 | parser.add_argument('--intra_op', type=int, default=44, 90 | help='Intra op thread num') 91 | parser.add_argument('--inter_op', type=int, default=1, 92 | help='Inter op thread num') 93 | parser.add_argument('--engine', type=str, default='tf', 94 | help='Select the engine you use: tf, mkl, mkldnn_rnn, cudnn_rnn') 95 | parser.add_argument('--debug', type=distutils.util.strtobool, default=False, 96 | help='Switch on to enable debug log') 97 | parser.add_argument('--nchw', type=distutils.util.strtobool, default=True, 98 | help='Whether to use nchw memory layout') 99 | parser.add_argument('--dummy', type=distutils.util.strtobool, default=False, 100 | help='Whether to use dummy data rather than librispeech data') 101 | 102 | args = parser.parse_args() 103 | 104 | print "debug: ", args.debug 105 | print "nchw: ", args.nchw 106 | print "dummy: ", args.dummy 107 | print "engine: ", args.engine 108 | print "initial lr: ", args.initial_lr 109 | 110 | # Read architecture hyper-parameters from checkpoint file 111 | # if one is provided. 112 | if args.checkpoint is not None: 113 | param_file = os.path.join(args.checkpoint, 'deepSpeech_parameters.json') 114 | with open(param_file, 'r') as file: 115 | params = json.load(file) 116 | # Read network architecture parameters from previously saved 117 | # parameter file. 118 | args.num_hidden = params['num_hidden'] 119 | args.num_rnn_layers = params['num_rnn_layers'] 120 | args.rnn_type = params['rnn_type'] 121 | args.num_filters = params['num_filters'] 122 | args.use_fp16 = params['use_fp16'] 123 | args.initial_lr = params['initial_lr'] 124 | args.engine = params['engine'] 125 | return args 126 | 127 | ARGS = parse_args() 128 | 129 | import deepSpeech 130 | 131 | g = tf.Graph() 132 | profiling = [] 133 | 134 | def tower_loss(sess, feats, labels, seq_lens): 135 | """Calculate the total loss on a single tower running the deepSpeech model. 136 | 137 | This function builds the graph for computing the loss per tower(GPU). 138 | 139 | ARGS: 140 | feats: Tensor of shape BxFxT representing the 141 | audio features (mfccs or spectrogram). 142 | labels: sparse tensor holding labels of each utterance. 143 | seq_lens: tensor of shape [batch_size] holding 144 | the sequence length per input utterance. 145 | Returns: 146 | Tensor of shape [batch_size] containing 147 | the total loss for a batch of data 148 | """ 149 | 150 | # Build inference Graph. 151 | logits = deepSpeech.inference(sess, feats, seq_lens, ARGS) 152 | 153 | # Build the portion of the Graph calculating the losses. Note that we will 154 | # assemble the total_loss using a custom function below. 155 | total_loss = deepSpeech.loss(logits, labels, seq_lens) 156 | 157 | # Compute the moving average of all individual losses and the total loss. 158 | # loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg') 159 | # loss_averages_op = loss_averages.apply([total_loss]) 160 | 161 | # Attach a scalar summary to all individual losses and the total loss; 162 | # do the same for the averaged version of the losses. 163 | loss_name = total_loss.op.name 164 | # Name each loss as '(raw)' and name the moving average 165 | # version of the loss as the original loss name. 166 | tf.summary.scalar(loss_name + '(raw)', total_loss) 167 | 168 | # Without this loss_averages_op would never run 169 | # with tf.control_dependencies([loss_averages_op]): 170 | # total_loss = tf.identity(total_loss) 171 | 172 | return total_loss 173 | 174 | 175 | def average_gradients(tower_grads): 176 | """Calculate the average gradient for each shared variable across all towers. 177 | 178 | Note that this function provides a synchronization point across all towers. 179 | 180 | Args: 181 | tower_grads: List of lists of (gradient, variable) tuples. The outer list 182 | is over individual gradients. The inner list is over the gradient 183 | calculation for each tower. 184 | Returns: 185 | List of pairs of (gradient, variable) where the 186 | gradient has been averaged across all towers. 187 | """ 188 | average_grads = [] 189 | for grad_and_vars in zip(*tower_grads): 190 | # Note that each grad_and_vars looks like the following: 191 | # ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN)) 192 | grads = [] 193 | for each_grad, _ in grad_and_vars: 194 | # Add 0 dimension to the gradients to represent the tower. 195 | expanded_g = tf.expand_dims(each_grad, 0) 196 | 197 | # Append on a 'tower' dimension which we will average over below. 198 | grads.append(expanded_g) 199 | 200 | # Average over the 'tower' dimension. 201 | grad = tf.concat(grads, 0) 202 | grad = tf.reduce_mean(grad, 0) 203 | 204 | # The variables are redundant because they are shared 205 | # across towers. So we will just return the first tower's pointer to 206 | # the Variable. 207 | weights = grad_and_vars[0][1] 208 | grad_and_var = (grad, weights) 209 | average_grads.append(grad_and_var) 210 | return average_grads 211 | 212 | 213 | def set_learning_rate(): 214 | """ Set up learning rate schedule """ 215 | 216 | # Create a variable to count the number of train() calls. 217 | # This equals the number of batches processed. 218 | global_step = tf.get_variable('global_step', [], 219 | initializer=tf.constant_initializer(0), trainable=False) 220 | 221 | # Calculate the learning rate schedule. 222 | num_batches_per_epoch = (deepSpeech.NUM_PER_EPOCH_FOR_TRAIN / ARGS.batch_size) 223 | decay_steps = int(num_batches_per_epoch * ARGS.num_epochs_per_decay) 224 | 225 | # Decay the learning rate exponentially based on the number of steps. 226 | learning_rate = tf.train.exponential_decay(ARGS.initial_lr, 227 | global_step, 228 | decay_steps, 229 | ARGS.lr_decay_factor, 230 | staircase=True) 231 | 232 | return learning_rate, global_step 233 | 234 | 235 | def fetch_data(): 236 | """ Fetch features, labels and sequence_lengths from a common queue.""" 237 | tot_batch_size = ARGS.batch_size 238 | with tf.device('/cpu'): 239 | feats, labels, seq_lens = deepSpeech.inputs(eval_data='train', 240 | data_dir=ARGS.data_dir, 241 | batch_size=tot_batch_size, 242 | use_fp16=ARGS.use_fp16, 243 | shuffle=ARGS.shuffle) 244 | dense_labels = tf.sparse_tensor_to_dense(labels) 245 | tf.Print(dense_labels, [dense_labels], "labels") 246 | 247 | # Split features and labels and sequence lengths for each tower 248 | return feats, labels, seq_lens 249 | 250 | 251 | def get_loss_grads(sess, data, optimizer): 252 | """ Set up loss and gradient ops. 253 | Add summaries to trainable variables """ 254 | 255 | # Calculate the gradients 256 | [feats, labels, seq_lens] = data 257 | grads_and_vars = None 258 | with tf.device('/cpu'): 259 | # Calculate the loss for the deepSpeech model. 260 | loss = tower_loss(sess, feats, labels, seq_lens) 261 | 262 | # Retain the summaries from the final tower. 263 | summaries = tf.get_collection(tf.GraphKeys.SUMMARIES) 264 | 265 | # Calculate the gradients for the batch of data. 266 | grads_and_vars = optimizer.compute_gradients(loss) 267 | 268 | # Clip the gradients. 269 | clipped_grads_and_vars = [(tf.clip_by_value(grad, clip_value_min=-400, clip_value_max=400), var) for grad, var in grads_and_vars] 270 | 271 | return loss, clipped_grads_and_vars, summaries 272 | 273 | 274 | def run_train_loop(sess, operations, saver): 275 | """ Train the model for required number of steps.""" 276 | (loss_op, train_op, summary_op) = operations 277 | 278 | run_options = None 279 | run_metadata = None 280 | trace_file = None 281 | if ARGS.debug: 282 | run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE) 283 | run_metadata = tf.RunMetadata() 284 | trace_file = open('profiling.json', 'w') 285 | 286 | # Evaluate the ops for max_steps 287 | for step in range(ARGS.max_steps): 288 | print "Step: ", step 289 | 290 | start_time = time.time() 291 | 292 | # print "Trainable Variables: " 293 | # tvariables_names = [v.name for v in tf.trainable_variables()] 294 | # tvalues = sess.run(tvariables_names) 295 | # for k, v in zip(tvariables_names, tvalues): 296 | # print "Variable: ", k 297 | # print v 298 | # print "Global Variables: " 299 | # gvariables_names = [v.name for v in tf.global_variables()] 300 | # gvalues = sess.run(gvariables_names) 301 | # for k, v in zip(gvariables_names, gvalues): 302 | # print "Variable: ", k 303 | # print v 304 | # print "Moving Average Variables: " 305 | # mvariables_names = [v.name for v in tf.moving_average_variables()] 306 | # mvalues = sess.run(mvariables_names) 307 | # for k, v in zip(mvariables_names, mvalues): 308 | # print "Variable: ", k 309 | # print v 310 | 311 | loss_value, _ = sess.run([loss_op, train_op], options=run_options, run_metadata=run_metadata) 312 | 313 | duration = time.time() - start_time 314 | assert not np.isnan(loss_value), 'Model diverged with loss = NaN' 315 | 316 | if step >= 10: 317 | profiling.append(duration) 318 | 319 | # tf.clear_collection("losses") 320 | 321 | # Print progress periodically 322 | if step > 10 and step % 10 == 0: 323 | examples_per_sec = (ARGS.batch_size * 1) / np.average(profiling) 324 | format_str = ('%s: step %d, ' 325 | 'loss = %.2f (%.1f examples/sec; %.3f ' 326 | 'sec/batch)') 327 | print(format_str % (datetime.now(), step, loss_value, 328 | examples_per_sec, np.average(profiling) / 1)) 329 | 330 | # Run the summary ops periodically 331 | if 0: 332 | summary_writer = tf.summary.FileWriter(ARGS.train_dir, sess.graph) 333 | summary_writer.add_summary(sess.run(summary_op), step) 334 | 335 | # Save the model checkpoint periodically 336 | if step % 20000 == 0 or (step + 1) == ARGS.max_steps: 337 | checkpoint_path = os.path.join(ARGS.train_dir, 'model.ckpt') 338 | saver.save(sess, checkpoint_path, global_step=step) 339 | 340 | if ARGS.debug and step == 20: 341 | trace = timeline.Timeline(run_metadata.step_stats) 342 | trace_file.write(trace.generate_chrome_trace_format()) 343 | 344 | prof_options = tf.contrib.tfprof.model_analyzer.TRAINABLE_VARS_PARAMS_STAT_OPTIONS 345 | prof_options['output'] = "file:outfile=./params.log" 346 | param_stats = tf.contrib.tfprof.model_analyzer.print_model_analysis(tf.get_default_graph(), 347 | tfprof_options=prof_options) 348 | # sys.stdout.write('total_params: %d\n' % param_stats.total_parameters) 349 | 350 | prof_options = tf.contrib.tfprof.model_analyzer.FLOAT_OPS_OPTIONS 351 | prof_options['output'] = "file:outfile=./flops.log" 352 | tf.contrib.tfprof.model_analyzer.print_model_analysis(tf.get_default_graph(), 353 | run_meta=run_metadata, 354 | tfprof_options=prof_options) 355 | 356 | prof_options = tf.contrib.tfprof.model_analyzer.PRINT_ALL_TIMING_MEMORY 357 | prof_options['output'] = "file:outfile=./timing_memory.log" 358 | prof_options['start_name_regexes'] = "ctc_loss" 359 | tf.contrib.tfprof.model_analyzer.print_model_analysis(tf.get_default_graph(), 360 | tfprof_cmd='graph', 361 | run_meta=run_metadata, 362 | tfprof_options=prof_options) 363 | 364 | 365 | def initialize_from_checkpoint(sess, saver): 366 | """ Initialize variables on the graph""" 367 | # Initialise variables from a checkpoint file, if provided. 368 | ckpt = tf.train.get_checkpoint_state(ARGS.checkpoint) 369 | if ckpt and ckpt.model_checkpoint_path: 370 | # Restores from checkpoint 371 | saver.restore(sess, ckpt.model_checkpoint_path) 372 | # Assuming model_checkpoint_path looks something like: 373 | # /my-favorite-path/train/model.ckpt-0, 374 | # extract global_step from it. 375 | checkpoint_path = ckpt.model_checkpoint_path 376 | global_step = checkpoint_path.split('/')[-1].split('-')[-1] 377 | return global_step 378 | else: 379 | print('No checkpoint file found') 380 | return 381 | 382 | 383 | def add_summaries(summaries, learning_rate, grads): 384 | """ Add summary ops""" 385 | 386 | # Track quantities for Tensorboard display 387 | summaries.append(tf.summary.scalar('learning_rate', learning_rate)) 388 | # Add histograms for gradients. 389 | for grad, var in grads: 390 | if grad is not None: 391 | summaries.append(tf.summary.histogram(var.op.name + '/gradients', grad)) 392 | # Add histograms for trainable variables. 393 | for var in tf.trainable_variables(): 394 | summaries.append(tf.summary.histogram(var.op.name, var)) 395 | 396 | # Build the summary operation from the last tower summaries. 397 | summary_op = tf.summary.merge(summaries) 398 | return summary_op 399 | 400 | 401 | def train(): 402 | """ 403 | Train deepSpeech for a number of steps. 404 | This function build a set of ops required to build the model and optimize 405 | weights. 406 | """ 407 | with g.as_default(), tf.device('/device:GPU:0'): 408 | # Learning rate set up 409 | learning_rate, global_step = set_learning_rate() 410 | 411 | # Create an optimizer that performs gradient descent. 412 | optimizer = tf.train.AdamOptimizer(learning_rate) 413 | 414 | # Fetch a batch worth of data 415 | data = fetch_data() 416 | 417 | # Start running operations on the Graph. allow_soft_placement 418 | # must be set to True to build towers on GPU, as some of the 419 | # ops do not have GPU implementations. 420 | sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, 421 | log_device_placement=ARGS.log_device_placement)) 422 | 423 | # Construct loss and gradient ops 424 | loss_op, grads, summaries = get_loss_grads(sess, data, optimizer) 425 | 426 | # Apply the gradients to adjust the shared variables. 427 | apply_gradient_op = optimizer.apply_gradients(grads, 428 | global_step=global_step) 429 | 430 | # Track the moving averages of all trainable variables. 431 | # variable_averages = tf.train.ExponentialMovingAverage(ARGS.moving_avg_decay, global_step) 432 | # variables_averages_op = variable_averages.apply(tf.trainable_variables()) 433 | 434 | # Group all updates to into a single train op. 435 | # train_op = tf.group(apply_gradient_op, variables_averages_op) 436 | 437 | train_op = apply_gradient_op 438 | 439 | # Build summary op 440 | summary_op = add_summaries(summaries, learning_rate, grads) 441 | 442 | # Create a saver. 443 | saver = tf.train.Saver(tf.global_variables(), max_to_keep=100) 444 | 445 | # sess = tf_debug.LocalCLIDebugWrapperSession(sess) 446 | # sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan) 447 | 448 | # Initialize vars. 449 | if ARGS.checkpoint is not None: 450 | print "can use checkpoint" 451 | global_step = initialize_from_checkpoint(sess, saver) 452 | else: 453 | print "cannot use checkpoint" 454 | sess.run(tf.global_variables_initializer()) 455 | 456 | # print "Trainable Variables: " 457 | # tvariables_names = [v.name for v in tf.trainable_variables()] 458 | # tvalues = sess.run(tvariables_names) 459 | # for k, v in zip(tvariables_names, tvalues): 460 | # print "Variable: ", k 461 | # print "Global Variables: " 462 | # gvariables_names = [v.name for v in tf.global_variables()] 463 | # gvalues = sess.run(gvariables_names) 464 | # for k, v in zip(gvariables_names, gvalues): 465 | # print "Variable: ", k 466 | # print "Moving Average Variables: " 467 | # mvariables_names = [v.name for v in tf.moving_average_variables()] 468 | # mvalues = sess.run(mvariables_names) 469 | # for k, v in zip(mvariables_names, mvalues): 470 | # print "Variable: ", k 471 | 472 | # Start the queue runners. 473 | tf.train.start_queue_runners(sess) 474 | 475 | g.finalize() 476 | 477 | # Run training loop 478 | run_train_loop(sess, (loss_op, train_op, summary_op), saver) 479 | 480 | 481 | def main(): 482 | """ 483 | Creates checkpoint directory to save training progress and records 484 | training parameters in a json file before initiating the training session. 485 | """ 486 | if ARGS.train_dir != ARGS.checkpoint: 487 | if tf.gfile.Exists(ARGS.train_dir): 488 | tf.gfile.DeleteRecursively(ARGS.train_dir) 489 | tf.gfile.MakeDirs(ARGS.train_dir) 490 | 491 | # Dump command line arguments to a parameter file, 492 | # in case the network training resumes at a later time. 493 | with open(os.path.join(ARGS.train_dir, 494 | 'deepSpeech_parameters.json'), 'w') as outfile: 495 | json.dump(vars(ARGS), outfile, sort_keys=True, indent=4) 496 | 497 | args = setenvs(sys.argv) 498 | 499 | train() 500 | 501 | if __name__ == '__main__': 502 | main() 503 | -------------------------------------------------------------------------------- /src/helper_routines.py: -------------------------------------------------------------------------------- 1 | """ 2 | Collection of helper routines to set up variables on cpu 3 | and apply weight decay. 4 | """ 5 | import re 6 | import tensorflow as tf 7 | 8 | 9 | # If a model is trained with multiple GPUs, prefix all Op names with tower_name 10 | # to differentiate the operations. Note that this prefix is removed from the 11 | # names of the summaries when visualizing a model. 12 | TOWER_NAME = 'tower' 13 | 14 | 15 | def _activation_summary(act): 16 | """Helper to create summaries for activations. 17 | 18 | Creates a summary that provides a histogram of activations. 19 | Creates a summary that measure the sparsity of activations. 20 | 21 | Args: 22 | act: Tensor 23 | """ 24 | # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training 25 | # session. This helps the clarity of presentation on tensorboard. 26 | tensor_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', act.op.name) 27 | tf.summary.histogram(tensor_name + '/activations', act) 28 | tf.summary.scalar(tensor_name + '/sparsity', tf.nn.zero_fraction(act)) 29 | 30 | 31 | def _variable_on_cpu(name, shape, initializer=None, use_fp16=False, trainable=True): 32 | """Helper to create a Variable stored on cpu memory. 33 | 34 | Args: 35 | name: name of the variable 36 | shape: list of ints 37 | initializer: initializer for Variable 38 | 39 | Returns: 40 | Variable Tensor 41 | """ 42 | with tf.device('/cpu'): 43 | dtype = tf.float16 if use_fp16 else tf.float32 44 | var = tf.get_variable(name, shape, 45 | initializer=initializer, dtype=dtype, trainable=trainable) 46 | return var 47 | 48 | 49 | def _variable_with_weight_decay(name, shape, wd_value, use_fp16): 50 | """Helper to create an initialized Variable with weight decay. 51 | 52 | Note that the Variable is initialized with a truncated normal distribution. 53 | A weight decay is added only if one is specified. 54 | 55 | Args: 56 | name: name of the variable 57 | shape: list of ints 58 | wd: add L2Loss weight decay multiplied by this float. If None, weight 59 | decay is not added for this Variable. 60 | 61 | Returns: 62 | Variable Tensor 63 | """ 64 | dtype = tf.float16 if use_fp16 else tf.float32 65 | var = _variable_on_cpu(name, shape, 66 | tf.contrib.layers.variance_scaling_initializer(mode='FAN_IN', 67 | uniform=False, 68 | seed=None, 69 | dtype=dtype), use_fp16) 70 | if wd_value is not None: 71 | weight_decay = tf.cast(tf.mul(tf.nn.l2_loss(var), 72 | wd_value, name='weight_loss'), 73 | tf.float32) 74 | tf.add_to_collection('losses', weight_decay) 75 | return var 76 | -------------------------------------------------------------------------------- /src/mkldnn_rnn_op.py: -------------------------------------------------------------------------------- 1 | """ 2 | Custom RNN Cell definition. 3 | Default RNNCell in TensorFlow throws errors when 4 | variables are re-used between devices. 5 | """ 6 | import tensorflow as tf 7 | 8 | from tensorflow.contrib.rnn import BasicRNNCell 9 | from tensorflow.python.util import nest 10 | from tensorflow.python.training import moving_averages 11 | 12 | from tensorflow.contrib.mkldnn_rnn.python.ops import mkldnn_rnn_ops 13 | 14 | from helper_routines import _variable_on_cpu 15 | 16 | class MkldnnRNNCell(BasicRNNCell): 17 | """ This is a MkldnnRNNCell based on MKLDNN engine. The Matrix of weights is 18 | set using _variable_on_cpu. 19 | The default version of the BasicRNNCell, did not support the ability to 20 | pin weights on one device (say cpu). 21 | """ 22 | def __init__(self, sess, num_units, input_size = None, activation=tf.nn.relu6, use_fp16=False): 23 | self._num_units = num_units 24 | self.use_fp16 = use_fp16 25 | self.model = mkldnn_rnn_ops.MkldnnRNNRelu(1, self._num_units, input_size, dropout=0.0) 26 | param_size_t = self.model.params_size() 27 | if sess is not None: 28 | self.param_size = sess.run(param_size_t) 29 | # print "param size: ", self.param_size 30 | 31 | def __call__(self, inputs, state, scope=None, weight_size=None): 32 | with tf.variable_scope(scope or type(self).__name__): 33 | # if len(inputs.get_shape()) == 2: 34 | # inputs = tf.expand_dims(inputs, axis=0) 35 | # state = tf.expand_dims(state, axis=0) 36 | # print "input size: ", inputs.get_shape(), " state size: ", state.get_shape() 37 | rnn_weights = _variable_on_cpu("rnn_weights", [self.param_size], tf.constant_initializer(1.0 / self.param_size), self.use_fp16) 38 | output, output_h = self.model(input_data=inputs, 39 | input_h=state, 40 | params=rnn_weights, 41 | is_training=True) 42 | # print "output size: ", output.get_shape(), "output h size: ", output_h.get_shape() 43 | # output = tf.squeeze(output, axis=0) 44 | # output_h = tf.squeeze(output_h, axis=0) 45 | return output, output_h 46 | -------------------------------------------------------------------------------- /src/preprocess_LibriSpeech.py: -------------------------------------------------------------------------------- 1 | # Author: Lakshmi Krishnan 2 | # Email: lkrishn7@ford.com 3 | 4 | """Creates SequenceExamples and stores them in TFRecords format. 5 | 6 | Computes spectral features from raw audio waveforms and groups the audio into 7 | multiple TFRecords files based on their length. The utterances are stored in 8 | sorted order based on length to allow for sorta-grad implementation. 9 | 10 | Note: 11 | This script can take a few hours to run to compute and store the mfcc 12 | features on the 100 hour Librispeech dataset. 13 | 14 | """ 15 | import os 16 | import glob2 17 | import soundfile as sf 18 | from python_speech_features import mfcc 19 | import numpy as np 20 | import tensorflow as tf 21 | from tqdm import tqdm 22 | 23 | 24 | def compute_linear_specgram(samples, 25 | sample_rate, 26 | stride_ms=10.0, 27 | window_ms=20.0, 28 | max_freq=None, 29 | eps=1e-14): 30 | """Compute the linear spectrogram from FFT energy.""" 31 | if max_freq is None: 32 | max_freq = sample_rate / 2 33 | if max_freq > sample_rate / 2: 34 | raise ValueError("max_freq must not be greater than half of " 35 | "sample rate.") 36 | if stride_ms > window_ms: 37 | raise ValueError("Stride size must not be greater than " 38 | "window size.") 39 | stride_size = int(0.001 * sample_rate * stride_ms) 40 | window_size = int(0.001 * sample_rate * window_ms) 41 | 42 | ## z-score normalizer 43 | # samples = samples - np.mean(samples) 44 | # samples = samples / np.std(samples) 45 | 46 | specgram, freqs = _specgram_real(samples, 47 | window_size=window_size, 48 | stride_size=stride_size, 49 | sample_rate=sample_rate) 50 | ind = np.where(freqs <= max_freq)[0][-1] + 1 51 | spectrogram = np.log(specgram[:ind, :] + eps) 52 | 53 | spectrogram = spectrogram.transpose() 54 | 55 | # z-score normalizer 56 | spectrogram = spectrogram - np.mean(spectrogram) 57 | spectrogram = spectrogram / np.std(spectrogram) 58 | 59 | # print "spectrogram shape: ", spectrogram.shape 60 | return spectrogram 61 | 62 | def _specgram_real(samples, window_size, stride_size, sample_rate): 63 | """Compute the spectrogram for samples from a real signal.""" 64 | # extract strided windows 65 | truncate_size = (len(samples) - window_size) % stride_size 66 | samples = samples[:len(samples) - truncate_size] 67 | nshape = (window_size, (len(samples) - window_size) // stride_size + 1) 68 | nstrides = (samples.strides[0], samples.strides[0] * stride_size) 69 | windows = np.lib.stride_tricks.as_strided(samples, shape=nshape, strides=nstrides) 70 | assert np.all(windows[:, 1] == samples[stride_size:(stride_size + window_size)]) 71 | # window weighting, squared Fast Fourier Transform (fft), scaling 72 | weighting = np.hanning(window_size)[:, None] 73 | fft = np.fft.rfft(windows * weighting, axis=0) 74 | fft = np.absolute(fft) 75 | fft = fft**2 76 | scale = np.sum(weighting**2) * sample_rate 77 | fft[1:-1, :] *= (2.0 / scale) 78 | fft[(0, -1), :] /= scale 79 | # prepare fft frequency list 80 | freqs = float(sample_rate) / window_size * np.arange(fft.shape[0]) 81 | return fft, freqs 82 | 83 | 84 | def compute_mfcc(audio_data, sample_rate): 85 | ''' Computes the mel-frequency cepstral coefficients. 86 | The audio time series is normalised and its mfcc features are computed. 87 | 88 | Args: 89 | audio_data: time series of the speech utterance. 90 | sample_rate: sampling rate. 91 | Returns: 92 | mfcc_feat:[num_frames x F] matrix representing the mfcc. 93 | 94 | ''' 95 | 96 | # z-score normalizer 97 | audio_data = audio_data - np.mean(audio_data) 98 | audio_data = audio_data / np.std(audio_data) 99 | 100 | mfcc_feat = mfcc(audio_data, sample_rate, winlen=0.02, winstep=0.01, 101 | numcep=13, nfilt=26, nfft=512, lowfreq=0, highfreq=None, 102 | preemph=0.97, ceplifter=22, appendEnergy=True) 103 | print "mfcc shape: ", mfcc_feat.shape 104 | return mfcc_feat 105 | 106 | 107 | def make_example(seq_len, spec_feat, labels): 108 | ''' Creates a SequenceExample for a single utterance. 109 | This function makes a SequenceExample given the sequence length, 110 | mfcc features and corresponding transcript. 111 | These sequence examples are read using tf.parse_single_sequence_example 112 | during training. 113 | 114 | Note: Some of the tf modules used in this function(such as 115 | tf.train.Feature) do not have comprehensive documentation in v0.12. 116 | This function was put together using the test routines in the 117 | tensorflow repo. 118 | See: https://github.com/tensorflow/tensorflow/ 119 | blob/246a3724f5406b357aefcad561407720f5ccb5dc/ 120 | tensorflow/python/kernel_tests/parsing_ops_test.py 121 | 122 | 123 | Args: 124 | seq_len: integer represents the sequence length in time frames. 125 | spec_feat: [TxF] matrix of mfcc features. 126 | labels: list of ints representing the encoded transcript. 127 | Returns: 128 | Serialized sequence example. 129 | 130 | ''' 131 | # Feature lists for the sequential features of the example 132 | feats_list = [tf.train.Feature(float_list=tf.train.FloatList(value=frame)) 133 | for frame in spec_feat] 134 | feat_dict = {"feats": tf.train.FeatureList(feature=feats_list)} 135 | sequence_feats = tf.train.FeatureLists(feature_list=feat_dict) 136 | 137 | # Context features for the entire sequence 138 | len_feat = tf.train.Feature(int64_list=tf.train.Int64List(value=[seq_len])) 139 | label_feat = tf.train.Feature(int64_list=tf.train.Int64List(value=labels)) 140 | 141 | context_feats = tf.train.Features(feature={"seq_len": len_feat, 142 | "labels": label_feat}) 143 | 144 | ex = tf.train.SequenceExample(context=context_feats, 145 | feature_lists=sequence_feats) 146 | 147 | return ex.SerializeToString() 148 | 149 | 150 | def process_data(partition): 151 | """ Reads audio waveform and transcripts from a dataset partition 152 | and generates mfcc featues. 153 | 154 | Args: 155 | parition - represents the dataset partition name. 156 | 157 | Returns: 158 | feats: dict containing mfcc feature per utterance 159 | transcripts: dict of lists representing transcript. 160 | utt_len: dict of ints holding sequence length of each 161 | utterance in time frames. 162 | 163 | """ 164 | 165 | feats = {} 166 | transcripts = {} 167 | utt_len = {} # Required for sorting the utterances based on length 168 | 169 | for filename in glob2.iglob(partition + '/**/*.txt'): 170 | with open(filename, 'r') as f: 171 | for line in f: 172 | parts = line.split() 173 | audio_file = parts[0] 174 | file_path = os.path.join(os.path.dirname(filename), audio_file + '.flac') 175 | audio, sample_rate = sf.read(file_path) 176 | # feats[audio_file] = compute_mfcc(audio, sample_rate) 177 | feats[audio_file] = compute_linear_specgram(audio, sample_rate) 178 | utt_len[audio_file] = feats[audio_file].shape[0] 179 | target = ' '.join(parts[1:]) 180 | transcripts[audio_file] = [CHAR_TO_IX[i] for i in target] 181 | if ((utt_len[audio_file] - 19) // 2 - 9) // 2 == 60: 182 | print("file[%s] -- utterance length: %d, transcripts lenght: %d" % (audio_file, ((utt_len[audio_file] - 19) // 2 - 9) // 2, len(transcripts[audio_file]))) 183 | return feats, transcripts, utt_len 184 | 185 | 186 | def create_records(): 187 | """ Pre-processes the raw audio and generates TFRecords. 188 | This function computes the mfcc features, encodes string transcripts 189 | into integers, and generates sequence examples for each utterance. 190 | Multiple sequence records are then written into TFRecord files. 191 | """ 192 | for partition in sorted(glob2.glob(AUDIO_PATH + '/*')): 193 | if os.path.isfile(partition): 194 | continue 195 | print('Processing ' + partition) 196 | feats, transcripts, utt_len = process_data(partition) 197 | sorted_utts = sorted(utt_len, key=utt_len.get) 198 | # bin into groups of 100 frames. 199 | max_t = int(utt_len[sorted_utts[-1]] / 100) 200 | min_t = int(utt_len[sorted_utts[0]] / 100) 201 | 202 | # Create destination directory 203 | write_dir = os.path.join(AUDIO_PATH, '../../processed', partition.split('/')[-1]) 204 | if tf.gfile.Exists(write_dir): 205 | tf.gfile.DeleteRecursively(write_dir) 206 | tf.gfile.MakeDirs(write_dir) 207 | 208 | if os.path.basename(partition) == 'train-clean-100': 209 | # Create multiple TFRecords based on utterance length for training 210 | writer = {} 211 | count = {} 212 | print('Processing training files...') 213 | for i in range(min_t, max_t + 1): 214 | filename = os.path.join(write_dir, 'train' + '_' + str(i) + '.tfrecords') 215 | writer[i] = tf.python_io.TFRecordWriter(filename) 216 | count[i] = 0 217 | 218 | for utt in tqdm(sorted_utts): 219 | example = make_example(utt_len[utt], feats[utt].tolist(), transcripts[utt]) 220 | index = int(utt_len[utt] / 100) 221 | writer[index].write(example) 222 | count[index] += 1 223 | 224 | for i in range(min_t, max_t + 1): 225 | writer[i].close() 226 | print(count) 227 | 228 | # Remove bins which have fewer than 20 utterances 229 | for i in range(min_t, max_t + 1): 230 | if count[i] < 20: 231 | os.remove(os.path.join(write_dir, 'train' + '_' + str(i) + '.tfrecords')) 232 | else: 233 | # Create single TFRecord for dev and test partition 234 | filename = os.path.join(write_dir, os.path.basename(write_dir) + '.tfrecords') 235 | print('Creating', filename) 236 | record_writer = tf.python_io.TFRecordWriter(filename) 237 | for utt in sorted_utts: 238 | example = make_example(utt_len[utt], feats[utt].tolist(), transcripts[utt]) 239 | record_writer.write(example) 240 | record_writer.close() 241 | print('Processed ' + str(len(sorted_utts)) + ' audio files') 242 | 243 | # Audio path is the location of the directory that contains the librispeech 244 | # data partitioned into three folders: dev-clean, train-clean-100, test-clean 245 | AUDIO_PATH = '/home/matrix/data/librispeech/LibriSpeech/' # '../data/LibriSpeech/audio' 246 | ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZ' " 247 | CHAR_TO_IX = {ch: i for (i, ch) in enumerate(ALPHABET)} 248 | 249 | if __name__ == '__main__': 250 | create_records() 251 | -------------------------------------------------------------------------------- /src/setenvs.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from __future__ import division 3 | from __future__ import print_function 4 | 5 | import os 6 | import sys 7 | 8 | class arglist: 9 | platform = 'knl' 10 | 11 | def setenvs(inargv): 12 | args = arglist() 13 | for i in range(0, len(inargv) - 1): 14 | if inargv[i] == '--platform' : 15 | args.platform = inargv[i + 1] 16 | assert (args.platform == 'knl' or args.platform == 'bdw') 17 | # print 'Using platform ', args.platform 18 | # print 'Groups set to ', args.groups 19 | if (args.platform == 'bdw'): 20 | os.environ["KMP_BLOCKTIME"] = "1" 21 | os.environ["KMP_SETTINGS"] = "1" 22 | os.environ["OMP_NUM_THREADS"] = "8" 23 | os.environ["MKL_NUM_THREADS"] = "8" 24 | os.environ["OMP_DYNAMIC"] = "false" 25 | os.environ["KMP_AFFINITY"]= "granularity=fine,verbose,compact,1,0" 26 | else: 27 | os.environ["KMP_BLOCKTIME"] = "0" 28 | os.environ["KMP_SETTINGS"] = "1" 29 | os.environ["OMP_NUM_THREADS"] = "8" 30 | os.environ["MKL_NUM_THREADS"] = "8" 31 | os.environ["OMP_DYNAMIC"] = "false" 32 | os.environ["KMP_AFFINITY"] = "granularity=fine,verbose,explicit,proclist=[4-67]" 33 | # os.environ["KMP_AFFINITY"] = "granularity=core,verbose,scatter" 34 | return args 35 | -------------------------------------------------------------------------------- /src/test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # This script trains a deepspeech model in tensorflow with sorta-grad. 3 | # usage ./train.sh or ./train.sh dummy 4 | 5 | 6 | clear 7 | cur_dir=$(cd "$(dirname $0)";pwd) 8 | # echo ${cur_dir} 9 | export PYTHONPATH=${cur_path}:/home/matrix/inteltf/:$PYTHONPATH 10 | # echo $PYTHONPATH 11 | export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64/:$LD_LIBRARY_PATH 12 | 13 | # activate Intel Python 14 | # source /opt/intel/intelpython2/bin/activate 15 | 16 | # environment variables 17 | unset TF_CPP_MIN_VLOG_LEVEL 18 | # export TF_CPP_MIN_VLOG_LEVEL=1 19 | 20 | # clear 21 | echo "-----------------------------------" 22 | echo "Start testing" 23 | 24 | nchw=True # True or False 25 | engine="mkl" # tf, mkl, cudnn_rnn, mkldnn_rnn 26 | 27 | config_check_one=`test "${nchw}" = "False" && test "${engine}"x = "tf"x -o "${engine}"x = "cudnn_rnn"x && echo 'OK'` 28 | # echo "check one: "$config_check_one 29 | config_check_two=`test "${nchw}" = "True" && test "${engine}"x == "mkl"x -o "${engine}"x = "mkldnn_rnn"x && echo 'OK'` 30 | # echo "check two: "$config_check_two 31 | check=`test ${config_check_one}x = "OK"x -o ${config_check_two}x = "OK"x && echo 'OK'` 32 | # echo "check: "$check 33 | 34 | if [[ ${check}x != "OK"x ]];then 35 | echo "unsupported configuration conbimation" 36 | exit -1 37 | fi 38 | 39 | python deepSpeech_test.py --eval_data 'test' --nchw ${nchw} --engine ${engine} --run_once True 40 | echo "Done" 41 | 42 | # deactivate Intel Python 43 | # source /opt/intel/intelpython2/bin/deactivate 44 | 45 | -------------------------------------------------------------------------------- /src/train.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | clear 4 | cur_dir=$(cd "$(dirname $0)";pwd) 5 | # echo ${cur_dir} 6 | export PYTHONPATH=${cur_path}:$PYTHONPATH 7 | # echo $PYTHONPATH 8 | export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64/:$LD_LIBRARY_PATH 9 | 10 | # activate Intel Python 11 | # source /opt/intel/intelpython2/bin/activate 12 | 13 | # environment variables 14 | unset TF_CPP_MIN_VLOG_LEVEL 15 | # export TF_CPP_MIN_VLOG_LEVEL=2 16 | 17 | # clear 18 | echo "-----------------------------------" 19 | echo "Start training" 20 | 21 | dummy=False # True or False 22 | nchw=False # True or False 23 | debug=False # True or False 24 | engine="tf" # tf, mkl, cudnn_rnn, mkldnn_rnn 25 | 26 | # echo $dummy 27 | 28 | config_check_one=`test "${nchw}" = "False" && test "${engine}"x = "tf"x -o "${engine}"x = "cudnn_rnn"x && echo 'OK'` 29 | # echo "check one: "$config_check_one 30 | config_check_two=`test "${nchw}" = "True" && test "${engine}"x == "mkl"x -o "${engine}"x = "mkldnn_rnn"x && echo 'OK'` 31 | # echo "check two: "$config_check_two 32 | check=`test ${config_check_one}x = "OK"x -o ${config_check_two}x = "OK"x && echo 'OK'` 33 | # echo "check: "$check 34 | 35 | if [[ ${check}x != "OK"x ]];then 36 | echo "unsupported configuration conbimation" 37 | exit -1 38 | fi 39 | 40 | model_dir='../models/librispeech/train' 41 | data_dir='/home/matrix/data/processed/' 42 | python deepSpeech_train.py --batch_size 32 --no-shuffle --max_steps 400000 --num_rnn_layers 3 --num_hidden 1024 --num_filters 32 --initial_lr 5e-4 --train_dir $model_dir --data_dir $data_dir --debug ${debug} --nchw ${nchw} --engine ${engine} --dummy ${dummy} 43 | 44 | echo "Done" 45 | 46 | # deactivate Intel Python 47 | # source /opt/intel/intelpython2/bin/deactivate 48 | 49 | -------------------------------------------------------------------------------- /src/validation.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # This script trains a deepspeech model in tensorflow with sorta-grad. 3 | # usage ./train.sh or ./train.sh dummy 4 | 5 | 6 | clear 7 | cur_dir=$(cd "$(dirname $0)";pwd) 8 | # echo ${cur_dir} 9 | export PYTHONPATH=${cur_path}:/home/matrix/inteltf/:$PYTHONPATH 10 | # echo $PYTHONPATH 11 | export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64/:$LD_LIBRARY_PATH 12 | 13 | # activate Intel Python 14 | # source /opt/intel/intelpython2/bin/activate 15 | 16 | # environment variables 17 | unset TF_CPP_MIN_VLOG_LEVEL 18 | # export TF_CPP_MIN_VLOG_LEVEL=1 19 | 20 | # clear 21 | echo "-----------------------------------" 22 | echo "Start validation" 23 | 24 | nchw=True # True or False 25 | engine="mkl" # tf, mkl, cudnn_rnn, mkldnn_rnn 26 | 27 | 28 | config_check_one=`test "${nchw}" = "False" && test "${engine}"x = "tf"x -o "${engine}"x = "cudnn_rnn"x && echo 'OK'` 29 | # echo "check one: "$config_check_one 30 | config_check_two=`test "${nchw}" = "True" && test "${engine}"x == "mkl"x -o "${engine}"x = "mkldnn_rnn"x && echo 'OK'` 31 | # echo "check two: "$config_check_two 32 | check=`test ${config_check_one}x = "OK"x -o ${config_check_two}x = "OK"x && echo 'OK'` 33 | # echo "check: "$check 34 | 35 | if [[ ${check}x != "OK"x ]];then 36 | echo "unsupported configuration conbimation" 37 | exit -1 38 | fi 39 | 40 | python deepSpeech_test.py --eval_data 'val' --nchw ${nchw} --engine ${engine} 41 | 42 | echo "Done" 43 | 44 | # deactivate Intel Python 45 | # source /opt/intel/intelpython2/bin/deactivate 46 | 47 | -------------------------------------------------------------------------------- /src/vtune.sh: -------------------------------------------------------------------------------- 1 | source /opt/intel/vtune_amplifier_xe/amplxe-vars.sh 2 | source /opt/intel/vtune_amplifier_xe/sep_vars.sh 3 | source /opt/intel/advisor/advixe-vars.sh 4 | 5 | # amplxe-cl -collect advanced-hotspots -- ./train.sh 6 | amplxe-cl -collect hotspots -- ./train.sh 7 | 8 | # amplxe-cl -collect memory-access -- ./train.sh 9 | 10 | # amplxe-cl -collect-with runsa -knob event-config=? -- ./train.sh 11 | # amplxe-cl -collect general-exploration -- ./train.sh 12 | # amplxe-gui 13 | -------------------------------------------------------------------------------- /tools/parse_log.py: -------------------------------------------------------------------------------- 1 | import fileinput as fin 2 | 3 | # funcs: 4 | def findValWithFormat(line): 5 | lines.append(line) 6 | taken = line.split(" ") 7 | raw_val = taken[-1] 8 | val = raw_val.split("/")[-1] 9 | val = val[0:-2] 10 | if 'us' in val: 11 | val = float(val[0:val.find('us')]) 12 | val = val/1000 13 | else: 14 | val = float(val[0:val.find('ms')]) 15 | return val 16 | 17 | def getCellNum(line): 18 | cell_num = line[line.find(rnn_cell_string):line.find(rnn_cell_string) + len(rnn_cell_string) + 1] 19 | return cell_num 20 | 21 | def profRNNCell(line, rnncell_prof): 22 | cell_num = getCellNum(line) 23 | val = findValWithFormat(line) 24 | rnncell_prof[cell_num] += val 25 | 26 | # variables: 27 | lines = [] 28 | module_rnncell = "CustomRNNCell2" 29 | module_grad = 'gradients' 30 | num_rnn_layer = 7 31 | rnn_cell_string = "cell_" 32 | module_rnn = 'rnn' 33 | module_conv1 = 'conv1' 34 | module_conv2 = 'conv2' 35 | module_softmax = 'softmax_linear' 36 | module_ctc = ['ctc_loss', 'CTCLoss'] 37 | module_bn = 'bn2' 38 | 39 | rnn_cells = [rnn_cell_string+str(i) for i in range(num_rnn_layer)] 40 | 41 | rnncell_f_prof = dict.fromkeys(rnn_cells) 42 | rnncell_b_prof = dict.fromkeys(rnn_cells) 43 | 44 | # prf estimator: 45 | for el in rnncell_f_prof: 46 | rnncell_f_prof[el] = 0.0 47 | for el in rnncell_b_prof: 48 | rnncell_b_prof[el] = 0.0 49 | 50 | overall_cost = 0.0 51 | 52 | profs ={\ 53 | 'rnn_trans_f_prof': 0.0, \ 54 | 'rnn_trans_b_prof': 0.0, \ 55 | 'rnn_reshape_f_prof': 0.0, \ 56 | 'rnn_reshape_b_prof': 0.0, \ 57 | 'rnn_ReverseSequence_f_prof': 0.0, \ 58 | 'rnn_ReverseSequence_b_prof': 0.0, \ 59 | 'conv1_f_prof': 0.0, \ 60 | 'conv1_b_prof': 0.0, \ 61 | 'bn1_f_prof': 0.0, \ 62 | 'bn1_b_prof': 0.0, \ 63 | 'relu1_f_prof': 0.0, \ 64 | 'relu1_b_prof': 0.0, \ 65 | 'conv2_f_prof': 0.0, \ 66 | 'conv2_b_prof': 0.0, \ 67 | 'bn2_f_prof': 0.0, \ 68 | 'bn2_b_prof': 0.0, \ 69 | 'relu2_f_prof': 0.0, \ 70 | 'relu2_b_prof': 0.0, \ 71 | 'softmax_f_prof': 0.0, \ 72 | 'softmax_b_prof': 0.0, \ 73 | 'ctc_f_prof': 0.0, \ 74 | 'ctc_b_prof': 0.0 \ 75 | } 76 | 77 | 78 | with open('timing_memory.log', 'r') as f: 79 | for line in f: 80 | if len(line) > 3: 81 | if ((line[3] != ' ') or 'Adam/update_' in line) and ('flops' not in line): 82 | # flops is not considered 83 | # conv1 84 | if (module_grad not in line) and (module_conv1 in line) and ('Minimum' not in line) and ('Relu' not in line) and (module_bn not in line): 85 | val = findValWithFormat(line) 86 | profs['conv1_f_prof'] += val 87 | if (module_grad in line) and (module_conv1 in line) and ('Minimum' not in line) and ('Relu' not in line) and (module_bn not in line): 88 | val = findValWithFormat(line) 89 | profs['conv1_b_prof'] += val 90 | 91 | # BN1 92 | if (module_grad not in line) and (module_conv1 in line) and ('Minimum' not in line) and ('Relu' not in line) and (module_bn in line): 93 | val = findValWithFormat(line) 94 | profs['bn1_f_prof'] += val 95 | if (module_grad in line) and (module_conv1 in line) and ('Minimum' not in line) and ('Relu' not in line) and (module_bn in line): 96 | val = findValWithFormat(line) 97 | profs['bn1_b_prof'] += val 98 | 99 | # Relu1 100 | if (module_grad not in line) and (module_conv1 in line) and ('Minimum' in line or 'Relu' in line) and (module_bn not in line): 101 | val = findValWithFormat(line) 102 | profs['relu1_f_prof'] += val 103 | if (module_grad in line) and (module_conv1 in line) and ('Minimum' in line or 'Relu' in line) and (module_bn not in line): 104 | val = findValWithFormat(line) 105 | profs['relu1_b_prof'] += val 106 | 107 | # conv2 108 | if (module_grad not in line) and (module_conv2 in line) and ('Minimum' not in line) and ('Relu' not in line) and (module_bn not in line): 109 | val = findValWithFormat(line) 110 | profs['conv2_f_prof'] += val 111 | if (module_grad in line) and (module_conv2 in line) and ('Minimum' not in line) and ('Relu' not in line) and (module_bn not in line): 112 | val = findValWithFormat(line) 113 | profs['conv2_b_prof'] += val 114 | 115 | # BN2 116 | if (module_grad not in line) and (module_conv2 in line) and ('Minimum' not in line) and ('Relu' not in line) and (module_bn in line): 117 | val = findValWithFormat(line) 118 | profs['bn2_f_prof'] += val 119 | if (module_grad in line) and (module_conv2 in line) and ('Minimum' not in line) and ('Relu' not in line) and (module_bn in line): 120 | val = findValWithFormat(line) 121 | profs['bn2_b_prof'] += val 122 | 123 | # Relu2 124 | if (module_grad not in line) and (module_conv2 in line) and ('Minimum' in line or 'Relu' in line) and (module_bn not in line): 125 | val = findValWithFormat(line) 126 | profs['relu2_f_prof'] += val 127 | if (module_grad in line) and (module_conv2 in line) and ('Minimum' in line or 'Relu' in line) and (module_bn not in line): 128 | val = findValWithFormat(line) 129 | profs['relu2_b_prof'] += val 130 | 131 | #rnn transpose 132 | if (module_grad not in line) and (module_rnn in line) and ('transpose' in line) and (module_rnncell not in line): 133 | val = findValWithFormat(line) 134 | profs['rnn_trans_f_prof'] += val 135 | if (module_grad in line) and (module_rnn in line) and ('transpose' in line) and (module_rnncell not in line): 136 | val = findValWithFormat(line) 137 | profs['rnn_trans_b_prof'] += val 138 | 139 | #rnn reshape 140 | if (module_grad not in line) and (module_rnn in line) and ('rnn/Reshape' in line) and (module_rnncell not in line): 141 | val = findValWithFormat(line) 142 | profs['rnn_reshape_f_prof'] += val 143 | if (module_grad in line) and (module_rnn in line) and ('rnn/Reshape' in line) and (module_rnncell not in line): 144 | val = findValWithFormat(line) 145 | profs['rnn_reshape_b_prof'] += val 146 | 147 | #rnn reshape 148 | if (module_grad not in line) and (module_rnn in line) and ('ReverseSequence' in line): 149 | val = findValWithFormat(line) 150 | profs['rnn_ReverseSequence_f_prof'] += val 151 | if (module_grad in line) and (module_rnn in line) and ('ReverseSequence' in line): 152 | val = findValWithFormat(line) 153 | profs['rnn_ReverseSequence_b_prof'] += val 154 | 155 | # rnn forward profiling by cell 156 | if (module_grad not in line) and (module_rnncell in line): 157 | profRNNCell(line, rnncell_f_prof) 158 | # rnn backward profiling by cell 159 | if (module_grad in line) and (module_rnncell in line): 160 | profRNNCell(line, rnncell_b_prof) 161 | 162 | # softmax 163 | if (module_grad not in line) and (module_softmax in line): 164 | val = findValWithFormat(line) 165 | profs['softmax_f_prof'] += val 166 | if (module_grad in line) and (module_softmax in line): 167 | val = findValWithFormat(line) 168 | profs['softmax_b_prof'] += val 169 | 170 | # ctc 171 | for c in module_ctc: 172 | if (c in line) and (module_grad not in line): 173 | val = findValWithFormat(line) 174 | profs['ctc_f_prof'] += val 175 | if (c in line) and (module_grad in line): 176 | val = findValWithFormat(line) 177 | profs['ctc_b_prof'] +=val 178 | 179 | 180 | for key, val in dict.iteritems(rnncell_f_prof): 181 | overall_cost += val 182 | print "(RNN forward by cell) " + str(key) + ": " + str(val) + "ms" 183 | for key, val in dict.iteritems(rnncell_b_prof): 184 | overall_cost += val 185 | print "(RNN backward by cell) " + str(key) + ": " + str(val) + "ms" 186 | 187 | 188 | # Profiling result 189 | for k in dict.fromkeys(profs): 190 | overall_cost += profs[k] 191 | print k + ": " + str(profs[k]) + "ms" 192 | 193 | print "overall: " + str(overall_cost) + "ms" 194 | 195 | 196 | prf_file1 = open('prf1.txt', 'w') 197 | for k in dict.fromkeys(profs): 198 | prf_file1.write("%s:%f\n" % (k, profs[k])) 199 | prf_file1.close() 200 | 201 | # write including modules 202 | prf_file2 = open('prf2.txt', 'w') 203 | for el in lines: 204 | prf_file2.write("%s\n" % el) 205 | prf_file2.close() 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | -------------------------------------------------------------------------------- /tools/prof.py: -------------------------------------------------------------------------------- 1 | import fileinput as fin 2 | from collections import OrderedDict 3 | import argparse 4 | import json 5 | from operator import itemgetter 6 | import os 7 | DEBUG = True 8 | # Global variables 9 | time_cond = ['ts', 'dur'] 10 | global_vas = OrderedDict([ \ 11 | ('markTF', ''), \ 12 | ('gobal_beg_time', 0), \ 13 | ('output_folder', 'Output'),\ 14 | ]) 15 | layers_names = OrderedDict([ \ 16 | ('conv1_f', 'conv1_forward'), \ 17 | ('conv1_b', 'conv1_backward'), \ 18 | ('bn1_f', 'bn1_forward'), \ 19 | ('bn1_b', 'bn1_backward'), \ 20 | ('relu1_f', 'relu1_forward'), \ 21 | ('relu1_b', 'relu1_backward'), \ 22 | ('conv2_f', 'conv2_forward'), \ 23 | ('conv2_b', 'conv2_backward'), \ 24 | ('bn2_f', 'bn2_forward'), \ 25 | ('bn2_b', 'bn2_backward'), \ 26 | ('relu2_f', 'relu2_forward'), \ 27 | ('relu2_b', 'relu2_backward'), \ 28 | ('rnn_trans_f', 'rnn_transpose_forward'), \ 29 | ('rnn_trans_b', 'rnn_transpose_backward'), \ 30 | ('rnn_reshape_f', 'rnn_reshape_forward'), \ 31 | ('rnn_reshape_b', 'rnn_reshape_backward'), \ 32 | ('rnn_Revs_f', 'rnn_ReverseSequence_forward'), \ 33 | ('rnn_Revs_b', 'rnn_ReverseSequence_backward'), \ 34 | ('rnn_f_0', 'rnn_forward_cell_0'), \ 35 | ('rnn_b_0', 'rnn_backward_cell_0'), \ 36 | ('rnn_f_1', 'rnn_forward_cell_1'), \ 37 | ('rnn_b_1', 'rnn_backward_cell_1'), \ 38 | ('rnn_f_2', 'rnn_forward_cell_2'), \ 39 | ('rnn_b_2', 'rnn_backward_cell_2'), \ 40 | ('rnn_f_3', 'rnn_forward_cell_3'), \ 41 | ('rnn_b_3', 'rnn_backward_cell_3'), \ 42 | ('rnn_f_4', 'rnn_forward_cell_4'), \ 43 | ('rnn_b_4', 'rnn_backward_cell_4'), \ 44 | ('rnn_f_5', 'rnn_forward_cell_5'), \ 45 | ('rnn_b_5', 'rnn_backward_cell_5'), \ 46 | ('rnn_f_6', 'rnn_forward_cell_6'), \ 47 | ('rnn_b_6', 'rnn_backward_cell_6'), \ 48 | ('softmax_f', 'softmax_forward'), \ 49 | ('softmax_b', 'softmax_backward'), \ 50 | ('ctc_f', 'ctc_forward'), \ 51 | ('ctc_b', 'ctc_backward'), \ 52 | ('ema', 'ExponentialMovingAverage') \ 53 | ]) 54 | # Define ops per layer 55 | layers_ops = OrderedDict() 56 | for key, val in layers_names.iteritems(): 57 | layers_ops[val] = [] 58 | # Define modules 59 | module_grad = 'gradients' 60 | module_rnncell = "CustomRNNCell2/" 61 | rnn_cell_string = "cell_" 62 | module_rnn = 'rnn' 63 | module_conv1 = 'conv1' 64 | module_conv2 = 'conv2' 65 | module_bn = '/bn' 66 | module_softmax = 'softmax_linear' 67 | module_Exponential = 'ExponentialMovingAverage/AssignMovingAvg_' 68 | module_trans = 'transpose' 69 | module_resp ='rnn/Reshape' 70 | module_rvseq = 'ReverseSequence' 71 | module_relu = ['Minimum', 'Relu'] 72 | module_ctc = ['ctc_loss', 'CTCLoss'] 73 | # Define cond for branch instructions 74 | # opt_exc = ['biases/ApplyAdam', 'weights/ApplyAdam', 'Conv2D_grad/Shape'] 75 | opt_inc = OrderedDict() 76 | opt_exc = OrderedDict() 77 | opt_any = OrderedDict() 78 | opt_inc[layers_names['conv1_f']] = [module_conv1] 79 | opt_exc[layers_names['conv1_f']] = [module_grad, module_bn, 'Minimum', 'Relu'] 80 | opt_inc[layers_names['conv1_b']] = [module_conv1, module_grad] 81 | opt_exc[layers_names['conv1_b']] = [module_bn, 'Minimum', 'Relu'] 82 | opt_inc[layers_names['bn1_f']] = [module_conv1, module_bn] 83 | opt_exc[layers_names['bn1_f']] = [module_grad, 'Minimum', 'Relu'] 84 | opt_inc[layers_names['bn1_b']] = [module_conv1, module_bn, module_grad] 85 | opt_exc[layers_names['bn1_b']] = ['Minimum', 'Relu'] 86 | opt_inc[layers_names['relu1_f']] = [module_conv1] 87 | opt_exc[layers_names['relu1_f']] = [module_grad, module_bn] 88 | opt_inc[layers_names['relu1_b']] = [module_conv1, module_grad] 89 | opt_exc[layers_names['relu1_b']] = [module_bn] 90 | # conv2 91 | opt_inc[layers_names['conv2_f']] = [module_conv2] 92 | opt_exc[layers_names['conv2_f']] = [module_grad, module_bn, 'Minimum', 'Relu'] 93 | opt_inc[layers_names['conv2_b']] = [module_conv2, module_grad] 94 | opt_exc[layers_names['conv2_b']] = [module_bn, 'Minimum', 'Relu'] 95 | opt_inc[layers_names['bn2_f']] = [module_conv2, module_bn] 96 | opt_exc[layers_names['bn2_f']] = [module_grad, 'Minimum', 'Relu'] 97 | opt_inc[layers_names['bn2_b']] = [module_conv2, module_bn, module_grad] 98 | opt_exc[layers_names['bn2_b']] = ['Minimum', 'Relu'] 99 | opt_inc[layers_names['relu2_f']] = [module_conv2] 100 | opt_exc[layers_names['relu2_f']] = [module_grad, module_bn] 101 | opt_inc[layers_names['relu2_b']] = [module_conv2, module_grad] 102 | opt_exc[layers_names['relu2_b']] = [module_bn] 103 | # full connection layer 104 | opt_inc[layers_names['softmax_f']] = [module_softmax] 105 | opt_exc[layers_names['softmax_f']] = [module_grad] 106 | opt_inc[layers_names['softmax_b']] = [module_softmax, module_grad] 107 | opt_exc[layers_names['softmax_b']] = [] 108 | opt_inc[layers_names['ctc_f']] = [] 109 | opt_exc[layers_names['ctc_f']] = [module_grad] 110 | opt_inc[layers_names['ctc_b']] = [module_grad] 111 | opt_exc[layers_names['ctc_b']] = [] 112 | 113 | opt_any[layers_names['relu1_f']] = module_relu 114 | opt_any[layers_names['relu1_b']] = module_relu 115 | opt_any[layers_names['relu2_f']] = module_relu 116 | opt_any[layers_names['relu2_b']] = module_relu 117 | opt_any[layers_names['ctc_f']] = module_ctc 118 | opt_any[layers_names['ctc_b']] = module_ctc 119 | debug_exc = [] 120 | for i in range(7): 121 | if i>=0: 122 | debug_exc.append("rnn_backward_cell_"+str(i)) 123 | debug_exc.append("rnn_forward_cell_"+str(i)) 124 | # rnn 125 | cell = "cell_" 126 | i1 = 0 127 | i2 = 0 128 | for key, val in layers_names.iteritems(): 129 | if 'rnn_f' in key: 130 | opt_inc[val] = [module_rnncell, cell+str(i1)] 131 | opt_exc[val] = [module_grad] 132 | i1 += 1 133 | elif 'rnn_b' in key: 134 | opt_inc[val] = [module_rnncell, cell+str(i2), module_grad] 135 | opt_exc[val] = [] 136 | i2 += 1 137 | # others 138 | opt_inc[layers_names['rnn_trans_f']] = [module_rnn, module_trans] 139 | opt_exc[layers_names['rnn_trans_f']] = [module_grad, module_rnncell] 140 | opt_inc[layers_names['rnn_trans_b']] = [module_rnn, module_trans, module_grad] 141 | opt_exc[layers_names['rnn_trans_b']] = [module_rnncell] 142 | opt_inc[layers_names['rnn_reshape_f']] = [module_rnn, module_resp] 143 | opt_exc[layers_names['rnn_reshape_f']] = [module_grad, module_rnncell] 144 | opt_inc[layers_names['rnn_reshape_b']] = [module_rnn, module_resp, module_grad] 145 | opt_exc[layers_names['rnn_reshape_b']] = [module_rnncell] 146 | opt_inc[layers_names['rnn_Revs_f']] = [module_rnn, module_rvseq] 147 | opt_exc[layers_names['rnn_Revs_f']] = [module_grad] 148 | opt_inc[layers_names['rnn_Revs_b']] = [module_rnn, module_rvseq, module_grad] 149 | opt_exc[layers_names['rnn_Revs_b']] = [] 150 | opt_inc[layers_names['ema']] = [module_Exponential] 151 | opt_exc[layers_names['ema']] = [module_rnn] 152 | 153 | # Functions and classes 154 | def parse_args(): 155 | parser = argparse.ArgumentParser() 156 | parser.add_argument('--input', type = str, 157 | default = 'ITF_profiling_32.json') 158 | args = parser.parse_args() 159 | return args 160 | 161 | def updateGlobalVas(args): 162 | if 'ITF' in args.input: 163 | global_vas['markTF'] = 'ITF_' 164 | elif 'PTF' in args.input: 165 | global_vas['markTF'] = 'PTF_' 166 | else: 167 | global_vas['markTF'] = 'MTF_' 168 | if not os.path.exists(os.path.join(os.getcwd(), global_vas['output_folder'])): 169 | os.makedirs(os.path.join(os.getcwd(), global_vas['output_folder'])) 170 | if not os.path.exists(os.path.join(os.getcwd(), global_vas['output_folder'] + '/inc_ops')): 171 | os.makedirs(os.path.join(os.getcwd(), global_vas['output_folder'] + '/inc_ops')) 172 | if not os.path.exists(os.path.join(os.getcwd(), global_vas['output_folder'] + '/rnn_gaps')): 173 | os.makedirs(os.path.join(os.getcwd(), global_vas['output_folder'] + '/rnn_gaps')) 174 | 175 | 176 | class RawPeriodList: 177 | def __init__(self): 178 | self.beg_time = 0.0 179 | self.end_time = 0.0 180 | self.beg_op = '' 181 | self.end_op = '' 182 | self.periodList = [] 183 | self.period = OrderedDict([\ 184 | ('layer_name', ''), \ 185 | ('period', 0.0), \ 186 | ('beg_time', 0.0), \ 187 | ('end_time', 0.0), \ 188 | ('beg_op', ''),\ 189 | ('end_op', '')]) 190 | def initPeriod(self): 191 | self.beg_time = 0.0 192 | self.end_time = 0.0 193 | def createPeriod(self, op): 194 | if self.beg_time == 0 and self.end_time == 0: 195 | self.beg_op = op['name'] 196 | self.end_op = op['name'] 197 | self.beg_time = op['beg'] 198 | self.end_time = op['end'] 199 | elif self.beg_time > op['beg']: 200 | self.beg_op = op['name'] 201 | self.beg_time = op['beg'] 202 | elif self.end_time < op['end']: 203 | self.end_op = op['name'] 204 | self.end_time = op['end'] 205 | def append2List(self, layer_name): 206 | self.period['layer_name'] = layer_name 207 | self.period['period'] = self.end_time - self.beg_time 208 | self.period['beg_time'] = self.beg_time 209 | self.period['end_time'] = self.end_time 210 | self.period['beg_op'] = self.beg_op 211 | self.period['end_op'] = self.end_op 212 | self.periodList.append(self.period.copy()) 213 | def getRawPeriodList(self): 214 | return self.periodList 215 | def printPeriods(self): 216 | file_name = global_vas['output_folder'] + '/'+ global_vas['markTF'] + 'periods.csv' 217 | fp = open(file_name, 'w') 218 | fp.write("layer_name, period, beg_time, beg_op, end_time, end_op\n") 219 | for period in self.periodList: 220 | fp.write("%s, %g, %g, %s, %g, %s\n" % (period['layer_name'], period['period'],\ 221 | period['beg_time'], period['beg_op'],\ 222 | period['end_time'], period['end_op'])) 223 | fp.close() 224 | 225 | class TimeStampsList: 226 | stampsList = [] 227 | stamps = OrderedDict() 228 | stamp = OrderedDict() 229 | 230 | def __init__(self, layers_ops): 231 | self.layers = layers_ops 232 | self.rawPeriodList = RawPeriodList() 233 | self.stamps = OrderedDict([\ 234 | ('layer_name', ''),\ 235 | ('stamps', [])]) 236 | self.stamp = OrderedDict([\ 237 | ('time', 0.0),\ 238 | ('op_name', ''),\ 239 | ('pos', '')]) 240 | def initStamps(self): 241 | self.stamps['stamps'] = [] 242 | def createTimeStamp(self, op): 243 | self.stamp['time'] = op['beg'] 244 | self.stamp['op_name'] = op['name'] 245 | self.stamp['pos'] = 'beg' 246 | stamp_beg = self.stamp.copy() 247 | self.stamp['time'] = op['end'] 248 | self.stamp['op_name'] = op['name'] 249 | self.stamp['pos'] = 'end' 250 | stamp_end = self.stamp.copy() 251 | return stamp_beg, stamp_end 252 | def createStampsList(self): 253 | for layer_name, ops in self.layers.iteritems(): 254 | self.initStamps() 255 | self.stamps['layer_name'] = layer_name 256 | self.rawPeriodList.initPeriod() 257 | for op in ops: 258 | stamp_beg, stamp_end = self.createTimeStamp(op) 259 | self.stamps['stamps'].append(stamp_beg) 260 | self.stamps['stamps'].append(stamp_end) 261 | self.rawPeriodList.createPeriod(op) 262 | self.stamps['stamps'].sort(key = lambda x: x['time']) 263 | # append the stamps of the layer into list 264 | self.stampsList.append(self.stamps.copy()) 265 | self.rawPeriodList.append2List(layer_name) 266 | def getStampsList(self): 267 | return self.stampsList 268 | def getRawPeriodList(self): 269 | return self.rawPeriodList.getRawPeriodList() 270 | def printStamps(self): 271 | file_name = global_vas['output_folder'] + '/'+ global_vas['markTF'] +'stamps.csv' 272 | fp = open(file_name, 'w') 273 | for stamps in self.stampsList: 274 | fp.write("%s\n" % stamps['layer_name']) 275 | fp.write("time, op_name, pos\n") 276 | for stamp in stamps['stamps']: 277 | fp.write("%g, %s, %s\n" % (stamp['time'], stamp['op_name'], stamp['pos'])) 278 | fp.close() 279 | def printPeriods(self): 280 | self.rawPeriodList.printPeriods() 281 | 282 | class TimeInfo: 283 | def __init__(self, layers_ops, threshold): 284 | self.threshold = threshold 285 | self.rawPeriodList = RawPeriodList() 286 | self.timeStampsList = TimeStampsList(layers_ops) 287 | self.layerExeTimeList = LayerExeTimeList(layers_ops) 288 | self.layers = layers_ops 289 | 290 | def createInfo(self): 291 | self.timeStampsList.createStampsList() 292 | stampsList = self.timeStampsList.getStampsList() 293 | self.layerExeTimeList.createExeTimeList(stampsList, self.threshold) 294 | 295 | self.timeStampsList.printStamps() 296 | self.timeStampsList.printPeriods() 297 | self.layerExeTimeList.printExeTimes() 298 | self.layerExeTimeList.printGapsList() 299 | 300 | class LayerExeTimeList: 301 | def __init__(self, layers): 302 | self.interGapsList = InterGapsList() 303 | self.layers = layers 304 | self.exeTimeList = [] 305 | self.exeTime = OrderedDict([\ 306 | ('layer_name', ''),\ 307 | ('wall_time', 0.0),\ 308 | ('wall_time_thres', 0.0)]) 309 | def checkInsideOp(self, mid, layer_name): 310 | ops = self.layers[layer_name] 311 | for op in ops: 312 | if mid > op['beg'] and mid < op['end']: 313 | return True 314 | return False 315 | def createExeTimeList(self, stampsList, threshold=0.0): 316 | # Compute the wall time for layers 317 | for item in stampsList: 318 | # For one layer 319 | layer_name = item['layer_name'] 320 | self.exeTime['layer_name'] = layer_name 321 | stamps = item['stamps'] 322 | wallTime = 0.0 323 | wallTime_thres = 0.0 324 | self.interGapsList.initGaps() 325 | for i in range(len(stamps)): 326 | if i == 0: 327 | continue 328 | prev = stamps[i-1]['time'] 329 | current = stamps[i]['time'] 330 | period = current - prev 331 | mid = prev + period/2 332 | if self.checkInsideOp(mid, layer_name): 333 | wallTime += period 334 | wallTime_thres += period 335 | elif period < threshold: 336 | wallTime_thres += period 337 | self.interGapsList.append2Gaps(layer_name, period, stamps[i-1], stamps[i]) 338 | self.exeTime['wall_time'] = wallTime 339 | self.exeTime['wall_time_thres'] = wallTime_thres 340 | self.exeTimeList.append(self.exeTime.copy()) 341 | self.interGapsList.append2GapsList() 342 | print ("ExeTime of %s has been computed" % (layer_name)) 343 | 344 | def getExeTimeList(self): 345 | return self.exeTimeList 346 | def printExeTimes(self): 347 | file_name = global_vas['output_folder'] + '/'+ global_vas['markTF'] +'exeTime.csv' 348 | fp = open(file_name, 'w') 349 | fp.write("layer_name, wall_time, wall_time_thres\n") 350 | for exeTime in self.exeTimeList: 351 | fp.write("%s, %g, %g\n" % (exeTime['layer_name'], exeTime['wall_time'], exeTime['wall_time_thres'])) 352 | fp.close() 353 | def printGapsList(self): 354 | self.interGapsList.printGapsList() 355 | 356 | class InterGapsList: 357 | def __init__(self): 358 | self.gapsList = [] 359 | self.gaps = OrderedDict([\ 360 | ('layer_name', ''),\ 361 | ('gaps', [])]) 362 | self.gap = OrderedDict([\ 363 | ('period', 0.0),\ 364 | ('beg_time', 0.0),\ 365 | ('end_time', 0.0),\ 366 | ('beg_op', 0.0),\ 367 | ('end_op', 0.0)]) 368 | def initGaps(self): 369 | self.gaps['gaps'] = [] 370 | def append2Gaps(self, layer_name, period, prev, current): 371 | self.gap['period'] = period 372 | self.gap['beg_time'] = prev['time'] 373 | self.gap['beg_op'] = prev['op_name'] 374 | self.gap['end_time'] = current['time'] 375 | self.gap['end_op'] = current['op_name'] 376 | self.gaps['layer_name'] = layer_name 377 | self.gaps['gaps'].append(self.gap.copy()) 378 | def append2GapsList(self): 379 | self.gapsList.append(self.gaps.copy()) 380 | def printGapsList(self): 381 | file_name = global_vas['output_folder'] + '/'+ global_vas['markTF'] +'gaps.csv' 382 | fp = open(file_name, 'w') 383 | for gaps in self.gapsList: 384 | if 'cell' in gaps['layer_name']: 385 | file_name = global_vas['output_folder'] + '/rnn_gaps/'+ global_vas['markTF'] + gaps['layer_name'] +'gaps.csv' 386 | fp_rnn = open(file_name, 'w') 387 | for gap in gaps['gaps']: 388 | fp_rnn.write("%g, %g, %s, %g, %s\n" % \ 389 | (gap['period'],gap['beg_time'],gap['beg_op'],gap['end_time'],gap['end_op'])) 390 | fp_rnn.close() 391 | else: 392 | fp.write("%s\n" % (gaps['layer_name'])) 393 | for gap in gaps['gaps']: 394 | fp.write("%g, %g, %s, %g, %s\n" % \ 395 | (gap['period'],gap['beg_time'],gap['beg_op'],gap['end_time'],gap['end_op'])) 396 | fp.close() 397 | 398 | class Operator: 399 | opInfo = OrderedDict() 400 | def __init__(self): 401 | self.opInfo = OrderedDict([\ 402 | ('name', ''),\ 403 | ('beg', 0.0),\ 404 | ('end', 0.0) ]) 405 | 406 | def createOpInfo(self, name, beg, end): 407 | self.opInfo['name'] = name 408 | self.opInfo['beg'] = float(beg)/1000.0 409 | self.opInfo['end'] = float(end)/1000.0 410 | 411 | def getOpInfo(self): 412 | return self.opInfo 413 | 414 | def insert(self, alist, input, input_name): 415 | beg = input['ts'] - global_vas['gobal_beg_time'] 416 | end = beg + input['dur'] 417 | self.createOpInfo(input_name, beg, end) 418 | alist.append(self.getOpInfo().copy()) 419 | 420 | class OpsList: 421 | gmark = False 422 | op = Operator() 423 | layers = OrderedDict() 424 | 425 | def __init__(self, layers_ops): 426 | self.layers = layers_ops 427 | 428 | def recordBegTime(self, item): 429 | if self.gmark == False and all(c in item.iterkeys() for c in time_cond): 430 | self.gmark = True 431 | global_vas['gobal_beg_time'] = item['ts'] 432 | 433 | def groupByLayer(self, input, cmd): 434 | for key, layer_name in layers_names.iteritems(): 435 | if 'relu' in key or 'ctc' in key: 436 | if all(c in cmd for c in opt_inc[layer_name]) and all(c not in cmd for c in opt_exc[layer_name]) \ 437 | and any(c in cmd for c in opt_any[layer_name]): 438 | self.op.insert(self.layers[layer_name], input, cmd) 439 | break 440 | else: 441 | if all(c in cmd for c in opt_inc[layer_name]) and all(c not in cmd for c in opt_exc[layer_name]): 442 | self.op.insert(self.layers[layer_name], input, cmd) 443 | break 444 | 445 | def append2List(self, input): 446 | input_name = str(input["name"]) 447 | self.recordBegTime(input) 448 | if "args" in input.iterkeys() and all(c in input.iterkeys() for c in time_cond): 449 | input_name = str(input["args"]["name"]) 450 | self.groupByLayer(input, input_name) 451 | elif "args" not in input.iterkeys() and all(c in input.iterkeys() for c in time_cond): 452 | self.groupByLayer(input, input_name) 453 | 454 | def printList(self): 455 | for key, val in self.layers.iteritems(): 456 | file_name = global_vas['output_folder'] + '/inc_ops/'+ global_vas['markTF'] +'include_ops_'+key+'.log' 457 | fp = open(file_name, 'w') 458 | for v in val: 459 | fp.write("%s:\n" % (v)) 460 | fp.close() 461 | 462 | # Read Json file 463 | args = parse_args() 464 | updateGlobalVas(args) 465 | json_data=open(args.input).read() 466 | jdata = json.loads(json_data) 467 | 468 | # Create layer's OPs list 469 | opsList = OpsList(layers_ops) 470 | for item in jdata["traceEvents"]: 471 | opsList.append2List(item) 472 | 473 | print 'opsList has been created' 474 | opsList.printList() 475 | 476 | threshold = 50000.0 477 | timeInfo = TimeInfo(layers_ops, threshold) 478 | timeInfo.createInfo() 479 | 480 | 481 | 482 | 483 | 484 | 485 | 486 | 487 | 488 | 489 | 490 | 491 | 492 | 493 | 494 | 495 | 496 | -------------------------------------------------------------------------------- /tools/vtune.sh: -------------------------------------------------------------------------------- 1 | source /opt/intel/vtune_amplifier_xe/amplxe-vars.sh 2 | 3 | amplxe-cl -collect advanced-hotspots -- ./train.sh 4 | 5 | # amplxe-gui 6 | --------------------------------------------------------------------------------