├── README.md ├── cifar ├── __init__.py ├── cifar.py └── dist.py ├── evaluate_mfacenet_reranking.py ├── evaluate_retrieval.py ├── flags.py ├── get_embeddings_reranking.py ├── imagenet_preprocessing.py ├── input_pipeline.py ├── megaface_embeddings.py ├── megaface_embeddings_hdf.py ├── mobilefacenet.py ├── mobilenet ├── __init__.py ├── conv_blocks.py ├── mobilenet.py ├── mobilenet_example.ipynb ├── mobilenet_v2.py ├── mobilenet_v2_test.py └── readme.md ├── model.py ├── preprocessing ├── build_image_data.py ├── msceleb_to_jpeg.py ├── preprocess_fs_mf_official.sh └── preprocess_msceleb.sh ├── proxy_metric_learning.py ├── resnet ├── __init__.py └── resnet_model.py ├── run_scripts ├── cifar │ ├── dense_64.sh │ └── sparse_64.sh ├── face │ ├── dense_mobilenet_128.sh │ ├── dense_mobilenet_512.sh │ ├── dense_resnet_128.sh │ ├── dense_resnet_512.sh │ ├── sparse_mobilenet_fl.sh │ ├── sparse_mobilenet_l1.sh │ ├── sparse_resnet_fl.sh │ └── sparse_resnet_l1.sh └── get_embeddings │ ├── face_mobilenet_reranking.sh │ └── face_resnet_reranking.sh ├── search ├── Makefile ├── hdf5_to_bin.py ├── mobilefacenet_to_bin.sh ├── resnet_to_bin.sh └── search.cpp └── train.py /README.md: -------------------------------------------------------------------------------- 1 | # sparse-embed 2 | Code for paper 'Minimizing FLOPs to Learn Efficient Sparse Representations' published at ICLR 2020 https://openreview.net/forum?id=SygpC6Ntvr 3 | 4 | The main training and testing code is based on [TensorFlow](https://www.tensorflow.org/), and multi-GPU training is achieved using [Horovod](https://github.com/horovod/horovod). 5 | 6 | ## Requirements 7 | The code has been tested to work with the following versions: 8 | * python 3.6 9 | * tensorflow-gpu 1.12 10 | * horovod 0.16 11 | 12 | 13 | ## Datasets 14 | **Training data:** The MS1M dataset for training was downloaded from the InsightFace [repository](https://github.com/deepinsight/insightface/wiki/Dataset-Zoo). The dataset is aligned by authors of InsightFace using [MTCNN](https://kpzhang93.github.io/MTCNN_face_detection_alignment/index.html). 15 | 16 | The downloaded dataset is in MXnet format, and must be converted to JPEG images using ``python preprocessing/msceleb_to_jpeg.py ``. The extracted images will be stored in ``dataset_directory/images``. 17 | 18 | **Testing data:** The MegaFace and FaceScrub datasets can been downloaded from the official MegaFace challenge [website](http://megaface.cs.washington.edu/participate/challenge.html). We used [MTCNN](https://kpzhang93.github.io/MTCNN_face_detection_alignment/index.html) to align the images. 19 | 20 | The CIFAR-100 training and testing data can be downloaded from https://www.cs.toronto.edu/~kriz/cifar.html. 21 | 22 | ## Preprocessing 23 | All the datasets must be converted to TFRecords format for fast and multi-threaded pre-fetching of data. The scripts in ``preprocessing/*.sh`` can be used to convert to TFRecords format. The output directories can be modified within the code. The preprocessing code has been taken from the official TF models [github](https://github.com/tensorflow/models). 24 | 25 | ### Training 26 | The pipeline to read images (``input_pipeline.py``, ``imagenet_preprocessing.py``) during training has also been taken from the TF models [github](https://github.com/tensorflow/models), and the main models have been implemented using the [tf.Estimator](https://www.tensorflow.org/guide/estimator) framework. Sample training scripts can be found in ``run_scripts/face`` and ``run_scripts/cifar``. The directories for the trained models and the dataset directories can be specified withing the shell scripts. 27 | 28 | ### Evaluation 29 | For evaluation, we have to first compute the trained embeddings for FaceScrub and MegaFace. Example scripts using ``get_embeddings_reranking.py`` are given in ``run_scripts/get_embeddings``. The computed embeddings will be saved as [HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) files in a directory named ``embeddings_reranking``. The MegaFace and FaceScrub embeddings start with the prefixes ``megaface`` and ``facescrub`` respectively. For instance, the MegaFace embeddings corresponding to the model ``sparse_mobilenet_fl_run_1`` will be saved as ``megaface_sparse_mobilenet_fl_run_1.hdf5``. 30 | 31 | The accuracy and retrieval times are evaluated independently as described below. 32 | 33 | **Accuracy:** Accuracy can be evaluated as 34 | ``` 35 | python evaluate_mfacenet_reranking.py --dense_suffix=dense_mobilenet_512_run_1.hdf5 --sparse_suffix=sparse_mobilenet_fl_run_1.hdf5 --threshold=0.25 --rerank_topk=1000 36 | ``` 37 | ``dense_suffix`` and ``sparse_suffix`` parameters denote the corresponding and dense and sparse embeddings to use. The dense embeddings are used for re-ranking as described in the paper. The ``threshold`` and ``rerank_topk`` parameters denote the filtering threshold and the number of top candidates to rerank, respectively. Refer to the paper for a more detailed description of these parameters. 38 | 39 | **Retrieval Time:** For measuring the time, a more efficient C++ script is used. The HDF5 files must be converted to binary files before they can be read by the script. ``mobilefacenet_to_bin.sh`` and ``resnet_to_bin.sh`` provide examples to convert the HDF5 files to binary format and saved in the directory ``embedding_bin``. The retrieval time can be measured using ``search/search.cpp``. The makefile ``search/Makefile`` provides a sample flow to compile the script and measure the time retrieval time in ms. The ``runmegaface`` command measures the time using MegaFace distractors and held-out MegaFace queries. The ``runfacescrub`` command on the other hand uses MegaFace distractors and FaceScrub queries. 40 | 41 | -------------------------------------------------------------------------------- /cifar/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/biswajitsc/sparse-embed/775404d38ce4382251b88168820da3740c1323ea/cifar/__init__.py -------------------------------------------------------------------------------- /cifar/cifar.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tensorflow as tf 3 | import pickle as pkl 4 | import os 5 | 6 | import absl 7 | from tqdm import tqdm 8 | 9 | flags = absl.flags 10 | FLAGS = flags.FLAGS 11 | slim = tf.contrib.slim 12 | 13 | 14 | def flip_rl(inputs): 15 | ''' 16 | Args: 17 | inputs - 4D tensor [batch, height, width, nchannel] 18 | Return: 19 | flips - 4D tensor [batch, height, width, nchannel] 20 | 21 | flips[i][j][width-k][l] = inputs[i][j][k][l] 22 | ''' 23 | flips = tf.reverse(inputs, axis=[2]) 24 | return flips 25 | 26 | 27 | def random_flip_rl(inputs): 28 | ''' 29 | Args: 30 | inputs - 4D tensor 31 | Return: 32 | flip left right or original image 33 | ''' 34 | rand = tf.random_uniform([], minval=-1, maxval=1) 35 | return tf.cond(tf.greater(rand, 0.0), lambda: inputs, lambda: flip_rl(inputs)) 36 | 37 | 38 | def get_shape(x): 39 | '''get the shape of tensor as list''' 40 | return x.get_shape().as_list() 41 | 42 | 43 | def random_crop(inputs, ch, cw): 44 | ''' 45 | apply tf.random_crop on 4D 46 | Args: 47 | inputs- 4D tensor 48 | ch - int 49 | crop height 50 | cw - int 51 | crop width 52 | ''' 53 | ib, ih, iw, ic = get_shape(inputs) 54 | return tf.random_crop(inputs, [FLAGS.batch_size, ch, cw, ic]) 55 | 56 | 57 | def read_cifar_split(data_dir, train, seen=False): 58 | if train: 59 | train_images = np.load(data_dir + '/train_image.npy') 60 | train_labels = np.load(data_dir + '/train_label.npy') 61 | 62 | print('CIFAR-100: images {} labels {}'.format(train_images.shape, train_labels.shape)) 63 | return train_images, train_labels 64 | 65 | elif not seen: 66 | # val_images = np.load(data_dir + '/val_image.npy') 67 | # val_labels = np.load(data_dir + '/val_label.npy') 68 | 69 | # test_images_u = np.load(data_dir + '/test_image_u.npy') 70 | # test_labels_u = np.load(data_dir + '/test_label_u.npy') 71 | 72 | # images = np.concatenate((val_images, test_images_u)) 73 | # labels = np.concatenate((val_labels, test_labels_u)) 74 | 75 | images = np.load(data_dir + '/test_image_u.npy') 76 | labels = np.load(data_dir + '/test_label_u.npy') 77 | 78 | print('CIFAR-100: images {} labels {}'.format(images.shape, labels.shape)) 79 | return images, labels 80 | 81 | else: 82 | images = np.load(data_dir + '/test_image_s.npy') 83 | labels = np.load(data_dir + '/test_label_s.npy') 84 | 85 | print('CIFAR-100: images {} labels {}'.format(images.shape, labels.shape)) 86 | 87 | return images, labels 88 | 89 | 90 | def input_fn_split(data_dir, train, batch_size, num_epochs, seen=False): 91 | images, labels = read_cifar_split(data_dir, train, seen) 92 | labels = np.asarray(labels.reshape((-1, 1)), dtype=np.int32) 93 | images = np.asarray(images, dtype=np.float32) 94 | idx = np.random.permutation(len(images)) 95 | images = images[idx] 96 | labels = labels[idx] 97 | # images = images / 128.0 # images are already centered 98 | 99 | texts = np.asarray(['-'] * len(labels)) 100 | 101 | fn = tf.estimator.inputs.numpy_input_fn( 102 | { 103 | "features": images, 104 | "labels": labels, 105 | "label_texts": texts, 106 | "filename": texts 107 | }, 108 | labels, batch_size=batch_size, num_epochs=num_epochs, 109 | shuffle=True) 110 | return fn() 111 | 112 | # texts = tf.constant(['-'] * len(labels)) 113 | # labels = tf.constant(labels, tf.int32) 114 | # images = tf.constant(images, tf.float32) # channels last 115 | # dataset = tf.data.Dataset.from_tensor_slices((images, labels, texts, texts)) 116 | # dataset = dataset.repeat(num_epochs) 117 | # dataset = dataset.batch(batch_size) 118 | 119 | # return dataset 120 | 121 | 122 | def input_fn(data_dir, train, batch_size, num_epochs): 123 | with open(os.path.join(data_dir, 'train'), 'rb') as f: 124 | data = pkl.load(f, encoding='bytes') 125 | train_images = data[b'data'].astype(np.float32) 126 | train_labels = data[b'fine_labels'] 127 | image_mean = np.mean(train_images, axis=0) 128 | 129 | with open(os.path.join(data_dir, 'test'), 'rb') as f: 130 | data = pkl.load(f, encoding='bytes') 131 | test_images = data[b'data'].astype(np.float32) 132 | test_labels = data[b'fine_labels'] 133 | 134 | train_images -= image_mean 135 | test_images -= image_mean 136 | 137 | train_images /= 128.0 138 | test_images /= 128.0 139 | 140 | if train: 141 | images = train_images 142 | labels = train_labels 143 | else: 144 | images = test_images 145 | labels = test_labels 146 | 147 | labels = np.asarray(labels, dtype=np.int32).reshape((-1, 1)) 148 | images = np.asarray(images, dtype=np.float32) 149 | images = np.transpose(np.reshape(images, [-1,3,32,32]), [0,2,3,1]) 150 | idx = np.random.permutation(len(images)) 151 | images = images[idx] 152 | labels = labels[idx] 153 | 154 | texts = np.asarray(['-'] * len(labels)) 155 | 156 | fn = tf.estimator.inputs.numpy_input_fn( 157 | { 158 | "features": images, 159 | "labels": labels, 160 | "label_texts": texts, 161 | "filename": texts 162 | }, 163 | labels, batch_size=batch_size, num_epochs=num_epochs, 164 | shuffle=True) 165 | return fn() 166 | 167 | 168 | def proxy_layer(embedding, labels, num_classes): 169 | with tf.variable_scope('Logits'): 170 | weights = tf.get_variable(name='proxy_wts', 171 | shape=(embedding.shape[-1], num_classes), 172 | initializer=tf.random_normal_initializer(stddev=0.001), 173 | dtype=tf.float32) 174 | weights = tf.nn.l2_normalize(weights, axis=0) 175 | bias = tf.Variable(tf.zeros([num_classes])) 176 | # alpha = 64.0 177 | logits = tf.matmul(embedding, weights) # + bias 178 | # one_hot_labels = tf.one_hot(labels, num_classes) 179 | # logits -= one_hot_labels * 0.1 180 | return logits 181 | 182 | 183 | def soft_thresh(lam): 184 | # lam = tf.get_variable( 185 | # "soft_thresh", shape=(), 186 | # initializer=tf.zeros_initializer, 187 | # constraint=lambda x: "hello") 188 | # lam = lam 189 | def activation(x): 190 | ret = tf.nn.relu(x - lam) - tf.nn.relu(-x - lam) 191 | return ret 192 | return activation 193 | 194 | 195 | def nin(features, labels, is_training, num_classes): 196 | if is_training: 197 | features = random_flip_rl( 198 | random_crop( 199 | tf.pad( 200 | features, [[0,0],[4,4],[4,4],[0,0]], 201 | "CONSTANT"), 32, 32)) 202 | 203 | with tf.variable_scope("NIN"): 204 | with slim.arg_scope([slim.conv2d], 205 | weights_initializer=tf.contrib.slim.variance_scaling_initializer(), 206 | weights_regularizer=slim.l2_regularizer(0.0001)): 207 | with slim.arg_scope([slim.conv2d], stride=1, padding='SAME', activation_fn=tf.nn.relu): 208 | with slim.arg_scope(([slim.dropout]), is_training=is_training, keep_prob=0.5): 209 | n = features 210 | n = slim.conv2d(n, 192, [5, 5], scope='conv2d_0') 211 | n = slim.conv2d(n, 160, [1, 1], scope='conv2d_1') 212 | n = slim.conv2d(n, 96, [1, 1], scope='conv2d_2') 213 | n = slim.max_pool2d(n, [3,3], stride=2, padding='SAME') 214 | n = slim.dropout(n) 215 | n = slim.conv2d(n, 192, [5, 5], scope='conv2d_3') 216 | n = slim.conv2d(n, 192, [1, 1], scope='conv2d_4') 217 | n = slim.conv2d(n, 192, [1, 1], scope='conv2d_5') 218 | n = slim.avg_pool2d(n, [3,3], stride=2, padding='SAME') 219 | n = slim.dropout(n) 220 | n = slim.conv2d(n, 192, [3, 3], scope='conv2d_6') 221 | n = slim.conv2d(n, 192, [1, 1], activation_fn=None, scope='conv2d_7') 222 | 223 | n = tf.reduce_mean(n, [1, 2], keepdims=False) 224 | n = tf.nn.relu(n) 225 | 226 | if FLAGS.final_activation == 'none': 227 | final_activation = None 228 | elif FLAGS.final_activation == 'relu': 229 | final_activation = tf.nn.relu 230 | elif FLAGS.final_activation == 'soft_thresh': 231 | final_activation = soft_thresh(0.1) 232 | else: 233 | raise ValueError('Unknown final_activation %s' % final_activation) 234 | 235 | with tf.variable_scope('embedding'): 236 | with slim.arg_scope([slim.fully_connected], 237 | activation_fn=final_activation, 238 | weights_regularizer=slim.l2_regularizer(0.001), 239 | biases_initializer=tf.zeros_initializer()): 240 | n = slim.fully_connected(n, FLAGS.embedding_size, scope = "fc1") 241 | embedding = tf.nn.l2_normalize(n, axis=1) 242 | 243 | end_points = {} 244 | end_points['embedding'] = embedding 245 | 246 | logits = proxy_layer(embedding, labels, num_classes) 247 | end_points['Logits'] = logits 248 | 249 | return logits, end_points 250 | 251 | 252 | def get_embeddings(estimator, global_step, is_train): 253 | fn = lambda: input_fn( 254 | FLAGS.data_dir, is_train, FLAGS.batch_size, 1) 255 | 256 | predictions = estimator.predict( 257 | input_fn=fn, 258 | yield_single_examples=True, 259 | predict_keys=["true_labels", "true_label_texts", "small_embeddings", "filename"], 260 | checkpoint_path=os.path.join(FLAGS.model_dir, 'model.ckpt-%d' % global_step) 261 | ) 262 | 263 | labels = [] 264 | embeddings = [] 265 | 266 | for prediction in tqdm(predictions): 267 | labels.append(prediction["true_labels"][0]) 268 | embeddings.append(prediction["small_embeddings"]) 269 | 270 | labels = np.asarray(labels) 271 | embeddings = np.asarray(embeddings) 272 | embeddings[np.abs(embeddings) <= FLAGS.zero_threshold] = 0.0 273 | return embeddings, labels 274 | 275 | def eval_precision(qemb, qlabel, demb, dlabel, ks): 276 | MAX_QUERY = 10000 277 | 278 | num_correct = np.zeros(len(ks)) 279 | num_queries = min(qemb.shape[0], MAX_QUERY) 280 | for i in tqdm(range(num_queries)): 281 | dis = - np.squeeze(np.matmul(qemb[i], demb.T)) 282 | dis[i] = 1e10 283 | pred = np.argsort(dis) 284 | for j, k in enumerate(ks): 285 | num_correct[j] += np.mean(dlabel[pred[:k]] == qlabel[i]) 286 | recall = num_correct / num_queries 287 | return recall 288 | 289 | def compute_metrics(estimator, global_step, is_training, summary_writer=None): 290 | print("Computing for global step %d" % global_step) 291 | 292 | ks = [1, 4, 16, 64] 293 | 294 | test_embeddings, test_labels = get_embeddings(estimator, global_step, False) 295 | if is_training: 296 | train_embeddings, train_labels = get_embeddings(estimator, global_step, True) 297 | 298 | if is_training: 299 | precisions = eval_precision(test_embeddings, test_labels, train_embeddings, train_labels, ks) 300 | else: 301 | precisions = eval_precision(test_embeddings, test_labels, test_embeddings, test_labels, ks) 302 | 303 | print("Global Step %d" % global_step) 304 | # print("Max dot product") 305 | # print("\tmean", np.mean(max_dots), "std", np.std(max_dots)) 306 | 307 | if is_training: 308 | embeddings = train_embeddings 309 | else: 310 | embeddings = test_embeddings 311 | 312 | nnz = np.abs(embeddings) >= FLAGS.zero_threshold 313 | sparsity = 1.0 - np.mean(nnz) 314 | flops = np.mean(nnz, axis=0) 315 | flops = np.sum(flops * flops) 316 | 317 | mean_nnz = np.mean(np.sum(nnz, axis=1)) 318 | flops_ratio = flops * FLAGS.embedding_size / (mean_nnz**2) 319 | print("\tSparsity", sparsity, "Flops", flops, "Ratio", flops_ratio) 320 | 321 | print("Precision") 322 | print("".join(["\t@%d %f" % (ks[i], precisions[i]) for i in range(len(ks))])) 323 | 324 | row_sparsity = np.mean(embeddings <= FLAGS.zero_threshold, axis=1) 325 | print("Row sparsity") 326 | print("\tmean", np.mean(row_sparsity), "std", np.std(row_sparsity)) 327 | 328 | col_sparsity = np.mean(embeddings <= FLAGS.zero_threshold, axis=0) 329 | print("Column sparsity") 330 | print("\tmean", np.mean(col_sparsity), "std", np.std(col_sparsity)) 331 | 332 | # print("NMI %f" % nmi) 333 | 334 | if summary_writer is not None: 335 | summary = tf.Summary( 336 | value=[tf.Summary.Value(tag='sparsity/small', simple_value=mean_nnz), 337 | tf.Summary.Value(tag='sparsity/flops', simple_value=flops), 338 | tf.Summary.Value(tag='sparsity/ratio', simple_value=flops_ratio)] + 339 | [tf.Summary.Value(tag='accuracy/precision_%d' % (ks[i]), simple_value=precisions[i]) \ 340 | for i in range(len(ks))] 341 | ) 342 | 343 | summary_writer.add_summary(summary, global_step=global_step) 344 | -------------------------------------------------------------------------------- /cifar/dist.py: -------------------------------------------------------------------------------- 1 | from tensorflow.python.framework import dtypes, ops, sparse_tensor, tensor_shape 2 | from tensorflow.python.ops import array_ops, control_flow_ops, logging_ops, math_ops,\ 3 | nn, script_ops, sparse_ops 4 | from tensorflow.python.summary import summary 5 | import tensorflow as tf 6 | 7 | def deviation_from_kth_element(feature, k): 8 | ''' 9 | Args: 10 | feature - 2-D tensor of size [number of data, feature dimensions] 11 | k - int 12 | number of bits to be activated 13 | k < feature dimensions 14 | Return: 15 | deviation : 2-D Tensor of size [number of data, feature dimensions] 16 | deviation from k th element 17 | ''' 18 | feature_top_k = tf.nn.top_k(feature, k=k+1)[0] # [number of data, k+1] 19 | 20 | rho = tf.stop_gradient(tf.add(feature_top_k[:,k-1], feature_top_k[:,k])/2) # [number of data] 21 | rho_r = tf.reshape(rho, [-1,1]) # [number of data, 1] 22 | 23 | deviation = tf.subtract(feature, rho_r) # [number of data, feature dimensions] 24 | return deviation 25 | 26 | 27 | def pairwise_distance_euclid_v2(feature1, feature2): 28 | """Computes the pairwise distance matrix with numerical stability. 29 | output[i, j] = || feature[i, :] - feature[j, :] ||_2 30 | Args: 31 | feature1 - 2-D Tensor of size [number of data1, feature dimension]. 32 | feature2 - 2-D Tensor of size [number of data2, feature dimension]. 33 | Returns: 34 | pairwise_distances - 2-D Tensor of size [number of data1, number of data2]. 35 | """ 36 | pairwise_distances = math_ops.add( 37 | math_ops.reduce_sum( 38 | math_ops.square(feature1), 39 | axis=[1], 40 | keep_dims=True), 41 | math_ops.reduce_sum( 42 | math_ops.square( 43 | array_ops.transpose(feature2)), 44 | axis=[0], 45 | keep_dims=True)) - 2.0 * math_ops.matmul( 46 | feature1, array_ops.transpose(feature2)) 47 | 48 | # Deal with numerical inaccuracies. Set small negatives to zero. 49 | pairwise_distances = math_ops.maximum(pairwise_distances, 0.0) 50 | # Get the mask where the zero distances are at. 51 | error_mask = math_ops.less_equal(pairwise_distances, 0.0) 52 | 53 | pairwise_distances = math_ops.multiply( 54 | pairwise_distances, math_ops.to_float(math_ops.logical_not(error_mask))) 55 | return pairwise_distances 56 | 57 | def pairwise_distance_euclid(feature, squared=False): 58 | """Computes the pairwise distance matrix with numerical stability. 59 | output[i, j] = || feature[i, :] - feature[j, :] ||_2 60 | Args: 61 | feature - 2-D Tensor of size [number of data, feature dimension]. 62 | squared - Boolean, whether or not to square the pairwise distances. 63 | Returns: 64 | pairwise_distances - 2-D Tensor of size [number of data, number of data]. 65 | """ 66 | pairwise_distances_squared = math_ops.add( 67 | math_ops.reduce_sum( 68 | math_ops.square(feature), 69 | axis=[1], 70 | keep_dims=True), 71 | math_ops.reduce_sum( 72 | math_ops.square( 73 | array_ops.transpose(feature)), 74 | axis=[0], 75 | keep_dims=True)) - 2.0 * math_ops.matmul( 76 | feature, array_ops.transpose(feature)) 77 | 78 | # Deal with numerical inaccuracies. Set small negatives to zero. 79 | pairwise_distances_squared = math_ops.maximum(pairwise_distances_squared, 0.0) 80 | # Get the mask where the zero distances are at. 81 | error_mask = math_ops.less_equal(pairwise_distances_squared, 0.0) 82 | 83 | # Optionally take the sqrt. 84 | if squared: 85 | pairwise_distances = pairwise_distances_squared 86 | else: 87 | pairwise_distances = math_ops.sqrt( 88 | pairwise_distances_squared + math_ops.to_float(error_mask) * 1e-16) 89 | 90 | # Undo conditionally adding 1e-16. 91 | pairwise_distances = math_ops.multiply( 92 | pairwise_distances, math_ops.to_float(math_ops.logical_not(error_mask))) 93 | 94 | num_data = array_ops.shape(feature)[0] 95 | # Explicitly set diagonals to zero. 96 | mask_offdiagonals = array_ops.ones_like(pairwise_distances) - array_ops.diag( 97 | array_ops.ones([num_data])) 98 | pairwise_distances = math_ops.multiply(pairwise_distances, mask_offdiagonals) 99 | return pairwise_distances 100 | 101 | def masked_maximum(data, mask, dim=1): 102 | """Computes the axis wise maximum over chosen elements. 103 | Args: 104 | data : float, 2D tensor [n, m] 105 | mask : bool, 2D tensor [n, m] 106 | dim : int, the dimension over which to compute the maximum. 107 | Returns: 108 | masked_maximums : 2D Tensor 109 | dim=0 => [n,1] 110 | dim=1 => [1,m] 111 | get maximum among mask=1 112 | """ 113 | axis_minimums = math_ops.reduce_min(data, dim, keep_dims=True) 114 | masked_maximums = math_ops.reduce_max(math_ops.multiply(data - axis_minimums, mask), dim, keep_dims=True) + axis_minimums 115 | return masked_maximums 116 | 117 | def masked_minimum(data, mask, dim=1): 118 | """Computes the axis wise minimum over chosen elements. 119 | Args: 120 | data : float, 2D tensor [n, m] 121 | mask : bool, 2D tensor [n, m] 122 | dim : int, the dimension over which to compute the minimum. 123 | Returns: 124 | masked_minimums : 2D Tensor 125 | dim=0 => [n,1] 126 | dim=1 => [1,m] 127 | get minimum among mask=1 128 | """ 129 | axis_maximums = math_ops.reduce_max(data, dim, keep_dims=True) # [n, 1] or [1, m] 130 | masked_minimums = math_ops.reduce_min(math_ops.multiply(data - axis_maximums, mask), dim, keep_dims=True) + axis_maximums # [n, 1] or [1, m] 131 | return masked_minimums 132 | 133 | def triplet_semihard_loss(labels, embeddings, pairwise_distance, margin=1.0): 134 | """Computes the triplet loss with semi-hard negative mining. 135 | Args: 136 | labels - 1-D tensor [batch_size] as tf.int32 137 | multiclass integer labels. 138 | embeddings - 2-D tensor [batch_size, feature dimensions] as tf.float32 139 | `Tensor` of embedding vectors. Embeddings should be l2 normalized. 140 | pairwise_distance - func 141 | with argus 2D tensor [number of data, feature dims] 142 | return 2D tensor [number of data, number of data] 143 | margin - float defaults to be 1.0 144 | margin term in the loss definition. 145 | 146 | Returns: 147 | triplet_loss - tf.float32 scalar. 148 | """ 149 | # Reshape [batch_size] label tensor to a [batch_size, 1] label tensor. 150 | lshape = array_ops.shape(labels) 151 | assert lshape.shape == 1 152 | labels = array_ops.reshape(labels, [lshape[0], 1]) 153 | 154 | # pdist_matrix[i][j] = dist(i, j) 155 | pdist_matrix = pairwise_distance(embeddings) # [batch_size, batch_size] 156 | # adjacency[i][j]=1 if label[i]==label[j] else 0 157 | adjacency = math_ops.equal(labels, array_ops.transpose(labels)) # [batch_size, batch_size] 158 | # adjacency_not[i][j]=0 if label[i]==label[j] else 0 159 | adjacency_not = math_ops.logical_not(adjacency) # [batch_size, batch_size] 160 | batch_size = array_ops.size(labels) 161 | 162 | # Compute the mask. 163 | # pdist_matrix_tile[batch_size*i+j, k] = distance(j, k) 164 | pdist_matrix_tile = array_ops.tile(pdist_matrix, [batch_size, 1]) # [batch_size*batch_size, batch_size] 165 | # reshape(transpose(pdist_matrix), [-1, 1])[batch_size*i+j][0] = distance(j, i) 166 | # tile(adjacency_not, [batch_size, 1])[batch_size*i+j][k] = 1 if label[j]!=label[k] otherwise 0 167 | # mask[batch_size*i+j][k] = 1 if label[j]!=label[k] different label, and distance(j,k)>distance(j,i) 168 | mask = math_ops.logical_and( 169 | array_ops.tile(adjacency_not, [batch_size, 1]), 170 | math_ops.greater(pdist_matrix_tile, array_ops.reshape(array_ops.transpose(pdist_matrix), [-1, 1]))) # [batch_size*batch_size, batch_size] 171 | # mask_final[i][j]=1 if there exists k s.t. label[j]!=label[k] and distance(j,k)>distance(j,i) 172 | mask_final = array_ops.reshape( 173 | math_ops.greater(math_ops.reduce_sum(math_ops.cast(mask, dtype=dtypes.float32), 1, keep_dims=True), 0.0), 174 | [batch_size, batch_size])# [batch_size, batch_size] 175 | # mask_final[i][j]=1 if there exists k s.t. label[i]!=label[k] and distance(i,k)>distance(i,j) 176 | mask_final = array_ops.transpose(mask_final)# [batch_size, batch_size] 177 | 178 | adjacency_not = math_ops.cast(adjacency_not, dtype=dtypes.float32) # [batch_size, batch_size] 179 | mask = math_ops.cast(mask, dtype=dtypes.float32) # [batch_size*batch_size, batch_size] 180 | 181 | # masked_minimum(pdist_matrix_tile, mask)[batch*i+j][1] = pdist_matrix[j][k] s.t minimum over 'k's label[j]!=label[k], distance(j,k)>distance(j,i) 182 | # negatives_outside[i][j] = pdist_matrix[j][k] s.t minimum over 'k's label[j]!=label[k], distance(j,k)>distance(j,i) 183 | negatives_outside = array_ops.reshape(masked_minimum(pdist_matrix_tile, mask), [batch_size, batch_size]) # [batch_size, batch_size] 184 | # negatives_outside[i][j] = pdist_matrix[i][k] s.t minimum over 'k's label[i]!=label[k], distance(i,k)>distance(i,j) 185 | negatives_outside = array_ops.transpose(negatives_outside) # [batch_size, batch_size] 186 | 187 | # negatives_inside[i][j] = pdist_matrix[i][k] s.t maximum over label[i]!=label[k] 188 | negatives_inside = array_ops.tile(masked_maximum(pdist_matrix, adjacency_not), [1, batch_size]) # [batch_size, batch_size] 189 | 190 | # semi_hard_negatives[i][j] = pdist_matrix[i][k] if exists negatives_outside, otherwise negatives_inside 191 | semi_hard_negatives = array_ops.where(mask_final, negatives_outside, negatives_inside) # [batch_size, batch_size] 192 | loss_mat = math_ops.add(margin, pdist_matrix - semi_hard_negatives) # [batch_size, batch_size] 193 | 194 | # mask_positives[i][j] = 1 if label[i]==label[j], and i!=j 195 | mask_positives = math_ops.cast(adjacency, dtype=dtypes.float32) - array_ops.diag(array_ops.ones([batch_size])) 196 | 197 | # In lifted-struct, the authors multiply 0.5 for upper triangular 198 | # in semihard, they take all positive pairs except the diagonal. 199 | num_positives = math_ops.reduce_sum(mask_positives) 200 | 201 | triplet_loss = math_ops.truediv( 202 | math_ops.reduce_sum(math_ops.maximum(math_ops.multiply(loss_mat, mask_positives), 0.0)), 203 | num_positives, 204 | name='triplet_semihard_loss') # hinge 205 | return triplet_loss 206 | 207 | def npairs_loss(labels, embeddings_anchor, embeddings_positive, 208 | reg_lambda=0.002, print_losses=False): 209 | """Computes the npairs loss. 210 | Npairs loss expects paired data where a pair is composed of samples from the 211 | same labels and each pairs in the minibatch have different labels. The loss 212 | has two components. The first component is the L2 regularizer on the 213 | embedding vectors. The second component is the sum of cross entropy loss 214 | which takes each row of the pair-wise similarity matrix as logits and 215 | the remapped one-hot labels as labels. 216 | See: http://www.nec-labs.com/uploads/images/Department-Images/MediaAnalytics/papers/nips16_npairmetriclearning.pdf 217 | 218 | Args: 219 | labels: 1-D tf.int32 `Tensor` of shape [batch_size/2]. 220 | embeddings_anchor: 2-D Tensor of shape [batch_size/2, embedding_dim] for the 221 | embedding vectors for the anchor images. Embeddings should not be 222 | l2 normalized. 223 | embeddings_positive: 2-D Tensor of shape [batch_size/2, embedding_dim] for the 224 | embedding vectors for the positive images. Embeddings should not be 225 | l2 normalized. 226 | reg_lambda: Float. L2 regularization term on the embedding vectors. 227 | print_losses: Boolean. Option to print the xent and l2loss. 228 | Returns: 229 | npairs_loss: tf.float32 scalar. 230 | """ 231 | # Add the regularizer on the embedding. 232 | reg_anchor = math_ops.reduce_mean(math_ops.reduce_sum(math_ops.square(embeddings_anchor), 1)) 233 | reg_positive = math_ops.reduce_mean(math_ops.reduce_sum(math_ops.square(embeddings_positive), 1)) 234 | l2loss = math_ops.multiply(0.25 * reg_lambda, reg_anchor + reg_positive, name='l2loss') 235 | 236 | # Get per pair similarities. 237 | similarity_matrix = math_ops.matmul(embeddings_anchor, embeddings_positive, transpose_a=False, transpose_b=True) # [batch_size/2, batch_size/2] 238 | 239 | # Reshape [batch_size/2] label tensor to a [batch_size/2, 1] label tensor. 240 | lshape = array_ops.shape(labels) 241 | assert lshape.shape == 1 242 | labels = array_ops.reshape(labels, [lshape[0], 1]) 243 | 244 | # labels_remapped[i][j] = 1 if label[i] == label[j] otherwise 0 245 | labels_remapped = math_ops.to_float(math_ops.equal(labels, array_ops.transpose(labels))) # [batch_size/2, batch_size/2] 246 | labels_remapped /= math_ops.reduce_sum(labels_remapped, 1, keep_dims=True) 247 | 248 | # Add the softmax loss. 249 | xent_loss = nn.softmax_cross_entropy_with_logits(logits=similarity_matrix, labels=labels_remapped) 250 | xent_loss = math_ops.reduce_mean(xent_loss, name='xentropy') 251 | 252 | if print_losses: 253 | xent_loss = logging_ops.Print(xent_loss, ['cross entropy:', xent_loss, 'l2loss:', l2loss]) 254 | return l2loss + xent_loss 255 | -------------------------------------------------------------------------------- /evaluate_mfacenet_reranking.py: -------------------------------------------------------------------------------- 1 | import pprint 2 | import time 3 | import json 4 | from tqdm import tqdm 5 | import sys 6 | import os 7 | import numpy as np 8 | import shutil 9 | import glob 10 | from sklearn.model_selection import KFold 11 | import pickle 12 | from copy import deepcopy 13 | import multiprocessing 14 | import h5py 15 | 16 | from absl import flags 17 | from absl import app 18 | 19 | flags.DEFINE_string(name='dense_suffix', 20 | help='suffix of the dense embedding files', 21 | default='dense_resnet_512_run_1.hdf5') 22 | flags.DEFINE_string(name='sparse_suffix', 23 | help='suffix of the sparse embedding files', 24 | default=None) 25 | flags.DEFINE_float(name='threshold', 26 | help='Threshold used to filter the examples', 27 | default=None) 28 | flags.DEFINE_integer(name='rerank_topk', 29 | help='topk elements to rerank', 30 | default=1000) 31 | 32 | 33 | FLAGS = flags.FLAGS 34 | 35 | def compute_recall(sorted_labels, correct_label, top_k): 36 | scores = [[] for _ in top_k] 37 | incorrect_seen = 0 38 | for label in sorted_labels: 39 | if label != -1 and label != correct_label: 40 | continue 41 | if label != correct_label: 42 | incorrect_seen += 1 43 | else: 44 | for i, k in enumerate(top_k): 45 | scores[i].append(1 if incorrect_seen < k else 0) 46 | scores = np.mean(scores, axis=1) 47 | return scores 48 | 49 | 50 | def get_top_k_idx(labels, scores, k): 51 | count = 0 52 | for i, label in enumerate(labels): 53 | if label == -1: 54 | count += 1 55 | if count >= k: 56 | return i 57 | if scores[i] < FLAGS.threshold: 58 | return i 59 | 60 | 61 | def get_recall_k(megaface_sparse, megaface_dense, facescrub_sparse, facescrub_dense, facescrub_labels, top_k, rerank_k): 62 | sparse_database = np.concatenate((facescrub_sparse, megaface_sparse)) 63 | dense_database = np.concatenate((facescrub_dense, megaface_dense)) 64 | dlabels = np.concatenate((facescrub_labels, [-1] * len(megaface_sparse))) 65 | num_probes = len(facescrub_labels) 66 | print("Num probes", num_probes) 67 | print("Multiplying matrices") 68 | dot = np.matmul(facescrub_sparse, sparse_database.T) 69 | print("Sorting the matrix") 70 | ranked_indices = np.argsort(-dot, axis=1) 71 | recalls = [] 72 | it = tqdm(range(num_probes)) 73 | for i in it: 74 | ranked = ranked_indices[i] 75 | labels = dlabels[ranked] 76 | scores = dot[i][ranked] 77 | if rerank_k > 0: 78 | print("Re-ranking top-%d" % rerank_k) 79 | top_idx = get_top_k_idx(labels, scores, rerank_k) 80 | labels, rest = labels[:top_idx+1], labels[top_idx+1:] 81 | ranked = ranked[:top_idx+1] 82 | dense_emb = dense_database[ranked] 83 | dense_dot = np.matmul(facescrub_dense[i], dense_emb.T).ravel() 84 | reranked_indices = np.argsort(-dense_dot) 85 | reranked_labels = labels[reranked_indices] 86 | labels = np.concatenate((reranked_labels, rest)) 87 | recalls.append(compute_recall(labels, facescrub_labels[i], top_k)) 88 | it.set_description('Recalls %.3f %.3f %.3f %.3f' % tuple(np.mean(recalls, axis=0))) 89 | 90 | return np.mean(recalls, axis=0) 91 | 92 | 93 | def main(_): 94 | print("Official Megaface Metrics") 95 | 96 | megaface_sparse = h5py.File('embeddings_reranking/megaface_' + FLAGS.sparse_suffix, 'r')['embedding'] 97 | megaface_dense = h5py.File('embeddings_reranking/megaface_' + FLAGS.dense_suffix, 'r')['embedding'] 98 | 99 | facescrub_sparse = h5py.File('embeddings_reranking/facescrub_' + FLAGS.sparse_suffix, 'r')['embedding'] 100 | facescrub_dense = h5py.File('embeddings_reranking/facescrub_' + FLAGS.dense_suffix, 'r')['embedding'] 101 | facescrub_labels = np.asarray(h5py.File('embeddings_reranking/facescrub_' + FLAGS.sparse_suffix, 'r')['label']) 102 | facescrub_dense_labels = np.asarray( 103 | h5py.File('embeddings_reranking/facescrub_' + FLAGS.dense_suffix, 'r')['label']) 104 | assert(np.all(facescrub_labels == facescrub_dense_labels)) 105 | 106 | nnz = np.abs(megaface_sparse) >= 1e-8 107 | sparsity = np.mean(nnz) 108 | flops = np.mean(nnz, axis=0) 109 | flops = np.sum(flops * flops) 110 | 111 | mean_nnz = np.mean(np.sum(nnz, axis=1)) 112 | flops_ratio = flops * nnz.shape[1] / (mean_nnz**2) 113 | 114 | print("\tSparsity", sparsity, "Flops", flops, "Ratio", flops_ratio) 115 | 116 | top_k = [1, 10, 50, 100] 117 | rerank_k = FLAGS.rerank_topk 118 | print("Rerank TOP_K", rerank_k) 119 | recalls = get_recall_k(megaface_sparse, megaface_dense, 120 | facescrub_sparse, facescrub_dense, facescrub_labels, top_k, rerank_k) 121 | 122 | print("Non Binarized") 123 | for recall, k in zip(recalls, top_k): 124 | print("\trecall@%d" % k, recall) 125 | 126 | 127 | if __name__ == '__main__': 128 | app.run(main) 129 | -------------------------------------------------------------------------------- /evaluate_retrieval.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import model 3 | import pprint 4 | import time 5 | from PIL import Image 6 | import absl 7 | import sys 8 | import os 9 | import numpy as np 10 | import horovod.tensorflow as hvd 11 | from tensorflow.core.framework.summary_pb2 import Summary 12 | from tensorflow import logging 13 | import shutil 14 | import glob 15 | from sklearn.cluster import MiniBatchKMeans 16 | from sklearn.metrics.cluster import normalized_mutual_info_score 17 | import sklearn 18 | import traceback 19 | import logging 20 | import scipy 21 | import flags 22 | from tqdm import tqdm 23 | from copy import deepcopy 24 | from cifar import cifar 25 | 26 | # Define all necessary hyperparameters as flags 27 | FLAGS = absl.flags.FLAGS 28 | 29 | def save_image(image, filename): 30 | feature_ = (image + 1.0) * 128 31 | img = Image.fromarray(np.uint8(feature_), 'RGB') 32 | img.save(filename) 33 | 34 | 35 | def img_full_path(filename): 36 | synset = filename.split('_')[0] 37 | full_path = os.path.join('../imagenet10k/images', synset, filename) 38 | return full_path 39 | 40 | 41 | def get_global_step(ckpt_path=None): 42 | if ckpt_path is None: 43 | ckpt_path = tf.train.latest_checkpoint(FLAGS.model_dir) 44 | base = os.path.basename(ckpt_path) 45 | global_step = int(base.split('-')[1]) 46 | return global_step 47 | 48 | 49 | def get_sorted_global_steps(): 50 | all_paths = glob.glob(os.path.join(FLAGS.model_dir, '*.ckpt*data*')) 51 | global_steps = [int(path.split('-')[-4].split('.')[0]) for path in all_paths] 52 | global_steps.sort() 53 | return global_steps 54 | 55 | 56 | def get_last_eval_step(): 57 | try: 58 | last_step = np.loadtxt(os.path.join(FLAGS.model_dir, 'eval', 'last_eval.txt')) 59 | return last_step 60 | except OSError: 61 | save_last_eval_step(0) 62 | return 0 63 | 64 | 65 | def save_last_eval_step(last_step): 66 | file_path = os.path.join(FLAGS.model_dir, 'eval', 'last_eval.txt') 67 | np.savetxt(file_path, [last_step]) 68 | print('Saved last_step=%d to %s' % (last_step, file_path)) 69 | 70 | 71 | def binarize(mat): 72 | mat = deepcopy(mat) 73 | mat[mat > FLAGS.zero_threshold] = 1.0 74 | mat[mat < FLAGS.zero_threshold] = -1.0 75 | norm = np.linalg.norm(mat, axis=1, keepdims=True) 76 | mat = mat / norm 77 | return mat 78 | 79 | 80 | def assign_by_euclidian_at_k(X, T, k): 81 | """ 82 | X : [nb_samples x nb_features], e.g. 100 x 64 (embeddings) 83 | k : for each sample, assign target labels of k nearest points 84 | """ 85 | # distances = sklearn.metrics.pairwise.pairwise_distances(X) 86 | distances = scipy.spatial.distance.squareform(scipy.spatial.distance.pdist(X)) 87 | # get nearest points 88 | indices = np.argsort(distances, axis=1)[:, 1:k + 1] 89 | return np.array([[T[i] for i in ii] for ii in indices]) 90 | 91 | 92 | def calc_recall_at_k(T, Y, k): 93 | """ 94 | T : [nb_samples] (target labels) 95 | Y : [nb_samples x k] (k predicted labels/neighbours) 96 | """ 97 | s = sum([1 for t, y in zip(T, Y) if t in y[:k]]) 98 | return s / (1. * len(T)) 99 | 100 | 101 | def evaluate(X, T, nb_classes): 102 | # calculate NMI with kmeans clustering 103 | # nmi = sklearn.metrics.cluster.normalized_mutual_info_score( 104 | # sklearn.cluster.KMeans(nb_classes).fit(X).labels_, 105 | # T 106 | # ) 107 | # logging.info("NMI: {:.3f}".format(nmi * 100)) 108 | 109 | # get predictions by assigning nearest 8 neighbors with euclidian 110 | Y = assign_by_euclidian_at_k(X, T, 8) 111 | 112 | # calculate recall @ 1, 2, 4, 8 113 | recall = [] 114 | for k in [1, 2, 4, 8]: 115 | r_at_k = calc_recall_at_k(T, Y, k) 116 | recall.append(r_at_k) 117 | logging.info("R@{} : {:.3f}".format(k, 100 * r_at_k)) 118 | 119 | # return nmi, recall 120 | 121 | 122 | def compute_nmi(embeddings, labels): 123 | print("Clustering ...") 124 | num_classes = len(set(labels)) 125 | print("num_classes %d\tnum_examples %d" % (num_classes, len(embeddings))) 126 | 127 | centers = {} 128 | for emb, label in zip(embeddings, labels): 129 | if label not in centers: 130 | centers[label] = (emb, 1) 131 | else: 132 | pair = centers[label] 133 | centers[label] = (pair[0] + emb, pair[1] + 1) 134 | centers = [centers[key] for key in centers] 135 | centers = [pair[0] / pair[1] for pair in centers] 136 | centers = np.asarray(centers) 137 | print("Computed centers") 138 | 139 | kmeans = MiniBatchKMeans(n_clusters=num_classes, init=centers, 140 | batch_size=1000, max_iter=1000) 141 | kmeans = kmeans.fit(embeddings) 142 | cluster_labels = kmeans.labels_ 143 | nmi = normalized_mutual_info_score(labels, cluster_labels) 144 | return nmi 145 | 146 | MAX_QUERY = 1000 147 | MAX_DISTRACTORS = 500000 148 | def eval_recall(embedding, label, ks): 149 | embedding = np.copy(embedding) 150 | # Normalized embeddings 151 | embedding /= np.linalg.norm(embedding, axis=1).reshape((-1, 1)) 152 | norm = np.sum(embedding * embedding, axis = 1) 153 | num_correct = np.zeros(len(ks)) 154 | num_queries = min(embedding.shape[0], MAX_QUERY) 155 | for i in tqdm(range(num_queries)): 156 | dis = norm[i] + norm - 2 * np.squeeze(np.matmul(embedding[i],embedding.T)) 157 | dis[i] = 1e10 158 | pred = np.argsort(dis) 159 | for j, k in enumerate(ks): 160 | if label[i] in label[pred[:k]]: 161 | num_correct[j] += 1 162 | recall = num_correct / num_queries 163 | return recall 164 | 165 | 166 | def eval_precision(embedding, label, ks): 167 | embedding = np.copy(embedding) 168 | # Normalized embeddings 169 | embedding /= np.linalg.norm(embedding, axis=1).reshape((-1, 1)) 170 | num_correct = np.zeros(len(ks)) 171 | num_queries = min(embedding.shape[0], MAX_QUERY) 172 | for i in tqdm(range(num_queries)): 173 | dis = - np.squeeze(np.matmul(embedding[i],embedding.T)) 174 | dis[i] = 1e10 175 | pred = np.argsort(dis) 176 | for j, k in enumerate(ks): 177 | num_correct[j] += np.mean(label[pred[:k]] == label[i]) 178 | recall = num_correct / num_queries 179 | return recall 180 | 181 | 182 | def compute_metrics(estimator, global_step, is_training, summary_writer=None): 183 | global MAX_DISTRACTORS 184 | global MAX_QUERY 185 | 186 | print("Computing for global step %d" % global_step) 187 | if FLAGS.model == 'cifar100': 188 | input_fn = lambda: cifar.input_fn( 189 | FLAGS.data_dir, is_training, FLAGS.batch_size, 1) 190 | else: 191 | if not is_training: 192 | MAX_DISTRACTORS = 1e10 193 | MAX_QUERY = 10000 194 | input_fn = lambda: model.imagenet_iterator(is_training=is_training) 195 | 196 | predictions = estimator.predict( 197 | input_fn=input_fn, 198 | yield_single_examples=True, 199 | predict_keys=["true_labels", "true_label_texts", "small_embeddings", "filename"], 200 | checkpoint_path=os.path.join(FLAGS.model_dir, 'model.ckpt-%d' % global_step) 201 | ) 202 | 203 | true_labels = [] 204 | true_label_texts = [] 205 | small_embeddings = [] 206 | filenames = [] 207 | 208 | start_time = time.time() 209 | step = 0 210 | for prediction in tqdm(predictions): 211 | filenames.append(prediction["filename"]) 212 | true_labels.append(prediction["true_labels"][0]) 213 | true_label_texts.append(prediction["true_label_texts"]) 214 | small_embeddings.append(prediction["small_embeddings"]) 215 | step += 1 216 | if FLAGS.debug and step >= 100000: 217 | break 218 | if step >= MAX_DISTRACTORS: 219 | break 220 | inference_time = time.time() - start_time 221 | 222 | true_labels = np.asarray(true_labels) 223 | small_embeddings = np.asarray(small_embeddings) 224 | small_embeddings[np.abs(small_embeddings) <= FLAGS.zero_threshold] = 0.0 225 | binarized_embeddings = binarize(small_embeddings) 226 | 227 | ks = [1, 4, 16, 64] 228 | precisions = eval_precision(small_embeddings, true_labels, ks) 229 | bin_precisions = eval_precision(binarized_embeddings, true_labels, ks) 230 | 231 | print("Global Step %d" % global_step) 232 | # print("Max dot product") 233 | # print("\tmean", np.mean(max_dots), "std", np.std(max_dots)) 234 | 235 | metrics_time = time.time() - start_time 236 | 237 | print("Time") 238 | print("\tInference", inference_time / 60, "minutes") 239 | print("\tMetrics", metrics_time / 60, "minutes") 240 | 241 | sys.stdout.flush() 242 | 243 | nnz = np.abs(small_embeddings) >= FLAGS.zero_threshold 244 | sparsity = 1.0 - np.mean(nnz) 245 | flops = np.mean(nnz, axis=0) 246 | flops = np.sum(flops * flops) 247 | 248 | mean_nnz = np.mean(np.sum(nnz, axis=1)) 249 | flops_ratio = flops * FLAGS.embedding_size / (mean_nnz**2) 250 | print("\tSparsity", sparsity, "Flops", flops, "Ratio", flops_ratio) 251 | 252 | print("Precision") 253 | print("".join(["\t@%d %f" % (ks[i], precisions[i]) for i in range(len(ks))])) 254 | 255 | print("Bin Precision") 256 | print("".join(["\t@%d %f" % (ks[i], bin_precisions[i]) for i in range(len(ks))])) 257 | 258 | row_sparsity = np.mean(small_embeddings <= FLAGS.zero_threshold, axis=1) 259 | print("Row sparsity") 260 | print("\tmean", np.mean(row_sparsity), "std", np.std(row_sparsity)) 261 | 262 | col_sparsity = np.mean(small_embeddings <= FLAGS.zero_threshold, axis=0) 263 | print("Column sparsity") 264 | print("\tmean", np.mean(col_sparsity), "std", np.std(col_sparsity)) 265 | 266 | # print("NMI %f" % nmi) 267 | 268 | if summary_writer is not None: 269 | summary = tf.Summary( 270 | value=[tf.Summary.Value(tag='sparsity/small', simple_value=mean_nnz), 271 | tf.Summary.Value(tag='sparsity/flops', simple_value=flops), 272 | tf.Summary.Value(tag='sparsity/ratio', simple_value=flops_ratio)] + 273 | [tf.Summary.Value(tag='accuracy/precision_%d' % (ks[i]), simple_value=precisions[i]) \ 274 | for i in range(len(ks))] + 275 | [tf.Summary.Value(tag='accuracy/precision_%d_bin' % (ks[i]), simple_value=bin_precisions[i]) \ 276 | for i in range(len(ks))] 277 | ) 278 | 279 | summary_writer.add_summary(summary, global_step=global_step) 280 | 281 | 282 | def main(_): 283 | global MAX_DISTRACTORS 284 | global MAX_QUERY 285 | 286 | config = tf.estimator.RunConfig( 287 | model_dir=FLAGS.model_dir, 288 | keep_checkpoint_every_n_hours=1, 289 | save_summary_steps=50, 290 | save_checkpoints_secs=60 * 20, 291 | ) 292 | estimator = tf.estimator.Estimator( 293 | model_fn=model.model_fn, 294 | config=config 295 | ) 296 | 297 | if not FLAGS.evaluate_all: 298 | global_step = get_global_step() 299 | MAX_DISTRACTORS = 1e10 300 | MAX_QUERY = 10000 301 | compute_metrics(estimator, global_step, False, None) 302 | 303 | else: 304 | logdir = os.path.join(FLAGS.model_dir, 'eval') 305 | print('Writing to', logdir) 306 | last_eval_time = time.time() 307 | with tf.summary.FileWriter( 308 | logdir=logdir, 309 | flush_secs=30) as writer: 310 | while True: 311 | global_steps = get_sorted_global_steps() 312 | last_eval_step = get_last_eval_step() 313 | global_steps = [step for step in global_steps if step > last_eval_step] 314 | if len(global_steps) == 0: 315 | curr_time = time.time() 316 | if curr_time - last_eval_time >= 10 * 60: 317 | print("Nothing else to evaluate. Exiting") 318 | break 319 | else: 320 | print("Waiting for 2 minutes") 321 | time.sleep(2 * 60) 322 | continue 323 | 324 | evaluate_next = global_steps[0] 325 | print("Models to be evaluated", global_steps) 326 | print("Attempting to evaluate now, Step %d" % evaluate_next) 327 | try: 328 | compute_metrics(estimator, evaluate_next, writer) 329 | save_last_eval_step(evaluate_next) 330 | last_eval_time = time.time() 331 | except Exception as e: 332 | print("*" * 5, "Failed to evaluate Step %d" % evaluate_next) 333 | traceback.print_exc() 334 | 335 | 336 | if __name__ == '__main__': 337 | absl.app.run(main) 338 | -------------------------------------------------------------------------------- /flags.py: -------------------------------------------------------------------------------- 1 | import absl 2 | 3 | # gb_limit = int(3e11 / 8) # 300 gb divided into 8 processes 4 | # resource.setrlimit(resource.RLIMIT_AS, (gb_limit, gb_limit)) 5 | 6 | # Define all necessary hyperparameters as flags 7 | flags = absl.flags 8 | 9 | # The same flag is used for the various kinds of l1 parameters 10 | # used in the different l1 weighting schemes 11 | flags.DEFINE_float(name='l1_parameter', 12 | help='The value of the l1_parameter', 13 | default=0.0) 14 | flags.DEFINE_float(name='zero_threshold', 15 | help='Threshold below which values will be considered zero', 16 | default=1e-8) 17 | flags.DEFINE_float(name='learning_rate', help='Learning rate', default=1e-2) 18 | flags.DEFINE_float(name='momentum', 19 | help='Momentum value if using momentum optimizer', 20 | default=0.7) 21 | flags.DEFINE_float(name='l1_p_steps', 22 | help='Number of steps to reach maximum weight', 23 | default=5e4) 24 | 25 | flags.DEFINE_integer(name='batch_size', help='Batch size', default=64) 26 | # flags.DEFINE_integer(name='num_epochs', help='Number of training epochs', default=10000) 27 | # Number of classes for Megaface should be 604854 + 1 28 | flags.DEFINE_integer(name='num_classes', help='Number of training classes', default=1001) 29 | flags.DEFINE_integer(name='embedding_size', 30 | help='Embedding size', 31 | default=1024) 32 | flags.DEFINE_integer(name='test_size', help='Test size', default=2000) 33 | flags.DEFINE_integer(name='decay_step', help='Number of steps after which to decay lr by 10', default=None) 34 | flags.DEFINE_integer(name='max_steps', help='Number of steps to train', default=1000000) 35 | 36 | flags.DEFINE_string(name='mobilenet_checkpoint_path', 37 | help='Path to the checkpoint file for MobileNetV2', 38 | default=None) 39 | flags.DEFINE_string(name='model_dir', 40 | help='Path to the model directory for checkpoints and summaries', 41 | default='checkpoints/sparse_mobilenet/') 42 | flags.DEFINE_string(name='data_dir', 43 | help='Directory containing the training and validation data', 44 | default='../imagenet/tfrecords') 45 | flags.DEFINE_string(name='lfw_dir', 46 | help='Directory containing the LFW data', 47 | default='../msceleb1m') 48 | flags.DEFINE_string(name='optimizer', 49 | help="The optimizer to use. Options are 'sgd', 'mom' and 'adam'", 50 | default='sgd') 51 | flags.DEFINE_string(name='model', 52 | help='Options are mobilenet, mobilefacenet, metric_learning', 53 | default='mobilenet') 54 | # Flag for the l1 weighting scheme to be used. 55 | # Constant: constant l1 weight equal to l1_parameter 56 | # Dynamic_1: l1_parameter / EMA(l1_norm) 57 | # Dynamic_2: l1_weight += \lambda * (c - cl_loss) 58 | flags.DEFINE_string(name='l1_weighing_scheme', 59 | help='constant, dynamic_1, dynamic_2, dynamic_3', 60 | default=None) 61 | flags.DEFINE_string(name='sparsity_type', 62 | help='Options are l1_norm, flops_sur', 63 | default=None) 64 | flags.DEFINE_string(name='final_activation', 65 | help='The activation to use in the final layer producing the embeddings', 66 | default=None) 67 | flags.DEFINE_string(name='megaface_dir', 68 | help='Root directory of the megaface tfrecords', 69 | default='../megaface_distractors/tfrecords_official/') 70 | flags.DEFINE_string(name='facescrub_dir', 71 | help='Root directory of the facescrub tfrecords', 72 | default='../facescrub/tfrecords_official/') 73 | # flags.DEFINE_string(name='megaface_list', 74 | # help='List of megaface images', 75 | # default='../megaface_distractors/templatelists/megaface_features_list.json_10000_1') 76 | # flags.DEFINE_string(name='facescrub_list', 77 | # help='List of facescrub images', 78 | # default='../megaface_distractors/templatelists/facescrub_features_list.json') 79 | 80 | 81 | flags.DEFINE_boolean(name='restore_last_layer', 82 | help='If True, the last layer will be restored when warm starting from ' 83 | 'checkpoint. If checkpoint has a different number of output classes, ' 84 | 'then set to False.', 85 | default=True) 86 | flags.DEFINE_boolean(name='evaluate', 87 | help='If true then evaluates model while training', 88 | default=False) 89 | 90 | # Used by the evaluation scripts 91 | flags.DEFINE_boolean(name='debug', 92 | help='If True then inference will be stopped at 10000 steps', 93 | default=False) 94 | -------------------------------------------------------------------------------- /get_embeddings_reranking.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import model 3 | import pprint 4 | import time 5 | from PIL import Image 6 | import absl 7 | import sys 8 | import os 9 | import numpy as np 10 | import horovod.tensorflow as hvd 11 | from tensorflow.core.framework.summary_pb2 import Summary 12 | from tensorflow import logging 13 | import shutil 14 | import h5py 15 | import json 16 | 17 | # Define all necessary hyperparameters as flags 18 | flags = absl.flags 19 | 20 | flags.DEFINE_float(name='zero_threshold', 21 | help='Threshold below which values will be considered zero', 22 | default=1e-8) 23 | flags.DEFINE_integer(name='batch_size', help='Batch size', default=64) 24 | flags.DEFINE_integer(name='num_classes', help='Number of classes', default=9001) 25 | flags.DEFINE_integer(name='num_epochs', help='Number of epochs', default=1) 26 | flags.DEFINE_integer(name='embedding_size', 27 | help='Embedding size', 28 | default=1024) 29 | 30 | flags.DEFINE_string(name='model_dir', 31 | help='Path to the model directory for checkpoints and summaries', 32 | default=None) 33 | flags.DEFINE_string(name='data_dir', 34 | help='Directory containing the validation data', 35 | default=None) 36 | flags.DEFINE_string(name='model', 37 | help='Options are mobilenet and mobilefacenet', 38 | default=None) 39 | flags.DEFINE_string(name='output', 40 | help='The output filename', 41 | default=None) 42 | flags.DEFINE_string(name='final_activation', 43 | help='The activation to use in the final layer producing the embeddings', 44 | default=None) 45 | flags.DEFINE_string(name='filelist', 46 | help='The list of evaluation files', 47 | default=None) 48 | flags.DEFINE_string(name='prefix', 49 | help='appended prefix for the hdf5 file', 50 | default=None) 51 | 52 | flags.DEFINE_boolean(name='debug', 53 | help='If True, inference will be stopped at 10000 steps', 54 | default=True) 55 | flags.DEFINE_boolean(name='is_training', 56 | help='If True, embeddings for the training data will be computed', 57 | default=False) 58 | 59 | 60 | FLAGS = flags.FLAGS 61 | 62 | 63 | def replace_extension(filename): 64 | pos = filename.rfind(b'.') 65 | if pos != -1: 66 | filename = filename[:pos+1] 67 | filename = filename + b'png' 68 | return filename 69 | 70 | def main(_): 71 | with open(FLAGS.filelist, 'r') as fin: 72 | filelist = json.load(fin) 73 | filelist = filelist['path'] 74 | filelist = [np.string_(os.path.basename(filename)) for filename in filelist] 75 | if FLAGS.prefix == 'facescrub': 76 | filelist = [filename.replace(b" ", b"_") for filename in filelist] 77 | filelist = [replace_extension(filename) for filename in filelist] 78 | 79 | file_dict = {filename:i for i, filename in enumerate(filelist)} 80 | num_samples = len(filelist) 81 | 82 | config = tf.estimator.RunConfig( 83 | model_dir=FLAGS.model_dir, 84 | keep_checkpoint_every_n_hours=1, 85 | save_summary_steps=50, 86 | save_checkpoints_secs=60 * 20, 87 | ) 88 | estimator = tf.estimator.Estimator( 89 | model_fn=model.model_fn, 90 | config=config 91 | ) 92 | 93 | predictions = estimator.predict( 94 | input_fn=lambda: model.imagenet_iterator(is_training=FLAGS.is_training), 95 | yield_single_examples=True, 96 | predict_keys=["true_labels", "true_label_texts", "small_embeddings", "filename"] 97 | ) 98 | 99 | if not os.path.exists('embeddings_reranking'): 100 | os.mkdir('embeddings_reranking') 101 | 102 | if FLAGS.output is None: 103 | output_file = FLAGS.prefix + "_" + os.path.basename(FLAGS.model_dir.rstrip('/')) + '.hdf5' 104 | else: 105 | output_file = FLAGS.output 106 | output_file = os.path.join('embeddings_reranking', output_file) 107 | if os.path.exists(output_file): 108 | print('%s exists. Exiting' % output_file) 109 | sys.exit() 110 | 111 | print("*" * 10, "Writing to", output_file) 112 | 113 | sparsity = [] 114 | embeddings = [] 115 | labels = [] 116 | idx = -np.ones(num_samples, dtype=np.int32) 117 | 118 | for step, prediction in enumerate(predictions): 119 | embedding = prediction["small_embeddings"] 120 | true_label = prediction["true_labels"][0] 121 | true_label_text = np.string_(prediction["true_label_texts"]) 122 | filename = np.string_(prediction["filename"]) 123 | 124 | idxs = np.where(np.abs(embedding) >= FLAGS.zero_threshold)[0] 125 | sparsity.append(len(idxs)) 126 | 127 | embeddings.append(embedding) 128 | labels.append(true_label) 129 | 130 | id = file_dict[filename] 131 | assert(idx[id] == -1) 132 | idx[id] = step 133 | 134 | if step % 10000 == 0: 135 | print('Step %d Av. Sparsity %f' % (step, np.mean(sparsity))) 136 | sys.stdout.flush() 137 | if FLAGS.debug and step >= 10000: 138 | break 139 | 140 | embeddings = np.asarray(embeddings) 141 | embeddings = embeddings[idx] 142 | print("embeddings shape", embeddings.shape) 143 | 144 | labels = np.asarray(labels) 145 | labels = labels[idx] 146 | print("labels shape", labels.shape) 147 | 148 | 149 | h5file = h5py.File(output_file, mode='w') 150 | h5file.create_dataset(name='embedding', data=embeddings) 151 | h5file.create_dataset(name='label', data=labels) 152 | h5file.close() 153 | print("Done writing to", output_file) 154 | 155 | 156 | if __name__ == '__main__': 157 | absl.app.run(main) 158 | -------------------------------------------------------------------------------- /imagenet_preprocessing.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Provides utilities to preprocess images. 16 | 17 | Training images are sampled using the provided bounding boxes, and subsequently 18 | cropped to the sampled bounding box. Images are additionally flipped randomly, 19 | then resized to the target output size (without aspect-ratio preservation). 20 | 21 | Images used during evaluation are resized (with aspect-ratio preservation) and 22 | centrally cropped. 23 | 24 | All images undergo mean color subtraction. 25 | 26 | Note that these steps are colloquially referred to as "ResNet preprocessing," 27 | and they differ from "VGG preprocessing," which does not use bounding boxes 28 | and instead does an aspect-preserving resize followed by random crop during 29 | training. (These both differ from "Inception preprocessing," which introduces 30 | color distortion steps.) 31 | 32 | """ 33 | 34 | from __future__ import absolute_import 35 | from __future__ import division 36 | from __future__ import print_function 37 | 38 | import tensorflow as tf 39 | 40 | _R_MEAN = 123.68 41 | _G_MEAN = 116.78 42 | _B_MEAN = 103.94 43 | _CHANNEL_MEANS = [_R_MEAN, _G_MEAN, _B_MEAN] 44 | 45 | # The lower bound for the smallest side of the image for aspect-preserving 46 | # resizing. For example, if an image is 500 x 1000, it will be resized to 47 | # _RESIZE_MIN x (_RESIZE_MIN * 2). 48 | _RESIZE_MIN = 256 49 | _CROP_SIZE = 227 50 | _MIN_OBJECT_COVERED = 0.1 51 | _ASPECT_RATIO_RANGE = [0.75, 1.33] 52 | _AREA_RANGE = [0.05, 1.0] 53 | 54 | 55 | def _decode_crop_and_flip(image_buffer, bbox, num_channels): 56 | """Crops the given image to a random part of the image, and randomly flips. 57 | 58 | We use the fused decode_and_crop op, which performs better than the two ops 59 | used separately in series, but note that this requires that the image be 60 | passed in as an un-decoded string Tensor. 61 | 62 | Args: 63 | image_buffer: scalar string Tensor representing the raw JPEG image buffer. 64 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 65 | where each coordinate is [0, 1) and the coordinates are arranged as 66 | [ymin, xmin, ymax, xmax]. 67 | num_channels: Integer depth of the image buffer for decoding. 68 | 69 | Returns: 70 | 3-D tensor with cropped image. 71 | 72 | """ 73 | # A large fraction of image datasets contain a human-annotated bounding box 74 | # delineating the region of the image containing the object of interest. We 75 | # choose to create a new bounding box for the object which is a randomly 76 | # distorted version of the human-annotated bounding box that obeys an 77 | # allowed range of aspect ratios, sizes and overlap with the human-annotated 78 | # bounding box. If no box is supplied, then we assume the bounding box is 79 | # the entire image. 80 | sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box( 81 | tf.image.extract_jpeg_shape(image_buffer), 82 | bounding_boxes=bbox, 83 | min_object_covered=_MIN_OBJECT_COVERED, 84 | aspect_ratio_range=_ASPECT_RATIO_RANGE, 85 | area_range=_AREA_RANGE, 86 | max_attempts=100, 87 | use_image_if_no_bounding_boxes=True) 88 | bbox_begin, bbox_size, _ = sample_distorted_bounding_box 89 | 90 | # Reassemble the bounding box in the format the crop op requires. 91 | offset_y, offset_x, _ = tf.unstack(bbox_begin) 92 | target_height, target_width, _ = tf.unstack(bbox_size) 93 | crop_window = tf.stack([offset_y, offset_x, target_height, target_width]) 94 | 95 | # Use the fused decode and crop op here, which is faster than each in series. 96 | cropped = tf.image.decode_and_crop_jpeg( 97 | image_buffer, crop_window, channels=num_channels) 98 | 99 | # Flip to add a little more random distortion in. 100 | cropped = tf.image.random_flip_left_right(cropped) 101 | return cropped 102 | 103 | 104 | def _central_crop(image, crop_height, crop_width): 105 | """Performs central crops of the given image list. 106 | 107 | Args: 108 | image: a 3-D image tensor 109 | crop_height: the height of the image following the crop. 110 | crop_width: the width of the image following the crop. 111 | 112 | Returns: 113 | 3-D tensor with cropped image. 114 | """ 115 | shape = tf.shape(image) 116 | height, width = shape[0], shape[1] 117 | 118 | amount_to_be_cropped_h = (height - crop_height) 119 | crop_top = amount_to_be_cropped_h // 2 120 | amount_to_be_cropped_w = (width - crop_width) 121 | crop_left = amount_to_be_cropped_w // 2 122 | return tf.slice( 123 | image, [crop_top, crop_left, 0], [crop_height, crop_width, -1]) 124 | 125 | 126 | def _mean_image_subtraction(image, means, num_channels): 127 | """Subtracts the given means from each image channel. 128 | 129 | For example: 130 | means = [123.68, 116.779, 103.939] 131 | image = _mean_image_subtraction(image, means) 132 | 133 | Note that the rank of `image` must be known. 134 | 135 | Args: 136 | image: a tensor of size [height, width, C]. 137 | means: a C-vector of values to subtract from each channel. 138 | num_channels: number of color channels in the image that will be distorted. 139 | 140 | Returns: 141 | the centered image. 142 | 143 | Raises: 144 | ValueError: If the rank of `image` is unknown, if `image` has a rank other 145 | than three or if the number of channels in `image` doesn't match the 146 | number of values in `means`. 147 | """ 148 | if image.get_shape().ndims != 3: 149 | raise ValueError('Input must be of size [height, width, C>0]') 150 | 151 | if len(means) != num_channels: 152 | raise ValueError('len(means) must match the number of channels') 153 | 154 | # We have a 1-D tensor of means; convert to 3-D. 155 | means = tf.expand_dims(tf.expand_dims(means, 0), 0) 156 | 157 | return image - means 158 | 159 | 160 | def _smallest_size_at_least(height, width, resize_min): 161 | """Computes new shape with the smallest side equal to `smallest_side`. 162 | 163 | Computes new shape with the smallest side equal to `smallest_side` while 164 | preserving the original aspect ratio. 165 | 166 | Args: 167 | height: an int32 scalar tensor indicating the current height. 168 | width: an int32 scalar tensor indicating the current width. 169 | resize_min: A python integer or scalar `Tensor` indicating the size of 170 | the smallest side after resize. 171 | 172 | Returns: 173 | new_height: an int32 scalar tensor indicating the new height. 174 | new_width: an int32 scalar tensor indicating the new width. 175 | """ 176 | resize_min = tf.cast(resize_min, tf.float32) 177 | 178 | # Convert to floats to make subsequent calculations go smoothly. 179 | height, width = tf.cast(height, tf.float32), tf.cast(width, tf.float32) 180 | 181 | smaller_dim = tf.minimum(height, width) 182 | scale_ratio = resize_min / smaller_dim 183 | 184 | # Convert back to ints to make heights and widths that TF ops will accept. 185 | new_height = tf.cast(height * scale_ratio, tf.int32) 186 | new_width = tf.cast(width * scale_ratio, tf.int32) 187 | 188 | return new_height, new_width 189 | 190 | 191 | def _aspect_preserving_resize(image, resize_min): 192 | """Resize images preserving the original aspect ratio. 193 | 194 | Args: 195 | image: A 3-D image `Tensor`. 196 | resize_min: A python integer or scalar `Tensor` indicating the size of 197 | the smallest side after resize. 198 | 199 | Returns: 200 | resized_image: A 3-D tensor containing the resized image. 201 | """ 202 | shape = tf.shape(image) 203 | height, width = shape[0], shape[1] 204 | 205 | new_height, new_width = _smallest_size_at_least(height, width, resize_min) 206 | 207 | return _resize_image(image, new_height, new_width) 208 | 209 | 210 | def _resize_image(image, height, width): 211 | """Simple wrapper around tf.resize_images. 212 | 213 | This is primarily to make sure we use the same `ResizeMethod` and other 214 | details each time. 215 | 216 | Args: 217 | image: A 3-D image `Tensor`. 218 | height: The target height for the resized image. 219 | width: The target width for the resized image. 220 | 221 | Returns: 222 | resized_image: A 3-D tensor containing the resized image. The first two 223 | dimensions have the shape [height, width]. 224 | """ 225 | return tf.image.resize_images( 226 | image, [height, width], method=tf.image.ResizeMethod.BILINEAR, 227 | align_corners=False) 228 | 229 | 230 | def preprocess_image(image_buffer, bbox, output_height, output_width, 231 | num_channels, is_training=False, fixed_crop=False): 232 | """Preprocesses the given image. 233 | 234 | Preprocessing includes decoding, cropping, and resizing for both training 235 | and eval images. Training preprocessing, however, introduces some random 236 | distortion of the image to improve accuracy. 237 | 238 | Args: 239 | image_buffer: scalar string Tensor representing the raw JPEG image buffer. 240 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 241 | where each coordinate is [0, 1) and the coordinates are arranged as 242 | [ymin, xmin, ymax, xmax]. 243 | output_height: The height of the image after preprocessing. 244 | output_width: The width of the image after preprocessing. 245 | num_channels: Integer depth of the image buffer for decoding. 246 | is_training: `True` if we're preprocessing the image for training and 247 | `False` otherwise. 248 | 249 | Returns: 250 | A preprocessed image. 251 | """ 252 | if fixed_crop: 253 | image = tf.image.decode_jpeg(image_buffer, channels=num_channels) 254 | image = tf.image.resize_images(image, [_RESIZE_MIN, _RESIZE_MIN]) 255 | # if is_training: 256 | # image = tf.image.random_flip_left_right(image) 257 | # image = tf.random_crop(image, size=[output_height, output_width, 3]) 258 | # else: 259 | # image = _central_crop(image, output_height, output_width) 260 | image = tf.cast(image, tf.float32) 261 | elif is_training: 262 | # For training, we want to randomize some of the distortions. 263 | image = _decode_crop_and_flip(image_buffer, bbox, num_channels) 264 | image = _resize_image(image, output_height, output_width) 265 | else: 266 | # For validation, we want to decode, resize, then just crop the middle. 267 | image = tf.image.decode_jpeg(image_buffer, channels=num_channels) 268 | image = _aspect_preserving_resize(image, _RESIZE_MIN) 269 | image = _central_crop(image, output_height, output_width) 270 | 271 | image.set_shape([output_height, output_width, num_channels]) 272 | 273 | # return _mean_image_subtraction(image, _CHANNEL_MEANS, num_channels) 274 | # Return the rescaled images instead of centered images as above 275 | return image / 128.0 - 1.0 276 | -------------------------------------------------------------------------------- /input_pipeline.py: -------------------------------------------------------------------------------- 1 | """ 2 | The following code has been adapted from 3 | https://github.com/tensorflow/models/blob/master/official/resnet/imagenet_main.py 4 | """ 5 | from __future__ import absolute_import 6 | from __future__ import division 7 | from __future__ import print_function 8 | 9 | import os 10 | import glob 11 | 12 | from absl import app as absl_app 13 | from absl import flags 14 | import tensorflow as tf # pylint: disable=g-bad-import-order 15 | 16 | import imagenet_preprocessing 17 | 18 | _DEFAULT_IMAGE_SIZE = 224 19 | _NUM_CHANNELS = 3 20 | # _NUM_CLASSES = 1001 21 | 22 | # _NUM_IMAGES = { 23 | # 'train': 1281167, 24 | # 'validation': 50000, 25 | # } 26 | 27 | _NUM_TRAIN_FILES = 1024 28 | _SHUFFLE_BUFFER_SIZE = 1000 29 | 30 | # DATASET_NAME = 'ImageNet' 31 | 32 | ############################################################################### 33 | # Data processing 34 | ############################################################################### 35 | def get_filenames(is_training, data_dir): 36 | """Return filenames for dataset.""" 37 | if is_training: 38 | # return [ 39 | # os.path.join(data_dir, 'train-%05d-of-01024' % i) 40 | # for i in range(_NUM_TRAIN_FILES)] 41 | return glob.glob(os.path.join(data_dir, 'train-*')) 42 | else: 43 | # return [ 44 | # os.path.join(data_dir, 'validation-%05d-of-00128' % i) 45 | # for i in range(128)] 46 | return glob.glob(os.path.join(data_dir, 'validation-*')) 47 | 48 | 49 | def _parse_example_proto(example_serialized): 50 | """Parses an Example proto containing a training example of an image. 51 | 52 | The output of the build_image_data.py image preprocessing script is a dataset 53 | containing serialized Example protocol buffers. Each Example proto contains 54 | the following fields (values are included as examples): 55 | 56 | image/height: 462 57 | image/width: 581 58 | image/colorspace: 'RGB' 59 | image/channels: 3 60 | image/class/label: 615 61 | image/class/synset: 'n03623198' 62 | image/class/text: 'knee pad' 63 | image/object/bbox/xmin: 0.1 64 | image/object/bbox/xmax: 0.9 65 | image/object/bbox/ymin: 0.2 66 | image/object/bbox/ymax: 0.6 67 | image/object/bbox/label: 615 68 | image/format: 'JPEG' 69 | image/filename: 'ILSVRC2012_val_00041207.JPEG' 70 | image/encoded: 71 | 72 | Args: 73 | example_serialized: scalar Tensor tf.string containing a serialized 74 | Example protocol buffer. 75 | 76 | Returns: 77 | image_buffer: Tensor tf.string containing the contents of a JPEG file. 78 | label: Tensor tf.int32 containing the label. 79 | bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords] 80 | where each coordinate is [0, 1) and the coordinates are arranged as 81 | [ymin, xmin, ymax, xmax]. 82 | """ 83 | # Dense features in Example proto. 84 | feature_map = { 85 | 'image/encoded': tf.FixedLenFeature([], dtype=tf.string, 86 | default_value=''), 87 | 'image/class/label': tf.FixedLenFeature([1], dtype=tf.int64, 88 | default_value=-1), 89 | 'image/class/text': tf.FixedLenFeature([], dtype=tf.string, 90 | default_value=''), 91 | 'image/filename': tf.FixedLenFeature([], dtype=tf.string, 92 | default_value=''), 93 | } 94 | sparse_float32 = tf.VarLenFeature(dtype=tf.float32) 95 | # Sparse features in Example proto. 96 | feature_map.update( 97 | {k: sparse_float32 for k in ['image/object/bbox/xmin', 98 | 'image/object/bbox/ymin', 99 | 'image/object/bbox/xmax', 100 | 'image/object/bbox/ymax']}) 101 | 102 | features = tf.parse_single_example(example_serialized, feature_map) 103 | label = tf.cast(features['image/class/label'], dtype=tf.int32) 104 | label_text = features['image/class/text'] 105 | filename = features['image/filename'] 106 | 107 | xmin = tf.expand_dims(features['image/object/bbox/xmin'].values, 0) 108 | ymin = tf.expand_dims(features['image/object/bbox/ymin'].values, 0) 109 | xmax = tf.expand_dims(features['image/object/bbox/xmax'].values, 0) 110 | ymax = tf.expand_dims(features['image/object/bbox/ymax'].values, 0) 111 | 112 | # Note that we impose an ordering of (y, x) just to make life difficult. 113 | bbox = tf.concat([ymin, xmin, ymax, xmax], 0) 114 | 115 | # Force the variable number of bounding boxes into the shape 116 | # [1, num_boxes, coords]. 117 | bbox = tf.expand_dims(bbox, 0) 118 | bbox = tf.transpose(bbox, [0, 2, 1]) 119 | 120 | return features['image/encoded'], label, bbox, label_text, filename 121 | 122 | 123 | def parse_record(raw_record, is_training, return_label_text=True, fixed_crop=False): 124 | """Parses a record containing a training example of an image. 125 | 126 | The input record is parsed into a label and image, and the image is passed 127 | through preprocessing steps (cropping, flipping, and so on). 128 | 129 | Args: 130 | raw_record: scalar Tensor tf.string containing a serialized 131 | Example protocol buffer. 132 | is_training: A boolean denoting whether the input is for training. 133 | 134 | Returns: 135 | Tuple with processed image tensor and one-hot-encoded label tensor. 136 | """ 137 | image_buffer, label, bbox, label_text, filename = _parse_example_proto(raw_record) 138 | 139 | image = imagenet_preprocessing.preprocess_image( 140 | image_buffer=image_buffer, 141 | bbox=bbox, 142 | output_height=_DEFAULT_IMAGE_SIZE, 143 | output_width=_DEFAULT_IMAGE_SIZE, 144 | num_channels=_NUM_CHANNELS, 145 | is_training=is_training, 146 | fixed_crop=fixed_crop) 147 | 148 | if return_label_text: 149 | return image, label, label_text, filename 150 | else: 151 | return image, label 152 | 153 | 154 | def process_record_dataset(dataset, is_training, batch_size, shuffle_buffer, 155 | parse_record_fn, num_epochs=1, return_label_text=True, 156 | filter_fn=None, fixed_crop=False): 157 | """Given a Dataset with raw records, return an iterator over the records. 158 | 159 | Args: 160 | dataset: A Dataset representing raw records 161 | is_training: A boolean denoting whether the input is for training. 162 | batch_size: The number of samples per batch. 163 | shuffle_buffer: The buffer size to use when shuffling records. A larger 164 | value results in better randomness, but smaller values reduce startup 165 | time and use less memory. 166 | parse_record_fn: A function that takes a raw record and returns the 167 | corresponding (image, label) pair. 168 | num_epochs: The number of epochs to repeat the dataset. 169 | 170 | Returns: 171 | Dataset of (image, label) pairs ready for iteration. 172 | """ 173 | 174 | # We prefetch a batch at a time, This can help smooth out the time taken to 175 | # load input files as we go through shuffling and processing. 176 | dataset = dataset.prefetch(buffer_size=batch_size) 177 | dataset = dataset.map( 178 | lambda value: parse_record_fn(value, is_training, return_label_text, fixed_crop)) 179 | if filter_fn is not None: 180 | dataset = dataset.filter(filter_fn) 181 | 182 | # Shuffle the records. Note that we shuffle before repeating to ensure 183 | # that the shuffling respects epoch boundaries. 184 | dataset = dataset.shuffle(buffer_size=shuffle_buffer) 185 | 186 | # If we are training over multiple epochs before evaluating, repeat the 187 | # dataset for the appropriate number of epochs. 188 | dataset = dataset.repeat(num_epochs) 189 | dataset = dataset.batch(batch_size) 190 | 191 | # Operations between the final prefetch and the get_next call to the iterator 192 | # will happen synchronously during run time. We prefetch here again to 193 | # background all of the above processing work and keep it out of the 194 | # critical training path. Setting buffer_size to tf.contrib.data.AUTOTUNE 195 | # allows DistributionStrategies to adjust how many batches to fetch based 196 | # on how many devices are present. 197 | dataset = dataset.prefetch(buffer_size=tf.contrib.data.AUTOTUNE) 198 | 199 | return dataset 200 | 201 | 202 | def input_fn(is_training, data_dir, batch_size, num_epochs=1, 203 | return_label_text=True, filter_fn=None, fixed_crop=False): 204 | """Input function which provides batches for train or eval. 205 | 206 | Args: 207 | is_training: A boolean denoting whether the input is for training. 208 | data_dir: The directory containing the input data. 209 | batch_size: The number of samples per batch. 210 | num_epochs: The number of epochs to repeat the dataset. 211 | 212 | Returns: 213 | A Dataset object that can be used for iteration over the input data. 214 | """ 215 | filenames = get_filenames(is_training, data_dir) 216 | if len(filenames) == 0: 217 | raise ValueError("No files found in %s" % data_dir) 218 | 219 | dataset = tf.data.Dataset.from_tensor_slices(filenames) 220 | 221 | # if is_training: 222 | # Shuffle the input files 223 | dataset = dataset.shuffle(buffer_size=_NUM_TRAIN_FILES) 224 | 225 | # Convert to individual records. 226 | # cycle_length = 10 means 10 files will be read and deserialized in parallel. 227 | # This number is low enough to not cause too much contention on small systems 228 | # but high enough to provide the benefits of parallelization. You may want 229 | # to increase this number if you have a large number of CPU cores. 230 | dataset = dataset.apply(tf.contrib.data.parallel_interleave( 231 | tf.data.TFRecordDataset, cycle_length=30)) 232 | 233 | return process_record_dataset( 234 | dataset, is_training, batch_size, _SHUFFLE_BUFFER_SIZE, parse_record, 235 | num_epochs, return_label_text, filter_fn, fixed_crop 236 | ) 237 | -------------------------------------------------------------------------------- /megaface_embeddings.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from __future__ import division 3 | from __future__ import print_function 4 | 5 | import tensorflow as tf 6 | import model 7 | import pprint 8 | import time 9 | from PIL import Image 10 | import absl 11 | import sys 12 | import os 13 | import numpy as np 14 | import horovod.tensorflow as hvd 15 | from tensorflow.core.framework.summary_pb2 import Summary 16 | from tensorflow import logging 17 | import shutil 18 | import glob 19 | from sklearn.model_selection import KFold 20 | import traceback 21 | import imagenet_preprocessing 22 | import pickle 23 | import matio 24 | import json 25 | import multiprocessing 26 | import tqdm 27 | 28 | 29 | flags = absl.flags 30 | flags.DEFINE_float(name='zero_threshold', 31 | help='Threshold below which values will be considered zero', 32 | default=1e-8) 33 | 34 | flags.DEFINE_integer(name='batch_size', help='Batch size', default=64) 35 | flags.DEFINE_integer(name='num_classes', help='Number of classes', default=9001) 36 | flags.DEFINE_integer(name='num_epochs', help='Number of epochs', default=1) 37 | flags.DEFINE_integer(name='embedding_size', 38 | help='Embedding size', 39 | default=1024) 40 | flags.DEFINE_integer(name='num_threads_preprocess', 41 | help='Number of threads for tensorflow preprocess pipeline', 42 | default=50) 43 | flags.DEFINE_integer(name='num_threads_save', 44 | help='Number of threads for saving mat files', 45 | default=50) 46 | flags.DEFINE_integer(name='step', 47 | help='The global step of the checkpoint to load', 48 | default=None) 49 | 50 | flags.DEFINE_string(name='model_dir', 51 | help='Path to the model directory for checkpoints and summaries', 52 | default='checkpoints/kubernetes/sparse_imagenet10k_dynamic_2_mom/') 53 | flags.DEFINE_string(name='data_dir', 54 | help='Directory containing the validation data', 55 | default='../imagenet10k/tfrecords/val') 56 | flags.DEFINE_string(name='filelist', 57 | help='JSON file containing the list of files', 58 | default=None) 59 | flags.DEFINE_string(name='model', 60 | help='Options are mobilenet and mobilefacenet', 61 | default='mobilenet') 62 | flags.DEFINE_string(name='output_suffix', 63 | help='The output filename', 64 | default='embeddings') 65 | flags.DEFINE_string(name='final_activation', 66 | help='The activation to use in the final layer producing the embeddings', 67 | default=None) 68 | 69 | flags.DEFINE_boolean(name='debug', 70 | help='If True then inference will be stopped at 10000 steps', 71 | default=True) 72 | flags.DEFINE_boolean(name='is_training', 73 | help='If True then embeddings for the training data will be computed', 74 | default=False) 75 | 76 | DENSE_DIR = "checkpoints/mobilenet_retrained/" 77 | FLAGS = flags.FLAGS 78 | 79 | def input_fn(filelist): 80 | dummy_labels = tf.constant([1] * len(filelist)) 81 | filelist = tf.constant(filelist) 82 | dataset = tf.data.Dataset.from_tensor_slices((filelist, dummy_labels)) 83 | 84 | def _parse_function(filename, label): 85 | image_string = tf.read_file(filename) 86 | image = tf.image.decode_jpeg(image_string, channels=3) 87 | image = tf.image.resize_images(image, [112, 112]) 88 | image = tf.cast(image, dtype=tf.float32) 89 | image = image / 128.0 - 1.0 90 | return image, filename, label 91 | 92 | dataset = dataset.map(_parse_function, 93 | num_parallel_calls=FLAGS.num_threads_preprocess) 94 | dataset = dataset.prefetch(buffer_size=tf.contrib.data.AUTOTUNE) 95 | dataset = dataset.batch(FLAGS.batch_size) 96 | dataset = dataset.prefetch(buffer_size=tf.contrib.data.AUTOTUNE) 97 | 98 | iterator = dataset.make_one_shot_iterator() 99 | image, filename, label = iterator.get_next() 100 | feature_dict = { 101 | "features": image, 102 | "labels": label, 103 | "label_texts": label, 104 | "filename": filename 105 | } 106 | 107 | return feature_dict, None 108 | 109 | 110 | def get_global_step(ckpt_path=None): 111 | if ckpt_path is None: 112 | ckpt_path = tf.train.latest_checkpoint(FLAGS.model_dir) 113 | base = os.path.basename(ckpt_path) 114 | global_step = int(base.split('-')[1]) 115 | return global_step 116 | 117 | 118 | def main(_): 119 | filelist = json.load(open(FLAGS.filelist, 'r'))["path"] 120 | filelist = [os.path.join(FLAGS.data_dir, f) for f in filelist] 121 | 122 | config = tf.ConfigProto() 123 | config.gpu_options.allow_growth = True 124 | config = tf.estimator.RunConfig( 125 | model_dir=FLAGS.model_dir, 126 | session_config=config 127 | ) 128 | 129 | estimator = tf.estimator.Estimator( 130 | model_fn=model.model_fn, 131 | config=config 132 | ) 133 | 134 | predictions = estimator.predict( 135 | input_fn=lambda: input_fn(filelist), 136 | yield_single_examples=True, 137 | predict_keys=["small_embeddings", "filename"], 138 | checkpoint_path=\ 139 | tf.train.latest_checkpoint(FLAGS.model_dir) 140 | if FLAGS.step is None 141 | else 'model.ckpt-%d' % FLAGS.step 142 | ) 143 | 144 | P = multiprocessing.Pool(FLAGS.num_threads_save) 145 | step = 0 146 | avg = 0.0 147 | 148 | for prediction in tqdm.tqdm(predictions): 149 | embedding = prediction["small_embeddings"] 150 | filename = prediction["filename"] 151 | filename = filename.decode() + FLAGS.output_suffix 152 | P.apply_async(matio.save_mat, (filename, embedding)) 153 | step += 1 154 | avg = (1 - 1/step) * avg + np.sum(np.abs(embedding) >= 1e-8) / step 155 | if step % 10000 == 0: 156 | print('Step %d\tSparsity %f' % (step, avg)) 157 | 158 | print('Step %d\tSparsity %f' % (step, avg)) 159 | 160 | P.close() 161 | P.join() 162 | 163 | 164 | if __name__=='__main__': 165 | absl.app.run(main) -------------------------------------------------------------------------------- /megaface_embeddings_hdf.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from __future__ import division 3 | from __future__ import print_function 4 | 5 | import tensorflow as tf 6 | import model 7 | import pprint 8 | import time 9 | from PIL import Image 10 | import absl 11 | import sys 12 | import os 13 | import numpy as np 14 | import horovod.tensorflow as hvd 15 | from tensorflow.core.framework.summary_pb2 import Summary 16 | from tensorflow import logging 17 | import shutil 18 | import glob 19 | from sklearn.model_selection import KFold 20 | import traceback 21 | import imagenet_preprocessing 22 | import pickle 23 | import matio 24 | import json 25 | import multiprocessing 26 | import tqdm 27 | import h5py 28 | 29 | 30 | flags = absl.flags 31 | flags.DEFINE_float(name='zero_threshold', 32 | help='Threshold below which values will be considered zero', 33 | default=1e-8) 34 | 35 | flags.DEFINE_integer(name='batch_size', help='Batch size', default=64) 36 | flags.DEFINE_integer(name='num_classes', help='Number of classes', default=9001) 37 | flags.DEFINE_integer(name='num_epochs', help='Number of epochs', default=1) 38 | flags.DEFINE_integer(name='embedding_size', 39 | help='Embedding size', 40 | default=1024) 41 | flags.DEFINE_integer(name='num_threads_preprocess', 42 | help='Number of threads for tensorflow preprocess pipeline', 43 | default=50) 44 | flags.DEFINE_integer(name='num_threads_save', 45 | help='Number of threads for saving mat files', 46 | default=50) 47 | 48 | flags.DEFINE_string(name='model_dir', 49 | help='Path to the model directory for checkpoints and summaries', 50 | default='checkpoints/kubernetes/sparse_imagenet10k_dynamic_2_mom/') 51 | flags.DEFINE_string(name='data_dir', 52 | help='Directory containing the validation data', 53 | default='../imagenet10k/tfrecords/val') 54 | flags.DEFINE_string(name='filelist', 55 | help='JSON file containing the list of files', 56 | default=None) 57 | flags.DEFINE_string(name='model', 58 | help='Options are mobilenet and mobilefacenet', 59 | default='mobilenet') 60 | flags.DEFINE_string(name='output_file', 61 | help='The output filename', 62 | default='embeddings') 63 | flags.DEFINE_string(name='final_activation', 64 | help='The activation to use in the final layer producing the embeddings', 65 | default=None) 66 | 67 | flags.DEFINE_boolean(name='debug', 68 | help='If True then inference will be stopped at 10000 steps', 69 | default=True) 70 | flags.DEFINE_boolean(name='is_training', 71 | help='If True then embeddings for the training data will be computed', 72 | default=False) 73 | 74 | DENSE_DIR = "checkpoints/mobilenet_retrained/" 75 | FLAGS = flags.FLAGS 76 | 77 | def input_fn(filelist): 78 | dummy_labels = tf.constant([1] * len(filelist)) 79 | filelist = tf.constant(filelist) 80 | dataset = tf.data.Dataset.from_tensor_slices((filelist, dummy_labels)) 81 | 82 | base_dir = tf.constant(FLAGS.data_dir + '/') 83 | 84 | def _parse_function(filename, label): 85 | image_string = tf.read_file(tf.strings.join([base_dir, filename])) 86 | image = tf.image.decode_jpeg(image_string, channels=3) 87 | image = tf.image.resize_images(image, [112, 112]) 88 | image = tf.cast(image, dtype=tf.float32) 89 | image = image / 128.0 - 1.0 90 | return image, filename, label 91 | 92 | dataset = dataset.map(_parse_function, 93 | num_parallel_calls=FLAGS.num_threads_preprocess) 94 | dataset = dataset.prefetch(buffer_size=tf.contrib.data.AUTOTUNE) 95 | dataset = dataset.batch(FLAGS.batch_size) 96 | dataset = dataset.prefetch(buffer_size=tf.contrib.data.AUTOTUNE) 97 | 98 | iterator = dataset.make_one_shot_iterator() 99 | image, filename, label = iterator.get_next() 100 | feature_dict = { 101 | "features": image, 102 | "labels": label, 103 | "label_texts": label, 104 | "filename": filename 105 | } 106 | 107 | return feature_dict, None 108 | 109 | 110 | def get_global_step(ckpt_path=None): 111 | if ckpt_path is None: 112 | ckpt_path = tf.train.latest_checkpoint(FLAGS.model_dir) 113 | base = os.path.basename(ckpt_path) 114 | global_step = int(base.split('-')[1]) 115 | return global_step 116 | 117 | 118 | def main(_): 119 | filelist = json.load(open(FLAGS.filelist, 'r'))["path"] 120 | 121 | config = tf.ConfigProto() 122 | config.gpu_options.allow_growth = True 123 | config = tf.estimator.RunConfig( 124 | model_dir=FLAGS.model_dir, 125 | session_config=config 126 | ) 127 | 128 | estimator = tf.estimator.Estimator( 129 | model_fn=model.model_fn, 130 | config=config 131 | ) 132 | 133 | predictions = estimator.predict( 134 | input_fn=lambda: input_fn(filelist), 135 | yield_single_examples=True, 136 | predict_keys=["small_embeddings", "filename"], 137 | checkpoint_path=tf.train.latest_checkpoint(FLAGS.model_dir) 138 | ) 139 | 140 | data = h5py.File(os.path.join('embeddings', FLAGS.output_file), mode='w') 141 | inttype = h5py.special_dtype(vlen=np.int32) 142 | floattype = h5py.special_dtype(vlen=np.float32) 143 | 144 | maxlen = len(filelist) + 1 145 | embedding_vals = data.create_dataset(name='embedding_vals', 146 | shape=(maxlen,), dtype=floattype) 147 | embedding_idx = data.create_dataset(name='embedding_idx', 148 | shape=(maxlen,), dtype=inttype) 149 | embedding_dense = data.create_dataset(name='embedding_dense', 150 | shape=(maxlen,), dtype=floattype) 151 | filenames = data.create_dataset(name='filenames', shape=(maxlen,), dtype='S50') 152 | tot_embeddings = data.create_dataset(name='tot_len', shape=(1,), dtype=np.int32) 153 | 154 | sparsity = [] 155 | 156 | step = 0 157 | avg = 0.0 158 | for prediction in tqdm.tqdm(predictions): 159 | embedding = prediction["small_embeddings"] 160 | filename = prediction["filename"] 161 | 162 | idxs = np.where(np.abs(embedding) >= 1e-8)[0] 163 | sparsity.append(len(idxs)) 164 | 165 | embedding_vals[step] = [embedding[idx] for idx in idxs] 166 | embedding_idx[step] = [idx for idx in idxs] 167 | embedding_dense[step] = [emb for emb in embedding] 168 | filenames[step] = filename 169 | step += 1 170 | tot_embeddings[0] = step 171 | avg = (1 - 1/step) * avg + len(idxs) / step 172 | if step % 10000 == 0: 173 | print('Step %d\tSparsity %f' % (step, avg)) 174 | 175 | print('Step %d\tSparsity %f' % (step, avg)) 176 | 177 | data.close() 178 | 179 | if __name__=='__main__': 180 | absl.app.run(main) -------------------------------------------------------------------------------- /mobilefacenet.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from mobilenet import mobilenet_v2 3 | from mobilenet import conv_blocks as ops 4 | from mobilenet import mobilenet as lib 5 | import input_pipeline as input_pipeline 6 | import pprint 7 | import time 8 | from PIL import Image 9 | import imagenet_preprocessing 10 | import absl 11 | from argparse import Namespace 12 | import sys 13 | import os 14 | import numpy as np 15 | import horovod.tensorflow as hvd 16 | from tensorflow.core.framework.summary_pb2 import Summary 17 | from tensorflow import logging 18 | import math 19 | from resnet import resnet_model 20 | 21 | # Define all necessary hyperparameters as flags 22 | flags = absl.flags 23 | FLAGS = flags.FLAGS 24 | 25 | slim = tf.contrib.slim 26 | op = lib.op 27 | 28 | expand_input = ops.expand_input_by_factor 29 | 30 | 31 | # The Arcface loss function is taken from 32 | # https://github.com/auroua/InsightFace_TF/blob/master/losses/face_losses.py 33 | def arcface_logits(embedding, out_num): 34 | with tf.variable_scope('Arcface'): 35 | weights = tf.get_variable(name='embedding_weights', shape=(embedding.shape[-1], out_num), 36 | initializer=tf.random_normal_initializer(stddev=0.001), 37 | dtype=tf.float32) 38 | weights = tf.nn.l2_normalize(weights, axis=0) 39 | # assign_op = tf.assign(weights, tf.nn.l2_normalize(weights, axis=0)) 40 | # with tf.control_dependencies([assign_op]): 41 | cos_t = tf.matmul(embedding, weights, name='cos_t') 42 | 43 | return cos_t 44 | 45 | 46 | def arcface_loss(logits, labels, out_num, s=64.0, m=0.5): 47 | assert len(labels.shape) == 1 48 | cos_m = math.cos(m) 49 | sin_m = math.sin(m) 50 | threshold = math.cos(math.pi - m) 51 | 52 | mask = tf.one_hot(labels, depth=out_num, name='one_hot_mask', on_value=1.0, off_value=0.0) 53 | cos_t = tf.multiply(logits, mask, name='cos_t') 54 | cos_t = tf.reduce_sum(cos_t, axis=1) 55 | 56 | cos_t2 = tf.square(cos_t, name='cos_t2') 57 | sin_t2 = tf.subtract(1.0, cos_t2, name='sin_t2') 58 | sin_t = tf.sqrt(sin_t2, name='sin_t') 59 | 60 | cos_mt = tf.subtract(tf.multiply(cos_t, cos_m), tf.multiply(sin_t, sin_m), name='cos_mt') 61 | 62 | cond = tf.less(threshold, cos_t, name='if_else') 63 | keep_val = cos_t - (threshold + 1) 64 | cos_mt = tf.where(cond, cos_mt, keep_val) 65 | # cos_mt = cond * cos_mt + (1 - cond) * keep_val 66 | 67 | update = tf.reshape(cos_mt, [-1, 1]) * mask 68 | mask = tf.cast(mask, tf.bool) 69 | logits = s * tf.where(mask, update, logits) 70 | # logits = s * (tf.reshape(cos_mt, [-1, 1]) * mask + logits * (1.0 - mask)) 71 | loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels) 72 | return loss 73 | 74 | 75 | def soft_thresh(lam): 76 | # lam = tf.get_variable( 77 | # "soft_thresh", shape=(), 78 | # initializer=tf.zeros_initializer, 79 | # constraint=lambda x: "hello") 80 | # lam = lam 81 | def activation(x): 82 | ret = tf.nn.relu(x - lam) - tf.nn.relu(-x - lam) 83 | return ret 84 | return activation 85 | 86 | 87 | def sign(x): 88 | return tf.stop_gradient(tf.sign(x) - tf.clip_by_value(x, -1, 1)) \ 89 | + tf.clip_by_value(x, -1, 1) 90 | 91 | 92 | def step(x): 93 | return (sign(x) + 1) / 2 94 | 95 | 96 | def mobilefacenet(features, is_training, num_classes): 97 | if FLAGS.final_activation == 'step': 98 | final_activation = step 99 | elif FLAGS.final_activation == 'relu': 100 | final_activation = tf.nn.relu 101 | elif FLAGS.final_activation == 'relu6': 102 | final_activation = tf.nn.relu6 103 | elif FLAGS.final_activation == 'soft_thresh': 104 | final_activation = soft_thresh(0.5) 105 | elif FLAGS.final_activation == 'none': 106 | final_activation = None 107 | else: 108 | raise ValueError('Unknown activation %s' % str(FLAGS.final_activation)) 109 | 110 | MFACENET_DEF = dict( 111 | defaults={ 112 | # Note: these parameters of batch norm affect the architecture 113 | # that's why they are here and not in training_scope. 114 | (slim.batch_norm,): {'center': True, 'scale': True}, 115 | (slim.conv2d, slim.fully_connected, slim.separable_conv2d): { 116 | 'normalizer_fn': slim.batch_norm, 'activation_fn': tf.nn.leaky_relu, 117 | 'weights_regularizer': slim.l2_regularizer(4e-5) 118 | }, 119 | (ops.expanded_conv,): { 120 | 'split_expansion': 1, 121 | 'normalizer_fn': slim.batch_norm, 122 | 'residual': True 123 | }, 124 | (slim.conv2d, slim.separable_conv2d): {'padding': 'SAME'}, 125 | }, 126 | spec=[ 127 | op(slim.conv2d, stride=2, num_outputs=64, kernel_size=[3, 3]), 128 | 129 | op(slim.separable_conv2d, num_outputs=None, kernel_size=[3, 3], 130 | depth_multiplier=1, stride=1), 131 | 132 | op(ops.expanded_conv, expansion_size=expand_input(2), stride=2, num_outputs=64), 133 | op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=64), 134 | op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=64), 135 | op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=64), 136 | op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=64), 137 | 138 | op(ops.expanded_conv, expansion_size=expand_input(4), stride=2, num_outputs=128), 139 | 140 | op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128), 141 | op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128), 142 | op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128), 143 | op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128), 144 | op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128), 145 | op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128), 146 | 147 | op(ops.expanded_conv, expansion_size=expand_input(4), stride=2, num_outputs=128), 148 | 149 | op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128), 150 | op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128), 151 | 152 | op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=512), 153 | 154 | op(slim.separable_conv2d, num_outputs=None, kernel_size=[7, 7], 155 | depth_multiplier=1, stride=1, padding="VALID", activation_fn=None), 156 | 157 | op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=FLAGS.embedding_size, 158 | weights_regularizer=slim.l2_regularizer(4e-4), activation_fn=final_activation), 159 | ], 160 | ) 161 | 162 | net, end_points = mobilenet_v2.mobilenet( 163 | features, is_training=is_training, num_classes=num_classes, 164 | base_only=True, conv_defs=MFACENET_DEF) 165 | 166 | embedding = end_points['embedding'] 167 | embedding = tf.squeeze(embedding, [1, 2]) 168 | embedding_unnorm = embedding 169 | embedding = tf.nn.l2_normalize(embedding, axis=1) 170 | if FLAGS.final_activation == 'step': 171 | end_points['embedding'] = embedding_unnorm 172 | else: 173 | end_points['embedding'] = embedding 174 | logits = arcface_logits(embedding, num_classes) 175 | end_points['Logits'] = logits 176 | 177 | return logits, end_points 178 | 179 | 180 | def faceresnet(features, is_training, num_classes): 181 | if FLAGS.final_activation == 'step': 182 | final_activation = step 183 | elif FLAGS.final_activation == 'relu': 184 | final_activation = tf.nn.relu 185 | elif FLAGS.final_activation == 'relu6': 186 | final_activation = tf.nn.relu6 187 | elif FLAGS.final_activation == 'soft_thresh': 188 | final_activation = soft_thresh(0.5) 189 | elif FLAGS.final_activation == 'none': 190 | final_activation = None 191 | 192 | filter_list = [64, 64, 128, 256, 512] 193 | units = [3, 13, 30, 3] 194 | 195 | model = resnet_model.Model( 196 | resnet_size=100, bottleneck=False, num_classes=FLAGS.embedding_size, 197 | num_filters=filter_list, kernel_size=(3, 3), conv_stride=1, 198 | first_pool_size=None, first_pool_stride=None, 199 | block_sizes=units, block_strides=[2] * 4, resnet_version=3, 200 | data_format='channels_last') 201 | # Relu/soft_thresh is not applied in the model 202 | embedding = model(features, is_training) 203 | if final_activation is not None: 204 | embedding = final_activation(embedding) 205 | embedding = tf.nn.l2_normalize(embedding, axis=1) 206 | 207 | end_points = dict() 208 | end_points['embedding'] = embedding 209 | logits = arcface_logits(embedding, num_classes) 210 | end_points['Logits'] = logits 211 | 212 | return logits, end_points 213 | -------------------------------------------------------------------------------- /mobilenet/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/biswajitsc/sparse-embed/775404d38ce4382251b88168820da3740c1323ea/mobilenet/__init__.py -------------------------------------------------------------------------------- /mobilenet/conv_blocks.py: -------------------------------------------------------------------------------- 1 | # Copyright 2018 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Convolution blocks for mobilenet.""" 16 | import contextlib 17 | import functools 18 | 19 | import tensorflow as tf 20 | 21 | slim = tf.contrib.slim 22 | 23 | 24 | def _fixed_padding(inputs, kernel_size, rate=1): 25 | """Pads the input along the spatial dimensions independently of input size. 26 | 27 | Pads the input such that if it was used in a convolution with 'VALID' padding, 28 | the output would have the same dimensions as if the unpadded input was used 29 | in a convolution with 'SAME' padding. 30 | 31 | Args: 32 | inputs: A tensor of size [batch, height_in, width_in, channels]. 33 | kernel_size: The kernel to be used in the conv2d or max_pool2d operation. 34 | rate: An integer, rate for atrous convolution. 35 | 36 | Returns: 37 | output: A tensor of size [batch, height_out, width_out, channels] with the 38 | input, either intact (if kernel_size == 1) or padded (if kernel_size > 1). 39 | """ 40 | kernel_size_effective = [kernel_size[0] + (kernel_size[0] - 1) * (rate - 1), 41 | kernel_size[0] + (kernel_size[0] - 1) * (rate - 1)] 42 | pad_total = [kernel_size_effective[0] - 1, kernel_size_effective[1] - 1] 43 | pad_beg = [pad_total[0] // 2, pad_total[1] // 2] 44 | pad_end = [pad_total[0] - pad_beg[0], pad_total[1] - pad_beg[1]] 45 | padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg[0], pad_end[0]], 46 | [pad_beg[1], pad_end[1]], [0, 0]]) 47 | return padded_inputs 48 | 49 | 50 | def _make_divisible(v, divisor, min_value=None): 51 | if min_value is None: 52 | min_value = divisor 53 | new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) 54 | # Make sure that round down does not go down by more than 10%. 55 | if new_v < 0.9 * v: 56 | new_v += divisor 57 | return new_v 58 | 59 | 60 | def _split_divisible(num, num_ways, divisible_by=8): 61 | """Evenly splits num, num_ways so each piece is a multiple of divisible_by.""" 62 | assert num % divisible_by == 0 63 | assert num / num_ways >= divisible_by 64 | # Note: want to round down, we adjust each split to match the total. 65 | base = num // num_ways // divisible_by * divisible_by 66 | result = [] 67 | accumulated = 0 68 | for i in range(num_ways): 69 | r = base 70 | while accumulated + r < num * (i + 1) / num_ways: 71 | r += divisible_by 72 | result.append(r) 73 | accumulated += r 74 | assert accumulated == num 75 | return result 76 | 77 | 78 | @contextlib.contextmanager 79 | def _v1_compatible_scope_naming(scope): 80 | if scope is None: # Create uniqified separable blocks. 81 | with tf.variable_scope(None, default_name='separable') as s, \ 82 | tf.name_scope(s.original_name_scope): 83 | yield '' 84 | else: 85 | # We use scope_depthwise, scope_pointwise for compatibility with V1 ckpts. 86 | # which provide numbered scopes. 87 | scope += '_' 88 | yield scope 89 | 90 | 91 | @slim.add_arg_scope 92 | def split_separable_conv2d(input_tensor, 93 | num_outputs, 94 | scope=None, 95 | normalizer_fn=None, 96 | stride=1, 97 | rate=1, 98 | endpoints=None, 99 | use_explicit_padding=False): 100 | """Separable mobilenet V1 style convolution. 101 | 102 | Depthwise convolution, with default non-linearity, 103 | followed by 1x1 depthwise convolution. This is similar to 104 | slim.separable_conv2d, but differs in tha it applies batch 105 | normalization and non-linearity to depthwise. This matches 106 | the basic building of Mobilenet Paper 107 | (https://arxiv.org/abs/1704.04861) 108 | 109 | Args: 110 | input_tensor: input 111 | num_outputs: number of outputs 112 | scope: optional name of the scope. Note if provided it will use 113 | scope_depthwise for deptwhise, and scope_pointwise for pointwise. 114 | normalizer_fn: which normalizer function to use for depthwise/pointwise 115 | stride: stride 116 | rate: output rate (also known as dilation rate) 117 | endpoints: optional, if provided, will export additional tensors to it. 118 | use_explicit_padding: Use 'VALID' padding for convolutions, but prepad 119 | inputs so that the output dimensions are the same as if 'SAME' padding 120 | were used. 121 | 122 | Returns: 123 | output tesnor 124 | """ 125 | 126 | with _v1_compatible_scope_naming(scope) as scope: 127 | dw_scope = scope + 'depthwise' 128 | endpoints = endpoints if endpoints is not None else {} 129 | kernel_size = [3, 3] 130 | padding = 'SAME' 131 | if use_explicit_padding: 132 | padding = 'VALID' 133 | input_tensor = _fixed_padding(input_tensor, kernel_size, rate) 134 | net = slim.separable_conv2d( 135 | input_tensor, 136 | None, 137 | kernel_size, 138 | depth_multiplier=1, 139 | stride=stride, 140 | rate=rate, 141 | normalizer_fn=normalizer_fn, 142 | padding=padding, 143 | scope=dw_scope) 144 | 145 | endpoints[dw_scope] = net 146 | 147 | pw_scope = scope + 'pointwise' 148 | net = slim.conv2d( 149 | net, 150 | num_outputs, [1, 1], 151 | stride=1, 152 | normalizer_fn=normalizer_fn, 153 | scope=pw_scope) 154 | endpoints[pw_scope] = net 155 | return net 156 | 157 | 158 | def expand_input_by_factor(n, divisible_by=8): 159 | return lambda num_inputs, **_: _make_divisible(num_inputs * n, divisible_by) 160 | 161 | 162 | @slim.add_arg_scope 163 | def expanded_conv(input_tensor, 164 | num_outputs, 165 | expansion_size=expand_input_by_factor(6), 166 | stride=1, 167 | rate=1, 168 | kernel_size=(3, 3), 169 | residual=True, 170 | normalizer_fn=None, 171 | split_projection=1, 172 | split_expansion=1, 173 | expansion_transform=None, 174 | depthwise_location='expansion', 175 | depthwise_channel_multiplier=1, 176 | endpoints=None, 177 | use_explicit_padding=False, 178 | padding='SAME', 179 | scope=None): 180 | """Depthwise Convolution Block with expansion. 181 | 182 | Builds a composite convolution that has the following structure 183 | expansion (1x1) -> depthwise (kernel_size) -> projection (1x1) 184 | 185 | Args: 186 | input_tensor: input 187 | num_outputs: number of outputs in the final layer. 188 | expansion_size: the size of expansion, could be a constant or a callable. 189 | If latter it will be provided 'num_inputs' as an input. For forward 190 | compatibility it should accept arbitrary keyword arguments. 191 | Default will expand the input by factor of 6. 192 | stride: depthwise stride 193 | rate: depthwise rate 194 | kernel_size: depthwise kernel 195 | residual: whether to include residual connection between input 196 | and output. 197 | normalizer_fn: batchnorm or otherwise 198 | split_projection: how many ways to split projection operator 199 | (that is conv expansion->bottleneck) 200 | split_expansion: how many ways to split expansion op 201 | (that is conv bottleneck->expansion) ops will keep depth divisible 202 | by this value. 203 | expansion_transform: Optional function that takes expansion 204 | as a single input and returns output. 205 | depthwise_location: where to put depthwise covnvolutions supported 206 | values None, 'input', 'output', 'expansion' 207 | depthwise_channel_multiplier: depthwise channel multiplier: 208 | each input will replicated (with different filters) 209 | that many times. So if input had c channels, 210 | output will have c x depthwise_channel_multpilier. 211 | endpoints: An optional dictionary into which intermediate endpoints are 212 | placed. The keys "expansion_output", "depthwise_output", 213 | "projection_output" and "expansion_transform" are always populated, even 214 | if the corresponding functions are not invoked. 215 | use_explicit_padding: Use 'VALID' padding for convolutions, but prepad 216 | inputs so that the output dimensions are the same as if 'SAME' padding 217 | were used. 218 | padding: Padding type to use if `use_explicit_padding` is not set. 219 | scope: optional scope. 220 | 221 | Returns: 222 | Tensor of depth num_outputs 223 | 224 | Raises: 225 | TypeError: on inval 226 | """ 227 | with tf.variable_scope(scope, default_name='expanded_conv') as s, \ 228 | tf.name_scope(s.original_name_scope): 229 | prev_depth = input_tensor.get_shape().as_list()[3] 230 | if depthwise_location not in [None, 'input', 'output', 'expansion']: 231 | raise TypeError('%r is unknown value for depthwise_location' % 232 | depthwise_location) 233 | if use_explicit_padding: 234 | if padding != 'SAME': 235 | raise TypeError('`use_explicit_padding` should only be used with ' 236 | '"SAME" padding.') 237 | padding = 'VALID' 238 | depthwise_func = functools.partial( 239 | slim.separable_conv2d, 240 | num_outputs=None, 241 | kernel_size=kernel_size, 242 | depth_multiplier=depthwise_channel_multiplier, 243 | stride=stride, 244 | rate=rate, 245 | normalizer_fn=normalizer_fn, 246 | padding=padding, 247 | scope='depthwise') 248 | # b1 -> b2 * r -> b2 249 | # i -> (o * r) (bottleneck) -> o 250 | input_tensor = tf.identity(input_tensor, 'input') 251 | net = input_tensor 252 | 253 | if depthwise_location == 'input': 254 | if use_explicit_padding: 255 | net = _fixed_padding(net, kernel_size, rate) 256 | net = depthwise_func(net, activation_fn=None) 257 | 258 | if callable(expansion_size): 259 | inner_size = expansion_size(num_inputs=prev_depth) 260 | else: 261 | inner_size = expansion_size 262 | 263 | if inner_size > net.shape[3]: 264 | net = split_conv( 265 | net, 266 | inner_size, 267 | num_ways=split_expansion, 268 | scope='expand', 269 | stride=1, 270 | normalizer_fn=normalizer_fn) 271 | net = tf.identity(net, 'expansion_output') 272 | if endpoints is not None: 273 | endpoints['expansion_output'] = net 274 | 275 | if depthwise_location == 'expansion': 276 | if use_explicit_padding: 277 | net = _fixed_padding(net, kernel_size, rate) 278 | net = depthwise_func(net) 279 | 280 | net = tf.identity(net, name='depthwise_output') 281 | if endpoints is not None: 282 | endpoints['depthwise_output'] = net 283 | if expansion_transform: 284 | net = expansion_transform(expansion_tensor=net, input_tensor=input_tensor) 285 | # Note in contrast with expansion, we always have 286 | # projection to produce the desired output size. 287 | net = split_conv( 288 | net, 289 | num_outputs, 290 | num_ways=split_projection, 291 | stride=1, 292 | scope='project', 293 | normalizer_fn=normalizer_fn, 294 | activation_fn=tf.identity) 295 | if endpoints is not None: 296 | endpoints['projection_output'] = net 297 | if depthwise_location == 'output': 298 | if use_explicit_padding: 299 | net = _fixed_padding(net, kernel_size, rate) 300 | net = depthwise_func(net, activation_fn=None) 301 | 302 | if callable(residual): # custom residual 303 | net = residual(input_tensor=input_tensor, output_tensor=net) 304 | elif (residual and 305 | # stride check enforces that we don't add residuals when spatial 306 | # dimensions are None 307 | stride == 1 and 308 | # Depth matches 309 | net.get_shape().as_list()[3] == 310 | input_tensor.get_shape().as_list()[3]): 311 | net += input_tensor 312 | return tf.identity(net, name='output') 313 | 314 | 315 | def split_conv(input_tensor, 316 | num_outputs, 317 | num_ways, 318 | scope, 319 | divisible_by=8, 320 | **kwargs): 321 | """Creates a split convolution. 322 | 323 | Split convolution splits the input and output into 324 | 'num_blocks' blocks of approximately the same size each, 325 | and only connects $i$-th input to $i$ output. 326 | 327 | Args: 328 | input_tensor: input tensor 329 | num_outputs: number of output filters 330 | num_ways: num blocks to split by. 331 | scope: scope for all the operators. 332 | divisible_by: make sure that every part is divisiable by this. 333 | **kwargs: will be passed directly into conv2d operator 334 | Returns: 335 | tensor 336 | """ 337 | b = input_tensor.get_shape().as_list()[3] 338 | 339 | if num_ways == 1 or min(b // num_ways, 340 | num_outputs // num_ways) < divisible_by: 341 | # Don't do any splitting if we end up with less than 8 filters 342 | # on either side. 343 | return slim.conv2d(input_tensor, num_outputs, [1, 1], scope=scope, **kwargs) 344 | 345 | outs = [] 346 | input_splits = _split_divisible(b, num_ways, divisible_by=divisible_by) 347 | output_splits = _split_divisible( 348 | num_outputs, num_ways, divisible_by=divisible_by) 349 | inputs = tf.split(input_tensor, input_splits, axis=3, name='split_' + scope) 350 | base = scope 351 | for i, (input_tensor, out_size) in enumerate(zip(inputs, output_splits)): 352 | scope = base + '_part_%d' % (i,) 353 | n = slim.conv2d(input_tensor, out_size, [1, 1], scope=scope, **kwargs) 354 | n = tf.identity(n, scope + '_output') 355 | outs.append(n) 356 | return tf.concat(outs, 3, name=scope + '_concat') 357 | -------------------------------------------------------------------------------- /mobilenet/mobilenet.py: -------------------------------------------------------------------------------- 1 | # Copyright 2018 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Mobilenet Base Class.""" 16 | 17 | from __future__ import absolute_import 18 | from __future__ import division 19 | from __future__ import print_function 20 | import collections 21 | import contextlib 22 | import copy 23 | import os 24 | import sys 25 | 26 | import tensorflow as tf 27 | 28 | 29 | slim = tf.contrib.slim 30 | 31 | 32 | @slim.add_arg_scope 33 | def apply_activation(x, name=None, activation_fn=None): 34 | return activation_fn(x, name=name) if activation_fn else x 35 | 36 | 37 | def _fixed_padding(inputs, kernel_size, rate=1): 38 | """Pads the input along the spatial dimensions independently of input size. 39 | 40 | Pads the input such that if it was used in a convolution with 'VALID' padding, 41 | the output would have the same dimensions as if the unpadded input was used 42 | in a convolution with 'SAME' padding. 43 | 44 | Args: 45 | inputs: A tensor of size [batch, height_in, width_in, channels]. 46 | kernel_size: The kernel to be used in the conv2d or max_pool2d operation. 47 | rate: An integer, rate for atrous convolution. 48 | 49 | Returns: 50 | output: A tensor of size [batch, height_out, width_out, channels] with the 51 | input, either intact (if kernel_size == 1) or padded (if kernel_size > 1). 52 | """ 53 | kernel_size_effective = [kernel_size[0] + (kernel_size[0] - 1) * (rate - 1), 54 | kernel_size[0] + (kernel_size[0] - 1) * (rate - 1)] 55 | pad_total = [kernel_size_effective[0] - 1, kernel_size_effective[1] - 1] 56 | pad_beg = [pad_total[0] // 2, pad_total[1] // 2] 57 | pad_end = [pad_total[0] - pad_beg[0], pad_total[1] - pad_beg[1]] 58 | padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg[0], pad_end[0]], 59 | [pad_beg[1], pad_end[1]], [0, 0]]) 60 | return padded_inputs 61 | 62 | 63 | def _make_divisible(v, divisor, min_value=None): 64 | if min_value is None: 65 | min_value = divisor 66 | new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) 67 | # Make sure that round down does not go down by more than 10%. 68 | if new_v < 0.9 * v: 69 | new_v += divisor 70 | return new_v 71 | 72 | 73 | @contextlib.contextmanager 74 | def _set_arg_scope_defaults(defaults): 75 | """Sets arg scope defaults for all items present in defaults. 76 | 77 | Args: 78 | defaults: dictionary/list of pairs, containing a mapping from 79 | function to a dictionary of default args. 80 | 81 | Yields: 82 | context manager where all defaults are set. 83 | """ 84 | if hasattr(defaults, 'items'): 85 | items = list(defaults.items()) 86 | else: 87 | items = defaults 88 | if not items: 89 | yield 90 | else: 91 | func, default_arg = items[0] 92 | with slim.arg_scope(func, **default_arg): 93 | with _set_arg_scope_defaults(items[1:]): 94 | yield 95 | 96 | 97 | @slim.add_arg_scope 98 | def depth_multiplier(output_params, 99 | multiplier, 100 | divisible_by=8, 101 | min_depth=8, 102 | **unused_kwargs): 103 | if 'num_outputs' not in output_params: 104 | return 105 | d = output_params['num_outputs'] 106 | if d is not None: 107 | output_params['num_outputs'] = _make_divisible(d * multiplier, divisible_by, 108 | min_depth) 109 | 110 | 111 | _Op = collections.namedtuple('Op', ['op', 'params', 'multiplier_func']) 112 | 113 | 114 | def op(opfunc, **params): 115 | multiplier = params.pop('multiplier_transorm', depth_multiplier) 116 | return _Op(opfunc, params=params, multiplier_func=multiplier) 117 | 118 | 119 | class NoOpScope(object): 120 | """No-op context manager.""" 121 | 122 | def __enter__(self): 123 | return None 124 | 125 | def __exit__(self, exc_type, exc_value, traceback): 126 | return False 127 | 128 | 129 | def safe_arg_scope(funcs, **kwargs): 130 | """Returns `slim.arg_scope` with all None arguments removed. 131 | 132 | Arguments: 133 | funcs: Functions to pass to `arg_scope`. 134 | **kwargs: Arguments to pass to `arg_scope`. 135 | 136 | Returns: 137 | arg_scope or No-op context manager. 138 | 139 | Note: can be useful if None value should be interpreted as "do not overwrite 140 | this parameter value". 141 | """ 142 | filtered_args = {name: value for name, value in kwargs.items() 143 | if value is not None} 144 | if filtered_args: 145 | return slim.arg_scope(funcs, **filtered_args) 146 | else: 147 | return NoOpScope() 148 | 149 | 150 | @slim.add_arg_scope 151 | def mobilenet_base( # pylint: disable=invalid-name 152 | inputs, 153 | conv_defs, 154 | multiplier=1.0, 155 | final_endpoint=None, 156 | output_stride=None, 157 | use_explicit_padding=False, 158 | scope=None, 159 | is_training=False): 160 | """Mobilenet base network. 161 | 162 | Constructs a network from inputs to the given final endpoint. By default 163 | the network is constructed in inference mode. To create network 164 | in training mode use: 165 | 166 | with slim.arg_scope(mobilenet.training_scope()): 167 | logits, endpoints = mobilenet_base(...) 168 | 169 | Args: 170 | inputs: a tensor of shape [batch_size, height, width, channels]. 171 | conv_defs: A list of op(...) layers specifying the net architecture. 172 | multiplier: Float multiplier for the depth (number of channels) 173 | for all convolution ops. The value must be greater than zero. Typical 174 | usage will be to set this value in (0, 1) to reduce the number of 175 | parameters or computation cost of the model. 176 | final_endpoint: The name of last layer, for early termination for 177 | for V1-based networks: last layer is "layer_14", for V2: "layer_20" 178 | output_stride: An integer that specifies the requested ratio of input to 179 | output spatial resolution. If not None, then we invoke atrous convolution 180 | if necessary to prevent the network from reducing the spatial resolution 181 | of the activation maps. Allowed values are 1 or any even number, excluding 182 | zero. Typical values are 8 (accurate fully convolutional mode), 16 183 | (fast fully convolutional mode), and 32 (classification mode). 184 | 185 | NOTE- output_stride relies on all consequent operators to support dilated 186 | operators via "rate" parameter. This might require wrapping non-conv 187 | operators to operate properly. 188 | 189 | use_explicit_padding: Use 'VALID' padding for convolutions, but prepad 190 | inputs so that the output dimensions are the same as if 'SAME' padding 191 | were used. 192 | scope: optional variable scope. 193 | is_training: How to setup batch_norm and other ops. Note: most of the time 194 | this does not need be set directly. Use mobilenet.training_scope() to set 195 | up training instead. This parameter is here for backward compatibility 196 | only. It is safe to set it to the value matching 197 | training_scope(is_training=...). It is also safe to explicitly set 198 | it to False, even if there is outer training_scope set to to training. 199 | (The network will be built in inference mode). If this is set to None, 200 | no arg_scope is added for slim.batch_norm's is_training parameter. 201 | 202 | Returns: 203 | tensor_out: output tensor. 204 | end_points: a set of activations for external use, for example summaries or 205 | losses. 206 | 207 | Raises: 208 | ValueError: depth_multiplier <= 0, or the target output_stride is not 209 | allowed. 210 | """ 211 | if multiplier <= 0: 212 | raise ValueError('multiplier is not greater than zero.') 213 | 214 | # Set conv defs defaults and overrides. 215 | conv_defs_defaults = conv_defs.get('defaults', {}) 216 | conv_defs_overrides = conv_defs.get('overrides', {}) 217 | if use_explicit_padding: 218 | conv_defs_overrides = copy.deepcopy(conv_defs_overrides) 219 | conv_defs_overrides[ 220 | (slim.conv2d, slim.separable_conv2d)] = {'padding': 'VALID'} 221 | 222 | if output_stride is not None: 223 | if output_stride == 0 or (output_stride > 1 and output_stride % 2): 224 | raise ValueError('Output stride must be None, 1 or a multiple of 2.') 225 | 226 | # a) Set the tensorflow scope 227 | # b) set padding to default: note we might consider removing this 228 | # since it is also set by mobilenet_scope 229 | # c) set all defaults 230 | # d) set all extra overrides. 231 | with _scope_all(scope, default_scope='Mobilenet'), \ 232 | safe_arg_scope([slim.batch_norm], is_training=is_training), \ 233 | _set_arg_scope_defaults(conv_defs_defaults), \ 234 | _set_arg_scope_defaults(conv_defs_overrides): 235 | # The current_stride variable keeps track of the output stride of the 236 | # activations, i.e., the running product of convolution strides up to the 237 | # current network layer. This allows us to invoke atrous convolution 238 | # whenever applying the next convolution would result in the activations 239 | # having output stride larger than the target output_stride. 240 | current_stride = 1 241 | 242 | # The atrous convolution rate parameter. 243 | rate = 1 244 | 245 | net = inputs 246 | # Insert default parameters before the base scope which includes 247 | # any custom overrides set in mobilenet. 248 | end_points = {} 249 | scopes = {} 250 | for i, opdef in enumerate(conv_defs['spec']): 251 | params = dict(opdef.params) 252 | opdef.multiplier_func(params, multiplier) 253 | stride = params.get('stride', 1) 254 | if output_stride is not None and current_stride == output_stride: 255 | # If we have reached the target output_stride, then we need to employ 256 | # atrous convolution with stride=1 and multiply the atrous rate by the 257 | # current unit's stride for use in subsequent layers. 258 | layer_stride = 1 259 | layer_rate = rate 260 | rate *= stride 261 | else: 262 | layer_stride = stride 263 | layer_rate = 1 264 | current_stride *= stride 265 | # Update params. 266 | params['stride'] = layer_stride 267 | # Only insert rate to params if rate > 1. 268 | if layer_rate > 1: 269 | params['rate'] = layer_rate 270 | # Set padding 271 | if use_explicit_padding: 272 | if 'kernel_size' in params: 273 | net = _fixed_padding(net, params['kernel_size'], layer_rate) 274 | else: 275 | params['use_explicit_padding'] = True 276 | 277 | end_point = 'layer_%d' % (i + 1) 278 | try: 279 | net = opdef.op(net, **params) 280 | except Exception: 281 | print('Failed to create op %i: %r params: %r' % (i, opdef, params)) 282 | raise 283 | end_points[end_point] = net 284 | scope = os.path.dirname(net.name) 285 | scopes[scope] = end_point 286 | if final_endpoint is not None and end_point == final_endpoint: 287 | break 288 | 289 | # Add all tensors that end with 'output' to 290 | # endpoints 291 | for t in net.graph.get_operations(): 292 | scope = os.path.dirname(t.name) 293 | bn = os.path.basename(t.name) 294 | if scope in scopes and t.name.endswith('output'): 295 | end_points[scopes[scope] + '/' + bn] = t.outputs[0] 296 | return net, end_points 297 | 298 | 299 | @contextlib.contextmanager 300 | def _scope_all(scope, default_scope=None): 301 | with tf.variable_scope(scope, default_name=default_scope) as s,\ 302 | tf.name_scope(s.original_name_scope): 303 | yield s 304 | 305 | 306 | @slim.add_arg_scope 307 | def mobilenet(inputs, 308 | num_classes=1001, 309 | prediction_fn=slim.softmax, 310 | reuse=None, 311 | scope='Mobilenet', 312 | base_only=False, 313 | **mobilenet_args): 314 | """Mobilenet model for classification, supports both V1 and V2. 315 | 316 | Note: default mode is inference, use mobilenet.training_scope to create 317 | training network. 318 | 319 | 320 | Args: 321 | inputs: a tensor of shape [batch_size, height, width, channels]. 322 | num_classes: number of predicted classes. If 0 or None, the logits layer 323 | is omitted and the input features to the logits layer (before dropout) 324 | are returned instead. 325 | prediction_fn: a function to get predictions out of logits 326 | (default softmax). 327 | reuse: whether or not the network and its variables should be reused. To be 328 | able to reuse 'scope' must be given. 329 | scope: Optional variable_scope. 330 | base_only: if True will only create the base of the network (no pooling 331 | and no logits). 332 | **mobilenet_args: passed to mobilenet_base verbatim. 333 | - conv_defs: list of conv defs 334 | - multiplier: Float multiplier for the depth (number of channels) 335 | for all convolution ops. The value must be greater than zero. Typical 336 | usage will be to set this value in (0, 1) to reduce the number of 337 | parameters or computation cost of the model. 338 | - output_stride: will ensure that the last layer has at most total stride. 339 | If the architecture calls for more stride than that provided 340 | (e.g. output_stride=16, but the architecture has 5 stride=2 operators), 341 | it will replace output_stride with fractional convolutions using Atrous 342 | Convolutions. 343 | 344 | Returns: 345 | logits: the pre-softmax activations, a tensor of size 346 | [batch_size, num_classes] 347 | end_points: a dictionary from components of the network to the corresponding 348 | activation tensor. 349 | 350 | Raises: 351 | ValueError: Input rank is invalid. 352 | """ 353 | is_training = mobilenet_args.get('is_training', False) 354 | input_shape = inputs.get_shape().as_list() 355 | if len(input_shape) != 4: 356 | raise ValueError('Expected rank 4 input, was: %d' % len(input_shape)) 357 | 358 | with tf.variable_scope(scope, 'Mobilenet', reuse=reuse) as scope: 359 | inputs = tf.identity(inputs, 'input') 360 | net, end_points = mobilenet_base(inputs, scope=scope, **mobilenet_args) 361 | net = tf.identity(net, name='embedding') 362 | end_points['embedding'] = net 363 | 364 | if base_only: 365 | return net, end_points 366 | 367 | with tf.variable_scope('Logits'): 368 | net = global_pool(net) 369 | end_points['global_pool'] = net 370 | if not num_classes: 371 | return net, end_points 372 | # Dropout turned off 373 | # net = slim.dropout(net, scope='Dropout', is_training=False) 374 | # 1 x 1 x num_classes 375 | # Note: legacy scope name. 376 | logits = slim.conv2d( 377 | net, 378 | num_classes, [1, 1], 379 | activation_fn=None, 380 | normalizer_fn=None, 381 | biases_initializer=tf.zeros_initializer(), 382 | scope='Conv2d_1c_1x1') 383 | 384 | logits = tf.squeeze(logits, [1, 2]) 385 | 386 | logits = tf.identity(logits, name='output') 387 | end_points['Logits'] = logits 388 | if prediction_fn: 389 | end_points['Predictions'] = prediction_fn(logits, 'Predictions') 390 | return logits, end_points 391 | 392 | 393 | def global_pool(input_tensor, pool_op=tf.nn.avg_pool): 394 | """Applies avg pool to produce 1x1 output. 395 | 396 | NOTE: This function is funcitonally equivalenet to reduce_mean, but it has 397 | baked in average pool which has better support across hardware. 398 | 399 | Args: 400 | input_tensor: input tensor 401 | pool_op: pooling op (avg pool is default) 402 | Returns: 403 | a tensor batch_size x 1 x 1 x depth. 404 | """ 405 | shape = input_tensor.get_shape().as_list() 406 | if shape[1] is None or shape[2] is None: 407 | kernel_size = tf.convert_to_tensor( 408 | [1, tf.shape(input_tensor)[1], 409 | tf.shape(input_tensor)[2], 1]) 410 | else: 411 | kernel_size = [1, shape[1], shape[2], 1] 412 | output = pool_op( 413 | input_tensor, ksize=kernel_size, strides=[1, 1, 1, 1], padding='VALID') 414 | # Recover output shape, for unknown shape. 415 | output.set_shape([None, 1, 1, None]) 416 | return output 417 | 418 | 419 | def training_scope(is_training=True, 420 | weight_decay=0.00004, 421 | stddev=0.09, 422 | dropout_keep_prob=0.8, 423 | bn_decay=0.997): 424 | """Defines Mobilenet training scope. 425 | 426 | Usage: 427 | with tf.contrib.slim.arg_scope(mobilenet.training_scope()): 428 | logits, endpoints = mobilenet_v2.mobilenet(input_tensor) 429 | 430 | # the network created will be trainble with dropout/batch norm 431 | # initialized appropriately. 432 | Args: 433 | is_training: if set to False this will ensure that all customizations are 434 | set to non-training mode. This might be helpful for code that is reused 435 | across both training/evaluation, but most of the time training_scope with 436 | value False is not needed. If this is set to None, the parameters is not 437 | added to the batch_norm arg_scope. 438 | 439 | weight_decay: The weight decay to use for regularizing the model. 440 | stddev: Standard deviation for initialization, if negative uses xavier. 441 | dropout_keep_prob: dropout keep probability (not set if equals to None). 442 | bn_decay: decay for the batch norm moving averages (not set if equals to 443 | None). 444 | 445 | Returns: 446 | An argument scope to use via arg_scope. 447 | """ 448 | # Note: do not introduce parameters that would change the inference 449 | # model here (for example whether to use bias), modify conv_def instead. 450 | batch_norm_params = { 451 | 'decay': bn_decay, 452 | 'is_training': is_training 453 | } 454 | if stddev < 0: 455 | weight_intitializer = slim.initializers.xavier_initializer() 456 | else: 457 | weight_intitializer = tf.truncated_normal_initializer(stddev=stddev) 458 | 459 | # Set weight_decay for weights in Conv and FC layers. 460 | with slim.arg_scope( 461 | [slim.conv2d, slim.fully_connected, slim.separable_conv2d], 462 | weights_initializer=weight_intitializer, 463 | normalizer_fn=slim.batch_norm), \ 464 | slim.arg_scope([mobilenet_base, mobilenet], is_training=is_training),\ 465 | safe_arg_scope([slim.batch_norm], **batch_norm_params), \ 466 | safe_arg_scope([slim.dropout], is_training=False, 467 | keep_prob=dropout_keep_prob), \ 468 | slim.arg_scope([slim.conv2d], \ 469 | weights_regularizer=slim.l2_regularizer(weight_decay)), \ 470 | slim.arg_scope([slim.separable_conv2d], weights_regularizer=None) as s: 471 | return s 472 | -------------------------------------------------------------------------------- /mobilenet/mobilenet_v2.py: -------------------------------------------------------------------------------- 1 | # Copyright 2018 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Implementation of Mobilenet V2. 16 | 17 | Architecture: https://arxiv.org/abs/1801.04381 18 | 19 | The base model gives 72.2% accuracy on ImageNet, with 300MMadds, 20 | 3.4 M parameters. 21 | """ 22 | 23 | from __future__ import absolute_import 24 | from __future__ import division 25 | from __future__ import print_function 26 | 27 | import copy 28 | import functools 29 | import tensorflow as tf 30 | 31 | from mobilenet import conv_blocks as ops 32 | from mobilenet import mobilenet as lib 33 | 34 | slim = tf.contrib.slim 35 | op = lib.op 36 | 37 | expand_input = ops.expand_input_by_factor 38 | 39 | # pyformat: disable 40 | # Architecture: https://arxiv.org/abs/1801.04381 41 | V2_DEF = dict( 42 | defaults={ 43 | # Note: these parameters of batch norm affect the architecture 44 | # that's why they are here and not in training_scope. 45 | (slim.batch_norm,): {'center': True, 'scale': True}, 46 | (slim.conv2d, slim.fully_connected, slim.separable_conv2d): { 47 | 'normalizer_fn': slim.batch_norm, 'activation_fn': tf.nn.relu6 48 | }, 49 | (ops.expanded_conv,): { 50 | 'expansion_size': expand_input(6), 51 | 'split_expansion': 1, 52 | 'normalizer_fn': slim.batch_norm, 53 | 'residual': True 54 | }, 55 | (slim.conv2d, slim.separable_conv2d): {'padding': 'SAME'} 56 | }, 57 | spec=[ 58 | op(slim.conv2d, stride=2, num_outputs=32, kernel_size=[3, 3]), 59 | op(ops.expanded_conv, 60 | expansion_size=expand_input(1, divisible_by=1), 61 | num_outputs=16), 62 | op(ops.expanded_conv, stride=2, num_outputs=24), 63 | op(ops.expanded_conv, stride=1, num_outputs=24), 64 | op(ops.expanded_conv, stride=2, num_outputs=32), 65 | op(ops.expanded_conv, stride=1, num_outputs=32), 66 | op(ops.expanded_conv, stride=1, num_outputs=32), 67 | op(ops.expanded_conv, stride=2, num_outputs=64), 68 | op(ops.expanded_conv, stride=1, num_outputs=64), 69 | op(ops.expanded_conv, stride=1, num_outputs=64), 70 | op(ops.expanded_conv, stride=1, num_outputs=64), 71 | op(ops.expanded_conv, stride=1, num_outputs=96), 72 | op(ops.expanded_conv, stride=1, num_outputs=96), 73 | op(ops.expanded_conv, stride=1, num_outputs=96), 74 | op(ops.expanded_conv, stride=2, num_outputs=160), 75 | op(ops.expanded_conv, stride=1, num_outputs=160), 76 | op(ops.expanded_conv, stride=1, num_outputs=160), 77 | op(ops.expanded_conv, stride=1, num_outputs=320), 78 | op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=1280) 79 | ], 80 | ) 81 | # pyformat: enable 82 | 83 | 84 | @slim.add_arg_scope 85 | def mobilenet(input_tensor, 86 | num_classes=1001, 87 | depth_multiplier=1.0, 88 | scope='MobilenetV2', 89 | conv_defs=None, 90 | finegrain_classification_mode=False, 91 | min_depth=None, 92 | divisible_by=None, 93 | **kwargs): 94 | """Creates mobilenet V2 network. 95 | 96 | Inference mode is created by default. To create training use training_scope 97 | below. 98 | 99 | with tf.contrib.slim.arg_scope(mobilenet_v2.training_scope()): 100 | logits, endpoints = mobilenet_v2.mobilenet(input_tensor) 101 | 102 | Args: 103 | input_tensor: The input tensor 104 | num_classes: number of classes 105 | depth_multiplier: The multiplier applied to scale number of 106 | channels in each layer. Note: this is called depth multiplier in the 107 | paper but the name is kept for consistency with slim's model builder. 108 | scope: Scope of the operator 109 | conv_defs: Allows to override default conv def. 110 | finegrain_classification_mode: When set to True, the model 111 | will keep the last layer large even for small multipliers. Following 112 | https://arxiv.org/abs/1801.04381 113 | suggests that it improves performance for ImageNet-type of problems. 114 | *Note* ignored if final_endpoint makes the builder exit earlier. 115 | min_depth: If provided, will ensure that all layers will have that 116 | many channels after application of depth multiplier. 117 | divisible_by: If provided will ensure that all layers # channels 118 | will be divisible by this number. 119 | **kwargs: passed directly to mobilenet.mobilenet: 120 | prediction_fn- what prediction function to use. 121 | reuse-: whether to reuse variables (if reuse set to true, scope 122 | must be given). 123 | Returns: 124 | logits/endpoints pair 125 | 126 | Raises: 127 | ValueError: On invalid arguments 128 | """ 129 | if conv_defs is None: 130 | conv_defs = V2_DEF 131 | if 'multiplier' in kwargs: 132 | raise ValueError('mobilenetv2 doesn\'t support generic ' 133 | 'multiplier parameter use "depth_multiplier" instead.') 134 | if finegrain_classification_mode: 135 | conv_defs = copy.deepcopy(conv_defs) 136 | if depth_multiplier < 1: 137 | conv_defs['spec'][-1].params['num_outputs'] /= depth_multiplier 138 | 139 | depth_args = {} 140 | # NB: do not set depth_args unless they are provided to avoid overriding 141 | # whatever default depth_multiplier might have thanks to arg_scope. 142 | if min_depth is not None: 143 | depth_args['min_depth'] = min_depth 144 | if divisible_by is not None: 145 | depth_args['divisible_by'] = divisible_by 146 | 147 | with slim.arg_scope((lib.depth_multiplier,), **depth_args): 148 | return lib.mobilenet( 149 | input_tensor, 150 | num_classes=num_classes, 151 | conv_defs=conv_defs, 152 | scope=scope, 153 | multiplier=depth_multiplier, 154 | **kwargs) 155 | 156 | 157 | @slim.add_arg_scope 158 | def mobilenet_base(input_tensor, depth_multiplier=1.0, **kwargs): 159 | """Creates base of the mobilenet (no pooling and no logits) .""" 160 | return mobilenet(input_tensor, 161 | depth_multiplier=depth_multiplier, 162 | base_only=True, **kwargs) 163 | 164 | 165 | def training_scope(**kwargs): 166 | """Defines MobilenetV2 training scope. 167 | 168 | Usage: 169 | with tf.contrib.slim.arg_scope(mobilenet_v2.training_scope()): 170 | logits, endpoints = mobilenet_v2.mobilenet(input_tensor) 171 | 172 | with slim. 173 | 174 | Args: 175 | **kwargs: Passed to mobilenet.training_scope. The following parameters 176 | are supported: 177 | weight_decay- The weight decay to use for regularizing the model. 178 | stddev- Standard deviation for initialization, if negative uses xavier. 179 | dropout_keep_prob- dropout keep probability 180 | bn_decay- decay for the batch norm moving averages. 181 | 182 | Returns: 183 | An `arg_scope` to use for the mobilenet v2 model. 184 | """ 185 | return lib.training_scope(**kwargs) 186 | 187 | 188 | def wrapped_partial(func, *args, **kwargs): 189 | partial_func = functools.partial(func, *args, **kwargs) 190 | functools.update_wrapper(partial_func, func) 191 | return partial_func 192 | 193 | 194 | # Wrappers for mobilenet v2 with depth-multipliers. Be noticed that 195 | # 'finegrain_classification_mode' is set to True, which means the embedding 196 | # layer will not be shrinked when given a depth-multiplier < 1.0. 197 | mobilenet_v2_140 = wrapped_partial(mobilenet, depth_multiplier=1.4) 198 | mobilenet_v2_050 = wrapped_partial(mobilenet, depth_multiplier=0.50, 199 | finegrain_classification_mode=True) 200 | mobilenet_v2_035 = wrapped_partial(mobilenet, depth_multiplier=0.35, 201 | finegrain_classification_mode=True) 202 | 203 | 204 | __all__ = ['training_scope', 'mobilenet_base', 'mobilenet', 'V2_DEF'] 205 | -------------------------------------------------------------------------------- /mobilenet/mobilenet_v2_test.py: -------------------------------------------------------------------------------- 1 | # Copyright 2018 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Tests for mobilenet_v2.""" 16 | 17 | from __future__ import absolute_import 18 | from __future__ import division 19 | from __future__ import print_function 20 | import copy 21 | import tensorflow as tf 22 | from nets.mobilenet import conv_blocks as ops 23 | from nets.mobilenet import mobilenet 24 | from nets.mobilenet import mobilenet_v2 25 | 26 | 27 | slim = tf.contrib.slim 28 | 29 | 30 | def find_ops(optype): 31 | """Find ops of a given type in graphdef or a graph. 32 | 33 | Args: 34 | optype: operation type (e.g. Conv2D) 35 | Returns: 36 | List of operations. 37 | """ 38 | gd = tf.get_default_graph() 39 | return [var for var in gd.get_operations() if var.type == optype] 40 | 41 | 42 | class MobilenetV2Test(tf.test.TestCase): 43 | 44 | def setUp(self): 45 | tf.reset_default_graph() 46 | 47 | def testCreation(self): 48 | spec = dict(mobilenet_v2.V2_DEF) 49 | _, ep = mobilenet.mobilenet( 50 | tf.placeholder(tf.float32, (10, 224, 224, 16)), conv_defs=spec) 51 | num_convs = len(find_ops('Conv2D')) 52 | 53 | # This is mostly a sanity test. No deep reason for these particular 54 | # constants. 55 | # 56 | # All but first 2 and last one have two convolutions, and there is one 57 | # extra conv that is not in the spec. (logits) 58 | self.assertEqual(num_convs, len(spec['spec']) * 2 - 2) 59 | # Check that depthwise are exposed. 60 | for i in range(2, 17): 61 | self.assertIn('layer_%d/depthwise_output' % i, ep) 62 | 63 | def testCreationNoClasses(self): 64 | spec = copy.deepcopy(mobilenet_v2.V2_DEF) 65 | net, ep = mobilenet.mobilenet( 66 | tf.placeholder(tf.float32, (10, 224, 224, 16)), conv_defs=spec, 67 | num_classes=None) 68 | self.assertIs(net, ep['global_pool']) 69 | 70 | def testImageSizes(self): 71 | for input_size, output_size in [(224, 7), (192, 6), (160, 5), 72 | (128, 4), (96, 3)]: 73 | tf.reset_default_graph() 74 | _, ep = mobilenet_v2.mobilenet( 75 | tf.placeholder(tf.float32, (10, input_size, input_size, 3))) 76 | 77 | self.assertEqual(ep['layer_18/output'].get_shape().as_list()[1:3], 78 | [output_size] * 2) 79 | 80 | def testWithSplits(self): 81 | spec = copy.deepcopy(mobilenet_v2.V2_DEF) 82 | spec['overrides'] = { 83 | (ops.expanded_conv,): dict(split_expansion=2), 84 | } 85 | _, _ = mobilenet.mobilenet( 86 | tf.placeholder(tf.float32, (10, 224, 224, 16)), conv_defs=spec) 87 | num_convs = len(find_ops('Conv2D')) 88 | # All but 3 op has 3 conv operatore, the remainign 3 have one 89 | # and there is one unaccounted. 90 | self.assertEqual(num_convs, len(spec['spec']) * 3 - 5) 91 | 92 | def testWithOutputStride8(self): 93 | out, _ = mobilenet.mobilenet_base( 94 | tf.placeholder(tf.float32, (10, 224, 224, 16)), 95 | conv_defs=mobilenet_v2.V2_DEF, 96 | output_stride=8, 97 | scope='MobilenetV2') 98 | self.assertEqual(out.get_shape().as_list()[1:3], [28, 28]) 99 | 100 | def testDivisibleBy(self): 101 | tf.reset_default_graph() 102 | mobilenet_v2.mobilenet( 103 | tf.placeholder(tf.float32, (10, 224, 224, 16)), 104 | conv_defs=mobilenet_v2.V2_DEF, 105 | divisible_by=16, 106 | min_depth=32) 107 | s = [op.outputs[0].get_shape().as_list()[-1] for op in find_ops('Conv2D')] 108 | s = set(s) 109 | self.assertSameElements([32, 64, 96, 160, 192, 320, 384, 576, 960, 1280, 110 | 1001], s) 111 | 112 | def testDivisibleByWithArgScope(self): 113 | tf.reset_default_graph() 114 | # Verifies that depth_multiplier arg scope actually works 115 | # if no default min_depth is provided. 116 | with slim.arg_scope((mobilenet.depth_multiplier,), min_depth=32): 117 | mobilenet_v2.mobilenet( 118 | tf.placeholder(tf.float32, (10, 224, 224, 2)), 119 | conv_defs=mobilenet_v2.V2_DEF, depth_multiplier=0.1) 120 | s = [op.outputs[0].get_shape().as_list()[-1] for op in find_ops('Conv2D')] 121 | s = set(s) 122 | self.assertSameElements(s, [32, 192, 128, 1001]) 123 | 124 | def testFineGrained(self): 125 | tf.reset_default_graph() 126 | # Verifies that depth_multiplier arg scope actually works 127 | # if no default min_depth is provided. 128 | 129 | mobilenet_v2.mobilenet( 130 | tf.placeholder(tf.float32, (10, 224, 224, 2)), 131 | conv_defs=mobilenet_v2.V2_DEF, depth_multiplier=0.01, 132 | finegrain_classification_mode=True) 133 | s = [op.outputs[0].get_shape().as_list()[-1] for op in find_ops('Conv2D')] 134 | s = set(s) 135 | # All convolutions will be 8->48, except for the last one. 136 | self.assertSameElements(s, [8, 48, 1001, 1280]) 137 | 138 | def testMobilenetBase(self): 139 | tf.reset_default_graph() 140 | # Verifies that mobilenet_base returns pre-pooling layer. 141 | with slim.arg_scope((mobilenet.depth_multiplier,), min_depth=32): 142 | net, _ = mobilenet_v2.mobilenet_base( 143 | tf.placeholder(tf.float32, (10, 224, 224, 16)), 144 | conv_defs=mobilenet_v2.V2_DEF, depth_multiplier=0.1) 145 | self.assertEqual(net.get_shape().as_list(), [10, 7, 7, 128]) 146 | 147 | def testWithOutputStride16(self): 148 | tf.reset_default_graph() 149 | out, _ = mobilenet.mobilenet_base( 150 | tf.placeholder(tf.float32, (10, 224, 224, 16)), 151 | conv_defs=mobilenet_v2.V2_DEF, 152 | output_stride=16) 153 | self.assertEqual(out.get_shape().as_list()[1:3], [14, 14]) 154 | 155 | def testWithOutputStride8AndExplicitPadding(self): 156 | tf.reset_default_graph() 157 | out, _ = mobilenet.mobilenet_base( 158 | tf.placeholder(tf.float32, (10, 224, 224, 16)), 159 | conv_defs=mobilenet_v2.V2_DEF, 160 | output_stride=8, 161 | use_explicit_padding=True, 162 | scope='MobilenetV2') 163 | self.assertEqual(out.get_shape().as_list()[1:3], [28, 28]) 164 | 165 | def testWithOutputStride16AndExplicitPadding(self): 166 | tf.reset_default_graph() 167 | out, _ = mobilenet.mobilenet_base( 168 | tf.placeholder(tf.float32, (10, 224, 224, 16)), 169 | conv_defs=mobilenet_v2.V2_DEF, 170 | output_stride=16, 171 | use_explicit_padding=True) 172 | self.assertEqual(out.get_shape().as_list()[1:3], [14, 14]) 173 | 174 | def testBatchNormScopeDoesNotHaveIsTrainingWhenItsSetToNone(self): 175 | sc = mobilenet.training_scope(is_training=None) 176 | self.assertNotIn('is_training', sc[slim.arg_scope_func_key( 177 | slim.batch_norm)]) 178 | 179 | def testBatchNormScopeDoesHasIsTrainingWhenItsNotNone(self): 180 | sc = mobilenet.training_scope(is_training=False) 181 | self.assertIn('is_training', sc[slim.arg_scope_func_key(slim.batch_norm)]) 182 | sc = mobilenet.training_scope(is_training=True) 183 | self.assertIn('is_training', sc[slim.arg_scope_func_key(slim.batch_norm)]) 184 | sc = mobilenet.training_scope() 185 | self.assertIn('is_training', sc[slim.arg_scope_func_key(slim.batch_norm)]) 186 | 187 | 188 | if __name__ == '__main__': 189 | tf.test.main() 190 | -------------------------------------------------------------------------------- /mobilenet/readme.md: -------------------------------------------------------------------------------- 1 | # MobileNetV2 2 | Placeholder for the code for MobileNetV2 from the TFSlim model library. 3 | Copy the code from the following link and place in this folder: 4 | 5 | https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet 6 | -------------------------------------------------------------------------------- /model.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from mobilenet import mobilenet_v2 3 | import input_pipeline 4 | import pprint 5 | import time 6 | from PIL import Image 7 | import imagenet_preprocessing 8 | import absl 9 | from argparse import Namespace 10 | import sys 11 | import os 12 | import numpy as np 13 | import horovod.tensorflow as hvd 14 | from tensorflow.core.framework.summary_pb2 import Summary 15 | from tensorflow import logging 16 | import mobilefacenet 17 | import horovod.tensorflow as hvd 18 | from mobilenet.mobilenet import global_pool 19 | from mobilenet import conv_blocks as ops 20 | import proxy_metric_learning 21 | from inception import inception_utils 22 | from inception import inception_v1 23 | import resnet 24 | from cifar import cifar 25 | from cifar.dist import pairwise_distance_euclid, triplet_semihard_loss 26 | 27 | slim = tf.contrib.slim 28 | # Define all necessary hyperparameters as flags 29 | flags = absl.flags 30 | FLAGS = flags.FLAGS 31 | 32 | 33 | def imagenet_iterator(is_training, num_epochs=1): 34 | filter_fn = None 35 | fixed_crop = False 36 | if 'face' in FLAGS.model: 37 | input_pipeline._DEFAULT_IMAGE_SIZE = 112 38 | imagenet_preprocessing._RESIZE_MIN = 112 39 | # imagenet_preprocessing._AREA_RANGE = [1.0, 1.0] 40 | # imagenet_preprocessing._ASPECT_RATIO_RANGE = [0.9, 1.1] 41 | # is_training = False 42 | fixed_crop = True 43 | 44 | if FLAGS.model == 'metric_learning': 45 | input_pipeline._DEFAULT_IMAGE_SIZE = 112 46 | imagenet_preprocessing._RESIZE_MIN = 112 47 | imagenet_preprocessing._AREA_RANGE = [0.7, 1.0] 48 | imagenet_preprocessing._ASPECT_RATIO_RANGE = [0.9, 1.1] 49 | fixed_crop = False 50 | 51 | if FLAGS.model == 'mobilenet': 52 | imagenet_preprocessing._AREA_RANGE = [0.2, 1.0] 53 | fixed_crop = False 54 | 55 | if FLAGS.model == 'cifar100': 56 | return cifar.input_fn(FLAGS.data_dir, is_training, FLAGS.batch_size, num_epochs) 57 | else: 58 | data = input_pipeline.input_fn( 59 | is_training=is_training, 60 | data_dir=FLAGS.data_dir, 61 | batch_size=FLAGS.batch_size, 62 | num_epochs=num_epochs, 63 | return_label_text=True, 64 | filter_fn=filter_fn, 65 | fixed_crop=fixed_crop 66 | ) 67 | 68 | iterator = data.make_one_shot_iterator() 69 | features, labels, label_texts, filename = iterator.get_next() 70 | feature_dict = { 71 | "features": features, 72 | "labels": labels, 73 | "label_texts": label_texts, 74 | "filename": filename 75 | } 76 | 77 | return feature_dict, labels 78 | 79 | 80 | def model_fn(features, labels, mode): 81 | feature_dict = features 82 | labels = tf.reshape(feature_dict["labels"], [-1]) 83 | 84 | if mode == tf.estimator.ModeKeys.TRAIN: 85 | is_training = True 86 | else: 87 | is_training = False 88 | 89 | if FLAGS.model == 'mobilenet': 90 | scope = mobilenet_v2.training_scope(is_training=is_training) 91 | with tf.contrib.slim.arg_scope(scope): 92 | net, end_points = mobilenet_v2.mobilenet( 93 | feature_dict["features"], is_training=is_training, num_classes=FLAGS.num_classes) 94 | end_points['embedding'] = end_points['global_pool'] 95 | elif FLAGS.model == 'mobilefacenet': 96 | scope = mobilenet_v2.training_scope(is_training=is_training) 97 | with tf.contrib.slim.arg_scope(scope): 98 | net, end_points = mobilefacenet.mobilefacenet( 99 | feature_dict["features"], is_training=is_training, num_classes=FLAGS.num_classes) 100 | elif FLAGS.model == 'metric_learning': 101 | with slim.arg_scope(inception_v1.inception_v1_arg_scope(weight_decay=0.0)): 102 | net, end_points = proxy_metric_learning.metric_learning( 103 | feature_dict["features"], is_training=is_training, num_classes=FLAGS.num_classes) 104 | elif FLAGS.model == 'faceresnet': 105 | net, end_points = mobilefacenet.faceresnet( 106 | feature_dict["features"], is_training=is_training, num_classes=FLAGS.num_classes) 107 | elif FLAGS.model == 'cifar100': 108 | net, end_points = cifar.nin(feature_dict["features"], labels, 109 | is_training=is_training, num_classes=FLAGS.num_classes) 110 | else: 111 | raise ValueError("Unknown model %s" % FLAGS.model) 112 | 113 | small_embeddings = tf.squeeze(end_points['embedding']) 114 | logits = end_points['Logits'] 115 | predictions = tf.cast(tf.argmax(logits, 1), dtype=tf.int32) 116 | 117 | if FLAGS.final_activation in ['soft_thresh', 'none']: 118 | abs_embeddings = tf.abs(small_embeddings) 119 | else: 120 | abs_embeddings = small_embeddings 121 | nnz_small_embeddings = tf.cast(tf.less(FLAGS.zero_threshold, abs_embeddings), dtype=tf.float32) 122 | small_sparsity = tf.reduce_sum(nnz_small_embeddings, axis=1) 123 | small_sparsity = tf.reduce_mean(small_sparsity) 124 | 125 | mean_nnz_col = tf.reduce_mean(nnz_small_embeddings, axis=0) 126 | sum_nnz_col = tf.reduce_sum(nnz_small_embeddings, axis=0) 127 | mean_flops_ub = tf.reduce_sum(sum_nnz_col * (sum_nnz_col - 1)) / (FLAGS.batch_size * (FLAGS.batch_size - 1)) 128 | mean_flops = tf.reduce_sum(mean_nnz_col * mean_nnz_col) 129 | 130 | l1_norm_row = tf.reduce_sum(abs_embeddings, axis=1) 131 | l1_norm_col = tf.reduce_mean(abs_embeddings, axis=0) 132 | 133 | mean_l1_norm = tf.reduce_mean(l1_norm_row) 134 | mean_flops_sur = tf.reduce_sum(l1_norm_col * l1_norm_col) 135 | 136 | if mode == tf.estimator.ModeKeys.PREDICT: 137 | predictions_dict = { 138 | 'predictions': predictions, 139 | 'true_labels': feature_dict["labels"], 140 | 'true_label_texts': feature_dict["label_texts"], 141 | 'small_embeddings': small_embeddings, 142 | 'sparsity/small': small_sparsity, 143 | 'sparsity/flops': mean_flops_ub, 144 | 'filename': features["filename"] 145 | } 146 | return tf.estimator.EstimatorSpec(mode, predictions=predictions_dict) 147 | 148 | if FLAGS.model == 'mobilenet': 149 | cr_ent_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels) 150 | elif 'face' in FLAGS.model: 151 | cr_ent_loss = mobilefacenet.arcface_loss(logits=logits, labels=labels, out_num=FLAGS.num_classes) 152 | elif FLAGS.model == 'metric_learning': 153 | cr_ent_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels) 154 | # cr_ent_loss = mobilefacenet.arcface_loss(logits=logits, labels=labels, out_num=FLAGS.num_classes) 155 | elif FLAGS.model == 'cifar100': 156 | # cr_ent_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels) 157 | cr_ent_loss = tf.contrib.losses.metric_learning.triplet_semihard_loss( 158 | labels, small_embeddings, margin=0.1) 159 | # cr_ent_loss = mobilefacenet.arcface_loss(logits=logits, 160 | # labels=labels, out_num=FLAGS.num_classes, m=0.0) 161 | # cr_ent_loss = triplet_semihard_loss(labels=labels, embeddings=small_embeddings, 162 | # pairwise_distance=lambda embed: pairwise_distance_euclid(embed, squared=True), margin=0.3) 163 | else: 164 | raise ValueError('Unknown model %s' % FLAGS.model) 165 | cr_ent_loss = tf.reduce_mean(cr_ent_loss) + tf.losses.get_regularization_loss() 166 | 167 | ema = tf.train.ExponentialMovingAverage(decay=0.99) 168 | ema_op = ema.apply([mean_l1_norm]) 169 | moving_l1_norm = ema.average(mean_l1_norm) 170 | 171 | global_step = tf.train.get_or_create_global_step() 172 | all_ops = [ema_op] 173 | 174 | l1_weight = tf.Variable(0.0, name='l1_weight', trainable=False) 175 | 176 | if FLAGS.l1_weighing_scheme == 'constant': 177 | l1_weight = FLAGS.l1_parameter 178 | elif FLAGS.l1_weighing_scheme == 'dynamic_1': 179 | l1_weight = FLAGS.l1_parameter / moving_l1_norm 180 | l1_weight = tf.stop_gradient(l1_weight) 181 | l1_weight = tf.train.piecewise_constant( 182 | x=global_step, boundaries=[5], values=[0.0, l1_weight]) 183 | elif FLAGS.l1_weighing_scheme == 'dynamic_2': 184 | if FLAGS.sparsity_type == "flops_sur": 185 | update_lr = 1e-5 186 | else: 187 | update_lr = 1e-4 188 | update = update_lr * (FLAGS.l1_parameter - cr_ent_loss) 189 | assign_op = tf.assign(l1_weight, tf.nn.relu(l1_weight + update)) 190 | all_ops.append(assign_op) 191 | elif FLAGS.l1_weighing_scheme == 'dynamic_3': 192 | update_lr = 1e-4 193 | global_step = tf.train.get_or_create_global_step() 194 | upper_bound = FLAGS.l1_parameter - (FLAGS.l1_parameter - 12.0) * tf.cast(global_step, tf.float32) / 5e5 195 | upper_bound = tf.cast(upper_bound, tf.float32) 196 | update = update_lr * tf.sign(upper_bound - cr_ent_loss) 197 | assign_op = tf.assign(l1_weight, l1_weight + tf.nn.relu(update)) 198 | all_ops.append(assign_op) 199 | elif FLAGS.l1_weighing_scheme == 'dynamic_4': 200 | l1_weight = FLAGS.l1_parameter * tf.minimum(1.0, 201 | (tf.cast(global_step, tf.float32) / FLAGS.l1_p_steps)**2) 202 | elif FLAGS.l1_weighing_scheme is None: 203 | l1_weight = 0.0 204 | else: 205 | raise ValueError('Unknown l1_weighing_scheme %s' % FLAGS.l1_weighing_scheme) 206 | 207 | if FLAGS.sparsity_type == 'l1_norm': 208 | sparsity_loss = l1_weight * mean_l1_norm 209 | elif FLAGS.sparsity_type == 'flops_sur': 210 | sparsity_loss = l1_weight * mean_flops_sur 211 | elif FLAGS.sparsity_type is None: 212 | sparsity_loss = 0.0 213 | else: 214 | raise ValueError("Unknown sparsity_type %d" % FLAGS.sparsity_type) 215 | 216 | total_loss = cr_ent_loss + sparsity_loss 217 | accuracy = tf.reduce_mean(tf.cast(tf.equal(predictions, labels), dtype=tf.float32)) 218 | 219 | if mode == tf.estimator.ModeKeys.EVAL: 220 | global_accuracy = tf.metrics.mean(accuracy) 221 | global_small_sparsity = tf.metrics.mean(small_sparsity) 222 | global_flops = tf.metrics.mean(mean_flops_ub) 223 | global_l1_norm = tf.metrics.mean(mean_l1_norm) 224 | global_mean_flops_sur = tf.metrics.mean(mean_flops_sur) 225 | global_cr_ent_loss = tf.metrics.mean(cr_ent_loss) 226 | 227 | metrics = { 228 | 'accuracy': global_accuracy, 229 | 'sparsity/small': global_small_sparsity, 230 | 'sparsity/flops': global_flops, 231 | 'l1_norm': global_l1_norm, 232 | 'l1_norm/flops_sur': global_mean_flops_sur, 233 | 'loss/cr_ent_loss': global_cr_ent_loss, 234 | } 235 | return tf.estimator.EstimatorSpec(mode, loss=total_loss, eval_metric_ops=metrics) 236 | 237 | if mode == tf.estimator.ModeKeys.TRAIN: 238 | base_lrate = FLAGS.learning_rate 239 | learning_rate = tf.train.piecewise_constant( 240 | x=global_step, 241 | boundaries=[FLAGS.decay_step if FLAGS.decay_step is not None else int(1e7)], 242 | values=[base_lrate, base_lrate / 10.0]) 243 | 244 | tf.summary.image("input", feature_dict["features"], max_outputs=1) 245 | tf.summary.scalar('sparsity/small', small_sparsity) 246 | tf.summary.scalar('sparsity/flops', mean_flops_ub) 247 | tf.summary.scalar('accuracy', accuracy) 248 | tf.summary.scalar('l1_norm', mean_l1_norm) 249 | tf.summary.scalar('l1_norm/ema', moving_l1_norm) # Comment this for mom 250 | tf.summary.scalar('l1_norm/l1_weight', l1_weight) 251 | tf.summary.scalar('l1_norm/flops_sur', mean_flops_sur) 252 | tf.summary.scalar('learning_rate', learning_rate) 253 | tf.summary.scalar('loss/cr_ent_loss', cr_ent_loss) 254 | tf.summary.scalar('sparsity/ratio', 255 | mean_flops * FLAGS.embedding_size / (small_sparsity * small_sparsity)) 256 | try: 257 | tf.summary.scalar('loss/upper_bound', upper_bound) 258 | except NameError: 259 | print("Skipping 'upper_bound' summary") 260 | for variable in tf.trainable_variables(): 261 | if 'soft_thresh' in variable.name: 262 | print('Adding summary for lambda') 263 | tf.summary.scalar('lambda', variable) 264 | 265 | # Histogram summaries 266 | # gamma = tf.get_default_graph().get_tensor_by_name('MobilenetV2/Conv_2/BatchNorm/gamma:0') 267 | # pre_relu = tf.get_default_graph().get_tensor_by_name( 268 | # 'MobilenetV2/Conv_2/BatchNorm/FusedBatchNorm:0') 269 | # pre_relu = tf.squeeze(pre_relu) 270 | # tf.summary.histogram('gamma', gamma) 271 | # tf.summary.histogram('pre_relu', pre_relu[:, 237]) 272 | # tf.summary.histogram('small_activations', nnz_small_embeddings) 273 | # tf.summary.histogram('small_activations/log', tf.log(nnz_small_embeddings + 1e-10)) 274 | # fl_sur_ratio = (mean_nnz_col * mean_nnz_col) / (l1_norm_col * l1_norm_col) 275 | # tf.summary.histogram('fl_sur_ratio', fl_sur_ratio) 276 | # tf.summary.histogram('fl_sur_ratio/log', tf.log(fl_sur_ratio + 1e-10)) 277 | # l1_sur_ratio = (mean_nnz_col * mean_nnz_col) / l1_norm_col 278 | # tf.summary.histogram('l1_sur_ratio', l1_sur_ratio) 279 | # tf.summary.histogram('l1_sur_ratio/log', tf.log(l1_sur_ratio + 1e-10)) 280 | 281 | if FLAGS.optimizer == 'mom': 282 | optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, 283 | momentum=FLAGS.momentum) 284 | elif FLAGS.optimizer == 'sgd': 285 | optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate) 286 | elif FLAGS.optimizer == 'adam': 287 | optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=0.0001) 288 | elif FLAGS.optimizer == 'rmsprop': 289 | optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate, epsilon=0.01, 290 | momentum=FLAGS.momentum) 291 | else: 292 | raise ValueError("Unknown optimizer %s" % FLAGS.optimizer) 293 | 294 | # Make the optimizer distributed 295 | optimizer = hvd.DistributedOptimizer(optimizer) 296 | 297 | train_op = tf.contrib.slim.learning.create_train_op( 298 | total_loss, 299 | optimizer, 300 | global_step=tf.train.get_or_create_global_step() 301 | ) 302 | all_ops.append(train_op) 303 | merged = tf.group(*all_ops) 304 | 305 | return tf.estimator.EstimatorSpec(mode, loss=total_loss, train_op=merged) 306 | -------------------------------------------------------------------------------- /preprocessing/build_image_data.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Google Inc. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Converts image data to TFRecords file format with Example protos. 16 | 17 | The image data set is expected to reside in JPEG files located in the 18 | following directory structure. 19 | 20 | data_dir/label_0/image0.jpeg 21 | data_dir/label_0/image1.jpg 22 | ... 23 | data_dir/label_1/weird-image.jpeg 24 | data_dir/label_1/my-image.jpeg 25 | ... 26 | 27 | where the sub-directory is the unique label associated with these images. 28 | 29 | This TensorFlow script converts the training and evaluation data into 30 | a sharded data set consisting of TFRecord files 31 | 32 | train_directory/train-00000-of-01024 33 | train_directory/train-00001-of-01024 34 | ... 35 | train_directory/train-01023-of-01024 36 | 37 | and 38 | 39 | validation_directory/validation-00000-of-00128 40 | validation_directory/validation-00001-of-00128 41 | ... 42 | validation_directory/validation-00127-of-00128 43 | 44 | where we have selected 1024 and 128 shards for each data set. Each record 45 | within the TFRecord file is a serialized Example proto. The Example proto 46 | contains the following fields: 47 | 48 | image/encoded: string containing JPEG encoded image in RGB colorspace 49 | image/height: integer, image height in pixels 50 | image/width: integer, image width in pixels 51 | image/colorspace: string, specifying the colorspace, always 'RGB' 52 | image/channels: integer, specifying the number of channels, always 3 53 | image/format: string, specifying the format, always 'JPEG' 54 | 55 | image/filename: string containing the basename of the image file 56 | e.g. 'n01440764_10026.JPEG' or 'ILSVRC2012_val_00000293.JPEG' 57 | image/class/label: integer specifying the index in a classification layer. 58 | The label ranges from [0, num_labels] where 0 is unused and left as 59 | the background class. 60 | image/class/text: string specifying the human-readable version of the label 61 | e.g. 'dog' 62 | 63 | If your data set involves bounding boxes, please look at build_imagenet_data.py. 64 | """ 65 | from __future__ import absolute_import 66 | from __future__ import division 67 | from __future__ import print_function 68 | 69 | from datetime import datetime 70 | import os 71 | import random 72 | import sys 73 | import threading 74 | import json 75 | 76 | import numpy as np 77 | import tensorflow as tf 78 | 79 | tf.app.flags.DEFINE_string('train_directory', None, 80 | 'Training data directory') 81 | tf.app.flags.DEFINE_string('validation_directory', None, 82 | 'Validation data directory') 83 | tf.app.flags.DEFINE_string('output_directory', None, 84 | 'Output data directory') 85 | 86 | tf.app.flags.DEFINE_integer('train_shards', 1024, 87 | 'Number of shards in training TFRecord files.') 88 | tf.app.flags.DEFINE_integer('validation_shards', 1024, 89 | 'Number of shards in validation TFRecord files.') 90 | 91 | tf.app.flags.DEFINE_integer('num_threads', 16, 92 | 'Number of threads to preprocess the images.') 93 | 94 | # The labels file contains a list of valid labels are held in this file. 95 | # Assumes that the file contains entries as such: 96 | # dog 97 | # cat 98 | # flower 99 | # where each line corresponds to a label. We map each label contained in 100 | # the file to an integer corresponding to the line number starting from 0. 101 | tf.app.flags.DEFINE_string('labels_file', None, 'Labels file') 102 | tf.app.flags.DEFINE_string('filelist', None, 'File list') 103 | 104 | 105 | FLAGS = tf.app.flags.FLAGS 106 | 107 | 108 | def _int64_feature(value): 109 | """Wrapper for inserting int64 features into Example proto.""" 110 | if not isinstance(value, list): 111 | value = [value] 112 | return tf.train.Feature(int64_list=tf.train.Int64List(value=value)) 113 | 114 | 115 | def _bytes_feature(value): 116 | """Wrapper for inserting bytes features into Example proto.""" 117 | return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) 118 | 119 | 120 | def _convert_to_example(filename, image_buffer, label, text, height, width): 121 | """Build an Example proto for an example. 122 | 123 | Args: 124 | filename: string, path to an image file, e.g., '/path/to/example.JPG' 125 | image_buffer: string, JPEG encoding of RGB image 126 | label: integer, identifier for the ground truth for the network 127 | text: string, unique human-readable, e.g. 'dog' 128 | height: integer, image height in pixels 129 | width: integer, image width in pixels 130 | Returns: 131 | Example proto 132 | """ 133 | 134 | colorspace = 'RGB' 135 | channels = 3 136 | image_format = 'JPEG' 137 | 138 | example = tf.train.Example(features=tf.train.Features(feature={ 139 | 'image/height': _int64_feature(height), 140 | 'image/width': _int64_feature(width), 141 | 'image/colorspace': _bytes_feature(tf.compat.as_bytes(colorspace)), 142 | 'image/channels': _int64_feature(channels), 143 | 'image/class/label': _int64_feature(label), 144 | 'image/class/text': _bytes_feature(tf.compat.as_bytes(text)), 145 | 'image/format': _bytes_feature(tf.compat.as_bytes(image_format)), 146 | 'image/filename': _bytes_feature(tf.compat.as_bytes(os.path.basename(filename))), 147 | 'image/encoded': _bytes_feature(tf.compat.as_bytes(image_buffer))})) 148 | return example 149 | 150 | 151 | class ImageCoder(object): 152 | """Helper class that provides TensorFlow image coding utilities.""" 153 | 154 | def __init__(self): 155 | # Create a single Session to run all image coding calls. 156 | self._sess = tf.Session() 157 | 158 | # Initializes function that converts PNG to JPEG data. 159 | self._png_data = tf.placeholder(dtype=tf.string) 160 | image = tf.image.decode_png(self._png_data, channels=3) 161 | self._png_to_jpeg = tf.image.encode_jpeg(image, format='rgb', quality=100) 162 | 163 | # Initializes function that decodes RGB JPEG data. 164 | self._decode_jpeg_data = tf.placeholder(dtype=tf.string) 165 | self._decode_jpeg = tf.image.decode_jpeg(self._decode_jpeg_data, channels=3) 166 | 167 | def png_to_jpeg(self, image_data): 168 | return self._sess.run(self._png_to_jpeg, 169 | feed_dict={self._png_data: image_data}) 170 | 171 | def decode_jpeg(self, image_data): 172 | image = self._sess.run(self._decode_jpeg, 173 | feed_dict={self._decode_jpeg_data: image_data}) 174 | assert len(image.shape) == 3 175 | assert image.shape[2] == 3 176 | return image 177 | 178 | 179 | def _is_png(filename): 180 | """Determine if a file contains a PNG format image. 181 | 182 | Args: 183 | filename: string, path of the image file. 184 | 185 | Returns: 186 | boolean indicating if the image is a PNG. 187 | """ 188 | return filename.endswith('.png') 189 | 190 | 191 | def _process_image(filename, coder): 192 | """Process a single image file. 193 | 194 | Args: 195 | filename: string, path to an image file e.g., '/path/to/example.JPG'. 196 | coder: instance of ImageCoder to provide TensorFlow image coding utils. 197 | Returns: 198 | image_buffer: string, JPEG encoding of RGB image. 199 | height: integer, image height in pixels. 200 | width: integer, image width in pixels. 201 | """ 202 | # Read the image file. 203 | with tf.gfile.FastGFile(filename, 'rb') as f: 204 | image_data = f.read() 205 | 206 | # Convert any PNG to JPEG's for consistency. 207 | if _is_png(filename): 208 | # print('Converting PNG to JPEG for %s' % filename) 209 | image_data = coder.png_to_jpeg(image_data) 210 | 211 | # Decode the RGB JPEG. 212 | image = coder.decode_jpeg(image_data) 213 | 214 | # Check that image converted to RGB 215 | assert len(image.shape) == 3 216 | height = image.shape[0] 217 | width = image.shape[1] 218 | assert image.shape[2] == 3 219 | 220 | return image_data, height, width 221 | 222 | 223 | def _process_image_files_batch(coder, thread_index, ranges, name, filenames, 224 | texts, labels, num_shards): 225 | """Processes and saves list of images as TFRecord in 1 thread. 226 | 227 | Args: 228 | coder: instance of ImageCoder to provide TensorFlow image coding utils. 229 | thread_index: integer, unique batch to run index is within [0, len(ranges)). 230 | ranges: list of pairs of integers specifying ranges of each batches to 231 | analyze in parallel. 232 | name: string, unique identifier specifying the data set 233 | filenames: list of strings; each string is a path to an image file 234 | texts: list of strings; each string is human readable, e.g. 'dog' 235 | labels: list of integer; each integer identifies the ground truth 236 | num_shards: integer number of shards for this data set. 237 | """ 238 | # Each thread produces N shards where N = int(num_shards / num_threads). 239 | # For instance, if num_shards = 128, and the num_threads = 2, then the first 240 | # thread would produce shards [0, 64). 241 | num_threads = len(ranges) 242 | assert not num_shards % num_threads 243 | num_shards_per_batch = int(num_shards / num_threads) 244 | 245 | shard_ranges = np.linspace(ranges[thread_index][0], 246 | ranges[thread_index][1], 247 | num_shards_per_batch + 1).astype(int) 248 | num_files_in_thread = ranges[thread_index][1] - ranges[thread_index][0] 249 | 250 | counter = 0 251 | for s in range(num_shards_per_batch): 252 | # Generate a sharded version of the file name, e.g. 'train-00002-of-00010' 253 | shard = thread_index * num_shards_per_batch + s 254 | output_filename = '%s-%.5d-of-%.5d' % (name, shard, num_shards) 255 | output_file = os.path.join(FLAGS.output_directory, output_filename) 256 | writer = tf.python_io.TFRecordWriter(output_file) 257 | 258 | shard_counter = 0 259 | files_in_shard = np.arange(shard_ranges[s], shard_ranges[s + 1], dtype=int) 260 | for i in files_in_shard: 261 | filename = filenames[i] 262 | label = labels[i] 263 | text = texts[i] 264 | 265 | try: 266 | image_buffer, height, width = _process_image(filename, coder) 267 | except Exception as e: 268 | print(e) 269 | print('SKIPPED: Unexpected error while decoding %s.' % filename) 270 | continue 271 | 272 | example = _convert_to_example(filename, image_buffer, label, 273 | text, height, width) 274 | writer.write(example.SerializeToString()) 275 | shard_counter += 1 276 | counter += 1 277 | 278 | if not counter % 1000: 279 | print('%s [thread %d]: Processed %d of %d images in thread batch.' % 280 | (datetime.now(), thread_index, counter, num_files_in_thread)) 281 | sys.stdout.flush() 282 | 283 | writer.close() 284 | print('%s [thread %d]: Wrote %d images to %s' % 285 | (datetime.now(), thread_index, shard_counter, output_file)) 286 | sys.stdout.flush() 287 | shard_counter = 0 288 | print('%s [thread %d]: Wrote %d images to %d shards.' % 289 | (datetime.now(), thread_index, counter, num_files_in_thread)) 290 | sys.stdout.flush() 291 | 292 | 293 | def _process_image_files(name, filenames, texts, labels, num_shards): 294 | """Process and save list of images as TFRecord of Example protos. 295 | 296 | Args: 297 | name: string, unique identifier specifying the data set 298 | filenames: list of strings; each string is a path to an image file 299 | texts: list of strings; each string is human readable, e.g. 'dog' 300 | labels: list of integer; each integer identifies the ground truth 301 | num_shards: integer number of shards for this data set. 302 | """ 303 | assert len(filenames) == len(texts) 304 | assert len(filenames) == len(labels) 305 | 306 | # Break all images into batches with a [ranges[i][0], ranges[i][1]]. 307 | spacing = np.linspace(0, len(filenames), FLAGS.num_threads + 1).astype(np.int) 308 | ranges = [] 309 | for i in range(len(spacing) - 1): 310 | ranges.append([spacing[i], spacing[i + 1]]) 311 | 312 | # Launch a thread for each batch. 313 | print('Launching %d threads for spacings: %s' % (FLAGS.num_threads, ranges)) 314 | sys.stdout.flush() 315 | 316 | # Create a mechanism for monitoring when all threads are finished. 317 | coord = tf.train.Coordinator() 318 | 319 | # Create a generic TensorFlow-based utility for converting all image codings. 320 | coder = ImageCoder() 321 | 322 | threads = [] 323 | for thread_index in range(len(ranges)): 324 | args = (coder, thread_index, ranges, name, filenames, 325 | texts, labels, num_shards) 326 | t = threading.Thread(target=_process_image_files_batch, args=args) 327 | t.start() 328 | threads.append(t) 329 | 330 | # Wait for all the threads to terminate. 331 | coord.join(threads) 332 | print('%s: Finished writing all %d images in data set.' % 333 | (datetime.now(), len(filenames))) 334 | sys.stdout.flush() 335 | 336 | 337 | def _find_image_files(data_dir, labels_file): 338 | """Build a list of all images files and labels in the data set. 339 | 340 | Args: 341 | data_dir: string, path to the root directory of images. 342 | 343 | Assumes that the image data set resides in JPEG files located in 344 | the following directory structure. 345 | 346 | data_dir/dog/another-image.JPEG 347 | data_dir/dog/my-image.jpg 348 | 349 | where 'dog' is the label associated with these images. 350 | 351 | labels_file: string, path to the labels file. 352 | 353 | The list of valid labels are held in this file. Assumes that the file 354 | contains entries as such: 355 | dog 356 | cat 357 | flower 358 | where each line corresponds to a label. We map each label contained in 359 | the file to an integer starting with the integer 0 corresponding to the 360 | label contained in the first line. 361 | 362 | Returns: 363 | filenames: list of strings; each string is a path to an image file. 364 | texts: list of strings; each string is the class, e.g. 'dog' 365 | labels: list of integer; each integer identifies the ground truth. 366 | """ 367 | print('Determining list of input files and labels from %s.' % data_dir) 368 | unique_labels = [l.strip() for l in tf.gfile.FastGFile( 369 | labels_file, 'r').readlines()] 370 | 371 | labels = [] 372 | filenames = [] 373 | texts = [] 374 | 375 | # Leave label index 0 empty as a background class. 376 | label_index = 1 377 | 378 | # Construct the list of JPEG files and labels. 379 | for text in unique_labels: 380 | patterns = [] 381 | for extension in ['JPEG', 'jpg', 'JPG', 'jpeg', 'png', 'PNG']: 382 | pattern = '%s/%s/*.%s' % (data_dir, text, extension) 383 | patterns.append(pattern) 384 | matching_files = tf.gfile.Glob(patterns) 385 | 386 | labels.extend([label_index] * len(matching_files)) 387 | texts.extend([text] * len(matching_files)) 388 | filenames.extend(matching_files) 389 | 390 | if not label_index % 100: 391 | print('Finished finding files in %d of %d classes.' % ( 392 | label_index, len(unique_labels))) 393 | label_index += 1 394 | 395 | # Shuffle the ordering of all image files in order to guarantee 396 | # random ordering of the images with respect to label in the 397 | # saved TFRecord files. Make the randomization repeatable. 398 | shuffled_index = list(range(len(filenames))) 399 | random.seed(12345) 400 | random.shuffle(shuffled_index) 401 | 402 | filenames = [filenames[i] for i in shuffled_index] 403 | texts = [texts[i] for i in shuffled_index] 404 | labels = [labels[i] for i in shuffled_index] 405 | 406 | print('Found %d JPEG files across %d labels inside %s.' % 407 | (len(filenames), len(unique_labels), data_dir)) 408 | return filenames, texts, labels 409 | 410 | 411 | def _process_dataset(name, directory, num_shards, labels_file=None, filelist=None): 412 | """Process a complete data set and save it as a TFRecord. 413 | 414 | Args: 415 | name: string, unique identifier specifying the data set. 416 | directory: string, root path to the data set. 417 | num_shards: integer number of shards for this data set. 418 | labels_file: string, path to the labels file. 419 | """ 420 | if filelist is not None: 421 | with open(filelist, 'r') as fin: 422 | filelist = json.load(fin) 423 | filenames = filelist["path"] 424 | filenames = [os.path.join(directory, path) for path in filenames] 425 | if "id" in filelist: 426 | texts = filelist["id"] 427 | label_dict = {} 428 | labels = [] 429 | for label in texts: 430 | if label not in label_dict: 431 | label_dict[label] = len(label_dict) 432 | labels.append(label_dict[label]) 433 | else: 434 | texts = ["_"] * len(filenames) 435 | labels = [-1] * len(filenames) 436 | 437 | shuffled_index = list(range(len(filenames))) 438 | random.seed(12345) 439 | random.shuffle(shuffled_index) 440 | 441 | filenames = [filenames[i] for i in shuffled_index] 442 | texts = [texts[i] for i in shuffled_index] 443 | labels = [labels[i] for i in shuffled_index] 444 | 445 | print("*" * 10, "Found %d files" % len(filenames)) 446 | 447 | else: 448 | filenames, texts, labels = _find_image_files(directory, labels_file) 449 | _process_image_files(name, filenames, texts, labels, num_shards) 450 | 451 | 452 | def main(unused_argv): 453 | assert not FLAGS.train_shards % FLAGS.num_threads, ( 454 | 'Please make the FLAGS.num_threads commensurate with FLAGS.train_shards') 455 | assert not FLAGS.validation_shards % FLAGS.num_threads, ( 456 | 'Please make the FLAGS.num_threads commensurate with ' 457 | 'FLAGS.validation_shards') 458 | print('Saving results to %s' % FLAGS.output_directory) 459 | 460 | assert not FLAGS.output_directory is None, ('Please specify a output directory') 461 | 462 | # Run it! 463 | if FLAGS.validation_directory is not None: 464 | if FLAGS.filelist is not None: 465 | _process_dataset('validation', FLAGS.validation_directory, 466 | FLAGS.validation_shards, filelist=FLAGS.filelist) 467 | else: 468 | _process_dataset('validation', FLAGS.validation_directory, 469 | FLAGS.validation_shards, labels_file=FLAGS.labels_file) 470 | else: 471 | print("Validation directory not specified, skipping") 472 | 473 | if FLAGS.train_directory is not None: 474 | if FLAGS.filelist is not None: 475 | _process_dataset('train', FLAGS.train_directory, 476 | FLAGS.train_shards, filelist=FLAGS.filelist) 477 | else: 478 | _process_dataset('train', FLAGS.train_directory, 479 | FLAGS.train_shards, labels_file=FLAGS.labels_file) 480 | else: 481 | print("Train directory not specified, skipping") 482 | 483 | if __name__ == '__main__': 484 | tf.app.run() 485 | -------------------------------------------------------------------------------- /preprocessing/msceleb_to_jpeg.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Extracts jpeg images from MXNet format dataset. 3 | Usage msceleb_to_jpeg.py dataset_directory 4 | Extracted images will be stored in dataset_directory/images 5 | ''' 6 | 7 | import os 8 | import random 9 | import logging 10 | import sys 11 | import numbers 12 | import numpy as np 13 | 14 | import mxnet as mx 15 | from mxnet import ndarray as nd 16 | from mxnet import io 17 | from mxnet import recordio 18 | from PIL import Image 19 | import pickle 20 | 21 | 22 | def get_msceleb_images(records_dir): 23 | imgidx_path = os.path.join(records_dir, "train.idx") 24 | imgrec_path = os.path.join(records_dir, "train.rec") 25 | images_dir = os.path.join(records_dir, "images") 26 | 27 | imgrec = recordio.MXIndexedRecordIO(imgidx_path, imgrec_path, 'r') 28 | s = imgrec.read_idx(0) 29 | header, _ = recordio.unpack(s) 30 | tot_images = int(header.label[0]) - 1 31 | print("Total images", tot_images) 32 | for i in range(tot_images): 33 | print("Reading ", i) 34 | s = imgrec.read() 35 | header, img = recordio.unpack(s) 36 | img = mx.image.imdecode(img).asnumpy() 37 | label = int(header.label) 38 | img = Image.fromarray(np.uint8(img), "RGB") 39 | images_subdir = os.path.join(images_dir, "identity_%d" % label) 40 | if not os.path.exists(images_subdir): 41 | os.makedirs(images_subdir) 42 | image_path = os.path.join(images_subdir, "image_%d.jpg" % i) 43 | img.save(image_path) 44 | 45 | 46 | def get_lfw_images(records_dir): 47 | lfw_bin = os.path.join(records_dir, "lfw.bin") 48 | bins, issame_list = pickle.load(open(lfw_bin, "rb"), encoding='bytes') 49 | lfw_images = os.path.join(records_dir, "lfw_images") 50 | if not os.path.exists(lfw_images): 51 | os.makedirs(lfw_images) 52 | 53 | for i in range(1000): 54 | bin_ = bins[i] 55 | img = mx.image.imdecode(bin_).asnumpy() 56 | img = Image.fromarray(np.uint8(img), "RGB") 57 | image_path = os.path.join(lfw_images, "image_%d.jpg" % i) 58 | img.save(image_path) 59 | 60 | 61 | def main(records_dir): 62 | get_msceleb_images(records_dir) 63 | # get_lfw_images(records_dir) 64 | 65 | 66 | if __name__ == "__main__": 67 | main(sys.argv[1]) 68 | -------------------------------------------------------------------------------- /preprocessing/preprocess_fs_mf_official.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Script to preprocess a set of labelled images and store in 4 | # sharded TFRecord files. The sharded files are stored under 5 | # the data directory. 6 | # Assumes that the images are organized into folders according 7 | # to their labels inside the data directory. 8 | # labels.txt should contain a list of the labels. For example 9 | # n03114236 10 | # n03114807 11 | # ... 12 | # 13 | # usage: 14 | # ./preprocess_fs_mf_official.sh data-dir 15 | 16 | set -e 17 | 18 | if (( $# != 2 )); then 19 | echo "Usage: preprocess_[dataset].sh [data dir] [file list]" 20 | exit 21 | fi 22 | 23 | # Build the TFRecords version of the ImageNet data. 24 | echo "Now creating TFRecord files. This might take a while." 25 | BUILD_SCRIPT="build_image_data.py" 26 | 27 | IMAGES_DIR="${1}/aligned_if/" 28 | OUTPUT_DIR="${1}/tfrecords_official/" 29 | FILELIST="${2}" 30 | 31 | echo "Writing to ${OUTPUT_DIR}" 32 | mkdir -p "${OUTPUT_DIR}" 33 | 34 | export CUDA_VISIBLE_DEVICES="" 35 | python "${BUILD_SCRIPT}" \ 36 | --validation_directory="${IMAGES_DIR}" \ 37 | --output_directory="${OUTPUT_DIR}" \ 38 | --filelist="${FILELIST}" \ 39 | --validation_shards=128 40 | 41 | echo "Done" 42 | -------------------------------------------------------------------------------- /preprocessing/preprocess_msceleb.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Script to preprocess a set of labelled images and store in 4 | # sharded TFRecord files. The sharded files are stored under 5 | # the data directory. 6 | # Assumes that the images are organized into folders according 7 | # to their labels inside the data directory. 8 | # lablels.txt should contain a list of the labels. For example 9 | # n03114236 10 | # n03114807 11 | # ... 12 | # 13 | # usage: 14 | # ./preprocess_msceleb.sh data-dir 15 | 16 | set -e 17 | 18 | if (( $# != 1 )); then 19 | echo "Usage: preprocess_msceleb.sh [data dir]" 20 | exit 21 | fi 22 | 23 | # Build the TFRecords version of the ImageNet data. 24 | echo "Now creating TFRecord files. This might take a while." 25 | BUILD_SCRIPT="build_image_data.py" 26 | 27 | IMAGES_DIR="${1}/images" 28 | OUTPUT_DIR="${1}/tfrecords/train" 29 | LABELS_FILE="${1}/labels.txt" 30 | 31 | echo "Writing to ${OUTPUT_DIR}" 32 | mkdir -p "${OUTPUT_DIR}" 33 | 34 | python "${BUILD_SCRIPT}" \ 35 | --train_directory="${IMAGES_DIR}" \ 36 | --output_directory="${OUTPUT_DIR}" \ 37 | --labels_file="${LABELS_FILE}" 38 | 39 | echo "Done" 40 | -------------------------------------------------------------------------------- /proxy_metric_learning.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from mobilenet import mobilenet_v2 3 | import input_pipeline 4 | import pprint 5 | import time 6 | from PIL import Image 7 | import imagenet_preprocessing 8 | import absl 9 | from argparse import Namespace 10 | import sys 11 | import os 12 | import numpy as np 13 | import horovod.tensorflow as hvd 14 | from tensorflow.core.framework.summary_pb2 import Summary 15 | from tensorflow import logging 16 | import mobilefacenet 17 | import horovod.tensorflow as hvd 18 | from mobilenet.mobilenet import global_pool 19 | from mobilenet import conv_blocks as ops 20 | from inception import inception_v1, inception_v2, inception_v3, inception_v4 21 | from inception import inception_utils 22 | 23 | inception_v1.default_image_size = 112 24 | inception_v2.default_image_size = 112 25 | inception_v3.default_image_size = 112 26 | inception_v4.default_image_size = 112 27 | 28 | slim = tf.contrib.slim 29 | # Define all necessary hyperparameters as flags 30 | flags = absl.flags 31 | FLAGS = flags.FLAGS 32 | 33 | expand_input = ops.expand_input_by_factor 34 | 35 | 36 | # class ProxyInitializerHook(tf.train.SessionRunHook): 37 | # def begin(self): 38 | # print("*" * 20, "begin called") 39 | # with tf.variable_scope('Logits', reuse=True): 40 | # weights = tf.get_variable(name='proxy_wts') 41 | # with tf.variable_scope('proxy_init'): 42 | # weights_placeholder = tf.placeholder(dtype=tf.float32, name='weights_placeholder') 43 | # self.assign_op = tf.assign(weights, weights_placeholder) 44 | 45 | # def after_create_session(self, session, coord): 46 | # global_step = tf.train.get_or_create_global_step() 47 | # step = sess.run([global_step]) 48 | # if step == 0: 49 | 50 | 51 | 52 | def proxy_layer(embedding, num_classes): 53 | with tf.variable_scope('Logits'): 54 | weights = tf.get_variable(name='proxy_wts', 55 | shape=(embedding.shape[-1], num_classes), 56 | initializer=tf.random_normal_initializer(stddev=0.001), 57 | dtype=tf.float32) 58 | weights = tf.nn.l2_normalize(weights, axis=0) 59 | # bias = tf.Variable(tf.zeros([num_classes])) 60 | alpha = 16.0 61 | logits = alpha * tf.matmul(embedding, weights) # + bias 62 | return logits 63 | 64 | 65 | def arc_logits(embedding, out_num): 66 | with tf.variable_scope('Logits'): 67 | weights = tf.get_variable(name='proxy_wts', shape=(embedding.shape[-1], out_num), 68 | initializer=tf.random_normal_initializer(stddev=0.001), 69 | dtype=tf.float32) 70 | weights = tf.nn.l2_normalize(weights, axis=0) 71 | # assign_op = tf.assign(weights, tf.nn.l2_normalize(weights, axis=0)) 72 | # with tf.control_dependencies([assign_op]): 73 | cos_t = tf.matmul(embedding, weights, name='cos_t') 74 | 75 | return cos_t 76 | 77 | 78 | def soft_thresh(lam): 79 | # lam = tf.get_variable( 80 | # "soft_thresh", shape=(), 81 | # initializer=tf.zeros_initializer, 82 | # constraint=lambda x: "hello") 83 | # lam = lam 84 | def activation(x): 85 | ret = tf.nn.relu(x - lam) - tf.nn.relu(-x - lam) 86 | return ret 87 | return activation 88 | 89 | 90 | def metric_learning(features, is_training, num_classes): 91 | # embedding, end_points = inception_v1.inception_v1( 92 | # features, embedding_dim=FLAGS.embedding_size, use_bn=True, is_training=is_training, 93 | # global_pool=True) 94 | embedding, end_points = inception_v4.inception_v4( 95 | features, num_classes=FLAGS.embedding_size, is_training=is_training, 96 | create_aux_logits=False) 97 | if FLAGS.final_activation == 'soft_thresh': 98 | act = soft_thresh(0.5) 99 | elif FLAGS.final_activation == 'relu': 100 | act = tf.nn.relu 101 | elif FLAGS.final_activation == 'none': 102 | act = lambda x: x 103 | else: 104 | raise ValueError('Unsupported activation', FLAGS.final_activation) 105 | embedding = act(embedding) 106 | embedding = tf.nn.l2_normalize(embedding, axis=1) 107 | end_points["embedding"] = embedding 108 | # net = tf.squeeze(net, axis=[1, 2]) 109 | logits = proxy_layer(embedding, num_classes) 110 | end_points["Logits"] = logits 111 | 112 | return logits, end_points 113 | -------------------------------------------------------------------------------- /resnet/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/biswajitsc/sparse-embed/775404d38ce4382251b88168820da3740c1323ea/resnet/__init__.py -------------------------------------------------------------------------------- /run_scripts/cifar/dense_64.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES="1" 2 | python train.py \ 3 | --model_dir=checkpoints/cifar/dense_64_run_1 \ 4 | --learning_rate=1e-4 --optimizer=adam \ 5 | --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 6 | --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=none \ 7 | --max_steps=20000 --decay_step=10000 8 | -------------------------------------------------------------------------------- /run_scripts/cifar/sparse_64.sh: -------------------------------------------------------------------------------- 1 | # soft thresh lam = 0.1 margin = 0.3 -> 0.5 l1_param decreased 2 | # export CUDA_VISIBLE_DEVICES="1" 3 | # python train.py \ 4 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_1 \ 5 | # --learning_rate=1e-3 --optimizer=adam \ 6 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 7 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 8 | # --max_steps=25000 --sparsity_type=flops_sur --l1_parameter=0.01 --l1_p_steps=21000 \ 9 | # --decay_step=10000 --l1_weighing_scheme=dynamic_4 10 | 11 | # soft thresh lam = 0.1 margin = 0.5 12 | # export CUDA_VISIBLE_DEVICES="1" 13 | # python train.py \ 14 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_2 \ 15 | # --learning_rate=1e-3 --optimizer=adam \ 16 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 17 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 18 | # --max_steps=20000 --sparsity_type=flops_sur --l1_parameter=0.002 --l1_p_steps=21000 \ 19 | # --decay_step=10000 --l1_weighing_scheme=dynamic_4 20 | 21 | # soft_thresh lam = 0.2 margin = 0.3 22 | # export CUDA_VISIBLE_DEVICES="1" 23 | # python train.py \ 24 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_3 \ 25 | # --learning_rate=1e-3 --optimizer=adam \ 26 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 27 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 28 | # --max_steps=25000 --decay_step=10000 29 | 30 | # soft_thresh lam = 0.1 margin = 0.3 31 | # export CUDA_VISIBLE_DEVICES="1" 32 | # python train.py \ 33 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_4 \ 34 | # --learning_rate=1e-3 --optimizer=adam \ 35 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 36 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 37 | # --max_steps=25000 --decay_step=10000 38 | 39 | 40 | # changing l2 regularization from 0.00004 -> 0.0001 and 0.0005 -> 0.001 41 | # export CUDA_VISIBLE_DEVICES="3" 42 | # python train.py \ 43 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_5 \ 44 | # --learning_rate=1e-3 --optimizer=adam \ 45 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 46 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 47 | # --max_steps=25000 --decay_step=10000 48 | 49 | # changing last layer l2 regularization from 0.001 -> 0.005 50 | # export CUDA_VISIBLE_DEVICES="2" 51 | # python train.py \ 52 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_6 \ 53 | # --learning_rate=1e-3 --optimizer=adam \ 54 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 55 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 56 | # --max_steps=25000 --decay_step=10000 57 | 58 | # export CUDA_VISIBLE_DEVICES="0" 59 | # python train.py \ 60 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_7 \ 61 | # --learning_rate=1e-3 --optimizer=adam \ 62 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 63 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 64 | # --max_steps=30000 --decay_step=10000 --l1_weighing_scheme=dynamic_4 \ 65 | # --l1_parameter=0.002 --l1_p_steps=12000 66 | 67 | # ****************************** 68 | # Changing margin from 0.3 to 0.2 69 | # export CUDA_VISIBLE_DEVICES="2" 70 | # python train.py \ 71 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_8 \ 72 | # --learning_rate=1e-3 --optimizer=adam \ 73 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 74 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 75 | # --max_steps=40000 --decay_step=10000 --l1_weighing_scheme=dynamic_4 \ 76 | # --l1_parameter=0.01 --l1_p_steps=12000 77 | 78 | # export CUDA_VISIBLE_DEVICES="0" 79 | # python train.py \ 80 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_9 \ 81 | # --learning_rate=1e-3 --optimizer=adam \ 82 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 83 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 84 | # --max_steps=30000 --decay_step=6000 --l1_weighing_scheme=dynamic_4 \ 85 | # --l1_parameter=0.02 --l1_p_steps=20000 86 | 87 | # export CUDA_VISIBLE_DEVICES="0" 88 | # python train.py \ 89 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_10 \ 90 | # --learning_rate=1e-3 --optimizer=adam \ 91 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 92 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 93 | # --max_steps=50000 --decay_step=10000 --l1_weighing_scheme=dynamic_4 \ 94 | # --l1_parameter=0.001 --l1_p_steps=12000 95 | 96 | # decreasing final regularization to 0.001 and margin to 0.1 97 | # export CUDA_VISIBLE_DEVICES="1" 98 | # python train.py \ 99 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_11 \ 100 | # --learning_rate=1e-3 --optimizer=adam \ 101 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 102 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 103 | # --max_steps=50000 --decay_step=15000 --l1_weighing_scheme=dynamic_4 \ 104 | # --l1_parameter=0.005 --l1_p_steps=15000 105 | 106 | # export CUDA_VISIBLE_DEVICES="0" 107 | # python train.py \ 108 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_12 \ 109 | # --learning_rate=1e-3 --optimizer=adam \ 110 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 111 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 112 | # --max_steps=50000 --decay_step=15000 --l1_weighing_scheme=dynamic_4 \ 113 | # --l1_parameter=0.0 --l1_p_steps=15000 114 | 115 | # export CUDA_VISIBLE_DEVICES="0" 116 | # python train.py \ 117 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_13 \ 118 | # --learning_rate=1e-3 --optimizer=adam \ 119 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 120 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 121 | # --max_steps=50000 --decay_step=15000 --l1_weighing_scheme=dynamic_4 \ 122 | # --l1_parameter=0.001 --l1_p_steps=15000 --sparsity_type=flops_sur 123 | 124 | export CUDA_VISIBLE_DEVICES="3" 125 | python train.py \ 126 | --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_14 \ 127 | --learning_rate=1e-3 --optimizer=adam \ 128 | --data_dir=../cifar-100 --evaluate=True --num_classes=101 \ 129 | --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \ 130 | --max_steps=50000 --decay_step=15000 --l1_weighing_scheme=dynamic_4 \ 131 | --l1_parameter=0.001 --l1_p_steps=15000 --sparsity_type=l1_norm 132 | -------------------------------------------------------------------------------- /run_scripts/face/dense_mobilenet_128.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3" 2 | 3 | horovodrun -np 4 -H localhost:4 \ 4 | python train.py \ 5 | --model_dir=checkpoints/msceleb/dense_mobilenet_128_run_1 --max_steps=180000 \ 6 | --learning_rate=1e-2 --decay_step=150000 --optimizer=mom --momentum=0.9 \ 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \ 8 | --num_classes=85743 --batch_size=256 --model=mobilefacenet \ 9 | --embedding_size=128 --evaluate=True --final_activation=none; 10 | -------------------------------------------------------------------------------- /run_scripts/face/dense_mobilenet_512.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3" 2 | 3 | horovodrun -np 4 -H localhost:4 \ 4 | python train.py \ 5 | --model_dir=checkpoints/msceleb/dense_mobilenet_512_run_1 --max_steps=180000 \ 6 | --learning_rate=1e-2 --decay_step=150000 --optimizer=mom --momentum=0.9 \ 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \ 8 | --num_classes=85743 --batch_size=256 --model=mobilefacenet \ 9 | --embedding_size=512 --evaluate=True --final_activation=none; 10 | -------------------------------------------------------------------------------- /run_scripts/face/dense_resnet_128.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3" 2 | 3 | horovodrun -np 4 -H localhost:4 \ 4 | python train.py \ 5 | --model_dir=checkpoints/msceleb/dense_resnet_128_run_1 --max_steps=180000 \ 6 | --learning_rate=1e-2 --decay_step=150000 --optimizer=mom --momentum=0.9 \ 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \ 8 | --num_classes=85743 --batch_size=64 --model=faceresnet \ 9 | --embedding_size=128 --evaluate=True --final_activation=none; 10 | -------------------------------------------------------------------------------- /run_scripts/face/dense_resnet_512.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3" 2 | 3 | horovodrun -np 4 -H localhost:4 \ 4 | python train.py \ 5 | --model_dir=checkpoints/msceleb/dense_resnet_512_run_1 --max_steps=180000 \ 6 | --learning_rate=1e-2 --decay_step=150000 --optimizer=mom --momentum=0.9 \ 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \ 8 | --num_classes=85743 --batch_size=64 --model=faceresnet \ 9 | --embedding_size=512 --evaluate=True --final_activation=none; 10 | -------------------------------------------------------------------------------- /run_scripts/face/sparse_mobilenet_fl.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3" 2 | 3 | horovodrun -np 4 -H localhost:4 \ 4 | python train.py \ 5 | --model_dir=checkpoints/msceleb/sparse_mobilenet_fl_run_1 \ 6 | --learning_rate=1e-4 --optimizer=mom --momentum=0.9 \ 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \ 8 | --num_classes=85743 --batch_size=256 --model=mobilefacenet \ 9 | --embedding_size=1024 --sparsity_type=flops_sur \ 10 | --l1_weighing_scheme=dynamic_4 --l1_parameter=100.0 \ 11 | --evaluate=True --final_activation=relu; 12 | -------------------------------------------------------------------------------- /run_scripts/face/sparse_mobilenet_l1.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES="1,2,3,4" 2 | 3 | horovodrun -np 4 -H localhost:4 \ 4 | python train.py \ 5 | --model_dir=checkpoints/msceleb/sparse_mobilenet_l1_run_1 \ 6 | --learning_rate=1e-3 --optimizer=mom --momentum=0.9 \ 7 | --l1_weighing_scheme=dynamic_4 --l1_parameter=1.0 \ 8 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \ 9 | --num_classes=85743 --batch_size=256 --model=mobilefacenet \ 10 | --embedding_size=1024 --sparsity_type=l1_norm \ 11 | --evaluate=True --final_activation=relu; 12 | -------------------------------------------------------------------------------- /run_scripts/face/sparse_resnet_fl.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3" 2 | 3 | horovodrun -np 4 -H localhost:4 \ 4 | python train.py \ 5 | --model_dir=checkpoints/msceleb/sparse_resnet_dynamic_4_fl_run_1 \ 6 | --learning_rate=1e-3 --decay_step=160000 --optimizer=mom --momentum=0.9 \ 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \ 8 | --num_classes=85743 --batch_size=64 --model=faceresnet \ 9 | --embedding_size=1024 --sparsity_type=flops_sur --max_steps=230000 \ 10 | --l1_weighing_scheme=dynamic_4 --l1_parameter=100.0 -l1_p_steps=100000 \ 11 | --evaluate=True --final_activation=soft_thresh; 12 | -------------------------------------------------------------------------------- /run_scripts/face/sparse_resnet_l1.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3" 2 | 3 | horovodrun -np 4 -H localhost:4 \ 4 | python train.py \ 5 | --model_dir=checkpoints/msceleb/sparse_resnet_flops_run_1 \ 6 | --learning_rate=1e-3 --decay_step=160000 --optimizer=mom --momentum=0.9 \ 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \ 8 | --num_classes=85743 --batch_size=64 --model=faceresnet \ 9 | --embedding_size=1024 --sparsity_type=l1_norm --max_steps=230000 \ 10 | --l1_weighing_scheme=dynamic_4 --l1_parameter=3.0 -l1_p_steps=100000 \ 11 | --evaluate=True --final_activation=soft_thresh; 12 | -------------------------------------------------------------------------------- /run_scripts/get_embeddings/face_mobilenet_reranking.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES="1" 2 | set -e 3 | 4 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=128 \ 5 | --model_dir=checkpoints/msceleb/dense_mobilenet_128_run_1 \ 6 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \ 7 | --final_activation=none --model=mobilefacenet --debug=False \ 8 | --filelist=../megaface_distractors/facescrub_features_list.json \ 9 | --prefix=facescrub 10 | 11 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=128 \ 12 | --model_dir=checkpoints/msceleb/dense_mobilenet_128_run_1 \ 13 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \ 14 | --final_activation=none --model=mobilefacenet --debug=False \ 15 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \ 16 | --prefix=megaface 17 | 18 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=512 \ 19 | --model_dir=checkpoints/msceleb/dense_mobilenet_512_run_1 \ 20 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \ 21 | --final_activation=none --model=mobilefacenet --debug=False \ 22 | --filelist=../megaface_distractors/facescrub_features_list.json \ 23 | --prefix=facescrub 24 | 25 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=512 \ 26 | --model_dir=checkpoints/msceleb/dense_mobilenet_512_run_1 \ 27 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \ 28 | --final_activation=none --model=mobilefacenet --debug=False \ 29 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \ 30 | --prefix=megaface 31 | 32 | 33 | for i in {3..8} 34 | do 35 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \ 36 | --model_dir=checkpoints/msceleb/sparse_mobilenet_fl_run_$i \ 37 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \ 38 | --final_activation=relu --model=mobilefacenet --debug=False \ 39 | --filelist=../megaface_distractors/facescrub_features_list.json \ 40 | --prefix=facescrub 41 | done 42 | 43 | for i in {1..6} 44 | do 45 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \ 46 | --model_dir=checkpoints/msceleb/sparse_mobilenet_l1_run_$i \ 47 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \ 48 | --final_activation=relu --model=mobilefacenet --debug=False \ 49 | --filelist=../megaface_distractors/facescrub_features_list.json \ 50 | --prefix=facescrub 51 | done 52 | 53 | for i in {3..8} 54 | do 55 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \ 56 | --model_dir=checkpoints/msceleb/sparse_mobilenet_fl_run_$i \ 57 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \ 58 | --final_activation=relu --model=mobilefacenet --debug=False \ 59 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \ 60 | --prefix=megaface 61 | done 62 | 63 | for i in {1..6} 64 | do 65 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \ 66 | --model_dir=checkpoints/msceleb/sparse_mobilenet_l1_run_$i \ 67 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \ 68 | --final_activation=relu --model=mobilefacenet --debug=False \ 69 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \ 70 | --prefix=megaface 71 | done -------------------------------------------------------------------------------- /run_scripts/get_embeddings/face_resnet_reranking.sh: -------------------------------------------------------------------------------- 1 | export CUDA_VISIBLE_DEVICES="2" 2 | set -e 3 | 4 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=512 \ 5 | --model_dir=checkpoints/msceleb/dense_resnet_512_run_1 \ 6 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \ 7 | --final_activation=none --model=faceresnet --debug=False \ 8 | --filelist=../megaface_distractors/facescrub_features_list.json \ 9 | --prefix=facescrub 10 | 11 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=512 \ 12 | --model_dir=checkpoints/msceleb/dense_resnet_512_run_1 \ 13 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \ 14 | --final_activation=none --model=faceresnet --debug=False \ 15 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \ 16 | --prefix=megaface 17 | 18 | 19 | for i in {1..4} 20 | do 21 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \ 22 | --model_dir=checkpoints/msceleb/sparse_resnet_fl_run_$i \ 23 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \ 24 | --final_activation=soft_thresh --model=faceresnet --debug=False \ 25 | --filelist=../megaface_distractors/facescrub_features_list.json \ 26 | --prefix=facescrub 27 | done 28 | 29 | for i in {1..4} 30 | do 31 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \ 32 | --model_dir=checkpoints/msceleb/sparse_resnet_l1_run_$i \ 33 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \ 34 | --final_activation=soft_thresh --model=faceresnet --debug=False \ 35 | --filelist=../megaface_distractors/facescrub_features_list.json \ 36 | --prefix=facescrub 37 | done 38 | 39 | for i in {1..4} 40 | do 41 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \ 42 | --model_dir=checkpoints/msceleb/sparse_resnet_fl_run_$i \ 43 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \ 44 | --final_activation=soft_thresh --model=faceresnet --debug=False \ 45 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \ 46 | --prefix=megaface 47 | done 48 | 49 | for i in {1..4} 50 | do 51 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \ 52 | --model_dir=checkpoints/msceleb/sparse_resnet_l1_run_$i \ 53 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \ 54 | --final_activation=soft_thresh --model=faceresnet --debug=False \ 55 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \ 56 | --prefix=megaface 57 | done -------------------------------------------------------------------------------- /search/Makefile: -------------------------------------------------------------------------------- 1 | all: clean compile_search 2 | 3 | compile_search: 4 | g++ -O3 -fopenmp -std=c++11 search.cpp -o search 5 | 6 | runmegaface: 7 | ./search ../face/megaface_sparse_1024.hdf5.bin 1000 0.25 8 | 9 | runfacescrub: 10 | ./search ../face/megaface_sparse_1024.hdf5.bin 1000 0.25 ../face/facescrub_sparse_1024.hdf5.bin 11 | 12 | clean: 13 | rm -f ./search 14 | -------------------------------------------------------------------------------- /search/hdf5_to_bin.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import h5py 3 | import sys 4 | from tqdm import tqdm 5 | 6 | try: 7 | embed_fpath = sys.argv[1] 8 | output_dir = sys.argv[2] 9 | print("Reading from", embed_fpath) 10 | except: 11 | print('python3 hdf5_to_bin.py [embed_fpath] [output_dir]') 12 | exit(0) 13 | 14 | embedfname = embed_fpath.split('/')[-1] 15 | idx = embedfname.rfind('.') 16 | embedfname = embedfname[:idx] 17 | 18 | print('Writing to', embedfname + '.bin') 19 | #with h5py.File(embed_fpath, 'r') as fp, \ 20 | # open(output_dir+'/'+embedfname+'.bin','wb') as fpw, open(output_dir+'/'+embedfname+'.filemap', 'w') as fpw2: 21 | 22 | with h5py.File(embed_fpath, 'r') as fp, open(output_dir+'/'+embedfname+'.bin','wb') as fpw: 23 | 24 | N = fp['embedding'].shape[0] 25 | dim = fp['embedding'].shape[1] 26 | batch_size = 1000 27 | 28 | #Number of instances 29 | #print( 'total number of instances={}'.format(N) ) 30 | fpw.write( np.int32(N) ) 31 | 32 | for i in tqdm(range(N)): #number of batches 33 | embedding = fp['embedding'][i] 34 | label = fp['label'][i] 35 | 36 | nnz_ind = np.where(embedding > 1e-5)[0] 37 | nnz_vals = embedding[nnz_ind] 38 | 39 | # ind_batch = fp[KEY_IND][b:b+batch_size] 40 | # val_batch = fp[KEY_VAL][b:b+batch_size] 41 | # lab_batch = fp[KEY_LABEL][b:b+batch_size] 42 | #txt_batch = fp[KEY_TEXT][b:b+batch_size] 43 | #fname_batch = fp[KEY_FNAME][b:b+batch_size] 44 | 45 | #print('processing {}...'.format(b)) 46 | fpw.write(np.int32(label)) #label 47 | #fpw.write(np.int32(-1)) #label 48 | nnz = np.int32(len(nnz_ind)) 49 | fpw.write(nnz) #number of nonzero elements in the instance 50 | fpw.write( np.int32(nnz_ind).tobytes() ) #indices 51 | fpw.write( np.float32(nnz_vals).tobytes() ) #values 52 | 53 | -------------------------------------------------------------------------------- /search/mobilefacenet_to_bin.sh: -------------------------------------------------------------------------------- 1 | set -e 2 | 3 | for i in {1..6}; 4 | do 5 | python hdf5_to_bin.py ../embeddings_reranking/facescrub_sparse_mobilenet_dynamic_4_l1_run_${i}.hdf5 ../embeddings_bin 6 | python hdf5_to_bin.py ../embeddings_reranking/megaface_sparse_mobilenet_dynamic_4_l1_run_${i}.hdf5 ../embeddings_bin 7 | done 8 | 9 | for i in {3..8}; 10 | do 11 | python hdf5_to_bin.py ../embeddings_reranking/facescrub_sparse_mobilenet_dynamic_4_fl_run_${i}.hdf5 ../embeddings_bin 12 | python hdf5_to_bin.py ../embeddings_reranking/megaface_sparse_mobilenet_dynamic_4_fl_run_${i}.hdf5 ../embeddings_bin 13 | done -------------------------------------------------------------------------------- /search/resnet_to_bin.sh: -------------------------------------------------------------------------------- 1 | set -e 2 | 3 | for i in {1..4}; 4 | do 5 | python hdf5_to_bin.py ../embeddings_reranking/facescrub_sparse_resnet_dynamic_4_l1_run_${i}.hdf5 ../embeddings_bin 6 | python hdf5_to_bin.py ../embeddings_reranking/megaface_sparse_resnet_dynamic_4_l1_run_${i}.hdf5 ../embeddings_bin 7 | done 8 | 9 | for i in {1..4}; 10 | do 11 | python hdf5_to_bin.py ../embeddings_reranking/facescrub_sparse_resnet_dynamic_4_fl_run_${i}.hdf5 ../embeddings_bin 12 | python hdf5_to_bin.py ../embeddings_reranking/megaface_sparse_resnet_dynamic_4_fl_run_${i}.hdf5 ../embeddings_bin 13 | done -------------------------------------------------------------------------------- /search/search.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | 11 | using namespace std; 12 | 13 | typedef vector Vec; 14 | typedef vector > SparseVec; 15 | typedef int Label; 16 | 17 | typedef pair Instance; 18 | typedef vector InvertedIndex; 19 | 20 | const int QUERY_SIZE = 10000; 21 | const int NUM_INS_PER_PRINT = 100000; 22 | 23 | int TOP_R = 1; 24 | float CONF_THD = 0.25f; 25 | 26 | 27 | void read_embed_data(char* embed_fpath, vector& embed_data, int& dim, int& num_class, int limit_N = 10000000){ 28 | 29 | ifstream fin(embed_fpath, ios::in | ios::binary); 30 | int N; 31 | fin.read( (char*) &N, sizeof(int) ); 32 | N = min( N, limit_N ); 33 | 34 | cerr << "Reading " << N << " instances..." << endl; 35 | 36 | int label_offset = num_class; 37 | embed_data.resize(N); 38 | for(int i=0;ifirst = lab; 45 | if(lab+1 > num_class) num_class = lab+1; 46 | fin.read((char*) &nnz, sizeof(int)); 47 | SparseVec& embed = embed_data[i]->second; 48 | embed.resize(nnz); 49 | for(int j=0;j dim) 52 | dim = embed[j].first+1; 53 | } 54 | for(int j=0;jfirst < it2->first ) 69 | it1++; 70 | else if( it1->first > it2->first ) 71 | it2++; 72 | else{ 73 | sum += it1->second * it2->second; 74 | it1++; 75 | it2++; 76 | } 77 | } 78 | return sum; 79 | } 80 | 81 | 82 | void split_into_query_db(vector& data, int query_size, 83 | vector& query_list, vector& db_list){ 84 | //Assume the data is already shuffled. 85 | // random_shuffle(data.begin(), data.end()); 86 | query_list.clear(); 87 | for(int i=0;i &a, const pair &b){ 95 | return (a.second > b.second); 96 | } 97 | 98 | 99 | int topk = 1000; 100 | 101 | class SearchEngine { 102 | public: 103 | SearchEngine(vector& data, int _dim){ 104 | N = data.size(); 105 | db = data; 106 | dim = _dim; 107 | build_index(db, dim); 108 | score_vec.resize(N); 109 | rank_list.resize(N); 110 | dense_query.resize(dim); 111 | } 112 | 113 | void build_index(vector& data, int dim){ 114 | int N = data.size(); 115 | long nnz = 0, nnz_thd = 0; 116 | db_index.resize(dim); 117 | for(int i=0;isecond; 119 | for(auto& p: embed){ 120 | db_index[p.first].push_back(make_pair(i,p.second)); 121 | nnz++; 122 | } 123 | } 124 | cerr << "Index built. NNZ% = " << ((float)nnz/N/dim*100.0) << endl; 125 | 126 | mata.resize(N, vector(dim)); 127 | for (size_t i=0; i CONF_THD) 149 | rank_list[rank_list_size++] = make_pair(i,score_vec[i]); 150 | } 151 | if(rank_list_size >= topk) 152 | nth_element(rank_list.begin(), rank_list.begin()+topk-1, rank_list.begin() + rank_list_size, sortbysec); 153 | int max_iter = min(rank_list_size, topk); 154 | 155 | for (size_t j=0; j TOP_R) 163 | nth_element(rank_list.begin(), rank_list.begin()+TOP_R-1, rank_list.begin()+max_iter-1, sortbysec); 164 | return max_iter; 165 | // *********************************** Comment ends here **************************** 166 | return 0; 167 | } 168 | 169 | 170 | private: 171 | int N; 172 | int dim; 173 | vector db; 174 | InvertedIndex db_index; 175 | vector score_vec; 176 | vector > mata; 177 | vector< pair > rank_list; 178 | vector dense_query; 179 | }; 180 | 181 | 182 | int main(int argc, char** argv){ 183 | if( argc-1 < 3 ){ 184 | cerr << "Command line arguments error" << endl; 185 | cerr << "Usage: ./search_test [queryset_fpath]" << endl; 186 | exit(0); 187 | } 188 | srand(1); 189 | char* embed_fpath = argv[1]; 190 | topk = atoi(argv[2]); 191 | CONF_THD = atof(argv[3]); 192 | cerr << "Confidence thr " << CONF_THD << endl; 193 | TOP_R = 1; 194 | char* queryset_fpath = NULL; 195 | if( argc-1 > 3 ){ 196 | queryset_fpath = argv[4]; 197 | } 198 | 199 | //Read Embeddings 200 | vector embed_data; 201 | int dim=-1, num_class=0; 202 | read_embed_data(embed_fpath, embed_data, dim, num_class); 203 | cerr << "N=" << embed_data.size() << ", D=" << dim << ", #class=" << num_class << endl; 204 | int num_nonquery_class = num_class; 205 | 206 | vector queryset_data; 207 | if(queryset_fpath!=NULL){ 208 | read_embed_data(queryset_fpath, queryset_data, dim, num_class); 209 | cerr << "Nq=" << queryset_data.size() << ", D=" << dim << ", #class=" << num_class << endl; 210 | } 211 | 212 | //Split into Query and DB 213 | vector query_list, db_list; 214 | if(queryset_fpath==NULL) { 215 | int i; 216 | for(i=0; i num_ins(num_class, 0); 231 | for(int i=0;ifirst]++; 234 | } 235 | 236 | //Search 237 | double start = omp_get_wtime(); 238 | int Nt = query_list.size(); 239 | 240 | int empty_count=0; 241 | int max_rank_len = -1; 242 | long cum_rank_len = 0; 243 | for(int i=0;ifirst; 246 | SparseVec& embed = q->second; 247 | int rank_list_size = engine->search(embed); 248 | cum_rank_len += rank_list_size; 249 | max_rank_len = max(max_rank_len, rank_list_size); 250 | } 251 | double end = omp_get_wtime(); 252 | 253 | float avg_rank_len = cum_rank_len / Nt; 254 | cerr << "Max reranking list length = " << max_rank_len << endl; 255 | cerr << "Average reranking list length = " << avg_rank_len << endl; 256 | cerr << "Total search time = " << end-start << "s" << endl; 257 | cerr << "Search Time per Query = " << 1e3*(end-start)/Nt << "ms" << endl; 258 | 259 | return 0; 260 | } 261 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import model 3 | import pprint 4 | import time 5 | import datetime 6 | from PIL import Image 7 | import absl 8 | import sys 9 | import os 10 | import numpy as np 11 | import horovod.tensorflow as hvd 12 | from tensorflow.core.framework.summary_pb2 import Summary 13 | from tensorflow import logging 14 | import resource 15 | import flags 16 | import evaluate_retrieval 17 | import evaluate_mfacenet 18 | import traceback 19 | from cifar import cifar 20 | 21 | 22 | # Define all necessary hyperparameters as flags 23 | FLAGS = absl.flags.FLAGS 24 | IMAGENET_NUM_TRAINING = 1281167 25 | 26 | 27 | class ImageRateHook(tf.train.StepCounterHook): 28 | ''' 29 | A simple extension of StepCounterHook to count images/sec. 30 | Overriding the _log_and_record function is a simple hack to achieve this. 31 | ''' 32 | def _log_and_record(self, elapsed_steps, elapsed_time, global_step): 33 | images_per_sec = elapsed_steps / elapsed_time * FLAGS.batch_size * hvd.size() 34 | summary_tag = 'images/sec' 35 | if self._summary_writer is not None: 36 | summary = Summary(value=[Summary.Value( 37 | tag=summary_tag, simple_value=images_per_sec)]) 38 | self._summary_writer.add_summary(summary, global_step) 39 | logging.info("%s: %g", summary_tag, images_per_sec) 40 | 41 | 42 | def main(_): 43 | hvd.init() 44 | 45 | # Only a see a single unique GPU based on process rank 46 | config = tf.ConfigProto() 47 | 48 | # We don't need allow_growth=True as we will use a whole GPU for each process. 49 | # config.gpu_options.allow_growth = True 50 | config.gpu_options.visible_device_list = str(hvd.local_rank()) 51 | 52 | # Only one of the workers save checkpoints and summaries in the 53 | # model directory 54 | if hvd.rank() == 0: 55 | config = tf.estimator.RunConfig( 56 | model_dir=FLAGS.model_dir, 57 | keep_checkpoint_every_n_hours=5, 58 | save_summary_steps=100, 59 | save_checkpoints_secs=60 * 5, 60 | session_config=config 61 | ) 62 | else: 63 | config = tf.estimator.RunConfig( 64 | session_config=config, 65 | keep_checkpoint_max=1 66 | ) 67 | 68 | if FLAGS.mobilenet_checkpoint_path is not None: 69 | # ^((?!badword).)*$ matches all strings which do not contain the badword 70 | ws = tf.estimator.WarmStartSettings( 71 | ckpt_to_initialize_from=FLAGS.mobilenet_checkpoint_path, 72 | vars_to_warm_start='.*' if FLAGS.restore_last_layer else "^((?!Logits).)*$", 73 | ) 74 | else: 75 | ws = None 76 | 77 | estimator = tf.estimator.Estimator( 78 | model_fn=model.model_fn, 79 | config=config, 80 | warm_start_from=ws 81 | ) 82 | 83 | bcast_hook = hvd.BroadcastGlobalVariablesHook(0) 84 | image_counter_hook = ImageRateHook() 85 | # Caching file writers, which will then be retrieved during evaluation. 86 | writer = tf.summary.FileWriter(logdir=FLAGS.model_dir, flush_secs=30) 87 | eval_writer = tf.summary.FileWriter( 88 | logdir=os.path.join(FLAGS.model_dir, "eval"), flush_secs=30) 89 | 90 | try: 91 | steps = estimator.get_variable_value('global_step') 92 | except ValueError: 93 | steps = 0 94 | evaluate_every_n = 1000 95 | evaluate(estimator, True) 96 | evaluate(estimator, False) 97 | # if hvd.rank() == 0 and FLAGS.evaluate: 98 | # evaluate(estimator, False) 99 | sys.exit() 100 | print("Steps", steps, "Max steps", FLAGS.max_steps) 101 | while steps < FLAGS.max_steps: 102 | evaluate_every_n = min(evaluate_every_n, FLAGS.max_steps - steps) 103 | estimator.train(input_fn=lambda: model.imagenet_iterator( 104 | is_training=True, num_epochs=10000), 105 | steps=evaluate_every_n, 106 | hooks=[bcast_hook, image_counter_hook]) 107 | if hvd.rank() == 0 and FLAGS.evaluate: 108 | # Evaluate on training set only for metric_learning 109 | if FLAGS.model in ['metric_learning', 'cifar100']: 110 | evaluate(estimator, True) 111 | evaluate(estimator, False) 112 | else: 113 | evaluate(estimator, False) 114 | 115 | steps += evaluate_every_n 116 | 117 | 118 | def evaluate(estimator, is_training): 119 | if is_training: 120 | logdir = FLAGS.model_dir 121 | else: 122 | logdir = os.path.join(FLAGS.model_dir, 'eval') 123 | print('Writing to', logdir) 124 | print('Evaluating for %s' % FLAGS.model) 125 | writer = tf.summary.FileWriterCache.get(logdir=logdir) 126 | if FLAGS.model in ['metric_learning']: 127 | global_step = evaluate_retrieval.get_global_step() 128 | evaluate_retrieval.compute_metrics(estimator, global_step, is_training, writer) 129 | if FLAGS.model in ['cifar100']: 130 | global_step = evaluate_retrieval.get_global_step() 131 | cifar.compute_metrics(estimator, global_step, is_training, writer) 132 | if "face" in FLAGS.model: 133 | global_step = evaluate_mfacenet.get_global_step() 134 | evaluate_mfacenet.compute_megaface_metrics(estimator, global_step, writer) 135 | for dataset in ["lfw", "agedb_30"]: 136 | evaluate_mfacenet.compute_metrics(dataset, estimator, global_step, writer) 137 | 138 | 139 | def log_error(): 140 | with open('errors.log', 'a') as fout: 141 | fout.write(('#' * 20) + '\n') 142 | fout.write(str(datetime.datetime.now()) + '\n') 143 | traceback.print_exc(file=fout) 144 | 145 | if __name__ == '__main__': 146 | try: 147 | absl.app.run(main) 148 | except tf.errors.ResourceExhaustedError as E: 149 | print("#" * 20, "\nCaught ResourceExhaustedError. Re-raising exception ...\n", "#" * 20) 150 | log_error() 151 | raise E 152 | except tf.errors.InternalError as E: 153 | print("#" * 20, "\nCaught InternalError. Re-raising exception ...\n", "#" * 20) 154 | log_error() 155 | raise E 156 | except Exception as E: 157 | print("*" * 20, "\nCaught exception", type(E), "Exiting now ...") 158 | traceback.print_exc() 159 | --------------------------------------------------------------------------------