├── README.md
├── cifar
    ├── __init__.py
    ├── cifar.py
    └── dist.py
├── evaluate_mfacenet_reranking.py
├── evaluate_retrieval.py
├── flags.py
├── get_embeddings_reranking.py
├── imagenet_preprocessing.py
├── input_pipeline.py
├── megaface_embeddings.py
├── megaface_embeddings_hdf.py
├── mobilefacenet.py
├── mobilenet
    ├── __init__.py
    ├── conv_blocks.py
    ├── mobilenet.py
    ├── mobilenet_example.ipynb
    ├── mobilenet_v2.py
    ├── mobilenet_v2_test.py
    └── readme.md
├── model.py
├── preprocessing
    ├── build_image_data.py
    ├── msceleb_to_jpeg.py
    ├── preprocess_fs_mf_official.sh
    └── preprocess_msceleb.sh
├── proxy_metric_learning.py
├── resnet
    ├── __init__.py
    └── resnet_model.py
├── run_scripts
    ├── cifar
    │   ├── dense_64.sh
    │   └── sparse_64.sh
    ├── face
    │   ├── dense_mobilenet_128.sh
    │   ├── dense_mobilenet_512.sh
    │   ├── dense_resnet_128.sh
    │   ├── dense_resnet_512.sh
    │   ├── sparse_mobilenet_fl.sh
    │   ├── sparse_mobilenet_l1.sh
    │   ├── sparse_resnet_fl.sh
    │   └── sparse_resnet_l1.sh
    └── get_embeddings
    │   ├── face_mobilenet_reranking.sh
    │   └── face_resnet_reranking.sh
├── search
    ├── Makefile
    ├── hdf5_to_bin.py
    ├── mobilefacenet_to_bin.sh
    ├── resnet_to_bin.sh
    └── search.cpp
└── train.py


/README.md:
--------------------------------------------------------------------------------
 1 | # sparse-embed
 2 | Code for paper 'Minimizing FLOPs to Learn Efficient Sparse Representations' published at ICLR 2020 https://openreview.net/forum?id=SygpC6Ntvr
 3 | 
 4 | The main training and testing code is based on [TensorFlow](https://www.tensorflow.org/), and multi-GPU training is achieved using [Horovod](https://github.com/horovod/horovod).
 5 | 
 6 | ## Requirements
 7 | The code has been tested to work with the following versions:
 8 | * python 3.6
 9 | * tensorflow-gpu 1.12
10 | * horovod 0.16
11 | 
12 | 
13 | ## Datasets
14 | **Training data:** The MS1M dataset for training was downloaded from the InsightFace [repository](https://github.com/deepinsight/insightface/wiki/Dataset-Zoo). The dataset is aligned by authors of InsightFace using [MTCNN](https://kpzhang93.github.io/MTCNN_face_detection_alignment/index.html).
15 | 
16 | The downloaded dataset is in MXnet format, and must be converted to JPEG images using ``python preprocessing/msceleb_to_jpeg.py <dataset_directory>``. The extracted images will be stored in ``dataset_directory/images``.
17 | 
18 | **Testing data:** The MegaFace and FaceScrub datasets can been downloaded from the official MegaFace challenge [website](http://megaface.cs.washington.edu/participate/challenge.html). We used [MTCNN](https://kpzhang93.github.io/MTCNN_face_detection_alignment/index.html) to align the images.
19 | 
20 | The CIFAR-100 training and testing data can be downloaded from https://www.cs.toronto.edu/~kriz/cifar.html.
21 | 
22 | ## Preprocessing
23 | All the datasets must be converted to TFRecords format for fast and multi-threaded pre-fetching of data. The scripts in ``preprocessing/*.sh`` can be used to convert to TFRecords format. The output directories can be modified within the code. The preprocessing code has been taken from the official TF models [github](https://github.com/tensorflow/models).
24 | 
25 | ### Training
26 | The pipeline to read images (``input_pipeline.py``, ``imagenet_preprocessing.py``) during training has also been taken from the TF models [github](https://github.com/tensorflow/models), and the main models have been implemented using the [tf.Estimator](https://www.tensorflow.org/guide/estimator) framework. Sample training scripts can be found in ``run_scripts/face`` and ``run_scripts/cifar``. The directories for the trained models and the dataset directories can be specified withing the shell scripts.
27 | 
28 | ### Evaluation
29 | For evaluation, we have to first compute the trained embeddings for FaceScrub and MegaFace. Example scripts using ``get_embeddings_reranking.py`` are given in ``run_scripts/get_embeddings``. The computed embeddings will be saved as [HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) files in a directory named ``embeddings_reranking``. The MegaFace and FaceScrub embeddings start with the prefixes ``megaface`` and ``facescrub`` respectively. For instance, the MegaFace embeddings corresponding to the model ``sparse_mobilenet_fl_run_1`` will be saved as ``megaface_sparse_mobilenet_fl_run_1.hdf5``.
30 | 
31 | The accuracy and retrieval times are evaluated independently as described below.
32 | 
33 | **Accuracy:** Accuracy can be evaluated as
34 | ```
35 | python evaluate_mfacenet_reranking.py --dense_suffix=dense_mobilenet_512_run_1.hdf5 --sparse_suffix=sparse_mobilenet_fl_run_1.hdf5 --threshold=0.25 --rerank_topk=1000
36 | ```
37 | ``dense_suffix`` and ``sparse_suffix`` parameters denote the corresponding and dense and sparse embeddings to use. The dense embeddings are used for re-ranking as described in the paper. The ``threshold`` and ``rerank_topk`` parameters denote the filtering threshold and the number of top candidates to rerank, respectively. Refer to the paper for a more detailed description of these parameters.
38 | 
39 | **Retrieval Time:** For measuring the time, a more efficient C++ script is used. The HDF5 files must be converted to binary files before they can be read by the script. ``mobilefacenet_to_bin.sh`` and ``resnet_to_bin.sh`` provide examples to convert the HDF5 files to binary format and saved in the directory ``embedding_bin``. The retrieval time can be measured using ``search/search.cpp``. The makefile ``search/Makefile`` provides a sample flow to compile the script and measure the time retrieval time in ms. The ``runmegaface`` command measures the time using MegaFace distractors and held-out MegaFace queries. The ``runfacescrub`` command on the other hand uses MegaFace distractors and FaceScrub queries.
40 | 
41 | 


--------------------------------------------------------------------------------
/cifar/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/biswajitsc/sparse-embed/775404d38ce4382251b88168820da3740c1323ea/cifar/__init__.py


--------------------------------------------------------------------------------
/cifar/cifar.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import tensorflow as tf
  3 | import pickle as pkl
  4 | import os
  5 | 
  6 | import absl
  7 | from tqdm import tqdm
  8 | 
  9 | flags = absl.flags
 10 | FLAGS = flags.FLAGS
 11 | slim = tf.contrib.slim
 12 | 
 13 | 
 14 | def flip_rl(inputs):
 15 |     '''
 16 |     Args:
 17 |         inputs - 4D tensor [batch, height, width, nchannel]
 18 |     Return:
 19 |         flips - 4D tensor [batch, height, width, nchannel]
 20 | 
 21 |     flips[i][j][width-k][l] = inputs[i][j][k][l]
 22 |     '''
 23 |     flips = tf.reverse(inputs, axis=[2])
 24 |     return flips
 25 | 
 26 | 
 27 | def random_flip_rl(inputs):
 28 |     '''
 29 |     Args: 
 30 |         inputs - 4D tensor
 31 |     Return: 
 32 |         flip left right or original image
 33 |     '''
 34 |     rand = tf.random_uniform([], minval=-1, maxval=1)
 35 |     return tf.cond(tf.greater(rand, 0.0), lambda: inputs, lambda: flip_rl(inputs))
 36 | 
 37 | 
 38 | def get_shape(x):
 39 |     '''get the shape of tensor as list'''
 40 |     return x.get_shape().as_list()
 41 | 
 42 | 
 43 | def random_crop(inputs, ch, cw):
 44 |     '''
 45 |     apply tf.random_crop on 4D
 46 |     Args:
 47 |         inputs- 4D tensor
 48 |         ch - int
 49 |             crop height
 50 |         cw - int
 51 |             crop width
 52 |     '''
 53 |     ib, ih, iw, ic = get_shape(inputs)
 54 |     return tf.random_crop(inputs, [FLAGS.batch_size, ch, cw, ic])
 55 | 
 56 | 
 57 | def read_cifar_split(data_dir, train, seen=False):
 58 |     if train:
 59 |         train_images = np.load(data_dir + '/train_image.npy')
 60 |         train_labels = np.load(data_dir + '/train_label.npy')
 61 |         
 62 |         print('CIFAR-100: images {} labels {}'.format(train_images.shape, train_labels.shape))
 63 |         return train_images, train_labels
 64 |     
 65 |     elif not seen:
 66 |         # val_images = np.load(data_dir + '/val_image.npy')
 67 |         # val_labels = np.load(data_dir + '/val_label.npy')
 68 | 
 69 |         # test_images_u = np.load(data_dir + '/test_image_u.npy')
 70 |         # test_labels_u = np.load(data_dir + '/test_label_u.npy')
 71 | 
 72 |         # images = np.concatenate((val_images, test_images_u))
 73 |         # labels = np.concatenate((val_labels, test_labels_u))
 74 | 
 75 |         images = np.load(data_dir + '/test_image_u.npy')
 76 |         labels = np.load(data_dir + '/test_label_u.npy')
 77 | 
 78 |         print('CIFAR-100: images {} labels {}'.format(images.shape, labels.shape))
 79 |         return images, labels
 80 | 
 81 |     else:
 82 |         images = np.load(data_dir + '/test_image_s.npy')
 83 |         labels = np.load(data_dir + '/test_label_s.npy')
 84 | 
 85 |         print('CIFAR-100: images {} labels {}'.format(images.shape, labels.shape))
 86 | 
 87 |         return images, labels
 88 | 
 89 | 
 90 | def input_fn_split(data_dir, train, batch_size, num_epochs, seen=False):
 91 |     images, labels = read_cifar_split(data_dir, train, seen)
 92 |     labels = np.asarray(labels.reshape((-1, 1)), dtype=np.int32)
 93 |     images = np.asarray(images, dtype=np.float32)
 94 |     idx = np.random.permutation(len(images))
 95 |     images = images[idx]
 96 |     labels = labels[idx]
 97 |     # images = images / 128.0 # images are already centered
 98 | 
 99 |     texts = np.asarray(['-'] * len(labels))
100 | 
101 |     fn = tf.estimator.inputs.numpy_input_fn(
102 |         {
103 |             "features": images,
104 |             "labels": labels,
105 |             "label_texts": texts,
106 |             "filename": texts
107 |         },
108 |         labels, batch_size=batch_size, num_epochs=num_epochs,
109 |         shuffle=True)
110 |     return fn()
111 | 
112 |     # texts = tf.constant(['-'] * len(labels))
113 |     # labels = tf.constant(labels, tf.int32)
114 |     # images = tf.constant(images, tf.float32) # channels last
115 |     # dataset = tf.data.Dataset.from_tensor_slices((images, labels, texts, texts))
116 |     # dataset = dataset.repeat(num_epochs)
117 |     # dataset = dataset.batch(batch_size)
118 | 
119 |     # return dataset
120 | 
121 | 
122 | def input_fn(data_dir, train, batch_size, num_epochs):
123 |     with open(os.path.join(data_dir, 'train'), 'rb') as f:
124 |         data = pkl.load(f, encoding='bytes')
125 |         train_images = data[b'data'].astype(np.float32)
126 |         train_labels = data[b'fine_labels']
127 |         image_mean = np.mean(train_images, axis=0)
128 | 
129 |     with open(os.path.join(data_dir, 'test'), 'rb') as f:
130 |         data = pkl.load(f, encoding='bytes')
131 |         test_images = data[b'data'].astype(np.float32)
132 |         test_labels = data[b'fine_labels']
133 |     
134 |     train_images -= image_mean
135 |     test_images -= image_mean
136 | 
137 |     train_images /= 128.0
138 |     test_images /= 128.0
139 | 
140 |     if train:
141 |         images = train_images
142 |         labels = train_labels
143 |     else:
144 |         images = test_images
145 |         labels = test_labels
146 | 
147 |     labels = np.asarray(labels, dtype=np.int32).reshape((-1, 1))
148 |     images = np.asarray(images, dtype=np.float32)
149 |     images = np.transpose(np.reshape(images, [-1,3,32,32]), [0,2,3,1])
150 |     idx = np.random.permutation(len(images))
151 |     images = images[idx]
152 |     labels = labels[idx]
153 | 
154 |     texts = np.asarray(['-'] * len(labels))
155 | 
156 |     fn = tf.estimator.inputs.numpy_input_fn(
157 |         {
158 |             "features": images,
159 |             "labels": labels,
160 |             "label_texts": texts,
161 |             "filename": texts
162 |         },
163 |         labels, batch_size=batch_size, num_epochs=num_epochs,
164 |         shuffle=True)
165 |     return fn()
166 | 
167 | 
168 | def proxy_layer(embedding, labels, num_classes):
169 |   with tf.variable_scope('Logits'):
170 |     weights = tf.get_variable(name='proxy_wts',
171 |                               shape=(embedding.shape[-1], num_classes),
172 |                               initializer=tf.random_normal_initializer(stddev=0.001),
173 |                               dtype=tf.float32)
174 |     weights = tf.nn.l2_normalize(weights, axis=0)
175 |     bias = tf.Variable(tf.zeros([num_classes]))
176 |     # alpha = 64.0
177 |     logits = tf.matmul(embedding, weights) # + bias
178 |     # one_hot_labels = tf.one_hot(labels, num_classes)
179 |     # logits -= one_hot_labels * 0.1
180 |   return logits
181 | 
182 | 
183 | def soft_thresh(lam):
184 |   # lam = tf.get_variable(
185 |   #   "soft_thresh", shape=(),
186 |   #   initializer=tf.zeros_initializer,
187 |   #   constraint=lambda x: "hello")
188 |   # lam = lam
189 |   def activation(x):
190 |     ret = tf.nn.relu(x - lam) - tf.nn.relu(-x - lam)
191 |     return ret  
192 |   return activation
193 | 
194 | 
195 | def nin(features, labels, is_training, num_classes):
196 |     if is_training:
197 |         features = random_flip_rl(
198 |                     random_crop(
199 |                         tf.pad(
200 |                             features, [[0,0],[4,4],[4,4],[0,0]], 
201 |                             "CONSTANT"), 32, 32))
202 | 
203 |     with tf.variable_scope("NIN"):
204 |         with slim.arg_scope([slim.conv2d],
205 |                             weights_initializer=tf.contrib.slim.variance_scaling_initializer(),
206 |                             weights_regularizer=slim.l2_regularizer(0.0001)):
207 |             with slim.arg_scope([slim.conv2d], stride=1, padding='SAME', activation_fn=tf.nn.relu):
208 |                 with slim.arg_scope(([slim.dropout]), is_training=is_training, keep_prob=0.5):
209 |                     n = features
210 |                     n = slim.conv2d(n, 192, [5, 5], scope='conv2d_0')
211 |                     n = slim.conv2d(n, 160, [1, 1], scope='conv2d_1')
212 |                     n = slim.conv2d(n, 96, [1, 1], scope='conv2d_2')
213 |                     n = slim.max_pool2d(n, [3,3], stride=2, padding='SAME')
214 |                     n = slim.dropout(n)
215 |                     n = slim.conv2d(n, 192, [5, 5], scope='conv2d_3')
216 |                     n = slim.conv2d(n, 192, [1, 1], scope='conv2d_4')
217 |                     n = slim.conv2d(n, 192, [1, 1], scope='conv2d_5')
218 |                     n = slim.avg_pool2d(n, [3,3], stride=2, padding='SAME')
219 |                     n = slim.dropout(n)
220 |                     n = slim.conv2d(n, 192, [3, 3], scope='conv2d_6')
221 |                     n = slim.conv2d(n, 192, [1, 1], activation_fn=None, scope='conv2d_7')
222 | 
223 |                     n = tf.reduce_mean(n, [1, 2], keepdims=False)
224 |                     n = tf.nn.relu(n)
225 |     
226 |     if FLAGS.final_activation == 'none':
227 |         final_activation = None
228 |     elif FLAGS.final_activation == 'relu':
229 |         final_activation = tf.nn.relu
230 |     elif FLAGS.final_activation == 'soft_thresh':
231 |         final_activation = soft_thresh(0.1)
232 |     else:
233 |         raise ValueError('Unknown final_activation %s' % final_activation)
234 | 
235 |     with tf.variable_scope('embedding'):
236 |         with slim.arg_scope([slim.fully_connected],
237 |                             activation_fn=final_activation,
238 |                             weights_regularizer=slim.l2_regularizer(0.001),
239 |                             biases_initializer=tf.zeros_initializer()):
240 |             n = slim.fully_connected(n, FLAGS.embedding_size, scope = "fc1")
241 |             embedding = tf.nn.l2_normalize(n, axis=1)
242 | 
243 |     end_points = {}
244 |     end_points['embedding'] = embedding
245 | 
246 |     logits = proxy_layer(embedding, labels, num_classes)
247 |     end_points['Logits'] = logits
248 | 
249 |     return logits, end_points
250 | 
251 | 
252 | def get_embeddings(estimator, global_step, is_train):
253 |   fn = lambda: input_fn(
254 |     FLAGS.data_dir, is_train, FLAGS.batch_size, 1)
255 | 
256 |   predictions = estimator.predict(
257 |     input_fn=fn,
258 |     yield_single_examples=True,
259 |     predict_keys=["true_labels", "true_label_texts", "small_embeddings", "filename"],
260 |     checkpoint_path=os.path.join(FLAGS.model_dir, 'model.ckpt-%d' % global_step)
261 |   )
262 | 
263 |   labels = []
264 |   embeddings = []
265 | 
266 |   for prediction in tqdm(predictions):
267 |     labels.append(prediction["true_labels"][0])
268 |     embeddings.append(prediction["small_embeddings"])
269 | 
270 |   labels = np.asarray(labels)
271 |   embeddings = np.asarray(embeddings)
272 |   embeddings[np.abs(embeddings) <= FLAGS.zero_threshold] = 0.0
273 |   return embeddings, labels
274 | 
275 | def eval_precision(qemb, qlabel, demb, dlabel, ks):
276 |   MAX_QUERY = 10000
277 | 
278 |   num_correct = np.zeros(len(ks))
279 |   num_queries = min(qemb.shape[0], MAX_QUERY)
280 |   for i in tqdm(range(num_queries)):
281 |       dis = - np.squeeze(np.matmul(qemb[i], demb.T))
282 |       dis[i] = 1e10
283 |       pred = np.argsort(dis)
284 |       for j, k in enumerate(ks):
285 |         num_correct[j] += np.mean(dlabel[pred[:k]] == qlabel[i])
286 |   recall = num_correct / num_queries
287 |   return recall
288 | 
289 | def compute_metrics(estimator, global_step, is_training, summary_writer=None):
290 |   print("Computing for global step %d" % global_step)
291 | 
292 |   ks = [1, 4, 16, 64]
293 | 
294 |   test_embeddings, test_labels = get_embeddings(estimator, global_step, False)
295 |   if is_training:
296 |     train_embeddings, train_labels = get_embeddings(estimator, global_step, True)
297 | 
298 |   if is_training:
299 |     precisions = eval_precision(test_embeddings, test_labels, train_embeddings, train_labels, ks)
300 |   else:
301 |     precisions = eval_precision(test_embeddings, test_labels, test_embeddings, test_labels, ks)
302 | 
303 |   print("Global Step %d" % global_step)
304 |   # print("Max dot product")
305 |   # print("\tmean", np.mean(max_dots), "std", np.std(max_dots))
306 | 
307 |   if is_training:
308 |     embeddings = train_embeddings
309 |   else:
310 |     embeddings = test_embeddings
311 | 
312 |   nnz = np.abs(embeddings) >= FLAGS.zero_threshold
313 |   sparsity = 1.0 - np.mean(nnz)
314 |   flops = np.mean(nnz, axis=0)
315 |   flops = np.sum(flops * flops)
316 | 
317 |   mean_nnz = np.mean(np.sum(nnz, axis=1))
318 |   flops_ratio = flops * FLAGS.embedding_size / (mean_nnz**2)
319 |   print("\tSparsity", sparsity, "Flops", flops, "Ratio", flops_ratio)
320 | 
321 |   print("Precision")
322 |   print("".join(["\t@%d %f" % (ks[i], precisions[i]) for i in range(len(ks))]))
323 | 
324 |   row_sparsity = np.mean(embeddings <= FLAGS.zero_threshold, axis=1)
325 |   print("Row sparsity")
326 |   print("\tmean", np.mean(row_sparsity), "std", np.std(row_sparsity))
327 | 
328 |   col_sparsity = np.mean(embeddings <= FLAGS.zero_threshold, axis=0)
329 |   print("Column sparsity")
330 |   print("\tmean", np.mean(col_sparsity), "std", np.std(col_sparsity))
331 | 
332 |   # print("NMI %f" % nmi)
333 | 
334 |   if summary_writer is not None:
335 |     summary = tf.Summary(
336 |       value=[tf.Summary.Value(tag='sparsity/small', simple_value=mean_nnz),
337 |              tf.Summary.Value(tag='sparsity/flops', simple_value=flops),
338 |              tf.Summary.Value(tag='sparsity/ratio', simple_value=flops_ratio)] +
339 |       [tf.Summary.Value(tag='accuracy/precision_%d' % (ks[i]), simple_value=precisions[i]) \
340 |           for i in range(len(ks))]
341 |     )
342 | 
343 |     summary_writer.add_summary(summary, global_step=global_step)
344 | 


--------------------------------------------------------------------------------
/cifar/dist.py:
--------------------------------------------------------------------------------
  1 | from tensorflow.python.framework import dtypes, ops, sparse_tensor, tensor_shape
  2 | from tensorflow.python.ops import array_ops, control_flow_ops, logging_ops, math_ops,\
  3 |                                   nn, script_ops, sparse_ops
  4 | from tensorflow.python.summary import summary
  5 | import tensorflow as tf
  6 | 
  7 | def deviation_from_kth_element(feature, k):
  8 |     '''
  9 |     Args:
 10 |         feature - 2-D tensor of size [number of data, feature dimensions]
 11 |         k - int
 12 |             number of bits to be activated
 13 |             k < feature dimensions
 14 |     Return:
 15 |         deviation : 2-D Tensor of size [number of data, feature dimensions] 
 16 |            deviation from k th element 
 17 |     '''
 18 |     feature_top_k = tf.nn.top_k(feature, k=k+1)[0] # [number of data, k+1]
 19 | 
 20 |     rho = tf.stop_gradient(tf.add(feature_top_k[:,k-1], feature_top_k[:,k])/2) # [number of data]
 21 |     rho_r = tf.reshape(rho, [-1,1]) # [number of data, 1] 
 22 | 
 23 |     deviation = tf.subtract(feature, rho_r) # [number of data, feature dimensions]
 24 |     return deviation
 25 | 
 26 | 
 27 | def pairwise_distance_euclid_v2(feature1, feature2):
 28 |     """Computes the pairwise distance matrix with numerical stability.
 29 |     output[i, j] = || feature[i, :] - feature[j, :] ||_2
 30 |     Args:
 31 |     	feature1 - 2-D Tensor of size [number of data1, feature dimension].
 32 |     	feature2 - 2-D Tensor of size [number of data2, feature dimension].
 33 |     Returns:
 34 |     	pairwise_distances - 2-D Tensor of size [number of data1, number of data2].
 35 |     """
 36 |     pairwise_distances = math_ops.add(
 37 |     	math_ops.reduce_sum(
 38 | 	    math_ops.square(feature1),
 39 | 	    axis=[1],
 40 | 	    keep_dims=True),
 41 |         math_ops.reduce_sum(
 42 | 	    math_ops.square(
 43 | 	        array_ops.transpose(feature2)),
 44 | 	        axis=[0],
 45 | 	        keep_dims=True)) - 2.0 * math_ops.matmul(
 46 | 	    feature1, array_ops.transpose(feature2))
 47 | 
 48 |     # Deal with numerical inaccuracies. Set small negatives to zero.
 49 |     pairwise_distances = math_ops.maximum(pairwise_distances, 0.0)
 50 |     # Get the mask where the zero distances are at.
 51 |     error_mask = math_ops.less_equal(pairwise_distances, 0.0)
 52 | 
 53 |     pairwise_distances = math_ops.multiply(
 54 |         pairwise_distances, math_ops.to_float(math_ops.logical_not(error_mask)))
 55 |     return pairwise_distances
 56 | 
 57 | def pairwise_distance_euclid(feature, squared=False):
 58 |     """Computes the pairwise distance matrix with numerical stability.
 59 |     output[i, j] = || feature[i, :] - feature[j, :] ||_2
 60 |     Args:
 61 |     	feature - 2-D Tensor of size [number of data, feature dimension].
 62 |     	squared - Boolean, whether or not to square the pairwise distances.
 63 |     Returns:
 64 |     	pairwise_distances - 2-D Tensor of size [number of data, number of data].
 65 |     """
 66 |     pairwise_distances_squared = math_ops.add(
 67 |     	math_ops.reduce_sum(
 68 | 	    math_ops.square(feature),
 69 | 	    axis=[1],
 70 | 	    keep_dims=True),
 71 |         math_ops.reduce_sum(
 72 | 	    math_ops.square(
 73 | 	        array_ops.transpose(feature)),
 74 | 	        axis=[0],
 75 | 	        keep_dims=True)) - 2.0 * math_ops.matmul(
 76 | 	    feature, array_ops.transpose(feature))
 77 | 
 78 |     # Deal with numerical inaccuracies. Set small negatives to zero.
 79 |     pairwise_distances_squared = math_ops.maximum(pairwise_distances_squared, 0.0)
 80 |     # Get the mask where the zero distances are at.
 81 |     error_mask = math_ops.less_equal(pairwise_distances_squared, 0.0)
 82 | 
 83 |     # Optionally take the sqrt.
 84 |     if squared:
 85 |         pairwise_distances = pairwise_distances_squared
 86 |     else:
 87 |         pairwise_distances = math_ops.sqrt(
 88 |             pairwise_distances_squared + math_ops.to_float(error_mask) * 1e-16)
 89 | 
 90 |     # Undo conditionally adding 1e-16.
 91 |     pairwise_distances = math_ops.multiply(
 92 |         pairwise_distances, math_ops.to_float(math_ops.logical_not(error_mask)))
 93 | 
 94 |     num_data = array_ops.shape(feature)[0]
 95 |     # Explicitly set diagonals to zero.
 96 |     mask_offdiagonals = array_ops.ones_like(pairwise_distances) - array_ops.diag(
 97 |         array_ops.ones([num_data]))
 98 |     pairwise_distances = math_ops.multiply(pairwise_distances, mask_offdiagonals)
 99 |     return pairwise_distances
100 | 
101 | def masked_maximum(data, mask, dim=1):
102 |     """Computes the axis wise maximum over chosen elements.
103 |     Args:
104 |         data : float, 2D tensor [n, m]
105 |         mask : bool, 2D tensor [n, m]
106 |         dim : int, the dimension over which to compute the maximum.
107 |     Returns:
108 |         masked_maximums : 2D Tensor
109 |             dim=0 => [n,1]
110 |             dim=1 => [1,m]
111 |         get maximum among mask=1
112 |     """
113 |     axis_minimums = math_ops.reduce_min(data, dim, keep_dims=True)
114 |     masked_maximums = math_ops.reduce_max(math_ops.multiply(data - axis_minimums, mask), dim, keep_dims=True) + axis_minimums
115 |     return masked_maximums
116 | 
117 | def masked_minimum(data, mask, dim=1):
118 |     """Computes the axis wise minimum over chosen elements.
119 |     Args:
120 |         data : float, 2D tensor [n, m]
121 |         mask : bool, 2D tensor [n, m]
122 |         dim : int, the dimension over which to compute the minimum.
123 |     Returns:
124 |         masked_minimums : 2D Tensor
125 |             dim=0 => [n,1]
126 |             dim=1 => [1,m]
127 |         get minimum among mask=1
128 |     """
129 |     axis_maximums = math_ops.reduce_max(data, dim, keep_dims=True) # [n, 1] or [1, m]
130 |     masked_minimums = math_ops.reduce_min(math_ops.multiply(data - axis_maximums, mask), dim, keep_dims=True) + axis_maximums # [n, 1] or [1, m]
131 |     return masked_minimums
132 | 
133 | def triplet_semihard_loss(labels, embeddings, pairwise_distance, margin=1.0):
134 |     """Computes the triplet loss with semi-hard negative mining.
135 |     Args:
136 | 	labels - 1-D tensor [batch_size] as tf.int32
137 |                 multiclass integer labels.
138 | 	embeddings - 2-D tensor [batch_size, feature dimensions] as tf.float32
139 |                      `Tensor` of embedding vectors. Embeddings should be l2 normalized.
140 |         pairwise_distance - func 
141 |                             with argus 2D tensor [number of data, feature dims]
142 |                                  return 2D tensor [number of data, number of data] 
143 | 	margin - float defaults to be 1.0
144 |                 margin term in the loss definition.
145 | 
146 |     Returns:
147 |     	triplet_loss - tf.float32 scalar.
148 |     """
149 |     # Reshape [batch_size] label tensor to a [batch_size, 1] label tensor.
150 |     lshape = array_ops.shape(labels)
151 |     assert lshape.shape == 1
152 |     labels = array_ops.reshape(labels, [lshape[0], 1])
153 | 
154 |     # pdist_matrix[i][j] = dist(i, j)
155 |     pdist_matrix = pairwise_distance(embeddings) # [batch_size, batch_size]
156 |     # adjacency[i][j]=1 if label[i]==label[j] else 0
157 |     adjacency = math_ops.equal(labels, array_ops.transpose(labels)) # [batch_size, batch_size]
158 |     # adjacency_not[i][j]=0 if label[i]==label[j] else 0
159 |     adjacency_not = math_ops.logical_not(adjacency) # [batch_size, batch_size]
160 |     batch_size = array_ops.size(labels)
161 | 
162 |     # Compute the mask.
163 |     # pdist_matrix_tile[batch_size*i+j, k] = distance(j, k)
164 |     pdist_matrix_tile = array_ops.tile(pdist_matrix, [batch_size, 1]) # [batch_size*batch_size, batch_size]
165 |     # reshape(transpose(pdist_matrix), [-1, 1])[batch_size*i+j][0] = distance(j, i)
166 |     # tile(adjacency_not, [batch_size, 1])[batch_size*i+j][k] = 1 if label[j]!=label[k] otherwise 0
167 |     # mask[batch_size*i+j][k] = 1 if label[j]!=label[k] different label, and distance(j,k)>distance(j,i)
168 |     mask = math_ops.logical_and(
169 |         array_ops.tile(adjacency_not, [batch_size, 1]),
170 |         math_ops.greater(pdist_matrix_tile, array_ops.reshape(array_ops.transpose(pdist_matrix), [-1, 1]))) # [batch_size*batch_size, batch_size]
171 |     # mask_final[i][j]=1 if there exists k s.t. label[j]!=label[k] and distance(j,k)>distance(j,i)
172 |     mask_final = array_ops.reshape(
173 |         math_ops.greater(math_ops.reduce_sum(math_ops.cast(mask, dtype=dtypes.float32), 1, keep_dims=True), 0.0),
174 |         [batch_size, batch_size])# [batch_size, batch_size]
175 |     # mask_final[i][j]=1 if there exists k s.t. label[i]!=label[k] and distance(i,k)>distance(i,j)
176 |     mask_final = array_ops.transpose(mask_final)# [batch_size, batch_size]
177 | 
178 |     adjacency_not = math_ops.cast(adjacency_not, dtype=dtypes.float32) # [batch_size, batch_size]
179 |     mask = math_ops.cast(mask, dtype=dtypes.float32) # [batch_size*batch_size, batch_size]
180 | 
181 |     # masked_minimum(pdist_matrix_tile, mask)[batch*i+j][1] = pdist_matrix[j][k] s.t minimum over 'k's label[j]!=label[k], distance(j,k)>distance(j,i)
182 |     # negatives_outside[i][j] = pdist_matrix[j][k] s.t minimum over 'k's label[j]!=label[k], distance(j,k)>distance(j,i)
183 |     negatives_outside = array_ops.reshape(masked_minimum(pdist_matrix_tile, mask), [batch_size, batch_size]) # [batch_size, batch_size]
184 |     # negatives_outside[i][j] = pdist_matrix[i][k] s.t minimum over 'k's label[i]!=label[k], distance(i,k)>distance(i,j)
185 |     negatives_outside = array_ops.transpose(negatives_outside) # [batch_size, batch_size]
186 | 
187 |     # negatives_inside[i][j] = pdist_matrix[i][k] s.t maximum over label[i]!=label[k]
188 |     negatives_inside = array_ops.tile(masked_maximum(pdist_matrix, adjacency_not), [1, batch_size]) # [batch_size, batch_size]
189 |      
190 |     # semi_hard_negatives[i][j] = pdist_matrix[i][k] if exists negatives_outside, otherwise negatives_inside
191 |     semi_hard_negatives = array_ops.where(mask_final, negatives_outside, negatives_inside) # [batch_size, batch_size]
192 |     loss_mat = math_ops.add(margin, pdist_matrix - semi_hard_negatives) # [batch_size, batch_size]
193 | 
194 |     # mask_positives[i][j] = 1 if label[i]==label[j], and i!=j
195 |     mask_positives = math_ops.cast(adjacency, dtype=dtypes.float32) - array_ops.diag(array_ops.ones([batch_size]))
196 | 
197 |     # In lifted-struct, the authors multiply 0.5 for upper triangular
198 |     # in semihard, they take all positive pairs except the diagonal.
199 |     num_positives = math_ops.reduce_sum(mask_positives)
200 | 
201 |     triplet_loss = math_ops.truediv(
202 |             math_ops.reduce_sum(math_ops.maximum(math_ops.multiply(loss_mat, mask_positives), 0.0)),
203 |             num_positives,
204 |             name='triplet_semihard_loss') # hinge
205 |     return triplet_loss
206 | 
207 | def npairs_loss(labels, embeddings_anchor, embeddings_positive,
208 |                 reg_lambda=0.002, print_losses=False):
209 |     """Computes the npairs loss.
210 |     Npairs loss expects paired data where a pair is composed of samples from the
211 |     same labels and each pairs in the minibatch have different labels. The loss
212 |     has two components. The first component is the L2 regularizer on the
213 |     embedding vectors. The second component is the sum of cross entropy loss
214 |     which takes each row of the pair-wise similarity matrix as logits and
215 |     the remapped one-hot labels as labels.
216 |     See: http://www.nec-labs.com/uploads/images/Department-Images/MediaAnalytics/papers/nips16_npairmetriclearning.pdf
217 | 
218 |     Args:
219 |         labels: 1-D tf.int32 `Tensor` of shape [batch_size/2].
220 |         embeddings_anchor: 2-D Tensor of shape [batch_size/2, embedding_dim] for the
221 |             embedding vectors for the anchor images. Embeddings should not be
222 |             l2 normalized.
223 |         embeddings_positive: 2-D Tensor of shape [batch_size/2, embedding_dim] for the
224 |             embedding vectors for the positive images. Embeddings should not be
225 |             l2 normalized.
226 |         reg_lambda: Float. L2 regularization term on the embedding vectors.
227 |         print_losses: Boolean. Option to print the xent and l2loss.
228 |     Returns:
229 |         npairs_loss: tf.float32 scalar.
230 |     """
231 |     # Add the regularizer on the embedding.
232 |     reg_anchor = math_ops.reduce_mean(math_ops.reduce_sum(math_ops.square(embeddings_anchor), 1))
233 |     reg_positive = math_ops.reduce_mean(math_ops.reduce_sum(math_ops.square(embeddings_positive), 1))
234 |     l2loss = math_ops.multiply(0.25 * reg_lambda, reg_anchor + reg_positive, name='l2loss')
235 | 
236 |     # Get per pair similarities.
237 |     similarity_matrix = math_ops.matmul(embeddings_anchor, embeddings_positive, transpose_a=False, transpose_b=True) # [batch_size/2, batch_size/2]
238 | 
239 |     # Reshape [batch_size/2] label tensor to a [batch_size/2, 1] label tensor.
240 |     lshape = array_ops.shape(labels)
241 |     assert lshape.shape == 1
242 |     labels = array_ops.reshape(labels, [lshape[0], 1])
243 | 
244 |     # labels_remapped[i][j] = 1 if label[i] == label[j] otherwise 0
245 |     labels_remapped = math_ops.to_float(math_ops.equal(labels, array_ops.transpose(labels))) # [batch_size/2, batch_size/2] 
246 |     labels_remapped /= math_ops.reduce_sum(labels_remapped, 1, keep_dims=True)
247 | 
248 |     # Add the softmax loss.
249 |     xent_loss = nn.softmax_cross_entropy_with_logits(logits=similarity_matrix, labels=labels_remapped)
250 |     xent_loss = math_ops.reduce_mean(xent_loss, name='xentropy')
251 | 
252 |     if print_losses:
253 |         xent_loss = logging_ops.Print(xent_loss, ['cross entropy:', xent_loss, 'l2loss:', l2loss])
254 |     return l2loss + xent_loss
255 | 


--------------------------------------------------------------------------------
/evaluate_mfacenet_reranking.py:
--------------------------------------------------------------------------------
  1 | import pprint
  2 | import time
  3 | import json
  4 | from tqdm import tqdm
  5 | import sys
  6 | import os
  7 | import numpy as np
  8 | import shutil
  9 | import glob
 10 | from sklearn.model_selection import KFold
 11 | import pickle
 12 | from copy import deepcopy
 13 | import multiprocessing
 14 | import h5py
 15 | 
 16 | from absl import flags
 17 | from absl import app
 18 | 
 19 | flags.DEFINE_string(name='dense_suffix',
 20 |                     help='suffix of the dense embedding files',
 21 |                     default='dense_resnet_512_run_1.hdf5')
 22 | flags.DEFINE_string(name='sparse_suffix',
 23 |                     help='suffix of the sparse embedding files',
 24 |                     default=None)
 25 | flags.DEFINE_float(name='threshold',
 26 |                    help='Threshold used to filter the examples',
 27 |                    default=None)
 28 | flags.DEFINE_integer(name='rerank_topk',
 29 |                    help='topk elements to rerank',
 30 |                    default=1000)
 31 | 
 32 | 
 33 | FLAGS = flags.FLAGS
 34 | 
 35 | def compute_recall(sorted_labels, correct_label, top_k):
 36 |     scores = [[] for _ in top_k]
 37 |     incorrect_seen = 0
 38 |     for label in sorted_labels:
 39 |       if label != -1 and label != correct_label:
 40 |         continue
 41 |       if label != correct_label:
 42 |         incorrect_seen += 1
 43 |       else:
 44 |         for i, k in enumerate(top_k):
 45 |           scores[i].append(1 if incorrect_seen < k else 0)
 46 |     scores = np.mean(scores, axis=1)
 47 |     return scores
 48 | 
 49 | 
 50 | def get_top_k_idx(labels, scores, k):
 51 |     count = 0
 52 |     for i, label in enumerate(labels):
 53 |         if label == -1:
 54 |             count += 1
 55 |         if count >= k:
 56 |             return i
 57 |         if scores[i] < FLAGS.threshold:
 58 |           return i
 59 | 
 60 | 
 61 | def get_recall_k(megaface_sparse, megaface_dense, facescrub_sparse, facescrub_dense, facescrub_labels, top_k, rerank_k):
 62 |   sparse_database = np.concatenate((facescrub_sparse, megaface_sparse))
 63 |   dense_database = np.concatenate((facescrub_dense, megaface_dense))
 64 |   dlabels = np.concatenate((facescrub_labels, [-1] * len(megaface_sparse)))
 65 |   num_probes = len(facescrub_labels)
 66 |   print("Num probes", num_probes)
 67 |   print("Multiplying matrices")
 68 |   dot = np.matmul(facescrub_sparse, sparse_database.T)
 69 |   print("Sorting the matrix")
 70 |   ranked_indices = np.argsort(-dot, axis=1)
 71 |   recalls = []
 72 |   it = tqdm(range(num_probes))
 73 |   for i in it:
 74 |     ranked = ranked_indices[i]
 75 |     labels = dlabels[ranked]
 76 |     scores = dot[i][ranked]
 77 |     if rerank_k > 0:
 78 |       print("Re-ranking top-%d" % rerank_k)
 79 |       top_idx = get_top_k_idx(labels, scores, rerank_k)
 80 |       labels, rest = labels[:top_idx+1], labels[top_idx+1:]
 81 |       ranked = ranked[:top_idx+1]
 82 |       dense_emb = dense_database[ranked]
 83 |       dense_dot = np.matmul(facescrub_dense[i], dense_emb.T).ravel()
 84 |       reranked_indices = np.argsort(-dense_dot)
 85 |       reranked_labels = labels[reranked_indices]
 86 |       labels = np.concatenate((reranked_labels, rest))
 87 |     recalls.append(compute_recall(labels, facescrub_labels[i], top_k))
 88 |     it.set_description('Recalls %.3f %.3f %.3f %.3f' % tuple(np.mean(recalls, axis=0)))
 89 | 
 90 |   return np.mean(recalls, axis=0)
 91 | 
 92 | 
 93 | def main(_):
 94 |   print("Official Megaface Metrics")
 95 | 
 96 |   megaface_sparse = h5py.File('embeddings_reranking/megaface_' + FLAGS.sparse_suffix, 'r')['embedding']
 97 |   megaface_dense = h5py.File('embeddings_reranking/megaface_' + FLAGS.dense_suffix, 'r')['embedding']
 98 | 
 99 |   facescrub_sparse = h5py.File('embeddings_reranking/facescrub_' + FLAGS.sparse_suffix, 'r')['embedding']
100 |   facescrub_dense = h5py.File('embeddings_reranking/facescrub_' + FLAGS.dense_suffix, 'r')['embedding']
101 |   facescrub_labels = np.asarray(h5py.File('embeddings_reranking/facescrub_' + FLAGS.sparse_suffix, 'r')['label'])
102 |   facescrub_dense_labels = np.asarray(
103 |         h5py.File('embeddings_reranking/facescrub_' + FLAGS.dense_suffix, 'r')['label'])
104 |   assert(np.all(facescrub_labels == facescrub_dense_labels))
105 | 
106 |   nnz = np.abs(megaface_sparse) >= 1e-8
107 |   sparsity = np.mean(nnz)
108 |   flops = np.mean(nnz, axis=0)
109 |   flops = np.sum(flops * flops)
110 | 
111 |   mean_nnz = np.mean(np.sum(nnz, axis=1))
112 |   flops_ratio = flops * nnz.shape[1] / (mean_nnz**2)
113 | 
114 |   print("\tSparsity", sparsity, "Flops", flops, "Ratio", flops_ratio)
115 | 
116 |   top_k = [1, 10, 50, 100]
117 |   rerank_k = FLAGS.rerank_topk
118 |   print("Rerank TOP_K", rerank_k)
119 |   recalls = get_recall_k(megaface_sparse, megaface_dense,
120 |                          facescrub_sparse, facescrub_dense, facescrub_labels, top_k, rerank_k)
121 | 
122 |   print("Non Binarized")
123 |   for recall, k in zip(recalls, top_k):
124 |     print("\trecall@%d" % k, recall)
125 | 
126 | 
127 | if __name__ == '__main__':
128 |   app.run(main)
129 | 


--------------------------------------------------------------------------------
/evaluate_retrieval.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | import model
  3 | import pprint
  4 | import time
  5 | from PIL import Image
  6 | import absl
  7 | import sys
  8 | import os
  9 | import numpy as np
 10 | import horovod.tensorflow as hvd
 11 | from tensorflow.core.framework.summary_pb2 import Summary
 12 | from tensorflow import logging
 13 | import shutil
 14 | import glob
 15 | from sklearn.cluster import MiniBatchKMeans
 16 | from sklearn.metrics.cluster import normalized_mutual_info_score
 17 | import sklearn
 18 | import traceback
 19 | import logging
 20 | import scipy
 21 | import flags
 22 | from tqdm import tqdm
 23 | from copy import deepcopy
 24 | from cifar import cifar
 25 | 
 26 | # Define all necessary hyperparameters as flags
 27 | FLAGS = absl.flags.FLAGS
 28 | 
 29 | def save_image(image, filename):
 30 |   feature_ = (image + 1.0) * 128
 31 |   img = Image.fromarray(np.uint8(feature_), 'RGB')
 32 |   img.save(filename)
 33 | 
 34 | 
 35 | def img_full_path(filename):
 36 |   synset = filename.split('_')[0]
 37 |   full_path = os.path.join('../imagenet10k/images', synset, filename)
 38 |   return full_path
 39 | 
 40 | 
 41 | def get_global_step(ckpt_path=None):
 42 |   if ckpt_path is None:
 43 |     ckpt_path = tf.train.latest_checkpoint(FLAGS.model_dir)
 44 |   base = os.path.basename(ckpt_path)
 45 |   global_step = int(base.split('-')[1])
 46 |   return global_step
 47 | 
 48 | 
 49 | def get_sorted_global_steps():
 50 |   all_paths = glob.glob(os.path.join(FLAGS.model_dir, '*.ckpt*data*'))
 51 |   global_steps = [int(path.split('-')[-4].split('.')[0]) for path in all_paths]
 52 |   global_steps.sort()
 53 |   return global_steps
 54 | 
 55 | 
 56 | def get_last_eval_step():
 57 |   try:
 58 |     last_step = np.loadtxt(os.path.join(FLAGS.model_dir, 'eval', 'last_eval.txt'))
 59 |     return last_step
 60 |   except OSError:
 61 |     save_last_eval_step(0)
 62 |     return 0
 63 | 
 64 | 
 65 | def save_last_eval_step(last_step):
 66 |   file_path = os.path.join(FLAGS.model_dir, 'eval', 'last_eval.txt')
 67 |   np.savetxt(file_path, [last_step])
 68 |   print('Saved last_step=%d to %s' % (last_step, file_path))
 69 | 
 70 | 
 71 | def binarize(mat):
 72 |   mat = deepcopy(mat)
 73 |   mat[mat > FLAGS.zero_threshold] = 1.0
 74 |   mat[mat < FLAGS.zero_threshold] = -1.0
 75 |   norm = np.linalg.norm(mat, axis=1, keepdims=True)
 76 |   mat = mat / norm
 77 |   return mat
 78 | 
 79 | 
 80 | def assign_by_euclidian_at_k(X, T, k):
 81 |     """
 82 |     X : [nb_samples x nb_features], e.g. 100 x 64 (embeddings)
 83 |     k : for each sample, assign target labels of k nearest points
 84 |     """
 85 |     # distances = sklearn.metrics.pairwise.pairwise_distances(X)
 86 |     distances = scipy.spatial.distance.squareform(scipy.spatial.distance.pdist(X))
 87 |     # get nearest points
 88 |     indices = np.argsort(distances, axis=1)[:, 1:k + 1]
 89 |     return np.array([[T[i] for i in ii] for ii in indices])
 90 | 
 91 | 
 92 | def calc_recall_at_k(T, Y, k):
 93 |     """
 94 |     T : [nb_samples] (target labels)
 95 |     Y : [nb_samples x k] (k predicted labels/neighbours)
 96 |     """
 97 |     s = sum([1 for t, y in zip(T, Y) if t in y[:k]])
 98 |     return s / (1. * len(T))
 99 | 
100 | 
101 | def evaluate(X, T, nb_classes):
102 |     # calculate NMI with kmeans clustering
103 |     # nmi = sklearn.metrics.cluster.normalized_mutual_info_score(
104 |     #     sklearn.cluster.KMeans(nb_classes).fit(X).labels_,
105 |     #     T
106 |     # )
107 |     # logging.info("NMI: {:.3f}".format(nmi * 100))
108 | 
109 |     # get predictions by assigning nearest 8 neighbors with euclidian
110 |     Y = assign_by_euclidian_at_k(X, T, 8)
111 | 
112 |     # calculate recall @ 1, 2, 4, 8
113 |     recall = []
114 |     for k in [1, 2, 4, 8]:
115 |         r_at_k = calc_recall_at_k(T, Y, k)
116 |         recall.append(r_at_k)
117 |         logging.info("R@{} : {:.3f}".format(k, 100 * r_at_k))
118 | 
119 |     # return nmi, recall
120 | 
121 | 
122 | def compute_nmi(embeddings, labels):
123 |   print("Clustering ...")
124 |   num_classes = len(set(labels))
125 |   print("num_classes %d\tnum_examples %d" % (num_classes, len(embeddings)))
126 | 
127 |   centers = {}
128 |   for emb, label in zip(embeddings, labels):
129 |     if label not in centers:
130 |       centers[label] = (emb, 1)
131 |     else:
132 |       pair = centers[label]
133 |       centers[label] = (pair[0] + emb, pair[1] + 1)
134 |   centers = [centers[key] for key in centers]
135 |   centers = [pair[0] / pair[1] for pair in centers]
136 |   centers = np.asarray(centers)
137 |   print("Computed centers")
138 | 
139 |   kmeans = MiniBatchKMeans(n_clusters=num_classes, init=centers,
140 |                            batch_size=1000, max_iter=1000)
141 |   kmeans = kmeans.fit(embeddings)
142 |   cluster_labels = kmeans.labels_
143 |   nmi = normalized_mutual_info_score(labels, cluster_labels)
144 |   return nmi
145 | 
146 | MAX_QUERY = 1000
147 | MAX_DISTRACTORS = 500000
148 | def eval_recall(embedding, label, ks):
149 |     embedding = np.copy(embedding)
150 |     # Normalized embeddings
151 |     embedding /= np.linalg.norm(embedding, axis=1).reshape((-1, 1))
152 |     norm = np.sum(embedding * embedding, axis = 1)
153 |     num_correct = np.zeros(len(ks))
154 |     num_queries = min(embedding.shape[0], MAX_QUERY)
155 |     for i in tqdm(range(num_queries)):
156 |         dis = norm[i] + norm - 2 * np.squeeze(np.matmul(embedding[i],embedding.T))
157 |         dis[i] = 1e10
158 |         pred = np.argsort(dis)
159 |         for j, k in enumerate(ks):
160 |           if label[i] in label[pred[:k]]:
161 |             num_correct[j] += 1
162 |     recall = num_correct / num_queries
163 |     return recall
164 | 
165 | 
166 | def eval_precision(embedding, label, ks):
167 |     embedding = np.copy(embedding)
168 |     # Normalized embeddings
169 |     embedding /= np.linalg.norm(embedding, axis=1).reshape((-1, 1))
170 |     num_correct = np.zeros(len(ks))
171 |     num_queries = min(embedding.shape[0], MAX_QUERY)
172 |     for i in tqdm(range(num_queries)):
173 |         dis = - np.squeeze(np.matmul(embedding[i],embedding.T))
174 |         dis[i] = 1e10
175 |         pred = np.argsort(dis)
176 |         for j, k in enumerate(ks):
177 |           num_correct[j] += np.mean(label[pred[:k]] == label[i])
178 |     recall = num_correct / num_queries
179 |     return recall
180 | 
181 | 
182 | def compute_metrics(estimator, global_step, is_training, summary_writer=None):
183 |   global MAX_DISTRACTORS
184 |   global MAX_QUERY
185 | 
186 |   print("Computing for global step %d" % global_step)
187 |   if FLAGS.model == 'cifar100':
188 |     input_fn = lambda: cifar.input_fn(
189 |       FLAGS.data_dir, is_training, FLAGS.batch_size, 1)
190 |   else:
191 |     if not is_training:
192 |       MAX_DISTRACTORS = 1e10
193 |       MAX_QUERY = 10000
194 |     input_fn = lambda: model.imagenet_iterator(is_training=is_training)
195 | 
196 |   predictions = estimator.predict(
197 |     input_fn=input_fn,
198 |     yield_single_examples=True,
199 |     predict_keys=["true_labels", "true_label_texts", "small_embeddings", "filename"],
200 |     checkpoint_path=os.path.join(FLAGS.model_dir, 'model.ckpt-%d' % global_step)
201 |   )
202 | 
203 |   true_labels = []
204 |   true_label_texts = []
205 |   small_embeddings = []
206 |   filenames = []
207 | 
208 |   start_time = time.time()
209 |   step = 0
210 |   for prediction in tqdm(predictions):
211 |     filenames.append(prediction["filename"])
212 |     true_labels.append(prediction["true_labels"][0])
213 |     true_label_texts.append(prediction["true_label_texts"])
214 |     small_embeddings.append(prediction["small_embeddings"])
215 |     step += 1
216 |     if FLAGS.debug and step >= 100000:
217 |       break
218 |     if step >= MAX_DISTRACTORS:
219 |       break
220 |   inference_time = time.time() - start_time
221 | 
222 |   true_labels = np.asarray(true_labels)
223 |   small_embeddings = np.asarray(small_embeddings)
224 |   small_embeddings[np.abs(small_embeddings) <= FLAGS.zero_threshold] = 0.0
225 |   binarized_embeddings = binarize(small_embeddings)
226 | 
227 |   ks = [1, 4, 16, 64]
228 |   precisions = eval_precision(small_embeddings, true_labels, ks)
229 |   bin_precisions = eval_precision(binarized_embeddings, true_labels, ks)
230 | 
231 |   print("Global Step %d" % global_step)
232 |   # print("Max dot product")
233 |   # print("\tmean", np.mean(max_dots), "std", np.std(max_dots))
234 | 
235 |   metrics_time = time.time() - start_time
236 | 
237 |   print("Time")
238 |   print("\tInference", inference_time / 60, "minutes")
239 |   print("\tMetrics", metrics_time / 60, "minutes")
240 | 
241 |   sys.stdout.flush()
242 | 
243 |   nnz = np.abs(small_embeddings) >= FLAGS.zero_threshold
244 |   sparsity = 1.0 - np.mean(nnz)
245 |   flops = np.mean(nnz, axis=0)
246 |   flops = np.sum(flops * flops)
247 | 
248 |   mean_nnz = np.mean(np.sum(nnz, axis=1))
249 |   flops_ratio = flops * FLAGS.embedding_size / (mean_nnz**2)
250 |   print("\tSparsity", sparsity, "Flops", flops, "Ratio", flops_ratio)
251 | 
252 |   print("Precision")
253 |   print("".join(["\t@%d %f" % (ks[i], precisions[i]) for i in range(len(ks))]))
254 | 
255 |   print("Bin Precision")
256 |   print("".join(["\t@%d %f" % (ks[i], bin_precisions[i]) for i in range(len(ks))]))
257 | 
258 |   row_sparsity = np.mean(small_embeddings <= FLAGS.zero_threshold, axis=1)
259 |   print("Row sparsity")
260 |   print("\tmean", np.mean(row_sparsity), "std", np.std(row_sparsity))
261 | 
262 |   col_sparsity = np.mean(small_embeddings <= FLAGS.zero_threshold, axis=0)
263 |   print("Column sparsity")
264 |   print("\tmean", np.mean(col_sparsity), "std", np.std(col_sparsity))
265 | 
266 |   # print("NMI %f" % nmi)
267 | 
268 |   if summary_writer is not None:
269 |     summary = tf.Summary(
270 |       value=[tf.Summary.Value(tag='sparsity/small', simple_value=mean_nnz),
271 |              tf.Summary.Value(tag='sparsity/flops', simple_value=flops),
272 |              tf.Summary.Value(tag='sparsity/ratio', simple_value=flops_ratio)] +
273 |       [tf.Summary.Value(tag='accuracy/precision_%d' % (ks[i]), simple_value=precisions[i]) \
274 |           for i in range(len(ks))] +
275 |       [tf.Summary.Value(tag='accuracy/precision_%d_bin' % (ks[i]), simple_value=bin_precisions[i]) \
276 |           for i in range(len(ks))]
277 |     )
278 | 
279 |     summary_writer.add_summary(summary, global_step=global_step)
280 | 
281 | 
282 | def main(_):
283 |   global MAX_DISTRACTORS
284 |   global MAX_QUERY
285 | 
286 |   config = tf.estimator.RunConfig(
287 |     model_dir=FLAGS.model_dir,
288 |     keep_checkpoint_every_n_hours=1,
289 |     save_summary_steps=50,
290 |     save_checkpoints_secs=60 * 20,
291 |   )
292 |   estimator = tf.estimator.Estimator(
293 |     model_fn=model.model_fn,
294 |     config=config
295 |   )
296 | 
297 |   if not FLAGS.evaluate_all:
298 |     global_step = get_global_step()
299 |     MAX_DISTRACTORS = 1e10
300 |     MAX_QUERY = 10000
301 |     compute_metrics(estimator, global_step, False, None)
302 | 
303 |   else:
304 |     logdir = os.path.join(FLAGS.model_dir, 'eval')
305 |     print('Writing to', logdir)
306 |     last_eval_time = time.time()
307 |     with tf.summary.FileWriter(
308 |         logdir=logdir,
309 |         flush_secs=30) as writer:
310 |       while True:
311 |         global_steps = get_sorted_global_steps()
312 |         last_eval_step = get_last_eval_step()
313 |         global_steps = [step for step in global_steps if step > last_eval_step]
314 |         if len(global_steps) == 0:
315 |           curr_time = time.time()
316 |           if curr_time - last_eval_time >= 10 * 60:
317 |             print("Nothing else to evaluate. Exiting")
318 |             break
319 |           else:
320 |             print("Waiting for 2 minutes")
321 |             time.sleep(2 * 60)
322 |             continue
323 | 
324 |         evaluate_next = global_steps[0]
325 |         print("Models to be evaluated", global_steps)
326 |         print("Attempting to evaluate now, Step %d" % evaluate_next)
327 |         try:
328 |           compute_metrics(estimator, evaluate_next, writer)
329 |           save_last_eval_step(evaluate_next)
330 |           last_eval_time = time.time()
331 |         except Exception as e:
332 |           print("*" * 5, "Failed to evaluate Step %d" % evaluate_next)
333 |           traceback.print_exc()
334 | 
335 | 
336 | if __name__ == '__main__':
337 |   absl.app.run(main)
338 | 


--------------------------------------------------------------------------------
/flags.py:
--------------------------------------------------------------------------------
 1 | import absl
 2 | 
 3 | # gb_limit = int(3e11 / 8) # 300 gb divided into 8 processes
 4 | # resource.setrlimit(resource.RLIMIT_AS, (gb_limit, gb_limit))
 5 | 
 6 | # Define all necessary hyperparameters as flags
 7 | flags = absl.flags
 8 | 
 9 | # The same flag is used for the various kinds of l1 parameters
10 | # used in the different l1 weighting schemes
11 | flags.DEFINE_float(name='l1_parameter',
12 |                    help='The value of the l1_parameter',
13 |                    default=0.0)
14 | flags.DEFINE_float(name='zero_threshold',
15 |                    help='Threshold below which values will be considered zero',
16 |                    default=1e-8)
17 | flags.DEFINE_float(name='learning_rate', help='Learning rate', default=1e-2)
18 | flags.DEFINE_float(name='momentum',
19 |                    help='Momentum value if using momentum optimizer',
20 |                    default=0.7)
21 | flags.DEFINE_float(name='l1_p_steps',
22 |                    help='Number of steps to reach maximum weight',
23 |                    default=5e4)
24 | 
25 | flags.DEFINE_integer(name='batch_size', help='Batch size', default=64)
26 | # flags.DEFINE_integer(name='num_epochs', help='Number of training epochs', default=10000)
27 | # Number of classes for Megaface should be 604854 + 1
28 | flags.DEFINE_integer(name='num_classes', help='Number of training classes', default=1001)
29 | flags.DEFINE_integer(name='embedding_size',
30 |                      help='Embedding size',
31 |                      default=1024)
32 | flags.DEFINE_integer(name='test_size', help='Test size', default=2000)
33 | flags.DEFINE_integer(name='decay_step', help='Number of steps after which to decay lr by 10', default=None)
34 | flags.DEFINE_integer(name='max_steps', help='Number of steps to train', default=1000000)
35 | 
36 | flags.DEFINE_string(name='mobilenet_checkpoint_path',
37 |                     help='Path to the checkpoint file for MobileNetV2',
38 |                     default=None)
39 | flags.DEFINE_string(name='model_dir',
40 |                     help='Path to the model directory for checkpoints and summaries',
41 |                     default='checkpoints/sparse_mobilenet/')
42 | flags.DEFINE_string(name='data_dir',
43 |                     help='Directory containing the training and validation data',
44 |                     default='../imagenet/tfrecords')
45 | flags.DEFINE_string(name='lfw_dir',
46 |                     help='Directory containing the LFW data',
47 |                     default='../msceleb1m')
48 | flags.DEFINE_string(name='optimizer',
49 |                     help="The optimizer to use. Options are 'sgd', 'mom' and 'adam'",
50 |                     default='sgd')
51 | flags.DEFINE_string(name='model',
52 |                     help='Options are mobilenet, mobilefacenet, metric_learning',
53 |                     default='mobilenet')
54 | # Flag for the l1 weighting scheme to be used.
55 | # Constant: constant l1 weight equal to l1_parameter
56 | # Dynamic_1: l1_parameter / EMA(l1_norm)
57 | # Dynamic_2: l1_weight += \lambda * (c - cl_loss)
58 | flags.DEFINE_string(name='l1_weighing_scheme',
59 |                     help='constant, dynamic_1, dynamic_2, dynamic_3',
60 |                     default=None)
61 | flags.DEFINE_string(name='sparsity_type',
62 |                     help='Options are l1_norm, flops_sur',
63 |                     default=None)
64 | flags.DEFINE_string(name='final_activation',
65 |                     help='The activation to use in the final layer producing the embeddings',
66 |                     default=None)
67 | flags.DEFINE_string(name='megaface_dir',
68 |                     help='Root directory of the megaface tfrecords',
69 |                     default='../megaface_distractors/tfrecords_official/')
70 | flags.DEFINE_string(name='facescrub_dir',
71 |                     help='Root directory of the facescrub tfrecords',
72 |                     default='../facescrub/tfrecords_official/')
73 | # flags.DEFINE_string(name='megaface_list',
74 | #                     help='List of megaface images',
75 | #                     default='../megaface_distractors/templatelists/megaface_features_list.json_10000_1')
76 | # flags.DEFINE_string(name='facescrub_list',
77 | #                     help='List of facescrub images',
78 | #                     default='../megaface_distractors/templatelists/facescrub_features_list.json')
79 | 
80 | 
81 | flags.DEFINE_boolean(name='restore_last_layer',
82 |                      help='If True, the last layer will be restored when warm starting from '
83 |                           'checkpoint. If checkpoint has a different number of output classes, '
84 |                           'then set to False.',
85 |                      default=True)
86 | flags.DEFINE_boolean(name='evaluate',
87 |                      help='If true then evaluates model while training',
88 |                      default=False)
89 | 
90 | # Used by the evaluation scripts
91 | flags.DEFINE_boolean(name='debug',
92 |                      help='If True then inference will be stopped at 10000 steps',
93 |                      default=False)
94 | 


--------------------------------------------------------------------------------
/get_embeddings_reranking.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | import model
  3 | import pprint
  4 | import time
  5 | from PIL import Image
  6 | import absl
  7 | import sys
  8 | import os
  9 | import numpy as np
 10 | import horovod.tensorflow as hvd
 11 | from tensorflow.core.framework.summary_pb2 import Summary
 12 | from tensorflow import logging
 13 | import shutil
 14 | import h5py
 15 | import json
 16 | 
 17 | # Define all necessary hyperparameters as flags
 18 | flags = absl.flags
 19 | 
 20 | flags.DEFINE_float(name='zero_threshold',
 21 |                    help='Threshold below which values will be considered zero',
 22 |                    default=1e-8)
 23 | flags.DEFINE_integer(name='batch_size', help='Batch size', default=64)
 24 | flags.DEFINE_integer(name='num_classes', help='Number of classes', default=9001)
 25 | flags.DEFINE_integer(name='num_epochs', help='Number of epochs', default=1)
 26 | flags.DEFINE_integer(name='embedding_size',
 27 |                      help='Embedding size',
 28 |                      default=1024)
 29 | 
 30 | flags.DEFINE_string(name='model_dir',
 31 |                     help='Path to the model directory for checkpoints and summaries',
 32 |                     default=None)
 33 | flags.DEFINE_string(name='data_dir',
 34 |                     help='Directory containing the validation data',
 35 |                     default=None)
 36 | flags.DEFINE_string(name='model',
 37 |                     help='Options are mobilenet and mobilefacenet',
 38 |                     default=None)
 39 | flags.DEFINE_string(name='output',
 40 |                     help='The output filename',
 41 |                     default=None)
 42 | flags.DEFINE_string(name='final_activation',
 43 |                     help='The activation to use in the final layer producing the embeddings',
 44 |                     default=None)
 45 | flags.DEFINE_string(name='filelist',
 46 |                     help='The list of evaluation files',
 47 |                     default=None)             
 48 | flags.DEFINE_string(name='prefix',
 49 |                     help='appended prefix for the hdf5 file',
 50 |                     default=None)       
 51 | 
 52 | flags.DEFINE_boolean(name='debug',
 53 |                      help='If True, inference will be stopped at 10000 steps',
 54 |                      default=True)
 55 | flags.DEFINE_boolean(name='is_training',
 56 |                      help='If True, embeddings for the training data will be computed',
 57 |                      default=False)
 58 | 
 59 | 
 60 | FLAGS = flags.FLAGS
 61 | 
 62 | 
 63 | def replace_extension(filename):
 64 |   pos = filename.rfind(b'.')
 65 |   if pos != -1:
 66 |     filename = filename[:pos+1]
 67 |   filename = filename + b'png'
 68 |   return filename
 69 | 
 70 | def main(_):
 71 |   with open(FLAGS.filelist, 'r') as fin:
 72 |     filelist = json.load(fin)
 73 |   filelist = filelist['path']
 74 |   filelist = [np.string_(os.path.basename(filename)) for filename in filelist]
 75 |   if FLAGS.prefix == 'facescrub':
 76 |     filelist = [filename.replace(b" ", b"_") for filename in filelist]
 77 |     filelist = [replace_extension(filename) for filename in filelist]
 78 | 
 79 |   file_dict = {filename:i for i, filename in enumerate(filelist)}
 80 |   num_samples = len(filelist)
 81 | 
 82 |   config = tf.estimator.RunConfig(
 83 |     model_dir=FLAGS.model_dir,
 84 |     keep_checkpoint_every_n_hours=1,
 85 |     save_summary_steps=50,
 86 |     save_checkpoints_secs=60 * 20,
 87 |   )
 88 |   estimator = tf.estimator.Estimator(
 89 |     model_fn=model.model_fn,
 90 |     config=config
 91 |   )
 92 | 
 93 |   predictions = estimator.predict(
 94 |     input_fn=lambda: model.imagenet_iterator(is_training=FLAGS.is_training),
 95 |     yield_single_examples=True,
 96 |     predict_keys=["true_labels", "true_label_texts", "small_embeddings", "filename"]
 97 |   )
 98 | 
 99 |   if not os.path.exists('embeddings_reranking'):
100 |     os.mkdir('embeddings_reranking')
101 | 
102 |   if FLAGS.output is None:
103 |     output_file = FLAGS.prefix + "_" + os.path.basename(FLAGS.model_dir.rstrip('/')) + '.hdf5'
104 |   else:
105 |     output_file = FLAGS.output
106 |   output_file = os.path.join('embeddings_reranking', output_file)
107 |   if os.path.exists(output_file):
108 |     print('%s exists. Exiting' % output_file)
109 |     sys.exit()
110 | 
111 |   print("*" * 10, "Writing to", output_file)
112 | 
113 |   sparsity = []
114 |   embeddings = []
115 |   labels = []
116 |   idx = -np.ones(num_samples, dtype=np.int32)
117 | 
118 |   for step, prediction in enumerate(predictions):
119 |     embedding = prediction["small_embeddings"]
120 |     true_label = prediction["true_labels"][0]
121 |     true_label_text = np.string_(prediction["true_label_texts"])
122 |     filename = np.string_(prediction["filename"])
123 | 
124 |     idxs = np.where(np.abs(embedding) >= FLAGS.zero_threshold)[0]
125 |     sparsity.append(len(idxs))
126 | 
127 |     embeddings.append(embedding)
128 |     labels.append(true_label)
129 | 
130 |     id = file_dict[filename]
131 |     assert(idx[id] == -1)
132 |     idx[id] = step
133 | 
134 |     if step % 10000 == 0:
135 |       print('Step %d Av. Sparsity %f' % (step, np.mean(sparsity)))
136 |       sys.stdout.flush()
137 |     if FLAGS.debug and step >= 10000:
138 |       break
139 |   
140 |   embeddings = np.asarray(embeddings)
141 |   embeddings = embeddings[idx]
142 |   print("embeddings shape", embeddings.shape)
143 | 
144 |   labels = np.asarray(labels)
145 |   labels = labels[idx]
146 |   print("labels shape", labels.shape)
147 | 
148 | 
149 |   h5file = h5py.File(output_file, mode='w')
150 |   h5file.create_dataset(name='embedding', data=embeddings)
151 |   h5file.create_dataset(name='label', data=labels)
152 |   h5file.close()
153 |   print("Done writing to", output_file)
154 | 
155 | 
156 | if __name__ == '__main__':
157 |   absl.app.run(main)
158 | 


--------------------------------------------------------------------------------
/imagenet_preprocessing.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | """Provides utilities to preprocess images.
 16 | 
 17 | Training images are sampled using the provided bounding boxes, and subsequently
 18 | cropped to the sampled bounding box. Images are additionally flipped randomly,
 19 | then resized to the target output size (without aspect-ratio preservation).
 20 | 
 21 | Images used during evaluation are resized (with aspect-ratio preservation) and
 22 | centrally cropped.
 23 | 
 24 | All images undergo mean color subtraction.
 25 | 
 26 | Note that these steps are colloquially referred to as "ResNet preprocessing,"
 27 | and they differ from "VGG preprocessing," which does not use bounding boxes
 28 | and instead does an aspect-preserving resize followed by random crop during
 29 | training. (These both differ from "Inception preprocessing," which introduces
 30 | color distortion steps.)
 31 | 
 32 | """
 33 | 
 34 | from __future__ import absolute_import
 35 | from __future__ import division
 36 | from __future__ import print_function
 37 | 
 38 | import tensorflow as tf
 39 | 
 40 | _R_MEAN = 123.68
 41 | _G_MEAN = 116.78
 42 | _B_MEAN = 103.94
 43 | _CHANNEL_MEANS = [_R_MEAN, _G_MEAN, _B_MEAN]
 44 | 
 45 | # The lower bound for the smallest side of the image for aspect-preserving
 46 | # resizing. For example, if an image is 500 x 1000, it will be resized to
 47 | # _RESIZE_MIN x (_RESIZE_MIN * 2).
 48 | _RESIZE_MIN = 256
 49 | _CROP_SIZE = 227
 50 | _MIN_OBJECT_COVERED = 0.1
 51 | _ASPECT_RATIO_RANGE = [0.75, 1.33]
 52 | _AREA_RANGE = [0.05, 1.0]
 53 | 
 54 | 
 55 | def _decode_crop_and_flip(image_buffer, bbox, num_channels):
 56 |   """Crops the given image to a random part of the image, and randomly flips.
 57 | 
 58 |   We use the fused decode_and_crop op, which performs better than the two ops
 59 |   used separately in series, but note that this requires that the image be
 60 |   passed in as an un-decoded string Tensor.
 61 | 
 62 |   Args:
 63 |     image_buffer: scalar string Tensor representing the raw JPEG image buffer.
 64 |     bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords]
 65 |       where each coordinate is [0, 1) and the coordinates are arranged as
 66 |       [ymin, xmin, ymax, xmax].
 67 |     num_channels: Integer depth of the image buffer for decoding.
 68 | 
 69 |   Returns:
 70 |     3-D tensor with cropped image.
 71 | 
 72 |   """
 73 |   # A large fraction of image datasets contain a human-annotated bounding box
 74 |   # delineating the region of the image containing the object of interest.  We
 75 |   # choose to create a new bounding box for the object which is a randomly
 76 |   # distorted version of the human-annotated bounding box that obeys an
 77 |   # allowed range of aspect ratios, sizes and overlap with the human-annotated
 78 |   # bounding box. If no box is supplied, then we assume the bounding box is
 79 |   # the entire image.
 80 |   sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box(
 81 |       tf.image.extract_jpeg_shape(image_buffer),
 82 |       bounding_boxes=bbox,
 83 |       min_object_covered=_MIN_OBJECT_COVERED,
 84 |       aspect_ratio_range=_ASPECT_RATIO_RANGE,
 85 |       area_range=_AREA_RANGE,
 86 |       max_attempts=100,
 87 |       use_image_if_no_bounding_boxes=True)
 88 |   bbox_begin, bbox_size, _ = sample_distorted_bounding_box
 89 | 
 90 |   # Reassemble the bounding box in the format the crop op requires.
 91 |   offset_y, offset_x, _ = tf.unstack(bbox_begin)
 92 |   target_height, target_width, _ = tf.unstack(bbox_size)
 93 |   crop_window = tf.stack([offset_y, offset_x, target_height, target_width])
 94 | 
 95 |   # Use the fused decode and crop op here, which is faster than each in series.
 96 |   cropped = tf.image.decode_and_crop_jpeg(
 97 |       image_buffer, crop_window, channels=num_channels)
 98 | 
 99 |   # Flip to add a little more random distortion in.
100 |   cropped = tf.image.random_flip_left_right(cropped)
101 |   return cropped
102 | 
103 | 
104 | def _central_crop(image, crop_height, crop_width):
105 |   """Performs central crops of the given image list.
106 | 
107 |   Args:
108 |     image: a 3-D image tensor
109 |     crop_height: the height of the image following the crop.
110 |     crop_width: the width of the image following the crop.
111 | 
112 |   Returns:
113 |     3-D tensor with cropped image.
114 |   """
115 |   shape = tf.shape(image)
116 |   height, width = shape[0], shape[1]
117 | 
118 |   amount_to_be_cropped_h = (height - crop_height)
119 |   crop_top = amount_to_be_cropped_h // 2
120 |   amount_to_be_cropped_w = (width - crop_width)
121 |   crop_left = amount_to_be_cropped_w // 2
122 |   return tf.slice(
123 |       image, [crop_top, crop_left, 0], [crop_height, crop_width, -1])
124 | 
125 | 
126 | def _mean_image_subtraction(image, means, num_channels):
127 |   """Subtracts the given means from each image channel.
128 | 
129 |   For example:
130 |     means = [123.68, 116.779, 103.939]
131 |     image = _mean_image_subtraction(image, means)
132 | 
133 |   Note that the rank of `image` must be known.
134 | 
135 |   Args:
136 |     image: a tensor of size [height, width, C].
137 |     means: a C-vector of values to subtract from each channel.
138 |     num_channels: number of color channels in the image that will be distorted.
139 | 
140 |   Returns:
141 |     the centered image.
142 | 
143 |   Raises:
144 |     ValueError: If the rank of `image` is unknown, if `image` has a rank other
145 |       than three or if the number of channels in `image` doesn't match the
146 |       number of values in `means`.
147 |   """
148 |   if image.get_shape().ndims != 3:
149 |     raise ValueError('Input must be of size [height, width, C>0]')
150 | 
151 |   if len(means) != num_channels:
152 |     raise ValueError('len(means) must match the number of channels')
153 | 
154 |   # We have a 1-D tensor of means; convert to 3-D.
155 |   means = tf.expand_dims(tf.expand_dims(means, 0), 0)
156 | 
157 |   return image - means
158 | 
159 | 
160 | def _smallest_size_at_least(height, width, resize_min):
161 |   """Computes new shape with the smallest side equal to `smallest_side`.
162 | 
163 |   Computes new shape with the smallest side equal to `smallest_side` while
164 |   preserving the original aspect ratio.
165 | 
166 |   Args:
167 |     height: an int32 scalar tensor indicating the current height.
168 |     width: an int32 scalar tensor indicating the current width.
169 |     resize_min: A python integer or scalar `Tensor` indicating the size of
170 |       the smallest side after resize.
171 | 
172 |   Returns:
173 |     new_height: an int32 scalar tensor indicating the new height.
174 |     new_width: an int32 scalar tensor indicating the new width.
175 |   """
176 |   resize_min = tf.cast(resize_min, tf.float32)
177 | 
178 |   # Convert to floats to make subsequent calculations go smoothly.
179 |   height, width = tf.cast(height, tf.float32), tf.cast(width, tf.float32)
180 | 
181 |   smaller_dim = tf.minimum(height, width)
182 |   scale_ratio = resize_min / smaller_dim
183 | 
184 |   # Convert back to ints to make heights and widths that TF ops will accept.
185 |   new_height = tf.cast(height * scale_ratio, tf.int32)
186 |   new_width = tf.cast(width * scale_ratio, tf.int32)
187 | 
188 |   return new_height, new_width
189 | 
190 | 
191 | def _aspect_preserving_resize(image, resize_min):
192 |   """Resize images preserving the original aspect ratio.
193 | 
194 |   Args:
195 |     image: A 3-D image `Tensor`.
196 |     resize_min: A python integer or scalar `Tensor` indicating the size of
197 |       the smallest side after resize.
198 | 
199 |   Returns:
200 |     resized_image: A 3-D tensor containing the resized image.
201 |   """
202 |   shape = tf.shape(image)
203 |   height, width = shape[0], shape[1]
204 | 
205 |   new_height, new_width = _smallest_size_at_least(height, width, resize_min)
206 | 
207 |   return _resize_image(image, new_height, new_width)
208 | 
209 | 
210 | def _resize_image(image, height, width):
211 |   """Simple wrapper around tf.resize_images.
212 | 
213 |   This is primarily to make sure we use the same `ResizeMethod` and other
214 |   details each time.
215 | 
216 |   Args:
217 |     image: A 3-D image `Tensor`.
218 |     height: The target height for the resized image.
219 |     width: The target width for the resized image.
220 | 
221 |   Returns:
222 |     resized_image: A 3-D tensor containing the resized image. The first two
223 |       dimensions have the shape [height, width].
224 |   """
225 |   return tf.image.resize_images(
226 |       image, [height, width], method=tf.image.ResizeMethod.BILINEAR,
227 |       align_corners=False)
228 | 
229 | 
230 | def preprocess_image(image_buffer, bbox, output_height, output_width,
231 |                      num_channels, is_training=False, fixed_crop=False):
232 |   """Preprocesses the given image.
233 | 
234 |   Preprocessing includes decoding, cropping, and resizing for both training
235 |   and eval images. Training preprocessing, however, introduces some random
236 |   distortion of the image to improve accuracy.
237 | 
238 |   Args:
239 |     image_buffer: scalar string Tensor representing the raw JPEG image buffer.
240 |     bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords]
241 |       where each coordinate is [0, 1) and the coordinates are arranged as
242 |       [ymin, xmin, ymax, xmax].
243 |     output_height: The height of the image after preprocessing.
244 |     output_width: The width of the image after preprocessing.
245 |     num_channels: Integer depth of the image buffer for decoding.
246 |     is_training: `True` if we're preprocessing the image for training and
247 |       `False` otherwise.
248 | 
249 |   Returns:
250 |     A preprocessed image.
251 |   """
252 |   if fixed_crop:
253 |     image = tf.image.decode_jpeg(image_buffer, channels=num_channels)
254 |     image = tf.image.resize_images(image, [_RESIZE_MIN, _RESIZE_MIN])
255 |     # if is_training:
256 |     #   image = tf.image.random_flip_left_right(image)
257 |     #   image = tf.random_crop(image, size=[output_height, output_width, 3])
258 |     # else:
259 |     #   image = _central_crop(image, output_height, output_width)
260 |     image = tf.cast(image, tf.float32)
261 |   elif is_training:
262 |     # For training, we want to randomize some of the distortions.
263 |     image = _decode_crop_and_flip(image_buffer, bbox, num_channels)
264 |     image = _resize_image(image, output_height, output_width)
265 |   else:
266 |     # For validation, we want to decode, resize, then just crop the middle.
267 |     image = tf.image.decode_jpeg(image_buffer, channels=num_channels)
268 |     image = _aspect_preserving_resize(image, _RESIZE_MIN)
269 |     image = _central_crop(image, output_height, output_width)
270 | 
271 |   image.set_shape([output_height, output_width, num_channels])
272 | 
273 |   # return _mean_image_subtraction(image, _CHANNEL_MEANS, num_channels)
274 |   # Return the rescaled images instead of centered images as above
275 |   return image / 128.0 - 1.0
276 | 


--------------------------------------------------------------------------------
/input_pipeline.py:
--------------------------------------------------------------------------------
  1 | """
  2 | The following code has been adapted from
  3 | https://github.com/tensorflow/models/blob/master/official/resnet/imagenet_main.py
  4 | """
  5 | from __future__ import absolute_import
  6 | from __future__ import division
  7 | from __future__ import print_function
  8 | 
  9 | import os
 10 | import glob
 11 | 
 12 | from absl import app as absl_app
 13 | from absl import flags
 14 | import tensorflow as tf  # pylint: disable=g-bad-import-order
 15 | 
 16 | import imagenet_preprocessing
 17 | 
 18 | _DEFAULT_IMAGE_SIZE = 224
 19 | _NUM_CHANNELS = 3
 20 | # _NUM_CLASSES = 1001
 21 | 
 22 | # _NUM_IMAGES = {
 23 | #     'train': 1281167,
 24 | #     'validation': 50000,
 25 | # }
 26 | 
 27 | _NUM_TRAIN_FILES = 1024
 28 | _SHUFFLE_BUFFER_SIZE = 1000
 29 | 
 30 | # DATASET_NAME = 'ImageNet'
 31 | 
 32 | ###############################################################################
 33 | # Data processing
 34 | ###############################################################################
 35 | def get_filenames(is_training, data_dir):
 36 |   """Return filenames for dataset."""
 37 |   if is_training:
 38 |     # return [
 39 |     #     os.path.join(data_dir, 'train-%05d-of-01024' % i)
 40 |     #     for i in range(_NUM_TRAIN_FILES)]
 41 |     return glob.glob(os.path.join(data_dir, 'train-*'))
 42 |   else:
 43 |     # return [
 44 |     #     os.path.join(data_dir, 'validation-%05d-of-00128' % i)
 45 |     #     for i in range(128)]
 46 |     return glob.glob(os.path.join(data_dir, 'validation-*'))
 47 | 
 48 | 
 49 | def _parse_example_proto(example_serialized):
 50 |   """Parses an Example proto containing a training example of an image.
 51 | 
 52 |   The output of the build_image_data.py image preprocessing script is a dataset
 53 |   containing serialized Example protocol buffers. Each Example proto contains
 54 |   the following fields (values are included as examples):
 55 | 
 56 |     image/height: 462
 57 |     image/width: 581
 58 |     image/colorspace: 'RGB'
 59 |     image/channels: 3
 60 |     image/class/label: 615
 61 |     image/class/synset: 'n03623198'
 62 |     image/class/text: 'knee pad'
 63 |     image/object/bbox/xmin: 0.1
 64 |     image/object/bbox/xmax: 0.9
 65 |     image/object/bbox/ymin: 0.2
 66 |     image/object/bbox/ymax: 0.6
 67 |     image/object/bbox/label: 615
 68 |     image/format: 'JPEG'
 69 |     image/filename: 'ILSVRC2012_val_00041207.JPEG'
 70 |     image/encoded: <JPEG encoded string>
 71 | 
 72 |   Args:
 73 |     example_serialized: scalar Tensor tf.string containing a serialized
 74 |       Example protocol buffer.
 75 | 
 76 |   Returns:
 77 |     image_buffer: Tensor tf.string containing the contents of a JPEG file.
 78 |     label: Tensor tf.int32 containing the label.
 79 |     bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords]
 80 |       where each coordinate is [0, 1) and the coordinates are arranged as
 81 |       [ymin, xmin, ymax, xmax].
 82 |   """
 83 |   # Dense features in Example proto.
 84 |   feature_map = {
 85 |       'image/encoded': tf.FixedLenFeature([], dtype=tf.string,
 86 |                                           default_value=''),
 87 |       'image/class/label': tf.FixedLenFeature([1], dtype=tf.int64,
 88 |                                               default_value=-1),
 89 |       'image/class/text': tf.FixedLenFeature([], dtype=tf.string,
 90 |                                              default_value=''),
 91 |       'image/filename': tf.FixedLenFeature([], dtype=tf.string,
 92 |                                              default_value=''),
 93 |   }
 94 |   sparse_float32 = tf.VarLenFeature(dtype=tf.float32)
 95 |   # Sparse features in Example proto.
 96 |   feature_map.update(
 97 |       {k: sparse_float32 for k in ['image/object/bbox/xmin',
 98 |                                    'image/object/bbox/ymin',
 99 |                                    'image/object/bbox/xmax',
100 |                                    'image/object/bbox/ymax']})
101 | 
102 |   features = tf.parse_single_example(example_serialized, feature_map)
103 |   label = tf.cast(features['image/class/label'], dtype=tf.int32)
104 |   label_text = features['image/class/text']
105 |   filename = features['image/filename']
106 | 
107 |   xmin = tf.expand_dims(features['image/object/bbox/xmin'].values, 0)
108 |   ymin = tf.expand_dims(features['image/object/bbox/ymin'].values, 0)
109 |   xmax = tf.expand_dims(features['image/object/bbox/xmax'].values, 0)
110 |   ymax = tf.expand_dims(features['image/object/bbox/ymax'].values, 0)
111 | 
112 |   # Note that we impose an ordering of (y, x) just to make life difficult.
113 |   bbox = tf.concat([ymin, xmin, ymax, xmax], 0)
114 | 
115 |   # Force the variable number of bounding boxes into the shape
116 |   # [1, num_boxes, coords].
117 |   bbox = tf.expand_dims(bbox, 0)
118 |   bbox = tf.transpose(bbox, [0, 2, 1])
119 | 
120 |   return features['image/encoded'], label, bbox, label_text, filename
121 | 
122 | 
123 | def parse_record(raw_record, is_training, return_label_text=True, fixed_crop=False):
124 |   """Parses a record containing a training example of an image.
125 | 
126 |   The input record is parsed into a label and image, and the image is passed
127 |   through preprocessing steps (cropping, flipping, and so on).
128 | 
129 |   Args:
130 |     raw_record: scalar Tensor tf.string containing a serialized
131 |       Example protocol buffer.
132 |     is_training: A boolean denoting whether the input is for training.
133 | 
134 |   Returns:
135 |     Tuple with processed image tensor and one-hot-encoded label tensor.
136 |   """
137 |   image_buffer, label, bbox, label_text, filename = _parse_example_proto(raw_record)
138 | 
139 |   image = imagenet_preprocessing.preprocess_image(
140 |       image_buffer=image_buffer,
141 |       bbox=bbox,
142 |       output_height=_DEFAULT_IMAGE_SIZE,
143 |       output_width=_DEFAULT_IMAGE_SIZE,
144 |       num_channels=_NUM_CHANNELS,
145 |       is_training=is_training,
146 |       fixed_crop=fixed_crop)
147 | 
148 |   if return_label_text:
149 |     return image, label, label_text, filename
150 |   else:
151 |     return image, label
152 | 
153 | 
154 | def process_record_dataset(dataset, is_training, batch_size, shuffle_buffer,
155 |                            parse_record_fn, num_epochs=1, return_label_text=True,
156 |                            filter_fn=None, fixed_crop=False):
157 |   """Given a Dataset with raw records, return an iterator over the records.
158 | 
159 |   Args:
160 |     dataset: A Dataset representing raw records
161 |     is_training: A boolean denoting whether the input is for training.
162 |     batch_size: The number of samples per batch.
163 |     shuffle_buffer: The buffer size to use when shuffling records. A larger
164 |       value results in better randomness, but smaller values reduce startup
165 |       time and use less memory.
166 |     parse_record_fn: A function that takes a raw record and returns the
167 |       corresponding (image, label) pair.
168 |     num_epochs: The number of epochs to repeat the dataset.
169 | 
170 |   Returns:
171 |     Dataset of (image, label) pairs ready for iteration.
172 |   """
173 | 
174 |   # We prefetch a batch at a time, This can help smooth out the time taken to
175 |   # load input files as we go through shuffling and processing.
176 |   dataset = dataset.prefetch(buffer_size=batch_size)
177 |   dataset = dataset.map(
178 |     lambda value: parse_record_fn(value, is_training, return_label_text, fixed_crop))
179 |   if filter_fn is not None:
180 |     dataset = dataset.filter(filter_fn)
181 | 
182 |   # Shuffle the records. Note that we shuffle before repeating to ensure
183 |   # that the shuffling respects epoch boundaries.
184 |   dataset = dataset.shuffle(buffer_size=shuffle_buffer)
185 | 
186 |   # If we are training over multiple epochs before evaluating, repeat the
187 |   # dataset for the appropriate number of epochs.
188 |   dataset = dataset.repeat(num_epochs)
189 |   dataset = dataset.batch(batch_size)
190 | 
191 |   # Operations between the final prefetch and the get_next call to the iterator
192 |   # will happen synchronously during run time. We prefetch here again to
193 |   # background all of the above processing work and keep it out of the
194 |   # critical training path. Setting buffer_size to tf.contrib.data.AUTOTUNE
195 |   # allows DistributionStrategies to adjust how many batches to fetch based
196 |   # on how many devices are present.
197 |   dataset = dataset.prefetch(buffer_size=tf.contrib.data.AUTOTUNE)
198 | 
199 |   return dataset
200 | 
201 | 
202 | def input_fn(is_training, data_dir, batch_size, num_epochs=1,
203 |              return_label_text=True, filter_fn=None, fixed_crop=False):
204 |   """Input function which provides batches for train or eval.
205 | 
206 |   Args:
207 |     is_training: A boolean denoting whether the input is for training.
208 |     data_dir: The directory containing the input data.
209 |     batch_size: The number of samples per batch.
210 |     num_epochs: The number of epochs to repeat the dataset.
211 | 
212 |   Returns:
213 |     A Dataset object that can be used for iteration over the input data.
214 |   """
215 |   filenames = get_filenames(is_training, data_dir)
216 |   if len(filenames) == 0:
217 |     raise ValueError("No files found in %s" % data_dir)
218 | 
219 |   dataset = tf.data.Dataset.from_tensor_slices(filenames)
220 | 
221 |   # if is_training:
222 |   # Shuffle the input files
223 |   dataset = dataset.shuffle(buffer_size=_NUM_TRAIN_FILES)
224 | 
225 |   # Convert to individual records.
226 |   # cycle_length = 10 means 10 files will be read and deserialized in parallel.
227 |   # This number is low enough to not cause too much contention on small systems
228 |   # but high enough to provide the benefits of parallelization. You may want
229 |   # to increase this number if you have a large number of CPU cores.
230 |   dataset = dataset.apply(tf.contrib.data.parallel_interleave(
231 |       tf.data.TFRecordDataset, cycle_length=30))
232 | 
233 |   return process_record_dataset(
234 |       dataset, is_training, batch_size, _SHUFFLE_BUFFER_SIZE, parse_record,
235 |       num_epochs, return_label_text, filter_fn, fixed_crop
236 |   )
237 | 


--------------------------------------------------------------------------------
/megaface_embeddings.py:
--------------------------------------------------------------------------------
  1 | from __future__ import absolute_import
  2 | from __future__ import division
  3 | from __future__ import print_function
  4 | 
  5 | import tensorflow as tf
  6 | import model
  7 | import pprint
  8 | import time
  9 | from PIL import Image
 10 | import absl
 11 | import sys
 12 | import os
 13 | import numpy as np
 14 | import horovod.tensorflow as hvd
 15 | from tensorflow.core.framework.summary_pb2 import Summary
 16 | from tensorflow import logging
 17 | import shutil
 18 | import glob
 19 | from sklearn.model_selection import KFold
 20 | import traceback
 21 | import imagenet_preprocessing
 22 | import pickle
 23 | import matio
 24 | import json
 25 | import multiprocessing
 26 | import tqdm
 27 | 
 28 | 
 29 | flags = absl.flags
 30 | flags.DEFINE_float(name='zero_threshold',
 31 |                    help='Threshold below which values will be considered zero',
 32 |                    default=1e-8)
 33 | 
 34 | flags.DEFINE_integer(name='batch_size', help='Batch size', default=64)
 35 | flags.DEFINE_integer(name='num_classes', help='Number of classes', default=9001)
 36 | flags.DEFINE_integer(name='num_epochs', help='Number of epochs', default=1)
 37 | flags.DEFINE_integer(name='embedding_size',
 38 |                      help='Embedding size',
 39 |                      default=1024)
 40 | flags.DEFINE_integer(name='num_threads_preprocess',
 41 |                      help='Number of threads for tensorflow preprocess pipeline',
 42 |                      default=50)
 43 | flags.DEFINE_integer(name='num_threads_save',
 44 |                      help='Number of threads for saving mat files',
 45 |                      default=50)
 46 | flags.DEFINE_integer(name='step',
 47 |                      help='The global step of the checkpoint to load',
 48 |                      default=None)
 49 | 
 50 | flags.DEFINE_string(name='model_dir',
 51 |                     help='Path to the model directory for checkpoints and summaries',
 52 |                     default='checkpoints/kubernetes/sparse_imagenet10k_dynamic_2_mom/')
 53 | flags.DEFINE_string(name='data_dir',
 54 |                     help='Directory containing the validation data',
 55 |                     default='../imagenet10k/tfrecords/val')
 56 | flags.DEFINE_string(name='filelist',
 57 |                     help='JSON file containing the list of files',
 58 |                     default=None)
 59 | flags.DEFINE_string(name='model',
 60 |                     help='Options are mobilenet and mobilefacenet',
 61 |                     default='mobilenet')
 62 | flags.DEFINE_string(name='output_suffix',
 63 |                     help='The output filename',
 64 |                     default='embeddings')
 65 | flags.DEFINE_string(name='final_activation',
 66 |                     help='The activation to use in the final layer producing the embeddings',
 67 |                     default=None)
 68 | 
 69 | flags.DEFINE_boolean(name='debug',
 70 |                      help='If True then inference will be stopped at 10000 steps',
 71 |                      default=True)
 72 | flags.DEFINE_boolean(name='is_training',
 73 |                      help='If True then embeddings for the training data will be computed',
 74 |                      default=False)
 75 | 
 76 | DENSE_DIR = "checkpoints/mobilenet_retrained/"
 77 | FLAGS = flags.FLAGS
 78 | 
 79 | def input_fn(filelist):
 80 |   dummy_labels = tf.constant([1] * len(filelist))
 81 |   filelist = tf.constant(filelist)
 82 |   dataset = tf.data.Dataset.from_tensor_slices((filelist, dummy_labels))
 83 | 
 84 |   def _parse_function(filename, label):
 85 |     image_string = tf.read_file(filename)
 86 |     image = tf.image.decode_jpeg(image_string, channels=3)
 87 |     image = tf.image.resize_images(image, [112, 112])
 88 |     image = tf.cast(image, dtype=tf.float32)
 89 |     image = image / 128.0 - 1.0
 90 |     return image, filename, label
 91 | 
 92 |   dataset = dataset.map(_parse_function,
 93 |                         num_parallel_calls=FLAGS.num_threads_preprocess)
 94 |   dataset = dataset.prefetch(buffer_size=tf.contrib.data.AUTOTUNE)
 95 |   dataset = dataset.batch(FLAGS.batch_size)
 96 |   dataset = dataset.prefetch(buffer_size=tf.contrib.data.AUTOTUNE)
 97 | 
 98 |   iterator = dataset.make_one_shot_iterator()
 99 |   image, filename, label = iterator.get_next()
100 |   feature_dict = {
101 |     "features": image,
102 |     "labels": label,
103 |     "label_texts": label,
104 |     "filename": filename
105 |   }
106 | 
107 |   return feature_dict, None
108 | 
109 | 
110 | def get_global_step(ckpt_path=None):
111 |   if ckpt_path is None:
112 |     ckpt_path = tf.train.latest_checkpoint(FLAGS.model_dir)
113 |   base = os.path.basename(ckpt_path)
114 |   global_step = int(base.split('-')[1])
115 |   return global_step
116 | 
117 | 
118 | def main(_):
119 |   filelist = json.load(open(FLAGS.filelist, 'r'))["path"]
120 |   filelist = [os.path.join(FLAGS.data_dir, f) for f in filelist]
121 | 
122 |   config = tf.ConfigProto()
123 |   config.gpu_options.allow_growth = True
124 |   config = tf.estimator.RunConfig(
125 |     model_dir=FLAGS.model_dir,
126 |     session_config=config
127 |   )
128 | 
129 |   estimator = tf.estimator.Estimator(
130 |     model_fn=model.model_fn,
131 |     config=config
132 |   )
133 | 
134 |   predictions = estimator.predict(
135 |     input_fn=lambda: input_fn(filelist),
136 |     yield_single_examples=True,
137 |     predict_keys=["small_embeddings", "filename"],
138 |     checkpoint_path=\
139 |       tf.train.latest_checkpoint(FLAGS.model_dir)
140 |         if FLAGS.step is None
141 |         else 'model.ckpt-%d' % FLAGS.step
142 |   )
143 | 
144 |   P = multiprocessing.Pool(FLAGS.num_threads_save)
145 |   step = 0
146 |   avg = 0.0
147 | 
148 |   for prediction in tqdm.tqdm(predictions):
149 |     embedding = prediction["small_embeddings"]
150 |     filename = prediction["filename"]
151 |     filename = filename.decode() + FLAGS.output_suffix
152 |     P.apply_async(matio.save_mat, (filename, embedding))
153 |     step += 1
154 |     avg = (1 - 1/step) * avg + np.sum(np.abs(embedding) >= 1e-8) / step
155 |     if step % 10000 == 0:
156 |       print('Step %d\tSparsity %f' % (step, avg))
157 |   
158 |   print('Step %d\tSparsity %f' % (step, avg))
159 |   
160 |   P.close()
161 |   P.join()
162 | 
163 | 
164 | if __name__=='__main__':
165 |   absl.app.run(main)


--------------------------------------------------------------------------------
/megaface_embeddings_hdf.py:
--------------------------------------------------------------------------------
  1 | from __future__ import absolute_import
  2 | from __future__ import division
  3 | from __future__ import print_function
  4 | 
  5 | import tensorflow as tf
  6 | import model
  7 | import pprint
  8 | import time
  9 | from PIL import Image
 10 | import absl
 11 | import sys
 12 | import os
 13 | import numpy as np
 14 | import horovod.tensorflow as hvd
 15 | from tensorflow.core.framework.summary_pb2 import Summary
 16 | from tensorflow import logging
 17 | import shutil
 18 | import glob
 19 | from sklearn.model_selection import KFold
 20 | import traceback
 21 | import imagenet_preprocessing
 22 | import pickle
 23 | import matio
 24 | import json
 25 | import multiprocessing
 26 | import tqdm
 27 | import h5py
 28 | 
 29 | 
 30 | flags = absl.flags
 31 | flags.DEFINE_float(name='zero_threshold',
 32 |                    help='Threshold below which values will be considered zero',
 33 |                    default=1e-8)
 34 | 
 35 | flags.DEFINE_integer(name='batch_size', help='Batch size', default=64)
 36 | flags.DEFINE_integer(name='num_classes', help='Number of classes', default=9001)
 37 | flags.DEFINE_integer(name='num_epochs', help='Number of epochs', default=1)
 38 | flags.DEFINE_integer(name='embedding_size',
 39 |                      help='Embedding size',
 40 |                      default=1024)
 41 | flags.DEFINE_integer(name='num_threads_preprocess',
 42 |                      help='Number of threads for tensorflow preprocess pipeline',
 43 |                      default=50)
 44 | flags.DEFINE_integer(name='num_threads_save',
 45 |                      help='Number of threads for saving mat files',
 46 |                      default=50)
 47 | 
 48 | flags.DEFINE_string(name='model_dir',
 49 |                     help='Path to the model directory for checkpoints and summaries',
 50 |                     default='checkpoints/kubernetes/sparse_imagenet10k_dynamic_2_mom/')
 51 | flags.DEFINE_string(name='data_dir',
 52 |                     help='Directory containing the validation data',
 53 |                     default='../imagenet10k/tfrecords/val')
 54 | flags.DEFINE_string(name='filelist',
 55 |                     help='JSON file containing the list of files',
 56 |                     default=None)
 57 | flags.DEFINE_string(name='model',
 58 |                     help='Options are mobilenet and mobilefacenet',
 59 |                     default='mobilenet')
 60 | flags.DEFINE_string(name='output_file',
 61 |                     help='The output filename',
 62 |                     default='embeddings')
 63 | flags.DEFINE_string(name='final_activation',
 64 |                     help='The activation to use in the final layer producing the embeddings',
 65 |                     default=None)
 66 | 
 67 | flags.DEFINE_boolean(name='debug',
 68 |                      help='If True then inference will be stopped at 10000 steps',
 69 |                      default=True)
 70 | flags.DEFINE_boolean(name='is_training',
 71 |                      help='If True then embeddings for the training data will be computed',
 72 |                      default=False)
 73 | 
 74 | DENSE_DIR = "checkpoints/mobilenet_retrained/"
 75 | FLAGS = flags.FLAGS
 76 | 
 77 | def input_fn(filelist):
 78 |   dummy_labels = tf.constant([1] * len(filelist))
 79 |   filelist = tf.constant(filelist)
 80 |   dataset = tf.data.Dataset.from_tensor_slices((filelist, dummy_labels))
 81 | 
 82 |   base_dir = tf.constant(FLAGS.data_dir + '/')
 83 | 
 84 |   def _parse_function(filename, label):
 85 |     image_string = tf.read_file(tf.strings.join([base_dir, filename]))
 86 |     image = tf.image.decode_jpeg(image_string, channels=3)
 87 |     image = tf.image.resize_images(image, [112, 112])
 88 |     image = tf.cast(image, dtype=tf.float32)
 89 |     image = image / 128.0 - 1.0
 90 |     return image, filename, label
 91 | 
 92 |   dataset = dataset.map(_parse_function,
 93 |                         num_parallel_calls=FLAGS.num_threads_preprocess)
 94 |   dataset = dataset.prefetch(buffer_size=tf.contrib.data.AUTOTUNE)
 95 |   dataset = dataset.batch(FLAGS.batch_size)
 96 |   dataset = dataset.prefetch(buffer_size=tf.contrib.data.AUTOTUNE)
 97 | 
 98 |   iterator = dataset.make_one_shot_iterator()
 99 |   image, filename, label = iterator.get_next()
100 |   feature_dict = {
101 |     "features": image,
102 |     "labels": label,
103 |     "label_texts": label,
104 |     "filename": filename
105 |   }
106 | 
107 |   return feature_dict, None
108 | 
109 | 
110 | def get_global_step(ckpt_path=None):
111 |   if ckpt_path is None:
112 |     ckpt_path = tf.train.latest_checkpoint(FLAGS.model_dir)
113 |   base = os.path.basename(ckpt_path)
114 |   global_step = int(base.split('-')[1])
115 |   return global_step
116 | 
117 | 
118 | def main(_):
119 |   filelist = json.load(open(FLAGS.filelist, 'r'))["path"]
120 | 
121 |   config = tf.ConfigProto()
122 |   config.gpu_options.allow_growth = True
123 |   config = tf.estimator.RunConfig(
124 |     model_dir=FLAGS.model_dir,
125 |     session_config=config
126 |   )
127 | 
128 |   estimator = tf.estimator.Estimator(
129 |     model_fn=model.model_fn,
130 |     config=config
131 |   )
132 | 
133 |   predictions = estimator.predict(
134 |     input_fn=lambda: input_fn(filelist),
135 |     yield_single_examples=True,
136 |     predict_keys=["small_embeddings", "filename"],
137 |     checkpoint_path=tf.train.latest_checkpoint(FLAGS.model_dir)
138 |   )
139 | 
140 |   data = h5py.File(os.path.join('embeddings', FLAGS.output_file), mode='w')
141 |   inttype = h5py.special_dtype(vlen=np.int32)
142 |   floattype = h5py.special_dtype(vlen=np.float32)
143 | 
144 |   maxlen = len(filelist) + 1
145 |   embedding_vals = data.create_dataset(name='embedding_vals',
146 |       shape=(maxlen,), dtype=floattype)
147 |   embedding_idx = data.create_dataset(name='embedding_idx',
148 |       shape=(maxlen,), dtype=inttype)
149 |   embedding_dense = data.create_dataset(name='embedding_dense',
150 |       shape=(maxlen,), dtype=floattype)
151 |   filenames = data.create_dataset(name='filenames', shape=(maxlen,), dtype='S50')
152 |   tot_embeddings = data.create_dataset(name='tot_len', shape=(1,), dtype=np.int32)
153 | 
154 |   sparsity = []
155 | 
156 |   step = 0
157 |   avg = 0.0
158 |   for prediction in tqdm.tqdm(predictions):
159 |     embedding = prediction["small_embeddings"]
160 |     filename = prediction["filename"]
161 | 
162 |     idxs = np.where(np.abs(embedding) >= 1e-8)[0]
163 |     sparsity.append(len(idxs))
164 | 
165 |     embedding_vals[step] = [embedding[idx] for idx in idxs]
166 |     embedding_idx[step] = [idx for idx in idxs]
167 |     embedding_dense[step] = [emb for emb in embedding]
168 |     filenames[step] = filename
169 |     step += 1
170 |     tot_embeddings[0] = step
171 |     avg = (1 - 1/step) * avg + len(idxs) / step
172 |     if step % 10000 == 0:
173 |       print('Step %d\tSparsity %f' % (step, avg))
174 |   
175 |   print('Step %d\tSparsity %f' % (step, avg))
176 | 
177 |   data.close()
178 | 
179 | if __name__=='__main__':
180 |   absl.app.run(main)


--------------------------------------------------------------------------------
/mobilefacenet.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from mobilenet import mobilenet_v2
  3 | from mobilenet import conv_blocks as ops
  4 | from mobilenet import mobilenet as lib
  5 | import input_pipeline as input_pipeline
  6 | import pprint
  7 | import time
  8 | from PIL import Image
  9 | import imagenet_preprocessing
 10 | import absl
 11 | from argparse import Namespace
 12 | import sys
 13 | import os
 14 | import numpy as np
 15 | import horovod.tensorflow as hvd
 16 | from tensorflow.core.framework.summary_pb2 import Summary
 17 | from tensorflow import logging
 18 | import math
 19 | from resnet import resnet_model
 20 | 
 21 | # Define all necessary hyperparameters as flags
 22 | flags = absl.flags
 23 | FLAGS = flags.FLAGS
 24 | 
 25 | slim = tf.contrib.slim
 26 | op = lib.op
 27 | 
 28 | expand_input = ops.expand_input_by_factor
 29 | 
 30 | 
 31 | # The Arcface loss function is taken from
 32 | # https://github.com/auroua/InsightFace_TF/blob/master/losses/face_losses.py
 33 | def arcface_logits(embedding, out_num):
 34 |     with tf.variable_scope('Arcface'):
 35 |       weights = tf.get_variable(name='embedding_weights', shape=(embedding.shape[-1], out_num),
 36 |                                 initializer=tf.random_normal_initializer(stddev=0.001),
 37 |                                 dtype=tf.float32)
 38 |       weights = tf.nn.l2_normalize(weights, axis=0)
 39 |     #   assign_op = tf.assign(weights, tf.nn.l2_normalize(weights, axis=0))
 40 |     #   with tf.control_dependencies([assign_op]):
 41 |       cos_t = tf.matmul(embedding, weights, name='cos_t')
 42 | 
 43 |     return cos_t
 44 | 
 45 | 
 46 | def arcface_loss(logits, labels, out_num, s=64.0, m=0.5):
 47 |   assert len(labels.shape) == 1
 48 |   cos_m = math.cos(m)
 49 |   sin_m = math.sin(m)
 50 |   threshold = math.cos(math.pi - m)
 51 | 
 52 |   mask = tf.one_hot(labels, depth=out_num, name='one_hot_mask', on_value=1.0, off_value=0.0)
 53 |   cos_t = tf.multiply(logits, mask, name='cos_t')
 54 |   cos_t = tf.reduce_sum(cos_t, axis=1)
 55 | 
 56 |   cos_t2 = tf.square(cos_t, name='cos_t2')
 57 |   sin_t2 = tf.subtract(1.0, cos_t2, name='sin_t2')
 58 |   sin_t = tf.sqrt(sin_t2, name='sin_t')
 59 | 
 60 |   cos_mt = tf.subtract(tf.multiply(cos_t, cos_m), tf.multiply(sin_t, sin_m), name='cos_mt')
 61 | 
 62 |   cond = tf.less(threshold, cos_t, name='if_else')
 63 |   keep_val = cos_t - (threshold + 1)
 64 |   cos_mt = tf.where(cond, cos_mt, keep_val)
 65 |   # cos_mt = cond * cos_mt + (1 - cond) * keep_val
 66 | 
 67 |   update = tf.reshape(cos_mt, [-1, 1]) * mask
 68 |   mask = tf.cast(mask, tf.bool)
 69 |   logits = s * tf.where(mask, update, logits)
 70 |   # logits = s * (tf.reshape(cos_mt, [-1, 1]) * mask + logits * (1.0 - mask))
 71 |   loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels)
 72 |   return loss
 73 | 
 74 | 
 75 | def soft_thresh(lam):
 76 |   # lam = tf.get_variable(
 77 |   #   "soft_thresh", shape=(),
 78 |   #   initializer=tf.zeros_initializer,
 79 |   #   constraint=lambda x: "hello")
 80 |   # lam = lam
 81 |   def activation(x):
 82 |     ret = tf.nn.relu(x - lam) - tf.nn.relu(-x - lam)
 83 |     return ret  
 84 |   return activation
 85 | 
 86 | 
 87 | def sign(x):
 88 |   return tf.stop_gradient(tf.sign(x) - tf.clip_by_value(x, -1, 1)) \
 89 |       + tf.clip_by_value(x, -1, 1)
 90 | 
 91 | 
 92 | def step(x):
 93 |   return (sign(x) + 1) / 2
 94 | 
 95 | 
 96 | def mobilefacenet(features, is_training, num_classes):
 97 |   if FLAGS.final_activation == 'step':
 98 |     final_activation = step
 99 |   elif FLAGS.final_activation == 'relu':
100 |     final_activation = tf.nn.relu
101 |   elif FLAGS.final_activation == 'relu6':
102 |     final_activation = tf.nn.relu6
103 |   elif FLAGS.final_activation == 'soft_thresh':
104 |     final_activation = soft_thresh(0.5)
105 |   elif FLAGS.final_activation == 'none':
106 |     final_activation = None
107 |   else:
108 |     raise ValueError('Unknown activation %s' % str(FLAGS.final_activation))
109 | 
110 |   MFACENET_DEF = dict(
111 |     defaults={
112 |       # Note: these parameters of batch norm affect the architecture
113 |       # that's why they are here and not in training_scope.
114 |       (slim.batch_norm,): {'center': True, 'scale': True},
115 |       (slim.conv2d, slim.fully_connected, slim.separable_conv2d): {
116 |           'normalizer_fn': slim.batch_norm, 'activation_fn': tf.nn.leaky_relu,
117 |           'weights_regularizer': slim.l2_regularizer(4e-5)
118 |       },
119 |       (ops.expanded_conv,): {
120 |           'split_expansion': 1,
121 |           'normalizer_fn': slim.batch_norm,
122 |           'residual': True
123 |       },
124 |       (slim.conv2d, slim.separable_conv2d): {'padding': 'SAME'},
125 |     },
126 |     spec=[
127 |       op(slim.conv2d, stride=2, num_outputs=64, kernel_size=[3, 3]),
128 | 
129 |       op(slim.separable_conv2d, num_outputs=None, kernel_size=[3, 3],
130 |           depth_multiplier=1, stride=1),
131 | 
132 |       op(ops.expanded_conv, expansion_size=expand_input(2), stride=2, num_outputs=64),
133 |       op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=64),
134 |       op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=64),
135 |       op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=64),
136 |       op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=64),
137 | 
138 |       op(ops.expanded_conv, expansion_size=expand_input(4), stride=2, num_outputs=128),
139 | 
140 |       op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128),
141 |       op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128),
142 |       op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128),
143 |       op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128),
144 |       op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128),
145 |       op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128),
146 | 
147 |       op(ops.expanded_conv, expansion_size=expand_input(4), stride=2, num_outputs=128),
148 | 
149 |       op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128),
150 |       op(ops.expanded_conv, expansion_size=expand_input(2), stride=1, num_outputs=128),
151 | 
152 |       op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=512),
153 | 
154 |       op(slim.separable_conv2d, num_outputs=None, kernel_size=[7, 7],
155 |          depth_multiplier=1, stride=1, padding="VALID", activation_fn=None),
156 | 
157 |       op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=FLAGS.embedding_size,
158 |          weights_regularizer=slim.l2_regularizer(4e-4), activation_fn=final_activation),
159 |     ],
160 |   )
161 | 
162 |   net, end_points = mobilenet_v2.mobilenet(
163 |       features, is_training=is_training, num_classes=num_classes,
164 |       base_only=True, conv_defs=MFACENET_DEF)
165 | 
166 |   embedding = end_points['embedding']
167 |   embedding = tf.squeeze(embedding, [1, 2])
168 |   embedding_unnorm = embedding
169 |   embedding = tf.nn.l2_normalize(embedding, axis=1)
170 |   if FLAGS.final_activation == 'step':
171 |     end_points['embedding'] = embedding_unnorm
172 |   else:
173 |     end_points['embedding'] = embedding
174 |   logits = arcface_logits(embedding, num_classes)
175 |   end_points['Logits'] = logits
176 | 
177 |   return logits, end_points
178 | 
179 | 
180 | def faceresnet(features, is_training, num_classes):
181 |   if FLAGS.final_activation == 'step':
182 |     final_activation = step
183 |   elif FLAGS.final_activation == 'relu':
184 |     final_activation = tf.nn.relu
185 |   elif FLAGS.final_activation == 'relu6':
186 |     final_activation = tf.nn.relu6
187 |   elif FLAGS.final_activation == 'soft_thresh':
188 |     final_activation = soft_thresh(0.5)
189 |   elif FLAGS.final_activation == 'none':
190 |     final_activation = None
191 | 
192 |   filter_list = [64, 64, 128, 256, 512]
193 |   units = [3, 13, 30, 3]
194 | 
195 |   model = resnet_model.Model(
196 |     resnet_size=100, bottleneck=False, num_classes=FLAGS.embedding_size,
197 |     num_filters=filter_list, kernel_size=(3, 3), conv_stride=1,
198 |     first_pool_size=None, first_pool_stride=None,
199 |     block_sizes=units, block_strides=[2] * 4, resnet_version=3,
200 |     data_format='channels_last')
201 |   # Relu/soft_thresh is not applied in the model
202 |   embedding = model(features, is_training)
203 |   if final_activation is not None:
204 |     embedding = final_activation(embedding)
205 |   embedding = tf.nn.l2_normalize(embedding, axis=1)
206 | 
207 |   end_points = dict()
208 |   end_points['embedding'] = embedding
209 |   logits = arcface_logits(embedding, num_classes)
210 |   end_points['Logits'] = logits
211 | 
212 |   return logits, end_points
213 | 


--------------------------------------------------------------------------------
/mobilenet/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/biswajitsc/sparse-embed/775404d38ce4382251b88168820da3740c1323ea/mobilenet/__init__.py


--------------------------------------------------------------------------------
/mobilenet/conv_blocks.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2018 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | """Convolution blocks for mobilenet."""
 16 | import contextlib
 17 | import functools
 18 | 
 19 | import tensorflow as tf
 20 | 
 21 | slim = tf.contrib.slim
 22 | 
 23 | 
 24 | def _fixed_padding(inputs, kernel_size, rate=1):
 25 |   """Pads the input along the spatial dimensions independently of input size.
 26 | 
 27 |   Pads the input such that if it was used in a convolution with 'VALID' padding,
 28 |   the output would have the same dimensions as if the unpadded input was used
 29 |   in a convolution with 'SAME' padding.
 30 | 
 31 |   Args:
 32 |     inputs: A tensor of size [batch, height_in, width_in, channels].
 33 |     kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
 34 |     rate: An integer, rate for atrous convolution.
 35 | 
 36 |   Returns:
 37 |     output: A tensor of size [batch, height_out, width_out, channels] with the
 38 |       input, either intact (if kernel_size == 1) or padded (if kernel_size > 1).
 39 |   """
 40 |   kernel_size_effective = [kernel_size[0] + (kernel_size[0] - 1) * (rate - 1),
 41 |                            kernel_size[0] + (kernel_size[0] - 1) * (rate - 1)]
 42 |   pad_total = [kernel_size_effective[0] - 1, kernel_size_effective[1] - 1]
 43 |   pad_beg = [pad_total[0] // 2, pad_total[1] // 2]
 44 |   pad_end = [pad_total[0] - pad_beg[0], pad_total[1] - pad_beg[1]]
 45 |   padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg[0], pad_end[0]],
 46 |                                   [pad_beg[1], pad_end[1]], [0, 0]])
 47 |   return padded_inputs
 48 | 
 49 | 
 50 | def _make_divisible(v, divisor, min_value=None):
 51 |   if min_value is None:
 52 |     min_value = divisor
 53 |   new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
 54 |   # Make sure that round down does not go down by more than 10%.
 55 |   if new_v < 0.9 * v:
 56 |     new_v += divisor
 57 |   return new_v
 58 | 
 59 | 
 60 | def _split_divisible(num, num_ways, divisible_by=8):
 61 |   """Evenly splits num, num_ways so each piece is a multiple of divisible_by."""
 62 |   assert num % divisible_by == 0
 63 |   assert num / num_ways >= divisible_by
 64 |   # Note: want to round down, we adjust each split to match the total.
 65 |   base = num // num_ways // divisible_by * divisible_by
 66 |   result = []
 67 |   accumulated = 0
 68 |   for i in range(num_ways):
 69 |     r = base
 70 |     while accumulated + r < num * (i + 1) / num_ways:
 71 |       r += divisible_by
 72 |     result.append(r)
 73 |     accumulated += r
 74 |   assert accumulated == num
 75 |   return result
 76 | 
 77 | 
 78 | @contextlib.contextmanager
 79 | def _v1_compatible_scope_naming(scope):
 80 |   if scope is None:  # Create uniqified separable blocks.
 81 |     with tf.variable_scope(None, default_name='separable') as s, \
 82 |          tf.name_scope(s.original_name_scope):
 83 |       yield ''
 84 |   else:
 85 |     # We use scope_depthwise, scope_pointwise for compatibility with V1 ckpts.
 86 |     # which provide numbered scopes.
 87 |     scope += '_'
 88 |     yield scope
 89 | 
 90 | 
 91 | @slim.add_arg_scope
 92 | def split_separable_conv2d(input_tensor,
 93 |                            num_outputs,
 94 |                            scope=None,
 95 |                            normalizer_fn=None,
 96 |                            stride=1,
 97 |                            rate=1,
 98 |                            endpoints=None,
 99 |                            use_explicit_padding=False):
100 |   """Separable mobilenet V1 style convolution.
101 | 
102 |   Depthwise convolution, with default non-linearity,
103 |   followed by 1x1 depthwise convolution.  This is similar to
104 |   slim.separable_conv2d, but differs in tha it applies batch
105 |   normalization and non-linearity to depthwise. This  matches
106 |   the basic building of Mobilenet Paper
107 |   (https://arxiv.org/abs/1704.04861)
108 | 
109 |   Args:
110 |     input_tensor: input
111 |     num_outputs: number of outputs
112 |     scope: optional name of the scope. Note if provided it will use
113 |     scope_depthwise for deptwhise, and scope_pointwise for pointwise.
114 |     normalizer_fn: which normalizer function to use for depthwise/pointwise
115 |     stride: stride
116 |     rate: output rate (also known as dilation rate)
117 |     endpoints: optional, if provided, will export additional tensors to it.
118 |     use_explicit_padding: Use 'VALID' padding for convolutions, but prepad
119 |       inputs so that the output dimensions are the same as if 'SAME' padding
120 |       were used.
121 | 
122 |   Returns:
123 |     output tesnor
124 |   """
125 | 
126 |   with _v1_compatible_scope_naming(scope) as scope:
127 |     dw_scope = scope + 'depthwise'
128 |     endpoints = endpoints if endpoints is not None else {}
129 |     kernel_size = [3, 3]
130 |     padding = 'SAME'
131 |     if use_explicit_padding:
132 |       padding = 'VALID'
133 |       input_tensor = _fixed_padding(input_tensor, kernel_size, rate)
134 |     net = slim.separable_conv2d(
135 |         input_tensor,
136 |         None,
137 |         kernel_size,
138 |         depth_multiplier=1,
139 |         stride=stride,
140 |         rate=rate,
141 |         normalizer_fn=normalizer_fn,
142 |         padding=padding,
143 |         scope=dw_scope)
144 | 
145 |     endpoints[dw_scope] = net
146 | 
147 |     pw_scope = scope + 'pointwise'
148 |     net = slim.conv2d(
149 |         net,
150 |         num_outputs, [1, 1],
151 |         stride=1,
152 |         normalizer_fn=normalizer_fn,
153 |         scope=pw_scope)
154 |     endpoints[pw_scope] = net
155 |   return net
156 | 
157 | 
158 | def expand_input_by_factor(n, divisible_by=8):
159 |   return lambda num_inputs, **_: _make_divisible(num_inputs * n, divisible_by)
160 | 
161 | 
162 | @slim.add_arg_scope
163 | def expanded_conv(input_tensor,
164 |                   num_outputs,
165 |                   expansion_size=expand_input_by_factor(6),
166 |                   stride=1,
167 |                   rate=1,
168 |                   kernel_size=(3, 3),
169 |                   residual=True,
170 |                   normalizer_fn=None,
171 |                   split_projection=1,
172 |                   split_expansion=1,
173 |                   expansion_transform=None,
174 |                   depthwise_location='expansion',
175 |                   depthwise_channel_multiplier=1,
176 |                   endpoints=None,
177 |                   use_explicit_padding=False,
178 |                   padding='SAME',
179 |                   scope=None):
180 |   """Depthwise Convolution Block with expansion.
181 | 
182 |   Builds a composite convolution that has the following structure
183 |   expansion (1x1) -> depthwise (kernel_size) -> projection (1x1)
184 | 
185 |   Args:
186 |     input_tensor: input
187 |     num_outputs: number of outputs in the final layer.
188 |     expansion_size: the size of expansion, could be a constant or a callable.
189 |       If latter it will be provided 'num_inputs' as an input. For forward
190 |       compatibility it should accept arbitrary keyword arguments.
191 |       Default will expand the input by factor of 6.
192 |     stride: depthwise stride
193 |     rate: depthwise rate
194 |     kernel_size: depthwise kernel
195 |     residual: whether to include residual connection between input
196 |       and output.
197 |     normalizer_fn: batchnorm or otherwise
198 |     split_projection: how many ways to split projection operator
199 |       (that is conv expansion->bottleneck)
200 |     split_expansion: how many ways to split expansion op
201 |       (that is conv bottleneck->expansion) ops will keep depth divisible
202 |       by this value.
203 |     expansion_transform: Optional function that takes expansion
204 |       as a single input and returns output.
205 |     depthwise_location: where to put depthwise covnvolutions supported
206 |       values None, 'input', 'output', 'expansion'
207 |     depthwise_channel_multiplier: depthwise channel multiplier:
208 |     each input will replicated (with different filters)
209 |     that many times. So if input had c channels,
210 |     output will have c x depthwise_channel_multpilier.
211 |     endpoints: An optional dictionary into which intermediate endpoints are
212 |       placed. The keys "expansion_output", "depthwise_output",
213 |       "projection_output" and "expansion_transform" are always populated, even
214 |       if the corresponding functions are not invoked.
215 |     use_explicit_padding: Use 'VALID' padding for convolutions, but prepad
216 |       inputs so that the output dimensions are the same as if 'SAME' padding
217 |       were used.
218 |     padding: Padding type to use if `use_explicit_padding` is not set.
219 |     scope: optional scope.
220 | 
221 |   Returns:
222 |     Tensor of depth num_outputs
223 | 
224 |   Raises:
225 |     TypeError: on inval
226 |   """
227 |   with tf.variable_scope(scope, default_name='expanded_conv') as s, \
228 |        tf.name_scope(s.original_name_scope):
229 |     prev_depth = input_tensor.get_shape().as_list()[3]
230 |     if  depthwise_location not in [None, 'input', 'output', 'expansion']:
231 |       raise TypeError('%r is unknown value for depthwise_location' %
232 |                       depthwise_location)
233 |     if use_explicit_padding:
234 |       if padding != 'SAME':
235 |         raise TypeError('`use_explicit_padding` should only be used with '
236 |                         '"SAME" padding.')
237 |       padding = 'VALID'
238 |     depthwise_func = functools.partial(
239 |         slim.separable_conv2d,
240 |         num_outputs=None,
241 |         kernel_size=kernel_size,
242 |         depth_multiplier=depthwise_channel_multiplier,
243 |         stride=stride,
244 |         rate=rate,
245 |         normalizer_fn=normalizer_fn,
246 |         padding=padding,
247 |         scope='depthwise')
248 |     # b1 -> b2 * r -> b2
249 |     #   i -> (o * r) (bottleneck) -> o
250 |     input_tensor = tf.identity(input_tensor, 'input')
251 |     net = input_tensor
252 | 
253 |     if depthwise_location == 'input':
254 |       if use_explicit_padding:
255 |         net = _fixed_padding(net, kernel_size, rate)
256 |       net = depthwise_func(net, activation_fn=None)
257 | 
258 |     if callable(expansion_size):
259 |       inner_size = expansion_size(num_inputs=prev_depth)
260 |     else:
261 |       inner_size = expansion_size
262 | 
263 |     if inner_size > net.shape[3]:
264 |       net = split_conv(
265 |           net,
266 |           inner_size,
267 |           num_ways=split_expansion,
268 |           scope='expand',
269 |           stride=1,
270 |           normalizer_fn=normalizer_fn)
271 |       net = tf.identity(net, 'expansion_output')
272 |     if endpoints is not None:
273 |       endpoints['expansion_output'] = net
274 | 
275 |     if depthwise_location == 'expansion':
276 |       if use_explicit_padding:
277 |         net = _fixed_padding(net, kernel_size, rate)
278 |       net = depthwise_func(net)
279 | 
280 |     net = tf.identity(net, name='depthwise_output')
281 |     if endpoints is not None:
282 |       endpoints['depthwise_output'] = net
283 |     if expansion_transform:
284 |       net = expansion_transform(expansion_tensor=net, input_tensor=input_tensor)
285 |     # Note in contrast with expansion, we always have
286 |     # projection to produce the desired output size.
287 |     net = split_conv(
288 |         net,
289 |         num_outputs,
290 |         num_ways=split_projection,
291 |         stride=1,
292 |         scope='project',
293 |         normalizer_fn=normalizer_fn,
294 |         activation_fn=tf.identity)
295 |     if endpoints is not None:
296 |       endpoints['projection_output'] = net
297 |     if depthwise_location == 'output':
298 |       if use_explicit_padding:
299 |         net = _fixed_padding(net, kernel_size, rate)
300 |       net = depthwise_func(net, activation_fn=None)
301 | 
302 |     if callable(residual):  # custom residual
303 |       net = residual(input_tensor=input_tensor, output_tensor=net)
304 |     elif (residual and
305 |           # stride check enforces that we don't add residuals when spatial
306 |           # dimensions are None
307 |           stride == 1 and
308 |           # Depth matches
309 |           net.get_shape().as_list()[3] ==
310 |           input_tensor.get_shape().as_list()[3]):
311 |       net += input_tensor
312 |     return tf.identity(net, name='output')
313 | 
314 | 
315 | def split_conv(input_tensor,
316 |                num_outputs,
317 |                num_ways,
318 |                scope,
319 |                divisible_by=8,
320 |                **kwargs):
321 |   """Creates a split convolution.
322 | 
323 |   Split convolution splits the input and output into
324 |   'num_blocks' blocks of approximately the same size each,
325 |   and only connects $i$-th input to $i$ output.
326 | 
327 |   Args:
328 |     input_tensor: input tensor
329 |     num_outputs: number of output filters
330 |     num_ways: num blocks to split by.
331 |     scope: scope for all the operators.
332 |     divisible_by: make sure that every part is divisiable by this.
333 |     **kwargs: will be passed directly into conv2d operator
334 |   Returns:
335 |     tensor
336 |   """
337 |   b = input_tensor.get_shape().as_list()[3]
338 | 
339 |   if num_ways == 1 or min(b // num_ways,
340 |                           num_outputs // num_ways) < divisible_by:
341 |     # Don't do any splitting if we end up with less than 8 filters
342 |     # on either side.
343 |     return slim.conv2d(input_tensor, num_outputs, [1, 1], scope=scope, **kwargs)
344 | 
345 |   outs = []
346 |   input_splits = _split_divisible(b, num_ways, divisible_by=divisible_by)
347 |   output_splits = _split_divisible(
348 |       num_outputs, num_ways, divisible_by=divisible_by)
349 |   inputs = tf.split(input_tensor, input_splits, axis=3, name='split_' + scope)
350 |   base = scope
351 |   for i, (input_tensor, out_size) in enumerate(zip(inputs, output_splits)):
352 |     scope = base + '_part_%d' % (i,)
353 |     n = slim.conv2d(input_tensor, out_size, [1, 1], scope=scope, **kwargs)
354 |     n = tf.identity(n, scope + '_output')
355 |     outs.append(n)
356 |   return tf.concat(outs, 3, name=scope + '_concat')
357 | 


--------------------------------------------------------------------------------
/mobilenet/mobilenet.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2018 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | """Mobilenet Base Class."""
 16 | 
 17 | from __future__ import absolute_import
 18 | from __future__ import division
 19 | from __future__ import print_function
 20 | import collections
 21 | import contextlib
 22 | import copy
 23 | import os
 24 | import sys
 25 | 
 26 | import tensorflow as tf
 27 | 
 28 | 
 29 | slim = tf.contrib.slim
 30 | 
 31 | 
 32 | @slim.add_arg_scope
 33 | def apply_activation(x, name=None, activation_fn=None):
 34 |   return activation_fn(x, name=name) if activation_fn else x
 35 | 
 36 | 
 37 | def _fixed_padding(inputs, kernel_size, rate=1):
 38 |   """Pads the input along the spatial dimensions independently of input size.
 39 | 
 40 |   Pads the input such that if it was used in a convolution with 'VALID' padding,
 41 |   the output would have the same dimensions as if the unpadded input was used
 42 |   in a convolution with 'SAME' padding.
 43 | 
 44 |   Args:
 45 |     inputs: A tensor of size [batch, height_in, width_in, channels].
 46 |     kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
 47 |     rate: An integer, rate for atrous convolution.
 48 | 
 49 |   Returns:
 50 |     output: A tensor of size [batch, height_out, width_out, channels] with the
 51 |       input, either intact (if kernel_size == 1) or padded (if kernel_size > 1).
 52 |   """
 53 |   kernel_size_effective = [kernel_size[0] + (kernel_size[0] - 1) * (rate - 1),
 54 |                            kernel_size[0] + (kernel_size[0] - 1) * (rate - 1)]
 55 |   pad_total = [kernel_size_effective[0] - 1, kernel_size_effective[1] - 1]
 56 |   pad_beg = [pad_total[0] // 2, pad_total[1] // 2]
 57 |   pad_end = [pad_total[0] - pad_beg[0], pad_total[1] - pad_beg[1]]
 58 |   padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg[0], pad_end[0]],
 59 |                                   [pad_beg[1], pad_end[1]], [0, 0]])
 60 |   return padded_inputs
 61 | 
 62 | 
 63 | def _make_divisible(v, divisor, min_value=None):
 64 |   if min_value is None:
 65 |     min_value = divisor
 66 |   new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
 67 |   # Make sure that round down does not go down by more than 10%.
 68 |   if new_v < 0.9 * v:
 69 |     new_v += divisor
 70 |   return new_v
 71 | 
 72 | 
 73 | @contextlib.contextmanager
 74 | def _set_arg_scope_defaults(defaults):
 75 |   """Sets arg scope defaults for all items present in defaults.
 76 | 
 77 |   Args:
 78 |     defaults: dictionary/list of pairs, containing a mapping from
 79 |     function to a dictionary of default args.
 80 | 
 81 |   Yields:
 82 |     context manager where all defaults are set.
 83 |   """
 84 |   if hasattr(defaults, 'items'):
 85 |     items = list(defaults.items())
 86 |   else:
 87 |     items = defaults
 88 |   if not items:
 89 |     yield
 90 |   else:
 91 |     func, default_arg = items[0]
 92 |     with slim.arg_scope(func, **default_arg):
 93 |       with _set_arg_scope_defaults(items[1:]):
 94 |         yield
 95 | 
 96 | 
 97 | @slim.add_arg_scope
 98 | def depth_multiplier(output_params,
 99 |                      multiplier,
100 |                      divisible_by=8,
101 |                      min_depth=8,
102 |                      **unused_kwargs):
103 |   if 'num_outputs' not in output_params:
104 |     return
105 |   d = output_params['num_outputs']
106 |   if d is not None:
107 |     output_params['num_outputs'] = _make_divisible(d * multiplier, divisible_by,
108 |                                                    min_depth)
109 | 
110 | 
111 | _Op = collections.namedtuple('Op', ['op', 'params', 'multiplier_func'])
112 | 
113 | 
114 | def op(opfunc, **params):
115 |   multiplier = params.pop('multiplier_transorm', depth_multiplier)
116 |   return _Op(opfunc, params=params, multiplier_func=multiplier)
117 | 
118 | 
119 | class NoOpScope(object):
120 |   """No-op context manager."""
121 | 
122 |   def __enter__(self):
123 |     return None
124 | 
125 |   def __exit__(self, exc_type, exc_value, traceback):
126 |     return False
127 | 
128 | 
129 | def safe_arg_scope(funcs, **kwargs):
130 |   """Returns `slim.arg_scope` with all None arguments removed.
131 | 
132 |   Arguments:
133 |     funcs: Functions to pass to `arg_scope`.
134 |     **kwargs: Arguments to pass to `arg_scope`.
135 | 
136 |   Returns:
137 |     arg_scope or No-op context manager.
138 | 
139 |   Note: can be useful if None value should be interpreted as "do not overwrite
140 |     this parameter value".
141 |   """
142 |   filtered_args = {name: value for name, value in kwargs.items()
143 |                    if value is not None}
144 |   if filtered_args:
145 |     return slim.arg_scope(funcs, **filtered_args)
146 |   else:
147 |     return NoOpScope()
148 | 
149 | 
150 | @slim.add_arg_scope
151 | def mobilenet_base(  # pylint: disable=invalid-name
152 |     inputs,
153 |     conv_defs,
154 |     multiplier=1.0,
155 |     final_endpoint=None,
156 |     output_stride=None,
157 |     use_explicit_padding=False,
158 |     scope=None,
159 |     is_training=False):
160 |   """Mobilenet base network.
161 | 
162 |   Constructs a network from inputs to the given final endpoint. By default
163 |   the network is constructed in inference mode. To create network
164 |   in training mode use:
165 | 
166 |   with slim.arg_scope(mobilenet.training_scope()):
167 |      logits, endpoints = mobilenet_base(...)
168 | 
169 |   Args:
170 |     inputs: a tensor of shape [batch_size, height, width, channels].
171 |     conv_defs: A list of op(...) layers specifying the net architecture.
172 |     multiplier: Float multiplier for the depth (number of channels)
173 |       for all convolution ops. The value must be greater than zero. Typical
174 |       usage will be to set this value in (0, 1) to reduce the number of
175 |       parameters or computation cost of the model.
176 |     final_endpoint: The name of last layer, for early termination for
177 |     for V1-based networks: last layer is "layer_14", for V2: "layer_20"
178 |     output_stride: An integer that specifies the requested ratio of input to
179 |       output spatial resolution. If not None, then we invoke atrous convolution
180 |       if necessary to prevent the network from reducing the spatial resolution
181 |       of the activation maps. Allowed values are 1 or any even number, excluding
182 |       zero. Typical values are 8 (accurate fully convolutional mode), 16
183 |       (fast fully convolutional mode), and 32 (classification mode).
184 | 
185 |       NOTE- output_stride relies on all consequent operators to support dilated
186 |       operators via "rate" parameter. This might require wrapping non-conv
187 |       operators to operate properly.
188 | 
189 |     use_explicit_padding: Use 'VALID' padding for convolutions, but prepad
190 |       inputs so that the output dimensions are the same as if 'SAME' padding
191 |       were used.
192 |     scope: optional variable scope.
193 |     is_training: How to setup batch_norm and other ops. Note: most of the time
194 |       this does not need be set directly. Use mobilenet.training_scope() to set
195 |       up training instead. This parameter is here for backward compatibility
196 |       only. It is safe to set it to the value matching
197 |       training_scope(is_training=...). It is also safe to explicitly set
198 |       it to False, even if there is outer training_scope set to to training.
199 |       (The network will be built in inference mode). If this is set to None,
200 |       no arg_scope is added for slim.batch_norm's is_training parameter.
201 | 
202 |   Returns:
203 |     tensor_out: output tensor.
204 |     end_points: a set of activations for external use, for example summaries or
205 |                 losses.
206 | 
207 |   Raises:
208 |     ValueError: depth_multiplier <= 0, or the target output_stride is not
209 |                 allowed.
210 |   """
211 |   if multiplier <= 0:
212 |     raise ValueError('multiplier is not greater than zero.')
213 | 
214 |   # Set conv defs defaults and overrides.
215 |   conv_defs_defaults = conv_defs.get('defaults', {})
216 |   conv_defs_overrides = conv_defs.get('overrides', {})
217 |   if use_explicit_padding:
218 |     conv_defs_overrides = copy.deepcopy(conv_defs_overrides)
219 |     conv_defs_overrides[
220 |         (slim.conv2d, slim.separable_conv2d)] = {'padding': 'VALID'}
221 | 
222 |   if output_stride is not None:
223 |     if output_stride == 0 or (output_stride > 1 and output_stride % 2):
224 |       raise ValueError('Output stride must be None, 1 or a multiple of 2.')
225 | 
226 |   # a) Set the tensorflow scope
227 |   # b) set padding to default: note we might consider removing this
228 |   # since it is also set by mobilenet_scope
229 |   # c) set all defaults
230 |   # d) set all extra overrides.
231 |   with _scope_all(scope, default_scope='Mobilenet'), \
232 |       safe_arg_scope([slim.batch_norm], is_training=is_training), \
233 |       _set_arg_scope_defaults(conv_defs_defaults), \
234 |       _set_arg_scope_defaults(conv_defs_overrides):
235 |     # The current_stride variable keeps track of the output stride of the
236 |     # activations, i.e., the running product of convolution strides up to the
237 |     # current network layer. This allows us to invoke atrous convolution
238 |     # whenever applying the next convolution would result in the activations
239 |     # having output stride larger than the target output_stride.
240 |     current_stride = 1
241 | 
242 |     # The atrous convolution rate parameter.
243 |     rate = 1
244 | 
245 |     net = inputs
246 |     # Insert default parameters before the base scope which includes
247 |     # any custom overrides set in mobilenet.
248 |     end_points = {}
249 |     scopes = {}
250 |     for i, opdef in enumerate(conv_defs['spec']):
251 |       params = dict(opdef.params)
252 |       opdef.multiplier_func(params, multiplier)
253 |       stride = params.get('stride', 1)
254 |       if output_stride is not None and current_stride == output_stride:
255 |         # If we have reached the target output_stride, then we need to employ
256 |         # atrous convolution with stride=1 and multiply the atrous rate by the
257 |         # current unit's stride for use in subsequent layers.
258 |         layer_stride = 1
259 |         layer_rate = rate
260 |         rate *= stride
261 |       else:
262 |         layer_stride = stride
263 |         layer_rate = 1
264 |         current_stride *= stride
265 |       # Update params.
266 |       params['stride'] = layer_stride
267 |       # Only insert rate to params if rate > 1.
268 |       if layer_rate > 1:
269 |         params['rate'] = layer_rate
270 |       # Set padding
271 |       if use_explicit_padding:
272 |         if 'kernel_size' in params:
273 |           net = _fixed_padding(net, params['kernel_size'], layer_rate)
274 |         else:
275 |           params['use_explicit_padding'] = True
276 | 
277 |       end_point = 'layer_%d' % (i + 1)
278 |       try:
279 |         net = opdef.op(net, **params)
280 |       except Exception:
281 |         print('Failed to create op %i: %r params: %r' % (i, opdef, params))
282 |         raise
283 |       end_points[end_point] = net
284 |       scope = os.path.dirname(net.name)
285 |       scopes[scope] = end_point
286 |       if final_endpoint is not None and end_point == final_endpoint:
287 |         break
288 | 
289 |     # Add all tensors that end with 'output' to
290 |     # endpoints
291 |     for t in net.graph.get_operations():
292 |       scope = os.path.dirname(t.name)
293 |       bn = os.path.basename(t.name)
294 |       if scope in scopes and t.name.endswith('output'):
295 |         end_points[scopes[scope] + '/' + bn] = t.outputs[0]
296 |     return net, end_points
297 | 
298 | 
299 | @contextlib.contextmanager
300 | def _scope_all(scope, default_scope=None):
301 |   with tf.variable_scope(scope, default_name=default_scope) as s,\
302 |        tf.name_scope(s.original_name_scope):
303 |     yield s
304 | 
305 | 
306 | @slim.add_arg_scope
307 | def mobilenet(inputs,
308 |               num_classes=1001,
309 |               prediction_fn=slim.softmax,
310 |               reuse=None,
311 |               scope='Mobilenet',
312 |               base_only=False,
313 |               **mobilenet_args):
314 |   """Mobilenet model for classification, supports both V1 and V2.
315 | 
316 |   Note: default mode is inference, use mobilenet.training_scope to create
317 |   training network.
318 | 
319 | 
320 |   Args:
321 |     inputs: a tensor of shape [batch_size, height, width, channels].
322 |     num_classes: number of predicted classes. If 0 or None, the logits layer
323 |       is omitted and the input features to the logits layer (before dropout)
324 |       are returned instead.
325 |     prediction_fn: a function to get predictions out of logits
326 |       (default softmax).
327 |     reuse: whether or not the network and its variables should be reused. To be
328 |       able to reuse 'scope' must be given.
329 |     scope: Optional variable_scope.
330 |     base_only: if True will only create the base of the network (no pooling
331 |     and no logits).
332 |     **mobilenet_args: passed to mobilenet_base verbatim.
333 |       - conv_defs: list of conv defs
334 |       - multiplier: Float multiplier for the depth (number of channels)
335 |       for all convolution ops. The value must be greater than zero. Typical
336 |       usage will be to set this value in (0, 1) to reduce the number of
337 |       parameters or computation cost of the model.
338 |       - output_stride: will ensure that the last layer has at most total stride.
339 |       If the architecture calls for more stride than that provided
340 |       (e.g. output_stride=16, but the architecture has 5 stride=2 operators),
341 |       it will replace output_stride with fractional convolutions using Atrous
342 |       Convolutions.
343 | 
344 |   Returns:
345 |     logits: the pre-softmax activations, a tensor of size
346 |       [batch_size, num_classes]
347 |     end_points: a dictionary from components of the network to the corresponding
348 |       activation tensor.
349 | 
350 |   Raises:
351 |     ValueError: Input rank is invalid.
352 |   """
353 |   is_training = mobilenet_args.get('is_training', False)
354 |   input_shape = inputs.get_shape().as_list()
355 |   if len(input_shape) != 4:
356 |     raise ValueError('Expected rank 4 input, was: %d' % len(input_shape))
357 | 
358 |   with tf.variable_scope(scope, 'Mobilenet', reuse=reuse) as scope:
359 |     inputs = tf.identity(inputs, 'input')
360 |     net, end_points = mobilenet_base(inputs, scope=scope, **mobilenet_args)
361 |     net = tf.identity(net, name='embedding')
362 |     end_points['embedding'] = net
363 | 
364 |     if base_only:
365 |       return net, end_points
366 | 
367 |     with tf.variable_scope('Logits'):
368 |       net = global_pool(net)
369 |       end_points['global_pool'] = net
370 |       if not num_classes:
371 |         return net, end_points
372 |       # Dropout turned off
373 |       # net = slim.dropout(net, scope='Dropout', is_training=False)
374 |       # 1 x 1 x num_classes
375 |       # Note: legacy scope name.
376 |       logits = slim.conv2d(
377 |           net,
378 |           num_classes, [1, 1],
379 |           activation_fn=None,
380 |           normalizer_fn=None,
381 |           biases_initializer=tf.zeros_initializer(),
382 |           scope='Conv2d_1c_1x1')
383 | 
384 |       logits = tf.squeeze(logits, [1, 2])
385 | 
386 |       logits = tf.identity(logits, name='output')
387 |     end_points['Logits'] = logits
388 |     if prediction_fn:
389 |       end_points['Predictions'] = prediction_fn(logits, 'Predictions')
390 |   return logits, end_points
391 | 
392 | 
393 | def global_pool(input_tensor, pool_op=tf.nn.avg_pool):
394 |   """Applies avg pool to produce 1x1 output.
395 | 
396 |   NOTE: This function is funcitonally equivalenet to reduce_mean, but it has
397 |   baked in average pool which has better support across hardware.
398 | 
399 |   Args:
400 |     input_tensor: input tensor
401 |     pool_op: pooling op (avg pool is default)
402 |   Returns:
403 |     a tensor batch_size x 1 x 1 x depth.
404 |   """
405 |   shape = input_tensor.get_shape().as_list()
406 |   if shape[1] is None or shape[2] is None:
407 |     kernel_size = tf.convert_to_tensor(
408 |         [1, tf.shape(input_tensor)[1],
409 |          tf.shape(input_tensor)[2], 1])
410 |   else:
411 |     kernel_size = [1, shape[1], shape[2], 1]
412 |   output = pool_op(
413 |       input_tensor, ksize=kernel_size, strides=[1, 1, 1, 1], padding='VALID')
414 |   # Recover output shape, for unknown shape.
415 |   output.set_shape([None, 1, 1, None])
416 |   return output
417 | 
418 | 
419 | def training_scope(is_training=True,
420 |                    weight_decay=0.00004,
421 |                    stddev=0.09,
422 |                    dropout_keep_prob=0.8,
423 |                    bn_decay=0.997):
424 |   """Defines Mobilenet training scope.
425 | 
426 |   Usage:
427 |      with tf.contrib.slim.arg_scope(mobilenet.training_scope()):
428 |        logits, endpoints = mobilenet_v2.mobilenet(input_tensor)
429 | 
430 |      # the network created will be trainble with dropout/batch norm
431 |      # initialized appropriately.
432 |   Args:
433 |     is_training: if set to False this will ensure that all customizations are
434 |       set to non-training mode. This might be helpful for code that is reused
435 |       across both training/evaluation, but most of the time training_scope with
436 |       value False is not needed. If this is set to None, the parameters is not
437 |       added to the batch_norm arg_scope.
438 | 
439 |     weight_decay: The weight decay to use for regularizing the model.
440 |     stddev: Standard deviation for initialization, if negative uses xavier.
441 |     dropout_keep_prob: dropout keep probability (not set if equals to None).
442 |     bn_decay: decay for the batch norm moving averages (not set if equals to
443 |       None).
444 | 
445 |   Returns:
446 |     An argument scope to use via arg_scope.
447 |   """
448 |   # Note: do not introduce parameters that would change the inference
449 |   # model here (for example whether to use bias), modify conv_def instead.
450 |   batch_norm_params = {
451 |       'decay': bn_decay,
452 |       'is_training': is_training
453 |   }
454 |   if stddev < 0:
455 |     weight_intitializer = slim.initializers.xavier_initializer()
456 |   else:
457 |     weight_intitializer = tf.truncated_normal_initializer(stddev=stddev)
458 | 
459 |   # Set weight_decay for weights in Conv and FC layers.
460 |   with slim.arg_scope(
461 |       [slim.conv2d, slim.fully_connected, slim.separable_conv2d],
462 |       weights_initializer=weight_intitializer,
463 |       normalizer_fn=slim.batch_norm), \
464 |       slim.arg_scope([mobilenet_base, mobilenet], is_training=is_training),\
465 |       safe_arg_scope([slim.batch_norm], **batch_norm_params), \
466 |       safe_arg_scope([slim.dropout], is_training=False,
467 |                      keep_prob=dropout_keep_prob), \
468 |       slim.arg_scope([slim.conv2d], \
469 |                      weights_regularizer=slim.l2_regularizer(weight_decay)), \
470 |       slim.arg_scope([slim.separable_conv2d], weights_regularizer=None) as s:
471 |     return s
472 | 


--------------------------------------------------------------------------------
/mobilenet/mobilenet_v2.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2018 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | """Implementation of Mobilenet V2.
 16 | 
 17 | Architecture: https://arxiv.org/abs/1801.04381
 18 | 
 19 | The base model gives 72.2% accuracy on ImageNet, with 300MMadds,
 20 | 3.4 M parameters.
 21 | """
 22 | 
 23 | from __future__ import absolute_import
 24 | from __future__ import division
 25 | from __future__ import print_function
 26 | 
 27 | import copy
 28 | import functools
 29 | import tensorflow as tf
 30 | 
 31 | from mobilenet import conv_blocks as ops
 32 | from mobilenet import mobilenet as lib
 33 | 
 34 | slim = tf.contrib.slim
 35 | op = lib.op
 36 | 
 37 | expand_input = ops.expand_input_by_factor
 38 | 
 39 | # pyformat: disable
 40 | # Architecture: https://arxiv.org/abs/1801.04381
 41 | V2_DEF = dict(
 42 |     defaults={
 43 |         # Note: these parameters of batch norm affect the architecture
 44 |         # that's why they are here and not in training_scope.
 45 |         (slim.batch_norm,): {'center': True, 'scale': True},
 46 |         (slim.conv2d, slim.fully_connected, slim.separable_conv2d): {
 47 |             'normalizer_fn': slim.batch_norm, 'activation_fn': tf.nn.relu6
 48 |         },
 49 |         (ops.expanded_conv,): {
 50 |             'expansion_size': expand_input(6),
 51 |             'split_expansion': 1,
 52 |             'normalizer_fn': slim.batch_norm,
 53 |             'residual': True
 54 |         },
 55 |         (slim.conv2d, slim.separable_conv2d): {'padding': 'SAME'}
 56 |     },
 57 |     spec=[
 58 |         op(slim.conv2d, stride=2, num_outputs=32, kernel_size=[3, 3]),
 59 |         op(ops.expanded_conv,
 60 |            expansion_size=expand_input(1, divisible_by=1),
 61 |            num_outputs=16),
 62 |         op(ops.expanded_conv, stride=2, num_outputs=24),
 63 |         op(ops.expanded_conv, stride=1, num_outputs=24),
 64 |         op(ops.expanded_conv, stride=2, num_outputs=32),
 65 |         op(ops.expanded_conv, stride=1, num_outputs=32),
 66 |         op(ops.expanded_conv, stride=1, num_outputs=32),
 67 |         op(ops.expanded_conv, stride=2, num_outputs=64),
 68 |         op(ops.expanded_conv, stride=1, num_outputs=64),
 69 |         op(ops.expanded_conv, stride=1, num_outputs=64),
 70 |         op(ops.expanded_conv, stride=1, num_outputs=64),
 71 |         op(ops.expanded_conv, stride=1, num_outputs=96),
 72 |         op(ops.expanded_conv, stride=1, num_outputs=96),
 73 |         op(ops.expanded_conv, stride=1, num_outputs=96),
 74 |         op(ops.expanded_conv, stride=2, num_outputs=160),
 75 |         op(ops.expanded_conv, stride=1, num_outputs=160),
 76 |         op(ops.expanded_conv, stride=1, num_outputs=160),
 77 |         op(ops.expanded_conv, stride=1, num_outputs=320),
 78 |         op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=1280)
 79 |     ],
 80 | )
 81 | # pyformat: enable
 82 | 
 83 | 
 84 | @slim.add_arg_scope
 85 | def mobilenet(input_tensor,
 86 |               num_classes=1001,
 87 |               depth_multiplier=1.0,
 88 |               scope='MobilenetV2',
 89 |               conv_defs=None,
 90 |               finegrain_classification_mode=False,
 91 |               min_depth=None,
 92 |               divisible_by=None,
 93 |               **kwargs):
 94 |   """Creates mobilenet V2 network.
 95 | 
 96 |   Inference mode is created by default. To create training use training_scope
 97 |   below.
 98 | 
 99 |   with tf.contrib.slim.arg_scope(mobilenet_v2.training_scope()):
100 |      logits, endpoints = mobilenet_v2.mobilenet(input_tensor)
101 | 
102 |   Args:
103 |     input_tensor: The input tensor
104 |     num_classes: number of classes
105 |     depth_multiplier: The multiplier applied to scale number of
106 |     channels in each layer. Note: this is called depth multiplier in the
107 |     paper but the name is kept for consistency with slim's model builder.
108 |     scope: Scope of the operator
109 |     conv_defs: Allows to override default conv def.
110 |     finegrain_classification_mode: When set to True, the model
111 |     will keep the last layer large even for small multipliers. Following
112 |     https://arxiv.org/abs/1801.04381
113 |     suggests that it improves performance for ImageNet-type of problems.
114 |       *Note* ignored if final_endpoint makes the builder exit earlier.
115 |     min_depth: If provided, will ensure that all layers will have that
116 |     many channels after application of depth multiplier.
117 |     divisible_by: If provided will ensure that all layers # channels
118 |     will be divisible by this number.
119 |     **kwargs: passed directly to mobilenet.mobilenet:
120 |       prediction_fn- what prediction function to use.
121 |       reuse-: whether to reuse variables (if reuse set to true, scope
122 |       must be given).
123 |   Returns:
124 |     logits/endpoints pair
125 | 
126 |   Raises:
127 |     ValueError: On invalid arguments
128 |   """
129 |   if conv_defs is None:
130 |     conv_defs = V2_DEF
131 |   if 'multiplier' in kwargs:
132 |     raise ValueError('mobilenetv2 doesn\'t support generic '
133 |                      'multiplier parameter use "depth_multiplier" instead.')
134 |   if finegrain_classification_mode:
135 |     conv_defs = copy.deepcopy(conv_defs)
136 |     if depth_multiplier < 1:
137 |       conv_defs['spec'][-1].params['num_outputs'] /= depth_multiplier
138 | 
139 |   depth_args = {}
140 |   # NB: do not set depth_args unless they are provided to avoid overriding
141 |   # whatever default depth_multiplier might have thanks to arg_scope.
142 |   if min_depth is not None:
143 |     depth_args['min_depth'] = min_depth
144 |   if divisible_by is not None:
145 |     depth_args['divisible_by'] = divisible_by
146 | 
147 |   with slim.arg_scope((lib.depth_multiplier,), **depth_args):
148 |     return lib.mobilenet(
149 |         input_tensor,
150 |         num_classes=num_classes,
151 |         conv_defs=conv_defs,
152 |         scope=scope,
153 |         multiplier=depth_multiplier,
154 |         **kwargs)
155 | 
156 | 
157 | @slim.add_arg_scope
158 | def mobilenet_base(input_tensor, depth_multiplier=1.0, **kwargs):
159 |   """Creates base of the mobilenet (no pooling and no logits) ."""
160 |   return mobilenet(input_tensor,
161 |                    depth_multiplier=depth_multiplier,
162 |                    base_only=True, **kwargs)
163 | 
164 | 
165 | def training_scope(**kwargs):
166 |   """Defines MobilenetV2 training scope.
167 | 
168 |   Usage:
169 |      with tf.contrib.slim.arg_scope(mobilenet_v2.training_scope()):
170 |        logits, endpoints = mobilenet_v2.mobilenet(input_tensor)
171 | 
172 |   with slim.
173 | 
174 |   Args:
175 |     **kwargs: Passed to mobilenet.training_scope. The following parameters
176 |     are supported:
177 |       weight_decay- The weight decay to use for regularizing the model.
178 |       stddev-  Standard deviation for initialization, if negative uses xavier.
179 |       dropout_keep_prob- dropout keep probability
180 |       bn_decay- decay for the batch norm moving averages.
181 | 
182 |   Returns:
183 |     An `arg_scope` to use for the mobilenet v2 model.
184 |   """
185 |   return lib.training_scope(**kwargs)
186 | 
187 | 
188 | def wrapped_partial(func, *args, **kwargs):
189 |   partial_func = functools.partial(func, *args, **kwargs)
190 |   functools.update_wrapper(partial_func, func)
191 |   return partial_func
192 | 
193 | 
194 | # Wrappers for mobilenet v2 with depth-multipliers. Be noticed that
195 | # 'finegrain_classification_mode' is set to True, which means the embedding
196 | # layer will not be shrinked when given a depth-multiplier < 1.0.
197 | mobilenet_v2_140 = wrapped_partial(mobilenet, depth_multiplier=1.4)
198 | mobilenet_v2_050 = wrapped_partial(mobilenet, depth_multiplier=0.50,
199 |                                    finegrain_classification_mode=True)
200 | mobilenet_v2_035 = wrapped_partial(mobilenet, depth_multiplier=0.35,
201 |                                    finegrain_classification_mode=True)
202 | 
203 | 
204 | __all__ = ['training_scope', 'mobilenet_base', 'mobilenet', 'V2_DEF']
205 | 


--------------------------------------------------------------------------------
/mobilenet/mobilenet_v2_test.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2018 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | """Tests for mobilenet_v2."""
 16 | 
 17 | from __future__ import absolute_import
 18 | from __future__ import division
 19 | from __future__ import print_function
 20 | import copy
 21 | import tensorflow as tf
 22 | from nets.mobilenet import conv_blocks as ops
 23 | from nets.mobilenet import mobilenet
 24 | from nets.mobilenet import mobilenet_v2
 25 | 
 26 | 
 27 | slim = tf.contrib.slim
 28 | 
 29 | 
 30 | def find_ops(optype):
 31 |   """Find ops of a given type in graphdef or a graph.
 32 | 
 33 |   Args:
 34 |     optype: operation type (e.g. Conv2D)
 35 |   Returns:
 36 |      List of operations.
 37 |   """
 38 |   gd = tf.get_default_graph()
 39 |   return [var for var in gd.get_operations() if var.type == optype]
 40 | 
 41 | 
 42 | class MobilenetV2Test(tf.test.TestCase):
 43 | 
 44 |   def setUp(self):
 45 |     tf.reset_default_graph()
 46 | 
 47 |   def testCreation(self):
 48 |     spec = dict(mobilenet_v2.V2_DEF)
 49 |     _, ep = mobilenet.mobilenet(
 50 |         tf.placeholder(tf.float32, (10, 224, 224, 16)), conv_defs=spec)
 51 |     num_convs = len(find_ops('Conv2D'))
 52 | 
 53 |     # This is mostly a sanity test. No deep reason for these particular
 54 |     # constants.
 55 |     #
 56 |     # All but first 2 and last one have  two convolutions, and there is one
 57 |     # extra conv that is not in the spec. (logits)
 58 |     self.assertEqual(num_convs, len(spec['spec']) * 2 - 2)
 59 |     # Check that depthwise are exposed.
 60 |     for i in range(2, 17):
 61 |       self.assertIn('layer_%d/depthwise_output' % i, ep)
 62 | 
 63 |   def testCreationNoClasses(self):
 64 |     spec = copy.deepcopy(mobilenet_v2.V2_DEF)
 65 |     net, ep = mobilenet.mobilenet(
 66 |         tf.placeholder(tf.float32, (10, 224, 224, 16)), conv_defs=spec,
 67 |         num_classes=None)
 68 |     self.assertIs(net, ep['global_pool'])
 69 | 
 70 |   def testImageSizes(self):
 71 |     for input_size, output_size in [(224, 7), (192, 6), (160, 5),
 72 |                                     (128, 4), (96, 3)]:
 73 |       tf.reset_default_graph()
 74 |       _, ep = mobilenet_v2.mobilenet(
 75 |           tf.placeholder(tf.float32, (10, input_size, input_size, 3)))
 76 | 
 77 |       self.assertEqual(ep['layer_18/output'].get_shape().as_list()[1:3],
 78 |                        [output_size] * 2)
 79 | 
 80 |   def testWithSplits(self):
 81 |     spec = copy.deepcopy(mobilenet_v2.V2_DEF)
 82 |     spec['overrides'] = {
 83 |         (ops.expanded_conv,): dict(split_expansion=2),
 84 |     }
 85 |     _, _ = mobilenet.mobilenet(
 86 |         tf.placeholder(tf.float32, (10, 224, 224, 16)), conv_defs=spec)
 87 |     num_convs = len(find_ops('Conv2D'))
 88 |     # All but 3 op has 3 conv operatore, the remainign 3 have one
 89 |     # and there is one unaccounted.
 90 |     self.assertEqual(num_convs, len(spec['spec']) * 3 - 5)
 91 | 
 92 |   def testWithOutputStride8(self):
 93 |     out, _ = mobilenet.mobilenet_base(
 94 |         tf.placeholder(tf.float32, (10, 224, 224, 16)),
 95 |         conv_defs=mobilenet_v2.V2_DEF,
 96 |         output_stride=8,
 97 |         scope='MobilenetV2')
 98 |     self.assertEqual(out.get_shape().as_list()[1:3], [28, 28])
 99 | 
100 |   def testDivisibleBy(self):
101 |     tf.reset_default_graph()
102 |     mobilenet_v2.mobilenet(
103 |         tf.placeholder(tf.float32, (10, 224, 224, 16)),
104 |         conv_defs=mobilenet_v2.V2_DEF,
105 |         divisible_by=16,
106 |         min_depth=32)
107 |     s = [op.outputs[0].get_shape().as_list()[-1] for op in find_ops('Conv2D')]
108 |     s = set(s)
109 |     self.assertSameElements([32, 64, 96, 160, 192, 320, 384, 576, 960, 1280,
110 |                              1001], s)
111 | 
112 |   def testDivisibleByWithArgScope(self):
113 |     tf.reset_default_graph()
114 |     # Verifies that depth_multiplier arg scope actually works
115 |     # if no default min_depth is provided.
116 |     with slim.arg_scope((mobilenet.depth_multiplier,), min_depth=32):
117 |       mobilenet_v2.mobilenet(
118 |           tf.placeholder(tf.float32, (10, 224, 224, 2)),
119 |           conv_defs=mobilenet_v2.V2_DEF, depth_multiplier=0.1)
120 |       s = [op.outputs[0].get_shape().as_list()[-1] for op in find_ops('Conv2D')]
121 |       s = set(s)
122 |       self.assertSameElements(s, [32, 192, 128, 1001])
123 | 
124 |   def testFineGrained(self):
125 |     tf.reset_default_graph()
126 |     # Verifies that depth_multiplier arg scope actually works
127 |     # if no default min_depth is provided.
128 | 
129 |     mobilenet_v2.mobilenet(
130 |         tf.placeholder(tf.float32, (10, 224, 224, 2)),
131 |         conv_defs=mobilenet_v2.V2_DEF, depth_multiplier=0.01,
132 |         finegrain_classification_mode=True)
133 |     s = [op.outputs[0].get_shape().as_list()[-1] for op in find_ops('Conv2D')]
134 |     s = set(s)
135 |     # All convolutions will be 8->48, except for the last one.
136 |     self.assertSameElements(s, [8, 48, 1001, 1280])
137 | 
138 |   def testMobilenetBase(self):
139 |     tf.reset_default_graph()
140 |     # Verifies that mobilenet_base returns pre-pooling layer.
141 |     with slim.arg_scope((mobilenet.depth_multiplier,), min_depth=32):
142 |       net, _ = mobilenet_v2.mobilenet_base(
143 |           tf.placeholder(tf.float32, (10, 224, 224, 16)),
144 |           conv_defs=mobilenet_v2.V2_DEF, depth_multiplier=0.1)
145 |       self.assertEqual(net.get_shape().as_list(), [10, 7, 7, 128])
146 | 
147 |   def testWithOutputStride16(self):
148 |     tf.reset_default_graph()
149 |     out, _ = mobilenet.mobilenet_base(
150 |         tf.placeholder(tf.float32, (10, 224, 224, 16)),
151 |         conv_defs=mobilenet_v2.V2_DEF,
152 |         output_stride=16)
153 |     self.assertEqual(out.get_shape().as_list()[1:3], [14, 14])
154 | 
155 |   def testWithOutputStride8AndExplicitPadding(self):
156 |     tf.reset_default_graph()
157 |     out, _ = mobilenet.mobilenet_base(
158 |         tf.placeholder(tf.float32, (10, 224, 224, 16)),
159 |         conv_defs=mobilenet_v2.V2_DEF,
160 |         output_stride=8,
161 |         use_explicit_padding=True,
162 |         scope='MobilenetV2')
163 |     self.assertEqual(out.get_shape().as_list()[1:3], [28, 28])
164 | 
165 |   def testWithOutputStride16AndExplicitPadding(self):
166 |     tf.reset_default_graph()
167 |     out, _ = mobilenet.mobilenet_base(
168 |         tf.placeholder(tf.float32, (10, 224, 224, 16)),
169 |         conv_defs=mobilenet_v2.V2_DEF,
170 |         output_stride=16,
171 |         use_explicit_padding=True)
172 |     self.assertEqual(out.get_shape().as_list()[1:3], [14, 14])
173 | 
174 |   def testBatchNormScopeDoesNotHaveIsTrainingWhenItsSetToNone(self):
175 |     sc = mobilenet.training_scope(is_training=None)
176 |     self.assertNotIn('is_training', sc[slim.arg_scope_func_key(
177 |         slim.batch_norm)])
178 | 
179 |   def testBatchNormScopeDoesHasIsTrainingWhenItsNotNone(self):
180 |     sc = mobilenet.training_scope(is_training=False)
181 |     self.assertIn('is_training', sc[slim.arg_scope_func_key(slim.batch_norm)])
182 |     sc = mobilenet.training_scope(is_training=True)
183 |     self.assertIn('is_training', sc[slim.arg_scope_func_key(slim.batch_norm)])
184 |     sc = mobilenet.training_scope()
185 |     self.assertIn('is_training', sc[slim.arg_scope_func_key(slim.batch_norm)])
186 | 
187 | 
188 | if __name__ == '__main__':
189 |   tf.test.main()
190 | 


--------------------------------------------------------------------------------
/mobilenet/readme.md:
--------------------------------------------------------------------------------
1 | # MobileNetV2
2 | Placeholder for the code for MobileNetV2 from the TFSlim model library.
3 | Copy the code from the following link and place in this folder:
4 | 
5 | https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet
6 | 


--------------------------------------------------------------------------------
/model.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from mobilenet import mobilenet_v2
  3 | import input_pipeline
  4 | import pprint
  5 | import time
  6 | from PIL import Image
  7 | import imagenet_preprocessing
  8 | import absl
  9 | from argparse import Namespace
 10 | import sys
 11 | import os
 12 | import numpy as np
 13 | import horovod.tensorflow as hvd
 14 | from tensorflow.core.framework.summary_pb2 import Summary
 15 | from tensorflow import logging
 16 | import mobilefacenet
 17 | import horovod.tensorflow as hvd
 18 | from mobilenet.mobilenet import global_pool
 19 | from mobilenet import conv_blocks as ops
 20 | import proxy_metric_learning
 21 | from inception import inception_utils
 22 | from inception import inception_v1
 23 | import resnet
 24 | from cifar import cifar
 25 | from cifar.dist import pairwise_distance_euclid, triplet_semihard_loss
 26 | 
 27 | slim = tf.contrib.slim
 28 | # Define all necessary hyperparameters as flags
 29 | flags = absl.flags
 30 | FLAGS = flags.FLAGS
 31 | 
 32 | 
 33 | def imagenet_iterator(is_training, num_epochs=1):
 34 |   filter_fn = None
 35 |   fixed_crop = False
 36 |   if 'face' in FLAGS.model:
 37 |     input_pipeline._DEFAULT_IMAGE_SIZE = 112
 38 |     imagenet_preprocessing._RESIZE_MIN = 112
 39 |     # imagenet_preprocessing._AREA_RANGE = [1.0, 1.0]
 40 |     # imagenet_preprocessing._ASPECT_RATIO_RANGE = [0.9, 1.1]
 41 |     # is_training = False
 42 |     fixed_crop = True
 43 | 
 44 |   if FLAGS.model == 'metric_learning':
 45 |     input_pipeline._DEFAULT_IMAGE_SIZE = 112
 46 |     imagenet_preprocessing._RESIZE_MIN = 112 
 47 |     imagenet_preprocessing._AREA_RANGE = [0.7, 1.0]
 48 |     imagenet_preprocessing._ASPECT_RATIO_RANGE = [0.9, 1.1]
 49 |     fixed_crop = False
 50 | 
 51 |   if FLAGS.model == 'mobilenet':
 52 |     imagenet_preprocessing._AREA_RANGE = [0.2, 1.0]
 53 |     fixed_crop = False
 54 | 
 55 |   if FLAGS.model == 'cifar100':
 56 |     return cifar.input_fn(FLAGS.data_dir, is_training, FLAGS.batch_size, num_epochs)
 57 |   else:
 58 |     data = input_pipeline.input_fn(
 59 |       is_training=is_training,
 60 |       data_dir=FLAGS.data_dir,
 61 |       batch_size=FLAGS.batch_size,
 62 |       num_epochs=num_epochs,
 63 |       return_label_text=True,
 64 |       filter_fn=filter_fn,
 65 |       fixed_crop=fixed_crop
 66 |     )
 67 | 
 68 |   iterator = data.make_one_shot_iterator()
 69 |   features, labels, label_texts, filename = iterator.get_next()
 70 |   feature_dict = {
 71 |     "features": features,
 72 |     "labels": labels,
 73 |     "label_texts": label_texts,
 74 |     "filename": filename
 75 |   }
 76 | 
 77 |   return feature_dict, labels
 78 | 
 79 | 
 80 | def model_fn(features, labels, mode):
 81 |   feature_dict = features
 82 |   labels = tf.reshape(feature_dict["labels"], [-1])
 83 | 
 84 |   if mode == tf.estimator.ModeKeys.TRAIN:
 85 |     is_training = True
 86 |   else:
 87 |     is_training = False
 88 | 
 89 |   if FLAGS.model == 'mobilenet':
 90 |     scope = mobilenet_v2.training_scope(is_training=is_training)
 91 |     with tf.contrib.slim.arg_scope(scope):
 92 |       net, end_points = mobilenet_v2.mobilenet(
 93 |         feature_dict["features"], is_training=is_training, num_classes=FLAGS.num_classes)
 94 |       end_points['embedding'] = end_points['global_pool']
 95 |   elif FLAGS.model == 'mobilefacenet':
 96 |     scope = mobilenet_v2.training_scope(is_training=is_training)
 97 |     with tf.contrib.slim.arg_scope(scope):
 98 |       net, end_points = mobilefacenet.mobilefacenet(
 99 |         feature_dict["features"], is_training=is_training, num_classes=FLAGS.num_classes)
100 |   elif FLAGS.model == 'metric_learning':
101 |     with slim.arg_scope(inception_v1.inception_v1_arg_scope(weight_decay=0.0)):
102 |       net, end_points = proxy_metric_learning.metric_learning(
103 |         feature_dict["features"], is_training=is_training, num_classes=FLAGS.num_classes)
104 |   elif FLAGS.model == 'faceresnet':
105 |     net, end_points = mobilefacenet.faceresnet(
106 |       feature_dict["features"], is_training=is_training, num_classes=FLAGS.num_classes)
107 |   elif FLAGS.model == 'cifar100':
108 |     net, end_points = cifar.nin(feature_dict["features"], labels,
109 |         is_training=is_training, num_classes=FLAGS.num_classes)
110 |   else:
111 |     raise ValueError("Unknown model %s" % FLAGS.model)
112 | 
113 |   small_embeddings = tf.squeeze(end_points['embedding'])
114 |   logits = end_points['Logits']
115 |   predictions = tf.cast(tf.argmax(logits, 1), dtype=tf.int32)
116 | 
117 |   if FLAGS.final_activation in ['soft_thresh', 'none']:
118 |     abs_embeddings = tf.abs(small_embeddings)
119 |   else:
120 |     abs_embeddings = small_embeddings
121 |   nnz_small_embeddings = tf.cast(tf.less(FLAGS.zero_threshold, abs_embeddings), dtype=tf.float32)
122 |   small_sparsity = tf.reduce_sum(nnz_small_embeddings, axis=1)
123 |   small_sparsity = tf.reduce_mean(small_sparsity)
124 | 
125 |   mean_nnz_col = tf.reduce_mean(nnz_small_embeddings, axis=0)
126 |   sum_nnz_col = tf.reduce_sum(nnz_small_embeddings, axis=0)
127 |   mean_flops_ub = tf.reduce_sum(sum_nnz_col * (sum_nnz_col - 1)) / (FLAGS.batch_size * (FLAGS.batch_size - 1))
128 |   mean_flops = tf.reduce_sum(mean_nnz_col * mean_nnz_col)
129 | 
130 |   l1_norm_row = tf.reduce_sum(abs_embeddings, axis=1)
131 |   l1_norm_col = tf.reduce_mean(abs_embeddings, axis=0)
132 | 
133 |   mean_l1_norm = tf.reduce_mean(l1_norm_row)
134 |   mean_flops_sur = tf.reduce_sum(l1_norm_col * l1_norm_col)
135 | 
136 |   if mode == tf.estimator.ModeKeys.PREDICT:
137 |     predictions_dict = {
138 |       'predictions': predictions,
139 |       'true_labels': feature_dict["labels"],
140 |       'true_label_texts': feature_dict["label_texts"],
141 |       'small_embeddings': small_embeddings,
142 |       'sparsity/small': small_sparsity,
143 |       'sparsity/flops': mean_flops_ub,
144 |       'filename': features["filename"]
145 |     }
146 |     return tf.estimator.EstimatorSpec(mode, predictions=predictions_dict)
147 | 
148 |   if FLAGS.model == 'mobilenet':
149 |     cr_ent_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels)
150 |   elif 'face' in FLAGS.model:
151 |     cr_ent_loss = mobilefacenet.arcface_loss(logits=logits, labels=labels, out_num=FLAGS.num_classes)
152 |   elif FLAGS.model == 'metric_learning':
153 |     cr_ent_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels)
154 |     # cr_ent_loss = mobilefacenet.arcface_loss(logits=logits, labels=labels, out_num=FLAGS.num_classes)
155 |   elif FLAGS.model == 'cifar100':
156 |     # cr_ent_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels)
157 |     cr_ent_loss = tf.contrib.losses.metric_learning.triplet_semihard_loss(
158 |         labels, small_embeddings, margin=0.1)
159 |     # cr_ent_loss = mobilefacenet.arcface_loss(logits=logits,
160 |     #     labels=labels, out_num=FLAGS.num_classes, m=0.0)
161 |     # cr_ent_loss = triplet_semihard_loss(labels=labels, embeddings=small_embeddings,
162 |     #     pairwise_distance=lambda embed: pairwise_distance_euclid(embed, squared=True), margin=0.3)
163 |   else:
164 |     raise ValueError('Unknown model %s' % FLAGS.model)
165 |   cr_ent_loss = tf.reduce_mean(cr_ent_loss) + tf.losses.get_regularization_loss()
166 | 
167 |   ema = tf.train.ExponentialMovingAverage(decay=0.99)
168 |   ema_op = ema.apply([mean_l1_norm])
169 |   moving_l1_norm = ema.average(mean_l1_norm)
170 | 
171 |   global_step = tf.train.get_or_create_global_step()
172 |   all_ops = [ema_op]
173 | 
174 |   l1_weight = tf.Variable(0.0, name='l1_weight', trainable=False)
175 | 
176 |   if FLAGS.l1_weighing_scheme == 'constant':
177 |     l1_weight = FLAGS.l1_parameter
178 |   elif FLAGS.l1_weighing_scheme == 'dynamic_1':
179 |     l1_weight = FLAGS.l1_parameter / moving_l1_norm
180 |     l1_weight = tf.stop_gradient(l1_weight)
181 |     l1_weight = tf.train.piecewise_constant(
182 |       x=global_step, boundaries=[5], values=[0.0, l1_weight])
183 |   elif FLAGS.l1_weighing_scheme == 'dynamic_2':
184 |     if FLAGS.sparsity_type == "flops_sur":
185 |       update_lr = 1e-5
186 |     else:
187 |       update_lr = 1e-4
188 |     update = update_lr * (FLAGS.l1_parameter - cr_ent_loss)
189 |     assign_op = tf.assign(l1_weight, tf.nn.relu(l1_weight + update))
190 |     all_ops.append(assign_op)
191 |   elif FLAGS.l1_weighing_scheme == 'dynamic_3':
192 |     update_lr = 1e-4
193 |     global_step = tf.train.get_or_create_global_step()
194 |     upper_bound = FLAGS.l1_parameter - (FLAGS.l1_parameter - 12.0) * tf.cast(global_step, tf.float32) / 5e5
195 |     upper_bound = tf.cast(upper_bound, tf.float32)
196 |     update = update_lr * tf.sign(upper_bound - cr_ent_loss)
197 |     assign_op = tf.assign(l1_weight, l1_weight + tf.nn.relu(update))
198 |     all_ops.append(assign_op)
199 |   elif FLAGS.l1_weighing_scheme == 'dynamic_4':
200 |     l1_weight = FLAGS.l1_parameter * tf.minimum(1.0,
201 |         (tf.cast(global_step, tf.float32) / FLAGS.l1_p_steps)**2)
202 |   elif FLAGS.l1_weighing_scheme is None:
203 |     l1_weight = 0.0
204 |   else:
205 |     raise ValueError('Unknown l1_weighing_scheme %s' % FLAGS.l1_weighing_scheme)
206 | 
207 |   if FLAGS.sparsity_type == 'l1_norm':
208 |     sparsity_loss = l1_weight * mean_l1_norm
209 |   elif FLAGS.sparsity_type == 'flops_sur':
210 |     sparsity_loss = l1_weight * mean_flops_sur
211 |   elif FLAGS.sparsity_type is None:
212 |     sparsity_loss = 0.0
213 |   else:
214 |     raise ValueError("Unknown sparsity_type %d" % FLAGS.sparsity_type)
215 | 
216 |   total_loss = cr_ent_loss + sparsity_loss
217 |   accuracy = tf.reduce_mean(tf.cast(tf.equal(predictions, labels), dtype=tf.float32))
218 | 
219 |   if mode == tf.estimator.ModeKeys.EVAL:
220 |     global_accuracy = tf.metrics.mean(accuracy)
221 |     global_small_sparsity = tf.metrics.mean(small_sparsity)
222 |     global_flops = tf.metrics.mean(mean_flops_ub)
223 |     global_l1_norm = tf.metrics.mean(mean_l1_norm)
224 |     global_mean_flops_sur = tf.metrics.mean(mean_flops_sur)
225 |     global_cr_ent_loss = tf.metrics.mean(cr_ent_loss)
226 | 
227 |     metrics = {
228 |       'accuracy': global_accuracy,
229 |       'sparsity/small': global_small_sparsity,
230 |       'sparsity/flops': global_flops,
231 |       'l1_norm': global_l1_norm,
232 |       'l1_norm/flops_sur': global_mean_flops_sur,
233 |       'loss/cr_ent_loss': global_cr_ent_loss,
234 |     }
235 |     return tf.estimator.EstimatorSpec(mode, loss=total_loss, eval_metric_ops=metrics)
236 | 
237 |   if mode == tf.estimator.ModeKeys.TRAIN:
238 |     base_lrate = FLAGS.learning_rate
239 |     learning_rate = tf.train.piecewise_constant(
240 |       x=global_step,
241 |       boundaries=[FLAGS.decay_step if FLAGS.decay_step is not None else int(1e7)],
242 |       values=[base_lrate, base_lrate / 10.0])
243 | 
244 |     tf.summary.image("input", feature_dict["features"], max_outputs=1)
245 |     tf.summary.scalar('sparsity/small', small_sparsity)
246 |     tf.summary.scalar('sparsity/flops', mean_flops_ub)
247 |     tf.summary.scalar('accuracy', accuracy)
248 |     tf.summary.scalar('l1_norm', mean_l1_norm)
249 |     tf.summary.scalar('l1_norm/ema', moving_l1_norm)  # Comment this for mom
250 |     tf.summary.scalar('l1_norm/l1_weight', l1_weight)
251 |     tf.summary.scalar('l1_norm/flops_sur', mean_flops_sur)
252 |     tf.summary.scalar('learning_rate', learning_rate)
253 |     tf.summary.scalar('loss/cr_ent_loss', cr_ent_loss)
254 |     tf.summary.scalar('sparsity/ratio',
255 |                       mean_flops * FLAGS.embedding_size / (small_sparsity * small_sparsity))
256 |     try:
257 |       tf.summary.scalar('loss/upper_bound', upper_bound)
258 |     except NameError:
259 |       print("Skipping 'upper_bound' summary")
260 |     for variable in tf.trainable_variables():
261 |       if 'soft_thresh' in variable.name:
262 |         print('Adding summary for lambda')
263 |         tf.summary.scalar('lambda', variable)
264 |     
265 |     # Histogram summaries
266 |     # gamma = tf.get_default_graph().get_tensor_by_name('MobilenetV2/Conv_2/BatchNorm/gamma:0')
267 |     # pre_relu = tf.get_default_graph().get_tensor_by_name(
268 |     #       'MobilenetV2/Conv_2/BatchNorm/FusedBatchNorm:0')
269 |     # pre_relu = tf.squeeze(pre_relu)
270 |     # tf.summary.histogram('gamma', gamma)
271 |     # tf.summary.histogram('pre_relu', pre_relu[:, 237])
272 |     # tf.summary.histogram('small_activations', nnz_small_embeddings)
273 |     # tf.summary.histogram('small_activations/log', tf.log(nnz_small_embeddings + 1e-10))
274 |     # fl_sur_ratio = (mean_nnz_col * mean_nnz_col) / (l1_norm_col * l1_norm_col)
275 |     # tf.summary.histogram('fl_sur_ratio', fl_sur_ratio)
276 |     # tf.summary.histogram('fl_sur_ratio/log', tf.log(fl_sur_ratio + 1e-10))
277 |     # l1_sur_ratio = (mean_nnz_col * mean_nnz_col) / l1_norm_col
278 |     # tf.summary.histogram('l1_sur_ratio', l1_sur_ratio)
279 |     # tf.summary.histogram('l1_sur_ratio/log', tf.log(l1_sur_ratio + 1e-10))
280 | 
281 |     if FLAGS.optimizer == 'mom':
282 |       optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate,
283 |                                              momentum=FLAGS.momentum)
284 |     elif FLAGS.optimizer == 'sgd':
285 |       optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
286 |     elif FLAGS.optimizer == 'adam':
287 |       optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate, epsilon=0.0001)
288 |     elif FLAGS.optimizer == 'rmsprop':
289 |       optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate, epsilon=0.01,
290 |         momentum=FLAGS.momentum)
291 |     else:
292 |       raise ValueError("Unknown optimizer %s" % FLAGS.optimizer)
293 | 
294 |     # Make the optimizer distributed
295 |     optimizer = hvd.DistributedOptimizer(optimizer)
296 | 
297 |     train_op = tf.contrib.slim.learning.create_train_op(
298 |       total_loss,
299 |       optimizer,
300 |       global_step=tf.train.get_or_create_global_step()
301 |     )
302 |     all_ops.append(train_op)
303 |     merged = tf.group(*all_ops)
304 | 
305 |     return tf.estimator.EstimatorSpec(mode, loss=total_loss, train_op=merged)
306 | 


--------------------------------------------------------------------------------
/preprocessing/build_image_data.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2016 Google Inc. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | """Converts image data to TFRecords file format with Example protos.
 16 | 
 17 | The image data set is expected to reside in JPEG files located in the
 18 | following directory structure.
 19 | 
 20 |   data_dir/label_0/image0.jpeg
 21 |   data_dir/label_0/image1.jpg
 22 |   ...
 23 |   data_dir/label_1/weird-image.jpeg
 24 |   data_dir/label_1/my-image.jpeg
 25 |   ...
 26 | 
 27 | where the sub-directory is the unique label associated with these images.
 28 | 
 29 | This TensorFlow script converts the training and evaluation data into
 30 | a sharded data set consisting of TFRecord files
 31 | 
 32 |   train_directory/train-00000-of-01024
 33 |   train_directory/train-00001-of-01024
 34 |   ...
 35 |   train_directory/train-01023-of-01024
 36 | 
 37 | and
 38 | 
 39 |   validation_directory/validation-00000-of-00128
 40 |   validation_directory/validation-00001-of-00128
 41 |   ...
 42 |   validation_directory/validation-00127-of-00128
 43 | 
 44 | where we have selected 1024 and 128 shards for each data set. Each record
 45 | within the TFRecord file is a serialized Example proto. The Example proto
 46 | contains the following fields:
 47 | 
 48 |   image/encoded: string containing JPEG encoded image in RGB colorspace
 49 |   image/height: integer, image height in pixels
 50 |   image/width: integer, image width in pixels
 51 |   image/colorspace: string, specifying the colorspace, always 'RGB'
 52 |   image/channels: integer, specifying the number of channels, always 3
 53 |   image/format: string, specifying the format, always 'JPEG'
 54 | 
 55 |   image/filename: string containing the basename of the image file
 56 |             e.g. 'n01440764_10026.JPEG' or 'ILSVRC2012_val_00000293.JPEG'
 57 |   image/class/label: integer specifying the index in a classification layer.
 58 |     The label ranges from [0, num_labels] where 0 is unused and left as
 59 |     the background class.
 60 |   image/class/text: string specifying the human-readable version of the label
 61 |     e.g. 'dog'
 62 | 
 63 | If your data set involves bounding boxes, please look at build_imagenet_data.py.
 64 | """
 65 | from __future__ import absolute_import
 66 | from __future__ import division
 67 | from __future__ import print_function
 68 | 
 69 | from datetime import datetime
 70 | import os
 71 | import random
 72 | import sys
 73 | import threading
 74 | import json
 75 | 
 76 | import numpy as np
 77 | import tensorflow as tf
 78 | 
 79 | tf.app.flags.DEFINE_string('train_directory', None,
 80 |                            'Training data directory')
 81 | tf.app.flags.DEFINE_string('validation_directory', None,
 82 |                            'Validation data directory')
 83 | tf.app.flags.DEFINE_string('output_directory', None,
 84 |                            'Output data directory')
 85 | 
 86 | tf.app.flags.DEFINE_integer('train_shards', 1024,
 87 |                             'Number of shards in training TFRecord files.')
 88 | tf.app.flags.DEFINE_integer('validation_shards', 1024,
 89 |                             'Number of shards in validation TFRecord files.')
 90 | 
 91 | tf.app.flags.DEFINE_integer('num_threads', 16,
 92 |                             'Number of threads to preprocess the images.')
 93 | 
 94 | # The labels file contains a list of valid labels are held in this file.
 95 | # Assumes that the file contains entries as such:
 96 | #   dog
 97 | #   cat
 98 | #   flower
 99 | # where each line corresponds to a label. We map each label contained in
100 | # the file to an integer corresponding to the line number starting from 0.
101 | tf.app.flags.DEFINE_string('labels_file', None, 'Labels file')
102 | tf.app.flags.DEFINE_string('filelist', None, 'File list')
103 | 
104 | 
105 | FLAGS = tf.app.flags.FLAGS
106 | 
107 | 
108 | def _int64_feature(value):
109 |   """Wrapper for inserting int64 features into Example proto."""
110 |   if not isinstance(value, list):
111 |     value = [value]
112 |   return tf.train.Feature(int64_list=tf.train.Int64List(value=value))
113 | 
114 | 
115 | def _bytes_feature(value):
116 |   """Wrapper for inserting bytes features into Example proto."""
117 |   return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))
118 | 
119 | 
120 | def _convert_to_example(filename, image_buffer, label, text, height, width):
121 |   """Build an Example proto for an example.
122 | 
123 |   Args:
124 |     filename: string, path to an image file, e.g., '/path/to/example.JPG'
125 |     image_buffer: string, JPEG encoding of RGB image
126 |     label: integer, identifier for the ground truth for the network
127 |     text: string, unique human-readable, e.g. 'dog'
128 |     height: integer, image height in pixels
129 |     width: integer, image width in pixels
130 |   Returns:
131 |     Example proto
132 |   """
133 | 
134 |   colorspace = 'RGB'
135 |   channels = 3
136 |   image_format = 'JPEG'
137 | 
138 |   example = tf.train.Example(features=tf.train.Features(feature={
139 |       'image/height': _int64_feature(height),
140 |       'image/width': _int64_feature(width),
141 |       'image/colorspace': _bytes_feature(tf.compat.as_bytes(colorspace)),
142 |       'image/channels': _int64_feature(channels),
143 |       'image/class/label': _int64_feature(label),
144 |       'image/class/text': _bytes_feature(tf.compat.as_bytes(text)),
145 |       'image/format': _bytes_feature(tf.compat.as_bytes(image_format)),
146 |       'image/filename': _bytes_feature(tf.compat.as_bytes(os.path.basename(filename))),
147 |       'image/encoded': _bytes_feature(tf.compat.as_bytes(image_buffer))}))
148 |   return example
149 | 
150 | 
151 | class ImageCoder(object):
152 |   """Helper class that provides TensorFlow image coding utilities."""
153 | 
154 |   def __init__(self):
155 |     # Create a single Session to run all image coding calls.
156 |     self._sess = tf.Session()
157 | 
158 |     # Initializes function that converts PNG to JPEG data.
159 |     self._png_data = tf.placeholder(dtype=tf.string)
160 |     image = tf.image.decode_png(self._png_data, channels=3)
161 |     self._png_to_jpeg = tf.image.encode_jpeg(image, format='rgb', quality=100)
162 | 
163 |     # Initializes function that decodes RGB JPEG data.
164 |     self._decode_jpeg_data = tf.placeholder(dtype=tf.string)
165 |     self._decode_jpeg = tf.image.decode_jpeg(self._decode_jpeg_data, channels=3)
166 | 
167 |   def png_to_jpeg(self, image_data):
168 |     return self._sess.run(self._png_to_jpeg,
169 |                           feed_dict={self._png_data: image_data})
170 | 
171 |   def decode_jpeg(self, image_data):
172 |     image = self._sess.run(self._decode_jpeg,
173 |                            feed_dict={self._decode_jpeg_data: image_data})
174 |     assert len(image.shape) == 3
175 |     assert image.shape[2] == 3
176 |     return image
177 | 
178 | 
179 | def _is_png(filename):
180 |   """Determine if a file contains a PNG format image.
181 | 
182 |   Args:
183 |     filename: string, path of the image file.
184 | 
185 |   Returns:
186 |     boolean indicating if the image is a PNG.
187 |   """
188 |   return filename.endswith('.png')
189 | 
190 | 
191 | def _process_image(filename, coder):
192 |   """Process a single image file.
193 | 
194 |   Args:
195 |     filename: string, path to an image file e.g., '/path/to/example.JPG'.
196 |     coder: instance of ImageCoder to provide TensorFlow image coding utils.
197 |   Returns:
198 |     image_buffer: string, JPEG encoding of RGB image.
199 |     height: integer, image height in pixels.
200 |     width: integer, image width in pixels.
201 |   """
202 |   # Read the image file.
203 |   with tf.gfile.FastGFile(filename, 'rb') as f:
204 |     image_data = f.read()
205 | 
206 |   # Convert any PNG to JPEG's for consistency.
207 |   if _is_png(filename):
208 |     # print('Converting PNG to JPEG for %s' % filename)
209 |     image_data = coder.png_to_jpeg(image_data)
210 | 
211 |   # Decode the RGB JPEG.
212 |   image = coder.decode_jpeg(image_data)
213 | 
214 |   # Check that image converted to RGB
215 |   assert len(image.shape) == 3
216 |   height = image.shape[0]
217 |   width = image.shape[1]
218 |   assert image.shape[2] == 3
219 | 
220 |   return image_data, height, width
221 | 
222 | 
223 | def _process_image_files_batch(coder, thread_index, ranges, name, filenames,
224 |                                texts, labels, num_shards):
225 |   """Processes and saves list of images as TFRecord in 1 thread.
226 | 
227 |   Args:
228 |     coder: instance of ImageCoder to provide TensorFlow image coding utils.
229 |     thread_index: integer, unique batch to run index is within [0, len(ranges)).
230 |     ranges: list of pairs of integers specifying ranges of each batches to
231 |       analyze in parallel.
232 |     name: string, unique identifier specifying the data set
233 |     filenames: list of strings; each string is a path to an image file
234 |     texts: list of strings; each string is human readable, e.g. 'dog'
235 |     labels: list of integer; each integer identifies the ground truth
236 |     num_shards: integer number of shards for this data set.
237 |   """
238 |   # Each thread produces N shards where N = int(num_shards / num_threads).
239 |   # For instance, if num_shards = 128, and the num_threads = 2, then the first
240 |   # thread would produce shards [0, 64).
241 |   num_threads = len(ranges)
242 |   assert not num_shards % num_threads
243 |   num_shards_per_batch = int(num_shards / num_threads)
244 | 
245 |   shard_ranges = np.linspace(ranges[thread_index][0],
246 |                              ranges[thread_index][1],
247 |                              num_shards_per_batch + 1).astype(int)
248 |   num_files_in_thread = ranges[thread_index][1] - ranges[thread_index][0]
249 | 
250 |   counter = 0
251 |   for s in range(num_shards_per_batch):
252 |     # Generate a sharded version of the file name, e.g. 'train-00002-of-00010'
253 |     shard = thread_index * num_shards_per_batch + s
254 |     output_filename = '%s-%.5d-of-%.5d' % (name, shard, num_shards)
255 |     output_file = os.path.join(FLAGS.output_directory, output_filename)
256 |     writer = tf.python_io.TFRecordWriter(output_file)
257 | 
258 |     shard_counter = 0
259 |     files_in_shard = np.arange(shard_ranges[s], shard_ranges[s + 1], dtype=int)
260 |     for i in files_in_shard:
261 |       filename = filenames[i]
262 |       label = labels[i]
263 |       text = texts[i]
264 | 
265 |       try:
266 |         image_buffer, height, width = _process_image(filename, coder)
267 |       except Exception as e:
268 |         print(e)
269 |         print('SKIPPED: Unexpected error while decoding %s.' % filename)
270 |         continue
271 | 
272 |       example = _convert_to_example(filename, image_buffer, label,
273 |                                     text, height, width)
274 |       writer.write(example.SerializeToString())
275 |       shard_counter += 1
276 |       counter += 1
277 | 
278 |       if not counter % 1000:
279 |         print('%s [thread %d]: Processed %d of %d images in thread batch.' %
280 |               (datetime.now(), thread_index, counter, num_files_in_thread))
281 |         sys.stdout.flush()
282 | 
283 |     writer.close()
284 |     print('%s [thread %d]: Wrote %d images to %s' %
285 |           (datetime.now(), thread_index, shard_counter, output_file))
286 |     sys.stdout.flush()
287 |     shard_counter = 0
288 |   print('%s [thread %d]: Wrote %d images to %d shards.' %
289 |         (datetime.now(), thread_index, counter, num_files_in_thread))
290 |   sys.stdout.flush()
291 | 
292 | 
293 | def _process_image_files(name, filenames, texts, labels, num_shards):
294 |   """Process and save list of images as TFRecord of Example protos.
295 | 
296 |   Args:
297 |     name: string, unique identifier specifying the data set
298 |     filenames: list of strings; each string is a path to an image file
299 |     texts: list of strings; each string is human readable, e.g. 'dog'
300 |     labels: list of integer; each integer identifies the ground truth
301 |     num_shards: integer number of shards for this data set.
302 |   """
303 |   assert len(filenames) == len(texts)
304 |   assert len(filenames) == len(labels)
305 | 
306 |   # Break all images into batches with a [ranges[i][0], ranges[i][1]].
307 |   spacing = np.linspace(0, len(filenames), FLAGS.num_threads + 1).astype(np.int)
308 |   ranges = []
309 |   for i in range(len(spacing) - 1):
310 |     ranges.append([spacing[i], spacing[i + 1]])
311 | 
312 |   # Launch a thread for each batch.
313 |   print('Launching %d threads for spacings: %s' % (FLAGS.num_threads, ranges))
314 |   sys.stdout.flush()
315 | 
316 |   # Create a mechanism for monitoring when all threads are finished.
317 |   coord = tf.train.Coordinator()
318 | 
319 |   # Create a generic TensorFlow-based utility for converting all image codings.
320 |   coder = ImageCoder()
321 | 
322 |   threads = []
323 |   for thread_index in range(len(ranges)):
324 |     args = (coder, thread_index, ranges, name, filenames,
325 |             texts, labels, num_shards)
326 |     t = threading.Thread(target=_process_image_files_batch, args=args)
327 |     t.start()
328 |     threads.append(t)
329 | 
330 |   # Wait for all the threads to terminate.
331 |   coord.join(threads)
332 |   print('%s: Finished writing all %d images in data set.' %
333 |         (datetime.now(), len(filenames)))
334 |   sys.stdout.flush()
335 | 
336 | 
337 | def _find_image_files(data_dir, labels_file):
338 |   """Build a list of all images files and labels in the data set.
339 | 
340 |   Args:
341 |     data_dir: string, path to the root directory of images.
342 | 
343 |       Assumes that the image data set resides in JPEG files located in
344 |       the following directory structure.
345 | 
346 |         data_dir/dog/another-image.JPEG
347 |         data_dir/dog/my-image.jpg
348 | 
349 |       where 'dog' is the label associated with these images.
350 | 
351 |     labels_file: string, path to the labels file.
352 | 
353 |       The list of valid labels are held in this file. Assumes that the file
354 |       contains entries as such:
355 |         dog
356 |         cat
357 |         flower
358 |       where each line corresponds to a label. We map each label contained in
359 |       the file to an integer starting with the integer 0 corresponding to the
360 |       label contained in the first line.
361 | 
362 |   Returns:
363 |     filenames: list of strings; each string is a path to an image file.
364 |     texts: list of strings; each string is the class, e.g. 'dog'
365 |     labels: list of integer; each integer identifies the ground truth.
366 |   """
367 |   print('Determining list of input files and labels from %s.' % data_dir)
368 |   unique_labels = [l.strip() for l in tf.gfile.FastGFile(
369 |       labels_file, 'r').readlines()]
370 | 
371 |   labels = []
372 |   filenames = []
373 |   texts = []
374 | 
375 |   # Leave label index 0 empty as a background class.
376 |   label_index = 1
377 | 
378 |   # Construct the list of JPEG files and labels.
379 |   for text in unique_labels:
380 |     patterns = []
381 |     for extension in ['JPEG', 'jpg', 'JPG', 'jpeg', 'png', 'PNG']:
382 |       pattern = '%s/%s/*.%s' % (data_dir, text, extension)
383 |       patterns.append(pattern)
384 |     matching_files = tf.gfile.Glob(patterns)
385 | 
386 |     labels.extend([label_index] * len(matching_files))
387 |     texts.extend([text] * len(matching_files))
388 |     filenames.extend(matching_files)
389 | 
390 |     if not label_index % 100:
391 |       print('Finished finding files in %d of %d classes.' % (
392 |           label_index, len(unique_labels)))
393 |     label_index += 1
394 | 
395 |   # Shuffle the ordering of all image files in order to guarantee
396 |   # random ordering of the images with respect to label in the
397 |   # saved TFRecord files. Make the randomization repeatable.
398 |   shuffled_index = list(range(len(filenames)))
399 |   random.seed(12345)
400 |   random.shuffle(shuffled_index)
401 | 
402 |   filenames = [filenames[i] for i in shuffled_index]
403 |   texts = [texts[i] for i in shuffled_index]
404 |   labels = [labels[i] for i in shuffled_index]
405 | 
406 |   print('Found %d JPEG files across %d labels inside %s.' %
407 |         (len(filenames), len(unique_labels), data_dir))
408 |   return filenames, texts, labels
409 | 
410 | 
411 | def _process_dataset(name, directory, num_shards, labels_file=None, filelist=None):
412 |   """Process a complete data set and save it as a TFRecord.
413 | 
414 |   Args:
415 |     name: string, unique identifier specifying the data set.
416 |     directory: string, root path to the data set.
417 |     num_shards: integer number of shards for this data set.
418 |     labels_file: string, path to the labels file.
419 |   """
420 |   if filelist is not None:
421 |     with open(filelist, 'r') as fin:
422 |       filelist = json.load(fin)
423 |     filenames = filelist["path"]
424 |     filenames = [os.path.join(directory, path) for path in filenames]
425 |     if "id" in filelist:
426 |       texts = filelist["id"]
427 |       label_dict = {}
428 |       labels = []
429 |       for label in texts:
430 |         if label not in label_dict:
431 |           label_dict[label] = len(label_dict)
432 |         labels.append(label_dict[label])
433 |     else:
434 |       texts = ["_"] * len(filenames)
435 |       labels = [-1] * len(filenames)
436 |     
437 |     shuffled_index = list(range(len(filenames)))
438 |     random.seed(12345)
439 |     random.shuffle(shuffled_index)
440 | 
441 |     filenames = [filenames[i] for i in shuffled_index]
442 |     texts = [texts[i] for i in shuffled_index]
443 |     labels = [labels[i] for i in shuffled_index]
444 | 
445 |     print("*" * 10, "Found %d files" % len(filenames))
446 | 
447 |   else:
448 |     filenames, texts, labels = _find_image_files(directory, labels_file)
449 |   _process_image_files(name, filenames, texts, labels, num_shards)
450 | 
451 | 
452 | def main(unused_argv):
453 |   assert not FLAGS.train_shards % FLAGS.num_threads, (
454 |       'Please make the FLAGS.num_threads commensurate with FLAGS.train_shards')
455 |   assert not FLAGS.validation_shards % FLAGS.num_threads, (
456 |       'Please make the FLAGS.num_threads commensurate with '
457 |       'FLAGS.validation_shards')
458 |   print('Saving results to %s' % FLAGS.output_directory)
459 | 
460 |   assert not FLAGS.output_directory is None, ('Please specify a output directory')
461 | 
462 |   # Run it!
463 |   if FLAGS.validation_directory is not None:
464 |     if FLAGS.filelist is not None:
465 |       _process_dataset('validation', FLAGS.validation_directory,
466 |                        FLAGS.validation_shards, filelist=FLAGS.filelist)
467 |     else:
468 |       _process_dataset('validation', FLAGS.validation_directory,
469 |                        FLAGS.validation_shards, labels_file=FLAGS.labels_file)
470 |   else:
471 |     print("Validation directory not specified, skipping")
472 | 
473 |   if FLAGS.train_directory is not None:
474 |     if FLAGS.filelist is not None:
475 |       _process_dataset('train', FLAGS.train_directory,
476 |                        FLAGS.train_shards, filelist=FLAGS.filelist)
477 |     else:
478 |       _process_dataset('train', FLAGS.train_directory,
479 |                        FLAGS.train_shards, labels_file=FLAGS.labels_file)
480 |   else:
481 |     print("Train directory not specified, skipping")
482 | 
483 | if __name__ == '__main__':
484 |   tf.app.run()
485 | 


--------------------------------------------------------------------------------
/preprocessing/msceleb_to_jpeg.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | Extracts jpeg images from MXNet format dataset.
 3 | Usage msceleb_to_jpeg.py dataset_directory
 4 | Extracted images will be stored in dataset_directory/images
 5 | '''
 6 | 
 7 | import os
 8 | import random
 9 | import logging
10 | import sys
11 | import numbers
12 | import numpy as np
13 | 
14 | import mxnet as mx
15 | from mxnet import ndarray as nd
16 | from mxnet import io
17 | from mxnet import recordio
18 | from PIL import Image
19 | import pickle
20 | 
21 | 
22 | def get_msceleb_images(records_dir):
23 |   imgidx_path = os.path.join(records_dir, "train.idx")
24 |   imgrec_path = os.path.join(records_dir, "train.rec")
25 |   images_dir = os.path.join(records_dir, "images")
26 | 
27 |   imgrec = recordio.MXIndexedRecordIO(imgidx_path, imgrec_path, 'r')
28 |   s = imgrec.read_idx(0)
29 |   header, _ = recordio.unpack(s)
30 |   tot_images = int(header.label[0]) - 1
31 |   print("Total images", tot_images)
32 |   for i in range(tot_images):
33 |     print("Reading ", i)
34 |     s = imgrec.read()
35 |     header, img = recordio.unpack(s)
36 |     img = mx.image.imdecode(img).asnumpy()
37 |     label = int(header.label)
38 |     img = Image.fromarray(np.uint8(img), "RGB")
39 |     images_subdir = os.path.join(images_dir, "identity_%d" % label)
40 |     if not os.path.exists(images_subdir):
41 |       os.makedirs(images_subdir)
42 |     image_path = os.path.join(images_subdir, "image_%d.jpg" % i)
43 |     img.save(image_path)
44 | 
45 | 
46 | def get_lfw_images(records_dir):
47 |   lfw_bin = os.path.join(records_dir, "lfw.bin")
48 |   bins, issame_list = pickle.load(open(lfw_bin, "rb"), encoding='bytes')
49 |   lfw_images = os.path.join(records_dir, "lfw_images")
50 |   if not os.path.exists(lfw_images):
51 |     os.makedirs(lfw_images)
52 | 
53 |   for i in range(1000):
54 |     bin_ = bins[i]
55 |     img = mx.image.imdecode(bin_).asnumpy()
56 |     img = Image.fromarray(np.uint8(img), "RGB")
57 |     image_path = os.path.join(lfw_images, "image_%d.jpg" % i)
58 |     img.save(image_path)
59 | 
60 | 
61 | def main(records_dir):
62 |   get_msceleb_images(records_dir)
63 |   # get_lfw_images(records_dir)
64 | 
65 | 
66 | if __name__ == "__main__":
67 |   main(sys.argv[1])
68 | 


--------------------------------------------------------------------------------
/preprocessing/preprocess_fs_mf_official.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Script to preprocess a set of labelled images and store in
 4 | # sharded TFRecord files. The sharded files are stored under
 5 | # the data directory.
 6 | # Assumes that the images are organized into folders according
 7 | # to their labels inside the data directory.
 8 | # labels.txt should contain a list of the labels. For example
 9 | #   n03114236
10 | #   n03114807
11 | #   ...
12 | #
13 | # usage:
14 | #  ./preprocess_fs_mf_official.sh data-dir
15 | 
16 | set -e
17 | 
18 | if (( $# != 2 )); then
19 |   echo "Usage: preprocess_[dataset].sh [data dir] [file list]"
20 |   exit
21 | fi
22 | 
23 | # Build the TFRecords version of the ImageNet data.
24 | echo "Now creating TFRecord files. This might take a while."
25 | BUILD_SCRIPT="build_image_data.py"
26 | 
27 | IMAGES_DIR="${1}/aligned_if/"
28 | OUTPUT_DIR="${1}/tfrecords_official/"
29 | FILELIST="${2}"
30 | 
31 | echo "Writing to ${OUTPUT_DIR}"
32 | mkdir -p "${OUTPUT_DIR}"
33 | 
34 | export CUDA_VISIBLE_DEVICES=""
35 | python "${BUILD_SCRIPT}" \
36 |   --validation_directory="${IMAGES_DIR}" \
37 |   --output_directory="${OUTPUT_DIR}" \
38 |   --filelist="${FILELIST}" \
39 |   --validation_shards=128
40 | 
41 | echo "Done"
42 | 


--------------------------------------------------------------------------------
/preprocessing/preprocess_msceleb.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # Script to preprocess a set of labelled images and store in
 4 | # sharded TFRecord files. The sharded files are stored under
 5 | # the data directory.
 6 | # Assumes that the images are organized into folders according
 7 | # to their labels inside the data directory.
 8 | # lablels.txt should contain a list of the labels. For example
 9 | #   n03114236
10 | #   n03114807
11 | #   ...
12 | #
13 | # usage:
14 | #  ./preprocess_msceleb.sh data-dir
15 | 
16 | set -e
17 | 
18 | if (( $# != 1 )); then
19 |   echo "Usage: preprocess_msceleb.sh [data dir]"
20 |   exit
21 | fi
22 | 
23 | # Build the TFRecords version of the ImageNet data.
24 | echo "Now creating TFRecord files. This might take a while."
25 | BUILD_SCRIPT="build_image_data.py"
26 | 
27 | IMAGES_DIR="${1}/images"
28 | OUTPUT_DIR="${1}/tfrecords/train"
29 | LABELS_FILE="${1}/labels.txt"
30 | 
31 | echo "Writing to ${OUTPUT_DIR}"
32 | mkdir -p "${OUTPUT_DIR}"
33 | 
34 | python "${BUILD_SCRIPT}" \
35 |   --train_directory="${IMAGES_DIR}" \
36 |   --output_directory="${OUTPUT_DIR}" \
37 |   --labels_file="${LABELS_FILE}"
38 | 
39 | echo "Done"
40 | 


--------------------------------------------------------------------------------
/proxy_metric_learning.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from mobilenet import mobilenet_v2
  3 | import input_pipeline
  4 | import pprint
  5 | import time
  6 | from PIL import Image
  7 | import imagenet_preprocessing
  8 | import absl
  9 | from argparse import Namespace
 10 | import sys
 11 | import os
 12 | import numpy as np
 13 | import horovod.tensorflow as hvd
 14 | from tensorflow.core.framework.summary_pb2 import Summary
 15 | from tensorflow import logging
 16 | import mobilefacenet
 17 | import horovod.tensorflow as hvd
 18 | from mobilenet.mobilenet import global_pool
 19 | from mobilenet import conv_blocks as ops
 20 | from inception import inception_v1, inception_v2, inception_v3, inception_v4
 21 | from inception import inception_utils
 22 | 
 23 | inception_v1.default_image_size = 112
 24 | inception_v2.default_image_size = 112
 25 | inception_v3.default_image_size = 112
 26 | inception_v4.default_image_size = 112
 27 | 
 28 | slim = tf.contrib.slim
 29 | # Define all necessary hyperparameters as flags
 30 | flags = absl.flags
 31 | FLAGS = flags.FLAGS
 32 | 
 33 | expand_input = ops.expand_input_by_factor
 34 | 
 35 | 
 36 | # class ProxyInitializerHook(tf.train.SessionRunHook):
 37 | #   def begin(self):
 38 | #     print("*" * 20, "begin called")
 39 | #     with tf.variable_scope('Logits', reuse=True):
 40 | #       weights = tf.get_variable(name='proxy_wts')
 41 | #     with tf.variable_scope('proxy_init'):
 42 | #       weights_placeholder = tf.placeholder(dtype=tf.float32, name='weights_placeholder')
 43 | #       self.assign_op = tf.assign(weights, weights_placeholder)
 44 |   
 45 | #   def after_create_session(self, session, coord):
 46 | #     global_step = tf.train.get_or_create_global_step()
 47 | #     step = sess.run([global_step])
 48 | #     if step == 0:
 49 | 
 50 | 
 51 | 
 52 | def proxy_layer(embedding, num_classes):
 53 |   with tf.variable_scope('Logits'):
 54 |     weights = tf.get_variable(name='proxy_wts',
 55 |                               shape=(embedding.shape[-1], num_classes),
 56 |                               initializer=tf.random_normal_initializer(stddev=0.001),
 57 |                               dtype=tf.float32)
 58 |     weights = tf.nn.l2_normalize(weights, axis=0)
 59 |     # bias = tf.Variable(tf.zeros([num_classes]))
 60 |     alpha = 16.0
 61 |     logits = alpha * tf.matmul(embedding, weights) # + bias
 62 |   return logits
 63 | 
 64 | 
 65 | def arc_logits(embedding, out_num):
 66 |     with tf.variable_scope('Logits'):
 67 |       weights = tf.get_variable(name='proxy_wts', shape=(embedding.shape[-1], out_num),
 68 |                                 initializer=tf.random_normal_initializer(stddev=0.001),
 69 |                                 dtype=tf.float32)
 70 |       weights = tf.nn.l2_normalize(weights, axis=0)
 71 |     #   assign_op = tf.assign(weights, tf.nn.l2_normalize(weights, axis=0))
 72 |     #   with tf.control_dependencies([assign_op]):
 73 |       cos_t = tf.matmul(embedding, weights, name='cos_t')
 74 | 
 75 |     return cos_t
 76 | 
 77 | 
 78 | def soft_thresh(lam):
 79 |   # lam = tf.get_variable(
 80 |   #   "soft_thresh", shape=(),
 81 |   #   initializer=tf.zeros_initializer,
 82 |   #   constraint=lambda x: "hello")
 83 |   # lam = lam
 84 |   def activation(x):
 85 |     ret = tf.nn.relu(x - lam) - tf.nn.relu(-x - lam)
 86 |     return ret  
 87 |   return activation
 88 | 
 89 | 
 90 | def metric_learning(features, is_training, num_classes):
 91 |   # embedding, end_points = inception_v1.inception_v1(
 92 |   #   features, embedding_dim=FLAGS.embedding_size, use_bn=True, is_training=is_training,
 93 |   #   global_pool=True)
 94 |   embedding, end_points = inception_v4.inception_v4(
 95 |     features, num_classes=FLAGS.embedding_size, is_training=is_training,
 96 |     create_aux_logits=False)
 97 |   if FLAGS.final_activation == 'soft_thresh':
 98 |     act = soft_thresh(0.5)
 99 |   elif FLAGS.final_activation == 'relu':
100 |     act = tf.nn.relu
101 |   elif FLAGS.final_activation == 'none':
102 |     act = lambda x: x
103 |   else:
104 |     raise ValueError('Unsupported activation', FLAGS.final_activation)
105 |   embedding = act(embedding)
106 |   embedding = tf.nn.l2_normalize(embedding, axis=1)
107 |   end_points["embedding"] = embedding
108 |   # net = tf.squeeze(net, axis=[1, 2])
109 |   logits = proxy_layer(embedding, num_classes)
110 |   end_points["Logits"] = logits
111 | 
112 |   return logits, end_points
113 | 


--------------------------------------------------------------------------------
/resnet/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/biswajitsc/sparse-embed/775404d38ce4382251b88168820da3740c1323ea/resnet/__init__.py


--------------------------------------------------------------------------------
/run_scripts/cifar/dense_64.sh:
--------------------------------------------------------------------------------
1 | export CUDA_VISIBLE_DEVICES="1"
2 | python train.py \
3 | --model_dir=checkpoints/cifar/dense_64_run_1 \
4 | --learning_rate=1e-4 --optimizer=adam \
5 | --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
6 | --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=none \
7 | --max_steps=20000 --decay_step=10000
8 | 


--------------------------------------------------------------------------------
/run_scripts/cifar/sparse_64.sh:
--------------------------------------------------------------------------------
  1 | # soft thresh lam = 0.1 margin = 0.3 -> 0.5 l1_param decreased
  2 | # export CUDA_VISIBLE_DEVICES="1"
  3 | # python train.py \
  4 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_1 \
  5 | # --learning_rate=1e-3 --optimizer=adam \
  6 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
  7 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
  8 | # --max_steps=25000 --sparsity_type=flops_sur --l1_parameter=0.01 --l1_p_steps=21000 \
  9 | # --decay_step=10000 --l1_weighing_scheme=dynamic_4
 10 | 
 11 | # soft thresh lam = 0.1 margin = 0.5
 12 | # export CUDA_VISIBLE_DEVICES="1"
 13 | # python train.py \
 14 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_2 \
 15 | # --learning_rate=1e-3 --optimizer=adam \
 16 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
 17 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
 18 | # --max_steps=20000 --sparsity_type=flops_sur --l1_parameter=0.002 --l1_p_steps=21000 \
 19 | # --decay_step=10000 --l1_weighing_scheme=dynamic_4
 20 | 
 21 | # soft_thresh lam = 0.2 margin = 0.3
 22 | # export CUDA_VISIBLE_DEVICES="1"
 23 | # python train.py \
 24 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_3 \
 25 | # --learning_rate=1e-3 --optimizer=adam \
 26 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
 27 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
 28 | # --max_steps=25000 --decay_step=10000
 29 | 
 30 | # soft_thresh lam = 0.1 margin = 0.3
 31 | # export CUDA_VISIBLE_DEVICES="1"
 32 | # python train.py \
 33 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_4 \
 34 | # --learning_rate=1e-3 --optimizer=adam \
 35 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
 36 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
 37 | # --max_steps=25000 --decay_step=10000
 38 | 
 39 | 
 40 | # changing l2 regularization from 0.00004 -> 0.0001 and 0.0005 -> 0.001
 41 | # export CUDA_VISIBLE_DEVICES="3"
 42 | # python train.py \
 43 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_5 \
 44 | # --learning_rate=1e-3 --optimizer=adam \
 45 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
 46 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
 47 | # --max_steps=25000 --decay_step=10000
 48 | 
 49 | # changing last layer l2 regularization from 0.001 -> 0.005
 50 | # export CUDA_VISIBLE_DEVICES="2"
 51 | # python train.py \
 52 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_6 \
 53 | # --learning_rate=1e-3 --optimizer=adam \
 54 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
 55 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
 56 | # --max_steps=25000 --decay_step=10000
 57 | 
 58 | # export CUDA_VISIBLE_DEVICES="0"
 59 | # python train.py \
 60 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_7 \
 61 | # --learning_rate=1e-3 --optimizer=adam \
 62 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
 63 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
 64 | # --max_steps=30000 --decay_step=10000 --l1_weighing_scheme=dynamic_4 \
 65 | # --l1_parameter=0.002 --l1_p_steps=12000
 66 | 
 67 | # ******************************
 68 | # Changing margin from 0.3 to 0.2
 69 | # export CUDA_VISIBLE_DEVICES="2"
 70 | # python train.py \
 71 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_8 \
 72 | # --learning_rate=1e-3 --optimizer=adam \
 73 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
 74 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
 75 | # --max_steps=40000 --decay_step=10000 --l1_weighing_scheme=dynamic_4 \
 76 | # --l1_parameter=0.01 --l1_p_steps=12000
 77 | 
 78 | # export CUDA_VISIBLE_DEVICES="0"
 79 | # python train.py \
 80 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_9 \
 81 | # --learning_rate=1e-3 --optimizer=adam \
 82 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
 83 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
 84 | # --max_steps=30000 --decay_step=6000 --l1_weighing_scheme=dynamic_4 \
 85 | # --l1_parameter=0.02 --l1_p_steps=20000
 86 | 
 87 | # export CUDA_VISIBLE_DEVICES="0"
 88 | # python train.py \
 89 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_10 \
 90 | # --learning_rate=1e-3 --optimizer=adam \
 91 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
 92 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
 93 | # --max_steps=50000 --decay_step=10000 --l1_weighing_scheme=dynamic_4 \
 94 | # --l1_parameter=0.001 --l1_p_steps=12000
 95 | 
 96 | # decreasing final regularization to 0.001 and margin to 0.1
 97 | # export CUDA_VISIBLE_DEVICES="1"
 98 | # python train.py \
 99 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_11 \
100 | # --learning_rate=1e-3 --optimizer=adam \
101 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
102 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
103 | # --max_steps=50000 --decay_step=15000 --l1_weighing_scheme=dynamic_4 \
104 | # --l1_parameter=0.005 --l1_p_steps=15000
105 | 
106 | # export CUDA_VISIBLE_DEVICES="0"
107 | # python train.py \
108 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_12 \
109 | # --learning_rate=1e-3 --optimizer=adam \
110 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
111 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
112 | # --max_steps=50000 --decay_step=15000 --l1_weighing_scheme=dynamic_4 \
113 | # --l1_parameter=0.0 --l1_p_steps=15000
114 | 
115 | # export CUDA_VISIBLE_DEVICES="0"
116 | # python train.py \
117 | # --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_13 \
118 | # --learning_rate=1e-3 --optimizer=adam \
119 | # --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
120 | # --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
121 | # --max_steps=50000 --decay_step=15000 --l1_weighing_scheme=dynamic_4 \
122 | # --l1_parameter=0.001 --l1_p_steps=15000 --sparsity_type=flops_sur
123 | 
124 | export CUDA_VISIBLE_DEVICES="3"
125 | python train.py \
126 | --model_dir=checkpoints/kubernetes/cifar/sparse_64_run_14 \
127 | --learning_rate=1e-3 --optimizer=adam \
128 | --data_dir=../cifar-100 --evaluate=True --num_classes=101 \
129 | --model=cifar100 --batch_size=512 --embedding_size=64 --final_activation=soft_thresh \
130 | --max_steps=50000 --decay_step=15000 --l1_weighing_scheme=dynamic_4 \
131 | --l1_parameter=0.001 --l1_p_steps=15000 --sparsity_type=l1_norm
132 | 


--------------------------------------------------------------------------------
/run_scripts/face/dense_mobilenet_128.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3"
 2 | 
 3 | horovodrun -np 4 -H localhost:4 \
 4 | python train.py \
 5 | --model_dir=checkpoints/msceleb/dense_mobilenet_128_run_1 --max_steps=180000 \
 6 | --learning_rate=1e-2 --decay_step=150000 --optimizer=mom --momentum=0.9 \
 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \
 8 | --num_classes=85743 --batch_size=256 --model=mobilefacenet \
 9 | --embedding_size=128 --evaluate=True --final_activation=none;
10 | 


--------------------------------------------------------------------------------
/run_scripts/face/dense_mobilenet_512.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3"
 2 | 
 3 | horovodrun -np 4 -H localhost:4 \
 4 | python train.py \
 5 | --model_dir=checkpoints/msceleb/dense_mobilenet_512_run_1 --max_steps=180000 \
 6 | --learning_rate=1e-2 --decay_step=150000 --optimizer=mom --momentum=0.9 \
 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \
 8 | --num_classes=85743 --batch_size=256 --model=mobilefacenet \
 9 | --embedding_size=512 --evaluate=True --final_activation=none;
10 | 


--------------------------------------------------------------------------------
/run_scripts/face/dense_resnet_128.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3"
 2 | 
 3 | horovodrun -np 4 -H localhost:4 \
 4 | python train.py \
 5 | --model_dir=checkpoints/msceleb/dense_resnet_128_run_1 --max_steps=180000 \
 6 | --learning_rate=1e-2 --decay_step=150000 --optimizer=mom --momentum=0.9 \
 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \
 8 | --num_classes=85743 --batch_size=64 --model=faceresnet \
 9 | --embedding_size=128 --evaluate=True --final_activation=none;
10 | 


--------------------------------------------------------------------------------
/run_scripts/face/dense_resnet_512.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3"
 2 | 
 3 | horovodrun -np 4 -H localhost:4 \
 4 | python train.py \
 5 | --model_dir=checkpoints/msceleb/dense_resnet_512_run_1 --max_steps=180000 \
 6 | --learning_rate=1e-2 --decay_step=150000 --optimizer=mom --momentum=0.9 \
 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \
 8 | --num_classes=85743 --batch_size=64 --model=faceresnet \
 9 | --embedding_size=512 --evaluate=True --final_activation=none;
10 | 


--------------------------------------------------------------------------------
/run_scripts/face/sparse_mobilenet_fl.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3"
 2 | 
 3 | horovodrun -np 4 -H localhost:4 \
 4 | python train.py \
 5 | --model_dir=checkpoints/msceleb/sparse_mobilenet_fl_run_1 \
 6 | --learning_rate=1e-4 --optimizer=mom --momentum=0.9 \
 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \
 8 | --num_classes=85743 --batch_size=256 --model=mobilefacenet \
 9 | --embedding_size=1024 --sparsity_type=flops_sur \
10 | --l1_weighing_scheme=dynamic_4 --l1_parameter=100.0 \
11 | --evaluate=True --final_activation=relu;
12 | 


--------------------------------------------------------------------------------
/run_scripts/face/sparse_mobilenet_l1.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES="1,2,3,4"
 2 | 
 3 | horovodrun -np 4 -H localhost:4 \
 4 | python train.py \
 5 | --model_dir=checkpoints/msceleb/sparse_mobilenet_l1_run_1 \
 6 | --learning_rate=1e-3 --optimizer=mom --momentum=0.9 \
 7 | --l1_weighing_scheme=dynamic_4 --l1_parameter=1.0 \
 8 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \
 9 | --num_classes=85743 --batch_size=256 --model=mobilefacenet \
10 | --embedding_size=1024 --sparsity_type=l1_norm \
11 | --evaluate=True --final_activation=relu;
12 | 


--------------------------------------------------------------------------------
/run_scripts/face/sparse_resnet_fl.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3"
 2 | 
 3 | horovodrun -np 4 -H localhost:4 \
 4 | python train.py \
 5 | --model_dir=checkpoints/msceleb/sparse_resnet_dynamic_4_fl_run_1 \
 6 | --learning_rate=1e-3 --decay_step=160000 --optimizer=mom --momentum=0.9 \
 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \
 8 | --num_classes=85743 --batch_size=64 --model=faceresnet \
 9 | --embedding_size=1024 --sparsity_type=flops_sur  --max_steps=230000 \
10 | --l1_weighing_scheme=dynamic_4 --l1_parameter=100.0 -l1_p_steps=100000 \
11 | --evaluate=True --final_activation=soft_thresh;
12 | 


--------------------------------------------------------------------------------
/run_scripts/face/sparse_resnet_l1.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES="0,1,2,3"
 2 | 
 3 | horovodrun -np 4 -H localhost:4 \
 4 | python train.py \
 5 | --model_dir=checkpoints/msceleb/sparse_resnet_flops_run_1 \
 6 | --learning_rate=1e-3 --decay_step=160000 --optimizer=mom --momentum=0.9 \
 7 | --data_dir=../msceleb1m/tfrecords/train --evaluate=True \
 8 | --num_classes=85743 --batch_size=64 --model=faceresnet \
 9 | --embedding_size=1024 --sparsity_type=l1_norm  --max_steps=230000 \
10 | --l1_weighing_scheme=dynamic_4 --l1_parameter=3.0 -l1_p_steps=100000 \
11 | --evaluate=True --final_activation=soft_thresh;
12 | 


--------------------------------------------------------------------------------
/run_scripts/get_embeddings/face_mobilenet_reranking.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES="1"
 2 | set -e
 3 | 
 4 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=128 \
 5 | --model_dir=checkpoints/msceleb/dense_mobilenet_128_run_1 \
 6 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \
 7 | --final_activation=none --model=mobilefacenet --debug=False  \
 8 | --filelist=../megaface_distractors/facescrub_features_list.json \
 9 | --prefix=facescrub
10 | 
11 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=128 \
12 | --model_dir=checkpoints/msceleb/dense_mobilenet_128_run_1 \
13 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \
14 | --final_activation=none --model=mobilefacenet --debug=False  \
15 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \
16 | --prefix=megaface
17 | 
18 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=512 \
19 | --model_dir=checkpoints/msceleb/dense_mobilenet_512_run_1 \
20 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \
21 | --final_activation=none --model=mobilefacenet --debug=False  \
22 | --filelist=../megaface_distractors/facescrub_features_list.json \
23 | --prefix=facescrub
24 | 
25 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=512 \
26 | --model_dir=checkpoints/msceleb/dense_mobilenet_512_run_1 \
27 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \
28 | --final_activation=none --model=mobilefacenet --debug=False  \
29 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \
30 | --prefix=megaface
31 | 
32 | 
33 | for i in {3..8}
34 | do
35 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \
36 | --model_dir=checkpoints/msceleb/sparse_mobilenet_fl_run_$i \
37 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \
38 | --final_activation=relu --model=mobilefacenet --debug=False  \
39 | --filelist=../megaface_distractors/facescrub_features_list.json \
40 | --prefix=facescrub
41 | done
42 | 
43 | for i in {1..6}
44 | do
45 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \
46 | --model_dir=checkpoints/msceleb/sparse_mobilenet_l1_run_$i \
47 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \
48 | --final_activation=relu --model=mobilefacenet --debug=False  \
49 | --filelist=../megaface_distractors/facescrub_features_list.json \
50 | --prefix=facescrub
51 | done
52 | 
53 | for i in {3..8}
54 | do
55 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \
56 | --model_dir=checkpoints/msceleb/sparse_mobilenet_fl_run_$i \
57 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \
58 | --final_activation=relu --model=mobilefacenet --debug=False  \
59 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \
60 | --prefix=megaface
61 | done
62 | 
63 | for i in {1..6}
64 | do
65 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \
66 | --model_dir=checkpoints/msceleb/sparse_mobilenet_l1_run_$i \
67 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \
68 | --final_activation=relu --model=mobilefacenet --debug=False  \
69 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \
70 | --prefix=megaface
71 | done


--------------------------------------------------------------------------------
/run_scripts/get_embeddings/face_resnet_reranking.sh:
--------------------------------------------------------------------------------
 1 | export CUDA_VISIBLE_DEVICES="2"
 2 | set -e
 3 | 
 4 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=512 \
 5 | --model_dir=checkpoints/msceleb/dense_resnet_512_run_1 \
 6 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \
 7 | --final_activation=none --model=faceresnet --debug=False  \
 8 | --filelist=../megaface_distractors/facescrub_features_list.json \
 9 | --prefix=facescrub
10 | 
11 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=512 \
12 | --model_dir=checkpoints/msceleb/dense_resnet_512_run_1 \
13 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \
14 | --final_activation=none --model=faceresnet --debug=False  \
15 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \
16 | --prefix=megaface
17 | 
18 | 
19 | for i in {1..4}
20 | do
21 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \
22 | --model_dir=checkpoints/msceleb/sparse_resnet_fl_run_$i \
23 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \
24 | --final_activation=soft_thresh --model=faceresnet --debug=False  \
25 | --filelist=../megaface_distractors/facescrub_features_list.json \
26 | --prefix=facescrub
27 | done
28 | 
29 | for i in {1..4}
30 | do
31 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \
32 | --model_dir=checkpoints/msceleb/sparse_resnet_l1_run_$i \
33 | --data_dir=../facescrub/tfrecords_official/ --is_training=False \
34 | --final_activation=soft_thresh --model=faceresnet --debug=False  \
35 | --filelist=../megaface_distractors/facescrub_features_list.json \
36 | --prefix=facescrub
37 | done
38 | 
39 | for i in {1..4}
40 | do
41 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \
42 | --model_dir=checkpoints/msceleb/sparse_resnet_fl_run_$i \
43 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \
44 | --final_activation=soft_thresh --model=faceresnet --debug=False  \
45 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \
46 | --prefix=megaface
47 | done
48 | 
49 | for i in {1..4}
50 | do
51 | python get_embeddings_reranking.py --num_classes=85743 --embedding_size=1024 \
52 | --model_dir=checkpoints/msceleb/sparse_resnet_l1_run_$i \
53 | --data_dir=../megaface_distractors/tfrecords_official/ --is_training=False \
54 | --final_activation=soft_thresh --model=faceresnet --debug=False  \
55 | --filelist=../megaface_distractors/megaface_features_list.json_1000000_1 \
56 | --prefix=megaface
57 | done


--------------------------------------------------------------------------------
/search/Makefile:
--------------------------------------------------------------------------------
 1 | all: clean compile_search
 2 | 
 3 | compile_search:
 4 | 	g++ -O3 -fopenmp -std=c++11 search.cpp -o search
 5 | 
 6 | runmegaface:
 7 | 	./search ../face/megaface_sparse_1024.hdf5.bin 1000 0.25
 8 | 
 9 | runfacescrub:
10 | 	./search ../face/megaface_sparse_1024.hdf5.bin 1000 0.25 ../face/facescrub_sparse_1024.hdf5.bin
11 | 
12 | clean:
13 | 	rm -f ./search
14 | 


--------------------------------------------------------------------------------
/search/hdf5_to_bin.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import h5py
 3 | import sys
 4 | from tqdm import tqdm
 5 | 
 6 | try:
 7 |     embed_fpath = sys.argv[1]
 8 |     output_dir = sys.argv[2]
 9 |     print("Reading from", embed_fpath)
10 | except:
11 |     print('python3 hdf5_to_bin.py [embed_fpath] [output_dir]')
12 |     exit(0)
13 | 
14 | embedfname = embed_fpath.split('/')[-1]
15 | idx = embedfname.rfind('.')
16 | embedfname = embedfname[:idx]
17 | 
18 | print('Writing to', embedfname + '.bin')
19 | #with h5py.File(embed_fpath, 'r') as fp, \
20 | #        open(output_dir+'/'+embedfname+'.bin','wb') as fpw, open(output_dir+'/'+embedfname+'.filemap', 'w') as fpw2:
21 | 
22 | with h5py.File(embed_fpath, 'r') as fp, open(output_dir+'/'+embedfname+'.bin','wb') as fpw:
23 |     
24 |     N = fp['embedding'].shape[0]
25 |     dim = fp['embedding'].shape[1]
26 |     batch_size = 1000
27 |     
28 |     #Number of instances
29 |     #print( 'total number of instances={}'.format(N) )
30 |     fpw.write( np.int32(N) )
31 |     
32 |     for i in tqdm(range(N)): #number of batches
33 |         embedding = fp['embedding'][i]
34 |         label = fp['label'][i]
35 | 
36 |         nnz_ind = np.where(embedding > 1e-5)[0]
37 |         nnz_vals = embedding[nnz_ind]
38 |         
39 |         # ind_batch = fp[KEY_IND][b:b+batch_size]
40 |         # val_batch = fp[KEY_VAL][b:b+batch_size]
41 |         # lab_batch = fp[KEY_LABEL][b:b+batch_size]
42 |         #txt_batch = fp[KEY_TEXT][b:b+batch_size]
43 |         #fname_batch = fp[KEY_FNAME][b:b+batch_size]
44 | 
45 |         #print('processing {}...'.format(b))    
46 |         fpw.write(np.int32(label)) #label
47 |         #fpw.write(np.int32(-1)) #label
48 |         nnz = np.int32(len(nnz_ind))
49 |         fpw.write(nnz) #number of nonzero elements in the instance
50 |         fpw.write( np.int32(nnz_ind).tobytes() ) #indices
51 |         fpw.write( np.float32(nnz_vals).tobytes() ) #values
52 | 
53 | 


--------------------------------------------------------------------------------
/search/mobilefacenet_to_bin.sh:
--------------------------------------------------------------------------------
 1 | set -e
 2 | 
 3 | for i in {1..6};
 4 | do
 5 |     python hdf5_to_bin.py ../embeddings_reranking/facescrub_sparse_mobilenet_dynamic_4_l1_run_${i}.hdf5 ../embeddings_bin
 6 |     python hdf5_to_bin.py ../embeddings_reranking/megaface_sparse_mobilenet_dynamic_4_l1_run_${i}.hdf5 ../embeddings_bin
 7 | done
 8 | 
 9 | for i in {3..8};
10 | do
11 |     python hdf5_to_bin.py ../embeddings_reranking/facescrub_sparse_mobilenet_dynamic_4_fl_run_${i}.hdf5 ../embeddings_bin
12 |     python hdf5_to_bin.py ../embeddings_reranking/megaface_sparse_mobilenet_dynamic_4_fl_run_${i}.hdf5 ../embeddings_bin
13 | done


--------------------------------------------------------------------------------
/search/resnet_to_bin.sh:
--------------------------------------------------------------------------------
 1 | set -e
 2 | 
 3 | for i in {1..4};
 4 | do
 5 |     python hdf5_to_bin.py ../embeddings_reranking/facescrub_sparse_resnet_dynamic_4_l1_run_${i}.hdf5 ../embeddings_bin
 6 |     python hdf5_to_bin.py ../embeddings_reranking/megaface_sparse_resnet_dynamic_4_l1_run_${i}.hdf5 ../embeddings_bin
 7 | done
 8 | 
 9 | for i in {1..4};
10 | do
11 |     python hdf5_to_bin.py ../embeddings_reranking/facescrub_sparse_resnet_dynamic_4_fl_run_${i}.hdf5 ../embeddings_bin
12 |     python hdf5_to_bin.py ../embeddings_reranking/megaface_sparse_resnet_dynamic_4_fl_run_${i}.hdf5 ../embeddings_bin
13 | done


--------------------------------------------------------------------------------
/search/search.cpp:
--------------------------------------------------------------------------------
  1 | #include <fstream>
  2 | #include <iostream>
  3 | #include <stdlib.h>
  4 | #include <vector>
  5 | #include <algorithm>
  6 | #include <omp.h>
  7 | #include <stdio.h>
  8 | #include <string.h>
  9 | #include <cmath>
 10 | 
 11 | using namespace std;
 12 | 
 13 | typedef vector<float> Vec;
 14 | typedef vector<pair<int,float> > SparseVec;
 15 | typedef int Label;
 16 | 
 17 | typedef pair<Label, SparseVec> Instance;
 18 | typedef vector<SparseVec> InvertedIndex;
 19 | 
 20 | const int QUERY_SIZE = 10000;
 21 | const int NUM_INS_PER_PRINT = 100000;
 22 | 
 23 | int TOP_R = 1;
 24 | float CONF_THD = 0.25f;
 25 | 
 26 | 
 27 | void read_embed_data(char* embed_fpath, vector<Instance*>& embed_data, int& dim, int& num_class, int limit_N = 10000000){
 28 |    
 29 |     ifstream fin(embed_fpath, ios::in | ios::binary);
 30 |     int N;
 31 |     fin.read( (char*) &N, sizeof(int) );
 32 |     N = min( N, limit_N );
 33 | 
 34 |     cerr << "Reading " << N << " instances..." << endl;
 35 | 
 36 |     int label_offset = num_class;
 37 |     embed_data.resize(N);
 38 |     for(int i=0;i<N;i++){
 39 |         embed_data[i] = new Instance();
 40 |         int lab, nnz;
 41 |         fin.read((char*) &lab, sizeof(int));
 42 |         lab = label_offset + lab;
 43 | 
 44 |         embed_data[i]->first = lab;
 45 |         if(lab+1 > num_class) num_class = lab+1;
 46 |         fin.read((char*) &nnz, sizeof(int));
 47 |         SparseVec& embed = embed_data[i]->second;
 48 |         embed.resize(nnz);
 49 |         for(int j=0;j<nnz;j++){
 50 |             fin.read((char*) &(embed[j].first), sizeof(int));
 51 |             if(embed[j].first+1 > dim)
 52 |             	dim = embed[j].first+1;
 53 |         }
 54 |         for(int j=0;j<nnz;j++){
 55 |             fin.read((char*) &(embed[j].second), sizeof(float));
 56 |         }	
 57 |         if(i % NUM_INS_PER_PRINT == 0 && i!=0)
 58 |             cerr << i << " instances read." << endl;
 59 |     }
 60 | }
 61 | 
 62 | 
 63 | float inner_prod(SparseVec& a, SparseVec& b){
 64 |     SparseVec::iterator it1 = a.begin();
 65 |     SparseVec::iterator it2 = b.begin();
 66 |     float sum = 0.0f;
 67 |     while(it1!=a.end() && it2!=b.end()){
 68 |         if( it1->first < it2->first )
 69 |                 it1++;
 70 |         else if( it1->first > it2->first )
 71 |                 it2++;
 72 |         else{
 73 |                 sum += it1->second * it2->second;
 74 |                 it1++;
 75 |                 it2++;
 76 |         }
 77 |     }
 78 |     return sum;
 79 | }
 80 | 
 81 | 
 82 | void split_into_query_db(vector<Instance*>& data, int query_size, 
 83 |         vector<Instance*>& query_list, vector<Instance*>& db_list){		
 84 |     //Assume the data is already shuffled.
 85 |     // random_shuffle(data.begin(), data.end());
 86 |     query_list.clear();
 87 |     for(int i=0;i<query_size;i++)
 88 |         query_list.push_back(data[i]);
 89 |     for(int i=0;i<data.size();i++)
 90 |         db_list.push_back(data[i]);
 91 | }
 92 | 
 93 | 
 94 | bool sortbysec(const pair<int,float> &a, const pair<int,float> &b){ 
 95 |     return (a.second > b.second); 
 96 | } 
 97 | 
 98 | 
 99 | int topk = 1000;
100 | 
101 | class SearchEngine {
102 |     public:
103 |     SearchEngine(vector<Instance*>& data, int _dim){
104 |         N = data.size();
105 |         db = data;
106 |         dim = _dim;
107 |         build_index(db, dim);
108 |         score_vec.resize(N);
109 |         rank_list.resize(N);
110 |         dense_query.resize(dim);
111 |     }
112 | 
113 |     void build_index(vector<Instance*>& data, int dim){
114 |         int N = data.size();
115 |         long nnz = 0, nnz_thd = 0;
116 |         db_index.resize(dim);
117 |         for(int i=0;i<N;i++){
118 |             SparseVec& embed = data[i]->second;
119 |             for(auto& p: embed){
120 |                 db_index[p.first].push_back(make_pair(i,p.second));
121 |                 nnz++;
122 |             }
123 |         }
124 |         cerr << "Index built. NNZ% = " << ((float)nnz/N/dim*100.0) << endl;
125 | 
126 |         mata.resize(N, vector<float>(dim));
127 |         for (size_t i=0; i<N;i++)
128 |             for (size_t j=0; j<dim;j++)
129 |                 mata[i][j] = (float)(rand() % 101) / 101.1;
130 |         
131 |         // Filling the dense dense query vector with random floats.
132 |         // This is done to simulate the dense reranking step without loading
133 |         // the dense vectors.
134 |         for (size_t i=0; i<dense_query.size(); i++)
135 |             dense_query[i] = (float)(rand() % 101) / 101.1;
136 |     }
137 |     
138 |     int search(SparseVec query) {
139 |         memset((void*) &(score_vec[0]), 0, score_vec.size()*sizeof(float));
140 |             for(auto& p: query){
141 |                 for(auto& p2: db_index[p.first]){
142 |                     score_vec[p2.first] += p.second*p2.second;
143 |                 }
144 |             }
145 |         // ************************************ Comment below for no reranking ****************
146 |         int rank_list_size = 0;
147 |         for(int i=0;i<score_vec.size();i++){
148 |             if(score_vec[i] > CONF_THD)
149 |                 rank_list[rank_list_size++] = make_pair(i,score_vec[i]);
150 |         }
151 |         if(rank_list_size >= topk)
152 |             nth_element(rank_list.begin(), rank_list.begin()+topk-1, rank_list.begin() + rank_list_size, sortbysec);
153 |         int max_iter = min(rank_list_size, topk);
154 | 
155 |         for (size_t j=0; j<max_iter; j++)
156 |         {
157 |             int idx = rank_list[j].first;
158 |             rank_list[j].second = 0;
159 |             for (size_t i = 0; i<dim;i++)
160 |                 rank_list[j].second += mata[idx][i] * dense_query[i];
161 |         }
162 |         if(max_iter > TOP_R)
163 |             nth_element(rank_list.begin(), rank_list.begin()+TOP_R-1, rank_list.begin()+max_iter-1, sortbysec);
164 |         return max_iter;
165 |         // *********************************** Comment ends here ****************************
166 |         return 0;
167 |     }
168 |     
169 | 
170 |     private:
171 |     int N;
172 |     int dim;
173 |     vector<Instance*> db;
174 |     InvertedIndex db_index;
175 |     vector<float> score_vec;
176 |     vector<vector<float> > mata;
177 |     vector< pair<int, float> > rank_list;
178 |     vector<float> dense_query;
179 | };
180 | 
181 | 
182 | int main(int argc, char** argv){
183 |     if( argc-1 < 3 ){
184 |         cerr << "Command line arguments error" << endl;
185 |         cerr << "Usage: ./search_test <embed_fpath> <top_k_rerank> <thr> [queryset_fpath]" << endl;
186 |         exit(0);
187 |     }
188 |     srand(1);
189 |     char* embed_fpath = argv[1];
190 |     topk = atoi(argv[2]);
191 |     CONF_THD = atof(argv[3]);
192 |     cerr << "Confidence thr " << CONF_THD << endl;
193 |     TOP_R = 1;
194 |     char* queryset_fpath = NULL;
195 |     if( argc-1 > 3 ){
196 |         queryset_fpath = argv[4];
197 |     }
198 | 
199 |     //Read Embeddings
200 |     vector<Instance*> embed_data;
201 |     int dim=-1, num_class=0;
202 |     read_embed_data(embed_fpath, embed_data, dim, num_class);
203 |     cerr << "N=" << embed_data.size() << ", D=" << dim << ", #class=" << num_class << endl;
204 |     int num_nonquery_class = num_class;
205 |     
206 |     vector<Instance*> queryset_data;
207 |     if(queryset_fpath!=NULL){
208 |         read_embed_data(queryset_fpath, queryset_data, dim, num_class);
209 |         cerr << "Nq=" << queryset_data.size() << ", D=" << dim << ", #class=" << num_class << endl;
210 |     }
211 |     
212 |     //Split into Query and DB
213 |     vector<Instance*> query_list, db_list;
214 |     if(queryset_fpath==NULL) {
215 |         int i;
216 |         for(i=0; i<QUERY_SIZE; i++) query_list.push_back(embed_data[i]);
217 |         for(;i<embed_data.size(); i++) db_list.push_back(embed_data[i]);
218 |         cerr << "Splitting into " << (int) query_list.size() << " queries and a database of size " << (int) db_list.size() << endl;
219 |     }
220 |     else {
221 |     query_list.insert(query_list.end(), queryset_data.begin(), queryset_data.end());
222 |     db_list.insert(db_list.end(), embed_data.begin(), embed_data.end());
223 |     cerr << "Splitting into " << (int) query_list.size() << " queries and a database of size " << (int) db_list.size() << endl;
224 |     }
225 | 
226 |     //Build Inverted Index
227 |     cerr << "Building index..." << endl;
228 |     SearchEngine* engine = new SearchEngine(db_list, dim);
229 | 
230 |     vector<int> num_ins(num_class, 0);
231 |     for(int i=0;i<queryset_data.size();i++){
232 |         Instance* ins = queryset_data[i];
233 |         num_ins[ins->first]++;
234 |     }
235 | 
236 |     //Search
237 |     double start = omp_get_wtime();
238 |     int Nt = query_list.size();
239 |     
240 |     int empty_count=0;
241 |     int max_rank_len = -1;
242 |     long cum_rank_len = 0;
243 |     for(int i=0;i<query_list.size();i++){
244 |         Instance* q = query_list[i];
245 |         int q_lab = q->first;
246 |         SparseVec& embed = q->second;
247 |         int rank_list_size = engine->search(embed);
248 |         cum_rank_len += rank_list_size;
249 |         max_rank_len = max(max_rank_len, rank_list_size);
250 |      }
251 |      double end = omp_get_wtime();
252 | 
253 |     float avg_rank_len = cum_rank_len / Nt;
254 |     cerr << "Max reranking list length = " << max_rank_len << endl;
255 |     cerr << "Average reranking list length = " << avg_rank_len << endl;
256 |     cerr << "Total search time = " << end-start << "s" << endl;
257 |     cerr << "Search Time per Query = " << 1e3*(end-start)/Nt << "ms" << endl;
258 | 
259 |    return 0;	 
260 | }
261 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | import model
  3 | import pprint
  4 | import time
  5 | import datetime
  6 | from PIL import Image
  7 | import absl
  8 | import sys
  9 | import os
 10 | import numpy as np
 11 | import horovod.tensorflow as hvd
 12 | from tensorflow.core.framework.summary_pb2 import Summary
 13 | from tensorflow import logging
 14 | import resource
 15 | import flags
 16 | import evaluate_retrieval
 17 | import evaluate_mfacenet
 18 | import traceback
 19 | from cifar import cifar
 20 | 
 21 | 
 22 | # Define all necessary hyperparameters as flags
 23 | FLAGS = absl.flags.FLAGS
 24 | IMAGENET_NUM_TRAINING = 1281167
 25 | 
 26 | 
 27 | class ImageRateHook(tf.train.StepCounterHook):
 28 |   '''
 29 |   A simple extension of StepCounterHook to count images/sec.
 30 |   Overriding the _log_and_record function is a simple hack to achieve this.
 31 |   '''
 32 |   def _log_and_record(self, elapsed_steps, elapsed_time, global_step):
 33 |     images_per_sec = elapsed_steps / elapsed_time * FLAGS.batch_size * hvd.size()
 34 |     summary_tag = 'images/sec'
 35 |     if self._summary_writer is not None:
 36 |       summary = Summary(value=[Summary.Value(
 37 |           tag=summary_tag, simple_value=images_per_sec)])
 38 |       self._summary_writer.add_summary(summary, global_step)
 39 |     logging.info("%s: %g", summary_tag, images_per_sec)
 40 | 
 41 | 
 42 | def main(_):
 43 |   hvd.init()
 44 | 
 45 |   # Only a see a single unique GPU based on process rank
 46 |   config = tf.ConfigProto()
 47 | 
 48 |   # We don't need allow_growth=True as we will use a whole GPU for each process.
 49 |   # config.gpu_options.allow_growth = True
 50 |   config.gpu_options.visible_device_list = str(hvd.local_rank())
 51 | 
 52 |   # Only one of the workers save checkpoints and summaries in the
 53 |   # model directory
 54 |   if hvd.rank() == 0:
 55 |     config = tf.estimator.RunConfig(
 56 |       model_dir=FLAGS.model_dir,
 57 |       keep_checkpoint_every_n_hours=5,
 58 |       save_summary_steps=100,
 59 |       save_checkpoints_secs=60 * 5,
 60 |       session_config=config
 61 |     )
 62 |   else:
 63 |     config = tf.estimator.RunConfig(
 64 |       session_config=config,
 65 |       keep_checkpoint_max=1
 66 |     )
 67 | 
 68 |   if FLAGS.mobilenet_checkpoint_path is not None:
 69 |     # ^((?!badword).)*$ matches all strings which do not contain the badword
 70 |     ws = tf.estimator.WarmStartSettings(
 71 |       ckpt_to_initialize_from=FLAGS.mobilenet_checkpoint_path,
 72 |       vars_to_warm_start='.*' if FLAGS.restore_last_layer else "^((?!Logits).)*$",
 73 |     )
 74 |   else:
 75 |     ws = None
 76 | 
 77 |   estimator = tf.estimator.Estimator(
 78 |     model_fn=model.model_fn,
 79 |     config=config,
 80 |     warm_start_from=ws
 81 |   )
 82 | 
 83 |   bcast_hook = hvd.BroadcastGlobalVariablesHook(0)
 84 |   image_counter_hook = ImageRateHook()
 85 |   # Caching file writers, which will then be retrieved during evaluation.
 86 |   writer = tf.summary.FileWriter(logdir=FLAGS.model_dir, flush_secs=30)
 87 |   eval_writer = tf.summary.FileWriter(
 88 |     logdir=os.path.join(FLAGS.model_dir, "eval"), flush_secs=30)
 89 | 
 90 |   try:
 91 |     steps = estimator.get_variable_value('global_step')
 92 |   except ValueError:
 93 |     steps = 0
 94 |   evaluate_every_n = 1000
 95 |   evaluate(estimator, True)
 96 |   evaluate(estimator, False)
 97 |   # if hvd.rank() == 0 and FLAGS.evaluate:
 98 |   #   evaluate(estimator, False)
 99 |   sys.exit()
100 |   print("Steps", steps, "Max steps", FLAGS.max_steps)
101 |   while steps < FLAGS.max_steps:
102 |     evaluate_every_n = min(evaluate_every_n, FLAGS.max_steps - steps)
103 |     estimator.train(input_fn=lambda: model.imagenet_iterator(
104 |                       is_training=True, num_epochs=10000),
105 |                     steps=evaluate_every_n,
106 |                     hooks=[bcast_hook, image_counter_hook])
107 |     if hvd.rank() == 0 and FLAGS.evaluate:
108 |       # Evaluate on training set only for metric_learning
109 |       if FLAGS.model in ['metric_learning', 'cifar100']:
110 |         evaluate(estimator, True)
111 |         evaluate(estimator, False)
112 |       else:
113 |         evaluate(estimator, False)
114 |     
115 |     steps += evaluate_every_n
116 | 
117 | 
118 | def evaluate(estimator, is_training):
119 |   if is_training:
120 |     logdir = FLAGS.model_dir
121 |   else:
122 |     logdir = os.path.join(FLAGS.model_dir, 'eval')
123 |   print('Writing to', logdir)
124 |   print('Evaluating for %s' % FLAGS.model)
125 |   writer = tf.summary.FileWriterCache.get(logdir=logdir)
126 |   if FLAGS.model in ['metric_learning']:
127 |     global_step = evaluate_retrieval.get_global_step()
128 |     evaluate_retrieval.compute_metrics(estimator, global_step, is_training, writer)
129 |   if FLAGS.model in ['cifar100']:
130 |     global_step = evaluate_retrieval.get_global_step()
131 |     cifar.compute_metrics(estimator, global_step, is_training, writer)
132 |   if "face" in FLAGS.model:
133 |     global_step = evaluate_mfacenet.get_global_step()
134 |     evaluate_mfacenet.compute_megaface_metrics(estimator, global_step, writer)
135 |     for dataset in ["lfw", "agedb_30"]:
136 |       evaluate_mfacenet.compute_metrics(dataset, estimator, global_step, writer)
137 | 
138 | 
139 | def log_error():
140 |   with open('errors.log', 'a') as fout:
141 |     fout.write(('#' * 20) + '\n')
142 |     fout.write(str(datetime.datetime.now()) + '\n')
143 |     traceback.print_exc(file=fout)
144 | 
145 | if __name__ == '__main__':
146 |   try:
147 |     absl.app.run(main)
148 |   except tf.errors.ResourceExhaustedError as E:
149 |     print("#" * 20, "\nCaught ResourceExhaustedError. Re-raising exception ...\n", "#" * 20)
150 |     log_error()
151 |     raise E
152 |   except tf.errors.InternalError as E:
153 |     print("#" * 20, "\nCaught InternalError. Re-raising exception ...\n", "#" * 20)
154 |     log_error()
155 |     raise E
156 |   except Exception as E:
157 |     print("*" * 20, "\nCaught exception", type(E), "Exiting now ...")
158 |     traceback.print_exc()
159 | 


--------------------------------------------------------------------------------