├── .gitignore ├── ABC-layer-inference-support.ipynb ├── ABC.ipynb ├── LICENSE ├── README.md └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | local_settings.py 56 | 57 | # Flask stuff: 58 | instance/ 59 | .webassets-cache 60 | 61 | # Scrapy stuff: 62 | .scrapy 63 | 64 | # Sphinx documentation 65 | docs/_build/ 66 | 67 | # PyBuilder 68 | target/ 69 | 70 | # Jupyter Notebook 71 | .ipynb_checkpoints 72 | 73 | # pyenv 74 | .python-version 75 | 76 | # celery beat schedule file 77 | celerybeat-schedule 78 | 79 | # SageMath parsed files 80 | *.sage.py 81 | 82 | # dotenv 83 | .env 84 | 85 | # virtualenv 86 | .venv 87 | venv/ 88 | ENV/ 89 | 90 | # Spyder project settings 91 | .spyderproject 92 | .spyproject 93 | 94 | # Rope project settings 95 | .ropeproject 96 | 97 | # mkdocs documentation 98 | /site 99 | 100 | # mypy 101 | .mypy_cache/ 102 | -------------------------------------------------------------------------------- /ABC-layer-inference-support.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Implementation of Accurate Binary Convolution Layer\n", 8 | "The main notebook is **ABC.ipynb**. In this notebook, *alphas* training is moved out of the layer, so that the variables and functions can be made reusable for inference time." 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "metadata": { 15 | "collapsed": true 16 | }, 17 | "outputs": [], 18 | "source": [ 19 | "from __future__ import division, print_function\n", 20 | "import tensorflow as tf\n", 21 | "import numpy as np" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "#### See *ABC* notebook for explanation of all the functions" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 2, 34 | "metadata": { 35 | "collapsed": true 36 | }, 37 | "outputs": [], 38 | "source": [ 39 | "def get_mean_stddev(input_tensor):\n", 40 | " with tf.name_scope('mean_stddev_cal'):\n", 41 | " mean, variance = tf.nn.moments(input_tensor, axes=range(len(input_tensor.get_shape())))\n", 42 | " stddev = tf.sqrt(variance, name=\"standard_deviation\")\n", 43 | " return mean, stddev\n", 44 | " \n", 45 | "# TODO: Allow shift parameters to be learnable\n", 46 | "def get_shifted_stddev(stddev, no_filters):\n", 47 | " with tf.name_scope('shifted_stddev'):\n", 48 | " spreaded_deviation = -1. + (2./(no_filters - 1)) * tf.convert_to_tensor(range(no_filters),\n", 49 | " dtype=tf.float32)\n", 50 | " return spreaded_deviation * stddev\n", 51 | " \n", 52 | "def get_binary_filters(convolution_filters, no_filters, name=None):\n", 53 | " with tf.name_scope(name, default_name=\"get_binary_filters\"):\n", 54 | " mean, stddev = get_mean_stddev(convolution_filters)\n", 55 | " shifted_stddev = get_shifted_stddev(stddev, no_filters)\n", 56 | " \n", 57 | " # Normalize the filters by subtracting mean from them\n", 58 | " mean_adjusted_filters = convolution_filters - mean\n", 59 | " \n", 60 | " # Tiling filters to match the number of filters\n", 61 | " expanded_filters = tf.expand_dims(mean_adjusted_filters, axis=0, name=\"expanded_filters\")\n", 62 | " tiled_filters = tf.tile(expanded_filters, [no_filters] + [1] * len(convolution_filters.get_shape()),\n", 63 | " name=\"tiled_filters\")\n", 64 | " \n", 65 | " # Similarly tiling spreaded stddev to match the shape of tiled_filters\n", 66 | " expanded_stddev = tf.reshape(shifted_stddev, [no_filters] + [1] * len(convolution_filters.get_shape()),\n", 67 | " name=\"expanded_stddev\")\n", 68 | " \n", 69 | " binarized_filters = tf.sign(tiled_filters + expanded_stddev, name=\"binarized_filters\")\n", 70 | " return binarized_filters" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "Now, instead of get_alphas, implementation of **alpha training** is provided, which takes input of the *filters*, *binarized filters*, and *alphas* and returns the loss and the alpha training operation" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 3, 83 | "metadata": { 84 | "collapsed": true 85 | }, 86 | "outputs": [], 87 | "source": [ 88 | "def alpha_training(convolution_filters, binary_filters, alphas, no_filters):\n", 89 | " with tf.name_scope(\"alpha_training\"):\n", 90 | " reshaped_convolution_filters = tf.reshape(convolution_filters, [-1], name=\"reshaped_convolution_filters\")\n", 91 | " reshaped_binary_filters = tf.reshape(binary_filters, [no_filters, -1],\n", 92 | " name=\"reshaped_binary_filters\")\n", 93 | " \n", 94 | " weighted_sum_filters = tf.reduce_sum(tf.multiply(alphas, reshaped_binary_filters),\n", 95 | " axis=0, name=\"weighted_sum_filters\")\n", 96 | " \n", 97 | " # Defining loss\n", 98 | " error = tf.square(reshaped_convolution_filters - weighted_sum_filters, name=\"alphas_error\")\n", 99 | " loss = tf.reduce_mean(error, axis=0, name=\"alphas_loss\")\n", 100 | " \n", 101 | " # Defining optimizer\n", 102 | " training_op = tf.train.AdamOptimizer().minimize(loss, var_list=[alphas],\n", 103 | " name=\"alphas_training_op\")\n", 104 | " \n", 105 | " return training_op, loss" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "Now, both *ABC* and *ApproxConv* is updated to incorporate this change" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": 4, 118 | "metadata": { 119 | "collapsed": true 120 | }, 121 | "outputs": [], 122 | "source": [ 123 | "def ApproxConv(no_filters, alphas, binary_filters, convolution_biases=None,\n", 124 | " strides=(1, 1), padding=\"VALID\", name=None):\n", 125 | " with tf.name_scope(name, \"ApproxConv\"):\n", 126 | " if convolution_biases is None:\n", 127 | " biases = 0.\n", 128 | " else:\n", 129 | " biases = convolution_biases\n", 130 | " \n", 131 | " # Defining function for closure to accept multiple inputs with same filters\n", 132 | " def ApproxConvLayer(input_tensor, name=None):\n", 133 | " with tf.name_scope(name, \"ApproxConv_Layer\"):\n", 134 | " # Reshaping alphas to match the input tensor\n", 135 | " reshaped_alphas = tf.reshape(alphas,\n", 136 | " shape=[no_filters] + [1] * len(input_tensor.get_shape()),\n", 137 | " name=\"reshaped_alphas\")\n", 138 | " \n", 139 | " # Calculating convolution for each binary filter\n", 140 | " approxConv_outputs = []\n", 141 | " for index in range(no_filters):\n", 142 | " # Binary convolution\n", 143 | " this_conv = tf.nn.conv2d(input_tensor, binary_filters[index],\n", 144 | " strides=(1,) + strides + (1,),\n", 145 | " padding=padding)\n", 146 | " approxConv_outputs.append(this_conv + biases)\n", 147 | " conv_outputs = tf.convert_to_tensor(approxConv_outputs, dtype=tf.float32,\n", 148 | " name=\"conv_outputs\")\n", 149 | " \n", 150 | " # Summing up each of the binary convolution\n", 151 | " ApproxConv_output = tf.reduce_sum(tf.multiply(conv_outputs, reshaped_alphas), axis=0)\n", 152 | " \n", 153 | " return ApproxConv_output\n", 154 | " \n", 155 | " return ApproxConvLayer\n", 156 | " \n", 157 | "def ABC(binary_filters, alphas, shift_parameters, betas, \n", 158 | " convolution_biases=None, no_binary_filters=5, no_ApproxConvLayers=5,\n", 159 | " strides=(1, 1), padding=\"VALID\", name=None):\n", 160 | " with tf.name_scope(name, \"ABC\"): \n", 161 | " # Instantiating the ApproxConv Layer\n", 162 | " ApproxConvLayer= ApproxConv(no_binary_filters, alphas, binary_filters, convolution_biases,\n", 163 | " strides, padding)\n", 164 | " \n", 165 | " def ABCLayer(input_tensor, name=None):\n", 166 | " with tf.name_scope(name, \"ABCLayer\"):\n", 167 | " # Reshaping betas to match the input tensor\n", 168 | " reshaped_betas = tf.reshape(betas,\n", 169 | " shape=[no_ApproxConvLayers] + [1] * len(input_tensor.get_shape()),\n", 170 | " name=\"reshaped_betas\")\n", 171 | " \n", 172 | " # Calculating ApproxConv for each shifted input\n", 173 | " ApproxConv_layers = []\n", 174 | " for index in range(no_ApproxConvLayers):\n", 175 | " # Shifting and binarizing input\n", 176 | " shifted_input = tf.clip_by_value(input_tensor + shift_parameters[index], 0., 1.,\n", 177 | " name=\"shifted_input_\" + str(index))\n", 178 | " binarized_activation = tf.sign(shifted_input - 0.5)\n", 179 | " \n", 180 | " # Passing through the ApproxConv layer\n", 181 | " ApproxConv_layers.append(ApproxConvLayer(binarized_activation))\n", 182 | " ApproxConv_output = tf.convert_to_tensor(ApproxConv_layers, dtype=tf.float32,\n", 183 | " name=\"ApproxConv_output\")\n", 184 | " \n", 185 | " # Taking the weighted sum using the betas\n", 186 | " ABC_output = tf.reduce_sum(tf.multiply(ApproxConv_output, reshaped_betas), axis=0)\n", 187 | " return ABC_output\n", 188 | " \n", 189 | " return ABCLayer" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "#### Now a layer can be created as follows" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": 10, 202 | "metadata": { 203 | "collapsed": true 204 | }, 205 | "outputs": [], 206 | "source": [ 207 | "test_filters = np.random.normal(size=(3, 3, 1, 64))\n", 208 | "test_biases = np.random.normal(size=(64,))\n", 209 | "test_input = np.random.normal(size=(32, 28, 28, 1))" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 11, 215 | "metadata": { 216 | "collapsed": true 217 | }, 218 | "outputs": [], 219 | "source": [ 220 | "g = tf.Graph()" 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": 12, 226 | "metadata": { 227 | "collapsed": true 228 | }, 229 | "outputs": [], 230 | "source": [ 231 | "with g.as_default():\n", 232 | " filters = tf.Variable(tf.convert_to_tensor(test_filters, dtype=tf.float32), name=\"convolution_filters\")\n", 233 | " biases = tf.Variable(tf.convert_to_tensor(test_biases, dtype=tf.float32), name=\"convolution_biases\")\n", 234 | " alphas = tf.Variable(tf.constant(1., shape=(5, 1)), dtype=tf.float32,\n", 235 | " name=\"alphas\")\n", 236 | " shift_parameters = tf.Variable(tf.constant(0., shape=(5, 1)), dtype=tf.float32,\n", 237 | " name=\"shift_parameters\")\n", 238 | " betas = tf.Variable(tf.constant(1., shape=(5, 1)), dtype=tf.float32,\n", 239 | " name=\"betas\")\n", 240 | " \n", 241 | " binary_filters = get_binary_filters(filters, 5)\n", 242 | " alphas_training_op, alphas_loss = alpha_training(tf.stop_gradient(filters),\n", 243 | " tf.stop_gradient(binary_filters),\n", 244 | " alphas, 5)\n", 245 | " ABC_layer = ABC(binary_filters, tf.stop_gradient(alphas), shift_parameters, betas, biases)\n", 246 | " \n", 247 | " output = ABC_layer(tf.convert_to_tensor(test_input, dtype=tf.float32))" 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "### Testing\n", 255 | "Let's test the updated architecture on MNIST again" 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": 5, 261 | "metadata": {}, 262 | "outputs": [ 263 | { 264 | "name": "stdout", 265 | "output_type": "stream", 266 | "text": [ 267 | "Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.\n", 268 | "Extracting /tmp/data/train-images-idx3-ubyte.gz\n", 269 | "Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.\n", 270 | "Extracting /tmp/data/train-labels-idx1-ubyte.gz\n", 271 | "Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.\n", 272 | "Extracting /tmp/data/t10k-images-idx3-ubyte.gz\n", 273 | "Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.\n", 274 | "Extracting /tmp/data/t10k-labels-idx1-ubyte.gz\n" 275 | ] 276 | } 277 | ], 278 | "source": [ 279 | "# MNIST data import\n", 280 | "# Importing data\n", 281 | "from tensorflow.examples.tutorials.mnist import input_data\n", 282 | "!mkdir -p /tmp/data\n", 283 | "mnist = input_data.read_data_sets(\"/tmp/data/\")" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "The following is exactly same as in the other notebook *ABC*" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": 6, 296 | "metadata": { 297 | "collapsed": true 298 | }, 299 | "outputs": [], 300 | "source": [ 301 | "# Defining utils function\n", 302 | "def weight_variable(shape, name=\"weight\"):\n", 303 | " initial = tf.truncated_normal(shape, stddev=0.1)\n", 304 | " return tf.Variable(initial, name=name)\n", 305 | "\n", 306 | "def bias_variable(shape, name=\"bias\"):\n", 307 | " initial = tf.constant(0.1, shape=shape)\n", 308 | " return tf.Variable(initial, name=name)\n", 309 | "\n", 310 | "def conv2d(x, W):\n", 311 | " return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')\n", 312 | "\n", 313 | "def max_pool_2x2(x):\n", 314 | " return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],\n", 315 | " strides=[1, 2, 2, 1], padding='SAME')" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": 7, 321 | "metadata": { 322 | "collapsed": true 323 | }, 324 | "outputs": [], 325 | "source": [ 326 | "# Creating the graph\n", 327 | "without_ABC_graph = tf.Graph()\n", 328 | "with without_ABC_graph.as_default():\n", 329 | " # Defining inputs\n", 330 | " x = tf.placeholder(dtype=tf.float32)\n", 331 | " x_image = tf.reshape(x, [-1, 28, 28, 1])\n", 332 | " \n", 333 | " # Convolution Layer 1\n", 334 | " W_conv1 = weight_variable(shape=([5, 5, 1, 32]), name=\"W_conv1\")\n", 335 | " b_conv1 = bias_variable(shape=[32], name=\"b_conv1\")\n", 336 | " conv1 = (conv2d(x_image, W_conv1) + b_conv1)\n", 337 | " pool1 = max_pool_2x2(conv1)\n", 338 | " bn_conv1 = tf.layers.batch_normalization(pool1, axis=-1, name=\"batchNorm1\")\n", 339 | " h_conv1 = tf.nn.relu(bn_conv1)\n", 340 | "\n", 341 | " # Convolution Layer 2\n", 342 | " W_conv2 = weight_variable(shape=([5, 5, 32, 64]), name=\"W_conv2\")\n", 343 | " b_conv2 = bias_variable(shape=[64], name=\"b_conv2\")\n", 344 | " conv2 = (conv2d(h_conv1, W_conv2) + b_conv2)\n", 345 | " pool2 = max_pool_2x2(conv2)\n", 346 | " bn_conv2 = tf.layers.batch_normalization(pool2, axis=-1, name=\"batchNorm2\")\n", 347 | " h_conv2 = tf.nn.relu(bn_conv2)\n", 348 | "\n", 349 | " # Flat the conv2 output\n", 350 | " h_conv2_flat = tf.reshape(h_conv2, shape=(-1, 7*7*64))\n", 351 | "\n", 352 | " # Dense layer1\n", 353 | " W_fc1 = weight_variable([7 * 7 * 64, 1024])\n", 354 | " b_fc1 = bias_variable([1024])\n", 355 | " h_fc1 = tf.nn.relu(tf.matmul(h_conv2_flat, W_fc1) + b_fc1)\n", 356 | "\n", 357 | " # Dropout\n", 358 | " keep_prob = tf.placeholder(tf.float32)\n", 359 | " h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)\n", 360 | "\n", 361 | " # Output layer\n", 362 | " W_fc2 = weight_variable([1024, 10])\n", 363 | " b_fc2 = bias_variable([10])\n", 364 | "\n", 365 | " y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2\n", 366 | " \n", 367 | " # Labels\n", 368 | " y = tf.placeholder(tf.int32, [None])\n", 369 | " y_ = tf.one_hot(y, 10)\n", 370 | " \n", 371 | " # Defining optimizer and loss\n", 372 | " cross_entropy = tf.reduce_mean(\n", 373 | " tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))\n", 374 | " train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)\n", 375 | " correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))\n", 376 | " accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))\n", 377 | " \n", 378 | " # Initializer\n", 379 | " graph_init = tf.global_variables_initializer()" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": 8, 385 | "metadata": { 386 | "collapsed": true 387 | }, 388 | "outputs": [], 389 | "source": [ 390 | "# Defining variables to save. These will be fed to our custom layer\n", 391 | "variables_to_save = {\"W_conv1\": W_conv1,\n", 392 | " \"b_conv1\": b_conv1,\n", 393 | " \"W_conv2\": W_conv2,\n", 394 | " \"b_conv2\": b_conv2,\n", 395 | " \"W_fc1\": W_fc1,\n", 396 | " \"b_fc1\": b_fc1,\n", 397 | " \"W_fc2\": W_fc2,\n", 398 | " \"b_fc2\": b_fc2}\n", 399 | "values = {}" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": 9, 405 | "metadata": {}, 406 | "outputs": [ 407 | { 408 | "name": "stdout", 409 | "output_type": "stream", 410 | "text": [ 411 | "Epoch: 1 Val accuracy: 80.0000% Loss: 0.575571\n", 412 | "Epoch: 2 Val accuracy: 88.0000% Loss: 0.516295\n", 413 | "Epoch: 3 Val accuracy: 98.0000% Loss: 0.074902\n", 414 | "Epoch: 4 Val accuracy: 96.0000% Loss: 0.114960\n", 415 | "Epoch: 5 Val accuracy: 96.0000% Loss: 0.108748 \n" 416 | ] 417 | } 418 | ], 419 | "source": [ 420 | "n_epochs = 5\n", 421 | "batch_size = 32\n", 422 | " \n", 423 | "with tf.Session(graph=without_ABC_graph) as sess:\n", 424 | " sess.run(graph_init)\n", 425 | " for epoch in range(n_epochs):\n", 426 | " for iteration in range(1, 200 + 1):\n", 427 | " batch = mnist.train.next_batch(50)\n", 428 | " \n", 429 | " # Run operation and calculate loss\n", 430 | " _, loss_train = sess.run([train_step, cross_entropy],\n", 431 | " feed_dict={x: batch[0], y: batch[1], keep_prob: 0.5})\n", 432 | " print(\"\\rIteration: {}/{} ({:.1f}%) Loss: {:.5f}\".format(\n", 433 | " iteration, 200,\n", 434 | " iteration * 100 / 200,\n", 435 | " loss_train),\n", 436 | " end=\"\")\n", 437 | "\n", 438 | " # At the end of each epoch,\n", 439 | " # measure the validation loss and accuracy:\n", 440 | " loss_vals = []\n", 441 | " acc_vals = []\n", 442 | " for iteration in range(1, 200 + 1):\n", 443 | " X_batch, y_batch = mnist.validation.next_batch(batch_size)\n", 444 | " acc_val, loss_val = sess.run([accuracy, cross_entropy],\n", 445 | " feed_dict={x: batch[0], y: batch[1], keep_prob: 1.0})\n", 446 | " loss_vals.append(loss_val)\n", 447 | " acc_vals.append(acc_val)\n", 448 | " print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(iteration, 200,\n", 449 | " iteration * 100 / 200),\n", 450 | " end=\" \" * 10)\n", 451 | " loss_val = np.mean(loss_vals)\n", 452 | " acc_val = np.mean(acc_vals)\n", 453 | " print(\"\\rEpoch: {} Val accuracy: {:.4f}% Loss: {:.6f}\".format(\n", 454 | " epoch + 1, acc_val * 100, loss_val))\n", 455 | " \n", 456 | " # On completion of training, save the variables to be fed to custom model\n", 457 | " for var_name in variables_to_save:\n", 458 | " values[var_name] = sess.run(variables_to_save[var_name])" 459 | ] 460 | }, 461 | { 462 | "cell_type": "markdown", 463 | "metadata": {}, 464 | "source": [ 465 | "The 100% accuracy is not an error. It is due to the fact that complete validation set is not being evaluated only part of it is being evaluated and our model got all right answers in that part" 466 | ] 467 | }, 468 | { 469 | "cell_type": "markdown", 470 | "metadata": {}, 471 | "source": [ 472 | "#### Creating the custom model\n", 473 | "While creating the custom model, we will need to create all the variables ourself." 474 | ] 475 | }, 476 | { 477 | "cell_type": "markdown", 478 | "metadata": {}, 479 | "source": [ 480 | "First let's create a function that returns the required mean and variance for the batchnorm layer. Batchnorm layer requires that mean and variance be calculated of every layer except that of the channels layer" 481 | ] 482 | }, 483 | { 484 | "cell_type": "code", 485 | "execution_count": 10, 486 | "metadata": { 487 | "collapsed": true 488 | }, 489 | "outputs": [], 490 | "source": [ 491 | "def bn_mean_variance(input_tensor, axis=-1, keep_dims=True):\n", 492 | " shape = len(input_tensor.get_shape())\n", 493 | " if axis < 0:\n", 494 | " axis += shape\n", 495 | " dimension_range = range(shape)\n", 496 | " return tf.nn.moments(input_tensor, axes=dimension_range[:axis] + dimension_range[axis+1:],\n", 497 | " keep_dims=keep_dims)" 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 13, 503 | "metadata": { 504 | "collapsed": true 505 | }, 506 | "outputs": [], 507 | "source": [ 508 | "custom_graph = tf.Graph()\n", 509 | "with custom_graph.as_default():\n", 510 | " alphas_training_operations = []\n", 511 | " alphas_variables = []\n", 512 | " \n", 513 | " # Setting configuration\n", 514 | " no_filters_conv1 = 5\n", 515 | " no_layers_conv1 = 5\n", 516 | " no_filters_conv2 = 5\n", 517 | " no_layers_conv2 = 5\n", 518 | " \n", 519 | " # Inputs\n", 520 | " x = tf.placeholder(dtype=tf.float32)\n", 521 | " x_image = tf.reshape(x, [-1, 28, 28, 1])\n", 522 | " \n", 523 | " # Convolution Layer 1\n", 524 | " W_conv1 = tf.Variable(values[\"W_conv1\"], name=\"W_conv1\")\n", 525 | " b_conv1 = tf.Variable(values[\"b_conv1\"], name=\"b_conv1\")\n", 526 | " # Creating new variables\n", 527 | " alphas_conv1 = tf.Variable(tf.random_normal(shape=(no_filters_conv1, 1), mean=1.0, stddev=0.1),\n", 528 | " dtype=tf.float32, name=\"alphas_conv1\")\n", 529 | " shift_parameters_conv1 = tf.Variable(tf.constant(0., shape=(no_layers_conv1, 1)),\n", 530 | " dtype=tf.float32, name=\"shift_parameters_conv1\")\n", 531 | " betas_conv1 = tf.Variable(tf.constant(1., shape=(no_layers_conv1, 1)),\n", 532 | " dtype=tf.float32, name=\"betas_conv1\")\n", 533 | " # Performing the operations\n", 534 | " binary_filters_conv1 = get_binary_filters(W_conv1, no_filters_conv1)\n", 535 | " alpha_training_conv1, alpha_loss_conv1 = alpha_training(tf.stop_gradient(W_conv1, \"no_gradient_W_conv1\"),\n", 536 | " tf.stop_gradient(binary_filters_conv1,\n", 537 | " \"no_gradient_binary_filters_conv1\"),\n", 538 | " alphas_conv1, no_filters_conv1)\n", 539 | " conv1 = ABC(binary_filters_conv1, tf.stop_gradient(alphas_conv1), shift_parameters_conv1,\n", 540 | " betas_conv1, b_conv1, padding=\"SAME\")(x_image)\n", 541 | " # Saving the alphas training operation and the variable\n", 542 | " alphas_training_operations.append(alpha_training_conv1)\n", 543 | " alphas_variables.append(alphas_conv1)\n", 544 | " \n", 545 | " # Other layers\n", 546 | " pool1 = max_pool_2x2(conv1)\n", 547 | " # BatchNorm \n", 548 | " mean_conv1, variance_conv1 = bn_mean_variance(pool1)\n", 549 | " bn_gamma_conv1 = tf.Variable(tf.ones(shape=(32,), dtype=tf.float32), name=\"bn_gamma_conv1\")\n", 550 | " bn_beta_conv1 = tf.Variable(tf.zeros(shape=(32,), dtype=tf.float32), name=\"bn_beta_conv1\")\n", 551 | " bn_conv1 = tf.nn.batch_normalization(pool1, mean_conv1, variance_conv1,\n", 552 | " bn_beta_conv1, bn_gamma_conv1, 0.001)\n", 553 | " h_conv1 = tf.nn.relu(bn_conv1)\n", 554 | "\n", 555 | " # Convolution Layer 2\n", 556 | " W_conv2 = tf.Variable(values[\"W_conv2\"], name=\"W_conv2\")\n", 557 | " b_conv2 = tf.Variable(values[\"b_conv2\"], name=\"b_conv2\")\n", 558 | " \n", 559 | " # Creating new variables\n", 560 | " alphas_conv2 = tf.Variable(tf.random_normal(shape=(no_filters_conv2, 1), mean=1.0, stddev=0.1),\n", 561 | " dtype=tf.float32, name=\"alphas_conv2\")\n", 562 | " shift_parameters_conv2 = tf.Variable(tf.constant(0., shape=(no_layers_conv2, 1)),\n", 563 | " dtype=tf.float32, name=\"shift_parameters_conv2\")\n", 564 | " betas_conv2 = tf.Variable(tf.constant(1., shape=(no_layers_conv2, 1)),\n", 565 | " dtype=tf.float32, name=\"betas_conv2\")\n", 566 | " \n", 567 | " # Performing the operations\n", 568 | " binary_filters_conv2 = get_binary_filters(W_conv2, no_filters_conv2)\n", 569 | " alpha_training_conv2, alpha_loss_conv2 = alpha_training(tf.stop_gradient(W_conv2, \"no_gradient_W_conv2\"),\n", 570 | " tf.stop_gradient(binary_filters_conv2,\n", 571 | " \"no_gradient_binary_filters_conv2\"),\n", 572 | " alphas_conv2, no_filters_conv2)\n", 573 | " conv2 = ABC(binary_filters_conv2, tf.stop_gradient(alphas_conv2), shift_parameters_conv2,\n", 574 | " betas_conv2, b_conv2, padding=\"SAME\")(h_conv1)\n", 575 | " \n", 576 | " # Saving the alphas training operation and the variable\n", 577 | " alphas_training_operations.append(alpha_training_conv2)\n", 578 | " alphas_variables.append(alphas_conv2)\n", 579 | " \n", 580 | " # Other layers\n", 581 | " pool2 = max_pool_2x2(conv2)\n", 582 | " # BatchNorm\n", 583 | " mean_conv2, variance_conv2 = bn_mean_variance(pool2)\n", 584 | " bn_gamma_conv2 = tf.Variable(tf.ones(shape=(64,), dtype=tf.float32), name=\"bn_gamma_conv2\")\n", 585 | " bn_beta_conv2 = tf.Variable(tf.zeros(shape=(64,), dtype=tf.float32), name=\"bn_beta_conv2\")\n", 586 | " bn_conv2 = tf.nn.batch_normalization(pool2, mean_conv2, variance_conv2,\n", 587 | " bn_beta_conv2, bn_gamma_conv2, 0.001)\n", 588 | " h_conv2 = tf.nn.relu(bn_conv2)\n", 589 | "\n", 590 | " # Flat the conv2 output\n", 591 | " h_conv2_flat = tf.reshape(h_conv2, shape=(-1, 7*7*64))\n", 592 | "\n", 593 | " # Dense layer1\n", 594 | " W_fc1 = tf.convert_to_tensor(values[\"W_fc1\"], dtype=tf.float32)\n", 595 | " b_fc1 = tf.convert_to_tensor(values[\"b_fc1\"], dtype=tf.float32)\n", 596 | " h_fc1 = tf.nn.relu(tf.matmul(h_conv2_flat, W_fc1) + b_fc1)\n", 597 | "\n", 598 | " # Dropout\n", 599 | " keep_prob = tf.placeholder(tf.float32)\n", 600 | " h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)\n", 601 | "\n", 602 | " # Output layer\n", 603 | " W_fc2 = tf.convert_to_tensor(values[\"W_fc2\"], dtype=tf.float32)\n", 604 | " b_fc2 = tf.convert_to_tensor(values[\"b_fc2\"], dtype=tf.float32)\n", 605 | " y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2\n", 606 | " \n", 607 | " # Labels\n", 608 | " y = tf.placeholder(tf.int32, [None])\n", 609 | " y_ = tf.one_hot(y, 10)\n", 610 | " \n", 611 | " # Defining optimizer and loss\n", 612 | " cross_entropy = tf.reduce_mean(\n", 613 | " tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))\n", 614 | " train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)\n", 615 | " correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))\n", 616 | " accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))\n", 617 | " \n", 618 | " graph_init = tf.global_variables_initializer()\n", 619 | " alphas_init = tf.variables_initializer(alphas_variables)" 620 | ] 621 | }, 622 | { 623 | "cell_type": "markdown", 624 | "metadata": {}, 625 | "source": [ 626 | "Let's create the dictionary of variables to save" 627 | ] 628 | }, 629 | { 630 | "cell_type": "code", 631 | "execution_count": 14, 632 | "metadata": { 633 | "collapsed": true 634 | }, 635 | "outputs": [], 636 | "source": [ 637 | "# Defining variables to save. These will be fed to our custom layer\n", 638 | "variables_to_save = {\"W_conv1\": W_conv1,\n", 639 | " \"b_conv1\": b_conv1,\n", 640 | " \"alphas_conv1\": alphas_conv1,\n", 641 | " \"betas_conv1\": betas_conv1,\n", 642 | " \"shift_parameters_conv1\": shift_parameters_conv1,\n", 643 | " \"bn_gamma_conv1\": bn_gamma_conv1,\n", 644 | " \"bn_beta_conv1\": bn_beta_conv1,\n", 645 | " \"W_conv2\": W_conv2,\n", 646 | " \"b_conv2\": b_conv2,\n", 647 | " \"alphas_conv2\": alphas_conv2,\n", 648 | " \"betas_conv2\": betas_conv2,\n", 649 | " \"shift_parameters_conv2\": shift_parameters_conv2,\n", 650 | " \"bn_gamma_conv2\": bn_gamma_conv2,\n", 651 | " \"bn_beta_conv2\": bn_beta_conv2,\n", 652 | " \"W_fc1\": W_fc1,\n", 653 | " \"b_fc1\": b_fc1,\n", 654 | " \"W_fc2\": W_fc2,\n", 655 | " \"b_fc2\": b_fc2}\n", 656 | "values = {}" 657 | ] 658 | }, 659 | { 660 | "cell_type": "code", 661 | "execution_count": 15, 662 | "metadata": {}, 663 | "outputs": [ 664 | { 665 | "name": "stdout", 666 | "output_type": "stream", 667 | "text": [ 668 | "Epoch: 1 Val accuracy: 90.0000% Loss: 0.314954\n", 669 | "Epoch: 2 Val accuracy: 76.0000% Loss: 0.954873\n", 670 | "Epoch: 3 Val accuracy: 80.0000% Loss: 0.985948\n", 671 | "Epoch: 4 Val accuracy: 84.0000% Loss: 1.012544\n", 672 | "Epoch: 5 Val accuracy: 78.0000% Loss: 1.004487\n", 673 | "CPU times: user 4min 42s, sys: 26.4 s, total: 5min 8s\n", 674 | "Wall time: 5min 6s\n" 675 | ] 676 | } 677 | ], 678 | "source": [ 679 | "%%time\n", 680 | "n_epochs = 5\n", 681 | "batch_size = 32\n", 682 | "alpha_training_epochs = 200\n", 683 | " \n", 684 | "with tf.Session(graph=custom_graph) as sess:\n", 685 | " sess.run(graph_init)\n", 686 | " for epoch in range(n_epochs):\n", 687 | " for iteration in range(1, 200 + 1):\n", 688 | " # Training alphas\n", 689 | " sess.run(alphas_init)\n", 690 | " for alpha_training_op in alphas_training_operations:\n", 691 | " for alpha_epoch in range(alpha_training_epochs):\n", 692 | " sess.run(alpha_training_op)\n", 693 | " \n", 694 | " batch = mnist.train.next_batch(50)\n", 695 | " \n", 696 | " # Run operation and calculate loss\n", 697 | " _, loss_train = sess.run([train_step, cross_entropy],\n", 698 | " feed_dict={x: batch[0], y: batch[1], keep_prob: 0.5})\n", 699 | " print(\"\\rIteration: {}/{} ({:.1f}%) Loss: {:.5f}\".format(\n", 700 | " iteration, 200,\n", 701 | " iteration * 100 / 200,\n", 702 | " loss_train),\n", 703 | " end=\"\")\n", 704 | "\n", 705 | " # At the end of each epoch,\n", 706 | " # measure the validation loss and accuracy:\n", 707 | " \n", 708 | " # Training alphas\n", 709 | " sess.run(alphas_init)\n", 710 | " for alpha_training_op in alphas_training_operations:\n", 711 | " for alpha_epoch in range(alpha_training_epochs):\n", 712 | " sess.run(alpha_training_op)\n", 713 | " \n", 714 | " loss_vals = []\n", 715 | " acc_vals = []\n", 716 | " for iteration in range(1, 200 + 1): \n", 717 | " X_batch, y_batch = mnist.validation.next_batch(batch_size)\n", 718 | " acc_val, loss_val = sess.run([accuracy, cross_entropy],\n", 719 | " feed_dict={x: batch[0], y: batch[1], keep_prob: 1.0})\n", 720 | " loss_vals.append(loss_val)\n", 721 | " acc_vals.append(acc_val)\n", 722 | " print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(iteration, 200,\n", 723 | " iteration * 100 / 200),\n", 724 | " end=\" \" * 10)\n", 725 | " loss_val = np.mean(loss_vals)\n", 726 | " acc_val = np.mean(acc_vals)\n", 727 | " print(\"\\rEpoch: {} Val accuracy: {:.4f}% Loss: {:.6f}\".format(\n", 728 | " epoch + 1, acc_val * 100, loss_val))\n", 729 | " \n", 730 | " # On completion of training, save the variables to be fed to custom model\n", 731 | " for var_name in variables_to_save:\n", 732 | " values[var_name] = sess.run(variables_to_save[var_name])" 733 | ] 734 | }, 735 | { 736 | "cell_type": "markdown", 737 | "metadata": {}, 738 | "source": [ 739 | "Now, only the required variables can be saved for inference time. Using the **W_conv1** and **W_conv2**, values for binary filters and alphas can be calculated and those can be used along with **shift_parameters** and **betas** to create ABC layer for inference" 740 | ] 741 | }, 742 | { 743 | "cell_type": "markdown", 744 | "metadata": {}, 745 | "source": [ 746 | "### Pure inference testing\n", 747 | "OK! Let's extract the binary filters and alphas and throw away the weights and test our network. This will ensure that we do not have any bug in the implementation of the ABC layer" 748 | ] 749 | }, 750 | { 751 | "cell_type": "markdown", 752 | "metadata": {}, 753 | "source": [ 754 | "Creating graphs for alphas calculation" 755 | ] 756 | }, 757 | { 758 | "cell_type": "code", 759 | "execution_count": 22, 760 | "metadata": { 761 | "collapsed": true 762 | }, 763 | "outputs": [], 764 | "source": [ 765 | "alpha1_cal_graph = tf.Graph()\n", 766 | "with alpha1_cal_graph.as_default():\n", 767 | " alphas1 = tf.Variable(tf.random_normal(shape=(no_filters_conv1, 1), mean=1.0, stddev=0.1))\n", 768 | " conv_filters1 = tf.placeholder(dtype=tf.float32, shape=(5, 5, 1, 32))\n", 769 | " bin_filters1 = get_binary_filters(convolution_filters=conv_filters1,\n", 770 | " no_filters=no_filters_conv1)\n", 771 | " alpha_training_op1, alpha_training_loss1 = alpha_training(conv_filters1, bin_filters1,\n", 772 | " alphas1, no_filters_conv1)\n", 773 | " al_init1 = tf.global_variables_initializer()\n", 774 | " \n", 775 | "alpha2_cal_graph = tf.Graph()\n", 776 | "with alpha2_cal_graph.as_default():\n", 777 | " alphas2 = tf.Variable(tf.random_normal(shape=(no_filters_conv1, 1), mean=1.0, stddev=0.1))\n", 778 | " conv_filters2 = tf.placeholder(dtype=tf.float32, shape=(5, 5, 32, 64))\n", 779 | " bin_filters2 = get_binary_filters(convolution_filters=conv_filters2,\n", 780 | " no_filters=no_filters_conv2)\n", 781 | " alpha_training_op2, alpha_training_loss2 = alpha_training(conv_filters2, bin_filters2,\n", 782 | " alphas2, no_filters_conv2)\n", 783 | " al_init2 = tf.global_variables_initializer()" 784 | ] 785 | }, 786 | { 787 | "cell_type": "markdown", 788 | "metadata": {}, 789 | "source": [ 790 | "Calculating alphas and binary filters" 791 | ] 792 | }, 793 | { 794 | "cell_type": "code", 795 | "execution_count": 23, 796 | "metadata": { 797 | "collapsed": true 798 | }, 799 | "outputs": [], 800 | "source": [ 801 | "with tf.Session(graph=alpha1_cal_graph) as sess:\n", 802 | " al_init1.run()\n", 803 | " for epoch in range(200):\n", 804 | " sess.run(alpha_training_op1, feed_dict={conv_filters1: values[\"W_conv1\"]})\n", 805 | " cal_bin_filters, cal_alphas = sess.run([bin_filters1, alphas1], feed_dict={conv_filters1: values[\"W_conv1\"]})\n", 806 | " values[\"binary_filters_conv1\"] = cal_bin_filters\n", 807 | " values[\"alphas_conv1\"] = cal_alphas\n", 808 | "\n", 809 | "with tf.Session(graph=alpha2_cal_graph) as sess:\n", 810 | " al_init2.run()\n", 811 | " for epoch in range(200):\n", 812 | " sess.run(alpha_training_op2, feed_dict={conv_filters2: values[\"W_conv2\"]})\n", 813 | " cal_bin_filters, cal_alphas = sess.run([bin_filters2, alphas2], feed_dict={conv_filters2: values[\"W_conv2\"]})\n", 814 | " values[\"binary_filters_conv2\"] = cal_bin_filters\n", 815 | " values[\"alphas_conv2\"] = cal_alphas" 816 | ] 817 | }, 818 | { 819 | "cell_type": "markdown", 820 | "metadata": {}, 821 | "source": [ 822 | "#### Building inference model\n", 823 | "Now, we have all our variables, let's build an inference model" 824 | ] 825 | }, 826 | { 827 | "cell_type": "code", 828 | "execution_count": 25, 829 | "metadata": { 830 | "collapsed": true 831 | }, 832 | "outputs": [], 833 | "source": [ 834 | "inference_graph = tf.Graph()\n", 835 | "with inference_graph.as_default():\n", 836 | " # Setting configuration\n", 837 | " no_filters_conv1 = 5\n", 838 | " no_layers_conv1 = 5\n", 839 | " no_filters_conv2 = 5\n", 840 | " no_layers_conv2 = 5\n", 841 | " \n", 842 | " # Inputs\n", 843 | " x = tf.placeholder(dtype=tf.float32)\n", 844 | " x_image = tf.reshape(x, [-1, 28, 28, 1])\n", 845 | " \n", 846 | " # Convolution Layer 1\n", 847 | " b_conv1 = tf.convert_to_tensor(values[\"b_conv1\"], dtype=tf.float32, name=\"b_conv1\")\n", 848 | " alphas_conv1 = tf.convert_to_tensor(values[\"alphas_conv1\"],\n", 849 | " dtype=tf.float32, name=\"alphas_conv1\")\n", 850 | " shift_parameters_conv1 = tf.convert_to_tensor(values[\"shift_parameters_conv1\"],\n", 851 | " dtype=tf.float32, name=\"shift_parameters_conv1\")\n", 852 | " betas_conv1 = tf.convert_to_tensor(values[\"betas_conv1\"],\n", 853 | " dtype=tf.float32, name=\"betas_conv1\")\n", 854 | " # Performing the operations\n", 855 | " binary_filters_conv1 = tf.convert_to_tensor(values[\"binary_filters_conv1\"], dtype=tf.float32,\n", 856 | " name=\"binary_filters_conv1\")\n", 857 | " conv1 = ABC(binary_filters_conv1, tf.stop_gradient(alphas_conv1), shift_parameters_conv1,\n", 858 | " betas_conv1, b_conv1, padding=\"SAME\")(x_image)\n", 859 | " # Other layers\n", 860 | " pool1 = max_pool_2x2(conv1)\n", 861 | " # batch norm parameters\n", 862 | " mean_conv1, variance_conv1 = bn_mean_variance(pool1)\n", 863 | " bn_gamma_conv1 = tf.convert_to_tensor(values[\"bn_gamma_conv1\"], dtype=tf.float32,\n", 864 | " name=\"bn_gamma_conv1\")\n", 865 | " bn_beta_conv1 = tf.convert_to_tensor(values[\"bn_beta_conv1\"], dtype=tf.float32,\n", 866 | " name=\"bn_beta_conv1\")\n", 867 | " bn_conv1 = tf.nn.batch_normalization(pool1, mean_conv1, variance_conv1,\n", 868 | " bn_beta_conv1, bn_gamma_conv1, 0.001)\n", 869 | " h_conv1 = tf.nn.relu(bn_conv1)\n", 870 | "\n", 871 | " # Convolution Layer 2\n", 872 | " b_conv2 = tf.convert_to_tensor(values[\"b_conv2\"], dtype=tf.float32, name=\"b_conv2\")\n", 873 | " alphas_conv2 = tf.convert_to_tensor(values[\"alphas_conv2\"],\n", 874 | " dtype=tf.float32, name=\"alphas_conv2\")\n", 875 | " shift_parameters_conv2 = tf.convert_to_tensor(values[\"shift_parameters_conv2\"],\n", 876 | " dtype=tf.float32, name=\"shift_parameters_conv2\")\n", 877 | " betas_conv2 = tf.convert_to_tensor(values[\"betas_conv2\"],\n", 878 | " dtype=tf.float32, name=\"betas_conv2\")\n", 879 | " # Performing the operations\n", 880 | " binary_filters_conv2 = tf.convert_to_tensor(values[\"binary_filters_conv2\"], dtype=tf.float32,\n", 881 | " name=\"binary_filters_conv2\")\n", 882 | " conv2 = ABC(binary_filters_conv2, tf.stop_gradient(alphas_conv2), shift_parameters_conv2,\n", 883 | " betas_conv2, b_conv2, padding=\"SAME\")(h_conv1)\n", 884 | " # Other layers\n", 885 | " pool2 = max_pool_2x2(conv2)\n", 886 | " # batch norm parameters\n", 887 | " mean_conv2, variance_conv2 = bn_mean_variance(pool2)\n", 888 | " bn_gamma_conv2 = tf.convert_to_tensor(values[\"bn_gamma_conv2\"], dtype=tf.float32,\n", 889 | " name=\"bn_gamma_conv2\")\n", 890 | " bn_beta_conv2 = tf.convert_to_tensor(values[\"bn_beta_conv2\"], dtype=tf.float32,\n", 891 | " name=\"bn_beta_conv2\")\n", 892 | " bn_conv2 = tf.nn.batch_normalization(pool2, mean_conv2, variance_conv2,\n", 893 | " bn_beta_conv2, bn_gamma_conv2, 0.001)\n", 894 | " h_conv2 = tf.nn.relu(bn_conv2)\n", 895 | "\n", 896 | " # Flat the conv2 output\n", 897 | " h_conv2_flat = tf.reshape(h_conv2, shape=(-1, 7*7*64))\n", 898 | "\n", 899 | " # Dense layer1\n", 900 | " W_fc1 = tf.convert_to_tensor(values[\"W_fc1\"], dtype=tf.float32)\n", 901 | " b_fc1 = tf.convert_to_tensor(values[\"b_fc1\"], dtype=tf.float32)\n", 902 | " h_fc1 = tf.nn.relu(tf.matmul(h_conv2_flat, W_fc1) + b_fc1)\n", 903 | "\n", 904 | " # Dropout\n", 905 | " keep_prob = tf.placeholder(tf.float32)\n", 906 | " h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)\n", 907 | "\n", 908 | " # Output layer\n", 909 | " W_fc2 = tf.convert_to_tensor(values[\"W_fc2\"], dtype=tf.float32)\n", 910 | " b_fc2 = tf.convert_to_tensor(values[\"b_fc2\"], dtype=tf.float32)\n", 911 | " y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2\n", 912 | " \n", 913 | " # Labels\n", 914 | " y = tf.placeholder(tf.int32, [None])\n", 915 | " y_ = tf.one_hot(y, 10)\n", 916 | " \n", 917 | " # Defining optimizer and loss\n", 918 | " cross_entropy = tf.reduce_mean(\n", 919 | " tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))\n", 920 | " correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))\n", 921 | " accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))" 922 | ] 923 | }, 924 | { 925 | "cell_type": "markdown", 926 | "metadata": {}, 927 | "source": [ 928 | "Let's test the inference model" 929 | ] 930 | }, 931 | { 932 | "cell_type": "code", 933 | "execution_count": 26, 934 | "metadata": {}, 935 | "outputs": [ 936 | { 937 | "name": "stdout", 938 | "output_type": "stream", 939 | "text": [ 940 | "Epoch: 200 Val accuracy: 78.0000% Loss: 0.884985\n", 941 | "CPU times: user 6.03 s, sys: 832 ms, total: 6.86 s\n", 942 | "Wall time: 5.95 s\n" 943 | ] 944 | } 945 | ], 946 | "source": [ 947 | "%%time\n", 948 | "with tf.Session(graph=inference_graph) as sess:\n", 949 | " loss_vals = []\n", 950 | " acc_vals = []\n", 951 | " for iteration in range(1, 500 + 1): \n", 952 | " X_batch, y_batch = mnist.validation.next_batch(batch_size)\n", 953 | " acc_val, loss_val = sess.run([accuracy, cross_entropy],\n", 954 | " feed_dict={x: batch[0], y: batch[1], keep_prob: 1.0})\n", 955 | " loss_vals.append(loss_val)\n", 956 | " acc_vals.append(acc_val)\n", 957 | " print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(iteration, 500,\n", 958 | " iteration * 100 / 500),\n", 959 | " end=\" \" * 10)\n", 960 | " loss_val = np.mean(loss_vals)\n", 961 | " acc_val = np.mean(acc_vals)\n", 962 | " print(\"\\rEpoch: {} Val accuracy: {:.4f}% Loss: {:.6f}\".format(\n", 963 | " epoch + 1, acc_val * 100, loss_val))" 964 | ] 965 | }, 966 | { 967 | "cell_type": "code", 968 | "execution_count": null, 969 | "metadata": { 970 | "collapsed": true 971 | }, 972 | "outputs": [], 973 | "source": [] 974 | } 975 | ], 976 | "metadata": { 977 | "kernelspec": { 978 | "display_name": "tensorflow", 979 | "language": "python", 980 | "name": "tensorflow" 981 | }, 982 | "language_info": { 983 | "codemirror_mode": { 984 | "name": "ipython", 985 | "version": 2 986 | }, 987 | "file_extension": ".py", 988 | "mimetype": "text/x-python", 989 | "name": "python", 990 | "nbconvert_exporter": "python", 991 | "pygments_lexer": "ipython2", 992 | "version": "2.7.15" 993 | } 994 | }, 995 | "nbformat": 4, 996 | "nbformat_minor": 2 997 | } 998 | -------------------------------------------------------------------------------- /ABC.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Implementation of Accurate Binary Convolution Layer\n", 8 | "[Original Paper](https://arxiv.org/abs/1711.11294)" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "metadata": { 15 | "collapsed": true 16 | }, 17 | "outputs": [], 18 | "source": [ 19 | "from __future__ import division, print_function\n", 20 | "import tensorflow as tf\n", 21 | "import numpy as np" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "The inspiration for this network is the use of Deep Neural Networks for real-time object recognition. Currently available **Convolution Layers** require large amount of computation power at runtime and that hinders the use of very deep networks in embedded systems or ASICs. Xiaofan Lin, Cong Zhao, and Wei Pan presented a way to convert Convolution Layers to **Binary Convolution Layers** for faster realtime computation." 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "### Approximating Convolution weights using binary weights\n", 36 | "Here the hope is to approximate $\\mathbf{W}\\in\\mathbb{R}^{w*h*c_{in}*c_{out}}$ using $\\alpha_1\\mathbf{B_1}+\\alpha_2\\mathbf{B_2}+...+\\alpha_m\\mathbf{B_m}$ where $\\mathbf{B_1}, \\mathbf{B_2}, ..., \\mathbf{B_m}\\in\\mathbb{R}^{w*h*c_{in}*c_{out}}$ and $\\alpha_1, \\alpha_2, ..., \\alpha_m\\in\\mathbb{R}^1$" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "#### Conversion from convolution filter to binary filter\n", 44 | "Let's implement the conversion of convolution filter to binary convolution filters first.\n", 45 | "To approximate $\\mathbf{W}$ with $\\alpha_1\\mathbf{B_1}+\\alpha_2\\mathbf{B_2}+...+\\alpha_m\\mathbf{B_m}$ we'll use the equation from the paper $\\mathbf{B_i}=\\operatorname{sign}(\\bar{\\mathbf{W}} + \\mu_i\\operatorname{std}(\\mathbf{W}))$" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "We'll need mean and standard deviation of the complete convolution filters" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 2, 58 | "metadata": { 59 | "collapsed": true 60 | }, 61 | "outputs": [], 62 | "source": [ 63 | "def get_mean_stddev(input_tensor):\n", 64 | " with tf.name_scope('mean_stddev_cal'):\n", 65 | " mean, variance = tf.nn.moments(input_tensor, axes=range(len(input_tensor.get_shape())))\n", 66 | " stddev = tf.sqrt(variance, name=\"standard_deviation\")\n", 67 | " return mean, stddev" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "We need to spread the standard deviation by the number of filters being used as in the original paper\n", 75 | "$\\mu_i= -1 + (i - 1)\\frac{2}{\\mathbf{M} - 1}$" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 3, 81 | "metadata": { 82 | "collapsed": true 83 | }, 84 | "outputs": [], 85 | "source": [ 86 | "# TODO: Allow shift parameters to be learnable\n", 87 | "def get_shifted_stddev(stddev, no_filters):\n", 88 | " with tf.name_scope('shifted_stddev'):\n", 89 | " spreaded_deviation = -1. + (2./(no_filters - 1)) * tf.convert_to_tensor(range(no_filters),\n", 90 | " dtype=tf.float32)\n", 91 | " return spreaded_deviation * stddev" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "Now, we can get the values of $\\mathbf{B_{i}s}$" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": 4, 104 | "metadata": { 105 | "collapsed": true 106 | }, 107 | "outputs": [], 108 | "source": [ 109 | "def get_binary_filters(convolution_filters, no_filters, name=None):\n", 110 | " with tf.name_scope(name, default_name=\"get_binary_filters\"):\n", 111 | " mean, stddev = get_mean_stddev(convolution_filters)\n", 112 | " shifted_stddev = get_shifted_stddev(stddev, no_filters)\n", 113 | " \n", 114 | " # Normalize the filters by subtracting mean from them\n", 115 | " mean_adjusted_filters = convolution_filters - mean\n", 116 | " \n", 117 | " # Tiling filters to match the number of filters\n", 118 | " expanded_filters = tf.expand_dims(mean_adjusted_filters, axis=0, name=\"expanded_filters\")\n", 119 | " tiled_filters = tf.tile(expanded_filters, [no_filters] + [1] * len(convolution_filters.get_shape()),\n", 120 | " name=\"tiled_filters\")\n", 121 | " \n", 122 | " # Similarly tiling spreaded stddev to match the shape of tiled_filters\n", 123 | " expanded_stddev = tf.reshape(shifted_stddev, [no_filters] + [1] * len(convolution_filters.get_shape()),\n", 124 | " name=\"expanded_stddev\")\n", 125 | " \n", 126 | " binarized_filters = tf.sign(tiled_filters + expanded_stddev, name=\"binarized_filters\")\n", 127 | " return binarized_filters" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "#### Calculating alphas\n", 135 | "Now, we can calculate alphas using the *binary filters* and *convolution filters* by minimizing the *squared difference*\n", 136 | "$\\|\\mathbf{W}-\\mathbf{B}\\alpha\\|^2$" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 5, 142 | "metadata": { 143 | "collapsed": true 144 | }, 145 | "outputs": [], 146 | "source": [ 147 | "def get_alphas(convolution_filters, binary_filters, no_filters, name=None):\n", 148 | " with tf.name_scope(name, \"get_alphas\"):\n", 149 | " # Reshaping convolution filters to be one dimensional and binary filters to be of [no_filters, -1] dimension\n", 150 | " reshaped_convolution_filters = tf.reshape(convolution_filters, [-1], name=\"reshaped_convolution_filters\")\n", 151 | " reshaped_binary_filters = tf.reshape(binary_filters, [no_filters, -1],\n", 152 | " name=\"reshaped_binary_filters\")\n", 153 | " \n", 154 | " # Creating variable for alphas\n", 155 | " alphas = tf.Variable(tf.random_normal(shape=(no_filters, 1), mean=1.0, stddev=0.1), name=\"alphas\")\n", 156 | " \n", 157 | " # Calculating W*alpha\n", 158 | " weighted_sum_filters = tf.reduce_sum(tf.multiply(alphas, reshaped_binary_filters),\n", 159 | " axis=0, name=\"weighted_sum_filters\")\n", 160 | " \n", 161 | " # Defining loss\n", 162 | " error = tf.square(reshaped_convolution_filters - weighted_sum_filters, name=\"alphas_error\")\n", 163 | " loss = tf.reduce_mean(error, axis=0, name=\"alphas_loss\")\n", 164 | " \n", 165 | " # Defining optimizer\n", 166 | " training_op = tf.train.AdamOptimizer().minimize(loss, var_list=[alphas],\n", 167 | " name=\"alphas_training_op\")\n", 168 | " \n", 169 | " return alphas, training_op, loss" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "### Creating ApproxConv using the binary filters\n", 177 | "$\\mathbf{O}=\\sum\\limits_{m=1}^M\\alpha_m\\operatorname{Conv}(\\mathbf{B}_m, \\mathbf{A})$" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "As in mentioned in the paper, it is better to train the network first with simple Convolution networks and then convert the filters into the binary filters, allowing original filters to be trained." 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 6, 190 | "metadata": { 191 | "collapsed": true 192 | }, 193 | "outputs": [], 194 | "source": [ 195 | "def ApproxConv(no_filters, convolution_filters, convolution_biases=None,\n", 196 | " strides=(1, 1), padding=\"VALID\", name=None):\n", 197 | " with tf.name_scope(name, \"ApproxConv\"):\n", 198 | " # Creating variables from input convolution filters and convolution biases\n", 199 | " filters = tf.Variable(convolution_filters, dtype=tf.float32, name=\"filters\")\n", 200 | " if convolution_biases is None:\n", 201 | " biases = 0.\n", 202 | " else:\n", 203 | " biases = tf.Variable(convolution_biases, dtype=tf.float32, name=\"biases\")\n", 204 | " \n", 205 | " # Creating binary filters\n", 206 | " binary_filters = get_binary_filters(filters, no_filters)\n", 207 | " \n", 208 | " # Getting alphas\n", 209 | " alphas, alphas_training_op, alphas_loss = get_alphas(filters, binary_filters,\n", 210 | " no_filters)\n", 211 | " \n", 212 | " # Defining function for closure to accept multiple inputs with same filters\n", 213 | " def ApproxConvLayer(input_tensor, name=None):\n", 214 | " with tf.name_scope(name, \"ApproxConv_Layer\"):\n", 215 | " # Reshaping alphas to match the input tensor\n", 216 | " reshaped_alphas = tf.reshape(alphas,\n", 217 | " shape=[no_filters] + [1] * len(input_tensor.get_shape()),\n", 218 | " name=\"reshaped_alphas\")\n", 219 | " \n", 220 | " # Calculating convolution for each binary filter\n", 221 | " approxConv_outputs = []\n", 222 | " for index in range(no_filters):\n", 223 | " # Binary convolution\n", 224 | " this_conv = tf.nn.conv2d(input_tensor, binary_filters[index],\n", 225 | " strides=(1,) + strides + (1,),\n", 226 | " padding=padding)\n", 227 | " approxConv_outputs.append(this_conv + biases)\n", 228 | " conv_outputs = tf.convert_to_tensor(approxConv_outputs, dtype=tf.float32,\n", 229 | " name=\"conv_outputs\")\n", 230 | " \n", 231 | " # Summing up each of the binary convolution\n", 232 | " ApproxConv_output = tf.reduce_sum(tf.multiply(conv_outputs, reshaped_alphas), axis=0)\n", 233 | " \n", 234 | " return ApproxConv_output\n", 235 | " \n", 236 | " return alphas_training_op, ApproxConvLayer, alphas_loss" 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "### Multiple binary activations and bitwise convolution\n", 244 | "Now, convolution can be achieved using just the summation operations by using the ApproxConv layers. But the paper suggests something even better. We can even bypass the summation through bitwise operations only, if the input to the convolution layer is also binarized.\n", 245 | "For that the authors suggests that an input can be binarized (creating multiple inputs) by shifting the inputs and binarizing them." 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "First, the input is clipped between 0. and 1. using multiple shift parameters $\\nu$, learnable by the network \n", 253 | "$\\operatorname{h_{\\nu}}(x)=\\operatorname{clip}(x + \\nu, 0, 1)$ \n", 254 | " \n", 255 | "Then using the following function it is binarized \n", 256 | "$\\operatorname{H_{\\nu}}(\\mathbf{R})=2\\mathbb{I}_{\\operatorname{h_{\\nu}}(\\mathbf{R})\\geq0.5}-1$\n", 257 | "\n", 258 | "The above function can be implemented as \n", 259 | "$\\operatorname{H_{\\nu}}(\\mathbf{R})=\\operatorname{sign}(\\mathbf{R} - 0.5)$\n", 260 | "\n", 261 | "Now, after calculating the **ApproxConv** over each separated input, their weighted summation can be taken using trainable paramters $\\beta s$" 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": 7, 267 | "metadata": { 268 | "collapsed": true 269 | }, 270 | "outputs": [], 271 | "source": [ 272 | "def ABC(convolution_filters, convolution_biases=None, no_binary_filters=5, no_ApproxConvLayers=5,\n", 273 | " strides=(1, 1), padding=\"VALID\", name=None):\n", 274 | " with tf.name_scope(name, \"ABC\"):\n", 275 | " # Creating variables shift parameters and weighted sum parameters (betas)\n", 276 | " shift_parameters = tf.Variable(tf.constant(0., shape=(no_ApproxConvLayers, 1)), dtype=tf.float32,\n", 277 | " name=\"shift_parameters\")\n", 278 | " betas = tf.Variable(tf.constant(1., shape=(no_ApproxConvLayers, 1)), dtype=tf.float32,\n", 279 | " name=\"betas\")\n", 280 | " \n", 281 | " # Instantiating the ApproxConv Layer\n", 282 | " alphas_training_op, ApproxConvLayer, alphas_loss = ApproxConv(no_binary_filters,\n", 283 | " convolution_filters, convolution_biases,\n", 284 | " strides, padding)\n", 285 | " \n", 286 | " def ABCLayer(input_tensor, name=None):\n", 287 | " with tf.name_scope(name, \"ABCLayer\"):\n", 288 | " # Reshaping betas to match the input tensor\n", 289 | " reshaped_betas = tf.reshape(betas,\n", 290 | " shape=[no_ApproxConvLayers] + [1] * len(input_tensor.get_shape()),\n", 291 | " name=\"reshaped_betas\")\n", 292 | " \n", 293 | " # Calculating ApproxConv for each shifted input\n", 294 | " ApproxConv_layers = []\n", 295 | " for index in range(no_ApproxConvLayers):\n", 296 | " # Shifting and binarizing input\n", 297 | " shifted_input = tf.clip_by_value(input_tensor + shift_parameters[index], 0., 1.,\n", 298 | " name=\"shifted_input_\" + str(index))\n", 299 | " binarized_activation = tf.sign(shifted_input - 0.5)\n", 300 | " \n", 301 | " # Passing through the ApproxConv layer\n", 302 | " ApproxConv_layers.append(ApproxConvLayer(binarized_activation))\n", 303 | " ApproxConv_output = tf.convert_to_tensor(ApproxConv_layers, dtype=tf.float32,\n", 304 | " name=\"ApproxConv_output\")\n", 305 | " \n", 306 | " # Taking the weighted sum using the betas\n", 307 | " ABC_output = tf.reduce_sum(tf.multiply(ApproxConv_output, reshaped_betas), axis=0)\n", 308 | " return ABC_output\n", 309 | " \n", 310 | " return alphas_training_op, ABCLayer, alphas_loss" 311 | ] 312 | }, 313 | { 314 | "cell_type": "markdown", 315 | "metadata": {}, 316 | "source": [ 317 | "## Testing\n", 318 | "Let's just test our network using MNIST" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 8, 324 | "metadata": {}, 325 | "outputs": [ 326 | { 327 | "name": "stdout", 328 | "output_type": "stream", 329 | "text": [ 330 | "Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.\n", 331 | "Extracting /tmp/data/train-images-idx3-ubyte.gz\n", 332 | "Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.\n", 333 | "Extracting /tmp/data/train-labels-idx1-ubyte.gz\n", 334 | "Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.\n", 335 | "Extracting /tmp/data/t10k-images-idx3-ubyte.gz\n", 336 | "Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.\n", 337 | "Extracting /tmp/data/t10k-labels-idx1-ubyte.gz\n" 338 | ] 339 | } 340 | ], 341 | "source": [ 342 | "# Importing data\n", 343 | "from tensorflow.examples.tutorials.mnist import input_data\n", 344 | "!mkdir -p /tmp/data\n", 345 | "mnist = input_data.read_data_sets(\"/tmp/data/\")" 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": 9, 351 | "metadata": { 352 | "collapsed": true 353 | }, 354 | "outputs": [], 355 | "source": [ 356 | "# Defining utils function\n", 357 | "def weight_variable(shape, name=\"weight\"):\n", 358 | " initial = tf.truncated_normal(shape, stddev=0.1)\n", 359 | " return tf.Variable(initial, name=name)\n", 360 | "\n", 361 | "def bias_variable(shape, name=\"bias\"):\n", 362 | " initial = tf.constant(0.1, shape=shape)\n", 363 | " return tf.Variable(initial, name=name)\n", 364 | "\n", 365 | "def conv2d(x, W):\n", 366 | " return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')\n", 367 | "\n", 368 | "def max_pool_2x2(x):\n", 369 | " return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],\n", 370 | " strides=[1, 2, 2, 1], padding='SAME')" 371 | ] 372 | }, 373 | { 374 | "cell_type": "code", 375 | "execution_count": 10, 376 | "metadata": { 377 | "collapsed": true 378 | }, 379 | "outputs": [], 380 | "source": [ 381 | "# Creating the graph\n", 382 | "without_ABC_graph = tf.Graph()\n", 383 | "with without_ABC_graph.as_default():\n", 384 | " # Defining inputs\n", 385 | " x = tf.placeholder(dtype=tf.float32)\n", 386 | " x_image = tf.reshape(x, [-1, 28, 28, 1])\n", 387 | " \n", 388 | " # Convolution Layer 1\n", 389 | " W_conv1 = weight_variable(shape=([5, 5, 1, 32]), name=\"W_conv1\")\n", 390 | " b_conv1 = bias_variable(shape=[32], name=\"b_conv1\")\n", 391 | " conv1 = (conv2d(x_image, W_conv1) + b_conv1)\n", 392 | " pool1 = max_pool_2x2(conv1)\n", 393 | " bn_conv1 = tf.layers.batch_normalization(pool1, axis=-1, name=\"batchNorm1\")\n", 394 | " h_conv1 = tf.nn.relu(bn_conv1)\n", 395 | "\n", 396 | " # Convolution Layer 2\n", 397 | " W_conv2 = weight_variable(shape=([5, 5, 32, 64]), name=\"W_conv2\")\n", 398 | " b_conv2 = bias_variable(shape=[64], name=\"b_conv2\")\n", 399 | " conv2 = (conv2d(h_conv1, W_conv2) + b_conv2)\n", 400 | " pool2 = max_pool_2x2(conv2)\n", 401 | " bn_conv2 = tf.layers.batch_normalization(pool2, axis=-1, name=\"batchNorm2\")\n", 402 | " h_conv2 = tf.nn.relu(bn_conv2)\n", 403 | "\n", 404 | " # Flat the conv2 output\n", 405 | " h_conv2_flat = tf.reshape(h_conv2, shape=(-1, 7*7*64))\n", 406 | "\n", 407 | " # Dense layer1\n", 408 | " W_fc1 = weight_variable([7 * 7 * 64, 1024])\n", 409 | " b_fc1 = bias_variable([1024])\n", 410 | " h_fc1 = tf.nn.relu(tf.matmul(h_conv2_flat, W_fc1) + b_fc1)\n", 411 | "\n", 412 | " # Dropout\n", 413 | " keep_prob = tf.placeholder(tf.float32)\n", 414 | " h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)\n", 415 | "\n", 416 | " # Output layer\n", 417 | " W_fc2 = weight_variable([1024, 10])\n", 418 | " b_fc2 = bias_variable([10])\n", 419 | "\n", 420 | " y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2\n", 421 | " \n", 422 | " # Labels\n", 423 | " y = tf.placeholder(tf.int32, [None])\n", 424 | " y_ = tf.one_hot(y, 10)\n", 425 | " \n", 426 | " # Defining optimizer and loss\n", 427 | " cross_entropy = tf.reduce_mean(\n", 428 | " tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))\n", 429 | " train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)\n", 430 | " correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))\n", 431 | " accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))\n", 432 | " \n", 433 | " # Initializer\n", 434 | " graph_init = tf.global_variables_initializer()" 435 | ] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "metadata": {}, 440 | "source": [ 441 | "Let's just define a dictionary to hold the numpy values of the calculated parameters of the network, so that we can feed it directly to our custom network" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": 11, 447 | "metadata": { 448 | "collapsed": true 449 | }, 450 | "outputs": [], 451 | "source": [ 452 | "# Defining variables to save. These will be fed to our custom layer\n", 453 | "variables_to_save = {\"W_conv1\": W_conv1,\n", 454 | " \"b_conv1\": b_conv1,\n", 455 | " \"W_conv2\": W_conv2,\n", 456 | " \"b_conv2\": b_conv2,\n", 457 | " \"W_fc1\": W_fc1,\n", 458 | " \"b_fc1\": b_fc1,\n", 459 | " \"W_fc2\": W_fc2,\n", 460 | " \"b_fc2\": b_fc2}\n", 461 | "values = {}" 462 | ] 463 | }, 464 | { 465 | "cell_type": "code", 466 | "execution_count": 14, 467 | "metadata": {}, 468 | "outputs": [ 469 | { 470 | "name": "stdout", 471 | "output_type": "stream", 472 | "text": [ 473 | "Epoch: 1 Val accuracy: 88.0000% Loss: 0.432063\n", 474 | "Epoch: 2 Val accuracy: 98.0000% Loss: 0.128601\n", 475 | "Epoch: 3 Val accuracy: 96.0000% Loss: 0.197146\n", 476 | "Epoch: 4 Val accuracy: 96.0000% Loss: 0.111511\n", 477 | "Epoch: 5 Val accuracy: 92.0000% Loss: 0.232009\n" 478 | ] 479 | } 480 | ], 481 | "source": [ 482 | "n_epochs = 5\n", 483 | "batch_size = 32\n", 484 | " \n", 485 | "with tf.Session(graph=without_ABC_graph) as sess:\n", 486 | " sess.run(graph_init)\n", 487 | " for epoch in range(n_epochs):\n", 488 | " for iteration in range(1, 200 + 1):\n", 489 | " batch = mnist.train.next_batch(50)\n", 490 | " \n", 491 | " # Run operation and calculate loss\n", 492 | " _, loss_train = sess.run([train_step, cross_entropy],\n", 493 | " feed_dict={x: batch[0], y: batch[1], keep_prob: 0.5})\n", 494 | " print(\"\\rIteration: {}/{} ({:.1f}%) Loss: {:.5f}\".format(\n", 495 | " iteration, 200,\n", 496 | " iteration * 100 / 200,\n", 497 | " loss_train),\n", 498 | " end=\"\")\n", 499 | "\n", 500 | " # At the end of each epoch,\n", 501 | " # measure the validation loss and accuracy:\n", 502 | " loss_vals = []\n", 503 | " acc_vals = []\n", 504 | " for iteration in range(1, 200 + 1):\n", 505 | " X_batch, y_batch = mnist.validation.next_batch(batch_size)\n", 506 | " acc_val, loss_val = sess.run([accuracy, cross_entropy],\n", 507 | " feed_dict={x: batch[0], y: batch[1], keep_prob: 1.0})\n", 508 | " loss_vals.append(loss_val)\n", 509 | " acc_vals.append(acc_val)\n", 510 | " print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(iteration, 200,\n", 511 | " iteration * 100 / 200),\n", 512 | " end=\" \" * 10)\n", 513 | " loss_val = np.mean(loss_vals)\n", 514 | " acc_val = np.mean(acc_vals)\n", 515 | " print(\"\\rEpoch: {} Val accuracy: {:.4f}% Loss: {:.6f}\".format(\n", 516 | " epoch + 1, acc_val * 100, loss_val))\n", 517 | " \n", 518 | " # On completion of training, save the variables to be fed to custom model\n", 519 | " for var_name in variables_to_save:\n", 520 | " values[var_name] = sess.run(variables_to_save[var_name])" 521 | ] 522 | }, 523 | { 524 | "cell_type": "markdown", 525 | "metadata": {}, 526 | "source": [ 527 | "### Let's build our model now" 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": 17, 533 | "metadata": { 534 | "collapsed": true 535 | }, 536 | "outputs": [], 537 | "source": [ 538 | "custom_graph = tf.Graph()\n", 539 | "with custom_graph.as_default():\n", 540 | " alphas_training_operations = []\n", 541 | " \n", 542 | " # Inputs\n", 543 | " x = tf.placeholder(dtype=tf.float32)\n", 544 | " x_image = tf.reshape(x, [-1, 28, 28, 1])\n", 545 | " \n", 546 | " # Convolution Layer 1\n", 547 | " W_conv1 = tf.Variable(values[\"W_conv1\"], name=\"W_conv1\")\n", 548 | " b_conv1 = tf.Variable(values[\"b_conv1\"], name=\"b_conv1\")\n", 549 | " alphas_training_op1, ABCLayer1, alphas_loss1 = ABC(W_conv1, b_conv1,\n", 550 | " no_binary_filters=5,\n", 551 | " no_ApproxConvLayers=5,\n", 552 | " padding=\"SAME\")\n", 553 | " alphas_training_operations.append(alphas_training_op1)\n", 554 | " conv1 = ABCLayer1(x_image)\n", 555 | " pool1 = max_pool_2x2(conv1)\n", 556 | " bn_conv1 = tf.layers.batch_normalization(pool1, axis=-1)\n", 557 | " h_conv1 = tf.nn.relu(bn_conv1)\n", 558 | "\n", 559 | " # Convolution Layer 2\n", 560 | " W_conv2 = tf.Variable(values[\"W_conv2\"], name=\"W_conv2\")\n", 561 | " b_conv2 = tf.Variable(values[\"b_conv2\"], name=\"b_conv2\")\n", 562 | " alphas_training_op2, ABCLayer2, alphas_loss2 = ABC(W_conv2, b_conv2,\n", 563 | " no_binary_filters=5,\n", 564 | " no_ApproxConvLayers=5,\n", 565 | " padding=\"SAME\")\n", 566 | " alphas_training_operations.append(alphas_training_op2)\n", 567 | " conv2 = ABCLayer2(h_conv1)\n", 568 | " pool2 = max_pool_2x2(conv2)\n", 569 | " bn_conv2 = tf.layers.batch_normalization(pool2, axis=-1)\n", 570 | " h_conv2 = tf.nn.relu(bn_conv2)\n", 571 | "\n", 572 | " # Flat the conv2 output\n", 573 | " h_conv2_flat = tf.reshape(h_conv2, shape=(-1, 7*7*64))\n", 574 | "\n", 575 | " # Dense layer1\n", 576 | " W_fc1 = weight_variable([7 * 7 * 64, 1024])\n", 577 | " b_fc1 = bias_variable([1024])\n", 578 | " h_fc1 = tf.nn.relu(tf.matmul(h_conv2_flat, W_fc1) + b_fc1)\n", 579 | "\n", 580 | " # Dropout\n", 581 | " keep_prob = tf.placeholder(tf.float32)\n", 582 | " h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)\n", 583 | "\n", 584 | " # Output layer\n", 585 | " W_fc2 = weight_variable([1024, 10])\n", 586 | " b_fc2 = bias_variable([10])\n", 587 | " y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2\n", 588 | " \n", 589 | " # Labels\n", 590 | " y = tf.placeholder(tf.int32, [None])\n", 591 | " y_ = tf.one_hot(y, 10)\n", 592 | " \n", 593 | " # Defining optimizer and loss\n", 594 | " cross_entropy = tf.reduce_mean(\n", 595 | " tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))\n", 596 | " train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)\n", 597 | " correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))\n", 598 | " accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))\n", 599 | " \n", 600 | " graph_init = tf.global_variables_initializer()" 601 | ] 602 | }, 603 | { 604 | "cell_type": "code", 605 | "execution_count": 20, 606 | "metadata": { 607 | "scrolled": true 608 | }, 609 | "outputs": [ 610 | { 611 | "name": "stdout", 612 | "output_type": "stream", 613 | "text": [ 614 | "Epoch: 1 Val accuracy: 88.0000% Loss: 6.530759\n", 615 | "Epoch: 2 Val accuracy: 86.0000% Loss: 4.208882\n", 616 | "Epoch: 3 Val accuracy: 92.0000% Loss: 1.455365\n", 617 | "Epoch: 4 Val accuracy: 92.0000% Loss: 0.708834\n", 618 | "Epoch: 5 Val accuracy: 86.0000% Loss: 0.366106\n" 619 | ] 620 | } 621 | ], 622 | "source": [ 623 | "n_epochs = 5\n", 624 | "batch_size = 32\n", 625 | "alpha_training_epochs = 200\n", 626 | " \n", 627 | "with tf.Session(graph=custom_graph) as sess:\n", 628 | " sess.run(graph_init)\n", 629 | " for epoch in range(n_epochs):\n", 630 | " for iteration in range(1, 200 + 1):\n", 631 | " # Training alphas\n", 632 | " for alpha_training_op in alphas_training_operations:\n", 633 | " for alpha_epoch in range(alpha_training_epochs):\n", 634 | " sess.run(alpha_training_op)\n", 635 | " \n", 636 | " batch = mnist.train.next_batch(50)\n", 637 | " \n", 638 | " # Run operation and calculate loss\n", 639 | " _, loss_train = sess.run([train_step, cross_entropy],\n", 640 | " feed_dict={x: batch[0], y: batch[1], keep_prob: 0.5})\n", 641 | " print(\"\\rIteration: {}/{} ({:.1f}%) Loss: {:.5f}\".format(\n", 642 | " iteration, 200,\n", 643 | " iteration * 100 / 200,\n", 644 | " loss_train),\n", 645 | " end=\"\")\n", 646 | "\n", 647 | " # At the end of each epoch,\n", 648 | " # measure the validation loss and accuracy:\n", 649 | " \n", 650 | " # Training alphas\n", 651 | " for alpha_training_op in alphas_training_operations:\n", 652 | " for alpha_epoch in range(alpha_training_epochs):\n", 653 | " sess.run(alpha_training_op)\n", 654 | " \n", 655 | " loss_vals = []\n", 656 | " acc_vals = []\n", 657 | " for iteration in range(1, 200 + 1): \n", 658 | " X_batch, y_batch = mnist.validation.next_batch(batch_size)\n", 659 | " acc_val, loss_val = sess.run([accuracy, cross_entropy],\n", 660 | " feed_dict={x: batch[0], y: batch[1], keep_prob: 1.0})\n", 661 | " loss_vals.append(loss_val)\n", 662 | " acc_vals.append(acc_val)\n", 663 | " print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(iteration, 200,\n", 664 | " iteration * 100 / 200),\n", 665 | " end=\" \" * 10)\n", 666 | " loss_val = np.mean(loss_vals)\n", 667 | " acc_val = np.mean(acc_vals)\n", 668 | " print(\"\\rEpoch: {} Val accuracy: {:.4f}% Loss: {:.6f}\".format(\n", 669 | " epoch + 1, acc_val * 100, loss_val))" 670 | ] 671 | }, 672 | { 673 | "cell_type": "code", 674 | "execution_count": null, 675 | "metadata": { 676 | "collapsed": true 677 | }, 678 | "outputs": [], 679 | "source": [] 680 | } 681 | ], 682 | "metadata": { 683 | "kernelspec": { 684 | "display_name": "tensorflow", 685 | "language": "python", 686 | "name": "tensorflow" 687 | }, 688 | "language_info": { 689 | "codemirror_mode": { 690 | "name": "ipython", 691 | "version": 2 692 | }, 693 | "file_extension": ".py", 694 | "mimetype": "text/x-python", 695 | "name": "python", 696 | "nbconvert_exporter": "python", 697 | "pygments_lexer": "ipython2", 698 | "version": "2.7.15" 699 | } 700 | }, 701 | "nbformat": 4, 702 | "nbformat_minor": 2 703 | } 704 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 layog 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Accurate-Binary-Convolution-Network 2 | Binary Convolution Network for faster real-time processing in ASICs 3 | 4 | --- 5 | 6 | Tensorflow implementation of [Towards Accurate Binary Convolutional Neural Network](https://arxiv.org/abs/1711.11294) by Xiaofan Lin, Cong Zhao, and Wei Pan. 7 | Why this network? Let's quote the authors 8 | > It has been known that using binary weights and activations drastically reduce memory size and accesses, and can replace arithmetic operations with more efficient bitwise operations, leading to much faster test-time inference and lower power consumption. 9 | > The implementation of the resulting binary CNN, denoted as ABC-Net, is shown to achieve much closer performance to its full-precision counterpart, and even reach the comparable prediction accuracy on ImageNet and forest trail datasets, given adequate binary weight bases and activations. 10 | 11 | ### Dependencies 12 | ```sh 13 | pip install -r requirements.txt 14 | ``` 15 | By default `tensorflow-gpu` will be installed. Make sure to have `CUDA` properly setup. 16 | 17 | ### Notebooks 18 | * **ABC** - Contains the original implementation of the ABC network 19 | * **ABC-layer-inference-support** - Slightly modified functions for better inference time support (tl;dr moved the alpha training operation out of the layer) 20 | 21 | ### Testing 22 | * MNIST - Accuracy on validation set reached upto 94%. (Check the notebook for information) 23 | * ImageNet - To be added 24 | 25 | > NOTE: shift_parameters and beta values are currently not trainable. This is because the gradient for `tf.sign` and `tf.clip_by_value` were not implemented in `tensorflow v1.4`. Even in the current version (`tensorflow v1.8`) the gradient for `tf.sign` is not implemented. Implementation of custom Straight Through Estimator (STE) is required. 26 | 27 | ### TODO 28 | - [ ] Test on ImageNet (2012) 29 | - [ ] Add visualization of the complete `ABC` layer 30 | - [ ] Port to `tensorflow v1.8.0` 31 | - [ ] Implement custom STE for `tf.sign` -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | tensorflow-gpu==1.4.1 2 | ipykernel==4.7.0 3 | numpy==1.14.0 4 | 5 | --------------------------------------------------------------------------------