├── .gitignore
├── ABC-layer-inference-support.ipynb
├── ABC.ipynb
├── LICENSE
├── README.md
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | env/
 12 | build/
 13 | develop-eggs/
 14 | dist/
 15 | downloads/
 16 | eggs/
 17 | .eggs/
 18 | lib/
 19 | lib64/
 20 | parts/
 21 | sdist/
 22 | var/
 23 | wheels/
 24 | *.egg-info/
 25 | .installed.cfg
 26 | *.egg
 27 | 
 28 | # PyInstaller
 29 | #  Usually these files are written by a python script from a template
 30 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 31 | *.manifest
 32 | *.spec
 33 | 
 34 | # Installer logs
 35 | pip-log.txt
 36 | pip-delete-this-directory.txt
 37 | 
 38 | # Unit test / coverage reports
 39 | htmlcov/
 40 | .tox/
 41 | .coverage
 42 | .coverage.*
 43 | .cache
 44 | nosetests.xml
 45 | coverage.xml
 46 | *.cover
 47 | .hypothesis/
 48 | 
 49 | # Translations
 50 | *.mo
 51 | *.pot
 52 | 
 53 | # Django stuff:
 54 | *.log
 55 | local_settings.py
 56 | 
 57 | # Flask stuff:
 58 | instance/
 59 | .webassets-cache
 60 | 
 61 | # Scrapy stuff:
 62 | .scrapy
 63 | 
 64 | # Sphinx documentation
 65 | docs/_build/
 66 | 
 67 | # PyBuilder
 68 | target/
 69 | 
 70 | # Jupyter Notebook
 71 | .ipynb_checkpoints
 72 | 
 73 | # pyenv
 74 | .python-version
 75 | 
 76 | # celery beat schedule file
 77 | celerybeat-schedule
 78 | 
 79 | # SageMath parsed files
 80 | *.sage.py
 81 | 
 82 | # dotenv
 83 | .env
 84 | 
 85 | # virtualenv
 86 | .venv
 87 | venv/
 88 | ENV/
 89 | 
 90 | # Spyder project settings
 91 | .spyderproject
 92 | .spyproject
 93 | 
 94 | # Rope project settings
 95 | .ropeproject
 96 | 
 97 | # mkdocs documentation
 98 | /site
 99 | 
100 | # mypy
101 | .mypy_cache/
102 | 


--------------------------------------------------------------------------------
/ABC-layer-inference-support.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Implementation of Accurate Binary Convolution Layer\n",
  8 |     "The main notebook is **ABC.ipynb**. In this notebook, *alphas* training is moved out of the layer, so that the variables and functions can be made reusable for inference time."
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "code",
 13 |    "execution_count": 1,
 14 |    "metadata": {
 15 |     "collapsed": true
 16 |    },
 17 |    "outputs": [],
 18 |    "source": [
 19 |     "from __future__ import division, print_function\n",
 20 |     "import tensorflow as tf\n",
 21 |     "import numpy as np"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "markdown",
 26 |    "metadata": {},
 27 |    "source": [
 28 |     "#### See *ABC* notebook for explanation of all the functions"
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "code",
 33 |    "execution_count": 2,
 34 |    "metadata": {
 35 |     "collapsed": true
 36 |    },
 37 |    "outputs": [],
 38 |    "source": [
 39 |     "def get_mean_stddev(input_tensor):\n",
 40 |     "    with tf.name_scope('mean_stddev_cal'):\n",
 41 |     "        mean, variance = tf.nn.moments(input_tensor, axes=range(len(input_tensor.get_shape())))\n",
 42 |     "        stddev = tf.sqrt(variance, name=\"standard_deviation\")\n",
 43 |     "        return mean, stddev\n",
 44 |     "    \n",
 45 |     "# TODO: Allow shift parameters to be learnable\n",
 46 |     "def get_shifted_stddev(stddev, no_filters):\n",
 47 |     "    with tf.name_scope('shifted_stddev'):\n",
 48 |     "        spreaded_deviation = -1. + (2./(no_filters - 1)) * tf.convert_to_tensor(range(no_filters),\n",
 49 |     "                                                                                dtype=tf.float32)\n",
 50 |     "        return spreaded_deviation * stddev\n",
 51 |     "    \n",
 52 |     "def get_binary_filters(convolution_filters, no_filters, name=None):\n",
 53 |     "    with tf.name_scope(name, default_name=\"get_binary_filters\"):\n",
 54 |     "        mean, stddev = get_mean_stddev(convolution_filters)\n",
 55 |     "        shifted_stddev = get_shifted_stddev(stddev, no_filters)\n",
 56 |     "        \n",
 57 |     "        # Normalize the filters by subtracting mean from them\n",
 58 |     "        mean_adjusted_filters = convolution_filters - mean\n",
 59 |     "        \n",
 60 |     "        # Tiling filters to match the number of filters\n",
 61 |     "        expanded_filters = tf.expand_dims(mean_adjusted_filters, axis=0, name=\"expanded_filters\")\n",
 62 |     "        tiled_filters = tf.tile(expanded_filters, [no_filters] + [1] * len(convolution_filters.get_shape()),\n",
 63 |     "                                name=\"tiled_filters\")\n",
 64 |     "        \n",
 65 |     "        # Similarly tiling spreaded stddev to match the shape of tiled_filters\n",
 66 |     "        expanded_stddev = tf.reshape(shifted_stddev, [no_filters] + [1] * len(convolution_filters.get_shape()),\n",
 67 |     "                                     name=\"expanded_stddev\")\n",
 68 |     "        \n",
 69 |     "        binarized_filters = tf.sign(tiled_filters + expanded_stddev, name=\"binarized_filters\")\n",
 70 |     "        return binarized_filters"
 71 |    ]
 72 |   },
 73 |   {
 74 |    "cell_type": "markdown",
 75 |    "metadata": {},
 76 |    "source": [
 77 |     "Now, instead of get_alphas, implementation of **alpha training** is provided, which takes input of the *filters*, *binarized filters*, and *alphas* and returns the loss and the alpha training operation"
 78 |    ]
 79 |   },
 80 |   {
 81 |    "cell_type": "code",
 82 |    "execution_count": 3,
 83 |    "metadata": {
 84 |     "collapsed": true
 85 |    },
 86 |    "outputs": [],
 87 |    "source": [
 88 |     "def alpha_training(convolution_filters, binary_filters, alphas, no_filters):\n",
 89 |     "    with tf.name_scope(\"alpha_training\"):\n",
 90 |     "        reshaped_convolution_filters = tf.reshape(convolution_filters, [-1], name=\"reshaped_convolution_filters\")\n",
 91 |     "        reshaped_binary_filters = tf.reshape(binary_filters, [no_filters, -1],\n",
 92 |     "                                             name=\"reshaped_binary_filters\")\n",
 93 |     "        \n",
 94 |     "        weighted_sum_filters = tf.reduce_sum(tf.multiply(alphas, reshaped_binary_filters),\n",
 95 |     "                                             axis=0, name=\"weighted_sum_filters\")\n",
 96 |     "        \n",
 97 |     "        # Defining loss\n",
 98 |     "        error = tf.square(reshaped_convolution_filters - weighted_sum_filters, name=\"alphas_error\")\n",
 99 |     "        loss = tf.reduce_mean(error, axis=0, name=\"alphas_loss\")\n",
100 |     "        \n",
101 |     "        # Defining optimizer\n",
102 |     "        training_op = tf.train.AdamOptimizer().minimize(loss, var_list=[alphas],\n",
103 |     "                                                        name=\"alphas_training_op\")\n",
104 |     "        \n",
105 |     "        return training_op, loss"
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "markdown",
110 |    "metadata": {},
111 |    "source": [
112 |     "Now, both *ABC* and *ApproxConv* is updated to incorporate this change"
113 |    ]
114 |   },
115 |   {
116 |    "cell_type": "code",
117 |    "execution_count": 4,
118 |    "metadata": {
119 |     "collapsed": true
120 |    },
121 |    "outputs": [],
122 |    "source": [
123 |     "def ApproxConv(no_filters, alphas, binary_filters, convolution_biases=None,\n",
124 |     "               strides=(1, 1), padding=\"VALID\", name=None):\n",
125 |     "    with tf.name_scope(name, \"ApproxConv\"):\n",
126 |     "        if convolution_biases is None:\n",
127 |     "            biases = 0.\n",
128 |     "        else:\n",
129 |     "            biases = convolution_biases\n",
130 |     "        \n",
131 |     "        # Defining function for closure to accept multiple inputs with same filters\n",
132 |     "        def ApproxConvLayer(input_tensor, name=None):\n",
133 |     "            with tf.name_scope(name, \"ApproxConv_Layer\"):\n",
134 |     "                # Reshaping alphas to match the input tensor\n",
135 |     "                reshaped_alphas = tf.reshape(alphas,\n",
136 |     "                                             shape=[no_filters] + [1] * len(input_tensor.get_shape()),\n",
137 |     "                                             name=\"reshaped_alphas\")\n",
138 |     "                \n",
139 |     "                # Calculating convolution for each binary filter\n",
140 |     "                approxConv_outputs = []\n",
141 |     "                for index in range(no_filters):\n",
142 |     "                    # Binary convolution\n",
143 |     "                    this_conv = tf.nn.conv2d(input_tensor, binary_filters[index],\n",
144 |     "                                             strides=(1,) + strides + (1,),\n",
145 |     "                                             padding=padding)\n",
146 |     "                    approxConv_outputs.append(this_conv + biases)\n",
147 |     "                conv_outputs = tf.convert_to_tensor(approxConv_outputs, dtype=tf.float32,\n",
148 |     "                                                    name=\"conv_outputs\")\n",
149 |     "                \n",
150 |     "                # Summing up each of the binary convolution\n",
151 |     "                ApproxConv_output = tf.reduce_sum(tf.multiply(conv_outputs, reshaped_alphas), axis=0)\n",
152 |     "                \n",
153 |     "                return ApproxConv_output\n",
154 |     "        \n",
155 |     "        return ApproxConvLayer\n",
156 |     "    \n",
157 |     "def ABC(binary_filters, alphas, shift_parameters, betas, \n",
158 |     "        convolution_biases=None, no_binary_filters=5, no_ApproxConvLayers=5,\n",
159 |     "        strides=(1, 1), padding=\"VALID\", name=None):\n",
160 |     "    with tf.name_scope(name, \"ABC\"):        \n",
161 |     "        # Instantiating the ApproxConv Layer\n",
162 |     "        ApproxConvLayer= ApproxConv(no_binary_filters, alphas, binary_filters, convolution_biases,\n",
163 |     "                                    strides, padding)\n",
164 |     "        \n",
165 |     "        def ABCLayer(input_tensor, name=None):\n",
166 |     "            with tf.name_scope(name, \"ABCLayer\"):\n",
167 |     "                # Reshaping betas to match the input tensor\n",
168 |     "                reshaped_betas = tf.reshape(betas,\n",
169 |     "                                            shape=[no_ApproxConvLayers] + [1] * len(input_tensor.get_shape()),\n",
170 |     "                                            name=\"reshaped_betas\")\n",
171 |     "                \n",
172 |     "                # Calculating ApproxConv for each shifted input\n",
173 |     "                ApproxConv_layers = []\n",
174 |     "                for index in range(no_ApproxConvLayers):\n",
175 |     "                    # Shifting and binarizing input\n",
176 |     "                    shifted_input = tf.clip_by_value(input_tensor + shift_parameters[index], 0., 1.,\n",
177 |     "                                                     name=\"shifted_input_\" + str(index))\n",
178 |     "                    binarized_activation = tf.sign(shifted_input - 0.5)\n",
179 |     "                    \n",
180 |     "                    # Passing through the ApproxConv layer\n",
181 |     "                    ApproxConv_layers.append(ApproxConvLayer(binarized_activation))\n",
182 |     "                ApproxConv_output = tf.convert_to_tensor(ApproxConv_layers, dtype=tf.float32,\n",
183 |     "                                                         name=\"ApproxConv_output\")\n",
184 |     "                \n",
185 |     "                # Taking the weighted sum using the betas\n",
186 |     "                ABC_output = tf.reduce_sum(tf.multiply(ApproxConv_output, reshaped_betas), axis=0)\n",
187 |     "                return ABC_output\n",
188 |     "        \n",
189 |     "        return ABCLayer"
190 |    ]
191 |   },
192 |   {
193 |    "cell_type": "markdown",
194 |    "metadata": {},
195 |    "source": [
196 |     "#### Now a layer can be created as follows"
197 |    ]
198 |   },
199 |   {
200 |    "cell_type": "code",
201 |    "execution_count": 10,
202 |    "metadata": {
203 |     "collapsed": true
204 |    },
205 |    "outputs": [],
206 |    "source": [
207 |     "test_filters = np.random.normal(size=(3, 3, 1, 64))\n",
208 |     "test_biases = np.random.normal(size=(64,))\n",
209 |     "test_input = np.random.normal(size=(32, 28, 28, 1))"
210 |    ]
211 |   },
212 |   {
213 |    "cell_type": "code",
214 |    "execution_count": 11,
215 |    "metadata": {
216 |     "collapsed": true
217 |    },
218 |    "outputs": [],
219 |    "source": [
220 |     "g = tf.Graph()"
221 |    ]
222 |   },
223 |   {
224 |    "cell_type": "code",
225 |    "execution_count": 12,
226 |    "metadata": {
227 |     "collapsed": true
228 |    },
229 |    "outputs": [],
230 |    "source": [
231 |     "with g.as_default():\n",
232 |     "    filters = tf.Variable(tf.convert_to_tensor(test_filters, dtype=tf.float32), name=\"convolution_filters\")\n",
233 |     "    biases = tf.Variable(tf.convert_to_tensor(test_biases, dtype=tf.float32), name=\"convolution_biases\")\n",
234 |     "    alphas = tf.Variable(tf.constant(1., shape=(5, 1)), dtype=tf.float32,\n",
235 |     "                         name=\"alphas\")\n",
236 |     "    shift_parameters = tf.Variable(tf.constant(0., shape=(5, 1)), dtype=tf.float32,\n",
237 |     "                                   name=\"shift_parameters\")\n",
238 |     "    betas = tf.Variable(tf.constant(1., shape=(5, 1)), dtype=tf.float32,\n",
239 |     "                        name=\"betas\")\n",
240 |     "    \n",
241 |     "    binary_filters = get_binary_filters(filters, 5)\n",
242 |     "    alphas_training_op, alphas_loss = alpha_training(tf.stop_gradient(filters),\n",
243 |     "                                                     tf.stop_gradient(binary_filters),\n",
244 |     "                                                     alphas, 5)\n",
245 |     "    ABC_layer = ABC(binary_filters, tf.stop_gradient(alphas), shift_parameters, betas, biases)\n",
246 |     "    \n",
247 |     "    output = ABC_layer(tf.convert_to_tensor(test_input, dtype=tf.float32))"
248 |    ]
249 |   },
250 |   {
251 |    "cell_type": "markdown",
252 |    "metadata": {},
253 |    "source": [
254 |     "### Testing\n",
255 |     "Let's test the updated architecture on MNIST again"
256 |    ]
257 |   },
258 |   {
259 |    "cell_type": "code",
260 |    "execution_count": 5,
261 |    "metadata": {},
262 |    "outputs": [
263 |     {
264 |      "name": "stdout",
265 |      "output_type": "stream",
266 |      "text": [
267 |       "Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.\n",
268 |       "Extracting /tmp/data/train-images-idx3-ubyte.gz\n",
269 |       "Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.\n",
270 |       "Extracting /tmp/data/train-labels-idx1-ubyte.gz\n",
271 |       "Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.\n",
272 |       "Extracting /tmp/data/t10k-images-idx3-ubyte.gz\n",
273 |       "Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.\n",
274 |       "Extracting /tmp/data/t10k-labels-idx1-ubyte.gz\n"
275 |      ]
276 |     }
277 |    ],
278 |    "source": [
279 |     "# MNIST data import\n",
280 |     "# Importing data\n",
281 |     "from tensorflow.examples.tutorials.mnist import input_data\n",
282 |     "!mkdir -p /tmp/data\n",
283 |     "mnist = input_data.read_data_sets(\"/tmp/data/\")"
284 |    ]
285 |   },
286 |   {
287 |    "cell_type": "markdown",
288 |    "metadata": {},
289 |    "source": [
290 |     "The following is exactly same as in the other notebook *ABC*"
291 |    ]
292 |   },
293 |   {
294 |    "cell_type": "code",
295 |    "execution_count": 6,
296 |    "metadata": {
297 |     "collapsed": true
298 |    },
299 |    "outputs": [],
300 |    "source": [
301 |     "# Defining utils function\n",
302 |     "def weight_variable(shape, name=\"weight\"):\n",
303 |     "    initial = tf.truncated_normal(shape, stddev=0.1)\n",
304 |     "    return tf.Variable(initial, name=name)\n",
305 |     "\n",
306 |     "def bias_variable(shape, name=\"bias\"):\n",
307 |     "    initial = tf.constant(0.1, shape=shape)\n",
308 |     "    return tf.Variable(initial, name=name)\n",
309 |     "\n",
310 |     "def conv2d(x, W):\n",
311 |     "    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')\n",
312 |     "\n",
313 |     "def max_pool_2x2(x):\n",
314 |     "    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],\n",
315 |     "                        strides=[1, 2, 2, 1], padding='SAME')"
316 |    ]
317 |   },
318 |   {
319 |    "cell_type": "code",
320 |    "execution_count": 7,
321 |    "metadata": {
322 |     "collapsed": true
323 |    },
324 |    "outputs": [],
325 |    "source": [
326 |     "# Creating the graph\n",
327 |     "without_ABC_graph = tf.Graph()\n",
328 |     "with without_ABC_graph.as_default():\n",
329 |     "    # Defining inputs\n",
330 |     "    x = tf.placeholder(dtype=tf.float32)\n",
331 |     "    x_image = tf.reshape(x, [-1, 28, 28, 1])\n",
332 |     "    \n",
333 |     "     # Convolution Layer 1\n",
334 |     "    W_conv1 = weight_variable(shape=([5, 5, 1, 32]), name=\"W_conv1\")\n",
335 |     "    b_conv1 = bias_variable(shape=[32], name=\"b_conv1\")\n",
336 |     "    conv1 = (conv2d(x_image, W_conv1) + b_conv1)\n",
337 |     "    pool1 = max_pool_2x2(conv1)\n",
338 |     "    bn_conv1 = tf.layers.batch_normalization(pool1, axis=-1, name=\"batchNorm1\")\n",
339 |     "    h_conv1 = tf.nn.relu(bn_conv1)\n",
340 |     "\n",
341 |     "    # Convolution Layer 2\n",
342 |     "    W_conv2 = weight_variable(shape=([5, 5, 32, 64]), name=\"W_conv2\")\n",
343 |     "    b_conv2 = bias_variable(shape=[64], name=\"b_conv2\")\n",
344 |     "    conv2 = (conv2d(h_conv1, W_conv2) + b_conv2)\n",
345 |     "    pool2 = max_pool_2x2(conv2)\n",
346 |     "    bn_conv2 = tf.layers.batch_normalization(pool2, axis=-1, name=\"batchNorm2\")\n",
347 |     "    h_conv2 = tf.nn.relu(bn_conv2)\n",
348 |     "\n",
349 |     "    # Flat the conv2 output\n",
350 |     "    h_conv2_flat = tf.reshape(h_conv2, shape=(-1, 7*7*64))\n",
351 |     "\n",
352 |     "    # Dense layer1\n",
353 |     "    W_fc1 = weight_variable([7 * 7 * 64, 1024])\n",
354 |     "    b_fc1 = bias_variable([1024])\n",
355 |     "    h_fc1 = tf.nn.relu(tf.matmul(h_conv2_flat, W_fc1) + b_fc1)\n",
356 |     "\n",
357 |     "    # Dropout\n",
358 |     "    keep_prob = tf.placeholder(tf.float32)\n",
359 |     "    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)\n",
360 |     "\n",
361 |     "    # Output layer\n",
362 |     "    W_fc2 = weight_variable([1024, 10])\n",
363 |     "    b_fc2 = bias_variable([10])\n",
364 |     "\n",
365 |     "    y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2\n",
366 |     "    \n",
367 |     "    # Labels\n",
368 |     "    y = tf.placeholder(tf.int32, [None])\n",
369 |     "    y_ = tf.one_hot(y, 10)\n",
370 |     "    \n",
371 |     "    # Defining optimizer and loss\n",
372 |     "    cross_entropy = tf.reduce_mean(\n",
373 |     "        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))\n",
374 |     "    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)\n",
375 |     "    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))\n",
376 |     "    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))\n",
377 |     "    \n",
378 |     "    # Initializer\n",
379 |     "    graph_init = tf.global_variables_initializer()"
380 |    ]
381 |   },
382 |   {
383 |    "cell_type": "code",
384 |    "execution_count": 8,
385 |    "metadata": {
386 |     "collapsed": true
387 |    },
388 |    "outputs": [],
389 |    "source": [
390 |     "# Defining variables to save. These will be fed to our custom layer\n",
391 |     "variables_to_save = {\"W_conv1\": W_conv1,\n",
392 |     "                     \"b_conv1\": b_conv1,\n",
393 |     "                     \"W_conv2\": W_conv2,\n",
394 |     "                     \"b_conv2\": b_conv2,\n",
395 |     "                     \"W_fc1\": W_fc1,\n",
396 |     "                     \"b_fc1\": b_fc1,\n",
397 |     "                     \"W_fc2\": W_fc2,\n",
398 |     "                     \"b_fc2\": b_fc2}\n",
399 |     "values = {}"
400 |    ]
401 |   },
402 |   {
403 |    "cell_type": "code",
404 |    "execution_count": 9,
405 |    "metadata": {},
406 |    "outputs": [
407 |     {
408 |      "name": "stdout",
409 |      "output_type": "stream",
410 |      "text": [
411 |       "Epoch: 1  Val accuracy: 80.0000%  Loss: 0.575571\n",
412 |       "Epoch: 2  Val accuracy: 88.0000%  Loss: 0.516295\n",
413 |       "Epoch: 3  Val accuracy: 98.0000%  Loss: 0.074902\n",
414 |       "Epoch: 4  Val accuracy: 96.0000%  Loss: 0.114960\n",
415 |       "Epoch: 5  Val accuracy: 96.0000%  Loss: 0.108748        \n"
416 |      ]
417 |     }
418 |    ],
419 |    "source": [
420 |     "n_epochs = 5\n",
421 |     "batch_size = 32\n",
422 |     "        \n",
423 |     "with tf.Session(graph=without_ABC_graph) as sess:\n",
424 |     "    sess.run(graph_init)\n",
425 |     "    for epoch in range(n_epochs):\n",
426 |     "        for iteration in range(1, 200 + 1):\n",
427 |     "            batch = mnist.train.next_batch(50)\n",
428 |     "            \n",
429 |     "            # Run operation and calculate loss\n",
430 |     "            _, loss_train = sess.run([train_step, cross_entropy],\n",
431 |     "                                     feed_dict={x: batch[0], y: batch[1], keep_prob: 0.5})\n",
432 |     "            print(\"\\rIteration: {}/{} ({:.1f}%)  Loss: {:.5f}\".format(\n",
433 |     "                      iteration, 200,\n",
434 |     "                      iteration * 100 / 200,\n",
435 |     "                      loss_train),\n",
436 |     "                  end=\"\")\n",
437 |     "\n",
438 |     "        # At the end of each epoch,\n",
439 |     "        # measure the validation loss and accuracy:\n",
440 |     "        loss_vals = []\n",
441 |     "        acc_vals = []\n",
442 |     "        for iteration in range(1, 200 + 1):\n",
443 |     "            X_batch, y_batch = mnist.validation.next_batch(batch_size)\n",
444 |     "            acc_val, loss_val = sess.run([accuracy, cross_entropy],\n",
445 |     "                                     feed_dict={x: batch[0], y: batch[1], keep_prob: 1.0})\n",
446 |     "            loss_vals.append(loss_val)\n",
447 |     "            acc_vals.append(acc_val)\n",
448 |     "            print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(iteration, 200,\n",
449 |     "                iteration * 100 / 200),\n",
450 |     "                  end=\" \" * 10)\n",
451 |     "        loss_val = np.mean(loss_vals)\n",
452 |     "        acc_val = np.mean(acc_vals)\n",
453 |     "        print(\"\\rEpoch: {}  Val accuracy: {:.4f}%  Loss: {:.6f}\".format(\n",
454 |     "            epoch + 1, acc_val * 100, loss_val))\n",
455 |     "        \n",
456 |     "    # On completion of training, save the variables to be fed to custom model\n",
457 |     "    for var_name in variables_to_save:\n",
458 |     "        values[var_name] = sess.run(variables_to_save[var_name])"
459 |    ]
460 |   },
461 |   {
462 |    "cell_type": "markdown",
463 |    "metadata": {},
464 |    "source": [
465 |     "The 100% accuracy is not an error. It is due to the fact that complete validation set is not being evaluated only part of it is being evaluated and our model got all right answers in that part"
466 |    ]
467 |   },
468 |   {
469 |    "cell_type": "markdown",
470 |    "metadata": {},
471 |    "source": [
472 |     "#### Creating the custom model\n",
473 |     "While creating the custom model, we will need to create all the variables ourself."
474 |    ]
475 |   },
476 |   {
477 |    "cell_type": "markdown",
478 |    "metadata": {},
479 |    "source": [
480 |     "First let's create a function that returns the required mean and variance for the batchnorm layer. Batchnorm layer requires that mean and variance be calculated of every layer except that of the channels layer"
481 |    ]
482 |   },
483 |   {
484 |    "cell_type": "code",
485 |    "execution_count": 10,
486 |    "metadata": {
487 |     "collapsed": true
488 |    },
489 |    "outputs": [],
490 |    "source": [
491 |     "def bn_mean_variance(input_tensor, axis=-1, keep_dims=True):\n",
492 |     "    shape = len(input_tensor.get_shape())\n",
493 |     "    if axis < 0:\n",
494 |     "        axis += shape\n",
495 |     "    dimension_range = range(shape)\n",
496 |     "    return tf.nn.moments(input_tensor, axes=dimension_range[:axis] + dimension_range[axis+1:],\n",
497 |     "                         keep_dims=keep_dims)"
498 |    ]
499 |   },
500 |   {
501 |    "cell_type": "code",
502 |    "execution_count": 13,
503 |    "metadata": {
504 |     "collapsed": true
505 |    },
506 |    "outputs": [],
507 |    "source": [
508 |     "custom_graph = tf.Graph()\n",
509 |     "with custom_graph.as_default():\n",
510 |     "    alphas_training_operations = []\n",
511 |     "    alphas_variables = []\n",
512 |     "    \n",
513 |     "    # Setting configuration\n",
514 |     "    no_filters_conv1 = 5\n",
515 |     "    no_layers_conv1 = 5\n",
516 |     "    no_filters_conv2 = 5\n",
517 |     "    no_layers_conv2 = 5\n",
518 |     "    \n",
519 |     "    # Inputs\n",
520 |     "    x = tf.placeholder(dtype=tf.float32)\n",
521 |     "    x_image = tf.reshape(x, [-1, 28, 28, 1])\n",
522 |     "    \n",
523 |     "    # Convolution Layer 1\n",
524 |     "    W_conv1 = tf.Variable(values[\"W_conv1\"], name=\"W_conv1\")\n",
525 |     "    b_conv1 = tf.Variable(values[\"b_conv1\"], name=\"b_conv1\")\n",
526 |     "    # Creating new variables\n",
527 |     "    alphas_conv1 = tf.Variable(tf.random_normal(shape=(no_filters_conv1, 1), mean=1.0, stddev=0.1),\n",
528 |     "                               dtype=tf.float32, name=\"alphas_conv1\")\n",
529 |     "    shift_parameters_conv1 = tf.Variable(tf.constant(0., shape=(no_layers_conv1, 1)),\n",
530 |     "                                         dtype=tf.float32, name=\"shift_parameters_conv1\")\n",
531 |     "    betas_conv1 = tf.Variable(tf.constant(1., shape=(no_layers_conv1, 1)),\n",
532 |     "                              dtype=tf.float32, name=\"betas_conv1\")\n",
533 |     "    # Performing the operations\n",
534 |     "    binary_filters_conv1 = get_binary_filters(W_conv1, no_filters_conv1)\n",
535 |     "    alpha_training_conv1, alpha_loss_conv1 = alpha_training(tf.stop_gradient(W_conv1, \"no_gradient_W_conv1\"),\n",
536 |     "                                                            tf.stop_gradient(binary_filters_conv1,\n",
537 |     "                                                                             \"no_gradient_binary_filters_conv1\"),\n",
538 |     "                                                            alphas_conv1, no_filters_conv1)\n",
539 |     "    conv1 = ABC(binary_filters_conv1, tf.stop_gradient(alphas_conv1), shift_parameters_conv1,\n",
540 |     "                betas_conv1, b_conv1, padding=\"SAME\")(x_image)\n",
541 |     "    # Saving the alphas training operation and the variable\n",
542 |     "    alphas_training_operations.append(alpha_training_conv1)\n",
543 |     "    alphas_variables.append(alphas_conv1)\n",
544 |     "    \n",
545 |     "    # Other layers\n",
546 |     "    pool1 = max_pool_2x2(conv1)\n",
547 |     "    # BatchNorm \n",
548 |     "    mean_conv1, variance_conv1 = bn_mean_variance(pool1)\n",
549 |     "    bn_gamma_conv1 = tf.Variable(tf.ones(shape=(32,), dtype=tf.float32), name=\"bn_gamma_conv1\")\n",
550 |     "    bn_beta_conv1 = tf.Variable(tf.zeros(shape=(32,), dtype=tf.float32), name=\"bn_beta_conv1\")\n",
551 |     "    bn_conv1 = tf.nn.batch_normalization(pool1, mean_conv1, variance_conv1,\n",
552 |     "                                         bn_beta_conv1, bn_gamma_conv1, 0.001)\n",
553 |     "    h_conv1 = tf.nn.relu(bn_conv1)\n",
554 |     "\n",
555 |     "    # Convolution Layer 2\n",
556 |     "    W_conv2 = tf.Variable(values[\"W_conv2\"], name=\"W_conv2\")\n",
557 |     "    b_conv2 = tf.Variable(values[\"b_conv2\"], name=\"b_conv2\")\n",
558 |     "    \n",
559 |     "    # Creating new variables\n",
560 |     "    alphas_conv2 = tf.Variable(tf.random_normal(shape=(no_filters_conv2, 1), mean=1.0, stddev=0.1),\n",
561 |     "                               dtype=tf.float32, name=\"alphas_conv2\")\n",
562 |     "    shift_parameters_conv2 = tf.Variable(tf.constant(0., shape=(no_layers_conv2, 1)),\n",
563 |     "                                         dtype=tf.float32, name=\"shift_parameters_conv2\")\n",
564 |     "    betas_conv2 = tf.Variable(tf.constant(1., shape=(no_layers_conv2, 1)),\n",
565 |     "                              dtype=tf.float32, name=\"betas_conv2\")\n",
566 |     "    \n",
567 |     "    # Performing the operations\n",
568 |     "    binary_filters_conv2 = get_binary_filters(W_conv2, no_filters_conv2)\n",
569 |     "    alpha_training_conv2, alpha_loss_conv2 = alpha_training(tf.stop_gradient(W_conv2, \"no_gradient_W_conv2\"),\n",
570 |     "                                                            tf.stop_gradient(binary_filters_conv2,\n",
571 |     "                                                                             \"no_gradient_binary_filters_conv2\"),\n",
572 |     "                                                            alphas_conv2, no_filters_conv2)\n",
573 |     "    conv2 = ABC(binary_filters_conv2, tf.stop_gradient(alphas_conv2), shift_parameters_conv2,\n",
574 |     "                betas_conv2, b_conv2, padding=\"SAME\")(h_conv1)\n",
575 |     "    \n",
576 |     "    # Saving the alphas training operation and the variable\n",
577 |     "    alphas_training_operations.append(alpha_training_conv2)\n",
578 |     "    alphas_variables.append(alphas_conv2)\n",
579 |     "    \n",
580 |     "    # Other layers\n",
581 |     "    pool2 = max_pool_2x2(conv2)\n",
582 |     "    # BatchNorm\n",
583 |     "    mean_conv2, variance_conv2 = bn_mean_variance(pool2)\n",
584 |     "    bn_gamma_conv2 = tf.Variable(tf.ones(shape=(64,), dtype=tf.float32), name=\"bn_gamma_conv2\")\n",
585 |     "    bn_beta_conv2 = tf.Variable(tf.zeros(shape=(64,), dtype=tf.float32), name=\"bn_beta_conv2\")\n",
586 |     "    bn_conv2 = tf.nn.batch_normalization(pool2, mean_conv2, variance_conv2,\n",
587 |     "                                         bn_beta_conv2, bn_gamma_conv2, 0.001)\n",
588 |     "    h_conv2 = tf.nn.relu(bn_conv2)\n",
589 |     "\n",
590 |     "    # Flat the conv2 output\n",
591 |     "    h_conv2_flat = tf.reshape(h_conv2, shape=(-1, 7*7*64))\n",
592 |     "\n",
593 |     "    # Dense layer1\n",
594 |     "    W_fc1 = tf.convert_to_tensor(values[\"W_fc1\"], dtype=tf.float32)\n",
595 |     "    b_fc1 = tf.convert_to_tensor(values[\"b_fc1\"], dtype=tf.float32)\n",
596 |     "    h_fc1 = tf.nn.relu(tf.matmul(h_conv2_flat, W_fc1) + b_fc1)\n",
597 |     "\n",
598 |     "    # Dropout\n",
599 |     "    keep_prob = tf.placeholder(tf.float32)\n",
600 |     "    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)\n",
601 |     "\n",
602 |     "    # Output layer\n",
603 |     "    W_fc2 = tf.convert_to_tensor(values[\"W_fc2\"], dtype=tf.float32)\n",
604 |     "    b_fc2 = tf.convert_to_tensor(values[\"b_fc2\"], dtype=tf.float32)\n",
605 |     "    y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2\n",
606 |     "    \n",
607 |     "    # Labels\n",
608 |     "    y = tf.placeholder(tf.int32, [None])\n",
609 |     "    y_ = tf.one_hot(y, 10)\n",
610 |     "    \n",
611 |     "    # Defining optimizer and loss\n",
612 |     "    cross_entropy = tf.reduce_mean(\n",
613 |     "        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))\n",
614 |     "    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)\n",
615 |     "    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))\n",
616 |     "    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))\n",
617 |     "    \n",
618 |     "    graph_init = tf.global_variables_initializer()\n",
619 |     "    alphas_init = tf.variables_initializer(alphas_variables)"
620 |    ]
621 |   },
622 |   {
623 |    "cell_type": "markdown",
624 |    "metadata": {},
625 |    "source": [
626 |     "Let's create the dictionary of variables to save"
627 |    ]
628 |   },
629 |   {
630 |    "cell_type": "code",
631 |    "execution_count": 14,
632 |    "metadata": {
633 |     "collapsed": true
634 |    },
635 |    "outputs": [],
636 |    "source": [
637 |     "# Defining variables to save. These will be fed to our custom layer\n",
638 |     "variables_to_save = {\"W_conv1\": W_conv1,\n",
639 |     "                     \"b_conv1\": b_conv1,\n",
640 |     "                     \"alphas_conv1\": alphas_conv1,\n",
641 |     "                     \"betas_conv1\": betas_conv1,\n",
642 |     "                     \"shift_parameters_conv1\": shift_parameters_conv1,\n",
643 |     "                     \"bn_gamma_conv1\": bn_gamma_conv1,\n",
644 |     "                     \"bn_beta_conv1\": bn_beta_conv1,\n",
645 |     "                     \"W_conv2\": W_conv2,\n",
646 |     "                     \"b_conv2\": b_conv2,\n",
647 |     "                     \"alphas_conv2\": alphas_conv2,\n",
648 |     "                     \"betas_conv2\": betas_conv2,\n",
649 |     "                     \"shift_parameters_conv2\": shift_parameters_conv2,\n",
650 |     "                     \"bn_gamma_conv2\": bn_gamma_conv2,\n",
651 |     "                     \"bn_beta_conv2\": bn_beta_conv2,\n",
652 |     "                     \"W_fc1\": W_fc1,\n",
653 |     "                     \"b_fc1\": b_fc1,\n",
654 |     "                     \"W_fc2\": W_fc2,\n",
655 |     "                     \"b_fc2\": b_fc2}\n",
656 |     "values = {}"
657 |    ]
658 |   },
659 |   {
660 |    "cell_type": "code",
661 |    "execution_count": 15,
662 |    "metadata": {},
663 |    "outputs": [
664 |     {
665 |      "name": "stdout",
666 |      "output_type": "stream",
667 |      "text": [
668 |       "Epoch: 1  Val accuracy: 90.0000%  Loss: 0.314954\n",
669 |       "Epoch: 2  Val accuracy: 76.0000%  Loss: 0.954873\n",
670 |       "Epoch: 3  Val accuracy: 80.0000%  Loss: 0.985948\n",
671 |       "Epoch: 4  Val accuracy: 84.0000%  Loss: 1.012544\n",
672 |       "Epoch: 5  Val accuracy: 78.0000%  Loss: 1.004487\n",
673 |       "CPU times: user 4min 42s, sys: 26.4 s, total: 5min 8s\n",
674 |       "Wall time: 5min 6s\n"
675 |      ]
676 |     }
677 |    ],
678 |    "source": [
679 |     "%%time\n",
680 |     "n_epochs = 5\n",
681 |     "batch_size = 32\n",
682 |     "alpha_training_epochs = 200\n",
683 |     "        \n",
684 |     "with tf.Session(graph=custom_graph) as sess:\n",
685 |     "    sess.run(graph_init)\n",
686 |     "    for epoch in range(n_epochs):\n",
687 |     "        for iteration in range(1, 200 + 1):\n",
688 |     "            # Training alphas\n",
689 |     "            sess.run(alphas_init)\n",
690 |     "            for alpha_training_op in alphas_training_operations:\n",
691 |     "                for alpha_epoch in range(alpha_training_epochs):\n",
692 |     "                    sess.run(alpha_training_op)\n",
693 |     "            \n",
694 |     "            batch = mnist.train.next_batch(50)\n",
695 |     "            \n",
696 |     "            # Run operation and calculate loss\n",
697 |     "            _, loss_train = sess.run([train_step, cross_entropy],\n",
698 |     "                                     feed_dict={x: batch[0], y: batch[1], keep_prob: 0.5})\n",
699 |     "            print(\"\\rIteration: {}/{} ({:.1f}%)  Loss: {:.5f}\".format(\n",
700 |     "                      iteration, 200,\n",
701 |     "                      iteration * 100 / 200,\n",
702 |     "                      loss_train),\n",
703 |     "                  end=\"\")\n",
704 |     "\n",
705 |     "        # At the end of each epoch,\n",
706 |     "        # measure the validation loss and accuracy:\n",
707 |     "        \n",
708 |     "        # Training alphas\n",
709 |     "        sess.run(alphas_init)\n",
710 |     "        for alpha_training_op in alphas_training_operations:\n",
711 |     "            for alpha_epoch in range(alpha_training_epochs):\n",
712 |     "                sess.run(alpha_training_op)\n",
713 |     "                    \n",
714 |     "        loss_vals = []\n",
715 |     "        acc_vals = []\n",
716 |     "        for iteration in range(1, 200 + 1):            \n",
717 |     "            X_batch, y_batch = mnist.validation.next_batch(batch_size)\n",
718 |     "            acc_val, loss_val = sess.run([accuracy, cross_entropy],\n",
719 |     "                                     feed_dict={x: batch[0], y: batch[1], keep_prob: 1.0})\n",
720 |     "            loss_vals.append(loss_val)\n",
721 |     "            acc_vals.append(acc_val)\n",
722 |     "            print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(iteration, 200,\n",
723 |     "                iteration * 100 / 200),\n",
724 |     "                  end=\" \" * 10)\n",
725 |     "        loss_val = np.mean(loss_vals)\n",
726 |     "        acc_val = np.mean(acc_vals)\n",
727 |     "        print(\"\\rEpoch: {}  Val accuracy: {:.4f}%  Loss: {:.6f}\".format(\n",
728 |     "            epoch + 1, acc_val * 100, loss_val))\n",
729 |     "        \n",
730 |     "    # On completion of training, save the variables to be fed to custom model\n",
731 |     "    for var_name in variables_to_save:\n",
732 |     "        values[var_name] = sess.run(variables_to_save[var_name])"
733 |    ]
734 |   },
735 |   {
736 |    "cell_type": "markdown",
737 |    "metadata": {},
738 |    "source": [
739 |     "Now, only the required variables can be saved for inference time. Using the **W_conv1** and **W_conv2**, values for binary filters and alphas can be calculated and those can be used along with **shift_parameters** and **betas** to create ABC layer for inference"
740 |    ]
741 |   },
742 |   {
743 |    "cell_type": "markdown",
744 |    "metadata": {},
745 |    "source": [
746 |     "### Pure inference testing\n",
747 |     "OK! Let's extract the binary filters and alphas and throw away the weights and test our network. This will ensure that we do not have any bug in the implementation of the ABC layer"
748 |    ]
749 |   },
750 |   {
751 |    "cell_type": "markdown",
752 |    "metadata": {},
753 |    "source": [
754 |     "Creating graphs for alphas calculation"
755 |    ]
756 |   },
757 |   {
758 |    "cell_type": "code",
759 |    "execution_count": 22,
760 |    "metadata": {
761 |     "collapsed": true
762 |    },
763 |    "outputs": [],
764 |    "source": [
765 |     "alpha1_cal_graph = tf.Graph()\n",
766 |     "with alpha1_cal_graph.as_default():\n",
767 |     "    alphas1 = tf.Variable(tf.random_normal(shape=(no_filters_conv1, 1), mean=1.0, stddev=0.1))\n",
768 |     "    conv_filters1 = tf.placeholder(dtype=tf.float32, shape=(5, 5, 1, 32))\n",
769 |     "    bin_filters1 = get_binary_filters(convolution_filters=conv_filters1,\n",
770 |     "                                     no_filters=no_filters_conv1)\n",
771 |     "    alpha_training_op1, alpha_training_loss1 = alpha_training(conv_filters1, bin_filters1,\n",
772 |     "                                                            alphas1, no_filters_conv1)\n",
773 |     "    al_init1 = tf.global_variables_initializer()\n",
774 |     "    \n",
775 |     "alpha2_cal_graph = tf.Graph()\n",
776 |     "with alpha2_cal_graph.as_default():\n",
777 |     "    alphas2 = tf.Variable(tf.random_normal(shape=(no_filters_conv1, 1), mean=1.0, stddev=0.1))\n",
778 |     "    conv_filters2 = tf.placeholder(dtype=tf.float32, shape=(5, 5, 32, 64))\n",
779 |     "    bin_filters2 = get_binary_filters(convolution_filters=conv_filters2,\n",
780 |     "                                     no_filters=no_filters_conv2)\n",
781 |     "    alpha_training_op2, alpha_training_loss2 = alpha_training(conv_filters2, bin_filters2,\n",
782 |     "                                                            alphas2, no_filters_conv2)\n",
783 |     "    al_init2 = tf.global_variables_initializer()"
784 |    ]
785 |   },
786 |   {
787 |    "cell_type": "markdown",
788 |    "metadata": {},
789 |    "source": [
790 |     "Calculating alphas and binary filters"
791 |    ]
792 |   },
793 |   {
794 |    "cell_type": "code",
795 |    "execution_count": 23,
796 |    "metadata": {
797 |     "collapsed": true
798 |    },
799 |    "outputs": [],
800 |    "source": [
801 |     "with tf.Session(graph=alpha1_cal_graph) as sess:\n",
802 |     "    al_init1.run()\n",
803 |     "    for epoch in range(200):\n",
804 |     "        sess.run(alpha_training_op1, feed_dict={conv_filters1: values[\"W_conv1\"]})\n",
805 |     "    cal_bin_filters, cal_alphas = sess.run([bin_filters1, alphas1], feed_dict={conv_filters1: values[\"W_conv1\"]})\n",
806 |     "    values[\"binary_filters_conv1\"] = cal_bin_filters\n",
807 |     "    values[\"alphas_conv1\"] = cal_alphas\n",
808 |     "\n",
809 |     "with tf.Session(graph=alpha2_cal_graph) as sess:\n",
810 |     "    al_init2.run()\n",
811 |     "    for epoch in range(200):\n",
812 |     "        sess.run(alpha_training_op2, feed_dict={conv_filters2: values[\"W_conv2\"]})\n",
813 |     "    cal_bin_filters, cal_alphas = sess.run([bin_filters2, alphas2], feed_dict={conv_filters2: values[\"W_conv2\"]})\n",
814 |     "    values[\"binary_filters_conv2\"] = cal_bin_filters\n",
815 |     "    values[\"alphas_conv2\"] = cal_alphas"
816 |    ]
817 |   },
818 |   {
819 |    "cell_type": "markdown",
820 |    "metadata": {},
821 |    "source": [
822 |     "#### Building inference model\n",
823 |     "Now, we have all our variables, let's build an inference model"
824 |    ]
825 |   },
826 |   {
827 |    "cell_type": "code",
828 |    "execution_count": 25,
829 |    "metadata": {
830 |     "collapsed": true
831 |    },
832 |    "outputs": [],
833 |    "source": [
834 |     "inference_graph = tf.Graph()\n",
835 |     "with inference_graph.as_default():\n",
836 |     "    # Setting configuration\n",
837 |     "    no_filters_conv1 = 5\n",
838 |     "    no_layers_conv1 = 5\n",
839 |     "    no_filters_conv2 = 5\n",
840 |     "    no_layers_conv2 = 5\n",
841 |     "    \n",
842 |     "    # Inputs\n",
843 |     "    x = tf.placeholder(dtype=tf.float32)\n",
844 |     "    x_image = tf.reshape(x, [-1, 28, 28, 1])\n",
845 |     "    \n",
846 |     "    # Convolution Layer 1\n",
847 |     "    b_conv1 = tf.convert_to_tensor(values[\"b_conv1\"], dtype=tf.float32, name=\"b_conv1\")\n",
848 |     "    alphas_conv1 = tf.convert_to_tensor(values[\"alphas_conv1\"],\n",
849 |     "                                        dtype=tf.float32, name=\"alphas_conv1\")\n",
850 |     "    shift_parameters_conv1 = tf.convert_to_tensor(values[\"shift_parameters_conv1\"],\n",
851 |     "                                                  dtype=tf.float32, name=\"shift_parameters_conv1\")\n",
852 |     "    betas_conv1 = tf.convert_to_tensor(values[\"betas_conv1\"],\n",
853 |     "                                       dtype=tf.float32, name=\"betas_conv1\")\n",
854 |     "    # Performing the operations\n",
855 |     "    binary_filters_conv1 = tf.convert_to_tensor(values[\"binary_filters_conv1\"], dtype=tf.float32,\n",
856 |     "                                                name=\"binary_filters_conv1\")\n",
857 |     "    conv1 = ABC(binary_filters_conv1, tf.stop_gradient(alphas_conv1), shift_parameters_conv1,\n",
858 |     "                betas_conv1, b_conv1, padding=\"SAME\")(x_image)\n",
859 |     "    # Other layers\n",
860 |     "    pool1 = max_pool_2x2(conv1)\n",
861 |     "    # batch norm parameters\n",
862 |     "    mean_conv1, variance_conv1 = bn_mean_variance(pool1)\n",
863 |     "    bn_gamma_conv1 = tf.convert_to_tensor(values[\"bn_gamma_conv1\"], dtype=tf.float32,\n",
864 |     "                                          name=\"bn_gamma_conv1\")\n",
865 |     "    bn_beta_conv1 = tf.convert_to_tensor(values[\"bn_beta_conv1\"], dtype=tf.float32,\n",
866 |     "                                         name=\"bn_beta_conv1\")\n",
867 |     "    bn_conv1 = tf.nn.batch_normalization(pool1, mean_conv1, variance_conv1,\n",
868 |     "                                         bn_beta_conv1, bn_gamma_conv1, 0.001)\n",
869 |     "    h_conv1 = tf.nn.relu(bn_conv1)\n",
870 |     "\n",
871 |     "    # Convolution Layer 2\n",
872 |     "    b_conv2 = tf.convert_to_tensor(values[\"b_conv2\"], dtype=tf.float32, name=\"b_conv2\")\n",
873 |     "    alphas_conv2 = tf.convert_to_tensor(values[\"alphas_conv2\"],\n",
874 |     "                                        dtype=tf.float32, name=\"alphas_conv2\")\n",
875 |     "    shift_parameters_conv2 = tf.convert_to_tensor(values[\"shift_parameters_conv2\"],\n",
876 |     "                                                  dtype=tf.float32, name=\"shift_parameters_conv2\")\n",
877 |     "    betas_conv2 = tf.convert_to_tensor(values[\"betas_conv2\"],\n",
878 |     "                                       dtype=tf.float32, name=\"betas_conv2\")\n",
879 |     "    # Performing the operations\n",
880 |     "    binary_filters_conv2 = tf.convert_to_tensor(values[\"binary_filters_conv2\"], dtype=tf.float32,\n",
881 |     "                                                name=\"binary_filters_conv2\")\n",
882 |     "    conv2 = ABC(binary_filters_conv2, tf.stop_gradient(alphas_conv2), shift_parameters_conv2,\n",
883 |     "                betas_conv2, b_conv2, padding=\"SAME\")(h_conv1)\n",
884 |     "    # Other layers\n",
885 |     "    pool2 = max_pool_2x2(conv2)\n",
886 |     "    # batch norm parameters\n",
887 |     "    mean_conv2, variance_conv2 = bn_mean_variance(pool2)\n",
888 |     "    bn_gamma_conv2 = tf.convert_to_tensor(values[\"bn_gamma_conv2\"], dtype=tf.float32,\n",
889 |     "                                          name=\"bn_gamma_conv2\")\n",
890 |     "    bn_beta_conv2 = tf.convert_to_tensor(values[\"bn_beta_conv2\"], dtype=tf.float32,\n",
891 |     "                                         name=\"bn_beta_conv2\")\n",
892 |     "    bn_conv2 = tf.nn.batch_normalization(pool2, mean_conv2, variance_conv2,\n",
893 |     "                                         bn_beta_conv2, bn_gamma_conv2, 0.001)\n",
894 |     "    h_conv2 = tf.nn.relu(bn_conv2)\n",
895 |     "\n",
896 |     "    # Flat the conv2 output\n",
897 |     "    h_conv2_flat = tf.reshape(h_conv2, shape=(-1, 7*7*64))\n",
898 |     "\n",
899 |     "    # Dense layer1\n",
900 |     "    W_fc1 = tf.convert_to_tensor(values[\"W_fc1\"], dtype=tf.float32)\n",
901 |     "    b_fc1 = tf.convert_to_tensor(values[\"b_fc1\"], dtype=tf.float32)\n",
902 |     "    h_fc1 = tf.nn.relu(tf.matmul(h_conv2_flat, W_fc1) + b_fc1)\n",
903 |     "\n",
904 |     "    # Dropout\n",
905 |     "    keep_prob = tf.placeholder(tf.float32)\n",
906 |     "    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)\n",
907 |     "\n",
908 |     "    # Output layer\n",
909 |     "    W_fc2 = tf.convert_to_tensor(values[\"W_fc2\"], dtype=tf.float32)\n",
910 |     "    b_fc2 = tf.convert_to_tensor(values[\"b_fc2\"], dtype=tf.float32)\n",
911 |     "    y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2\n",
912 |     "    \n",
913 |     "    # Labels\n",
914 |     "    y = tf.placeholder(tf.int32, [None])\n",
915 |     "    y_ = tf.one_hot(y, 10)\n",
916 |     "    \n",
917 |     "    # Defining optimizer and loss\n",
918 |     "    cross_entropy = tf.reduce_mean(\n",
919 |     "        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))\n",
920 |     "    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))\n",
921 |     "    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))"
922 |    ]
923 |   },
924 |   {
925 |    "cell_type": "markdown",
926 |    "metadata": {},
927 |    "source": [
928 |     "Let's test the inference model"
929 |    ]
930 |   },
931 |   {
932 |    "cell_type": "code",
933 |    "execution_count": 26,
934 |    "metadata": {},
935 |    "outputs": [
936 |     {
937 |      "name": "stdout",
938 |      "output_type": "stream",
939 |      "text": [
940 |       "Epoch: 200  Val accuracy: 78.0000%  Loss: 0.884985\n",
941 |       "CPU times: user 6.03 s, sys: 832 ms, total: 6.86 s\n",
942 |       "Wall time: 5.95 s\n"
943 |      ]
944 |     }
945 |    ],
946 |    "source": [
947 |     "%%time\n",
948 |     "with tf.Session(graph=inference_graph) as sess:\n",
949 |     "    loss_vals = []\n",
950 |     "    acc_vals = []\n",
951 |     "    for iteration in range(1, 500 + 1):            \n",
952 |     "        X_batch, y_batch = mnist.validation.next_batch(batch_size)\n",
953 |     "        acc_val, loss_val = sess.run([accuracy, cross_entropy],\n",
954 |     "                                 feed_dict={x: batch[0], y: batch[1], keep_prob: 1.0})\n",
955 |     "        loss_vals.append(loss_val)\n",
956 |     "        acc_vals.append(acc_val)\n",
957 |     "        print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(iteration, 500,\n",
958 |     "            iteration * 100 / 500),\n",
959 |     "              end=\" \" * 10)\n",
960 |     "    loss_val = np.mean(loss_vals)\n",
961 |     "    acc_val = np.mean(acc_vals)\n",
962 |     "    print(\"\\rEpoch: {}  Val accuracy: {:.4f}%  Loss: {:.6f}\".format(\n",
963 |     "        epoch + 1, acc_val * 100, loss_val))"
964 |    ]
965 |   },
966 |   {
967 |    "cell_type": "code",
968 |    "execution_count": null,
969 |    "metadata": {
970 |     "collapsed": true
971 |    },
972 |    "outputs": [],
973 |    "source": []
974 |   }
975 |  ],
976 |  "metadata": {
977 |   "kernelspec": {
978 |    "display_name": "tensorflow",
979 |    "language": "python",
980 |    "name": "tensorflow"
981 |   },
982 |   "language_info": {
983 |    "codemirror_mode": {
984 |     "name": "ipython",
985 |     "version": 2
986 |    },
987 |    "file_extension": ".py",
988 |    "mimetype": "text/x-python",
989 |    "name": "python",
990 |    "nbconvert_exporter": "python",
991 |    "pygments_lexer": "ipython2",
992 |    "version": "2.7.15"
993 |   }
994 |  },
995 |  "nbformat": 4,
996 |  "nbformat_minor": 2
997 | }
998 | 


--------------------------------------------------------------------------------
/ABC.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Implementation of Accurate Binary Convolution Layer\n",
  8 |     "[Original Paper](https://arxiv.org/abs/1711.11294)"
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "code",
 13 |    "execution_count": 1,
 14 |    "metadata": {
 15 |     "collapsed": true
 16 |    },
 17 |    "outputs": [],
 18 |    "source": [
 19 |     "from __future__ import division, print_function\n",
 20 |     "import tensorflow as tf\n",
 21 |     "import numpy as np"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "markdown",
 26 |    "metadata": {},
 27 |    "source": [
 28 |     "The inspiration for this network is the use of Deep Neural Networks for real-time object recognition. Currently available **Convolution Layers** require large amount of computation power at runtime and that hinders the use of very deep networks in embedded systems or ASICs. Xiaofan Lin, Cong Zhao, and Wei Pan presented a way to convert Convolution Layers to **Binary Convolution Layers** for faster realtime computation."
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "markdown",
 33 |    "metadata": {},
 34 |    "source": [
 35 |     "### Approximating Convolution weights using binary weights\n",
 36 |     "Here the hope is to approximate $\\mathbf{W}\\in\\mathbb{R}^{w*h*c_{in}*c_{out}}$ using $\\alpha_1\\mathbf{B_1}+\\alpha_2\\mathbf{B_2}+...+\\alpha_m\\mathbf{B_m}$ where $\\mathbf{B_1}, \\mathbf{B_2}, ..., \\mathbf{B_m}\\in\\mathbb{R}^{w*h*c_{in}*c_{out}}$ and $\\alpha_1, \\alpha_2, ..., \\alpha_m\\in\\mathbb{R}^1$"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "metadata": {},
 42 |    "source": [
 43 |     "#### Conversion from convolution filter to binary filter\n",
 44 |     "Let's implement the conversion of convolution filter to binary convolution filters first.\n",
 45 |     "To approximate $\\mathbf{W}$ with $\\alpha_1\\mathbf{B_1}+\\alpha_2\\mathbf{B_2}+...+\\alpha_m\\mathbf{B_m}$ we'll use the equation from the paper $\\mathbf{B_i}=\\operatorname{sign}(\\bar{\\mathbf{W}} + \\mu_i\\operatorname{std}(\\mathbf{W}))$"
 46 |    ]
 47 |   },
 48 |   {
 49 |    "cell_type": "markdown",
 50 |    "metadata": {},
 51 |    "source": [
 52 |     "We'll need mean and standard deviation of the complete convolution filters"
 53 |    ]
 54 |   },
 55 |   {
 56 |    "cell_type": "code",
 57 |    "execution_count": 2,
 58 |    "metadata": {
 59 |     "collapsed": true
 60 |    },
 61 |    "outputs": [],
 62 |    "source": [
 63 |     "def get_mean_stddev(input_tensor):\n",
 64 |     "    with tf.name_scope('mean_stddev_cal'):\n",
 65 |     "        mean, variance = tf.nn.moments(input_tensor, axes=range(len(input_tensor.get_shape())))\n",
 66 |     "        stddev = tf.sqrt(variance, name=\"standard_deviation\")\n",
 67 |     "        return mean, stddev"
 68 |    ]
 69 |   },
 70 |   {
 71 |    "cell_type": "markdown",
 72 |    "metadata": {},
 73 |    "source": [
 74 |     "We need to spread the standard deviation by the number of filters being used as in the original paper\n",
 75 |     "$\\mu_i= -1 + (i - 1)\\frac{2}{\\mathbf{M} - 1}$"
 76 |    ]
 77 |   },
 78 |   {
 79 |    "cell_type": "code",
 80 |    "execution_count": 3,
 81 |    "metadata": {
 82 |     "collapsed": true
 83 |    },
 84 |    "outputs": [],
 85 |    "source": [
 86 |     "# TODO: Allow shift parameters to be learnable\n",
 87 |     "def get_shifted_stddev(stddev, no_filters):\n",
 88 |     "    with tf.name_scope('shifted_stddev'):\n",
 89 |     "        spreaded_deviation = -1. + (2./(no_filters - 1)) * tf.convert_to_tensor(range(no_filters),\n",
 90 |     "                                                                                dtype=tf.float32)\n",
 91 |     "        return spreaded_deviation * stddev"
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "markdown",
 96 |    "metadata": {},
 97 |    "source": [
 98 |     "Now, we can get the values of $\\mathbf{B_{i}s}$"
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "code",
103 |    "execution_count": 4,
104 |    "metadata": {
105 |     "collapsed": true
106 |    },
107 |    "outputs": [],
108 |    "source": [
109 |     "def get_binary_filters(convolution_filters, no_filters, name=None):\n",
110 |     "    with tf.name_scope(name, default_name=\"get_binary_filters\"):\n",
111 |     "        mean, stddev = get_mean_stddev(convolution_filters)\n",
112 |     "        shifted_stddev = get_shifted_stddev(stddev, no_filters)\n",
113 |     "        \n",
114 |     "        # Normalize the filters by subtracting mean from them\n",
115 |     "        mean_adjusted_filters = convolution_filters - mean\n",
116 |     "        \n",
117 |     "        # Tiling filters to match the number of filters\n",
118 |     "        expanded_filters = tf.expand_dims(mean_adjusted_filters, axis=0, name=\"expanded_filters\")\n",
119 |     "        tiled_filters = tf.tile(expanded_filters, [no_filters] + [1] * len(convolution_filters.get_shape()),\n",
120 |     "                                name=\"tiled_filters\")\n",
121 |     "        \n",
122 |     "        # Similarly tiling spreaded stddev to match the shape of tiled_filters\n",
123 |     "        expanded_stddev = tf.reshape(shifted_stddev, [no_filters] + [1] * len(convolution_filters.get_shape()),\n",
124 |     "                                     name=\"expanded_stddev\")\n",
125 |     "        \n",
126 |     "        binarized_filters = tf.sign(tiled_filters + expanded_stddev, name=\"binarized_filters\")\n",
127 |     "        return binarized_filters"
128 |    ]
129 |   },
130 |   {
131 |    "cell_type": "markdown",
132 |    "metadata": {},
133 |    "source": [
134 |     "#### Calculating alphas\n",
135 |     "Now, we can calculate alphas using the *binary filters* and *convolution filters* by minimizing the *squared difference*\n",
136 |     "$\\|\\mathbf{W}-\\mathbf{B}\\alpha\\|^2$"
137 |    ]
138 |   },
139 |   {
140 |    "cell_type": "code",
141 |    "execution_count": 5,
142 |    "metadata": {
143 |     "collapsed": true
144 |    },
145 |    "outputs": [],
146 |    "source": [
147 |     "def get_alphas(convolution_filters, binary_filters, no_filters, name=None):\n",
148 |     "    with tf.name_scope(name, \"get_alphas\"):\n",
149 |     "        # Reshaping convolution filters to be one dimensional and binary filters to be of [no_filters, -1] dimension\n",
150 |     "        reshaped_convolution_filters = tf.reshape(convolution_filters, [-1], name=\"reshaped_convolution_filters\")\n",
151 |     "        reshaped_binary_filters = tf.reshape(binary_filters, [no_filters, -1],\n",
152 |     "                                             name=\"reshaped_binary_filters\")\n",
153 |     "        \n",
154 |     "        # Creating variable for alphas\n",
155 |     "        alphas = tf.Variable(tf.random_normal(shape=(no_filters, 1), mean=1.0, stddev=0.1), name=\"alphas\")\n",
156 |     "        \n",
157 |     "        # Calculating W*alpha\n",
158 |     "        weighted_sum_filters = tf.reduce_sum(tf.multiply(alphas, reshaped_binary_filters),\n",
159 |     "                                             axis=0, name=\"weighted_sum_filters\")\n",
160 |     "        \n",
161 |     "        # Defining loss\n",
162 |     "        error = tf.square(reshaped_convolution_filters - weighted_sum_filters, name=\"alphas_error\")\n",
163 |     "        loss = tf.reduce_mean(error, axis=0, name=\"alphas_loss\")\n",
164 |     "        \n",
165 |     "        # Defining optimizer\n",
166 |     "        training_op = tf.train.AdamOptimizer().minimize(loss, var_list=[alphas],\n",
167 |     "                                                        name=\"alphas_training_op\")\n",
168 |     "        \n",
169 |     "        return alphas, training_op, loss"
170 |    ]
171 |   },
172 |   {
173 |    "cell_type": "markdown",
174 |    "metadata": {},
175 |    "source": [
176 |     "### Creating ApproxConv using the binary filters\n",
177 |     "$\\mathbf{O}=\\sum\\limits_{m=1}^M\\alpha_m\\operatorname{Conv}(\\mathbf{B}_m, \\mathbf{A})$"
178 |    ]
179 |   },
180 |   {
181 |    "cell_type": "markdown",
182 |    "metadata": {},
183 |    "source": [
184 |     "As in mentioned in the paper, it is better to train the network first with simple Convolution networks and then convert the filters into the binary filters, allowing original filters to be trained."
185 |    ]
186 |   },
187 |   {
188 |    "cell_type": "code",
189 |    "execution_count": 6,
190 |    "metadata": {
191 |     "collapsed": true
192 |    },
193 |    "outputs": [],
194 |    "source": [
195 |     "def ApproxConv(no_filters, convolution_filters, convolution_biases=None,\n",
196 |     "               strides=(1, 1), padding=\"VALID\", name=None):\n",
197 |     "    with tf.name_scope(name, \"ApproxConv\"):\n",
198 |     "        # Creating variables from input convolution filters and convolution biases\n",
199 |     "        filters = tf.Variable(convolution_filters, dtype=tf.float32, name=\"filters\")\n",
200 |     "        if convolution_biases is None:\n",
201 |     "            biases = 0.\n",
202 |     "        else:\n",
203 |     "            biases = tf.Variable(convolution_biases, dtype=tf.float32, name=\"biases\")\n",
204 |     "        \n",
205 |     "        # Creating binary filters\n",
206 |     "        binary_filters = get_binary_filters(filters, no_filters)\n",
207 |     "        \n",
208 |     "        # Getting alphas\n",
209 |     "        alphas, alphas_training_op, alphas_loss = get_alphas(filters, binary_filters,\n",
210 |     "                                                             no_filters)\n",
211 |     "        \n",
212 |     "        # Defining function for closure to accept multiple inputs with same filters\n",
213 |     "        def ApproxConvLayer(input_tensor, name=None):\n",
214 |     "            with tf.name_scope(name, \"ApproxConv_Layer\"):\n",
215 |     "                # Reshaping alphas to match the input tensor\n",
216 |     "                reshaped_alphas = tf.reshape(alphas,\n",
217 |     "                                             shape=[no_filters] + [1] * len(input_tensor.get_shape()),\n",
218 |     "                                             name=\"reshaped_alphas\")\n",
219 |     "                \n",
220 |     "                # Calculating convolution for each binary filter\n",
221 |     "                approxConv_outputs = []\n",
222 |     "                for index in range(no_filters):\n",
223 |     "                    # Binary convolution\n",
224 |     "                    this_conv = tf.nn.conv2d(input_tensor, binary_filters[index],\n",
225 |     "                                             strides=(1,) + strides + (1,),\n",
226 |     "                                             padding=padding)\n",
227 |     "                    approxConv_outputs.append(this_conv + biases)\n",
228 |     "                conv_outputs = tf.convert_to_tensor(approxConv_outputs, dtype=tf.float32,\n",
229 |     "                                                    name=\"conv_outputs\")\n",
230 |     "                \n",
231 |     "                # Summing up each of the binary convolution\n",
232 |     "                ApproxConv_output = tf.reduce_sum(tf.multiply(conv_outputs, reshaped_alphas), axis=0)\n",
233 |     "                \n",
234 |     "                return ApproxConv_output\n",
235 |     "        \n",
236 |     "        return alphas_training_op, ApproxConvLayer, alphas_loss"
237 |    ]
238 |   },
239 |   {
240 |    "cell_type": "markdown",
241 |    "metadata": {},
242 |    "source": [
243 |     "### Multiple binary activations and bitwise convolution\n",
244 |     "Now, convolution can be achieved using just the summation operations by using the ApproxConv layers. But the paper suggests something even better. We can even bypass the summation through bitwise operations only, if the input to the convolution layer is also binarized.\n",
245 |     "For that the authors suggests that an input can be binarized (creating multiple inputs) by shifting the inputs and binarizing them."
246 |    ]
247 |   },
248 |   {
249 |    "cell_type": "markdown",
250 |    "metadata": {},
251 |    "source": [
252 |     "First, the input is clipped between 0. and 1. using multiple shift parameters $\\nu$, learnable by the network  \n",
253 |     "$\\operatorname{h_{\\nu}}(x)=\\operatorname{clip}(x + \\nu, 0, 1)$  \n",
254 |     "  \n",
255 |     "Then using the following function it is binarized  \n",
256 |     "$\\operatorname{H_{\\nu}}(\\mathbf{R})=2\\mathbb{I}_{\\operatorname{h_{\\nu}}(\\mathbf{R})\\geq0.5}-1$\n",
257 |     "\n",
258 |     "The above function can be implemented as  \n",
259 |     "$\\operatorname{H_{\\nu}}(\\mathbf{R})=\\operatorname{sign}(\\mathbf{R} - 0.5)$\n",
260 |     "\n",
261 |     "Now, after calculating the **ApproxConv** over each separated input, their weighted summation can be taken using trainable paramters $\\beta s$"
262 |    ]
263 |   },
264 |   {
265 |    "cell_type": "code",
266 |    "execution_count": 7,
267 |    "metadata": {
268 |     "collapsed": true
269 |    },
270 |    "outputs": [],
271 |    "source": [
272 |     "def ABC(convolution_filters, convolution_biases=None, no_binary_filters=5, no_ApproxConvLayers=5,\n",
273 |     "        strides=(1, 1), padding=\"VALID\", name=None):\n",
274 |     "    with tf.name_scope(name, \"ABC\"):\n",
275 |     "        # Creating variables shift parameters and weighted sum parameters (betas)\n",
276 |     "        shift_parameters = tf.Variable(tf.constant(0., shape=(no_ApproxConvLayers, 1)), dtype=tf.float32,\n",
277 |     "                                       name=\"shift_parameters\")\n",
278 |     "        betas = tf.Variable(tf.constant(1., shape=(no_ApproxConvLayers, 1)), dtype=tf.float32,\n",
279 |     "                            name=\"betas\")\n",
280 |     "        \n",
281 |     "        # Instantiating the ApproxConv Layer\n",
282 |     "        alphas_training_op, ApproxConvLayer, alphas_loss = ApproxConv(no_binary_filters,\n",
283 |     "                                                                      convolution_filters, convolution_biases,\n",
284 |     "                                                                      strides, padding)\n",
285 |     "        \n",
286 |     "        def ABCLayer(input_tensor, name=None):\n",
287 |     "            with tf.name_scope(name, \"ABCLayer\"):\n",
288 |     "                # Reshaping betas to match the input tensor\n",
289 |     "                reshaped_betas = tf.reshape(betas,\n",
290 |     "                                            shape=[no_ApproxConvLayers] + [1] * len(input_tensor.get_shape()),\n",
291 |     "                                            name=\"reshaped_betas\")\n",
292 |     "                \n",
293 |     "                # Calculating ApproxConv for each shifted input\n",
294 |     "                ApproxConv_layers = []\n",
295 |     "                for index in range(no_ApproxConvLayers):\n",
296 |     "                    # Shifting and binarizing input\n",
297 |     "                    shifted_input = tf.clip_by_value(input_tensor + shift_parameters[index], 0., 1.,\n",
298 |     "                                                     name=\"shifted_input_\" + str(index))\n",
299 |     "                    binarized_activation = tf.sign(shifted_input - 0.5)\n",
300 |     "                    \n",
301 |     "                    # Passing through the ApproxConv layer\n",
302 |     "                    ApproxConv_layers.append(ApproxConvLayer(binarized_activation))\n",
303 |     "                ApproxConv_output = tf.convert_to_tensor(ApproxConv_layers, dtype=tf.float32,\n",
304 |     "                                                         name=\"ApproxConv_output\")\n",
305 |     "                \n",
306 |     "                # Taking the weighted sum using the betas\n",
307 |     "                ABC_output = tf.reduce_sum(tf.multiply(ApproxConv_output, reshaped_betas), axis=0)\n",
308 |     "                return ABC_output\n",
309 |     "        \n",
310 |     "        return alphas_training_op, ABCLayer, alphas_loss"
311 |    ]
312 |   },
313 |   {
314 |    "cell_type": "markdown",
315 |    "metadata": {},
316 |    "source": [
317 |     "## Testing\n",
318 |     "Let's just test our network using MNIST"
319 |    ]
320 |   },
321 |   {
322 |    "cell_type": "code",
323 |    "execution_count": 8,
324 |    "metadata": {},
325 |    "outputs": [
326 |     {
327 |      "name": "stdout",
328 |      "output_type": "stream",
329 |      "text": [
330 |       "Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.\n",
331 |       "Extracting /tmp/data/train-images-idx3-ubyte.gz\n",
332 |       "Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.\n",
333 |       "Extracting /tmp/data/train-labels-idx1-ubyte.gz\n",
334 |       "Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.\n",
335 |       "Extracting /tmp/data/t10k-images-idx3-ubyte.gz\n",
336 |       "Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.\n",
337 |       "Extracting /tmp/data/t10k-labels-idx1-ubyte.gz\n"
338 |      ]
339 |     }
340 |    ],
341 |    "source": [
342 |     "# Importing data\n",
343 |     "from tensorflow.examples.tutorials.mnist import input_data\n",
344 |     "!mkdir -p /tmp/data\n",
345 |     "mnist = input_data.read_data_sets(\"/tmp/data/\")"
346 |    ]
347 |   },
348 |   {
349 |    "cell_type": "code",
350 |    "execution_count": 9,
351 |    "metadata": {
352 |     "collapsed": true
353 |    },
354 |    "outputs": [],
355 |    "source": [
356 |     "# Defining utils function\n",
357 |     "def weight_variable(shape, name=\"weight\"):\n",
358 |     "    initial = tf.truncated_normal(shape, stddev=0.1)\n",
359 |     "    return tf.Variable(initial, name=name)\n",
360 |     "\n",
361 |     "def bias_variable(shape, name=\"bias\"):\n",
362 |     "    initial = tf.constant(0.1, shape=shape)\n",
363 |     "    return tf.Variable(initial, name=name)\n",
364 |     "\n",
365 |     "def conv2d(x, W):\n",
366 |     "    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')\n",
367 |     "\n",
368 |     "def max_pool_2x2(x):\n",
369 |     "    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],\n",
370 |     "                        strides=[1, 2, 2, 1], padding='SAME')"
371 |    ]
372 |   },
373 |   {
374 |    "cell_type": "code",
375 |    "execution_count": 10,
376 |    "metadata": {
377 |     "collapsed": true
378 |    },
379 |    "outputs": [],
380 |    "source": [
381 |     "# Creating the graph\n",
382 |     "without_ABC_graph = tf.Graph()\n",
383 |     "with without_ABC_graph.as_default():\n",
384 |     "    # Defining inputs\n",
385 |     "    x = tf.placeholder(dtype=tf.float32)\n",
386 |     "    x_image = tf.reshape(x, [-1, 28, 28, 1])\n",
387 |     "    \n",
388 |     "     # Convolution Layer 1\n",
389 |     "    W_conv1 = weight_variable(shape=([5, 5, 1, 32]), name=\"W_conv1\")\n",
390 |     "    b_conv1 = bias_variable(shape=[32], name=\"b_conv1\")\n",
391 |     "    conv1 = (conv2d(x_image, W_conv1) + b_conv1)\n",
392 |     "    pool1 = max_pool_2x2(conv1)\n",
393 |     "    bn_conv1 = tf.layers.batch_normalization(pool1, axis=-1, name=\"batchNorm1\")\n",
394 |     "    h_conv1 = tf.nn.relu(bn_conv1)\n",
395 |     "\n",
396 |     "    # Convolution Layer 2\n",
397 |     "    W_conv2 = weight_variable(shape=([5, 5, 32, 64]), name=\"W_conv2\")\n",
398 |     "    b_conv2 = bias_variable(shape=[64], name=\"b_conv2\")\n",
399 |     "    conv2 = (conv2d(h_conv1, W_conv2) + b_conv2)\n",
400 |     "    pool2 = max_pool_2x2(conv2)\n",
401 |     "    bn_conv2 = tf.layers.batch_normalization(pool2, axis=-1, name=\"batchNorm2\")\n",
402 |     "    h_conv2 = tf.nn.relu(bn_conv2)\n",
403 |     "\n",
404 |     "    # Flat the conv2 output\n",
405 |     "    h_conv2_flat = tf.reshape(h_conv2, shape=(-1, 7*7*64))\n",
406 |     "\n",
407 |     "    # Dense layer1\n",
408 |     "    W_fc1 = weight_variable([7 * 7 * 64, 1024])\n",
409 |     "    b_fc1 = bias_variable([1024])\n",
410 |     "    h_fc1 = tf.nn.relu(tf.matmul(h_conv2_flat, W_fc1) + b_fc1)\n",
411 |     "\n",
412 |     "    # Dropout\n",
413 |     "    keep_prob = tf.placeholder(tf.float32)\n",
414 |     "    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)\n",
415 |     "\n",
416 |     "    # Output layer\n",
417 |     "    W_fc2 = weight_variable([1024, 10])\n",
418 |     "    b_fc2 = bias_variable([10])\n",
419 |     "\n",
420 |     "    y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2\n",
421 |     "    \n",
422 |     "    # Labels\n",
423 |     "    y = tf.placeholder(tf.int32, [None])\n",
424 |     "    y_ = tf.one_hot(y, 10)\n",
425 |     "    \n",
426 |     "    # Defining optimizer and loss\n",
427 |     "    cross_entropy = tf.reduce_mean(\n",
428 |     "        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))\n",
429 |     "    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)\n",
430 |     "    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))\n",
431 |     "    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))\n",
432 |     "    \n",
433 |     "    # Initializer\n",
434 |     "    graph_init = tf.global_variables_initializer()"
435 |    ]
436 |   },
437 |   {
438 |    "cell_type": "markdown",
439 |    "metadata": {},
440 |    "source": [
441 |     "Let's just define a dictionary to hold the numpy values of the calculated parameters of the network, so that we can feed it directly to our custom network"
442 |    ]
443 |   },
444 |   {
445 |    "cell_type": "code",
446 |    "execution_count": 11,
447 |    "metadata": {
448 |     "collapsed": true
449 |    },
450 |    "outputs": [],
451 |    "source": [
452 |     "# Defining variables to save. These will be fed to our custom layer\n",
453 |     "variables_to_save = {\"W_conv1\": W_conv1,\n",
454 |     "                     \"b_conv1\": b_conv1,\n",
455 |     "                     \"W_conv2\": W_conv2,\n",
456 |     "                     \"b_conv2\": b_conv2,\n",
457 |     "                     \"W_fc1\": W_fc1,\n",
458 |     "                     \"b_fc1\": b_fc1,\n",
459 |     "                     \"W_fc2\": W_fc2,\n",
460 |     "                     \"b_fc2\": b_fc2}\n",
461 |     "values = {}"
462 |    ]
463 |   },
464 |   {
465 |    "cell_type": "code",
466 |    "execution_count": 14,
467 |    "metadata": {},
468 |    "outputs": [
469 |     {
470 |      "name": "stdout",
471 |      "output_type": "stream",
472 |      "text": [
473 |       "Epoch: 1  Val accuracy: 88.0000%  Loss: 0.432063\n",
474 |       "Epoch: 2  Val accuracy: 98.0000%  Loss: 0.128601\n",
475 |       "Epoch: 3  Val accuracy: 96.0000%  Loss: 0.197146\n",
476 |       "Epoch: 4  Val accuracy: 96.0000%  Loss: 0.111511\n",
477 |       "Epoch: 5  Val accuracy: 92.0000%  Loss: 0.232009\n"
478 |      ]
479 |     }
480 |    ],
481 |    "source": [
482 |     "n_epochs = 5\n",
483 |     "batch_size = 32\n",
484 |     "        \n",
485 |     "with tf.Session(graph=without_ABC_graph) as sess:\n",
486 |     "    sess.run(graph_init)\n",
487 |     "    for epoch in range(n_epochs):\n",
488 |     "        for iteration in range(1, 200 + 1):\n",
489 |     "            batch = mnist.train.next_batch(50)\n",
490 |     "            \n",
491 |     "            # Run operation and calculate loss\n",
492 |     "            _, loss_train = sess.run([train_step, cross_entropy],\n",
493 |     "                                     feed_dict={x: batch[0], y: batch[1], keep_prob: 0.5})\n",
494 |     "            print(\"\\rIteration: {}/{} ({:.1f}%)  Loss: {:.5f}\".format(\n",
495 |     "                      iteration, 200,\n",
496 |     "                      iteration * 100 / 200,\n",
497 |     "                      loss_train),\n",
498 |     "                  end=\"\")\n",
499 |     "\n",
500 |     "        # At the end of each epoch,\n",
501 |     "        # measure the validation loss and accuracy:\n",
502 |     "        loss_vals = []\n",
503 |     "        acc_vals = []\n",
504 |     "        for iteration in range(1, 200 + 1):\n",
505 |     "            X_batch, y_batch = mnist.validation.next_batch(batch_size)\n",
506 |     "            acc_val, loss_val = sess.run([accuracy, cross_entropy],\n",
507 |     "                                     feed_dict={x: batch[0], y: batch[1], keep_prob: 1.0})\n",
508 |     "            loss_vals.append(loss_val)\n",
509 |     "            acc_vals.append(acc_val)\n",
510 |     "            print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(iteration, 200,\n",
511 |     "                iteration * 100 / 200),\n",
512 |     "                  end=\" \" * 10)\n",
513 |     "        loss_val = np.mean(loss_vals)\n",
514 |     "        acc_val = np.mean(acc_vals)\n",
515 |     "        print(\"\\rEpoch: {}  Val accuracy: {:.4f}%  Loss: {:.6f}\".format(\n",
516 |     "            epoch + 1, acc_val * 100, loss_val))\n",
517 |     "        \n",
518 |     "    # On completion of training, save the variables to be fed to custom model\n",
519 |     "    for var_name in variables_to_save:\n",
520 |     "        values[var_name] = sess.run(variables_to_save[var_name])"
521 |    ]
522 |   },
523 |   {
524 |    "cell_type": "markdown",
525 |    "metadata": {},
526 |    "source": [
527 |     "### Let's build our model now"
528 |    ]
529 |   },
530 |   {
531 |    "cell_type": "code",
532 |    "execution_count": 17,
533 |    "metadata": {
534 |     "collapsed": true
535 |    },
536 |    "outputs": [],
537 |    "source": [
538 |     "custom_graph = tf.Graph()\n",
539 |     "with custom_graph.as_default():\n",
540 |     "    alphas_training_operations = []\n",
541 |     "    \n",
542 |     "    # Inputs\n",
543 |     "    x = tf.placeholder(dtype=tf.float32)\n",
544 |     "    x_image = tf.reshape(x, [-1, 28, 28, 1])\n",
545 |     "    \n",
546 |     "    # Convolution Layer 1\n",
547 |     "    W_conv1 = tf.Variable(values[\"W_conv1\"], name=\"W_conv1\")\n",
548 |     "    b_conv1 = tf.Variable(values[\"b_conv1\"], name=\"b_conv1\")\n",
549 |     "    alphas_training_op1, ABCLayer1, alphas_loss1 = ABC(W_conv1, b_conv1,\n",
550 |     "                                                       no_binary_filters=5,\n",
551 |     "                                                       no_ApproxConvLayers=5,\n",
552 |     "                                                       padding=\"SAME\")\n",
553 |     "    alphas_training_operations.append(alphas_training_op1)\n",
554 |     "    conv1 = ABCLayer1(x_image)\n",
555 |     "    pool1 = max_pool_2x2(conv1)\n",
556 |     "    bn_conv1 = tf.layers.batch_normalization(pool1, axis=-1)\n",
557 |     "    h_conv1 = tf.nn.relu(bn_conv1)\n",
558 |     "\n",
559 |     "    # Convolution Layer 2\n",
560 |     "    W_conv2 = tf.Variable(values[\"W_conv2\"], name=\"W_conv2\")\n",
561 |     "    b_conv2 = tf.Variable(values[\"b_conv2\"], name=\"b_conv2\")\n",
562 |     "    alphas_training_op2, ABCLayer2, alphas_loss2 = ABC(W_conv2, b_conv2,\n",
563 |     "                                                       no_binary_filters=5,\n",
564 |     "                                                       no_ApproxConvLayers=5,\n",
565 |     "                                                       padding=\"SAME\")\n",
566 |     "    alphas_training_operations.append(alphas_training_op2)\n",
567 |     "    conv2 = ABCLayer2(h_conv1)\n",
568 |     "    pool2 = max_pool_2x2(conv2)\n",
569 |     "    bn_conv2 = tf.layers.batch_normalization(pool2, axis=-1)\n",
570 |     "    h_conv2 = tf.nn.relu(bn_conv2)\n",
571 |     "\n",
572 |     "    # Flat the conv2 output\n",
573 |     "    h_conv2_flat = tf.reshape(h_conv2, shape=(-1, 7*7*64))\n",
574 |     "\n",
575 |     "    # Dense layer1\n",
576 |     "    W_fc1 = weight_variable([7 * 7 * 64, 1024])\n",
577 |     "    b_fc1 = bias_variable([1024])\n",
578 |     "    h_fc1 = tf.nn.relu(tf.matmul(h_conv2_flat, W_fc1) + b_fc1)\n",
579 |     "\n",
580 |     "    # Dropout\n",
581 |     "    keep_prob = tf.placeholder(tf.float32)\n",
582 |     "    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)\n",
583 |     "\n",
584 |     "    # Output layer\n",
585 |     "    W_fc2 = weight_variable([1024, 10])\n",
586 |     "    b_fc2 = bias_variable([10])\n",
587 |     "    y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2\n",
588 |     "    \n",
589 |     "    # Labels\n",
590 |     "    y = tf.placeholder(tf.int32, [None])\n",
591 |     "    y_ = tf.one_hot(y, 10)\n",
592 |     "    \n",
593 |     "    # Defining optimizer and loss\n",
594 |     "    cross_entropy = tf.reduce_mean(\n",
595 |     "        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))\n",
596 |     "    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)\n",
597 |     "    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))\n",
598 |     "    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))\n",
599 |     "    \n",
600 |     "    graph_init = tf.global_variables_initializer()"
601 |    ]
602 |   },
603 |   {
604 |    "cell_type": "code",
605 |    "execution_count": 20,
606 |    "metadata": {
607 |     "scrolled": true
608 |    },
609 |    "outputs": [
610 |     {
611 |      "name": "stdout",
612 |      "output_type": "stream",
613 |      "text": [
614 |       "Epoch: 1  Val accuracy: 88.0000%  Loss: 6.530759\n",
615 |       "Epoch: 2  Val accuracy: 86.0000%  Loss: 4.208882\n",
616 |       "Epoch: 3  Val accuracy: 92.0000%  Loss: 1.455365\n",
617 |       "Epoch: 4  Val accuracy: 92.0000%  Loss: 0.708834\n",
618 |       "Epoch: 5  Val accuracy: 86.0000%  Loss: 0.366106\n"
619 |      ]
620 |     }
621 |    ],
622 |    "source": [
623 |     "n_epochs = 5\n",
624 |     "batch_size = 32\n",
625 |     "alpha_training_epochs = 200\n",
626 |     "        \n",
627 |     "with tf.Session(graph=custom_graph) as sess:\n",
628 |     "    sess.run(graph_init)\n",
629 |     "    for epoch in range(n_epochs):\n",
630 |     "        for iteration in range(1, 200 + 1):\n",
631 |     "            # Training alphas\n",
632 |     "            for alpha_training_op in alphas_training_operations:\n",
633 |     "                for alpha_epoch in range(alpha_training_epochs):\n",
634 |     "                    sess.run(alpha_training_op)\n",
635 |     "            \n",
636 |     "            batch = mnist.train.next_batch(50)\n",
637 |     "            \n",
638 |     "            # Run operation and calculate loss\n",
639 |     "            _, loss_train = sess.run([train_step, cross_entropy],\n",
640 |     "                                     feed_dict={x: batch[0], y: batch[1], keep_prob: 0.5})\n",
641 |     "            print(\"\\rIteration: {}/{} ({:.1f}%)  Loss: {:.5f}\".format(\n",
642 |     "                      iteration, 200,\n",
643 |     "                      iteration * 100 / 200,\n",
644 |     "                      loss_train),\n",
645 |     "                  end=\"\")\n",
646 |     "\n",
647 |     "        # At the end of each epoch,\n",
648 |     "        # measure the validation loss and accuracy:\n",
649 |     "        \n",
650 |     "        # Training alphas\n",
651 |     "        for alpha_training_op in alphas_training_operations:\n",
652 |     "            for alpha_epoch in range(alpha_training_epochs):\n",
653 |     "                sess.run(alpha_training_op)\n",
654 |     "                    \n",
655 |     "        loss_vals = []\n",
656 |     "        acc_vals = []\n",
657 |     "        for iteration in range(1, 200 + 1):            \n",
658 |     "            X_batch, y_batch = mnist.validation.next_batch(batch_size)\n",
659 |     "            acc_val, loss_val = sess.run([accuracy, cross_entropy],\n",
660 |     "                                     feed_dict={x: batch[0], y: batch[1], keep_prob: 1.0})\n",
661 |     "            loss_vals.append(loss_val)\n",
662 |     "            acc_vals.append(acc_val)\n",
663 |     "            print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(iteration, 200,\n",
664 |     "                iteration * 100 / 200),\n",
665 |     "                  end=\" \" * 10)\n",
666 |     "        loss_val = np.mean(loss_vals)\n",
667 |     "        acc_val = np.mean(acc_vals)\n",
668 |     "        print(\"\\rEpoch: {}  Val accuracy: {:.4f}%  Loss: {:.6f}\".format(\n",
669 |     "            epoch + 1, acc_val * 100, loss_val))"
670 |    ]
671 |   },
672 |   {
673 |    "cell_type": "code",
674 |    "execution_count": null,
675 |    "metadata": {
676 |     "collapsed": true
677 |    },
678 |    "outputs": [],
679 |    "source": []
680 |   }
681 |  ],
682 |  "metadata": {
683 |   "kernelspec": {
684 |    "display_name": "tensorflow",
685 |    "language": "python",
686 |    "name": "tensorflow"
687 |   },
688 |   "language_info": {
689 |    "codemirror_mode": {
690 |     "name": "ipython",
691 |     "version": 2
692 |    },
693 |    "file_extension": ".py",
694 |    "mimetype": "text/x-python",
695 |    "name": "python",
696 |    "nbconvert_exporter": "python",
697 |    "pygments_lexer": "ipython2",
698 |    "version": "2.7.15"
699 |   }
700 |  },
701 |  "nbformat": 4,
702 |  "nbformat_minor": 2
703 | }
704 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 layog
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Accurate-Binary-Convolution-Network  
 2 | Binary Convolution Network for faster real-time processing in ASICs  
 3 | 
 4 | ---
 5 | 
 6 | Tensorflow implementation of [Towards Accurate Binary Convolutional Neural Network](https://arxiv.org/abs/1711.11294) by Xiaofan Lin, Cong Zhao, and Wei Pan.  
 7 | Why this network? Let's quote the authors
 8 | > It has been known that using binary weights and activations drastically reduce memory size and accesses, and can replace arithmetic operations with more efficient bitwise operations, leading to much faster test-time inference and lower power consumption.  
 9 | > The implementation of the resulting binary CNN, denoted as ABC-Net, is shown to achieve much closer performance to its full-precision counterpart, and even reach the comparable prediction accuracy on ImageNet and forest trail datasets, given adequate binary weight bases and activations.
10 | 
11 | ### Dependencies
12 | ```sh
13 | pip install -r requirements.txt
14 | ```
15 | By default `tensorflow-gpu` will be installed. Make sure to have `CUDA` properly setup.
16 | 
17 | ### Notebooks
18 | * **ABC** - Contains the original implementation of the ABC network
19 | * **ABC-layer-inference-support** - Slightly modified functions for better inference time support (tl;dr moved the alpha training operation out of the layer)
20 | 
21 | ### Testing
22 | * MNIST - Accuracy on validation set reached upto 94%. (Check the notebook for information)
23 | * ImageNet - To be added
24 | 
25 | > NOTE: shift_parameters and beta values are currently not trainable. This is because the gradient for `tf.sign` and `tf.clip_by_value` were not implemented in `tensorflow v1.4`. Even in the current version (`tensorflow v1.8`) the gradient for `tf.sign` is not implemented. Implementation of custom Straight Through Estimator (STE) is required.
26 | 
27 | ### TODO
28 | - [ ] Test on ImageNet (2012)
29 | - [ ] Add visualization of the complete `ABC` layer
30 | - [ ] Port to `tensorflow v1.8.0`
31 | - [ ] Implement custom STE for `tf.sign`


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | tensorflow-gpu==1.4.1
2 | ipykernel==4.7.0
3 | numpy==1.14.0
4 | 
5 | 


--------------------------------------------------------------------------------