├── LICENSE ├── README.md ├── img ├── Rnn.png ├── graph.png ├── nas.jpeg ├── swish_.png ├── swish_com.png └── swish_graph.png └── src ├── Searching for activation function.ipynb ├── WMT14.py ├── __pycache__ ├── config.cpython-35.pyc ├── config.cpython-36.pyc ├── dataset.cpython-35.pyc ├── models.cpython-35.pyc ├── models.cpython-36.pyc ├── network.cpython-35.pyc ├── network.cpython-36.pyc ├── parser.cpython-35.pyc ├── parser.cpython-36.pyc ├── resnet.cpython-35.pyc ├── resnet.cpython-36.pyc ├── resnet_model.cpython-35.pyc ├── resnet_model.cpython-36.pyc ├── utils.cpython-35.pyc └── utils.cpython-36.pyc ├── childnetwork.py ├── cifar100_download_and_extract.py ├── cifar100_test.py ├── cifar100_train.py ├── cifar10_download_and_extract.py ├── config.py ├── dataset.py ├── img ├── Rnn.png ├── activationfunctions.png ├── graph.png ├── loss_rmsprop.png └── nas.jpeg ├── main.py ├── parser.py ├── rnn_controller.py ├── swish.py ├── train └── events.out.tfevents.1514935452.6bf252a4b161 └── utils.py /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License 2 | 3 | Copyright (c) Ang Ming Liang 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Searching for activation functions 2 | 3 | This project attempts to implement NIPS 2017 paper "Searching for activation function" (Zoph & Le 2017). Although neural networks are powerful and flexible models, they are still hard to design and limited by human creativity. Using a combination of exhaustive and reinforcement learning-based search, the paper claims to be able to discover multiple novel activation functions. We tried to verify the claims of this paper by replicating the original study. However we were unable to get good results as probably due to the lack of massive computing resources used in the original experiment (800 Titan X GPUs). 4 | 5 | ![alt text](https://github.com/Neoanarika/Searching-for-activation-functions/blob/master/img/nas.jpeg) 6 | 7 | # Dependencies 8 | 9 | - Anaconda3 10 | - TensorFlow-GPU >=1.4 11 | 12 | # Setting up the docker environment 13 | If you do not have the right dependencies to run this project, you can use our docker image which we used too to run these experiments on. 14 | ``` 15 | docker pull etheleon/dotfiles 16 | docker run --runtime=nvidia -it etheleon/dotfiles 17 | ``` 18 | 19 | 20 | # Running the code 21 | Do a git clone of the repo first, then navigate into the src folder where the code of this project is stored 22 | ``` 23 | git clone https://github.com/Neoanarika/Searching-for-activation-functions.git 24 | cd Searching-for-activation-functions 25 | cd src 26 | ``` 27 | Download the data first, then find the activation functions 28 | ``` 29 | python cifar10_download_and_extract.py 30 | python main.py 31 | ``` 32 | 33 | Next, test against your newly generated activation functions 34 | ``` 35 | python cifar100_download_and_extract.py 36 | python cifar100_train.py 37 | python cifar100_test.py 38 | ``` 39 | 40 | Or you can open up the jupyter notebook in the repo and run from there. 41 | 42 | # RNN controller 43 | 44 | ![alt text](https://github.com/Neoanarika/Searching-for-activation-functions/blob/master/img/Rnn.png) 45 | ![alt text](https://github.com/Neoanarika/Searching-for-activation-functions/blob/master/img/graph.png) 46 | 47 | # Some sample activation functions found 48 | | Activation functions | 49 | | ------------- | 50 | | 3x | 51 | | 1 | 52 | | -3 | 53 | 54 | Clearly we are doing something wrong, the problem with implementing these papers is that even if it doesn't work, it could be due to us not running it long enough, or perhaps there's a bug in the program that we are unaware of that is causing the negative result. 55 | 56 | # Evaluating Swish 57 | We also implemented swish, which was the activaiton function found and discussed in the original paper 58 | 59 | ![alt text](https://github.com/Neoanarika/Searching-for-activation-functions/blob/master/img/swish_.png) 60 | 61 | ``` 62 | python swish.py 63 | ``` 64 | 65 | ![alt text](https://github.com/Neoanarika/Searching-for-activation-functions/blob/master/src/img/loss_rmsprop.png) 66 | 67 | We found a few things, the first is that sometimes during the inital phase of training, the loss function remains the same on average. This shows that swish suffers from poor intialisation during training, at least when using initally normal distributed weights with std_dev =0.1. We tried various initialisations but there were no improvements found. Finially changing the optimiser from SGD to Rmsprop solved the problem. The diagram above is from training with Rmsprop. 68 | 69 | 70 | # Visualising Swish activation function 71 | ![alt text](https://github.com/Neoanarika/Searching-for-activation-functions/blob/master/img/swish_com.png) 72 | 73 | Swish has a sharp global minima especially when compared with Relu, which may account for the high variance of the gradient updates as the model might be stuck in the wedge to reach the global minima. Learning rate decay might thus help improve the training for models using swish. Furthermore a sharper minima corresponds with poorer generalisation, which might explain why it performs slightly worse than relu in practise. 74 | 75 | # Citation 76 | ``` 77 | @article{DBLP:journals/corr/abs-1710-05941, 78 | author = {Prajit Ramachandran and 79 | Barret Zoph and 80 | Quoc V. Le}, 81 | title = {Searching for Activation Functions}, 82 | journal = {CoRR}, 83 | volume = {abs/1710.05941}, 84 | year = {2017}, 85 | url = {http://arxiv.org/abs/1710.05941}, 86 | archivePrefix = {arXiv}, 87 | eprint = {1710.05941}, 88 | timestamp = {Wed, 01 Nov 2017 19:05:42 +0100}, 89 | biburl = {http://dblp.org/rec/bib/journals/corr/abs-1710-05941}, 90 | bibsource = {dblp computer science bibliography, http://dblp.org} 91 | } 92 | ``` 93 | 94 | -------------------------------------------------------------------------------- /img/Rnn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/img/Rnn.png -------------------------------------------------------------------------------- /img/graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/img/graph.png -------------------------------------------------------------------------------- /img/nas.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/img/nas.jpeg -------------------------------------------------------------------------------- /img/swish_.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/img/swish_.png -------------------------------------------------------------------------------- /img/swish_com.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/img/swish_com.png -------------------------------------------------------------------------------- /img/swish_graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/img/swish_graph.png -------------------------------------------------------------------------------- /src/Searching for activation function.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Overview of the project " 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "This project attempts to implement NIPS 2017 paper \"Searching for activation function\" (Zoph & Le 2017). Although neural networks are powerful and flexible models they are still hard to design and limited to human creativity. One important consideration to designing a neural networks is the activation function as it works as the non-linearity between the affine transformation in a neural network. However how do we choose which basic functions to use and combine with to construct a new activaiton functions. Essentially the problem becomes a search problem of finding the best activation function in a search space. The approach that Zoph and Le took in their paper was similar to their earlier work \"Neural archiecture search using reinforcement learning\" that used a RNN to sample the possible hyperparamers of a neural network while using a polciy gradient approach(specifically REINFORCE)to train the network to maximise the validation accuracy of the child network (i.e. the reward signal)." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "![title](img/nas.jpeg)" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "In this paper they used a RNN to instead generate the choice of unary and binary functions used to create the activation function and trained the RNN using the Policy Proximal Optimization (PPO) algorthim. " 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "![title](img/activationfunctions.png)" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "Using PPO is in strak contrast with the other 2 papers previous publish by Zoph & Le which used REINFORCE and Trust Region Policies (TRP). A reason why did not used REINFORCE was given in their paper \"Neural Optimizer Search with Reinforcement Learning\", there the authors explanation was that REINFORCE exhbited poor sampling efficiency comapred to Trust Region Policies. However no explanations was given as to why PPO was used in searching for activations instead of TRP. Another thing the author failed to explain was why they used policy gradient methods instead of other approaches like evolutionary strategies. Finally there was no control done in any of the papers as to whether their approach was better than say random search and how much better was it than random search. Such a controlled experiment might also make it possible to compare other approaches say evolutionary strategies with their method more fairly. This is however outside the scope of a mere paper implementation and is worth considering as a future project. " 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "![title](img/Rnn.png)" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "# Dependencies" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "This project requires tensorflow gpu" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": null, 69 | "metadata": { 70 | "collapsed": true 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "!pip install tensorflow-gpu" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "# Start searching for activation functions" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "Download cifar-10 datset" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": null, 94 | "metadata": { 95 | "collapsed": true 96 | }, 97 | "outputs": [], 98 | "source": [ 99 | "!python cifar10_download_and_extract.py" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "Run the training program" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": null, 112 | "metadata": { 113 | "collapsed": true 114 | }, 115 | "outputs": [], 116 | "source": [ 117 | "!python main.py" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "The search is conducted using ResNet-20 as the child network architecture and trained on CIFAR-10 for 10k. " 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "# Testing on CIFAR-100" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "In the paper 3 datasets were used to test the transfer capabilties of the newly found activation functions \n", 139 | "1. CIFAR-100\n", 140 | "2. ImageNet\n", 141 | "3. WMT\n", 142 | "\n", 143 | "I wasn't able to donwload imagenet because of it's large size. WMT 2014 EnglishIn this notebook we only look at CIFAR100 and WMT 2014 English-German Dataset" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": null, 149 | "metadata": { 150 | "collapsed": true 151 | }, 152 | "outputs": [], 153 | "source": [ 154 | "!python cifar100_download_and_extract.py\n", 155 | "!python cifar100_train.py\n", 156 | "!python cifar100_test.py" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "# Swish" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "Swish was found by the original activation function search implemented by the the original paper by Zoph and Le and was demostrated to have an **improvement of the top-1 classification by ImageNet by 0.9% by simply\n", 171 | "replacing all relu activation functions with swish**. " 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": { 178 | "collapsed": true 179 | }, 180 | "outputs": [], 181 | "source": [ 182 | "!python swish.py" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "![title](img/loss_rmsprop.png)" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": { 196 | "collapsed": true 197 | }, 198 | "outputs": [], 199 | "source": [] 200 | } 201 | ], 202 | "metadata": { 203 | "anaconda-cloud": {}, 204 | "kernelspec": { 205 | "display_name": "Python [conda root]", 206 | "language": "python", 207 | "name": "conda-root-py" 208 | }, 209 | "language_info": { 210 | "codemirror_mode": { 211 | "name": "ipython", 212 | "version": 3 213 | }, 214 | "file_extension": ".py", 215 | "mimetype": "text/x-python", 216 | "name": "python", 217 | "nbconvert_exporter": "python", 218 | "pygments_lexer": "ipython3", 219 | "version": "3.5.2" 220 | } 221 | }, 222 | "nbformat": 4, 223 | "nbformat_minor": 1 224 | } 225 | -------------------------------------------------------------------------------- /src/WMT14.py: -------------------------------------------------------------------------------- 1 | # Copyright 2015 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | 16 | """Downloads and extracts the binary version of the CIFAR-10 dataset.""" 17 | 18 | from __future__ import absolute_import 19 | from __future__ import division 20 | from __future__ import print_function 21 | 22 | import argparse 23 | import os 24 | import sys 25 | import tarfile 26 | 27 | from six.moves import urllib 28 | import tensorflow as tf 29 | 30 | DATA_URL = 'https://wit3.fbk.eu/archive/2016-01//texts/de/en/de-en.tgz' 31 | 32 | parser = argparse.ArgumentParser() 33 | 34 | parser.add_argument( 35 | '--data_dir', type=str, default='../data/wmt/', 36 | help='Directory to download data and extract the tarball') 37 | 38 | 39 | def main(unused_argv): 40 | """Download and extract the tarball from Alex's website.""" 41 | if not os.path.exists(FLAGS.data_dir): 42 | os.makedirs(FLAGS.data_dir) 43 | 44 | filename = DATA_URL.split('/')[-1] 45 | filepath = os.path.join(FLAGS.data_dir, filename) 46 | 47 | if not os.path.exists(filepath): 48 | def _progress(count, block_size, total_size): 49 | sys.stdout.write('\r>> Downloading %s %.1f%%' % ( 50 | filename, 100.0 * count * block_size / total_size)) 51 | sys.stdout.flush() 52 | 53 | filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress) 54 | print() 55 | statinfo = os.stat(filepath) 56 | print('Successfully downloaded', filename, statinfo.st_size, 'bytes.') 57 | 58 | tarfile.open(filepath, 'r:gz').extractall(FLAGS.data_dir) 59 | 60 | 61 | if __name__ == '__main__': 62 | FLAGS, unparsed = parser.parse_known_args() 63 | tf.app.run(argv=[sys.argv[0]] + unparsed) 64 | -------------------------------------------------------------------------------- /src/__pycache__/config.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/config.cpython-35.pyc -------------------------------------------------------------------------------- /src/__pycache__/config.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/config.cpython-36.pyc -------------------------------------------------------------------------------- /src/__pycache__/dataset.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/dataset.cpython-35.pyc -------------------------------------------------------------------------------- /src/__pycache__/models.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/models.cpython-35.pyc -------------------------------------------------------------------------------- /src/__pycache__/models.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/models.cpython-36.pyc -------------------------------------------------------------------------------- /src/__pycache__/network.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/network.cpython-35.pyc -------------------------------------------------------------------------------- /src/__pycache__/network.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/network.cpython-36.pyc -------------------------------------------------------------------------------- /src/__pycache__/parser.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/parser.cpython-35.pyc -------------------------------------------------------------------------------- /src/__pycache__/parser.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/parser.cpython-36.pyc -------------------------------------------------------------------------------- /src/__pycache__/resnet.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/resnet.cpython-35.pyc -------------------------------------------------------------------------------- /src/__pycache__/resnet.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/resnet.cpython-36.pyc -------------------------------------------------------------------------------- /src/__pycache__/resnet_model.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/resnet_model.cpython-35.pyc -------------------------------------------------------------------------------- /src/__pycache__/resnet_model.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/resnet_model.cpython-36.pyc -------------------------------------------------------------------------------- /src/__pycache__/utils.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/utils.cpython-35.pyc -------------------------------------------------------------------------------- /src/__pycache__/utils.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/utils.cpython-36.pyc -------------------------------------------------------------------------------- /src/childnetwork.py: -------------------------------------------------------------------------------- 1 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Contains definitions for the preactivation form of Residual Networks. 16 | 17 | Residual networks (ResNets) were originally proposed in: 18 | [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun 19 | Deep Residual Learning for Image Recognition. arXiv:1512.03385 20 | 21 | The full preactivation 'v2' ResNet variant implemented in this module was 22 | introduced by: 23 | [2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun 24 | Identity Mappings in Deep Residual Networks. arXiv: 1603.05027 25 | 26 | The key difference of the full preactivation 'v2' variant compared to the 27 | 'v1' variant in [1] is the use of batch normalization before every weight layer 28 | rather than after. 29 | """ 30 | 31 | from __future__ import absolute_import 32 | from __future__ import division 33 | from __future__ import print_function 34 | 35 | import tensorflow as tf 36 | 37 | _BATCH_NORM_DECAY = 0.997 38 | _BATCH_NORM_EPSILON = 1e-5 39 | 40 | def batch_norm_relu(inputs, is_training, data_format): 41 | """Performs a batch normalization followed by a ReLU.""" 42 | # We set fused=True for a significant performance boost. See 43 | # https://www.tensorflow.org/performance/performance_guide#common_fused_ops 44 | inputs = tf.layers.batch_normalization( 45 | inputs=inputs, axis=1 if data_format == 'channels_first' else 3, 46 | momentum=_BATCH_NORM_DECAY, epsilon=_BATCH_NORM_EPSILON, center=True, 47 | scale=True, training=is_training, fused=True) 48 | 49 | unary = {"1":lambda x:x ,"2":lambda x: -x, "3": lambda x: tf.maximum(x,0), "4":lambda x : tf.pow(x,2),"5":lambda x : tf.tanh(tf.cast(x,tf.float32))} 50 | binary = {"1":lambda x,y: tf.add(x,y),"2":lambda x,y:tf.multiply(x,y),"3":lambda x,y:tf.add(x,-y),"4":lambda x,y:tf.maximum(x,y),"5":lambda x,y: tf.sigmoid(x)*y} 51 | input_fun = {"1":lambda x:tf.cast(x,tf.float32) , "2":lambda x:tf.zeros(tf.shape(x)), "3": lambda x:2*tf.ones(tf.shape(x)),"4": lambda x : tf.ones(tf.shape(x)), "5": lambda x: -tf.ones(tf.shape(x))} 52 | 53 | with open("tmp","r") as f: 54 | activation = f.readline() 55 | activation = activation.split(" ") 56 | 57 | inputs = binary[activation[8]](unary[activation[5]](binary[activation[4]](unary[activation[2]](input_fun[activation[0]](inputs)),unary[activation[3]](input_fun[activation[1]](inputs)))),unary[activation[7]](input_fun[activation[6]](inputs))) 58 | #inputs = tf.nn.relu(inputs) 59 | return inputs 60 | 61 | 62 | def fixed_padding(inputs, kernel_size, data_format): 63 | """Pads the input along the spatial dimensions independently of input size. 64 | 65 | Args: 66 | inputs: A tensor of size [batch, channels, height_in, width_in] or 67 | [batch, height_in, width_in, channels] depending on data_format. 68 | kernel_size: The kernel to be used in the conv2d or max_pool2d operation. 69 | Should be a positive integer. 70 | data_format: The input format ('channels_last' or 'channels_first'). 71 | 72 | Returns: 73 | A tensor with the same format as the input with the data either intact 74 | (if kernel_size == 1) or padded (if kernel_size > 1). 75 | """ 76 | pad_total = kernel_size - 1 77 | pad_beg = pad_total // 2 78 | pad_end = pad_total - pad_beg 79 | 80 | if data_format == 'channels_first': 81 | padded_inputs = tf.pad(inputs, [[0, 0], [0, 0], 82 | [pad_beg, pad_end], [pad_beg, pad_end]]) 83 | else: 84 | padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end], 85 | [pad_beg, pad_end], [0, 0]]) 86 | return padded_inputs 87 | 88 | 89 | def conv2d_fixed_padding(inputs, filters, kernel_size, strides, data_format): 90 | """Strided 2-D convolution with explicit padding.""" 91 | # The padding is consistent and is based only on `kernel_size`, not on the 92 | # dimensions of `inputs` (as opposed to using `tf.layers.conv2d` alone). 93 | if strides > 1: 94 | inputs = fixed_padding(inputs, kernel_size, data_format) 95 | 96 | return tf.layers.conv2d( 97 | inputs=inputs, filters=filters, kernel_size=kernel_size, strides=strides, 98 | padding=('SAME' if strides == 1 else 'VALID'), use_bias=False, 99 | kernel_initializer=tf.variance_scaling_initializer(), 100 | data_format=data_format) 101 | 102 | 103 | def building_block(inputs, filters, is_training, projection_shortcut, strides, 104 | data_format): 105 | """Standard building block for residual networks with BN before convolutions. 106 | 107 | Args: 108 | inputs: A tensor of size [batch, channels, height_in, width_in] or 109 | [batch, height_in, width_in, channels] depending on data_format. 110 | filters: The number of filters for the convolutions. 111 | is_training: A Boolean for whether the model is in training or inference 112 | mode. Needed for batch normalization. 113 | projection_shortcut: The function to use for projection shortcuts (typically 114 | a 1x1 convolution when downsampling the input). 115 | strides: The block's stride. If greater than 1, this block will ultimately 116 | downsample the input. 117 | data_format: The input format ('channels_last' or 'channels_first'). 118 | 119 | Returns: 120 | The output tensor of the block. 121 | """ 122 | shortcut = inputs 123 | inputs = batch_norm_relu(inputs, is_training, data_format) 124 | 125 | # The projection shortcut should come after the first batch norm and ReLU 126 | # since it performs a 1x1 convolution. 127 | if projection_shortcut is not None: 128 | shortcut = projection_shortcut(inputs) 129 | 130 | inputs = conv2d_fixed_padding( 131 | inputs=inputs, filters=filters, kernel_size=3, strides=strides, 132 | data_format=data_format) 133 | 134 | inputs = batch_norm_relu(inputs, is_training, data_format) 135 | inputs = conv2d_fixed_padding( 136 | inputs=inputs, filters=filters, kernel_size=3, strides=1, 137 | data_format=data_format) 138 | 139 | return inputs + shortcut 140 | 141 | 142 | def bottleneck_block(inputs, filters, is_training, projection_shortcut, 143 | strides, data_format): 144 | """Bottleneck block variant for residual networks with BN before convolutions. 145 | 146 | Args: 147 | inputs: A tensor of size [batch, channels, height_in, width_in] or 148 | [batch, height_in, width_in, channels] depending on data_format. 149 | filters: The number of filters for the first two convolutions. Note that the 150 | third and final convolution will use 4 times as many filters. 151 | is_training: A Boolean for whether the model is in training or inference 152 | mode. Needed for batch normalization. 153 | projection_shortcut: The function to use for projection shortcuts (typically 154 | a 1x1 convolution when downsampling the input). 155 | strides: The block's stride. If greater than 1, this block will ultimately 156 | downsample the input. 157 | data_format: The input format ('channels_last' or 'channels_first'). 158 | 159 | Returns: 160 | The output tensor of the block. 161 | """ 162 | shortcut = inputs 163 | inputs = batch_norm_relu(inputs, is_training, data_format) 164 | 165 | # The projection shortcut should come after the first batch norm and ReLU 166 | # since it performs a 1x1 convolution. 167 | if projection_shortcut is not None: 168 | shortcut = projection_shortcut(inputs) 169 | 170 | inputs = conv2d_fixed_padding( 171 | inputs=inputs, filters=filters, kernel_size=1, strides=1, 172 | data_format=data_format) 173 | 174 | inputs = batch_norm_relu(inputs, is_training, data_format) 175 | inputs = conv2d_fixed_padding( 176 | inputs=inputs, filters=filters, kernel_size=3, strides=strides, 177 | data_format=data_format) 178 | 179 | inputs = batch_norm_relu(inputs, is_training, data_format) 180 | inputs = conv2d_fixed_padding( 181 | inputs=inputs, filters=4 * filters, kernel_size=1, strides=1, 182 | data_format=data_format) 183 | 184 | return inputs + shortcut 185 | 186 | 187 | def block_layer(inputs, filters, block_fn, blocks, strides, is_training, name, 188 | data_format): 189 | """Creates one layer of blocks for the ResNet model. 190 | 191 | Args: 192 | inputs: A tensor of size [batch, channels, height_in, width_in] or 193 | [batch, height_in, width_in, channels] depending on data_format. 194 | filters: The number of filters for the first convolution of the layer. 195 | block_fn: The block to use within the model, either `building_block` or 196 | `bottleneck_block`. 197 | blocks: The number of blocks contained in the layer. 198 | strides: The stride to use for the first convolution of the layer. If 199 | greater than 1, this layer will ultimately downsample the input. 200 | is_training: Either True or False, whether we are currently training the 201 | model. Needed for batch norm. 202 | name: A string name for the tensor output of the block layer. 203 | data_format: The input format ('channels_last' or 'channels_first'). 204 | 205 | Returns: 206 | The output tensor of the block layer. 207 | """ 208 | # Bottleneck blocks end with 4x the number of filters as they start with 209 | filters_out = 4 * filters if block_fn is bottleneck_block else filters 210 | 211 | def projection_shortcut(inputs): 212 | return conv2d_fixed_padding( 213 | inputs=inputs, filters=filters_out, kernel_size=1, strides=strides, 214 | data_format=data_format) 215 | 216 | # Only the first block per block_layer uses projection_shortcut and strides 217 | inputs = block_fn(inputs, filters, is_training, projection_shortcut, strides, 218 | data_format) 219 | 220 | for _ in range(1, blocks): 221 | inputs = block_fn(inputs, filters, is_training, None, 1, data_format) 222 | 223 | return tf.identity(inputs, name) 224 | 225 | 226 | def cifar10_resnet_v2_generator(resnet_size, num_classes, data_format=None): 227 | """Generator for CIFAR-10 ResNet v2 models. 228 | 229 | Args:: A single integer for the size of the ResNet model. 230 | num_classes: The number of possible classes for image classification. 231 | data_format: The input format ('channels_last', 'channels_first', or None). 232 | If set to None, the format is dependent on whether a GPU is available. 233 | 234 | Returns: 235 | The model function that takes in `inputs` and `is_training` and 236 | returns the output tensor of the ResNet model. 237 | 238 | Raises: 239 | ValueError: If `resnet_size` is invalid. 240 | """ 241 | if resnet_size % 6 != 2: 242 | raise ValueError('resnet_size must be 6n + 2:', resnet_size) 243 | 244 | num_blocks = (resnet_size - 2) // 6 245 | 246 | if data_format is None: 247 | data_format = ( 248 | 'channels_first' if tf.test.is_built_with_cuda() else 'channels_last') 249 | 250 | def model(inputs, is_training): 251 | """Constructs the ResNet model given the inputs.""" 252 | if data_format == 'channels_first': 253 | # Convert the inputs from channels_last (NHWC) to channels_first (NCHW). 254 | # This provides a large performance boost on GPU. See 255 | # https://www.tensorflow.org/performance/performance_guide#data_formats 256 | inputs = tf.transpose(inputs, [0, 3, 1, 2]) 257 | 258 | with tf.device("/gpu:0"): 259 | inputs = conv2d_fixed_padding( 260 | inputs=inputs, filters=16, kernel_size=3, strides=1, 261 | data_format=data_format) 262 | inputs = tf.identity(inputs, 'initial_conv') 263 | 264 | inputs = block_layer( 265 | inputs=inputs, filters=16, block_fn=building_block, blocks=num_blocks, 266 | strides=1, is_training=is_training, name='block_layer1', 267 | data_format=data_format) 268 | inputs = block_layer( 269 | inputs=inputs, filters=32, block_fn=building_block, blocks=num_blocks, 270 | strides=2, is_training=is_training, name='block_layer2', 271 | data_format=data_format) 272 | inputs = block_layer( 273 | inputs=inputs, filters=64, block_fn=building_block, blocks=num_blocks, 274 | strides=2, is_training=is_training, name='block_layer3', 275 | data_format=data_format) 276 | 277 | inputs = batch_norm_relu(inputs, is_training, data_format) 278 | inputs = tf.layers.average_pooling2d( 279 | inputs=inputs, pool_size=8, strides=1, padding='VALID', 280 | data_format=data_format) 281 | inputs = tf.identity(inputs, 'final_avg_pool') 282 | inputs = tf.reshape(inputs, [-1, 64]) 283 | inputs = tf.layers.dense(inputs=inputs, units=num_classes) 284 | inputs = tf.identity(inputs, 'final_dense') 285 | # inputs = conv2d_fixed_padding( 286 | # inputs=inputs, filters=16, kernel_size=3, strides=1, 287 | # data_format=data_format) 288 | # inputs = tf.identity(inputs, 'initial_conv') 289 | # 290 | # inputs = block_layer( 291 | # inputs=inputs, filters=16, block_fn=building_block, blocks=num_blocks, 292 | # strides=1, is_training=is_training, name='block_layer1', 293 | # data_format=data_format) 294 | # inputs = block_layer( 295 | # inputs=inputs, filters=32, block_fn=building_block, blocks=num_blocks, 296 | # strides=2, is_training=is_training, name='block_layer2', 297 | # data_format=data_format) 298 | # inputs = block_layer( 299 | # inputs=inputs, filters=64, block_fn=building_block, blocks=num_blocks, 300 | # strides=2, is_training=is_training, name='block_layer3', 301 | # data_format=data_format) 302 | # 303 | # inputs = batch_norm_relu(inputs, is_training, data_format) 304 | # inputs = tf.layers.average_pooling2d( 305 | # inputs=inputs, pool_size=8, strides=1, padding='VALID', 306 | # data_format=data_format) 307 | # inputs = tf.identity(inputs, 'final_avg_pool') 308 | # inputs = tf.reshape(inputs, [-1, 64]) 309 | # inputs = tf.layers.dense(inputs=inputs, units=num_classes) 310 | # inputs = tf.identity(inputs, 'final_dense') 311 | return inputs 312 | 313 | return model 314 | 315 | 316 | def imagenet_resnet_v2_generator(block_fn, layers, num_classes, 317 | data_format=None): 318 | """Generator for ImageNet ResNet v2 models. 319 | 320 | Args: 321 | block_fn: The block to use within the model, either `building_block` or 322 | `bottleneck_block`. 323 | layers: A length-4 array denoting the number of blocks to include in each 324 | layer. Each layer consists of blocks that take inputs of the same size. 325 | num_classes: The number of possible classes for image classification. 326 | data_format: The input format ('channels_last', 'channels_first', or None). 327 | If set to None, the format is dependent on whether a GPU is available. 328 | 329 | Returns: 330 | The model function that takes in `inputs` and `is_training` and 331 | returns the output tensor of the ResNet model. 332 | """ 333 | if data_format is None: 334 | data_format = ( 335 | 'channels_first' if tf.test.is_built_with_cuda() else 'channels_last') 336 | 337 | def model(inputs, is_training): 338 | """Constructs the ResNet model given the inputs.""" 339 | if data_format == 'channels_first': 340 | # Convert the inputs from channels_last (NHWC) to channels_first (NCHW). 341 | # This provides a large performance boost on GPU. See 342 | # https://www.tensorflow.org/performance/performance_guide#data_formats 343 | inputs = tf.transpose(inputs, [0, 3, 1, 2]) 344 | 345 | inputs = conv2d_fixed_padding( 346 | inputs=inputs, filters=64, kernel_size=7, strides=2, 347 | data_format=data_format) 348 | inputs = tf.identity(inputs, 'initial_conv') 349 | inputs = tf.layers.max_pooling2d( 350 | inputs=inputs, pool_size=3, strides=2, padding='SAME', 351 | data_format=data_format) 352 | inputs = tf.identity(inputs, 'initial_max_pool') 353 | 354 | inputs = block_layer( 355 | inputs=inputs, filters=64, block_fn=block_fn, blocks=layers[0], 356 | strides=1, is_training=is_training, name='block_layer1', 357 | data_format=data_format) 358 | inputs = block_layer( 359 | inputs=inputs, filters=128, block_fn=block_fn, blocks=layers[1], 360 | strides=2, is_training=is_training, name='block_layer2', 361 | data_format=data_format) 362 | inputs = block_layer( 363 | inputs=inputs, filters=256, block_fn=block_fn, blocks=layers[2], 364 | strides=2, is_training=is_training, name='block_layer3', 365 | data_format=data_format) 366 | inputs = block_layer( 367 | inputs=inputs, filters=512, block_fn=block_fn, blocks=layers[3], 368 | strides=2, is_training=is_training, name='block_layer4', 369 | data_format=data_format) 370 | 371 | inputs = batch_norm_relu(inputs, is_training, data_format) 372 | inputs = tf.layers.average_pooling2d( 373 | inputs=inputs, pool_size=7, strides=1, padding='VALID', 374 | data_format=data_format) 375 | inputs = tf.identity(inputs, 'final_avg_pool') 376 | inputs = tf.reshape(inputs, 377 | [-1, 512 if block_fn is building_block else 2048]) 378 | inputs = tf.layers.dense(inputs=inputs, units=num_classes) 379 | inputs = tf.identity(inputs, 'final_dense') 380 | return inputs 381 | 382 | return model 383 | 384 | 385 | def imagenet_resnet_v2(resnet_size, num_classes, data_format=None): 386 | """Returns the ResNet model for a given size and number of output classes.""" 387 | model_params = { 388 | 18: {'block': building_block, 'layers': [2, 2, 2, 2]}, 389 | 34: {'block': building_block, 'layers': [3, 4, 6, 3]}, 390 | 50: {'block': bottleneck_block, 'layers': [3, 4, 6, 3]}, 391 | 101: {'block': bottleneck_block, 'layers': [3, 4, 23, 3]}, 392 | 152: {'block': bottleneck_block, 'layers': [3, 8, 36, 3]}, 393 | 200: {'block': bottleneck_block, 'layers': [3, 24, 36, 3]} 394 | } 395 | 396 | if resnet_size not in model_params: 397 | raise ValueError('Not a valid resnet_size:', resnet_size) 398 | 399 | params = model_params[resnet_size] 400 | return imagenet_resnet_v2_generator( 401 | params['block'], params['layers'], num_classes, data_format) 402 | -------------------------------------------------------------------------------- /src/cifar100_download_and_extract.py: -------------------------------------------------------------------------------- 1 | # Copyright 2015 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | 16 | """Downloads and extracts the binary version of the CIFAR-10 dataset.""" 17 | 18 | from __future__ import absolute_import 19 | from __future__ import division 20 | from __future__ import print_function 21 | 22 | import argparse 23 | import os 24 | import sys 25 | import tarfile 26 | 27 | from six.moves import urllib 28 | import tensorflow as tf 29 | 30 | DATA_URL = 'https://www.cs.toronto.edu/~kriz/cifar-100-binary.tar.gz' 31 | 32 | parser = argparse.ArgumentParser() 33 | 34 | parser.add_argument( 35 | '--data_dir', type=str, default='../data/cifar100_data', 36 | help='Directory to download data and extract the tarball') 37 | 38 | 39 | def main(unused_argv): 40 | """Download and extract the tarball from Alex's website.""" 41 | if not os.path.exists(FLAGS.data_dir): 42 | os.makedirs(FLAGS.data_dir) 43 | 44 | filename = DATA_URL.split('/')[-1] 45 | filepath = os.path.join(FLAGS.data_dir, filename) 46 | 47 | if not os.path.exists(filepath): 48 | def _progress(count, block_size, total_size): 49 | sys.stdout.write('\r>> Downloading %s %.1f%%' % ( 50 | filename, 100.0 * count * block_size / total_size)) 51 | sys.stdout.flush() 52 | 53 | filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress) 54 | print() 55 | statinfo = os.stat(filepath) 56 | print('Successfully downloaded', filename, statinfo.st_size, 'bytes.') 57 | 58 | tarfile.open(filepath, 'r:gz').extractall(FLAGS.data_dir) 59 | 60 | 61 | if __name__ == '__main__': 62 | FLAGS, unparsed = parser.parse_known_args() 63 | tf.app.run(argv=[sys.argv[0]] + unparsed) 64 | -------------------------------------------------------------------------------- /src/cifar100_test.py: -------------------------------------------------------------------------------- 1 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | 16 | from __future__ import absolute_import 17 | from __future__ import division 18 | from __future__ import print_function 19 | 20 | from tempfile import mkstemp 21 | 22 | import numpy as np 23 | import tensorflow as tf 24 | 25 | import main 26 | 27 | tf.logging.set_verbosity(tf.logging.ERROR) 28 | 29 | _BATCH_SIZE = 128 30 | 31 | 32 | class BaseTest(tf.test.TestCase): 33 | 34 | def test_dataset_input_fn(self): 35 | fake_data = bytearray() 36 | fake_data.append(7) 37 | for i in range(3): 38 | for _ in range(1024): 39 | fake_data.append(i) 40 | 41 | _, filename = mkstemp(dir=self.get_temp_dir()) 42 | data_file = open(filename, 'wb') 43 | data_file.write(fake_data) 44 | data_file.close() 45 | 46 | fake_dataset = cifar10_main.record_dataset(filename) 47 | fake_dataset = fake_dataset.map(cifar10_main.parse_record) 48 | image, label = fake_dataset.make_one_shot_iterator().get_next() 49 | 50 | self.assertEqual(label.get_shape().as_list(), [10]) 51 | self.assertEqual(image.get_shape().as_list(), [32, 32, 3]) 52 | 53 | with self.test_session() as sess: 54 | image, label = sess.run([image, label]) 55 | 56 | self.assertAllEqual(label, np.array([int(i == 7) for i in range(10)])) 57 | 58 | for row in image: 59 | for pixel in row: 60 | self.assertAllEqual(pixel, np.array([0, 1, 2])) 61 | 62 | def input_fn(self): 63 | features = tf.random_uniform([_BATCH_SIZE, 32, 32, 3]) 64 | labels = tf.random_uniform( 65 | [_BATCH_SIZE], maxval=9, dtype=tf.int32) 66 | return features, tf.one_hot(labels, 10) 67 | 68 | def cifar10_model_fn_helper(self, mode): 69 | features, labels = self.input_fn() 70 | spec = cifar10_main.cifar10_model_fn( 71 | features, labels, mode, { 72 | 'resnet_size': 32, 73 | 'data_format': 'channels_last', 74 | 'batch_size': _BATCH_SIZE, 75 | }) 76 | 77 | predictions = spec.predictions 78 | self.assertAllEqual(predictions['probabilities'].shape, 79 | (_BATCH_SIZE, 10)) 80 | self.assertEqual(predictions['probabilities'].dtype, tf.float32) 81 | self.assertAllEqual(predictions['classes'].shape, (_BATCH_SIZE,)) 82 | self.assertEqual(predictions['classes'].dtype, tf.int64) 83 | 84 | if mode != tf.estimator.ModeKeys.PREDICT: 85 | loss = spec.loss 86 | self.assertAllEqual(loss.shape, ()) 87 | self.assertEqual(loss.dtype, tf.float32) 88 | 89 | if mode == tf.estimator.ModeKeys.EVAL: 90 | eval_metric_ops = spec.eval_metric_ops 91 | self.assertAllEqual(eval_metric_ops['accuracy'][0].shape, ()) 92 | self.assertAllEqual(eval_metric_ops['accuracy'][1].shape, ()) 93 | self.assertEqual(eval_metric_ops['accuracy'][0].dtype, tf.float32) 94 | self.assertEqual(eval_metric_ops['accuracy'][1].dtype, tf.float32) 95 | 96 | def test_cifar10_model_fn_train_mode(self): 97 | self.cifar10_model_fn_helper(tf.estimator.ModeKeys.TRAIN) 98 | 99 | def test_cifar10_model_fn_eval_mode(self): 100 | self.cifar10_model_fn_helper(tf.estimator.ModeKeys.EVAL) 101 | 102 | def test_cifar10_model_fn_predict_mode(self): 103 | self.cifar10_model_fn_helper(tf.estimator.ModeKeys.PREDICT) 104 | 105 | 106 | if __name__ == '__main__': 107 | tf.test.main() 108 | -------------------------------------------------------------------------------- /src/cifar100_train.py: -------------------------------------------------------------------------------- 1 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Runs a ResNet model on the CIFAR-100 dataset.""" 16 | 17 | from __future__ import absolute_import 18 | from __future__ import division 19 | from __future__ import print_function 20 | 21 | import argparse 22 | import os 23 | import sys 24 | 25 | import tensorflow as tf 26 | 27 | import childnetwork as resnet_model 28 | from rnn_controller import Network 29 | from config import Config 30 | from parser import Parser 31 | 32 | parser = argparse.ArgumentParser() 33 | 34 | # Basic model parameters. 35 | parser.add_argument('--data_dir', type=str, default='../data/cifar100_data', 36 | help='The path to the CIFAR-10 data directory.') 37 | 38 | parser.add_argument('--model_dir', type=str, default='/tmp/cifar100_model', 39 | help='The directory where the model will be stored.') 40 | 41 | parser.add_argument('--resnet_size', type=int, default=164, 42 | help='The size of the ResNet model to use.') 43 | 44 | parser.add_argument('--train_epochs', type=int, default=250, 45 | help='The number of epochs to train.') 46 | 47 | parser.add_argument('--epochs_per_eval', type=int, default=10, 48 | help='The number of epochs to run in between evaluations.') 49 | 50 | parser.add_argument('--batch_size', type=int, default=5, 51 | help='The number of images per batch.') 52 | 53 | parser.add_argument( 54 | '--data_format', type=str, default=None, 55 | choices=['channels_first', 'channels_last'], 56 | help='A flag to override the data format used in the model. channels_first ' 57 | 'provides a performance boost on GPU but is not always compatible ' 58 | 'with CPU. If left unspecified, the data format will be chosen ' 59 | 'automatically based on whether TensorFlow was built for CPU or GPU.') 60 | 61 | _HEIGHT = 32 62 | _WIDTH = 32 63 | _DEPTH = 3 64 | _NUM_CLASSES = 10 65 | _NUM_DATA_FILES = 5 66 | 67 | # We use a weight decay of 0.0002, which performs better than the 0.0001 that 68 | # was originally suggested. 69 | _WEIGHT_DECAY = 2e-4 70 | _MOMENTUM = 0.9 71 | 72 | _NUM_IMAGES = { 73 | 'train': 50000, 74 | 'validation': 10000, 75 | } 76 | 77 | 78 | def record_dataset(filenames): 79 | """Returns an input pipeline Dataset from `filenames`.""" 80 | record_bytes = _HEIGHT * _WIDTH * _DEPTH + 1 81 | return tf.data.FixedLengthRecordDataset(filenames, record_bytes) 82 | 83 | 84 | def get_filenames(is_training, data_dir): 85 | """Returns a list of filenames.""" 86 | data_dir = os.path.join(data_dir, 'cifar-10-batches-bin') 87 | 88 | assert os.path.exists(data_dir), ( 89 | 'Run cifar10_download_and_extract.py first to download and extract the ' 90 | 'CIFAR-10 data.') 91 | 92 | if is_training: 93 | return [ 94 | os.path.join(data_dir, 'data_batch_%d.bin' % i) 95 | for i in range(1, _NUM_DATA_FILES + 1) 96 | ] 97 | else: 98 | return [os.path.join(data_dir, 'test_batch.bin')] 99 | 100 | 101 | def parse_record(raw_record): 102 | """Parse CIFAR-10 image and label from a raw record.""" 103 | # Every record consists of a label followed by the image, with a fixed number 104 | # of bytes for each. 105 | label_bytes = 1 106 | image_bytes = _HEIGHT * _WIDTH * _DEPTH 107 | record_bytes = label_bytes + image_bytes 108 | 109 | # Convert bytes to a vector of uint8 that is record_bytes long. 110 | record_vector = tf.decode_raw(raw_record, tf.uint8) 111 | 112 | # The first byte represents the label, which we convert from uint8 to int32 113 | # and then to one-hot. 114 | label = tf.cast(record_vector[0], tf.int32) 115 | label = tf.one_hot(label, _NUM_CLASSES) 116 | 117 | # The remaining bytes after the label represent the image, which we reshape 118 | # from [depth * height * width] to [depth, height, width]. 119 | depth_major = tf.reshape( 120 | record_vector[label_bytes:record_bytes], [_DEPTH, _HEIGHT, _WIDTH]) 121 | 122 | # Convert from [depth, height, width] to [height, width, depth], and cast as 123 | # float32. 124 | image = tf.cast(tf.transpose(depth_major, [1, 2, 0]), tf.float32) 125 | 126 | return image, label 127 | 128 | 129 | def preprocess_image(image, is_training): 130 | """Preprocess a single image of layout [height, width, depth].""" 131 | if is_training: 132 | # Resize the image to add four extra pixels on each side. 133 | image = tf.image.resize_image_with_crop_or_pad( 134 | image, _HEIGHT + 8, _WIDTH + 8) 135 | 136 | # Randomly crop a [_HEIGHT, _WIDTH] section of the image. 137 | image = tf.random_crop(image, [_HEIGHT, _WIDTH, _DEPTH]) 138 | 139 | # Randomly flip the image horizontally. 140 | image = tf.image.random_flip_left_right(image) 141 | 142 | # Subtract off the mean and divide by the variance of the pixels. 143 | image = tf.image.per_image_standardization(image) 144 | return image 145 | 146 | 147 | def input_fn(is_training, data_dir, batch_size, num_epochs=1): 148 | """Input_fn using the tf.data input pipeline for CIFAR-10 dataset. 149 | 150 | Args: 151 | is_training: A boolean denoting whether the input is for training. 152 | data_dir: The directory containing the input data. 153 | batch_size: The number of samples per batch. 154 | num_epochs: The number of epochs to repeat the dataset. 155 | 156 | Returns: 157 | A tuple of images and labels. 158 | """ 159 | dataset = record_dataset(get_filenames(is_training, data_dir)) 160 | 161 | if is_training: 162 | # When choosing shuffle buffer sizes, larger sizes result in better 163 | # randomness, while smaller sizes have better performance. Because CIFAR-10 164 | # is a relatively small dataset, we choose to shuffle the full epoch. 165 | dataset = dataset.shuffle(buffer_size=_NUM_IMAGES['train']) 166 | 167 | dataset = dataset.map(parse_record) 168 | dataset = dataset.map( 169 | lambda image, label: (preprocess_image(image, is_training), label)) 170 | 171 | dataset = dataset.prefetch(2 * batch_size) 172 | 173 | # We call repeat after shuffling, rather than before, to prevent separate 174 | # epochs from blending together. 175 | dataset = dataset.repeat(num_epochs) 176 | 177 | # Batch results by up to batch_size, and then fetch the tuple from the 178 | # iterator. 179 | dataset = dataset.batch(batch_size) 180 | iterator = dataset.make_one_shot_iterator() 181 | images, labels = iterator.get_next() 182 | 183 | return images, labels 184 | 185 | 186 | def cifar10_model_fn(features, labels, mode, params): 187 | """Model function for CIFAR-10.""" 188 | tf.summary.image('images', features, max_outputs=6) 189 | 190 | network = resnet_model.cifar10_resnet_v2_generator( 191 | params['resnet_size'], _NUM_CLASSES, params['data_format']) 192 | 193 | inputs = tf.reshape(features, [-1, _HEIGHT, _WIDTH, _DEPTH]) 194 | logits = network(inputs, mode == tf.estimator.ModeKeys.TRAIN) 195 | 196 | predictions = { 197 | 'classes': tf.argmax(logits, axis=1), 198 | 'probabilities': tf.nn.softmax(logits, name='softmax_tensor') 199 | } 200 | 201 | if mode == tf.estimator.ModeKeys.PREDICT: 202 | return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions) 203 | 204 | # Calculate loss, which includes softmax cross entropy and L2 regularization. 205 | cross_entropy = tf.losses.softmax_cross_entropy( 206 | logits=logits, onehot_labels=labels) 207 | 208 | # Create a tensor named cross_entropy for logging purposes. 209 | tf.identity(cross_entropy, name='cross_entropy') 210 | tf.summary.scalar('cross_entropy', cross_entropy) 211 | 212 | # Add weight decay to the loss. 213 | loss = cross_entropy + _WEIGHT_DECAY * tf.add_n( 214 | [tf.nn.l2_loss(v) for v in tf.trainable_variables()]) 215 | 216 | if mode == tf.estimator.ModeKeys.TRAIN: 217 | # Scale the learning rate linearly with the batch size. When the batch size 218 | # is 128, the learning rate should be 0.1. 219 | initial_learning_rate = 0.1 * params['batch_size'] / 128 220 | batches_per_epoch = _NUM_IMAGES['train'] / params['batch_size'] 221 | global_step = tf.train.get_or_create_global_step() 222 | 223 | # Multiply the learning rate by 0.1 at 100, 150, and 200 epochs. 224 | boundaries = [int(batches_per_epoch * epoch) for epoch in [100, 150, 200]] 225 | values = [initial_learning_rate * decay for decay in [1, 0.1, 0.01, 0.001]] 226 | learning_rate = tf.train.piecewise_constant( 227 | tf.cast(global_step, tf.int32), boundaries, values) 228 | 229 | # Create a tensor named learning_rate for logging purposes 230 | tf.identity(learning_rate, name='learning_rate') 231 | tf.summary.scalar('learning_rate', learning_rate) 232 | 233 | optimizer = tf.train.MomentumOptimizer( 234 | learning_rate=learning_rate, 235 | momentum=_MOMENTUM) 236 | 237 | # Batch norm requires update ops to be added as a dependency to the train_op 238 | update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 239 | with tf.control_dependencies(update_ops): 240 | train_op = optimizer.minimize(loss, global_step) 241 | else: 242 | train_op = None 243 | 244 | accuracy = tf.metrics.accuracy( 245 | tf.argmax(labels, axis=1), predictions['classes']) 246 | metrics = {'accuracy': accuracy} 247 | 248 | # Create a tensor named train_accuracy for logging purposes 249 | tf.identity(accuracy[1], name='train_accuracy') 250 | tf.summary.scalar('train_accuracy', accuracy[1]) 251 | 252 | return tf.estimator.EstimatorSpec( 253 | mode=mode, 254 | predictions=predictions, 255 | loss=loss, 256 | train_op=train_op, 257 | eval_metric_ops=metrics) 258 | 259 | 260 | def main(unused_argv): 261 | # Using the Winograd non-fused algorithms provides a small performance boost. 262 | os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1' 263 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 264 | 265 | allow_soft_placement=True,log_device_placement=True)) 266 | cifar_classifier = tf.estimator.Estimator( 267 | model_fn=cifar10_model_fn, model_dir=FLAGS.model_dir, config=run_config, 268 | params={ 269 | 'resnet_size': FLAGS.resnet_size, 270 | 'data_format': FLAGS.data_format, 271 | 'batch_size': FLAGS.batch_size, 272 | }) 273 | 274 | # FLAGS.train_epochs // FLAGS.epochs_per_eval 275 | for _ in range(FLAGS.train_epochs): 276 | tensors_to_log = { 277 | 'learning_rate': 'learning_rate', 278 | 'cross_entropy': 'cross_entropy', 279 | 'train_accuracy': 'train_accuracy' 280 | } 281 | 282 | logging_hook = tf.train.LoggingTensorHook( 283 | tensors=tensors_to_log, every_n_iter=100) 284 | 285 | # cifar_classifier.train( 286 | # input_fn=lambda: input_fn( 287 | # True, FLAGS.data_dir, FLAGS.batch_size, FLAGS.epochs_per_eval), 288 | # hooks=[logging_hook]) 289 | cifar_classifier.train( 290 | input_fn=lambda: input_fn( 291 | True, FLAGS.data_dir, FLAGS.batch_size, FLAGS.epochs_per_eval)) 292 | 293 | # Evaluate the model and print results 294 | eval_results = cifar_classifier.evaluate( 295 | input_fn=lambda: input_fn(False, FLAGS.data_dir, FLAGS.batch_size)) 296 | print(eval_results) 297 | 298 | 299 | 300 | if __name__ == '__main__': 301 | tf.logging.set_verbosity(tf.logging.INFO) 302 | FLAGS, unparsed = parser.parse_known_args() 303 | tf.app.run(argv=[sys.argv[0]] + unparsed) 304 | -------------------------------------------------------------------------------- /src/cifar10_download_and_extract.py: -------------------------------------------------------------------------------- 1 | # Copyright 2015 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | 16 | """Downloads and extracts the binary version of the CIFAR-10 dataset.""" 17 | 18 | from __future__ import absolute_import 19 | from __future__ import division 20 | from __future__ import print_function 21 | 22 | import argparse 23 | import os 24 | import sys 25 | import tarfile 26 | 27 | from six.moves import urllib 28 | import tensorflow as tf 29 | 30 | DATA_URL = 'https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz' 31 | 32 | parser = argparse.ArgumentParser() 33 | 34 | parser.add_argument( 35 | '--data_dir', type=str, default='../data/cifar10_data', 36 | help='Directory to download data and extract the tarball') 37 | 38 | 39 | def main(unused_argv): 40 | """Download and extract the tarball from Alex's website.""" 41 | if not os.path.exists(FLAGS.data_dir): 42 | os.makedirs(FLAGS.data_dir) 43 | 44 | filename = DATA_URL.split('/')[-1] 45 | filepath = os.path.join(FLAGS.data_dir, filename) 46 | 47 | if not os.path.exists(filepath): 48 | def _progress(count, block_size, total_size): 49 | sys.stdout.write('\r>> Downloading %s %.1f%%' % ( 50 | filename, 100.0 * count * block_size / total_size)) 51 | sys.stdout.flush() 52 | 53 | filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress) 54 | print() 55 | statinfo = os.stat(filepath) 56 | print('Successfully downloaded', filename, statinfo.st_size, 'bytes.') 57 | 58 | tarfile.open(filepath, 'r:gz').extractall(FLAGS.data_dir) 59 | 60 | 61 | if __name__ == '__main__': 62 | FLAGS, unparsed = parser.parse_known_args() 63 | tf.app.run(argv=[sys.argv[0]] + unparsed) 64 | -------------------------------------------------------------------------------- /src/config.py: -------------------------------------------------------------------------------- 1 | import os 2 | import utils 3 | import tensorflow as tf 4 | 5 | class Config(object): 6 | def __init__(self, args): 7 | self.codebase_root_path = args.path 8 | self.folder_suffix = args.folder_suffix 9 | self.project_name = args.project 10 | self.dataset_name = args.dataset 11 | self.batch_size = args.batch_size 12 | self.max_epochs = args.max_epochs 13 | self.num_classes = args.num_classes 14 | self.hyperparams = args.hyperparams 15 | self.load = args.load 16 | self.debug = args.debug 17 | class Solver(object): 18 | def __init__(self, t_args): 19 | self.learning_rate = t_args.lr 20 | self.dropout = t_args.dropout 21 | if t_args.opt.lower() not in ["adam", "rmsprop", "sgd", "normal"]: 22 | raise ValueError('Undefined type of optmizer') 23 | else: 24 | self.optimizer = {"adam": tf.train.AdamOptimizer, "rmsprop": tf.train.RMSPropOptimizer, "sgd": tf.train.GradientDescentOptimizer, "normal": tf.train.Optimizer}[t_args.opt.lower()](self.learning_rate) 25 | 26 | self.solver = Solver(args) 27 | self.project_path, self.project_prefix_path, self.dataset_path, self.train_path, self.test_path, self.ckptdir_path = self.set_paths() 28 | 29 | def set_paths(self): 30 | project_path = utils.path_exists(self.codebase_root_path) 31 | project_prefix_path = "" #utils.path_exists(os.path.join(self.codebase_root_path, self.project_name, self.folder_suffix)) 32 | dataset_path = utils.path_exists(os.path.join(self.codebase_root_path, "data", self.dataset_name)) 33 | ckptdir_path = utils.path_exists(os.path.join(self.codebase_root_path, "checkpoint")) 34 | train_path = os.path.join(dataset_path, "data_batch_") 35 | test_path = os.path.join(dataset_path, "test_batch") 36 | 37 | return project_path, project_prefix_path, dataset_path, train_path, test_path, ckptdir_path 38 | -------------------------------------------------------------------------------- /src/dataset.py: -------------------------------------------------------------------------------- 1 | import os 2 | import utils 3 | import pickle 4 | import numpy as np 5 | import matplotlib.pyplot as plt 6 | import matplotlib.image as mpimg 7 | 8 | class DataSet(object): 9 | def __init__(self, config): 10 | self.config = config 11 | self.batch_count = 1 12 | 13 | def load_data(self, file_name): 14 | with open(file_name, 'rb') as file: 15 | unpickler = pickle._Unpickler(file) 16 | unpickler.encoding = 'latin1' 17 | contents = unpickler.load() 18 | X, Y = np.asarray(contents['data'], dtype=np.float32), np.asarray(contents['labels']) 19 | one_hot = np.zeros((Y.size, Y.max() + 1)) 20 | one_hot[np.arange(Y.size), Y] = 1 21 | return X, one_hot 22 | 23 | def get_batch(self, type_): 24 | if type_ == "test": 25 | return self.load_data(self.config.test_path) 26 | elif type_ == "train": 27 | self.batch_count += 1 28 | return self.load_data(self.config.train_path + str(self.batch_count)) 29 | elif type_ == "validation": 30 | return self.load_data(self.config.train_path + "5") 31 | 32 | def next_batch(self, type_): 33 | if self.batch_count > 4: 34 | self.batch_count = 1 35 | X, Y = self.get_batch(type_) 36 | start, batch_size, tot = 0, self.config.batch_size, len(X) 37 | total = int(tot/ batch_size) # fix the last batch 38 | while start < total: 39 | end = start + batch_size 40 | x = X[start : end, :] 41 | y = Y[start : end, :] 42 | start += 1 43 | yield (x, y, int(total)) -------------------------------------------------------------------------------- /src/img/Rnn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/img/Rnn.png -------------------------------------------------------------------------------- /src/img/activationfunctions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/img/activationfunctions.png -------------------------------------------------------------------------------- /src/img/graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/img/graph.png -------------------------------------------------------------------------------- /src/img/loss_rmsprop.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/img/loss_rmsprop.png -------------------------------------------------------------------------------- /src/img/nas.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/img/nas.jpeg -------------------------------------------------------------------------------- /src/main.py: -------------------------------------------------------------------------------- 1 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Runs a ResNet model on the CIFAR-10 dataset.""" 16 | 17 | from __future__ import absolute_import 18 | from __future__ import division 19 | from __future__ import print_function 20 | 21 | import argparse 22 | import os 23 | import sys 24 | 25 | import tensorflow as tf 26 | 27 | import childnetwork as resnet_model 28 | from rnn_controller import Network 29 | from config import Config 30 | from parser import Parser 31 | 32 | parser = argparse.ArgumentParser() 33 | 34 | # Basic model parameters. 35 | parser.add_argument('--data_dir', type=str, default='../data/cifar10_data', 36 | help='The path to the CIFAR-10 data directory.') 37 | 38 | parser.add_argument('--model_dir', type=str, default='/tmp/cifar10_model', 39 | help='The directory where the model will be stored.') 40 | 41 | parser.add_argument('--resnet_size', type=int, default=20, 42 | help='The size of the ResNet model to use. We set as 20 by default following the paper') 43 | 44 | parser.add_argument('--train_epochs', type=int, default=100, 45 | help='The number of epochs to train the RNN controller to generate the Activation function') 46 | 47 | parser.add_argument('--epochs_per_eval', type=int, default=10, 48 | help='The number of epochs to run in between evaluations.') 49 | 50 | parser.add_argument('--batch_size', type=int, default=5, 51 | help='The number of images per batch.') 52 | 53 | parser.add_argument( 54 | '--data_format', type=str, default=None, 55 | choices=['channels_first', 'channels_last'], 56 | help='A flag to override the data format used in the model. channels_first ' 57 | 'provides a performance boost on GPU but is not always compatible ' 58 | 'with CPU. If left unspecified, the data format will be chosen ' 59 | 'automatically based on whether TensorFlow was built for CPU or GPU.') 60 | 61 | _HEIGHT = 32 62 | _WIDTH = 32 63 | _DEPTH = 3 64 | _NUM_CLASSES = 10 65 | _NUM_DATA_FILES = 5 66 | 67 | # We use a weight decay of 0.0002, which performs better than the 0.0001 that 68 | # was originally suggested. 69 | _WEIGHT_DECAY = 2e-4 70 | _MOMENTUM = 0.9 71 | 72 | _NUM_IMAGES = { 73 | 'train': 50000, 74 | 'validation': 10000, 75 | } 76 | 77 | 78 | def record_dataset(filenames): 79 | """Returns an input pipeline Dataset from `filenames`.""" 80 | record_bytes = _HEIGHT * _WIDTH * _DEPTH + 1 81 | return tf.data.FixedLengthRecordDataset(filenames, record_bytes) 82 | 83 | 84 | def get_filenames(is_training, data_dir): 85 | """Returns a list of filenames.""" 86 | data_dir = os.path.join(data_dir, 'cifar-10-batches-bin') 87 | 88 | assert os.path.exists(data_dir), ( 89 | 'Run cifar10_download_and_extract.py first to download and extract the ' 90 | 'CIFAR-10 data.') 91 | 92 | if is_training: 93 | return [ 94 | os.path.join(data_dir, 'data_batch_%d.bin' % i) 95 | for i in range(1, _NUM_DATA_FILES + 1) 96 | ] 97 | else: 98 | return [os.path.join(data_dir, 'test_batch.bin')] 99 | 100 | 101 | def parse_record(raw_record): 102 | """Parse CIFAR-10 image and label from a raw record.""" 103 | # Every record consists of a label followed by the image, with a fixed number 104 | # of bytes for each. 105 | label_bytes = 1 106 | image_bytes = _HEIGHT * _WIDTH * _DEPTH 107 | record_bytes = label_bytes + image_bytes 108 | 109 | # Convert bytes to a vector of uint8 that is record_bytes long. 110 | record_vector = tf.decode_raw(raw_record, tf.uint8) 111 | 112 | # The first byte represents the label, which we convert from uint8 to int32 113 | # and then to one-hot. 114 | label = tf.cast(record_vector[0], tf.int32) 115 | label = tf.one_hot(label, _NUM_CLASSES) 116 | 117 | # The remaining bytes after the label represent the image, which we reshape 118 | # from [depth * height * width] to [depth, height, width]. 119 | depth_major = tf.reshape( 120 | record_vector[label_bytes:record_bytes], [_DEPTH, _HEIGHT, _WIDTH]) 121 | 122 | # Convert from [depth, height, width] to [height, width, depth], and cast as 123 | # float32. 124 | image = tf.cast(tf.transpose(depth_major, [1, 2, 0]), tf.float32) 125 | 126 | return image, label 127 | 128 | 129 | def preprocess_image(image, is_training): 130 | """Preprocess a single image of layout [height, width, depth].""" 131 | if is_training: 132 | # Resize the image to add four extra pixels on each side. 133 | image = tf.image.resize_image_with_crop_or_pad( 134 | image, _HEIGHT + 8, _WIDTH + 8) 135 | 136 | # Randomly crop a [_HEIGHT, _WIDTH] section of the image. 137 | image = tf.random_crop(image, [_HEIGHT, _WIDTH, _DEPTH]) 138 | 139 | # Randomly flip the image horizontally. 140 | image = tf.image.random_flip_left_right(image) 141 | 142 | # Subtract off the mean and divide by the variance of the pixels. 143 | image = tf.image.per_image_standardization(image) 144 | return image 145 | 146 | 147 | def input_fn(is_training, data_dir, batch_size, num_epochs=1): 148 | """Input_fn using the tf.data input pipeline for CIFAR-10 dataset. 149 | 150 | Args: 151 | is_training: A boolean denoting whether the input is for training. 152 | data_dir: The directory containing the input data. 153 | batch_size: The number of samples per batch. 154 | num_epochs: The number of epochs to repeat the dataset. 155 | 156 | Returns: 157 | A tuple of images and labels. 158 | """ 159 | dataset = record_dataset(get_filenames(is_training, data_dir)) 160 | 161 | if is_training: 162 | # When choosing shuffle buffer sizes, larger sizes result in better 163 | # randomness, while smaller sizes have better performance. Because CIFAR-10 164 | # is a relatively small dataset, we choose to shuffle the full epoch. 165 | dataset = dataset.shuffle(buffer_size=_NUM_IMAGES['train']) 166 | 167 | dataset = dataset.map(parse_record) 168 | dataset = dataset.map( 169 | lambda image, label: (preprocess_image(image, is_training), label)) 170 | 171 | dataset = dataset.prefetch(2 * batch_size) 172 | 173 | # We call repeat after shuffling, rather than before, to prevent separate 174 | # epochs from blending together. 175 | dataset = dataset.repeat(num_epochs) 176 | 177 | # Batch results by up to batch_size, and then fetch the tuple from the 178 | # iterator. 179 | dataset = dataset.batch(batch_size) 180 | iterator = dataset.make_one_shot_iterator() 181 | images, labels = iterator.get_next() 182 | 183 | return images, labels 184 | 185 | 186 | def cifar10_model_fn(features, labels, mode, params): 187 | """Model function for CIFAR-10.""" 188 | tf.summary.image('images', features, max_outputs=6) 189 | 190 | network = resnet_model.cifar10_resnet_v2_generator( 191 | params['resnet_size'], _NUM_CLASSES, params['data_format']) 192 | 193 | inputs = tf.reshape(features, [-1, _HEIGHT, _WIDTH, _DEPTH]) 194 | logits = network(inputs, mode == tf.estimator.ModeKeys.TRAIN) 195 | 196 | predictions = { 197 | 'classes': tf.argmax(logits, axis=1), 198 | 'probabilities': tf.nn.softmax(logits, name='softmax_tensor') 199 | } 200 | 201 | if mode == tf.estimator.ModeKeys.PREDICT: 202 | return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions) 203 | 204 | # Calculate loss, which includes softmax cross entropy and L2 regularization. 205 | cross_entropy = tf.losses.softmax_cross_entropy( 206 | logits=logits, onehot_labels=labels) 207 | 208 | # Create a tensor named cross_entropy for logging purposes. 209 | tf.identity(cross_entropy, name='cross_entropy') 210 | tf.summary.scalar('cross_entropy', cross_entropy) 211 | 212 | # Add weight decay to the loss. 213 | loss = cross_entropy + _WEIGHT_DECAY * tf.add_n( 214 | [tf.nn.l2_loss(v) for v in tf.trainable_variables()]) 215 | 216 | if mode == tf.estimator.ModeKeys.TRAIN: 217 | # Scale the learning rate linearly with the batch size. When the batch size 218 | # is 128, the learning rate should be 0.1. 219 | initial_learning_rate = 0.1 * params['batch_size'] / 128 220 | batches_per_epoch = _NUM_IMAGES['train'] / params['batch_size'] 221 | global_step = tf.train.get_or_create_global_step() 222 | 223 | # Multiply the learning rate by 0.1 at 100, 150, and 200 epochs. 224 | boundaries = [int(batches_per_epoch * epoch) for epoch in [100, 150, 200]] 225 | values = [initial_learning_rate * decay for decay in [1, 0.1, 0.01, 0.001]] 226 | learning_rate = tf.train.piecewise_constant( 227 | tf.cast(global_step, tf.int32), boundaries, values) 228 | 229 | # Create a tensor named learning_rate for logging purposes 230 | tf.identity(learning_rate, name='learning_rate') 231 | tf.summary.scalar('learning_rate', learning_rate) 232 | 233 | optimizer = tf.train.MomentumOptimizer( 234 | learning_rate=learning_rate, 235 | momentum=_MOMENTUM) 236 | 237 | # Batch norm requires update ops to be added as a dependency to the train_op 238 | update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 239 | with tf.control_dependencies(update_ops): 240 | train_op = optimizer.minimize(loss, global_step) 241 | else: 242 | train_op = None 243 | 244 | accuracy = tf.metrics.accuracy( 245 | tf.argmax(labels, axis=1), predictions['classes']) 246 | metrics = {'accuracy': accuracy} 247 | 248 | # Create a tensor named train_accuracy for logging purposes 249 | tf.identity(accuracy[1], name='train_accuracy') 250 | tf.summary.scalar('train_accuracy', accuracy[1]) 251 | 252 | return tf.estimator.EstimatorSpec( 253 | mode=mode, 254 | predictions=predictions, 255 | loss=loss, 256 | train_op=train_op, 257 | eval_metric_ops=metrics) 258 | 259 | 260 | def main(unused_argv): 261 | # Using the Winograd non-fused algorithms provides a small performance boost. 262 | os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1' 263 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 264 | gamma = 0.95 265 | #RNN controller 266 | args = Parser().get_parser().parse_args() 267 | #Defining rnn 268 | val_accuracy = tf.placeholder(tf.float32) 269 | config = Config(args) 270 | net = Network(config) 271 | 272 | #Generate hyperparams 273 | A_t = tf.zeros((1,1)) 274 | # PPO implementation 275 | for i in range(FLAGS.train_epochs): 276 | outputs,prob,value = net.neural_search() 277 | hyperparams = net.gen_hyperparams(outputs) 278 | tf.assert_rank_at_least(tf.convert_to_tensor(prob),1,message="prob is the fucking problem") 279 | c_1=1 280 | c_2=0.01 281 | if i >0 : 282 | #Polciy ratio 283 | #We write it in this tf.exp(tf.log(prob) - tf.log(old_prob)) instead of prob/old_prob 284 | #To improve numberical stability 285 | r = tf.exp(tf.log(prob) - tf.log(old_prob)) 286 | #Encforcing the bellman equation 287 | delta_t = eval_results["accuracy"] + gamma*value - old_value 288 | A_t = delta_t + gamma*A_t 289 | L_clip = net.Lclip(eval_results["accuracy"],A_t) 290 | L_vf = net.Lvf(delta_t) 291 | entropy_penalty = -tf.reduce_sum(tf.exp(tf.add(tf.log(prob),tf.log(tf.log(tf.clip_by_value(prob, 1e-10, 1.0)))))) 292 | tf.assert_rank(L_clip,0,message="L_clip is computed wrongly, wrong rank") 293 | tf.assert_rank(L_vf,0,message="L_vf is computed wrongly, wrong rank") 294 | tf.assert_rank(entropy_penalty,0,message="entropy_penalty is computed wrongly, wrong rank") 295 | total_loss = L_clip - c_1*L_vf + c_2 * entropy_penalty 296 | tf.summary.scalar('loss',total_loss) 297 | 298 | tf_config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True) 299 | tf_config.gpu_options.allow_growth = True 300 | sess = tf.Session(config=tf_config) 301 | sess.run(tf.global_variables_initializer()) 302 | sess.run(tf.local_variables_initializer()) 303 | merged = tf.summary.merge_all() 304 | train_writer = tf.summary.FileWriter('train',sess.graph) 305 | 306 | # Set up a RunConfig to only save checkpoints once per training cycle. 307 | #run_config = tf.estimator.RunConfig().replace(session_config=tf.ConfigProto(log_device_placement=True),save_checkpoints_secs=1e9) 308 | print(sess.run(hyperparams)) 309 | print(sess.run(value)) 310 | #tmp is a temporary file which stores the encoded activation function, 311 | # it is used by main.py to pass the activation function to the childnetwork which reads from the file as the the program is being run. 312 | # It also acts as a cache file to store the final activation function found the agorthim 313 | with open("tmp","w") as f: 314 | f.write(' '.join(map(str,sess.run(hyperparams)))) 315 | 316 | run_config = tf.estimator.RunConfig().replace(session_config=tf.ConfigProto(allow_soft_placement=True,log_device_placement=True)) 317 | cifar_classifier = tf.estimator.Estimator( 318 | model_fn=cifar10_model_fn, model_dir=FLAGS.model_dir, config=run_config, 319 | params={ 320 | 'resnet_size': FLAGS.resnet_size, 321 | 'data_format': FLAGS.data_format, 322 | 'batch_size': FLAGS.batch_size, 323 | }) 324 | 325 | for _ in range(FLAGS.train_epochs // FLAGS.epochs_per_eval): 326 | tensors_to_log = { 327 | 'learning_rate': 'learning_rate', 328 | 'cross_entropy': 'cross_entropy', 329 | 'train_accuracy': 'train_accuracy' 330 | } 331 | 332 | logging_hook = tf.train.LoggingTensorHook( 333 | tensors=tensors_to_log, every_n_iter=100) 334 | 335 | cifar_classifier.train( 336 | input_fn=lambda: input_fn( 337 | True, FLAGS.data_dir, FLAGS.batch_size, FLAGS.epochs_per_eval)) 338 | 339 | # Evaluate the model and print results 340 | eval_results = cifar_classifier.evaluate( 341 | input_fn=lambda: input_fn(False, FLAGS.data_dir, FLAGS.batch_size)) 342 | print(eval_results) 343 | 344 | old_prob = tf.identity(prob) 345 | old_value = tf.identity(value) 346 | 347 | if i >0 : 348 | print("Training RNN") 349 | tr_cont_step = net.update(total_loss) 350 | sess.run(tf.global_variables_initializer()) 351 | _ = sess.run(tr_cont_step, feed_dict={val_accuracy : eval_results["accuracy"]}) 352 | print("RNN Trained") 353 | assert A_t !=tf.zeros((1,1)), "Advantage function was not computed correctly" 354 | 355 | 356 | if __name__ == '__main__': 357 | tf.logging.set_verbosity(tf.logging.INFO) 358 | FLAGS, unparsed = parser.parse_known_args() 359 | tf.app.run(argv=[sys.argv[0]] + unparsed) 360 | -------------------------------------------------------------------------------- /src/parser.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | class Parser(object): 4 | def __init__(self): 5 | parser = argparse.ArgumentParser() 6 | parser.add_argument("--path", default="../", help="Base Path for the Folder") 7 | parser.add_argument("--project", default="NASuRL", help="Project Folder") 8 | parser.add_argument("--folder_suffix", default="Default", help="Folder Name Suffix") 9 | parser.add_argument("--dataset", default="cifar-10", help="Name of the Dataset") 10 | parser.add_argument("--num_classes", default=10, type=int, help="Number of Classes") 11 | parser.add_argument("--opt", default="sgd", help="Optimizer : adam, rmsprop, sgd, normal") 12 | parser.add_argument("--hyperparams", default=10, help="Number of Hyperparameters to search") 13 | parser.add_argument("--lr", default=0.1, help="Learning Rate", type=float) 14 | parser.add_argument("--batch_size", default=75, help="Batch Size", type=int) 15 | parser.add_argument("--dropout", default=0.5, help="Dropout Probab. for Pre-Final Layer", type=float) 16 | parser.add_argument("--max_epochs", default=100, help="Maximum Number of Epochs", type=int) 17 | parser.add_argument("--debug", default=False, type=self.str_to_bool, help="Debug Mode") 18 | parser.add_argument("--load", default=False, type=self.str_to_bool, help="Load Model to calculate accuracy") 19 | self.parser=parser 20 | 21 | def str_to_bool(self, string): 22 | if string.lower() == "true": 23 | return True 24 | elif string.lower() == "false": 25 | return False 26 | else : 27 | return argparse.ArgumentTypeError("Boolean Value Expected") 28 | 29 | def get_parser(self): 30 | return self.parser -------------------------------------------------------------------------------- /src/rnn_controller.py: -------------------------------------------------------------------------------- 1 | #RNN controller code 2 | 3 | import os 4 | import sys 5 | import utils 6 | import numpy as np 7 | import tensorflow as tf 8 | 9 | class Network(object): 10 | # My Concern is that some of these activation function might be numerically unstable due to the implementation 11 | # tf.log(1+exp(x)) is one of these things 12 | 13 | def __init__(self, config): 14 | self.config = config 15 | self.n_steps = 10 16 | self.n_input, self.n_hidden = 4, 2 17 | self.state = tf.Variable(tf.random_normal(shape=[1, 4])) 18 | self.lstm = tf.contrib.rnn.BasicLSTMCell(self.n_hidden, forget_bias=1.0, state_is_tuple=False) 19 | self.Wc, self.bc = self.init_controller_vars() 20 | self.Wv, self.bv = self.init_value_vars() 21 | 22 | # Other functions used in the paper 23 | # self.full_list_unary = {1:lambda x:x ,2:lambda x: -x, 3: tf.abs, 4:lambda x : tf.pow(x,2),5:lambda x : tf.pow(x,3), 24 | # 6:tf.sqrt,7:lambda x: tf.Variable(tf.truncated_normal([1], stddev=0.08))*x, 25 | # 8:lambda x : x + tf.Variable(tf.truncated_normal([1], stddev=0.08)),9:lambda x: tf.log(tf.abs(x)+10e-8), 26 | # 10:tf.exp,11:tf.sin,12:tf.sinh,13:tf.cosh,14:tf.tanh,15:tf.asinh,16:tf.atan,17:lambda x: tf.sin(x)/x, 27 | # 18:lambda x : tf.maximum(x,0),19:lambda x : tf.minimum(x,0),20:tf.sigmoid,21:lambda x:tf.log(1+tf.exp(x)), 28 | # 22:lambda x:tf.exp(-tf.pow(x,2)),23:tf.erf,24:lambda x: tf.Variable(tf.truncated_normal([1], stddev=0.08))} 29 | # 30 | # self.full_list_binary = {1:lambda x,y: x+y,2:lambda x,y:x*y,3:lambda x,y:x-y,4:lambda x,y:x/(y+10e-8), 31 | # 5:lambda x,y:tf.maximum(x,y),6:lambda x,y: tf.sigmoid(x)*y,7:lambda x,y:tf.exp(-tf.Variable(tf.truncated_normal([1], stddev=0.08))*tf.pow(x-y,2)), 32 | # 8:lambda x,y:tf.exp(-tf.Variable(tf.truncated_normal([1], stddev=0.08))*tf.abs(x-y)), 33 | # 9:lambda x,y: tf.Variable(tf.truncated_normal([1], stddev=0.08))*x + (1-tf.Variable(tf.truncated_normal([1], stddev=0.08)))*y} 34 | # 35 | # self.unary = {1:lambda x:x ,2:lambda x: -x, 3: lambda x: tf.maximum(x,0), 4:lambda x : tf.pow(x,2),5:tf.tanh} 36 | # binary = {1:lambda x,y: x+y,2:lambda x,y:x*y,3:lambda x,y:x-y,4:lambda x,y:tf.maximum(x,y),5:lambda x,y: tf.sigmoid(x)*y} 37 | # inputs = {1:lambda x:x , 2:lambda x:0, 3: lambda x:3.14159265,4: lambda x : 1, 5: lambda x: 1.61803399} 38 | 39 | def weight_variable(self, shape, name): 40 | return tf.Variable(tf.random_normal(shape=shape), name=name) 41 | 42 | def bias_variable(self, shape, name): 43 | return tf.Variable(tf.random_normal(shape=shape), name=name) 44 | 45 | def init_controller_vars(self): 46 | Wc = self.weight_variable(shape=[self.n_hidden, self.n_input], name="w_controller") 47 | bc = self.bias_variable(shape=[self.n_input], name="b_controller") 48 | return Wc, bc 49 | 50 | def init_value_vars(self): 51 | Wv = self.weight_variable(shape=[self.n_hidden, 1], name="w_controller") 52 | bv = self.bias_variable(shape=[1], name="b_controller") 53 | return Wv, bv 54 | 55 | def neural_search(self): 56 | inp = tf.constant(np.ones((1, 4), dtype="float32")) 57 | output = list() 58 | for _ in range(self.n_steps): 59 | inp, self.state = self.lstm(inp, self.state) 60 | value = tf.nn.softmax(tf.matmul(inp, self.Wv) + self.bv) 61 | inp = tf.nn.softmax(tf.matmul(inp, self.Wc) + self.bc) 62 | output.append(inp[0, :]) 63 | out = [utils.max(output[i]) for i in range(self.n_steps)] 64 | return out, output[-1],value 65 | 66 | def gen_hyperparams(self, output): 67 | options = tf.constant([1,2,3,4], dtype=tf.int32) 68 | hyperparams = [1 for _ in range(self.n_steps)] 69 | # Change the following based on number of hyperparameters to be predicted 70 | # Removing strides for now 71 | hyperparams[0], hyperparams[1] = options[output[0]], options[output[1]] 72 | hyperparams[2] = options[output[2]] # Layer 1 73 | hyperparams[3], hyperparams[4] = options[output[3]], options[output[5]] 74 | hyperparams[5] = options[output[5]] # Layer 2 75 | hyperparams[6], hyperparams[7] = options[output[6]], options[output[7]] 76 | hyperparams[8] = options[output[8]] # Layer 3 77 | hyperparams[9] = options[output[9]] # FNN Layer 78 | return hyperparams 79 | 80 | def REINFORCE(self, prob): 81 | loss = tf.reduce_mean(tf.log(prob)) # Might have to take the negative 82 | return loss 83 | 84 | def entropyloss(self,prob): 85 | tf.assert_rank_at_least(tf.log(tf.log(tf.clip_by_value(prob, 1e-10, 1.0))),1,message="clipping is computed wrongly, wrong rank") 86 | tf.assert_rank_at_least(tf.log(prob),1,message="log(prob) is computed wrongly, wrong rank") 87 | entropy = -tf.reduce_mean(tf.exp(tf.add(tf.log(prob),tf.log(tf.log(tf.clip_by_value(prob, 1e-10, 1.0))))), axis=1) 88 | return entropy 89 | 90 | def Lclip(self,val_accuracy,a_t): 91 | e = 0.2 92 | return tf.reduce_mean(tf.minimum(val_accuracy*a_t,tf.clip_by_value(val_accuracy,1-e,1+e)*a_t)) 93 | 94 | def Lvf(self,delta): 95 | return tf.reduce_mean(tf.square(delta)) 96 | 97 | def train_controller(self, reinforce_loss, val_accuracy): 98 | #Adam was used to train the RNN controller Bello et al 2017 99 | learning_rate = 1e-5 #As per Bello et al 2017 100 | optimizer = tf.train.AdamOptimizer(learning_rate) 101 | var_list = [self.Wc, self.bc] 102 | gradients = optimizer.compute_gradients(loss=reinforce_loss, var_list=var_list) 103 | for i, (grad, var) in enumerate(gradients): 104 | if grad is not None: 105 | gradients[i] = (grad * val_accuracy, var) 106 | return optimizer.apply_gradients(gradients) 107 | 108 | def update(self, reinforce_loss): 109 | #Adam was used to train the RNN controller Bello et al 2017 110 | learning_rate = 1e-5 #As per Bello et al 2017 111 | optimizer = tf.train.AdamOptimizer(learning_rate) 112 | var_list = [self.Wc, self.bc] 113 | gradients = optimizer.compute_gradients(loss=reinforce_loss, var_list=var_list) 114 | return optimizer.apply_gradients(gradients) 115 | -------------------------------------------------------------------------------- /src/swish.py: -------------------------------------------------------------------------------- 1 | # Module 7: Convolutional Neural Network (CNN) 2 | # CNN model with dropout for MNIST dataset 3 | 4 | # CNN structure: 5 | # · · · · · · · · · · input data X [batch, 28, 28, 1] 6 | # @ @ @ @ @ @ @ @ @ @ -- conv. layer 5x5x1x4 stride 1 W1 [5, 5, 1, 4] 7 | # ∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶ Y1 [batch, 28, 28, 4] 8 | # @ @ @ @ @ @ @ @ -- conv. layer 5x5x4x8 with max pooling stride 2 W2 [5, 5, 4, 8] 9 | # ∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶ Y2 [batch, 14, 14, 8] 10 | # @ @ @ @ @ @ -- conv. layer 4x4x8x12 stride 2 with max pooling stride 2 W3 [4, 4, 8, 12] 11 | # ∶∶∶∶∶∶∶∶∶∶∶ Y3 [batch, 7, 7, 12] 12 | # \x/x\x\x/ -- fully connected layer (relu) W4 [7*7*12, 200] 13 | # · · · · Y4 [batch, 200] 14 | # \x/x\x/ -- fully connected layer (softmax) W5 [200, 10] 15 | # · · · Y [batch, 10] 16 | 17 | import os 18 | os.environ['TF_CPP_MIN_LOG_LEVEL']='2' 19 | 20 | # Hyper Parameters 21 | learning_rate = 0.01 22 | training_epochs = 2 23 | batch_size = 100 24 | 25 | import tensorflow as tf 26 | 27 | from tensorflow.examples.tutorials.mnist import input_data 28 | mnist = input_data.read_data_sets("mnist", one_hot=True,reshape=False,validation_size=0) 29 | logdir = '/users/mingliangang/Desktop/grad_cnn' 30 | # Step 1: Initial Setup 31 | X = tf.placeholder(tf.float32, [None, 28, 28, 1]) 32 | y = tf.placeholder(tf.float32, [None, 10]) 33 | pkeep = tf.placeholder(tf.float32) 34 | 35 | L1 = 4 # first convolutional filters 36 | L2 = 8 # second convolutional filters 37 | L3 = 12 # third convolutional filters 38 | L4 = 200 # fully connected neurons 39 | 40 | W1 = tf.Variable(tf.truncated_normal([5,5,1,L1], stddev=0.08)) 41 | B1 = tf.Variable(tf.zeros([L1])) 42 | beta1 = tf.Variable(tf.truncated_normal([1], stddev=0.08)) 43 | W2 = tf.Variable(tf.truncated_normal([5,5,L1,L2], stddev=0.08)) 44 | B2 = tf.Variable(tf.zeros([L2])) 45 | beta2 = tf.Variable(tf.truncated_normal([1], stddev=0.08)) 46 | W3 = tf.Variable(tf.truncated_normal([4,4,L2,L3], stddev=0.08)) 47 | B3 = tf.Variable(tf.zeros([L3])) 48 | beta3 = tf.Variable(tf.truncated_normal([1], stddev=0.08)) 49 | W4 = tf.Variable(tf.truncated_normal([7*7*L3,L4], stddev=0.08)) 50 | B4 = tf.Variable(tf.zeros([L4])) 51 | W5 = tf.Variable(tf.truncated_normal([L4, 10], stddev=0.08)) 52 | B5 = tf.Variable(tf.zeros([10])) 53 | #tf.summary.scalar('W1',tf.reduce_mean(W1)) 54 | 55 | # Step 2: Setup Model 56 | x1 = tf.nn.conv2d(X, W1, strides=[1,1,1,1], padding='SAME') + B1 57 | Y1 = x1*tf.nn.sigmoid(beta1*x1)# output is 28x28 58 | x2 = tf.nn.conv2d(Y1, W2, strides=[1,1,1,1], padding='SAME') + B2 59 | Y2 = x2*tf.nn.sigmoid(beta2*x2) 60 | Y2 = tf.nn.max_pool(Y2, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME') # output is 14x14 61 | Y2= tf.nn.dropout(Y2, pkeep) 62 | x3 = tf.nn.conv2d(Y2, W3, strides=[1,1,1,1], padding='SAME') + B3 63 | Y3 = x3*tf.nn.sigmoid(beta3*x3) 64 | Y3 = tf.nn.max_pool(Y3, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME') # output is 7x7 65 | Y3= tf.nn.dropout(Y3, pkeep) 66 | 67 | # Flatten the third convolution for the fully connected layer 68 | YY = tf.reshape(Y3, shape=[-1, 7 * 7 * L3]) 69 | 70 | Y4 = tf.nn.relu(tf.matmul(YY, W4) + B4) 71 | #YY4 = tf.nn.dropout(Y4, 0.3) 72 | Ylogits = tf.matmul(Y4, W5) + B5 73 | yhat = tf.nn.softmax(Ylogits) 74 | 75 | # Step 3: Loss Functions 76 | loss = tf.reduce_mean( 77 | tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=y)) 78 | tf.summary.scalar('loss',loss) 79 | # Step 4: Optimizer 80 | #optimizer = tf.train.RMSPropOptimizer(learning_rate) 81 | optimizer = tf.train.AdamOptimizer(learning_rate) 82 | #optimizer = tf.train.AdamOptimizer() 83 | grad = optimizer.compute_gradients(loss) 84 | tf.summary.scalar('beta1',tf.reduce_mean(beta1)) 85 | tf.summary.scalar('grad',tf.reduce_mean(grad[0][0])) 86 | tf.summary.scalar('W1',tf.reduce_mean(grad[0][1])) 87 | tf.summary.histogram('grad',tf.reduce_mean(grad[0][0])) 88 | tf.summary.histogram('W1',tf.reduce_mean(grad[0][1])) 89 | tf.summary.histogram('beta1',tf.reduce_mean(beta1)) 90 | 91 | train = optimizer.minimize(loss) 92 | 93 | # accuracy of the trained model, between 0 (worst) and 1 (best) 94 | is_correct = tf.equal(tf.argmax(y,1),tf.argmax(yhat,1)) 95 | accuracy = tf.reduce_mean(tf.cast(is_correct,tf.float32)) 96 | 97 | init = tf.global_variables_initializer() 98 | sess = tf.Session() 99 | merged = tf.summary.merge_all() 100 | writer = tf.summary.FileWriter(logdir+ '/train', 101 | sess.graph) 102 | sess.run(init) 103 | 104 | # Step 5: Training Loop 105 | for epoch in range(training_epochs): 106 | num_batches = int(mnist.train.num_examples / batch_size) 107 | for i in range(num_batches): 108 | batch_X, batch_y = mnist.train.next_batch(batch_size) 109 | train_data = {X: batch_X, y: batch_y, pkeep: 0.5} 110 | summary,_ = sess.run([merged,train], feed_dict=train_data) 111 | writer.add_summary(summary,epoch*num_batches+i+1) 112 | print(epoch * num_batches + i + 1, "Training accuracy =", sess.run(accuracy, feed_dict=train_data), 113 | "Loss =", sess.run(loss, feed_dict=train_data)) 114 | 115 | # Step 6: Evaluation 116 | test_data = {X:mnist.test.images,y:mnist.test.labels, pkeep: 1.0} 117 | print("Testing Accuracy = ", sess.run(accuracy, feed_dict = test_data)) 118 | -------------------------------------------------------------------------------- /src/train/events.out.tfevents.1514935452.6bf252a4b161: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/train/events.out.tfevents.1514935452.6bf252a4b161 -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import shutil 3 | import numpy as np 4 | import tensorflow as tf 5 | 6 | def path_exists(path, overwrite=False): 7 | if not os.path.isdir(path): 8 | os.mkdir(path) 9 | elif overwrite == True : 10 | shutil.rmtree(path) 11 | return path 12 | 13 | def remove_dir(path): 14 | os.rmdir(path) 15 | return True 16 | 17 | def relu_init(shape, dtype=tf.float32, partition_info=None): 18 | init_range = np.sprt(2.0 / shape[1]) 19 | return tf.random_normal(shape, dtype=dtype) * init_range 20 | 21 | def ones(shape, dtype=tf.float32): 22 | return tf.ones(shape, dtype=dtype) 23 | 24 | def zeros(shape, dtype=tf.float32): 25 | return tf.zeros(shape, dtype=dtype) 26 | 27 | def tanh_init(shape, dtype=tf.float32, partition_info=None): 28 | init_range = np.sqrt(6.0 / (shape[0] + shape[1])) 29 | return tf.random_uniform(shape, minval=-init_range, maxval=init_range, dtype=dtype) 30 | 31 | def leaky_relu(X, alpha=0.01): 32 | return tf.maximum(X, alpha * X) 33 | 34 | def max(input): 35 | return tf.argmax(input) 36 | --------------------------------------------------------------------------------