├── LICENSE
├── README.md
├── img
    ├── Rnn.png
    ├── graph.png
    ├── nas.jpeg
    ├── swish_.png
    ├── swish_com.png
    └── swish_graph.png
└── src
    ├── Searching for activation function.ipynb
    ├── WMT14.py
    ├── __pycache__
        ├── config.cpython-35.pyc
        ├── config.cpython-36.pyc
        ├── dataset.cpython-35.pyc
        ├── models.cpython-35.pyc
        ├── models.cpython-36.pyc
        ├── network.cpython-35.pyc
        ├── network.cpython-36.pyc
        ├── parser.cpython-35.pyc
        ├── parser.cpython-36.pyc
        ├── resnet.cpython-35.pyc
        ├── resnet.cpython-36.pyc
        ├── resnet_model.cpython-35.pyc
        ├── resnet_model.cpython-36.pyc
        ├── utils.cpython-35.pyc
        └── utils.cpython-36.pyc
    ├── childnetwork.py
    ├── cifar100_download_and_extract.py
    ├── cifar100_test.py
    ├── cifar100_train.py
    ├── cifar10_download_and_extract.py
    ├── config.py
    ├── dataset.py
    ├── img
        ├── Rnn.png
        ├── activationfunctions.png
        ├── graph.png
        ├── loss_rmsprop.png
        └── nas.jpeg
    ├── main.py
    ├── parser.py
    ├── rnn_controller.py
    ├── swish.py
    ├── train
        └── events.out.tfevents.1514935452.6bf252a4b161
    └── utils.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License
 2 | 
 3 | Copyright (c) Ang Ming Liang
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in
13 | all copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21 | THE SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Searching for activation functions 
 2 | 
 3 | This project attempts to implement NIPS 2017 paper "Searching for activation function" (Zoph & Le 2017). Although neural networks are powerful and flexible models, they are still hard to design and limited by human creativity. Using a combination of exhaustive and reinforcement learning-based search, the paper claims to be able to discover multiple novel activation functions. We tried to verify the claims of this paper by replicating the original study. However we were unable to get good results as probably due to the lack of massive computing resources used in the original experiment (800 Titan X GPUs).   
 4 | 
 5 | ![alt text](https://github.com/Neoanarika/Searching-for-activation-functions/blob/master/img/nas.jpeg)
 6 | 
 7 | # Dependencies 
 8 | 
 9 | - Anaconda3
10 | - TensorFlow-GPU >=1.4
11 | 
12 | # Setting up the docker environment
13 | If you do not have the right dependencies to run this project, you can use our docker image which we used too to run these experiments on. 
14 | ```
15 | docker pull etheleon/dotfiles
16 | docker run --runtime=nvidia -it etheleon/dotfiles
17 | ```
18 | 
19 | 
20 | # Running the code
21 | Do a git clone of the repo first, then navigate into the src folder where the code of this project is stored
22 | ``` 
23 | git clone https://github.com/Neoanarika/Searching-for-activation-functions.git
24 | cd Searching-for-activation-functions
25 | cd src
26 | ```
27 | Download the data first, then find the activation functions
28 | ```
29 | python cifar10_download_and_extract.py
30 | python main.py
31 | ```
32 | 
33 | Next, test against your newly generated activation functions 
34 | ```
35 | python cifar100_download_and_extract.py
36 | python cifar100_train.py
37 | python cifar100_test.py
38 | ```
39 | 
40 | Or you can open up the jupyter notebook in the repo and run from there. 
41 | 
42 | # RNN controller 
43 | 
44 | ![alt text](https://github.com/Neoanarika/Searching-for-activation-functions/blob/master/img/Rnn.png)
45 | ![alt text](https://github.com/Neoanarika/Searching-for-activation-functions/blob/master/img/graph.png)
46 | 
47 | # Some sample activation functions found
48 | | Activation functions  |
49 | | ------------- |
50 | | 3x  |
51 | | 1  |
52 | | -3  |
53 | 
54 | Clearly we are doing something wrong, the problem with implementing these papers is that even if it doesn't work, it could be due to us not running it long enough, or perhaps there's a bug in the program that we are unaware of that is causing the negative result. 
55 | 
56 | # Evaluating Swish
57 | We also implemented swish, which was the activaiton function found and discussed in the original paper
58 | 
59 | ![alt text](https://github.com/Neoanarika/Searching-for-activation-functions/blob/master/img/swish_.png)
60 | 
61 | ```
62 | python swish.py
63 | ```
64 | 
65 | ![alt text](https://github.com/Neoanarika/Searching-for-activation-functions/blob/master/src/img/loss_rmsprop.png)
66 | 
67 | We found a few things, the first is that sometimes during the inital phase of training, the loss function remains the same on average. This shows that swish suffers from poor intialisation during training, at least when using initally normal distributed weights with std_dev =0.1. We tried various initialisations but there were no improvements found. Finially changing the optimiser from SGD to Rmsprop solved the problem. The diagram above is from training with Rmsprop. 
68 | 
69 | 
70 | # Visualising Swish activation function
71 | ![alt text](https://github.com/Neoanarika/Searching-for-activation-functions/blob/master/img/swish_com.png)
72 | 
73 | Swish has a sharp global minima especially when compared with Relu, which may account for the high variance of the gradient updates as the model might be stuck in the wedge to reach the global minima. Learning rate decay might thus help improve the training for models using swish. Furthermore a sharper minima corresponds with poorer generalisation, which might explain why it performs slightly worse than relu in practise. 
74 | 
75 | # Citation
76 | ```
77 | @article{DBLP:journals/corr/abs-1710-05941,
78 |   author    = {Prajit Ramachandran and
79 |                Barret Zoph and
80 |                Quoc V. Le},
81 |   title     = {Searching for Activation Functions},
82 |   journal   = {CoRR},
83 |   volume    = {abs/1710.05941},
84 |   year      = {2017},
85 |   url       = {http://arxiv.org/abs/1710.05941},
86 |   archivePrefix = {arXiv},
87 |   eprint    = {1710.05941},
88 |   timestamp = {Wed, 01 Nov 2017 19:05:42 +0100},
89 |   biburl    = {http://dblp.org/rec/bib/journals/corr/abs-1710-05941},
90 |   bibsource = {dblp computer science bibliography, http://dblp.org}
91 | }
92 | ```
93 | 
94 | 


--------------------------------------------------------------------------------
/img/Rnn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/img/Rnn.png


--------------------------------------------------------------------------------
/img/graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/img/graph.png


--------------------------------------------------------------------------------
/img/nas.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/img/nas.jpeg


--------------------------------------------------------------------------------
/img/swish_.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/img/swish_.png


--------------------------------------------------------------------------------
/img/swish_com.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/img/swish_com.png


--------------------------------------------------------------------------------
/img/swish_graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/img/swish_graph.png


--------------------------------------------------------------------------------
/src/Searching for activation function.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Overview of the project "
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "This project attempts to implement NIPS 2017 paper \"Searching for activation function\" (Zoph & Le 2017). Although neural networks are powerful and flexible models they are still hard to design and limited to human creativity. One important consideration to designing a neural networks is the activation function as it works as the non-linearity between the affine transformation in a neural network. However how do we choose which basic functions to use and combine with to construct a new activaiton functions. Essentially the problem becomes a search problem of finding the best activation function in a search space. The approach that Zoph and Le took in their paper was similar to their earlier work \"Neural archiecture search using reinforcement learning\" that used a RNN to sample the possible hyperparamers of a neural network while using a polciy gradient approach(specifically REINFORCE)to train the network to maximise the validation accuracy of the child network (i.e. the reward signal)."
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "![title](img/nas.jpeg)"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "markdown",
 26 |    "metadata": {},
 27 |    "source": [
 28 |     "In this paper they used a RNN to instead generate the choice of unary and binary functions used to create the activation function and trained the RNN using the Policy Proximal Optimization (PPO) algorthim.  "
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "markdown",
 33 |    "metadata": {},
 34 |    "source": [
 35 |     "![title](img/activationfunctions.png)"
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "markdown",
 40 |    "metadata": {},
 41 |    "source": [
 42 |     "Using PPO is in strak contrast with the other 2 papers previous publish by Zoph & Le which used REINFORCE and Trust Region Policies (TRP). A reason why did not used REINFORCE was given in their paper \"Neural Optimizer Search with Reinforcement Learning\", there the authors explanation was that REINFORCE exhbited poor sampling efficiency comapred to Trust Region Policies. However no explanations was given as to why PPO was used in searching for activations instead of TRP. Another thing the author failed to explain was why they used policy gradient methods instead of other approaches like evolutionary strategies. Finally there was no control done in any of the papers as to whether their approach was better than say random search and how much better was it than random search. Such a controlled experiment might also make it possible to compare other approaches say evolutionary strategies with their method more fairly. This is however outside the scope of a mere paper implementation and is worth considering as a future project. "
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "markdown",
 47 |    "metadata": {},
 48 |    "source": [
 49 |     "![title](img/Rnn.png)"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "markdown",
 54 |    "metadata": {},
 55 |    "source": [
 56 |     "# Dependencies"
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "markdown",
 61 |    "metadata": {},
 62 |    "source": [
 63 |     "This project requires tensorflow gpu"
 64 |    ]
 65 |   },
 66 |   {
 67 |    "cell_type": "code",
 68 |    "execution_count": null,
 69 |    "metadata": {
 70 |     "collapsed": true
 71 |    },
 72 |    "outputs": [],
 73 |    "source": [
 74 |     "!pip install tensorflow-gpu"
 75 |    ]
 76 |   },
 77 |   {
 78 |    "cell_type": "markdown",
 79 |    "metadata": {},
 80 |    "source": [
 81 |     "# Start searching for activation functions"
 82 |    ]
 83 |   },
 84 |   {
 85 |    "cell_type": "markdown",
 86 |    "metadata": {},
 87 |    "source": [
 88 |     "Download cifar-10 datset"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "code",
 93 |    "execution_count": null,
 94 |    "metadata": {
 95 |     "collapsed": true
 96 |    },
 97 |    "outputs": [],
 98 |    "source": [
 99 |     "!python cifar10_download_and_extract.py"
100 |    ]
101 |   },
102 |   {
103 |    "cell_type": "markdown",
104 |    "metadata": {},
105 |    "source": [
106 |     "Run the training program"
107 |    ]
108 |   },
109 |   {
110 |    "cell_type": "code",
111 |    "execution_count": null,
112 |    "metadata": {
113 |     "collapsed": true
114 |    },
115 |    "outputs": [],
116 |    "source": [
117 |     "!python main.py"
118 |    ]
119 |   },
120 |   {
121 |    "cell_type": "markdown",
122 |    "metadata": {},
123 |    "source": [
124 |     "The search is conducted using ResNet-20 as the child network architecture and trained on CIFAR-10 for 10k. "
125 |    ]
126 |   },
127 |   {
128 |    "cell_type": "markdown",
129 |    "metadata": {},
130 |    "source": [
131 |     "# Testing on CIFAR-100"
132 |    ]
133 |   },
134 |   {
135 |    "cell_type": "markdown",
136 |    "metadata": {},
137 |    "source": [
138 |     "In the paper 3 datasets were used to test the transfer capabilties of the newly found activation functions \n",
139 |     "1. CIFAR-100\n",
140 |     "2. ImageNet\n",
141 |     "3. WMT\n",
142 |     "\n",
143 |     "I wasn't able to donwload imagenet because of it's large size. WMT 2014 EnglishIn this notebook we only look at CIFAR100 and WMT 2014 English-German Dataset"
144 |    ]
145 |   },
146 |   {
147 |    "cell_type": "code",
148 |    "execution_count": null,
149 |    "metadata": {
150 |     "collapsed": true
151 |    },
152 |    "outputs": [],
153 |    "source": [
154 |     "!python cifar100_download_and_extract.py\n",
155 |     "!python cifar100_train.py\n",
156 |     "!python cifar100_test.py"
157 |    ]
158 |   },
159 |   {
160 |    "cell_type": "markdown",
161 |    "metadata": {},
162 |    "source": [
163 |     "# Swish"
164 |    ]
165 |   },
166 |   {
167 |    "cell_type": "markdown",
168 |    "metadata": {},
169 |    "source": [
170 |     "Swish was found by the original activation function search implemented by the the original paper by Zoph and Le and was demostrated to have an **improvement of the top-1 classification by ImageNet by 0.9% by simply\n",
171 |     "replacing all relu activation functions with swish**. "
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "code",
176 |    "execution_count": null,
177 |    "metadata": {
178 |     "collapsed": true
179 |    },
180 |    "outputs": [],
181 |    "source": [
182 |     "!python swish.py"
183 |    ]
184 |   },
185 |   {
186 |    "cell_type": "markdown",
187 |    "metadata": {},
188 |    "source": [
189 |     "![title](img/loss_rmsprop.png)"
190 |    ]
191 |   },
192 |   {
193 |    "cell_type": "code",
194 |    "execution_count": null,
195 |    "metadata": {
196 |     "collapsed": true
197 |    },
198 |    "outputs": [],
199 |    "source": []
200 |   }
201 |  ],
202 |  "metadata": {
203 |   "anaconda-cloud": {},
204 |   "kernelspec": {
205 |    "display_name": "Python [conda root]",
206 |    "language": "python",
207 |    "name": "conda-root-py"
208 |   },
209 |   "language_info": {
210 |    "codemirror_mode": {
211 |     "name": "ipython",
212 |     "version": 3
213 |    },
214 |    "file_extension": ".py",
215 |    "mimetype": "text/x-python",
216 |    "name": "python",
217 |    "nbconvert_exporter": "python",
218 |    "pygments_lexer": "ipython3",
219 |    "version": "3.5.2"
220 |   }
221 |  },
222 |  "nbformat": 4,
223 |  "nbformat_minor": 1
224 | }
225 | 


--------------------------------------------------------------------------------
/src/WMT14.py:
--------------------------------------------------------------------------------
 1 | # Copyright 2015 The TensorFlow Authors. All Rights Reserved.
 2 | #
 3 | # Licensed under the Apache License, Version 2.0 (the "License");
 4 | # you may not use this file except in compliance with the License.
 5 | # You may obtain a copy of the License at
 6 | #
 7 | #     http://www.apache.org/licenses/LICENSE-2.0
 8 | #
 9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 | 
16 | """Downloads and extracts the binary version of the CIFAR-10 dataset."""
17 | 
18 | from __future__ import absolute_import
19 | from __future__ import division
20 | from __future__ import print_function
21 | 
22 | import argparse
23 | import os
24 | import sys
25 | import tarfile
26 | 
27 | from six.moves import urllib
28 | import tensorflow as tf
29 | 
30 | DATA_URL = 'https://wit3.fbk.eu/archive/2016-01//texts/de/en/de-en.tgz'
31 | 
32 | parser = argparse.ArgumentParser()
33 | 
34 | parser.add_argument(
35 |     '--data_dir', type=str, default='../data/wmt/',
36 |     help='Directory to download data and extract the tarball')
37 | 
38 | 
39 | def main(unused_argv):
40 |   """Download and extract the tarball from Alex's website."""
41 |   if not os.path.exists(FLAGS.data_dir):
42 |     os.makedirs(FLAGS.data_dir)
43 | 
44 |   filename = DATA_URL.split('/')[-1]
45 |   filepath = os.path.join(FLAGS.data_dir, filename)
46 | 
47 |   if not os.path.exists(filepath):
48 |     def _progress(count, block_size, total_size):
49 |       sys.stdout.write('\r>> Downloading %s %.1f%%' % (
50 |           filename, 100.0 * count * block_size / total_size))
51 |       sys.stdout.flush()
52 | 
53 |     filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress)
54 |     print()
55 |     statinfo = os.stat(filepath)
56 |     print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
57 | 
58 |   tarfile.open(filepath, 'r:gz').extractall(FLAGS.data_dir)
59 | 
60 | 
61 | if __name__ == '__main__':
62 |   FLAGS, unparsed = parser.parse_known_args()
63 |   tf.app.run(argv=[sys.argv[0]] + unparsed)
64 | 


--------------------------------------------------------------------------------
/src/__pycache__/config.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/config.cpython-35.pyc


--------------------------------------------------------------------------------
/src/__pycache__/config.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/config.cpython-36.pyc


--------------------------------------------------------------------------------
/src/__pycache__/dataset.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/dataset.cpython-35.pyc


--------------------------------------------------------------------------------
/src/__pycache__/models.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/models.cpython-35.pyc


--------------------------------------------------------------------------------
/src/__pycache__/models.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/models.cpython-36.pyc


--------------------------------------------------------------------------------
/src/__pycache__/network.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/network.cpython-35.pyc


--------------------------------------------------------------------------------
/src/__pycache__/network.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/network.cpython-36.pyc


--------------------------------------------------------------------------------
/src/__pycache__/parser.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/parser.cpython-35.pyc


--------------------------------------------------------------------------------
/src/__pycache__/parser.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/parser.cpython-36.pyc


--------------------------------------------------------------------------------
/src/__pycache__/resnet.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/resnet.cpython-35.pyc


--------------------------------------------------------------------------------
/src/__pycache__/resnet.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/resnet.cpython-36.pyc


--------------------------------------------------------------------------------
/src/__pycache__/resnet_model.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/resnet_model.cpython-35.pyc


--------------------------------------------------------------------------------
/src/__pycache__/resnet_model.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/resnet_model.cpython-36.pyc


--------------------------------------------------------------------------------
/src/__pycache__/utils.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/utils.cpython-35.pyc


--------------------------------------------------------------------------------
/src/__pycache__/utils.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/__pycache__/utils.cpython-36.pyc


--------------------------------------------------------------------------------
/src/childnetwork.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | """Contains definitions for the preactivation form of Residual Networks.
 16 | 
 17 | Residual networks (ResNets) were originally proposed in:
 18 | [1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
 19 |     Deep Residual Learning for Image Recognition. arXiv:1512.03385
 20 | 
 21 | The full preactivation 'v2' ResNet variant implemented in this module was
 22 | introduced by:
 23 | [2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
 24 |     Identity Mappings in Deep Residual Networks. arXiv: 1603.05027
 25 | 
 26 | The key difference of the full preactivation 'v2' variant compared to the
 27 | 'v1' variant in [1] is the use of batch normalization before every weight layer
 28 | rather than after.
 29 | """
 30 | 
 31 | from __future__ import absolute_import
 32 | from __future__ import division
 33 | from __future__ import print_function
 34 | 
 35 | import tensorflow as tf
 36 | 
 37 | _BATCH_NORM_DECAY = 0.997
 38 | _BATCH_NORM_EPSILON = 1e-5
 39 | 
 40 | def batch_norm_relu(inputs, is_training, data_format):
 41 |   """Performs a batch normalization followed by a ReLU."""
 42 |   # We set fused=True for a significant performance boost. See
 43 |   # https://www.tensorflow.org/performance/performance_guide#common_fused_ops
 44 |   inputs = tf.layers.batch_normalization(
 45 |       inputs=inputs, axis=1 if data_format == 'channels_first' else 3,
 46 |       momentum=_BATCH_NORM_DECAY, epsilon=_BATCH_NORM_EPSILON, center=True,
 47 |       scale=True, training=is_training, fused=True)
 48 | 
 49 |   unary = {"1":lambda x:x ,"2":lambda x: -x, "3": lambda x: tf.maximum(x,0), "4":lambda x : tf.pow(x,2),"5":lambda x : tf.tanh(tf.cast(x,tf.float32))}
 50 |   binary = {"1":lambda x,y: tf.add(x,y),"2":lambda x,y:tf.multiply(x,y),"3":lambda x,y:tf.add(x,-y),"4":lambda x,y:tf.maximum(x,y),"5":lambda x,y: tf.sigmoid(x)*y}
 51 |   input_fun = {"1":lambda x:tf.cast(x,tf.float32) , "2":lambda x:tf.zeros(tf.shape(x)), "3": lambda x:2*tf.ones(tf.shape(x)),"4": lambda x : tf.ones(tf.shape(x)), "5": lambda x: -tf.ones(tf.shape(x))}
 52 | 
 53 |   with open("tmp","r") as f:
 54 |       activation = f.readline()
 55 |       activation = activation.split(" ")
 56 | 
 57 |   inputs = binary[activation[8]](unary[activation[5]](binary[activation[4]](unary[activation[2]](input_fun[activation[0]](inputs)),unary[activation[3]](input_fun[activation[1]](inputs)))),unary[activation[7]](input_fun[activation[6]](inputs)))
 58 |   #inputs = tf.nn.relu(inputs)
 59 |   return inputs
 60 | 
 61 | 
 62 | def fixed_padding(inputs, kernel_size, data_format):
 63 |   """Pads the input along the spatial dimensions independently of input size.
 64 | 
 65 |   Args:
 66 |     inputs: A tensor of size [batch, channels, height_in, width_in] or
 67 |       [batch, height_in, width_in, channels] depending on data_format.
 68 |     kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
 69 |                  Should be a positive integer.
 70 |     data_format: The input format ('channels_last' or 'channels_first').
 71 | 
 72 |   Returns:
 73 |     A tensor with the same format as the input with the data either intact
 74 |     (if kernel_size == 1) or padded (if kernel_size > 1).
 75 |   """
 76 |   pad_total = kernel_size - 1
 77 |   pad_beg = pad_total // 2
 78 |   pad_end = pad_total - pad_beg
 79 | 
 80 |   if data_format == 'channels_first':
 81 |     padded_inputs = tf.pad(inputs, [[0, 0], [0, 0],
 82 |                                     [pad_beg, pad_end], [pad_beg, pad_end]])
 83 |   else:
 84 |     padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end],
 85 |                                     [pad_beg, pad_end], [0, 0]])
 86 |   return padded_inputs
 87 | 
 88 | 
 89 | def conv2d_fixed_padding(inputs, filters, kernel_size, strides, data_format):
 90 |   """Strided 2-D convolution with explicit padding."""
 91 |   # The padding is consistent and is based only on `kernel_size`, not on the
 92 |   # dimensions of `inputs` (as opposed to using `tf.layers.conv2d` alone).
 93 |   if strides > 1:
 94 |     inputs = fixed_padding(inputs, kernel_size, data_format)
 95 | 
 96 |   return tf.layers.conv2d(
 97 |       inputs=inputs, filters=filters, kernel_size=kernel_size, strides=strides,
 98 |       padding=('SAME' if strides == 1 else 'VALID'), use_bias=False,
 99 |       kernel_initializer=tf.variance_scaling_initializer(),
100 |       data_format=data_format)
101 | 
102 | 
103 | def building_block(inputs, filters, is_training, projection_shortcut, strides,
104 |                    data_format):
105 |   """Standard building block for residual networks with BN before convolutions.
106 | 
107 |   Args:
108 |     inputs: A tensor of size [batch, channels, height_in, width_in] or
109 |       [batch, height_in, width_in, channels] depending on data_format.
110 |     filters: The number of filters for the convolutions.
111 |     is_training: A Boolean for whether the model is in training or inference
112 |       mode. Needed for batch normalization.
113 |     projection_shortcut: The function to use for projection shortcuts (typically
114 |       a 1x1 convolution when downsampling the input).
115 |     strides: The block's stride. If greater than 1, this block will ultimately
116 |       downsample the input.
117 |     data_format: The input format ('channels_last' or 'channels_first').
118 | 
119 |   Returns:
120 |     The output tensor of the block.
121 |   """
122 |   shortcut = inputs
123 |   inputs = batch_norm_relu(inputs, is_training, data_format)
124 | 
125 |   # The projection shortcut should come after the first batch norm and ReLU
126 |   # since it performs a 1x1 convolution.
127 |   if projection_shortcut is not None:
128 |     shortcut = projection_shortcut(inputs)
129 | 
130 |   inputs = conv2d_fixed_padding(
131 |       inputs=inputs, filters=filters, kernel_size=3, strides=strides,
132 |       data_format=data_format)
133 | 
134 |   inputs = batch_norm_relu(inputs, is_training, data_format)
135 |   inputs = conv2d_fixed_padding(
136 |       inputs=inputs, filters=filters, kernel_size=3, strides=1,
137 |       data_format=data_format)
138 | 
139 |   return inputs + shortcut
140 | 
141 | 
142 | def bottleneck_block(inputs, filters, is_training, projection_shortcut,
143 |                      strides, data_format):
144 |   """Bottleneck block variant for residual networks with BN before convolutions.
145 | 
146 |   Args:
147 |     inputs: A tensor of size [batch, channels, height_in, width_in] or
148 |       [batch, height_in, width_in, channels] depending on data_format.
149 |     filters: The number of filters for the first two convolutions. Note that the
150 |       third and final convolution will use 4 times as many filters.
151 |     is_training: A Boolean for whether the model is in training or inference
152 |       mode. Needed for batch normalization.
153 |     projection_shortcut: The function to use for projection shortcuts (typically
154 |       a 1x1 convolution when downsampling the input).
155 |     strides: The block's stride. If greater than 1, this block will ultimately
156 |       downsample the input.
157 |     data_format: The input format ('channels_last' or 'channels_first').
158 | 
159 |   Returns:
160 |     The output tensor of the block.
161 |   """
162 |   shortcut = inputs
163 |   inputs = batch_norm_relu(inputs, is_training, data_format)
164 | 
165 |   # The projection shortcut should come after the first batch norm and ReLU
166 |   # since it performs a 1x1 convolution.
167 |   if projection_shortcut is not None:
168 |     shortcut = projection_shortcut(inputs)
169 | 
170 |   inputs = conv2d_fixed_padding(
171 |       inputs=inputs, filters=filters, kernel_size=1, strides=1,
172 |       data_format=data_format)
173 | 
174 |   inputs = batch_norm_relu(inputs, is_training, data_format)
175 |   inputs = conv2d_fixed_padding(
176 |       inputs=inputs, filters=filters, kernel_size=3, strides=strides,
177 |       data_format=data_format)
178 | 
179 |   inputs = batch_norm_relu(inputs, is_training, data_format)
180 |   inputs = conv2d_fixed_padding(
181 |       inputs=inputs, filters=4 * filters, kernel_size=1, strides=1,
182 |       data_format=data_format)
183 | 
184 |   return inputs + shortcut
185 | 
186 | 
187 | def block_layer(inputs, filters, block_fn, blocks, strides, is_training, name,
188 |                 data_format):
189 |   """Creates one layer of blocks for the ResNet model.
190 | 
191 |   Args:
192 |     inputs: A tensor of size [batch, channels, height_in, width_in] or
193 |       [batch, height_in, width_in, channels] depending on data_format.
194 |     filters: The number of filters for the first convolution of the layer.
195 |     block_fn: The block to use within the model, either `building_block` or
196 |       `bottleneck_block`.
197 |     blocks: The number of blocks contained in the layer.
198 |     strides: The stride to use for the first convolution of the layer. If
199 |       greater than 1, this layer will ultimately downsample the input.
200 |     is_training: Either True or False, whether we are currently training the
201 |       model. Needed for batch norm.
202 |     name: A string name for the tensor output of the block layer.
203 |     data_format: The input format ('channels_last' or 'channels_first').
204 | 
205 |   Returns:
206 |     The output tensor of the block layer.
207 |   """
208 |   # Bottleneck blocks end with 4x the number of filters as they start with
209 |   filters_out = 4 * filters if block_fn is bottleneck_block else filters
210 | 
211 |   def projection_shortcut(inputs):
212 |     return conv2d_fixed_padding(
213 |         inputs=inputs, filters=filters_out, kernel_size=1, strides=strides,
214 |         data_format=data_format)
215 | 
216 |   # Only the first block per block_layer uses projection_shortcut and strides
217 |   inputs = block_fn(inputs, filters, is_training, projection_shortcut, strides,
218 |                     data_format)
219 | 
220 |   for _ in range(1, blocks):
221 |     inputs = block_fn(inputs, filters, is_training, None, 1, data_format)
222 | 
223 |   return tf.identity(inputs, name)
224 | 
225 | 
226 | def cifar10_resnet_v2_generator(resnet_size, num_classes, data_format=None):
227 |   """Generator for CIFAR-10 ResNet v2 models.
228 | 
229 |   Args:: A single integer for the size of the ResNet model.
230 |     num_classes: The number of possible classes for image classification.
231 |     data_format: The input format ('channels_last', 'channels_first', or None).
232 |       If set to None, the format is dependent on whether a GPU is available.
233 | 
234 |   Returns:
235 |     The model function that takes in `inputs` and `is_training` and
236 |     returns the output tensor of the ResNet model.
237 | 
238 |   Raises:
239 |     ValueError: If `resnet_size` is invalid.
240 |   """
241 |   if resnet_size % 6 != 2:
242 |     raise ValueError('resnet_size must be 6n + 2:', resnet_size)
243 | 
244 |   num_blocks = (resnet_size - 2) // 6
245 | 
246 |   if data_format is None:
247 |     data_format = (
248 |         'channels_first' if tf.test.is_built_with_cuda() else 'channels_last')
249 | 
250 |   def model(inputs, is_training):
251 |     """Constructs the ResNet model given the inputs."""
252 |     if data_format == 'channels_first':
253 |       # Convert the inputs from channels_last (NHWC) to channels_first (NCHW).
254 |       # This provides a large performance boost on GPU. See
255 |       # https://www.tensorflow.org/performance/performance_guide#data_formats
256 |       inputs = tf.transpose(inputs, [0, 3, 1, 2])
257 | 
258 |     with tf.device("/gpu:0"):
259 |         inputs = conv2d_fixed_padding(
260 |             inputs=inputs, filters=16, kernel_size=3, strides=1,
261 |             data_format=data_format)
262 |         inputs = tf.identity(inputs, 'initial_conv')
263 | 
264 |         inputs = block_layer(
265 |             inputs=inputs, filters=16, block_fn=building_block, blocks=num_blocks,
266 |             strides=1, is_training=is_training, name='block_layer1',
267 |             data_format=data_format)
268 |         inputs = block_layer(
269 |             inputs=inputs, filters=32, block_fn=building_block, blocks=num_blocks,
270 |             strides=2, is_training=is_training, name='block_layer2',
271 |             data_format=data_format)
272 |         inputs = block_layer(
273 |             inputs=inputs, filters=64, block_fn=building_block, blocks=num_blocks,
274 |             strides=2, is_training=is_training, name='block_layer3',
275 |             data_format=data_format)
276 | 
277 |         inputs = batch_norm_relu(inputs, is_training, data_format)
278 |         inputs = tf.layers.average_pooling2d(
279 |             inputs=inputs, pool_size=8, strides=1, padding='VALID',
280 |             data_format=data_format)
281 |         inputs = tf.identity(inputs, 'final_avg_pool')
282 |         inputs = tf.reshape(inputs, [-1, 64])
283 |         inputs = tf.layers.dense(inputs=inputs, units=num_classes)
284 |         inputs = tf.identity(inputs, 'final_dense')
285 |     # inputs = conv2d_fixed_padding(
286 |     #     inputs=inputs, filters=16, kernel_size=3, strides=1,
287 |     #     data_format=data_format)
288 |     # inputs = tf.identity(inputs, 'initial_conv')
289 |     #
290 |     # inputs = block_layer(
291 |     #     inputs=inputs, filters=16, block_fn=building_block, blocks=num_blocks,
292 |     #     strides=1, is_training=is_training, name='block_layer1',
293 |     #     data_format=data_format)
294 |     # inputs = block_layer(
295 |     #     inputs=inputs, filters=32, block_fn=building_block, blocks=num_blocks,
296 |     #     strides=2, is_training=is_training, name='block_layer2',
297 |     #     data_format=data_format)
298 |     # inputs = block_layer(
299 |     #     inputs=inputs, filters=64, block_fn=building_block, blocks=num_blocks,
300 |     #     strides=2, is_training=is_training, name='block_layer3',
301 |     #     data_format=data_format)
302 |     #
303 |     # inputs = batch_norm_relu(inputs, is_training, data_format)
304 |     # inputs = tf.layers.average_pooling2d(
305 |     #     inputs=inputs, pool_size=8, strides=1, padding='VALID',
306 |     #     data_format=data_format)
307 |     # inputs = tf.identity(inputs, 'final_avg_pool')
308 |     # inputs = tf.reshape(inputs, [-1, 64])
309 |     # inputs = tf.layers.dense(inputs=inputs, units=num_classes)
310 |     # inputs = tf.identity(inputs, 'final_dense')
311 |     return inputs
312 | 
313 |   return model
314 | 
315 | 
316 | def imagenet_resnet_v2_generator(block_fn, layers, num_classes,
317 |                                  data_format=None):
318 |   """Generator for ImageNet ResNet v2 models.
319 | 
320 |   Args:
321 |     block_fn: The block to use within the model, either `building_block` or
322 |       `bottleneck_block`.
323 |     layers: A length-4 array denoting the number of blocks to include in each
324 |       layer. Each layer consists of blocks that take inputs of the same size.
325 |     num_classes: The number of possible classes for image classification.
326 |     data_format: The input format ('channels_last', 'channels_first', or None).
327 |       If set to None, the format is dependent on whether a GPU is available.
328 | 
329 |   Returns:
330 |     The model function that takes in `inputs` and `is_training` and
331 |     returns the output tensor of the ResNet model.
332 |   """
333 |   if data_format is None:
334 |     data_format = (
335 |         'channels_first' if tf.test.is_built_with_cuda() else 'channels_last')
336 | 
337 |   def model(inputs, is_training):
338 |     """Constructs the ResNet model given the inputs."""
339 |     if data_format == 'channels_first':
340 |       # Convert the inputs from channels_last (NHWC) to channels_first (NCHW).
341 |       # This provides a large performance boost on GPU. See
342 |       # https://www.tensorflow.org/performance/performance_guide#data_formats
343 |       inputs = tf.transpose(inputs, [0, 3, 1, 2])
344 | 
345 |     inputs = conv2d_fixed_padding(
346 |         inputs=inputs, filters=64, kernel_size=7, strides=2,
347 |         data_format=data_format)
348 |     inputs = tf.identity(inputs, 'initial_conv')
349 |     inputs = tf.layers.max_pooling2d(
350 |         inputs=inputs, pool_size=3, strides=2, padding='SAME',
351 |         data_format=data_format)
352 |     inputs = tf.identity(inputs, 'initial_max_pool')
353 | 
354 |     inputs = block_layer(
355 |         inputs=inputs, filters=64, block_fn=block_fn, blocks=layers[0],
356 |         strides=1, is_training=is_training, name='block_layer1',
357 |         data_format=data_format)
358 |     inputs = block_layer(
359 |         inputs=inputs, filters=128, block_fn=block_fn, blocks=layers[1],
360 |         strides=2, is_training=is_training, name='block_layer2',
361 |         data_format=data_format)
362 |     inputs = block_layer(
363 |         inputs=inputs, filters=256, block_fn=block_fn, blocks=layers[2],
364 |         strides=2, is_training=is_training, name='block_layer3',
365 |         data_format=data_format)
366 |     inputs = block_layer(
367 |         inputs=inputs, filters=512, block_fn=block_fn, blocks=layers[3],
368 |         strides=2, is_training=is_training, name='block_layer4',
369 |         data_format=data_format)
370 | 
371 |     inputs = batch_norm_relu(inputs, is_training, data_format)
372 |     inputs = tf.layers.average_pooling2d(
373 |         inputs=inputs, pool_size=7, strides=1, padding='VALID',
374 |         data_format=data_format)
375 |     inputs = tf.identity(inputs, 'final_avg_pool')
376 |     inputs = tf.reshape(inputs,
377 |                         [-1, 512 if block_fn is building_block else 2048])
378 |     inputs = tf.layers.dense(inputs=inputs, units=num_classes)
379 |     inputs = tf.identity(inputs, 'final_dense')
380 |     return inputs
381 | 
382 |   return model
383 | 
384 | 
385 | def imagenet_resnet_v2(resnet_size, num_classes, data_format=None):
386 |   """Returns the ResNet model for a given size and number of output classes."""
387 |   model_params = {
388 |       18: {'block': building_block, 'layers': [2, 2, 2, 2]},
389 |       34: {'block': building_block, 'layers': [3, 4, 6, 3]},
390 |       50: {'block': bottleneck_block, 'layers': [3, 4, 6, 3]},
391 |       101: {'block': bottleneck_block, 'layers': [3, 4, 23, 3]},
392 |       152: {'block': bottleneck_block, 'layers': [3, 8, 36, 3]},
393 |       200: {'block': bottleneck_block, 'layers': [3, 24, 36, 3]}
394 |   }
395 | 
396 |   if resnet_size not in model_params:
397 |     raise ValueError('Not a valid resnet_size:', resnet_size)
398 | 
399 |   params = model_params[resnet_size]
400 |   return imagenet_resnet_v2_generator(
401 |       params['block'], params['layers'], num_classes, data_format)
402 | 


--------------------------------------------------------------------------------
/src/cifar100_download_and_extract.py:
--------------------------------------------------------------------------------
 1 | # Copyright 2015 The TensorFlow Authors. All Rights Reserved.
 2 | #
 3 | # Licensed under the Apache License, Version 2.0 (the "License");
 4 | # you may not use this file except in compliance with the License.
 5 | # You may obtain a copy of the License at
 6 | #
 7 | #     http://www.apache.org/licenses/LICENSE-2.0
 8 | #
 9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 | 
16 | """Downloads and extracts the binary version of the CIFAR-10 dataset."""
17 | 
18 | from __future__ import absolute_import
19 | from __future__ import division
20 | from __future__ import print_function
21 | 
22 | import argparse
23 | import os
24 | import sys
25 | import tarfile
26 | 
27 | from six.moves import urllib
28 | import tensorflow as tf
29 | 
30 | DATA_URL = 'https://www.cs.toronto.edu/~kriz/cifar-100-binary.tar.gz'
31 | 
32 | parser = argparse.ArgumentParser()
33 | 
34 | parser.add_argument(
35 |     '--data_dir', type=str, default='../data/cifar100_data',
36 |     help='Directory to download data and extract the tarball')
37 | 
38 | 
39 | def main(unused_argv):
40 |   """Download and extract the tarball from Alex's website."""
41 |   if not os.path.exists(FLAGS.data_dir):
42 |     os.makedirs(FLAGS.data_dir)
43 | 
44 |   filename = DATA_URL.split('/')[-1]
45 |   filepath = os.path.join(FLAGS.data_dir, filename)
46 | 
47 |   if not os.path.exists(filepath):
48 |     def _progress(count, block_size, total_size):
49 |       sys.stdout.write('\r>> Downloading %s %.1f%%' % (
50 |           filename, 100.0 * count * block_size / total_size))
51 |       sys.stdout.flush()
52 | 
53 |     filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress)
54 |     print()
55 |     statinfo = os.stat(filepath)
56 |     print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
57 | 
58 |   tarfile.open(filepath, 'r:gz').extractall(FLAGS.data_dir)
59 | 
60 | 
61 | if __name__ == '__main__':
62 |   FLAGS, unparsed = parser.parse_known_args()
63 |   tf.app.run(argv=[sys.argv[0]] + unparsed)
64 | 


--------------------------------------------------------------------------------
/src/cifar100_test.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | 
 16 | from __future__ import absolute_import
 17 | from __future__ import division
 18 | from __future__ import print_function
 19 | 
 20 | from tempfile import mkstemp
 21 | 
 22 | import numpy as np
 23 | import tensorflow as tf
 24 | 
 25 | import main
 26 | 
 27 | tf.logging.set_verbosity(tf.logging.ERROR)
 28 | 
 29 | _BATCH_SIZE = 128
 30 | 
 31 | 
 32 | class BaseTest(tf.test.TestCase):
 33 | 
 34 |   def test_dataset_input_fn(self):
 35 |     fake_data = bytearray()
 36 |     fake_data.append(7)
 37 |     for i in range(3):
 38 |       for _ in range(1024):
 39 |         fake_data.append(i)
 40 | 
 41 |     _, filename = mkstemp(dir=self.get_temp_dir())
 42 |     data_file = open(filename, 'wb')
 43 |     data_file.write(fake_data)
 44 |     data_file.close()
 45 | 
 46 |     fake_dataset = cifar10_main.record_dataset(filename)
 47 |     fake_dataset = fake_dataset.map(cifar10_main.parse_record)
 48 |     image, label = fake_dataset.make_one_shot_iterator().get_next()
 49 | 
 50 |     self.assertEqual(label.get_shape().as_list(), [10])
 51 |     self.assertEqual(image.get_shape().as_list(), [32, 32, 3])
 52 | 
 53 |     with self.test_session() as sess:
 54 |       image, label = sess.run([image, label])
 55 | 
 56 |       self.assertAllEqual(label, np.array([int(i == 7) for i in range(10)]))
 57 | 
 58 |       for row in image:
 59 |         for pixel in row:
 60 |           self.assertAllEqual(pixel, np.array([0, 1, 2]))
 61 | 
 62 |   def input_fn(self):
 63 |     features = tf.random_uniform([_BATCH_SIZE, 32, 32, 3])
 64 |     labels = tf.random_uniform(
 65 |         [_BATCH_SIZE], maxval=9, dtype=tf.int32)
 66 |     return features, tf.one_hot(labels, 10)
 67 | 
 68 |   def cifar10_model_fn_helper(self, mode):
 69 |     features, labels = self.input_fn()
 70 |     spec = cifar10_main.cifar10_model_fn(
 71 |         features, labels, mode, {
 72 |             'resnet_size': 32,
 73 |             'data_format': 'channels_last',
 74 |             'batch_size': _BATCH_SIZE,
 75 |         })
 76 | 
 77 |     predictions = spec.predictions
 78 |     self.assertAllEqual(predictions['probabilities'].shape,
 79 |                         (_BATCH_SIZE, 10))
 80 |     self.assertEqual(predictions['probabilities'].dtype, tf.float32)
 81 |     self.assertAllEqual(predictions['classes'].shape, (_BATCH_SIZE,))
 82 |     self.assertEqual(predictions['classes'].dtype, tf.int64)
 83 | 
 84 |     if mode != tf.estimator.ModeKeys.PREDICT:
 85 |       loss = spec.loss
 86 |       self.assertAllEqual(loss.shape, ())
 87 |       self.assertEqual(loss.dtype, tf.float32)
 88 | 
 89 |     if mode == tf.estimator.ModeKeys.EVAL:
 90 |       eval_metric_ops = spec.eval_metric_ops
 91 |       self.assertAllEqual(eval_metric_ops['accuracy'][0].shape, ())
 92 |       self.assertAllEqual(eval_metric_ops['accuracy'][1].shape, ())
 93 |       self.assertEqual(eval_metric_ops['accuracy'][0].dtype, tf.float32)
 94 |       self.assertEqual(eval_metric_ops['accuracy'][1].dtype, tf.float32)
 95 | 
 96 |   def test_cifar10_model_fn_train_mode(self):
 97 |     self.cifar10_model_fn_helper(tf.estimator.ModeKeys.TRAIN)
 98 | 
 99 |   def test_cifar10_model_fn_eval_mode(self):
100 |     self.cifar10_model_fn_helper(tf.estimator.ModeKeys.EVAL)
101 | 
102 |   def test_cifar10_model_fn_predict_mode(self):
103 |     self.cifar10_model_fn_helper(tf.estimator.ModeKeys.PREDICT)
104 | 
105 | 
106 | if __name__ == '__main__':
107 |   tf.test.main()
108 | 


--------------------------------------------------------------------------------
/src/cifar100_train.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | """Runs a ResNet model on the CIFAR-100 dataset."""
 16 | 
 17 | from __future__ import absolute_import
 18 | from __future__ import division
 19 | from __future__ import print_function
 20 | 
 21 | import argparse
 22 | import os
 23 | import sys
 24 | 
 25 | import tensorflow as tf
 26 | 
 27 | import childnetwork as resnet_model
 28 | from rnn_controller import Network
 29 | from config import Config
 30 | from parser import Parser
 31 | 
 32 | parser = argparse.ArgumentParser()
 33 | 
 34 | # Basic model parameters.
 35 | parser.add_argument('--data_dir', type=str, default='../data/cifar100_data',
 36 |                     help='The path to the CIFAR-10 data directory.')
 37 | 
 38 | parser.add_argument('--model_dir', type=str, default='/tmp/cifar100_model',
 39 |                     help='The directory where the model will be stored.')
 40 | 
 41 | parser.add_argument('--resnet_size', type=int, default=164,
 42 |                     help='The size of the ResNet model to use.')
 43 | 
 44 | parser.add_argument('--train_epochs', type=int, default=250,
 45 |                     help='The number of epochs to train.')
 46 | 
 47 | parser.add_argument('--epochs_per_eval', type=int, default=10,
 48 |                     help='The number of epochs to run in between evaluations.')
 49 | 
 50 | parser.add_argument('--batch_size', type=int, default=5,
 51 |                     help='The number of images per batch.')
 52 | 
 53 | parser.add_argument(
 54 |     '--data_format', type=str, default=None,
 55 |     choices=['channels_first', 'channels_last'],
 56 |     help='A flag to override the data format used in the model. channels_first '
 57 |          'provides a performance boost on GPU but is not always compatible '
 58 |          'with CPU. If left unspecified, the data format will be chosen '
 59 |          'automatically based on whether TensorFlow was built for CPU or GPU.')
 60 | 
 61 | _HEIGHT = 32
 62 | _WIDTH = 32
 63 | _DEPTH = 3
 64 | _NUM_CLASSES = 10
 65 | _NUM_DATA_FILES = 5
 66 | 
 67 | # We use a weight decay of 0.0002, which performs better than the 0.0001 that
 68 | # was originally suggested.
 69 | _WEIGHT_DECAY = 2e-4
 70 | _MOMENTUM = 0.9
 71 | 
 72 | _NUM_IMAGES = {
 73 |     'train': 50000,
 74 |     'validation': 10000,
 75 | }
 76 | 
 77 | 
 78 | def record_dataset(filenames):
 79 |   """Returns an input pipeline Dataset from `filenames`."""
 80 |   record_bytes = _HEIGHT * _WIDTH * _DEPTH + 1
 81 |   return tf.data.FixedLengthRecordDataset(filenames, record_bytes)
 82 | 
 83 | 
 84 | def get_filenames(is_training, data_dir):
 85 |   """Returns a list of filenames."""
 86 |   data_dir = os.path.join(data_dir, 'cifar-10-batches-bin')
 87 | 
 88 |   assert os.path.exists(data_dir), (
 89 |       'Run cifar10_download_and_extract.py first to download and extract the '
 90 |       'CIFAR-10 data.')
 91 | 
 92 |   if is_training:
 93 |     return [
 94 |         os.path.join(data_dir, 'data_batch_%d.bin' % i)
 95 |         for i in range(1, _NUM_DATA_FILES + 1)
 96 |     ]
 97 |   else:
 98 |     return [os.path.join(data_dir, 'test_batch.bin')]
 99 | 
100 | 
101 | def parse_record(raw_record):
102 |   """Parse CIFAR-10 image and label from a raw record."""
103 |   # Every record consists of a label followed by the image, with a fixed number
104 |   # of bytes for each.
105 |   label_bytes = 1
106 |   image_bytes = _HEIGHT * _WIDTH * _DEPTH
107 |   record_bytes = label_bytes + image_bytes
108 | 
109 |   # Convert bytes to a vector of uint8 that is record_bytes long.
110 |   record_vector = tf.decode_raw(raw_record, tf.uint8)
111 | 
112 |   # The first byte represents the label, which we convert from uint8 to int32
113 |   # and then to one-hot.
114 |   label = tf.cast(record_vector[0], tf.int32)
115 |   label = tf.one_hot(label, _NUM_CLASSES)
116 | 
117 |   # The remaining bytes after the label represent the image, which we reshape
118 |   # from [depth * height * width] to [depth, height, width].
119 |   depth_major = tf.reshape(
120 |       record_vector[label_bytes:record_bytes], [_DEPTH, _HEIGHT, _WIDTH])
121 | 
122 |   # Convert from [depth, height, width] to [height, width, depth], and cast as
123 |   # float32.
124 |   image = tf.cast(tf.transpose(depth_major, [1, 2, 0]), tf.float32)
125 | 
126 |   return image, label
127 | 
128 | 
129 | def preprocess_image(image, is_training):
130 |   """Preprocess a single image of layout [height, width, depth]."""
131 |   if is_training:
132 |     # Resize the image to add four extra pixels on each side.
133 |     image = tf.image.resize_image_with_crop_or_pad(
134 |         image, _HEIGHT + 8, _WIDTH + 8)
135 | 
136 |     # Randomly crop a [_HEIGHT, _WIDTH] section of the image.
137 |     image = tf.random_crop(image, [_HEIGHT, _WIDTH, _DEPTH])
138 | 
139 |     # Randomly flip the image horizontally.
140 |     image = tf.image.random_flip_left_right(image)
141 | 
142 |   # Subtract off the mean and divide by the variance of the pixels.
143 |   image = tf.image.per_image_standardization(image)
144 |   return image
145 | 
146 | 
147 | def input_fn(is_training, data_dir, batch_size, num_epochs=1):
148 |   """Input_fn using the tf.data input pipeline for CIFAR-10 dataset.
149 | 
150 |   Args:
151 |     is_training: A boolean denoting whether the input is for training.
152 |     data_dir: The directory containing the input data.
153 |     batch_size: The number of samples per batch.
154 |     num_epochs: The number of epochs to repeat the dataset.
155 | 
156 |   Returns:
157 |     A tuple of images and labels.
158 |   """
159 |   dataset = record_dataset(get_filenames(is_training, data_dir))
160 | 
161 |   if is_training:
162 |     # When choosing shuffle buffer sizes, larger sizes result in better
163 |     # randomness, while smaller sizes have better performance. Because CIFAR-10
164 |     # is a relatively small dataset, we choose to shuffle the full epoch.
165 |     dataset = dataset.shuffle(buffer_size=_NUM_IMAGES['train'])
166 | 
167 |   dataset = dataset.map(parse_record)
168 |   dataset = dataset.map(
169 |       lambda image, label: (preprocess_image(image, is_training), label))
170 | 
171 |   dataset = dataset.prefetch(2 * batch_size)
172 | 
173 |   # We call repeat after shuffling, rather than before, to prevent separate
174 |   # epochs from blending together.
175 |   dataset = dataset.repeat(num_epochs)
176 | 
177 |   # Batch results by up to batch_size, and then fetch the tuple from the
178 |   # iterator.
179 |   dataset = dataset.batch(batch_size)
180 |   iterator = dataset.make_one_shot_iterator()
181 |   images, labels = iterator.get_next()
182 | 
183 |   return images, labels
184 | 
185 | 
186 | def cifar10_model_fn(features, labels, mode, params):
187 |   """Model function for CIFAR-10."""
188 |   tf.summary.image('images', features, max_outputs=6)
189 | 
190 |   network = resnet_model.cifar10_resnet_v2_generator(
191 |           params['resnet_size'], _NUM_CLASSES, params['data_format'])
192 | 
193 |   inputs = tf.reshape(features, [-1, _HEIGHT, _WIDTH, _DEPTH])
194 |   logits = network(inputs, mode == tf.estimator.ModeKeys.TRAIN)
195 | 
196 |   predictions = {
197 |       'classes': tf.argmax(logits, axis=1),
198 |       'probabilities': tf.nn.softmax(logits, name='softmax_tensor')
199 |   }
200 | 
201 |   if mode == tf.estimator.ModeKeys.PREDICT:
202 |     return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
203 | 
204 |   # Calculate loss, which includes softmax cross entropy and L2 regularization.
205 |   cross_entropy = tf.losses.softmax_cross_entropy(
206 |       logits=logits, onehot_labels=labels)
207 | 
208 |   # Create a tensor named cross_entropy for logging purposes.
209 |   tf.identity(cross_entropy, name='cross_entropy')
210 |   tf.summary.scalar('cross_entropy', cross_entropy)
211 | 
212 |   # Add weight decay to the loss.
213 |   loss = cross_entropy + _WEIGHT_DECAY * tf.add_n(
214 |       [tf.nn.l2_loss(v) for v in tf.trainable_variables()])
215 | 
216 |   if mode == tf.estimator.ModeKeys.TRAIN:
217 |     # Scale the learning rate linearly with the batch size. When the batch size
218 |     # is 128, the learning rate should be 0.1.
219 |     initial_learning_rate = 0.1 * params['batch_size'] / 128
220 |     batches_per_epoch = _NUM_IMAGES['train'] / params['batch_size']
221 |     global_step = tf.train.get_or_create_global_step()
222 | 
223 |     # Multiply the learning rate by 0.1 at 100, 150, and 200 epochs.
224 |     boundaries = [int(batches_per_epoch * epoch) for epoch in [100, 150, 200]]
225 |     values = [initial_learning_rate * decay for decay in [1, 0.1, 0.01, 0.001]]
226 |     learning_rate = tf.train.piecewise_constant(
227 |         tf.cast(global_step, tf.int32), boundaries, values)
228 | 
229 |     # Create a tensor named learning_rate for logging purposes
230 |     tf.identity(learning_rate, name='learning_rate')
231 |     tf.summary.scalar('learning_rate', learning_rate)
232 | 
233 |     optimizer = tf.train.MomentumOptimizer(
234 |         learning_rate=learning_rate,
235 |         momentum=_MOMENTUM)
236 | 
237 |     # Batch norm requires update ops to be added as a dependency to the train_op
238 |     update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
239 |     with tf.control_dependencies(update_ops):
240 |       train_op = optimizer.minimize(loss, global_step)
241 |   else:
242 |     train_op = None
243 | 
244 |   accuracy = tf.metrics.accuracy(
245 |       tf.argmax(labels, axis=1), predictions['classes'])
246 |   metrics = {'accuracy': accuracy}
247 | 
248 |   # Create a tensor named train_accuracy for logging purposes
249 |   tf.identity(accuracy[1], name='train_accuracy')
250 |   tf.summary.scalar('train_accuracy', accuracy[1])
251 | 
252 |   return tf.estimator.EstimatorSpec(
253 |       mode=mode,
254 |       predictions=predictions,
255 |       loss=loss,
256 |       train_op=train_op,
257 |       eval_metric_ops=metrics)
258 | 
259 | 
260 | def main(unused_argv):
261 |   # Using the Winograd non-fused algorithms provides a small performance boost.
262 |   os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'
263 |   os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
264 | 
265 |   allow_soft_placement=True,log_device_placement=True))
266 |   cifar_classifier = tf.estimator.Estimator(
267 |   model_fn=cifar10_model_fn, model_dir=FLAGS.model_dir, config=run_config,
268 |   params={
269 |     'resnet_size': FLAGS.resnet_size,
270 |     'data_format': FLAGS.data_format,
271 |     'batch_size': FLAGS.batch_size,
272 |   })
273 | 
274 |   # FLAGS.train_epochs // FLAGS.epochs_per_eval
275 |   for _ in range(FLAGS.train_epochs):
276 |     tensors_to_log = {
277 |         'learning_rate': 'learning_rate',
278 |         'cross_entropy': 'cross_entropy',
279 |         'train_accuracy': 'train_accuracy'
280 |     }
281 | 
282 |     logging_hook = tf.train.LoggingTensorHook(
283 |         tensors=tensors_to_log, every_n_iter=100)
284 | 
285 |     # cifar_classifier.train(
286 |     #     input_fn=lambda: input_fn(
287 |     #         True, FLAGS.data_dir, FLAGS.batch_size, FLAGS.epochs_per_eval),
288 |     #     hooks=[logging_hook])
289 |     cifar_classifier.train(
290 |         input_fn=lambda: input_fn(
291 |             True, FLAGS.data_dir, FLAGS.batch_size, FLAGS.epochs_per_eval))
292 | 
293 |     # Evaluate the model and print results
294 |     eval_results = cifar_classifier.evaluate(
295 |         input_fn=lambda: input_fn(False, FLAGS.data_dir, FLAGS.batch_size))
296 |     print(eval_results)
297 | 
298 | 
299 | 
300 | if __name__ == '__main__':
301 |   tf.logging.set_verbosity(tf.logging.INFO)
302 |   FLAGS, unparsed = parser.parse_known_args()
303 |   tf.app.run(argv=[sys.argv[0]] + unparsed)
304 | 


--------------------------------------------------------------------------------
/src/cifar10_download_and_extract.py:
--------------------------------------------------------------------------------
 1 | # Copyright 2015 The TensorFlow Authors. All Rights Reserved.
 2 | #
 3 | # Licensed under the Apache License, Version 2.0 (the "License");
 4 | # you may not use this file except in compliance with the License.
 5 | # You may obtain a copy of the License at
 6 | #
 7 | #     http://www.apache.org/licenses/LICENSE-2.0
 8 | #
 9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 | 
16 | """Downloads and extracts the binary version of the CIFAR-10 dataset."""
17 | 
18 | from __future__ import absolute_import
19 | from __future__ import division
20 | from __future__ import print_function
21 | 
22 | import argparse
23 | import os
24 | import sys
25 | import tarfile
26 | 
27 | from six.moves import urllib
28 | import tensorflow as tf
29 | 
30 | DATA_URL = 'https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz'
31 | 
32 | parser = argparse.ArgumentParser()
33 | 
34 | parser.add_argument(
35 |     '--data_dir', type=str, default='../data/cifar10_data',
36 |     help='Directory to download data and extract the tarball')
37 | 
38 | 
39 | def main(unused_argv):
40 |   """Download and extract the tarball from Alex's website."""
41 |   if not os.path.exists(FLAGS.data_dir):
42 |     os.makedirs(FLAGS.data_dir)
43 | 
44 |   filename = DATA_URL.split('/')[-1]
45 |   filepath = os.path.join(FLAGS.data_dir, filename)
46 | 
47 |   if not os.path.exists(filepath):
48 |     def _progress(count, block_size, total_size):
49 |       sys.stdout.write('\r>> Downloading %s %.1f%%' % (
50 |           filename, 100.0 * count * block_size / total_size))
51 |       sys.stdout.flush()
52 | 
53 |     filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress)
54 |     print()
55 |     statinfo = os.stat(filepath)
56 |     print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
57 | 
58 |   tarfile.open(filepath, 'r:gz').extractall(FLAGS.data_dir)
59 | 
60 | 
61 | if __name__ == '__main__':
62 |   FLAGS, unparsed = parser.parse_known_args()
63 |   tf.app.run(argv=[sys.argv[0]] + unparsed)
64 | 


--------------------------------------------------------------------------------
/src/config.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import utils
 3 | import tensorflow as tf
 4 | 
 5 | class Config(object):
 6 |     def __init__(self, args):
 7 |         self.codebase_root_path = args.path
 8 |         self.folder_suffix = args.folder_suffix
 9 |         self.project_name = args.project
10 |         self.dataset_name = args.dataset
11 |         self.batch_size = args.batch_size
12 |         self.max_epochs = args.max_epochs
13 |         self.num_classes = args.num_classes
14 |         self.hyperparams = args.hyperparams
15 |         self.load = args.load
16 |         self.debug = args.debug
17 |         class Solver(object):
18 |             def __init__(self, t_args):
19 |                 self.learning_rate = t_args.lr
20 |                 self.dropout = t_args.dropout
21 |                 if t_args.opt.lower() not in ["adam", "rmsprop", "sgd", "normal"]:
22 |                     raise ValueError('Undefined type of optmizer')
23 |                 else:
24 |                     self.optimizer = {"adam": tf.train.AdamOptimizer, "rmsprop": tf.train.RMSPropOptimizer, "sgd": tf.train.GradientDescentOptimizer, "normal": tf.train.Optimizer}[t_args.opt.lower()](self.learning_rate)
25 | 
26 |         self.solver = Solver(args)
27 |         self.project_path, self.project_prefix_path, self.dataset_path, self.train_path, self.test_path, self.ckptdir_path = self.set_paths()
28 | 
29 |     def set_paths(self):
30 |         project_path = utils.path_exists(self.codebase_root_path)
31 |         project_prefix_path = "" #utils.path_exists(os.path.join(self.codebase_root_path, self.project_name, self.folder_suffix))
32 |         dataset_path = utils.path_exists(os.path.join(self.codebase_root_path, "data", self.dataset_name))
33 |         ckptdir_path = utils.path_exists(os.path.join(self.codebase_root_path, "checkpoint"))
34 |         train_path = os.path.join(dataset_path, "data_batch_")
35 |         test_path = os.path.join(dataset_path, "test_batch")
36 | 
37 |         return project_path, project_prefix_path, dataset_path, train_path, test_path, ckptdir_path
38 | 


--------------------------------------------------------------------------------
/src/dataset.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import utils
 3 | import pickle
 4 | import numpy as np
 5 | import matplotlib.pyplot as plt
 6 | import matplotlib.image as mpimg
 7 | 
 8 | class DataSet(object):
 9 |     def __init__(self, config):
10 |         self.config = config
11 |         self.batch_count = 1
12 | 
13 |     def load_data(self, file_name):
14 |         with open(file_name, 'rb') as file:
15 |             unpickler = pickle._Unpickler(file)
16 |             unpickler.encoding = 'latin1'
17 |             contents = unpickler.load()
18 |             X, Y = np.asarray(contents['data'], dtype=np.float32), np.asarray(contents['labels'])
19 |             one_hot = np.zeros((Y.size, Y.max() + 1))
20 |             one_hot[np.arange(Y.size), Y] = 1
21 |             return X, one_hot
22 | 
23 |     def get_batch(self, type_):
24 |         if type_ == "test":
25 |             return self.load_data(self.config.test_path)
26 |         elif type_ == "train": 
27 |             self.batch_count += 1
28 |             return self.load_data(self.config.train_path + str(self.batch_count))
29 |         elif type_ == "validation":
30 |             return self.load_data(self.config.train_path + "5")
31 | 
32 |     def next_batch(self, type_):
33 |         if self.batch_count > 4:
34 |             self.batch_count = 1
35 |         X, Y = self.get_batch(type_)
36 |         start, batch_size, tot = 0, self.config.batch_size, len(X)
37 |         total = int(tot/ batch_size) # fix the last batch
38 |         while start < total:
39 |             end = start + batch_size
40 |             x = X[start : end, :]
41 |             y = Y[start : end, :]
42 |             start += 1
43 |             yield (x, y, int(total))


--------------------------------------------------------------------------------
/src/img/Rnn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/img/Rnn.png


--------------------------------------------------------------------------------
/src/img/activationfunctions.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/img/activationfunctions.png


--------------------------------------------------------------------------------
/src/img/graph.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/img/graph.png


--------------------------------------------------------------------------------
/src/img/loss_rmsprop.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/img/loss_rmsprop.png


--------------------------------------------------------------------------------
/src/img/nas.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/img/nas.jpeg


--------------------------------------------------------------------------------
/src/main.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2017 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | """Runs a ResNet model on the CIFAR-10 dataset."""
 16 | 
 17 | from __future__ import absolute_import
 18 | from __future__ import division
 19 | from __future__ import print_function
 20 | 
 21 | import argparse
 22 | import os
 23 | import sys
 24 | 
 25 | import tensorflow as tf
 26 | 
 27 | import childnetwork as resnet_model
 28 | from rnn_controller import Network
 29 | from config import Config
 30 | from parser import Parser
 31 | 
 32 | parser = argparse.ArgumentParser()
 33 | 
 34 | # Basic model parameters.
 35 | parser.add_argument('--data_dir', type=str, default='../data/cifar10_data',
 36 |                     help='The path to the CIFAR-10 data directory.')
 37 | 
 38 | parser.add_argument('--model_dir', type=str, default='/tmp/cifar10_model',
 39 |                     help='The directory where the model will be stored.')
 40 | 
 41 | parser.add_argument('--resnet_size', type=int, default=20,
 42 |                     help='The size of the ResNet model to use. We set as 20 by default following the paper')
 43 | 
 44 | parser.add_argument('--train_epochs', type=int, default=100,
 45 |                     help='The number of epochs to train the RNN controller to generate the Activation function')
 46 | 
 47 | parser.add_argument('--epochs_per_eval', type=int, default=10,
 48 |                     help='The number of epochs to run in between evaluations.')
 49 | 
 50 | parser.add_argument('--batch_size', type=int, default=5,
 51 |                     help='The number of images per batch.')
 52 | 
 53 | parser.add_argument(
 54 |     '--data_format', type=str, default=None,
 55 |     choices=['channels_first', 'channels_last'],
 56 |     help='A flag to override the data format used in the model. channels_first '
 57 |          'provides a performance boost on GPU but is not always compatible '
 58 |          'with CPU. If left unspecified, the data format will be chosen '
 59 |          'automatically based on whether TensorFlow was built for CPU or GPU.')
 60 | 
 61 | _HEIGHT = 32
 62 | _WIDTH = 32
 63 | _DEPTH = 3
 64 | _NUM_CLASSES = 10
 65 | _NUM_DATA_FILES = 5
 66 | 
 67 | # We use a weight decay of 0.0002, which performs better than the 0.0001 that
 68 | # was originally suggested.
 69 | _WEIGHT_DECAY = 2e-4
 70 | _MOMENTUM = 0.9
 71 | 
 72 | _NUM_IMAGES = {
 73 |     'train': 50000,
 74 |     'validation': 10000,
 75 | }
 76 | 
 77 | 
 78 | def record_dataset(filenames):
 79 |   """Returns an input pipeline Dataset from `filenames`."""
 80 |   record_bytes = _HEIGHT * _WIDTH * _DEPTH + 1
 81 |   return tf.data.FixedLengthRecordDataset(filenames, record_bytes)
 82 | 
 83 | 
 84 | def get_filenames(is_training, data_dir):
 85 |   """Returns a list of filenames."""
 86 |   data_dir = os.path.join(data_dir, 'cifar-10-batches-bin')
 87 | 
 88 |   assert os.path.exists(data_dir), (
 89 |       'Run cifar10_download_and_extract.py first to download and extract the '
 90 |       'CIFAR-10 data.')
 91 | 
 92 |   if is_training:
 93 |     return [
 94 |         os.path.join(data_dir, 'data_batch_%d.bin' % i)
 95 |         for i in range(1, _NUM_DATA_FILES + 1)
 96 |     ]
 97 |   else:
 98 |     return [os.path.join(data_dir, 'test_batch.bin')]
 99 | 
100 | 
101 | def parse_record(raw_record):
102 |   """Parse CIFAR-10 image and label from a raw record."""
103 |   # Every record consists of a label followed by the image, with a fixed number
104 |   # of bytes for each.
105 |   label_bytes = 1
106 |   image_bytes = _HEIGHT * _WIDTH * _DEPTH
107 |   record_bytes = label_bytes + image_bytes
108 | 
109 |   # Convert bytes to a vector of uint8 that is record_bytes long.
110 |   record_vector = tf.decode_raw(raw_record, tf.uint8)
111 | 
112 |   # The first byte represents the label, which we convert from uint8 to int32
113 |   # and then to one-hot.
114 |   label = tf.cast(record_vector[0], tf.int32)
115 |   label = tf.one_hot(label, _NUM_CLASSES)
116 | 
117 |   # The remaining bytes after the label represent the image, which we reshape
118 |   # from [depth * height * width] to [depth, height, width].
119 |   depth_major = tf.reshape(
120 |       record_vector[label_bytes:record_bytes], [_DEPTH, _HEIGHT, _WIDTH])
121 | 
122 |   # Convert from [depth, height, width] to [height, width, depth], and cast as
123 |   # float32.
124 |   image = tf.cast(tf.transpose(depth_major, [1, 2, 0]), tf.float32)
125 | 
126 |   return image, label
127 | 
128 | 
129 | def preprocess_image(image, is_training):
130 |   """Preprocess a single image of layout [height, width, depth]."""
131 |   if is_training:
132 |     # Resize the image to add four extra pixels on each side.
133 |     image = tf.image.resize_image_with_crop_or_pad(
134 |         image, _HEIGHT + 8, _WIDTH + 8)
135 | 
136 |     # Randomly crop a [_HEIGHT, _WIDTH] section of the image.
137 |     image = tf.random_crop(image, [_HEIGHT, _WIDTH, _DEPTH])
138 | 
139 |     # Randomly flip the image horizontally.
140 |     image = tf.image.random_flip_left_right(image)
141 | 
142 |   # Subtract off the mean and divide by the variance of the pixels.
143 |   image = tf.image.per_image_standardization(image)
144 |   return image
145 | 
146 | 
147 | def input_fn(is_training, data_dir, batch_size, num_epochs=1):
148 |   """Input_fn using the tf.data input pipeline for CIFAR-10 dataset.
149 | 
150 |   Args:
151 |     is_training: A boolean denoting whether the input is for training.
152 |     data_dir: The directory containing the input data.
153 |     batch_size: The number of samples per batch.
154 |     num_epochs: The number of epochs to repeat the dataset.
155 | 
156 |   Returns:
157 |     A tuple of images and labels.
158 |   """
159 |   dataset = record_dataset(get_filenames(is_training, data_dir))
160 | 
161 |   if is_training:
162 |     # When choosing shuffle buffer sizes, larger sizes result in better
163 |     # randomness, while smaller sizes have better performance. Because CIFAR-10
164 |     # is a relatively small dataset, we choose to shuffle the full epoch.
165 |     dataset = dataset.shuffle(buffer_size=_NUM_IMAGES['train'])
166 | 
167 |   dataset = dataset.map(parse_record)
168 |   dataset = dataset.map(
169 |       lambda image, label: (preprocess_image(image, is_training), label))
170 | 
171 |   dataset = dataset.prefetch(2 * batch_size)
172 | 
173 |   # We call repeat after shuffling, rather than before, to prevent separate
174 |   # epochs from blending together.
175 |   dataset = dataset.repeat(num_epochs)
176 | 
177 |   # Batch results by up to batch_size, and then fetch the tuple from the
178 |   # iterator.
179 |   dataset = dataset.batch(batch_size)
180 |   iterator = dataset.make_one_shot_iterator()
181 |   images, labels = iterator.get_next()
182 | 
183 |   return images, labels
184 | 
185 | 
186 | def cifar10_model_fn(features, labels, mode, params):
187 |   """Model function for CIFAR-10."""
188 |   tf.summary.image('images', features, max_outputs=6)
189 | 
190 |   network = resnet_model.cifar10_resnet_v2_generator(
191 |           params['resnet_size'], _NUM_CLASSES, params['data_format'])
192 | 
193 |   inputs = tf.reshape(features, [-1, _HEIGHT, _WIDTH, _DEPTH])
194 |   logits = network(inputs, mode == tf.estimator.ModeKeys.TRAIN)
195 | 
196 |   predictions = {
197 |       'classes': tf.argmax(logits, axis=1),
198 |       'probabilities': tf.nn.softmax(logits, name='softmax_tensor')
199 |   }
200 | 
201 |   if mode == tf.estimator.ModeKeys.PREDICT:
202 |     return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
203 | 
204 |   # Calculate loss, which includes softmax cross entropy and L2 regularization.
205 |   cross_entropy = tf.losses.softmax_cross_entropy(
206 |       logits=logits, onehot_labels=labels)
207 | 
208 |   # Create a tensor named cross_entropy for logging purposes.
209 |   tf.identity(cross_entropy, name='cross_entropy')
210 |   tf.summary.scalar('cross_entropy', cross_entropy)
211 | 
212 |   # Add weight decay to the loss.
213 |   loss = cross_entropy + _WEIGHT_DECAY * tf.add_n(
214 |       [tf.nn.l2_loss(v) for v in tf.trainable_variables()])
215 | 
216 |   if mode == tf.estimator.ModeKeys.TRAIN:
217 |     # Scale the learning rate linearly with the batch size. When the batch size
218 |     # is 128, the learning rate should be 0.1.
219 |     initial_learning_rate = 0.1 * params['batch_size'] / 128
220 |     batches_per_epoch = _NUM_IMAGES['train'] / params['batch_size']
221 |     global_step = tf.train.get_or_create_global_step()
222 | 
223 |     # Multiply the learning rate by 0.1 at 100, 150, and 200 epochs.
224 |     boundaries = [int(batches_per_epoch * epoch) for epoch in [100, 150, 200]]
225 |     values = [initial_learning_rate * decay for decay in [1, 0.1, 0.01, 0.001]]
226 |     learning_rate = tf.train.piecewise_constant(
227 |         tf.cast(global_step, tf.int32), boundaries, values)
228 | 
229 |     # Create a tensor named learning_rate for logging purposes
230 |     tf.identity(learning_rate, name='learning_rate')
231 |     tf.summary.scalar('learning_rate', learning_rate)
232 | 
233 |     optimizer = tf.train.MomentumOptimizer(
234 |         learning_rate=learning_rate,
235 |         momentum=_MOMENTUM)
236 | 
237 |     # Batch norm requires update ops to be added as a dependency to the train_op
238 |     update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
239 |     with tf.control_dependencies(update_ops):
240 |       train_op = optimizer.minimize(loss, global_step)
241 |   else:
242 |     train_op = None
243 | 
244 |   accuracy = tf.metrics.accuracy(
245 |       tf.argmax(labels, axis=1), predictions['classes'])
246 |   metrics = {'accuracy': accuracy}
247 | 
248 |   # Create a tensor named train_accuracy for logging purposes
249 |   tf.identity(accuracy[1], name='train_accuracy')
250 |   tf.summary.scalar('train_accuracy', accuracy[1])
251 | 
252 |   return tf.estimator.EstimatorSpec(
253 |       mode=mode,
254 |       predictions=predictions,
255 |       loss=loss,
256 |       train_op=train_op,
257 |       eval_metric_ops=metrics)
258 | 
259 | 
260 | def main(unused_argv):
261 |   # Using the Winograd non-fused algorithms provides a small performance boost.
262 |   os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'
263 |   os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
264 |   gamma = 0.95
265 |   #RNN controller
266 |   args = Parser().get_parser().parse_args()
267 |   #Defining rnn
268 |   val_accuracy = tf.placeholder(tf.float32)
269 |   config = Config(args)
270 |   net = Network(config)
271 | 
272 |   #Generate hyperparams
273 |   A_t = tf.zeros((1,1))
274 |   # PPO implementation
275 |   for i in range(FLAGS.train_epochs):
276 |       outputs,prob,value = net.neural_search()
277 |       hyperparams = net.gen_hyperparams(outputs)
278 |       tf.assert_rank_at_least(tf.convert_to_tensor(prob),1,message="prob is the fucking problem")
279 |       c_1=1
280 |       c_2=0.01
281 |       if i >0 :
282 |           #Polciy ratio
283 |           #We write it in this tf.exp(tf.log(prob) - tf.log(old_prob)) instead of prob/old_prob
284 |           #To improve numberical stability
285 |           r = tf.exp(tf.log(prob) - tf.log(old_prob))
286 |           #Encforcing the bellman equation
287 |           delta_t = eval_results["accuracy"] + gamma*value - old_value
288 |           A_t = delta_t + gamma*A_t
289 |           L_clip = net.Lclip(eval_results["accuracy"],A_t)
290 |           L_vf = net.Lvf(delta_t)
291 |           entropy_penalty = -tf.reduce_sum(tf.exp(tf.add(tf.log(prob),tf.log(tf.log(tf.clip_by_value(prob, 1e-10, 1.0))))))
292 |           tf.assert_rank(L_clip,0,message="L_clip is computed wrongly, wrong rank")
293 |           tf.assert_rank(L_vf,0,message="L_vf is computed wrongly, wrong rank")
294 |           tf.assert_rank(entropy_penalty,0,message="entropy_penalty is computed wrongly, wrong rank")
295 |           total_loss = L_clip - c_1*L_vf + c_2 * entropy_penalty
296 |           tf.summary.scalar('loss',total_loss)
297 | 
298 |       tf_config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)
299 |       tf_config.gpu_options.allow_growth = True
300 |       sess = tf.Session(config=tf_config)
301 |       sess.run(tf.global_variables_initializer())
302 |       sess.run(tf.local_variables_initializer())
303 |       merged = tf.summary.merge_all()
304 |       train_writer = tf.summary.FileWriter('train',sess.graph)
305 | 
306 |       # Set up a RunConfig to only save checkpoints once per training cycle.
307 |       #run_config = tf.estimator.RunConfig().replace(session_config=tf.ConfigProto(log_device_placement=True),save_checkpoints_secs=1e9)
308 |       print(sess.run(hyperparams))
309 |       print(sess.run(value))
310 |       #tmp is a temporary file which stores the encoded activation function,
311 |       # it is used by main.py to pass the activation function to the childnetwork which reads from the file as the the program is being run.
312 |       # It also acts as a cache file to store the final activation function found the agorthim 
313 |       with open("tmp","w") as f:
314 |           f.write(' '.join(map(str,sess.run(hyperparams))))
315 | 
316 |       run_config = tf.estimator.RunConfig().replace(session_config=tf.ConfigProto(allow_soft_placement=True,log_device_placement=True))
317 |       cifar_classifier = tf.estimator.Estimator(
318 |       model_fn=cifar10_model_fn, model_dir=FLAGS.model_dir, config=run_config,
319 |       params={
320 |         'resnet_size': FLAGS.resnet_size,
321 |         'data_format': FLAGS.data_format,
322 |         'batch_size': FLAGS.batch_size,
323 |       })
324 | 
325 |       for _ in range(FLAGS.train_epochs // FLAGS.epochs_per_eval):
326 |         tensors_to_log = {
327 |             'learning_rate': 'learning_rate',
328 |             'cross_entropy': 'cross_entropy',
329 |             'train_accuracy': 'train_accuracy'
330 |         }
331 | 
332 |         logging_hook = tf.train.LoggingTensorHook(
333 |             tensors=tensors_to_log, every_n_iter=100)
334 | 
335 |         cifar_classifier.train(
336 |             input_fn=lambda: input_fn(
337 |                 True, FLAGS.data_dir, FLAGS.batch_size, FLAGS.epochs_per_eval))
338 | 
339 |         # Evaluate the model and print results
340 |         eval_results = cifar_classifier.evaluate(
341 |             input_fn=lambda: input_fn(False, FLAGS.data_dir, FLAGS.batch_size))
342 |         print(eval_results)
343 | 
344 |         old_prob = tf.identity(prob)
345 |         old_value = tf.identity(value)
346 | 
347 |         if i >0 :
348 |           print("Training RNN")
349 |           tr_cont_step = net.update(total_loss)
350 |           sess.run(tf.global_variables_initializer())
351 |           _ = sess.run(tr_cont_step, feed_dict={val_accuracy : eval_results["accuracy"]})
352 |           print("RNN Trained")
353 |   assert A_t !=tf.zeros((1,1)),  "Advantage function was not computed correctly"
354 | 
355 | 
356 | if __name__ == '__main__':
357 |   tf.logging.set_verbosity(tf.logging.INFO)
358 |   FLAGS, unparsed = parser.parse_known_args()
359 |   tf.app.run(argv=[sys.argv[0]] + unparsed)
360 | 


--------------------------------------------------------------------------------
/src/parser.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | 
 3 | class Parser(object):
 4 |     def __init__(self):
 5 |         parser = argparse.ArgumentParser()
 6 |         parser.add_argument("--path", default="../", help="Base Path for the Folder")
 7 |         parser.add_argument("--project", default="NASuRL", help="Project Folder")
 8 |         parser.add_argument("--folder_suffix", default="Default", help="Folder Name Suffix")
 9 |         parser.add_argument("--dataset", default="cifar-10", help="Name of the Dataset")
10 |         parser.add_argument("--num_classes", default=10, type=int, help="Number of Classes")
11 |         parser.add_argument("--opt", default="sgd", help="Optimizer : adam, rmsprop, sgd, normal")
12 |         parser.add_argument("--hyperparams", default=10, help="Number of Hyperparameters to search")
13 |         parser.add_argument("--lr", default=0.1, help="Learning Rate", type=float)
14 |         parser.add_argument("--batch_size", default=75, help="Batch Size", type=int)
15 |         parser.add_argument("--dropout", default=0.5, help="Dropout Probab. for Pre-Final Layer", type=float)
16 |         parser.add_argument("--max_epochs", default=100, help="Maximum Number of Epochs", type=int)
17 |         parser.add_argument("--debug", default=False, type=self.str_to_bool, help="Debug Mode")
18 |         parser.add_argument("--load", default=False, type=self.str_to_bool, help="Load Model to calculate accuracy")
19 |         self.parser=parser
20 | 
21 |     def str_to_bool(self, string):
22 |         if string.lower() == "true":
23 |             return True
24 |         elif string.lower() == "false":
25 |             return False
26 |         else :
27 |             return argparse.ArgumentTypeError("Boolean Value Expected")
28 | 
29 |     def get_parser(self):
30 |         return self.parser


--------------------------------------------------------------------------------
/src/rnn_controller.py:
--------------------------------------------------------------------------------
  1 | #RNN controller code
  2 | 
  3 | import os
  4 | import sys
  5 | import utils
  6 | import numpy as np
  7 | import tensorflow as tf
  8 | 
  9 | class Network(object):
 10 |     # My Concern is that some of these activation function might be numerically unstable due to the implementation
 11 |     # tf.log(1+exp(x)) is one of these things
 12 | 
 13 |     def __init__(self, config):
 14 |         self.config = config
 15 |         self.n_steps = 10
 16 |         self.n_input, self.n_hidden =  4, 2
 17 |         self.state = tf.Variable(tf.random_normal(shape=[1, 4]))
 18 |         self.lstm = tf.contrib.rnn.BasicLSTMCell(self.n_hidden, forget_bias=1.0, state_is_tuple=False)
 19 |         self.Wc, self.bc = self.init_controller_vars()
 20 |         self.Wv, self.bv = self.init_value_vars()
 21 | 
 22 |         # Other functions used in the paper
 23 |         # self.full_list_unary = {1:lambda x:x ,2:lambda x: -x, 3: tf.abs, 4:lambda x : tf.pow(x,2),5:lambda x : tf.pow(x,3),
 24 |         #   6:tf.sqrt,7:lambda x: tf.Variable(tf.truncated_normal([1], stddev=0.08))*x,
 25 |         #   8:lambda x : x + tf.Variable(tf.truncated_normal([1], stddev=0.08)),9:lambda x: tf.log(tf.abs(x)+10e-8),
 26 |         #   10:tf.exp,11:tf.sin,12:tf.sinh,13:tf.cosh,14:tf.tanh,15:tf.asinh,16:tf.atan,17:lambda x: tf.sin(x)/x,
 27 |         #   18:lambda x : tf.maximum(x,0),19:lambda x : tf.minimum(x,0),20:tf.sigmoid,21:lambda x:tf.log(1+tf.exp(x)),
 28 |         #   22:lambda x:tf.exp(-tf.pow(x,2)),23:tf.erf,24:lambda x: tf.Variable(tf.truncated_normal([1], stddev=0.08))}
 29 |         #
 30 |         # self.full_list_binary = {1:lambda x,y: x+y,2:lambda x,y:x*y,3:lambda x,y:x-y,4:lambda x,y:x/(y+10e-8),
 31 |         # 5:lambda x,y:tf.maximum(x,y),6:lambda x,y: tf.sigmoid(x)*y,7:lambda x,y:tf.exp(-tf.Variable(tf.truncated_normal([1], stddev=0.08))*tf.pow(x-y,2)),
 32 |         # 8:lambda x,y:tf.exp(-tf.Variable(tf.truncated_normal([1], stddev=0.08))*tf.abs(x-y)),
 33 |         # 9:lambda x,y: tf.Variable(tf.truncated_normal([1], stddev=0.08))*x + (1-tf.Variable(tf.truncated_normal([1], stddev=0.08)))*y}
 34 |         #
 35 |         # self.unary = {1:lambda x:x ,2:lambda x: -x, 3: lambda x: tf.maximum(x,0), 4:lambda x : tf.pow(x,2),5:tf.tanh}
 36 |         # binary = {1:lambda x,y: x+y,2:lambda x,y:x*y,3:lambda x,y:x-y,4:lambda x,y:tf.maximum(x,y),5:lambda x,y: tf.sigmoid(x)*y}
 37 |         # inputs = {1:lambda x:x , 2:lambda x:0, 3: lambda x:3.14159265,4: lambda x : 1, 5: lambda x: 1.61803399}
 38 | 
 39 |     def weight_variable(self, shape, name):
 40 |         return tf.Variable(tf.random_normal(shape=shape), name=name)
 41 | 
 42 |     def bias_variable(self, shape, name):
 43 |         return tf.Variable(tf.random_normal(shape=shape), name=name)
 44 | 
 45 |     def init_controller_vars(self):
 46 |         Wc = self.weight_variable(shape=[self.n_hidden, self.n_input], name="w_controller")
 47 |         bc = self.bias_variable(shape=[self.n_input], name="b_controller")
 48 |         return Wc, bc
 49 | 
 50 |     def init_value_vars(self):
 51 |         Wv = self.weight_variable(shape=[self.n_hidden, 1], name="w_controller")
 52 |         bv = self.bias_variable(shape=[1], name="b_controller")
 53 |         return Wv, bv
 54 | 
 55 |     def neural_search(self):
 56 |         inp = tf.constant(np.ones((1, 4), dtype="float32"))
 57 |         output = list()
 58 |         for _ in range(self.n_steps):
 59 |             inp, self.state = self.lstm(inp, self.state)
 60 |             value = tf.nn.softmax(tf.matmul(inp, self.Wv) + self.bv)
 61 |             inp = tf.nn.softmax(tf.matmul(inp, self.Wc) + self.bc)
 62 |             output.append(inp[0, :])
 63 |         out = [utils.max(output[i]) for i in range(self.n_steps)]
 64 |         return out, output[-1],value
 65 | 
 66 |     def gen_hyperparams(self, output):
 67 |         options = tf.constant([1,2,3,4], dtype=tf.int32)
 68 |         hyperparams = [1 for _ in range(self.n_steps)]
 69 |         # Change the following based on number of hyperparameters to be predicted
 70 |         # Removing strides for now
 71 |         hyperparams[0], hyperparams[1] = options[output[0]], options[output[1]]
 72 |         hyperparams[2] = options[output[2]]  # Layer 1
 73 |         hyperparams[3], hyperparams[4] = options[output[3]], options[output[5]]
 74 |         hyperparams[5] = options[output[5]]  # Layer 2
 75 |         hyperparams[6], hyperparams[7] = options[output[6]], options[output[7]]
 76 |         hyperparams[8] = options[output[8]] # Layer 3
 77 |         hyperparams[9] = options[output[9]] # FNN Layer
 78 |         return hyperparams
 79 | 
 80 |     def REINFORCE(self, prob):
 81 |         loss = tf.reduce_mean(tf.log(prob)) # Might have to take the negative
 82 |         return loss
 83 | 
 84 |     def entropyloss(self,prob):
 85 |         tf.assert_rank_at_least(tf.log(tf.log(tf.clip_by_value(prob, 1e-10, 1.0))),1,message="clipping is computed wrongly, wrong rank")
 86 |         tf.assert_rank_at_least(tf.log(prob),1,message="log(prob) is computed wrongly, wrong rank")
 87 |         entropy = -tf.reduce_mean(tf.exp(tf.add(tf.log(prob),tf.log(tf.log(tf.clip_by_value(prob, 1e-10, 1.0))))), axis=1)
 88 |         return entropy
 89 | 
 90 |     def Lclip(self,val_accuracy,a_t):
 91 |         e = 0.2
 92 |         return tf.reduce_mean(tf.minimum(val_accuracy*a_t,tf.clip_by_value(val_accuracy,1-e,1+e)*a_t))
 93 | 
 94 |     def Lvf(self,delta):
 95 |         return tf.reduce_mean(tf.square(delta))
 96 | 
 97 |     def train_controller(self, reinforce_loss, val_accuracy):
 98 |         #Adam was used to train the RNN controller Bello et al 2017
 99 |         learning_rate = 1e-5 #As per Bello et al 2017
100 |         optimizer = tf.train.AdamOptimizer(learning_rate)
101 |         var_list = [self.Wc, self.bc]
102 |         gradients = optimizer.compute_gradients(loss=reinforce_loss, var_list=var_list)
103 |         for i, (grad, var) in enumerate(gradients):
104 |             if grad is not None:
105 |                 gradients[i] = (grad * val_accuracy, var)
106 |         return optimizer.apply_gradients(gradients)
107 | 
108 |     def update(self, reinforce_loss):
109 |         #Adam was used to train the RNN controller Bello et al 2017
110 |         learning_rate = 1e-5 #As per Bello et al 2017
111 |         optimizer = tf.train.AdamOptimizer(learning_rate)
112 |         var_list = [self.Wc, self.bc]
113 |         gradients = optimizer.compute_gradients(loss=reinforce_loss, var_list=var_list)
114 |         return optimizer.apply_gradients(gradients)
115 | 


--------------------------------------------------------------------------------
/src/swish.py:
--------------------------------------------------------------------------------
  1 | # Module 7: Convolutional Neural Network (CNN)
  2 | # CNN model with dropout for MNIST dataset
  3 | 
  4 | # CNN structure:
  5 | # · · · · · · · · · ·      input data                                               X  [batch, 28, 28, 1]
  6 | # @ @ @ @ @ @ @ @ @ @   -- conv. layer 5x5x1x4  stride 1                            W1 [5, 5, 1, 4]
  7 | # ∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶                                                               Y1 [batch, 28, 28, 4]
  8 | #   @ @ @ @ @ @ @ @     -- conv. layer 5x5x4x8  with max pooling stride 2           W2 [5, 5, 4, 8]
  9 | #   ∶∶∶∶∶∶∶∶∶∶∶∶∶∶∶                                                                 Y2 [batch, 14, 14, 8]
 10 | #     @ @ @ @ @ @       -- conv. layer 4x4x8x12 stride 2 with max pooling stride 2  W3 [4, 4, 8, 12]
 11 | #     ∶∶∶∶∶∶∶∶∶∶∶                                                                   Y3 [batch, 7, 7, 12]
 12 | #      \x/x\x\x/        -- fully connected layer (relu)                             W4 [7*7*12, 200]
 13 | #       · · · ·                                                                     Y4 [batch, 200]
 14 | #       \x/x\x/         -- fully connected layer (softmax)                          W5 [200, 10]
 15 | #        · · ·                                                                      Y [batch, 10]
 16 | 
 17 | import os
 18 | os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
 19 | 
 20 | # Hyper Parameters
 21 | learning_rate = 0.01
 22 | training_epochs = 2
 23 | batch_size = 100
 24 | 
 25 | import tensorflow as tf
 26 | 
 27 | from tensorflow.examples.tutorials.mnist import input_data
 28 | mnist = input_data.read_data_sets("mnist", one_hot=True,reshape=False,validation_size=0)
 29 | logdir = '/users/mingliangang/Desktop/grad_cnn'
 30 | # Step 1: Initial Setup
 31 | X = tf.placeholder(tf.float32, [None, 28, 28, 1])
 32 | y = tf.placeholder(tf.float32, [None, 10])
 33 | pkeep = tf.placeholder(tf.float32)
 34 | 
 35 | L1 = 4 # first convolutional filters
 36 | L2 = 8 # second convolutional filters
 37 | L3 = 12 # third convolutional filters
 38 | L4 = 200 # fully connected neurons
 39 | 
 40 | W1 = tf.Variable(tf.truncated_normal([5,5,1,L1], stddev=0.08))
 41 | B1 = tf.Variable(tf.zeros([L1]))
 42 | beta1 =  tf.Variable(tf.truncated_normal([1], stddev=0.08))
 43 | W2 = tf.Variable(tf.truncated_normal([5,5,L1,L2], stddev=0.08))
 44 | B2 = tf.Variable(tf.zeros([L2]))
 45 | beta2 =  tf.Variable(tf.truncated_normal([1], stddev=0.08))
 46 | W3 = tf.Variable(tf.truncated_normal([4,4,L2,L3], stddev=0.08))
 47 | B3 = tf.Variable(tf.zeros([L3]))
 48 | beta3 =  tf.Variable(tf.truncated_normal([1], stddev=0.08))
 49 | W4 = tf.Variable(tf.truncated_normal([7*7*L3,L4], stddev=0.08))
 50 | B4 = tf.Variable(tf.zeros([L4]))
 51 | W5 = tf.Variable(tf.truncated_normal([L4, 10], stddev=0.08))
 52 | B5 = tf.Variable(tf.zeros([10]))
 53 | #tf.summary.scalar('W1',tf.reduce_mean(W1))
 54 | 
 55 | # Step 2: Setup Model
 56 | x1 = tf.nn.conv2d(X, W1, strides=[1,1,1,1], padding='SAME') + B1
 57 | Y1 = x1*tf.nn.sigmoid(beta1*x1)# output is 28x28
 58 | x2 = tf.nn.conv2d(Y1, W2, strides=[1,1,1,1], padding='SAME') + B2
 59 | Y2 = x2*tf.nn.sigmoid(beta2*x2)
 60 | Y2 = tf.nn.max_pool(Y2, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME') # output is 14x14
 61 | Y2= tf.nn.dropout(Y2, pkeep)
 62 | x3 = tf.nn.conv2d(Y2, W3, strides=[1,1,1,1], padding='SAME') + B3
 63 | Y3 = x3*tf.nn.sigmoid(beta3*x3)
 64 | Y3 = tf.nn.max_pool(Y3, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME') # output is 7x7
 65 | Y3= tf.nn.dropout(Y3, pkeep)
 66 | 
 67 | # Flatten the third convolution for the fully connected layer
 68 | YY = tf.reshape(Y3, shape=[-1, 7 * 7 * L3])
 69 | 
 70 | Y4 = tf.nn.relu(tf.matmul(YY, W4) + B4)
 71 | #YY4 = tf.nn.dropout(Y4, 0.3)
 72 | Ylogits = tf.matmul(Y4, W5) + B5
 73 | yhat = tf.nn.softmax(Ylogits)
 74 | 
 75 | # Step 3: Loss Functions
 76 | loss = tf.reduce_mean(
 77 |     tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=y))
 78 | tf.summary.scalar('loss',loss)
 79 | # Step 4: Optimizer
 80 | #optimizer = tf.train.RMSPropOptimizer(learning_rate)
 81 | optimizer = tf.train.AdamOptimizer(learning_rate)
 82 | #optimizer = tf.train.AdamOptimizer()
 83 | grad = optimizer.compute_gradients(loss)
 84 | tf.summary.scalar('beta1',tf.reduce_mean(beta1))
 85 | tf.summary.scalar('grad',tf.reduce_mean(grad[0][0]))
 86 | tf.summary.scalar('W1',tf.reduce_mean(grad[0][1]))
 87 | tf.summary.histogram('grad',tf.reduce_mean(grad[0][0]))
 88 | tf.summary.histogram('W1',tf.reduce_mean(grad[0][1]))
 89 | tf.summary.histogram('beta1',tf.reduce_mean(beta1))
 90 | 
 91 | train = optimizer.minimize(loss)
 92 | 
 93 | # accuracy of the trained model, between 0 (worst) and 1 (best)
 94 | is_correct = tf.equal(tf.argmax(y,1),tf.argmax(yhat,1))
 95 | accuracy = tf.reduce_mean(tf.cast(is_correct,tf.float32))
 96 | 
 97 | init = tf.global_variables_initializer()
 98 | sess = tf.Session()
 99 | merged = tf.summary.merge_all()
100 | writer = tf.summary.FileWriter(logdir+ '/train',
101 |                                       sess.graph)
102 | sess.run(init)
103 | 
104 | # Step 5: Training Loop
105 | for epoch in range(training_epochs):
106 |     num_batches = int(mnist.train.num_examples / batch_size)
107 |     for i in range(num_batches):
108 |         batch_X, batch_y = mnist.train.next_batch(batch_size)
109 |         train_data = {X: batch_X, y: batch_y, pkeep: 0.5}
110 |         summary,_ = sess.run([merged,train], feed_dict=train_data)
111 |         writer.add_summary(summary,epoch*num_batches+i+1)
112 |         print(epoch * num_batches + i + 1, "Training accuracy =", sess.run(accuracy, feed_dict=train_data),
113 |           "Loss =", sess.run(loss, feed_dict=train_data))
114 | 
115 | # Step 6: Evaluation
116 | test_data = {X:mnist.test.images,y:mnist.test.labels, pkeep: 1.0}
117 | print("Testing Accuracy = ", sess.run(accuracy, feed_dict = test_data))
118 | 


--------------------------------------------------------------------------------
/src/train/events.out.tfevents.1514935452.6bf252a4b161:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Neoanarika/Searching-for-activation-functions/1615f3fa423ab6a7977aad9d3ebf4a0ace2fab63/src/train/events.out.tfevents.1514935452.6bf252a4b161


--------------------------------------------------------------------------------
/src/utils.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import shutil
 3 | import numpy as np
 4 | import tensorflow as tf
 5 | 
 6 | def path_exists(path, overwrite=False):
 7 |     if not os.path.isdir(path):
 8 |         os.mkdir(path)
 9 |     elif overwrite == True :
10 |         shutil.rmtree(path)
11 |     return path
12 | 
13 | def remove_dir(path):
14 |     os.rmdir(path)
15 |     return True
16 | 
17 | def relu_init(shape, dtype=tf.float32, partition_info=None):
18 |     init_range = np.sprt(2.0 / shape[1])
19 |     return tf.random_normal(shape, dtype=dtype) * init_range
20 | 
21 | def ones(shape, dtype=tf.float32):
22 |     return tf.ones(shape, dtype=dtype)
23 | 
24 | def zeros(shape, dtype=tf.float32):
25 |     return tf.zeros(shape, dtype=dtype)
26 | 
27 | def tanh_init(shape, dtype=tf.float32, partition_info=None):
28 |     init_range = np.sqrt(6.0 / (shape[0] + shape[1]))
29 |     return tf.random_uniform(shape, minval=-init_range, maxval=init_range, dtype=dtype)
30 | 
31 | def leaky_relu(X, alpha=0.01):
32 |     return tf.maximum(X, alpha * X)
33 | 
34 | def max(input):
35 |     return tf.argmax(input)
36 | 


--------------------------------------------------------------------------------