├── .gitignore ├── 1_Neural_Network_Tutorial_Visualizations.ipynb ├── 2_Neural_Network_Tutorial_Matrix_Representations.ipynb ├── 3_Neural_Network_Tutorial_Writing_NN_ForwardProp_In_Python.ipynb ├── 4_Neural_Network_Tutorial_Backpropagation.ipynb ├── 5_Neural_Network_Tutorial_Training_And_Testing.ipynb ├── 6_Neural_Network_Tutorial_Descent_Experimenting_with_Optimizers.ipynb ├── MNIST experiments.ipynb ├── README.md ├── images ├── Title_ANN.png ├── digitsNN.png └── optimizers.gif ├── myPyNN.py └── myPyNNTest.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | 27 | # PyInstaller 28 | # Usually these files are written by a python script from a template 29 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 30 | *.manifest 31 | *.spec 32 | 33 | # Installer logs 34 | pip-log.txt 35 | pip-delete-this-directory.txt 36 | 37 | # Unit test / coverage reports 38 | htmlcov/ 39 | .tox/ 40 | .coverage 41 | .coverage.* 42 | .cache 43 | nosetests.xml 44 | coverage.xml 45 | *,cover 46 | .hypothesis/ 47 | 48 | # Translations 49 | *.mo 50 | *.pot 51 | 52 | # Django stuff: 53 | *.log 54 | local_settings.py 55 | 56 | # Flask stuff: 57 | instance/ 58 | .webassets-cache 59 | 60 | # Scrapy stuff: 61 | .scrapy 62 | 63 | # Sphinx documentation 64 | docs/_build/ 65 | 66 | # PyBuilder 67 | target/ 68 | 69 | # IPython Notebook 70 | .ipynb_checkpoints 71 | 72 | # pyenv 73 | .python-version 74 | 75 | # celery beat schedule file 76 | celerybeat-schedule 77 | 78 | # dotenv 79 | .env 80 | 81 | # virtualenv 82 | venv/ 83 | ENV/ 84 | 85 | # Spyder project settings 86 | .spyderproject 87 | 88 | # Rope project settings 89 | .ropeproject 90 | 91 | # OS X 92 | .DS_Store 93 | -------------------------------------------------------------------------------- /2_Neural_Network_Tutorial_Matrix_Representations.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Matrix representations\n", 8 | "\n", 9 | "## Matrix representations - input to the network\n", 10 | "\n", 11 | "Suppose an input has $d_{i}$ dimensions. (Remember that the input has been normalized to range between 0 and 1.)\n", 12 | "\n", 13 | "Then each input would be:\n", 14 | "\n", 15 | "$$X \\; (without bias) _{1{\\times}d_{i}} = \\left[ \\begin{array}{c} x_{0} & x_{1} & \\cdots & x_{(d_{i}-1)} \\end{array} \\right] _{1{\\times}d_{i}}$$\n", 16 | "\n", 17 | "After adding the bias term,\n", 18 | "\n", 19 | "$$X_{1{\\times}(d_{i}+1)} = \\left[ \\begin{array}{c} 1 & X_{1{\\times}d_{i}} \\end{array} \\right] _{1{\\times}(d_{i}+1)}$$\n", 20 | "\n", 21 | "For example, one of the data points given above to make a logic gate was $(0,1)$. Here, $X = \\left[ \\begin{array}{c} 1 & 0 & 1 \\end{array} \\right]_{1{\\times}(2+1)}$" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "Suppose we provide $n$ $d_{i}$-dimensional data points. For the first layer of neurons, we can make an input matrix of $n{\\times}d_{i}$ dimension.\n", 29 | "\n", 30 | "$$X^{(1)}_{n{\\times}(d_{i}+1)} = \n", 31 | "\\left[ \\begin{array}{c} 1 & _{(0)}X \\\\ 1 & _{(1)}X \\\\ \\vdots & \\vdots \\\\ 1 & _{(n-1)}X \\end{array} \\right] _{n{\\times}(d_{i}+1)}\n", 32 | "=\n", 33 | "\\left[ \\begin{array}{c} \n", 34 | "1 & _{(0)}x_{0} & _{(0)}x_{1} & _{(0)}x_{2} & \\cdots & _{(0)}x_{(d_{i}-1)} \\\\ \n", 35 | "1 & _{(1)}x_{0} & _{(1)}x_{1} & _{(1)}x_{2} & \\cdots & _{(1)}x_{(d_{i}-1)} \\\\ \n", 36 | "\\vdots & \\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\ \n", 37 | "1 & _{(n-1)}x_{0} & _{(n-1)}x_{1} & _{(n-1)}x_{2} & \\cdots & _{(n-1)}x_{(d_{i}-1)} \n", 38 | "\\end{array} \\right] _{n{\\times}(d_{i}+1)}$$\n", 39 | "\n", 40 | "For example, for logic gates, the input matrix was $X = \\left[ \\begin{array}{c} 1 & 0 & 0 \\\\ 1 & 0 & 1 \\\\ 1 & 1 & 0 \\\\ 1 & 1 & 1 \\end{array} \\right] _{4{\\times}3} $" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "## Matrix representations - output of a layer\n", 48 | "\n", 49 | "Suppose the output of the $l^{th}$ layer has $o_{l}$ dimensions, meaning there are $o_{l}$ neurons in the layer.\n", 50 | "\n", 51 | "In the above example, the output of the 1st Layer of 2 neurons is $o_{1} = 2$, and the output of the 2nd layer of 1 neuron is $o_{2} = 1$.\n", 52 | "\n", 53 | "For each input, the output is an $o_{l}$-dimensional vector:\n", 54 | "\n", 55 | "$$Y^{(l)} = \\left[ \\begin{array}{c} y_{[0]}^{(l)} & y_{[1]}^{(l)} & \\cdots & y_{[o_{l}-1]}^{(l)} \\end{array} \\right] _{1{\\times}o_{l}}$$\n", 56 | "\n", 57 | "\n", 58 | "For example, for an AND gate, the output of $(0,1)$ is $Y = \\left[ \\begin{array}{c} 0 \\end{array} \\right] _{1{\\times}1}$" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "Thus, for $n$ data points, the output is:\n", 66 | "\n", 67 | "$$Y^{(l)} = \\left[ \\begin{array}{c} \n", 68 | "{_{(0)}}Y^{(l)} \\\\ {_{(1)}}Y^{(l)} \\\\ \\vdots \\\\ _{(n-1)}Y^{(l)} \\end{array} \\right] _{n{\\times}o_{l}} \n", 69 | "= \\left[ \\begin{array}{c} \n", 70 | "{_{(0)}}y_{[0]}^{(l)} & \\cdots & {_{(0)}}y_{[o_{l}-1]}^{(l)} \\\\ \n", 71 | "{_{(1)}}y_{[0]}^{(l)} & \\cdots & {_{(1)}}y_{[o_{l}-1]}^{(l)} \\\\ \n", 72 | "\\vdots & \\ddots & \\vdots \\\\ \n", 73 | "_{(n-1)}y_{[0]}^{(l)} & \\cdots & _{(n-1)}y_{[o_{l}-1]}^{(l)} \n", 74 | "\\end{array} \\right] _{n{\\times}o_{l}}$$\n", 75 | "\n", 76 | "For example, for an AND gate, the output matrix is $Y = \\left[ \\begin{array}{c} 0 \\\\ 0 \\\\ 0 \\\\ 1 \\end{array} \\right] _{4{\\times}1}$" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "## Matrix representations - input to a layer\n", 84 | "\n", 85 | "Suppose at the $l^{th}$ layer, the input has $i_{l}$ dimensions.\n", 86 | "\n", 87 | "(The number of inputs to the layer) = (1 bias term) + (the number of outputs from the previous layer):\n", 88 | "$$i_{l} = 1 + o_{(l-1)}$$\n", 89 | "\n", 90 | "In the above example, the input to the first layer of 2 neurons has $i_{1} = d_{i}+1 = 3$, and the second layer of 1 neuron has $i_{2} = o_{1} + 1 = 3$." 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "If there are $n$ data points given, the input to the $l^{th}$ layer would be an $n{\\times}i_{l} = n{\\times}(o_{(l-1)}+1)$ matrix:\n", 98 | "\n", 99 | "$$X^{(l)}_{n{\\times}i_{l}} \n", 100 | "= \\left[ \\begin{array}{c} \n", 101 | "1 & _{(0)}Y^{(l-1)} \\\\ \n", 102 | "1 & _{(1)}Y^{(l-1)} \\\\ \n", 103 | "\\vdots & \\vdots \\\\ \n", 104 | "1 & _{(n-1)}Y^{(l-1)} \n", 105 | "\\end{array} \\right] _{n{\\times}i_{l}}\n", 106 | "= \\left[ \\begin{array}{c} \n", 107 | "1 & _{(0)}y^{(l-1)}_{[0]} & \\cdots & _{(0)}y^{(l-1)}_{[o_{l-1}-1]} \\\\ \n", 108 | "1 & _{(1)}y^{(l-1)}_{[0]} & \\cdots & _{(1)}y^{(l-1)}_{[o_{l-1}-1]} \\\\ \n", 109 | "\\vdots & \\vdots & \\ddots & \\vdots \\\\ \n", 110 | "1 & _{(n-1)}y^{(l-1)}_{[0]} & \\cdots & _{(n-1)}y^{(l-1)}_{[o_{l-1}-1]} \n", 111 | "\\end{array} \\right] _{n{\\times}i_{l}}$$\n", 112 | "\n", 113 | "For example, in the 3-neurons neural network above, input matrix to the first layer is $\\left[ \\begin{array}{c} 1 & x_0 & x_1 \\end{array} \\right] _{1{\\times}3}$, and the input matrix to the second layer is $\\left[ \\begin{array}{c} 1 & y_0 & y_1 \\end{array} \\right] _{1{\\times}3}$" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "## Matrix representations - weight matrix of one neuron\n", 121 | "\n", 122 | "For a neuron, the weight matrix multiplies a weight with each input in every dimension, and sums them. This can be represented by a dot product.\n", 123 | "\n", 124 | "Assuming the input to the $k^{th}$ neuron in the $l^{th}$ layer has $i_{l}$ dimensions,\n", 125 | "\n", 126 | "$$W^{(l)}_{[k]} {_{1{\\times}i_{l}}} = \\left[ \\begin{array}{c} w^{(l)}_{[k],0} & w^{(l)}_{[k],1} & \\cdots & w^{(l)}_{[k],i_{l}-1} \\end{array} \\right] _{1{\\times}i_{l}}$$\n", 127 | "\n", 128 | "(Remember $i_{l} = 1 + o_{(l-1)}$)" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "Then the output of that neuron for one data point is x < dot product \\> weights.\n", 136 | "\n", 137 | "$$y^{(l)}_{[k]} {_{1{\\times}1}} = Sigmoid( x^{(l)} {_{1{\\times}i_{l}}} \\; .* \\; W^{(l)}_{[k]}{^T}{_{i_{l}{\\times}1}} )$$\n", 138 | "\n", 139 | "$$\n", 140 | "=\n", 141 | "Sigmoid \\left(\n", 142 | "x^{(l)}_{[k]}\n", 143 | "\\left[ \\begin{array}{c} 1 & y^{(l-1)}_{0} & \\cdots & y^{(l-1)}_{(o_{l-1}-1)}\n", 144 | "\\end{array} \\right] _{1{\\times}i_{l}}\n", 145 | "\\;\\;\\; .* \\;\\;\\;\n", 146 | "W^{(l)}_{[k]} {^{T}}\n", 147 | "\\left[ \\begin{array}{c} w^{(l)}_{[k],0} \\\\ w^{(l)}_{[k],1} \\\\ \\vdots \\\\ w^{(l)}_{[k],i_{l}-1} \\end{array} \\right] _{i_{l}{\\times}1}\n", 148 | "\\right)\n", 149 | "$$\n", 150 | "\n", 151 | "$$\n", 152 | "= Sigmoid(1*w^{(l)}_{[k],0} \\;\\;+\\;\\; y^{(l-1)}_{0}*w^{(l)}_{[k],1} \\;\\;+\\;\\; ... \\;\\;+\\;\\; y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1})\n", 153 | "$$\n", 154 | "\n", 155 | "(We can see that the dot product of the $x$ and $W$ matrices does indeed give the output of the neuron)" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "For $n$ data points, the output of the $k^{th}$ neuron in the $l^{th}$ layer is:\n", 163 | "$$Y^{(l)}_{[k]} {_{n{\\times}1}}\n", 164 | "=\n", 165 | "Sigmoid \\left(\n", 166 | "X^{(l)}_{[k]}\n", 167 | "\\left[ \\begin{array}{c} \n", 168 | "1 & _{(0)}y^{(l-1)}_{0} & \\cdots & _{(0)}y^{(l-1)}_{(o_{l-1}-1)} \\\\\n", 169 | "1 & _{(1)}y^{(l-1)}_{0} & \\cdots & _{(1)}y^{(l-1)}_{(o_{l-1}-1)} \\\\\n", 170 | "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", 171 | "1 & _{(n-1)}y^{(l-1)}_{0} & \\cdots & _{(n-1)}y^{(l-1)}_{(o_{l-1}-1)}\n", 172 | "\\end{array} \\right] _{n{\\times}i_{l}}\n", 173 | "\\; .* \\;\n", 174 | "W^{(l)}_{[k]} {^{T}}\n", 175 | "\\left[ \\begin{array}{c} w^{(l)}_{[k],0} \\\\ w^{(l)}_{[k],1} \\\\ \\vdots \\\\ w^{(l)}_{[k],i_{l}-1} \\end{array} \\right] _{i_{l}{\\times}1}\n", 176 | "\\right)\n", 177 | "$$\n", 178 | "\n", 179 | "$$\n", 180 | "=\n", 181 | "Sigmoid \\left(\n", 182 | "\\left[ \\begin{array}{c} \n", 183 | "1*w^{(l)}_{[k],0} \\;\\;+\\;\\; _{(0)}y^{(l-1)}_{(0)}*w^{(l)}_{[k],1} \\;\\;+\\;\\; ... \\;\\;+\\;\\; _{(0)}y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1} \\\\\n", 184 | "1*w^{(l)}_{[k],0} \\;\\;+\\;\\; _{(1)}y^{(l-1)}_{0}*w^{(l)}_{[k],1} \\;\\;+\\;\\; ... \\;\\;+\\;\\; _{(1)}y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1} \\\\\n", 185 | "\\vdots \\\\\n", 186 | "1*w^{(l)}_{[k],0} \\;\\;+\\;\\; _{(n-1)}y^{(l-1)}_{0}*w^{(l)}_{[k],1} \\;\\;+\\;\\; ... \\;\\;+\\;\\; _{(n-1)}y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1} \\\\\n", 187 | "\\end{array} \\right] _{n{\\times}1}\n", 188 | "\\right)\n", 189 | "$$" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "## Matrix representations - weight of a layer of neurons\n", 197 | "\n", 198 | "Suppose the $l^{th}$ layer in a neural network has $o_{l}$ neurons.\n", 199 | "\n", 200 | "Each neuron would produce one number as its output - the dot product of its weights, and the inputs.\n", 201 | "\n", 202 | "In matrix form, the weight matrix of the layer is:\n", 203 | "\n", 204 | "$$\n", 205 | "W^{(l)}_{o_{l}{\\times}i_{l}} = \\left[ \\begin{array}{c} W^{(l)}_{[0]} \\\\ W^{(l)}_{[1]} \\\\ \\cdots \\\\ W^{(l)}_{[o_{l}-1]} \\end{array} \\right] _{o_{l}{\\times}i_{l}} \n", 206 | "= \n", 207 | "\\left[ \\begin{array}{c} \n", 208 | "w^{(l)}_{[0],0} & w^{(l)}_{[0],1} & w^{(l)}_{[0],2} & \\cdots & w^{(l)}_{[0],i_{l}-1} \\\\ \n", 209 | "w^{(l)}_{[1],0} & w^{(l)}_{[1],1} & w^{(l)}_{[1],2} & \\cdots & w^{(l)}_{[1],i_{l}-1} \\\\ \n", 210 | "\\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\ \n", 211 | "w^{(l)}_{[o_{l}-1],0} & w^{(l)}_{[o_{l}-1],1} & w^{(l)}_{[o_{l}-1],2} & \\cdots & w^{(l)}_{[o_{l}-1],i_{l}-1} \n", 212 | "\\end{array} \\right] _{o_{l}{\\times}i_{l}}\n", 213 | "$$" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "The output of this layer of neurons is:\n", 221 | "\n", 222 | "$$ Y^{(l)}_{n{\\times}o_{l}} = Sigmoid\\;(\\;X^{(l)}_{n{\\times}i_{l}} \\; .* \\; W^{(l)}{^{T}}_{i_{l}{\\times}o_{l}} \\;)\\; $$\n", 223 | "\n", 224 | "$$\n", 225 | "Y^{(l)}_{n{\\times}o_{l}} \\left[ \\begin{array}{c} \n", 226 | "{_{(0)}}y_{0}^{(l)} & \\cdots & {_{(0)}}y_{o_{l}-1}^{(l)} \\\\ \n", 227 | "{_{(1)}}y_{0}^{(l)} & \\cdots & {_{(1)}}y_{o_{l}-1}^{(l)} \\\\ \n", 228 | "\\vdots & \\ddots & \\vdots \\\\ \n", 229 | "_{(n-1)}y_{0}^{(l)} & \\cdots & _{(n-1)}y_{o_{l}-1}^{(l)} \n", 230 | "\\end{array} \\right] _{n{\\times}o_{l}}\n", 231 | "=\n", 232 | "Sigmoid \\left(\n", 233 | "X^{(l)}_{n{\\times}i_{l}} \\left[ \\begin{array}{c} \n", 234 | "1 & _{(0)}y^{(l-1)}_{0} & \\cdots & _{(0)}y^{(l-1)}_{(o_{l-1}-1)} \\\\ \n", 235 | "1 & _{(1)}y^{(l-1)}_{0} & \\cdots & _{(1)}y^{(l-1)}_{(o_{l-1}-1)} \\\\ \n", 236 | "\\vdots & \\vdots & \\ddots & \\vdots \\\\ \n", 237 | "1 & _{(n-1)}y^{(l-1)}_{0} & \\cdots & _{(n-1)}y^{(l-1)}_{(o_{l-1}-1)} \n", 238 | "\\end{array} \\right] _{n{\\times}i_{l}}\n", 239 | "\\; .* \\;\n", 240 | "W^{(l)}{^{T}}_{i_{l}{\\times}o_{l}} \\left[ \\begin{array}{c} \n", 241 | "w^{(l)}_{[0],0} & w^{(l)}_{[1],1} & \\cdots & w^{(l)}_{[o_{l}-1],0} \\\\ \n", 242 | "w^{(l)}_{[0],1} & w^{(l)}_{[1],1} & \\cdots & w^{(l)}_{[o_{l}-1],1} \\\\ \n", 243 | "\\vdots & \\vdots & \\ddots & \\vdots \\\\ \n", 244 | "w^{(l)}_{[0],i_{l}-1} & w^{(l)}_{[1],1} & \\cdots & w^{(l)}_{[o_{l}-1],i_{l}-1} \n", 245 | "\\end{array} \\right] _{i_{l}{\\times}o_{l}}\n", 246 | "\\right)\n", 247 | "$$\n", 248 | "\n", 249 | "$$\n", 250 | "=\n", 251 | "Sigmoid \\left(\n", 252 | "\\left[ \\begin{array}{c} \n", 253 | "1*w^{(l)}_{[0],0} + \\cdots + _{(0)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[0],i_{l-1}-1}\n", 254 | "&\n", 255 | "\\cdots\n", 256 | "&\n", 257 | "1*w^{(l)}_{[(o_{l}-1)],0} + \\cdots + _{(0)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[(o_{l}-1)],i_{l-1}-1}\n", 258 | "\\\\\n", 259 | "\\vdots & \\ddots & \\vdots\n", 260 | "\\\\\n", 261 | "1*w^{(l)}_{[0],0} + \\cdots + _{(n-1)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[0],i_{l-1}-1}\n", 262 | "&\n", 263 | "\\cdots\n", 264 | "&\n", 265 | "1*w^{(l)}_{[(o_{l}-1)],0} + \\cdots + _{(n-1)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[(o_{l}-1)],i_{l-1}-1}\n", 266 | "\\end{array} \\right] _{n{\\times}o_{l}}\n", 267 | "\\right)\n", 268 | "$$" 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "## Conclusion\n", 276 | "\n", 277 | "We have seen that the action of a layer of a neural network can be written as the following matrix operation:\n", 278 | "\n", 279 | "$$ Y^{(l)}_{n{\\times}o_{l}} = Sigmoid\\;(\\;X^{(l)}_{n{\\times}i_{l}} \\; .* \\; W^{(l)}{^{T}}_{i_{l}{\\times}o_{l}} \\;)\\; $$\n", 280 | "\n", 281 | "So, a neural network can be defined as the set of weights $W^{(l)}_{i_{l}{\\times}o_{l}}$ for all its layers, where $l$ is the index of the layer we are considering, $i_{l}$ and $o_{l}$ are its input and output dimensions.\n", 282 | "\n", 283 | "Also, because of adding a bias term at every layer,\n", 284 | "\n", 285 | "$$i_{l} = 1 + o_{(l-1)}$$\n", 286 | "\n", 287 | "The utility of neural networks can be exploited only once the weight matrices $W^{(l)}_{i_{l}{\\times}o_{l}}$ for all $l$ have ben set according to need." 288 | ] 289 | } 290 | ], 291 | "metadata": { 292 | "kernelspec": { 293 | "display_name": "Python 3", 294 | "language": "python", 295 | "name": "python3" 296 | }, 297 | "language_info": { 298 | "codemirror_mode": { 299 | "name": "ipython", 300 | "version": 3 301 | }, 302 | "file_extension": ".py", 303 | "mimetype": "text/x-python", 304 | "name": "python", 305 | "nbconvert_exporter": "python", 306 | "pygments_lexer": "ipython3", 307 | "version": "3.5.1" 308 | } 309 | }, 310 | "nbformat": 4, 311 | "nbformat_minor": 2 312 | } 313 | -------------------------------------------------------------------------------- /3_Neural_Network_Tutorial_Writing_NN_ForwardProp_In_Python.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 19, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "# Pre-requisites\n", 12 | "import numpy as np" 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "metadata": {}, 18 | "source": [ 19 | "# Writing a neural network in python\n", 20 | "\n", 21 | "Firstly, a neural network is defined by the number of layers, and the number of neurons in each layer.\n", 22 | "\n", 23 | "Let us use a list to denote this." 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## Defining layer sizes" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 31, 36 | "metadata": {}, 37 | "outputs": [ 38 | { 39 | "name": "stdout", 40 | "output_type": "stream", 41 | "text": [ 42 | "jwifhiwfn\n" 43 | ] 44 | } 45 | ], 46 | "source": [ 47 | "# Defining the sizes of the layers in our neural network\n", 48 | "layers = [2, 2, 1]\n", 49 | "print(\"jwifhiwfn\")" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "The above code denotes the 3-neuron neural network we saw previously: 2-dimensional input, 2 neurons in a hidden layer, 1 neuron in the output layer.\n", 57 | "\n", 58 | "Generally speaking, a neural network than has more than 1 hidden layer is a **deep** neural network." 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "## Defining weight matrices\n", 66 | "\n", 67 | "Using the sizes of the layers in our neural network, let us initialize the weight matrices to random values (sampled from a standard normal gaussian, because we know that we need both positive and negative weights)." 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 3, 73 | "metadata": { 74 | "collapsed": true 75 | }, 76 | "outputs": [], 77 | "source": [ 78 | "# Initializing weight matrices from layer sizes\n", 79 | "def initializeWeights(layers):\n", 80 | " weights = [np.random.randn(o, i+1) for i, o in zip(layers[:-1], layers[1:])]\n", 81 | " return weights" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 37, 87 | "metadata": {}, 88 | "outputs": [ 89 | { 90 | "name": "stdout", 91 | "output_type": "stream", 92 | "text": [ 93 | "1\n", 94 | "(2, 3)\n", 95 | "[[ 0.45147937 2.36764603 -0.44038386]\n", 96 | " [ 1.25899973 -1.06551598 0.20563357]]\n", 97 | "2\n", 98 | "(1, 3)\n", 99 | "[[-0.76261718 -0.90078965 -0.01774495]]\n" 100 | ] 101 | } 102 | ], 103 | "source": [ 104 | "# Displaying weight matrices\n", 105 | "layers = [2, 2, 1]\n", 106 | "weights = initializeWeights(layers)\n", 107 | "\n", 108 | "for i in range(len(weights)):\n", 109 | " print(i+1); print(weights[i].shape); print(weights[i])" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "# Forward Propagation\n", 117 | "\n", 118 | "The output of the neural network is calculated by **propagating forward** the outputs of each layer.\n", 119 | "\n", 120 | "Let us define our input as an np.array, since we want to represent matrices." 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 9, 126 | "metadata": { 127 | "collapsed": true 128 | }, 129 | "outputs": [], 130 | "source": [ 131 | "# We shall use np.array() to represent matrices\n", 132 | "#X = np.array([23, 42, 56])\n", 133 | "X = np.array([[0,0], [0,1], [1,0], [1,1]])" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": {}, 139 | "source": [ 140 | "## Adding bias terms\n", 141 | "\n", 142 | "Since the input to every layer needs a bias term (1) added to it, let us define a function to do that." 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 29, 148 | "metadata": { 149 | "collapsed": true 150 | }, 151 | "outputs": [], 152 | "source": [ 153 | "# Add a bias term to every data point in the input\n", 154 | "def addBiasTerms(X):\n", 155 | " # Make the input an np.array()\n", 156 | " X = np.array(X)\n", 157 | " \n", 158 | " # Forcing 1D vectors to be 2D matrices of 1xlength dimensions\n", 159 | " if X.ndim==1:\n", 160 | " X = np.reshape(X, (1, len(X)))\n", 161 | " \n", 162 | " # Inserting bias terms\n", 163 | " X = np.insert(X, 0, 1, axis=1)\n", 164 | " \n", 165 | " return X" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "Use the following cell to test the addBiasTerms function:" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": 30, 178 | "metadata": {}, 179 | "outputs": [ 180 | { 181 | "name": "stdout", 182 | "output_type": "stream", 183 | "text": [ 184 | "Before adding bias terms: \n", 185 | "[[0 0]\n", 186 | " [0 1]\n", 187 | " [1 0]\n", 188 | " [1 1]]\n", 189 | "After adding bias terms: \n", 190 | "[[1 0 0]\n", 191 | " [1 0 1]\n", 192 | " [1 1 0]\n", 193 | " [1 1 1]]\n" 194 | ] 195 | } 196 | ], 197 | "source": [ 198 | "# TESTING addBiasTerms\n", 199 | "\n", 200 | "# We shall use np.array() to represent matrices\n", 201 | "#X = np.array([23, 42, 56])\n", 202 | "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n", 203 | "print(\"Before adding bias terms: \"); print(X)\n", 204 | "X = addBiasTerms(X)\n", 205 | "print(\"After adding bias terms: \"); print(X)" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "## Sigmoid function\n", 213 | "\n", 214 | "Let us also define a function to calculate the sigmoid of any np.array given to it:" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 13, 220 | "metadata": { 221 | "collapsed": true 222 | }, 223 | "outputs": [], 224 | "source": [ 225 | "# Sigmoid function\n", 226 | "def sigmoid(a):\n", 227 | " return 1/(1 + np.exp(-a))" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "## Forward propagation of inputs\n", 235 | "\n", 236 | "Let us store the outputs of the layers in a list called \"outputs\". We shall use that the output of one layer as the input to the next layer." 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 17, 242 | "metadata": { 243 | "collapsed": true 244 | }, 245 | "outputs": [], 246 | "source": [ 247 | "# Forward Propagation of outputs\n", 248 | "def forwardProp(X, weights):\n", 249 | " # Initializing an empty list of outputs\n", 250 | " outputs = []\n", 251 | " \n", 252 | " # Assigning a name to reuse as inputs\n", 253 | " inputs = X\n", 254 | " \n", 255 | " # For each layer\n", 256 | " for w in weights:\n", 257 | " # Add bias term to input\n", 258 | " inputs = addBiasTerms(inputs)\n", 259 | " \n", 260 | " # Y = Sigmoid ( X .* W^T )\n", 261 | " outputs.append(sigmoid(np.dot(inputs, w.T)))\n", 262 | " \n", 263 | " # Input of next layer is output of this layer\n", 264 | " inputs = outputs[-1]\n", 265 | " \n", 266 | " return outputs" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "Use the following cell to test forward propagation:" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": 24, 279 | "metadata": {}, 280 | "outputs": [ 281 | { 282 | "name": "stdout", 283 | "output_type": "stream", 284 | "text": [ 285 | "weights:\n", 286 | "1\n", 287 | "(2, 3)\n", 288 | "[[-250 350 350]\n", 289 | " [-250 200 200]]\n", 290 | "2\n", 291 | "(1, 3)\n", 292 | "[[-100 500 -500]]\n", 293 | "X:\n", 294 | "[[0, 0], [0, 1], [1, 0], [1, 1]]\n", 295 | "outputs:\n", 296 | "1\n", 297 | "(4, 2)\n", 298 | "[[ 2.66919022e-109 2.66919022e-109]\n", 299 | " [ 1.00000000e+000 1.92874985e-022]\n", 300 | " [ 1.00000000e+000 1.92874985e-022]\n", 301 | " [ 1.00000000e+000 1.00000000e+000]]\n", 302 | "2\n", 303 | "(4, 1)\n", 304 | "[[ 3.72007598e-44]\n", 305 | " [ 1.00000000e+00]\n", 306 | " [ 1.00000000e+00]\n", 307 | " [ 3.72007598e-44]]\n" 308 | ] 309 | } 310 | ], 311 | "source": [ 312 | "# VIEWING FORWARD PROPAGATION\n", 313 | "\n", 314 | "# Initialize network\n", 315 | "layers = [2, 2, 1]\n", 316 | "#weights = initializeWeights(layers)\n", 317 | "\n", 318 | "# 3-neuron network\n", 319 | "weights = []\n", 320 | "weights.append(np.array([[-250, 350, 350], [-250, 200, 200]]))\n", 321 | "weights.append(np.array([[-100, 500, -500]]))\n", 322 | "\n", 323 | "print(\"weights:\")\n", 324 | "for i in range(len(weights)):\n", 325 | " print(i+1); print(weights[i].shape); print(weights[i])\n", 326 | "\n", 327 | "# Input\n", 328 | "X = [[0,0], [0,1], [1,0], [1,1]]\n", 329 | "\n", 330 | "print(\"X:\"); print(X)\n", 331 | "\n", 332 | "# Forward propagate X, and save outputs\n", 333 | "outputs = forwardProp(X, weights)\n", 334 | "\n", 335 | "print(\"outputs:\")\n", 336 | "for o in range(len(outputs)):\n", 337 | " print(o+1); print(outputs[o].shape); print(outputs[o])" 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": null, 343 | "metadata": { 344 | "collapsed": true 345 | }, 346 | "outputs": [], 347 | "source": [] 348 | } 349 | ], 350 | "metadata": { 351 | "kernelspec": { 352 | "display_name": "Python 3", 353 | "language": "python", 354 | "name": "python3" 355 | }, 356 | "language_info": { 357 | "codemirror_mode": { 358 | "name": "ipython", 359 | "version": 3 360 | }, 361 | "file_extension": ".py", 362 | "mimetype": "text/x-python", 363 | "name": "python", 364 | "nbconvert_exporter": "python", 365 | "pygments_lexer": "ipython3", 366 | "version": "3.5.1" 367 | } 368 | }, 369 | "nbformat": 4, 370 | "nbformat_minor": 2 371 | } 372 | -------------------------------------------------------------------------------- /4_Neural_Network_Tutorial_Backpropagation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 707, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "# Pre-requisites\n", 12 | "import numpy as np\n", 13 | "import time\n", 14 | "\n", 15 | "# To clear print buffer\n", 16 | "from IPython.display import clear_output" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "# Importing code from previous tutorial:" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 30, 29 | "metadata": { 30 | "collapsed": true 31 | }, 32 | "outputs": [], 33 | "source": [ 34 | "# Initializing weight matrices from layer sizes\n", 35 | "def initializeWeights(layers):\n", 36 | " weights = [np.random.randn(o, i+1) for i, o in zip(layers[:-1], layers[1:])]\n", 37 | " return weights\n", 38 | "\n", 39 | "# Add a bias term to every data point in the input\n", 40 | "def addBiasTerms(X):\n", 41 | " # Make the input an np.array()\n", 42 | " X = np.array(X)\n", 43 | " \n", 44 | " # Forcing 1D vectors to be 2D matrices of 1xlength dimensions\n", 45 | " if X.ndim==1:\n", 46 | " X = np.reshape(X, (1, len(X)))\n", 47 | " \n", 48 | " # Inserting bias terms\n", 49 | " X = np.insert(X, 0, 1, axis=1)\n", 50 | " \n", 51 | " return X\n", 52 | "\n", 53 | "# Sigmoid function\n", 54 | "def sigmoid(a):\n", 55 | " return 1/(1 + np.exp(-a))\n", 56 | "\n", 57 | "# Forward Propagation of outputs\n", 58 | "def forwardProp(X, weights):\n", 59 | " # Initializing an empty list of outputs\n", 60 | " outputs = []\n", 61 | " \n", 62 | " # Assigning a name to reuse as inputs\n", 63 | " inputs = X\n", 64 | " \n", 65 | " # For each layer\n", 66 | " for w in weights:\n", 67 | " # Add bias term to input\n", 68 | " inputs = addBiasTerms(inputs)\n", 69 | " \n", 70 | " # Y = Sigmoid ( X .* W^T )\n", 71 | " outputs.append(sigmoid(np.dot(inputs, w.T)))\n", 72 | " \n", 73 | " # Input of next layer is output of this layer\n", 74 | " inputs = outputs[-1]\n", 75 | " \n", 76 | " return outputs" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "# Training Neural Networks\n", 84 | "\n", 85 | "$$ Y^{(l)}_{n{\\times}o_{l}} = Sigmoid\\;(\\;X^{(l)}_{n{\\times}i_{l}} \\; .* \\; W^{(l)}{^{T}}_{i_{l}{\\times}o_{l}}) \\;\\;\\;\\;\\;\\;-------------(1)$$\n", 86 | "\n", 87 | "Neural networks are advantageous when we are able to compute that $W$ which satisfies $Y = Sigmoid(X\\cdot*W)$, for given $X$ and $Y$ (in supervised training).\n", 88 | "\n", 89 | "But, since there are so many weights (for bigger networks), it is time-intensive to algebraically solve the above equation. (Something like $W = X^{-1} \\;.*\\; Sigmoid^{-1}(Y)$...)" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "## Set W to minimize cost (computationally intensive)\n", 97 | "\n", 98 | "A quicker way to compute W would be to randomly initialize it, and keep updating its value in such a way as to decrease the cost of the neural network.\n", 99 | "\n", 100 | "Define the cost as the mean squared error of the output of the neural network:\n", 101 | "\n", 102 | "$$error = yPred-Y$$\n", 103 | "\n", 104 | "Here, $yPred$ = ``forwardProp``$(X)$, and $Y$ is the desired output value from the neural network.\n", 105 | "\n", 106 | "$$Cost \\; J = \\frac{1}{2} \\sum \\limits_{n} \\frac{ {\\left( error \\right)}^2 }{n} = \\frac{1}{2} \\sum \\limits_{n} \\frac{ {\\left( yPred-Y \\right)}^2 }{n}$$\n", 107 | "\n", 108 | "Once we have initialized W, we need to change it such that J is minimized.\n", 109 | "\n", 110 | "The best way to minimize J w.r.t. W, is to partially derive J w.r.t. W and equate it to 0: $\\frac{{\\partial}J}{{\\partial}W} = 0$. But, this is computationally intensive." 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 433, 116 | "metadata": { 117 | "collapsed": true 118 | }, 119 | "outputs": [], 120 | "source": [ 121 | "# Compute COST (J) of Neural Network\n", 122 | "def nnCost(weights, X, Y):\n", 123 | " # Calculate yPred\n", 124 | " yPred = forwardProp(X, weights)[-1]\n", 125 | " \n", 126 | " # Compute J\n", 127 | " J = 0.5*np.sum((yPred-Y)**2)/len(Y)\n", 128 | " \n", 129 | " return J" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 434, 135 | "metadata": { 136 | "collapsed": true 137 | }, 138 | "outputs": [], 139 | "source": [ 140 | "# Initialize network\n", 141 | "layers = [2, 2, 1]\n", 142 | "weights = initializeWeights(layers)" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 435, 148 | "metadata": { 149 | "collapsed": true 150 | }, 151 | "outputs": [], 152 | "source": [ 153 | "# Declare input and desired output for AND gate\n", 154 | "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n", 155 | "Y = np.array([[0], [0], [0], [1]])" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 436, 161 | "metadata": {}, 162 | "outputs": [ 163 | { 164 | "name": "stdout", 165 | "output_type": "stream", 166 | "text": [ 167 | "0.284231765606\n" 168 | ] 169 | } 170 | ], 171 | "source": [ 172 | "# Cost\n", 173 | "J = nnCost(weights, X, Y)\n", 174 | "print(J)" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "## Randomly initialize W, change it to decrease cost (more feasible)\n", 182 | "\n", 183 | "Instead, we initialize $W$ by randomly sampling from a standard normal distribution, and then keep changing $W$ so as to decrease the cost $J$.\n", 184 | "\n", 185 | "But what value to change $W$ by? To find out, let us focus on the weights of one of the neurons in the last layer, $W^{(L)}_{[k]}$, differentiate $J$ by it to see what we get:\n", 186 | "\n", 187 | "$$\\frac{ {\\partial}J} {{\\partial}W^{(L)}_{[k]} }=\\frac{\\partial}{{\\partial}W^{(L)}_{[k]}}\\left(\\frac{1}{2}\\sum\\limits_{n}{\\frac{ {\\left( yPred-Y \\right)}^2 }{n} }\\right)=\\frac{1}{2*n}\\sum\\limits_{n} \\left( \\frac{\\partial} {{\\partial}W^{(L)}_{[k]}} (yPred-Y)^2 \\right)=\\frac{1}{n}\\sum\\limits_{n} \\left( (yPred-Y) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right)$$\n", 188 | "\n", 189 | "$$\\Rightarrow \\frac{ {\\partial}J} {{\\partial}W^{(L)}_{[k]} } = \\frac{1}{n}\\sum\\limits_{n} \\left( (error) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right)$$" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "The above equation tells us how $J$ changes by changing $W^{(L)}_{[k]}$. Approximating it for numerical analysis:\n", 197 | "\n", 198 | "$${\\Delta}J ={{\\Delta}W^{(L)}_{[k]}} * \\left[ \\frac{1}{n}\\sum\\limits_{n} \\left( (error) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right) \\right] \\;\\;\\;\\;\\;\\;-------------(2)$$ " 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": {}, 204 | "source": [ 205 | "## Change $W^{(L)}_{[k]}$ so that $J$ always decreases\n", 206 | "\n", 207 | "If we ensure that ${\\Delta}W^{(L)}_{[k]}$ is equal to $-\\left[ \\frac{1}{n}\\sum\\limits_{n} \\left( (error) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right) \\right]$, we see that ${\\Delta}J = {\\Delta}W^{(L)}_{[k]}*\\left(-\\left[{\\Delta}W^{(L)}_{[k]}\\right]\\right) = -\\left[{\\Delta}W^{(L)}_{[k]}\\right]^{2} \\Rightarrow$ negative! \n", 208 | "\n", 209 | "Thus, we decide to change $W^{(L)}_{[k]}$ by that amount which ensures $J$ always decreases!\n", 210 | "\n", 211 | "$${\\Delta}W^{(L)}_{[k]} = -\\left[ \\frac{1}{n}\\sum\\limits_{n} \\left( (error) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right) \\right] \\;\\;\\;\\;\\;\\;-------------(3)$$ \n", 212 | "\n", 213 | "So, for each weight in the last layer, that ${\\Delta}W^{(L)}_{[k]}$ which shall (for sure) decrease J can be computed. " 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "## Gradient Descent\n", 221 | "\n", 222 | "If we update each weight as $W^{(L)}_{[k]} \\leftarrow W^{(L)}_{[k]} + {\\Delta}W^{(L)}_{[k]}$, it is guaranteed that with the new weights, the neural network shall produce outputs that are closer to the desired output.\n", 223 | "\n", 224 | "This is how to train a neural network - randomly initialize $W$, iteratively change $W$ according to eq (3).\n", 225 | "\n", 226 | "**This is called Gradient Descent.**\n", 227 | "\n", 228 | "One way to think about this is - assuming the graph of $J$ vs. $W$ is like an upturned hill, we are slowly descending down the hill by changing $W$, to the point where $J$ is minimum.\n", 229 | "\n", 230 | "J is (sort of) a quadratic function on W, so we can assume it's (sort of) like an upturned hill." 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "# Computing ${\\Delta}W^{(L)}$ of last layer\n", 238 | "\n", 239 | "To compute ${\\Delta}W$, we need to compute $error$ and $\\frac{{\\partial}\\;yPred}{{\\partial}W^{(L)}}$" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "## 1. Computing error\n", 247 | "\n", 248 | "$ error = yPred - Y = $ ``forwardProp``$(X) - Y \\;\\;\\;\\;\\;\\;-------------(4)$" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "For example, suppose we want to compute those $W$'s in a 3-neuron network that are able to perform AND logic on two inputs.\n", 256 | "\n", 257 | "Here, for $X = \\left[\\begin{array}{c}(0,0)\\\\(0,1)\\\\(1,0)\\\\(1,1)\\end{array}\\right]$, $Y = \\left[\\begin{array}{c}0\\\\0\\\\0\\\\1\\end{array}\\right]$" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 686, 263 | "metadata": {}, 264 | "outputs": [ 265 | { 266 | "name": "stdout", 267 | "output_type": "stream", 268 | "text": [ 269 | "weights:\n", 270 | "1\n", 271 | "(2, 3)\n", 272 | "[[-0.87271574 0.35621485 0.95252276]\n", 273 | " [-0.61981924 -1.49164222 0.55011796]]\n", 274 | "2\n", 275 | "(1, 3)\n", 276 | "[[-1.57656753 -1.10359895 -0.34594249]]\n" 277 | ] 278 | } 279 | ], 280 | "source": [ 281 | "# Initialize network\n", 282 | "layers = [2, 2, 1]\n", 283 | "weights = initializeWeights(layers)\n", 284 | "\n", 285 | "print(\"weights:\")\n", 286 | "for i in range(len(weights)):\n", 287 | " print(i+1); print(weights[i].shape); print(weights[i])" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "Our weights have been randomly initialized. Let us see what yPred they give:" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": 687, 300 | "metadata": { 301 | "collapsed": true 302 | }, 303 | "outputs": [], 304 | "source": [ 305 | "# Declare input and desired output for AND gate\n", 306 | "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n", 307 | "Y = np.array([[0], [0], [0], [1]])" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": 688, 313 | "metadata": {}, 314 | "outputs": [ 315 | { 316 | "name": "stdout", 317 | "output_type": "stream", 318 | "text": [ 319 | "outputs\n", 320 | "[array([[ 0.29468953, 0.34982256],\n", 321 | " [ 0.51994117, 0.48258173],\n", 322 | " [ 0.37367081, 0.10798781],\n", 323 | " [ 0.60731071, 0.17345395]]), array([[ 0.11682925],\n", 324 | " [ 0.08969868],\n", 325 | " [ 0.11646832],\n", 326 | " [ 0.09056134]])]\n" 327 | ] 328 | } 329 | ], 330 | "source": [ 331 | "# Calculate outputs at each layer by forward propagation\n", 332 | "outputs = forwardProp(X, weights)\n", 333 | "print(\"outputs\"); print(outputs)" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 689, 339 | "metadata": {}, 340 | "outputs": [ 341 | { 342 | "name": "stdout", 343 | "output_type": "stream", 344 | "text": [ 345 | "(4, 1)\n", 346 | "[[ 0.11682925]\n", 347 | " [ 0.08969868]\n", 348 | " [ 0.11646832]\n", 349 | " [ 0.09056134]]\n" 350 | ] 351 | } 352 | ], 353 | "source": [ 354 | "# Calculate yPred as the last output from forward propagation\n", 355 | "yPred = outputs[-1]\n", 356 | "print(yPred.shape); print(yPred)" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": 690, 362 | "metadata": {}, 363 | "outputs": [ 364 | { 365 | "name": "stdout", 366 | "output_type": "stream", 367 | "text": [ 368 | "(4, 1)\n", 369 | "[[ 0.11682925]\n", 370 | " [ 0.08969868]\n", 371 | " [ 0.11646832]\n", 372 | " [-0.90943866]]\n" 373 | ] 374 | } 375 | ], 376 | "source": [ 377 | "# Error = yPred - Y\n", 378 | "error = yPred - Y\n", 379 | "print(error.shape); print(error)" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "## 2. Computing $\\frac{{\\partial}\\;yPred}{{\\partial}W^{(L)}_{[k]}}$\n", 387 | "\n", 388 | "From eq. (1), $yPred$ can be written as:\n", 389 | "\n", 390 | "$$yPred = Sigmoid(X^{(L)}\\;.*\\;W^{(L)}{^{T}})$$\n", 391 | "\n", 392 | "So,\n", 393 | "\n", 394 | "$$\\frac{{\\partial}\\;yPred}{{\\partial}W^{(L)}_{[k]}} = \\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}\\left(Sigmoid\\left(X^{(L)}.*W^{(L)}{^{T}}\\right)\\right) = Sigmoid^{'}\\left(X^{(L)}.*W^{(L)}{^{T}}\\right)*\\left(\\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}\\left((X^{(L)}.*W^{(L)}{^{T}})\\right)\\right)$$\n", 395 | "\n", 396 | "Here, $yPred$ is an $o_{L}$-dimensional number, and $W^{(L)}_{[k]}$ only affect the $k$-th dimension of $yPred$, i.e. $yPred_{[k]}$. So,\n", 397 | "\n", 398 | "$$\\frac{{\\partial}\\;yPred_{[k]}}{{\\partial}W^{(L)}_{[k]}} = Sigmoid^{'}\\left(X^{(L)}.*W^{(L)}_{[k]}\\right)*\\left(\\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}\\left((X^{(L)}.*W^{(L)}_{[k]})\\right)\\right)$$" 399 | ] 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": {}, 404 | "source": [ 405 | "### - Computing $Sigmoid^{'}\\left(X^{(L)}.*W^{(L)}_{[k]}\\right)$\n", 406 | "\n", 407 | "It can be verified that $Sigmoid^{'}(a) = Sigmoid(a)*(1-Sigmoid(a))$. Thus, $Sigmoid^{'}(X^{(L)}.*W^{(L)}_{[k]}{^{T}}) = yPred_{[k]}*(1 - yPred_{[k]})$. So,\n", 408 | "\n", 409 | "$$\\frac{{\\partial}\\;yPred_{[k]}}{{\\partial}W^{(L)}_{[k]}} = \\left(yPred_{[k]}*(1 - yPred_{[k]})*\\left(\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L)}_{[k]}}\\right)\\right)$$\n", 410 | "\n", 411 | "$${\\Delta}W^{(L)}_{[k]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}\\left(error_{[k]}*yPred_{[k]}*(1 - yPred_{[k]})*\\left(\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L)}_{[k]}}\\right)\\right)\\right] \\;\\;\\;\\;\\;\\;-------------(5)$$" 412 | ] 413 | }, 414 | { 415 | "cell_type": "markdown", 416 | "metadata": {}, 417 | "source": [ 418 | "### - Computing $\\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}((X^{(L)}.*W^{(L)}_{[k]}))$\n", 419 | "\n", 420 | "It can be seen that $\\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}((X^{(L)}.*W^{(L)}_{[k]})) = X^{(L)}$\n", 421 | "\n", 422 | "We also know that $X^{(L)} = \\left[ \\begin{array}{c} 1 & Y^{(L-1)} \\end{array} \\right]_{n{\\times}i_{L}}$, and $Y^{(L-1)}$ have been computed during Forward Propagation. So,\n", 423 | "\n", 424 | "$$\\frac{{\\partial}\\;yPred_{[k]}}{{\\partial}W^{(L)}} = (yPred_{[k]}*(1-yPred_{[k]}))*X^{(L)} $$\n", 425 | "\n", 426 | "$${\\Delta}W^{(L)}_{[k]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}\\left(error_{[k]}*yPred_{[k]}*(1 - yPred_{[k]})*X^{(L)}\\right)\\right]\\;\\;\\;\\;\\;\\;-------------(6)$$" 427 | ] 428 | }, 429 | { 430 | "cell_type": "markdown", 431 | "metadata": {}, 432 | "source": [ 433 | "## Combining terms to simplify computation\n", 434 | "\n", 435 | "Here, dimension of $error$, $yPred$, and $(1-yPred)$ is $n{\\times}o_{L}$, while that of $X^{(L)}$ is $n{\\times}i_{L}$. A little thought has to be given towards how those quantities are multiplied.\n", 436 | "\n", 437 | "First of all, we can combine the mentioned three into one and call it $\\delta$.\n", 438 | "\n", 439 | "$${\\delta}_{n{\\times}o_{L}} = error_{n{\\times}o_{L}}*yPred_{n{\\times}o_{L}}*(1-yPred)_{n{\\times}o_{L}} \\;\\;\\;\\;\\;\\;-----(7)$$\n", 440 | "\n", 441 | "$${\\Delta}W^{(L)}_{[k]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}\\left({\\delta}_{[k]}*X^{(L)}\\right)\\right]$$\n", 442 | "\n", 443 | "We can now combine calculations of all dimensions into this matrix operation: (We will figure out the matrix dimensions below)\n", 444 | "\n", 445 | "$${\\Delta}W^{(L)} = -\\left[\\frac{1}{n}\\sum\\limits_{n}\\left({\\delta}*X^{(L)}\\right)\\right]$$\n" 446 | ] 447 | }, 448 | { 449 | "cell_type": "markdown", 450 | "metadata": {}, 451 | "source": [ 452 | "One way of figuring out how $\\delta$ and $X^{(L)}$ are combined is to see that the dimension of ${\\Delta}W$ is $o_{L}{\\times}i_{L}$, dimension of $\\delta$ is $n{\\times}o_{L}$, and the dimension of $X^{(L)}$ is $n{\\times}i_{L}$.\n", 453 | "\n", 454 | "Clearly, the $\\sum\\limits_{n}\\left({\\delta}*X^{(L)}\\right)$ term, when considered for all the weights, is equal to $\\delta^{T}_{o_{L}{\\times}n}\\;.*\\;X^{(L)}_{n{\\times}i_{L}}$, the summation over $n$ being taken care of by the dot product, and the output dimension ${o_{L}{\\times}i_{L}}$ matches that of $W^{(L)}$.\n", 455 | "\n", 456 | "Hence, using matrix operations, ${\\Delta}W^{(L)}$ can be found as:\n", 457 | "\n", 458 | "$${\\Delta}W^{(L)}_{{o_{L}{\\times}i_{L}}} = -\\frac{1}{n}\\left({\\delta}^{T}{_{o_{L}{\\times}n}}\\;.*\\;X^{(L)}_{n{\\times}i_{L}}\\right) \\;\\;\\;\\;\\;\\;-------------(8)$$" 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": 691, 464 | "metadata": {}, 465 | "outputs": [ 466 | { 467 | "name": "stdout", 468 | "output_type": "stream", 469 | "text": [ 470 | "(4, 1)\n", 471 | "[[ 0.01205446]\n", 472 | " [ 0.00732415]\n", 473 | " [ 0.01198499]\n", 474 | " [-0.07490136]]\n" 475 | ] 476 | } 477 | ], 478 | "source": [ 479 | "# Calculate delta for the last layer\n", 480 | "delta = np.multiply(np.multiply(error, yPred), 1-yPred)\n", 481 | "print(delta.shape); print(delta)" 482 | ] 483 | }, 484 | { 485 | "cell_type": "code", 486 | "execution_count": 692, 487 | "metadata": {}, 488 | "outputs": [ 489 | { 490 | "name": "stdout", 491 | "output_type": "stream", 492 | "text": [ 493 | "(4, 3)\n", 494 | "[[ 1. 0.29468953 0.34982256]\n", 495 | " [ 1. 0.51994117 0.48258173]\n", 496 | " [ 1. 0.37367081 0.10798781]\n", 497 | " [ 1. 0.60731071 0.17345395]]\n" 498 | ] 499 | } 500 | ], 501 | "source": [ 502 | "# Find input to the last layer\n", 503 | "xL = addBiasTerms(outputs[-2])\n", 504 | "print(xL.shape); print(xL)" 505 | ] 506 | }, 507 | { 508 | "cell_type": "code", 509 | "execution_count": 693, 510 | "metadata": {}, 511 | "outputs": [ 512 | { 513 | "name": "stdout", 514 | "output_type": "stream", 515 | "text": [ 516 | "(1, 3)\n", 517 | "[[ 0.01088444 0.00841238 0.00098657]]\n" 518 | ] 519 | } 520 | ], 521 | "source": [ 522 | "# Find deltaW for last layer\n", 523 | "deltaW = -np.dot(delta.T, xL)/len(Y)\n", 524 | "print(deltaW.shape); print(deltaW)" 525 | ] 526 | }, 527 | { 528 | "cell_type": "code", 529 | "execution_count": 694, 530 | "metadata": {}, 531 | "outputs": [ 532 | { 533 | "name": "stdout", 534 | "output_type": "stream", 535 | "text": [ 536 | "old weights:\n", 537 | "1\n", 538 | "(2, 3)\n", 539 | "[[-0.87271574 0.35621485 0.95252276]\n", 540 | " [-0.61981924 -1.49164222 0.55011796]]\n", 541 | "2\n", 542 | "(1, 3)\n", 543 | "[[-1.57656753 -1.10359895 -0.34594249]]\n", 544 | "new weights:\n", 545 | "1\n", 546 | "(2, 3)\n", 547 | "[[-0.87271574 0.35621485 0.95252276]\n", 548 | " [-0.61981924 -1.49164222 0.55011796]]\n", 549 | "2\n", 550 | "(1, 3)\n", 551 | "[[-1.5656831 -1.09518657 -0.34495592]]\n", 552 | "old cost:\n", 553 | "0.107792308277\n", 554 | "new cost:\n", 555 | "0.107601673739\n" 556 | ] 557 | } 558 | ], 559 | "source": [ 560 | "# Checking cost of neural network before and after change in W^{L}\n", 561 | "newWeights = [np.array(w) for w in weights]\n", 562 | "newWeights[-1] += deltaW\n", 563 | "\n", 564 | "print(\"old weights:\")\n", 565 | "for i in range(len(weights)):\n", 566 | " print(i+1); print(weights[i].shape); print(weights[i])\n", 567 | "\n", 568 | "print(\"new weights:\")\n", 569 | "for i in range(len(newWeights)):\n", 570 | " print(i+1); print(newWeights[i].shape); print(newWeights[i])\n", 571 | "\n", 572 | "print(\"old cost:\"); print(nnCost(weights, X, Y))\n", 573 | "print(\"new cost:\"); print(nnCost(newWeights, X, Y))" 574 | ] 575 | }, 576 | { 577 | "cell_type": "markdown", 578 | "metadata": {}, 579 | "source": [ 580 | "### **Congratulations! You've just learned how to back propagate!**\n", 581 | "(1 layer only)" 582 | ] 583 | }, 584 | { 585 | "cell_type": "markdown", 586 | "metadata": {}, 587 | "source": [ 588 | "# Back-propagation through layers\n", 589 | "\n", 590 | "For the last layer, according to eq. (5),\n", 591 | "$${\\Delta}W^{(L)}_{[k]} = -\\frac{1}{n}\\sum\\limits_{n}\\left(error_{[k]}*yPred_{[k]}*(1 - yPred_{[k]})*\\left(\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L)}_{[k]}}\\right)\\right) = -\\frac{1}{n}\\sum\\limits_{n}\\left(\\delta^{(L)}_{[k]}*\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L)}_{[k]}}\\right)$$" 592 | ] 593 | }, 594 | { 595 | "cell_type": "markdown", 596 | "metadata": {}, 597 | "source": [ 598 | "### Computing for Layer L-1\n", 599 | "\n", 600 | "If we go back one more layer to find out ${\\Delta}W$ for the $p^{th}$ neuron in the $(L-1)^{th}$ layer, backpropagated from the $k^{th}$ neuron in the $L^{th}$ layer, noting that $X^{L} = Y^{L-1}$:\n", 601 | "\n", 602 | "$${\\Delta}W^{(L-1)}_{[p]from[k]} = -\\frac{1}{n}\\sum\\limits_{n}\\left(\\delta^{(L)}_{[k]}*\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}}\\right) = -\\frac{1}{n}\\sum\\limits_{n}\\left(\\delta^{(L)}_{[k]}*\\frac{{\\partial}(Y^{(L-1)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}}\\right)$$" 603 | ] 604 | }, 605 | { 606 | "cell_type": "markdown", 607 | "metadata": {}, 608 | "source": [ 609 | "Here, $Y^{(L-1)}$ is the collected output of the penultimate layer, i.e. the collected output of all neurons in the penultimate layer. $W^{(L-1)}_{[p]}$ is the weight matrix of the $p^{th}$ neuron in the penultimate layer. So,\n", 610 | "\n", 611 | "$$\\frac{{\\partial}(Y^{(L-1)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}} = \\frac{{\\partial}((Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[0]}) + Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[1]}) + ... + Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[p]}) + ... + Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[i_{L-1}-1]})).*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}}$$\n", 612 | "\n", 613 | "We know that change in $W^{(L-1)}_{[p]}$ does not affect $W^{(L)}$ or any $W^{(L-1)}$ weight matrix other than $W^{(L-1)}_{[p]}$. So:\n", 614 | "\n", 615 | "$$\\frac{{\\partial}(Y^{(L-1)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}} = \\frac{{\\partial}(Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[p]}))}{{\\partial}W^{(L-1)}_{[p]}}$$\n", 616 | "\n", 617 | "$${\\Delta}W^{(L-1)}_{[p]from[k]} = -\\frac{1}{n}\\sum\\limits_{n}\\left(\\delta^{(L)}_{[k]}*W^{(L)}_{[k]}*\\frac{{\\partial}\\;(Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[p]}))}{{\\partial}W^{(L-1)}_{[p]}}\\right)$$\n", 618 | "\n", 619 | "(Ignoring dimensions for now)" 620 | ] 621 | }, 622 | { 623 | "cell_type": "markdown", 624 | "metadata": {}, 625 | "source": [ 626 | "We know how this goes now.\n", 627 | "\n", 628 | "$$\\frac{{\\partial}\\;(Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[p]}))}{{\\partial}W^{(L-1)}_{[p]}} = Sigmoid^{'}(X^{(L-1)}.*W^{(L-1)}_{[p]})*\\frac{{\\partial}(X^{(L-1)}.*W^{(L-1)}_{[p]})}{{\\partial}W^{(L-1)}_{[p]}} = Y^{(L-1)}_{[p]}*(1 - Y^{(L-1)}_{[p]}))*X^{(L-1)}$$\n" 629 | ] 630 | }, 631 | { 632 | "cell_type": "markdown", 633 | "metadata": {}, 634 | "source": [ 635 | "Thus,\n", 636 | "\n", 637 | "$${\\Delta}W^{(L-1)}_{[p]from[k]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}(\\delta^{(L)}_{[k]}*W^{(L)}_{[k]}*(Y^{(L-1)}_{[p]}*(1 - Y^{(L-1)}_{[p]})*X^{(L-1)})\\right]$$\n", 638 | "\n", 639 | "We need to take care of the dimensions. Here, there are two parts: $\\delta^{(L)}_{[k]}*W^{(L)}_{[k]}$, which is only concerned with the $L^{th}$ layer, and $Y^{(L-1)}_{[p]}*(1 - Y^{(L-1)}_{[p]}))*X^{(L-1)}$, which is only concerned with the $(L-1)^{th}$ layer." 640 | ] 641 | }, 642 | { 643 | "cell_type": "markdown", 644 | "metadata": {}, 645 | "source": [ 646 | "### 1) Back-propagation Error\n", 647 | "\n", 648 | "We can observe here that the terms $\\delta^{(L)}_{[k]}$ and $W^{(L)}_{[k]}$ are back-propagated from the $k^{th}$ neuron of the final layer. Let's combine them and call it the back-propagated error:\n", 649 | "$$bpError^{(L-1)}_{[k]}{_{n{\\times}i_{L}}} = \\delta^{(L)}_{[k]}*W^{(L)}_{[k]}$$\n", 650 | "\n", 651 | "We know that $\\delta^{(L)}{_{n{\\times}o_{L}}}*W^{(L)}{_{o_{L}{\\times}{i_{L}}}}$ is a matrix of dimensions $n{\\times}i_{L} = n{\\times}o_{(L-1)}$, which is the sum of the backprop errors from each neuron in the final layer. Thus,\n", 652 | "\n", 653 | "$$bpError^{(L-1)}{_{n{\\times}i_{L}}} = \\delta^{(L)}*W^{(L)} \\;\\;\\;\\;\\;\\;--------------(9)$$\n", 654 | "\n", 655 | "We see that for a neuron in the $(L-1)^{th}$ layer, the total error back-propagated to it is the sum of the back-propagated errors from each of the neurons connected to it in the $L^{th}$ layer." 656 | ] 657 | }, 658 | { 659 | "cell_type": "markdown", 660 | "metadata": {}, 661 | "source": [ 662 | "Thus, instead of ${\\Delta}W^{(L-1)}_{[p]from[k]}$ from the $k^{th}$ neuron, we can directly consider ${\\Delta}W^{(L-1)}_{[p]}$:\n", 663 | "\n", 664 | "$${\\Delta}W^{(L-1)}_{[p]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}(bpError^{(L-1)}_{[p]}*(Y^{(L-1)}_{[p]}*(1 - Y^{(L-1)}_{[p]})*X^{(L-1)})\\right]$$" 665 | ] 666 | }, 667 | { 668 | "cell_type": "markdown", 669 | "metadata": {}, 670 | "source": [ 671 | "### 2) The $Y*(1-Y)*X$ term\n", 672 | "\n", 673 | "We can convert the $Y*(1-Y)*X$ term into a matrix operation, with summation over $n$ inherently taken care. Directly considering $Y$ instead of $Y_{[p]}$:\n", 674 | "\n", 675 | "$$Y^{(L-1)}*(1 - Y^{(L-1)}))*X^{(L-1)} == (Y^{(L-1)}.*(1 - Y^{(L-1)}))^{T}{_{o_{(L-1)}{\\times}n}} * X^{(L-1)}{_{n{\\times}i_{(L-1)}}}$$\n", 676 | "\n", 677 | "We can see that the resultant matrix has the same dimensions as $W^{(L-1)}$ : $o_{(L-1)}{\\times}i_{(L-1)}$." 678 | ] 679 | }, 680 | { 681 | "cell_type": "markdown", 682 | "metadata": {}, 683 | "source": [ 684 | "### Combining the two\n", 685 | "\n", 686 | "To combine $bpError$ and the $Y*(1-Y)*X$ terms, for consistency in dimensions, we need to first dot multiply $bpError_{n{\\times}o_{(L-1)}}$ with $Y^{(L-1)}_{n{\\times}o_{(L-1)}}.*(1 - Y^{(L-1)})_{n{\\times}o_{(L-1)}}$, and then multiply the transpose of that with X.\n", 687 | "\n", 688 | "Thus,\n", 689 | "\n", 690 | "$${\\Delta}W^{(L-1)}_{o_{(L-1)}{\\times}i_{(L-1)}} = -\\left[\\frac{1}{n}((bpError^{(L-1)}.*Y^{(L-1)}.*(1 - Y^{(L-1)}))^{T} _{o_{(L-1)}{\\times}n}* X^{(L-1)}_{n{\\times}i_{(L-1)}}\\right] \\;\\;\\;\\;\\;\\;--------------(10)$$\n", 691 | "\n", 692 | "(Summation across $n$ is taken care of within the matrix multiplication)" 693 | ] 694 | }, 695 | { 696 | "cell_type": "markdown", 697 | "metadata": {}, 698 | "source": [ 699 | "## Simplifying to matrix operation of any layer $l$\n", 700 | "\n", 701 | "Just as we had done for the final layer, from equation 9:\n", 702 | "\n", 703 | "
$bpError^{(l)}_{n{\\times}o_{l}} = \\delta^{(l+1)}_{n{\\times}o_{l+1}}*W^{(l+1)}_{o_{l+1}{\\times}o_{l}}$\n", 704 | "\n", 705 | "If we compare equation (10) with equation (6), we can generalize \"error\" there as Backpropagation Error, and the formula for ${\\delta}$ as:\n", 706 | "\n", 707 | "$${\\delta}^{(l)}_{n{\\times}o_{l}} = {bpError^{(l)}_{n{\\times}o_{l}}} .* {Y^{(l)}_{n{\\times}o_{l}}} .* (1-Y^{(l)})_{n{\\times}o_{l}}$$\n", 708 | "\n", 709 | "Thus,\n", 710 | "\n", 711 | "$${\\Delta}W^{(l)}_{{o_{l}{\\times}i_{l}}} = -\\frac{1}{n}\\left({\\delta^{(l)}}^{T}{_{o_{l}{\\times}n}}\\;.*\\;X^{(l)}_{n{\\times}i_{l}}\\right) \\;\\;\\;\\;\\;\\;-------------(11)$$\n" 712 | ] 713 | }, 714 | { 715 | "cell_type": "code", 716 | "execution_count": 695, 717 | "metadata": { 718 | "collapsed": true 719 | }, 720 | "outputs": [], 721 | "source": [ 722 | "# IMPLEMENTING BACK-PROPAGATION\n", 723 | "def backProp(weights, X, Y):\n", 724 | " # Forward propagate to find outputs\n", 725 | " outputs = forwardProp(X, weights)\n", 726 | " \n", 727 | " # For the last layer, bpError = error = yPred - Y\n", 728 | " bpError = outputs[-1] - Y\n", 729 | " \n", 730 | " # Back-propagating from the last layer to the first\n", 731 | " for l, w in enumerate(reversed(weights)):\n", 732 | " \n", 733 | " # Find yPred for this layer\n", 734 | " yPred = outputs[-l-1]\n", 735 | " \n", 736 | " # Calculate delta for this layer using bpError from next layer\n", 737 | " delta = np.multiply(np.multiply(bpError, yPred), 1-yPred)\n", 738 | " \n", 739 | " # Find input to the layer, by adding bias to the output of the previous layer\n", 740 | " # Take care, l goes from 0 to 1, while the weights are in reverse order\n", 741 | " if l==len(weights)-1: # If 1st layer has been reached\n", 742 | " xL = addBiasTerms(X)\n", 743 | " else:\n", 744 | " xL = addBiasTerms(outputs[-l-2])\n", 745 | " \n", 746 | " # Calculate deltaW for this layer\n", 747 | " deltaW = -np.dot(delta.T, xL)/len(Y)\n", 748 | " \n", 749 | " # Calculate bpError for previous layer to be back-propagated\n", 750 | " bpError = np.dot(delta, w)\n", 751 | " \n", 752 | " # Ignore bias term in bpError\n", 753 | " bpError = bpError[:,1:]\n", 754 | " \n", 755 | " # Change weights of the current layer (W <- W + deltaW)\n", 756 | " w += deltaW" 757 | ] 758 | }, 759 | { 760 | "cell_type": "code", 761 | "execution_count": 698, 762 | "metadata": {}, 763 | "outputs": [ 764 | { 765 | "name": "stdout", 766 | "output_type": "stream", 767 | "text": [ 768 | "old weights:\n", 769 | "1\n", 770 | "(2, 3)\n", 771 | "[[-0.87271574 0.35621485 0.95252276]\n", 772 | " [-0.61981924 -1.49164222 0.55011796]]\n", 773 | "2\n", 774 | "(1, 3)\n", 775 | "[[-1.57656753 -1.10359895 -0.34594249]]\n", 776 | "old cost:\n", 777 | "0.107792308277\n" 778 | ] 779 | } 780 | ], 781 | "source": [ 782 | "# To check with the single back-propagation step done before,\n", 783 | "# back up the current weights\n", 784 | "oldWeights = [np.array(w) for w in weights]\n", 785 | "print(\"old weights:\")\n", 786 | "for i in range(len(oldWeights)):\n", 787 | " print(i+1); print(oldWeights[i].shape); print(oldWeights[i])\n", 788 | "\n", 789 | "print(\"old cost:\"); print(nnCost(oldWeights, X, Y))" 790 | ] 791 | }, 792 | { 793 | "cell_type": "markdown", 794 | "metadata": {}, 795 | "source": [ 796 | "Let us define a function to compute the accuracy of our model, irrespective of the number of neuron in the output layer." 797 | ] 798 | }, 799 | { 800 | "cell_type": "code", 801 | "execution_count": 699, 802 | "metadata": { 803 | "collapsed": true 804 | }, 805 | "outputs": [], 806 | "source": [ 807 | "# Evaluate the accuracy of weights for input X and desired outptut Y\n", 808 | "def evaluate(weights, X, Y):\n", 809 | " yPreds = forwardProp(X, weights)[-1]\n", 810 | " # Check if maximum probability is from that neuron corresponding to desired class,\n", 811 | " # AND check if that maximum probability is greater than 0.5\n", 812 | " yes = sum( int( ( np.argmax(yPreds[i]) == np.argmax(Y[i]) ) and \n", 813 | " ( (yPreds[i][np.argmax(yPreds[i])]>0.5) == (Y[i][np.argmax(Y[i])]>0.5) ) )\n", 814 | " for i in range(len(Y)) )\n", 815 | " print(str(yes)+\" out of \"+str(len(Y))+\" : \"+str(float(yes/len(Y))))" 816 | ] 817 | }, 818 | { 819 | "cell_type": "markdown", 820 | "metadata": {}, 821 | "source": [ 822 | "Check the results of back-propagation:" 823 | ] 824 | }, 825 | { 826 | "cell_type": "code", 827 | "execution_count": 722, 828 | "metadata": {}, 829 | "outputs": [ 830 | { 831 | "name": "stdout", 832 | "output_type": "stream", 833 | "text": [ 834 | "950\n", 835 | "new cost:\n", 836 | "0.0113971310862\n", 837 | "new accuracy: \n", 838 | "4 out of 4 : 1.0\n", 839 | "[[ 0.03022141]\n", 840 | " [ 0.13740936]\n", 841 | " [ 0.13683374]\n", 842 | " [ 0.7705247 ]]\n" 843 | ] 844 | } 845 | ], 846 | "source": [ 847 | "# BACK-PROPAGATE, checking old & new weights and costs\n", 848 | "\n", 849 | "# Re-initialize to old weights\n", 850 | "weights = [np.array(w) for w in oldWeights]\n", 851 | "\n", 852 | "#print(\"old weights:\")\n", 853 | "#for i in range(len(weights)):\n", 854 | "# print(i+1); print(weights[i].shape); print(weights[i])\n", 855 | "\n", 856 | "print(\"old cost: \"); print(nnCost(weights, X, Y))\n", 857 | "print(\"old accuracy: \"); print(evaluate(weights, X, Y))\n", 858 | "for i in range(1000):\n", 859 | " # Back propagate\n", 860 | " backProp(weights, X, Y)\n", 861 | "\n", 862 | " #print(\"new weights:\")\n", 863 | " #for i in range(len(weights)):\n", 864 | " # print(i+1); print(weights[i].shape); print(weights[i])\n", 865 | " \n", 866 | " if i%50==0:\n", 867 | " time.sleep(1)\n", 868 | " clear_output()\n", 869 | " print(i)\n", 870 | " print(\"new cost:\"); print(nnCost(weights, X, Y))\n", 871 | " print(\"new accuracy: \"); evaluate(weights, X, Y)\n", 872 | " print(forwardProp(X, weights)[-1])\n" 873 | ] 874 | }, 875 | { 876 | "cell_type": "code", 877 | "execution_count": 718, 878 | "metadata": { 879 | "collapsed": true 880 | }, 881 | "outputs": [], 882 | "source": [ 883 | "# Revert back to original weights (if needed)\n", 884 | "weights = [np.array(w) for w in oldWeights]" 885 | ] 886 | }, 887 | { 888 | "cell_type": "markdown", 889 | "metadata": {}, 890 | "source": [ 891 | "### Training\n", 892 | "\n", 893 | "Keep calling backProp() again and again until the cost decreases so much that we reach our desired accuracy.\n", 894 | "\n", 895 | "You can observe the cost of the function going down with iterations." 896 | ] 897 | }, 898 | { 899 | "cell_type": "markdown", 900 | "metadata": {}, 901 | "source": [ 902 | "# Problems\n", 903 | "\n", 904 | "### - Not reaching desired accuracy fast enough\n", 905 | "\n", 906 | "It takes too many iterations of the backProp algorithm for the network to reach the desired output.\n", 907 | "\n", 908 | "One of the simplest ways of solving this problem is by adding a Learning Rate (described below) to the back-propagation algorithm.\n", 909 | "\n", 910 | "### - Taking too long to compute one iteration\n", 911 | "\n", 912 | "Within one iteration, the multiplication and summing operations take too long because there are too many data points feeded into the network.\n", 913 | "\n", 914 | "This problem is tackled using Stochastic Gradient Descent (talked about in the next tutorial). The above algorithm is running Batch Gradient Descent. " 915 | ] 916 | }, 917 | { 918 | "cell_type": "markdown", 919 | "metadata": {}, 920 | "source": [ 921 | "# Learning Rate\n", 922 | "\n", 923 | "Usually, it is desired that we change the amount with which we back propagate, so that we can train our network to reach the desired accuracy faster. So we multiply ${\\Delta}W$ with a factor to control this.\n", 924 | "\n", 925 | "$$W \\leftarrow W + \\eta*{\\Delta}W$$" 926 | ] 927 | }, 928 | { 929 | "cell_type": "markdown", 930 | "metadata": {}, 931 | "source": [ 932 | "If $\\eta$ is large, then we take bigger steps to the assumed minimum. If $\\eta$ is small, we take smaller steps.\n", 933 | "\n", 934 | "Remember that we are not actually travelling on the gradient, we are only approximating the direction using a ${\\Delta}W$ instead of a ${\\delta}W$. So we don't always point in the direction of the minimum, we could undershoot or overshoot." 935 | ] 936 | }, 937 | { 938 | "cell_type": "markdown", 939 | "metadata": {}, 940 | "source": [ 941 | "If $\\eta$ is too small, we might take too long to get to the minimum.\n", 942 | "\n", 943 | "If $\\eta$ is too big, we might start climbing back up the hill and our cost would keep increasing instead of decreasing!" 944 | ] 945 | }, 946 | { 947 | "cell_type": "markdown", 948 | "metadata": {}, 949 | "source": [ 950 | "One way to ensure that we get the best learning rate is to start at, say, 1,\n", 951 | "- increase $\\eta$ by 5% if the cost is decreasing\n", 952 | "- decrease $\\eta$ to 50% if the cost is increasing" 953 | ] 954 | }, 955 | { 956 | "cell_type": "markdown", 957 | "metadata": {}, 958 | "source": [ 959 | "### Different ways to manipulate learning rate\n", 960 | "\n", 961 | "There are various methods available that leverage the variability of learning rate, to produce results that \"converge\" (reach a minimum) faster. The following list includes those with even more complicated methods of trying to converge faster:\n", 962 | "\n", 963 | "
![Optimizers](images/optimizers.gif)" 964 | ] 965 | }, 966 | { 967 | "cell_type": "markdown", 968 | "metadata": {}, 969 | "source": [ 970 | "As can be seen, Stochastic Gradient Descent (SGD) itself performs slower than all the other methods, and the one that we are using (Batch Gradient Descent) is even slower." 971 | ] 972 | }, 973 | { 974 | "cell_type": "markdown", 975 | "metadata": {}, 976 | "source": [ 977 | "Below is an implementation of backProp with provision for learning rate:" 978 | ] 979 | }, 980 | { 981 | "cell_type": "code", 982 | "execution_count": 485, 983 | "metadata": { 984 | "collapsed": true 985 | }, 986 | "outputs": [], 987 | "source": [ 988 | "# IMPLEMENTING BACK-PROPAGATION WITH LEARNING RATE\n", 989 | "# Added eta, the learning rate, as an input\n", 990 | "def backProp(weights, X, Y, learningRate):\n", 991 | " # Forward propagate to find outputs\n", 992 | " outputs = forwardProp(X, weights)\n", 993 | " \n", 994 | " # For the last layer, bpError = error = yPred - Y\n", 995 | " bpError = outputs[-1] - Y\n", 996 | " \n", 997 | " # Back-propagating from the last layer to the first\n", 998 | " for l, w in enumerate(reversed(weights)):\n", 999 | " \n", 1000 | " # Find yPred for this layer\n", 1001 | " yPred = outputs[-l-1]\n", 1002 | " \n", 1003 | " # Calculate delta for this layer using bpError from next layer\n", 1004 | " delta = np.multiply(np.multiply(bpError, yPred), 1-yPred)\n", 1005 | " \n", 1006 | " # Find input to the layer, by adding bias to the output of the previous layer\n", 1007 | " # Take care, l goes from 0 to 1, while the weights are in reverse order\n", 1008 | " if l==len(weights)-1: # If 1st layer has been reached\n", 1009 | " xL = addBiasTerms(X)\n", 1010 | " else:\n", 1011 | " xL = addBiasTerms(outputs[-l-2])\n", 1012 | " \n", 1013 | " # Calculate deltaW for this layer\n", 1014 | " deltaW = -np.dot(delta.T, xL)/len(Y)\n", 1015 | " \n", 1016 | " # Calculate bpError for previous layer to be back-propagated\n", 1017 | " bpError = np.dot(delta, w)\n", 1018 | " \n", 1019 | " # Ignore bias term in bpError\n", 1020 | " bpError = bpError[:,1:]\n", 1021 | " \n", 1022 | " # Change weights of the current layer (W <- W + eta*deltaW)\n", 1023 | " w += learningRate*deltaW" 1024 | ] 1025 | }, 1026 | { 1027 | "cell_type": "markdown", 1028 | "metadata": {}, 1029 | "source": [ 1030 | "Given this back-propagation code, it is better to launch another function that calls it iteratively until we reach the desired accuracy." 1031 | ] 1032 | }, 1033 | { 1034 | "cell_type": "markdown", 1035 | "metadata": {}, 1036 | "source": [ 1037 | "We shall look at training schemes and experiments in the next tutorial." 1038 | ] 1039 | } 1040 | ], 1041 | "metadata": { 1042 | "kernelspec": { 1043 | "display_name": "Python 3", 1044 | "language": "python", 1045 | "name": "python3" 1046 | }, 1047 | "language_info": { 1048 | "codemirror_mode": { 1049 | "name": "ipython", 1050 | "version": 3 1051 | }, 1052 | "file_extension": ".py", 1053 | "mimetype": "text/x-python", 1054 | "name": "python", 1055 | "nbconvert_exporter": "python", 1056 | "pygments_lexer": "ipython3", 1057 | "version": "3.5.2" 1058 | } 1059 | }, 1060 | "nbformat": 4, 1061 | "nbformat_minor": 2 1062 | } 1063 | -------------------------------------------------------------------------------- /5_Neural_Network_Tutorial_Training_And_Testing.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 2, 6 | "metadata": { 7 | "collapsed": true 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "# Pre-requisites\n", 12 | "import numpy as np\n", 13 | "import time\n", 14 | "\n", 15 | "# For plots\n", 16 | "%matplotlib inline\n", 17 | "import matplotlib.pyplot as plt\n", 18 | "\n", 19 | "# To clear print buffer\n", 20 | "from IPython.display import clear_output" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "# Importing functions from the previous tutorials:" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 3, 33 | "metadata": {}, 34 | "outputs": [ 35 | { 36 | "name": "stdout", 37 | "output_type": "stream", 38 | "text": [ 39 | "weights:\n", 40 | "1\n", 41 | "(2, 3)\n", 42 | "[[-0.33589735 -0.396816 0.45849862]\n", 43 | " [-0.64374374 -2.41279823 0.78403628]]\n", 44 | "2\n", 45 | "(1, 3)\n", 46 | "[[ 1.54182154 -0.12516091 -0.28203429]]\n" 47 | ] 48 | } 49 | ], 50 | "source": [ 51 | "# Initializing weight matrices from layer sizes\n", 52 | "def initializeWeights(layers):\n", 53 | " weights = [np.random.randn(o, i+1) for i, o in zip(layers[:-1], layers[1:])]\n", 54 | " return weights\n", 55 | "\n", 56 | "# Add a bias term to every data point in the input\n", 57 | "def addBiasTerms(X):\n", 58 | " # Make the input an np.array()\n", 59 | " X = np.array(X)\n", 60 | " \n", 61 | " # Forcing 1D vectors to be 2D matrices of 1xlength dimensions\n", 62 | " if X.ndim==1:\n", 63 | " X = np.reshape(X, (1, len(X)))\n", 64 | " \n", 65 | " # Inserting bias terms\n", 66 | " X = np.insert(X, 0, 1, axis=1)\n", 67 | " \n", 68 | " return X\n", 69 | "\n", 70 | "# Sigmoid function\n", 71 | "def sigmoid(a):\n", 72 | " return 1/(1 + np.exp(-a))\n", 73 | "\n", 74 | "# Forward Propagation of outputs\n", 75 | "def forwardProp(X, weights):\n", 76 | " # Initializing an empty list of outputs\n", 77 | " outputs = []\n", 78 | " \n", 79 | " # Assigning a name to reuse as inputs\n", 80 | " inputs = X\n", 81 | " \n", 82 | " # For each layer\n", 83 | " for w in weights:\n", 84 | " # Add bias term to input\n", 85 | " inputs = addBiasTerms(inputs)\n", 86 | " \n", 87 | " # Y = Sigmoid ( X .* W^T )\n", 88 | " outputs.append(sigmoid(np.dot(inputs, w.T)))\n", 89 | " \n", 90 | " # Input of next layer is output of this layer\n", 91 | " inputs = outputs[-1]\n", 92 | " \n", 93 | " return outputs\n", 94 | "\n", 95 | "# Compute COST (J) of Neural Network\n", 96 | "def nnCost(weights, X, Y):\n", 97 | " # Calculate yPred\n", 98 | " yPred = forwardProp(X, weights)[-1]\n", 99 | " \n", 100 | " # Compute J\n", 101 | " J = 0.5*np.sum((yPred-Y)**2)/len(Y)\n", 102 | " \n", 103 | " return J\n", 104 | "\n", 105 | "# IMPLEMENTING BACK-PROPAGATION WITH LEARNING RATE\n", 106 | "# Added eta, the learning rate, as an input\n", 107 | "def backProp(weights, X, Y, learningRate):\n", 108 | " # Forward propagate to find outputs\n", 109 | " outputs = forwardProp(X, weights)\n", 110 | " \n", 111 | " # For the last layer, bpError = error = yPred - Y\n", 112 | " bpError = outputs[-1] - Y\n", 113 | " \n", 114 | " # Back-propagating from the last layer to the first\n", 115 | " for l, w in enumerate(reversed(weights)):\n", 116 | " \n", 117 | " # Find yPred for this layer\n", 118 | " yPred = outputs[-l-1]\n", 119 | " \n", 120 | " # Calculate delta for this layer using bpError from next layer\n", 121 | " delta = np.multiply(np.multiply(bpError, yPred), 1-yPred)\n", 122 | " \n", 123 | " # Find input to the layer, by adding bias to the output of the previous layer\n", 124 | " # Take care, l goes from 0 to 1, while the weights are in reverse order\n", 125 | " if l==len(weights)-1: # If 1st layer has been reached\n", 126 | " xL = addBiasTerms(X)\n", 127 | " else:\n", 128 | " xL = addBiasTerms(outputs[-l-2])\n", 129 | " \n", 130 | " # Calculate deltaW for this layer\n", 131 | " deltaW = -np.dot(delta.T, xL)/len(Y)\n", 132 | " \n", 133 | " # Calculate bpError for previous layer to be back-propagated\n", 134 | " bpError = np.dot(delta, w)\n", 135 | " \n", 136 | " # Ignore bias term in bpError\n", 137 | " bpError = bpError[:,1:]\n", 138 | " \n", 139 | " # Change weights of the current layer (W <- W + eta*deltaW)\n", 140 | " w += learningRate*deltaW\n", 141 | "\n", 142 | "# Evaluate the accuracy of weights for input X and desired outptut Y\n", 143 | "def evaluate(weights, X, Y):\n", 144 | " yPreds = forwardProp(X, weights)[-1]\n", 145 | " # Check if maximum probability is from that neuron corresponding to desired class,\n", 146 | " # AND check if that maximum probability is greater than 0.5\n", 147 | " yes = sum( int( ( np.argmax(yPreds[i]) == np.argmax(Y[i]) ) and \n", 148 | " ( (yPreds[i][np.argmax(yPreds[i])]>0.5) == (Y[i][np.argmax(Y[i])]>0.5) ) )\n", 149 | " for i in range(len(Y)) )\n", 150 | " print(str(yes)+\" out of \"+str(len(Y))+\" : \"+str(float(yes/len(Y))))\n", 151 | "\n", 152 | "# Initialize network\n", 153 | "layers = [2, 2, 1]\n", 154 | "weights = initializeWeights(layers)\n", 155 | "\n", 156 | "print(\"weights:\")\n", 157 | "for i in range(len(weights)):\n", 158 | " print(i+1); print(weights[i].shape); print(weights[i])\n", 159 | "\n", 160 | "# Declare input and desired output for AND gate\n", 161 | "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n", 162 | "Y = np.array([[0], [0], [0], [1]])" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "# Batch Gradient Descent\n", 170 | "\n", 171 | "Batch Gradient Descent is how we have tried to train our network so far - give it ALL the data points, compute ${\\Delta}W$s by summing up quantities across ALL the data points, change all the weights once, Repeat." 172 | ] 173 | }, 174 | { 175 | "cell_type": "markdown", 176 | "metadata": {}, 177 | "source": [ 178 | "Suppose we want to train our 3-neuron network to implement Logical XOR.\n", 179 | "\n", 180 | "Inputs are: $X=\\left[\\begin{array}{c}(0,0)\\\\(0,1)\\\\(1,0)\\\\(1,1)\\end{array}\\right]$, and the desired output is $Y=\\left[\\begin{array}{c}0\\\\1\\\\1\\\\0\\end{array}\\right]$." 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "We know that in order to train the network, we need to call backProp repeatedly. Let us use a function to do that." 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "execution_count": 4, 193 | "metadata": { 194 | "collapsed": true 195 | }, 196 | "outputs": [], 197 | "source": [ 198 | "# TRAINING FUNCTION, USING GD\n", 199 | "def train(weights, X, Y, nIterations, learningRate=1):\n", 200 | " for i in range(nIterations):\n", 201 | " # Run backprop\n", 202 | " backProp(weights, X, Y, learningRate)\n", 203 | " \n", 204 | " # Clears screen output\n", 205 | " if (i+1)%(nIterations/10)==0:\n", 206 | " clear_output()\n", 207 | " print(\"Iteration \"+str(i+1)+\" of \"+str(nIterations))\n", 208 | " # Prints Cost and Accuracy\n", 209 | " print(\"Cost: \"+str(nnCost(weights, X, Y)))\n", 210 | " print(\"Accuracy:\")\n", 211 | " evaluate(weights, X, Y)" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 5, 217 | "metadata": {}, 218 | "outputs": [ 219 | { 220 | "name": "stdout", 221 | "output_type": "stream", 222 | "text": [ 223 | "weights:\n", 224 | "1\n", 225 | "(2, 3)\n", 226 | "[[ 0.04837515 0.26989845 -0.24049688]\n", 227 | " [ 0.40457749 -1.12764482 1.62391936]]\n", 228 | "2\n", 229 | "(1, 3)\n", 230 | "[[-0.21690785 -0.77508326 0.61363791]]\n" 231 | ] 232 | } 233 | ], 234 | "source": [ 235 | "# Initialize network\n", 236 | "layers = [2, 2, 1]\n", 237 | "weights = initializeWeights(layers)\n", 238 | "\n", 239 | "print(\"weights:\")\n", 240 | "for i in range(len(weights)):\n", 241 | " print(i+1); print(weights[i].shape); print(weights[i])\n", 242 | "\n", 243 | "# Take backup of weights to be used later for comparison\n", 244 | "initialWeights = [np.array(w) for w in weights]" 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": 6, 250 | "metadata": { 251 | "collapsed": true 252 | }, 253 | "outputs": [], 254 | "source": [ 255 | "# Declare input and desired output for XOR gate\n", 256 | "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n", 257 | "Y = np.array([[0], [1], [1], [0]])" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 7, 263 | "metadata": {}, 264 | "outputs": [ 265 | { 266 | "name": "stdout", 267 | "output_type": "stream", 268 | "text": [ 269 | "Cost: 0.12907524705\n", 270 | "Accuracy: \n", 271 | "2 out of 4 : 0.5\n", 272 | "[[ 0.43886508]\n", 273 | " [ 0.49374299]\n", 274 | " [ 0.38577198]\n", 275 | " [ 0.4543426 ]]\n" 276 | ] 277 | } 278 | ], 279 | "source": [ 280 | "# Check current accuracy and cost\n", 281 | "print(\"Cost: \"+str(nnCost(weights, X, Y)))\n", 282 | "print(\"Accuracy: \")\n", 283 | "evaluate(weights, X, Y)\n", 284 | "print(forwardProp(X, weights)[-1])" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": {}, 290 | "source": [ 291 | "Say we want to train our model 600 times." 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": 8, 297 | "metadata": {}, 298 | "outputs": [ 299 | { 300 | "name": "stdout", 301 | "output_type": "stream", 302 | "text": [ 303 | "Iteration 400 of 400\n", 304 | "Cost: 0.124997811474\n", 305 | "Accuracy:\n", 306 | "3 out of 4 : 0.75\n", 307 | "[[ 0.49895486]\n", 308 | " [ 0.50338071]\n", 309 | " [ 0.49407386]\n", 310 | " [ 0.4984321 ]]\n" 311 | ] 312 | } 313 | ], 314 | "source": [ 315 | "nIterations = 400\n", 316 | "train(weights, X, Y, nIterations)\n", 317 | "print(forwardProp(X, weights)[-1])" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": 9, 323 | "metadata": { 324 | "collapsed": true 325 | }, 326 | "outputs": [], 327 | "source": [ 328 | "# In case we want to revert the weight back\n", 329 | "weights = [np.array(w) for w in initialWeights]" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": {}, 335 | "source": [ 336 | "It took our function a long time to train.\n", 337 | "\n", 338 | "What if we speed up using adaptive learning rate?" 339 | ] 340 | }, 341 | { 342 | "cell_type": "code", 343 | "execution_count": 10, 344 | "metadata": { 345 | "collapsed": true 346 | }, 347 | "outputs": [], 348 | "source": [ 349 | "# TRAINING FUNCTION, USING GD\n", 350 | "# Default learning rate = 1.0\n", 351 | "def trainUsingGD(weights, X, Y, nIterations, learningRate=1.0):\n", 352 | " # Setting initial cost to infinity\n", 353 | " prevCost = np.inf\n", 354 | " \n", 355 | " # For nIterations number of iterations:\n", 356 | " for i in range(nIterations):\n", 357 | " # Run backprop\n", 358 | " backProp(weights, X, Y, learningRate)\n", 359 | " \n", 360 | " #clear_output()\n", 361 | " print(\"Iteration \"+str(i+1)+\" of \"+str(nIterations))\n", 362 | " cost = nnCost(weights, X, Y)\n", 363 | " print(\"Cost: \"+str(cost))\n", 364 | " \n", 365 | " # ADAPT LEARNING RATE\n", 366 | " # If cost increases\n", 367 | " if (cost > prevCost):\n", 368 | " # Halve the learning rate\n", 369 | " learningRate /= 2.0\n", 370 | " # If cost decreases\n", 371 | " else:\n", 372 | " # Increase learning rate by 5%\n", 373 | " learningRate *= 1.05\n", 374 | " \n", 375 | " prevCost = cost" 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": 11, 381 | "metadata": { 382 | "collapsed": true 383 | }, 384 | "outputs": [], 385 | "source": [ 386 | "# Revert weights back to initial values\n", 387 | "weights = [np.array(w) for w in initialWeights]" 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": 12, 393 | "metadata": {}, 394 | "outputs": [ 395 | { 396 | "name": "stdout", 397 | "output_type": "stream", 398 | "text": [ 399 | "Iteration 1 of 100\n", 400 | "Cost: 0.128848112614\n", 401 | "Iteration 2 of 100\n", 402 | "Cost: 0.128650869728\n", 403 | "Iteration 3 of 100\n", 404 | "Cost: 0.12848026395\n", 405 | "Iteration 4 of 100\n", 406 | "Cost: 0.128332996448\n", 407 | "Iteration 5 of 100\n", 408 | "Cost: 0.128205816336\n", 409 | "Iteration 6 of 100\n", 410 | "Cost: 0.128095601033\n", 411 | "Iteration 7 of 100\n", 412 | "Cost: 0.127999422128\n", 413 | "Iteration 8 of 100\n", 414 | "Cost: 0.12791459536\n", 415 | "Iteration 9 of 100\n", 416 | "Cost: 0.127838714376\n", 417 | "Iteration 10 of 100\n", 418 | "Cost: 0.12776966891\n", 419 | "Iteration 11 of 100\n", 420 | "Cost: 0.127705648793\n", 421 | "Iteration 12 of 100\n", 422 | "Cost: 0.127645135859\n", 423 | "Iteration 13 of 100\n", 424 | "Cost: 0.127586886153\n", 425 | "Iteration 14 of 100\n", 426 | "Cost: 0.127529905081\n", 427 | "Iteration 15 of 100\n", 428 | "Cost: 0.127473418103\n", 429 | "Iteration 16 of 100\n", 430 | "Cost: 0.127416839401\n", 431 | "Iteration 17 of 100\n", 432 | "Cost: 0.127359740627\n", 433 | "Iteration 18 of 100\n", 434 | "Cost: 0.127301821408\n", 435 | "Iteration 19 of 100\n", 436 | "Cost: 0.127242882839\n", 437 | "Iteration 20 of 100\n", 438 | "Cost: 0.127182804686\n", 439 | "Iteration 21 of 100\n", 440 | "Cost: 0.127121526616\n", 441 | "Iteration 22 of 100\n", 442 | "Cost: 0.127059033379\n", 443 | "Iteration 23 of 100\n", 444 | "Cost: 0.126995343612\n", 445 | "Iteration 24 of 100\n", 446 | "Cost: 0.126930501761\n", 447 | "Iteration 25 of 100\n", 448 | "Cost: 0.126864572508\n", 449 | "Iteration 26 of 100\n", 450 | "Cost: 0.126797637148\n", 451 | "Iteration 27 of 100\n", 452 | "Cost: 0.126729791334\n", 453 | "Iteration 28 of 100\n", 454 | "Cost: 0.126661143778\n", 455 | "Iteration 29 of 100\n", 456 | "Cost: 0.126591815524\n", 457 | "Iteration 30 of 100\n", 458 | "Cost: 0.126521939537\n", 459 | "Iteration 31 of 100\n", 460 | "Cost: 0.126451660431\n", 461 | "Iteration 32 of 100\n", 462 | "Cost: 0.126381134199\n", 463 | "Iteration 33 of 100\n", 464 | "Cost: 0.126310527861\n", 465 | "Iteration 34 of 100\n", 466 | "Cost: 0.126240018977\n", 467 | "Iteration 35 of 100\n", 468 | "Cost: 0.126169794983\n", 469 | "Iteration 36 of 100\n", 470 | "Cost: 0.126100052304\n", 471 | "Iteration 37 of 100\n", 472 | "Cost: 0.126030995231\n", 473 | "Iteration 38 of 100\n", 474 | "Cost: 0.125962834522\n", 475 | "Iteration 39 of 100\n", 476 | "Cost: 0.125895785707\n", 477 | "Iteration 40 of 100\n", 478 | "Cost: 0.12583006709\n", 479 | "Iteration 41 of 100\n", 480 | "Cost: 0.125765897414\n", 481 | "Iteration 42 of 100\n", 482 | "Cost: 0.125703493213\n", 483 | "Iteration 43 of 100\n", 484 | "Cost: 0.125643065835\n", 485 | "Iteration 44 of 100\n", 486 | "Cost: 0.12558481818\n", 487 | "Iteration 45 of 100\n", 488 | "Cost: 0.125528941182\n", 489 | "Iteration 46 of 100\n", 490 | "Cost: 0.125475610101\n", 491 | "Iteration 47 of 100\n", 492 | "Cost: 0.125424980703\n", 493 | "Iteration 48 of 100\n", 494 | "Cost: 0.125377185428\n", 495 | "Iteration 49 of 100\n", 496 | "Cost: 0.12533232967\n", 497 | "Iteration 50 of 100\n", 498 | "Cost: 0.125290488278\n", 499 | "Iteration 51 of 100\n", 500 | "Cost: 0.125251702426\n", 501 | "Iteration 52 of 100\n", 502 | "Cost: 0.125215976974\n", 503 | "Iteration 53 of 100\n", 504 | "Cost: 0.125183278403\n", 505 | "Iteration 54 of 100\n", 506 | "Cost: 0.125153533406\n", 507 | "Iteration 55 of 100\n", 508 | "Cost: 0.125126628143\n", 509 | "Iteration 56 of 100\n", 510 | "Cost: 0.125102408083\n", 511 | "Iteration 57 of 100\n", 512 | "Cost: 0.125080678309\n", 513 | "Iteration 58 of 100\n", 514 | "Cost: 0.125061203999\n", 515 | "Iteration 59 of 100\n", 516 | "Cost: 0.125043710737\n", 517 | "Iteration 60 of 100\n", 518 | "Cost: 0.125027884097\n", 519 | "Iteration 61 of 100\n", 520 | "Cost: 0.125013367839\n", 521 | "Iteration 62 of 100\n", 522 | "Cost: 0.124999759817\n", 523 | "Iteration 63 of 100\n", 524 | "Cost: 0.124986604465\n", 525 | "Iteration 64 of 100\n", 526 | "Cost: 0.12497338044\n", 527 | "Iteration 65 of 100\n", 528 | "Cost: 0.124959481574\n", 529 | "Iteration 66 of 100\n", 530 | "Cost: 0.124944188837\n", 531 | "Iteration 67 of 100\n", 532 | "Cost: 0.12492663033\n", 533 | "Iteration 68 of 100\n", 534 | "Cost: 0.124905725647\n", 535 | "Iteration 69 of 100\n", 536 | "Cost: 0.124880110131\n", 537 | "Iteration 70 of 100\n", 538 | "Cost: 0.124848033934\n", 539 | "Iteration 71 of 100\n", 540 | "Cost: 0.124807230659\n", 541 | "Iteration 72 of 100\n", 542 | "Cost: 0.124754751262\n", 543 | "Iteration 73 of 100\n", 544 | "Cost: 0.124686761318\n", 545 | "Iteration 74 of 100\n", 546 | "Cost: 0.124598303179\n", 547 | "Iteration 75 of 100\n", 548 | "Cost: 0.124483025338\n", 549 | "Iteration 76 of 100\n", 550 | "Cost: 0.12433286859\n", 551 | "Iteration 77 of 100\n", 552 | "Cost: 0.124137651121\n", 553 | "Iteration 78 of 100\n", 554 | "Cost: 0.123884387194\n", 555 | "Iteration 79 of 100\n", 556 | "Cost: 0.123556010954\n", 557 | "Iteration 80 of 100\n", 558 | "Cost: 0.123129051477\n", 559 | "Iteration 81 of 100\n", 560 | "Cost: 0.122569925578\n", 561 | "Iteration 82 of 100\n", 562 | "Cost: 0.121830095196\n", 563 | "Iteration 83 of 100\n", 564 | "Cost: 0.120841446193\n", 565 | "Iteration 84 of 100\n", 566 | "Cost: 0.119514949723\n", 567 | "Iteration 85 of 100\n", 568 | "Cost: 0.117748246786\n", 569 | "Iteration 86 of 100\n", 570 | "Cost: 0.115450497266\n", 571 | "Iteration 87 of 100\n", 572 | "Cost: 0.112589941634\n", 573 | "Iteration 88 of 100\n", 574 | "Cost: 0.109249438151\n", 575 | "Iteration 89 of 100\n", 576 | "Cost: 0.105640175411\n", 577 | "Iteration 90 of 100\n", 578 | "Cost: 0.102027696196\n", 579 | "Iteration 91 of 100\n", 580 | "Cost: 0.0990064970213\n", 581 | "Iteration 92 of 100\n", 582 | "Cost: 0.123641875887\n", 583 | "Iteration 93 of 100\n", 584 | "Cost: 0.206124967964\n", 585 | "Iteration 94 of 100\n", 586 | "Cost: 0.128853866919\n", 587 | "Iteration 95 of 100\n", 588 | "Cost: 0.100914621849\n", 589 | "Iteration 96 of 100\n", 590 | "Cost: 0.0954172210932\n", 591 | "Iteration 97 of 100\n", 592 | "Cost: 0.0925797728969\n", 593 | "Iteration 98 of 100\n", 594 | "Cost: 0.0909633907318\n", 595 | "Iteration 99 of 100\n", 596 | "Cost: 0.0897659003613\n", 597 | "Iteration 100 of 100\n", 598 | "Cost: 0.0886726139317\n" 599 | ] 600 | } 601 | ], 602 | "source": [ 603 | "# Train for nIterations\n", 604 | "# Don't expect same results for running with 20 iterations\n", 605 | "# as with running twice with 10 iterations - learning rates are different!\n", 606 | "nIterations = 100\n", 607 | "trainUsingGD(weights, X, Y, nIterations)" 608 | ] 609 | }, 610 | { 611 | "cell_type": "markdown", 612 | "metadata": {}, 613 | "source": [ 614 | "We see that with adaptive learning rate, we reach the desired output much faster!" 615 | ] 616 | }, 617 | { 618 | "cell_type": "markdown", 619 | "metadata": {}, 620 | "source": [ 621 | "# MNIST Dataset\n", 622 | "\n", 623 | "MNIST is a dataset of 60000 images of hand-written numbers." 624 | ] 625 | }, 626 | { 627 | "cell_type": "code", 628 | "execution_count": 13, 629 | "metadata": { 630 | "collapsed": true 631 | }, 632 | "outputs": [], 633 | "source": [ 634 | "# Load MNIST DATA\n", 635 | "# Use numpy.load() to load the .npz file\n", 636 | "f = np.load('mnist.npz')\n", 637 | "# Saving the files\n", 638 | "x_train = f['x_train']\n", 639 | "y_train = f['y_train']\n", 640 | "x_test = f['x_test']\n", 641 | "y_test = f['y_test']\n", 642 | "f.close()" 643 | ] 644 | }, 645 | { 646 | "cell_type": "code", 647 | "execution_count": 14, 648 | "metadata": {}, 649 | "outputs": [ 650 | { 651 | "name": "stdout", 652 | "output_type": "stream", 653 | "text": [ 654 | "x_train.shape = (60000, 28, 28)\n", 655 | "y_train.shape = (60000,)\n" 656 | ] 657 | }, 658 | { 659 | "data": { 660 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlMAAACNCAYAAACT6v+eAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3WeAFMX29/HvkhQDIKBEI4JgJoMiiChiAgVFURQxoKhg\nRBC9JpJIUBAEAdM1Z1BQMSCgcPUa0D+KRHPEAIogK2GfF/2c6tnd2djTM917f5836M7sTNX2THf1\nqVOnsnJychARERGR0imX6QaIiIiIxJkGUyIiIiIBaDAlIiIiEoAGUyIiIiIBaDAlIiIiEoAGUyIi\nIiIBaDAlIiIiEoAGUyIiIiIBaDAlIiIiEkCFdL5ZVlZWrMut5+TkZBX1nLLex7LeP1Af40B9LPv9\nA/UxDtRHjyJTIiIiIgFoMCUiIiISgAZTIiIiIgFoMCUiIiISgAZTIiIiIgFoMBVTzZs358EHH+TB\nBx9k27ZtbNu2zf1/s2bNMt08EYmhCRMmkJOTQ05ODkuXLmXp0qXsvffemW6WSCjefPNN5s2bx7x5\n8wK/lgZTIiIiIgGktc5UGMqXL0/VqlXz/fyKK64AYKeddgLggAMOAODyyy9n7NixAPTq1QuAzZs3\nc8cddwBw2223hd7mIA4//HAAXn/9dapUqQJATo5XwuPcc88FoGvXrtSoUSMzDUyTTp06AfDYY48B\n0KFDB1asWJHJJqXETTfdBHifw3LlvHudo48+GoAFCxZkqllSiF133ZVddtkFgJNOOgmA3XffHYDx\n48eTnZ2dsbYV1z777ANA79692b59OwBNmjQBoHHjxnz99deZalrKNGrUCICKFSvSvn17AO69914A\n1+eCzJo1C4CzzjoLgH/++SesZqZExYoVOeKIIwAYOXIkAEceeWQmmxQpd911FwBHHHEE//73v1Py\nmrEYTO21115UqlQJwH1A2rVrB0C1atXo0aNHka/x3XffATBx4kROO+00ADZs2ADAJ598EvkLVatW\nrQB47rnnAKhataobRFk/7Ateo0YN2rRpA8BHH32U67Ew2QmqRo0avPDCC6G+V8uWLQF4//33Q32f\ndDn//PMBGDx4MJD75G7HWaLBBh52rNq2bcvBBx+c9Ll16tRh4MCB6Wpaqf3yyy8ALFy4kK5du2a4\nNalx0EEHAf5364wzzgCgXLly1K1bF/C/Z0V9x+xvMnXqVACuuuoq/vzzz5S3OVWqVq3KW2+9BcBP\nP/0EQO3atd1//6+yoMmll14KwJYtW3jzzTdT8tqa5hMREREJINKRKZvSmjdvXtKpvOKwOw+bPvnr\nr7/c1NCPP/4IwLp16yI5RWRTlM2aNePRRx8FvDvdvFatWgXAnXfeCcCTTz7JokWLAL/fo0aNCr29\nNh3VsGHDUCNT5cqVY9999wVwybFZWUVW+48068eOO+6Y4ZaUXuvWrenduzfgTbuCHx0AuO666wD4\n4YcfAC+6bJ/r9957L51NLbHGjRsDXkTinHPOAaBy5cqA99n79ttvAT9KbFNkPXv2dFNJy5cvT2ub\nS2Ljxo0AZWI6z9g578QTT0zZa5533nkA3H///e4cG3W1a9d2//6vR6ZsxqZixYoAvPPOOzz99NMp\neW1FpkREREQCiHRk6ptvvgHgt99+K1Zkyu5u169fT8eOHQE/V+iRRx4JqZXhue+++wA/Ub4gVgrB\nkmAXLFjgokSHHnpoeA3Mw+7a/vOf/4T6PnXq1OHiiy8GcJGNKN/1F+bYY48FYMCAAbl+vnz5ck4+\n+WQAfv7557S3qyTOPPNMwFtWX7NmTcCPFM6fP98lY48ZMybX72VlZbnHLLE3Kux8M3r0aMDv4667\n7prvuatWreL4448H/Dte+zzWrFnT/U2irFq1agAcdthhGW5J6rz++utA/sjU2rVruf/++wHcIo/E\nHEXLy7XoatzFPWpfkPbt23PjjTcC/jXy999/L/D5vXr1crmNa9asAfxoeSpEejBlf5hBgwa5C8uS\nJUsAL5HcfPzxxwAcd9xxgBeytumFK6+8Mm3tTZXmzZsD/sqgxC+DJcq/9NJLblWiTZvY32bdunUc\nc8wx+X43bHZiCtuMGTPcf9sUZxy1a9eOBx98ECDfzcKYMWMiO+VSoYJ32mjRogUA06dPB7xp6YUL\nFwIwbNgwwAuj77DDDgAunN65c2f3Wh988EF6Gl1CtkjloosuKvA5dkI+7rjj3DTf/vvvH37jQmAp\nBXvttVe+x1q2bOkGh1H9TCYzZcoUAGbOnJnr51u2bCl0ustWSX/66acALlk98bWi+rlNxpLr45xC\nkMy0adNo2LAhAAceeCDgnW8KMnToULfK3W7GP/nkk5S1R9N8IiIiIgFEOjJlZs6c6SqUWoKnhaMv\nvPBCF6GxJEqAzz77DIB+/fqls6mBJNaQAnLVkXrllVcAP5zZoUMHl1xukRpb3vzJJ5+4sLVFt5o1\na+bKJKSaTSXWqlUrlNfPKzGKY3+rOOrTp0+uu17wpsWAlNU+CYMlmSdGCME7FjYdlrhs3H6WGJEC\nr1zJww8/HGZTS82W0ef11VdfuXIcVhrBolLgJ57HjUW3H3roIW699dZcj916662sX78egEmTJqW7\naaW2detWIPfxKQ6bst1tt93yPWYlduJQOyyvFi1a8O6772a6GSmzadOmYkXd7Lq69957u+tiGFE6\nRaZEREREAohFZArIVyDtjz/+cP9t859PPfUUUHQ12yhq1KgRgwYNAvzIy6+//gp4JRzsDv6vv/4C\nYM6cOcyZM6fI17Xl29dee61b0p1qluBp7xUWi3xZWQSA77//PtT3DIMlJF9wwQXus2p3/sOHD89Y\nu4pj2LBhDB06FPBzMWzp/0033ZS0kKElieY1cOBAF02NGjunWGT7tddeA2D16tWsXbu2wN9LV3Q2\nLMOGDcsXmfpfYYsg7NgnO5/dfPPNaW1TaW3dutVdI+160qBBg0w2KWUsH/OQQw7h888/B5LnPu28\n886AH0HeaaedXGTu2WefTXm7FJkSERERCSA2kam87O6pefPmbgmrLTO3u8g4sJVOY8eOdREeywuz\nUgMffPBB4KhPslU6qWL7HhrLV0s1y42rVasWK1euBPy/VRzYNiS2JVCie+65B8BtARE1dkc+dOhQ\nV25k7ty5gH/n9/fff7vnW05C586d3WfPVpZa9M32O4siyyEqaZSmbdu2IbQmvZKVCyirLFo/ZMgQ\ntxLTylskshXjW7ZsSV/jAli/fj1vv/02gFsJH3d77rkn4EcOt27d6vbgTRbhHj9+PODnP/7www+h\n7k8Y28GUJZtffPHFLrHalmi/9dZbbunq5MmTgejub9a0aVMgdy2Ubt26AfHd2DYV++VVqVKFLl26\nAH7Cc2ICs4V6bXosDqw/ibW/bF+oCRMmZKRNRbH6Q5dddhngfY9sEHXqqafme75dkGyXASvzAX5o\n3Sr1x5XttWfTCIkOOeSQXP+/ePHi0OuupVpx96uLOrt5sQ3g7WY7ke3xmqyvNmU9ZMgQXn75ZSD3\nDYOkh9WGsl01LE3innvuSXqNtNpRtiejGTFiRIit1DSfiIiISCCxjUyZNWvWuBGoFUA899xz3d2I\n3T3aUnPbjy8qLBSZlZXlRtmpiEhlMlRfvXr1pD+3chY23WN3ivXr16dSpUqAH3YvV66cuwu0yva2\nHLlChQp8+OGHIbU+HKeeeqrbsdy888479OnTB8i9oCJK7LgkVvG2yMwee+wBQN++fQHo2rWru4u0\navw5OTnurt+q1SeWMIk6K2ZpRQFvueWWfBW1y5Url+97ZtOEffv2Zdu2bWloqSQ6+OCDefHFF4HS\npzjYNNm0adNS1q5MsoKVcWCFgXv37l1gtfq2bdtyww03AP51tHr16m5az64zdu23HUXCosiUiIiI\nSACxj0yBP5dqW4uMHz+eTp06ATBy5EjAK9gF3rxpFJbTW1KgFRTLyclxd1KpkDfvwRIow2ARJHuv\nqVOnuuXziSxXyO4YrKjepk2bWLZsGQAPPPAA4CXdW4TO9qazgnmVK1eOzV58hSWdf/HFF5Hfd8+S\nzS3Bc/fdd+fLL78EkueZWETG8k3q1KnjSny89NJLobc3FSpWrOhyGe241alTB/A+69ZHy4Xq0qWL\ni2AZu7Pu3r27y4ezv6Wkh51nCttSq7AIvp2jTzjhBFc0Oc66du2a6SYUm5WpmDFjhjvP2DFavXo1\n4BUhtS2tLM+4Xr167rtq56wLLrggLW0uE4MpY3sp9ezZk1NOOQXwp/4uueQSABo2bOj28MskW51n\n0yhr1651dbJKy1YGJq5AssrxFg4NgyUn275dtlFoXrZxte1vZTVCiqrKa7V+bFPcL774ImCL08dW\nuiU7Weed9osiS/C3ZPPZs2e7aVzbm85W5T300ENuP80nn3wS8AYh9t9RZ9/FLl268Pzzz+d67Lbb\nbgO879OiRYsAfzp73rx5bnrT2Gd11KhR+T73Ua+enWyA0b59eyA+FdA//fRTt9m7LWCxhRObN29O\n+jsXXnghkH/T8biylcFxWs1nuyXYdXvLli3uHHT22WcD3t6zAOPGjXMr+W1QlZWV5QZflppgFfCP\nPvpod84Kg6b5RERERAIoU5Eps379eh555BHA3z/Mwu7t27d3dyy2D1oUZGdnlzo53iJStlffoEGD\n3JTYuHHjAL9yephGjx4dyuvalK1JNmUWNTZ9m3c/OvAjOStWrEhrm4KwRQAWcSmIRTDsjnH79u2R\njyRaXSGLPtlOBICb3rE6YOvXr3d/A1suf8ghh7gpPCv7YJGqbt26uTIRb7zxBuB9T+zu2oQ5DV9S\nyUojdO/eHfAT8W1aPsosUl7cJfEW0S8rkSmLiJqKFSu6dBf720SNzSBZ24cPH+6iVHkNGDDAJZUn\nq+9m07sWoQszKgWKTImIiIgEUqYiU5bgfPrpp9OyZUvAj0iZZcuWsXDhwrS3rSilST636IfdSdt8\n86xZs+jRo0fqGhcxtuAgyqwKf+LO85YblreYXFliuYCJ0Y0o50yVL1/eFYC1Yn8bN25kyJAhgJ/7\nZXkbLVq0cHlDlqS+atUq+vfvD/h3wVWqVAG8/EEr92EJwK+//rp7f8vnSNxvMtOmTp0K+FGCRJa/\neNVVV6W1Telw/PHHZ7oJKWULfExWVpabxYgqi9pbzqJ9P5KpWbNmvlzFXr16udxpY7M0YVNkSkRE\nRCSA2EemDjjgALc/j83r165dO9/zrHDejz/+GIk9p/Iu2z311FO58sori/37V199Nf/6178Af1dw\ny82wPf0kc6xAXuJn7d577wXSk7+WKbZiKi769evnIlKbNm0CvIiMRRbbtGkD+IVJTzjhBBd9u/32\n2wFv5VHeO2grDfHqq6/y6quvAt5dM/irksD7HkdNXMqOJLK8N8tRnDdvXom2funbt29kt3QqLYvy\n2PFs3LixiyjaCuyoKc4xsOvdGWec4SLAlg/19NNPh9e4IsRuMGUDJTsxXXHFFa6WTzK2R58lIaay\nllMQltxp/9auXZuJEycCfq2l3377DfBO6FbR3aqI169f3yXp2QXMLtZllQ08GzVqVGQ5hUyxZElb\nXp5o8eLF6W5O2sVtqsQ2cAZvyg+8aXNLRra9BhPZY6NGjQIodoXzJ554Ite/UWXJ9paI3aBBA/eY\n3fDZc8JO6i2Odu3aceONNwK4sjf77rtvoVNEVtbCqtmPHz8+X60wG4wVVEohLuzGoF69elxzzTUZ\nbk1wNhDs378/a9euBeCYY47JZJMATfOJiIiIBBKLyFStWrXcklxL/mzcuHGBz3/vvfcYM2YM4Ic6\nozC1V5jy5cu7Ebclj9tUQcOGDfM9f/HixS7ZNfHuuiyzKF6yqE8UHH744W6/Qfu82ZL5yZMnR77a\neSrst99+mW5Cifz000+u1IEl51r0F/zyB7ZoZebMmXz11VdA8SNScfXZZ58BuY9pFM+jkyZNypeI\nfP3117Nhw4YCf8ciWM2aNQNyl4GwkjlTpkwB/EUFcZeTkxPrKvxW1uGiiy4CvP7YvonpSjIvTDSv\nSiIiIiIxEcnIlM1nW0Guww8/vNA7XstFsQKVc+fOLVHyYSbYvl7vv/8+gCvlAH5eWK1atdzPLH/K\nlmqXJFm9rGnbti0PPfRQppuRT7Vq1fItfrB9IC3Juax7++23gcL3PIuS9u3bu61yLEqxdu1al7do\nxTXjfEdfWnbXb1tzxYmVqiiutWvXur0j7dwa91ypvKpUqeL2sItDeZm8rKSIRageffRRbrnllkw2\nKZfIDKZat24NeMmfrVq1AryEuYLYypuJEye6zYw3btwYcitTx8KStgLxkksucRXM85owYYILOdsm\nj/+LCtuwVKLBarzYpuP77befS2C2jUejZMOGDW63BPtXPFbl/PPPP6dJkyYZbk3Bzj//fJcs36dP\nnyKfv2bNGnf9sMH/tGnT8tUnKit69uwJeLts2H6ocWSLe6wunKXwRIWm+UREREQCyEpMvAv9zbKy\nCnyzO+64A8i9L5ZZtmwZs2fPBvyqrjalZ5WJ0yEnJ6fI0EhhfYyDovqYif5ZxXCbepk+fXrS6szF\nEeYxrF27Nk899RTgLdcG+PLLL4HkS+zDEoXPqR2zGTNmsGDBAsBfap+Kfd2i0MewRfG7mEqpPIa2\neMA+d8OHD3e7D8ycORPwp4lmzZrFTz/9VPIGl0IUPqeWGtKkSRNXhT+Ve/NFoY9hK04fFZkSERER\nCSAykak40Ai87PcP1MdUsMrETz/9tCsXYfttWTXxIDmOUehj2PRdVB/jQH30KDIlIiIiEoAiUyWg\nEXjZ7x+oj6lUpUoVt5WTLVc/9NBDgWC5U1HqY1j0XVQf40B99GgwVQL60JT9/oH6GAfqY9nvH6iP\ncaA+ejTNJyIiIhJAWiNTIiIiImWNIlMiIiIiAWgwJSIiIhKABlMiIiIiAWgwJSIiIhKABlMiIiIi\nAWgwJSIiIhKABlMiIiIiAWgwJSIiIhKABlMiIiIiAWgwJSIiIhKABlMiIiIiAWgwJSIiIhJAhXS+\nWVZWVqx3Vc7Jyckq6jllvY9lvX+gPsaB+lj2+wfqYxyojx5FpkREREQCSGtkSkRya9SoEQCvvvoq\nAOXLlwdg7733zlibRESkZBSZEhEREQlAkSmRDLnnnns488wzAahevToAs2fPzmSTRKQM22+//QAY\nNWoUAKeddhoAhx56KMuXL89Yu8oCRaZEREREAohtZOrAAw8E4OSTT6Zfv34AvP/++wAsWbLEPe/u\nu+8G4J9//klzC0Vyq1WrFgDPP/88AG3atCEnx1vk8umnnwJw4YUXZqZxIlKmHXHEES4385dffgFg\n8uTJAPz8888Za1dZociUiIiISABZdmecljdLQa2JSy65BICxY8cCsMsuuxT6/GOOOQaAt956K+hb\nq54Gyftnx8DyfzZv3kzz5s0B2HXXXQE455xzmD9/PgDff/99ga//008/ATBr1iw++OCDkja/SJk6\nho0aNXKf2RNPPNHehyFDhgC4vsbxc5qV5b3dE0884fpmkePvvvsuVW+Ti76Lqe3fueeeC0Dnzp05\n/PDDATjggAPc4++++y4Ap5xyCgB//PFH4PeMyzHceeed3bmrbt26ABx55JF89dVXRf5uFPp40kkn\nAfDss88ydepUAG688UYANm3aFPj1o9DHsBWrj3EbTFmi7ueffw7AHnvsUejz169fD/gX+tdee63U\n760PTfL+3XnnnQBcd911KWvH9u3bWbZsGeBdpBP/Lc5JrCCZOoZt2rThnXfeyfs+9O7dG/D7lgrp\n7uNOO+0EwIoVK6hXrx6Am3qfMWNGqt4mF30Xg/WvZs2agH98bJC0fv16Fi9enOu5Rx99NDvvvDOA\nS1K2wXIQUTqGdevWZffdd8/1s3Xr1gHQsWNHHnzwQcD7jAO0atWKDRs2FPm6mezj/vvvD8Ann3wC\nwNtvv+1udrZv356y94nScQyLinaKiIiIhCx2Cei///47ALfccgsA48aNc3fG33zzDQB77bWXe361\natUA6NKlCxAsMhUnVvSxcuXKAPTq1Yv+/fvnes6cOXMA6Nu3b6D36t69e4GP/fbbbwD83//9X4HP\nWbFihZtSsOPVtGlTDj74YABGjBiR6zWCRKbSzYpyPv744246zHTv3p1Zs2ZlolkpZVMFq1atcpGp\nvHf5Zdm1115LpUqVAGjSpAngTWsbi+YcdNBB6W9cASwReZ999gH86PKYMWPcOdY0btyY//73v4D/\neb755psBuP3229PR3JSw88nAgQPzFcVt1KhRrusGwB133AF4UTj77lqKgh3vqNpxxx1d1HHp0qUA\n9OzZM6URqSiwmSqbeRo6dKibijU33XQT4JeDCIsiUyIiIiIBxC5nKq+PP/6Yww47DPCXl9sdSKIG\nDRoA8MUXX5T6vaI+N3zssccCXsSjV69eAFStWhWAZMd55cqVgH83/f+fV+I8Dfvb2l2rvS74UYsf\nf/yxWH2whPWlS5fmu1OcPn064C9CKI10H8Nhw4YBcMMNN/DKK68AcOmllwKFJ+IHkanPaY8ePXjm\nmWcAePTRRwE477zzUv02QOb62KFDB3d+6dChA+AVPswbdUxk0YDVq1cDxc83Citn6rjjjnORqaef\nfhrAnS8KYhEou8v/+uuvAdh3331L0wQg/cdw4MCBANx11135HsvOznafXVu0lBjhsONrn2f7fBcl\nU5/TMWPGcMUVVwDQsGFDoOwtBmnTpo07lq1atbK2FPj8Rx55pNSzMMXpY+ym+fIaPny4W5lgq1CS\niXpYtjQsjHvIIYcA0LJly3zPsSTJxx57zNXhsmTnzZs3p6Qda9asyfVvECeffDKQe6o2Ozsb8AdT\ncWBJvPaZ/Oqrr7j66quB8AZRmWZTQeBNKQAMHjy42APpqKhTp477jljFaFO1alWXjG0X2A8//JBm\nzZoV+HrlynkTAPZ7mVahQgU3sHvyySeL9TvPPvss4A+mdtxxRwCqVKnCn3/+GUIrU+fWW28FYNCg\nQe5nDz/8MODXWxo7dqz7b/vOzp07F/CS9e0x+ztE1Q477ABA79693QrEsAZRmWKLJ6ZPn+4CAXZ8\nZs6c6VInbOB7xhlnAN7gy8YBYdSd1DSfiIiISACxj0w9++yzbsm5JZdbpCbR8OHDATj99NPT17gQ\n1KhRA/CS6S644ALAT8r/8MMPAS9x0qY8//77b8BPzo+iSpUqMXHiRCD5tFDbtm0Bb0o36rp16wZA\n69atAT/s/Mwzz6QsEhhlFq2xO8CuXbty3333ZbJJxWbT5NOnT2fPPfcs8vk2Xffrr7+6u2WbGrKl\n9PXr13fPt1IfmfbWW2/RtGlToPh1hiw6bKya/9lnn+1qF0WVRQRtMc7XX3/tZjMSo6ZWSmDo0KGA\nv4hi48aNLroV9e/w9ddfD3i1/6yPZY1Fnpo0aeKu+VbyIdGqVasA/3tdv359F8mychGppMiUiIiI\nSACxj0ydc845LgE9WeK5yVswMa7+9a9/Ad4ebvfccw/gV7P966+/Mtau0ujYsSPgVV8+//zzcz22\nZcsWlzAal93Mq1WrxlFHHZX0sXXr1hWau3DllVcC5IqIpLIIarrkTQCNU66i3dUni0pZZGbw4MGu\nGrgVcAS/BIgdx8SIlJXysCrjmVaa6Iot3Pnss88Av8yDJTdHmeU5WXmcAw880JU9uOyyywAvF278\n+PGAXzHcIv4jRoxgypQpaW1zaXXu3BmARYsW8dFHH2W4NeGw2RagRKVl/vzzT3799dcwmgQoMiUi\nIiISSOwiU40bNwbghRdeALx57goViu7Giy++GGq7wmDFSAcPHuzuaq+66irAy3uw1SZRn8fPy5ax\n2nx3+fLl8z0nJyfH5Xlt27YtfY0LYNu2bW5PQlvBZcviFy5cmO/5troPYMCAAQC5iglee+21gB/l\nKKurADPN7ubbtGmT7zH7DNr3b9GiRYW+VmJEytjdc5h3xWHbsmULAFu3bs1wS0rOci0tonjggQe6\n8gfHHXcc4JVLyFuK5bbbbgNwMwBR1q5dO8D/DCfLGwZvayDwV79ZpDFOLC8zKyvLbfljq0sbNGjg\nZjnsXGz7vfbq1SvUc2jsBlOWQGb1TYozkAL/wmUXrTiwZciDBw929WBsABK3AVQiWzafbBBlKlWq\n5Cq02ybAL730EuANpC3BPko6dOjgpvlsEGUX48QLqS29Puqoo+jatWuu19i4cSPgLWe2qvA2TXHW\nWWe5+j6SOjZotZsX8Etb2AW1sEHUbrvt5qaQ2rdvn+uxxYsX8/LLL6e0vZlgS+7tomWKsz9dptkU\nbWIJB1so8NxzzwHehdmmqO+//37AW2YfF7bHp+1Z++WXX7rHbHAxbtw4dtttN8D/m1gqweTJk9PV\n1MBsijknJ4drrrkG8L/DNoAC73wJ6StnoWk+ERERkQBiF5my6T1LFh09enS+u6Vk6tSpE2q7wnDD\nDTcA3gg81YU2M+n5558H/Chjy5Yt3dLyZFq0aJHr31tuuYW7774b8PcUW7t2bWjtLYpVbU+sBv3D\nDz8AXtVd8KpfW4V4Kx7YrVs3F7GyiOO4ceMALyF23rx57r/jwkLw6dxZIahp06YBfjHAP/74g7PP\nPhvwpwgKc+mll7pK98amT3r27Fms14g628PPoqXGKqknqlmzplsUZGVNrLp4YtJ+uhUV1bUI4tix\nYwH49ttvQ29TqliZHPvcZmdnu8Ufto/tJZdc4lJDrJSAlfBYs2ZN0mMZRbbYY9ddd3XXhMTzjpX7\nSHcpEkWmRERERAKIXWTKWJHHVatWUa1atVyPVahQgUmTJgHedgdxZdtztGjRwvXHloW+/vrrGWtX\nUJaPYkuQ99prLxcVsGKA3bt3d3dbefc9K1eunJsrtznyTp06ZWxHdEv+TNzzy7a+sT3NatWq5e54\n7a5ww4YNLhfOchdsqfnUqVNdPsqbb74JFH1nHQVxikgZy5uxf4vrlFNOAeDmm292P7MEbStkGeeo\nlOVJ1a9fnyOOOCLpc6ZOneqKBduWOtWrV3flJewzbAUx85ZASQfLzbR8xmT7KM6ZM8cdzzix/CHL\nHU5cIGAtR4t3AAAJHklEQVTHwyJOiblDTz31FOCfu2644YbYRKasz23atHELPqw/4M98pDsyFfuN\njgt4H1ex1k50tm9cp06dSn1RCnNDx9atW7NkyRLA3zeoevXqgLdBp9WXslpSrVu3DqX+Ulibq5bG\nOeecA/iLBmwVYDJDhgxxU36FCeMYDh48GPDq0Zi8CyMWLVrkqqKbTp06sWDBAsBfhZNYD82mMkta\nbypTG4/uueee+b5bHTt2dH1MpShsOm6rTBPPoVa3yKYOgwjru1i5cmX22GMPwL/g2ufPVrmBn2xu\nF69ktm3blq9+2kMPPeQWj9g0ttXaSpSuY2hTjN27dy/wOXPmzMm3GCQVwu5jp06dAP/m2qryL1++\n3KUf2HSfTY8lsucvXbq00AVBhcnkd9FqS1pF85ycHNenlStXpux9itNHTfOJiIiIBBDbab7CVKpU\nKVfoHfw6KVGpWWQJ8bNnzwa8qS4r3/Doo48CfgXeSZMmucjULrvsAvhRq7LsscceA/wQ7htvvAHk\nX34O/jRCJtg0c1ZWVr6KvFYGYZ999nHTC7aMd8GCBS4p/fHHH3evYc+xyFScWUS4LBk5ciSQv5YY\nEEoULijbk86i9aeccoqr15eMlRCwKbqtW7fmi7TOmDED8Kb5olhpu27duvTt2xeAHj16AH4E8aOP\nPnKRDHuOReriLrGOUnHKVhS2K0McWD2tZN/FdFNkSkRERCSAMhmZGj58eL6fWSG2qIzE7W7OEuQH\nDx7sIlJ52X5f4Ednoli0MiyWVGmJrskiU6mcHy+tnJycAhOwt2/f7h479NBDAa+gp+WlWJE9S5L9\n448/wm6ulEKlSpVo2rQp4N8F5+TkuO+o7VQfJVZ80qp9Z2dnu5wm+9xZRDU7O9vlN9m5cvny5S6C\nanv02QKQqO4H2qlTJ7f4w1gR5EmTJnHqqacCfmQq3cnKqZJYDbw0OnToAMSj+GoytiDLvovz5893\nOcfppsiUiIiISACRjEzVqFED8AuKPfHEE65oZWEsD6lfv375HrPlklFhpR3sbmnixInuZ8buchs2\nbOhWSVkhz8StEeKgTp06XHzxxQBuFaKVBSiKrTKxQoCJLGpl+25lgt3VDxo0iG7dugH+6ijLmbKV\nNQDnnXce4N1N2mony2cpa/vv2fL6uLOtZnr37u0iPOaJJ55w+X2ZzNkoiO09aFGo7t27u/3qkrH8\nqNGjRwNQr149VxTXtoKKakTK9p5LPJfaKj2L6teuXTtfTm2y1YZxYNHukq7Kr1ixIuAVnAW/uHCc\nNG7cmAsvvBDw9xqcMmVKxo5lJAdT9kWwuh+NGjVyFaXtYrN69WrAqzNkIWirip5YW8oqStvvR8Wo\nUaMAPzG+adOmHHvssbmeY/sozZkzxy2Pt37HRe3atQGv1oklC1q/imI1p2xKIXHZtrG9qBJLCqSb\nHcNNmza5i67t5VbYSS6xztQrr7wScisz48QTT4zFRrEFsUGw1Q07/fTT3WO2YGTSpEmRHEQZ+wyu\nX78eKDxFYMcdd3SlBKwOXHZ2ttvnLIrJ5olsoFu1alW3GMAW+dgA4uSTT3a7Ctj0mF2M48amJ3/8\n8UfA36NvypQpSZ9vfwN73Crb9+nTJ8xmppQdu7lz51KvXj3AL0+Trn34ktE0n4iIiEgAkYxM2Z2s\n7XXWtm1b5s+fD/jhWBuRH3XUUbmmUMC7E7OpJNuXKKp72llV7LLKlvdbVAr842r7dFkSIfjLuK+/\n/noXkcp7fLOyslzC5MCBA0NqefFZYnyvXr1cm226IdHDDz8MeAXyAJYsWRLJpfSl9fPPP7s96Qor\n9BgnduebGJGycg95p+WjyhZn2JTztGnTXCqFlQiwxPJBgwa5/ffee+89APr371/otGCUJC4KsIic\nRWMs6XzChAmsW7cO8Es8FBTJiTqLSFm5DpuJAb+0zH777Qd4aRJDhw4F/OuhTQFbukEcWHHmevXq\nufSfxH5niiJTIiIiIgFEejsZG22uXr2ae++9t9i/9/vvv7s7r1SKwhYWYUv1FhaWdH7ffffle8y2\nz0ksA2Dz4bb8PJm//vqL0047DfD3rSsuHUNPWH18//33AX/PxNmzZ8dymw4ramkFVm0J/cqVKznh\nhBOA8PdKTPV3cdiwYYC3PZEVOczrxRdfdGVkwt6rLYxjaOeZiy66yOXPWO6llR0BP0r10ksvleTl\nSyzd38XLL78cgDFjxuRb/LFhwwYXTbXyQakoI5CuPlpOsS342b59u8sRy1ssOdWK1ccoD6bMDjvs\nkG86xy62vXr1cj+zi/IxxxwTSqKkLsQl758lOI4cOdIlsZaUrdizKcPnnnvOTUGUlI6hJ6w+WqK2\nrbKZP39+0oUDQYXdR5siOfPMM3P9fMCAAWmbEorSPplhCOMYXnXVVUDuaR9LMrcdJSZPnswdd9wB\n5E4xCIPON54gfbRriKVTWG2+3r1788ILL5T2ZUtEe/OJiIiIhCySCeh5ZWdnM2bMmKSPnX322Wlu\njZSELRjo27cvL774IuCXOLDE2MRpIFs4ADBv3rxcP4tLEuz/shEjRgD+bu7FrSUWJQcddFCu8irg\nJW2D/5mUaLJFHpUqVXL7mX7wwQcA7vxz1113ZaZxUmKVK1d2U+2WAvLcc88BpC0qVVyKTImIiIgE\nEIucqajQ/HfZ7x+oj3EQZh9Hjx7t7oYtyfzEE08E/HIe6aDvovoYB2H2sX///kyaNAmAxYsXA34i\nenZ2dmleslSUMyUiIiISMkWmSkB3GWW/f6A+xkGYfezUqRNz584FoEePHkD4S6+T0XdRfYyDMPrY\nqlUrwMuPeuCBBwB/pfB3331X4jYGVWZKI0SFvhhlv3+gPsaB+lj2+wfqYxyojx5N84mIiIgEkNbI\nlIiIiEhZo8iUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiI\nSAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGU\niIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgE\noMGUiIiISAD/D2VmfeQeqcmwAAAAAElFTkSuQmCC\n", 661 | "text/plain": [ 662 | "" 663 | ] 664 | }, 665 | "metadata": {}, 666 | "output_type": "display_data" 667 | } 668 | ], 669 | "source": [ 670 | "# To check MNIST data\n", 671 | "print(\"x_train.shape = \"+str(x_train.shape))\n", 672 | "print(\"y_train.shape = \"+str(y_train.shape))\n", 673 | "fig = plt.figure(figsize=(10, 2))\n", 674 | "for i in range(20):\n", 675 | " ax1 = fig.add_subplot(2, 10, i+1)\n", 676 | " ax1.imshow(x_train[i], cmap='gray');\n", 677 | " ax1.axis('off')" 678 | ] 679 | }, 680 | { 681 | "cell_type": "markdown", 682 | "metadata": {}, 683 | "source": [ 684 | "(In supervised learning) Every (good) dataset consists of a training set and a test set.\n", 685 | "\n", 686 | "The training data set consists of data points and their desired outputs.\n", 687 | "\n", 688 | "In this case, the data points are grayscale images of hand-written numbers, and their desired outputs are the numbers that have been drawn.\n", 689 | "\n", 690 | "The test data set consists of data points whose outputs need to be found." 691 | ] 692 | }, 693 | { 694 | "cell_type": "markdown", 695 | "metadata": {}, 696 | "source": [ 697 | "Let us implement the following neural network to classify MNIST data:\n", 698 | "
![MNIST NN](images/digitsNN.png)" 699 | ] 700 | }, 701 | { 702 | "cell_type": "markdown", 703 | "metadata": {}, 704 | "source": [ 705 | "## Initialize network\n", 706 | "\n", 707 | "MNIST dataset has images of size 28x28. So the input layer to our network must have $28*28=784$ neurons.\n", 708 | "\n", 709 | "Since we are tring to classify whether the image is that of 0 or 1 or 2 ... or 9, we need to have 10 output neurons, each catering to the probability of one number among 0-9.\n", 710 | "\n", 711 | "Let our hidden layer (as shown in the diagram) have 15 neurons." 712 | ] 713 | }, 714 | { 715 | "cell_type": "markdown", 716 | "metadata": {}, 717 | "source": [ 718 | "Before initializing the network though, let's ensure our inputs and outputs are appropriate for the task at hand." 719 | ] 720 | }, 721 | { 722 | "cell_type": "markdown", 723 | "metadata": {}, 724 | "source": [ 725 | "## Are our inputs in the right format and shape?\n", 726 | "\n", 727 | "Remember that we give inputs as np.arrays of $n{\\times}784$ dimensions, $n$ being the number of data points we want to input to the network." 728 | ] 729 | }, 730 | { 731 | "cell_type": "markdown", 732 | "metadata": {}, 733 | "source": [ 734 | "Is ``x_train`` an np.array?" 735 | ] 736 | }, 737 | { 738 | "cell_type": "code", 739 | "execution_count": 15, 740 | "metadata": {}, 741 | "outputs": [ 742 | { 743 | "data": { 744 | "text/plain": [ 745 | "numpy.ndarray" 746 | ] 747 | }, 748 | "execution_count": 15, 749 | "metadata": {}, 750 | "output_type": "execute_result" 751 | } 752 | ], 753 | "source": [ 754 | "# Check type of x_train\n", 755 | "type(x_train)" 756 | ] 757 | }, 758 | { 759 | "cell_type": "markdown", 760 | "metadata": {}, 761 | "source": [ 762 | "Yup, ``x_train`` is an np.array" 763 | ] 764 | }, 765 | { 766 | "cell_type": "markdown", 767 | "metadata": {}, 768 | "source": [ 769 | "Is ``x_train`` in the shape required by the network?" 770 | ] 771 | }, 772 | { 773 | "cell_type": "code", 774 | "execution_count": 16, 775 | "metadata": {}, 776 | "outputs": [ 777 | { 778 | "data": { 779 | "text/plain": [ 780 | "(60000, 28, 28)" 781 | ] 782 | }, 783 | "execution_count": 16, 784 | "metadata": {}, 785 | "output_type": "execute_result" 786 | } 787 | ], 788 | "source": [ 789 | "# Check shape of x_train\n", 790 | "x_train.shape" 791 | ] 792 | }, 793 | { 794 | "cell_type": "markdown", 795 | "metadata": {}, 796 | "source": [ 797 | "Clearly not.\n", 798 | "\n", 799 | "We need to reshape this matrix to $60000{\\times}784$." 800 | ] 801 | }, 802 | { 803 | "cell_type": "code", 804 | "execution_count": 17, 805 | "metadata": {}, 806 | "outputs": [ 807 | { 808 | "data": { 809 | "text/plain": [ 810 | "(60000, 784)" 811 | ] 812 | }, 813 | "execution_count": 17, 814 | "metadata": {}, 815 | "output_type": "execute_result" 816 | } 817 | ], 818 | "source": [ 819 | "# Reshaping x_train and x_test for our network with 784 inputs neurons\n", 820 | "x_train = np.reshape(x_train, (len(x_train), 784))\n", 821 | "x_test = np.reshape(x_test, (len(x_test), 784))\n", 822 | "\n", 823 | "# Check the dimensions\n", 824 | "x_train.shape" 825 | ] 826 | }, 827 | { 828 | "cell_type": "markdown", 829 | "metadata": {}, 830 | "source": [ 831 | "Now our input is in the right format and shape." 832 | ] 833 | }, 834 | { 835 | "cell_type": "markdown", 836 | "metadata": {}, 837 | "source": [ 838 | "## Are our inputs normalized?\n", 839 | "\n", 840 | "Remember that we had decided to limit the range of values for the input to 0-1." 841 | ] 842 | }, 843 | { 844 | "cell_type": "markdown", 845 | "metadata": {}, 846 | "source": [ 847 | "Are all the values of ``x_train`` between 0 and 1?" 848 | ] 849 | }, 850 | { 851 | "cell_type": "code", 852 | "execution_count": 18, 853 | "metadata": {}, 854 | "outputs": [ 855 | { 856 | "name": "stdout", 857 | "output_type": "stream", 858 | "text": [ 859 | "Values in x_train lie between 0 and 255\n" 860 | ] 861 | } 862 | ], 863 | "source": [ 864 | "# Check range of values of x_train\n", 865 | "print(\"Values in x_train lie between \"+str(np.min(x_train))+\" and \"+str(np.max(np.max(x_train))))" 866 | ] 867 | }, 868 | { 869 | "cell_type": "markdown", 870 | "metadata": {}, 871 | "source": [ 872 | "Our inputs are images, their values range from 0 to 255. We need to bring them down to 0-1." 873 | ] 874 | }, 875 | { 876 | "cell_type": "code", 877 | "execution_count": 19, 878 | "metadata": { 879 | "collapsed": true 880 | }, 881 | "outputs": [], 882 | "source": [ 883 | "# Normalize x_train\n", 884 | "x_train = x_train / 255.0\n", 885 | "x_test = x_test / 255.0" 886 | ] 887 | }, 888 | { 889 | "cell_type": "code", 890 | "execution_count": 20, 891 | "metadata": {}, 892 | "outputs": [ 893 | { 894 | "name": "stdout", 895 | "output_type": "stream", 896 | "text": [ 897 | "Values in x_train lie between 0.0 and 1.0\n" 898 | ] 899 | } 900 | ], 901 | "source": [ 902 | "# Check range of values of x_train\n", 903 | "print(\"Values in x_train lie between \"+str(np.min(x_train))+\" and \"+str(np.max(np.max(x_train))))" 904 | ] 905 | }, 906 | { 907 | "cell_type": "markdown", 908 | "metadata": {}, 909 | "source": [ 910 | "Perfect." 911 | ] 912 | }, 913 | { 914 | "cell_type": "markdown", 915 | "metadata": {}, 916 | "source": [ 917 | "## Are our outputs in the right format and shape?" 918 | ] 919 | }, 920 | { 921 | "cell_type": "markdown", 922 | "metadata": {}, 923 | "source": [ 924 | "Is ``y_train`` an np.array?" 925 | ] 926 | }, 927 | { 928 | "cell_type": "code", 929 | "execution_count": 21, 930 | "metadata": {}, 931 | "outputs": [ 932 | { 933 | "data": { 934 | "text/plain": [ 935 | "numpy.ndarray" 936 | ] 937 | }, 938 | "execution_count": 21, 939 | "metadata": {}, 940 | "output_type": "execute_result" 941 | } 942 | ], 943 | "source": [ 944 | "# Check type of y_train\n", 945 | "type(y_train)" 946 | ] 947 | }, 948 | { 949 | "cell_type": "markdown", 950 | "metadata": {}, 951 | "source": [ 952 | "Yup, ``y_train`` is an np.array" 953 | ] 954 | }, 955 | { 956 | "cell_type": "markdown", 957 | "metadata": {}, 958 | "source": [ 959 | "Remember that we have 10 neurons in the output layer. That means our output needs to be of ${n{\\times}10}$ dimensions." 960 | ] 961 | }, 962 | { 963 | "cell_type": "markdown", 964 | "metadata": {}, 965 | "source": [ 966 | "Is the shape of ``y_train`` $n{\\times}10$?" 967 | ] 968 | }, 969 | { 970 | "cell_type": "code", 971 | "execution_count": 22, 972 | "metadata": {}, 973 | "outputs": [ 974 | { 975 | "data": { 976 | "text/plain": [ 977 | "(60000,)" 978 | ] 979 | }, 980 | "execution_count": 22, 981 | "metadata": {}, 982 | "output_type": "execute_result" 983 | } 984 | ], 985 | "source": [ 986 | "# Check shape of y_train\n", 987 | "y_train.shape" 988 | ] 989 | }, 990 | { 991 | "cell_type": "markdown", 992 | "metadata": {}, 993 | "source": [ 994 | "Nope, ``y_train`` is of shape $60000{\\times}1$" 995 | ] 996 | }, 997 | { 998 | "cell_type": "markdown", 999 | "metadata": {}, 1000 | "source": [ 1001 | "What are its values like?" 1002 | ] 1003 | }, 1004 | { 1005 | "cell_type": "code", 1006 | "execution_count": 23, 1007 | "metadata": {}, 1008 | "outputs": [ 1009 | { 1010 | "name": "stdout", 1011 | "output_type": "stream", 1012 | "text": [ 1013 | "5\n", 1014 | "0\n", 1015 | "4\n", 1016 | "1\n", 1017 | "9\n" 1018 | ] 1019 | } 1020 | ], 1021 | "source": [ 1022 | "for i in range(5):\n", 1023 | " print(y_train[i])" 1024 | ] 1025 | }, 1026 | { 1027 | "cell_type": "markdown", 1028 | "metadata": {}, 1029 | "source": [ 1030 | "So ``y_train`` carries the numbers of the digits the images represent." 1031 | ] 1032 | }, 1033 | { 1034 | "cell_type": "markdown", 1035 | "metadata": {}, 1036 | "source": [ 1037 | "We need to make a new binary array of $60000{\\times}10$ and insert a 1 in the column corresponding to the number of the digit its image shows.\n", 1038 | "\n", 1039 | "For example, the first row of our new y_train should look like $\\left[\\begin{array}{c}0&0&0&0&0&1&0&0&0&0\\end{array}\\right]$, since it represents 5. This is called one-hot encoding." 1040 | ] 1041 | }, 1042 | { 1043 | "cell_type": "code", 1044 | "execution_count": 24, 1045 | "metadata": { 1046 | "collapsed": true 1047 | }, 1048 | "outputs": [], 1049 | "source": [ 1050 | "# Make new y_train of nx10 elements\n", 1051 | "new_y_train = np.zeros((len(y_train), 10))\n", 1052 | "for i in range(len(y_train)):\n", 1053 | " new_y_train[i, y_train[i]] = 1" 1054 | ] 1055 | }, 1056 | { 1057 | "cell_type": "code", 1058 | "execution_count": 25, 1059 | "metadata": { 1060 | "collapsed": true 1061 | }, 1062 | "outputs": [], 1063 | "source": [ 1064 | "# Make new y_test of nx10 elements\n", 1065 | "new_y_test = np.zeros((len(y_test), 10))\n", 1066 | "for i in range(len(y_test)):\n", 1067 | " new_y_test[i, y_test[i]] = 1" 1068 | ] 1069 | }, 1070 | { 1071 | "cell_type": "code", 1072 | "execution_count": 26, 1073 | "metadata": {}, 1074 | "outputs": [ 1075 | { 1076 | "name": "stdout", 1077 | "output_type": "stream", 1078 | "text": [ 1079 | "[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]\n", 1080 | "[ 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]\n" 1081 | ] 1082 | } 1083 | ], 1084 | "source": [ 1085 | "# Check first row of y_train\n", 1086 | "print(new_y_train[0])\n", 1087 | "print(new_y_test[0])" 1088 | ] 1089 | }, 1090 | { 1091 | "cell_type": "markdown", 1092 | "metadata": {}, 1093 | "source": [ 1094 | "Now that new_y_train is correctly shaped and formatted, let us reassign the name y_train to the matrix new_y_train." 1095 | ] 1096 | }, 1097 | { 1098 | "cell_type": "code", 1099 | "execution_count": 27, 1100 | "metadata": { 1101 | "collapsed": true 1102 | }, 1103 | "outputs": [], 1104 | "source": [ 1105 | "# Reassign the name \"y_train\" to new_y_train\n", 1106 | "y_train = new_y_train\n", 1107 | "y_test = new_y_test" 1108 | ] 1109 | }, 1110 | { 1111 | "cell_type": "markdown", 1112 | "metadata": {}, 1113 | "source": [ 1114 | "## Initialize the network" 1115 | ] 1116 | }, 1117 | { 1118 | "cell_type": "code", 1119 | "execution_count": 28, 1120 | "metadata": { 1121 | "collapsed": true 1122 | }, 1123 | "outputs": [], 1124 | "source": [ 1125 | "# Initialize network\n", 1126 | "layers = [784, 15, 10]\n", 1127 | "weights = initializeWeights(layers)\n", 1128 | "\n", 1129 | "# Take backup of weights to be used later for comparison\n", 1130 | "initialWeights = [np.array(w) for w in weights]" 1131 | ] 1132 | }, 1133 | { 1134 | "cell_type": "code", 1135 | "execution_count": 29, 1136 | "metadata": {}, 1137 | "outputs": [ 1138 | { 1139 | "data": { 1140 | "text/plain": [ 1141 | "'\\nprint(\"weights:\")\\nfor i in range(len(weights)):\\n print(i+1); print(weights[i].shape); print(weights[i])\\n'" 1142 | ] 1143 | }, 1144 | "execution_count": 29, 1145 | "metadata": {}, 1146 | "output_type": "execute_result" 1147 | } 1148 | ], 1149 | "source": [ 1150 | "# Please don't print the weights\n", 1151 | "# There are 15*784=11760 weights in the first layer,\n", 1152 | "# + 10*15=150 weights in the second layer\n", 1153 | "'''\n", 1154 | "print(\"weights:\")\n", 1155 | "for i in range(len(weights)):\n", 1156 | " print(i+1); print(weights[i].shape); print(weights[i])\n", 1157 | "'''\n" 1158 | ] 1159 | }, 1160 | { 1161 | "cell_type": "markdown", 1162 | "metadata": {}, 1163 | "source": [ 1164 | "## Train the network\n", 1165 | "\n", 1166 | "Use the proper inputs ``x_train`` and ``y_train`` to train your neural network." 1167 | ] 1168 | }, 1169 | { 1170 | "cell_type": "markdown", 1171 | "metadata": {}, 1172 | "source": [ 1173 | "How many iterations do you want to perform? How much should be the learning rate? Should it be adaptive? How many neurons per layer?" 1174 | ] 1175 | }, 1176 | { 1177 | "cell_type": "markdown", 1178 | "metadata": {}, 1179 | "source": [ 1180 | "Remember that there are 60,000 images in the training set." 1181 | ] 1182 | }, 1183 | { 1184 | "cell_type": "code", 1185 | "execution_count": 30, 1186 | "metadata": {}, 1187 | "outputs": [ 1188 | { 1189 | "name": "stdout", 1190 | "output_type": "stream", 1191 | "text": [ 1192 | "Iteration 1 of 1\n", 1193 | "Cost: 1.97857726345\n", 1194 | "Time: 3.7738959789276123 seconds\n" 1195 | ] 1196 | } 1197 | ], 1198 | "source": [ 1199 | "# Train the network using Gradient Descent\n", 1200 | "# Let's check how much time it takes for 1 iteration\n", 1201 | "\n", 1202 | "# Set options\n", 1203 | "nIterations = 1\n", 1204 | "learningRate = 1.0\n", 1205 | "\n", 1206 | "# Start time\n", 1207 | "start = time.time()\n", 1208 | "\n", 1209 | "# Train\n", 1210 | "trainUsingGD(weights, x_train, y_train, nIterations, learningRate)\n", 1211 | "\n", 1212 | "# End time\n", 1213 | "end = time.time()\n", 1214 | "\n", 1215 | "print(\"Time: \"+str(end - start)+\" seconds\")" 1216 | ] 1217 | }, 1218 | { 1219 | "cell_type": "markdown", 1220 | "metadata": { 1221 | "collapsed": true 1222 | }, 1223 | "source": [ 1224 | "See how it takes SO LONG for just one iteration?" 1225 | ] 1226 | }, 1227 | { 1228 | "cell_type": "markdown", 1229 | "metadata": {}, 1230 | "source": [ 1231 | "**Problem: Batch Gradient Descent computes error, delta, etc. over the entire input data set**\n", 1232 | "\n", 1233 | "Solution: Don't change weights over the entire data set, repeatedly use a randomly sampled subset of the data set.\n", 1234 | "\n", 1235 | "This is called the Monte Carlo method, and in this case it has been developed into Stochastic Gradient Descent." 1236 | ] 1237 | }, 1238 | { 1239 | "cell_type": "markdown", 1240 | "metadata": {}, 1241 | "source": [ 1242 | "# Mini-batch Gradient Descent" 1243 | ] 1244 | }, 1245 | { 1246 | "cell_type": "markdown", 1247 | "metadata": { 1248 | "collapsed": true 1249 | }, 1250 | "source": [ 1251 | "We shall define a $minibatchSize$ lesser than the number of data points input to the network ($n$). Say $minibatchSize = 100$.\n", 1252 | "\n", 1253 | "**Mini-batch GD**:\n", 1254 | "\n", 1255 | "For every epoch:\n", 1256 | "- randomly group the input data set into mini-batches of ($minibatchSize=$) 100 images:\n", 1257 | " - randomly shuffle the entire data set\n", 1258 | " - consider every 100 images as one mini-batch - so there are ``int(n/minibatchSize)`` number of mini-batches\n", 1259 | "- use gradient descent on every mini-batch to update weights\n", 1260 | "- Repeat.\n", 1261 | "\n", 1262 | "If $minibatchSize=n$, this is the same as Batch Gradient Descent.\n", 1263 | "\n", 1264 | "If $minibatchSize=1$, i.e. we update the weights after backpropagating for only one image, it is called **Stochastic Grdient Descent**." 1265 | ] 1266 | }, 1267 | { 1268 | "cell_type": "markdown", 1269 | "metadata": {}, 1270 | "source": [ 1271 | "So, at every iteration we are using gradient descent on only $minibatchSize$ number of images.\n", 1272 | "\n", 1273 | "Mathematical proofs exist on why this works better than gradient descent, under some assumptions (like stationarity, which holds true for our purposes)." 1274 | ] 1275 | }, 1276 | { 1277 | "cell_type": "markdown", 1278 | "metadata": {}, 1279 | "source": [ 1280 | "Let's code Mini-batch Gradient Descent:" 1281 | ] 1282 | }, 1283 | { 1284 | "cell_type": "code", 1285 | "execution_count": 43, 1286 | "metadata": { 1287 | "collapsed": true 1288 | }, 1289 | "outputs": [], 1290 | "source": [ 1291 | "# TRAINING USING MINI-BATCH GRADIENT DESCENT\n", 1292 | "# Default learning rate = 1.0\n", 1293 | "def trainUsingMinibatchGD(weights, X, Y, minibatchSize, nEpochs, learningRate=1.0):\n", 1294 | " # For nIterations number of iterations:\n", 1295 | " for i in range(nEpochs):\n", 1296 | " # clear output\n", 1297 | " #clear_output()\n", 1298 | " print(\"Epoch \"+str(i+1)+\" of \"+str(nEpochs))\n", 1299 | " \n", 1300 | " # Make a list of all the indices\n", 1301 | " fullIdx = list(range(len(Y)))\n", 1302 | " \n", 1303 | " # Shuffle the full index\n", 1304 | " np.random.shuffle(fullIdx)\n", 1305 | " \n", 1306 | " # Count number of mini-batches\n", 1307 | " nOfMinibatches = int(len(X)/minibatchSize)\n", 1308 | " \n", 1309 | " # For each mini-batch\n", 1310 | " for m in range(nOfMinibatches):\n", 1311 | " # Compute the starting index of this mini-batch\n", 1312 | " startIdx = m*minibatchSize\n", 1313 | " \n", 1314 | " # Declare sampled inputs and outputs\n", 1315 | " xSample = X[fullIdx[startIdx:startIdx+minibatchSize]]\n", 1316 | " ySample = Y[fullIdx[startIdx:startIdx+minibatchSize]]\n", 1317 | "\n", 1318 | " # Run backprop\n", 1319 | " backProp(weights, xSample, ySample, learningRate)" 1320 | ] 1321 | }, 1322 | { 1323 | "cell_type": "markdown", 1324 | "metadata": {}, 1325 | "source": [ 1326 | "Using MinibatchGD, training upto the same accuracy should take lesser time than GD." 1327 | ] 1328 | }, 1329 | { 1330 | "cell_type": "code", 1331 | "execution_count": 44, 1332 | "metadata": { 1333 | "collapsed": true 1334 | }, 1335 | "outputs": [], 1336 | "source": [ 1337 | "# Initialize network\n", 1338 | "layers = [784, 30, 10]\n", 1339 | "weights = initializeWeights(layers)\n", 1340 | "\n", 1341 | "# Take backup of weights to be used later for comparison\n", 1342 | "initialWeights = [np.array(w) for w in weights]" 1343 | ] 1344 | }, 1345 | { 1346 | "cell_type": "code", 1347 | "execution_count": 45, 1348 | "metadata": {}, 1349 | "outputs": [ 1350 | { 1351 | "name": "stdout", 1352 | "output_type": "stream", 1353 | "text": [ 1354 | "5570 out of 60000 : 0.09283333333333334\n" 1355 | ] 1356 | } 1357 | ], 1358 | "source": [ 1359 | "# Evaluate initial weights on training data\n", 1360 | "evaluate(weights, x_train, y_train)" 1361 | ] 1362 | }, 1363 | { 1364 | "cell_type": "code", 1365 | "execution_count": 46, 1366 | "metadata": {}, 1367 | "outputs": [ 1368 | { 1369 | "name": "stdout", 1370 | "output_type": "stream", 1371 | "text": [ 1372 | "948 out of 10000 : 0.0948\n" 1373 | ] 1374 | } 1375 | ], 1376 | "source": [ 1377 | "# Evaluate initial weights on test data\n", 1378 | "evaluate(weights, x_test, y_test)" 1379 | ] 1380 | }, 1381 | { 1382 | "cell_type": "markdown", 1383 | "metadata": {}, 1384 | "source": [ 1385 | "- Let's first use Batch Gradient Descent ($minibatchSize = size\\;of \\;full\\;input$) to evaluate the accuracy and time with one iteration " 1386 | ] 1387 | }, 1388 | { 1389 | "cell_type": "code", 1390 | "execution_count": 47, 1391 | "metadata": {}, 1392 | "outputs": [ 1393 | { 1394 | "name": "stdout", 1395 | "output_type": "stream", 1396 | "text": [ 1397 | "Epoch 1 of 1\n", 1398 | "Training accuracy:\n", 1399 | "5889 out of 60000 : 0.09815\n", 1400 | "Test accuracy:\n", 1401 | "1012 out of 10000 : 0.1012\n", 1402 | "Time: 2.8622570037841797 seconds\n" 1403 | ] 1404 | } 1405 | ], 1406 | "source": [ 1407 | "# Train the network ONCE using Batch Gradient Descent to check accuracy and time\n", 1408 | "\n", 1409 | "# Re-initialize weights\n", 1410 | "weights = [np.array(w) for w in initialWeights]\n", 1411 | "\n", 1412 | "# Set options for batch gradient descent\n", 1413 | "minibatchSize = len(y_train)\n", 1414 | "nEpochs = 1\n", 1415 | "learningRate = 3.0\n", 1416 | "\n", 1417 | "# Start time\n", 1418 | "start = time.time()\n", 1419 | "\n", 1420 | "# Train\n", 1421 | "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n", 1422 | "\n", 1423 | "# End time\n", 1424 | "end = time.time()\n", 1425 | "\n", 1426 | "# Evaluate accuracy\n", 1427 | "print(\"Training accuracy:\")\n", 1428 | "evaluate(weights, x_train, y_train)\n", 1429 | "print(\"Test accuracy:\")\n", 1430 | "evaluate(weights, x_test, y_test)\n", 1431 | "\n", 1432 | "# Print time taken\n", 1433 | "print(\"Time: \"+str(end-start)+\" seconds\")" 1434 | ] 1435 | }, 1436 | { 1437 | "cell_type": "markdown", 1438 | "metadata": {}, 1439 | "source": [ 1440 | "- Okay, let's check with Stochastic Gradient Descent, i.e. $minibatchSize = 1$" 1441 | ] 1442 | }, 1443 | { 1444 | "cell_type": "code", 1445 | "execution_count": 48, 1446 | "metadata": {}, 1447 | "outputs": [ 1448 | { 1449 | "name": "stdout", 1450 | "output_type": "stream", 1451 | "text": [ 1452 | "Epoch 1 of 1\n", 1453 | "Training accuracy:\n", 1454 | "44816 out of 60000 : 0.7469333333333333\n", 1455 | "Test accuracy:\n", 1456 | "7539 out of 10000 : 0.7539\n", 1457 | "Time: 21.746292114257812 seconds\n" 1458 | ] 1459 | } 1460 | ], 1461 | "source": [ 1462 | "# Train the network ONCE using Stochastic Gradient Descent to check accuracy and time\n", 1463 | "\n", 1464 | "# Re-initialize weights\n", 1465 | "weights = [np.array(w) for w in initialWeights]\n", 1466 | "\n", 1467 | "# Set options of stochastic gradient descent\n", 1468 | "minibatchSize = 1\n", 1469 | "nEpochs = 1\n", 1470 | "learningRate = 3.0\n", 1471 | "\n", 1472 | "# Start time\n", 1473 | "start = time.time()\n", 1474 | "\n", 1475 | "# Train\n", 1476 | "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n", 1477 | "\n", 1478 | "# End time\n", 1479 | "end = time.time()\n", 1480 | "\n", 1481 | "# Evaluate accuracy\n", 1482 | "print(\"Training accuracy:\")\n", 1483 | "evaluate(weights, x_train, y_train)\n", 1484 | "print(\"Test accuracy:\")\n", 1485 | "evaluate(weights, x_test, y_test)\n", 1486 | "\n", 1487 | "# Print time taken\n", 1488 | "print(\"Time: \"+str(end-start)+\" seconds\")" 1489 | ] 1490 | }, 1491 | { 1492 | "cell_type": "markdown", 1493 | "metadata": {}, 1494 | "source": [ 1495 | "Stochastic Gradient Descent took more time, but gave much better accuracy in just 1 epoch." 1496 | ] 1497 | }, 1498 | { 1499 | "cell_type": "markdown", 1500 | "metadata": {}, 1501 | "source": [ 1502 | "- Let's now check for Mini-batch Gradient Descent, with $minibatchSize = $ (say) $10$" 1503 | ] 1504 | }, 1505 | { 1506 | "cell_type": "code", 1507 | "execution_count": 49, 1508 | "metadata": {}, 1509 | "outputs": [ 1510 | { 1511 | "name": "stdout", 1512 | "output_type": "stream", 1513 | "text": [ 1514 | "Epoch 1 of 1\n", 1515 | "Training accuracy:\n", 1516 | "52428 out of 60000 : 0.8738\n", 1517 | "Test accuracy:\n", 1518 | "8752 out of 10000 : 0.8752\n", 1519 | "Time: 4.0647711753845215 seconds\n" 1520 | ] 1521 | } 1522 | ], 1523 | "source": [ 1524 | "# Train the network ONCE using Mini-batch Gradient Descent to check accuracy and time\n", 1525 | "\n", 1526 | "# Re-initialize weights\n", 1527 | "weights = [np.array(w) for w in initialWeights]\n", 1528 | "\n", 1529 | "# Set options of mini-batch gradient descent\n", 1530 | "minibatchSize = 10\n", 1531 | "nEpochs = 1\n", 1532 | "learningRate = 3.0\n", 1533 | "\n", 1534 | "# Start time\n", 1535 | "start = time.time()\n", 1536 | "\n", 1537 | "# Train\n", 1538 | "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n", 1539 | "\n", 1540 | "# End time\n", 1541 | "end = time.time()\n", 1542 | "\n", 1543 | "# Evaluate accuracy\n", 1544 | "print(\"Training accuracy:\")\n", 1545 | "evaluate(weights, x_train, y_train)\n", 1546 | "print(\"Test accuracy:\")\n", 1547 | "evaluate(weights, x_test, y_test)\n", 1548 | "\n", 1549 | "# Print time taken\n", 1550 | "print(\"Time: \"+str(end-start)+\" seconds\")" 1551 | ] 1552 | }, 1553 | { 1554 | "cell_type": "markdown", 1555 | "metadata": {}, 1556 | "source": [ 1557 | "Thus, (in 1 epoch) Mini-batch Gradient descent gives comparable accuracy to Stochastic Gradient Descent, which is much better than the accuracy given by Batch Gradient Descent, in much lesser time." 1558 | ] 1559 | }, 1560 | { 1561 | "cell_type": "markdown", 1562 | "metadata": {}, 1563 | "source": [ 1564 | "## Classifying MNIST data set\n", 1565 | "\n", 1566 | "Let us try to classify the MNIST data set up to more than 99%. This means deciding the number of layers, size of each layer, number of Epochs, the mini-batch size, and the learning (constant, for now).\n", 1567 | "\n", 1568 | "Let us try, $layers = [784$ (input layer, because each MNIST image is 28$x$28)$, 30$ (hidden layer)$, 10$ (outputs layer, one neuron for each digit)$], nEpochs = 30, minibatchSize = 10, learningRate = 3.0$" 1569 | ] 1570 | }, 1571 | { 1572 | "cell_type": "code", 1573 | "execution_count": 55, 1574 | "metadata": {}, 1575 | "outputs": [ 1576 | { 1577 | "name": "stdout", 1578 | "output_type": "stream", 1579 | "text": [ 1580 | "Epoch 1 of 50\n", 1581 | "Epoch 2 of 50\n", 1582 | "Epoch 3 of 50\n", 1583 | "Epoch 4 of 50\n", 1584 | "Epoch 5 of 50\n", 1585 | "Epoch 6 of 50\n", 1586 | "Epoch 7 of 50\n", 1587 | "Epoch 8 of 50\n", 1588 | "Epoch 9 of 50\n", 1589 | "Epoch 10 of 50\n", 1590 | "Epoch 11 of 50\n", 1591 | "Epoch 12 of 50\n", 1592 | "Epoch 13 of 50\n", 1593 | "Epoch 14 of 50\n", 1594 | "Epoch 15 of 50\n", 1595 | "Epoch 16 of 50\n", 1596 | "Epoch 17 of 50\n", 1597 | "Epoch 18 of 50\n", 1598 | "Epoch 19 of 50\n", 1599 | "Epoch 20 of 50\n", 1600 | "Epoch 21 of 50\n", 1601 | "Epoch 22 of 50\n", 1602 | "Epoch 23 of 50\n", 1603 | "Epoch 24 of 50\n", 1604 | "Epoch 25 of 50\n", 1605 | "Epoch 26 of 50\n", 1606 | "Epoch 27 of 50\n", 1607 | "Epoch 28 of 50\n", 1608 | "Epoch 29 of 50\n", 1609 | "Epoch 30 of 50\n", 1610 | "Epoch 31 of 50\n", 1611 | "Epoch 32 of 50\n", 1612 | "Epoch 33 of 50\n", 1613 | "Epoch 34 of 50\n", 1614 | "Epoch 35 of 50\n", 1615 | "Epoch 36 of 50\n", 1616 | "Epoch 37 of 50\n", 1617 | "Epoch 38 of 50\n", 1618 | "Epoch 39 of 50\n", 1619 | "Epoch 40 of 50\n", 1620 | "Epoch 41 of 50\n", 1621 | "Epoch 42 of 50\n", 1622 | "Epoch 43 of 50\n", 1623 | "Epoch 44 of 50\n", 1624 | "Epoch 45 of 50\n", 1625 | "Epoch 46 of 50\n", 1626 | "Epoch 47 of 50\n", 1627 | "Epoch 48 of 50\n", 1628 | "Epoch 49 of 50\n", 1629 | "Epoch 50 of 50\n", 1630 | "Training accuracy:\n", 1631 | "58180 out of 60000 : 0.9696666666666667\n", 1632 | "Test accuracy:\n", 1633 | "9397 out of 10000 : 0.9397\n" 1634 | ] 1635 | } 1636 | ], 1637 | "source": [ 1638 | "# TRAIN A NETWORK TO CLASSIFY MNIST\n", 1639 | "\n", 1640 | "# Initialize network\n", 1641 | "layers = [784, 30, 10]\n", 1642 | "weights = initializeWeights(layers)\n", 1643 | "\n", 1644 | "# Take backup of weights to be used later for comparison\n", 1645 | "initialWeights = [np.array(w) for w in weights]\n", 1646 | "\n", 1647 | "# Set options of mini-batch gradient descent\n", 1648 | "minibatchSize = 10\n", 1649 | "nEpochs = 50\n", 1650 | "learningRate = 3.0\n", 1651 | "\n", 1652 | "# Train\n", 1653 | "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n", 1654 | "\n", 1655 | "# Evaluate accuracy\n", 1656 | "print(\"Training accuracy:\")\n", 1657 | "evaluate(weights, x_train, y_train)\n", 1658 | "print(\"Test accuracy:\")\n", 1659 | "evaluate(weights, x_test, y_test)" 1660 | ] 1661 | }, 1662 | { 1663 | "cell_type": "markdown", 1664 | "metadata": {}, 1665 | "source": [ 1666 | "About 93%-95%.. What if we increase the mini-batch size?" 1667 | ] 1668 | }, 1669 | { 1670 | "cell_type": "code", 1671 | "execution_count": 59, 1672 | "metadata": {}, 1673 | "outputs": [ 1674 | { 1675 | "name": "stdout", 1676 | "output_type": "stream", 1677 | "text": [ 1678 | "Epoch 1 of 10\n", 1679 | "Epoch 2 of 10\n", 1680 | "Epoch 3 of 10\n", 1681 | "Epoch 4 of 10\n", 1682 | "Epoch 5 of 10\n", 1683 | "Epoch 6 of 10\n", 1684 | "Epoch 7 of 10\n", 1685 | "Epoch 8 of 10\n", 1686 | "Epoch 9 of 10\n", 1687 | "Epoch 10 of 10\n", 1688 | "Training accuracy:\n", 1689 | "53245 out of 60000 : 0.8874166666666666\n", 1690 | "Test accuracy:\n", 1691 | "8846 out of 10000 : 0.8846\n" 1692 | ] 1693 | } 1694 | ], 1695 | "source": [ 1696 | "# TRAIN A NETWORK TO CLASSIFY MNIST\n", 1697 | "\n", 1698 | "# Initialize network\n", 1699 | "layers = [784, 10, 10, 10]\n", 1700 | "weights = initializeWeights(layers)\n", 1701 | "\n", 1702 | "# Take backup of weights to be used later for comparison\n", 1703 | "initialWeights = [np.array(w) for w in weights]\n", 1704 | "\n", 1705 | "# Set options of mini-batch gradient descent\n", 1706 | "minibatchSize = 10\n", 1707 | "nEpochs = 30\n", 1708 | "learningRate = 3.0\n", 1709 | "\n", 1710 | "# Train\n", 1711 | "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n", 1712 | "\n", 1713 | "# Evaluate accuracy\n", 1714 | "print(\"Training accuracy:\")\n", 1715 | "evaluate(weights, x_train, y_train)\n", 1716 | "print(\"Test accuracy:\")\n", 1717 | "evaluate(weights, x_test, y_test)" 1718 | ] 1719 | }, 1720 | { 1721 | "cell_type": "markdown", 1722 | "metadata": { 1723 | "collapsed": true 1724 | }, 1725 | "source": [ 1726 | "## Coming up next\n", 1727 | "\n", 1728 | "In the next tutorial, we shall see the different types of optimizations that can be done in gradient descent, and compare their performances." 1729 | ] 1730 | } 1731 | ], 1732 | "metadata": { 1733 | "kernelspec": { 1734 | "display_name": "Python 3", 1735 | "language": "python", 1736 | "name": "python3" 1737 | }, 1738 | "language_info": { 1739 | "codemirror_mode": { 1740 | "name": "ipython", 1741 | "version": 3 1742 | }, 1743 | "file_extension": ".py", 1744 | "mimetype": "text/x-python", 1745 | "name": "python", 1746 | "nbconvert_exporter": "python", 1747 | "pygments_lexer": "ipython3", 1748 | "version": "3.5.1" 1749 | } 1750 | }, 1751 | "nbformat": 4, 1752 | "nbformat_minor": 2 1753 | } 1754 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Neural Network in Python 2 | 3 | An implementation of a Multi-Layer Perceptron, with forward propagation, back propagation using Gradient Descent, training usng Batch or Stochastic Gradient Descent 4 | 5 | Use: myNN = MyPyNN(nOfInputDims, nOfHiddenLayers, sizesOfHiddenLayers, nOfOutputDims, alpha, regLambda) 6 | Here, alpha = learning rate of gradient descent, regLambda = regularization parameter 7 | 8 | ## Example 1 9 | 10 | ``` 11 | from myPyNN import * 12 | X = [0, 0.5, 1] 13 | y = [0, 0.5, 1] 14 | myNN = MyPyNN([1, 1, 1]] 15 | ``` 16 | Input Layer : 1-dimensional (Eg: X) 17 | 18 | 1 Hidden Layer : 1-dimensional 19 | 20 | Output Layer : 1-dimensional (Eg. y) 21 | 22 | Learning Rate : 0.05 (default) 23 | ``` 24 | print myNN.predict(0.2) 25 | ``` 26 | 27 | 28 | ## Example 2 29 | ``` 30 | X = [[0,0], [1,1]] 31 | y = [0, 1] 32 | myNN = MyPyNN([2, 3, 1]) 33 | ``` 34 | Input Layer : 2-dimensional (Eg: X) 35 | 36 | 1 Hidden Layer : 3-dimensional 37 | 38 | Output Layer : 1-dimensional (Eg. y) 39 | 40 | Learning rate : 0.8 41 | ``` 42 | print myNN.predict(X) 43 | #myNN.trainUsingGD(X, y, 899) 44 | myNN.trainUsingSGD(X, y, 1000) 45 | print myNN.predict(X) 46 | ``` 47 | 48 | ## Example 3 49 | 50 | ``` 51 | X = [[2,2,2], [3,3,3], [4,4,4], [5,5,5], [6,6,6], [7,7,7], [8,8,8], [9,9,9], [10,10,10], [11,11,11]] 52 | y = [.2, .3, .4, .5, .6, .7, .8, .9, 0, .1] 53 | myNN = MyPyNN([3, 10, 10, 5, 1]) 54 | ``` 55 | Input Layer : 3-dimensional (Eg: X) 56 | 57 | 3 Hidden Layers: 10-dimensional, 10-dimensional, 5-dimensional 58 | 59 | Output Layer : 1-dimensional (Eg. y) 60 | 61 | Learning rate : 0.9 62 | 63 | Regularization parameter : 0.5 64 | ``` 65 | print myNN.predict(X) 66 | #myNN.trainUsingGD(X, y, 899) 67 | myNN.trainUsingSGD(X, y, 1000) 68 | print myNN.predict(X) 69 | ``` 70 | 71 | ## Requirements for interactive tutorial (myPyNN.ipynb) 72 | 73 | I ran this in OS X, after installing brew for command-line use, and pip for python-related stuff. 74 | 75 | ### Python 76 | 77 | I designed the tutorial on Python 2.7, can be run on Python 3 as well. 78 | 79 | ### Packages 80 | 81 | - numpy 82 | - matplotlib 83 | - ipywidgets 84 | 85 | ### Jupyter 86 | 87 | The tutorial is an iPython notebook. It is designed and meant to run in Jupyter. To install Jupyter, one can install Anaconda which would install Python, Jupyter, along with a lot of other stuff. Or, one can install only Jupyter using: 88 | ``` 89 | pip install jupyter 90 | ``` 91 | 92 | ### ipywidgets 93 | 94 | ipywidgets comes pre-installed with Jupyter. However, widgets might need to be actived using: 95 | ``` 96 | jupyter nbextension enable --py widgetsnbextension 97 | jupyter nbextension enable --py --sys-prefix widgetsnbextension 98 | ``` 99 | 100 | ## References 101 | - [Machine Learning Mastery's excellent tutorial](https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/) 102 | 103 | - [Mattmazur's example](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/) 104 | 105 | - [Welch Lab's excellent video playlist on neural networks](https://www.youtube.com/playlist?list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU) 106 | 107 | - [Michael Nielsen's brilliant hands-on interactive tutorial on the awesome power of neural networks as universal approximators](https://neuralnetworksanddeeplearning.com/chap4.html) 108 | 109 | - [Excellent overview of gradient descent algorithms](http://sebastianruder.com/optimizing-gradient-descent/) 110 | 111 | - [CS321n's iPython tutorial](https://cs231n.github.io/ipython-tutorial/) 112 | 113 | - [Karlijn Willem's definitive Jupyter guide](https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook#gs.SJPul58) 114 | 115 | - [matplotlib](https://matplotlib.org/) 116 | 117 | - [Tutorial on using Matplotlib in Jupyter](https://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-4-Matplotlib.ipynb) 118 | 119 | - [Interactive dashboards in Jupyter](https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/) 120 | 121 | - [ipywidgets - for interactive dashboards in Jupyter](http://ipywidgets.readthedocs.io/) 122 | 123 | - [drawing-animating-shapes-matplotlib](https://nickcharlton.net/posts/drawing-animating-shapes-matplotlib.html) 124 | 125 | - [RISE - for Jupyter presentations](https://github.com/damianavila/RISE) 126 | 127 | - [MathJax syntax list](https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference) 128 | 129 | - [MNIST dataset and results](http://yann.lecun.com/exdb/mnist/) 130 | 131 | - [MNIST dataset .npz file (Amazon AWS)](https://s3.amazonaws.com/img-datasets/mnist.npz) 132 | 133 | - [NpzFile doc](http://docr.it/numpy/lib/npyio/NpzFile) 134 | 135 | - [matplotlib examples from SciPy](http://scipython.com/book/chapter-7-matplotlib/examples/simple-surface-plots/) 136 | 137 | - [Yann LeCun's backprop paper, containing tips for efficient backpropagation](http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf) 138 | 139 | - [Mathematical notations for LaTeX, which can also be used in Jupyter](https://en.wikibooks.org/wiki/LaTeX/Mathematics) 140 | 141 | - [JupyterHub](http://jupyterhub.readthedocs.io/en/latest/getting-started.html) 142 | 143 | - [Optional code visibility in iPython notebooks](https://chris-said.io/2016/02/13/how-to-make-polished-jupyter-presentations-with-optional-code-visibility/) 144 | 145 | - [Ultimate iPython notebook tips](https://blog.juliusschulz.de/blog/ultimate-ipython-notebook) 146 | 147 | - [Full preprocessing for medical images tutorial](https://www.kaggle.com/gzuidhof/data-science-bowl-2017/full-preprocessing-tutorial) 148 | 149 | - [Example ConvNet for a kaggle problem (cats vs dogs)](https://www.kaggle.com/sentdex/dogs-vs-cats-redux-kernels-edition/full-classification-example-with-convnet) 150 | 151 | - Fernando Pérez, Brian E. Granger, IPython: A System for Interactive Scientific Computing, Computing in Science and Engineering, vol. 9, no. 3, pp. 21-29, May/June 2007, doi:10.1109/MCSE.2007.53. URL: (http://ipython.org) 152 | 153 | -------------------------------------------------------------------------------- /images/Title_ANN.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/voletiv/myPythonNeuralNetwork/e215c6f20ca0b2d7aa947f956049110f4e60b094/images/Title_ANN.png -------------------------------------------------------------------------------- /images/digitsNN.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/voletiv/myPythonNeuralNetwork/e215c6f20ca0b2d7aa947f956049110f4e60b094/images/digitsNN.png -------------------------------------------------------------------------------- /images/optimizers.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/voletiv/myPythonNeuralNetwork/e215c6f20ca0b2d7aa947f956049110f4e60b094/images/optimizers.gif -------------------------------------------------------------------------------- /myPyNN.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | DEBUG = 0 3 | 4 | class MyPyNN(object): 5 | 6 | def __init__(self, layers=[3, 4, 2]): 7 | 8 | self.layers = layers 9 | 10 | # Network 11 | self.weights = [np.random.randn(x+1, y) 12 | for x, y in zip(self.layers[:-1], self.layers[1:])] 13 | 14 | # For mean-centering 15 | self.meanX = np.zeros((1, self.layers[0])) 16 | 17 | # Default options 18 | self.learningRate = 1.0 19 | self.regLambda = 0 20 | self.adaptLearningRate = False 21 | self.normalizeInputs = False 22 | self.meanCentering = False 23 | self.visible = False 24 | 25 | def predict(self, X, visible=False): 26 | self.visible = visible 27 | # mean-centering 28 | inputs = self.preprocessTestingInputs(X) - self.meanX 29 | 30 | if inputs.ndim!=1 and inputs.ndim!=2: 31 | print "X is not one or two dimensional, please check." 32 | return 33 | 34 | if DEBUG or self.visible: 35 | print "PREDICT:" 36 | print inputs 37 | 38 | for l, w in enumerate(self.weights): 39 | inputs = self.addBiasTerms(inputs) 40 | inputs = self.sigmoid(np.dot(inputs, w)) 41 | if DEBUG or self.visible: 42 | print "Layer "+str(l+1) 43 | print inputs 44 | 45 | return inputs 46 | 47 | def trainUsingMinibatchGD(self, X, y, nEpochs=1000, minibatchSize=100, 48 | learningRate=0.05, regLambda=0, adaptLearningRate=False, 49 | normalizeInputs=False, meanCentering=False, 50 | printTestAccuracy=False, testX=None, testY=None, 51 | visible=False): 52 | self.learningRate = float(learningRate) 53 | self.regLambda = regLambda 54 | self.adaptLearningRate = adaptLearningRate 55 | self.normalizeInputs = normalizeInputs 56 | self.meanCentering = meanCentering 57 | self.visible = visible 58 | 59 | X = self.preprocessTrainingInputs(X) 60 | y = self.preprocessOutputs(y) 61 | 62 | yPred = self.predict(X, visible=self.visible) 63 | 64 | if yPred.shape != y.shape: 65 | print "Shape of y ("+str(y.shape)+") does not match what shape of y is supposed to be: "+str(yPred.shape) 66 | return 67 | 68 | self.trainAccuracy = (np.sum([np.argmax(yPred[k])==np.argmax(y[k]) 69 | for k in range(len(y))])).astype(float)/len(y) 70 | print "train accuracy = " + str(self.trainAccuracy) 71 | 72 | self.prevCost = 0.5*np.sum((yPred-y)**2)/len(y) 73 | print "cost = " + str(self.prevCost) 74 | self.cost = self.prevCost 75 | 76 | # mean-centering 77 | if self.meanCentering: 78 | X = X - self.meanX 79 | else: 80 | X = X 81 | 82 | self.inputs = X 83 | 84 | if DEBUG or self.visible: 85 | print "train input:"+str(inputs) 86 | 87 | # Just to ensure minibatchSize !> len(X) 88 | if minibatchSize > len(X): 89 | minibatchSize = int(len(X)/10)+1 90 | 91 | # Test data 92 | if printTestAccuracy: 93 | if testX==None and testY==None: 94 | print "No test data given" 95 | testX = np.zeros((1, len(X))) 96 | testY = np.zeros((1,1)) 97 | elif testX==None or testY==None: 98 | print "One of testData not available" 99 | return 100 | else: 101 | testX = self.preprocessTrainingInputs(testX) 102 | testY = self.preprocessOutputs(testY) 103 | if len(testX)!=len(testY): 104 | print "Test Datas not of same length" 105 | return 106 | 107 | yTestPred = self.predict(testX, visible=self.visible) 108 | self.testAccuracy = np.sum([np.argmax(yTestPred[k])==np.argmax(testY[k]) 109 | for k in range(len(testY))])/float(len(testY)) 110 | print "test accuracy = " + str(self.testAccuracy) 111 | 112 | # Randomly initialize old weights (for adaptive learning), will copy values later 113 | if adaptLearningRate: 114 | self.oldWeights = [np.random.randn(i+1, j) 115 | for i, j in zip(self.layers[:-1], self.layers[1:])] 116 | 117 | # For each epoch 118 | for i in range(nEpochs): 119 | 120 | print "Epoch "+str(i)+" of "+str(nEpochs) 121 | 122 | ## Find minibatches 123 | # Generate list of indices of full training data 124 | fullIdx = list(range(len(X))) 125 | # Shuffle the list 126 | np.random.shuffle(fullIdx) 127 | # Make list of mininbatches 128 | minibatches = [fullIdx[k:k+minibatchSize] 129 | for k in xrange(0, len(X), minibatchSize)] 130 | 131 | # For each minibatch 132 | for mininbatch in mininbatches: 133 | # Find X and y for each minibatch 134 | miniX = X[idx] 135 | miniY = y[idx] 136 | 137 | # Forward propagate through miniX 138 | a = self.forwardProp(miniX) 139 | 140 | # Check if Forward Propagation was successful 141 | if a==False: 142 | return 143 | 144 | # Save old weights before backProp in case of adaptLR 145 | if adaptLearningRate: 146 | for i in range(len(self.weights)): 147 | self.oldWeights[i] = np.array(self.weights[i]) 148 | 149 | # Back propagate, update weights for minibatch 150 | self.backPropGradDescent(miniX, miniY) 151 | 152 | yPred = self.predict(X, visible=self.visible) 153 | 154 | self.trainAccuracy = (np.sum([np.argmax(yPred[k])==np.argmax(y[k]) 155 | for k in range(len(y))])).astype(float)/len(y) 156 | print "train accuracy = " + str(self.trainAccuracy) 157 | if printTestAccuracy: 158 | yTestPred = self.predict(testX, visible=self.visible) 159 | self.testAccuracy = (np.sum([np.argmax(yTestPred[k])==np.argmax(testY[k]) 160 | for k in range(len(testY))])).astype(float)/len(testY) 161 | print "test accuracy = " + str(self.testAccuracy) 162 | 163 | self.cost = 0.5*np.sum((yPred-y)**2)/len(y) 164 | print "cost = " + str(self.cost) 165 | 166 | if adaptLearningRate: 167 | self.adaptLR() 168 | 169 | self.evaluate(X, y) 170 | 171 | self.prevCost = self.cost 172 | 173 | def forwardProp(self, inputs): 174 | inputs = self.preprocessInputs(inputs) 175 | print "Forward..." 176 | 177 | if inputs.ndim!=1 and inputs.ndim!=2: 178 | print "Input argument " + str(inputs.ndim) + \ 179 | "is not one or two dimensional, please check." 180 | return False 181 | 182 | if (inputs.ndim==1 and len(inputs)!=self.layers[0]) or \ 183 | (inputs.ndim==2 and inputs.shape[1]!=self.layers[0]): 184 | print "Input argument does not match input dimensions (" + \ 185 | str(self.layers[0]) + ") of network." 186 | return False 187 | 188 | if DEBUG or self.visible: 189 | print inputs 190 | 191 | # Save the outputs of each layer 192 | self.outputs = [] 193 | 194 | # For each layer 195 | for l, w in enumerate(self.weights): 196 | # Add bias term to the input 197 | inputs = self.addBiasTerms(inputs) 198 | 199 | # Calculate the output 200 | self.outputs.append(self.sigmoid(np.dot(inputs, w))) 201 | 202 | # Set this as the input to the next layer 203 | inputs = np.array(self.outputs[-1]) 204 | 205 | if DEBUG or self.visible: 206 | print "Layer "+str(l+1) 207 | print "inputs: "+str(inputs) 208 | print "weights: "+str(w) 209 | print "output: "+str(inputs) 210 | del inputs 211 | 212 | return True 213 | 214 | def backPropGradDescent(self, X, y): 215 | print "...Backward" 216 | 217 | # Correct the formats of inputs and outputs 218 | X = self.preprocessInputs(X) 219 | y = self.preprocessOutputs(y) 220 | 221 | # Compute first error 222 | bpError = self.outputs[-1] - y 223 | 224 | if DEBUG or self.visible: 225 | print "error = self.outputs[-1] - y:" 226 | print error 227 | 228 | # For each layer in reverse order (last layer to first layer) 229 | for l, w in enumerate(reversed(self.weights)): 230 | if DEBUG or self.visible: 231 | print "LAYER "+str(len(self.weights)-l) 232 | 233 | # The calculated output "z" of that layer 234 | predOutputs = self.outputs[-l-1] 235 | 236 | if DEBUG or self.visible: 237 | print "predOutputs" 238 | print predOutputs 239 | 240 | # delta = error*(z*(1-z)) === nxneurons 241 | delta = np.multiply(error, np.multiply(predOutputs, 1 - predOutputs)) 242 | 243 | if DEBUG or self.visible: 244 | print "To compute error to be backpropagated:" 245 | print "del = predOutputs*(1 - predOutputs)*error :" 246 | print delta 247 | print "weights:" 248 | print w 249 | 250 | # Compute new error to be propagated back (bias term neglected in backpropagation) 251 | bpError = np.dot(delta, w[1:,:].T) 252 | 253 | if DEBUG or self.visible: 254 | print "backprop error = np.dot(del, w[1:,:].T) :" 255 | print error 256 | 257 | # If we are at first layer, inputs are data points 258 | if l==len(self.weights)-1: 259 | inputs = self.addBiasTerms(X) 260 | # Else, inputs === outputs from previous layer 261 | else: 262 | inputs = self.addBiasTerms(self.outputs[-l-2]) 263 | 264 | if DEBUG or self.visible: 265 | print "To compute errorTerm:" 266 | print "inputs:" 267 | print inputs 268 | print "del:" 269 | print delta 270 | 271 | # errorTerm = (inputs.T).*(delta)/n 272 | # delta === nxneurons, inputs === nxprev, W === prevxneurons 273 | errorTerm = np.dot(inputs.T, delta)/len(y) 274 | if errorTerm.ndim==1: 275 | errorTerm.reshape((len(errorTerm), 1)) 276 | 277 | if DEBUG or self.visible: 278 | print "errorTerm = np.dot(inputs.T, del) :" 279 | print errorTerm 280 | 281 | # regularization term 282 | regWeight = np.zeros(w.shape) 283 | regWeight[1:,:] = self.regLambda #bias term neglected 284 | 285 | if DEBUG or self.visible: 286 | print "To update weights:" 287 | print "learningRate*errorTerm:" 288 | print self.learningRate*errorTerm 289 | print "regWeight:" 290 | print regWeight 291 | print "weights:" 292 | print w 293 | print "regTerm = regWeight*w :" 294 | print regWeight*w 295 | 296 | # Update weights 297 | self.weights[-l-1] = w - \ 298 | (self.learningRate*errorTerm + np.multiply(regWeight,w)) 299 | 300 | if DEBUG or self.visible: 301 | print "Updated 'weights' = learningRate*errorTerm + regTerm :" 302 | print self.weights[len(self.weights)-l-1] 303 | 304 | def adaptLR(self): 305 | if self.cost > self.prevCost: 306 | print "Cost increased!!" 307 | self.learningRate /= 2.0 308 | print " - learningRate halved to: "+str(self.learningRate) 309 | for i in range(len(self.weights)): 310 | self.weights[i] = self.oldWeights[i] 311 | print " - weights reverted back" 312 | # good function 313 | else: 314 | self.learningRate *= 1.05 315 | print " - learningRate increased by 5% to: "+str(self.learningRate) 316 | 317 | def preprocessTrainingInputs(self, X): 318 | X = self.preprocessInputs(X) 319 | if self.normalizeInputs and np.max(X) > 1.0: 320 | X = X/255.0 321 | if np.all(self.meanX == np.zeros((1, self.layers[0]))) and self.meanCentering: 322 | self.meanX = np.reshape(np.mean(X, axis=0), (1, X.shape[1])) 323 | return X 324 | 325 | def preprocessTestingInputs(self, X): 326 | X = self.preprocessInputs(X) 327 | if self.normalizeInputs and np.max(X) > 1.0: 328 | X = X/255.0 329 | return X 330 | 331 | def preprocessInputs(self, X): 332 | X = np.array(X, dtype=float) 333 | # if X is int 334 | if X.ndim==0: 335 | X = np.array([X]) 336 | # if X is 1D 337 | if X.ndim==1: 338 | if self.layers[0]==1: #if ndim=1 339 | X = np.reshape(X, (len(X),1)) 340 | else: #if X is only 1 nd-ndimensional vector 341 | X = np.reshape(X, (1,len(X))) 342 | return X 343 | 344 | def preprocessOutputs(self, Y): 345 | Y = np.array(Y, dtype=float) 346 | # if Y is int 347 | if Y.ndim==0: 348 | Y = np.array([Y]) 349 | # if Y is 1D 350 | if Y.ndim==1: 351 | if self.layers[-1]==1: 352 | Y = np.reshape(Y, (len(Y),1)) 353 | else: 354 | Y = np.reshape(Y, (1,len(Y))) 355 | return Y 356 | 357 | def addBiasTerms(self, X): 358 | if X.ndim==0 or X.ndim==1: 359 | X = np.insert(X, 0, 1) 360 | elif X.ndim==2: 361 | X = np.insert(X, 0, 1, axis=1) 362 | return X 363 | 364 | def sigmoid(self, z): 365 | return 1/(1 + np.exp(-z)) 366 | 367 | def evaluate(self, X, Y): 368 | yPreds = forwardProp(X, self.weights)[-1] 369 | test_results = [(np.argmax(yPreds[i]), np.argmax(Y[i])) 370 | for i in range(len(Y))] 371 | yes = sum(int(x == y) for (x, y) in test_results) 372 | print(str(yes)+" out of "+str(len(Y))) 373 | 374 | def loadMNISTData(self, path='/Users/vikram.v/Downloads/mnist.npz'): 375 | # Use numpy.load() to load the .npz file 376 | f = np.load(path) 377 | 378 | # To check files stored in .npz file 379 | f.files 380 | 381 | # Saving the files 382 | x_train = f['x_train'] 383 | y_train = f['y_train'] 384 | x_test = f['x_test'] 385 | y_test = f['y_test'] 386 | f.close() 387 | 388 | # Preprocess inputs 389 | x_train_new = np.array([x.flatten() for x in x_train]) 390 | y_train_new = np.zeros((len(y_train), 10)) 391 | for i in range(len(y_train)): 392 | y_train_new[i][y_train[i]] = 1 393 | 394 | x_test_new = np.array([x.flatten() for x in x_test]) 395 | y_test_new = np.zeros((len(y_test), 10)) 396 | for i in range(len(y_test)): 397 | y_test_new[i][y_test[i]] = 1 398 | 399 | return [x_train_new, y_train_new, x_test_new, y_test_new] 400 | -------------------------------------------------------------------------------- /myPyNNTest.py: -------------------------------------------------------------------------------- 1 | from myPyNN import * 2 | 3 | # RANDOM 4 | X = [[2,2,2], [3,3,3], [4,4,4], [5,5,5], [6,6,6], [7,7,7], [8,8,8], [9,9,9], [10,10,10], [11,11,11]] 5 | y = [.2, .3, .4, .5, .6, .7, .8, .9, 0, .1] 6 | myNN = MyPyNN([3, 10, 1]) 7 | 8 | 9 | # MANUAL CALCULATIONS TO CHECK NETWORK 10 | def addBiasTerms(X): 11 | if X.ndim==0 or X.ndim==1: 12 | X = np.insert(X, 0, 1) 13 | elif X.ndim==2: 14 | X = np.insert(X, 0, 1, axis=1) 15 | return X 16 | 17 | def sigmoid(z): 18 | return 1/(1 + np.exp(-z)) 19 | 20 | X = np.array([[0,0], [0,1], [1,0], [1,1]]) 21 | y = np.array([[0], [1], [1], [1]]) 22 | myNN = MyPyNN([2, 1, 1]) 23 | lr = 1.5 24 | nIterations = 1 25 | W01 = myNN.weights[0] 26 | W02 = myNN.weights[1] 27 | W1 = W01 28 | W2 = W02 29 | X = X.astype('float') 30 | inputs = X - np.reshape(np.mean(X, axis=0), (1, X.shape[1])) 31 | for i in range(nIterations): 32 | yPred = sigmoid(np.dot(addBiasTerms(sigmoid(np.dot(addBiasTerms(inputs), W1))), W2)) 33 | err2 = yPred - y 34 | output1 = sigmoid(np.dot(addBiasTerms(inputs), W1)) 35 | del2 = np.multiply(np.multiply(yPred, (1-yPred)), err2) 36 | err1 = np.dot(del2, W2[1:].T) 37 | deltaW2 = lr*np.dot(addBiasTerms(output1).T, del2)/len(yPred) 38 | newW2 = W2 - deltaW2 39 | del1 = np.multiply(np.multiply(output1, 1-output1), err1) 40 | deltaW1 = lr*np.dot(addBiasTerms(inputs).T, del1)/len(yPred) 41 | newW1 = W1 - deltaW1 42 | W1 = newW1 43 | W2 = newW2 44 | 45 | myNN.trainUsingGD(X, y, learningRate=lr, nIterations=nIterations, visible=True) 46 | newW1 == myNN.weights[0] 47 | newW2 == myNN.weights[1] 48 | 49 | yPred == myNN.outputs[1] 50 | output1 == myNN.outputs[0] 51 | 52 | 53 | # COMPARING LEARNING RATES 54 | myNN1 = MyPyNN([2, 3, 1]) 55 | myNN2 = MyPyNN([2, 3, 1]) 56 | myNN3 = MyPyNN([2, 3, 1]) 57 | myNN4 = MyPyNN([2, 3, 1]) 58 | myNN5 = MyPyNN([2, 3, 1]) 59 | myNN2.weights[0] = myNN1.weights[0] 60 | myNN2.weights[1] = myNN1.weights[1] 61 | myNN3.weights[0] = myNN1.weights[0] 62 | myNN3.weights[1] = myNN1.weights[1] 63 | myNN4.weights[0] = myNN1.weights[0] 64 | myNN4.weights[1] = myNN1.weights[1] 65 | myNN5.weights[0] = myNN1.weights[0] 66 | myNN5.weights[1] = myNN1.weights[1] 67 | myNN1.trainUsingGD(X, y, learningRate=0.1, nIterations=2500) 68 | myNN2.trainUsingGD(X, y, learningRate=0.5, nIterations=600) 69 | myNN3.trainUsingGD(X, y, learningRate=1, nIterations=400) 70 | myNN4.trainUsingGD(X, y, learningRate=2, nIterations=200) 71 | myNN5.trainUsingGD(X, y, learningRate=200, nIterations=1000) 72 | 73 | 74 | # Make network 75 | myNN = MyPyNN([784, 30, 10]) 76 | lr = 3 77 | nIterations = 30 78 | minibatchSize = 10 79 | 80 | # MNIST DATA 81 | ''' 82 | f = np.load(path) 83 | 84 | # To check files stored in .npz file 85 | f.files 86 | 87 | # Saving the files 88 | x_train = f['x_train'] 89 | y_train = f['y_train'] 90 | x_test = f['x_test'] 91 | y_test = f['y_test'] 92 | f.close() 93 | 94 | # Preprocess inputs 95 | x_train_new = np.array([x.flatten() for x in x_train]) 96 | y_train_new = np.zeros((len(y_train), 10)) 97 | for i in range(len(y_train)): 98 | y_train_new[i][y_train[i]] = 1 99 | 100 | x_test_new = np.array([x.flatten() for x in x_test]) 101 | y_test_new = np.zeros((len(y_test), 10)) 102 | for i in range(len(y_test)): 103 | y_test_new[i][y_test[i]] = 1 104 | ''' 105 | 106 | [x_train_new, y_train_new, x_test_new, y_test_new] = myNN.loadMNISTData() 107 | 108 | myNN.trainUsingGD(x_train_new, y_train_new, nIterations=nIterations, learningRate=lr) 109 | myNN.trainUsingMinibatchGD(x_train_new, y_train_new, nIterations=nIterations, minibatchSize=minibatchSize, learningRate=lr) 110 | myNN.trainUsingminibatchGD(x_train_new, y_train_new, nIterations=nIterations, minibatchSize=minibatchSize, learningRate=lr, printTestAccuracy=True, testX=x_test_new, testY=y_test_new) 111 | 112 | # Make network 113 | myNN = MyPyNN([784, 5, 5, 10]) 114 | lr = 1.5 115 | nIterations = 1000 116 | minibatchSize = 100 117 | myNN.trainUsingSGD(x_train_new, y_train_new, nIterations=nIterations, minibatchSize=minibatchSize, learningRate=lr) 118 | 119 | # To check type of the dataset 120 | type(x_train) 121 | type(y_train) 122 | # To check data 123 | x_train.shape 124 | y_train.shape 125 | fig = plt.figure(figsize=(10, 2)) 126 | for i in range(20): 127 | ax1 = fig.add_subplot(2, 10, i+1) 128 | ax1.imshow(x_train[i], cmap='gray'); 129 | ax1.axis('off') 130 | 131 | --------------------------------------------------------------------------------