├── .gitignore
├── 1_Neural_Network_Tutorial_Visualizations.ipynb
├── 2_Neural_Network_Tutorial_Matrix_Representations.ipynb
├── 3_Neural_Network_Tutorial_Writing_NN_ForwardProp_In_Python.ipynb
├── 4_Neural_Network_Tutorial_Backpropagation.ipynb
├── 5_Neural_Network_Tutorial_Training_And_Testing.ipynb
├── 6_Neural_Network_Tutorial_Descent_Experimenting_with_Optimizers.ipynb
├── MNIST experiments.ipynb
├── README.md
├── images
├── Title_ANN.png
├── digitsNN.png
└── optimizers.gif
├── myPyNN.py
└── myPyNNTest.py
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | env/
12 | build/
13 | develop-eggs/
14 | dist/
15 | downloads/
16 | eggs/
17 | .eggs/
18 | lib/
19 | lib64/
20 | parts/
21 | sdist/
22 | var/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 |
27 | # PyInstaller
28 | # Usually these files are written by a python script from a template
29 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
30 | *.manifest
31 | *.spec
32 |
33 | # Installer logs
34 | pip-log.txt
35 | pip-delete-this-directory.txt
36 |
37 | # Unit test / coverage reports
38 | htmlcov/
39 | .tox/
40 | .coverage
41 | .coverage.*
42 | .cache
43 | nosetests.xml
44 | coverage.xml
45 | *,cover
46 | .hypothesis/
47 |
48 | # Translations
49 | *.mo
50 | *.pot
51 |
52 | # Django stuff:
53 | *.log
54 | local_settings.py
55 |
56 | # Flask stuff:
57 | instance/
58 | .webassets-cache
59 |
60 | # Scrapy stuff:
61 | .scrapy
62 |
63 | # Sphinx documentation
64 | docs/_build/
65 |
66 | # PyBuilder
67 | target/
68 |
69 | # IPython Notebook
70 | .ipynb_checkpoints
71 |
72 | # pyenv
73 | .python-version
74 |
75 | # celery beat schedule file
76 | celerybeat-schedule
77 |
78 | # dotenv
79 | .env
80 |
81 | # virtualenv
82 | venv/
83 | ENV/
84 |
85 | # Spyder project settings
86 | .spyderproject
87 |
88 | # Rope project settings
89 | .ropeproject
90 |
91 | # OS X
92 | .DS_Store
93 |
--------------------------------------------------------------------------------
/2_Neural_Network_Tutorial_Matrix_Representations.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Matrix representations\n",
8 | "\n",
9 | "## Matrix representations - input to the network\n",
10 | "\n",
11 | "Suppose an input has $d_{i}$ dimensions. (Remember that the input has been normalized to range between 0 and 1.)\n",
12 | "\n",
13 | "Then each input would be:\n",
14 | "\n",
15 | "$$X \\; (without bias) _{1{\\times}d_{i}} = \\left[ \\begin{array}{c} x_{0} & x_{1} & \\cdots & x_{(d_{i}-1)} \\end{array} \\right] _{1{\\times}d_{i}}$$\n",
16 | "\n",
17 | "After adding the bias term,\n",
18 | "\n",
19 | "$$X_{1{\\times}(d_{i}+1)} = \\left[ \\begin{array}{c} 1 & X_{1{\\times}d_{i}} \\end{array} \\right] _{1{\\times}(d_{i}+1)}$$\n",
20 | "\n",
21 | "For example, one of the data points given above to make a logic gate was $(0,1)$. Here, $X = \\left[ \\begin{array}{c} 1 & 0 & 1 \\end{array} \\right]_{1{\\times}(2+1)}$"
22 | ]
23 | },
24 | {
25 | "cell_type": "markdown",
26 | "metadata": {},
27 | "source": [
28 | "Suppose we provide $n$ $d_{i}$-dimensional data points. For the first layer of neurons, we can make an input matrix of $n{\\times}d_{i}$ dimension.\n",
29 | "\n",
30 | "$$X^{(1)}_{n{\\times}(d_{i}+1)} = \n",
31 | "\\left[ \\begin{array}{c} 1 & _{(0)}X \\\\ 1 & _{(1)}X \\\\ \\vdots & \\vdots \\\\ 1 & _{(n-1)}X \\end{array} \\right] _{n{\\times}(d_{i}+1)}\n",
32 | "=\n",
33 | "\\left[ \\begin{array}{c} \n",
34 | "1 & _{(0)}x_{0} & _{(0)}x_{1} & _{(0)}x_{2} & \\cdots & _{(0)}x_{(d_{i}-1)} \\\\ \n",
35 | "1 & _{(1)}x_{0} & _{(1)}x_{1} & _{(1)}x_{2} & \\cdots & _{(1)}x_{(d_{i}-1)} \\\\ \n",
36 | "\\vdots & \\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\ \n",
37 | "1 & _{(n-1)}x_{0} & _{(n-1)}x_{1} & _{(n-1)}x_{2} & \\cdots & _{(n-1)}x_{(d_{i}-1)} \n",
38 | "\\end{array} \\right] _{n{\\times}(d_{i}+1)}$$\n",
39 | "\n",
40 | "For example, for logic gates, the input matrix was $X = \\left[ \\begin{array}{c} 1 & 0 & 0 \\\\ 1 & 0 & 1 \\\\ 1 & 1 & 0 \\\\ 1 & 1 & 1 \\end{array} \\right] _{4{\\times}3} $"
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | "## Matrix representations - output of a layer\n",
48 | "\n",
49 | "Suppose the output of the $l^{th}$ layer has $o_{l}$ dimensions, meaning there are $o_{l}$ neurons in the layer.\n",
50 | "\n",
51 | "In the above example, the output of the 1st Layer of 2 neurons is $o_{1} = 2$, and the output of the 2nd layer of 1 neuron is $o_{2} = 1$.\n",
52 | "\n",
53 | "For each input, the output is an $o_{l}$-dimensional vector:\n",
54 | "\n",
55 | "$$Y^{(l)} = \\left[ \\begin{array}{c} y_{[0]}^{(l)} & y_{[1]}^{(l)} & \\cdots & y_{[o_{l}-1]}^{(l)} \\end{array} \\right] _{1{\\times}o_{l}}$$\n",
56 | "\n",
57 | "\n",
58 | "For example, for an AND gate, the output of $(0,1)$ is $Y = \\left[ \\begin{array}{c} 0 \\end{array} \\right] _{1{\\times}1}$"
59 | ]
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "metadata": {},
64 | "source": [
65 | "Thus, for $n$ data points, the output is:\n",
66 | "\n",
67 | "$$Y^{(l)} = \\left[ \\begin{array}{c} \n",
68 | "{_{(0)}}Y^{(l)} \\\\ {_{(1)}}Y^{(l)} \\\\ \\vdots \\\\ _{(n-1)}Y^{(l)} \\end{array} \\right] _{n{\\times}o_{l}} \n",
69 | "= \\left[ \\begin{array}{c} \n",
70 | "{_{(0)}}y_{[0]}^{(l)} & \\cdots & {_{(0)}}y_{[o_{l}-1]}^{(l)} \\\\ \n",
71 | "{_{(1)}}y_{[0]}^{(l)} & \\cdots & {_{(1)}}y_{[o_{l}-1]}^{(l)} \\\\ \n",
72 | "\\vdots & \\ddots & \\vdots \\\\ \n",
73 | "_{(n-1)}y_{[0]}^{(l)} & \\cdots & _{(n-1)}y_{[o_{l}-1]}^{(l)} \n",
74 | "\\end{array} \\right] _{n{\\times}o_{l}}$$\n",
75 | "\n",
76 | "For example, for an AND gate, the output matrix is $Y = \\left[ \\begin{array}{c} 0 \\\\ 0 \\\\ 0 \\\\ 1 \\end{array} \\right] _{4{\\times}1}$"
77 | ]
78 | },
79 | {
80 | "cell_type": "markdown",
81 | "metadata": {},
82 | "source": [
83 | "## Matrix representations - input to a layer\n",
84 | "\n",
85 | "Suppose at the $l^{th}$ layer, the input has $i_{l}$ dimensions.\n",
86 | "\n",
87 | "(The number of inputs to the layer) = (1 bias term) + (the number of outputs from the previous layer):\n",
88 | "$$i_{l} = 1 + o_{(l-1)}$$\n",
89 | "\n",
90 | "In the above example, the input to the first layer of 2 neurons has $i_{1} = d_{i}+1 = 3$, and the second layer of 1 neuron has $i_{2} = o_{1} + 1 = 3$."
91 | ]
92 | },
93 | {
94 | "cell_type": "markdown",
95 | "metadata": {},
96 | "source": [
97 | "If there are $n$ data points given, the input to the $l^{th}$ layer would be an $n{\\times}i_{l} = n{\\times}(o_{(l-1)}+1)$ matrix:\n",
98 | "\n",
99 | "$$X^{(l)}_{n{\\times}i_{l}} \n",
100 | "= \\left[ \\begin{array}{c} \n",
101 | "1 & _{(0)}Y^{(l-1)} \\\\ \n",
102 | "1 & _{(1)}Y^{(l-1)} \\\\ \n",
103 | "\\vdots & \\vdots \\\\ \n",
104 | "1 & _{(n-1)}Y^{(l-1)} \n",
105 | "\\end{array} \\right] _{n{\\times}i_{l}}\n",
106 | "= \\left[ \\begin{array}{c} \n",
107 | "1 & _{(0)}y^{(l-1)}_{[0]} & \\cdots & _{(0)}y^{(l-1)}_{[o_{l-1}-1]} \\\\ \n",
108 | "1 & _{(1)}y^{(l-1)}_{[0]} & \\cdots & _{(1)}y^{(l-1)}_{[o_{l-1}-1]} \\\\ \n",
109 | "\\vdots & \\vdots & \\ddots & \\vdots \\\\ \n",
110 | "1 & _{(n-1)}y^{(l-1)}_{[0]} & \\cdots & _{(n-1)}y^{(l-1)}_{[o_{l-1}-1]} \n",
111 | "\\end{array} \\right] _{n{\\times}i_{l}}$$\n",
112 | "\n",
113 | "For example, in the 3-neurons neural network above, input matrix to the first layer is $\\left[ \\begin{array}{c} 1 & x_0 & x_1 \\end{array} \\right] _{1{\\times}3}$, and the input matrix to the second layer is $\\left[ \\begin{array}{c} 1 & y_0 & y_1 \\end{array} \\right] _{1{\\times}3}$"
114 | ]
115 | },
116 | {
117 | "cell_type": "markdown",
118 | "metadata": {},
119 | "source": [
120 | "## Matrix representations - weight matrix of one neuron\n",
121 | "\n",
122 | "For a neuron, the weight matrix multiplies a weight with each input in every dimension, and sums them. This can be represented by a dot product.\n",
123 | "\n",
124 | "Assuming the input to the $k^{th}$ neuron in the $l^{th}$ layer has $i_{l}$ dimensions,\n",
125 | "\n",
126 | "$$W^{(l)}_{[k]} {_{1{\\times}i_{l}}} = \\left[ \\begin{array}{c} w^{(l)}_{[k],0} & w^{(l)}_{[k],1} & \\cdots & w^{(l)}_{[k],i_{l}-1} \\end{array} \\right] _{1{\\times}i_{l}}$$\n",
127 | "\n",
128 | "(Remember $i_{l} = 1 + o_{(l-1)}$)"
129 | ]
130 | },
131 | {
132 | "cell_type": "markdown",
133 | "metadata": {},
134 | "source": [
135 | "Then the output of that neuron for one data point is x < dot product \\> weights.\n",
136 | "\n",
137 | "$$y^{(l)}_{[k]} {_{1{\\times}1}} = Sigmoid( x^{(l)} {_{1{\\times}i_{l}}} \\; .* \\; W^{(l)}_{[k]}{^T}{_{i_{l}{\\times}1}} )$$\n",
138 | "\n",
139 | "$$\n",
140 | "=\n",
141 | "Sigmoid \\left(\n",
142 | "x^{(l)}_{[k]}\n",
143 | "\\left[ \\begin{array}{c} 1 & y^{(l-1)}_{0} & \\cdots & y^{(l-1)}_{(o_{l-1}-1)}\n",
144 | "\\end{array} \\right] _{1{\\times}i_{l}}\n",
145 | "\\;\\;\\; .* \\;\\;\\;\n",
146 | "W^{(l)}_{[k]} {^{T}}\n",
147 | "\\left[ \\begin{array}{c} w^{(l)}_{[k],0} \\\\ w^{(l)}_{[k],1} \\\\ \\vdots \\\\ w^{(l)}_{[k],i_{l}-1} \\end{array} \\right] _{i_{l}{\\times}1}\n",
148 | "\\right)\n",
149 | "$$\n",
150 | "\n",
151 | "$$\n",
152 | "= Sigmoid(1*w^{(l)}_{[k],0} \\;\\;+\\;\\; y^{(l-1)}_{0}*w^{(l)}_{[k],1} \\;\\;+\\;\\; ... \\;\\;+\\;\\; y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1})\n",
153 | "$$\n",
154 | "\n",
155 | "(We can see that the dot product of the $x$ and $W$ matrices does indeed give the output of the neuron)"
156 | ]
157 | },
158 | {
159 | "cell_type": "markdown",
160 | "metadata": {},
161 | "source": [
162 | "For $n$ data points, the output of the $k^{th}$ neuron in the $l^{th}$ layer is:\n",
163 | "$$Y^{(l)}_{[k]} {_{n{\\times}1}}\n",
164 | "=\n",
165 | "Sigmoid \\left(\n",
166 | "X^{(l)}_{[k]}\n",
167 | "\\left[ \\begin{array}{c} \n",
168 | "1 & _{(0)}y^{(l-1)}_{0} & \\cdots & _{(0)}y^{(l-1)}_{(o_{l-1}-1)} \\\\\n",
169 | "1 & _{(1)}y^{(l-1)}_{0} & \\cdots & _{(1)}y^{(l-1)}_{(o_{l-1}-1)} \\\\\n",
170 | "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
171 | "1 & _{(n-1)}y^{(l-1)}_{0} & \\cdots & _{(n-1)}y^{(l-1)}_{(o_{l-1}-1)}\n",
172 | "\\end{array} \\right] _{n{\\times}i_{l}}\n",
173 | "\\; .* \\;\n",
174 | "W^{(l)}_{[k]} {^{T}}\n",
175 | "\\left[ \\begin{array}{c} w^{(l)}_{[k],0} \\\\ w^{(l)}_{[k],1} \\\\ \\vdots \\\\ w^{(l)}_{[k],i_{l}-1} \\end{array} \\right] _{i_{l}{\\times}1}\n",
176 | "\\right)\n",
177 | "$$\n",
178 | "\n",
179 | "$$\n",
180 | "=\n",
181 | "Sigmoid \\left(\n",
182 | "\\left[ \\begin{array}{c} \n",
183 | "1*w^{(l)}_{[k],0} \\;\\;+\\;\\; _{(0)}y^{(l-1)}_{(0)}*w^{(l)}_{[k],1} \\;\\;+\\;\\; ... \\;\\;+\\;\\; _{(0)}y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1} \\\\\n",
184 | "1*w^{(l)}_{[k],0} \\;\\;+\\;\\; _{(1)}y^{(l-1)}_{0}*w^{(l)}_{[k],1} \\;\\;+\\;\\; ... \\;\\;+\\;\\; _{(1)}y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1} \\\\\n",
185 | "\\vdots \\\\\n",
186 | "1*w^{(l)}_{[k],0} \\;\\;+\\;\\; _{(n-1)}y^{(l-1)}_{0}*w^{(l)}_{[k],1} \\;\\;+\\;\\; ... \\;\\;+\\;\\; _{(n-1)}y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1} \\\\\n",
187 | "\\end{array} \\right] _{n{\\times}1}\n",
188 | "\\right)\n",
189 | "$$"
190 | ]
191 | },
192 | {
193 | "cell_type": "markdown",
194 | "metadata": {},
195 | "source": [
196 | "## Matrix representations - weight of a layer of neurons\n",
197 | "\n",
198 | "Suppose the $l^{th}$ layer in a neural network has $o_{l}$ neurons.\n",
199 | "\n",
200 | "Each neuron would produce one number as its output - the dot product of its weights, and the inputs.\n",
201 | "\n",
202 | "In matrix form, the weight matrix of the layer is:\n",
203 | "\n",
204 | "$$\n",
205 | "W^{(l)}_{o_{l}{\\times}i_{l}} = \\left[ \\begin{array}{c} W^{(l)}_{[0]} \\\\ W^{(l)}_{[1]} \\\\ \\cdots \\\\ W^{(l)}_{[o_{l}-1]} \\end{array} \\right] _{o_{l}{\\times}i_{l}} \n",
206 | "= \n",
207 | "\\left[ \\begin{array}{c} \n",
208 | "w^{(l)}_{[0],0} & w^{(l)}_{[0],1} & w^{(l)}_{[0],2} & \\cdots & w^{(l)}_{[0],i_{l}-1} \\\\ \n",
209 | "w^{(l)}_{[1],0} & w^{(l)}_{[1],1} & w^{(l)}_{[1],2} & \\cdots & w^{(l)}_{[1],i_{l}-1} \\\\ \n",
210 | "\\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\ \n",
211 | "w^{(l)}_{[o_{l}-1],0} & w^{(l)}_{[o_{l}-1],1} & w^{(l)}_{[o_{l}-1],2} & \\cdots & w^{(l)}_{[o_{l}-1],i_{l}-1} \n",
212 | "\\end{array} \\right] _{o_{l}{\\times}i_{l}}\n",
213 | "$$"
214 | ]
215 | },
216 | {
217 | "cell_type": "markdown",
218 | "metadata": {},
219 | "source": [
220 | "The output of this layer of neurons is:\n",
221 | "\n",
222 | "$$ Y^{(l)}_{n{\\times}o_{l}} = Sigmoid\\;(\\;X^{(l)}_{n{\\times}i_{l}} \\; .* \\; W^{(l)}{^{T}}_{i_{l}{\\times}o_{l}} \\;)\\; $$\n",
223 | "\n",
224 | "$$\n",
225 | "Y^{(l)}_{n{\\times}o_{l}} \\left[ \\begin{array}{c} \n",
226 | "{_{(0)}}y_{0}^{(l)} & \\cdots & {_{(0)}}y_{o_{l}-1}^{(l)} \\\\ \n",
227 | "{_{(1)}}y_{0}^{(l)} & \\cdots & {_{(1)}}y_{o_{l}-1}^{(l)} \\\\ \n",
228 | "\\vdots & \\ddots & \\vdots \\\\ \n",
229 | "_{(n-1)}y_{0}^{(l)} & \\cdots & _{(n-1)}y_{o_{l}-1}^{(l)} \n",
230 | "\\end{array} \\right] _{n{\\times}o_{l}}\n",
231 | "=\n",
232 | "Sigmoid \\left(\n",
233 | "X^{(l)}_{n{\\times}i_{l}} \\left[ \\begin{array}{c} \n",
234 | "1 & _{(0)}y^{(l-1)}_{0} & \\cdots & _{(0)}y^{(l-1)}_{(o_{l-1}-1)} \\\\ \n",
235 | "1 & _{(1)}y^{(l-1)}_{0} & \\cdots & _{(1)}y^{(l-1)}_{(o_{l-1}-1)} \\\\ \n",
236 | "\\vdots & \\vdots & \\ddots & \\vdots \\\\ \n",
237 | "1 & _{(n-1)}y^{(l-1)}_{0} & \\cdots & _{(n-1)}y^{(l-1)}_{(o_{l-1}-1)} \n",
238 | "\\end{array} \\right] _{n{\\times}i_{l}}\n",
239 | "\\; .* \\;\n",
240 | "W^{(l)}{^{T}}_{i_{l}{\\times}o_{l}} \\left[ \\begin{array}{c} \n",
241 | "w^{(l)}_{[0],0} & w^{(l)}_{[1],1} & \\cdots & w^{(l)}_{[o_{l}-1],0} \\\\ \n",
242 | "w^{(l)}_{[0],1} & w^{(l)}_{[1],1} & \\cdots & w^{(l)}_{[o_{l}-1],1} \\\\ \n",
243 | "\\vdots & \\vdots & \\ddots & \\vdots \\\\ \n",
244 | "w^{(l)}_{[0],i_{l}-1} & w^{(l)}_{[1],1} & \\cdots & w^{(l)}_{[o_{l}-1],i_{l}-1} \n",
245 | "\\end{array} \\right] _{i_{l}{\\times}o_{l}}\n",
246 | "\\right)\n",
247 | "$$\n",
248 | "\n",
249 | "$$\n",
250 | "=\n",
251 | "Sigmoid \\left(\n",
252 | "\\left[ \\begin{array}{c} \n",
253 | "1*w^{(l)}_{[0],0} + \\cdots + _{(0)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[0],i_{l-1}-1}\n",
254 | "&\n",
255 | "\\cdots\n",
256 | "&\n",
257 | "1*w^{(l)}_{[(o_{l}-1)],0} + \\cdots + _{(0)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[(o_{l}-1)],i_{l-1}-1}\n",
258 | "\\\\\n",
259 | "\\vdots & \\ddots & \\vdots\n",
260 | "\\\\\n",
261 | "1*w^{(l)}_{[0],0} + \\cdots + _{(n-1)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[0],i_{l-1}-1}\n",
262 | "&\n",
263 | "\\cdots\n",
264 | "&\n",
265 | "1*w^{(l)}_{[(o_{l}-1)],0} + \\cdots + _{(n-1)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[(o_{l}-1)],i_{l-1}-1}\n",
266 | "\\end{array} \\right] _{n{\\times}o_{l}}\n",
267 | "\\right)\n",
268 | "$$"
269 | ]
270 | },
271 | {
272 | "cell_type": "markdown",
273 | "metadata": {},
274 | "source": [
275 | "## Conclusion\n",
276 | "\n",
277 | "We have seen that the action of a layer of a neural network can be written as the following matrix operation:\n",
278 | "\n",
279 | "$$ Y^{(l)}_{n{\\times}o_{l}} = Sigmoid\\;(\\;X^{(l)}_{n{\\times}i_{l}} \\; .* \\; W^{(l)}{^{T}}_{i_{l}{\\times}o_{l}} \\;)\\; $$\n",
280 | "\n",
281 | "So, a neural network can be defined as the set of weights $W^{(l)}_{i_{l}{\\times}o_{l}}$ for all its layers, where $l$ is the index of the layer we are considering, $i_{l}$ and $o_{l}$ are its input and output dimensions.\n",
282 | "\n",
283 | "Also, because of adding a bias term at every layer,\n",
284 | "\n",
285 | "$$i_{l} = 1 + o_{(l-1)}$$\n",
286 | "\n",
287 | "The utility of neural networks can be exploited only once the weight matrices $W^{(l)}_{i_{l}{\\times}o_{l}}$ for all $l$ have ben set according to need."
288 | ]
289 | }
290 | ],
291 | "metadata": {
292 | "kernelspec": {
293 | "display_name": "Python 3",
294 | "language": "python",
295 | "name": "python3"
296 | },
297 | "language_info": {
298 | "codemirror_mode": {
299 | "name": "ipython",
300 | "version": 3
301 | },
302 | "file_extension": ".py",
303 | "mimetype": "text/x-python",
304 | "name": "python",
305 | "nbconvert_exporter": "python",
306 | "pygments_lexer": "ipython3",
307 | "version": "3.5.1"
308 | }
309 | },
310 | "nbformat": 4,
311 | "nbformat_minor": 2
312 | }
313 |
--------------------------------------------------------------------------------
/3_Neural_Network_Tutorial_Writing_NN_ForwardProp_In_Python.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 19,
6 | "metadata": {
7 | "collapsed": true
8 | },
9 | "outputs": [],
10 | "source": [
11 | "# Pre-requisites\n",
12 | "import numpy as np"
13 | ]
14 | },
15 | {
16 | "cell_type": "markdown",
17 | "metadata": {},
18 | "source": [
19 | "# Writing a neural network in python\n",
20 | "\n",
21 | "Firstly, a neural network is defined by the number of layers, and the number of neurons in each layer.\n",
22 | "\n",
23 | "Let us use a list to denote this."
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "metadata": {},
29 | "source": [
30 | "## Defining layer sizes"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 31,
36 | "metadata": {},
37 | "outputs": [
38 | {
39 | "name": "stdout",
40 | "output_type": "stream",
41 | "text": [
42 | "jwifhiwfn\n"
43 | ]
44 | }
45 | ],
46 | "source": [
47 | "# Defining the sizes of the layers in our neural network\n",
48 | "layers = [2, 2, 1]\n",
49 | "print(\"jwifhiwfn\")"
50 | ]
51 | },
52 | {
53 | "cell_type": "markdown",
54 | "metadata": {},
55 | "source": [
56 | "The above code denotes the 3-neuron neural network we saw previously: 2-dimensional input, 2 neurons in a hidden layer, 1 neuron in the output layer.\n",
57 | "\n",
58 | "Generally speaking, a neural network than has more than 1 hidden layer is a **deep** neural network."
59 | ]
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "metadata": {},
64 | "source": [
65 | "## Defining weight matrices\n",
66 | "\n",
67 | "Using the sizes of the layers in our neural network, let us initialize the weight matrices to random values (sampled from a standard normal gaussian, because we know that we need both positive and negative weights)."
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": 3,
73 | "metadata": {
74 | "collapsed": true
75 | },
76 | "outputs": [],
77 | "source": [
78 | "# Initializing weight matrices from layer sizes\n",
79 | "def initializeWeights(layers):\n",
80 | " weights = [np.random.randn(o, i+1) for i, o in zip(layers[:-1], layers[1:])]\n",
81 | " return weights"
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": 37,
87 | "metadata": {},
88 | "outputs": [
89 | {
90 | "name": "stdout",
91 | "output_type": "stream",
92 | "text": [
93 | "1\n",
94 | "(2, 3)\n",
95 | "[[ 0.45147937 2.36764603 -0.44038386]\n",
96 | " [ 1.25899973 -1.06551598 0.20563357]]\n",
97 | "2\n",
98 | "(1, 3)\n",
99 | "[[-0.76261718 -0.90078965 -0.01774495]]\n"
100 | ]
101 | }
102 | ],
103 | "source": [
104 | "# Displaying weight matrices\n",
105 | "layers = [2, 2, 1]\n",
106 | "weights = initializeWeights(layers)\n",
107 | "\n",
108 | "for i in range(len(weights)):\n",
109 | " print(i+1); print(weights[i].shape); print(weights[i])"
110 | ]
111 | },
112 | {
113 | "cell_type": "markdown",
114 | "metadata": {},
115 | "source": [
116 | "# Forward Propagation\n",
117 | "\n",
118 | "The output of the neural network is calculated by **propagating forward** the outputs of each layer.\n",
119 | "\n",
120 | "Let us define our input as an np.array, since we want to represent matrices."
121 | ]
122 | },
123 | {
124 | "cell_type": "code",
125 | "execution_count": 9,
126 | "metadata": {
127 | "collapsed": true
128 | },
129 | "outputs": [],
130 | "source": [
131 | "# We shall use np.array() to represent matrices\n",
132 | "#X = np.array([23, 42, 56])\n",
133 | "X = np.array([[0,0], [0,1], [1,0], [1,1]])"
134 | ]
135 | },
136 | {
137 | "cell_type": "markdown",
138 | "metadata": {},
139 | "source": [
140 | "## Adding bias terms\n",
141 | "\n",
142 | "Since the input to every layer needs a bias term (1) added to it, let us define a function to do that."
143 | ]
144 | },
145 | {
146 | "cell_type": "code",
147 | "execution_count": 29,
148 | "metadata": {
149 | "collapsed": true
150 | },
151 | "outputs": [],
152 | "source": [
153 | "# Add a bias term to every data point in the input\n",
154 | "def addBiasTerms(X):\n",
155 | " # Make the input an np.array()\n",
156 | " X = np.array(X)\n",
157 | " \n",
158 | " # Forcing 1D vectors to be 2D matrices of 1xlength dimensions\n",
159 | " if X.ndim==1:\n",
160 | " X = np.reshape(X, (1, len(X)))\n",
161 | " \n",
162 | " # Inserting bias terms\n",
163 | " X = np.insert(X, 0, 1, axis=1)\n",
164 | " \n",
165 | " return X"
166 | ]
167 | },
168 | {
169 | "cell_type": "markdown",
170 | "metadata": {},
171 | "source": [
172 | "Use the following cell to test the addBiasTerms function:"
173 | ]
174 | },
175 | {
176 | "cell_type": "code",
177 | "execution_count": 30,
178 | "metadata": {},
179 | "outputs": [
180 | {
181 | "name": "stdout",
182 | "output_type": "stream",
183 | "text": [
184 | "Before adding bias terms: \n",
185 | "[[0 0]\n",
186 | " [0 1]\n",
187 | " [1 0]\n",
188 | " [1 1]]\n",
189 | "After adding bias terms: \n",
190 | "[[1 0 0]\n",
191 | " [1 0 1]\n",
192 | " [1 1 0]\n",
193 | " [1 1 1]]\n"
194 | ]
195 | }
196 | ],
197 | "source": [
198 | "# TESTING addBiasTerms\n",
199 | "\n",
200 | "# We shall use np.array() to represent matrices\n",
201 | "#X = np.array([23, 42, 56])\n",
202 | "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n",
203 | "print(\"Before adding bias terms: \"); print(X)\n",
204 | "X = addBiasTerms(X)\n",
205 | "print(\"After adding bias terms: \"); print(X)"
206 | ]
207 | },
208 | {
209 | "cell_type": "markdown",
210 | "metadata": {},
211 | "source": [
212 | "## Sigmoid function\n",
213 | "\n",
214 | "Let us also define a function to calculate the sigmoid of any np.array given to it:"
215 | ]
216 | },
217 | {
218 | "cell_type": "code",
219 | "execution_count": 13,
220 | "metadata": {
221 | "collapsed": true
222 | },
223 | "outputs": [],
224 | "source": [
225 | "# Sigmoid function\n",
226 | "def sigmoid(a):\n",
227 | " return 1/(1 + np.exp(-a))"
228 | ]
229 | },
230 | {
231 | "cell_type": "markdown",
232 | "metadata": {},
233 | "source": [
234 | "## Forward propagation of inputs\n",
235 | "\n",
236 | "Let us store the outputs of the layers in a list called \"outputs\". We shall use that the output of one layer as the input to the next layer."
237 | ]
238 | },
239 | {
240 | "cell_type": "code",
241 | "execution_count": 17,
242 | "metadata": {
243 | "collapsed": true
244 | },
245 | "outputs": [],
246 | "source": [
247 | "# Forward Propagation of outputs\n",
248 | "def forwardProp(X, weights):\n",
249 | " # Initializing an empty list of outputs\n",
250 | " outputs = []\n",
251 | " \n",
252 | " # Assigning a name to reuse as inputs\n",
253 | " inputs = X\n",
254 | " \n",
255 | " # For each layer\n",
256 | " for w in weights:\n",
257 | " # Add bias term to input\n",
258 | " inputs = addBiasTerms(inputs)\n",
259 | " \n",
260 | " # Y = Sigmoid ( X .* W^T )\n",
261 | " outputs.append(sigmoid(np.dot(inputs, w.T)))\n",
262 | " \n",
263 | " # Input of next layer is output of this layer\n",
264 | " inputs = outputs[-1]\n",
265 | " \n",
266 | " return outputs"
267 | ]
268 | },
269 | {
270 | "cell_type": "markdown",
271 | "metadata": {},
272 | "source": [
273 | "Use the following cell to test forward propagation:"
274 | ]
275 | },
276 | {
277 | "cell_type": "code",
278 | "execution_count": 24,
279 | "metadata": {},
280 | "outputs": [
281 | {
282 | "name": "stdout",
283 | "output_type": "stream",
284 | "text": [
285 | "weights:\n",
286 | "1\n",
287 | "(2, 3)\n",
288 | "[[-250 350 350]\n",
289 | " [-250 200 200]]\n",
290 | "2\n",
291 | "(1, 3)\n",
292 | "[[-100 500 -500]]\n",
293 | "X:\n",
294 | "[[0, 0], [0, 1], [1, 0], [1, 1]]\n",
295 | "outputs:\n",
296 | "1\n",
297 | "(4, 2)\n",
298 | "[[ 2.66919022e-109 2.66919022e-109]\n",
299 | " [ 1.00000000e+000 1.92874985e-022]\n",
300 | " [ 1.00000000e+000 1.92874985e-022]\n",
301 | " [ 1.00000000e+000 1.00000000e+000]]\n",
302 | "2\n",
303 | "(4, 1)\n",
304 | "[[ 3.72007598e-44]\n",
305 | " [ 1.00000000e+00]\n",
306 | " [ 1.00000000e+00]\n",
307 | " [ 3.72007598e-44]]\n"
308 | ]
309 | }
310 | ],
311 | "source": [
312 | "# VIEWING FORWARD PROPAGATION\n",
313 | "\n",
314 | "# Initialize network\n",
315 | "layers = [2, 2, 1]\n",
316 | "#weights = initializeWeights(layers)\n",
317 | "\n",
318 | "# 3-neuron network\n",
319 | "weights = []\n",
320 | "weights.append(np.array([[-250, 350, 350], [-250, 200, 200]]))\n",
321 | "weights.append(np.array([[-100, 500, -500]]))\n",
322 | "\n",
323 | "print(\"weights:\")\n",
324 | "for i in range(len(weights)):\n",
325 | " print(i+1); print(weights[i].shape); print(weights[i])\n",
326 | "\n",
327 | "# Input\n",
328 | "X = [[0,0], [0,1], [1,0], [1,1]]\n",
329 | "\n",
330 | "print(\"X:\"); print(X)\n",
331 | "\n",
332 | "# Forward propagate X, and save outputs\n",
333 | "outputs = forwardProp(X, weights)\n",
334 | "\n",
335 | "print(\"outputs:\")\n",
336 | "for o in range(len(outputs)):\n",
337 | " print(o+1); print(outputs[o].shape); print(outputs[o])"
338 | ]
339 | },
340 | {
341 | "cell_type": "code",
342 | "execution_count": null,
343 | "metadata": {
344 | "collapsed": true
345 | },
346 | "outputs": [],
347 | "source": []
348 | }
349 | ],
350 | "metadata": {
351 | "kernelspec": {
352 | "display_name": "Python 3",
353 | "language": "python",
354 | "name": "python3"
355 | },
356 | "language_info": {
357 | "codemirror_mode": {
358 | "name": "ipython",
359 | "version": 3
360 | },
361 | "file_extension": ".py",
362 | "mimetype": "text/x-python",
363 | "name": "python",
364 | "nbconvert_exporter": "python",
365 | "pygments_lexer": "ipython3",
366 | "version": "3.5.1"
367 | }
368 | },
369 | "nbformat": 4,
370 | "nbformat_minor": 2
371 | }
372 |
--------------------------------------------------------------------------------
/4_Neural_Network_Tutorial_Backpropagation.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 707,
6 | "metadata": {
7 | "collapsed": true
8 | },
9 | "outputs": [],
10 | "source": [
11 | "# Pre-requisites\n",
12 | "import numpy as np\n",
13 | "import time\n",
14 | "\n",
15 | "# To clear print buffer\n",
16 | "from IPython.display import clear_output"
17 | ]
18 | },
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {},
22 | "source": [
23 | "# Importing code from previous tutorial:"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": 30,
29 | "metadata": {
30 | "collapsed": true
31 | },
32 | "outputs": [],
33 | "source": [
34 | "# Initializing weight matrices from layer sizes\n",
35 | "def initializeWeights(layers):\n",
36 | " weights = [np.random.randn(o, i+1) for i, o in zip(layers[:-1], layers[1:])]\n",
37 | " return weights\n",
38 | "\n",
39 | "# Add a bias term to every data point in the input\n",
40 | "def addBiasTerms(X):\n",
41 | " # Make the input an np.array()\n",
42 | " X = np.array(X)\n",
43 | " \n",
44 | " # Forcing 1D vectors to be 2D matrices of 1xlength dimensions\n",
45 | " if X.ndim==1:\n",
46 | " X = np.reshape(X, (1, len(X)))\n",
47 | " \n",
48 | " # Inserting bias terms\n",
49 | " X = np.insert(X, 0, 1, axis=1)\n",
50 | " \n",
51 | " return X\n",
52 | "\n",
53 | "# Sigmoid function\n",
54 | "def sigmoid(a):\n",
55 | " return 1/(1 + np.exp(-a))\n",
56 | "\n",
57 | "# Forward Propagation of outputs\n",
58 | "def forwardProp(X, weights):\n",
59 | " # Initializing an empty list of outputs\n",
60 | " outputs = []\n",
61 | " \n",
62 | " # Assigning a name to reuse as inputs\n",
63 | " inputs = X\n",
64 | " \n",
65 | " # For each layer\n",
66 | " for w in weights:\n",
67 | " # Add bias term to input\n",
68 | " inputs = addBiasTerms(inputs)\n",
69 | " \n",
70 | " # Y = Sigmoid ( X .* W^T )\n",
71 | " outputs.append(sigmoid(np.dot(inputs, w.T)))\n",
72 | " \n",
73 | " # Input of next layer is output of this layer\n",
74 | " inputs = outputs[-1]\n",
75 | " \n",
76 | " return outputs"
77 | ]
78 | },
79 | {
80 | "cell_type": "markdown",
81 | "metadata": {},
82 | "source": [
83 | "# Training Neural Networks\n",
84 | "\n",
85 | "$$ Y^{(l)}_{n{\\times}o_{l}} = Sigmoid\\;(\\;X^{(l)}_{n{\\times}i_{l}} \\; .* \\; W^{(l)}{^{T}}_{i_{l}{\\times}o_{l}}) \\;\\;\\;\\;\\;\\;-------------(1)$$\n",
86 | "\n",
87 | "Neural networks are advantageous when we are able to compute that $W$ which satisfies $Y = Sigmoid(X\\cdot*W)$, for given $X$ and $Y$ (in supervised training).\n",
88 | "\n",
89 | "But, since there are so many weights (for bigger networks), it is time-intensive to algebraically solve the above equation. (Something like $W = X^{-1} \\;.*\\; Sigmoid^{-1}(Y)$...)"
90 | ]
91 | },
92 | {
93 | "cell_type": "markdown",
94 | "metadata": {},
95 | "source": [
96 | "## Set W to minimize cost (computationally intensive)\n",
97 | "\n",
98 | "A quicker way to compute W would be to randomly initialize it, and keep updating its value in such a way as to decrease the cost of the neural network.\n",
99 | "\n",
100 | "Define the cost as the mean squared error of the output of the neural network:\n",
101 | "\n",
102 | "$$error = yPred-Y$$\n",
103 | "\n",
104 | "Here, $yPred$ = ``forwardProp``$(X)$, and $Y$ is the desired output value from the neural network.\n",
105 | "\n",
106 | "$$Cost \\; J = \\frac{1}{2} \\sum \\limits_{n} \\frac{ {\\left( error \\right)}^2 }{n} = \\frac{1}{2} \\sum \\limits_{n} \\frac{ {\\left( yPred-Y \\right)}^2 }{n}$$\n",
107 | "\n",
108 | "Once we have initialized W, we need to change it such that J is minimized.\n",
109 | "\n",
110 | "The best way to minimize J w.r.t. W, is to partially derive J w.r.t. W and equate it to 0: $\\frac{{\\partial}J}{{\\partial}W} = 0$. But, this is computationally intensive."
111 | ]
112 | },
113 | {
114 | "cell_type": "code",
115 | "execution_count": 433,
116 | "metadata": {
117 | "collapsed": true
118 | },
119 | "outputs": [],
120 | "source": [
121 | "# Compute COST (J) of Neural Network\n",
122 | "def nnCost(weights, X, Y):\n",
123 | " # Calculate yPred\n",
124 | " yPred = forwardProp(X, weights)[-1]\n",
125 | " \n",
126 | " # Compute J\n",
127 | " J = 0.5*np.sum((yPred-Y)**2)/len(Y)\n",
128 | " \n",
129 | " return J"
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": 434,
135 | "metadata": {
136 | "collapsed": true
137 | },
138 | "outputs": [],
139 | "source": [
140 | "# Initialize network\n",
141 | "layers = [2, 2, 1]\n",
142 | "weights = initializeWeights(layers)"
143 | ]
144 | },
145 | {
146 | "cell_type": "code",
147 | "execution_count": 435,
148 | "metadata": {
149 | "collapsed": true
150 | },
151 | "outputs": [],
152 | "source": [
153 | "# Declare input and desired output for AND gate\n",
154 | "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n",
155 | "Y = np.array([[0], [0], [0], [1]])"
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": 436,
161 | "metadata": {},
162 | "outputs": [
163 | {
164 | "name": "stdout",
165 | "output_type": "stream",
166 | "text": [
167 | "0.284231765606\n"
168 | ]
169 | }
170 | ],
171 | "source": [
172 | "# Cost\n",
173 | "J = nnCost(weights, X, Y)\n",
174 | "print(J)"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {},
180 | "source": [
181 | "## Randomly initialize W, change it to decrease cost (more feasible)\n",
182 | "\n",
183 | "Instead, we initialize $W$ by randomly sampling from a standard normal distribution, and then keep changing $W$ so as to decrease the cost $J$.\n",
184 | "\n",
185 | "But what value to change $W$ by? To find out, let us focus on the weights of one of the neurons in the last layer, $W^{(L)}_{[k]}$, differentiate $J$ by it to see what we get:\n",
186 | "\n",
187 | "$$\\frac{ {\\partial}J} {{\\partial}W^{(L)}_{[k]} }=\\frac{\\partial}{{\\partial}W^{(L)}_{[k]}}\\left(\\frac{1}{2}\\sum\\limits_{n}{\\frac{ {\\left( yPred-Y \\right)}^2 }{n} }\\right)=\\frac{1}{2*n}\\sum\\limits_{n} \\left( \\frac{\\partial} {{\\partial}W^{(L)}_{[k]}} (yPred-Y)^2 \\right)=\\frac{1}{n}\\sum\\limits_{n} \\left( (yPred-Y) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right)$$\n",
188 | "\n",
189 | "$$\\Rightarrow \\frac{ {\\partial}J} {{\\partial}W^{(L)}_{[k]} } = \\frac{1}{n}\\sum\\limits_{n} \\left( (error) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right)$$"
190 | ]
191 | },
192 | {
193 | "cell_type": "markdown",
194 | "metadata": {},
195 | "source": [
196 | "The above equation tells us how $J$ changes by changing $W^{(L)}_{[k]}$. Approximating it for numerical analysis:\n",
197 | "\n",
198 | "$${\\Delta}J ={{\\Delta}W^{(L)}_{[k]}} * \\left[ \\frac{1}{n}\\sum\\limits_{n} \\left( (error) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right) \\right] \\;\\;\\;\\;\\;\\;-------------(2)$$ "
199 | ]
200 | },
201 | {
202 | "cell_type": "markdown",
203 | "metadata": {},
204 | "source": [
205 | "## Change $W^{(L)}_{[k]}$ so that $J$ always decreases\n",
206 | "\n",
207 | "If we ensure that ${\\Delta}W^{(L)}_{[k]}$ is equal to $-\\left[ \\frac{1}{n}\\sum\\limits_{n} \\left( (error) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right) \\right]$, we see that ${\\Delta}J = {\\Delta}W^{(L)}_{[k]}*\\left(-\\left[{\\Delta}W^{(L)}_{[k]}\\right]\\right) = -\\left[{\\Delta}W^{(L)}_{[k]}\\right]^{2} \\Rightarrow$ negative! \n",
208 | "\n",
209 | "Thus, we decide to change $W^{(L)}_{[k]}$ by that amount which ensures $J$ always decreases!\n",
210 | "\n",
211 | "$${\\Delta}W^{(L)}_{[k]} = -\\left[ \\frac{1}{n}\\sum\\limits_{n} \\left( (error) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right) \\right] \\;\\;\\;\\;\\;\\;-------------(3)$$ \n",
212 | "\n",
213 | "So, for each weight in the last layer, that ${\\Delta}W^{(L)}_{[k]}$ which shall (for sure) decrease J can be computed. "
214 | ]
215 | },
216 | {
217 | "cell_type": "markdown",
218 | "metadata": {},
219 | "source": [
220 | "## Gradient Descent\n",
221 | "\n",
222 | "If we update each weight as $W^{(L)}_{[k]} \\leftarrow W^{(L)}_{[k]} + {\\Delta}W^{(L)}_{[k]}$, it is guaranteed that with the new weights, the neural network shall produce outputs that are closer to the desired output.\n",
223 | "\n",
224 | "This is how to train a neural network - randomly initialize $W$, iteratively change $W$ according to eq (3).\n",
225 | "\n",
226 | "**This is called Gradient Descent.**\n",
227 | "\n",
228 | "One way to think about this is - assuming the graph of $J$ vs. $W$ is like an upturned hill, we are slowly descending down the hill by changing $W$, to the point where $J$ is minimum.\n",
229 | "\n",
230 | "J is (sort of) a quadratic function on W, so we can assume it's (sort of) like an upturned hill."
231 | ]
232 | },
233 | {
234 | "cell_type": "markdown",
235 | "metadata": {},
236 | "source": [
237 | "# Computing ${\\Delta}W^{(L)}$ of last layer\n",
238 | "\n",
239 | "To compute ${\\Delta}W$, we need to compute $error$ and $\\frac{{\\partial}\\;yPred}{{\\partial}W^{(L)}}$"
240 | ]
241 | },
242 | {
243 | "cell_type": "markdown",
244 | "metadata": {},
245 | "source": [
246 | "## 1. Computing error\n",
247 | "\n",
248 | "$ error = yPred - Y = $ ``forwardProp``$(X) - Y \\;\\;\\;\\;\\;\\;-------------(4)$"
249 | ]
250 | },
251 | {
252 | "cell_type": "markdown",
253 | "metadata": {},
254 | "source": [
255 | "For example, suppose we want to compute those $W$'s in a 3-neuron network that are able to perform AND logic on two inputs.\n",
256 | "\n",
257 | "Here, for $X = \\left[\\begin{array}{c}(0,0)\\\\(0,1)\\\\(1,0)\\\\(1,1)\\end{array}\\right]$, $Y = \\left[\\begin{array}{c}0\\\\0\\\\0\\\\1\\end{array}\\right]$"
258 | ]
259 | },
260 | {
261 | "cell_type": "code",
262 | "execution_count": 686,
263 | "metadata": {},
264 | "outputs": [
265 | {
266 | "name": "stdout",
267 | "output_type": "stream",
268 | "text": [
269 | "weights:\n",
270 | "1\n",
271 | "(2, 3)\n",
272 | "[[-0.87271574 0.35621485 0.95252276]\n",
273 | " [-0.61981924 -1.49164222 0.55011796]]\n",
274 | "2\n",
275 | "(1, 3)\n",
276 | "[[-1.57656753 -1.10359895 -0.34594249]]\n"
277 | ]
278 | }
279 | ],
280 | "source": [
281 | "# Initialize network\n",
282 | "layers = [2, 2, 1]\n",
283 | "weights = initializeWeights(layers)\n",
284 | "\n",
285 | "print(\"weights:\")\n",
286 | "for i in range(len(weights)):\n",
287 | " print(i+1); print(weights[i].shape); print(weights[i])"
288 | ]
289 | },
290 | {
291 | "cell_type": "markdown",
292 | "metadata": {},
293 | "source": [
294 | "Our weights have been randomly initialized. Let us see what yPred they give:"
295 | ]
296 | },
297 | {
298 | "cell_type": "code",
299 | "execution_count": 687,
300 | "metadata": {
301 | "collapsed": true
302 | },
303 | "outputs": [],
304 | "source": [
305 | "# Declare input and desired output for AND gate\n",
306 | "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n",
307 | "Y = np.array([[0], [0], [0], [1]])"
308 | ]
309 | },
310 | {
311 | "cell_type": "code",
312 | "execution_count": 688,
313 | "metadata": {},
314 | "outputs": [
315 | {
316 | "name": "stdout",
317 | "output_type": "stream",
318 | "text": [
319 | "outputs\n",
320 | "[array([[ 0.29468953, 0.34982256],\n",
321 | " [ 0.51994117, 0.48258173],\n",
322 | " [ 0.37367081, 0.10798781],\n",
323 | " [ 0.60731071, 0.17345395]]), array([[ 0.11682925],\n",
324 | " [ 0.08969868],\n",
325 | " [ 0.11646832],\n",
326 | " [ 0.09056134]])]\n"
327 | ]
328 | }
329 | ],
330 | "source": [
331 | "# Calculate outputs at each layer by forward propagation\n",
332 | "outputs = forwardProp(X, weights)\n",
333 | "print(\"outputs\"); print(outputs)"
334 | ]
335 | },
336 | {
337 | "cell_type": "code",
338 | "execution_count": 689,
339 | "metadata": {},
340 | "outputs": [
341 | {
342 | "name": "stdout",
343 | "output_type": "stream",
344 | "text": [
345 | "(4, 1)\n",
346 | "[[ 0.11682925]\n",
347 | " [ 0.08969868]\n",
348 | " [ 0.11646832]\n",
349 | " [ 0.09056134]]\n"
350 | ]
351 | }
352 | ],
353 | "source": [
354 | "# Calculate yPred as the last output from forward propagation\n",
355 | "yPred = outputs[-1]\n",
356 | "print(yPred.shape); print(yPred)"
357 | ]
358 | },
359 | {
360 | "cell_type": "code",
361 | "execution_count": 690,
362 | "metadata": {},
363 | "outputs": [
364 | {
365 | "name": "stdout",
366 | "output_type": "stream",
367 | "text": [
368 | "(4, 1)\n",
369 | "[[ 0.11682925]\n",
370 | " [ 0.08969868]\n",
371 | " [ 0.11646832]\n",
372 | " [-0.90943866]]\n"
373 | ]
374 | }
375 | ],
376 | "source": [
377 | "# Error = yPred - Y\n",
378 | "error = yPred - Y\n",
379 | "print(error.shape); print(error)"
380 | ]
381 | },
382 | {
383 | "cell_type": "markdown",
384 | "metadata": {},
385 | "source": [
386 | "## 2. Computing $\\frac{{\\partial}\\;yPred}{{\\partial}W^{(L)}_{[k]}}$\n",
387 | "\n",
388 | "From eq. (1), $yPred$ can be written as:\n",
389 | "\n",
390 | "$$yPred = Sigmoid(X^{(L)}\\;.*\\;W^{(L)}{^{T}})$$\n",
391 | "\n",
392 | "So,\n",
393 | "\n",
394 | "$$\\frac{{\\partial}\\;yPred}{{\\partial}W^{(L)}_{[k]}} = \\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}\\left(Sigmoid\\left(X^{(L)}.*W^{(L)}{^{T}}\\right)\\right) = Sigmoid^{'}\\left(X^{(L)}.*W^{(L)}{^{T}}\\right)*\\left(\\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}\\left((X^{(L)}.*W^{(L)}{^{T}})\\right)\\right)$$\n",
395 | "\n",
396 | "Here, $yPred$ is an $o_{L}$-dimensional number, and $W^{(L)}_{[k]}$ only affect the $k$-th dimension of $yPred$, i.e. $yPred_{[k]}$. So,\n",
397 | "\n",
398 | "$$\\frac{{\\partial}\\;yPred_{[k]}}{{\\partial}W^{(L)}_{[k]}} = Sigmoid^{'}\\left(X^{(L)}.*W^{(L)}_{[k]}\\right)*\\left(\\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}\\left((X^{(L)}.*W^{(L)}_{[k]})\\right)\\right)$$"
399 | ]
400 | },
401 | {
402 | "cell_type": "markdown",
403 | "metadata": {},
404 | "source": [
405 | "### - Computing $Sigmoid^{'}\\left(X^{(L)}.*W^{(L)}_{[k]}\\right)$\n",
406 | "\n",
407 | "It can be verified that $Sigmoid^{'}(a) = Sigmoid(a)*(1-Sigmoid(a))$. Thus, $Sigmoid^{'}(X^{(L)}.*W^{(L)}_{[k]}{^{T}}) = yPred_{[k]}*(1 - yPred_{[k]})$. So,\n",
408 | "\n",
409 | "$$\\frac{{\\partial}\\;yPred_{[k]}}{{\\partial}W^{(L)}_{[k]}} = \\left(yPred_{[k]}*(1 - yPred_{[k]})*\\left(\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L)}_{[k]}}\\right)\\right)$$\n",
410 | "\n",
411 | "$${\\Delta}W^{(L)}_{[k]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}\\left(error_{[k]}*yPred_{[k]}*(1 - yPred_{[k]})*\\left(\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L)}_{[k]}}\\right)\\right)\\right] \\;\\;\\;\\;\\;\\;-------------(5)$$"
412 | ]
413 | },
414 | {
415 | "cell_type": "markdown",
416 | "metadata": {},
417 | "source": [
418 | "### - Computing $\\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}((X^{(L)}.*W^{(L)}_{[k]}))$\n",
419 | "\n",
420 | "It can be seen that $\\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}((X^{(L)}.*W^{(L)}_{[k]})) = X^{(L)}$\n",
421 | "\n",
422 | "We also know that $X^{(L)} = \\left[ \\begin{array}{c} 1 & Y^{(L-1)} \\end{array} \\right]_{n{\\times}i_{L}}$, and $Y^{(L-1)}$ have been computed during Forward Propagation. So,\n",
423 | "\n",
424 | "$$\\frac{{\\partial}\\;yPred_{[k]}}{{\\partial}W^{(L)}} = (yPred_{[k]}*(1-yPred_{[k]}))*X^{(L)} $$\n",
425 | "\n",
426 | "$${\\Delta}W^{(L)}_{[k]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}\\left(error_{[k]}*yPred_{[k]}*(1 - yPred_{[k]})*X^{(L)}\\right)\\right]\\;\\;\\;\\;\\;\\;-------------(6)$$"
427 | ]
428 | },
429 | {
430 | "cell_type": "markdown",
431 | "metadata": {},
432 | "source": [
433 | "## Combining terms to simplify computation\n",
434 | "\n",
435 | "Here, dimension of $error$, $yPred$, and $(1-yPred)$ is $n{\\times}o_{L}$, while that of $X^{(L)}$ is $n{\\times}i_{L}$. A little thought has to be given towards how those quantities are multiplied.\n",
436 | "\n",
437 | "First of all, we can combine the mentioned three into one and call it $\\delta$.\n",
438 | "\n",
439 | "$${\\delta}_{n{\\times}o_{L}} = error_{n{\\times}o_{L}}*yPred_{n{\\times}o_{L}}*(1-yPred)_{n{\\times}o_{L}} \\;\\;\\;\\;\\;\\;-----(7)$$\n",
440 | "\n",
441 | "$${\\Delta}W^{(L)}_{[k]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}\\left({\\delta}_{[k]}*X^{(L)}\\right)\\right]$$\n",
442 | "\n",
443 | "We can now combine calculations of all dimensions into this matrix operation: (We will figure out the matrix dimensions below)\n",
444 | "\n",
445 | "$${\\Delta}W^{(L)} = -\\left[\\frac{1}{n}\\sum\\limits_{n}\\left({\\delta}*X^{(L)}\\right)\\right]$$\n"
446 | ]
447 | },
448 | {
449 | "cell_type": "markdown",
450 | "metadata": {},
451 | "source": [
452 | "One way of figuring out how $\\delta$ and $X^{(L)}$ are combined is to see that the dimension of ${\\Delta}W$ is $o_{L}{\\times}i_{L}$, dimension of $\\delta$ is $n{\\times}o_{L}$, and the dimension of $X^{(L)}$ is $n{\\times}i_{L}$.\n",
453 | "\n",
454 | "Clearly, the $\\sum\\limits_{n}\\left({\\delta}*X^{(L)}\\right)$ term, when considered for all the weights, is equal to $\\delta^{T}_{o_{L}{\\times}n}\\;.*\\;X^{(L)}_{n{\\times}i_{L}}$, the summation over $n$ being taken care of by the dot product, and the output dimension ${o_{L}{\\times}i_{L}}$ matches that of $W^{(L)}$.\n",
455 | "\n",
456 | "Hence, using matrix operations, ${\\Delta}W^{(L)}$ can be found as:\n",
457 | "\n",
458 | "$${\\Delta}W^{(L)}_{{o_{L}{\\times}i_{L}}} = -\\frac{1}{n}\\left({\\delta}^{T}{_{o_{L}{\\times}n}}\\;.*\\;X^{(L)}_{n{\\times}i_{L}}\\right) \\;\\;\\;\\;\\;\\;-------------(8)$$"
459 | ]
460 | },
461 | {
462 | "cell_type": "code",
463 | "execution_count": 691,
464 | "metadata": {},
465 | "outputs": [
466 | {
467 | "name": "stdout",
468 | "output_type": "stream",
469 | "text": [
470 | "(4, 1)\n",
471 | "[[ 0.01205446]\n",
472 | " [ 0.00732415]\n",
473 | " [ 0.01198499]\n",
474 | " [-0.07490136]]\n"
475 | ]
476 | }
477 | ],
478 | "source": [
479 | "# Calculate delta for the last layer\n",
480 | "delta = np.multiply(np.multiply(error, yPred), 1-yPred)\n",
481 | "print(delta.shape); print(delta)"
482 | ]
483 | },
484 | {
485 | "cell_type": "code",
486 | "execution_count": 692,
487 | "metadata": {},
488 | "outputs": [
489 | {
490 | "name": "stdout",
491 | "output_type": "stream",
492 | "text": [
493 | "(4, 3)\n",
494 | "[[ 1. 0.29468953 0.34982256]\n",
495 | " [ 1. 0.51994117 0.48258173]\n",
496 | " [ 1. 0.37367081 0.10798781]\n",
497 | " [ 1. 0.60731071 0.17345395]]\n"
498 | ]
499 | }
500 | ],
501 | "source": [
502 | "# Find input to the last layer\n",
503 | "xL = addBiasTerms(outputs[-2])\n",
504 | "print(xL.shape); print(xL)"
505 | ]
506 | },
507 | {
508 | "cell_type": "code",
509 | "execution_count": 693,
510 | "metadata": {},
511 | "outputs": [
512 | {
513 | "name": "stdout",
514 | "output_type": "stream",
515 | "text": [
516 | "(1, 3)\n",
517 | "[[ 0.01088444 0.00841238 0.00098657]]\n"
518 | ]
519 | }
520 | ],
521 | "source": [
522 | "# Find deltaW for last layer\n",
523 | "deltaW = -np.dot(delta.T, xL)/len(Y)\n",
524 | "print(deltaW.shape); print(deltaW)"
525 | ]
526 | },
527 | {
528 | "cell_type": "code",
529 | "execution_count": 694,
530 | "metadata": {},
531 | "outputs": [
532 | {
533 | "name": "stdout",
534 | "output_type": "stream",
535 | "text": [
536 | "old weights:\n",
537 | "1\n",
538 | "(2, 3)\n",
539 | "[[-0.87271574 0.35621485 0.95252276]\n",
540 | " [-0.61981924 -1.49164222 0.55011796]]\n",
541 | "2\n",
542 | "(1, 3)\n",
543 | "[[-1.57656753 -1.10359895 -0.34594249]]\n",
544 | "new weights:\n",
545 | "1\n",
546 | "(2, 3)\n",
547 | "[[-0.87271574 0.35621485 0.95252276]\n",
548 | " [-0.61981924 -1.49164222 0.55011796]]\n",
549 | "2\n",
550 | "(1, 3)\n",
551 | "[[-1.5656831 -1.09518657 -0.34495592]]\n",
552 | "old cost:\n",
553 | "0.107792308277\n",
554 | "new cost:\n",
555 | "0.107601673739\n"
556 | ]
557 | }
558 | ],
559 | "source": [
560 | "# Checking cost of neural network before and after change in W^{L}\n",
561 | "newWeights = [np.array(w) for w in weights]\n",
562 | "newWeights[-1] += deltaW\n",
563 | "\n",
564 | "print(\"old weights:\")\n",
565 | "for i in range(len(weights)):\n",
566 | " print(i+1); print(weights[i].shape); print(weights[i])\n",
567 | "\n",
568 | "print(\"new weights:\")\n",
569 | "for i in range(len(newWeights)):\n",
570 | " print(i+1); print(newWeights[i].shape); print(newWeights[i])\n",
571 | "\n",
572 | "print(\"old cost:\"); print(nnCost(weights, X, Y))\n",
573 | "print(\"new cost:\"); print(nnCost(newWeights, X, Y))"
574 | ]
575 | },
576 | {
577 | "cell_type": "markdown",
578 | "metadata": {},
579 | "source": [
580 | "### **Congratulations! You've just learned how to back propagate!**\n",
581 | "(1 layer only)"
582 | ]
583 | },
584 | {
585 | "cell_type": "markdown",
586 | "metadata": {},
587 | "source": [
588 | "# Back-propagation through layers\n",
589 | "\n",
590 | "For the last layer, according to eq. (5),\n",
591 | "$${\\Delta}W^{(L)}_{[k]} = -\\frac{1}{n}\\sum\\limits_{n}\\left(error_{[k]}*yPred_{[k]}*(1 - yPred_{[k]})*\\left(\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L)}_{[k]}}\\right)\\right) = -\\frac{1}{n}\\sum\\limits_{n}\\left(\\delta^{(L)}_{[k]}*\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L)}_{[k]}}\\right)$$"
592 | ]
593 | },
594 | {
595 | "cell_type": "markdown",
596 | "metadata": {},
597 | "source": [
598 | "### Computing for Layer L-1\n",
599 | "\n",
600 | "If we go back one more layer to find out ${\\Delta}W$ for the $p^{th}$ neuron in the $(L-1)^{th}$ layer, backpropagated from the $k^{th}$ neuron in the $L^{th}$ layer, noting that $X^{L} = Y^{L-1}$:\n",
601 | "\n",
602 | "$${\\Delta}W^{(L-1)}_{[p]from[k]} = -\\frac{1}{n}\\sum\\limits_{n}\\left(\\delta^{(L)}_{[k]}*\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}}\\right) = -\\frac{1}{n}\\sum\\limits_{n}\\left(\\delta^{(L)}_{[k]}*\\frac{{\\partial}(Y^{(L-1)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}}\\right)$$"
603 | ]
604 | },
605 | {
606 | "cell_type": "markdown",
607 | "metadata": {},
608 | "source": [
609 | "Here, $Y^{(L-1)}$ is the collected output of the penultimate layer, i.e. the collected output of all neurons in the penultimate layer. $W^{(L-1)}_{[p]}$ is the weight matrix of the $p^{th}$ neuron in the penultimate layer. So,\n",
610 | "\n",
611 | "$$\\frac{{\\partial}(Y^{(L-1)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}} = \\frac{{\\partial}((Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[0]}) + Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[1]}) + ... + Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[p]}) + ... + Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[i_{L-1}-1]})).*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}}$$\n",
612 | "\n",
613 | "We know that change in $W^{(L-1)}_{[p]}$ does not affect $W^{(L)}$ or any $W^{(L-1)}$ weight matrix other than $W^{(L-1)}_{[p]}$. So:\n",
614 | "\n",
615 | "$$\\frac{{\\partial}(Y^{(L-1)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}} = \\frac{{\\partial}(Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[p]}))}{{\\partial}W^{(L-1)}_{[p]}}$$\n",
616 | "\n",
617 | "$${\\Delta}W^{(L-1)}_{[p]from[k]} = -\\frac{1}{n}\\sum\\limits_{n}\\left(\\delta^{(L)}_{[k]}*W^{(L)}_{[k]}*\\frac{{\\partial}\\;(Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[p]}))}{{\\partial}W^{(L-1)}_{[p]}}\\right)$$\n",
618 | "\n",
619 | "(Ignoring dimensions for now)"
620 | ]
621 | },
622 | {
623 | "cell_type": "markdown",
624 | "metadata": {},
625 | "source": [
626 | "We know how this goes now.\n",
627 | "\n",
628 | "$$\\frac{{\\partial}\\;(Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[p]}))}{{\\partial}W^{(L-1)}_{[p]}} = Sigmoid^{'}(X^{(L-1)}.*W^{(L-1)}_{[p]})*\\frac{{\\partial}(X^{(L-1)}.*W^{(L-1)}_{[p]})}{{\\partial}W^{(L-1)}_{[p]}} = Y^{(L-1)}_{[p]}*(1 - Y^{(L-1)}_{[p]}))*X^{(L-1)}$$\n"
629 | ]
630 | },
631 | {
632 | "cell_type": "markdown",
633 | "metadata": {},
634 | "source": [
635 | "Thus,\n",
636 | "\n",
637 | "$${\\Delta}W^{(L-1)}_{[p]from[k]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}(\\delta^{(L)}_{[k]}*W^{(L)}_{[k]}*(Y^{(L-1)}_{[p]}*(1 - Y^{(L-1)}_{[p]})*X^{(L-1)})\\right]$$\n",
638 | "\n",
639 | "We need to take care of the dimensions. Here, there are two parts: $\\delta^{(L)}_{[k]}*W^{(L)}_{[k]}$, which is only concerned with the $L^{th}$ layer, and $Y^{(L-1)}_{[p]}*(1 - Y^{(L-1)}_{[p]}))*X^{(L-1)}$, which is only concerned with the $(L-1)^{th}$ layer."
640 | ]
641 | },
642 | {
643 | "cell_type": "markdown",
644 | "metadata": {},
645 | "source": [
646 | "### 1) Back-propagation Error\n",
647 | "\n",
648 | "We can observe here that the terms $\\delta^{(L)}_{[k]}$ and $W^{(L)}_{[k]}$ are back-propagated from the $k^{th}$ neuron of the final layer. Let's combine them and call it the back-propagated error:\n",
649 | "$$bpError^{(L-1)}_{[k]}{_{n{\\times}i_{L}}} = \\delta^{(L)}_{[k]}*W^{(L)}_{[k]}$$\n",
650 | "\n",
651 | "We know that $\\delta^{(L)}{_{n{\\times}o_{L}}}*W^{(L)}{_{o_{L}{\\times}{i_{L}}}}$ is a matrix of dimensions $n{\\times}i_{L} = n{\\times}o_{(L-1)}$, which is the sum of the backprop errors from each neuron in the final layer. Thus,\n",
652 | "\n",
653 | "$$bpError^{(L-1)}{_{n{\\times}i_{L}}} = \\delta^{(L)}*W^{(L)} \\;\\;\\;\\;\\;\\;--------------(9)$$\n",
654 | "\n",
655 | "We see that for a neuron in the $(L-1)^{th}$ layer, the total error back-propagated to it is the sum of the back-propagated errors from each of the neurons connected to it in the $L^{th}$ layer."
656 | ]
657 | },
658 | {
659 | "cell_type": "markdown",
660 | "metadata": {},
661 | "source": [
662 | "Thus, instead of ${\\Delta}W^{(L-1)}_{[p]from[k]}$ from the $k^{th}$ neuron, we can directly consider ${\\Delta}W^{(L-1)}_{[p]}$:\n",
663 | "\n",
664 | "$${\\Delta}W^{(L-1)}_{[p]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}(bpError^{(L-1)}_{[p]}*(Y^{(L-1)}_{[p]}*(1 - Y^{(L-1)}_{[p]})*X^{(L-1)})\\right]$$"
665 | ]
666 | },
667 | {
668 | "cell_type": "markdown",
669 | "metadata": {},
670 | "source": [
671 | "### 2) The $Y*(1-Y)*X$ term\n",
672 | "\n",
673 | "We can convert the $Y*(1-Y)*X$ term into a matrix operation, with summation over $n$ inherently taken care. Directly considering $Y$ instead of $Y_{[p]}$:\n",
674 | "\n",
675 | "$$Y^{(L-1)}*(1 - Y^{(L-1)}))*X^{(L-1)} == (Y^{(L-1)}.*(1 - Y^{(L-1)}))^{T}{_{o_{(L-1)}{\\times}n}} * X^{(L-1)}{_{n{\\times}i_{(L-1)}}}$$\n",
676 | "\n",
677 | "We can see that the resultant matrix has the same dimensions as $W^{(L-1)}$ : $o_{(L-1)}{\\times}i_{(L-1)}$."
678 | ]
679 | },
680 | {
681 | "cell_type": "markdown",
682 | "metadata": {},
683 | "source": [
684 | "### Combining the two\n",
685 | "\n",
686 | "To combine $bpError$ and the $Y*(1-Y)*X$ terms, for consistency in dimensions, we need to first dot multiply $bpError_{n{\\times}o_{(L-1)}}$ with $Y^{(L-1)}_{n{\\times}o_{(L-1)}}.*(1 - Y^{(L-1)})_{n{\\times}o_{(L-1)}}$, and then multiply the transpose of that with X.\n",
687 | "\n",
688 | "Thus,\n",
689 | "\n",
690 | "$${\\Delta}W^{(L-1)}_{o_{(L-1)}{\\times}i_{(L-1)}} = -\\left[\\frac{1}{n}((bpError^{(L-1)}.*Y^{(L-1)}.*(1 - Y^{(L-1)}))^{T} _{o_{(L-1)}{\\times}n}* X^{(L-1)}_{n{\\times}i_{(L-1)}}\\right] \\;\\;\\;\\;\\;\\;--------------(10)$$\n",
691 | "\n",
692 | "(Summation across $n$ is taken care of within the matrix multiplication)"
693 | ]
694 | },
695 | {
696 | "cell_type": "markdown",
697 | "metadata": {},
698 | "source": [
699 | "## Simplifying to matrix operation of any layer $l$\n",
700 | "\n",
701 | "Just as we had done for the final layer, from equation 9:\n",
702 | "\n",
703 | "
$bpError^{(l)}_{n{\\times}o_{l}} = \\delta^{(l+1)}_{n{\\times}o_{l+1}}*W^{(l+1)}_{o_{l+1}{\\times}o_{l}}$\n",
704 | "\n",
705 | "If we compare equation (10) with equation (6), we can generalize \"error\" there as Backpropagation Error, and the formula for ${\\delta}$ as:\n",
706 | "\n",
707 | "$${\\delta}^{(l)}_{n{\\times}o_{l}} = {bpError^{(l)}_{n{\\times}o_{l}}} .* {Y^{(l)}_{n{\\times}o_{l}}} .* (1-Y^{(l)})_{n{\\times}o_{l}}$$\n",
708 | "\n",
709 | "Thus,\n",
710 | "\n",
711 | "$${\\Delta}W^{(l)}_{{o_{l}{\\times}i_{l}}} = -\\frac{1}{n}\\left({\\delta^{(l)}}^{T}{_{o_{l}{\\times}n}}\\;.*\\;X^{(l)}_{n{\\times}i_{l}}\\right) \\;\\;\\;\\;\\;\\;-------------(11)$$\n"
712 | ]
713 | },
714 | {
715 | "cell_type": "code",
716 | "execution_count": 695,
717 | "metadata": {
718 | "collapsed": true
719 | },
720 | "outputs": [],
721 | "source": [
722 | "# IMPLEMENTING BACK-PROPAGATION\n",
723 | "def backProp(weights, X, Y):\n",
724 | " # Forward propagate to find outputs\n",
725 | " outputs = forwardProp(X, weights)\n",
726 | " \n",
727 | " # For the last layer, bpError = error = yPred - Y\n",
728 | " bpError = outputs[-1] - Y\n",
729 | " \n",
730 | " # Back-propagating from the last layer to the first\n",
731 | " for l, w in enumerate(reversed(weights)):\n",
732 | " \n",
733 | " # Find yPred for this layer\n",
734 | " yPred = outputs[-l-1]\n",
735 | " \n",
736 | " # Calculate delta for this layer using bpError from next layer\n",
737 | " delta = np.multiply(np.multiply(bpError, yPred), 1-yPred)\n",
738 | " \n",
739 | " # Find input to the layer, by adding bias to the output of the previous layer\n",
740 | " # Take care, l goes from 0 to 1, while the weights are in reverse order\n",
741 | " if l==len(weights)-1: # If 1st layer has been reached\n",
742 | " xL = addBiasTerms(X)\n",
743 | " else:\n",
744 | " xL = addBiasTerms(outputs[-l-2])\n",
745 | " \n",
746 | " # Calculate deltaW for this layer\n",
747 | " deltaW = -np.dot(delta.T, xL)/len(Y)\n",
748 | " \n",
749 | " # Calculate bpError for previous layer to be back-propagated\n",
750 | " bpError = np.dot(delta, w)\n",
751 | " \n",
752 | " # Ignore bias term in bpError\n",
753 | " bpError = bpError[:,1:]\n",
754 | " \n",
755 | " # Change weights of the current layer (W <- W + deltaW)\n",
756 | " w += deltaW"
757 | ]
758 | },
759 | {
760 | "cell_type": "code",
761 | "execution_count": 698,
762 | "metadata": {},
763 | "outputs": [
764 | {
765 | "name": "stdout",
766 | "output_type": "stream",
767 | "text": [
768 | "old weights:\n",
769 | "1\n",
770 | "(2, 3)\n",
771 | "[[-0.87271574 0.35621485 0.95252276]\n",
772 | " [-0.61981924 -1.49164222 0.55011796]]\n",
773 | "2\n",
774 | "(1, 3)\n",
775 | "[[-1.57656753 -1.10359895 -0.34594249]]\n",
776 | "old cost:\n",
777 | "0.107792308277\n"
778 | ]
779 | }
780 | ],
781 | "source": [
782 | "# To check with the single back-propagation step done before,\n",
783 | "# back up the current weights\n",
784 | "oldWeights = [np.array(w) for w in weights]\n",
785 | "print(\"old weights:\")\n",
786 | "for i in range(len(oldWeights)):\n",
787 | " print(i+1); print(oldWeights[i].shape); print(oldWeights[i])\n",
788 | "\n",
789 | "print(\"old cost:\"); print(nnCost(oldWeights, X, Y))"
790 | ]
791 | },
792 | {
793 | "cell_type": "markdown",
794 | "metadata": {},
795 | "source": [
796 | "Let us define a function to compute the accuracy of our model, irrespective of the number of neuron in the output layer."
797 | ]
798 | },
799 | {
800 | "cell_type": "code",
801 | "execution_count": 699,
802 | "metadata": {
803 | "collapsed": true
804 | },
805 | "outputs": [],
806 | "source": [
807 | "# Evaluate the accuracy of weights for input X and desired outptut Y\n",
808 | "def evaluate(weights, X, Y):\n",
809 | " yPreds = forwardProp(X, weights)[-1]\n",
810 | " # Check if maximum probability is from that neuron corresponding to desired class,\n",
811 | " # AND check if that maximum probability is greater than 0.5\n",
812 | " yes = sum( int( ( np.argmax(yPreds[i]) == np.argmax(Y[i]) ) and \n",
813 | " ( (yPreds[i][np.argmax(yPreds[i])]>0.5) == (Y[i][np.argmax(Y[i])]>0.5) ) )\n",
814 | " for i in range(len(Y)) )\n",
815 | " print(str(yes)+\" out of \"+str(len(Y))+\" : \"+str(float(yes/len(Y))))"
816 | ]
817 | },
818 | {
819 | "cell_type": "markdown",
820 | "metadata": {},
821 | "source": [
822 | "Check the results of back-propagation:"
823 | ]
824 | },
825 | {
826 | "cell_type": "code",
827 | "execution_count": 722,
828 | "metadata": {},
829 | "outputs": [
830 | {
831 | "name": "stdout",
832 | "output_type": "stream",
833 | "text": [
834 | "950\n",
835 | "new cost:\n",
836 | "0.0113971310862\n",
837 | "new accuracy: \n",
838 | "4 out of 4 : 1.0\n",
839 | "[[ 0.03022141]\n",
840 | " [ 0.13740936]\n",
841 | " [ 0.13683374]\n",
842 | " [ 0.7705247 ]]\n"
843 | ]
844 | }
845 | ],
846 | "source": [
847 | "# BACK-PROPAGATE, checking old & new weights and costs\n",
848 | "\n",
849 | "# Re-initialize to old weights\n",
850 | "weights = [np.array(w) for w in oldWeights]\n",
851 | "\n",
852 | "#print(\"old weights:\")\n",
853 | "#for i in range(len(weights)):\n",
854 | "# print(i+1); print(weights[i].shape); print(weights[i])\n",
855 | "\n",
856 | "print(\"old cost: \"); print(nnCost(weights, X, Y))\n",
857 | "print(\"old accuracy: \"); print(evaluate(weights, X, Y))\n",
858 | "for i in range(1000):\n",
859 | " # Back propagate\n",
860 | " backProp(weights, X, Y)\n",
861 | "\n",
862 | " #print(\"new weights:\")\n",
863 | " #for i in range(len(weights)):\n",
864 | " # print(i+1); print(weights[i].shape); print(weights[i])\n",
865 | " \n",
866 | " if i%50==0:\n",
867 | " time.sleep(1)\n",
868 | " clear_output()\n",
869 | " print(i)\n",
870 | " print(\"new cost:\"); print(nnCost(weights, X, Y))\n",
871 | " print(\"new accuracy: \"); evaluate(weights, X, Y)\n",
872 | " print(forwardProp(X, weights)[-1])\n"
873 | ]
874 | },
875 | {
876 | "cell_type": "code",
877 | "execution_count": 718,
878 | "metadata": {
879 | "collapsed": true
880 | },
881 | "outputs": [],
882 | "source": [
883 | "# Revert back to original weights (if needed)\n",
884 | "weights = [np.array(w) for w in oldWeights]"
885 | ]
886 | },
887 | {
888 | "cell_type": "markdown",
889 | "metadata": {},
890 | "source": [
891 | "### Training\n",
892 | "\n",
893 | "Keep calling backProp() again and again until the cost decreases so much that we reach our desired accuracy.\n",
894 | "\n",
895 | "You can observe the cost of the function going down with iterations."
896 | ]
897 | },
898 | {
899 | "cell_type": "markdown",
900 | "metadata": {},
901 | "source": [
902 | "# Problems\n",
903 | "\n",
904 | "### - Not reaching desired accuracy fast enough\n",
905 | "\n",
906 | "It takes too many iterations of the backProp algorithm for the network to reach the desired output.\n",
907 | "\n",
908 | "One of the simplest ways of solving this problem is by adding a Learning Rate (described below) to the back-propagation algorithm.\n",
909 | "\n",
910 | "### - Taking too long to compute one iteration\n",
911 | "\n",
912 | "Within one iteration, the multiplication and summing operations take too long because there are too many data points feeded into the network.\n",
913 | "\n",
914 | "This problem is tackled using Stochastic Gradient Descent (talked about in the next tutorial). The above algorithm is running Batch Gradient Descent. "
915 | ]
916 | },
917 | {
918 | "cell_type": "markdown",
919 | "metadata": {},
920 | "source": [
921 | "# Learning Rate\n",
922 | "\n",
923 | "Usually, it is desired that we change the amount with which we back propagate, so that we can train our network to reach the desired accuracy faster. So we multiply ${\\Delta}W$ with a factor to control this.\n",
924 | "\n",
925 | "$$W \\leftarrow W + \\eta*{\\Delta}W$$"
926 | ]
927 | },
928 | {
929 | "cell_type": "markdown",
930 | "metadata": {},
931 | "source": [
932 | "If $\\eta$ is large, then we take bigger steps to the assumed minimum. If $\\eta$ is small, we take smaller steps.\n",
933 | "\n",
934 | "Remember that we are not actually travelling on the gradient, we are only approximating the direction using a ${\\Delta}W$ instead of a ${\\delta}W$. So we don't always point in the direction of the minimum, we could undershoot or overshoot."
935 | ]
936 | },
937 | {
938 | "cell_type": "markdown",
939 | "metadata": {},
940 | "source": [
941 | "If $\\eta$ is too small, we might take too long to get to the minimum.\n",
942 | "\n",
943 | "If $\\eta$ is too big, we might start climbing back up the hill and our cost would keep increasing instead of decreasing!"
944 | ]
945 | },
946 | {
947 | "cell_type": "markdown",
948 | "metadata": {},
949 | "source": [
950 | "One way to ensure that we get the best learning rate is to start at, say, 1,\n",
951 | "- increase $\\eta$ by 5% if the cost is decreasing\n",
952 | "- decrease $\\eta$ to 50% if the cost is increasing"
953 | ]
954 | },
955 | {
956 | "cell_type": "markdown",
957 | "metadata": {},
958 | "source": [
959 | "### Different ways to manipulate learning rate\n",
960 | "\n",
961 | "There are various methods available that leverage the variability of learning rate, to produce results that \"converge\" (reach a minimum) faster. The following list includes those with even more complicated methods of trying to converge faster:\n",
962 | "\n",
963 | ""
964 | ]
965 | },
966 | {
967 | "cell_type": "markdown",
968 | "metadata": {},
969 | "source": [
970 | "As can be seen, Stochastic Gradient Descent (SGD) itself performs slower than all the other methods, and the one that we are using (Batch Gradient Descent) is even slower."
971 | ]
972 | },
973 | {
974 | "cell_type": "markdown",
975 | "metadata": {},
976 | "source": [
977 | "Below is an implementation of backProp with provision for learning rate:"
978 | ]
979 | },
980 | {
981 | "cell_type": "code",
982 | "execution_count": 485,
983 | "metadata": {
984 | "collapsed": true
985 | },
986 | "outputs": [],
987 | "source": [
988 | "# IMPLEMENTING BACK-PROPAGATION WITH LEARNING RATE\n",
989 | "# Added eta, the learning rate, as an input\n",
990 | "def backProp(weights, X, Y, learningRate):\n",
991 | " # Forward propagate to find outputs\n",
992 | " outputs = forwardProp(X, weights)\n",
993 | " \n",
994 | " # For the last layer, bpError = error = yPred - Y\n",
995 | " bpError = outputs[-1] - Y\n",
996 | " \n",
997 | " # Back-propagating from the last layer to the first\n",
998 | " for l, w in enumerate(reversed(weights)):\n",
999 | " \n",
1000 | " # Find yPred for this layer\n",
1001 | " yPred = outputs[-l-1]\n",
1002 | " \n",
1003 | " # Calculate delta for this layer using bpError from next layer\n",
1004 | " delta = np.multiply(np.multiply(bpError, yPred), 1-yPred)\n",
1005 | " \n",
1006 | " # Find input to the layer, by adding bias to the output of the previous layer\n",
1007 | " # Take care, l goes from 0 to 1, while the weights are in reverse order\n",
1008 | " if l==len(weights)-1: # If 1st layer has been reached\n",
1009 | " xL = addBiasTerms(X)\n",
1010 | " else:\n",
1011 | " xL = addBiasTerms(outputs[-l-2])\n",
1012 | " \n",
1013 | " # Calculate deltaW for this layer\n",
1014 | " deltaW = -np.dot(delta.T, xL)/len(Y)\n",
1015 | " \n",
1016 | " # Calculate bpError for previous layer to be back-propagated\n",
1017 | " bpError = np.dot(delta, w)\n",
1018 | " \n",
1019 | " # Ignore bias term in bpError\n",
1020 | " bpError = bpError[:,1:]\n",
1021 | " \n",
1022 | " # Change weights of the current layer (W <- W + eta*deltaW)\n",
1023 | " w += learningRate*deltaW"
1024 | ]
1025 | },
1026 | {
1027 | "cell_type": "markdown",
1028 | "metadata": {},
1029 | "source": [
1030 | "Given this back-propagation code, it is better to launch another function that calls it iteratively until we reach the desired accuracy."
1031 | ]
1032 | },
1033 | {
1034 | "cell_type": "markdown",
1035 | "metadata": {},
1036 | "source": [
1037 | "We shall look at training schemes and experiments in the next tutorial."
1038 | ]
1039 | }
1040 | ],
1041 | "metadata": {
1042 | "kernelspec": {
1043 | "display_name": "Python 3",
1044 | "language": "python",
1045 | "name": "python3"
1046 | },
1047 | "language_info": {
1048 | "codemirror_mode": {
1049 | "name": "ipython",
1050 | "version": 3
1051 | },
1052 | "file_extension": ".py",
1053 | "mimetype": "text/x-python",
1054 | "name": "python",
1055 | "nbconvert_exporter": "python",
1056 | "pygments_lexer": "ipython3",
1057 | "version": "3.5.2"
1058 | }
1059 | },
1060 | "nbformat": 4,
1061 | "nbformat_minor": 2
1062 | }
1063 |
--------------------------------------------------------------------------------
/5_Neural_Network_Tutorial_Training_And_Testing.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 2,
6 | "metadata": {
7 | "collapsed": true
8 | },
9 | "outputs": [],
10 | "source": [
11 | "# Pre-requisites\n",
12 | "import numpy as np\n",
13 | "import time\n",
14 | "\n",
15 | "# For plots\n",
16 | "%matplotlib inline\n",
17 | "import matplotlib.pyplot as plt\n",
18 | "\n",
19 | "# To clear print buffer\n",
20 | "from IPython.display import clear_output"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": [
27 | "# Importing functions from the previous tutorials:"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 3,
33 | "metadata": {},
34 | "outputs": [
35 | {
36 | "name": "stdout",
37 | "output_type": "stream",
38 | "text": [
39 | "weights:\n",
40 | "1\n",
41 | "(2, 3)\n",
42 | "[[-0.33589735 -0.396816 0.45849862]\n",
43 | " [-0.64374374 -2.41279823 0.78403628]]\n",
44 | "2\n",
45 | "(1, 3)\n",
46 | "[[ 1.54182154 -0.12516091 -0.28203429]]\n"
47 | ]
48 | }
49 | ],
50 | "source": [
51 | "# Initializing weight matrices from layer sizes\n",
52 | "def initializeWeights(layers):\n",
53 | " weights = [np.random.randn(o, i+1) for i, o in zip(layers[:-1], layers[1:])]\n",
54 | " return weights\n",
55 | "\n",
56 | "# Add a bias term to every data point in the input\n",
57 | "def addBiasTerms(X):\n",
58 | " # Make the input an np.array()\n",
59 | " X = np.array(X)\n",
60 | " \n",
61 | " # Forcing 1D vectors to be 2D matrices of 1xlength dimensions\n",
62 | " if X.ndim==1:\n",
63 | " X = np.reshape(X, (1, len(X)))\n",
64 | " \n",
65 | " # Inserting bias terms\n",
66 | " X = np.insert(X, 0, 1, axis=1)\n",
67 | " \n",
68 | " return X\n",
69 | "\n",
70 | "# Sigmoid function\n",
71 | "def sigmoid(a):\n",
72 | " return 1/(1 + np.exp(-a))\n",
73 | "\n",
74 | "# Forward Propagation of outputs\n",
75 | "def forwardProp(X, weights):\n",
76 | " # Initializing an empty list of outputs\n",
77 | " outputs = []\n",
78 | " \n",
79 | " # Assigning a name to reuse as inputs\n",
80 | " inputs = X\n",
81 | " \n",
82 | " # For each layer\n",
83 | " for w in weights:\n",
84 | " # Add bias term to input\n",
85 | " inputs = addBiasTerms(inputs)\n",
86 | " \n",
87 | " # Y = Sigmoid ( X .* W^T )\n",
88 | " outputs.append(sigmoid(np.dot(inputs, w.T)))\n",
89 | " \n",
90 | " # Input of next layer is output of this layer\n",
91 | " inputs = outputs[-1]\n",
92 | " \n",
93 | " return outputs\n",
94 | "\n",
95 | "# Compute COST (J) of Neural Network\n",
96 | "def nnCost(weights, X, Y):\n",
97 | " # Calculate yPred\n",
98 | " yPred = forwardProp(X, weights)[-1]\n",
99 | " \n",
100 | " # Compute J\n",
101 | " J = 0.5*np.sum((yPred-Y)**2)/len(Y)\n",
102 | " \n",
103 | " return J\n",
104 | "\n",
105 | "# IMPLEMENTING BACK-PROPAGATION WITH LEARNING RATE\n",
106 | "# Added eta, the learning rate, as an input\n",
107 | "def backProp(weights, X, Y, learningRate):\n",
108 | " # Forward propagate to find outputs\n",
109 | " outputs = forwardProp(X, weights)\n",
110 | " \n",
111 | " # For the last layer, bpError = error = yPred - Y\n",
112 | " bpError = outputs[-1] - Y\n",
113 | " \n",
114 | " # Back-propagating from the last layer to the first\n",
115 | " for l, w in enumerate(reversed(weights)):\n",
116 | " \n",
117 | " # Find yPred for this layer\n",
118 | " yPred = outputs[-l-1]\n",
119 | " \n",
120 | " # Calculate delta for this layer using bpError from next layer\n",
121 | " delta = np.multiply(np.multiply(bpError, yPred), 1-yPred)\n",
122 | " \n",
123 | " # Find input to the layer, by adding bias to the output of the previous layer\n",
124 | " # Take care, l goes from 0 to 1, while the weights are in reverse order\n",
125 | " if l==len(weights)-1: # If 1st layer has been reached\n",
126 | " xL = addBiasTerms(X)\n",
127 | " else:\n",
128 | " xL = addBiasTerms(outputs[-l-2])\n",
129 | " \n",
130 | " # Calculate deltaW for this layer\n",
131 | " deltaW = -np.dot(delta.T, xL)/len(Y)\n",
132 | " \n",
133 | " # Calculate bpError for previous layer to be back-propagated\n",
134 | " bpError = np.dot(delta, w)\n",
135 | " \n",
136 | " # Ignore bias term in bpError\n",
137 | " bpError = bpError[:,1:]\n",
138 | " \n",
139 | " # Change weights of the current layer (W <- W + eta*deltaW)\n",
140 | " w += learningRate*deltaW\n",
141 | "\n",
142 | "# Evaluate the accuracy of weights for input X and desired outptut Y\n",
143 | "def evaluate(weights, X, Y):\n",
144 | " yPreds = forwardProp(X, weights)[-1]\n",
145 | " # Check if maximum probability is from that neuron corresponding to desired class,\n",
146 | " # AND check if that maximum probability is greater than 0.5\n",
147 | " yes = sum( int( ( np.argmax(yPreds[i]) == np.argmax(Y[i]) ) and \n",
148 | " ( (yPreds[i][np.argmax(yPreds[i])]>0.5) == (Y[i][np.argmax(Y[i])]>0.5) ) )\n",
149 | " for i in range(len(Y)) )\n",
150 | " print(str(yes)+\" out of \"+str(len(Y))+\" : \"+str(float(yes/len(Y))))\n",
151 | "\n",
152 | "# Initialize network\n",
153 | "layers = [2, 2, 1]\n",
154 | "weights = initializeWeights(layers)\n",
155 | "\n",
156 | "print(\"weights:\")\n",
157 | "for i in range(len(weights)):\n",
158 | " print(i+1); print(weights[i].shape); print(weights[i])\n",
159 | "\n",
160 | "# Declare input and desired output for AND gate\n",
161 | "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n",
162 | "Y = np.array([[0], [0], [0], [1]])"
163 | ]
164 | },
165 | {
166 | "cell_type": "markdown",
167 | "metadata": {},
168 | "source": [
169 | "# Batch Gradient Descent\n",
170 | "\n",
171 | "Batch Gradient Descent is how we have tried to train our network so far - give it ALL the data points, compute ${\\Delta}W$s by summing up quantities across ALL the data points, change all the weights once, Repeat."
172 | ]
173 | },
174 | {
175 | "cell_type": "markdown",
176 | "metadata": {},
177 | "source": [
178 | "Suppose we want to train our 3-neuron network to implement Logical XOR.\n",
179 | "\n",
180 | "Inputs are: $X=\\left[\\begin{array}{c}(0,0)\\\\(0,1)\\\\(1,0)\\\\(1,1)\\end{array}\\right]$, and the desired output is $Y=\\left[\\begin{array}{c}0\\\\1\\\\1\\\\0\\end{array}\\right]$."
181 | ]
182 | },
183 | {
184 | "cell_type": "markdown",
185 | "metadata": {},
186 | "source": [
187 | "We know that in order to train the network, we need to call backProp repeatedly. Let us use a function to do that."
188 | ]
189 | },
190 | {
191 | "cell_type": "code",
192 | "execution_count": 4,
193 | "metadata": {
194 | "collapsed": true
195 | },
196 | "outputs": [],
197 | "source": [
198 | "# TRAINING FUNCTION, USING GD\n",
199 | "def train(weights, X, Y, nIterations, learningRate=1):\n",
200 | " for i in range(nIterations):\n",
201 | " # Run backprop\n",
202 | " backProp(weights, X, Y, learningRate)\n",
203 | " \n",
204 | " # Clears screen output\n",
205 | " if (i+1)%(nIterations/10)==0:\n",
206 | " clear_output()\n",
207 | " print(\"Iteration \"+str(i+1)+\" of \"+str(nIterations))\n",
208 | " # Prints Cost and Accuracy\n",
209 | " print(\"Cost: \"+str(nnCost(weights, X, Y)))\n",
210 | " print(\"Accuracy:\")\n",
211 | " evaluate(weights, X, Y)"
212 | ]
213 | },
214 | {
215 | "cell_type": "code",
216 | "execution_count": 5,
217 | "metadata": {},
218 | "outputs": [
219 | {
220 | "name": "stdout",
221 | "output_type": "stream",
222 | "text": [
223 | "weights:\n",
224 | "1\n",
225 | "(2, 3)\n",
226 | "[[ 0.04837515 0.26989845 -0.24049688]\n",
227 | " [ 0.40457749 -1.12764482 1.62391936]]\n",
228 | "2\n",
229 | "(1, 3)\n",
230 | "[[-0.21690785 -0.77508326 0.61363791]]\n"
231 | ]
232 | }
233 | ],
234 | "source": [
235 | "# Initialize network\n",
236 | "layers = [2, 2, 1]\n",
237 | "weights = initializeWeights(layers)\n",
238 | "\n",
239 | "print(\"weights:\")\n",
240 | "for i in range(len(weights)):\n",
241 | " print(i+1); print(weights[i].shape); print(weights[i])\n",
242 | "\n",
243 | "# Take backup of weights to be used later for comparison\n",
244 | "initialWeights = [np.array(w) for w in weights]"
245 | ]
246 | },
247 | {
248 | "cell_type": "code",
249 | "execution_count": 6,
250 | "metadata": {
251 | "collapsed": true
252 | },
253 | "outputs": [],
254 | "source": [
255 | "# Declare input and desired output for XOR gate\n",
256 | "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n",
257 | "Y = np.array([[0], [1], [1], [0]])"
258 | ]
259 | },
260 | {
261 | "cell_type": "code",
262 | "execution_count": 7,
263 | "metadata": {},
264 | "outputs": [
265 | {
266 | "name": "stdout",
267 | "output_type": "stream",
268 | "text": [
269 | "Cost: 0.12907524705\n",
270 | "Accuracy: \n",
271 | "2 out of 4 : 0.5\n",
272 | "[[ 0.43886508]\n",
273 | " [ 0.49374299]\n",
274 | " [ 0.38577198]\n",
275 | " [ 0.4543426 ]]\n"
276 | ]
277 | }
278 | ],
279 | "source": [
280 | "# Check current accuracy and cost\n",
281 | "print(\"Cost: \"+str(nnCost(weights, X, Y)))\n",
282 | "print(\"Accuracy: \")\n",
283 | "evaluate(weights, X, Y)\n",
284 | "print(forwardProp(X, weights)[-1])"
285 | ]
286 | },
287 | {
288 | "cell_type": "markdown",
289 | "metadata": {},
290 | "source": [
291 | "Say we want to train our model 600 times."
292 | ]
293 | },
294 | {
295 | "cell_type": "code",
296 | "execution_count": 8,
297 | "metadata": {},
298 | "outputs": [
299 | {
300 | "name": "stdout",
301 | "output_type": "stream",
302 | "text": [
303 | "Iteration 400 of 400\n",
304 | "Cost: 0.124997811474\n",
305 | "Accuracy:\n",
306 | "3 out of 4 : 0.75\n",
307 | "[[ 0.49895486]\n",
308 | " [ 0.50338071]\n",
309 | " [ 0.49407386]\n",
310 | " [ 0.4984321 ]]\n"
311 | ]
312 | }
313 | ],
314 | "source": [
315 | "nIterations = 400\n",
316 | "train(weights, X, Y, nIterations)\n",
317 | "print(forwardProp(X, weights)[-1])"
318 | ]
319 | },
320 | {
321 | "cell_type": "code",
322 | "execution_count": 9,
323 | "metadata": {
324 | "collapsed": true
325 | },
326 | "outputs": [],
327 | "source": [
328 | "# In case we want to revert the weight back\n",
329 | "weights = [np.array(w) for w in initialWeights]"
330 | ]
331 | },
332 | {
333 | "cell_type": "markdown",
334 | "metadata": {},
335 | "source": [
336 | "It took our function a long time to train.\n",
337 | "\n",
338 | "What if we speed up using adaptive learning rate?"
339 | ]
340 | },
341 | {
342 | "cell_type": "code",
343 | "execution_count": 10,
344 | "metadata": {
345 | "collapsed": true
346 | },
347 | "outputs": [],
348 | "source": [
349 | "# TRAINING FUNCTION, USING GD\n",
350 | "# Default learning rate = 1.0\n",
351 | "def trainUsingGD(weights, X, Y, nIterations, learningRate=1.0):\n",
352 | " # Setting initial cost to infinity\n",
353 | " prevCost = np.inf\n",
354 | " \n",
355 | " # For nIterations number of iterations:\n",
356 | " for i in range(nIterations):\n",
357 | " # Run backprop\n",
358 | " backProp(weights, X, Y, learningRate)\n",
359 | " \n",
360 | " #clear_output()\n",
361 | " print(\"Iteration \"+str(i+1)+\" of \"+str(nIterations))\n",
362 | " cost = nnCost(weights, X, Y)\n",
363 | " print(\"Cost: \"+str(cost))\n",
364 | " \n",
365 | " # ADAPT LEARNING RATE\n",
366 | " # If cost increases\n",
367 | " if (cost > prevCost):\n",
368 | " # Halve the learning rate\n",
369 | " learningRate /= 2.0\n",
370 | " # If cost decreases\n",
371 | " else:\n",
372 | " # Increase learning rate by 5%\n",
373 | " learningRate *= 1.05\n",
374 | " \n",
375 | " prevCost = cost"
376 | ]
377 | },
378 | {
379 | "cell_type": "code",
380 | "execution_count": 11,
381 | "metadata": {
382 | "collapsed": true
383 | },
384 | "outputs": [],
385 | "source": [
386 | "# Revert weights back to initial values\n",
387 | "weights = [np.array(w) for w in initialWeights]"
388 | ]
389 | },
390 | {
391 | "cell_type": "code",
392 | "execution_count": 12,
393 | "metadata": {},
394 | "outputs": [
395 | {
396 | "name": "stdout",
397 | "output_type": "stream",
398 | "text": [
399 | "Iteration 1 of 100\n",
400 | "Cost: 0.128848112614\n",
401 | "Iteration 2 of 100\n",
402 | "Cost: 0.128650869728\n",
403 | "Iteration 3 of 100\n",
404 | "Cost: 0.12848026395\n",
405 | "Iteration 4 of 100\n",
406 | "Cost: 0.128332996448\n",
407 | "Iteration 5 of 100\n",
408 | "Cost: 0.128205816336\n",
409 | "Iteration 6 of 100\n",
410 | "Cost: 0.128095601033\n",
411 | "Iteration 7 of 100\n",
412 | "Cost: 0.127999422128\n",
413 | "Iteration 8 of 100\n",
414 | "Cost: 0.12791459536\n",
415 | "Iteration 9 of 100\n",
416 | "Cost: 0.127838714376\n",
417 | "Iteration 10 of 100\n",
418 | "Cost: 0.12776966891\n",
419 | "Iteration 11 of 100\n",
420 | "Cost: 0.127705648793\n",
421 | "Iteration 12 of 100\n",
422 | "Cost: 0.127645135859\n",
423 | "Iteration 13 of 100\n",
424 | "Cost: 0.127586886153\n",
425 | "Iteration 14 of 100\n",
426 | "Cost: 0.127529905081\n",
427 | "Iteration 15 of 100\n",
428 | "Cost: 0.127473418103\n",
429 | "Iteration 16 of 100\n",
430 | "Cost: 0.127416839401\n",
431 | "Iteration 17 of 100\n",
432 | "Cost: 0.127359740627\n",
433 | "Iteration 18 of 100\n",
434 | "Cost: 0.127301821408\n",
435 | "Iteration 19 of 100\n",
436 | "Cost: 0.127242882839\n",
437 | "Iteration 20 of 100\n",
438 | "Cost: 0.127182804686\n",
439 | "Iteration 21 of 100\n",
440 | "Cost: 0.127121526616\n",
441 | "Iteration 22 of 100\n",
442 | "Cost: 0.127059033379\n",
443 | "Iteration 23 of 100\n",
444 | "Cost: 0.126995343612\n",
445 | "Iteration 24 of 100\n",
446 | "Cost: 0.126930501761\n",
447 | "Iteration 25 of 100\n",
448 | "Cost: 0.126864572508\n",
449 | "Iteration 26 of 100\n",
450 | "Cost: 0.126797637148\n",
451 | "Iteration 27 of 100\n",
452 | "Cost: 0.126729791334\n",
453 | "Iteration 28 of 100\n",
454 | "Cost: 0.126661143778\n",
455 | "Iteration 29 of 100\n",
456 | "Cost: 0.126591815524\n",
457 | "Iteration 30 of 100\n",
458 | "Cost: 0.126521939537\n",
459 | "Iteration 31 of 100\n",
460 | "Cost: 0.126451660431\n",
461 | "Iteration 32 of 100\n",
462 | "Cost: 0.126381134199\n",
463 | "Iteration 33 of 100\n",
464 | "Cost: 0.126310527861\n",
465 | "Iteration 34 of 100\n",
466 | "Cost: 0.126240018977\n",
467 | "Iteration 35 of 100\n",
468 | "Cost: 0.126169794983\n",
469 | "Iteration 36 of 100\n",
470 | "Cost: 0.126100052304\n",
471 | "Iteration 37 of 100\n",
472 | "Cost: 0.126030995231\n",
473 | "Iteration 38 of 100\n",
474 | "Cost: 0.125962834522\n",
475 | "Iteration 39 of 100\n",
476 | "Cost: 0.125895785707\n",
477 | "Iteration 40 of 100\n",
478 | "Cost: 0.12583006709\n",
479 | "Iteration 41 of 100\n",
480 | "Cost: 0.125765897414\n",
481 | "Iteration 42 of 100\n",
482 | "Cost: 0.125703493213\n",
483 | "Iteration 43 of 100\n",
484 | "Cost: 0.125643065835\n",
485 | "Iteration 44 of 100\n",
486 | "Cost: 0.12558481818\n",
487 | "Iteration 45 of 100\n",
488 | "Cost: 0.125528941182\n",
489 | "Iteration 46 of 100\n",
490 | "Cost: 0.125475610101\n",
491 | "Iteration 47 of 100\n",
492 | "Cost: 0.125424980703\n",
493 | "Iteration 48 of 100\n",
494 | "Cost: 0.125377185428\n",
495 | "Iteration 49 of 100\n",
496 | "Cost: 0.12533232967\n",
497 | "Iteration 50 of 100\n",
498 | "Cost: 0.125290488278\n",
499 | "Iteration 51 of 100\n",
500 | "Cost: 0.125251702426\n",
501 | "Iteration 52 of 100\n",
502 | "Cost: 0.125215976974\n",
503 | "Iteration 53 of 100\n",
504 | "Cost: 0.125183278403\n",
505 | "Iteration 54 of 100\n",
506 | "Cost: 0.125153533406\n",
507 | "Iteration 55 of 100\n",
508 | "Cost: 0.125126628143\n",
509 | "Iteration 56 of 100\n",
510 | "Cost: 0.125102408083\n",
511 | "Iteration 57 of 100\n",
512 | "Cost: 0.125080678309\n",
513 | "Iteration 58 of 100\n",
514 | "Cost: 0.125061203999\n",
515 | "Iteration 59 of 100\n",
516 | "Cost: 0.125043710737\n",
517 | "Iteration 60 of 100\n",
518 | "Cost: 0.125027884097\n",
519 | "Iteration 61 of 100\n",
520 | "Cost: 0.125013367839\n",
521 | "Iteration 62 of 100\n",
522 | "Cost: 0.124999759817\n",
523 | "Iteration 63 of 100\n",
524 | "Cost: 0.124986604465\n",
525 | "Iteration 64 of 100\n",
526 | "Cost: 0.12497338044\n",
527 | "Iteration 65 of 100\n",
528 | "Cost: 0.124959481574\n",
529 | "Iteration 66 of 100\n",
530 | "Cost: 0.124944188837\n",
531 | "Iteration 67 of 100\n",
532 | "Cost: 0.12492663033\n",
533 | "Iteration 68 of 100\n",
534 | "Cost: 0.124905725647\n",
535 | "Iteration 69 of 100\n",
536 | "Cost: 0.124880110131\n",
537 | "Iteration 70 of 100\n",
538 | "Cost: 0.124848033934\n",
539 | "Iteration 71 of 100\n",
540 | "Cost: 0.124807230659\n",
541 | "Iteration 72 of 100\n",
542 | "Cost: 0.124754751262\n",
543 | "Iteration 73 of 100\n",
544 | "Cost: 0.124686761318\n",
545 | "Iteration 74 of 100\n",
546 | "Cost: 0.124598303179\n",
547 | "Iteration 75 of 100\n",
548 | "Cost: 0.124483025338\n",
549 | "Iteration 76 of 100\n",
550 | "Cost: 0.12433286859\n",
551 | "Iteration 77 of 100\n",
552 | "Cost: 0.124137651121\n",
553 | "Iteration 78 of 100\n",
554 | "Cost: 0.123884387194\n",
555 | "Iteration 79 of 100\n",
556 | "Cost: 0.123556010954\n",
557 | "Iteration 80 of 100\n",
558 | "Cost: 0.123129051477\n",
559 | "Iteration 81 of 100\n",
560 | "Cost: 0.122569925578\n",
561 | "Iteration 82 of 100\n",
562 | "Cost: 0.121830095196\n",
563 | "Iteration 83 of 100\n",
564 | "Cost: 0.120841446193\n",
565 | "Iteration 84 of 100\n",
566 | "Cost: 0.119514949723\n",
567 | "Iteration 85 of 100\n",
568 | "Cost: 0.117748246786\n",
569 | "Iteration 86 of 100\n",
570 | "Cost: 0.115450497266\n",
571 | "Iteration 87 of 100\n",
572 | "Cost: 0.112589941634\n",
573 | "Iteration 88 of 100\n",
574 | "Cost: 0.109249438151\n",
575 | "Iteration 89 of 100\n",
576 | "Cost: 0.105640175411\n",
577 | "Iteration 90 of 100\n",
578 | "Cost: 0.102027696196\n",
579 | "Iteration 91 of 100\n",
580 | "Cost: 0.0990064970213\n",
581 | "Iteration 92 of 100\n",
582 | "Cost: 0.123641875887\n",
583 | "Iteration 93 of 100\n",
584 | "Cost: 0.206124967964\n",
585 | "Iteration 94 of 100\n",
586 | "Cost: 0.128853866919\n",
587 | "Iteration 95 of 100\n",
588 | "Cost: 0.100914621849\n",
589 | "Iteration 96 of 100\n",
590 | "Cost: 0.0954172210932\n",
591 | "Iteration 97 of 100\n",
592 | "Cost: 0.0925797728969\n",
593 | "Iteration 98 of 100\n",
594 | "Cost: 0.0909633907318\n",
595 | "Iteration 99 of 100\n",
596 | "Cost: 0.0897659003613\n",
597 | "Iteration 100 of 100\n",
598 | "Cost: 0.0886726139317\n"
599 | ]
600 | }
601 | ],
602 | "source": [
603 | "# Train for nIterations\n",
604 | "# Don't expect same results for running with 20 iterations\n",
605 | "# as with running twice with 10 iterations - learning rates are different!\n",
606 | "nIterations = 100\n",
607 | "trainUsingGD(weights, X, Y, nIterations)"
608 | ]
609 | },
610 | {
611 | "cell_type": "markdown",
612 | "metadata": {},
613 | "source": [
614 | "We see that with adaptive learning rate, we reach the desired output much faster!"
615 | ]
616 | },
617 | {
618 | "cell_type": "markdown",
619 | "metadata": {},
620 | "source": [
621 | "# MNIST Dataset\n",
622 | "\n",
623 | "MNIST is a dataset of 60000 images of hand-written numbers."
624 | ]
625 | },
626 | {
627 | "cell_type": "code",
628 | "execution_count": 13,
629 | "metadata": {
630 | "collapsed": true
631 | },
632 | "outputs": [],
633 | "source": [
634 | "# Load MNIST DATA\n",
635 | "# Use numpy.load() to load the .npz file\n",
636 | "f = np.load('mnist.npz')\n",
637 | "# Saving the files\n",
638 | "x_train = f['x_train']\n",
639 | "y_train = f['y_train']\n",
640 | "x_test = f['x_test']\n",
641 | "y_test = f['y_test']\n",
642 | "f.close()"
643 | ]
644 | },
645 | {
646 | "cell_type": "code",
647 | "execution_count": 14,
648 | "metadata": {},
649 | "outputs": [
650 | {
651 | "name": "stdout",
652 | "output_type": "stream",
653 | "text": [
654 | "x_train.shape = (60000, 28, 28)\n",
655 | "y_train.shape = (60000,)\n"
656 | ]
657 | },
658 | {
659 | "data": {
660 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlMAAACNCAYAAACT6v+eAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3WeAFMX29/HvkhQDIKBEI4JgJoMiiChiAgVFURQxoKhg\nRBC9JpJIUBAEAdM1Z1BQMSCgcPUa0D+KRHPEAIogK2GfF/2c6tnd2djTM917f5836M7sTNX2THf1\nqVOnsnJychARERGR0imX6QaIiIiIxJkGUyIiIiIBaDAlIiIiEoAGUyIiIiIBaDAlIiIiEoAGUyIi\nIiIBaDAlIiIiEoAGUyIiIiIBaDAlIiIiEkCFdL5ZVlZWrMut5+TkZBX1nLLex7LeP1Af40B9LPv9\nA/UxDtRHjyJTIiIiIgFoMCUiIiISgAZTIiIiIgFoMCUiIiISgAZTIiIiIgFoMBVTzZs358EHH+TB\nBx9k27ZtbNu2zf1/s2bNMt08EYmhCRMmkJOTQ05ODkuXLmXp0qXsvffemW6WSCjefPNN5s2bx7x5\n8wK/lgZTIiIiIgGktc5UGMqXL0/VqlXz/fyKK64AYKeddgLggAMOAODyyy9n7NixAPTq1QuAzZs3\nc8cddwBw2223hd7mIA4//HAAXn/9dapUqQJATo5XwuPcc88FoGvXrtSoUSMzDUyTTp06AfDYY48B\n0KFDB1asWJHJJqXETTfdBHifw3LlvHudo48+GoAFCxZkqllSiF133ZVddtkFgJNOOgmA3XffHYDx\n48eTnZ2dsbYV1z777ANA79692b59OwBNmjQBoHHjxnz99deZalrKNGrUCICKFSvSvn17AO69914A\n1+eCzJo1C4CzzjoLgH/++SesZqZExYoVOeKIIwAYOXIkAEceeWQmmxQpd911FwBHHHEE//73v1Py\nmrEYTO21115UqlQJwH1A2rVrB0C1atXo0aNHka/x3XffATBx4kROO+00ADZs2ADAJ598EvkLVatW\nrQB47rnnAKhataobRFk/7Ateo0YN2rRpA8BHH32U67Ew2QmqRo0avPDCC6G+V8uWLQF4//33Q32f\ndDn//PMBGDx4MJD75G7HWaLBBh52rNq2bcvBBx+c9Ll16tRh4MCB6Wpaqf3yyy8ALFy4kK5du2a4\nNalx0EEHAf5364wzzgCgXLly1K1bF/C/Z0V9x+xvMnXqVACuuuoq/vzzz5S3OVWqVq3KW2+9BcBP\nP/0EQO3atd1//6+yoMmll14KwJYtW3jzzTdT8tqa5hMREREJINKRKZvSmjdvXtKpvOKwOw+bPvnr\nr7/c1NCPP/4IwLp16yI5RWRTlM2aNePRRx8FvDvdvFatWgXAnXfeCcCTTz7JokWLAL/fo0aNCr29\nNh3VsGHDUCNT5cqVY9999wVwybFZWUVW+48068eOO+6Y4ZaUXuvWrenduzfgTbuCHx0AuO666wD4\n4YcfAC+6bJ/r9957L51NLbHGjRsDXkTinHPOAaBy5cqA99n79ttvAT9KbFNkPXv2dFNJy5cvT2ub\nS2Ljxo0AZWI6z9g578QTT0zZa5533nkA3H///e4cG3W1a9d2//6vR6ZsxqZixYoAvPPOOzz99NMp\neW1FpkREREQCiHRk6ptvvgHgt99+K1Zkyu5u169fT8eOHQE/V+iRRx4JqZXhue+++wA/Ub4gVgrB\nkmAXLFjgokSHHnpoeA3Mw+7a/vOf/4T6PnXq1OHiiy8GcJGNKN/1F+bYY48FYMCAAbl+vnz5ck4+\n+WQAfv7557S3qyTOPPNMwFtWX7NmTcCPFM6fP98lY48ZMybX72VlZbnHLLE3Kux8M3r0aMDv4667\n7prvuatWreL4448H/Dte+zzWrFnT/U2irFq1agAcdthhGW5J6rz++utA/sjU2rVruf/++wHcIo/E\nHEXLy7XoatzFPWpfkPbt23PjjTcC/jXy999/L/D5vXr1crmNa9asAfxoeSpEejBlf5hBgwa5C8uS\nJUsAL5HcfPzxxwAcd9xxgBeytumFK6+8Mm3tTZXmzZsD/sqgxC+DJcq/9NJLblWiTZvY32bdunUc\nc8wx+X43bHZiCtuMGTPcf9sUZxy1a9eOBx98ECDfzcKYMWMiO+VSoYJ32mjRogUA06dPB7xp6YUL\nFwIwbNgwwAuj77DDDgAunN65c2f3Wh988EF6Gl1CtkjloosuKvA5dkI+7rjj3DTf/vvvH37jQmAp\nBXvttVe+x1q2bOkGh1H9TCYzZcoUAGbOnJnr51u2bCl0ustWSX/66acALlk98bWi+rlNxpLr45xC\nkMy0adNo2LAhAAceeCDgnW8KMnToULfK3W7GP/nkk5S1R9N8IiIiIgFEOjJlZs6c6SqUWoKnhaMv\nvPBCF6GxJEqAzz77DIB+/fqls6mBJNaQAnLVkXrllVcAP5zZoUMHl1xukRpb3vzJJ5+4sLVFt5o1\na+bKJKSaTSXWqlUrlNfPKzGKY3+rOOrTp0+uu17wpsWAlNU+CYMlmSdGCME7FjYdlrhs3H6WGJEC\nr1zJww8/HGZTS82W0ef11VdfuXIcVhrBolLgJ57HjUW3H3roIW699dZcj916662sX78egEmTJqW7\naaW2detWIPfxKQ6bst1tt93yPWYlduJQOyyvFi1a8O6772a6GSmzadOmYkXd7Lq69957u+tiGFE6\nRaZEREREAohFZArIVyDtjz/+cP9t859PPfUUUHQ12yhq1KgRgwYNAvzIy6+//gp4JRzsDv6vv/4C\nYM6cOcyZM6fI17Xl29dee61b0p1qluBp7xUWi3xZWQSA77//PtT3DIMlJF9wwQXus2p3/sOHD89Y\nu4pj2LBhDB06FPBzMWzp/0033ZS0kKElieY1cOBAF02NGjunWGT7tddeA2D16tWsXbu2wN9LV3Q2\nLMOGDcsXmfpfYYsg7NgnO5/dfPPNaW1TaW3dutVdI+160qBBg0w2KWUsH/OQQw7h888/B5LnPu28\n886AH0HeaaedXGTu2WefTXm7FJkSERERCSA2kam87O6pefPmbgmrLTO3u8g4sJVOY8eOdREeywuz\nUgMffPBB4KhPslU6qWL7HhrLV0s1y42rVasWK1euBPy/VRzYNiS2JVCie+65B8BtARE1dkc+dOhQ\nV25k7ty5gH/n9/fff7vnW05C586d3WfPVpZa9M32O4siyyEqaZSmbdu2IbQmvZKVCyirLFo/ZMgQ\ntxLTylskshXjW7ZsSV/jAli/fj1vv/02gFsJH3d77rkn4EcOt27d6vbgTRbhHj9+PODnP/7www+h\n7k8Y28GUJZtffPHFLrHalmi/9dZbbunq5MmTgejub9a0aVMgdy2Ubt26AfHd2DYV++VVqVKFLl26\nAH7Cc2ICs4V6bXosDqw/ibW/bF+oCRMmZKRNRbH6Q5dddhngfY9sEHXqqafme75dkGyXASvzAX5o\n3Sr1x5XttWfTCIkOOeSQXP+/ePHi0OuupVpx96uLOrt5sQ3g7WY7ke3xmqyvNmU9ZMgQXn75ZSD3\nDYOkh9WGsl01LE3innvuSXqNtNpRtiejGTFiRIit1DSfiIiISCCxjUyZNWvWuBGoFUA899xz3d2I\n3T3aUnPbjy8qLBSZlZXlRtmpiEhlMlRfvXr1pD+3chY23WN3ivXr16dSpUqAH3YvV66cuwu0yva2\nHLlChQp8+OGHIbU+HKeeeqrbsdy888479OnTB8i9oCJK7LgkVvG2yMwee+wBQN++fQHo2rWru4u0\navw5OTnurt+q1SeWMIk6K2ZpRQFvueWWfBW1y5Url+97ZtOEffv2Zdu2bWloqSQ6+OCDefHFF4HS\npzjYNNm0adNS1q5MsoKVcWCFgXv37l1gtfq2bdtyww03AP51tHr16m5az64zdu23HUXCosiUiIiI\nSACxj0yBP5dqW4uMHz+eTp06ATBy5EjAK9gF3rxpFJbTW1KgFRTLyclxd1KpkDfvwRIow2ARJHuv\nqVOnuuXziSxXyO4YrKjepk2bWLZsGQAPPPAA4CXdW4TO9qazgnmVK1eOzV58hSWdf/HFF5Hfd8+S\nzS3Bc/fdd+fLL78EkueZWETG8k3q1KnjSny89NJLobc3FSpWrOhyGe241alTB/A+69ZHy4Xq0qWL\ni2AZu7Pu3r27y4ezv6Wkh51nCttSq7AIvp2jTzjhBFc0Oc66du2a6SYUm5WpmDFjhjvP2DFavXo1\n4BUhtS2tLM+4Xr167rtq56wLLrggLW0uE4MpY3sp9ezZk1NOOQXwp/4uueQSABo2bOj28MskW51n\n0yhr1651dbJKy1YGJq5AssrxFg4NgyUn275dtlFoXrZxte1vZTVCiqrKa7V+bFPcL774ImCL08dW\nuiU7Weed9osiS/C3ZPPZs2e7aVzbm85W5T300ENuP80nn3wS8AYh9t9RZ9/FLl268Pzzz+d67Lbb\nbgO879OiRYsAfzp73rx5bnrT2Gd11KhR+T73Ua+enWyA0b59eyA+FdA//fRTt9m7LWCxhRObN29O\n+jsXXnghkH/T8biylcFxWs1nuyXYdXvLli3uHHT22WcD3t6zAOPGjXMr+W1QlZWV5QZflppgFfCP\nPvpod84Kg6b5RERERAIoU5Eps379eh555BHA3z/Mwu7t27d3dyy2D1oUZGdnlzo53iJStlffoEGD\n3JTYuHHjAL9yephGjx4dyuvalK1JNmUWNTZ9m3c/OvAjOStWrEhrm4KwRQAWcSmIRTDsjnH79u2R\njyRaXSGLPtlOBICb3rE6YOvXr3d/A1suf8ghh7gpPCv7YJGqbt26uTIRb7zxBuB9T+zu2oQ5DV9S\nyUojdO/eHfAT8W1aPsosUl7cJfEW0S8rkSmLiJqKFSu6dBf720SNzSBZ24cPH+6iVHkNGDDAJZUn\nq+9m07sWoQszKgWKTImIiIgEUqYiU5bgfPrpp9OyZUvAj0iZZcuWsXDhwrS3rSilST636IfdSdt8\n86xZs+jRo0fqGhcxtuAgyqwKf+LO85YblreYXFliuYCJ0Y0o50yVL1/eFYC1Yn8bN25kyJAhgJ/7\nZXkbLVq0cHlDlqS+atUq+vfvD/h3wVWqVAG8/EEr92EJwK+//rp7f8vnSNxvMtOmTp0K+FGCRJa/\neNVVV6W1Telw/PHHZ7oJKWULfExWVpabxYgqi9pbzqJ9P5KpWbNmvlzFXr16udxpY7M0YVNkSkRE\nRCSA2EemDjjgALc/j83r165dO9/zrHDejz/+GIk9p/Iu2z311FO58sori/37V199Nf/6178Af1dw\ny82wPf0kc6xAXuJn7d577wXSk7+WKbZiKi769evnIlKbNm0CvIiMRRbbtGkD+IVJTzjhBBd9u/32\n2wFv5VHeO2grDfHqq6/y6quvAt5dM/irksD7HkdNXMqOJLK8N8tRnDdvXom2funbt29kt3QqLYvy\n2PFs3LixiyjaCuyoKc4xsOvdGWec4SLAlg/19NNPh9e4IsRuMGUDJTsxXXHFFa6WTzK2R58lIaay\nllMQltxp/9auXZuJEycCfq2l3377DfBO6FbR3aqI169f3yXp2QXMLtZllQ08GzVqVGQ5hUyxZElb\nXp5o8eLF6W5O2sVtqsQ2cAZvyg+8aXNLRra9BhPZY6NGjQIodoXzJ554Ite/UWXJ9paI3aBBA/eY\n3fDZc8JO6i2Odu3aceONNwK4sjf77rtvoVNEVtbCqtmPHz8+X60wG4wVVEohLuzGoF69elxzzTUZ\nbk1wNhDs378/a9euBeCYY47JZJMATfOJiIiIBBKLyFStWrXcklxL/mzcuHGBz3/vvfcYM2YM4Ic6\nozC1V5jy5cu7Ebclj9tUQcOGDfM9f/HixS7ZNfHuuiyzKF6yqE8UHH744W6/Qfu82ZL5yZMnR77a\neSrst99+mW5Cifz000+u1IEl51r0F/zyB7ZoZebMmXz11VdA8SNScfXZZ58BuY9pFM+jkyZNypeI\nfP3117Nhw4YCf8ciWM2aNQNyl4GwkjlTpkwB/EUFcZeTkxPrKvxW1uGiiy4CvP7YvonpSjIvTDSv\nSiIiIiIxEcnIlM1nW0Guww8/vNA7XstFsQKVc+fOLVHyYSbYvl7vv/8+gCvlAH5eWK1atdzPLH/K\nlmqXJFm9rGnbti0PPfRQppuRT7Vq1fItfrB9IC3Juax7++23gcL3PIuS9u3bu61yLEqxdu1al7do\nxTXjfEdfWnbXb1tzxYmVqiiutWvXur0j7dwa91ypvKpUqeL2sItDeZm8rKSIRageffRRbrnllkw2\nKZfIDKZat24NeMmfrVq1AryEuYLYypuJEye6zYw3btwYcitTx8KStgLxkksucRXM85owYYILOdsm\nj/+LCtuwVKLBarzYpuP77befS2C2jUejZMOGDW63BPtXPFbl/PPPP6dJkyYZbk3Bzj//fJcs36dP\nnyKfv2bNGnf9sMH/tGnT8tUnKit69uwJeLts2H6ocWSLe6wunKXwRIWm+UREREQCyEpMvAv9zbKy\nCnyzO+64A8i9L5ZZtmwZs2fPBvyqrjalZ5WJ0yEnJ6fI0EhhfYyDovqYif5ZxXCbepk+fXrS6szF\nEeYxrF27Nk899RTgLdcG+PLLL4HkS+zDEoXPqR2zGTNmsGDBAsBfap+Kfd2i0MewRfG7mEqpPIa2\neMA+d8OHD3e7D8ycORPwp4lmzZrFTz/9VPIGl0IUPqeWGtKkSRNXhT+Ve/NFoY9hK04fFZkSERER\nCSAykak40Ai87PcP1MdUsMrETz/9tCsXYfttWTXxIDmOUehj2PRdVB/jQH30KDIlIiIiEoAiUyWg\nEXjZ7x+oj6lUpUoVt5WTLVc/9NBDgWC5U1HqY1j0XVQf40B99GgwVQL60JT9/oH6GAfqY9nvH6iP\ncaA+ejTNJyIiIhJAWiNTIiIiImWNIlMiIiIiAWgwJSIiIhKABlMiIiIiAWgwJSIiIhKABlMiIiIi\nAWgwJSIiIhKABlMiIiIiAWgwJSIiIhKABlMiIiIiAWgwJSIiIhKABlMiIiIiAWgwJSIiIhJAhXS+\nWVZWVqx3Vc7Jyckq6jllvY9lvX+gPsaB+lj2+wfqYxyojx5FpkREREQCSGtkSkRya9SoEQCvvvoq\nAOXLlwdg7733zlibRESkZBSZEhEREQlAkSmRDLnnnns488wzAahevToAs2fPzmSTRKQM22+//QAY\nNWoUAKeddhoAhx56KMuXL89Yu8oCRaZEREREAohtZOrAAw8E4OSTT6Zfv34AvP/++wAsWbLEPe/u\nu+8G4J9//klzC0Vyq1WrFgDPP/88AG3atCEnx1vk8umnnwJw4YUXZqZxIlKmHXHEES4385dffgFg\n8uTJAPz8888Za1dZociUiIiISABZdmecljdLQa2JSy65BICxY8cCsMsuuxT6/GOOOQaAt956K+hb\nq54Gyftnx8DyfzZv3kzz5s0B2HXXXQE455xzmD9/PgDff/99ga//008/ATBr1iw++OCDkja/SJk6\nho0aNXKf2RNPPNHehyFDhgC4vsbxc5qV5b3dE0884fpmkePvvvsuVW+Ti76Lqe3fueeeC0Dnzp05\n/PDDATjggAPc4++++y4Ap5xyCgB//PFH4PeMyzHceeed3bmrbt26ABx55JF89dVXRf5uFPp40kkn\nAfDss88ydepUAG688UYANm3aFPj1o9DHsBWrj3EbTFmi7ueffw7AHnvsUejz169fD/gX+tdee63U\n760PTfL+3XnnnQBcd911KWvH9u3bWbZsGeBdpBP/Lc5JrCCZOoZt2rThnXfeyfs+9O7dG/D7lgrp\n7uNOO+0EwIoVK6hXrx6Am3qfMWNGqt4mF30Xg/WvZs2agH98bJC0fv16Fi9enOu5Rx99NDvvvDOA\nS1K2wXIQUTqGdevWZffdd8/1s3Xr1gHQsWNHHnzwQcD7jAO0atWKDRs2FPm6mezj/vvvD8Ann3wC\nwNtvv+1udrZv356y94nScQyLinaKiIiIhCx2Cei///47ALfccgsA48aNc3fG33zzDQB77bWXe361\natUA6NKlCxAsMhUnVvSxcuXKAPTq1Yv+/fvnes6cOXMA6Nu3b6D36t69e4GP/fbbbwD83//9X4HP\nWbFihZtSsOPVtGlTDj74YABGjBiR6zWCRKbSzYpyPv744246zHTv3p1Zs2ZlolkpZVMFq1atcpGp\nvHf5Zdm1115LpUqVAGjSpAngTWsbi+YcdNBB6W9cASwReZ999gH86PKYMWPcOdY0btyY//73v4D/\neb755psBuP3229PR3JSw88nAgQPzFcVt1KhRrusGwB133AF4UTj77lqKgh3vqNpxxx1d1HHp0qUA\n9OzZM6URqSiwmSqbeRo6dKibijU33XQT4JeDCIsiUyIiIiIBxC5nKq+PP/6Yww47DPCXl9sdSKIG\nDRoA8MUXX5T6vaI+N3zssccCXsSjV69eAFStWhWAZMd55cqVgH83/f+fV+I8Dfvb2l2rvS74UYsf\nf/yxWH2whPWlS5fmu1OcPn064C9CKI10H8Nhw4YBcMMNN/DKK68AcOmllwKFJ+IHkanPaY8ePXjm\nmWcAePTRRwE477zzUv02QOb62KFDB3d+6dChA+AVPswbdUxk0YDVq1cDxc83Citn6rjjjnORqaef\nfhrAnS8KYhEou8v/+uuvAdh3331L0wQg/cdw4MCBANx11135HsvOznafXVu0lBjhsONrn2f7fBcl\nU5/TMWPGcMUVVwDQsGFDoOwtBmnTpo07lq1atbK2FPj8Rx55pNSzMMXpY+ym+fIaPny4W5lgq1CS\niXpYtjQsjHvIIYcA0LJly3zPsSTJxx57zNXhsmTnzZs3p6Qda9asyfVvECeffDKQe6o2Ozsb8AdT\ncWBJvPaZ/Oqrr7j66quB8AZRmWZTQeBNKQAMHjy42APpqKhTp477jljFaFO1alWXjG0X2A8//JBm\nzZoV+HrlynkTAPZ7mVahQgU3sHvyySeL9TvPPvss4A+mdtxxRwCqVKnCn3/+GUIrU+fWW28FYNCg\nQe5nDz/8MODXWxo7dqz7b/vOzp07F/CS9e0x+ztE1Q477ABA79693QrEsAZRmWKLJ6ZPn+4CAXZ8\nZs6c6VInbOB7xhlnAN7gy8YBYdSd1DSfiIiISACxj0w9++yzbsm5JZdbpCbR8OHDATj99NPT17gQ\n1KhRA/CS6S644ALAT8r/8MMPAS9x0qY8//77b8BPzo+iSpUqMXHiRCD5tFDbtm0Bb0o36rp16wZA\n69atAT/s/Mwzz6QsEhhlFq2xO8CuXbty3333ZbJJxWbT5NOnT2fPPfcs8vk2Xffrr7+6u2WbGrKl\n9PXr13fPt1IfmfbWW2/RtGlToPh1hiw6bKya/9lnn+1qF0WVRQRtMc7XX3/tZjMSo6ZWSmDo0KGA\nv4hi48aNLroV9e/w9ddfD3i1/6yPZY1Fnpo0aeKu+VbyIdGqVasA/3tdv359F8mychGppMiUiIiI\nSACxj0ydc845LgE9WeK5yVswMa7+9a9/Ad4ebvfccw/gV7P966+/Mtau0ujYsSPgVV8+//zzcz22\nZcsWlzAal93Mq1WrxlFHHZX0sXXr1hWau3DllVcC5IqIpLIIarrkTQCNU66i3dUni0pZZGbw4MGu\nGrgVcAS/BIgdx8SIlJXysCrjmVaa6Iot3Pnss88Av8yDJTdHmeU5WXmcAw880JU9uOyyywAvF278\n+PGAXzHcIv4jRoxgypQpaW1zaXXu3BmARYsW8dFHH2W4NeGw2RagRKVl/vzzT3799dcwmgQoMiUi\nIiISSOwiU40bNwbghRdeALx57goViu7Giy++GGq7wmDFSAcPHuzuaq+66irAy3uw1SZRn8fPy5ax\n2nx3+fLl8z0nJyfH5Xlt27YtfY0LYNu2bW5PQlvBZcviFy5cmO/5troPYMCAAQC5iglee+21gB/l\nKKurADPN7ubbtGmT7zH7DNr3b9GiRYW+VmJEytjdc5h3xWHbsmULAFu3bs1wS0rOci0tonjggQe6\n8gfHHXcc4JVLyFuK5bbbbgNwMwBR1q5dO8D/DCfLGwZvayDwV79ZpDFOLC8zKyvLbfljq0sbNGjg\nZjnsXGz7vfbq1SvUc2jsBlOWQGb1TYozkAL/wmUXrTiwZciDBw929WBsABK3AVQiWzafbBBlKlWq\n5Cq02ybAL730EuANpC3BPko6dOjgpvlsEGUX48QLqS29Puqoo+jatWuu19i4cSPgLWe2qvA2TXHW\nWWe5+j6SOjZotZsX8Etb2AW1sEHUbrvt5qaQ2rdvn+uxxYsX8/LLL6e0vZlgS+7tomWKsz9dptkU\nbWIJB1so8NxzzwHehdmmqO+//37AW2YfF7bHp+1Z++WXX7rHbHAxbtw4dtttN8D/m1gqweTJk9PV\n1MBsijknJ4drrrkG8L/DNoAC73wJ6StnoWk+ERERkQBiF5my6T1LFh09enS+u6Vk6tSpE2q7wnDD\nDTcA3gg81YU2M+n5558H/Chjy5Yt3dLyZFq0aJHr31tuuYW7774b8PcUW7t2bWjtLYpVbU+sBv3D\nDz8AXtVd8KpfW4V4Kx7YrVs3F7GyiOO4ceMALyF23rx57r/jwkLw6dxZIahp06YBfjHAP/74g7PP\nPhvwpwgKc+mll7pK98amT3r27Fms14g628PPoqXGKqknqlmzplsUZGVNrLp4YtJ+uhUV1bUI4tix\nYwH49ttvQ29TqliZHPvcZmdnu8Ufto/tJZdc4lJDrJSAlfBYs2ZN0mMZRbbYY9ddd3XXhMTzjpX7\nSHcpEkWmRERERAKIXWTKWJHHVatWUa1atVyPVahQgUmTJgHedgdxZdtztGjRwvXHloW+/vrrGWtX\nUJaPYkuQ99prLxcVsGKA3bt3d3dbefc9K1eunJsrtznyTp06ZWxHdEv+TNzzy7a+sT3NatWq5e54\n7a5ww4YNLhfOchdsqfnUqVNdPsqbb74JFH1nHQVxikgZy5uxf4vrlFNOAeDmm292P7MEbStkGeeo\nlOVJ1a9fnyOOOCLpc6ZOneqKBduWOtWrV3flJewzbAUx85ZASQfLzbR8xmT7KM6ZM8cdzzix/CHL\nHU5cIGAtR4t3AAAJHklEQVTHwyJOiblDTz31FOCfu2644YbYRKasz23atHELPqw/4M98pDsyFfuN\njgt4H1ex1k50tm9cp06dSn1RCnNDx9atW7NkyRLA3zeoevXqgLdBp9WXslpSrVu3DqX+Ulibq5bG\nOeecA/iLBmwVYDJDhgxxU36FCeMYDh48GPDq0Zi8CyMWLVrkqqKbTp06sWDBAsBfhZNYD82mMkta\nbypTG4/uueee+b5bHTt2dH1MpShsOm6rTBPPoVa3yKYOgwjru1i5cmX22GMPwL/g2ufPVrmBn2xu\nF69ktm3blq9+2kMPPeQWj9g0ttXaSpSuY2hTjN27dy/wOXPmzMm3GCQVwu5jp06dAP/m2qryL1++\n3KUf2HSfTY8lsucvXbq00AVBhcnkd9FqS1pF85ycHNenlStXpux9itNHTfOJiIiIBBDbab7CVKpU\nKVfoHfw6KVGpWWQJ8bNnzwa8qS4r3/Doo48CfgXeSZMmucjULrvsAvhRq7LsscceA/wQ7htvvAHk\nX34O/jRCJtg0c1ZWVr6KvFYGYZ999nHTC7aMd8GCBS4p/fHHH3evYc+xyFScWUS4LBk5ciSQv5YY\nEEoULijbk86i9aeccoqr15eMlRCwKbqtW7fmi7TOmDED8Kb5olhpu27duvTt2xeAHj16AH4E8aOP\nPnKRDHuOReriLrGOUnHKVhS2K0McWD2tZN/FdFNkSkRERCSAMhmZGj58eL6fWSG2qIzE7W7OEuQH\nDx7sIlJ52X5f4Ednoli0MiyWVGmJrskiU6mcHy+tnJycAhOwt2/f7h479NBDAa+gp+WlWJE9S5L9\n448/wm6ulEKlSpVo2rQp4N8F5+TkuO+o7VQfJVZ80qp9Z2dnu5wm+9xZRDU7O9vlN9m5cvny5S6C\nanv02QKQqO4H2qlTJ7f4w1gR5EmTJnHqqacCfmQq3cnKqZJYDbw0OnToAMSj+GoytiDLvovz5893\nOcfppsiUiIiISACRjEzVqFED8AuKPfHEE65oZWEsD6lfv375HrPlklFhpR3sbmnixInuZ8buchs2\nbOhWSVkhz8StEeKgTp06XHzxxQBuFaKVBSiKrTKxQoCJLGpl+25lgt3VDxo0iG7dugH+6ijLmbKV\nNQDnnXce4N1N2mony2cpa/vv2fL6uLOtZnr37u0iPOaJJ55w+X2ZzNkoiO09aFGo7t27u/3qkrH8\nqNGjRwNQr149VxTXtoKKakTK9p5LPJfaKj2L6teuXTtfTm2y1YZxYNHukq7Kr1ixIuAVnAW/uHCc\nNG7cmAsvvBDw9xqcMmVKxo5lJAdT9kWwuh+NGjVyFaXtYrN69WrAqzNkIWirip5YW8oqStvvR8Wo\nUaMAPzG+adOmHHvssbmeY/sozZkzxy2Pt37HRe3atQGv1oklC1q/imI1p2xKIXHZtrG9qBJLCqSb\nHcNNmza5i67t5VbYSS6xztQrr7wScisz48QTT4zFRrEFsUGw1Q07/fTT3WO2YGTSpEmRHEQZ+wyu\nX78eKDxFYMcdd3SlBKwOXHZ2ttvnLIrJ5olsoFu1alW3GMAW+dgA4uSTT3a7Ctj0mF2M48amJ3/8\n8UfA36NvypQpSZ9vfwN73Crb9+nTJ8xmppQdu7lz51KvXj3AL0+Trn34ktE0n4iIiEgAkYxM2Z2s\n7XXWtm1b5s+fD/jhWBuRH3XUUbmmUMC7E7OpJNuXKKp72llV7LLKlvdbVAr842r7dFkSIfjLuK+/\n/noXkcp7fLOyslzC5MCBA0NqefFZYnyvXr1cm226IdHDDz8MeAXyAJYsWRLJpfSl9fPPP7s96Qor\n9BgnduebGJGycg95p+WjyhZn2JTztGnTXCqFlQiwxPJBgwa5/ffee+89APr371/otGCUJC4KsIic\nRWMs6XzChAmsW7cO8Es8FBTJiTqLSFm5DpuJAb+0zH777Qd4aRJDhw4F/OuhTQFbukEcWHHmevXq\nufSfxH5niiJTIiIiIgFEejsZG22uXr2ae++9t9i/9/vvv7s7r1SKwhYWYUv1FhaWdH7ffffle8y2\nz0ksA2Dz4bb8PJm//vqL0047DfD3rSsuHUNPWH18//33AX/PxNmzZ8dymw4ramkFVm0J/cqVKznh\nhBOA8PdKTPV3cdiwYYC3PZEVOczrxRdfdGVkwt6rLYxjaOeZiy66yOXPWO6llR0BP0r10ksvleTl\nSyzd38XLL78cgDFjxuRb/LFhwwYXTbXyQakoI5CuPlpOsS342b59u8sRy1ssOdWK1ccoD6bMDjvs\nkG86xy62vXr1cj+zi/IxxxwTSqKkLsQl758lOI4cOdIlsZaUrdizKcPnnnvOTUGUlI6hJ6w+WqK2\nrbKZP39+0oUDQYXdR5siOfPMM3P9fMCAAWmbEorSPplhCOMYXnXVVUDuaR9LMrcdJSZPnswdd9wB\n5E4xCIPON54gfbRriKVTWG2+3r1788ILL5T2ZUtEe/OJiIiIhCySCeh5ZWdnM2bMmKSPnX322Wlu\njZSELRjo27cvL774IuCXOLDE2MRpIFs4ADBv3rxcP4tLEuz/shEjRgD+bu7FrSUWJQcddFCu8irg\nJW2D/5mUaLJFHpUqVXL7mX7wwQcA7vxz1113ZaZxUmKVK1d2U+2WAvLcc88BpC0qVVyKTImIiIgE\nEIucqajQ/HfZ7x+oj3EQZh9Hjx7t7oYtyfzEE08E/HIe6aDvovoYB2H2sX///kyaNAmAxYsXA34i\nenZ2dmleslSUMyUiIiISMkWmSkB3GWW/f6A+xkGYfezUqRNz584FoEePHkD4S6+T0XdRfYyDMPrY\nqlUrwMuPeuCBBwB/pfB3331X4jYGVWZKI0SFvhhlv3+gPsaB+lj2+wfqYxyojx5N84mIiIgEkNbI\nlIiIiEhZo8iUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiI\nSAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGU\niIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgE\noMGUiIiISAD/D2VmfeQeqcmwAAAAAElFTkSuQmCC\n",
661 | "text/plain": [
662 | ""
663 | ]
664 | },
665 | "metadata": {},
666 | "output_type": "display_data"
667 | }
668 | ],
669 | "source": [
670 | "# To check MNIST data\n",
671 | "print(\"x_train.shape = \"+str(x_train.shape))\n",
672 | "print(\"y_train.shape = \"+str(y_train.shape))\n",
673 | "fig = plt.figure(figsize=(10, 2))\n",
674 | "for i in range(20):\n",
675 | " ax1 = fig.add_subplot(2, 10, i+1)\n",
676 | " ax1.imshow(x_train[i], cmap='gray');\n",
677 | " ax1.axis('off')"
678 | ]
679 | },
680 | {
681 | "cell_type": "markdown",
682 | "metadata": {},
683 | "source": [
684 | "(In supervised learning) Every (good) dataset consists of a training set and a test set.\n",
685 | "\n",
686 | "The training data set consists of data points and their desired outputs.\n",
687 | "\n",
688 | "In this case, the data points are grayscale images of hand-written numbers, and their desired outputs are the numbers that have been drawn.\n",
689 | "\n",
690 | "The test data set consists of data points whose outputs need to be found."
691 | ]
692 | },
693 | {
694 | "cell_type": "markdown",
695 | "metadata": {},
696 | "source": [
697 | "Let us implement the following neural network to classify MNIST data:\n",
698 | ""
699 | ]
700 | },
701 | {
702 | "cell_type": "markdown",
703 | "metadata": {},
704 | "source": [
705 | "## Initialize network\n",
706 | "\n",
707 | "MNIST dataset has images of size 28x28. So the input layer to our network must have $28*28=784$ neurons.\n",
708 | "\n",
709 | "Since we are tring to classify whether the image is that of 0 or 1 or 2 ... or 9, we need to have 10 output neurons, each catering to the probability of one number among 0-9.\n",
710 | "\n",
711 | "Let our hidden layer (as shown in the diagram) have 15 neurons."
712 | ]
713 | },
714 | {
715 | "cell_type": "markdown",
716 | "metadata": {},
717 | "source": [
718 | "Before initializing the network though, let's ensure our inputs and outputs are appropriate for the task at hand."
719 | ]
720 | },
721 | {
722 | "cell_type": "markdown",
723 | "metadata": {},
724 | "source": [
725 | "## Are our inputs in the right format and shape?\n",
726 | "\n",
727 | "Remember that we give inputs as np.arrays of $n{\\times}784$ dimensions, $n$ being the number of data points we want to input to the network."
728 | ]
729 | },
730 | {
731 | "cell_type": "markdown",
732 | "metadata": {},
733 | "source": [
734 | "Is ``x_train`` an np.array?"
735 | ]
736 | },
737 | {
738 | "cell_type": "code",
739 | "execution_count": 15,
740 | "metadata": {},
741 | "outputs": [
742 | {
743 | "data": {
744 | "text/plain": [
745 | "numpy.ndarray"
746 | ]
747 | },
748 | "execution_count": 15,
749 | "metadata": {},
750 | "output_type": "execute_result"
751 | }
752 | ],
753 | "source": [
754 | "# Check type of x_train\n",
755 | "type(x_train)"
756 | ]
757 | },
758 | {
759 | "cell_type": "markdown",
760 | "metadata": {},
761 | "source": [
762 | "Yup, ``x_train`` is an np.array"
763 | ]
764 | },
765 | {
766 | "cell_type": "markdown",
767 | "metadata": {},
768 | "source": [
769 | "Is ``x_train`` in the shape required by the network?"
770 | ]
771 | },
772 | {
773 | "cell_type": "code",
774 | "execution_count": 16,
775 | "metadata": {},
776 | "outputs": [
777 | {
778 | "data": {
779 | "text/plain": [
780 | "(60000, 28, 28)"
781 | ]
782 | },
783 | "execution_count": 16,
784 | "metadata": {},
785 | "output_type": "execute_result"
786 | }
787 | ],
788 | "source": [
789 | "# Check shape of x_train\n",
790 | "x_train.shape"
791 | ]
792 | },
793 | {
794 | "cell_type": "markdown",
795 | "metadata": {},
796 | "source": [
797 | "Clearly not.\n",
798 | "\n",
799 | "We need to reshape this matrix to $60000{\\times}784$."
800 | ]
801 | },
802 | {
803 | "cell_type": "code",
804 | "execution_count": 17,
805 | "metadata": {},
806 | "outputs": [
807 | {
808 | "data": {
809 | "text/plain": [
810 | "(60000, 784)"
811 | ]
812 | },
813 | "execution_count": 17,
814 | "metadata": {},
815 | "output_type": "execute_result"
816 | }
817 | ],
818 | "source": [
819 | "# Reshaping x_train and x_test for our network with 784 inputs neurons\n",
820 | "x_train = np.reshape(x_train, (len(x_train), 784))\n",
821 | "x_test = np.reshape(x_test, (len(x_test), 784))\n",
822 | "\n",
823 | "# Check the dimensions\n",
824 | "x_train.shape"
825 | ]
826 | },
827 | {
828 | "cell_type": "markdown",
829 | "metadata": {},
830 | "source": [
831 | "Now our input is in the right format and shape."
832 | ]
833 | },
834 | {
835 | "cell_type": "markdown",
836 | "metadata": {},
837 | "source": [
838 | "## Are our inputs normalized?\n",
839 | "\n",
840 | "Remember that we had decided to limit the range of values for the input to 0-1."
841 | ]
842 | },
843 | {
844 | "cell_type": "markdown",
845 | "metadata": {},
846 | "source": [
847 | "Are all the values of ``x_train`` between 0 and 1?"
848 | ]
849 | },
850 | {
851 | "cell_type": "code",
852 | "execution_count": 18,
853 | "metadata": {},
854 | "outputs": [
855 | {
856 | "name": "stdout",
857 | "output_type": "stream",
858 | "text": [
859 | "Values in x_train lie between 0 and 255\n"
860 | ]
861 | }
862 | ],
863 | "source": [
864 | "# Check range of values of x_train\n",
865 | "print(\"Values in x_train lie between \"+str(np.min(x_train))+\" and \"+str(np.max(np.max(x_train))))"
866 | ]
867 | },
868 | {
869 | "cell_type": "markdown",
870 | "metadata": {},
871 | "source": [
872 | "Our inputs are images, their values range from 0 to 255. We need to bring them down to 0-1."
873 | ]
874 | },
875 | {
876 | "cell_type": "code",
877 | "execution_count": 19,
878 | "metadata": {
879 | "collapsed": true
880 | },
881 | "outputs": [],
882 | "source": [
883 | "# Normalize x_train\n",
884 | "x_train = x_train / 255.0\n",
885 | "x_test = x_test / 255.0"
886 | ]
887 | },
888 | {
889 | "cell_type": "code",
890 | "execution_count": 20,
891 | "metadata": {},
892 | "outputs": [
893 | {
894 | "name": "stdout",
895 | "output_type": "stream",
896 | "text": [
897 | "Values in x_train lie between 0.0 and 1.0\n"
898 | ]
899 | }
900 | ],
901 | "source": [
902 | "# Check range of values of x_train\n",
903 | "print(\"Values in x_train lie between \"+str(np.min(x_train))+\" and \"+str(np.max(np.max(x_train))))"
904 | ]
905 | },
906 | {
907 | "cell_type": "markdown",
908 | "metadata": {},
909 | "source": [
910 | "Perfect."
911 | ]
912 | },
913 | {
914 | "cell_type": "markdown",
915 | "metadata": {},
916 | "source": [
917 | "## Are our outputs in the right format and shape?"
918 | ]
919 | },
920 | {
921 | "cell_type": "markdown",
922 | "metadata": {},
923 | "source": [
924 | "Is ``y_train`` an np.array?"
925 | ]
926 | },
927 | {
928 | "cell_type": "code",
929 | "execution_count": 21,
930 | "metadata": {},
931 | "outputs": [
932 | {
933 | "data": {
934 | "text/plain": [
935 | "numpy.ndarray"
936 | ]
937 | },
938 | "execution_count": 21,
939 | "metadata": {},
940 | "output_type": "execute_result"
941 | }
942 | ],
943 | "source": [
944 | "# Check type of y_train\n",
945 | "type(y_train)"
946 | ]
947 | },
948 | {
949 | "cell_type": "markdown",
950 | "metadata": {},
951 | "source": [
952 | "Yup, ``y_train`` is an np.array"
953 | ]
954 | },
955 | {
956 | "cell_type": "markdown",
957 | "metadata": {},
958 | "source": [
959 | "Remember that we have 10 neurons in the output layer. That means our output needs to be of ${n{\\times}10}$ dimensions."
960 | ]
961 | },
962 | {
963 | "cell_type": "markdown",
964 | "metadata": {},
965 | "source": [
966 | "Is the shape of ``y_train`` $n{\\times}10$?"
967 | ]
968 | },
969 | {
970 | "cell_type": "code",
971 | "execution_count": 22,
972 | "metadata": {},
973 | "outputs": [
974 | {
975 | "data": {
976 | "text/plain": [
977 | "(60000,)"
978 | ]
979 | },
980 | "execution_count": 22,
981 | "metadata": {},
982 | "output_type": "execute_result"
983 | }
984 | ],
985 | "source": [
986 | "# Check shape of y_train\n",
987 | "y_train.shape"
988 | ]
989 | },
990 | {
991 | "cell_type": "markdown",
992 | "metadata": {},
993 | "source": [
994 | "Nope, ``y_train`` is of shape $60000{\\times}1$"
995 | ]
996 | },
997 | {
998 | "cell_type": "markdown",
999 | "metadata": {},
1000 | "source": [
1001 | "What are its values like?"
1002 | ]
1003 | },
1004 | {
1005 | "cell_type": "code",
1006 | "execution_count": 23,
1007 | "metadata": {},
1008 | "outputs": [
1009 | {
1010 | "name": "stdout",
1011 | "output_type": "stream",
1012 | "text": [
1013 | "5\n",
1014 | "0\n",
1015 | "4\n",
1016 | "1\n",
1017 | "9\n"
1018 | ]
1019 | }
1020 | ],
1021 | "source": [
1022 | "for i in range(5):\n",
1023 | " print(y_train[i])"
1024 | ]
1025 | },
1026 | {
1027 | "cell_type": "markdown",
1028 | "metadata": {},
1029 | "source": [
1030 | "So ``y_train`` carries the numbers of the digits the images represent."
1031 | ]
1032 | },
1033 | {
1034 | "cell_type": "markdown",
1035 | "metadata": {},
1036 | "source": [
1037 | "We need to make a new binary array of $60000{\\times}10$ and insert a 1 in the column corresponding to the number of the digit its image shows.\n",
1038 | "\n",
1039 | "For example, the first row of our new y_train should look like $\\left[\\begin{array}{c}0&0&0&0&0&1&0&0&0&0\\end{array}\\right]$, since it represents 5. This is called one-hot encoding."
1040 | ]
1041 | },
1042 | {
1043 | "cell_type": "code",
1044 | "execution_count": 24,
1045 | "metadata": {
1046 | "collapsed": true
1047 | },
1048 | "outputs": [],
1049 | "source": [
1050 | "# Make new y_train of nx10 elements\n",
1051 | "new_y_train = np.zeros((len(y_train), 10))\n",
1052 | "for i in range(len(y_train)):\n",
1053 | " new_y_train[i, y_train[i]] = 1"
1054 | ]
1055 | },
1056 | {
1057 | "cell_type": "code",
1058 | "execution_count": 25,
1059 | "metadata": {
1060 | "collapsed": true
1061 | },
1062 | "outputs": [],
1063 | "source": [
1064 | "# Make new y_test of nx10 elements\n",
1065 | "new_y_test = np.zeros((len(y_test), 10))\n",
1066 | "for i in range(len(y_test)):\n",
1067 | " new_y_test[i, y_test[i]] = 1"
1068 | ]
1069 | },
1070 | {
1071 | "cell_type": "code",
1072 | "execution_count": 26,
1073 | "metadata": {},
1074 | "outputs": [
1075 | {
1076 | "name": "stdout",
1077 | "output_type": "stream",
1078 | "text": [
1079 | "[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]\n",
1080 | "[ 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]\n"
1081 | ]
1082 | }
1083 | ],
1084 | "source": [
1085 | "# Check first row of y_train\n",
1086 | "print(new_y_train[0])\n",
1087 | "print(new_y_test[0])"
1088 | ]
1089 | },
1090 | {
1091 | "cell_type": "markdown",
1092 | "metadata": {},
1093 | "source": [
1094 | "Now that new_y_train is correctly shaped and formatted, let us reassign the name y_train to the matrix new_y_train."
1095 | ]
1096 | },
1097 | {
1098 | "cell_type": "code",
1099 | "execution_count": 27,
1100 | "metadata": {
1101 | "collapsed": true
1102 | },
1103 | "outputs": [],
1104 | "source": [
1105 | "# Reassign the name \"y_train\" to new_y_train\n",
1106 | "y_train = new_y_train\n",
1107 | "y_test = new_y_test"
1108 | ]
1109 | },
1110 | {
1111 | "cell_type": "markdown",
1112 | "metadata": {},
1113 | "source": [
1114 | "## Initialize the network"
1115 | ]
1116 | },
1117 | {
1118 | "cell_type": "code",
1119 | "execution_count": 28,
1120 | "metadata": {
1121 | "collapsed": true
1122 | },
1123 | "outputs": [],
1124 | "source": [
1125 | "# Initialize network\n",
1126 | "layers = [784, 15, 10]\n",
1127 | "weights = initializeWeights(layers)\n",
1128 | "\n",
1129 | "# Take backup of weights to be used later for comparison\n",
1130 | "initialWeights = [np.array(w) for w in weights]"
1131 | ]
1132 | },
1133 | {
1134 | "cell_type": "code",
1135 | "execution_count": 29,
1136 | "metadata": {},
1137 | "outputs": [
1138 | {
1139 | "data": {
1140 | "text/plain": [
1141 | "'\\nprint(\"weights:\")\\nfor i in range(len(weights)):\\n print(i+1); print(weights[i].shape); print(weights[i])\\n'"
1142 | ]
1143 | },
1144 | "execution_count": 29,
1145 | "metadata": {},
1146 | "output_type": "execute_result"
1147 | }
1148 | ],
1149 | "source": [
1150 | "# Please don't print the weights\n",
1151 | "# There are 15*784=11760 weights in the first layer,\n",
1152 | "# + 10*15=150 weights in the second layer\n",
1153 | "'''\n",
1154 | "print(\"weights:\")\n",
1155 | "for i in range(len(weights)):\n",
1156 | " print(i+1); print(weights[i].shape); print(weights[i])\n",
1157 | "'''\n"
1158 | ]
1159 | },
1160 | {
1161 | "cell_type": "markdown",
1162 | "metadata": {},
1163 | "source": [
1164 | "## Train the network\n",
1165 | "\n",
1166 | "Use the proper inputs ``x_train`` and ``y_train`` to train your neural network."
1167 | ]
1168 | },
1169 | {
1170 | "cell_type": "markdown",
1171 | "metadata": {},
1172 | "source": [
1173 | "How many iterations do you want to perform? How much should be the learning rate? Should it be adaptive? How many neurons per layer?"
1174 | ]
1175 | },
1176 | {
1177 | "cell_type": "markdown",
1178 | "metadata": {},
1179 | "source": [
1180 | "Remember that there are 60,000 images in the training set."
1181 | ]
1182 | },
1183 | {
1184 | "cell_type": "code",
1185 | "execution_count": 30,
1186 | "metadata": {},
1187 | "outputs": [
1188 | {
1189 | "name": "stdout",
1190 | "output_type": "stream",
1191 | "text": [
1192 | "Iteration 1 of 1\n",
1193 | "Cost: 1.97857726345\n",
1194 | "Time: 3.7738959789276123 seconds\n"
1195 | ]
1196 | }
1197 | ],
1198 | "source": [
1199 | "# Train the network using Gradient Descent\n",
1200 | "# Let's check how much time it takes for 1 iteration\n",
1201 | "\n",
1202 | "# Set options\n",
1203 | "nIterations = 1\n",
1204 | "learningRate = 1.0\n",
1205 | "\n",
1206 | "# Start time\n",
1207 | "start = time.time()\n",
1208 | "\n",
1209 | "# Train\n",
1210 | "trainUsingGD(weights, x_train, y_train, nIterations, learningRate)\n",
1211 | "\n",
1212 | "# End time\n",
1213 | "end = time.time()\n",
1214 | "\n",
1215 | "print(\"Time: \"+str(end - start)+\" seconds\")"
1216 | ]
1217 | },
1218 | {
1219 | "cell_type": "markdown",
1220 | "metadata": {
1221 | "collapsed": true
1222 | },
1223 | "source": [
1224 | "See how it takes SO LONG for just one iteration?"
1225 | ]
1226 | },
1227 | {
1228 | "cell_type": "markdown",
1229 | "metadata": {},
1230 | "source": [
1231 | "**Problem: Batch Gradient Descent computes error, delta, etc. over the entire input data set**\n",
1232 | "\n",
1233 | "Solution: Don't change weights over the entire data set, repeatedly use a randomly sampled subset of the data set.\n",
1234 | "\n",
1235 | "This is called the Monte Carlo method, and in this case it has been developed into Stochastic Gradient Descent."
1236 | ]
1237 | },
1238 | {
1239 | "cell_type": "markdown",
1240 | "metadata": {},
1241 | "source": [
1242 | "# Mini-batch Gradient Descent"
1243 | ]
1244 | },
1245 | {
1246 | "cell_type": "markdown",
1247 | "metadata": {
1248 | "collapsed": true
1249 | },
1250 | "source": [
1251 | "We shall define a $minibatchSize$ lesser than the number of data points input to the network ($n$). Say $minibatchSize = 100$.\n",
1252 | "\n",
1253 | "**Mini-batch GD**:\n",
1254 | "\n",
1255 | "For every epoch:\n",
1256 | "- randomly group the input data set into mini-batches of ($minibatchSize=$) 100 images:\n",
1257 | " - randomly shuffle the entire data set\n",
1258 | " - consider every 100 images as one mini-batch - so there are ``int(n/minibatchSize)`` number of mini-batches\n",
1259 | "- use gradient descent on every mini-batch to update weights\n",
1260 | "- Repeat.\n",
1261 | "\n",
1262 | "If $minibatchSize=n$, this is the same as Batch Gradient Descent.\n",
1263 | "\n",
1264 | "If $minibatchSize=1$, i.e. we update the weights after backpropagating for only one image, it is called **Stochastic Grdient Descent**."
1265 | ]
1266 | },
1267 | {
1268 | "cell_type": "markdown",
1269 | "metadata": {},
1270 | "source": [
1271 | "So, at every iteration we are using gradient descent on only $minibatchSize$ number of images.\n",
1272 | "\n",
1273 | "Mathematical proofs exist on why this works better than gradient descent, under some assumptions (like stationarity, which holds true for our purposes)."
1274 | ]
1275 | },
1276 | {
1277 | "cell_type": "markdown",
1278 | "metadata": {},
1279 | "source": [
1280 | "Let's code Mini-batch Gradient Descent:"
1281 | ]
1282 | },
1283 | {
1284 | "cell_type": "code",
1285 | "execution_count": 43,
1286 | "metadata": {
1287 | "collapsed": true
1288 | },
1289 | "outputs": [],
1290 | "source": [
1291 | "# TRAINING USING MINI-BATCH GRADIENT DESCENT\n",
1292 | "# Default learning rate = 1.0\n",
1293 | "def trainUsingMinibatchGD(weights, X, Y, minibatchSize, nEpochs, learningRate=1.0):\n",
1294 | " # For nIterations number of iterations:\n",
1295 | " for i in range(nEpochs):\n",
1296 | " # clear output\n",
1297 | " #clear_output()\n",
1298 | " print(\"Epoch \"+str(i+1)+\" of \"+str(nEpochs))\n",
1299 | " \n",
1300 | " # Make a list of all the indices\n",
1301 | " fullIdx = list(range(len(Y)))\n",
1302 | " \n",
1303 | " # Shuffle the full index\n",
1304 | " np.random.shuffle(fullIdx)\n",
1305 | " \n",
1306 | " # Count number of mini-batches\n",
1307 | " nOfMinibatches = int(len(X)/minibatchSize)\n",
1308 | " \n",
1309 | " # For each mini-batch\n",
1310 | " for m in range(nOfMinibatches):\n",
1311 | " # Compute the starting index of this mini-batch\n",
1312 | " startIdx = m*minibatchSize\n",
1313 | " \n",
1314 | " # Declare sampled inputs and outputs\n",
1315 | " xSample = X[fullIdx[startIdx:startIdx+minibatchSize]]\n",
1316 | " ySample = Y[fullIdx[startIdx:startIdx+minibatchSize]]\n",
1317 | "\n",
1318 | " # Run backprop\n",
1319 | " backProp(weights, xSample, ySample, learningRate)"
1320 | ]
1321 | },
1322 | {
1323 | "cell_type": "markdown",
1324 | "metadata": {},
1325 | "source": [
1326 | "Using MinibatchGD, training upto the same accuracy should take lesser time than GD."
1327 | ]
1328 | },
1329 | {
1330 | "cell_type": "code",
1331 | "execution_count": 44,
1332 | "metadata": {
1333 | "collapsed": true
1334 | },
1335 | "outputs": [],
1336 | "source": [
1337 | "# Initialize network\n",
1338 | "layers = [784, 30, 10]\n",
1339 | "weights = initializeWeights(layers)\n",
1340 | "\n",
1341 | "# Take backup of weights to be used later for comparison\n",
1342 | "initialWeights = [np.array(w) for w in weights]"
1343 | ]
1344 | },
1345 | {
1346 | "cell_type": "code",
1347 | "execution_count": 45,
1348 | "metadata": {},
1349 | "outputs": [
1350 | {
1351 | "name": "stdout",
1352 | "output_type": "stream",
1353 | "text": [
1354 | "5570 out of 60000 : 0.09283333333333334\n"
1355 | ]
1356 | }
1357 | ],
1358 | "source": [
1359 | "# Evaluate initial weights on training data\n",
1360 | "evaluate(weights, x_train, y_train)"
1361 | ]
1362 | },
1363 | {
1364 | "cell_type": "code",
1365 | "execution_count": 46,
1366 | "metadata": {},
1367 | "outputs": [
1368 | {
1369 | "name": "stdout",
1370 | "output_type": "stream",
1371 | "text": [
1372 | "948 out of 10000 : 0.0948\n"
1373 | ]
1374 | }
1375 | ],
1376 | "source": [
1377 | "# Evaluate initial weights on test data\n",
1378 | "evaluate(weights, x_test, y_test)"
1379 | ]
1380 | },
1381 | {
1382 | "cell_type": "markdown",
1383 | "metadata": {},
1384 | "source": [
1385 | "- Let's first use Batch Gradient Descent ($minibatchSize = size\\;of \\;full\\;input$) to evaluate the accuracy and time with one iteration "
1386 | ]
1387 | },
1388 | {
1389 | "cell_type": "code",
1390 | "execution_count": 47,
1391 | "metadata": {},
1392 | "outputs": [
1393 | {
1394 | "name": "stdout",
1395 | "output_type": "stream",
1396 | "text": [
1397 | "Epoch 1 of 1\n",
1398 | "Training accuracy:\n",
1399 | "5889 out of 60000 : 0.09815\n",
1400 | "Test accuracy:\n",
1401 | "1012 out of 10000 : 0.1012\n",
1402 | "Time: 2.8622570037841797 seconds\n"
1403 | ]
1404 | }
1405 | ],
1406 | "source": [
1407 | "# Train the network ONCE using Batch Gradient Descent to check accuracy and time\n",
1408 | "\n",
1409 | "# Re-initialize weights\n",
1410 | "weights = [np.array(w) for w in initialWeights]\n",
1411 | "\n",
1412 | "# Set options for batch gradient descent\n",
1413 | "minibatchSize = len(y_train)\n",
1414 | "nEpochs = 1\n",
1415 | "learningRate = 3.0\n",
1416 | "\n",
1417 | "# Start time\n",
1418 | "start = time.time()\n",
1419 | "\n",
1420 | "# Train\n",
1421 | "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n",
1422 | "\n",
1423 | "# End time\n",
1424 | "end = time.time()\n",
1425 | "\n",
1426 | "# Evaluate accuracy\n",
1427 | "print(\"Training accuracy:\")\n",
1428 | "evaluate(weights, x_train, y_train)\n",
1429 | "print(\"Test accuracy:\")\n",
1430 | "evaluate(weights, x_test, y_test)\n",
1431 | "\n",
1432 | "# Print time taken\n",
1433 | "print(\"Time: \"+str(end-start)+\" seconds\")"
1434 | ]
1435 | },
1436 | {
1437 | "cell_type": "markdown",
1438 | "metadata": {},
1439 | "source": [
1440 | "- Okay, let's check with Stochastic Gradient Descent, i.e. $minibatchSize = 1$"
1441 | ]
1442 | },
1443 | {
1444 | "cell_type": "code",
1445 | "execution_count": 48,
1446 | "metadata": {},
1447 | "outputs": [
1448 | {
1449 | "name": "stdout",
1450 | "output_type": "stream",
1451 | "text": [
1452 | "Epoch 1 of 1\n",
1453 | "Training accuracy:\n",
1454 | "44816 out of 60000 : 0.7469333333333333\n",
1455 | "Test accuracy:\n",
1456 | "7539 out of 10000 : 0.7539\n",
1457 | "Time: 21.746292114257812 seconds\n"
1458 | ]
1459 | }
1460 | ],
1461 | "source": [
1462 | "# Train the network ONCE using Stochastic Gradient Descent to check accuracy and time\n",
1463 | "\n",
1464 | "# Re-initialize weights\n",
1465 | "weights = [np.array(w) for w in initialWeights]\n",
1466 | "\n",
1467 | "# Set options of stochastic gradient descent\n",
1468 | "minibatchSize = 1\n",
1469 | "nEpochs = 1\n",
1470 | "learningRate = 3.0\n",
1471 | "\n",
1472 | "# Start time\n",
1473 | "start = time.time()\n",
1474 | "\n",
1475 | "# Train\n",
1476 | "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n",
1477 | "\n",
1478 | "# End time\n",
1479 | "end = time.time()\n",
1480 | "\n",
1481 | "# Evaluate accuracy\n",
1482 | "print(\"Training accuracy:\")\n",
1483 | "evaluate(weights, x_train, y_train)\n",
1484 | "print(\"Test accuracy:\")\n",
1485 | "evaluate(weights, x_test, y_test)\n",
1486 | "\n",
1487 | "# Print time taken\n",
1488 | "print(\"Time: \"+str(end-start)+\" seconds\")"
1489 | ]
1490 | },
1491 | {
1492 | "cell_type": "markdown",
1493 | "metadata": {},
1494 | "source": [
1495 | "Stochastic Gradient Descent took more time, but gave much better accuracy in just 1 epoch."
1496 | ]
1497 | },
1498 | {
1499 | "cell_type": "markdown",
1500 | "metadata": {},
1501 | "source": [
1502 | "- Let's now check for Mini-batch Gradient Descent, with $minibatchSize = $ (say) $10$"
1503 | ]
1504 | },
1505 | {
1506 | "cell_type": "code",
1507 | "execution_count": 49,
1508 | "metadata": {},
1509 | "outputs": [
1510 | {
1511 | "name": "stdout",
1512 | "output_type": "stream",
1513 | "text": [
1514 | "Epoch 1 of 1\n",
1515 | "Training accuracy:\n",
1516 | "52428 out of 60000 : 0.8738\n",
1517 | "Test accuracy:\n",
1518 | "8752 out of 10000 : 0.8752\n",
1519 | "Time: 4.0647711753845215 seconds\n"
1520 | ]
1521 | }
1522 | ],
1523 | "source": [
1524 | "# Train the network ONCE using Mini-batch Gradient Descent to check accuracy and time\n",
1525 | "\n",
1526 | "# Re-initialize weights\n",
1527 | "weights = [np.array(w) for w in initialWeights]\n",
1528 | "\n",
1529 | "# Set options of mini-batch gradient descent\n",
1530 | "minibatchSize = 10\n",
1531 | "nEpochs = 1\n",
1532 | "learningRate = 3.0\n",
1533 | "\n",
1534 | "# Start time\n",
1535 | "start = time.time()\n",
1536 | "\n",
1537 | "# Train\n",
1538 | "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n",
1539 | "\n",
1540 | "# End time\n",
1541 | "end = time.time()\n",
1542 | "\n",
1543 | "# Evaluate accuracy\n",
1544 | "print(\"Training accuracy:\")\n",
1545 | "evaluate(weights, x_train, y_train)\n",
1546 | "print(\"Test accuracy:\")\n",
1547 | "evaluate(weights, x_test, y_test)\n",
1548 | "\n",
1549 | "# Print time taken\n",
1550 | "print(\"Time: \"+str(end-start)+\" seconds\")"
1551 | ]
1552 | },
1553 | {
1554 | "cell_type": "markdown",
1555 | "metadata": {},
1556 | "source": [
1557 | "Thus, (in 1 epoch) Mini-batch Gradient descent gives comparable accuracy to Stochastic Gradient Descent, which is much better than the accuracy given by Batch Gradient Descent, in much lesser time."
1558 | ]
1559 | },
1560 | {
1561 | "cell_type": "markdown",
1562 | "metadata": {},
1563 | "source": [
1564 | "## Classifying MNIST data set\n",
1565 | "\n",
1566 | "Let us try to classify the MNIST data set up to more than 99%. This means deciding the number of layers, size of each layer, number of Epochs, the mini-batch size, and the learning (constant, for now).\n",
1567 | "\n",
1568 | "Let us try, $layers = [784$ (input layer, because each MNIST image is 28$x$28)$, 30$ (hidden layer)$, 10$ (outputs layer, one neuron for each digit)$], nEpochs = 30, minibatchSize = 10, learningRate = 3.0$"
1569 | ]
1570 | },
1571 | {
1572 | "cell_type": "code",
1573 | "execution_count": 55,
1574 | "metadata": {},
1575 | "outputs": [
1576 | {
1577 | "name": "stdout",
1578 | "output_type": "stream",
1579 | "text": [
1580 | "Epoch 1 of 50\n",
1581 | "Epoch 2 of 50\n",
1582 | "Epoch 3 of 50\n",
1583 | "Epoch 4 of 50\n",
1584 | "Epoch 5 of 50\n",
1585 | "Epoch 6 of 50\n",
1586 | "Epoch 7 of 50\n",
1587 | "Epoch 8 of 50\n",
1588 | "Epoch 9 of 50\n",
1589 | "Epoch 10 of 50\n",
1590 | "Epoch 11 of 50\n",
1591 | "Epoch 12 of 50\n",
1592 | "Epoch 13 of 50\n",
1593 | "Epoch 14 of 50\n",
1594 | "Epoch 15 of 50\n",
1595 | "Epoch 16 of 50\n",
1596 | "Epoch 17 of 50\n",
1597 | "Epoch 18 of 50\n",
1598 | "Epoch 19 of 50\n",
1599 | "Epoch 20 of 50\n",
1600 | "Epoch 21 of 50\n",
1601 | "Epoch 22 of 50\n",
1602 | "Epoch 23 of 50\n",
1603 | "Epoch 24 of 50\n",
1604 | "Epoch 25 of 50\n",
1605 | "Epoch 26 of 50\n",
1606 | "Epoch 27 of 50\n",
1607 | "Epoch 28 of 50\n",
1608 | "Epoch 29 of 50\n",
1609 | "Epoch 30 of 50\n",
1610 | "Epoch 31 of 50\n",
1611 | "Epoch 32 of 50\n",
1612 | "Epoch 33 of 50\n",
1613 | "Epoch 34 of 50\n",
1614 | "Epoch 35 of 50\n",
1615 | "Epoch 36 of 50\n",
1616 | "Epoch 37 of 50\n",
1617 | "Epoch 38 of 50\n",
1618 | "Epoch 39 of 50\n",
1619 | "Epoch 40 of 50\n",
1620 | "Epoch 41 of 50\n",
1621 | "Epoch 42 of 50\n",
1622 | "Epoch 43 of 50\n",
1623 | "Epoch 44 of 50\n",
1624 | "Epoch 45 of 50\n",
1625 | "Epoch 46 of 50\n",
1626 | "Epoch 47 of 50\n",
1627 | "Epoch 48 of 50\n",
1628 | "Epoch 49 of 50\n",
1629 | "Epoch 50 of 50\n",
1630 | "Training accuracy:\n",
1631 | "58180 out of 60000 : 0.9696666666666667\n",
1632 | "Test accuracy:\n",
1633 | "9397 out of 10000 : 0.9397\n"
1634 | ]
1635 | }
1636 | ],
1637 | "source": [
1638 | "# TRAIN A NETWORK TO CLASSIFY MNIST\n",
1639 | "\n",
1640 | "# Initialize network\n",
1641 | "layers = [784, 30, 10]\n",
1642 | "weights = initializeWeights(layers)\n",
1643 | "\n",
1644 | "# Take backup of weights to be used later for comparison\n",
1645 | "initialWeights = [np.array(w) for w in weights]\n",
1646 | "\n",
1647 | "# Set options of mini-batch gradient descent\n",
1648 | "minibatchSize = 10\n",
1649 | "nEpochs = 50\n",
1650 | "learningRate = 3.0\n",
1651 | "\n",
1652 | "# Train\n",
1653 | "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n",
1654 | "\n",
1655 | "# Evaluate accuracy\n",
1656 | "print(\"Training accuracy:\")\n",
1657 | "evaluate(weights, x_train, y_train)\n",
1658 | "print(\"Test accuracy:\")\n",
1659 | "evaluate(weights, x_test, y_test)"
1660 | ]
1661 | },
1662 | {
1663 | "cell_type": "markdown",
1664 | "metadata": {},
1665 | "source": [
1666 | "About 93%-95%.. What if we increase the mini-batch size?"
1667 | ]
1668 | },
1669 | {
1670 | "cell_type": "code",
1671 | "execution_count": 59,
1672 | "metadata": {},
1673 | "outputs": [
1674 | {
1675 | "name": "stdout",
1676 | "output_type": "stream",
1677 | "text": [
1678 | "Epoch 1 of 10\n",
1679 | "Epoch 2 of 10\n",
1680 | "Epoch 3 of 10\n",
1681 | "Epoch 4 of 10\n",
1682 | "Epoch 5 of 10\n",
1683 | "Epoch 6 of 10\n",
1684 | "Epoch 7 of 10\n",
1685 | "Epoch 8 of 10\n",
1686 | "Epoch 9 of 10\n",
1687 | "Epoch 10 of 10\n",
1688 | "Training accuracy:\n",
1689 | "53245 out of 60000 : 0.8874166666666666\n",
1690 | "Test accuracy:\n",
1691 | "8846 out of 10000 : 0.8846\n"
1692 | ]
1693 | }
1694 | ],
1695 | "source": [
1696 | "# TRAIN A NETWORK TO CLASSIFY MNIST\n",
1697 | "\n",
1698 | "# Initialize network\n",
1699 | "layers = [784, 10, 10, 10]\n",
1700 | "weights = initializeWeights(layers)\n",
1701 | "\n",
1702 | "# Take backup of weights to be used later for comparison\n",
1703 | "initialWeights = [np.array(w) for w in weights]\n",
1704 | "\n",
1705 | "# Set options of mini-batch gradient descent\n",
1706 | "minibatchSize = 10\n",
1707 | "nEpochs = 30\n",
1708 | "learningRate = 3.0\n",
1709 | "\n",
1710 | "# Train\n",
1711 | "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n",
1712 | "\n",
1713 | "# Evaluate accuracy\n",
1714 | "print(\"Training accuracy:\")\n",
1715 | "evaluate(weights, x_train, y_train)\n",
1716 | "print(\"Test accuracy:\")\n",
1717 | "evaluate(weights, x_test, y_test)"
1718 | ]
1719 | },
1720 | {
1721 | "cell_type": "markdown",
1722 | "metadata": {
1723 | "collapsed": true
1724 | },
1725 | "source": [
1726 | "## Coming up next\n",
1727 | "\n",
1728 | "In the next tutorial, we shall see the different types of optimizations that can be done in gradient descent, and compare their performances."
1729 | ]
1730 | }
1731 | ],
1732 | "metadata": {
1733 | "kernelspec": {
1734 | "display_name": "Python 3",
1735 | "language": "python",
1736 | "name": "python3"
1737 | },
1738 | "language_info": {
1739 | "codemirror_mode": {
1740 | "name": "ipython",
1741 | "version": 3
1742 | },
1743 | "file_extension": ".py",
1744 | "mimetype": "text/x-python",
1745 | "name": "python",
1746 | "nbconvert_exporter": "python",
1747 | "pygments_lexer": "ipython3",
1748 | "version": "3.5.1"
1749 | }
1750 | },
1751 | "nbformat": 4,
1752 | "nbformat_minor": 2
1753 | }
1754 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Neural Network in Python
2 |
3 | An implementation of a Multi-Layer Perceptron, with forward propagation, back propagation using Gradient Descent, training usng Batch or Stochastic Gradient Descent
4 |
5 | Use: myNN = MyPyNN(nOfInputDims, nOfHiddenLayers, sizesOfHiddenLayers, nOfOutputDims, alpha, regLambda)
6 | Here, alpha = learning rate of gradient descent, regLambda = regularization parameter
7 |
8 | ## Example 1
9 |
10 | ```
11 | from myPyNN import *
12 | X = [0, 0.5, 1]
13 | y = [0, 0.5, 1]
14 | myNN = MyPyNN([1, 1, 1]]
15 | ```
16 | Input Layer : 1-dimensional (Eg: X)
17 |
18 | 1 Hidden Layer : 1-dimensional
19 |
20 | Output Layer : 1-dimensional (Eg. y)
21 |
22 | Learning Rate : 0.05 (default)
23 | ```
24 | print myNN.predict(0.2)
25 | ```
26 |
27 |
28 | ## Example 2
29 | ```
30 | X = [[0,0], [1,1]]
31 | y = [0, 1]
32 | myNN = MyPyNN([2, 3, 1])
33 | ```
34 | Input Layer : 2-dimensional (Eg: X)
35 |
36 | 1 Hidden Layer : 3-dimensional
37 |
38 | Output Layer : 1-dimensional (Eg. y)
39 |
40 | Learning rate : 0.8
41 | ```
42 | print myNN.predict(X)
43 | #myNN.trainUsingGD(X, y, 899)
44 | myNN.trainUsingSGD(X, y, 1000)
45 | print myNN.predict(X)
46 | ```
47 |
48 | ## Example 3
49 |
50 | ```
51 | X = [[2,2,2], [3,3,3], [4,4,4], [5,5,5], [6,6,6], [7,7,7], [8,8,8], [9,9,9], [10,10,10], [11,11,11]]
52 | y = [.2, .3, .4, .5, .6, .7, .8, .9, 0, .1]
53 | myNN = MyPyNN([3, 10, 10, 5, 1])
54 | ```
55 | Input Layer : 3-dimensional (Eg: X)
56 |
57 | 3 Hidden Layers: 10-dimensional, 10-dimensional, 5-dimensional
58 |
59 | Output Layer : 1-dimensional (Eg. y)
60 |
61 | Learning rate : 0.9
62 |
63 | Regularization parameter : 0.5
64 | ```
65 | print myNN.predict(X)
66 | #myNN.trainUsingGD(X, y, 899)
67 | myNN.trainUsingSGD(X, y, 1000)
68 | print myNN.predict(X)
69 | ```
70 |
71 | ## Requirements for interactive tutorial (myPyNN.ipynb)
72 |
73 | I ran this in OS X, after installing brew for command-line use, and pip for python-related stuff.
74 |
75 | ### Python
76 |
77 | I designed the tutorial on Python 2.7, can be run on Python 3 as well.
78 |
79 | ### Packages
80 |
81 | - numpy
82 | - matplotlib
83 | - ipywidgets
84 |
85 | ### Jupyter
86 |
87 | The tutorial is an iPython notebook. It is designed and meant to run in Jupyter. To install Jupyter, one can install Anaconda which would install Python, Jupyter, along with a lot of other stuff. Or, one can install only Jupyter using:
88 | ```
89 | pip install jupyter
90 | ```
91 |
92 | ### ipywidgets
93 |
94 | ipywidgets comes pre-installed with Jupyter. However, widgets might need to be actived using:
95 | ```
96 | jupyter nbextension enable --py widgetsnbextension
97 | jupyter nbextension enable --py --sys-prefix widgetsnbextension
98 | ```
99 |
100 | ## References
101 | - [Machine Learning Mastery's excellent tutorial](https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/)
102 |
103 | - [Mattmazur's example](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)
104 |
105 | - [Welch Lab's excellent video playlist on neural networks](https://www.youtube.com/playlist?list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU)
106 |
107 | - [Michael Nielsen's brilliant hands-on interactive tutorial on the awesome power of neural networks as universal approximators](https://neuralnetworksanddeeplearning.com/chap4.html)
108 |
109 | - [Excellent overview of gradient descent algorithms](http://sebastianruder.com/optimizing-gradient-descent/)
110 |
111 | - [CS321n's iPython tutorial](https://cs231n.github.io/ipython-tutorial/)
112 |
113 | - [Karlijn Willem's definitive Jupyter guide](https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook#gs.SJPul58)
114 |
115 | - [matplotlib](https://matplotlib.org/)
116 |
117 | - [Tutorial on using Matplotlib in Jupyter](https://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-4-Matplotlib.ipynb)
118 |
119 | - [Interactive dashboards in Jupyter](https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/)
120 |
121 | - [ipywidgets - for interactive dashboards in Jupyter](http://ipywidgets.readthedocs.io/)
122 |
123 | - [drawing-animating-shapes-matplotlib](https://nickcharlton.net/posts/drawing-animating-shapes-matplotlib.html)
124 |
125 | - [RISE - for Jupyter presentations](https://github.com/damianavila/RISE)
126 |
127 | - [MathJax syntax list](https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference)
128 |
129 | - [MNIST dataset and results](http://yann.lecun.com/exdb/mnist/)
130 |
131 | - [MNIST dataset .npz file (Amazon AWS)](https://s3.amazonaws.com/img-datasets/mnist.npz)
132 |
133 | - [NpzFile doc](http://docr.it/numpy/lib/npyio/NpzFile)
134 |
135 | - [matplotlib examples from SciPy](http://scipython.com/book/chapter-7-matplotlib/examples/simple-surface-plots/)
136 |
137 | - [Yann LeCun's backprop paper, containing tips for efficient backpropagation](http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf)
138 |
139 | - [Mathematical notations for LaTeX, which can also be used in Jupyter](https://en.wikibooks.org/wiki/LaTeX/Mathematics)
140 |
141 | - [JupyterHub](http://jupyterhub.readthedocs.io/en/latest/getting-started.html)
142 |
143 | - [Optional code visibility in iPython notebooks](https://chris-said.io/2016/02/13/how-to-make-polished-jupyter-presentations-with-optional-code-visibility/)
144 |
145 | - [Ultimate iPython notebook tips](https://blog.juliusschulz.de/blog/ultimate-ipython-notebook)
146 |
147 | - [Full preprocessing for medical images tutorial](https://www.kaggle.com/gzuidhof/data-science-bowl-2017/full-preprocessing-tutorial)
148 |
149 | - [Example ConvNet for a kaggle problem (cats vs dogs)](https://www.kaggle.com/sentdex/dogs-vs-cats-redux-kernels-edition/full-classification-example-with-convnet)
150 |
151 | - Fernando Pérez, Brian E. Granger, IPython: A System for Interactive Scientific Computing, Computing in Science and Engineering, vol. 9, no. 3, pp. 21-29, May/June 2007, doi:10.1109/MCSE.2007.53. URL: (http://ipython.org)
152 |
153 |
--------------------------------------------------------------------------------
/images/Title_ANN.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/voletiv/myPythonNeuralNetwork/e215c6f20ca0b2d7aa947f956049110f4e60b094/images/Title_ANN.png
--------------------------------------------------------------------------------
/images/digitsNN.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/voletiv/myPythonNeuralNetwork/e215c6f20ca0b2d7aa947f956049110f4e60b094/images/digitsNN.png
--------------------------------------------------------------------------------
/images/optimizers.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/voletiv/myPythonNeuralNetwork/e215c6f20ca0b2d7aa947f956049110f4e60b094/images/optimizers.gif
--------------------------------------------------------------------------------
/myPyNN.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | DEBUG = 0
3 |
4 | class MyPyNN(object):
5 |
6 | def __init__(self, layers=[3, 4, 2]):
7 |
8 | self.layers = layers
9 |
10 | # Network
11 | self.weights = [np.random.randn(x+1, y)
12 | for x, y in zip(self.layers[:-1], self.layers[1:])]
13 |
14 | # For mean-centering
15 | self.meanX = np.zeros((1, self.layers[0]))
16 |
17 | # Default options
18 | self.learningRate = 1.0
19 | self.regLambda = 0
20 | self.adaptLearningRate = False
21 | self.normalizeInputs = False
22 | self.meanCentering = False
23 | self.visible = False
24 |
25 | def predict(self, X, visible=False):
26 | self.visible = visible
27 | # mean-centering
28 | inputs = self.preprocessTestingInputs(X) - self.meanX
29 |
30 | if inputs.ndim!=1 and inputs.ndim!=2:
31 | print "X is not one or two dimensional, please check."
32 | return
33 |
34 | if DEBUG or self.visible:
35 | print "PREDICT:"
36 | print inputs
37 |
38 | for l, w in enumerate(self.weights):
39 | inputs = self.addBiasTerms(inputs)
40 | inputs = self.sigmoid(np.dot(inputs, w))
41 | if DEBUG or self.visible:
42 | print "Layer "+str(l+1)
43 | print inputs
44 |
45 | return inputs
46 |
47 | def trainUsingMinibatchGD(self, X, y, nEpochs=1000, minibatchSize=100,
48 | learningRate=0.05, regLambda=0, adaptLearningRate=False,
49 | normalizeInputs=False, meanCentering=False,
50 | printTestAccuracy=False, testX=None, testY=None,
51 | visible=False):
52 | self.learningRate = float(learningRate)
53 | self.regLambda = regLambda
54 | self.adaptLearningRate = adaptLearningRate
55 | self.normalizeInputs = normalizeInputs
56 | self.meanCentering = meanCentering
57 | self.visible = visible
58 |
59 | X = self.preprocessTrainingInputs(X)
60 | y = self.preprocessOutputs(y)
61 |
62 | yPred = self.predict(X, visible=self.visible)
63 |
64 | if yPred.shape != y.shape:
65 | print "Shape of y ("+str(y.shape)+") does not match what shape of y is supposed to be: "+str(yPred.shape)
66 | return
67 |
68 | self.trainAccuracy = (np.sum([np.argmax(yPred[k])==np.argmax(y[k])
69 | for k in range(len(y))])).astype(float)/len(y)
70 | print "train accuracy = " + str(self.trainAccuracy)
71 |
72 | self.prevCost = 0.5*np.sum((yPred-y)**2)/len(y)
73 | print "cost = " + str(self.prevCost)
74 | self.cost = self.prevCost
75 |
76 | # mean-centering
77 | if self.meanCentering:
78 | X = X - self.meanX
79 | else:
80 | X = X
81 |
82 | self.inputs = X
83 |
84 | if DEBUG or self.visible:
85 | print "train input:"+str(inputs)
86 |
87 | # Just to ensure minibatchSize !> len(X)
88 | if minibatchSize > len(X):
89 | minibatchSize = int(len(X)/10)+1
90 |
91 | # Test data
92 | if printTestAccuracy:
93 | if testX==None and testY==None:
94 | print "No test data given"
95 | testX = np.zeros((1, len(X)))
96 | testY = np.zeros((1,1))
97 | elif testX==None or testY==None:
98 | print "One of testData not available"
99 | return
100 | else:
101 | testX = self.preprocessTrainingInputs(testX)
102 | testY = self.preprocessOutputs(testY)
103 | if len(testX)!=len(testY):
104 | print "Test Datas not of same length"
105 | return
106 |
107 | yTestPred = self.predict(testX, visible=self.visible)
108 | self.testAccuracy = np.sum([np.argmax(yTestPred[k])==np.argmax(testY[k])
109 | for k in range(len(testY))])/float(len(testY))
110 | print "test accuracy = " + str(self.testAccuracy)
111 |
112 | # Randomly initialize old weights (for adaptive learning), will copy values later
113 | if adaptLearningRate:
114 | self.oldWeights = [np.random.randn(i+1, j)
115 | for i, j in zip(self.layers[:-1], self.layers[1:])]
116 |
117 | # For each epoch
118 | for i in range(nEpochs):
119 |
120 | print "Epoch "+str(i)+" of "+str(nEpochs)
121 |
122 | ## Find minibatches
123 | # Generate list of indices of full training data
124 | fullIdx = list(range(len(X)))
125 | # Shuffle the list
126 | np.random.shuffle(fullIdx)
127 | # Make list of mininbatches
128 | minibatches = [fullIdx[k:k+minibatchSize]
129 | for k in xrange(0, len(X), minibatchSize)]
130 |
131 | # For each minibatch
132 | for mininbatch in mininbatches:
133 | # Find X and y for each minibatch
134 | miniX = X[idx]
135 | miniY = y[idx]
136 |
137 | # Forward propagate through miniX
138 | a = self.forwardProp(miniX)
139 |
140 | # Check if Forward Propagation was successful
141 | if a==False:
142 | return
143 |
144 | # Save old weights before backProp in case of adaptLR
145 | if adaptLearningRate:
146 | for i in range(len(self.weights)):
147 | self.oldWeights[i] = np.array(self.weights[i])
148 |
149 | # Back propagate, update weights for minibatch
150 | self.backPropGradDescent(miniX, miniY)
151 |
152 | yPred = self.predict(X, visible=self.visible)
153 |
154 | self.trainAccuracy = (np.sum([np.argmax(yPred[k])==np.argmax(y[k])
155 | for k in range(len(y))])).astype(float)/len(y)
156 | print "train accuracy = " + str(self.trainAccuracy)
157 | if printTestAccuracy:
158 | yTestPred = self.predict(testX, visible=self.visible)
159 | self.testAccuracy = (np.sum([np.argmax(yTestPred[k])==np.argmax(testY[k])
160 | for k in range(len(testY))])).astype(float)/len(testY)
161 | print "test accuracy = " + str(self.testAccuracy)
162 |
163 | self.cost = 0.5*np.sum((yPred-y)**2)/len(y)
164 | print "cost = " + str(self.cost)
165 |
166 | if adaptLearningRate:
167 | self.adaptLR()
168 |
169 | self.evaluate(X, y)
170 |
171 | self.prevCost = self.cost
172 |
173 | def forwardProp(self, inputs):
174 | inputs = self.preprocessInputs(inputs)
175 | print "Forward..."
176 |
177 | if inputs.ndim!=1 and inputs.ndim!=2:
178 | print "Input argument " + str(inputs.ndim) + \
179 | "is not one or two dimensional, please check."
180 | return False
181 |
182 | if (inputs.ndim==1 and len(inputs)!=self.layers[0]) or \
183 | (inputs.ndim==2 and inputs.shape[1]!=self.layers[0]):
184 | print "Input argument does not match input dimensions (" + \
185 | str(self.layers[0]) + ") of network."
186 | return False
187 |
188 | if DEBUG or self.visible:
189 | print inputs
190 |
191 | # Save the outputs of each layer
192 | self.outputs = []
193 |
194 | # For each layer
195 | for l, w in enumerate(self.weights):
196 | # Add bias term to the input
197 | inputs = self.addBiasTerms(inputs)
198 |
199 | # Calculate the output
200 | self.outputs.append(self.sigmoid(np.dot(inputs, w)))
201 |
202 | # Set this as the input to the next layer
203 | inputs = np.array(self.outputs[-1])
204 |
205 | if DEBUG or self.visible:
206 | print "Layer "+str(l+1)
207 | print "inputs: "+str(inputs)
208 | print "weights: "+str(w)
209 | print "output: "+str(inputs)
210 | del inputs
211 |
212 | return True
213 |
214 | def backPropGradDescent(self, X, y):
215 | print "...Backward"
216 |
217 | # Correct the formats of inputs and outputs
218 | X = self.preprocessInputs(X)
219 | y = self.preprocessOutputs(y)
220 |
221 | # Compute first error
222 | bpError = self.outputs[-1] - y
223 |
224 | if DEBUG or self.visible:
225 | print "error = self.outputs[-1] - y:"
226 | print error
227 |
228 | # For each layer in reverse order (last layer to first layer)
229 | for l, w in enumerate(reversed(self.weights)):
230 | if DEBUG or self.visible:
231 | print "LAYER "+str(len(self.weights)-l)
232 |
233 | # The calculated output "z" of that layer
234 | predOutputs = self.outputs[-l-1]
235 |
236 | if DEBUG or self.visible:
237 | print "predOutputs"
238 | print predOutputs
239 |
240 | # delta = error*(z*(1-z)) === nxneurons
241 | delta = np.multiply(error, np.multiply(predOutputs, 1 - predOutputs))
242 |
243 | if DEBUG or self.visible:
244 | print "To compute error to be backpropagated:"
245 | print "del = predOutputs*(1 - predOutputs)*error :"
246 | print delta
247 | print "weights:"
248 | print w
249 |
250 | # Compute new error to be propagated back (bias term neglected in backpropagation)
251 | bpError = np.dot(delta, w[1:,:].T)
252 |
253 | if DEBUG or self.visible:
254 | print "backprop error = np.dot(del, w[1:,:].T) :"
255 | print error
256 |
257 | # If we are at first layer, inputs are data points
258 | if l==len(self.weights)-1:
259 | inputs = self.addBiasTerms(X)
260 | # Else, inputs === outputs from previous layer
261 | else:
262 | inputs = self.addBiasTerms(self.outputs[-l-2])
263 |
264 | if DEBUG or self.visible:
265 | print "To compute errorTerm:"
266 | print "inputs:"
267 | print inputs
268 | print "del:"
269 | print delta
270 |
271 | # errorTerm = (inputs.T).*(delta)/n
272 | # delta === nxneurons, inputs === nxprev, W === prevxneurons
273 | errorTerm = np.dot(inputs.T, delta)/len(y)
274 | if errorTerm.ndim==1:
275 | errorTerm.reshape((len(errorTerm), 1))
276 |
277 | if DEBUG or self.visible:
278 | print "errorTerm = np.dot(inputs.T, del) :"
279 | print errorTerm
280 |
281 | # regularization term
282 | regWeight = np.zeros(w.shape)
283 | regWeight[1:,:] = self.regLambda #bias term neglected
284 |
285 | if DEBUG or self.visible:
286 | print "To update weights:"
287 | print "learningRate*errorTerm:"
288 | print self.learningRate*errorTerm
289 | print "regWeight:"
290 | print regWeight
291 | print "weights:"
292 | print w
293 | print "regTerm = regWeight*w :"
294 | print regWeight*w
295 |
296 | # Update weights
297 | self.weights[-l-1] = w - \
298 | (self.learningRate*errorTerm + np.multiply(regWeight,w))
299 |
300 | if DEBUG or self.visible:
301 | print "Updated 'weights' = learningRate*errorTerm + regTerm :"
302 | print self.weights[len(self.weights)-l-1]
303 |
304 | def adaptLR(self):
305 | if self.cost > self.prevCost:
306 | print "Cost increased!!"
307 | self.learningRate /= 2.0
308 | print " - learningRate halved to: "+str(self.learningRate)
309 | for i in range(len(self.weights)):
310 | self.weights[i] = self.oldWeights[i]
311 | print " - weights reverted back"
312 | # good function
313 | else:
314 | self.learningRate *= 1.05
315 | print " - learningRate increased by 5% to: "+str(self.learningRate)
316 |
317 | def preprocessTrainingInputs(self, X):
318 | X = self.preprocessInputs(X)
319 | if self.normalizeInputs and np.max(X) > 1.0:
320 | X = X/255.0
321 | if np.all(self.meanX == np.zeros((1, self.layers[0]))) and self.meanCentering:
322 | self.meanX = np.reshape(np.mean(X, axis=0), (1, X.shape[1]))
323 | return X
324 |
325 | def preprocessTestingInputs(self, X):
326 | X = self.preprocessInputs(X)
327 | if self.normalizeInputs and np.max(X) > 1.0:
328 | X = X/255.0
329 | return X
330 |
331 | def preprocessInputs(self, X):
332 | X = np.array(X, dtype=float)
333 | # if X is int
334 | if X.ndim==0:
335 | X = np.array([X])
336 | # if X is 1D
337 | if X.ndim==1:
338 | if self.layers[0]==1: #if ndim=1
339 | X = np.reshape(X, (len(X),1))
340 | else: #if X is only 1 nd-ndimensional vector
341 | X = np.reshape(X, (1,len(X)))
342 | return X
343 |
344 | def preprocessOutputs(self, Y):
345 | Y = np.array(Y, dtype=float)
346 | # if Y is int
347 | if Y.ndim==0:
348 | Y = np.array([Y])
349 | # if Y is 1D
350 | if Y.ndim==1:
351 | if self.layers[-1]==1:
352 | Y = np.reshape(Y, (len(Y),1))
353 | else:
354 | Y = np.reshape(Y, (1,len(Y)))
355 | return Y
356 |
357 | def addBiasTerms(self, X):
358 | if X.ndim==0 or X.ndim==1:
359 | X = np.insert(X, 0, 1)
360 | elif X.ndim==2:
361 | X = np.insert(X, 0, 1, axis=1)
362 | return X
363 |
364 | def sigmoid(self, z):
365 | return 1/(1 + np.exp(-z))
366 |
367 | def evaluate(self, X, Y):
368 | yPreds = forwardProp(X, self.weights)[-1]
369 | test_results = [(np.argmax(yPreds[i]), np.argmax(Y[i]))
370 | for i in range(len(Y))]
371 | yes = sum(int(x == y) for (x, y) in test_results)
372 | print(str(yes)+" out of "+str(len(Y)))
373 |
374 | def loadMNISTData(self, path='/Users/vikram.v/Downloads/mnist.npz'):
375 | # Use numpy.load() to load the .npz file
376 | f = np.load(path)
377 |
378 | # To check files stored in .npz file
379 | f.files
380 |
381 | # Saving the files
382 | x_train = f['x_train']
383 | y_train = f['y_train']
384 | x_test = f['x_test']
385 | y_test = f['y_test']
386 | f.close()
387 |
388 | # Preprocess inputs
389 | x_train_new = np.array([x.flatten() for x in x_train])
390 | y_train_new = np.zeros((len(y_train), 10))
391 | for i in range(len(y_train)):
392 | y_train_new[i][y_train[i]] = 1
393 |
394 | x_test_new = np.array([x.flatten() for x in x_test])
395 | y_test_new = np.zeros((len(y_test), 10))
396 | for i in range(len(y_test)):
397 | y_test_new[i][y_test[i]] = 1
398 |
399 | return [x_train_new, y_train_new, x_test_new, y_test_new]
400 |
--------------------------------------------------------------------------------
/myPyNNTest.py:
--------------------------------------------------------------------------------
1 | from myPyNN import *
2 |
3 | # RANDOM
4 | X = [[2,2,2], [3,3,3], [4,4,4], [5,5,5], [6,6,6], [7,7,7], [8,8,8], [9,9,9], [10,10,10], [11,11,11]]
5 | y = [.2, .3, .4, .5, .6, .7, .8, .9, 0, .1]
6 | myNN = MyPyNN([3, 10, 1])
7 |
8 |
9 | # MANUAL CALCULATIONS TO CHECK NETWORK
10 | def addBiasTerms(X):
11 | if X.ndim==0 or X.ndim==1:
12 | X = np.insert(X, 0, 1)
13 | elif X.ndim==2:
14 | X = np.insert(X, 0, 1, axis=1)
15 | return X
16 |
17 | def sigmoid(z):
18 | return 1/(1 + np.exp(-z))
19 |
20 | X = np.array([[0,0], [0,1], [1,0], [1,1]])
21 | y = np.array([[0], [1], [1], [1]])
22 | myNN = MyPyNN([2, 1, 1])
23 | lr = 1.5
24 | nIterations = 1
25 | W01 = myNN.weights[0]
26 | W02 = myNN.weights[1]
27 | W1 = W01
28 | W2 = W02
29 | X = X.astype('float')
30 | inputs = X - np.reshape(np.mean(X, axis=0), (1, X.shape[1]))
31 | for i in range(nIterations):
32 | yPred = sigmoid(np.dot(addBiasTerms(sigmoid(np.dot(addBiasTerms(inputs), W1))), W2))
33 | err2 = yPred - y
34 | output1 = sigmoid(np.dot(addBiasTerms(inputs), W1))
35 | del2 = np.multiply(np.multiply(yPred, (1-yPred)), err2)
36 | err1 = np.dot(del2, W2[1:].T)
37 | deltaW2 = lr*np.dot(addBiasTerms(output1).T, del2)/len(yPred)
38 | newW2 = W2 - deltaW2
39 | del1 = np.multiply(np.multiply(output1, 1-output1), err1)
40 | deltaW1 = lr*np.dot(addBiasTerms(inputs).T, del1)/len(yPred)
41 | newW1 = W1 - deltaW1
42 | W1 = newW1
43 | W2 = newW2
44 |
45 | myNN.trainUsingGD(X, y, learningRate=lr, nIterations=nIterations, visible=True)
46 | newW1 == myNN.weights[0]
47 | newW2 == myNN.weights[1]
48 |
49 | yPred == myNN.outputs[1]
50 | output1 == myNN.outputs[0]
51 |
52 |
53 | # COMPARING LEARNING RATES
54 | myNN1 = MyPyNN([2, 3, 1])
55 | myNN2 = MyPyNN([2, 3, 1])
56 | myNN3 = MyPyNN([2, 3, 1])
57 | myNN4 = MyPyNN([2, 3, 1])
58 | myNN5 = MyPyNN([2, 3, 1])
59 | myNN2.weights[0] = myNN1.weights[0]
60 | myNN2.weights[1] = myNN1.weights[1]
61 | myNN3.weights[0] = myNN1.weights[0]
62 | myNN3.weights[1] = myNN1.weights[1]
63 | myNN4.weights[0] = myNN1.weights[0]
64 | myNN4.weights[1] = myNN1.weights[1]
65 | myNN5.weights[0] = myNN1.weights[0]
66 | myNN5.weights[1] = myNN1.weights[1]
67 | myNN1.trainUsingGD(X, y, learningRate=0.1, nIterations=2500)
68 | myNN2.trainUsingGD(X, y, learningRate=0.5, nIterations=600)
69 | myNN3.trainUsingGD(X, y, learningRate=1, nIterations=400)
70 | myNN4.trainUsingGD(X, y, learningRate=2, nIterations=200)
71 | myNN5.trainUsingGD(X, y, learningRate=200, nIterations=1000)
72 |
73 |
74 | # Make network
75 | myNN = MyPyNN([784, 30, 10])
76 | lr = 3
77 | nIterations = 30
78 | minibatchSize = 10
79 |
80 | # MNIST DATA
81 | '''
82 | f = np.load(path)
83 |
84 | # To check files stored in .npz file
85 | f.files
86 |
87 | # Saving the files
88 | x_train = f['x_train']
89 | y_train = f['y_train']
90 | x_test = f['x_test']
91 | y_test = f['y_test']
92 | f.close()
93 |
94 | # Preprocess inputs
95 | x_train_new = np.array([x.flatten() for x in x_train])
96 | y_train_new = np.zeros((len(y_train), 10))
97 | for i in range(len(y_train)):
98 | y_train_new[i][y_train[i]] = 1
99 |
100 | x_test_new = np.array([x.flatten() for x in x_test])
101 | y_test_new = np.zeros((len(y_test), 10))
102 | for i in range(len(y_test)):
103 | y_test_new[i][y_test[i]] = 1
104 | '''
105 |
106 | [x_train_new, y_train_new, x_test_new, y_test_new] = myNN.loadMNISTData()
107 |
108 | myNN.trainUsingGD(x_train_new, y_train_new, nIterations=nIterations, learningRate=lr)
109 | myNN.trainUsingMinibatchGD(x_train_new, y_train_new, nIterations=nIterations, minibatchSize=minibatchSize, learningRate=lr)
110 | myNN.trainUsingminibatchGD(x_train_new, y_train_new, nIterations=nIterations, minibatchSize=minibatchSize, learningRate=lr, printTestAccuracy=True, testX=x_test_new, testY=y_test_new)
111 |
112 | # Make network
113 | myNN = MyPyNN([784, 5, 5, 10])
114 | lr = 1.5
115 | nIterations = 1000
116 | minibatchSize = 100
117 | myNN.trainUsingSGD(x_train_new, y_train_new, nIterations=nIterations, minibatchSize=minibatchSize, learningRate=lr)
118 |
119 | # To check type of the dataset
120 | type(x_train)
121 | type(y_train)
122 | # To check data
123 | x_train.shape
124 | y_train.shape
125 | fig = plt.figure(figsize=(10, 2))
126 | for i in range(20):
127 | ax1 = fig.add_subplot(2, 10, i+1)
128 | ax1.imshow(x_train[i], cmap='gray');
129 | ax1.axis('off')
130 |
131 |
--------------------------------------------------------------------------------