├── .gitignore
├── 1_Neural_Network_Tutorial_Visualizations.ipynb
├── 2_Neural_Network_Tutorial_Matrix_Representations.ipynb
├── 3_Neural_Network_Tutorial_Writing_NN_ForwardProp_In_Python.ipynb
├── 4_Neural_Network_Tutorial_Backpropagation.ipynb
├── 5_Neural_Network_Tutorial_Training_And_Testing.ipynb
├── 6_Neural_Network_Tutorial_Descent_Experimenting_with_Optimizers.ipynb
├── MNIST experiments.ipynb
├── README.md
├── images
    ├── Title_ANN.png
    ├── digitsNN.png
    └── optimizers.gif
├── myPyNN.py
└── myPyNNTest.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Byte-compiled / optimized / DLL files
 2 | __pycache__/
 3 | *.py[cod]
 4 | *$py.class
 5 | 
 6 | # C extensions
 7 | *.so
 8 | 
 9 | # Distribution / packaging
10 | .Python
11 | env/
12 | build/
13 | develop-eggs/
14 | dist/
15 | downloads/
16 | eggs/
17 | .eggs/
18 | lib/
19 | lib64/
20 | parts/
21 | sdist/
22 | var/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 | 
27 | # PyInstaller
28 | #  Usually these files are written by a python script from a template
29 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
30 | *.manifest
31 | *.spec
32 | 
33 | # Installer logs
34 | pip-log.txt
35 | pip-delete-this-directory.txt
36 | 
37 | # Unit test / coverage reports
38 | htmlcov/
39 | .tox/
40 | .coverage
41 | .coverage.*
42 | .cache
43 | nosetests.xml
44 | coverage.xml
45 | *,cover
46 | .hypothesis/
47 | 
48 | # Translations
49 | *.mo
50 | *.pot
51 | 
52 | # Django stuff:
53 | *.log
54 | local_settings.py
55 | 
56 | # Flask stuff:
57 | instance/
58 | .webassets-cache
59 | 
60 | # Scrapy stuff:
61 | .scrapy
62 | 
63 | # Sphinx documentation
64 | docs/_build/
65 | 
66 | # PyBuilder
67 | target/
68 | 
69 | # IPython Notebook
70 | .ipynb_checkpoints
71 | 
72 | # pyenv
73 | .python-version
74 | 
75 | # celery beat schedule file
76 | celerybeat-schedule
77 | 
78 | # dotenv
79 | .env
80 | 
81 | # virtualenv
82 | venv/
83 | ENV/
84 | 
85 | # Spyder project settings
86 | .spyderproject
87 | 
88 | # Rope project settings
89 | .ropeproject
90 | 
91 | # OS X
92 | .DS_Store
93 | 


--------------------------------------------------------------------------------
/2_Neural_Network_Tutorial_Matrix_Representations.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Matrix representations\n",
  8 |     "\n",
  9 |     "## Matrix representations - input to the network\n",
 10 |     "\n",
 11 |     "Suppose an input has $d_{i}$ dimensions. (Remember that the input has been normalized to range between 0 and 1.)\n",
 12 |     "\n",
 13 |     "Then each input would be:\n",
 14 |     "\n",
 15 |     "$$X \\; (without bias) _{1{\\times}d_{i}} = \\left[ \\begin{array}{c} x_{0} & x_{1} & \\cdots & x_{(d_{i}-1)} \\end{array} \\right] _{1{\\times}d_{i}}$$\n",
 16 |     "\n",
 17 |     "After adding the bias term,\n",
 18 |     "\n",
 19 |     "$$X_{1{\\times}(d_{i}+1)} = \\left[ \\begin{array}{c} 1 & X_{1{\\times}d_{i}} \\end{array} \\right] _{1{\\times}(d_{i}+1)}$$\n",
 20 |     "\n",
 21 |     "For example, one of the data points given above to make a logic gate was $(0,1)$. Here, $X = \\left[ \\begin{array}{c} 1 & 0 & 1 \\end{array} \\right]_{1{\\times}(2+1)}$"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "markdown",
 26 |    "metadata": {},
 27 |    "source": [
 28 |     "Suppose we provide $n$ $d_{i}$-dimensional data points. For the first layer of neurons, we can make an input matrix of $n{\\times}d_{i}$ dimension.\n",
 29 |     "\n",
 30 |     "$$X^{(1)}_{n{\\times}(d_{i}+1)} = \n",
 31 |     "\\left[ \\begin{array}{c} 1 & _{(0)}X \\\\ 1 & _{(1)}X \\\\ \\vdots & \\vdots \\\\ 1 & _{(n-1)}X \\end{array} \\right] _{n{\\times}(d_{i}+1)}\n",
 32 |     "=\n",
 33 |     "\\left[ \\begin{array}{c} \n",
 34 |     "1 & _{(0)}x_{0} & _{(0)}x_{1} & _{(0)}x_{2} & \\cdots & _{(0)}x_{(d_{i}-1)} \\\\ \n",
 35 |     "1 & _{(1)}x_{0} & _{(1)}x_{1} & _{(1)}x_{2} & \\cdots & _{(1)}x_{(d_{i}-1)} \\\\ \n",
 36 |     "\\vdots & \\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\ \n",
 37 |     "1 & _{(n-1)}x_{0} & _{(n-1)}x_{1} & _{(n-1)}x_{2} & \\cdots & _{(n-1)}x_{(d_{i}-1)} \n",
 38 |     "\\end{array} \\right] _{n{\\times}(d_{i}+1)}$$\n",
 39 |     "\n",
 40 |     "For example, for logic gates, the input matrix was $X = \\left[ \\begin{array}{c} 1 & 0 & 0 \\\\ 1 & 0 & 1 \\\\ 1 & 1 & 0 \\\\ 1 & 1 & 1 \\end{array} \\right] _{4{\\times}3} $"
 41 |    ]
 42 |   },
 43 |   {
 44 |    "cell_type": "markdown",
 45 |    "metadata": {},
 46 |    "source": [
 47 |     "## Matrix representations - output of a layer\n",
 48 |     "\n",
 49 |     "Suppose the output of the $l^{th}$ layer has $o_{l}$ dimensions, meaning there are $o_{l}$ neurons in the layer.\n",
 50 |     "\n",
 51 |     "In the above example, the output of the 1st Layer of 2 neurons is $o_{1} = 2$, and the output of the 2nd layer of 1 neuron is $o_{2} = 1$.\n",
 52 |     "\n",
 53 |     "For each input, the output is an $o_{l}$-dimensional vector:\n",
 54 |     "\n",
 55 |     "$$Y^{(l)} = \\left[ \\begin{array}{c} y_{[0]}^{(l)} & y_{[1]}^{(l)} & \\cdots & y_{[o_{l}-1]}^{(l)} \\end{array} \\right] _{1{\\times}o_{l}}$$\n",
 56 |     "\n",
 57 |     "\n",
 58 |     "For example, for an AND gate, the output of $(0,1)$ is $Y = \\left[ \\begin{array}{c} 0 \\end{array} \\right] _{1{\\times}1}$"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "markdown",
 63 |    "metadata": {},
 64 |    "source": [
 65 |     "Thus, for $n$ data points, the output is:\n",
 66 |     "\n",
 67 |     "$$Y^{(l)} = \\left[ \\begin{array}{c} \n",
 68 |     "{_{(0)}}Y^{(l)} \\\\ {_{(1)}}Y^{(l)} \\\\ \\vdots \\\\ _{(n-1)}Y^{(l)} \\end{array} \\right] _{n{\\times}o_{l}} \n",
 69 |     "= \\left[ \\begin{array}{c} \n",
 70 |     "{_{(0)}}y_{[0]}^{(l)} & \\cdots & {_{(0)}}y_{[o_{l}-1]}^{(l)} \\\\ \n",
 71 |     "{_{(1)}}y_{[0]}^{(l)} & \\cdots & {_{(1)}}y_{[o_{l}-1]}^{(l)} \\\\ \n",
 72 |     "\\vdots & \\ddots & \\vdots \\\\ \n",
 73 |     "_{(n-1)}y_{[0]}^{(l)} & \\cdots & _{(n-1)}y_{[o_{l}-1]}^{(l)} \n",
 74 |     "\\end{array} \\right] _{n{\\times}o_{l}}$$\n",
 75 |     "\n",
 76 |     "For example, for an AND gate, the output matrix is $Y = \\left[ \\begin{array}{c} 0 \\\\ 0 \\\\ 0 \\\\ 1 \\end{array} \\right] _{4{\\times}1}$"
 77 |    ]
 78 |   },
 79 |   {
 80 |    "cell_type": "markdown",
 81 |    "metadata": {},
 82 |    "source": [
 83 |     "## Matrix representations - input to a layer\n",
 84 |     "\n",
 85 |     "Suppose at the $l^{th}$ layer, the input has $i_{l}$ dimensions.\n",
 86 |     "\n",
 87 |     "(The number of inputs to the layer) = (1 bias term) + (the number of outputs from the previous layer):\n",
 88 |     "$$i_{l} = 1 + o_{(l-1)}$$\n",
 89 |     "\n",
 90 |     "In the above example, the input to the first layer of 2 neurons has $i_{1} = d_{i}+1 = 3$, and the second layer of 1 neuron has $i_{2} = o_{1} + 1 = 3$."
 91 |    ]
 92 |   },
 93 |   {
 94 |    "cell_type": "markdown",
 95 |    "metadata": {},
 96 |    "source": [
 97 |     "If there are $n$ data points given, the input to the $l^{th}$ layer would be an $n{\\times}i_{l} = n{\\times}(o_{(l-1)}+1)$ matrix:\n",
 98 |     "\n",
 99 |     "$$X^{(l)}_{n{\\times}i_{l}} \n",
100 |     "= \\left[ \\begin{array}{c} \n",
101 |     "1 & _{(0)}Y^{(l-1)} \\\\ \n",
102 |     "1 & _{(1)}Y^{(l-1)} \\\\ \n",
103 |     "\\vdots & \\vdots \\\\  \n",
104 |     "1 & _{(n-1)}Y^{(l-1)} \n",
105 |     "\\end{array} \\right] _{n{\\times}i_{l}}\n",
106 |     "= \\left[ \\begin{array}{c} \n",
107 |     "1 & _{(0)}y^{(l-1)}_{[0]} & \\cdots & _{(0)}y^{(l-1)}_{[o_{l-1}-1]} \\\\ \n",
108 |     "1 & _{(1)}y^{(l-1)}_{[0]} & \\cdots & _{(1)}y^{(l-1)}_{[o_{l-1}-1]} \\\\ \n",
109 |     "\\vdots & \\vdots & \\ddots & \\vdots \\\\ \n",
110 |     "1 & _{(n-1)}y^{(l-1)}_{[0]} & \\cdots & _{(n-1)}y^{(l-1)}_{[o_{l-1}-1]} \n",
111 |     "\\end{array} \\right] _{n{\\times}i_{l}}$$\n",
112 |     "\n",
113 |     "For example, in the 3-neurons neural network above, input matrix to the first layer is $\\left[ \\begin{array}{c} 1 & x_0 & x_1 \\end{array} \\right] _{1{\\times}3}$, and the input matrix to the second layer is $\\left[ \\begin{array}{c} 1 & y_0 & y_1 \\end{array} \\right] _{1{\\times}3}$"
114 |    ]
115 |   },
116 |   {
117 |    "cell_type": "markdown",
118 |    "metadata": {},
119 |    "source": [
120 |     "## Matrix representations - weight matrix of one neuron\n",
121 |     "\n",
122 |     "For a neuron, the weight matrix multiplies a weight with each input in every dimension, and sums them. This can be represented by a dot product.\n",
123 |     "\n",
124 |     "Assuming the input to the $k^{th}$ neuron in the $l^{th}$ layer has $i_{l}$ dimensions,\n",
125 |     "\n",
126 |     "$$W^{(l)}_{[k]} {_{1{\\times}i_{l}}} = \\left[ \\begin{array}{c} w^{(l)}_{[k],0} & w^{(l)}_{[k],1} & \\cdots & w^{(l)}_{[k],i_{l}-1} \\end{array} \\right] _{1{\\times}i_{l}}$$\n",
127 |     "\n",
128 |     "(Remember $i_{l} = 1 + o_{(l-1)}$)"
129 |    ]
130 |   },
131 |   {
132 |    "cell_type": "markdown",
133 |    "metadata": {},
134 |    "source": [
135 |     "Then the output of that neuron for one data point is x < dot product \\> weights.\n",
136 |     "\n",
137 |     "$$y^{(l)}_{[k]} {_{1{\\times}1}} = Sigmoid( x^{(l)} {_{1{\\times}i_{l}}} \\; .* \\; W^{(l)}_{[k]}{^T}{_{i_{l}{\\times}1}} )$$\n",
138 |     "\n",
139 |     "$$\n",
140 |     "=\n",
141 |     "Sigmoid \\left(\n",
142 |     "x^{(l)}_{[k]}\n",
143 |     "\\left[ \\begin{array}{c} 1 & y^{(l-1)}_{0} & \\cdots & y^{(l-1)}_{(o_{l-1}-1)}\n",
144 |     "\\end{array} \\right] _{1{\\times}i_{l}}\n",
145 |     "\\;\\;\\; .* \\;\\;\\;\n",
146 |     "W^{(l)}_{[k]} {^{T}}\n",
147 |     "\\left[ \\begin{array}{c} w^{(l)}_{[k],0} \\\\ w^{(l)}_{[k],1} \\\\ \\vdots \\\\ w^{(l)}_{[k],i_{l}-1} \\end{array} \\right] _{i_{l}{\\times}1}\n",
148 |     "\\right)\n",
149 |     "$$\n",
150 |     "\n",
151 |     "$$\n",
152 |     "= Sigmoid(1*w^{(l)}_{[k],0} \\;\\;+\\;\\; y^{(l-1)}_{0}*w^{(l)}_{[k],1} \\;\\;+\\;\\; ... \\;\\;+\\;\\; y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1})\n",
153 |     "$$\n",
154 |     "\n",
155 |     "(We can see that the dot product of the $x$ and $W$ matrices does indeed give the output of the neuron)"
156 |    ]
157 |   },
158 |   {
159 |    "cell_type": "markdown",
160 |    "metadata": {},
161 |    "source": [
162 |     "For $n$ data points, the output of the $k^{th}$ neuron in the $l^{th}$ layer is:\n",
163 |     "$$Y^{(l)}_{[k]} {_{n{\\times}1}}\n",
164 |     "=\n",
165 |     "Sigmoid \\left(\n",
166 |     "X^{(l)}_{[k]}\n",
167 |     "\\left[ \\begin{array}{c} \n",
168 |     "1 & _{(0)}y^{(l-1)}_{0} & \\cdots & _{(0)}y^{(l-1)}_{(o_{l-1}-1)} \\\\\n",
169 |     "1 & _{(1)}y^{(l-1)}_{0} & \\cdots & _{(1)}y^{(l-1)}_{(o_{l-1}-1)} \\\\\n",
170 |     "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
171 |     "1 & _{(n-1)}y^{(l-1)}_{0} & \\cdots & _{(n-1)}y^{(l-1)}_{(o_{l-1}-1)}\n",
172 |     "\\end{array} \\right] _{n{\\times}i_{l}}\n",
173 |     "\\; .* \\;\n",
174 |     "W^{(l)}_{[k]} {^{T}}\n",
175 |     "\\left[ \\begin{array}{c} w^{(l)}_{[k],0} \\\\ w^{(l)}_{[k],1} \\\\ \\vdots \\\\ w^{(l)}_{[k],i_{l}-1} \\end{array} \\right] _{i_{l}{\\times}1}\n",
176 |     "\\right)\n",
177 |     "$$\n",
178 |     "\n",
179 |     "$$\n",
180 |     "=\n",
181 |     "Sigmoid \\left(\n",
182 |     "\\left[ \\begin{array}{c} \n",
183 |     "1*w^{(l)}_{[k],0} \\;\\;+\\;\\; _{(0)}y^{(l-1)}_{(0)}*w^{(l)}_{[k],1} \\;\\;+\\;\\; ... \\;\\;+\\;\\; _{(0)}y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1} \\\\\n",
184 |     "1*w^{(l)}_{[k],0} \\;\\;+\\;\\; _{(1)}y^{(l-1)}_{0}*w^{(l)}_{[k],1} \\;\\;+\\;\\; ... \\;\\;+\\;\\; _{(1)}y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1} \\\\\n",
185 |     "\\vdots \\\\\n",
186 |     "1*w^{(l)}_{[k],0} \\;\\;+\\;\\; _{(n-1)}y^{(l-1)}_{0}*w^{(l)}_{[k],1} \\;\\;+\\;\\; ... \\;\\;+\\;\\; _{(n-1)}y^{(l-1)}_{o_{l-1}-1}*w^{(l)}_{[k],i_{l}-1} \\\\\n",
187 |     "\\end{array} \\right] _{n{\\times}1}\n",
188 |     "\\right)\n",
189 |     "$$"
190 |    ]
191 |   },
192 |   {
193 |    "cell_type": "markdown",
194 |    "metadata": {},
195 |    "source": [
196 |     "## Matrix representations - weight of a layer of neurons\n",
197 |     "\n",
198 |     "Suppose the $l^{th}$ layer in a neural network has $o_{l}$ neurons.\n",
199 |     "\n",
200 |     "Each neuron would produce one number as its output - the dot product of its weights, and the inputs.\n",
201 |     "\n",
202 |     "In matrix form, the weight matrix of the layer is:\n",
203 |     "\n",
204 |     "$$\n",
205 |     "W^{(l)}_{o_{l}{\\times}i_{l}} = \\left[ \\begin{array}{c} W^{(l)}_{[0]} \\\\ W^{(l)}_{[1]} \\\\ \\cdots \\\\ W^{(l)}_{[o_{l}-1]} \\end{array} \\right] _{o_{l}{\\times}i_{l}} \n",
206 |     "= \n",
207 |     "\\left[ \\begin{array}{c} \n",
208 |     "w^{(l)}_{[0],0} & w^{(l)}_{[0],1} & w^{(l)}_{[0],2} & \\cdots & w^{(l)}_{[0],i_{l}-1} \\\\ \n",
209 |     "w^{(l)}_{[1],0} & w^{(l)}_{[1],1} & w^{(l)}_{[1],2} & \\cdots & w^{(l)}_{[1],i_{l}-1} \\\\ \n",
210 |     "\\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\ \n",
211 |     "w^{(l)}_{[o_{l}-1],0} & w^{(l)}_{[o_{l}-1],1} & w^{(l)}_{[o_{l}-1],2} & \\cdots & w^{(l)}_{[o_{l}-1],i_{l}-1} \n",
212 |     "\\end{array} \\right] _{o_{l}{\\times}i_{l}}\n",
213 |     "$$"
214 |    ]
215 |   },
216 |   {
217 |    "cell_type": "markdown",
218 |    "metadata": {},
219 |    "source": [
220 |     "The output of this layer of neurons is:\n",
221 |     "\n",
222 |     "$$ Y^{(l)}_{n{\\times}o_{l}} = Sigmoid\\;(\\;X^{(l)}_{n{\\times}i_{l}} \\; .* \\; W^{(l)}{^{T}}_{i_{l}{\\times}o_{l}} \\;)\\; $$\n",
223 |     "\n",
224 |     "$$\n",
225 |     "Y^{(l)}_{n{\\times}o_{l}} \\left[ \\begin{array}{c} \n",
226 |     "{_{(0)}}y_{0}^{(l)} & \\cdots & {_{(0)}}y_{o_{l}-1}^{(l)} \\\\ \n",
227 |     "{_{(1)}}y_{0}^{(l)} & \\cdots & {_{(1)}}y_{o_{l}-1}^{(l)} \\\\ \n",
228 |     "\\vdots & \\ddots & \\vdots \\\\ \n",
229 |     "_{(n-1)}y_{0}^{(l)} & \\cdots & _{(n-1)}y_{o_{l}-1}^{(l)} \n",
230 |     "\\end{array} \\right] _{n{\\times}o_{l}}\n",
231 |     "=\n",
232 |     "Sigmoid \\left(\n",
233 |     "X^{(l)}_{n{\\times}i_{l}} \\left[ \\begin{array}{c} \n",
234 |     "1 & _{(0)}y^{(l-1)}_{0} & \\cdots & _{(0)}y^{(l-1)}_{(o_{l-1}-1)} \\\\ \n",
235 |     "1 & _{(1)}y^{(l-1)}_{0} & \\cdots & _{(1)}y^{(l-1)}_{(o_{l-1}-1)} \\\\ \n",
236 |     "\\vdots & \\vdots & \\ddots & \\vdots \\\\ \n",
237 |     "1 & _{(n-1)}y^{(l-1)}_{0} & \\cdots & _{(n-1)}y^{(l-1)}_{(o_{l-1}-1)} \n",
238 |     "\\end{array} \\right] _{n{\\times}i_{l}}\n",
239 |     "\\; .* \\;\n",
240 |     "W^{(l)}{^{T}}_{i_{l}{\\times}o_{l}} \\left[ \\begin{array}{c} \n",
241 |     "w^{(l)}_{[0],0} & w^{(l)}_{[1],1} & \\cdots & w^{(l)}_{[o_{l}-1],0} \\\\ \n",
242 |     "w^{(l)}_{[0],1} & w^{(l)}_{[1],1} & \\cdots & w^{(l)}_{[o_{l}-1],1} \\\\ \n",
243 |     "\\vdots & \\vdots & \\ddots & \\vdots \\\\ \n",
244 |     "w^{(l)}_{[0],i_{l}-1} & w^{(l)}_{[1],1} & \\cdots & w^{(l)}_{[o_{l}-1],i_{l}-1} \n",
245 |     "\\end{array} \\right] _{i_{l}{\\times}o_{l}}\n",
246 |     "\\right)\n",
247 |     "$$\n",
248 |     "\n",
249 |     "$$\n",
250 |     "=\n",
251 |     "Sigmoid \\left(\n",
252 |     "\\left[ \\begin{array}{c} \n",
253 |     "1*w^{(l)}_{[0],0} + \\cdots + _{(0)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[0],i_{l-1}-1}\n",
254 |     "&\n",
255 |     "\\cdots\n",
256 |     "&\n",
257 |     "1*w^{(l)}_{[(o_{l}-1)],0} + \\cdots + _{(0)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[(o_{l}-1)],i_{l-1}-1}\n",
258 |     "\\\\\n",
259 |     "\\vdots & \\ddots & \\vdots\n",
260 |     "\\\\\n",
261 |     "1*w^{(l)}_{[0],0} + \\cdots + _{(n-1)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[0],i_{l-1}-1}\n",
262 |     "&\n",
263 |     "\\cdots\n",
264 |     "&\n",
265 |     "1*w^{(l)}_{[(o_{l}-1)],0} + \\cdots + _{(n-1)}y^{(l-1)}_{(i_{l-1}-1)}*w^{(l)}_{[(o_{l}-1)],i_{l-1}-1}\n",
266 |     "\\end{array} \\right] _{n{\\times}o_{l}}\n",
267 |     "\\right)\n",
268 |     "$$"
269 |    ]
270 |   },
271 |   {
272 |    "cell_type": "markdown",
273 |    "metadata": {},
274 |    "source": [
275 |     "## Conclusion\n",
276 |     "\n",
277 |     "We have seen that the action of a layer of a neural network can be written as the following matrix operation:\n",
278 |     "\n",
279 |     "$$ Y^{(l)}_{n{\\times}o_{l}} = Sigmoid\\;(\\;X^{(l)}_{n{\\times}i_{l}} \\; .* \\; W^{(l)}{^{T}}_{i_{l}{\\times}o_{l}} \\;)\\; $$\n",
280 |     "\n",
281 |     "So, a neural network can be defined as the set of weights $W^{(l)}_{i_{l}{\\times}o_{l}}$ for all its layers, where $l$ is the index of the layer we are considering, $i_{l}$ and $o_{l}$ are its input and output dimensions.\n",
282 |     "\n",
283 |     "Also, because of adding a bias term at every layer,\n",
284 |     "\n",
285 |     "$$i_{l} = 1 + o_{(l-1)}$$\n",
286 |     "\n",
287 |     "The utility of neural networks can be exploited only once the weight matrices $W^{(l)}_{i_{l}{\\times}o_{l}}$ for all $l$ have ben set according to need."
288 |    ]
289 |   }
290 |  ],
291 |  "metadata": {
292 |   "kernelspec": {
293 |    "display_name": "Python 3",
294 |    "language": "python",
295 |    "name": "python3"
296 |   },
297 |   "language_info": {
298 |    "codemirror_mode": {
299 |     "name": "ipython",
300 |     "version": 3
301 |    },
302 |    "file_extension": ".py",
303 |    "mimetype": "text/x-python",
304 |    "name": "python",
305 |    "nbconvert_exporter": "python",
306 |    "pygments_lexer": "ipython3",
307 |    "version": "3.5.1"
308 |   }
309 |  },
310 |  "nbformat": 4,
311 |  "nbformat_minor": 2
312 | }
313 | 


--------------------------------------------------------------------------------
/3_Neural_Network_Tutorial_Writing_NN_ForwardProp_In_Python.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 19,
  6 |    "metadata": {
  7 |     "collapsed": true
  8 |    },
  9 |    "outputs": [],
 10 |    "source": [
 11 |     "# Pre-requisites\n",
 12 |     "import numpy as np"
 13 |    ]
 14 |   },
 15 |   {
 16 |    "cell_type": "markdown",
 17 |    "metadata": {},
 18 |    "source": [
 19 |     "# Writing a neural network in python\n",
 20 |     "\n",
 21 |     "Firstly, a neural network is defined by the number of layers, and the number of neurons in each layer.\n",
 22 |     "\n",
 23 |     "Let us use a list to denote this."
 24 |    ]
 25 |   },
 26 |   {
 27 |    "cell_type": "markdown",
 28 |    "metadata": {},
 29 |    "source": [
 30 |     "## Defining layer sizes"
 31 |    ]
 32 |   },
 33 |   {
 34 |    "cell_type": "code",
 35 |    "execution_count": 31,
 36 |    "metadata": {},
 37 |    "outputs": [
 38 |     {
 39 |      "name": "stdout",
 40 |      "output_type": "stream",
 41 |      "text": [
 42 |       "jwifhiwfn\n"
 43 |      ]
 44 |     }
 45 |    ],
 46 |    "source": [
 47 |     "# Defining the sizes of the layers in our neural network\n",
 48 |     "layers = [2, 2, 1]\n",
 49 |     "print(\"jwifhiwfn\")"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "markdown",
 54 |    "metadata": {},
 55 |    "source": [
 56 |     "The above code denotes the 3-neuron neural network we saw previously: 2-dimensional input, 2 neurons in a hidden layer, 1 neuron in the output layer.\n",
 57 |     "\n",
 58 |     "Generally speaking, a neural network than has more than 1 hidden layer is a **deep** neural network."
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "markdown",
 63 |    "metadata": {},
 64 |    "source": [
 65 |     "## Defining weight matrices\n",
 66 |     "\n",
 67 |     "Using the sizes of the layers in our neural network, let us initialize the weight matrices to random values (sampled from a standard normal gaussian, because we know that we need both positive and negative weights)."
 68 |    ]
 69 |   },
 70 |   {
 71 |    "cell_type": "code",
 72 |    "execution_count": 3,
 73 |    "metadata": {
 74 |     "collapsed": true
 75 |    },
 76 |    "outputs": [],
 77 |    "source": [
 78 |     "# Initializing weight matrices from layer sizes\n",
 79 |     "def initializeWeights(layers):\n",
 80 |     "    weights = [np.random.randn(o, i+1) for i, o in zip(layers[:-1], layers[1:])]\n",
 81 |     "    return weights"
 82 |    ]
 83 |   },
 84 |   {
 85 |    "cell_type": "code",
 86 |    "execution_count": 37,
 87 |    "metadata": {},
 88 |    "outputs": [
 89 |     {
 90 |      "name": "stdout",
 91 |      "output_type": "stream",
 92 |      "text": [
 93 |       "1\n",
 94 |       "(2, 3)\n",
 95 |       "[[ 0.45147937  2.36764603 -0.44038386]\n",
 96 |       " [ 1.25899973 -1.06551598  0.20563357]]\n",
 97 |       "2\n",
 98 |       "(1, 3)\n",
 99 |       "[[-0.76261718 -0.90078965 -0.01774495]]\n"
100 |      ]
101 |     }
102 |    ],
103 |    "source": [
104 |     "# Displaying weight matrices\n",
105 |     "layers = [2, 2, 1]\n",
106 |     "weights = initializeWeights(layers)\n",
107 |     "\n",
108 |     "for i in range(len(weights)):\n",
109 |     "    print(i+1); print(weights[i].shape); print(weights[i])"
110 |    ]
111 |   },
112 |   {
113 |    "cell_type": "markdown",
114 |    "metadata": {},
115 |    "source": [
116 |     "# Forward Propagation\n",
117 |     "\n",
118 |     "The output of the neural network is calculated by **propagating forward** the outputs of each layer.\n",
119 |     "\n",
120 |     "Let us define our input as an np.array, since we want to represent matrices."
121 |    ]
122 |   },
123 |   {
124 |    "cell_type": "code",
125 |    "execution_count": 9,
126 |    "metadata": {
127 |     "collapsed": true
128 |    },
129 |    "outputs": [],
130 |    "source": [
131 |     "# We shall use np.array() to represent matrices\n",
132 |     "#X = np.array([23, 42, 56])\n",
133 |     "X = np.array([[0,0], [0,1], [1,0], [1,1]])"
134 |    ]
135 |   },
136 |   {
137 |    "cell_type": "markdown",
138 |    "metadata": {},
139 |    "source": [
140 |     "## Adding bias terms\n",
141 |     "\n",
142 |     "Since the input to every layer needs a bias term (1) added to it, let us define a function to do that."
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "code",
147 |    "execution_count": 29,
148 |    "metadata": {
149 |     "collapsed": true
150 |    },
151 |    "outputs": [],
152 |    "source": [
153 |     "# Add a bias term to every data point in the input\n",
154 |     "def addBiasTerms(X):\n",
155 |     "        # Make the input an np.array()\n",
156 |     "        X = np.array(X)\n",
157 |     "        \n",
158 |     "        # Forcing 1D vectors to be 2D matrices of 1xlength dimensions\n",
159 |     "        if X.ndim==1:\n",
160 |     "            X = np.reshape(X, (1, len(X)))\n",
161 |     "        \n",
162 |     "        # Inserting bias terms\n",
163 |     "        X = np.insert(X, 0, 1, axis=1)\n",
164 |     "        \n",
165 |     "        return X"
166 |    ]
167 |   },
168 |   {
169 |    "cell_type": "markdown",
170 |    "metadata": {},
171 |    "source": [
172 |     "Use the following cell to test the addBiasTerms function:"
173 |    ]
174 |   },
175 |   {
176 |    "cell_type": "code",
177 |    "execution_count": 30,
178 |    "metadata": {},
179 |    "outputs": [
180 |     {
181 |      "name": "stdout",
182 |      "output_type": "stream",
183 |      "text": [
184 |       "Before adding bias terms: \n",
185 |       "[[0 0]\n",
186 |       " [0 1]\n",
187 |       " [1 0]\n",
188 |       " [1 1]]\n",
189 |       "After adding bias terms: \n",
190 |       "[[1 0 0]\n",
191 |       " [1 0 1]\n",
192 |       " [1 1 0]\n",
193 |       " [1 1 1]]\n"
194 |      ]
195 |     }
196 |    ],
197 |    "source": [
198 |     "# TESTING addBiasTerms\n",
199 |     "\n",
200 |     "# We shall use np.array() to represent matrices\n",
201 |     "#X = np.array([23, 42, 56])\n",
202 |     "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n",
203 |     "print(\"Before adding bias terms: \"); print(X)\n",
204 |     "X = addBiasTerms(X)\n",
205 |     "print(\"After adding bias terms: \"); print(X)"
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "markdown",
210 |    "metadata": {},
211 |    "source": [
212 |     "## Sigmoid function\n",
213 |     "\n",
214 |     "Let us also define a function to calculate the sigmoid of any np.array given to it:"
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "code",
219 |    "execution_count": 13,
220 |    "metadata": {
221 |     "collapsed": true
222 |    },
223 |    "outputs": [],
224 |    "source": [
225 |     "# Sigmoid function\n",
226 |     "def sigmoid(a):\n",
227 |     "    return 1/(1 + np.exp(-a))"
228 |    ]
229 |   },
230 |   {
231 |    "cell_type": "markdown",
232 |    "metadata": {},
233 |    "source": [
234 |     "## Forward propagation of inputs\n",
235 |     "\n",
236 |     "Let us store the outputs of the layers in a list called \"outputs\". We shall use that the output of one layer as the input to the next layer."
237 |    ]
238 |   },
239 |   {
240 |    "cell_type": "code",
241 |    "execution_count": 17,
242 |    "metadata": {
243 |     "collapsed": true
244 |    },
245 |    "outputs": [],
246 |    "source": [
247 |     "# Forward Propagation of outputs\n",
248 |     "def forwardProp(X, weights):\n",
249 |     "    # Initializing an empty list of outputs\n",
250 |     "    outputs = []\n",
251 |     "    \n",
252 |     "    # Assigning a name to reuse as inputs\n",
253 |     "    inputs = X\n",
254 |     "    \n",
255 |     "    # For each layer\n",
256 |     "    for w in weights:\n",
257 |     "        # Add bias term to input\n",
258 |     "        inputs = addBiasTerms(inputs)\n",
259 |     "        \n",
260 |     "        # Y = Sigmoid ( X .* W^T )\n",
261 |     "        outputs.append(sigmoid(np.dot(inputs, w.T)))\n",
262 |     "        \n",
263 |     "        # Input of next layer is output of this layer\n",
264 |     "        inputs = outputs[-1]\n",
265 |     "        \n",
266 |     "    return outputs"
267 |    ]
268 |   },
269 |   {
270 |    "cell_type": "markdown",
271 |    "metadata": {},
272 |    "source": [
273 |     "Use the following cell to test forward propagation:"
274 |    ]
275 |   },
276 |   {
277 |    "cell_type": "code",
278 |    "execution_count": 24,
279 |    "metadata": {},
280 |    "outputs": [
281 |     {
282 |      "name": "stdout",
283 |      "output_type": "stream",
284 |      "text": [
285 |       "weights:\n",
286 |       "1\n",
287 |       "(2, 3)\n",
288 |       "[[-250  350  350]\n",
289 |       " [-250  200  200]]\n",
290 |       "2\n",
291 |       "(1, 3)\n",
292 |       "[[-100  500 -500]]\n",
293 |       "X:\n",
294 |       "[[0, 0], [0, 1], [1, 0], [1, 1]]\n",
295 |       "outputs:\n",
296 |       "1\n",
297 |       "(4, 2)\n",
298 |       "[[  2.66919022e-109   2.66919022e-109]\n",
299 |       " [  1.00000000e+000   1.92874985e-022]\n",
300 |       " [  1.00000000e+000   1.92874985e-022]\n",
301 |       " [  1.00000000e+000   1.00000000e+000]]\n",
302 |       "2\n",
303 |       "(4, 1)\n",
304 |       "[[  3.72007598e-44]\n",
305 |       " [  1.00000000e+00]\n",
306 |       " [  1.00000000e+00]\n",
307 |       " [  3.72007598e-44]]\n"
308 |      ]
309 |     }
310 |    ],
311 |    "source": [
312 |     "# VIEWING FORWARD PROPAGATION\n",
313 |     "\n",
314 |     "# Initialize network\n",
315 |     "layers = [2, 2, 1]\n",
316 |     "#weights = initializeWeights(layers)\n",
317 |     "\n",
318 |     "# 3-neuron network\n",
319 |     "weights = []\n",
320 |     "weights.append(np.array([[-250, 350, 350], [-250, 200, 200]]))\n",
321 |     "weights.append(np.array([[-100, 500, -500]]))\n",
322 |     "\n",
323 |     "print(\"weights:\")\n",
324 |     "for i in range(len(weights)):\n",
325 |     "    print(i+1); print(weights[i].shape); print(weights[i])\n",
326 |     "\n",
327 |     "# Input\n",
328 |     "X = [[0,0], [0,1], [1,0], [1,1]]\n",
329 |     "\n",
330 |     "print(\"X:\"); print(X)\n",
331 |     "\n",
332 |     "# Forward propagate X, and save outputs\n",
333 |     "outputs = forwardProp(X, weights)\n",
334 |     "\n",
335 |     "print(\"outputs:\")\n",
336 |     "for o in range(len(outputs)):\n",
337 |     "    print(o+1); print(outputs[o].shape); print(outputs[o])"
338 |    ]
339 |   },
340 |   {
341 |    "cell_type": "code",
342 |    "execution_count": null,
343 |    "metadata": {
344 |     "collapsed": true
345 |    },
346 |    "outputs": [],
347 |    "source": []
348 |   }
349 |  ],
350 |  "metadata": {
351 |   "kernelspec": {
352 |    "display_name": "Python 3",
353 |    "language": "python",
354 |    "name": "python3"
355 |   },
356 |   "language_info": {
357 |    "codemirror_mode": {
358 |     "name": "ipython",
359 |     "version": 3
360 |    },
361 |    "file_extension": ".py",
362 |    "mimetype": "text/x-python",
363 |    "name": "python",
364 |    "nbconvert_exporter": "python",
365 |    "pygments_lexer": "ipython3",
366 |    "version": "3.5.1"
367 |   }
368 |  },
369 |  "nbformat": 4,
370 |  "nbformat_minor": 2
371 | }
372 | 


--------------------------------------------------------------------------------
/4_Neural_Network_Tutorial_Backpropagation.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "code",
   5 |    "execution_count": 707,
   6 |    "metadata": {
   7 |     "collapsed": true
   8 |    },
   9 |    "outputs": [],
  10 |    "source": [
  11 |     "# Pre-requisites\n",
  12 |     "import numpy as np\n",
  13 |     "import time\n",
  14 |     "\n",
  15 |     "# To clear print buffer\n",
  16 |     "from IPython.display import clear_output"
  17 |    ]
  18 |   },
  19 |   {
  20 |    "cell_type": "markdown",
  21 |    "metadata": {},
  22 |    "source": [
  23 |     "# Importing code from previous tutorial:"
  24 |    ]
  25 |   },
  26 |   {
  27 |    "cell_type": "code",
  28 |    "execution_count": 30,
  29 |    "metadata": {
  30 |     "collapsed": true
  31 |    },
  32 |    "outputs": [],
  33 |    "source": [
  34 |     "# Initializing weight matrices from layer sizes\n",
  35 |     "def initializeWeights(layers):\n",
  36 |     "    weights = [np.random.randn(o, i+1) for i, o in zip(layers[:-1], layers[1:])]\n",
  37 |     "    return weights\n",
  38 |     "\n",
  39 |     "# Add a bias term to every data point in the input\n",
  40 |     "def addBiasTerms(X):\n",
  41 |     "        # Make the input an np.array()\n",
  42 |     "        X = np.array(X)\n",
  43 |     "        \n",
  44 |     "        # Forcing 1D vectors to be 2D matrices of 1xlength dimensions\n",
  45 |     "        if X.ndim==1:\n",
  46 |     "            X = np.reshape(X, (1, len(X)))\n",
  47 |     "        \n",
  48 |     "        # Inserting bias terms\n",
  49 |     "        X = np.insert(X, 0, 1, axis=1)\n",
  50 |     "        \n",
  51 |     "        return X\n",
  52 |     "\n",
  53 |     "# Sigmoid function\n",
  54 |     "def sigmoid(a):\n",
  55 |     "    return 1/(1 + np.exp(-a))\n",
  56 |     "\n",
  57 |     "# Forward Propagation of outputs\n",
  58 |     "def forwardProp(X, weights):\n",
  59 |     "    # Initializing an empty list of outputs\n",
  60 |     "    outputs = []\n",
  61 |     "    \n",
  62 |     "    # Assigning a name to reuse as inputs\n",
  63 |     "    inputs = X\n",
  64 |     "    \n",
  65 |     "    # For each layer\n",
  66 |     "    for w in weights:\n",
  67 |     "        # Add bias term to input\n",
  68 |     "        inputs = addBiasTerms(inputs)\n",
  69 |     "        \n",
  70 |     "        # Y = Sigmoid ( X .* W^T )\n",
  71 |     "        outputs.append(sigmoid(np.dot(inputs, w.T)))\n",
  72 |     "        \n",
  73 |     "        # Input of next layer is output of this layer\n",
  74 |     "        inputs = outputs[-1]\n",
  75 |     "        \n",
  76 |     "    return outputs"
  77 |    ]
  78 |   },
  79 |   {
  80 |    "cell_type": "markdown",
  81 |    "metadata": {},
  82 |    "source": [
  83 |     "# Training Neural Networks\n",
  84 |     "\n",
  85 |     "$$ Y^{(l)}_{n{\\times}o_{l}} = Sigmoid\\;(\\;X^{(l)}_{n{\\times}i_{l}} \\; .* \\; W^{(l)}{^{T}}_{i_{l}{\\times}o_{l}}) \\;\\;\\;\\;\\;\\;-------------(1)$$\n",
  86 |     "\n",
  87 |     "Neural networks are advantageous when we are able to compute that $W$ which satisfies $Y = Sigmoid(X\\cdot*W)$, for given $X$ and $Y$ (in supervised training).\n",
  88 |     "\n",
  89 |     "But, since there are so many weights (for bigger networks), it is time-intensive to algebraically solve the above equation. (Something like $W = X^{-1} \\;.*\\; Sigmoid^{-1}(Y)$...)"
  90 |    ]
  91 |   },
  92 |   {
  93 |    "cell_type": "markdown",
  94 |    "metadata": {},
  95 |    "source": [
  96 |     "## Set W to minimize cost (computationally intensive)\n",
  97 |     "\n",
  98 |     "A quicker way to compute W would be to randomly initialize it, and keep updating its value in such a way as to decrease the cost of the neural network.\n",
  99 |     "\n",
 100 |     "Define the cost as the mean squared error of the output of the neural network:\n",
 101 |     "\n",
 102 |     "$$error = yPred-Y$$\n",
 103 |     "\n",
 104 |     "Here, $yPred$ = ``forwardProp``$(X)$, and $Y$ is the desired output value from the neural network.\n",
 105 |     "\n",
 106 |     "$$Cost \\; J = \\frac{1}{2} \\sum \\limits_{n} \\frac{ {\\left( error \\right)}^2 }{n} = \\frac{1}{2} \\sum \\limits_{n} \\frac{ {\\left( yPred-Y \\right)}^2 }{n}$$\n",
 107 |     "\n",
 108 |     "Once we have initialized W, we need to change it such that J is minimized.\n",
 109 |     "\n",
 110 |     "The best way to minimize J w.r.t. W, is to partially derive J w.r.t. W and equate it to 0: $\\frac{{\\partial}J}{{\\partial}W} = 0$. But, this is computationally intensive."
 111 |    ]
 112 |   },
 113 |   {
 114 |    "cell_type": "code",
 115 |    "execution_count": 433,
 116 |    "metadata": {
 117 |     "collapsed": true
 118 |    },
 119 |    "outputs": [],
 120 |    "source": [
 121 |     "# Compute COST (J) of Neural Network\n",
 122 |     "def nnCost(weights, X, Y):\n",
 123 |     "    # Calculate yPred\n",
 124 |     "    yPred = forwardProp(X, weights)[-1]\n",
 125 |     "    \n",
 126 |     "    # Compute J\n",
 127 |     "    J = 0.5*np.sum((yPred-Y)**2)/len(Y)\n",
 128 |     "    \n",
 129 |     "    return J"
 130 |    ]
 131 |   },
 132 |   {
 133 |    "cell_type": "code",
 134 |    "execution_count": 434,
 135 |    "metadata": {
 136 |     "collapsed": true
 137 |    },
 138 |    "outputs": [],
 139 |    "source": [
 140 |     "# Initialize network\n",
 141 |     "layers = [2, 2, 1]\n",
 142 |     "weights = initializeWeights(layers)"
 143 |    ]
 144 |   },
 145 |   {
 146 |    "cell_type": "code",
 147 |    "execution_count": 435,
 148 |    "metadata": {
 149 |     "collapsed": true
 150 |    },
 151 |    "outputs": [],
 152 |    "source": [
 153 |     "# Declare input and desired output for AND gate\n",
 154 |     "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n",
 155 |     "Y = np.array([[0], [0], [0], [1]])"
 156 |    ]
 157 |   },
 158 |   {
 159 |    "cell_type": "code",
 160 |    "execution_count": 436,
 161 |    "metadata": {},
 162 |    "outputs": [
 163 |     {
 164 |      "name": "stdout",
 165 |      "output_type": "stream",
 166 |      "text": [
 167 |       "0.284231765606\n"
 168 |      ]
 169 |     }
 170 |    ],
 171 |    "source": [
 172 |     "# Cost\n",
 173 |     "J = nnCost(weights, X, Y)\n",
 174 |     "print(J)"
 175 |    ]
 176 |   },
 177 |   {
 178 |    "cell_type": "markdown",
 179 |    "metadata": {},
 180 |    "source": [
 181 |     "## Randomly initialize W, change it to decrease cost (more feasible)\n",
 182 |     "\n",
 183 |     "Instead, we initialize $W$ by randomly sampling from a standard normal distribution, and then keep changing $W$ so as to decrease the cost $J$.\n",
 184 |     "\n",
 185 |     "But what value to change $W$ by? To find out, let us focus on the weights of one of the neurons in the last layer, $W^{(L)}_{[k]}$, differentiate $J$ by it to see what we get:\n",
 186 |     "\n",
 187 |     "$$\\frac{ {\\partial}J} {{\\partial}W^{(L)}_{[k]} }=\\frac{\\partial}{{\\partial}W^{(L)}_{[k]}}\\left(\\frac{1}{2}\\sum\\limits_{n}{\\frac{ {\\left( yPred-Y \\right)}^2 }{n} }\\right)=\\frac{1}{2*n}\\sum\\limits_{n} \\left( \\frac{\\partial} {{\\partial}W^{(L)}_{[k]}} (yPred-Y)^2 \\right)=\\frac{1}{n}\\sum\\limits_{n} \\left( (yPred-Y) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right)$$\n",
 188 |     "\n",
 189 |     "$$\\Rightarrow \\frac{ {\\partial}J} {{\\partial}W^{(L)}_{[k]} } = \\frac{1}{n}\\sum\\limits_{n} \\left( (error) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} }  \\right)$$"
 190 |    ]
 191 |   },
 192 |   {
 193 |    "cell_type": "markdown",
 194 |    "metadata": {},
 195 |    "source": [
 196 |     "The above equation tells us how $J$ changes by changing $W^{(L)}_{[k]}$. Approximating it for numerical analysis:\n",
 197 |     "\n",
 198 |     "$${\\Delta}J ={{\\Delta}W^{(L)}_{[k]}} * \\left[ \\frac{1}{n}\\sum\\limits_{n} \\left( (error) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right) \\right] \\;\\;\\;\\;\\;\\;-------------(2)$$ "
 199 |    ]
 200 |   },
 201 |   {
 202 |    "cell_type": "markdown",
 203 |    "metadata": {},
 204 |    "source": [
 205 |     "## Change $W^{(L)}_{[k]}$ so that $J$ always decreases\n",
 206 |     "\n",
 207 |     "If we ensure that ${\\Delta}W^{(L)}_{[k]}$ is equal to $-\\left[ \\frac{1}{n}\\sum\\limits_{n} \\left( (error) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right) \\right]$, we see that ${\\Delta}J = {\\Delta}W^{(L)}_{[k]}*\\left(-\\left[{\\Delta}W^{(L)}_{[k]}\\right]\\right) = -\\left[{\\Delta}W^{(L)}_{[k]}\\right]^{2} \\Rightarrow$ negative! \n",
 208 |     "\n",
 209 |     "Thus, we decide to change $W^{(L)}_{[k]}$ by that amount which ensures $J$ always decreases!\n",
 210 |     "\n",
 211 |     "$${\\Delta}W^{(L)}_{[k]} = -\\left[ \\frac{1}{n}\\sum\\limits_{n} \\left( (error) * \\frac {{\\partial} \\; yPred} { {\\partial}W^{(L)}_{[k]} } \\right) \\right] \\;\\;\\;\\;\\;\\;-------------(3)$$ \n",
 212 |     "\n",
 213 |     "So, for each weight in the last layer, that ${\\Delta}W^{(L)}_{[k]}$ which shall (for sure) decrease J can be computed. "
 214 |    ]
 215 |   },
 216 |   {
 217 |    "cell_type": "markdown",
 218 |    "metadata": {},
 219 |    "source": [
 220 |     "## Gradient Descent\n",
 221 |     "\n",
 222 |     "If we update each weight as $W^{(L)}_{[k]} \\leftarrow W^{(L)}_{[k]} + {\\Delta}W^{(L)}_{[k]}$, it is guaranteed that with the new weights, the neural network shall produce outputs that are closer to the desired output.\n",
 223 |     "\n",
 224 |     "This is how to train a neural network - randomly initialize $W$, iteratively change $W$ according to eq (3).\n",
 225 |     "\n",
 226 |     "**This is called Gradient Descent.**\n",
 227 |     "\n",
 228 |     "One way to think about this is - assuming the graph of $J$ vs. $W$ is like an upturned hill, we are slowly descending down the hill by changing $W$, to the point where $J$ is minimum.\n",
 229 |     "\n",
 230 |     "J is (sort of) a quadratic function on W, so we can assume it's (sort of) like an upturned hill."
 231 |    ]
 232 |   },
 233 |   {
 234 |    "cell_type": "markdown",
 235 |    "metadata": {},
 236 |    "source": [
 237 |     "# Computing ${\\Delta}W^{(L)}$ of last layer\n",
 238 |     "\n",
 239 |     "To compute ${\\Delta}W$, we need to compute $error$ and $\\frac{{\\partial}\\;yPred}{{\\partial}W^{(L)}}$"
 240 |    ]
 241 |   },
 242 |   {
 243 |    "cell_type": "markdown",
 244 |    "metadata": {},
 245 |    "source": [
 246 |     "## 1. Computing error\n",
 247 |     "\n",
 248 |     "$ error = yPred - Y = $ ``forwardProp``$(X) - Y \\;\\;\\;\\;\\;\\;-------------(4)$"
 249 |    ]
 250 |   },
 251 |   {
 252 |    "cell_type": "markdown",
 253 |    "metadata": {},
 254 |    "source": [
 255 |     "For example, suppose we want to compute those $W$'s in a 3-neuron network that are able to perform AND logic on two inputs.\n",
 256 |     "\n",
 257 |     "Here, for $X = \\left[\\begin{array}{c}(0,0)\\\\(0,1)\\\\(1,0)\\\\(1,1)\\end{array}\\right]$, $Y = \\left[\\begin{array}{c}0\\\\0\\\\0\\\\1\\end{array}\\right]$"
 258 |    ]
 259 |   },
 260 |   {
 261 |    "cell_type": "code",
 262 |    "execution_count": 686,
 263 |    "metadata": {},
 264 |    "outputs": [
 265 |     {
 266 |      "name": "stdout",
 267 |      "output_type": "stream",
 268 |      "text": [
 269 |       "weights:\n",
 270 |       "1\n",
 271 |       "(2, 3)\n",
 272 |       "[[-0.87271574  0.35621485  0.95252276]\n",
 273 |       " [-0.61981924 -1.49164222  0.55011796]]\n",
 274 |       "2\n",
 275 |       "(1, 3)\n",
 276 |       "[[-1.57656753 -1.10359895 -0.34594249]]\n"
 277 |      ]
 278 |     }
 279 |    ],
 280 |    "source": [
 281 |     "# Initialize network\n",
 282 |     "layers = [2, 2, 1]\n",
 283 |     "weights = initializeWeights(layers)\n",
 284 |     "\n",
 285 |     "print(\"weights:\")\n",
 286 |     "for i in range(len(weights)):\n",
 287 |     "    print(i+1); print(weights[i].shape); print(weights[i])"
 288 |    ]
 289 |   },
 290 |   {
 291 |    "cell_type": "markdown",
 292 |    "metadata": {},
 293 |    "source": [
 294 |     "Our weights have been randomly initialized. Let us see what yPred they give:"
 295 |    ]
 296 |   },
 297 |   {
 298 |    "cell_type": "code",
 299 |    "execution_count": 687,
 300 |    "metadata": {
 301 |     "collapsed": true
 302 |    },
 303 |    "outputs": [],
 304 |    "source": [
 305 |     "# Declare input and desired output for AND gate\n",
 306 |     "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n",
 307 |     "Y = np.array([[0], [0], [0], [1]])"
 308 |    ]
 309 |   },
 310 |   {
 311 |    "cell_type": "code",
 312 |    "execution_count": 688,
 313 |    "metadata": {},
 314 |    "outputs": [
 315 |     {
 316 |      "name": "stdout",
 317 |      "output_type": "stream",
 318 |      "text": [
 319 |       "outputs\n",
 320 |       "[array([[ 0.29468953,  0.34982256],\n",
 321 |       "       [ 0.51994117,  0.48258173],\n",
 322 |       "       [ 0.37367081,  0.10798781],\n",
 323 |       "       [ 0.60731071,  0.17345395]]), array([[ 0.11682925],\n",
 324 |       "       [ 0.08969868],\n",
 325 |       "       [ 0.11646832],\n",
 326 |       "       [ 0.09056134]])]\n"
 327 |      ]
 328 |     }
 329 |    ],
 330 |    "source": [
 331 |     "# Calculate outputs at each layer by forward propagation\n",
 332 |     "outputs = forwardProp(X, weights)\n",
 333 |     "print(\"outputs\"); print(outputs)"
 334 |    ]
 335 |   },
 336 |   {
 337 |    "cell_type": "code",
 338 |    "execution_count": 689,
 339 |    "metadata": {},
 340 |    "outputs": [
 341 |     {
 342 |      "name": "stdout",
 343 |      "output_type": "stream",
 344 |      "text": [
 345 |       "(4, 1)\n",
 346 |       "[[ 0.11682925]\n",
 347 |       " [ 0.08969868]\n",
 348 |       " [ 0.11646832]\n",
 349 |       " [ 0.09056134]]\n"
 350 |      ]
 351 |     }
 352 |    ],
 353 |    "source": [
 354 |     "# Calculate yPred as the last output from forward propagation\n",
 355 |     "yPred = outputs[-1]\n",
 356 |     "print(yPred.shape); print(yPred)"
 357 |    ]
 358 |   },
 359 |   {
 360 |    "cell_type": "code",
 361 |    "execution_count": 690,
 362 |    "metadata": {},
 363 |    "outputs": [
 364 |     {
 365 |      "name": "stdout",
 366 |      "output_type": "stream",
 367 |      "text": [
 368 |       "(4, 1)\n",
 369 |       "[[ 0.11682925]\n",
 370 |       " [ 0.08969868]\n",
 371 |       " [ 0.11646832]\n",
 372 |       " [-0.90943866]]\n"
 373 |      ]
 374 |     }
 375 |    ],
 376 |    "source": [
 377 |     "# Error = yPred - Y\n",
 378 |     "error = yPred - Y\n",
 379 |     "print(error.shape); print(error)"
 380 |    ]
 381 |   },
 382 |   {
 383 |    "cell_type": "markdown",
 384 |    "metadata": {},
 385 |    "source": [
 386 |     "## 2. Computing $\\frac{{\\partial}\\;yPred}{{\\partial}W^{(L)}_{[k]}}$\n",
 387 |     "\n",
 388 |     "From eq. (1), $yPred$ can be written as:\n",
 389 |     "\n",
 390 |     "$$yPred = Sigmoid(X^{(L)}\\;.*\\;W^{(L)}{^{T}})$$\n",
 391 |     "\n",
 392 |     "So,\n",
 393 |     "\n",
 394 |     "$$\\frac{{\\partial}\\;yPred}{{\\partial}W^{(L)}_{[k]}} = \\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}\\left(Sigmoid\\left(X^{(L)}.*W^{(L)}{^{T}}\\right)\\right) = Sigmoid^{'}\\left(X^{(L)}.*W^{(L)}{^{T}}\\right)*\\left(\\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}\\left((X^{(L)}.*W^{(L)}{^{T}})\\right)\\right)$$\n",
 395 |     "\n",
 396 |     "Here, $yPred$ is an $o_{L}$-dimensional number, and $W^{(L)}_{[k]}$ only affect the $k$-th dimension of $yPred$, i.e. $yPred_{[k]}$. So,\n",
 397 |     "\n",
 398 |     "$$\\frac{{\\partial}\\;yPred_{[k]}}{{\\partial}W^{(L)}_{[k]}} = Sigmoid^{'}\\left(X^{(L)}.*W^{(L)}_{[k]}\\right)*\\left(\\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}\\left((X^{(L)}.*W^{(L)}_{[k]})\\right)\\right)$$"
 399 |    ]
 400 |   },
 401 |   {
 402 |    "cell_type": "markdown",
 403 |    "metadata": {},
 404 |    "source": [
 405 |     "### - Computing $Sigmoid^{'}\\left(X^{(L)}.*W^{(L)}_{[k]}\\right)$\n",
 406 |     "\n",
 407 |     "It can be verified that $Sigmoid^{'}(a) = Sigmoid(a)*(1-Sigmoid(a))$. Thus, $Sigmoid^{'}(X^{(L)}.*W^{(L)}_{[k]}{^{T}}) = yPred_{[k]}*(1 - yPred_{[k]})$. So,\n",
 408 |     "\n",
 409 |     "$$\\frac{{\\partial}\\;yPred_{[k]}}{{\\partial}W^{(L)}_{[k]}} = \\left(yPred_{[k]}*(1 - yPred_{[k]})*\\left(\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L)}_{[k]}}\\right)\\right)$$\n",
 410 |     "\n",
 411 |     "$${\\Delta}W^{(L)}_{[k]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}\\left(error_{[k]}*yPred_{[k]}*(1 - yPred_{[k]})*\\left(\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L)}_{[k]}}\\right)\\right)\\right]  \\;\\;\\;\\;\\;\\;-------------(5)$$"
 412 |    ]
 413 |   },
 414 |   {
 415 |    "cell_type": "markdown",
 416 |    "metadata": {},
 417 |    "source": [
 418 |     "### - Computing $\\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}((X^{(L)}.*W^{(L)}_{[k]}))$\n",
 419 |     "\n",
 420 |     "It can be seen that $\\frac{{\\partial}}{{\\partial}W^{(L)}_{[k]}}((X^{(L)}.*W^{(L)}_{[k]})) = X^{(L)}$\n",
 421 |     "\n",
 422 |     "We also know that $X^{(L)} = \\left[ \\begin{array}{c} 1 & Y^{(L-1)} \\end{array} \\right]_{n{\\times}i_{L}}$, and $Y^{(L-1)}$ have been computed during Forward Propagation. So,\n",
 423 |     "\n",
 424 |     "$$\\frac{{\\partial}\\;yPred_{[k]}}{{\\partial}W^{(L)}} = (yPred_{[k]}*(1-yPred_{[k]}))*X^{(L)} $$\n",
 425 |     "\n",
 426 |     "$${\\Delta}W^{(L)}_{[k]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}\\left(error_{[k]}*yPred_{[k]}*(1 - yPred_{[k]})*X^{(L)}\\right)\\right]\\;\\;\\;\\;\\;\\;-------------(6)$$"
 427 |    ]
 428 |   },
 429 |   {
 430 |    "cell_type": "markdown",
 431 |    "metadata": {},
 432 |    "source": [
 433 |     "## Combining terms to simplify computation\n",
 434 |     "\n",
 435 |     "Here, dimension of $error$, $yPred$, and $(1-yPred)$ is $n{\\times}o_{L}$, while that of $X^{(L)}$ is $n{\\times}i_{L}$. A little thought has to be given towards how those quantities are multiplied.\n",
 436 |     "\n",
 437 |     "First of all, we can combine the mentioned three into one and call it $\\delta$.\n",
 438 |     "\n",
 439 |     "$${\\delta}_{n{\\times}o_{L}} = error_{n{\\times}o_{L}}*yPred_{n{\\times}o_{L}}*(1-yPred)_{n{\\times}o_{L}} \\;\\;\\;\\;\\;\\;-----(7)$$\n",
 440 |     "\n",
 441 |     "$${\\Delta}W^{(L)}_{[k]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}\\left({\\delta}_{[k]}*X^{(L)}\\right)\\right]$$\n",
 442 |     "\n",
 443 |     "We can now combine calculations of all dimensions into this matrix operation: (We will figure out the matrix dimensions below)\n",
 444 |     "\n",
 445 |     "$${\\Delta}W^{(L)} = -\\left[\\frac{1}{n}\\sum\\limits_{n}\\left({\\delta}*X^{(L)}\\right)\\right]$$\n"
 446 |    ]
 447 |   },
 448 |   {
 449 |    "cell_type": "markdown",
 450 |    "metadata": {},
 451 |    "source": [
 452 |     "One way of figuring out how $\\delta$ and $X^{(L)}$ are combined is to see that the dimension of ${\\Delta}W$ is $o_{L}{\\times}i_{L}$, dimension of $\\delta$ is $n{\\times}o_{L}$, and the dimension of $X^{(L)}$ is $n{\\times}i_{L}$.\n",
 453 |     "\n",
 454 |     "Clearly, the $\\sum\\limits_{n}\\left({\\delta}*X^{(L)}\\right)$ term, when considered for all the weights, is equal to $\\delta^{T}_{o_{L}{\\times}n}\\;.*\\;X^{(L)}_{n{\\times}i_{L}}$, the summation over $n$ being taken care of by the dot product, and the output dimension ${o_{L}{\\times}i_{L}}$ matches that of $W^{(L)}$.\n",
 455 |     "\n",
 456 |     "Hence, using matrix operations, ${\\Delta}W^{(L)}$ can be found as:\n",
 457 |     "\n",
 458 |     "$${\\Delta}W^{(L)}_{{o_{L}{\\times}i_{L}}} = -\\frac{1}{n}\\left({\\delta}^{T}{_{o_{L}{\\times}n}}\\;.*\\;X^{(L)}_{n{\\times}i_{L}}\\right) \\;\\;\\;\\;\\;\\;-------------(8)$$"
 459 |    ]
 460 |   },
 461 |   {
 462 |    "cell_type": "code",
 463 |    "execution_count": 691,
 464 |    "metadata": {},
 465 |    "outputs": [
 466 |     {
 467 |      "name": "stdout",
 468 |      "output_type": "stream",
 469 |      "text": [
 470 |       "(4, 1)\n",
 471 |       "[[ 0.01205446]\n",
 472 |       " [ 0.00732415]\n",
 473 |       " [ 0.01198499]\n",
 474 |       " [-0.07490136]]\n"
 475 |      ]
 476 |     }
 477 |    ],
 478 |    "source": [
 479 |     "# Calculate delta for the last layer\n",
 480 |     "delta = np.multiply(np.multiply(error, yPred), 1-yPred)\n",
 481 |     "print(delta.shape); print(delta)"
 482 |    ]
 483 |   },
 484 |   {
 485 |    "cell_type": "code",
 486 |    "execution_count": 692,
 487 |    "metadata": {},
 488 |    "outputs": [
 489 |     {
 490 |      "name": "stdout",
 491 |      "output_type": "stream",
 492 |      "text": [
 493 |       "(4, 3)\n",
 494 |       "[[ 1.          0.29468953  0.34982256]\n",
 495 |       " [ 1.          0.51994117  0.48258173]\n",
 496 |       " [ 1.          0.37367081  0.10798781]\n",
 497 |       " [ 1.          0.60731071  0.17345395]]\n"
 498 |      ]
 499 |     }
 500 |    ],
 501 |    "source": [
 502 |     "# Find input to the last layer\n",
 503 |     "xL = addBiasTerms(outputs[-2])\n",
 504 |     "print(xL.shape); print(xL)"
 505 |    ]
 506 |   },
 507 |   {
 508 |    "cell_type": "code",
 509 |    "execution_count": 693,
 510 |    "metadata": {},
 511 |    "outputs": [
 512 |     {
 513 |      "name": "stdout",
 514 |      "output_type": "stream",
 515 |      "text": [
 516 |       "(1, 3)\n",
 517 |       "[[ 0.01088444  0.00841238  0.00098657]]\n"
 518 |      ]
 519 |     }
 520 |    ],
 521 |    "source": [
 522 |     "# Find deltaW for last layer\n",
 523 |     "deltaW = -np.dot(delta.T, xL)/len(Y)\n",
 524 |     "print(deltaW.shape); print(deltaW)"
 525 |    ]
 526 |   },
 527 |   {
 528 |    "cell_type": "code",
 529 |    "execution_count": 694,
 530 |    "metadata": {},
 531 |    "outputs": [
 532 |     {
 533 |      "name": "stdout",
 534 |      "output_type": "stream",
 535 |      "text": [
 536 |       "old weights:\n",
 537 |       "1\n",
 538 |       "(2, 3)\n",
 539 |       "[[-0.87271574  0.35621485  0.95252276]\n",
 540 |       " [-0.61981924 -1.49164222  0.55011796]]\n",
 541 |       "2\n",
 542 |       "(1, 3)\n",
 543 |       "[[-1.57656753 -1.10359895 -0.34594249]]\n",
 544 |       "new weights:\n",
 545 |       "1\n",
 546 |       "(2, 3)\n",
 547 |       "[[-0.87271574  0.35621485  0.95252276]\n",
 548 |       " [-0.61981924 -1.49164222  0.55011796]]\n",
 549 |       "2\n",
 550 |       "(1, 3)\n",
 551 |       "[[-1.5656831  -1.09518657 -0.34495592]]\n",
 552 |       "old cost:\n",
 553 |       "0.107792308277\n",
 554 |       "new cost:\n",
 555 |       "0.107601673739\n"
 556 |      ]
 557 |     }
 558 |    ],
 559 |    "source": [
 560 |     "# Checking cost of neural network before and after change in W^{L}\n",
 561 |     "newWeights = [np.array(w) for w in weights]\n",
 562 |     "newWeights[-1] += deltaW\n",
 563 |     "\n",
 564 |     "print(\"old weights:\")\n",
 565 |     "for i in range(len(weights)):\n",
 566 |     "    print(i+1); print(weights[i].shape); print(weights[i])\n",
 567 |     "\n",
 568 |     "print(\"new weights:\")\n",
 569 |     "for i in range(len(newWeights)):\n",
 570 |     "    print(i+1); print(newWeights[i].shape); print(newWeights[i])\n",
 571 |     "\n",
 572 |     "print(\"old cost:\"); print(nnCost(weights, X, Y))\n",
 573 |     "print(\"new cost:\"); print(nnCost(newWeights, X, Y))"
 574 |    ]
 575 |   },
 576 |   {
 577 |    "cell_type": "markdown",
 578 |    "metadata": {},
 579 |    "source": [
 580 |     "### **Congratulations! You've just learned how to back propagate!**\n",
 581 |     "(1 layer only)"
 582 |    ]
 583 |   },
 584 |   {
 585 |    "cell_type": "markdown",
 586 |    "metadata": {},
 587 |    "source": [
 588 |     "# Back-propagation through layers\n",
 589 |     "\n",
 590 |     "For the last layer, according to eq. (5),\n",
 591 |     "$${\\Delta}W^{(L)}_{[k]} = -\\frac{1}{n}\\sum\\limits_{n}\\left(error_{[k]}*yPred_{[k]}*(1 - yPred_{[k]})*\\left(\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L)}_{[k]}}\\right)\\right) = -\\frac{1}{n}\\sum\\limits_{n}\\left(\\delta^{(L)}_{[k]}*\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L)}_{[k]}}\\right)$$"
 592 |    ]
 593 |   },
 594 |   {
 595 |    "cell_type": "markdown",
 596 |    "metadata": {},
 597 |    "source": [
 598 |     "### Computing for Layer L-1\n",
 599 |     "\n",
 600 |     "If we go back one more layer to find out ${\\Delta}W$ for the $p^{th}$ neuron in the $(L-1)^{th}$ layer, backpropagated from the $k^{th}$ neuron in the $L^{th}$ layer, noting that $X^{L} = Y^{L-1}$:\n",
 601 |     "\n",
 602 |     "$${\\Delta}W^{(L-1)}_{[p]from[k]} = -\\frac{1}{n}\\sum\\limits_{n}\\left(\\delta^{(L)}_{[k]}*\\frac{{\\partial}(X^{(L)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}}\\right) = -\\frac{1}{n}\\sum\\limits_{n}\\left(\\delta^{(L)}_{[k]}*\\frac{{\\partial}(Y^{(L-1)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}}\\right)$$"
 603 |    ]
 604 |   },
 605 |   {
 606 |    "cell_type": "markdown",
 607 |    "metadata": {},
 608 |    "source": [
 609 |     "Here, $Y^{(L-1)}$ is the collected output of the penultimate layer, i.e. the collected output of all neurons in the penultimate layer. $W^{(L-1)}_{[p]}$ is the weight matrix of the $p^{th}$ neuron in the penultimate layer. So,\n",
 610 |     "\n",
 611 |     "$$\\frac{{\\partial}(Y^{(L-1)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}} = \\frac{{\\partial}((Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[0]}) + Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[1]}) + ... + Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[p]}) + ... + Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[i_{L-1}-1]})).*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}}$$\n",
 612 |     "\n",
 613 |     "We know that change in $W^{(L-1)}_{[p]}$ does not affect $W^{(L)}$ or any $W^{(L-1)}$ weight matrix other than $W^{(L-1)}_{[p]}$. So:\n",
 614 |     "\n",
 615 |     "$$\\frac{{\\partial}(Y^{(L-1)}.*W^{(L)}_{[k]})}{{\\partial}W^{(L-1)}_{[p]}} = \\frac{{\\partial}(Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[p]}))}{{\\partial}W^{(L-1)}_{[p]}}$$\n",
 616 |     "\n",
 617 |     "$${\\Delta}W^{(L-1)}_{[p]from[k]} = -\\frac{1}{n}\\sum\\limits_{n}\\left(\\delta^{(L)}_{[k]}*W^{(L)}_{[k]}*\\frac{{\\partial}\\;(Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[p]}))}{{\\partial}W^{(L-1)}_{[p]}}\\right)$$\n",
 618 |     "\n",
 619 |     "(Ignoring dimensions for now)"
 620 |    ]
 621 |   },
 622 |   {
 623 |    "cell_type": "markdown",
 624 |    "metadata": {},
 625 |    "source": [
 626 |     "We know how this goes now.\n",
 627 |     "\n",
 628 |     "$$\\frac{{\\partial}\\;(Sigmoid(X^{(L-1)}.*W^{(L-1)}_{[p]}))}{{\\partial}W^{(L-1)}_{[p]}} = Sigmoid^{'}(X^{(L-1)}.*W^{(L-1)}_{[p]})*\\frac{{\\partial}(X^{(L-1)}.*W^{(L-1)}_{[p]})}{{\\partial}W^{(L-1)}_{[p]}} = Y^{(L-1)}_{[p]}*(1 - Y^{(L-1)}_{[p]}))*X^{(L-1)}$$\n"
 629 |    ]
 630 |   },
 631 |   {
 632 |    "cell_type": "markdown",
 633 |    "metadata": {},
 634 |    "source": [
 635 |     "Thus,\n",
 636 |     "\n",
 637 |     "$${\\Delta}W^{(L-1)}_{[p]from[k]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}(\\delta^{(L)}_{[k]}*W^{(L)}_{[k]}*(Y^{(L-1)}_{[p]}*(1 - Y^{(L-1)}_{[p]})*X^{(L-1)})\\right]$$\n",
 638 |     "\n",
 639 |     "We need to take care of the dimensions. Here, there are two parts: $\\delta^{(L)}_{[k]}*W^{(L)}_{[k]}$, which is only concerned with the $L^{th}$ layer, and $Y^{(L-1)}_{[p]}*(1 - Y^{(L-1)}_{[p]}))*X^{(L-1)}$, which is only concerned with the $(L-1)^{th}$ layer."
 640 |    ]
 641 |   },
 642 |   {
 643 |    "cell_type": "markdown",
 644 |    "metadata": {},
 645 |    "source": [
 646 |     "### 1) Back-propagation Error\n",
 647 |     "\n",
 648 |     "We can observe here that the terms $\\delta^{(L)}_{[k]}$ and $W^{(L)}_{[k]}$ are back-propagated from the $k^{th}$ neuron of the final layer. Let's combine them and call it the back-propagated error:\n",
 649 |     "$$bpError^{(L-1)}_{[k]}{_{n{\\times}i_{L}}} = \\delta^{(L)}_{[k]}*W^{(L)}_{[k]}$$\n",
 650 |     "\n",
 651 |     "We know that $\\delta^{(L)}{_{n{\\times}o_{L}}}*W^{(L)}{_{o_{L}{\\times}{i_{L}}}}$ is a matrix of dimensions $n{\\times}i_{L} = n{\\times}o_{(L-1)}$, which is the sum of the backprop errors from each neuron in the final layer. Thus,\n",
 652 |     "\n",
 653 |     "$$bpError^{(L-1)}{_{n{\\times}i_{L}}} = \\delta^{(L)}*W^{(L)} \\;\\;\\;\\;\\;\\;--------------(9)$$\n",
 654 |     "\n",
 655 |     "We see that for a neuron in the $(L-1)^{th}$ layer, the total error back-propagated to it is the sum of the back-propagated errors from each of the neurons connected to it in the $L^{th}$ layer."
 656 |    ]
 657 |   },
 658 |   {
 659 |    "cell_type": "markdown",
 660 |    "metadata": {},
 661 |    "source": [
 662 |     "Thus, instead of ${\\Delta}W^{(L-1)}_{[p]from[k]}$ from the $k^{th}$ neuron, we can directly consider ${\\Delta}W^{(L-1)}_{[p]}$:\n",
 663 |     "\n",
 664 |     "$${\\Delta}W^{(L-1)}_{[p]} = -\\left[\\frac{1}{n}\\sum\\limits_{n}(bpError^{(L-1)}_{[p]}*(Y^{(L-1)}_{[p]}*(1 - Y^{(L-1)}_{[p]})*X^{(L-1)})\\right]$$"
 665 |    ]
 666 |   },
 667 |   {
 668 |    "cell_type": "markdown",
 669 |    "metadata": {},
 670 |    "source": [
 671 |     "### 2) The $Y*(1-Y)*X$ term\n",
 672 |     "\n",
 673 |     "We can convert the $Y*(1-Y)*X$ term into a matrix operation, with summation over $n$ inherently taken care. Directly considering $Y$ instead of $Y_{[p]}$:\n",
 674 |     "\n",
 675 |     "$$Y^{(L-1)}*(1 - Y^{(L-1)}))*X^{(L-1)} == (Y^{(L-1)}.*(1 - Y^{(L-1)}))^{T}{_{o_{(L-1)}{\\times}n}} * X^{(L-1)}{_{n{\\times}i_{(L-1)}}}$$\n",
 676 |     "\n",
 677 |     "We can see that the resultant matrix has the same dimensions as $W^{(L-1)}$ : $o_{(L-1)}{\\times}i_{(L-1)}$."
 678 |    ]
 679 |   },
 680 |   {
 681 |    "cell_type": "markdown",
 682 |    "metadata": {},
 683 |    "source": [
 684 |     "### Combining the two\n",
 685 |     "\n",
 686 |     "To combine $bpError$ and the $Y*(1-Y)*X$ terms, for consistency in dimensions, we need to first dot multiply $bpError_{n{\\times}o_{(L-1)}}$ with $Y^{(L-1)}_{n{\\times}o_{(L-1)}}.*(1 - Y^{(L-1)})_{n{\\times}o_{(L-1)}}$, and then multiply the transpose of that with X.\n",
 687 |     "\n",
 688 |     "Thus,\n",
 689 |     "\n",
 690 |     "$${\\Delta}W^{(L-1)}_{o_{(L-1)}{\\times}i_{(L-1)}} = -\\left[\\frac{1}{n}((bpError^{(L-1)}.*Y^{(L-1)}.*(1 - Y^{(L-1)}))^{T} _{o_{(L-1)}{\\times}n}* X^{(L-1)}_{n{\\times}i_{(L-1)}}\\right] \\;\\;\\;\\;\\;\\;--------------(10)$$\n",
 691 |     "\n",
 692 |     "(Summation across $n$ is taken care of within the matrix multiplication)"
 693 |    ]
 694 |   },
 695 |   {
 696 |    "cell_type": "markdown",
 697 |    "metadata": {},
 698 |    "source": [
 699 |     "## Simplifying to matrix operation of any layer $l$\n",
 700 |     "\n",
 701 |     "Just as we had done for the final layer, from equation 9:\n",
 702 |     "\n",
 703 |     "<center>$bpError^{(l)}_{n{\\times}o_{l}} = \\delta^{(l+1)}_{n{\\times}o_{l+1}}*W^{(l+1)}_{o_{l+1}{\\times}o_{l}}$\n",
 704 |     "\n",
 705 |     "If we compare equation (10) with equation (6), we can generalize \"error\" there as Backpropagation Error, and the formula for ${\\delta}$ as:\n",
 706 |     "\n",
 707 |     "$${\\delta}^{(l)}_{n{\\times}o_{l}} = {bpError^{(l)}_{n{\\times}o_{l}}} .* {Y^{(l)}_{n{\\times}o_{l}}} .* (1-Y^{(l)})_{n{\\times}o_{l}}$$\n",
 708 |     "\n",
 709 |     "Thus,\n",
 710 |     "\n",
 711 |     "$${\\Delta}W^{(l)}_{{o_{l}{\\times}i_{l}}} = -\\frac{1}{n}\\left({\\delta^{(l)}}^{T}{_{o_{l}{\\times}n}}\\;.*\\;X^{(l)}_{n{\\times}i_{l}}\\right) \\;\\;\\;\\;\\;\\;-------------(11)$$\n"
 712 |    ]
 713 |   },
 714 |   {
 715 |    "cell_type": "code",
 716 |    "execution_count": 695,
 717 |    "metadata": {
 718 |     "collapsed": true
 719 |    },
 720 |    "outputs": [],
 721 |    "source": [
 722 |     "# IMPLEMENTING BACK-PROPAGATION\n",
 723 |     "def backProp(weights, X, Y):\n",
 724 |     "    # Forward propagate to find outputs\n",
 725 |     "    outputs = forwardProp(X, weights)\n",
 726 |     "    \n",
 727 |     "    # For the last layer, bpError = error = yPred - Y\n",
 728 |     "    bpError = outputs[-1] - Y\n",
 729 |     "    \n",
 730 |     "    # Back-propagating from the last layer to the first\n",
 731 |     "    for l, w in enumerate(reversed(weights)):\n",
 732 |     "        \n",
 733 |     "        # Find yPred for this layer\n",
 734 |     "        yPred = outputs[-l-1]\n",
 735 |     "        \n",
 736 |     "        # Calculate delta for this layer using bpError from next layer\n",
 737 |     "        delta = np.multiply(np.multiply(bpError, yPred), 1-yPred)\n",
 738 |     "        \n",
 739 |     "        # Find input to the layer, by adding bias to the output of the previous layer\n",
 740 |     "        # Take care, l goes from 0 to 1, while the weights are in reverse order\n",
 741 |     "        if l==len(weights)-1: # If 1st layer has been reached\n",
 742 |     "            xL = addBiasTerms(X)\n",
 743 |     "        else:\n",
 744 |     "            xL = addBiasTerms(outputs[-l-2])\n",
 745 |     "        \n",
 746 |     "        # Calculate deltaW for this layer\n",
 747 |     "        deltaW = -np.dot(delta.T, xL)/len(Y)\n",
 748 |     "        \n",
 749 |     "        # Calculate bpError for previous layer to be back-propagated\n",
 750 |     "        bpError = np.dot(delta, w)\n",
 751 |     "        \n",
 752 |     "        # Ignore bias term in bpError\n",
 753 |     "        bpError = bpError[:,1:]\n",
 754 |     "        \n",
 755 |     "        # Change weights of the current layer (W <- W + deltaW)\n",
 756 |     "        w += deltaW"
 757 |    ]
 758 |   },
 759 |   {
 760 |    "cell_type": "code",
 761 |    "execution_count": 698,
 762 |    "metadata": {},
 763 |    "outputs": [
 764 |     {
 765 |      "name": "stdout",
 766 |      "output_type": "stream",
 767 |      "text": [
 768 |       "old weights:\n",
 769 |       "1\n",
 770 |       "(2, 3)\n",
 771 |       "[[-0.87271574  0.35621485  0.95252276]\n",
 772 |       " [-0.61981924 -1.49164222  0.55011796]]\n",
 773 |       "2\n",
 774 |       "(1, 3)\n",
 775 |       "[[-1.57656753 -1.10359895 -0.34594249]]\n",
 776 |       "old cost:\n",
 777 |       "0.107792308277\n"
 778 |      ]
 779 |     }
 780 |    ],
 781 |    "source": [
 782 |     "# To check with the single back-propagation step done before,\n",
 783 |     "# back up the current weights\n",
 784 |     "oldWeights = [np.array(w) for w in weights]\n",
 785 |     "print(\"old weights:\")\n",
 786 |     "for i in range(len(oldWeights)):\n",
 787 |     "    print(i+1); print(oldWeights[i].shape); print(oldWeights[i])\n",
 788 |     "\n",
 789 |     "print(\"old cost:\"); print(nnCost(oldWeights, X, Y))"
 790 |    ]
 791 |   },
 792 |   {
 793 |    "cell_type": "markdown",
 794 |    "metadata": {},
 795 |    "source": [
 796 |     "Let us define a function to compute the accuracy of our model, irrespective of the number of neuron in the output layer."
 797 |    ]
 798 |   },
 799 |   {
 800 |    "cell_type": "code",
 801 |    "execution_count": 699,
 802 |    "metadata": {
 803 |     "collapsed": true
 804 |    },
 805 |    "outputs": [],
 806 |    "source": [
 807 |     "# Evaluate the accuracy of weights for input X and desired outptut Y\n",
 808 |     "def evaluate(weights, X, Y):\n",
 809 |     "    yPreds = forwardProp(X, weights)[-1]\n",
 810 |     "    # Check if maximum probability is from that neuron corresponding to desired class,\n",
 811 |     "    # AND check if that maximum probability is greater than 0.5\n",
 812 |     "    yes = sum( int( ( np.argmax(yPreds[i]) == np.argmax(Y[i]) ) and \n",
 813 |     "                    ( (yPreds[i][np.argmax(yPreds[i])]>0.5) == (Y[i][np.argmax(Y[i])]>0.5) ) )\n",
 814 |     "              for i in range(len(Y)) )\n",
 815 |     "    print(str(yes)+\" out of \"+str(len(Y))+\" : \"+str(float(yes/len(Y))))"
 816 |    ]
 817 |   },
 818 |   {
 819 |    "cell_type": "markdown",
 820 |    "metadata": {},
 821 |    "source": [
 822 |     "Check the results of back-propagation:"
 823 |    ]
 824 |   },
 825 |   {
 826 |    "cell_type": "code",
 827 |    "execution_count": 722,
 828 |    "metadata": {},
 829 |    "outputs": [
 830 |     {
 831 |      "name": "stdout",
 832 |      "output_type": "stream",
 833 |      "text": [
 834 |       "950\n",
 835 |       "new cost:\n",
 836 |       "0.0113971310862\n",
 837 |       "new accuracy: \n",
 838 |       "4 out of 4 : 1.0\n",
 839 |       "[[ 0.03022141]\n",
 840 |       " [ 0.13740936]\n",
 841 |       " [ 0.13683374]\n",
 842 |       " [ 0.7705247 ]]\n"
 843 |      ]
 844 |     }
 845 |    ],
 846 |    "source": [
 847 |     "# BACK-PROPAGATE, checking old & new weights and costs\n",
 848 |     "\n",
 849 |     "# Re-initialize to old weights\n",
 850 |     "weights = [np.array(w) for w in oldWeights]\n",
 851 |     "\n",
 852 |     "#print(\"old weights:\")\n",
 853 |     "#for i in range(len(weights)):\n",
 854 |     "#    print(i+1); print(weights[i].shape); print(weights[i])\n",
 855 |     "\n",
 856 |     "print(\"old cost: \"); print(nnCost(weights, X, Y))\n",
 857 |     "print(\"old accuracy: \"); print(evaluate(weights, X, Y))\n",
 858 |     "for i in range(1000):\n",
 859 |     "    # Back propagate\n",
 860 |     "    backProp(weights, X, Y)\n",
 861 |     "\n",
 862 |     "    #print(\"new weights:\")\n",
 863 |     "    #for i in range(len(weights)):\n",
 864 |     "    #    print(i+1); print(weights[i].shape); print(weights[i])\n",
 865 |     "    \n",
 866 |     "    if i%50==0:\n",
 867 |     "        time.sleep(1)\n",
 868 |     "        clear_output()\n",
 869 |     "        print(i)\n",
 870 |     "        print(\"new cost:\"); print(nnCost(weights, X, Y))\n",
 871 |     "        print(\"new accuracy: \"); evaluate(weights, X, Y)\n",
 872 |     "        print(forwardProp(X, weights)[-1])\n"
 873 |    ]
 874 |   },
 875 |   {
 876 |    "cell_type": "code",
 877 |    "execution_count": 718,
 878 |    "metadata": {
 879 |     "collapsed": true
 880 |    },
 881 |    "outputs": [],
 882 |    "source": [
 883 |     "# Revert back to original weights (if needed)\n",
 884 |     "weights = [np.array(w) for w in oldWeights]"
 885 |    ]
 886 |   },
 887 |   {
 888 |    "cell_type": "markdown",
 889 |    "metadata": {},
 890 |    "source": [
 891 |     "### Training\n",
 892 |     "\n",
 893 |     "Keep calling backProp() again and again until the cost decreases so much that we reach our desired accuracy.\n",
 894 |     "\n",
 895 |     "You can observe the cost of the function going down with iterations."
 896 |    ]
 897 |   },
 898 |   {
 899 |    "cell_type": "markdown",
 900 |    "metadata": {},
 901 |    "source": [
 902 |     "# Problems\n",
 903 |     "\n",
 904 |     "### - Not reaching desired accuracy fast enough\n",
 905 |     "\n",
 906 |     "It takes too many iterations of the backProp algorithm for the network to reach the desired output.\n",
 907 |     "\n",
 908 |     "One of the simplest ways of solving this problem is by adding a Learning Rate (described below) to the back-propagation algorithm.\n",
 909 |     "\n",
 910 |     "### - Taking too long to compute one iteration\n",
 911 |     "\n",
 912 |     "Within one iteration, the multiplication and summing operations take too long because there are too many data points feeded into the network.\n",
 913 |     "\n",
 914 |     "This problem is tackled using Stochastic Gradient Descent (talked about in the next tutorial). The above algorithm is running Batch Gradient Descent. "
 915 |    ]
 916 |   },
 917 |   {
 918 |    "cell_type": "markdown",
 919 |    "metadata": {},
 920 |    "source": [
 921 |     "# Learning Rate\n",
 922 |     "\n",
 923 |     "Usually, it is desired that we change the amount with which we back propagate, so that we can train our network to reach the desired accuracy faster. So we multiply ${\\Delta}W$ with a factor to control this.\n",
 924 |     "\n",
 925 |     "$$W \\leftarrow W + \\eta*{\\Delta}W$$"
 926 |    ]
 927 |   },
 928 |   {
 929 |    "cell_type": "markdown",
 930 |    "metadata": {},
 931 |    "source": [
 932 |     "If $\\eta$ is large, then we take bigger steps to the assumed minimum. If $\\eta$ is small, we take smaller steps.\n",
 933 |     "\n",
 934 |     "Remember that we are not actually travelling on the gradient, we are only approximating the direction using a ${\\Delta}W$ instead of a ${\\delta}W$. So we don't always point in the direction of the minimum, we could undershoot or overshoot."
 935 |    ]
 936 |   },
 937 |   {
 938 |    "cell_type": "markdown",
 939 |    "metadata": {},
 940 |    "source": [
 941 |     "If $\\eta$ is too small, we might take too long to get to the minimum.\n",
 942 |     "\n",
 943 |     "If $\\eta$ is too big, we might start climbing back up the hill and our cost would keep increasing instead of decreasing!"
 944 |    ]
 945 |   },
 946 |   {
 947 |    "cell_type": "markdown",
 948 |    "metadata": {},
 949 |    "source": [
 950 |     "One way to ensure that we get the best learning rate is to start at, say, 1,\n",
 951 |     "- increase $\\eta$ by 5% if the cost is decreasing\n",
 952 |     "- decrease $\\eta$ to 50% if the cost is increasing"
 953 |    ]
 954 |   },
 955 |   {
 956 |    "cell_type": "markdown",
 957 |    "metadata": {},
 958 |    "source": [
 959 |     "### Different ways to manipulate learning rate\n",
 960 |     "\n",
 961 |     "There are various methods available that leverage the variability of learning rate, to produce results that \"converge\" (reach a minimum) faster. The following list includes those with even more complicated methods of trying to converge faster:\n",
 962 |     "\n",
 963 |     "<center>![Optimizers](images/optimizers.gif)"
 964 |    ]
 965 |   },
 966 |   {
 967 |    "cell_type": "markdown",
 968 |    "metadata": {},
 969 |    "source": [
 970 |     "As can be seen, Stochastic Gradient Descent (SGD) itself performs slower than all the other methods, and the one that we are using (Batch Gradient Descent) is even slower."
 971 |    ]
 972 |   },
 973 |   {
 974 |    "cell_type": "markdown",
 975 |    "metadata": {},
 976 |    "source": [
 977 |     "Below is an implementation of backProp with provision for learning rate:"
 978 |    ]
 979 |   },
 980 |   {
 981 |    "cell_type": "code",
 982 |    "execution_count": 485,
 983 |    "metadata": {
 984 |     "collapsed": true
 985 |    },
 986 |    "outputs": [],
 987 |    "source": [
 988 |     "# IMPLEMENTING BACK-PROPAGATION WITH LEARNING RATE\n",
 989 |     "# Added eta, the learning rate, as an input\n",
 990 |     "def backProp(weights, X, Y, learningRate):\n",
 991 |     "    # Forward propagate to find outputs\n",
 992 |     "    outputs = forwardProp(X, weights)\n",
 993 |     "    \n",
 994 |     "    # For the last layer, bpError = error = yPred - Y\n",
 995 |     "    bpError = outputs[-1] - Y\n",
 996 |     "    \n",
 997 |     "    # Back-propagating from the last layer to the first\n",
 998 |     "    for l, w in enumerate(reversed(weights)):\n",
 999 |     "        \n",
1000 |     "        # Find yPred for this layer\n",
1001 |     "        yPred = outputs[-l-1]\n",
1002 |     "        \n",
1003 |     "        # Calculate delta for this layer using bpError from next layer\n",
1004 |     "        delta = np.multiply(np.multiply(bpError, yPred), 1-yPred)\n",
1005 |     "        \n",
1006 |     "        # Find input to the layer, by adding bias to the output of the previous layer\n",
1007 |     "        # Take care, l goes from 0 to 1, while the weights are in reverse order\n",
1008 |     "        if l==len(weights)-1: # If 1st layer has been reached\n",
1009 |     "            xL = addBiasTerms(X)\n",
1010 |     "        else:\n",
1011 |     "            xL = addBiasTerms(outputs[-l-2])\n",
1012 |     "        \n",
1013 |     "        # Calculate deltaW for this layer\n",
1014 |     "        deltaW = -np.dot(delta.T, xL)/len(Y)\n",
1015 |     "        \n",
1016 |     "        # Calculate bpError for previous layer to be back-propagated\n",
1017 |     "        bpError = np.dot(delta, w)\n",
1018 |     "        \n",
1019 |     "        # Ignore bias term in bpError\n",
1020 |     "        bpError = bpError[:,1:]\n",
1021 |     "        \n",
1022 |     "        # Change weights of the current layer (W <- W + eta*deltaW)\n",
1023 |     "        w += learningRate*deltaW"
1024 |    ]
1025 |   },
1026 |   {
1027 |    "cell_type": "markdown",
1028 |    "metadata": {},
1029 |    "source": [
1030 |     "Given this back-propagation code, it is better to launch another function that calls it iteratively until we reach the desired accuracy."
1031 |    ]
1032 |   },
1033 |   {
1034 |    "cell_type": "markdown",
1035 |    "metadata": {},
1036 |    "source": [
1037 |     "We shall look at training schemes and experiments in the next tutorial."
1038 |    ]
1039 |   }
1040 |  ],
1041 |  "metadata": {
1042 |   "kernelspec": {
1043 |    "display_name": "Python 3",
1044 |    "language": "python",
1045 |    "name": "python3"
1046 |   },
1047 |   "language_info": {
1048 |    "codemirror_mode": {
1049 |     "name": "ipython",
1050 |     "version": 3
1051 |    },
1052 |    "file_extension": ".py",
1053 |    "mimetype": "text/x-python",
1054 |    "name": "python",
1055 |    "nbconvert_exporter": "python",
1056 |    "pygments_lexer": "ipython3",
1057 |    "version": "3.5.2"
1058 |   }
1059 |  },
1060 |  "nbformat": 4,
1061 |  "nbformat_minor": 2
1062 | }
1063 | 


--------------------------------------------------------------------------------
/5_Neural_Network_Tutorial_Training_And_Testing.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "code",
   5 |    "execution_count": 2,
   6 |    "metadata": {
   7 |     "collapsed": true
   8 |    },
   9 |    "outputs": [],
  10 |    "source": [
  11 |     "# Pre-requisites\n",
  12 |     "import numpy as np\n",
  13 |     "import time\n",
  14 |     "\n",
  15 |     "# For plots\n",
  16 |     "%matplotlib inline\n",
  17 |     "import matplotlib.pyplot as plt\n",
  18 |     "\n",
  19 |     "# To clear print buffer\n",
  20 |     "from IPython.display import clear_output"
  21 |    ]
  22 |   },
  23 |   {
  24 |    "cell_type": "markdown",
  25 |    "metadata": {},
  26 |    "source": [
  27 |     "# Importing functions from the previous tutorials:"
  28 |    ]
  29 |   },
  30 |   {
  31 |    "cell_type": "code",
  32 |    "execution_count": 3,
  33 |    "metadata": {},
  34 |    "outputs": [
  35 |     {
  36 |      "name": "stdout",
  37 |      "output_type": "stream",
  38 |      "text": [
  39 |       "weights:\n",
  40 |       "1\n",
  41 |       "(2, 3)\n",
  42 |       "[[-0.33589735 -0.396816    0.45849862]\n",
  43 |       " [-0.64374374 -2.41279823  0.78403628]]\n",
  44 |       "2\n",
  45 |       "(1, 3)\n",
  46 |       "[[ 1.54182154 -0.12516091 -0.28203429]]\n"
  47 |      ]
  48 |     }
  49 |    ],
  50 |    "source": [
  51 |     "# Initializing weight matrices from layer sizes\n",
  52 |     "def initializeWeights(layers):\n",
  53 |     "    weights = [np.random.randn(o, i+1) for i, o in zip(layers[:-1], layers[1:])]\n",
  54 |     "    return weights\n",
  55 |     "\n",
  56 |     "# Add a bias term to every data point in the input\n",
  57 |     "def addBiasTerms(X):\n",
  58 |     "        # Make the input an np.array()\n",
  59 |     "        X = np.array(X)\n",
  60 |     "        \n",
  61 |     "        # Forcing 1D vectors to be 2D matrices of 1xlength dimensions\n",
  62 |     "        if X.ndim==1:\n",
  63 |     "            X = np.reshape(X, (1, len(X)))\n",
  64 |     "        \n",
  65 |     "        # Inserting bias terms\n",
  66 |     "        X = np.insert(X, 0, 1, axis=1)\n",
  67 |     "        \n",
  68 |     "        return X\n",
  69 |     "\n",
  70 |     "# Sigmoid function\n",
  71 |     "def sigmoid(a):\n",
  72 |     "    return 1/(1 + np.exp(-a))\n",
  73 |     "\n",
  74 |     "# Forward Propagation of outputs\n",
  75 |     "def forwardProp(X, weights):\n",
  76 |     "    # Initializing an empty list of outputs\n",
  77 |     "    outputs = []\n",
  78 |     "    \n",
  79 |     "    # Assigning a name to reuse as inputs\n",
  80 |     "    inputs = X\n",
  81 |     "    \n",
  82 |     "    # For each layer\n",
  83 |     "    for w in weights:\n",
  84 |     "        # Add bias term to input\n",
  85 |     "        inputs = addBiasTerms(inputs)\n",
  86 |     "        \n",
  87 |     "        # Y = Sigmoid ( X .* W^T )\n",
  88 |     "        outputs.append(sigmoid(np.dot(inputs, w.T)))\n",
  89 |     "        \n",
  90 |     "        # Input of next layer is output of this layer\n",
  91 |     "        inputs = outputs[-1]\n",
  92 |     "        \n",
  93 |     "    return outputs\n",
  94 |     "\n",
  95 |     "# Compute COST (J) of Neural Network\n",
  96 |     "def nnCost(weights, X, Y):\n",
  97 |     "    # Calculate yPred\n",
  98 |     "    yPred = forwardProp(X, weights)[-1]\n",
  99 |     "    \n",
 100 |     "    # Compute J\n",
 101 |     "    J = 0.5*np.sum((yPred-Y)**2)/len(Y)\n",
 102 |     "    \n",
 103 |     "    return J\n",
 104 |     "\n",
 105 |     "# IMPLEMENTING BACK-PROPAGATION WITH LEARNING RATE\n",
 106 |     "# Added eta, the learning rate, as an input\n",
 107 |     "def backProp(weights, X, Y, learningRate):\n",
 108 |     "    # Forward propagate to find outputs\n",
 109 |     "    outputs = forwardProp(X, weights)\n",
 110 |     "    \n",
 111 |     "    # For the last layer, bpError = error = yPred - Y\n",
 112 |     "    bpError = outputs[-1] - Y\n",
 113 |     "    \n",
 114 |     "    # Back-propagating from the last layer to the first\n",
 115 |     "    for l, w in enumerate(reversed(weights)):\n",
 116 |     "        \n",
 117 |     "        # Find yPred for this layer\n",
 118 |     "        yPred = outputs[-l-1]\n",
 119 |     "        \n",
 120 |     "        # Calculate delta for this layer using bpError from next layer\n",
 121 |     "        delta = np.multiply(np.multiply(bpError, yPred), 1-yPred)\n",
 122 |     "        \n",
 123 |     "        # Find input to the layer, by adding bias to the output of the previous layer\n",
 124 |     "        # Take care, l goes from 0 to 1, while the weights are in reverse order\n",
 125 |     "        if l==len(weights)-1: # If 1st layer has been reached\n",
 126 |     "            xL = addBiasTerms(X)\n",
 127 |     "        else:\n",
 128 |     "            xL = addBiasTerms(outputs[-l-2])\n",
 129 |     "        \n",
 130 |     "        # Calculate deltaW for this layer\n",
 131 |     "        deltaW = -np.dot(delta.T, xL)/len(Y)\n",
 132 |     "        \n",
 133 |     "        # Calculate bpError for previous layer to be back-propagated\n",
 134 |     "        bpError = np.dot(delta, w)\n",
 135 |     "        \n",
 136 |     "        # Ignore bias term in bpError\n",
 137 |     "        bpError = bpError[:,1:]\n",
 138 |     "        \n",
 139 |     "        # Change weights of the current layer (W <- W + eta*deltaW)\n",
 140 |     "        w += learningRate*deltaW\n",
 141 |     "\n",
 142 |     "# Evaluate the accuracy of weights for input X and desired outptut Y\n",
 143 |     "def evaluate(weights, X, Y):\n",
 144 |     "    yPreds = forwardProp(X, weights)[-1]\n",
 145 |     "    # Check if maximum probability is from that neuron corresponding to desired class,\n",
 146 |     "    # AND check if that maximum probability is greater than 0.5\n",
 147 |     "    yes = sum( int( ( np.argmax(yPreds[i]) == np.argmax(Y[i]) ) and \n",
 148 |     "                    ( (yPreds[i][np.argmax(yPreds[i])]>0.5) == (Y[i][np.argmax(Y[i])]>0.5) ) )\n",
 149 |     "              for i in range(len(Y)) )\n",
 150 |     "    print(str(yes)+\" out of \"+str(len(Y))+\" : \"+str(float(yes/len(Y))))\n",
 151 |     "\n",
 152 |     "# Initialize network\n",
 153 |     "layers = [2, 2, 1]\n",
 154 |     "weights = initializeWeights(layers)\n",
 155 |     "\n",
 156 |     "print(\"weights:\")\n",
 157 |     "for i in range(len(weights)):\n",
 158 |     "    print(i+1); print(weights[i].shape); print(weights[i])\n",
 159 |     "\n",
 160 |     "# Declare input and desired output for AND gate\n",
 161 |     "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n",
 162 |     "Y = np.array([[0], [0], [0], [1]])"
 163 |    ]
 164 |   },
 165 |   {
 166 |    "cell_type": "markdown",
 167 |    "metadata": {},
 168 |    "source": [
 169 |     "# Batch Gradient Descent\n",
 170 |     "\n",
 171 |     "Batch Gradient Descent is how we have tried to train our network so far - give it ALL the data points, compute ${\\Delta}W$s by summing up quantities across ALL the data points, change all the weights once, Repeat."
 172 |    ]
 173 |   },
 174 |   {
 175 |    "cell_type": "markdown",
 176 |    "metadata": {},
 177 |    "source": [
 178 |     "Suppose we want to train our 3-neuron network to implement Logical XOR.\n",
 179 |     "\n",
 180 |     "Inputs are: $X=\\left[\\begin{array}{c}(0,0)\\\\(0,1)\\\\(1,0)\\\\(1,1)\\end{array}\\right]$, and the desired output is $Y=\\left[\\begin{array}{c}0\\\\1\\\\1\\\\0\\end{array}\\right]$."
 181 |    ]
 182 |   },
 183 |   {
 184 |    "cell_type": "markdown",
 185 |    "metadata": {},
 186 |    "source": [
 187 |     "We know that in order to train the network, we need to call backProp repeatedly. Let us use a function to do that."
 188 |    ]
 189 |   },
 190 |   {
 191 |    "cell_type": "code",
 192 |    "execution_count": 4,
 193 |    "metadata": {
 194 |     "collapsed": true
 195 |    },
 196 |    "outputs": [],
 197 |    "source": [
 198 |     "# TRAINING FUNCTION, USING GD\n",
 199 |     "def train(weights, X, Y, nIterations, learningRate=1):\n",
 200 |     "    for i in range(nIterations):\n",
 201 |     "        # Run backprop\n",
 202 |     "        backProp(weights, X, Y, learningRate)\n",
 203 |     "        \n",
 204 |     "        # Clears screen output\n",
 205 |     "        if (i+1)%(nIterations/10)==0:\n",
 206 |     "            clear_output()\n",
 207 |     "            print(\"Iteration \"+str(i+1)+\" of \"+str(nIterations))\n",
 208 |     "            # Prints Cost and Accuracy\n",
 209 |     "            print(\"Cost: \"+str(nnCost(weights, X, Y)))\n",
 210 |     "            print(\"Accuracy:\")\n",
 211 |     "            evaluate(weights, X, Y)"
 212 |    ]
 213 |   },
 214 |   {
 215 |    "cell_type": "code",
 216 |    "execution_count": 5,
 217 |    "metadata": {},
 218 |    "outputs": [
 219 |     {
 220 |      "name": "stdout",
 221 |      "output_type": "stream",
 222 |      "text": [
 223 |       "weights:\n",
 224 |       "1\n",
 225 |       "(2, 3)\n",
 226 |       "[[ 0.04837515  0.26989845 -0.24049688]\n",
 227 |       " [ 0.40457749 -1.12764482  1.62391936]]\n",
 228 |       "2\n",
 229 |       "(1, 3)\n",
 230 |       "[[-0.21690785 -0.77508326  0.61363791]]\n"
 231 |      ]
 232 |     }
 233 |    ],
 234 |    "source": [
 235 |     "# Initialize network\n",
 236 |     "layers = [2, 2, 1]\n",
 237 |     "weights = initializeWeights(layers)\n",
 238 |     "\n",
 239 |     "print(\"weights:\")\n",
 240 |     "for i in range(len(weights)):\n",
 241 |     "    print(i+1); print(weights[i].shape); print(weights[i])\n",
 242 |     "\n",
 243 |     "# Take backup of weights to be used later for comparison\n",
 244 |     "initialWeights = [np.array(w) for w in weights]"
 245 |    ]
 246 |   },
 247 |   {
 248 |    "cell_type": "code",
 249 |    "execution_count": 6,
 250 |    "metadata": {
 251 |     "collapsed": true
 252 |    },
 253 |    "outputs": [],
 254 |    "source": [
 255 |     "# Declare input and desired output for XOR gate\n",
 256 |     "X = np.array([[0,0], [0,1], [1,0], [1,1]])\n",
 257 |     "Y = np.array([[0], [1], [1], [0]])"
 258 |    ]
 259 |   },
 260 |   {
 261 |    "cell_type": "code",
 262 |    "execution_count": 7,
 263 |    "metadata": {},
 264 |    "outputs": [
 265 |     {
 266 |      "name": "stdout",
 267 |      "output_type": "stream",
 268 |      "text": [
 269 |       "Cost: 0.12907524705\n",
 270 |       "Accuracy: \n",
 271 |       "2 out of 4 : 0.5\n",
 272 |       "[[ 0.43886508]\n",
 273 |       " [ 0.49374299]\n",
 274 |       " [ 0.38577198]\n",
 275 |       " [ 0.4543426 ]]\n"
 276 |      ]
 277 |     }
 278 |    ],
 279 |    "source": [
 280 |     "# Check current accuracy and cost\n",
 281 |     "print(\"Cost: \"+str(nnCost(weights, X, Y)))\n",
 282 |     "print(\"Accuracy: \")\n",
 283 |     "evaluate(weights, X, Y)\n",
 284 |     "print(forwardProp(X, weights)[-1])"
 285 |    ]
 286 |   },
 287 |   {
 288 |    "cell_type": "markdown",
 289 |    "metadata": {},
 290 |    "source": [
 291 |     "Say we want to train our model 600 times."
 292 |    ]
 293 |   },
 294 |   {
 295 |    "cell_type": "code",
 296 |    "execution_count": 8,
 297 |    "metadata": {},
 298 |    "outputs": [
 299 |     {
 300 |      "name": "stdout",
 301 |      "output_type": "stream",
 302 |      "text": [
 303 |       "Iteration 400 of 400\n",
 304 |       "Cost: 0.124997811474\n",
 305 |       "Accuracy:\n",
 306 |       "3 out of 4 : 0.75\n",
 307 |       "[[ 0.49895486]\n",
 308 |       " [ 0.50338071]\n",
 309 |       " [ 0.49407386]\n",
 310 |       " [ 0.4984321 ]]\n"
 311 |      ]
 312 |     }
 313 |    ],
 314 |    "source": [
 315 |     "nIterations = 400\n",
 316 |     "train(weights, X, Y, nIterations)\n",
 317 |     "print(forwardProp(X, weights)[-1])"
 318 |    ]
 319 |   },
 320 |   {
 321 |    "cell_type": "code",
 322 |    "execution_count": 9,
 323 |    "metadata": {
 324 |     "collapsed": true
 325 |    },
 326 |    "outputs": [],
 327 |    "source": [
 328 |     "# In case we want to revert the weight back\n",
 329 |     "weights = [np.array(w) for w in initialWeights]"
 330 |    ]
 331 |   },
 332 |   {
 333 |    "cell_type": "markdown",
 334 |    "metadata": {},
 335 |    "source": [
 336 |     "It took our function a long time to train.\n",
 337 |     "\n",
 338 |     "What if we speed up using adaptive learning rate?"
 339 |    ]
 340 |   },
 341 |   {
 342 |    "cell_type": "code",
 343 |    "execution_count": 10,
 344 |    "metadata": {
 345 |     "collapsed": true
 346 |    },
 347 |    "outputs": [],
 348 |    "source": [
 349 |     "# TRAINING FUNCTION, USING GD\n",
 350 |     "# Default learning rate = 1.0\n",
 351 |     "def trainUsingGD(weights, X, Y, nIterations, learningRate=1.0):\n",
 352 |     "    # Setting initial cost to infinity\n",
 353 |     "    prevCost = np.inf\n",
 354 |     "    \n",
 355 |     "    # For nIterations number of iterations:\n",
 356 |     "    for i in range(nIterations):\n",
 357 |     "        # Run backprop\n",
 358 |     "        backProp(weights, X, Y, learningRate)\n",
 359 |     "        \n",
 360 |     "        #clear_output()\n",
 361 |     "        print(\"Iteration \"+str(i+1)+\" of \"+str(nIterations))\n",
 362 |     "        cost = nnCost(weights, X, Y)\n",
 363 |     "        print(\"Cost: \"+str(cost))\n",
 364 |     "        \n",
 365 |     "        # ADAPT LEARNING RATE\n",
 366 |     "        # If cost increases\n",
 367 |     "        if (cost > prevCost):\n",
 368 |     "            # Halve the learning rate\n",
 369 |     "            learningRate /= 2.0\n",
 370 |     "        # If cost decreases\n",
 371 |     "        else:\n",
 372 |     "            # Increase learning rate by 5%\n",
 373 |     "            learningRate *= 1.05\n",
 374 |     "        \n",
 375 |     "        prevCost = cost"
 376 |    ]
 377 |   },
 378 |   {
 379 |    "cell_type": "code",
 380 |    "execution_count": 11,
 381 |    "metadata": {
 382 |     "collapsed": true
 383 |    },
 384 |    "outputs": [],
 385 |    "source": [
 386 |     "# Revert weights back to initial values\n",
 387 |     "weights = [np.array(w) for w in initialWeights]"
 388 |    ]
 389 |   },
 390 |   {
 391 |    "cell_type": "code",
 392 |    "execution_count": 12,
 393 |    "metadata": {},
 394 |    "outputs": [
 395 |     {
 396 |      "name": "stdout",
 397 |      "output_type": "stream",
 398 |      "text": [
 399 |       "Iteration 1 of 100\n",
 400 |       "Cost: 0.128848112614\n",
 401 |       "Iteration 2 of 100\n",
 402 |       "Cost: 0.128650869728\n",
 403 |       "Iteration 3 of 100\n",
 404 |       "Cost: 0.12848026395\n",
 405 |       "Iteration 4 of 100\n",
 406 |       "Cost: 0.128332996448\n",
 407 |       "Iteration 5 of 100\n",
 408 |       "Cost: 0.128205816336\n",
 409 |       "Iteration 6 of 100\n",
 410 |       "Cost: 0.128095601033\n",
 411 |       "Iteration 7 of 100\n",
 412 |       "Cost: 0.127999422128\n",
 413 |       "Iteration 8 of 100\n",
 414 |       "Cost: 0.12791459536\n",
 415 |       "Iteration 9 of 100\n",
 416 |       "Cost: 0.127838714376\n",
 417 |       "Iteration 10 of 100\n",
 418 |       "Cost: 0.12776966891\n",
 419 |       "Iteration 11 of 100\n",
 420 |       "Cost: 0.127705648793\n",
 421 |       "Iteration 12 of 100\n",
 422 |       "Cost: 0.127645135859\n",
 423 |       "Iteration 13 of 100\n",
 424 |       "Cost: 0.127586886153\n",
 425 |       "Iteration 14 of 100\n",
 426 |       "Cost: 0.127529905081\n",
 427 |       "Iteration 15 of 100\n",
 428 |       "Cost: 0.127473418103\n",
 429 |       "Iteration 16 of 100\n",
 430 |       "Cost: 0.127416839401\n",
 431 |       "Iteration 17 of 100\n",
 432 |       "Cost: 0.127359740627\n",
 433 |       "Iteration 18 of 100\n",
 434 |       "Cost: 0.127301821408\n",
 435 |       "Iteration 19 of 100\n",
 436 |       "Cost: 0.127242882839\n",
 437 |       "Iteration 20 of 100\n",
 438 |       "Cost: 0.127182804686\n",
 439 |       "Iteration 21 of 100\n",
 440 |       "Cost: 0.127121526616\n",
 441 |       "Iteration 22 of 100\n",
 442 |       "Cost: 0.127059033379\n",
 443 |       "Iteration 23 of 100\n",
 444 |       "Cost: 0.126995343612\n",
 445 |       "Iteration 24 of 100\n",
 446 |       "Cost: 0.126930501761\n",
 447 |       "Iteration 25 of 100\n",
 448 |       "Cost: 0.126864572508\n",
 449 |       "Iteration 26 of 100\n",
 450 |       "Cost: 0.126797637148\n",
 451 |       "Iteration 27 of 100\n",
 452 |       "Cost: 0.126729791334\n",
 453 |       "Iteration 28 of 100\n",
 454 |       "Cost: 0.126661143778\n",
 455 |       "Iteration 29 of 100\n",
 456 |       "Cost: 0.126591815524\n",
 457 |       "Iteration 30 of 100\n",
 458 |       "Cost: 0.126521939537\n",
 459 |       "Iteration 31 of 100\n",
 460 |       "Cost: 0.126451660431\n",
 461 |       "Iteration 32 of 100\n",
 462 |       "Cost: 0.126381134199\n",
 463 |       "Iteration 33 of 100\n",
 464 |       "Cost: 0.126310527861\n",
 465 |       "Iteration 34 of 100\n",
 466 |       "Cost: 0.126240018977\n",
 467 |       "Iteration 35 of 100\n",
 468 |       "Cost: 0.126169794983\n",
 469 |       "Iteration 36 of 100\n",
 470 |       "Cost: 0.126100052304\n",
 471 |       "Iteration 37 of 100\n",
 472 |       "Cost: 0.126030995231\n",
 473 |       "Iteration 38 of 100\n",
 474 |       "Cost: 0.125962834522\n",
 475 |       "Iteration 39 of 100\n",
 476 |       "Cost: 0.125895785707\n",
 477 |       "Iteration 40 of 100\n",
 478 |       "Cost: 0.12583006709\n",
 479 |       "Iteration 41 of 100\n",
 480 |       "Cost: 0.125765897414\n",
 481 |       "Iteration 42 of 100\n",
 482 |       "Cost: 0.125703493213\n",
 483 |       "Iteration 43 of 100\n",
 484 |       "Cost: 0.125643065835\n",
 485 |       "Iteration 44 of 100\n",
 486 |       "Cost: 0.12558481818\n",
 487 |       "Iteration 45 of 100\n",
 488 |       "Cost: 0.125528941182\n",
 489 |       "Iteration 46 of 100\n",
 490 |       "Cost: 0.125475610101\n",
 491 |       "Iteration 47 of 100\n",
 492 |       "Cost: 0.125424980703\n",
 493 |       "Iteration 48 of 100\n",
 494 |       "Cost: 0.125377185428\n",
 495 |       "Iteration 49 of 100\n",
 496 |       "Cost: 0.12533232967\n",
 497 |       "Iteration 50 of 100\n",
 498 |       "Cost: 0.125290488278\n",
 499 |       "Iteration 51 of 100\n",
 500 |       "Cost: 0.125251702426\n",
 501 |       "Iteration 52 of 100\n",
 502 |       "Cost: 0.125215976974\n",
 503 |       "Iteration 53 of 100\n",
 504 |       "Cost: 0.125183278403\n",
 505 |       "Iteration 54 of 100\n",
 506 |       "Cost: 0.125153533406\n",
 507 |       "Iteration 55 of 100\n",
 508 |       "Cost: 0.125126628143\n",
 509 |       "Iteration 56 of 100\n",
 510 |       "Cost: 0.125102408083\n",
 511 |       "Iteration 57 of 100\n",
 512 |       "Cost: 0.125080678309\n",
 513 |       "Iteration 58 of 100\n",
 514 |       "Cost: 0.125061203999\n",
 515 |       "Iteration 59 of 100\n",
 516 |       "Cost: 0.125043710737\n",
 517 |       "Iteration 60 of 100\n",
 518 |       "Cost: 0.125027884097\n",
 519 |       "Iteration 61 of 100\n",
 520 |       "Cost: 0.125013367839\n",
 521 |       "Iteration 62 of 100\n",
 522 |       "Cost: 0.124999759817\n",
 523 |       "Iteration 63 of 100\n",
 524 |       "Cost: 0.124986604465\n",
 525 |       "Iteration 64 of 100\n",
 526 |       "Cost: 0.12497338044\n",
 527 |       "Iteration 65 of 100\n",
 528 |       "Cost: 0.124959481574\n",
 529 |       "Iteration 66 of 100\n",
 530 |       "Cost: 0.124944188837\n",
 531 |       "Iteration 67 of 100\n",
 532 |       "Cost: 0.12492663033\n",
 533 |       "Iteration 68 of 100\n",
 534 |       "Cost: 0.124905725647\n",
 535 |       "Iteration 69 of 100\n",
 536 |       "Cost: 0.124880110131\n",
 537 |       "Iteration 70 of 100\n",
 538 |       "Cost: 0.124848033934\n",
 539 |       "Iteration 71 of 100\n",
 540 |       "Cost: 0.124807230659\n",
 541 |       "Iteration 72 of 100\n",
 542 |       "Cost: 0.124754751262\n",
 543 |       "Iteration 73 of 100\n",
 544 |       "Cost: 0.124686761318\n",
 545 |       "Iteration 74 of 100\n",
 546 |       "Cost: 0.124598303179\n",
 547 |       "Iteration 75 of 100\n",
 548 |       "Cost: 0.124483025338\n",
 549 |       "Iteration 76 of 100\n",
 550 |       "Cost: 0.12433286859\n",
 551 |       "Iteration 77 of 100\n",
 552 |       "Cost: 0.124137651121\n",
 553 |       "Iteration 78 of 100\n",
 554 |       "Cost: 0.123884387194\n",
 555 |       "Iteration 79 of 100\n",
 556 |       "Cost: 0.123556010954\n",
 557 |       "Iteration 80 of 100\n",
 558 |       "Cost: 0.123129051477\n",
 559 |       "Iteration 81 of 100\n",
 560 |       "Cost: 0.122569925578\n",
 561 |       "Iteration 82 of 100\n",
 562 |       "Cost: 0.121830095196\n",
 563 |       "Iteration 83 of 100\n",
 564 |       "Cost: 0.120841446193\n",
 565 |       "Iteration 84 of 100\n",
 566 |       "Cost: 0.119514949723\n",
 567 |       "Iteration 85 of 100\n",
 568 |       "Cost: 0.117748246786\n",
 569 |       "Iteration 86 of 100\n",
 570 |       "Cost: 0.115450497266\n",
 571 |       "Iteration 87 of 100\n",
 572 |       "Cost: 0.112589941634\n",
 573 |       "Iteration 88 of 100\n",
 574 |       "Cost: 0.109249438151\n",
 575 |       "Iteration 89 of 100\n",
 576 |       "Cost: 0.105640175411\n",
 577 |       "Iteration 90 of 100\n",
 578 |       "Cost: 0.102027696196\n",
 579 |       "Iteration 91 of 100\n",
 580 |       "Cost: 0.0990064970213\n",
 581 |       "Iteration 92 of 100\n",
 582 |       "Cost: 0.123641875887\n",
 583 |       "Iteration 93 of 100\n",
 584 |       "Cost: 0.206124967964\n",
 585 |       "Iteration 94 of 100\n",
 586 |       "Cost: 0.128853866919\n",
 587 |       "Iteration 95 of 100\n",
 588 |       "Cost: 0.100914621849\n",
 589 |       "Iteration 96 of 100\n",
 590 |       "Cost: 0.0954172210932\n",
 591 |       "Iteration 97 of 100\n",
 592 |       "Cost: 0.0925797728969\n",
 593 |       "Iteration 98 of 100\n",
 594 |       "Cost: 0.0909633907318\n",
 595 |       "Iteration 99 of 100\n",
 596 |       "Cost: 0.0897659003613\n",
 597 |       "Iteration 100 of 100\n",
 598 |       "Cost: 0.0886726139317\n"
 599 |      ]
 600 |     }
 601 |    ],
 602 |    "source": [
 603 |     "# Train for nIterations\n",
 604 |     "# Don't expect same results for running with 20 iterations\n",
 605 |     "# as with running twice with 10 iterations - learning rates are different!\n",
 606 |     "nIterations = 100\n",
 607 |     "trainUsingGD(weights, X, Y, nIterations)"
 608 |    ]
 609 |   },
 610 |   {
 611 |    "cell_type": "markdown",
 612 |    "metadata": {},
 613 |    "source": [
 614 |     "We see that with adaptive learning rate, we reach the desired output much faster!"
 615 |    ]
 616 |   },
 617 |   {
 618 |    "cell_type": "markdown",
 619 |    "metadata": {},
 620 |    "source": [
 621 |     "# MNIST Dataset\n",
 622 |     "\n",
 623 |     "MNIST is a dataset of 60000 images of hand-written numbers."
 624 |    ]
 625 |   },
 626 |   {
 627 |    "cell_type": "code",
 628 |    "execution_count": 13,
 629 |    "metadata": {
 630 |     "collapsed": true
 631 |    },
 632 |    "outputs": [],
 633 |    "source": [
 634 |     "# Load MNIST DATA\n",
 635 |     "# Use numpy.load() to load the .npz file\n",
 636 |     "f = np.load('mnist.npz')\n",
 637 |     "# Saving the files\n",
 638 |     "x_train = f['x_train']\n",
 639 |     "y_train = f['y_train']\n",
 640 |     "x_test = f['x_test']\n",
 641 |     "y_test = f['y_test']\n",
 642 |     "f.close()"
 643 |    ]
 644 |   },
 645 |   {
 646 |    "cell_type": "code",
 647 |    "execution_count": 14,
 648 |    "metadata": {},
 649 |    "outputs": [
 650 |     {
 651 |      "name": "stdout",
 652 |      "output_type": "stream",
 653 |      "text": [
 654 |       "x_train.shape = (60000, 28, 28)\n",
 655 |       "y_train.shape = (60000,)\n"
 656 |      ]
 657 |     },
 658 |     {
 659 |      "data": {
 660 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlMAAACNCAYAAACT6v+eAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3WeAFMX29/HvkhQDIKBEI4JgJoMiiChiAgVFURQxoKhg\nRBC9JpJIUBAEAdM1Z1BQMSCgcPUa0D+KRHPEAIogK2GfF/2c6tnd2djTM917f5836M7sTNX2THf1\nqVOnsnJychARERGR0imX6QaIiIiIxJkGUyIiIiIBaDAlIiIiEoAGUyIiIiIBaDAlIiIiEoAGUyIi\nIiIBaDAlIiIiEoAGUyIiIiIBaDAlIiIiEkCFdL5ZVlZWrMut5+TkZBX1nLLex7LeP1Af40B9LPv9\nA/UxDtRHjyJTIiIiIgFoMCUiIiISgAZTIiIiIgFoMCUiIiISgAZTIiIiIgFoMBVTzZs358EHH+TB\nBx9k27ZtbNu2zf1/s2bNMt08EYmhCRMmkJOTQ05ODkuXLmXp0qXsvffemW6WSCjefPNN5s2bx7x5\n8wK/lgZTIiIiIgGktc5UGMqXL0/VqlXz/fyKK64AYKeddgLggAMOAODyyy9n7NixAPTq1QuAzZs3\nc8cddwBw2223hd7mIA4//HAAXn/9dapUqQJATo5XwuPcc88FoGvXrtSoUSMzDUyTTp06AfDYY48B\n0KFDB1asWJHJJqXETTfdBHifw3LlvHudo48+GoAFCxZkqllSiF133ZVddtkFgJNOOgmA3XffHYDx\n48eTnZ2dsbYV1z777ANA79692b59OwBNmjQBoHHjxnz99deZalrKNGrUCICKFSvSvn17AO69914A\n1+eCzJo1C4CzzjoLgH/++SesZqZExYoVOeKIIwAYOXIkAEceeWQmmxQpd911FwBHHHEE//73v1Py\nmrEYTO21115UqlQJwH1A2rVrB0C1atXo0aNHka/x3XffATBx4kROO+00ADZs2ADAJ598EvkLVatW\nrQB47rnnAKhataobRFk/7Ateo0YN2rRpA8BHH32U67Ew2QmqRo0avPDCC6G+V8uWLQF4//33Q32f\ndDn//PMBGDx4MJD75G7HWaLBBh52rNq2bcvBBx+c9Ll16tRh4MCB6Wpaqf3yyy8ALFy4kK5du2a4\nNalx0EEHAf5364wzzgCgXLly1K1bF/C/Z0V9x+xvMnXqVACuuuoq/vzzz5S3OVWqVq3KW2+9BcBP\nP/0EQO3atd1//6+yoMmll14KwJYtW3jzzTdT8tqa5hMREREJINKRKZvSmjdvXtKpvOKwOw+bPvnr\nr7/c1NCPP/4IwLp16yI5RWRTlM2aNePRRx8FvDvdvFatWgXAnXfeCcCTTz7JokWLAL/fo0aNCr29\nNh3VsGHDUCNT5cqVY9999wVwybFZWUVW+48068eOO+6Y4ZaUXuvWrenduzfgTbuCHx0AuO666wD4\n4YcfAC+6bJ/r9957L51NLbHGjRsDXkTinHPOAaBy5cqA99n79ttvAT9KbFNkPXv2dFNJy5cvT2ub\nS2Ljxo0AZWI6z9g578QTT0zZa5533nkA3H///e4cG3W1a9d2//6vR6ZsxqZixYoAvPPOOzz99NMp\neW1FpkREREQCiHRk6ptvvgHgt99+K1Zkyu5u169fT8eOHQE/V+iRRx4JqZXhue+++wA/Ub4gVgrB\nkmAXLFjgokSHHnpoeA3Mw+7a/vOf/4T6PnXq1OHiiy8GcJGNKN/1F+bYY48FYMCAAbl+vnz5ck4+\n+WQAfv7557S3qyTOPPNMwFtWX7NmTcCPFM6fP98lY48ZMybX72VlZbnHLLE3Kux8M3r0aMDv4667\n7prvuatWreL4448H/Dte+zzWrFnT/U2irFq1agAcdthhGW5J6rz++utA/sjU2rVruf/++wHcIo/E\nHEXLy7XoatzFPWpfkPbt23PjjTcC/jXy999/L/D5vXr1crmNa9asAfxoeSpEejBlf5hBgwa5C8uS\nJUsAL5HcfPzxxwAcd9xxgBeytumFK6+8Mm3tTZXmzZsD/sqgxC+DJcq/9NJLblWiTZvY32bdunUc\nc8wx+X43bHZiCtuMGTPcf9sUZxy1a9eOBx98ECDfzcKYMWMiO+VSoYJ32mjRogUA06dPB7xp6YUL\nFwIwbNgwwAuj77DDDgAunN65c2f3Wh988EF6Gl1CtkjloosuKvA5dkI+7rjj3DTf/vvvH37jQmAp\nBXvttVe+x1q2bOkGh1H9TCYzZcoUAGbOnJnr51u2bCl0ustWSX/66acALlk98bWi+rlNxpLr45xC\nkMy0adNo2LAhAAceeCDgnW8KMnToULfK3W7GP/nkk5S1R9N8IiIiIgFEOjJlZs6c6SqUWoKnhaMv\nvPBCF6GxJEqAzz77DIB+/fqls6mBJNaQAnLVkXrllVcAP5zZoUMHl1xukRpb3vzJJ5+4sLVFt5o1\na+bKJKSaTSXWqlUrlNfPKzGKY3+rOOrTp0+uu17wpsWAlNU+CYMlmSdGCME7FjYdlrhs3H6WGJEC\nr1zJww8/HGZTS82W0ef11VdfuXIcVhrBolLgJ57HjUW3H3roIW699dZcj916662sX78egEmTJqW7\naaW2detWIPfxKQ6bst1tt93yPWYlduJQOyyvFi1a8O6772a6GSmzadOmYkXd7Lq69957u+tiGFE6\nRaZEREREAohFZArIVyDtjz/+cP9t859PPfUUUHQ12yhq1KgRgwYNAvzIy6+//gp4JRzsDv6vv/4C\nYM6cOcyZM6fI17Xl29dee61b0p1qluBp7xUWi3xZWQSA77//PtT3DIMlJF9wwQXus2p3/sOHD89Y\nu4pj2LBhDB06FPBzMWzp/0033ZS0kKElieY1cOBAF02NGjunWGT7tddeA2D16tWsXbu2wN9LV3Q2\nLMOGDcsXmfpfYYsg7NgnO5/dfPPNaW1TaW3dutVdI+160qBBg0w2KWUsH/OQQw7h888/B5LnPu28\n886AH0HeaaedXGTu2WefTXm7FJkSERERCSA2kam87O6pefPmbgmrLTO3u8g4sJVOY8eOdREeywuz\nUgMffPBB4KhPslU6qWL7HhrLV0s1y42rVasWK1euBPy/VRzYNiS2JVCie+65B8BtARE1dkc+dOhQ\nV25k7ty5gH/n9/fff7vnW05C586d3WfPVpZa9M32O4siyyEqaZSmbdu2IbQmvZKVCyirLFo/ZMgQ\ntxLTylskshXjW7ZsSV/jAli/fj1vv/02gFsJH3d77rkn4EcOt27d6vbgTRbhHj9+PODnP/7www+h\n7k8Y28GUJZtffPHFLrHalmi/9dZbbunq5MmTgejub9a0aVMgdy2Ubt26AfHd2DYV++VVqVKFLl26\nAH7Cc2ICs4V6bXosDqw/ibW/bF+oCRMmZKRNRbH6Q5dddhngfY9sEHXqqafme75dkGyXASvzAX5o\n3Sr1x5XttWfTCIkOOeSQXP+/ePHi0OuupVpx96uLOrt5sQ3g7WY7ke3xmqyvNmU9ZMgQXn75ZSD3\nDYOkh9WGsl01LE3innvuSXqNtNpRtiejGTFiRIit1DSfiIiISCCxjUyZNWvWuBGoFUA899xz3d2I\n3T3aUnPbjy8qLBSZlZXlRtmpiEhlMlRfvXr1pD+3chY23WN3ivXr16dSpUqAH3YvV66cuwu0yva2\nHLlChQp8+OGHIbU+HKeeeqrbsdy888479OnTB8i9oCJK7LgkVvG2yMwee+wBQN++fQHo2rWru4u0\navw5OTnurt+q1SeWMIk6K2ZpRQFvueWWfBW1y5Url+97ZtOEffv2Zdu2bWloqSQ6+OCDefHFF4HS\npzjYNNm0adNS1q5MsoKVcWCFgXv37l1gtfq2bdtyww03AP51tHr16m5az64zdu23HUXCosiUiIiI\nSACxj0yBP5dqW4uMHz+eTp06ATBy5EjAK9gF3rxpFJbTW1KgFRTLyclxd1KpkDfvwRIow2ARJHuv\nqVOnuuXziSxXyO4YrKjepk2bWLZsGQAPPPAA4CXdW4TO9qazgnmVK1eOzV58hSWdf/HFF5Hfd8+S\nzS3Bc/fdd+fLL78EkueZWETG8k3q1KnjSny89NJLobc3FSpWrOhyGe241alTB/A+69ZHy4Xq0qWL\ni2AZu7Pu3r27y4ezv6Wkh51nCttSq7AIvp2jTzjhBFc0Oc66du2a6SYUm5WpmDFjhjvP2DFavXo1\n4BUhtS2tLM+4Xr167rtq56wLLrggLW0uE4MpY3sp9ezZk1NOOQXwp/4uueQSABo2bOj28MskW51n\n0yhr1651dbJKy1YGJq5AssrxFg4NgyUn275dtlFoXrZxte1vZTVCiqrKa7V+bFPcL774ImCL08dW\nuiU7Weed9osiS/C3ZPPZs2e7aVzbm85W5T300ENuP80nn3wS8AYh9t9RZ9/FLl268Pzzz+d67Lbb\nbgO879OiRYsAfzp73rx5bnrT2Gd11KhR+T73Ua+enWyA0b59eyA+FdA//fRTt9m7LWCxhRObN29O\n+jsXXnghkH/T8biylcFxWs1nuyXYdXvLli3uHHT22WcD3t6zAOPGjXMr+W1QlZWV5QZflppgFfCP\nPvpod84Kg6b5RERERAIoU5Eps379eh555BHA3z/Mwu7t27d3dyy2D1oUZGdnlzo53iJStlffoEGD\n3JTYuHHjAL9yephGjx4dyuvalK1JNmUWNTZ9m3c/OvAjOStWrEhrm4KwRQAWcSmIRTDsjnH79u2R\njyRaXSGLPtlOBICb3rE6YOvXr3d/A1suf8ghh7gpPCv7YJGqbt26uTIRb7zxBuB9T+zu2oQ5DV9S\nyUojdO/eHfAT8W1aPsosUl7cJfEW0S8rkSmLiJqKFSu6dBf720SNzSBZ24cPH+6iVHkNGDDAJZUn\nq+9m07sWoQszKgWKTImIiIgEUqYiU5bgfPrpp9OyZUvAj0iZZcuWsXDhwrS3rSilST636IfdSdt8\n86xZs+jRo0fqGhcxtuAgyqwKf+LO85YblreYXFliuYCJ0Y0o50yVL1/eFYC1Yn8bN25kyJAhgJ/7\nZXkbLVq0cHlDlqS+atUq+vfvD/h3wVWqVAG8/EEr92EJwK+//rp7f8vnSNxvMtOmTp0K+FGCRJa/\neNVVV6W1Telw/PHHZ7oJKWULfExWVpabxYgqi9pbzqJ9P5KpWbNmvlzFXr16udxpY7M0YVNkSkRE\nRCSA2EemDjjgALc/j83r165dO9/zrHDejz/+GIk9p/Iu2z311FO58sori/37V199Nf/6178Af1dw\ny82wPf0kc6xAXuJn7d577wXSk7+WKbZiKi769evnIlKbNm0CvIiMRRbbtGkD+IVJTzjhBBd9u/32\n2wFv5VHeO2grDfHqq6/y6quvAt5dM/irksD7HkdNXMqOJLK8N8tRnDdvXom2funbt29kt3QqLYvy\n2PFs3LixiyjaCuyoKc4xsOvdGWec4SLAlg/19NNPh9e4IsRuMGUDJTsxXXHFFa6WTzK2R58lIaay\nllMQltxp/9auXZuJEycCfq2l3377DfBO6FbR3aqI169f3yXp2QXMLtZllQ08GzVqVGQ5hUyxZElb\nXp5o8eLF6W5O2sVtqsQ2cAZvyg+8aXNLRra9BhPZY6NGjQIodoXzJ554Ite/UWXJ9paI3aBBA/eY\n3fDZc8JO6i2Odu3aceONNwK4sjf77rtvoVNEVtbCqtmPHz8+X60wG4wVVEohLuzGoF69elxzzTUZ\nbk1wNhDs378/a9euBeCYY47JZJMATfOJiIiIBBKLyFStWrXcklxL/mzcuHGBz3/vvfcYM2YM4Ic6\nozC1V5jy5cu7Ebclj9tUQcOGDfM9f/HixS7ZNfHuuiyzKF6yqE8UHH744W6/Qfu82ZL5yZMnR77a\neSrst99+mW5Cifz000+u1IEl51r0F/zyB7ZoZebMmXz11VdA8SNScfXZZ58BuY9pFM+jkyZNypeI\nfP3117Nhw4YCf8ciWM2aNQNyl4GwkjlTpkwB/EUFcZeTkxPrKvxW1uGiiy4CvP7YvonpSjIvTDSv\nSiIiIiIxEcnIlM1nW0Guww8/vNA7XstFsQKVc+fOLVHyYSbYvl7vv/8+gCvlAH5eWK1atdzPLH/K\nlmqXJFm9rGnbti0PPfRQppuRT7Vq1fItfrB9IC3Juax7++23gcL3PIuS9u3bu61yLEqxdu1al7do\nxTXjfEdfWnbXb1tzxYmVqiiutWvXur0j7dwa91ypvKpUqeL2sItDeZm8rKSIRageffRRbrnllkw2\nKZfIDKZat24NeMmfrVq1AryEuYLYypuJEye6zYw3btwYcitTx8KStgLxkksucRXM85owYYILOdsm\nj/+LCtuwVKLBarzYpuP77befS2C2jUejZMOGDW63BPtXPFbl/PPPP6dJkyYZbk3Bzj//fJcs36dP\nnyKfv2bNGnf9sMH/tGnT8tUnKit69uwJeLts2H6ocWSLe6wunKXwRIWm+UREREQCyEpMvAv9zbKy\nCnyzO+64A8i9L5ZZtmwZs2fPBvyqrjalZ5WJ0yEnJ6fI0EhhfYyDovqYif5ZxXCbepk+fXrS6szF\nEeYxrF27Nk899RTgLdcG+PLLL4HkS+zDEoXPqR2zGTNmsGDBAsBfap+Kfd2i0MewRfG7mEqpPIa2\neMA+d8OHD3e7D8ycORPwp4lmzZrFTz/9VPIGl0IUPqeWGtKkSRNXhT+Ve/NFoY9hK04fFZkSERER\nCSAykak40Ai87PcP1MdUsMrETz/9tCsXYfttWTXxIDmOUehj2PRdVB/jQH30KDIlIiIiEoAiUyWg\nEXjZ7x+oj6lUpUoVt5WTLVc/9NBDgWC5U1HqY1j0XVQf40B99GgwVQL60JT9/oH6GAfqY9nvH6iP\ncaA+ejTNJyIiIhJAWiNTIiIiImWNIlMiIiIiAWgwJSIiIhKABlMiIiIiAWgwJSIiIhKABlMiIiIi\nAWgwJSIiIhKABlMiIiIiAWgwJSIiIhKABlMiIiIiAWgwJSIiIhKABlMiIiIiAWgwJSIiIhJAhXS+\nWVZWVqx3Vc7Jyckq6jllvY9lvX+gPsaB+lj2+wfqYxyojx5FpkREREQCSGtkSkRya9SoEQCvvvoq\nAOXLlwdg7733zlibRESkZBSZEhEREQlAkSmRDLnnnns488wzAahevToAs2fPzmSTRKQM22+//QAY\nNWoUAKeddhoAhx56KMuXL89Yu8oCRaZEREREAohtZOrAAw8E4OSTT6Zfv34AvP/++wAsWbLEPe/u\nu+8G4J9//klzC0Vyq1WrFgDPP/88AG3atCEnx1vk8umnnwJw4YUXZqZxIlKmHXHEES4385dffgFg\n8uTJAPz8888Za1dZociUiIiISABZdmecljdLQa2JSy65BICxY8cCsMsuuxT6/GOOOQaAt956K+hb\nq54Gyftnx8DyfzZv3kzz5s0B2HXXXQE455xzmD9/PgDff/99ga//008/ATBr1iw++OCDkja/SJk6\nho0aNXKf2RNPPNHehyFDhgC4vsbxc5qV5b3dE0884fpmkePvvvsuVW+Ti76Lqe3fueeeC0Dnzp05\n/PDDATjggAPc4++++y4Ap5xyCgB//PFH4PeMyzHceeed3bmrbt26ABx55JF89dVXRf5uFPp40kkn\nAfDss88ydepUAG688UYANm3aFPj1o9DHsBWrj3EbTFmi7ueffw7AHnvsUejz169fD/gX+tdee63U\n760PTfL+3XnnnQBcd911KWvH9u3bWbZsGeBdpBP/Lc5JrCCZOoZt2rThnXfeyfs+9O7dG/D7lgrp\n7uNOO+0EwIoVK6hXrx6Am3qfMWNGqt4mF30Xg/WvZs2agH98bJC0fv16Fi9enOu5Rx99NDvvvDOA\nS1K2wXIQUTqGdevWZffdd8/1s3Xr1gHQsWNHHnzwQcD7jAO0atWKDRs2FPm6mezj/vvvD8Ann3wC\nwNtvv+1udrZv356y94nScQyLinaKiIiIhCx2Cei///47ALfccgsA48aNc3fG33zzDQB77bWXe361\natUA6NKlCxAsMhUnVvSxcuXKAPTq1Yv+/fvnes6cOXMA6Nu3b6D36t69e4GP/fbbbwD83//9X4HP\nWbFihZtSsOPVtGlTDj74YABGjBiR6zWCRKbSzYpyPv744246zHTv3p1Zs2ZlolkpZVMFq1atcpGp\nvHf5Zdm1115LpUqVAGjSpAngTWsbi+YcdNBB6W9cASwReZ999gH86PKYMWPcOdY0btyY//73v4D/\neb755psBuP3229PR3JSw88nAgQPzFcVt1KhRrusGwB133AF4UTj77lqKgh3vqNpxxx1d1HHp0qUA\n9OzZM6URqSiwmSqbeRo6dKibijU33XQT4JeDCIsiUyIiIiIBxC5nKq+PP/6Yww47DPCXl9sdSKIG\nDRoA8MUXX5T6vaI+N3zssccCXsSjV69eAFStWhWAZMd55cqVgH83/f+fV+I8Dfvb2l2rvS74UYsf\nf/yxWH2whPWlS5fmu1OcPn064C9CKI10H8Nhw4YBcMMNN/DKK68AcOmllwKFJ+IHkanPaY8ePXjm\nmWcAePTRRwE477zzUv02QOb62KFDB3d+6dChA+AVPswbdUxk0YDVq1cDxc83Citn6rjjjnORqaef\nfhrAnS8KYhEou8v/+uuvAdh3331L0wQg/cdw4MCBANx11135HsvOznafXVu0lBjhsONrn2f7fBcl\nU5/TMWPGcMUVVwDQsGFDoOwtBmnTpo07lq1atbK2FPj8Rx55pNSzMMXpY+ym+fIaPny4W5lgq1CS\niXpYtjQsjHvIIYcA0LJly3zPsSTJxx57zNXhsmTnzZs3p6Qda9asyfVvECeffDKQe6o2Ozsb8AdT\ncWBJvPaZ/Oqrr7j66quB8AZRmWZTQeBNKQAMHjy42APpqKhTp477jljFaFO1alWXjG0X2A8//JBm\nzZoV+HrlynkTAPZ7mVahQgU3sHvyySeL9TvPPvss4A+mdtxxRwCqVKnCn3/+GUIrU+fWW28FYNCg\nQe5nDz/8MODXWxo7dqz7b/vOzp07F/CS9e0x+ztE1Q477ABA79693QrEsAZRmWKLJ6ZPn+4CAXZ8\nZs6c6VInbOB7xhlnAN7gy8YBYdSd1DSfiIiISACxj0w9++yzbsm5JZdbpCbR8OHDATj99NPT17gQ\n1KhRA/CS6S644ALAT8r/8MMPAS9x0qY8//77b8BPzo+iSpUqMXHiRCD5tFDbtm0Bb0o36rp16wZA\n69atAT/s/Mwzz6QsEhhlFq2xO8CuXbty3333ZbJJxWbT5NOnT2fPPfcs8vk2Xffrr7+6u2WbGrKl\n9PXr13fPt1IfmfbWW2/RtGlToPh1hiw6bKya/9lnn+1qF0WVRQRtMc7XX3/tZjMSo6ZWSmDo0KGA\nv4hi48aNLroV9e/w9ddfD3i1/6yPZY1Fnpo0aeKu+VbyIdGqVasA/3tdv359F8mychGppMiUiIiI\nSACxj0ydc845LgE9WeK5yVswMa7+9a9/Ad4ebvfccw/gV7P966+/Mtau0ujYsSPgVV8+//zzcz22\nZcsWlzAal93Mq1WrxlFHHZX0sXXr1hWau3DllVcC5IqIpLIIarrkTQCNU66i3dUni0pZZGbw4MGu\nGrgVcAS/BIgdx8SIlJXysCrjmVaa6Iot3Pnss88Av8yDJTdHmeU5WXmcAw880JU9uOyyywAvF278\n+PGAXzHcIv4jRoxgypQpaW1zaXXu3BmARYsW8dFHH2W4NeGw2RagRKVl/vzzT3799dcwmgQoMiUi\nIiISSOwiU40bNwbghRdeALx57goViu7Giy++GGq7wmDFSAcPHuzuaq+66irAy3uw1SZRn8fPy5ax\n2nx3+fLl8z0nJyfH5Xlt27YtfY0LYNu2bW5PQlvBZcviFy5cmO/5troPYMCAAQC5iglee+21gB/l\nKKurADPN7ubbtGmT7zH7DNr3b9GiRYW+VmJEytjdc5h3xWHbsmULAFu3bs1wS0rOci0tonjggQe6\n8gfHHXcc4JVLyFuK5bbbbgNwMwBR1q5dO8D/DCfLGwZvayDwV79ZpDFOLC8zKyvLbfljq0sbNGjg\nZjnsXGz7vfbq1SvUc2jsBlOWQGb1TYozkAL/wmUXrTiwZciDBw929WBsABK3AVQiWzafbBBlKlWq\n5Cq02ybAL730EuANpC3BPko6dOjgpvlsEGUX48QLqS29Puqoo+jatWuu19i4cSPgLWe2qvA2TXHW\nWWe5+j6SOjZotZsX8Etb2AW1sEHUbrvt5qaQ2rdvn+uxxYsX8/LLL6e0vZlgS+7tomWKsz9dptkU\nbWIJB1so8NxzzwHehdmmqO+//37AW2YfF7bHp+1Z++WXX7rHbHAxbtw4dtttN8D/m1gqweTJk9PV\n1MBsijknJ4drrrkG8L/DNoAC73wJ6StnoWk+ERERkQBiF5my6T1LFh09enS+u6Vk6tSpE2q7wnDD\nDTcA3gg81YU2M+n5558H/Chjy5Yt3dLyZFq0aJHr31tuuYW7774b8PcUW7t2bWjtLYpVbU+sBv3D\nDz8AXtVd8KpfW4V4Kx7YrVs3F7GyiOO4ceMALyF23rx57r/jwkLw6dxZIahp06YBfjHAP/74g7PP\nPhvwpwgKc+mll7pK98amT3r27Fms14g628PPoqXGKqknqlmzplsUZGVNrLp4YtJ+uhUV1bUI4tix\nYwH49ttvQ29TqliZHPvcZmdnu8Ufto/tJZdc4lJDrJSAlfBYs2ZN0mMZRbbYY9ddd3XXhMTzjpX7\nSHcpEkWmRERERAKIXWTKWJHHVatWUa1atVyPVahQgUmTJgHedgdxZdtztGjRwvXHloW+/vrrGWtX\nUJaPYkuQ99prLxcVsGKA3bt3d3dbefc9K1eunJsrtznyTp06ZWxHdEv+TNzzy7a+sT3NatWq5e54\n7a5ww4YNLhfOchdsqfnUqVNdPsqbb74JFH1nHQVxikgZy5uxf4vrlFNOAeDmm292P7MEbStkGeeo\nlOVJ1a9fnyOOOCLpc6ZOneqKBduWOtWrV3flJewzbAUx85ZASQfLzbR8xmT7KM6ZM8cdzzix/CHL\nHU5cIGAtR4t3AAAJHklEQVTHwyJOiblDTz31FOCfu2644YbYRKasz23atHELPqw/4M98pDsyFfuN\njgt4H1ex1k50tm9cp06dSn1RCnNDx9atW7NkyRLA3zeoevXqgLdBp9WXslpSrVu3DqX+Ulibq5bG\nOeecA/iLBmwVYDJDhgxxU36FCeMYDh48GPDq0Zi8CyMWLVrkqqKbTp06sWDBAsBfhZNYD82mMkta\nbypTG4/uueee+b5bHTt2dH1MpShsOm6rTBPPoVa3yKYOgwjru1i5cmX22GMPwL/g2ufPVrmBn2xu\nF69ktm3blq9+2kMPPeQWj9g0ttXaSpSuY2hTjN27dy/wOXPmzMm3GCQVwu5jp06dAP/m2qryL1++\n3KUf2HSfTY8lsucvXbq00AVBhcnkd9FqS1pF85ycHNenlStXpux9itNHTfOJiIiIBBDbab7CVKpU\nKVfoHfw6KVGpWWQJ8bNnzwa8qS4r3/Doo48CfgXeSZMmucjULrvsAvhRq7LsscceA/wQ7htvvAHk\nX34O/jRCJtg0c1ZWVr6KvFYGYZ999nHTC7aMd8GCBS4p/fHHH3evYc+xyFScWUS4LBk5ciSQv5YY\nEEoULijbk86i9aeccoqr15eMlRCwKbqtW7fmi7TOmDED8Kb5olhpu27duvTt2xeAHj16AH4E8aOP\nPnKRDHuOReriLrGOUnHKVhS2K0McWD2tZN/FdFNkSkRERCSAMhmZGj58eL6fWSG2qIzE7W7OEuQH\nDx7sIlJ52X5f4Ednoli0MiyWVGmJrskiU6mcHy+tnJycAhOwt2/f7h479NBDAa+gp+WlWJE9S5L9\n448/wm6ulEKlSpVo2rQp4N8F5+TkuO+o7VQfJVZ80qp9Z2dnu5wm+9xZRDU7O9vlN9m5cvny5S6C\nanv02QKQqO4H2qlTJ7f4w1gR5EmTJnHqqacCfmQq3cnKqZJYDbw0OnToAMSj+GoytiDLvovz5893\nOcfppsiUiIiISACRjEzVqFED8AuKPfHEE65oZWEsD6lfv375HrPlklFhpR3sbmnixInuZ8buchs2\nbOhWSVkhz8StEeKgTp06XHzxxQBuFaKVBSiKrTKxQoCJLGpl+25lgt3VDxo0iG7dugH+6ijLmbKV\nNQDnnXce4N1N2mony2cpa/vv2fL6uLOtZnr37u0iPOaJJ55w+X2ZzNkoiO09aFGo7t27u/3qkrH8\nqNGjRwNQr149VxTXtoKKakTK9p5LPJfaKj2L6teuXTtfTm2y1YZxYNHukq7Kr1ixIuAVnAW/uHCc\nNG7cmAsvvBDw9xqcMmVKxo5lJAdT9kWwuh+NGjVyFaXtYrN69WrAqzNkIWirip5YW8oqStvvR8Wo\nUaMAPzG+adOmHHvssbmeY/sozZkzxy2Pt37HRe3atQGv1oklC1q/imI1p2xKIXHZtrG9qBJLCqSb\nHcNNmza5i67t5VbYSS6xztQrr7wScisz48QTT4zFRrEFsUGw1Q07/fTT3WO2YGTSpEmRHEQZ+wyu\nX78eKDxFYMcdd3SlBKwOXHZ2ttvnLIrJ5olsoFu1alW3GMAW+dgA4uSTT3a7Ctj0mF2M48amJ3/8\n8UfA36NvypQpSZ9vfwN73Crb9+nTJ8xmppQdu7lz51KvXj3AL0+Trn34ktE0n4iIiEgAkYxM2Z2s\n7XXWtm1b5s+fD/jhWBuRH3XUUbmmUMC7E7OpJNuXKKp72llV7LLKlvdbVAr842r7dFkSIfjLuK+/\n/noXkcp7fLOyslzC5MCBA0NqefFZYnyvXr1cm226IdHDDz8MeAXyAJYsWRLJpfSl9fPPP7s96Qor\n9BgnduebGJGycg95p+WjyhZn2JTztGnTXCqFlQiwxPJBgwa5/ffee+89APr371/otGCUJC4KsIic\nRWMs6XzChAmsW7cO8Es8FBTJiTqLSFm5DpuJAb+0zH777Qd4aRJDhw4F/OuhTQFbukEcWHHmevXq\nufSfxH5niiJTIiIiIgFEejsZG22uXr2ae++9t9i/9/vvv7s7r1SKwhYWYUv1FhaWdH7ffffle8y2\nz0ksA2Dz4bb8PJm//vqL0047DfD3rSsuHUNPWH18//33AX/PxNmzZ8dymw4ramkFVm0J/cqVKznh\nhBOA8PdKTPV3cdiwYYC3PZEVOczrxRdfdGVkwt6rLYxjaOeZiy66yOXPWO6llR0BP0r10ksvleTl\nSyzd38XLL78cgDFjxuRb/LFhwwYXTbXyQakoI5CuPlpOsS342b59u8sRy1ssOdWK1ccoD6bMDjvs\nkG86xy62vXr1cj+zi/IxxxwTSqKkLsQl758lOI4cOdIlsZaUrdizKcPnnnvOTUGUlI6hJ6w+WqK2\nrbKZP39+0oUDQYXdR5siOfPMM3P9fMCAAWmbEorSPplhCOMYXnXVVUDuaR9LMrcdJSZPnswdd9wB\n5E4xCIPON54gfbRriKVTWG2+3r1788ILL5T2ZUtEe/OJiIiIhCySCeh5ZWdnM2bMmKSPnX322Wlu\njZSELRjo27cvL774IuCXOLDE2MRpIFs4ADBv3rxcP4tLEuz/shEjRgD+bu7FrSUWJQcddFCu8irg\nJW2D/5mUaLJFHpUqVXL7mX7wwQcA7vxz1113ZaZxUmKVK1d2U+2WAvLcc88BpC0qVVyKTImIiIgE\nEIucqajQ/HfZ7x+oj3EQZh9Hjx7t7oYtyfzEE08E/HIe6aDvovoYB2H2sX///kyaNAmAxYsXA34i\nenZ2dmleslSUMyUiIiISMkWmSkB3GWW/f6A+xkGYfezUqRNz584FoEePHkD4S6+T0XdRfYyDMPrY\nqlUrwMuPeuCBBwB/pfB3331X4jYGVWZKI0SFvhhlv3+gPsaB+lj2+wfqYxyojx5N84mIiIgEkNbI\nlIiIiEhZo8iUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiI\nSAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGU\niIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgEoMGUiIiISAAaTImIiIgE\noMGUiIiISAD/D2VmfeQeqcmwAAAAAElFTkSuQmCC\n",
 661 |       "text/plain": [
 662 |        "<matplotlib.figure.Figure at 0x10bdc67f0>"
 663 |       ]
 664 |      },
 665 |      "metadata": {},
 666 |      "output_type": "display_data"
 667 |     }
 668 |    ],
 669 |    "source": [
 670 |     "# To check MNIST data\n",
 671 |     "print(\"x_train.shape = \"+str(x_train.shape))\n",
 672 |     "print(\"y_train.shape = \"+str(y_train.shape))\n",
 673 |     "fig = plt.figure(figsize=(10, 2))\n",
 674 |     "for i in range(20):\n",
 675 |     "    ax1 = fig.add_subplot(2, 10, i+1)\n",
 676 |     "    ax1.imshow(x_train[i], cmap='gray');\n",
 677 |     "    ax1.axis('off')"
 678 |    ]
 679 |   },
 680 |   {
 681 |    "cell_type": "markdown",
 682 |    "metadata": {},
 683 |    "source": [
 684 |     "(In supervised learning) Every (good) dataset consists of a training set and a test set.\n",
 685 |     "\n",
 686 |     "The training data set consists of data points and their desired outputs.\n",
 687 |     "\n",
 688 |     "In this case, the data points are grayscale images of hand-written numbers, and their desired outputs are the numbers that have been drawn.\n",
 689 |     "\n",
 690 |     "The test data set consists of data points whose outputs need to be found."
 691 |    ]
 692 |   },
 693 |   {
 694 |    "cell_type": "markdown",
 695 |    "metadata": {},
 696 |    "source": [
 697 |     "Let us implement the following neural network to classify MNIST data:\n",
 698 |     "<center>![MNIST NN](images/digitsNN.png)"
 699 |    ]
 700 |   },
 701 |   {
 702 |    "cell_type": "markdown",
 703 |    "metadata": {},
 704 |    "source": [
 705 |     "## Initialize network\n",
 706 |     "\n",
 707 |     "MNIST dataset has images of size 28x28. So the input layer to our network must have $28*28=784$ neurons.\n",
 708 |     "\n",
 709 |     "Since we are tring to classify whether the image is that of 0 or 1 or 2 ... or 9, we need to have 10 output neurons, each catering to the probability of one number among 0-9.\n",
 710 |     "\n",
 711 |     "Let our hidden layer (as shown in the diagram) have 15 neurons."
 712 |    ]
 713 |   },
 714 |   {
 715 |    "cell_type": "markdown",
 716 |    "metadata": {},
 717 |    "source": [
 718 |     "Before initializing the network though, let's ensure our inputs and outputs are appropriate for the task at hand."
 719 |    ]
 720 |   },
 721 |   {
 722 |    "cell_type": "markdown",
 723 |    "metadata": {},
 724 |    "source": [
 725 |     "## Are our inputs in the right format and shape?\n",
 726 |     "\n",
 727 |     "Remember that we give inputs as np.arrays of $n{\\times}784$ dimensions, $n$ being the number of data points we want to input to the network."
 728 |    ]
 729 |   },
 730 |   {
 731 |    "cell_type": "markdown",
 732 |    "metadata": {},
 733 |    "source": [
 734 |     "Is ``x_train`` an np.array?"
 735 |    ]
 736 |   },
 737 |   {
 738 |    "cell_type": "code",
 739 |    "execution_count": 15,
 740 |    "metadata": {},
 741 |    "outputs": [
 742 |     {
 743 |      "data": {
 744 |       "text/plain": [
 745 |        "numpy.ndarray"
 746 |       ]
 747 |      },
 748 |      "execution_count": 15,
 749 |      "metadata": {},
 750 |      "output_type": "execute_result"
 751 |     }
 752 |    ],
 753 |    "source": [
 754 |     "# Check type of x_train\n",
 755 |     "type(x_train)"
 756 |    ]
 757 |   },
 758 |   {
 759 |    "cell_type": "markdown",
 760 |    "metadata": {},
 761 |    "source": [
 762 |     "Yup, ``x_train`` is an np.array"
 763 |    ]
 764 |   },
 765 |   {
 766 |    "cell_type": "markdown",
 767 |    "metadata": {},
 768 |    "source": [
 769 |     "Is ``x_train`` in the shape required by the network?"
 770 |    ]
 771 |   },
 772 |   {
 773 |    "cell_type": "code",
 774 |    "execution_count": 16,
 775 |    "metadata": {},
 776 |    "outputs": [
 777 |     {
 778 |      "data": {
 779 |       "text/plain": [
 780 |        "(60000, 28, 28)"
 781 |       ]
 782 |      },
 783 |      "execution_count": 16,
 784 |      "metadata": {},
 785 |      "output_type": "execute_result"
 786 |     }
 787 |    ],
 788 |    "source": [
 789 |     "# Check shape of x_train\n",
 790 |     "x_train.shape"
 791 |    ]
 792 |   },
 793 |   {
 794 |    "cell_type": "markdown",
 795 |    "metadata": {},
 796 |    "source": [
 797 |     "Clearly not.\n",
 798 |     "\n",
 799 |     "We need to reshape this matrix to $60000{\\times}784$."
 800 |    ]
 801 |   },
 802 |   {
 803 |    "cell_type": "code",
 804 |    "execution_count": 17,
 805 |    "metadata": {},
 806 |    "outputs": [
 807 |     {
 808 |      "data": {
 809 |       "text/plain": [
 810 |        "(60000, 784)"
 811 |       ]
 812 |      },
 813 |      "execution_count": 17,
 814 |      "metadata": {},
 815 |      "output_type": "execute_result"
 816 |     }
 817 |    ],
 818 |    "source": [
 819 |     "# Reshaping x_train and x_test for our network with 784 inputs neurons\n",
 820 |     "x_train = np.reshape(x_train, (len(x_train), 784))\n",
 821 |     "x_test = np.reshape(x_test, (len(x_test), 784))\n",
 822 |     "\n",
 823 |     "# Check the dimensions\n",
 824 |     "x_train.shape"
 825 |    ]
 826 |   },
 827 |   {
 828 |    "cell_type": "markdown",
 829 |    "metadata": {},
 830 |    "source": [
 831 |     "Now our input is in the right format and shape."
 832 |    ]
 833 |   },
 834 |   {
 835 |    "cell_type": "markdown",
 836 |    "metadata": {},
 837 |    "source": [
 838 |     "## Are our inputs normalized?\n",
 839 |     "\n",
 840 |     "Remember that we had decided to limit the range of values for the input to 0-1."
 841 |    ]
 842 |   },
 843 |   {
 844 |    "cell_type": "markdown",
 845 |    "metadata": {},
 846 |    "source": [
 847 |     "Are all the values of ``x_train`` between 0 and 1?"
 848 |    ]
 849 |   },
 850 |   {
 851 |    "cell_type": "code",
 852 |    "execution_count": 18,
 853 |    "metadata": {},
 854 |    "outputs": [
 855 |     {
 856 |      "name": "stdout",
 857 |      "output_type": "stream",
 858 |      "text": [
 859 |       "Values in x_train lie between 0 and 255\n"
 860 |      ]
 861 |     }
 862 |    ],
 863 |    "source": [
 864 |     "# Check range of values of x_train\n",
 865 |     "print(\"Values in x_train lie between \"+str(np.min(x_train))+\" and \"+str(np.max(np.max(x_train))))"
 866 |    ]
 867 |   },
 868 |   {
 869 |    "cell_type": "markdown",
 870 |    "metadata": {},
 871 |    "source": [
 872 |     "Our inputs are images, their values range from 0 to 255. We need to bring them down to 0-1."
 873 |    ]
 874 |   },
 875 |   {
 876 |    "cell_type": "code",
 877 |    "execution_count": 19,
 878 |    "metadata": {
 879 |     "collapsed": true
 880 |    },
 881 |    "outputs": [],
 882 |    "source": [
 883 |     "# Normalize x_train\n",
 884 |     "x_train = x_train / 255.0\n",
 885 |     "x_test = x_test / 255.0"
 886 |    ]
 887 |   },
 888 |   {
 889 |    "cell_type": "code",
 890 |    "execution_count": 20,
 891 |    "metadata": {},
 892 |    "outputs": [
 893 |     {
 894 |      "name": "stdout",
 895 |      "output_type": "stream",
 896 |      "text": [
 897 |       "Values in x_train lie between 0.0 and 1.0\n"
 898 |      ]
 899 |     }
 900 |    ],
 901 |    "source": [
 902 |     "# Check range of values of x_train\n",
 903 |     "print(\"Values in x_train lie between \"+str(np.min(x_train))+\" and \"+str(np.max(np.max(x_train))))"
 904 |    ]
 905 |   },
 906 |   {
 907 |    "cell_type": "markdown",
 908 |    "metadata": {},
 909 |    "source": [
 910 |     "Perfect."
 911 |    ]
 912 |   },
 913 |   {
 914 |    "cell_type": "markdown",
 915 |    "metadata": {},
 916 |    "source": [
 917 |     "## Are our outputs in the right format and shape?"
 918 |    ]
 919 |   },
 920 |   {
 921 |    "cell_type": "markdown",
 922 |    "metadata": {},
 923 |    "source": [
 924 |     "Is ``y_train`` an np.array?"
 925 |    ]
 926 |   },
 927 |   {
 928 |    "cell_type": "code",
 929 |    "execution_count": 21,
 930 |    "metadata": {},
 931 |    "outputs": [
 932 |     {
 933 |      "data": {
 934 |       "text/plain": [
 935 |        "numpy.ndarray"
 936 |       ]
 937 |      },
 938 |      "execution_count": 21,
 939 |      "metadata": {},
 940 |      "output_type": "execute_result"
 941 |     }
 942 |    ],
 943 |    "source": [
 944 |     "# Check type of y_train\n",
 945 |     "type(y_train)"
 946 |    ]
 947 |   },
 948 |   {
 949 |    "cell_type": "markdown",
 950 |    "metadata": {},
 951 |    "source": [
 952 |     "Yup, ``y_train`` is an np.array"
 953 |    ]
 954 |   },
 955 |   {
 956 |    "cell_type": "markdown",
 957 |    "metadata": {},
 958 |    "source": [
 959 |     "Remember that we have 10 neurons in the output layer. That means our output needs to be of ${n{\\times}10}$ dimensions."
 960 |    ]
 961 |   },
 962 |   {
 963 |    "cell_type": "markdown",
 964 |    "metadata": {},
 965 |    "source": [
 966 |     "Is the shape of ``y_train`` $n{\\times}10$?"
 967 |    ]
 968 |   },
 969 |   {
 970 |    "cell_type": "code",
 971 |    "execution_count": 22,
 972 |    "metadata": {},
 973 |    "outputs": [
 974 |     {
 975 |      "data": {
 976 |       "text/plain": [
 977 |        "(60000,)"
 978 |       ]
 979 |      },
 980 |      "execution_count": 22,
 981 |      "metadata": {},
 982 |      "output_type": "execute_result"
 983 |     }
 984 |    ],
 985 |    "source": [
 986 |     "# Check shape of y_train\n",
 987 |     "y_train.shape"
 988 |    ]
 989 |   },
 990 |   {
 991 |    "cell_type": "markdown",
 992 |    "metadata": {},
 993 |    "source": [
 994 |     "Nope, ``y_train`` is of shape $60000{\\times}1$"
 995 |    ]
 996 |   },
 997 |   {
 998 |    "cell_type": "markdown",
 999 |    "metadata": {},
1000 |    "source": [
1001 |     "What are its values like?"
1002 |    ]
1003 |   },
1004 |   {
1005 |    "cell_type": "code",
1006 |    "execution_count": 23,
1007 |    "metadata": {},
1008 |    "outputs": [
1009 |     {
1010 |      "name": "stdout",
1011 |      "output_type": "stream",
1012 |      "text": [
1013 |       "5\n",
1014 |       "0\n",
1015 |       "4\n",
1016 |       "1\n",
1017 |       "9\n"
1018 |      ]
1019 |     }
1020 |    ],
1021 |    "source": [
1022 |     "for i in range(5):\n",
1023 |     "    print(y_train[i])"
1024 |    ]
1025 |   },
1026 |   {
1027 |    "cell_type": "markdown",
1028 |    "metadata": {},
1029 |    "source": [
1030 |     "So ``y_train`` carries the numbers of the digits the images represent."
1031 |    ]
1032 |   },
1033 |   {
1034 |    "cell_type": "markdown",
1035 |    "metadata": {},
1036 |    "source": [
1037 |     "We need to make a new binary array of $60000{\\times}10$ and insert a 1 in the column corresponding to the number of the digit its image shows.\n",
1038 |     "\n",
1039 |     "For example, the first row of our new y_train should look like $\\left[\\begin{array}{c}0&0&0&0&0&1&0&0&0&0\\end{array}\\right]$, since it represents 5. This is called one-hot encoding."
1040 |    ]
1041 |   },
1042 |   {
1043 |    "cell_type": "code",
1044 |    "execution_count": 24,
1045 |    "metadata": {
1046 |     "collapsed": true
1047 |    },
1048 |    "outputs": [],
1049 |    "source": [
1050 |     "# Make new y_train of nx10 elements\n",
1051 |     "new_y_train = np.zeros((len(y_train), 10))\n",
1052 |     "for i in range(len(y_train)):\n",
1053 |     "    new_y_train[i, y_train[i]] = 1"
1054 |    ]
1055 |   },
1056 |   {
1057 |    "cell_type": "code",
1058 |    "execution_count": 25,
1059 |    "metadata": {
1060 |     "collapsed": true
1061 |    },
1062 |    "outputs": [],
1063 |    "source": [
1064 |     "# Make new y_test of nx10 elements\n",
1065 |     "new_y_test = np.zeros((len(y_test), 10))\n",
1066 |     "for i in range(len(y_test)):\n",
1067 |     "    new_y_test[i, y_test[i]] = 1"
1068 |    ]
1069 |   },
1070 |   {
1071 |    "cell_type": "code",
1072 |    "execution_count": 26,
1073 |    "metadata": {},
1074 |    "outputs": [
1075 |     {
1076 |      "name": "stdout",
1077 |      "output_type": "stream",
1078 |      "text": [
1079 |       "[ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]\n",
1080 |       "[ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]\n"
1081 |      ]
1082 |     }
1083 |    ],
1084 |    "source": [
1085 |     "# Check first row of y_train\n",
1086 |     "print(new_y_train[0])\n",
1087 |     "print(new_y_test[0])"
1088 |    ]
1089 |   },
1090 |   {
1091 |    "cell_type": "markdown",
1092 |    "metadata": {},
1093 |    "source": [
1094 |     "Now that new_y_train is correctly shaped and formatted, let us reassign the name y_train to the matrix new_y_train."
1095 |    ]
1096 |   },
1097 |   {
1098 |    "cell_type": "code",
1099 |    "execution_count": 27,
1100 |    "metadata": {
1101 |     "collapsed": true
1102 |    },
1103 |    "outputs": [],
1104 |    "source": [
1105 |     "# Reassign the name \"y_train\" to new_y_train\n",
1106 |     "y_train = new_y_train\n",
1107 |     "y_test = new_y_test"
1108 |    ]
1109 |   },
1110 |   {
1111 |    "cell_type": "markdown",
1112 |    "metadata": {},
1113 |    "source": [
1114 |     "## Initialize the network"
1115 |    ]
1116 |   },
1117 |   {
1118 |    "cell_type": "code",
1119 |    "execution_count": 28,
1120 |    "metadata": {
1121 |     "collapsed": true
1122 |    },
1123 |    "outputs": [],
1124 |    "source": [
1125 |     "# Initialize network\n",
1126 |     "layers = [784, 15, 10]\n",
1127 |     "weights = initializeWeights(layers)\n",
1128 |     "\n",
1129 |     "# Take backup of weights to be used later for comparison\n",
1130 |     "initialWeights = [np.array(w) for w in weights]"
1131 |    ]
1132 |   },
1133 |   {
1134 |    "cell_type": "code",
1135 |    "execution_count": 29,
1136 |    "metadata": {},
1137 |    "outputs": [
1138 |     {
1139 |      "data": {
1140 |       "text/plain": [
1141 |        "'\\nprint(\"weights:\")\\nfor i in range(len(weights)):\\n    print(i+1); print(weights[i].shape); print(weights[i])\\n'"
1142 |       ]
1143 |      },
1144 |      "execution_count": 29,
1145 |      "metadata": {},
1146 |      "output_type": "execute_result"
1147 |     }
1148 |    ],
1149 |    "source": [
1150 |     "# Please don't print the weights\n",
1151 |     "# There are 15*784=11760 weights in the first layer,\n",
1152 |     "# + 10*15=150 weights in the second layer\n",
1153 |     "'''\n",
1154 |     "print(\"weights:\")\n",
1155 |     "for i in range(len(weights)):\n",
1156 |     "    print(i+1); print(weights[i].shape); print(weights[i])\n",
1157 |     "'''\n"
1158 |    ]
1159 |   },
1160 |   {
1161 |    "cell_type": "markdown",
1162 |    "metadata": {},
1163 |    "source": [
1164 |     "## Train the network\n",
1165 |     "\n",
1166 |     "Use the proper inputs ``x_train`` and ``y_train`` to train your neural network."
1167 |    ]
1168 |   },
1169 |   {
1170 |    "cell_type": "markdown",
1171 |    "metadata": {},
1172 |    "source": [
1173 |     "How many iterations do you want to perform? How much should be the learning rate? Should it be adaptive? How many neurons per layer?"
1174 |    ]
1175 |   },
1176 |   {
1177 |    "cell_type": "markdown",
1178 |    "metadata": {},
1179 |    "source": [
1180 |     "Remember that there are 60,000 images in the training set."
1181 |    ]
1182 |   },
1183 |   {
1184 |    "cell_type": "code",
1185 |    "execution_count": 30,
1186 |    "metadata": {},
1187 |    "outputs": [
1188 |     {
1189 |      "name": "stdout",
1190 |      "output_type": "stream",
1191 |      "text": [
1192 |       "Iteration 1 of 1\n",
1193 |       "Cost: 1.97857726345\n",
1194 |       "Time: 3.7738959789276123 seconds\n"
1195 |      ]
1196 |     }
1197 |    ],
1198 |    "source": [
1199 |     "# Train the network using Gradient Descent\n",
1200 |     "# Let's check how much time it takes for 1 iteration\n",
1201 |     "\n",
1202 |     "# Set options\n",
1203 |     "nIterations = 1\n",
1204 |     "learningRate = 1.0\n",
1205 |     "\n",
1206 |     "# Start time\n",
1207 |     "start = time.time()\n",
1208 |     "\n",
1209 |     "# Train\n",
1210 |     "trainUsingGD(weights, x_train, y_train, nIterations, learningRate)\n",
1211 |     "\n",
1212 |     "# End time\n",
1213 |     "end = time.time()\n",
1214 |     "\n",
1215 |     "print(\"Time: \"+str(end - start)+\" seconds\")"
1216 |    ]
1217 |   },
1218 |   {
1219 |    "cell_type": "markdown",
1220 |    "metadata": {
1221 |     "collapsed": true
1222 |    },
1223 |    "source": [
1224 |     "See how it takes SO LONG for just one iteration?"
1225 |    ]
1226 |   },
1227 |   {
1228 |    "cell_type": "markdown",
1229 |    "metadata": {},
1230 |    "source": [
1231 |     "**Problem: Batch Gradient Descent computes error, delta, etc. over the entire input data set**\n",
1232 |     "\n",
1233 |     "Solution: Don't change weights over the entire data set, repeatedly use a randomly sampled subset of the data set.\n",
1234 |     "\n",
1235 |     "This is called the Monte Carlo method, and in this case it has been developed into Stochastic Gradient Descent."
1236 |    ]
1237 |   },
1238 |   {
1239 |    "cell_type": "markdown",
1240 |    "metadata": {},
1241 |    "source": [
1242 |     "# Mini-batch Gradient Descent"
1243 |    ]
1244 |   },
1245 |   {
1246 |    "cell_type": "markdown",
1247 |    "metadata": {
1248 |     "collapsed": true
1249 |    },
1250 |    "source": [
1251 |     "We shall define a $minibatchSize$ lesser than the number of data points input to the network ($n$). Say $minibatchSize = 100$.\n",
1252 |     "\n",
1253 |     "**Mini-batch GD**:\n",
1254 |     "\n",
1255 |     "For every epoch:\n",
1256 |     "- randomly group the input data set into mini-batches of ($minibatchSize=$) 100 images:\n",
1257 |     "    - randomly shuffle the entire data set\n",
1258 |     "    - consider every 100 images as one mini-batch - so there are ``int(n/minibatchSize)`` number of mini-batches\n",
1259 |     "- use gradient descent on every mini-batch to update weights\n",
1260 |     "- Repeat.\n",
1261 |     "\n",
1262 |     "If $minibatchSize=n$, this is the same as Batch Gradient Descent.\n",
1263 |     "\n",
1264 |     "If $minibatchSize=1$, i.e. we update the weights after backpropagating for only one image, it is called **Stochastic Grdient Descent**."
1265 |    ]
1266 |   },
1267 |   {
1268 |    "cell_type": "markdown",
1269 |    "metadata": {},
1270 |    "source": [
1271 |     "So, at every iteration we are using gradient descent on only $minibatchSize$ number of images.\n",
1272 |     "\n",
1273 |     "Mathematical proofs exist on why this works better than gradient descent, under some assumptions (like stationarity, which holds true for our purposes)."
1274 |    ]
1275 |   },
1276 |   {
1277 |    "cell_type": "markdown",
1278 |    "metadata": {},
1279 |    "source": [
1280 |     "Let's code Mini-batch Gradient Descent:"
1281 |    ]
1282 |   },
1283 |   {
1284 |    "cell_type": "code",
1285 |    "execution_count": 43,
1286 |    "metadata": {
1287 |     "collapsed": true
1288 |    },
1289 |    "outputs": [],
1290 |    "source": [
1291 |     "# TRAINING USING MINI-BATCH GRADIENT DESCENT\n",
1292 |     "# Default learning rate = 1.0\n",
1293 |     "def trainUsingMinibatchGD(weights, X, Y, minibatchSize, nEpochs, learningRate=1.0):\n",
1294 |     "    # For nIterations number of iterations:\n",
1295 |     "    for i in range(nEpochs):\n",
1296 |     "        # clear output\n",
1297 |     "        #clear_output()\n",
1298 |     "        print(\"Epoch \"+str(i+1)+\" of \"+str(nEpochs))\n",
1299 |     "        \n",
1300 |     "        # Make a list of all the indices\n",
1301 |     "        fullIdx = list(range(len(Y)))\n",
1302 |     "        \n",
1303 |     "        # Shuffle the full index\n",
1304 |     "        np.random.shuffle(fullIdx)\n",
1305 |     "        \n",
1306 |     "        # Count number of mini-batches\n",
1307 |     "        nOfMinibatches = int(len(X)/minibatchSize)\n",
1308 |     "        \n",
1309 |     "        # For each mini-batch\n",
1310 |     "        for m in range(nOfMinibatches):\n",
1311 |     "            # Compute the starting index of this mini-batch\n",
1312 |     "            startIdx = m*minibatchSize\n",
1313 |     "            \n",
1314 |     "            # Declare sampled inputs and outputs\n",
1315 |     "            xSample = X[fullIdx[startIdx:startIdx+minibatchSize]]\n",
1316 |     "            ySample = Y[fullIdx[startIdx:startIdx+minibatchSize]]\n",
1317 |     "\n",
1318 |     "            # Run backprop\n",
1319 |     "            backProp(weights, xSample, ySample, learningRate)"
1320 |    ]
1321 |   },
1322 |   {
1323 |    "cell_type": "markdown",
1324 |    "metadata": {},
1325 |    "source": [
1326 |     "Using MinibatchGD, training upto the same accuracy should take lesser time than GD."
1327 |    ]
1328 |   },
1329 |   {
1330 |    "cell_type": "code",
1331 |    "execution_count": 44,
1332 |    "metadata": {
1333 |     "collapsed": true
1334 |    },
1335 |    "outputs": [],
1336 |    "source": [
1337 |     "# Initialize network\n",
1338 |     "layers = [784, 30, 10]\n",
1339 |     "weights = initializeWeights(layers)\n",
1340 |     "\n",
1341 |     "# Take backup of weights to be used later for comparison\n",
1342 |     "initialWeights = [np.array(w) for w in weights]"
1343 |    ]
1344 |   },
1345 |   {
1346 |    "cell_type": "code",
1347 |    "execution_count": 45,
1348 |    "metadata": {},
1349 |    "outputs": [
1350 |     {
1351 |      "name": "stdout",
1352 |      "output_type": "stream",
1353 |      "text": [
1354 |       "5570 out of 60000 : 0.09283333333333334\n"
1355 |      ]
1356 |     }
1357 |    ],
1358 |    "source": [
1359 |     "# Evaluate initial weights on training data\n",
1360 |     "evaluate(weights, x_train, y_train)"
1361 |    ]
1362 |   },
1363 |   {
1364 |    "cell_type": "code",
1365 |    "execution_count": 46,
1366 |    "metadata": {},
1367 |    "outputs": [
1368 |     {
1369 |      "name": "stdout",
1370 |      "output_type": "stream",
1371 |      "text": [
1372 |       "948 out of 10000 : 0.0948\n"
1373 |      ]
1374 |     }
1375 |    ],
1376 |    "source": [
1377 |     "# Evaluate initial weights on test data\n",
1378 |     "evaluate(weights, x_test, y_test)"
1379 |    ]
1380 |   },
1381 |   {
1382 |    "cell_type": "markdown",
1383 |    "metadata": {},
1384 |    "source": [
1385 |     "- Let's first use Batch Gradient Descent ($minibatchSize = size\\;of \\;full\\;input$) to evaluate the accuracy and time with one iteration "
1386 |    ]
1387 |   },
1388 |   {
1389 |    "cell_type": "code",
1390 |    "execution_count": 47,
1391 |    "metadata": {},
1392 |    "outputs": [
1393 |     {
1394 |      "name": "stdout",
1395 |      "output_type": "stream",
1396 |      "text": [
1397 |       "Epoch 1 of 1\n",
1398 |       "Training accuracy:\n",
1399 |       "5889 out of 60000 : 0.09815\n",
1400 |       "Test accuracy:\n",
1401 |       "1012 out of 10000 : 0.1012\n",
1402 |       "Time: 2.8622570037841797 seconds\n"
1403 |      ]
1404 |     }
1405 |    ],
1406 |    "source": [
1407 |     "# Train the network ONCE using Batch Gradient Descent to check accuracy and time\n",
1408 |     "\n",
1409 |     "# Re-initialize weights\n",
1410 |     "weights = [np.array(w) for w in initialWeights]\n",
1411 |     "\n",
1412 |     "# Set options for batch gradient descent\n",
1413 |     "minibatchSize = len(y_train)\n",
1414 |     "nEpochs = 1\n",
1415 |     "learningRate = 3.0\n",
1416 |     "\n",
1417 |     "# Start time\n",
1418 |     "start = time.time()\n",
1419 |     "\n",
1420 |     "# Train\n",
1421 |     "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n",
1422 |     "\n",
1423 |     "# End time\n",
1424 |     "end = time.time()\n",
1425 |     "\n",
1426 |     "# Evaluate accuracy\n",
1427 |     "print(\"Training accuracy:\")\n",
1428 |     "evaluate(weights, x_train, y_train)\n",
1429 |     "print(\"Test accuracy:\")\n",
1430 |     "evaluate(weights, x_test, y_test)\n",
1431 |     "\n",
1432 |     "# Print time taken\n",
1433 |     "print(\"Time: \"+str(end-start)+\" seconds\")"
1434 |    ]
1435 |   },
1436 |   {
1437 |    "cell_type": "markdown",
1438 |    "metadata": {},
1439 |    "source": [
1440 |     "- Okay, let's check with Stochastic Gradient Descent, i.e. $minibatchSize = 1$"
1441 |    ]
1442 |   },
1443 |   {
1444 |    "cell_type": "code",
1445 |    "execution_count": 48,
1446 |    "metadata": {},
1447 |    "outputs": [
1448 |     {
1449 |      "name": "stdout",
1450 |      "output_type": "stream",
1451 |      "text": [
1452 |       "Epoch 1 of 1\n",
1453 |       "Training accuracy:\n",
1454 |       "44816 out of 60000 : 0.7469333333333333\n",
1455 |       "Test accuracy:\n",
1456 |       "7539 out of 10000 : 0.7539\n",
1457 |       "Time: 21.746292114257812 seconds\n"
1458 |      ]
1459 |     }
1460 |    ],
1461 |    "source": [
1462 |     "# Train the network ONCE using Stochastic Gradient Descent to check accuracy and time\n",
1463 |     "\n",
1464 |     "# Re-initialize weights\n",
1465 |     "weights = [np.array(w) for w in initialWeights]\n",
1466 |     "\n",
1467 |     "# Set options of stochastic gradient descent\n",
1468 |     "minibatchSize = 1\n",
1469 |     "nEpochs = 1\n",
1470 |     "learningRate = 3.0\n",
1471 |     "\n",
1472 |     "# Start time\n",
1473 |     "start = time.time()\n",
1474 |     "\n",
1475 |     "# Train\n",
1476 |     "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n",
1477 |     "\n",
1478 |     "# End time\n",
1479 |     "end = time.time()\n",
1480 |     "\n",
1481 |     "# Evaluate accuracy\n",
1482 |     "print(\"Training accuracy:\")\n",
1483 |     "evaluate(weights, x_train, y_train)\n",
1484 |     "print(\"Test accuracy:\")\n",
1485 |     "evaluate(weights, x_test, y_test)\n",
1486 |     "\n",
1487 |     "# Print time taken\n",
1488 |     "print(\"Time: \"+str(end-start)+\" seconds\")"
1489 |    ]
1490 |   },
1491 |   {
1492 |    "cell_type": "markdown",
1493 |    "metadata": {},
1494 |    "source": [
1495 |     "Stochastic Gradient Descent took more time, but gave much better accuracy in just 1 epoch."
1496 |    ]
1497 |   },
1498 |   {
1499 |    "cell_type": "markdown",
1500 |    "metadata": {},
1501 |    "source": [
1502 |     "- Let's now check for Mini-batch Gradient Descent, with $minibatchSize = $ (say) $10$"
1503 |    ]
1504 |   },
1505 |   {
1506 |    "cell_type": "code",
1507 |    "execution_count": 49,
1508 |    "metadata": {},
1509 |    "outputs": [
1510 |     {
1511 |      "name": "stdout",
1512 |      "output_type": "stream",
1513 |      "text": [
1514 |       "Epoch 1 of 1\n",
1515 |       "Training accuracy:\n",
1516 |       "52428 out of 60000 : 0.8738\n",
1517 |       "Test accuracy:\n",
1518 |       "8752 out of 10000 : 0.8752\n",
1519 |       "Time: 4.0647711753845215 seconds\n"
1520 |      ]
1521 |     }
1522 |    ],
1523 |    "source": [
1524 |     "# Train the network ONCE using Mini-batch Gradient Descent to check accuracy and time\n",
1525 |     "\n",
1526 |     "# Re-initialize weights\n",
1527 |     "weights = [np.array(w) for w in initialWeights]\n",
1528 |     "\n",
1529 |     "# Set options of mini-batch gradient descent\n",
1530 |     "minibatchSize = 10\n",
1531 |     "nEpochs = 1\n",
1532 |     "learningRate = 3.0\n",
1533 |     "\n",
1534 |     "# Start time\n",
1535 |     "start = time.time()\n",
1536 |     "\n",
1537 |     "# Train\n",
1538 |     "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n",
1539 |     "\n",
1540 |     "# End time\n",
1541 |     "end = time.time()\n",
1542 |     "\n",
1543 |     "# Evaluate accuracy\n",
1544 |     "print(\"Training accuracy:\")\n",
1545 |     "evaluate(weights, x_train, y_train)\n",
1546 |     "print(\"Test accuracy:\")\n",
1547 |     "evaluate(weights, x_test, y_test)\n",
1548 |     "\n",
1549 |     "# Print time taken\n",
1550 |     "print(\"Time: \"+str(end-start)+\" seconds\")"
1551 |    ]
1552 |   },
1553 |   {
1554 |    "cell_type": "markdown",
1555 |    "metadata": {},
1556 |    "source": [
1557 |     "Thus, (in 1 epoch) Mini-batch Gradient descent gives comparable accuracy to Stochastic Gradient Descent, which is much better than the accuracy given by Batch Gradient Descent, in much lesser time."
1558 |    ]
1559 |   },
1560 |   {
1561 |    "cell_type": "markdown",
1562 |    "metadata": {},
1563 |    "source": [
1564 |     "## Classifying MNIST data set\n",
1565 |     "\n",
1566 |     "Let us try to classify the MNIST data set up to more than 99%. This means deciding the number of layers, size of each layer, number of Epochs, the mini-batch size, and the learning (constant, for now).\n",
1567 |     "\n",
1568 |     "Let us try, $layers = [784$ (input layer, because each MNIST image is 28$x$28)$, 30$ (hidden layer)$, 10$ (outputs layer, one neuron for each digit)$], nEpochs = 30, minibatchSize = 10, learningRate = 3.0$"
1569 |    ]
1570 |   },
1571 |   {
1572 |    "cell_type": "code",
1573 |    "execution_count": 55,
1574 |    "metadata": {},
1575 |    "outputs": [
1576 |     {
1577 |      "name": "stdout",
1578 |      "output_type": "stream",
1579 |      "text": [
1580 |       "Epoch 1 of 50\n",
1581 |       "Epoch 2 of 50\n",
1582 |       "Epoch 3 of 50\n",
1583 |       "Epoch 4 of 50\n",
1584 |       "Epoch 5 of 50\n",
1585 |       "Epoch 6 of 50\n",
1586 |       "Epoch 7 of 50\n",
1587 |       "Epoch 8 of 50\n",
1588 |       "Epoch 9 of 50\n",
1589 |       "Epoch 10 of 50\n",
1590 |       "Epoch 11 of 50\n",
1591 |       "Epoch 12 of 50\n",
1592 |       "Epoch 13 of 50\n",
1593 |       "Epoch 14 of 50\n",
1594 |       "Epoch 15 of 50\n",
1595 |       "Epoch 16 of 50\n",
1596 |       "Epoch 17 of 50\n",
1597 |       "Epoch 18 of 50\n",
1598 |       "Epoch 19 of 50\n",
1599 |       "Epoch 20 of 50\n",
1600 |       "Epoch 21 of 50\n",
1601 |       "Epoch 22 of 50\n",
1602 |       "Epoch 23 of 50\n",
1603 |       "Epoch 24 of 50\n",
1604 |       "Epoch 25 of 50\n",
1605 |       "Epoch 26 of 50\n",
1606 |       "Epoch 27 of 50\n",
1607 |       "Epoch 28 of 50\n",
1608 |       "Epoch 29 of 50\n",
1609 |       "Epoch 30 of 50\n",
1610 |       "Epoch 31 of 50\n",
1611 |       "Epoch 32 of 50\n",
1612 |       "Epoch 33 of 50\n",
1613 |       "Epoch 34 of 50\n",
1614 |       "Epoch 35 of 50\n",
1615 |       "Epoch 36 of 50\n",
1616 |       "Epoch 37 of 50\n",
1617 |       "Epoch 38 of 50\n",
1618 |       "Epoch 39 of 50\n",
1619 |       "Epoch 40 of 50\n",
1620 |       "Epoch 41 of 50\n",
1621 |       "Epoch 42 of 50\n",
1622 |       "Epoch 43 of 50\n",
1623 |       "Epoch 44 of 50\n",
1624 |       "Epoch 45 of 50\n",
1625 |       "Epoch 46 of 50\n",
1626 |       "Epoch 47 of 50\n",
1627 |       "Epoch 48 of 50\n",
1628 |       "Epoch 49 of 50\n",
1629 |       "Epoch 50 of 50\n",
1630 |       "Training accuracy:\n",
1631 |       "58180 out of 60000 : 0.9696666666666667\n",
1632 |       "Test accuracy:\n",
1633 |       "9397 out of 10000 : 0.9397\n"
1634 |      ]
1635 |     }
1636 |    ],
1637 |    "source": [
1638 |     "# TRAIN A NETWORK TO CLASSIFY MNIST\n",
1639 |     "\n",
1640 |     "# Initialize network\n",
1641 |     "layers = [784, 30, 10]\n",
1642 |     "weights = initializeWeights(layers)\n",
1643 |     "\n",
1644 |     "# Take backup of weights to be used later for comparison\n",
1645 |     "initialWeights = [np.array(w) for w in weights]\n",
1646 |     "\n",
1647 |     "# Set options of mini-batch gradient descent\n",
1648 |     "minibatchSize = 10\n",
1649 |     "nEpochs = 50\n",
1650 |     "learningRate = 3.0\n",
1651 |     "\n",
1652 |     "# Train\n",
1653 |     "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n",
1654 |     "\n",
1655 |     "# Evaluate accuracy\n",
1656 |     "print(\"Training accuracy:\")\n",
1657 |     "evaluate(weights, x_train, y_train)\n",
1658 |     "print(\"Test accuracy:\")\n",
1659 |     "evaluate(weights, x_test, y_test)"
1660 |    ]
1661 |   },
1662 |   {
1663 |    "cell_type": "markdown",
1664 |    "metadata": {},
1665 |    "source": [
1666 |     "About 93%-95%.. What if we increase the mini-batch size?"
1667 |    ]
1668 |   },
1669 |   {
1670 |    "cell_type": "code",
1671 |    "execution_count": 59,
1672 |    "metadata": {},
1673 |    "outputs": [
1674 |     {
1675 |      "name": "stdout",
1676 |      "output_type": "stream",
1677 |      "text": [
1678 |       "Epoch 1 of 10\n",
1679 |       "Epoch 2 of 10\n",
1680 |       "Epoch 3 of 10\n",
1681 |       "Epoch 4 of 10\n",
1682 |       "Epoch 5 of 10\n",
1683 |       "Epoch 6 of 10\n",
1684 |       "Epoch 7 of 10\n",
1685 |       "Epoch 8 of 10\n",
1686 |       "Epoch 9 of 10\n",
1687 |       "Epoch 10 of 10\n",
1688 |       "Training accuracy:\n",
1689 |       "53245 out of 60000 : 0.8874166666666666\n",
1690 |       "Test accuracy:\n",
1691 |       "8846 out of 10000 : 0.8846\n"
1692 |      ]
1693 |     }
1694 |    ],
1695 |    "source": [
1696 |     "# TRAIN A NETWORK TO CLASSIFY MNIST\n",
1697 |     "\n",
1698 |     "# Initialize network\n",
1699 |     "layers = [784, 10, 10, 10]\n",
1700 |     "weights = initializeWeights(layers)\n",
1701 |     "\n",
1702 |     "# Take backup of weights to be used later for comparison\n",
1703 |     "initialWeights = [np.array(w) for w in weights]\n",
1704 |     "\n",
1705 |     "# Set options of mini-batch gradient descent\n",
1706 |     "minibatchSize = 10\n",
1707 |     "nEpochs = 30\n",
1708 |     "learningRate = 3.0\n",
1709 |     "\n",
1710 |     "# Train\n",
1711 |     "trainUsingMinibatchGD(weights, x_train, y_train, minibatchSize, nEpochs, learningRate)\n",
1712 |     "\n",
1713 |     "# Evaluate accuracy\n",
1714 |     "print(\"Training accuracy:\")\n",
1715 |     "evaluate(weights, x_train, y_train)\n",
1716 |     "print(\"Test accuracy:\")\n",
1717 |     "evaluate(weights, x_test, y_test)"
1718 |    ]
1719 |   },
1720 |   {
1721 |    "cell_type": "markdown",
1722 |    "metadata": {
1723 |     "collapsed": true
1724 |    },
1725 |    "source": [
1726 |     "## Coming up next\n",
1727 |     "\n",
1728 |     "In the next tutorial, we shall see the different types of optimizations that can be done in gradient descent, and compare their performances."
1729 |    ]
1730 |   }
1731 |  ],
1732 |  "metadata": {
1733 |   "kernelspec": {
1734 |    "display_name": "Python 3",
1735 |    "language": "python",
1736 |    "name": "python3"
1737 |   },
1738 |   "language_info": {
1739 |    "codemirror_mode": {
1740 |     "name": "ipython",
1741 |     "version": 3
1742 |    },
1743 |    "file_extension": ".py",
1744 |    "mimetype": "text/x-python",
1745 |    "name": "python",
1746 |    "nbconvert_exporter": "python",
1747 |    "pygments_lexer": "ipython3",
1748 |    "version": "3.5.1"
1749 |   }
1750 |  },
1751 |  "nbformat": 4,
1752 |  "nbformat_minor": 2
1753 | }
1754 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Neural Network in Python
  2 | 
  3 | An implementation of a Multi-Layer Perceptron, with forward propagation, back propagation using Gradient Descent, training usng Batch or Stochastic Gradient Descent
  4 | 
  5 | Use: myNN = MyPyNN(nOfInputDims, nOfHiddenLayers, sizesOfHiddenLayers, nOfOutputDims, alpha, regLambda)
  6 | Here, alpha = learning rate of gradient descent, regLambda = regularization parameter
  7 | 
  8 | ## Example 1
  9 | 
 10 | ```
 11 | from myPyNN import *
 12 | X = [0, 0.5, 1]
 13 | y = [0, 0.5, 1]
 14 | myNN = MyPyNN([1, 1, 1]]
 15 | ```
 16 | Input Layer    : 1-dimensional (Eg: X)
 17 | 
 18 | 1 Hidden Layer : 1-dimensional
 19 | 
 20 | Output Layer   : 1-dimensional (Eg. y)
 21 | 
 22 | Learning Rate  : 0.05 (default)
 23 | ``` 
 24 | print myNN.predict(0.2)
 25 | ```
 26 | 
 27 | 
 28 | ## Example 2
 29 | ```
 30 | X = [[0,0], [1,1]]
 31 | y = [0, 1]
 32 | myNN = MyPyNN([2, 3, 1])
 33 | ```
 34 | Input Layer    : 2-dimensional (Eg: X)
 35 | 
 36 | 1 Hidden Layer : 3-dimensional
 37 | 
 38 | Output Layer   : 1-dimensional (Eg. y)
 39 | 
 40 | Learning rate  : 0.8
 41 | ``` 
 42 | print myNN.predict(X)
 43 | #myNN.trainUsingGD(X, y, 899)
 44 | myNN.trainUsingSGD(X, y, 1000)
 45 | print myNN.predict(X)
 46 | ```
 47 | 
 48 | ## Example 3
 49 | 
 50 | ```
 51 | X = [[2,2,2], [3,3,3], [4,4,4], [5,5,5], [6,6,6], [7,7,7], [8,8,8], [9,9,9], [10,10,10], [11,11,11]]
 52 | y = [.2, .3, .4, .5, .6, .7, .8, .9, 0, .1]
 53 | myNN = MyPyNN([3, 10, 10, 5, 1])
 54 | ```
 55 | Input Layer    : 3-dimensional (Eg: X)
 56 | 
 57 | 3 Hidden Layers: 10-dimensional, 10-dimensional, 5-dimensional
 58 | 
 59 | Output Layer   : 1-dimensional (Eg. y)
 60 | 
 61 | Learning rate  : 0.9
 62 | 
 63 | Regularization parameter : 0.5
 64 | ``` 
 65 | print myNN.predict(X)
 66 | #myNN.trainUsingGD(X, y, 899)
 67 | myNN.trainUsingSGD(X, y, 1000)
 68 | print myNN.predict(X)
 69 | ```
 70 | 
 71 | ## Requirements for interactive tutorial (myPyNN.ipynb)
 72 | 
 73 | I ran this in OS X, after installing brew for command-line use, and pip for python-related stuff.
 74 | 
 75 | ### Python
 76 | 
 77 | I designed the tutorial on Python 2.7, can be run on Python 3 as well.
 78 | 
 79 | ### Packages
 80 | 
 81 | - numpy
 82 | - matplotlib
 83 | - ipywidgets
 84 | 
 85 | ### Jupyter
 86 | 
 87 | The tutorial is an iPython notebook. It is designed and meant to run in Jupyter. To install Jupyter, one can install Anaconda which would install Python, Jupyter, along with a lot of other stuff. Or, one can install only Jupyter using:
 88 | ```
 89 | pip install jupyter
 90 | ```
 91 | 
 92 | ### ipywidgets
 93 | 
 94 | ipywidgets comes pre-installed with Jupyter. However, widgets might need to be actived using:
 95 | ```
 96 | jupyter nbextension enable --py widgetsnbextension
 97 | jupyter nbextension enable --py --sys-prefix widgetsnbextension
 98 | ```
 99 | 
100 | ## References
101 | - [Machine Learning Mastery's excellent tutorial](https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/)
102 | 
103 | - [Mattmazur's example](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)
104 | 
105 | - [Welch Lab's excellent video playlist on neural networks](https://www.youtube.com/playlist?list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU)
106 | 
107 | - [Michael Nielsen's brilliant hands-on interactive tutorial on the awesome power of neural networks as universal approximators](https://neuralnetworksanddeeplearning.com/chap4.html)
108 | 
109 | - [Excellent overview of gradient descent algorithms](http://sebastianruder.com/optimizing-gradient-descent/)
110 | 
111 | - [CS321n's iPython tutorial](https://cs231n.github.io/ipython-tutorial/)
112 | 
113 | - [Karlijn Willem's definitive Jupyter guide](https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook#gs.SJPul58)
114 | 
115 | - [matplotlib](https://matplotlib.org/)
116 | 
117 | - [Tutorial on using Matplotlib in Jupyter](https://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-4-Matplotlib.ipynb)
118 | 
119 | - [Interactive dashboards in Jupyter](https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/)
120 | 
121 | - [ipywidgets - for interactive dashboards in Jupyter](http://ipywidgets.readthedocs.io/)
122 | 
123 | - [drawing-animating-shapes-matplotlib](https://nickcharlton.net/posts/drawing-animating-shapes-matplotlib.html)
124 | 
125 | - [RISE - for Jupyter presentations](https://github.com/damianavila/RISE)
126 | 
127 | - [MathJax syntax list](https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference)
128 | 
129 | - [MNIST dataset and results](http://yann.lecun.com/exdb/mnist/)
130 | 
131 | - [MNIST dataset .npz file (Amazon AWS)](https://s3.amazonaws.com/img-datasets/mnist.npz)
132 | 
133 | - [NpzFile doc](http://docr.it/numpy/lib/npyio/NpzFile)
134 | 
135 | - [matplotlib examples from SciPy](http://scipython.com/book/chapter-7-matplotlib/examples/simple-surface-plots/)
136 | 
137 | - [Yann LeCun's backprop paper, containing tips for efficient backpropagation](http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf)
138 | 
139 | - [Mathematical notations for LaTeX, which can also be used in Jupyter](https://en.wikibooks.org/wiki/LaTeX/Mathematics)
140 | 
141 | - [JupyterHub](http://jupyterhub.readthedocs.io/en/latest/getting-started.html)
142 | 
143 | - [Optional code visibility in iPython notebooks](https://chris-said.io/2016/02/13/how-to-make-polished-jupyter-presentations-with-optional-code-visibility/)
144 | 
145 | - [Ultimate iPython notebook tips](https://blog.juliusschulz.de/blog/ultimate-ipython-notebook)
146 | 
147 | - [Full preprocessing for medical images tutorial](https://www.kaggle.com/gzuidhof/data-science-bowl-2017/full-preprocessing-tutorial)
148 | 
149 | - [Example ConvNet for a kaggle problem (cats vs dogs)](https://www.kaggle.com/sentdex/dogs-vs-cats-redux-kernels-edition/full-classification-example-with-convnet)
150 | 
151 | - Fernando Pérez, Brian E. Granger, IPython: A System for Interactive Scientific Computing, Computing in Science and Engineering, vol. 9, no. 3, pp. 21-29, May/June 2007, doi:10.1109/MCSE.2007.53. URL: (http://ipython.org)
152 | 
153 | 


--------------------------------------------------------------------------------
/images/Title_ANN.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/voletiv/myPythonNeuralNetwork/e215c6f20ca0b2d7aa947f956049110f4e60b094/images/Title_ANN.png


--------------------------------------------------------------------------------
/images/digitsNN.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/voletiv/myPythonNeuralNetwork/e215c6f20ca0b2d7aa947f956049110f4e60b094/images/digitsNN.png


--------------------------------------------------------------------------------
/images/optimizers.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/voletiv/myPythonNeuralNetwork/e215c6f20ca0b2d7aa947f956049110f4e60b094/images/optimizers.gif


--------------------------------------------------------------------------------
/myPyNN.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | DEBUG = 0
  3 | 
  4 | class MyPyNN(object):
  5 | 
  6 |     def __init__(self, layers=[3, 4, 2]):
  7 | 
  8 |         self.layers = layers
  9 | 
 10 |         # Network
 11 |         self.weights = [np.random.randn(x+1, y) 
 12 |                         for x, y in zip(self.layers[:-1], self.layers[1:])]
 13 | 
 14 |         # For mean-centering
 15 |         self.meanX = np.zeros((1, self.layers[0]))
 16 | 
 17 |         # Default options
 18 |         self.learningRate = 1.0
 19 |         self.regLambda = 0
 20 |         self.adaptLearningRate = False
 21 |         self.normalizeInputs = False
 22 |         self.meanCentering = False
 23 |         self.visible = False
 24 | 
 25 |     def predict(self, X, visible=False):
 26 |         self.visible = visible
 27 |         # mean-centering
 28 |         inputs = self.preprocessTestingInputs(X) - self.meanX
 29 | 
 30 |         if inputs.ndim!=1 and inputs.ndim!=2:
 31 |             print "X is not one or two dimensional, please check."
 32 |             return
 33 | 
 34 |         if DEBUG or self.visible:
 35 |             print "PREDICT:"
 36 |             print inputs
 37 | 
 38 |         for l, w in enumerate(self.weights):
 39 |             inputs = self.addBiasTerms(inputs)
 40 |             inputs = self.sigmoid(np.dot(inputs, w))
 41 |             if DEBUG or self.visible:
 42 |                 print "Layer "+str(l+1)
 43 |                 print inputs
 44 |         
 45 |         return inputs
 46 | 
 47 |     def trainUsingMinibatchGD(self, X, y, nEpochs=1000, minibatchSize=100,
 48 |                         learningRate=0.05, regLambda=0, adaptLearningRate=False, 
 49 |                         normalizeInputs=False, meanCentering=False, 
 50 |                         printTestAccuracy=False, testX=None, testY=None, 
 51 |                         visible=False):
 52 |         self.learningRate = float(learningRate)
 53 |         self.regLambda = regLambda
 54 |         self.adaptLearningRate = adaptLearningRate
 55 |         self.normalizeInputs = normalizeInputs
 56 |         self.meanCentering = meanCentering
 57 |         self.visible = visible
 58 |         
 59 |         X = self.preprocessTrainingInputs(X)
 60 |         y = self.preprocessOutputs(y)
 61 |         
 62 |         yPred = self.predict(X, visible=self.visible)
 63 |         
 64 |         if yPred.shape != y.shape:
 65 |             print "Shape of y ("+str(y.shape)+") does not match what shape of y is supposed to be: "+str(yPred.shape)
 66 |             return
 67 |         
 68 |         self.trainAccuracy = (np.sum([np.argmax(yPred[k])==np.argmax(y[k])
 69 |                             for k in range(len(y))])).astype(float)/len(y)
 70 |         print "train accuracy = " + str(self.trainAccuracy)
 71 |         
 72 |         self.prevCost = 0.5*np.sum((yPred-y)**2)/len(y)
 73 |         print "cost = " + str(self.prevCost)
 74 |         self.cost = self.prevCost
 75 | 
 76 |         # mean-centering
 77 |         if self.meanCentering:
 78 |             X = X - self.meanX
 79 |         else:
 80 |             X = X
 81 | 
 82 |         self.inputs = X
 83 |         
 84 |         if DEBUG or self.visible:
 85 |             print "train input:"+str(inputs)
 86 | 
 87 |         # Just to ensure minibatchSize !> len(X)
 88 |         if minibatchSize > len(X):
 89 |             minibatchSize = int(len(X)/10)+1
 90 | 
 91 |         # Test data
 92 |         if printTestAccuracy:
 93 |             if testX==None and testY==None:
 94 |                 print "No test data given"
 95 |                 testX = np.zeros((1, len(X)))
 96 |                 testY = np.zeros((1,1))
 97 |             elif testX==None or testY==None:
 98 |                 print "One of testData not available"
 99 |                 return
100 |             else:
101 |                 testX = self.preprocessTrainingInputs(testX)
102 |                 testY = self.preprocessOutputs(testY)
103 |             if len(testX)!=len(testY):
104 |                 print "Test Datas not of same length"
105 |                 return
106 |             
107 |             yTestPred = self.predict(testX, visible=self.visible)
108 |             self.testAccuracy = np.sum([np.argmax(yTestPred[k])==np.argmax(testY[k])
109 |                                 for k in range(len(testY))])/float(len(testY))
110 |             print "test accuracy = " + str(self.testAccuracy)
111 | 
112 |         # Randomly initialize old weights (for adaptive learning), will copy values later 
113 |         if adaptLearningRate:
114 |             self.oldWeights = [np.random.randn(i+1, j) 
115 |                     for i, j in zip(self.layers[:-1], self.layers[1:])]
116 |         
117 |         # For each epoch
118 |         for i in range(nEpochs):
119 |             
120 |             print "Epoch "+str(i)+" of "+str(nEpochs)
121 |             
122 |             ## Find minibatches
123 |             # Generate list of indices of full training data
124 |             fullIdx = list(range(len(X)))
125 |             # Shuffle the list
126 |             np.random.shuffle(fullIdx)
127 |             # Make list of mininbatches
128 |             minibatches = [fullIdx[k:k+minibatchSize] 
129 |                             for k in xrange(0, len(X), minibatchSize)]
130 | 
131 |             # For each minibatch
132 |             for mininbatch in mininbatches:
133 |                 # Find X and y for each minibatch
134 |                 miniX = X[idx]
135 |                 miniY = y[idx]
136 |                 
137 |                 # Forward propagate through miniX
138 |                 a = self.forwardProp(miniX)
139 |                 
140 |                 # Check if Forward Propagation was successful
141 |                 if a==False:
142 |                     return
143 | 
144 |                 # Save old weights before backProp in case of adaptLR
145 |                 if adaptLearningRate:
146 |                     for i in range(len(self.weights)):
147 |                         self.oldWeights[i] = np.array(self.weights[i])
148 | 
149 |                 # Back propagate, update weights for minibatch
150 |                 self.backPropGradDescent(miniX, miniY)
151 | 
152 |             yPred = self.predict(X, visible=self.visible)
153 | 
154 |             self.trainAccuracy = (np.sum([np.argmax(yPred[k])==np.argmax(y[k])
155 |                                 for k in range(len(y))])).astype(float)/len(y)
156 |             print "train accuracy = " + str(self.trainAccuracy)
157 |             if printTestAccuracy:
158 |                 yTestPred = self.predict(testX, visible=self.visible)
159 |                 self.testAccuracy = (np.sum([np.argmax(yTestPred[k])==np.argmax(testY[k])
160 |                                     for k in range(len(testY))])).astype(float)/len(testY)
161 |                 print "test accuracy = " + str(self.testAccuracy)
162 | 
163 |             self.cost = 0.5*np.sum((yPred-y)**2)/len(y)            
164 |             print "cost = " + str(self.cost)
165 |             
166 |             if adaptLearningRate:
167 |                 self.adaptLR()
168 |             
169 |             self.evaluate(X, y)
170 | 
171 |             self.prevCost = self.cost
172 | 
173 |     def forwardProp(self, inputs):
174 |         inputs = self.preprocessInputs(inputs)
175 |         print "Forward..."
176 | 
177 |         if inputs.ndim!=1 and inputs.ndim!=2:
178 |             print "Input argument " + str(inputs.ndim) + \
179 |                 "is not one or two dimensional, please check."
180 |             return False
181 | 
182 |         if (inputs.ndim==1 and len(inputs)!=self.layers[0]) or \
183 |             (inputs.ndim==2 and inputs.shape[1]!=self.layers[0]):
184 |             print "Input argument does not match input dimensions (" + \
185 |                 str(self.layers[0]) + ") of network."
186 |             return False
187 |         
188 |         if DEBUG or self.visible:
189 |             print inputs
190 | 
191 |         # Save the outputs of each layer
192 |         self.outputs = []
193 | 
194 |         # For each layer
195 |         for l, w in enumerate(self.weights):
196 |             # Add bias term to the input
197 |             inputs = self.addBiasTerms(inputs)
198 | 
199 |             # Calculate the output
200 |             self.outputs.append(self.sigmoid(np.dot(inputs, w)))
201 | 
202 |             # Set this as the input to the next layer
203 |             inputs = np.array(self.outputs[-1])
204 | 
205 |             if DEBUG or self.visible:
206 |                 print "Layer "+str(l+1)
207 |                 print "inputs: "+str(inputs)
208 |                 print "weights: "+str(w)
209 |                 print "output: "+str(inputs)
210 |         del inputs
211 | 
212 |         return True
213 | 
214 |     def backPropGradDescent(self, X, y):
215 |         print "...Backward"
216 | 
217 |         # Correct the formats of inputs and outputs
218 |         X = self.preprocessInputs(X)
219 |         y = self.preprocessOutputs(y)
220 | 
221 |         # Compute first error
222 |         bpError = self.outputs[-1] - y
223 | 
224 |         if DEBUG or self.visible:
225 |             print "error = self.outputs[-1] - y:"
226 |             print error
227 | 
228 |         # For each layer in reverse order (last layer to first layer)
229 |         for l, w in enumerate(reversed(self.weights)):
230 |             if DEBUG or self.visible:
231 |                 print "LAYER "+str(len(self.weights)-l)
232 |             
233 |             # The calculated output "z" of that layer
234 |             predOutputs = self.outputs[-l-1]
235 | 
236 |             if DEBUG or self.visible:
237 |                 print "predOutputs"
238 |                 print predOutputs
239 | 
240 |             # delta = error*(z*(1-z)) === nxneurons
241 |             delta = np.multiply(error, np.multiply(predOutputs, 1 - predOutputs))
242 | 
243 |             if DEBUG or self.visible:
244 |                 print "To compute error to be backpropagated:"
245 |                 print "del = predOutputs*(1 - predOutputs)*error :"
246 |                 print delta
247 |                 print "weights:"
248 |                 print w
249 | 
250 |             # Compute new error to be propagated back (bias term neglected in backpropagation)
251 |             bpError = np.dot(delta, w[1:,:].T)
252 | 
253 |             if DEBUG or self.visible:
254 |                 print "backprop error = np.dot(del, w[1:,:].T) :"
255 |                 print error
256 | 
257 |             # If we are at first layer, inputs are data points
258 |             if l==len(self.weights)-1:
259 |                 inputs = self.addBiasTerms(X)
260 |             # Else, inputs === outputs from previous layer
261 |             else:
262 |                 inputs = self.addBiasTerms(self.outputs[-l-2])
263 |             
264 |             if DEBUG or self.visible:
265 |                 print "To compute errorTerm:"
266 |                 print "inputs:"
267 |                 print inputs
268 |                 print "del:"
269 |                 print delta
270 | 
271 |             # errorTerm = (inputs.T).*(delta)/n
272 |             # delta === nxneurons, inputs === nxprev, W === prevxneurons
273 |             errorTerm = np.dot(inputs.T, delta)/len(y)
274 |             if errorTerm.ndim==1:
275 |                 errorTerm.reshape((len(errorTerm), 1))
276 | 
277 |             if DEBUG or self.visible:
278 |                 print "errorTerm = np.dot(inputs.T, del) :"
279 |                 print errorTerm
280 |             
281 |             # regularization term
282 |             regWeight = np.zeros(w.shape)
283 |             regWeight[1:,:] = self.regLambda #bias term neglected
284 | 
285 |             if DEBUG or self.visible:
286 |                 print "To update weights:"
287 |                 print "learningRate*errorTerm:"
288 |                 print self.learningRate*errorTerm
289 |                 print "regWeight:"
290 |                 print regWeight
291 |                 print "weights:"
292 |                 print w
293 |                 print "regTerm = regWeight*w :"
294 |                 print regWeight*w
295 | 
296 |             # Update weights
297 |             self.weights[-l-1] = w - \
298 |                 (self.learningRate*errorTerm + np.multiply(regWeight,w))
299 |             
300 |             if DEBUG or self.visible:
301 |                 print "Updated 'weights' = learningRate*errorTerm + regTerm :"
302 |                 print self.weights[len(self.weights)-l-1]
303 | 
304 |     def adaptLR(self):
305 |         if self.cost > self.prevCost:
306 |             print "Cost increased!!"
307 |             self.learningRate /= 2.0
308 |             print "   - learningRate halved to: "+str(self.learningRate)
309 |             for i in range(len(self.weights)):
310 |                 self.weights[i] = self.oldWeights[i]
311 |             print "   - weights reverted back"
312 |         # good function
313 |         else:
314 |             self.learningRate *= 1.05
315 |             print "   - learningRate increased by 5% to: "+str(self.learningRate)
316 | 
317 |     def preprocessTrainingInputs(self, X):
318 |         X = self.preprocessInputs(X)
319 |         if self.normalizeInputs and np.max(X) > 1.0:
320 |             X = X/255.0
321 |         if np.all(self.meanX == np.zeros((1, self.layers[0]))) and self.meanCentering:
322 |             self.meanX = np.reshape(np.mean(X, axis=0), (1, X.shape[1]))
323 |         return X
324 | 
325 |     def preprocessTestingInputs(self, X):
326 |         X = self.preprocessInputs(X)
327 |         if self.normalizeInputs and np.max(X) > 1.0:
328 |             X = X/255.0
329 |         return X
330 | 
331 |     def preprocessInputs(self, X):
332 |         X = np.array(X, dtype=float)
333 |         # if X is int
334 |         if X.ndim==0:
335 |             X = np.array([X])
336 |         # if X is 1D
337 |         if X.ndim==1:
338 |             if self.layers[0]==1: #if ndim=1
339 |                 X = np.reshape(X, (len(X),1))
340 |             else: #if X is only 1 nd-ndimensional vector
341 |                 X = np.reshape(X, (1,len(X)))
342 |         return X
343 | 
344 |     def preprocessOutputs(self, Y):
345 |         Y = np.array(Y, dtype=float)
346 |         # if Y is int
347 |         if Y.ndim==0:
348 |             Y = np.array([Y])
349 |         # if Y is 1D
350 |         if Y.ndim==1:
351 |             if self.layers[-1]==1:
352 |                 Y = np.reshape(Y, (len(Y),1))
353 |             else:
354 |                 Y = np.reshape(Y, (1,len(Y)))
355 |         return Y
356 | 
357 |     def addBiasTerms(self, X):
358 |         if X.ndim==0 or X.ndim==1:
359 |             X = np.insert(X, 0, 1)
360 |         elif X.ndim==2:
361 |             X = np.insert(X, 0, 1, axis=1)
362 |         return X
363 | 
364 |     def sigmoid(self, z):
365 |         return 1/(1 + np.exp(-z))
366 | 
367 |     def evaluate(self, X, Y):
368 |         yPreds = forwardProp(X, self.weights)[-1]
369 |         test_results = [(np.argmax(yPreds[i]), np.argmax(Y[i]))
370 |                             for i in range(len(Y))]
371 |         yes = sum(int(x == y) for (x, y) in test_results)
372 |         print(str(yes)+" out of "+str(len(Y)))
373 | 
374 |     def loadMNISTData(self, path='/Users/vikram.v/Downloads/mnist.npz'):
375 |         # Use numpy.load() to load the .npz file
376 |         f = np.load(path)
377 | 
378 |         # To check files stored in .npz file
379 |         f.files
380 | 
381 |         # Saving the files
382 |         x_train = f['x_train']
383 |         y_train = f['y_train']
384 |         x_test = f['x_test']
385 |         y_test = f['y_test']
386 |         f.close()
387 | 
388 |         # Preprocess inputs
389 |         x_train_new = np.array([x.flatten() for x in x_train])
390 |         y_train_new = np.zeros((len(y_train), 10))
391 |         for i in range(len(y_train)):
392 |             y_train_new[i][y_train[i]] = 1
393 | 
394 |         x_test_new = np.array([x.flatten() for x in x_test])
395 |         y_test_new = np.zeros((len(y_test), 10))
396 |         for i in range(len(y_test)):
397 |             y_test_new[i][y_test[i]] = 1
398 | 
399 |         return [x_train_new, y_train_new, x_test_new, y_test_new]
400 | 


--------------------------------------------------------------------------------
/myPyNNTest.py:
--------------------------------------------------------------------------------
  1 | from myPyNN import *
  2 | 
  3 | # RANDOM
  4 | X = [[2,2,2], [3,3,3], [4,4,4], [5,5,5], [6,6,6], [7,7,7], [8,8,8], [9,9,9], [10,10,10], [11,11,11]]
  5 | y = [.2, .3, .4, .5, .6, .7, .8, .9, 0, .1]
  6 | myNN = MyPyNN([3, 10, 1])
  7 | 
  8 | 
  9 | # MANUAL CALCULATIONS TO CHECK NETWORK
 10 | def addBiasTerms(X):
 11 |     if X.ndim==0 or X.ndim==1:
 12 |         X = np.insert(X, 0, 1)
 13 |     elif X.ndim==2:
 14 |         X = np.insert(X, 0, 1, axis=1)
 15 |     return X
 16 | 
 17 | def sigmoid(z):
 18 |     return 1/(1 + np.exp(-z))
 19 | 
 20 | X = np.array([[0,0], [0,1], [1,0], [1,1]])
 21 | y = np.array([[0], [1], [1], [1]])
 22 | myNN = MyPyNN([2, 1, 1])
 23 | lr = 1.5
 24 | nIterations = 1
 25 | W01 = myNN.weights[0]
 26 | W02 = myNN.weights[1]
 27 | W1 = W01
 28 | W2 = W02
 29 | X = X.astype('float')
 30 | inputs = X - np.reshape(np.mean(X, axis=0), (1, X.shape[1]))
 31 | for i in range(nIterations):
 32 |     yPred = sigmoid(np.dot(addBiasTerms(sigmoid(np.dot(addBiasTerms(inputs), W1))), W2))
 33 |     err2 = yPred - y
 34 |     output1 = sigmoid(np.dot(addBiasTerms(inputs), W1))
 35 |     del2 = np.multiply(np.multiply(yPred, (1-yPred)), err2)
 36 |     err1 = np.dot(del2, W2[1:].T)
 37 |     deltaW2 = lr*np.dot(addBiasTerms(output1).T, del2)/len(yPred)
 38 |     newW2 = W2 - deltaW2
 39 |     del1 = np.multiply(np.multiply(output1, 1-output1), err1)
 40 |     deltaW1 = lr*np.dot(addBiasTerms(inputs).T, del1)/len(yPred)
 41 |     newW1 = W1 - deltaW1
 42 |     W1 = newW1
 43 |     W2 = newW2
 44 | 
 45 | myNN.trainUsingGD(X, y, learningRate=lr, nIterations=nIterations, visible=True)
 46 | newW1 == myNN.weights[0]
 47 | newW2 == myNN.weights[1]
 48 | 
 49 | yPred == myNN.outputs[1]
 50 | output1 == myNN.outputs[0]
 51 | 
 52 | 
 53 | # COMPARING LEARNING RATES
 54 | myNN1 = MyPyNN([2, 3, 1])
 55 | myNN2 = MyPyNN([2, 3, 1])
 56 | myNN3 = MyPyNN([2, 3, 1])
 57 | myNN4 = MyPyNN([2, 3, 1])
 58 | myNN5 = MyPyNN([2, 3, 1])
 59 | myNN2.weights[0] = myNN1.weights[0]
 60 | myNN2.weights[1] = myNN1.weights[1]
 61 | myNN3.weights[0] = myNN1.weights[0]
 62 | myNN3.weights[1] = myNN1.weights[1]
 63 | myNN4.weights[0] = myNN1.weights[0]
 64 | myNN4.weights[1] = myNN1.weights[1]
 65 | myNN5.weights[0] = myNN1.weights[0]
 66 | myNN5.weights[1] = myNN1.weights[1]
 67 | myNN1.trainUsingGD(X, y, learningRate=0.1, nIterations=2500)
 68 | myNN2.trainUsingGD(X, y, learningRate=0.5, nIterations=600)
 69 | myNN3.trainUsingGD(X, y, learningRate=1, nIterations=400)
 70 | myNN4.trainUsingGD(X, y, learningRate=2, nIterations=200)
 71 | myNN5.trainUsingGD(X, y, learningRate=200, nIterations=1000)
 72 | 
 73 | 
 74 | # Make network
 75 | myNN = MyPyNN([784, 30, 10])
 76 | lr = 3
 77 | nIterations = 30
 78 | minibatchSize = 10
 79 | 
 80 | # MNIST DATA
 81 | '''
 82 | f = np.load(path)
 83 | 
 84 | # To check files stored in .npz file
 85 | f.files
 86 | 
 87 | # Saving the files
 88 | x_train = f['x_train']
 89 | y_train = f['y_train']
 90 | x_test = f['x_test']
 91 | y_test = f['y_test']
 92 | f.close()
 93 | 
 94 | # Preprocess inputs
 95 | x_train_new = np.array([x.flatten() for x in x_train])
 96 | y_train_new = np.zeros((len(y_train), 10))
 97 | for i in range(len(y_train)):
 98 |     y_train_new[i][y_train[i]] = 1
 99 | 
100 | x_test_new = np.array([x.flatten() for x in x_test])
101 | y_test_new = np.zeros((len(y_test), 10))
102 | for i in range(len(y_test)):
103 |     y_test_new[i][y_test[i]] = 1
104 | '''
105 | 
106 | [x_train_new, y_train_new, x_test_new, y_test_new] = myNN.loadMNISTData()
107 | 
108 | myNN.trainUsingGD(x_train_new, y_train_new, nIterations=nIterations, learningRate=lr)
109 | myNN.trainUsingMinibatchGD(x_train_new, y_train_new, nIterations=nIterations, minibatchSize=minibatchSize, learningRate=lr)
110 | myNN.trainUsingminibatchGD(x_train_new, y_train_new, nIterations=nIterations, minibatchSize=minibatchSize, learningRate=lr, printTestAccuracy=True, testX=x_test_new, testY=y_test_new)
111 | 
112 | # Make network
113 | myNN = MyPyNN([784, 5, 5, 10])
114 | lr = 1.5
115 | nIterations = 1000
116 | minibatchSize = 100
117 | myNN.trainUsingSGD(x_train_new, y_train_new, nIterations=nIterations, minibatchSize=minibatchSize, learningRate=lr)
118 | 
119 | # To check type of the dataset
120 | type(x_train)
121 | type(y_train)
122 | # To check data
123 | x_train.shape
124 | y_train.shape
125 | fig = plt.figure(figsize=(10, 2))
126 | for i in range(20):
127 |     ax1 = fig.add_subplot(2, 10, i+1)
128 |     ax1.imshow(x_train[i], cmap='gray');
129 |     ax1.axis('off')
130 | 
131 | 


--------------------------------------------------------------------------------