├── .gitignore
├── LICENSE
├── README.md
├── activations.py
├── main.py
├── network.py
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Compiled code files of modules
 2 | *.pyc
 3 | 
 4 | # Intellij project files
 5 | *.iml
 6 | .idea/*
 7 | 
 8 | # IPython Notebook Checkpoints
 9 | .ipynb_checkpoints/*
10 | 
11 | # MNIST dataset directory
12 | data/*
13 | 
14 | # Model files (compressed numpy binaries)
15 | models/*
16 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2016 Karan Desai
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | MNIST Handwritten Digit Classifier
  2 | ==================================
  3 | 
  4 | An implementation of multilayer neural network using `numpy` library. The implementation 
  5 | is a modified version of Michael Nielsen's implementation in 
  6 | [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/) book. 
  7 | 
  8 | 
  9 | ### Brief Background:
 10 | 
 11 | If you are familiar with basics of Neural Networks, feel free to skip this section. For 
 12 | total beginners who landed up here before reading anything about Neural Networks:
 13 | 
 14 | ![Sigmoid Neuron](http://i.imgur.com/dOkT9Y9.png)
 15 | 
 16 | * Neural networks are made up of building blocks known as **Sigmoid Neurons**. These are 
 17 | named so because their output follows [Sigmoid Function](https://en.wikipedia.org/wiki/Sigmoid_function).
 18 | * **x<sub>j</sub>** are inputs, which are weighted by **w<sub>j</sub>** weights and the 
 19 | neuron has its intrinsic bias **b**. The output of neuron is known as "activation ( **a** )".
 20 | 
 21 | _**Note:** There are other functions in use other than sigmoid, but this information for
 22 | now is sufficient for beginners._
 23 | 
 24 | * A neural network is made up by stacking layers of neurons, and is defined by the weights 
 25 | of connections and biases of neurons. Activations are a result dependent on a certain input.
 26 | 
 27 | 
 28 | ### Why a modified implementation ?
 29 | 
 30 | This book and Stanford's Machine Learning Course by Prof. Andrew Ng are recommended as 
 31 | good resources for beginners. At times, it got confusing to me while referring both resources:
 32 | 
 33 | MATLAB has _1-indexed_ data structures, while `numpy` has them _0-indexed_. Some parameters 
 34 | of a neural network are not defined for the input layer, so there was a little mess up in 
 35 | mathematical equations of book, and indices in code. For example according to the book, the 
 36 | bias vector of second layer of neural network was referred as `bias[0]` as input layer (first 
 37 | layer) has no bias vector. I found it a bit inconvenient to play with.
 38 | 
 39 | I am fond of Scikit Learn's API style, hence my class has a similar structure of code. While 
 40 | theoretically it resembles the book and Stanford's course, you can find simple methods such 
 41 | as `fit`, `predict`, `validate` to train, test, validate the model respectively.
 42 | 
 43 | 
 44 | ### Naming and Indexing Convention:
 45 | 
 46 | I have followed a particular convention in indexing quantities.
 47 | Dimensions of quantities are listed according to this figure.
 48 | 
 49 | ![Small Labelled Neural Network](http://i.imgur.com/HdfentB.png)
 50 | 
 51 | 
 52 | #### **Layers**
 53 | * Input layer is the **0<sup>th</sup>** layer, and output layer 
 54 | is the **L<sup>th</sup>** layer. Number of layers: **N<sub>L</sub> = L + 1**.
 55 | ```
 56 | sizes = [2, 3, 1]
 57 | ```
 58 | 
 59 | #### **Weights**
 60 | * Weights in this neural network implementation are a list of 
 61 | matrices (`numpy.ndarrays`). `weights[l]` is a matrix of weights entering the 
 62 | **l<sup>th</sup>** layer of the network (Denoted as **w<sup>l</sup>**).  
 63 | * An element of this matrix is denoted as **w<sup>l</sup><sub>jk</sub>**. It is a 
 64 | part of **j<sup>th</sup>** row, which is a collection of all weights entering 
 65 | **j<sup>th</sup>** neuron, from all neurons (0 to k) of **(l-1)<sup>th</sup>** layer.  
 66 | * No weights enter the input layer, hence `weights[0]` is redundant, and further it 
 67 | follows as `weights[1]` being the collection of weights entering layer 1 and so on.
 68 | ```
 69 | weights = |¯   [[]],    [[a, b],    [[p],   ¯|
 70 |           |              [c, d],     [q],    |
 71 |           |_             [e, f]],    [r]]   _|
 72 | ```
 73 | 
 74 | #### **Biases**
 75 | * Biases in this neural network implementation are a list of one-dimensional 
 76 | vectors (`numpy.ndarrays`). `biases[l]` is a vector of biases of neurons in the 
 77 | **l<sup>th</sup>** layer of network (Denoted as **b<sup>l</sup>**).  
 78 | * An element of this vector is denoted as **b<sup>l</sup><sub>j</sub>**. It is a 
 79 | part of **j<sup>th</sup>** row, the bias of **j<sup>th</sup>** in layer.  
 80 | * Input layer has no biases, hence `biases[0]` is redundant, and further it 
 81 | follows as `biases[1]` being the biases of neurons of layer 1 and so on.
 82 | ```
 83 | biases = |¯   [[],    [[0],    [[0]]   ¯|
 84 |          |     []],    [1],             |
 85 |          |_            [2]],           _|
 86 | ```
 87 | 
 88 | #### **'Z's**
 89 | * For input vector **x** to a layer **l**, **z** is defined as: 
 90 | **z<sup>l</sup> = w<sup>l</sup> . x + b<sup>l</sup>**
 91 | * Input layer provides **x** vector as input to layer 1, and itself has no input, 
 92 | weight or bias, hence `zs[0]` is redundant.
 93 | * Dimensions of `zs` will be same as `biases`.
 94 | 
 95 | #### **Activations**
 96 | * Activations of **l<sup>th</sup>** layer are outputs from neurons of **l<sup>th</sup>** 
 97 | which serve as input to **(l+1)<sup>th</sup>** layer. The dimensions of `biases`, `zs` and 
 98 | `activations` are similar.
 99 | * Input layer provides **x** vector as input to layer 1, hence `activations[0]` can be related 
100 | to **x** - the input training example.
101 | 
102 | #### **Execution of Neural network**
103 | ```
104 | #to train and test the neural network algorithm, please use the following command
105 | python main.py
106 | ```
107 | 


--------------------------------------------------------------------------------
/activations.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Helper module to provide activation to network layers.
 3 | Four types of activations with their derivates are available:
 4 | 
 5 | - Sigmoid
 6 | - Softmax
 7 | - Tanh
 8 | - ReLU
 9 | """
10 | import numpy as np
11 | 
12 | 
13 | def sigmoid(z):
14 |     return 1.0 / (1.0 + np.exp(-z))
15 | 
16 | 
17 | def sigmoid_prime(z):
18 |     return sigmoid(z) * (1 - sigmoid(z))
19 | 
20 | 
21 | def softmax(z):
22 |     return np.exp(z) / np.sum(np.exp(z))
23 | 
24 | 
25 | def tanh(z):
26 |     return np.tanh(z)
27 | 
28 | 
29 | def tanh_prime(z):
30 |     return 1 - tanh(z) * tanh(z)
31 | 
32 | 
33 | def relu(z):
34 |     return np.maximum(z, 0)
35 | 
36 | 
37 | def relu_prime(z):
38 |     return z > 0
39 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
 1 | import gzip
 2 | import os
 3 | import pickle
 4 | import sys
 5 | import wget
 6 | import numpy as np
 7 | 
 8 | from network import NeuralNetwork
 9 | 
10 | 
11 | def load_mnist():
12 |     if not os.path.exists(os.path.join(os.curdir, "data")):
13 |         os.mkdir(os.path.join(os.curdir, "data"))
14 |         wget.download("http://deeplearning.net/data/mnist/mnist.pkl.gz", out="data")
15 | 
16 |     data_file = gzip.open(os.path.join(os.curdir, "data", "mnist.pkl.gz"), "rb")
17 |     train_data, val_data, test_data = pickle.load(data_file, encoding="latin1")
18 |     data_file.close()
19 | 
20 |     train_inputs = [np.reshape(x, (784, 1)) for x in train_data[0]]
21 |     train_results = [vectorized_result(y) for y in train_data[1]]
22 |     train_data = list(zip(train_inputs, train_results))
23 | 
24 |     val_inputs = [np.reshape(x, (784, 1)) for x in val_data[0]]
25 |     val_results = val_data[1]
26 |     val_data = list(zip(val_inputs, val_results))
27 | 
28 |     test_inputs = [np.reshape(x, (784, 1)) for x in test_data[0]]
29 |     test_data = list(zip(test_inputs, test_data[1]))
30 |     return train_data, val_data, test_data
31 | 
32 | 
33 | def vectorized_result(y):
34 |     e = np.zeros((10, 1))
35 |     e[y] = 1.0
36 |     return e
37 | 
38 | 
39 | if __name__ == "__main__":
40 |     np.random.seed(42)
41 | 
42 |     layers = [784, 30, 10]
43 |     learning_rate = 0.01
44 |     mini_batch_size = 16
45 |     epochs = 100
46 | 
47 |     # Initialize train, val and test data
48 |     train_data, val_data, test_data = load_mnist()
49 | 
50 |     nn = NeuralNetwork(layers, learning_rate, mini_batch_size, "relu")
51 |     nn.fit(train_data, val_data, epochs)
52 | 
53 |     accuracy = nn.validate(test_data) / 100.0
54 |     print(f"Test Accuracy: {accuracy}%.")
55 | 
56 |     nn.save()
57 | 


--------------------------------------------------------------------------------
/network.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import numpy as np
  3 | import random
  4 | 
  5 | import activations
  6 | 
  7 | 
  8 | class NeuralNetwork(object):
  9 | 
 10 |     def __init__(
 11 |         self,
 12 |         sizes=[784, 30, 10],
 13 |         learning_rate=1e-2,
 14 |         mini_batch_size=16,
 15 |         activation_fn="relu"
 16 |     ):
 17 |         """Initialize a Neural Network model.
 18 | 
 19 |         Parameters
 20 |         ----------
 21 |         sizes : list, optional
 22 |             A list of integers specifying number of neurns in each layer. Not
 23 |             required if a pretrained model is used.
 24 | 
 25 |         learning_rate : float, optional
 26 |             Learning rate for gradient descent optimization. Defaults to 1.0
 27 | 
 28 |         mini_batch_size : int, optional
 29 |             Size of each mini batch of training examples as used by Stochastic
 30 |             Gradient Descent. Denotes after how many examples the weights
 31 |             and biases would be updated. Default size is 16.
 32 | 
 33 |         activation_fn: str, optional
 34 |             Which activation to use in intermediate layers, one of {"sigmoid",
 35 |             "tanh", "self.activation_fn"}. Final layer activation is always "softmax".
 36 |             Default value is "self.activation_fn".
 37 |         """
 38 | 
 39 |         # Input layer is layer 0, followed by hidden layers layer 1, 2, 3...
 40 |         self.sizes = sizes
 41 |         self.num_layers = len(sizes)
 42 |         self.activation_fn = getattr(activations, activation_fn)
 43 |         self.activation_fn_prime = getattr(activations, f"{activation_fn}_prime")
 44 | 
 45 |         # First term corresponds to layer 0 (input layer). No weights enter the
 46 |         # input layer and hence self.weights[0] is redundant.
 47 |         self.weights = [np.array([0])] + [np.random.randn(y, x)/np.sqrt(x) for y, x in
 48 |                                           zip(sizes[1:], sizes[:-1])]
 49 | 
 50 |         # Input layer does not have any biases. self.biases[0] is redundant.
 51 |         self.biases = [np.array([0])] + [np.random.randn(y, 1) for y in sizes[1:]]
 52 | 
 53 |         # Input layer has no weights, biases associated. Hence z = wx + b is not
 54 |         # defined for input layer. self.zs[0] is redundant.
 55 |         self._zs = [np.zeros(bias.shape) for bias in self.biases]
 56 | 
 57 |         # Training examples can be treated as activations coming out of input
 58 |         # layer. Hence self.activations[0] = (training_example).
 59 |         self._activations = [np.zeros(bias.shape) for bias in self.biases]
 60 | 
 61 |         self.mini_batch_size = mini_batch_size
 62 |         self.lr = learning_rate
 63 | 
 64 |     def fit(self, training_data, validation_data=None, epochs=10):
 65 |         """Fit (train) the Neural Network on provided training data. Fitting is
 66 |         carried out using Stochastic Gradient Descent Algorithm.
 67 | 
 68 |         Parameters
 69 |         ----------
 70 |         training_data : list of tuple
 71 |             A list of tuples of numpy arrays, ordered as (image, label).
 72 | 
 73 |         validation_data : list of tuple, optional
 74 |             Same as `training_data`, if provided, the network will display
 75 |             validation accuracy after each epoch.
 76 | 
 77 |         """
 78 |         for epoch in range(epochs):
 79 |             random.shuffle(training_data)
 80 |             mini_batches = [
 81 |                 training_data[k:k + self.mini_batch_size] for k in
 82 |                 range(0, len(training_data), self.mini_batch_size)]
 83 | 
 84 |             for mini_batch in mini_batches:
 85 |                 nabla_b = [np.zeros(bias.shape) for bias in self.biases]
 86 |                 nabla_w = [np.zeros(weight.shape) for weight in self.weights]
 87 |                 for x, y in mini_batch:
 88 |                     self._forward_prop(x)
 89 |                     delta_nabla_b, delta_nabla_w = self._back_prop(x, y)
 90 |                     nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
 91 |                     nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
 92 | 
 93 |                 self.weights = [
 94 |                     w - (self.lr / self.mini_batch_size) * dw for w, dw in
 95 |                     zip(self.weights, nabla_w)]
 96 |                 self.biases = [
 97 |                     b - (self.lr / self.mini_batch_size) * db for b, db in
 98 |                     zip(self.biases, nabla_b)]
 99 | 
100 |             if validation_data:
101 |                 accuracy = self.validate(validation_data) / 100.0
102 |                 print(f"Epoch {epoch + 1}, accuracy {accuracy} %.")
103 |             else:
104 |                 print(f"Processed epoch {epoch}.")
105 | 
106 |     def validate(self, validation_data):
107 |         """Validate the Neural Network on provided validation data. It uses the
108 |         number of correctly predicted examples as validation accuracy metric.
109 | 
110 |         Parameters
111 |         ----------
112 |         validation_data : list of tuple
113 | 
114 |         Returns
115 |         -------
116 |         int
117 |             Number of correctly predicted images.
118 | 
119 |         """
120 |         validation_results = [(self.predict(x) == y) for x, y in validation_data]
121 |         return sum(result for result in validation_results)
122 | 
123 |     def predict(self, x):
124 |         """Predict the label of a single test example (image).
125 | 
126 |         Parameters
127 |         ----------
128 |         x : numpy.array
129 | 
130 |         Returns
131 |         -------
132 |         int
133 |             Predicted label of example (image).
134 | 
135 |         """
136 | 
137 |         self._forward_prop(x)
138 |         return np.argmax(self._activations[-1])
139 | 
140 |     def _forward_prop(self, x):
141 |         self._activations[0] = x
142 |         for i in range(1, self.num_layers):
143 |             self._zs[i] = (
144 |                 self.weights[i].dot(self._activations[i - 1]) + self.biases[i]
145 |             )
146 |             # Use "softmax" for last layer.
147 |             if i == self.num_layers - 1:
148 |                 self._activations[i] = activations.softmax(self._zs[i])
149 |             else:
150 |                 self._activations[i] = self.activation_fn(self._zs[i])
151 | 
152 |     def _back_prop(self, x, y):
153 |         nabla_b = [np.zeros(bias.shape) for bias in self.biases]
154 |         nabla_w = [np.zeros(weight.shape) for weight in self.weights]
155 | 
156 |         error = (self._activations[-1] - y)
157 |         nabla_b[-1] = error
158 |         nabla_w[-1] = error.dot(self._activations[-2].transpose())
159 | 
160 |         for l in range(self.num_layers - 2, 0, -1):
161 |             error = np.multiply(
162 |                 self.weights[l + 1].transpose().dot(error),
163 |                 self.activation_fn_prime(self._zs[l])
164 |             )
165 |             nabla_b[l] = error
166 |             nabla_w[l] = error.dot(self._activations[l - 1].transpose())
167 | 
168 |         return nabla_b, nabla_w
169 | 
170 |     def load(self, filename='model.npz'):
171 |         """Prepare a neural network from a compressed binary containing weights
172 |         and biases arrays. Size of layers are derived from dimensions of
173 |         numpy arrays.
174 | 
175 |         Parameters
176 |         ----------
177 |         filename : str, optional
178 |             Name of the ``.npz`` compressed binary in models directory.
179 | 
180 |         """
181 |         npz_members = np.load(os.path.join(os.curdir, 'models', filename))
182 | 
183 |         self.weights = list(npz_members['weights'])
184 |         self.biases = list(npz_members['biases'])
185 | 
186 |         # Bias vectors of each layer has same length as the number of neurons
187 |         # in that layer. So we can build `sizes` through biases vectors.
188 |         self.sizes = [b.shape[0] for b in self.biases]
189 |         self.num_layers = len(self.sizes)
190 | 
191 |         # These are declared as per desired shape.
192 |         self._zs = [np.zeros(bias.shape) for bias in self.biases]
193 |         self._activations = [np.zeros(bias.shape) for bias in self.biases]
194 | 
195 |         # Other hyperparameters are set as specified in model. These were cast
196 |         # to numpy arrays for saving in the compressed binary.
197 |         self.mini_batch_size = int(npz_members['mini_batch_size'])
198 |         self.lr = float(npz_members['lr'])
199 | 
200 |     def save(self, filename='model.npz'):
201 |         """Save weights, biases and hyperparameters of neural network to a
202 |         compressed binary. This ``.npz`` binary is saved in 'models' directory.
203 | 
204 |         Parameters
205 |         ----------
206 |         filename : str, optional
207 |             Name of the ``.npz`` compressed binary in to be saved.
208 | 
209 |         """
210 |         np.savez_compressed(
211 |             file=os.path.join(os.curdir, 'models', filename),
212 |             weights=self.weights,
213 |             biases=self.biases,
214 |             mini_batch_size=self.mini_batch_size,
215 |             lr=self.lr
216 |         )
217 | 
218 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | wget==3.2
2 | numpy==1.16.5
3 | 


--------------------------------------------------------------------------------