├── .gitignore ├── LICENSE ├── README.md ├── activations.py ├── main.py ├── network.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Compiled code files of modules 2 | *.pyc 3 | 4 | # Intellij project files 5 | *.iml 6 | .idea/* 7 | 8 | # IPython Notebook Checkpoints 9 | .ipynb_checkpoints/* 10 | 11 | # MNIST dataset directory 12 | data/* 13 | 14 | # Model files (compressed numpy binaries) 15 | models/* 16 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 Karan Desai 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | MNIST Handwritten Digit Classifier 2 | ================================== 3 | 4 | An implementation of multilayer neural network using `numpy` library. The implementation 5 | is a modified version of Michael Nielsen's implementation in 6 | [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/) book. 7 | 8 | 9 | ### Brief Background: 10 | 11 | If you are familiar with basics of Neural Networks, feel free to skip this section. For 12 | total beginners who landed up here before reading anything about Neural Networks: 13 | 14 | ![Sigmoid Neuron](http://i.imgur.com/dOkT9Y9.png) 15 | 16 | * Neural networks are made up of building blocks known as **Sigmoid Neurons**. These are 17 | named so because their output follows [Sigmoid Function](https://en.wikipedia.org/wiki/Sigmoid_function). 18 | * **xj** are inputs, which are weighted by **wj** weights and the 19 | neuron has its intrinsic bias **b**. The output of neuron is known as "activation ( **a** )". 20 | 21 | _**Note:** There are other functions in use other than sigmoid, but this information for 22 | now is sufficient for beginners._ 23 | 24 | * A neural network is made up by stacking layers of neurons, and is defined by the weights 25 | of connections and biases of neurons. Activations are a result dependent on a certain input. 26 | 27 | 28 | ### Why a modified implementation ? 29 | 30 | This book and Stanford's Machine Learning Course by Prof. Andrew Ng are recommended as 31 | good resources for beginners. At times, it got confusing to me while referring both resources: 32 | 33 | MATLAB has _1-indexed_ data structures, while `numpy` has them _0-indexed_. Some parameters 34 | of a neural network are not defined for the input layer, so there was a little mess up in 35 | mathematical equations of book, and indices in code. For example according to the book, the 36 | bias vector of second layer of neural network was referred as `bias[0]` as input layer (first 37 | layer) has no bias vector. I found it a bit inconvenient to play with. 38 | 39 | I am fond of Scikit Learn's API style, hence my class has a similar structure of code. While 40 | theoretically it resembles the book and Stanford's course, you can find simple methods such 41 | as `fit`, `predict`, `validate` to train, test, validate the model respectively. 42 | 43 | 44 | ### Naming and Indexing Convention: 45 | 46 | I have followed a particular convention in indexing quantities. 47 | Dimensions of quantities are listed according to this figure. 48 | 49 | ![Small Labelled Neural Network](http://i.imgur.com/HdfentB.png) 50 | 51 | 52 | #### **Layers** 53 | * Input layer is the **0th** layer, and output layer 54 | is the **Lth** layer. Number of layers: **NL = L + 1**. 55 | ``` 56 | sizes = [2, 3, 1] 57 | ``` 58 | 59 | #### **Weights** 60 | * Weights in this neural network implementation are a list of 61 | matrices (`numpy.ndarrays`). `weights[l]` is a matrix of weights entering the 62 | **lth** layer of the network (Denoted as **wl**). 63 | * An element of this matrix is denoted as **wljk**. It is a 64 | part of **jth** row, which is a collection of all weights entering 65 | **jth** neuron, from all neurons (0 to k) of **(l-1)th** layer. 66 | * No weights enter the input layer, hence `weights[0]` is redundant, and further it 67 | follows as `weights[1]` being the collection of weights entering layer 1 and so on. 68 | ``` 69 | weights = |¯ [[]], [[a, b], [[p], ¯| 70 | | [c, d], [q], | 71 | |_ [e, f]], [r]] _| 72 | ``` 73 | 74 | #### **Biases** 75 | * Biases in this neural network implementation are a list of one-dimensional 76 | vectors (`numpy.ndarrays`). `biases[l]` is a vector of biases of neurons in the 77 | **lth** layer of network (Denoted as **bl**). 78 | * An element of this vector is denoted as **blj**. It is a 79 | part of **jth** row, the bias of **jth** in layer. 80 | * Input layer has no biases, hence `biases[0]` is redundant, and further it 81 | follows as `biases[1]` being the biases of neurons of layer 1 and so on. 82 | ``` 83 | biases = |¯ [[], [[0], [[0]] ¯| 84 | | []], [1], | 85 | |_ [2]], _| 86 | ``` 87 | 88 | #### **'Z's** 89 | * For input vector **x** to a layer **l**, **z** is defined as: 90 | **zl = wl . x + bl** 91 | * Input layer provides **x** vector as input to layer 1, and itself has no input, 92 | weight or bias, hence `zs[0]` is redundant. 93 | * Dimensions of `zs` will be same as `biases`. 94 | 95 | #### **Activations** 96 | * Activations of **lth** layer are outputs from neurons of **lth** 97 | which serve as input to **(l+1)th** layer. The dimensions of `biases`, `zs` and 98 | `activations` are similar. 99 | * Input layer provides **x** vector as input to layer 1, hence `activations[0]` can be related 100 | to **x** - the input training example. 101 | 102 | #### **Execution of Neural network** 103 | ``` 104 | #to train and test the neural network algorithm, please use the following command 105 | python main.py 106 | ``` 107 | -------------------------------------------------------------------------------- /activations.py: -------------------------------------------------------------------------------- 1 | """ 2 | Helper module to provide activation to network layers. 3 | Four types of activations with their derivates are available: 4 | 5 | - Sigmoid 6 | - Softmax 7 | - Tanh 8 | - ReLU 9 | """ 10 | import numpy as np 11 | 12 | 13 | def sigmoid(z): 14 | return 1.0 / (1.0 + np.exp(-z)) 15 | 16 | 17 | def sigmoid_prime(z): 18 | return sigmoid(z) * (1 - sigmoid(z)) 19 | 20 | 21 | def softmax(z): 22 | return np.exp(z) / np.sum(np.exp(z)) 23 | 24 | 25 | def tanh(z): 26 | return np.tanh(z) 27 | 28 | 29 | def tanh_prime(z): 30 | return 1 - tanh(z) * tanh(z) 31 | 32 | 33 | def relu(z): 34 | return np.maximum(z, 0) 35 | 36 | 37 | def relu_prime(z): 38 | return z > 0 39 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import gzip 2 | import os 3 | import pickle 4 | import sys 5 | import wget 6 | import numpy as np 7 | 8 | from network import NeuralNetwork 9 | 10 | 11 | def load_mnist(): 12 | if not os.path.exists(os.path.join(os.curdir, "data")): 13 | os.mkdir(os.path.join(os.curdir, "data")) 14 | wget.download("http://deeplearning.net/data/mnist/mnist.pkl.gz", out="data") 15 | 16 | data_file = gzip.open(os.path.join(os.curdir, "data", "mnist.pkl.gz"), "rb") 17 | train_data, val_data, test_data = pickle.load(data_file, encoding="latin1") 18 | data_file.close() 19 | 20 | train_inputs = [np.reshape(x, (784, 1)) for x in train_data[0]] 21 | train_results = [vectorized_result(y) for y in train_data[1]] 22 | train_data = list(zip(train_inputs, train_results)) 23 | 24 | val_inputs = [np.reshape(x, (784, 1)) for x in val_data[0]] 25 | val_results = val_data[1] 26 | val_data = list(zip(val_inputs, val_results)) 27 | 28 | test_inputs = [np.reshape(x, (784, 1)) for x in test_data[0]] 29 | test_data = list(zip(test_inputs, test_data[1])) 30 | return train_data, val_data, test_data 31 | 32 | 33 | def vectorized_result(y): 34 | e = np.zeros((10, 1)) 35 | e[y] = 1.0 36 | return e 37 | 38 | 39 | if __name__ == "__main__": 40 | np.random.seed(42) 41 | 42 | layers = [784, 30, 10] 43 | learning_rate = 0.01 44 | mini_batch_size = 16 45 | epochs = 100 46 | 47 | # Initialize train, val and test data 48 | train_data, val_data, test_data = load_mnist() 49 | 50 | nn = NeuralNetwork(layers, learning_rate, mini_batch_size, "relu") 51 | nn.fit(train_data, val_data, epochs) 52 | 53 | accuracy = nn.validate(test_data) / 100.0 54 | print(f"Test Accuracy: {accuracy}%.") 55 | 56 | nn.save() 57 | -------------------------------------------------------------------------------- /network.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import random 4 | 5 | import activations 6 | 7 | 8 | class NeuralNetwork(object): 9 | 10 | def __init__( 11 | self, 12 | sizes=[784, 30, 10], 13 | learning_rate=1e-2, 14 | mini_batch_size=16, 15 | activation_fn="relu" 16 | ): 17 | """Initialize a Neural Network model. 18 | 19 | Parameters 20 | ---------- 21 | sizes : list, optional 22 | A list of integers specifying number of neurns in each layer. Not 23 | required if a pretrained model is used. 24 | 25 | learning_rate : float, optional 26 | Learning rate for gradient descent optimization. Defaults to 1.0 27 | 28 | mini_batch_size : int, optional 29 | Size of each mini batch of training examples as used by Stochastic 30 | Gradient Descent. Denotes after how many examples the weights 31 | and biases would be updated. Default size is 16. 32 | 33 | activation_fn: str, optional 34 | Which activation to use in intermediate layers, one of {"sigmoid", 35 | "tanh", "self.activation_fn"}. Final layer activation is always "softmax". 36 | Default value is "self.activation_fn". 37 | """ 38 | 39 | # Input layer is layer 0, followed by hidden layers layer 1, 2, 3... 40 | self.sizes = sizes 41 | self.num_layers = len(sizes) 42 | self.activation_fn = getattr(activations, activation_fn) 43 | self.activation_fn_prime = getattr(activations, f"{activation_fn}_prime") 44 | 45 | # First term corresponds to layer 0 (input layer). No weights enter the 46 | # input layer and hence self.weights[0] is redundant. 47 | self.weights = [np.array([0])] + [np.random.randn(y, x)/np.sqrt(x) for y, x in 48 | zip(sizes[1:], sizes[:-1])] 49 | 50 | # Input layer does not have any biases. self.biases[0] is redundant. 51 | self.biases = [np.array([0])] + [np.random.randn(y, 1) for y in sizes[1:]] 52 | 53 | # Input layer has no weights, biases associated. Hence z = wx + b is not 54 | # defined for input layer. self.zs[0] is redundant. 55 | self._zs = [np.zeros(bias.shape) for bias in self.biases] 56 | 57 | # Training examples can be treated as activations coming out of input 58 | # layer. Hence self.activations[0] = (training_example). 59 | self._activations = [np.zeros(bias.shape) for bias in self.biases] 60 | 61 | self.mini_batch_size = mini_batch_size 62 | self.lr = learning_rate 63 | 64 | def fit(self, training_data, validation_data=None, epochs=10): 65 | """Fit (train) the Neural Network on provided training data. Fitting is 66 | carried out using Stochastic Gradient Descent Algorithm. 67 | 68 | Parameters 69 | ---------- 70 | training_data : list of tuple 71 | A list of tuples of numpy arrays, ordered as (image, label). 72 | 73 | validation_data : list of tuple, optional 74 | Same as `training_data`, if provided, the network will display 75 | validation accuracy after each epoch. 76 | 77 | """ 78 | for epoch in range(epochs): 79 | random.shuffle(training_data) 80 | mini_batches = [ 81 | training_data[k:k + self.mini_batch_size] for k in 82 | range(0, len(training_data), self.mini_batch_size)] 83 | 84 | for mini_batch in mini_batches: 85 | nabla_b = [np.zeros(bias.shape) for bias in self.biases] 86 | nabla_w = [np.zeros(weight.shape) for weight in self.weights] 87 | for x, y in mini_batch: 88 | self._forward_prop(x) 89 | delta_nabla_b, delta_nabla_w = self._back_prop(x, y) 90 | nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] 91 | nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] 92 | 93 | self.weights = [ 94 | w - (self.lr / self.mini_batch_size) * dw for w, dw in 95 | zip(self.weights, nabla_w)] 96 | self.biases = [ 97 | b - (self.lr / self.mini_batch_size) * db for b, db in 98 | zip(self.biases, nabla_b)] 99 | 100 | if validation_data: 101 | accuracy = self.validate(validation_data) / 100.0 102 | print(f"Epoch {epoch + 1}, accuracy {accuracy} %.") 103 | else: 104 | print(f"Processed epoch {epoch}.") 105 | 106 | def validate(self, validation_data): 107 | """Validate the Neural Network on provided validation data. It uses the 108 | number of correctly predicted examples as validation accuracy metric. 109 | 110 | Parameters 111 | ---------- 112 | validation_data : list of tuple 113 | 114 | Returns 115 | ------- 116 | int 117 | Number of correctly predicted images. 118 | 119 | """ 120 | validation_results = [(self.predict(x) == y) for x, y in validation_data] 121 | return sum(result for result in validation_results) 122 | 123 | def predict(self, x): 124 | """Predict the label of a single test example (image). 125 | 126 | Parameters 127 | ---------- 128 | x : numpy.array 129 | 130 | Returns 131 | ------- 132 | int 133 | Predicted label of example (image). 134 | 135 | """ 136 | 137 | self._forward_prop(x) 138 | return np.argmax(self._activations[-1]) 139 | 140 | def _forward_prop(self, x): 141 | self._activations[0] = x 142 | for i in range(1, self.num_layers): 143 | self._zs[i] = ( 144 | self.weights[i].dot(self._activations[i - 1]) + self.biases[i] 145 | ) 146 | # Use "softmax" for last layer. 147 | if i == self.num_layers - 1: 148 | self._activations[i] = activations.softmax(self._zs[i]) 149 | else: 150 | self._activations[i] = self.activation_fn(self._zs[i]) 151 | 152 | def _back_prop(self, x, y): 153 | nabla_b = [np.zeros(bias.shape) for bias in self.biases] 154 | nabla_w = [np.zeros(weight.shape) for weight in self.weights] 155 | 156 | error = (self._activations[-1] - y) 157 | nabla_b[-1] = error 158 | nabla_w[-1] = error.dot(self._activations[-2].transpose()) 159 | 160 | for l in range(self.num_layers - 2, 0, -1): 161 | error = np.multiply( 162 | self.weights[l + 1].transpose().dot(error), 163 | self.activation_fn_prime(self._zs[l]) 164 | ) 165 | nabla_b[l] = error 166 | nabla_w[l] = error.dot(self._activations[l - 1].transpose()) 167 | 168 | return nabla_b, nabla_w 169 | 170 | def load(self, filename='model.npz'): 171 | """Prepare a neural network from a compressed binary containing weights 172 | and biases arrays. Size of layers are derived from dimensions of 173 | numpy arrays. 174 | 175 | Parameters 176 | ---------- 177 | filename : str, optional 178 | Name of the ``.npz`` compressed binary in models directory. 179 | 180 | """ 181 | npz_members = np.load(os.path.join(os.curdir, 'models', filename)) 182 | 183 | self.weights = list(npz_members['weights']) 184 | self.biases = list(npz_members['biases']) 185 | 186 | # Bias vectors of each layer has same length as the number of neurons 187 | # in that layer. So we can build `sizes` through biases vectors. 188 | self.sizes = [b.shape[0] for b in self.biases] 189 | self.num_layers = len(self.sizes) 190 | 191 | # These are declared as per desired shape. 192 | self._zs = [np.zeros(bias.shape) for bias in self.biases] 193 | self._activations = [np.zeros(bias.shape) for bias in self.biases] 194 | 195 | # Other hyperparameters are set as specified in model. These were cast 196 | # to numpy arrays for saving in the compressed binary. 197 | self.mini_batch_size = int(npz_members['mini_batch_size']) 198 | self.lr = float(npz_members['lr']) 199 | 200 | def save(self, filename='model.npz'): 201 | """Save weights, biases and hyperparameters of neural network to a 202 | compressed binary. This ``.npz`` binary is saved in 'models' directory. 203 | 204 | Parameters 205 | ---------- 206 | filename : str, optional 207 | Name of the ``.npz`` compressed binary in to be saved. 208 | 209 | """ 210 | np.savez_compressed( 211 | file=os.path.join(os.curdir, 'models', filename), 212 | weights=self.weights, 213 | biases=self.biases, 214 | mini_batch_size=self.mini_batch_size, 215 | lr=self.lr 216 | ) 217 | 218 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | wget==3.2 2 | numpy==1.16.5 3 | --------------------------------------------------------------------------------