├── LICENSE ├── README.md ├── constants.py ├── data └── mnist │ ├── t10k-images-idx3-ubyte │ ├── t10k-labels-idx1-ubyte │ ├── train-images-idx3-ubyte │ └── train-labels-idx1-ubyte ├── loader.py ├── models └── naiveScratch.py ├── neuralnet.py └── neuralnet.pyc /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 Meet Vora 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | MLP Classifier 2 | ========================== 3 | A *Handwritten* **Multilayer Perceptron Classifier** 4 | 5 | This python implementation is an extension of artifical neural network discussed in [Python Machine Learning](https://github.com/rasbt/python-machine-learning-book) and [Neural networks and Deep learning](http://neuralnetworksanddeeplearning.com) by extending the ANN to **deep** neural network & including **softmax layers**, along with **log-likelihood** *loss function* and **L1** and **L2** *regularization techniques*. 6 | 7 | ### Some Basics 8 | An artificial neuron is mathematical function conceived as a model of biological neurons. Each of the nodes in the diagram is a a neuron, which transfer their information to the next layer through transfer function. 9 | 10 | ![artificial-neuron](https://upload.wikimedia.org/wikipedia/commons/thumb/6/60/ArtificialNeuronModel_english.png/600px-ArtificialNeuronModel_english.png) 11 | 12 | The transfer function is a linear combination of the input neurons and a fixed value - *bias* (threshold in figure). The coefficients of the input neurons are *weights*. 13 | In the code, bias is a numpy array of size(layers-1) as input layer do not have a bias. The weights, also a numpy array, form a matrix for every two layers in the network. 14 | Activation function is the output of the given neuron. 15 | 16 | ```python 17 | X: vectorize{(j-1)th layer} 18 | 19 | w = weights[j-1] 20 | bias = threshold[j-1] 21 | transfer_function = dot_product(w, X) 22 | o = activation(transfer_function + bias) 23 | ```` 24 | 25 | ### Details 26 | The implementation includes two types of artificial neurons: 27 | * Sigmoid Neurons 28 | * Softmax Neurons 29 | 30 | The loss function associated with Softmax function is the *log-likelihood function*, while the loss function for Sigmoid function is the the *cross-entropy function*. The calculus for both loss functions have been discussed within the code. 31 | 32 | Further, the two most common regularization techiques - *L1* and *L2* have been used to prevent overfitting of training data. 33 | 34 | ##### Why Softmax? 35 | For **z**Lj in some vector **Z**L, *softmax(**z**Lj)* is defined as 36 | ![](https://s31.postimg.org/f44eizfm3/Screenshot_from_2016_06_16_04_54_45.png) 37 | The output from the softmax layer can be thought of as a probability distribution. 38 | ![](https://s31.postimg.org/4399dynd7/Screenshot_from_2016_06_16_04_50_36.png) 39 | 40 | In many problems it is convenient to be able to interpret the output activation ***O(j)*** as the network's estimate of the probability that the correct output is ***j***. 41 | 42 | Refer these [notes](https://www.ics.uci.edu/~pjsadows/notes.pdf) for calculus of softmax function. 43 | [Source](http://yann.lecun.com/exdb/mnist/) of MNIST training data-set. -------------------------------------------------------------------------------- /constants.py: -------------------------------------------------------------------------------- 1 | """Definitions of Constants used in neuralnet""" 2 | 3 | # Activation functions 4 | SIGMOID = 'sigmoid' 5 | SOFTMAX = 'softmax' 6 | 7 | # Regularization method 8 | L2 = 'L2' 9 | L1 = 'L1' 10 | -------------------------------------------------------------------------------- /data/mnist/t10k-images-idx3-ubyte: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meetvora/mlp-classifier/0af5759e95cbfc730d0b733d65b87a6186e49fcc/data/mnist/t10k-images-idx3-ubyte -------------------------------------------------------------------------------- /data/mnist/t10k-labels-idx1-ubyte: -------------------------------------------------------------------------------- 1 | '                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             -------------------------------------------------------------------------------- /data/mnist/train-images-idx3-ubyte: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meetvora/mlp-classifier/0af5759e95cbfc730d0b733d65b87a6186e49fcc/data/mnist/train-images-idx3-ubyte -------------------------------------------------------------------------------- /data/mnist/train-labels-idx1-ubyte: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meetvora/mlp-classifier/0af5759e95cbfc730d0b733d65b87a6186e49fcc/data/mnist/train-labels-idx1-ubyte -------------------------------------------------------------------------------- /loader.py: -------------------------------------------------------------------------------- 1 | import os, struct 2 | import numpy as np 3 | 4 | def load_mnist(path, kind='train'): 5 | """ load_mnist('mnist') => returns (images, labels)""" 6 | labels_path = os.path.join('data', path, '%s-labels-idx1-ubyte' % kind) 7 | images_path = os.path.join('data', path, '%s-images-idx3-ubyte' % kind) 8 | with open(labels_path, 'rb') as lpath: 9 | magic, n = struct.unpack('>II', lpath.read(8)) 10 | labels = np.fromfile(lpath, dtype=np.uint8) 11 | with open(images_path, 'rb') as imgpath: 12 | magic, num, rows, cols = struct.unpack(">IIII", imgpath.read(16)) 13 | images = np.fromfile(imgpath, dtype=np.uint8).reshape(len(labels), 784) 14 | return images, labels -------------------------------------------------------------------------------- /models/naiveScratch.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | class NaiveBayesClassifier(object): 4 | def __init__(self, x, y): 5 | self.classes = set(y) 6 | self.class_count = len(set(y)) 7 | self.train = zip(x,y) 8 | 9 | def classProb(self, clss, input, method='regular'): 10 | try: 11 | x_probab = [] 12 | clssProbability = len(filter(lambda u: u[1] == clss, self.train))/float(len(self.train)) 13 | if method == 'regular': 14 | clss_instances = filter(lambda u: u[1] == clss, self.train) 15 | for i in range(len(input)): 16 | x_occurrence = len(filter(lambda u: u[0][i]== input[i], clss_instances)) 17 | x_probab.append(x_occurrence/len(clss_instances)) 18 | if method == 'gaussian': 19 | x_mean, x_stdev = self._classMean(clss), self._classStd(clss) 20 | for i in range(len(input)): 21 | x, mean, stdev = input[i], x_mean[i], x_stdev[i] 22 | if stdev == 0: 23 | continue 24 | exponent = math.exp(-(math.pow(x-mean,2)/float(2*math.pow(stdev,2)))) 25 | x_probab.append((1.0 / (math.sqrt(2*math.pi) * stdev)) * exponent) 26 | return (reduce(lambda x,y: x*y, x_probab) * clssProbability) 27 | except Exception as e: 28 | print e 29 | 30 | def _classMean(self, clss): 31 | def mean(values): 32 | return sum(values)/len(values) 33 | clss_instances = filter(lambda u: u[1] == clss, self.train) 34 | X = [u[0] for u in clss_instances] 35 | x_mean = tuple([mean(attr) for attr in zip(*X)]) 36 | return x_mean 37 | 38 | def _classStd(self, clss): 39 | def stdev(values): 40 | avg = sum(values)/len(values) 41 | variance = sum([pow(x-avg,2) for x in values])/float(len(values)-1) 42 | return math.sqrt(variance) 43 | clss_instances = filter(lambda u: u[1] == clss, self.train) 44 | X = [u[0] for u in clss_instances] 45 | x_stdev = tuple([stdev(attr) for attr in zip(*X)]) 46 | return x_stdev 47 | 48 | def predictClass(self, input, method='regular'): 49 | probabilityMap = {} 50 | for clss in self.classes: 51 | probabilityMap[clss] = classProb(clss, input, method) 52 | print 'Prediction:', max(probabilityMap, key=probabilityMap.get) -------------------------------------------------------------------------------- /neuralnet.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from scipy.special import expit 3 | from constants import * 4 | 5 | class NeuralNetMLP(object): 6 | def __init__(self, layers, random_state=None): 7 | """ Initialise the layers as list(input_layer, ...hidden_layers..., output_layer) """ 8 | np.random.seed(random_state) 9 | self.num_layers = len(layers) 10 | self.layers = layers 11 | self.initialize_weights() 12 | 13 | def initialize_weights(self): 14 | """ Randomly generate biases and weights for hidden layers. 15 | Weights have a Gaussian distribution with mean 0 and 16 | standard deviation 1 over the square root of the number 17 | of weights connecting to the same neuron """ 18 | self.biases = [np.random.randn(y, 1) for y in self.layers[1:]] 19 | self.weights = [np.random.randn(y, x)/np.sqrt(x) for x, y in zip(self.layers[:-1], self.layers[1:])] 20 | 21 | def fit(self, training_data, l1=0.0, l2=0.0, epochs=500, eta=0.001, minibatches=1, regularization = L2): 22 | """ Fits the parameters according to training data. 23 | l1(2) is the L1(2) regularization coefficient. """ 24 | self.l1 = l1 25 | self.l2 = l2 26 | n = len(training_data) 27 | for epoch in xrange(epochs): 28 | random.shuffle(training_data) 29 | mini_batches = [training_data[k:k+mini_batch_size] for k in xrange(0, n, minibatches)] 30 | for mini_batch in mini_batches: 31 | self.batch_update( mini_batch, eta, len(training_data), regularization) 32 | 33 | def batch_update(self, mini_batch, eta, n, regularization=L2): 34 | """ Update the network's weights and biases by applying gradient 35 | descent using backpropagation to a single mini batch. """ 36 | nabla_b = [np.zeroes(b.shape) for b in self.biases] 37 | nabla_w = [np.zeros(w.shape) for w in self.weights] 38 | for x, y in mini_batch: 39 | delta_nabla_b, delta_nabla_w = self.back_propogation(x, y) 40 | nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)] 41 | nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)] 42 | self.biases = [b-(eta/len(mini_batch))*nb for b, nb in zip(self.biases, nabla_b)] 43 | if regularization == L2: 44 | self.weights = [(1-eta*(self.l2/n))*w-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)] 45 | elif regularization == L1: 46 | self.weights = [w - eta*self.l1*np.sign(w)/n-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)] 47 | 48 | 49 | def back_propogation(self, x, y, fn = SIGMOID): 50 | """ Gradient for cost function is calculated from a(L) and 51 | back-propogated to the input layer. 52 | Cross Entropy cost functionis associated with sigmoid neurons, while 53 | Log-Likelihood cost function is associated with softmax neurons.""" 54 | nabla_b = [np.zeros(b.shape) for b in self.biases] 55 | nabla_w = [np.zeros(w.shape) for w in self.weights] 56 | activation = x 57 | activations = [x] 58 | zs = [] 59 | for b, w in zip(self.biases, self.weights): 60 | z = np.dot(w, activation)+b 61 | zs.append(z) 62 | if fn == SIGMOID: 63 | activation = sigmoid(z) 64 | else: 65 | activation = softmax(z) 66 | activations.append(activation) 67 | dell = delta(activations[-1], y) 68 | nabla_b[-1] = dell 69 | nabla_w[-1] = np.dot(dell, activations[-2].transpose()) 70 | for l in xrange(2, self.num_layers -2, 0, -1): 71 | dell = np.dot(self.weights[l+1].transpose(), dell) * derivative(zs[l], fn) 72 | nabla_b[-l] = dell 73 | nabla_w[-l] = np.dot(dell, activations[-l-1].transpose()) 74 | return (nabla_b, nabla_w) 75 | 76 | def cross_entropy_loss(a, y): 77 | return np.sum(np.nan_to_num(-y*np.log(a)-(1-y)*np.log(1-a))) 78 | 79 | def log_likelihood_loss(a, y): 80 | return -np.dot(y, softmax(a).transpose()) 81 | 82 | def delta(a, y): 83 | """ delta for both activations works out to be the same""" 84 | return (a-y) 85 | 86 | def sigmoid(z): 87 | """ expit is equivalent to 1.0/(1.0 + np.exp(-z)) """ 88 | return expit(z) 89 | 90 | def softmax(z): 91 | e = np.exp(float(z)) 92 | return (e/np.sum(e)) 93 | 94 | def derivative(z, fn): 95 | """ derivative for f is f(1-f) for respective cost functions """ 96 | if fn == SIGMOID: 97 | f = sigmoid 98 | elif fn == SOFTMAX: 99 | f = softmax 100 | return f(z)*(1-f(z)) -------------------------------------------------------------------------------- /neuralnet.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meetvora/mlp-classifier/0af5759e95cbfc730d0b733d65b87a6186e49fcc/neuralnet.pyc --------------------------------------------------------------------------------