├── LICENSE
├── README.md
├── constants.py
├── data
    └── mnist
    │   ├── t10k-images-idx3-ubyte
    │   ├── t10k-labels-idx1-ubyte
    │   ├── train-images-idx3-ubyte
    │   └── train-labels-idx1-ubyte
├── loader.py
├── models
    └── naiveScratch.py
├── neuralnet.py
└── neuralnet.pyc


/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2016 Meet Vora
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | MLP Classifier
 2 | ==========================
 3 | A *Handwritten* **Multilayer Perceptron Classifier**
 4 | 
 5 | This python implementation is an extension of artifical neural network discussed in [Python Machine Learning](https://github.com/rasbt/python-machine-learning-book) and  [Neural networks and Deep learning](http://neuralnetworksanddeeplearning.com) by extending the ANN to **deep** neural network &  including **softmax layers**, along with **log-likelihood** *loss function* and **L1** and **L2** *regularization techniques*.
 6 | 
 7 | ### Some Basics
 8 | An artificial neuron is mathematical function conceived as a model of biological neurons. Each of the nodes in the diagram is a a neuron, which transfer their information to the next layer through transfer function.
 9 | 
10 | ![artificial-neuron](https://upload.wikimedia.org/wikipedia/commons/thumb/6/60/ArtificialNeuronModel_english.png/600px-ArtificialNeuronModel_english.png)
11 | 
12 | The transfer function is a linear combination of the input neurons and a fixed value - *bias* (threshold in figure). The coefficients of the input neurons are *weights*.  
13 | In the code, bias is a numpy array of size(layers-1) as input layer do not have a bias. The weights, also a numpy array, form a matrix for every two layers in the network.  
14 | Activation function is the output of the given neuron.
15 | 
16 | ```python
17 | X: vectorize{(j-1)th layer}
18 | 
19 |     w = weights[j-1]
20 |     bias = threshold[j-1]
21 |     transfer_function = dot_product(w, X)
22 |     o = activation(transfer_function + bias)
23 | ````
24 | 
25 | ### Details
26 | The implementation includes two types of artificial neurons:
27 |   * Sigmoid Neurons  
28 |   * Softmax Neurons
29 | 
30 | The loss function associated with Softmax function is the *log-likelihood function*, while the loss function for Sigmoid function is the the *cross-entropy function*. The calculus for both loss functions have been discussed within the code. 
31 | 
32 | Further, the two most common regularization techiques - *L1* and *L2* have been used to prevent overfitting of training data.
33 | 
34 | ##### Why Softmax?
35 | For **z**<sup>L</sup><sub>j</sub> in some vector **Z**<sup>L</sup>, *softmax(**z**<sup>L</sup><sub>j</sub>)* is defined as  
36 | ![](https://s31.postimg.org/f44eizfm3/Screenshot_from_2016_06_16_04_54_45.png)  
37 | The output from the softmax layer can be thought of as a probability distribution.  
38 | ![](https://s31.postimg.org/4399dynd7/Screenshot_from_2016_06_16_04_50_36.png) 
39 |   
40 | In many problems it is convenient to be able to interpret the output activation ***O(j)*** as the network's estimate of the probability that the correct output is ***j***.
41 | 
42 | Refer these [notes](https://www.ics.uci.edu/~pjsadows/notes.pdf) for calculus of softmax function.  
43 | [Source](http://yann.lecun.com/exdb/mnist/) of MNIST training data-set.


--------------------------------------------------------------------------------
/constants.py:
--------------------------------------------------------------------------------
 1 | """Definitions of Constants used in neuralnet""" 
 2 | 
 3 | #	Activation functions
 4 | SIGMOID = 'sigmoid'
 5 | SOFTMAX = 'softmax'
 6 | 
 7 | #	Regularization method
 8 | L2 = 'L2'
 9 | L1 = 'L1'
10 | 


--------------------------------------------------------------------------------
/data/mnist/t10k-images-idx3-ubyte:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meetvora/mlp-classifier/0af5759e95cbfc730d0b733d65b87a6186e49fcc/data/mnist/t10k-images-idx3-ubyte


--------------------------------------------------------------------------------
/data/mnist/t10k-labels-idx1-ubyte:
--------------------------------------------------------------------------------
1 |     ' 		 	 		   		  				 						   		  	  	 				 		 		  					      			  	 	   		 			 		   		 			 		 	 		 	   		 	  	   	 		  	 				    					  	    	  		 		  	 	   	 		 	   			 			  	  		 	 	 	 		 		 			 			 	 	 	 	  	   	   	 	 		   		   						 	 		 	    	 						 	 		 	  		 		 		  	   			 	    			   	 		 				 			 		  	  	  	 		       				  	  			   			 		 	 			   	 	 	 	 				   		   	  		   		 	 	  	 	  		  		 								   		 		    				   	  	 	 	 		 			  	 			   			 	 	  			  	 			  	   		  	    		 			     	 			  		    	  		  	    						 			  					 	 		   	 		 			   				    		   			    					    	 	    		      		        	     	 				 		   	     						  			 				 		 	 	   			  	      		  		   	 			   			   	 		    		 			  					  	 				 					   			  	 			 	 		 						 	  	 	    	   	  								  	  		 				  	 			 			 				 	  		 										 		  	 	   	 	 	 	   								 		  	     			 				   	 		 							 	  	   	  		 				 	 								  	  	 		  	 		  	  					  	 	  				   		   						  		  						   	 	 	 		 	 		   	 		  	  	 					 	  	      			 	 	 	   		  	  		   					 	 	 	 	   	  	    		   	 	  		 	 			 	  	 	 				    		 	    			 	 	   		 	 				 		  	    	  	 		 	 		 	 	   	 	 	 		 	    		   			 		 	 	 	  	  		     	 				 	 	 	 					 		        		 	     	 		  						    	 	 	     	 				 			    		 		    	  		 		 	 	 				 		 	 	 		      		  	 		  	 	 	    	 		  							    	 	 	  	     							  		  	 	 	  				 		 	 	 	   	 	 	    	 	 					 	   	 	 	 		 	  	 		   		 		  	    			 	 		 			  	 	 	   	 	 					 	   	 	 			 		  		  	 	   	  	 	 		  	 			 	  		  	  	 		 	  	 	  		   	 	 	 		   				  	  	 		   	 	 	 	 		   	 	 		    			   		 	 	 	 	 	 	 	  		 		 		     	 		 	 	  	 	 			 	 	 		    			  	 	 	 	    		 		    	  		 		 	 	 				    		 	    			 	 	 	 			 	       		 				 	 	 			 				    		   			   		 	 	 			    			  	 	  	 	 				       					 	  			  	  	   	 	 	 	  		 	 		 	      				 	 	 	   		 	 					 	 	 	    		  	  		   		 	 	 	 		    				   	  	 		 	   		 	  		 	 	     	 	 


--------------------------------------------------------------------------------
/data/mnist/train-images-idx3-ubyte:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meetvora/mlp-classifier/0af5759e95cbfc730d0b733d65b87a6186e49fcc/data/mnist/train-images-idx3-ubyte


--------------------------------------------------------------------------------
/data/mnist/train-labels-idx1-ubyte:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meetvora/mlp-classifier/0af5759e95cbfc730d0b733d65b87a6186e49fcc/data/mnist/train-labels-idx1-ubyte


--------------------------------------------------------------------------------
/loader.py:
--------------------------------------------------------------------------------
 1 | import os, struct
 2 | import numpy as np
 3 | 
 4 | def load_mnist(path, kind='train'):
 5 | 	"""  load_mnist('mnist') => returns (images, labels)"""
 6 | 	labels_path = os.path.join('data', path, '%s-labels-idx1-ubyte' % kind)
 7 | 	images_path = os.path.join('data', path, '%s-images-idx3-ubyte' % kind)
 8 | 	with open(labels_path, 'rb') as lpath:
 9 | 		magic, n = struct.unpack('>II', lpath.read(8))
10 | 		labels = np.fromfile(lpath, dtype=np.uint8)
11 | 	with open(images_path, 'rb') as imgpath:
12 | 		magic, num, rows, cols = struct.unpack(">IIII", imgpath.read(16))
13 | 		images = np.fromfile(imgpath,  dtype=np.uint8).reshape(len(labels), 784)
14 | 	return images, labels


--------------------------------------------------------------------------------
/models/naiveScratch.py:
--------------------------------------------------------------------------------
 1 | import math
 2 | 
 3 | class NaiveBayesClassifier(object):
 4 | 	def __init__(self, x, y): 
 5 | 		self.classes = set(y)
 6 | 		self.class_count = len(set(y))
 7 | 		self.train = zip(x,y)
 8 | 	
 9 | 	def classProb(self, clss, input, method='regular'):
10 | 		try:
11 | 			x_probab = []
12 | 			clssProbability = len(filter(lambda u: u[1] == clss, self.train))/float(len(self.train))
13 | 			if method == 'regular':
14 | 				clss_instances = filter(lambda u: u[1] == clss, self.train)
15 | 				for i in range(len(input)):
16 | 					x_occurrence = len(filter(lambda u: u[0][i]== input[i], clss_instances))
17 | 					x_probab.append(x_occurrence/len(clss_instances))	
18 | 			if method == 'gaussian':
19 | 				x_mean, x_stdev = self._classMean(clss), self._classStd(clss)
20 | 				for i in range(len(input)):
21 | 					x, mean, stdev = input[i], x_mean[i], x_stdev[i]
22 | 					if stdev == 0:
23 | 						continue
24 | 					exponent = math.exp(-(math.pow(x-mean,2)/float(2*math.pow(stdev,2))))
25 | 					x_probab.append((1.0 / (math.sqrt(2*math.pi) * stdev)) * exponent)
26 | 			return (reduce(lambda x,y: x*y, x_probab) * clssProbability)
27 | 		except Exception as e:
28 | 			print e
29 | 	
30 | 	def _classMean(self, clss):
31 | 		def mean(values):
32 | 			return sum(values)/len(values)
33 | 		clss_instances = filter(lambda u: u[1] == clss, self.train)
34 | 		X = [u[0] for u in clss_instances]
35 | 		x_mean = tuple([mean(attr) for attr in zip(*X)])
36 | 		return x_mean
37 | 
38 | 	def _classStd(self, clss):
39 | 		def stdev(values):
40 | 			avg = sum(values)/len(values)
41 | 			variance = sum([pow(x-avg,2) for x in values])/float(len(values)-1)
42 | 			return math.sqrt(variance)
43 | 		clss_instances = filter(lambda u: u[1] == clss, self.train)
44 | 		X = [u[0] for u in clss_instances]
45 | 		x_stdev = tuple([stdev(attr) for attr in zip(*X)])
46 | 		return x_stdev
47 | 
48 | 	def predictClass(self, input, method='regular'):
49 | 		probabilityMap = {}
50 | 		for clss in self.classes:
51 | 			probabilityMap[clss] = classProb(clss, input, method)
52 | 		print 'Prediction:', max(probabilityMap, key=probabilityMap.get)


--------------------------------------------------------------------------------
/neuralnet.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from scipy.special import expit
  3 | from constants import *
  4 | 
  5 | class NeuralNetMLP(object):
  6 | 	def __init__(self, layers, random_state=None):
  7 | 		""" Initialise the layers as list(input_layer, ...hidden_layers..., output_layer) """
  8 | 		np.random.seed(random_state)
  9 | 		self.num_layers = len(layers)
 10 | 		self.layers = layers
 11 | 		self.initialize_weights()
 12 | 
 13 | 	def initialize_weights(self):
 14 | 		""" Randomly generate biases and weights for hidden layers. 
 15 | 		Weights have a Gaussian distribution with mean 0 and
 16 | 		standard deviation 1 over the square root of the number
 17 | 		of weights connecting to the same neuron """
 18 | 		self.biases = [np.random.randn(y, 1) for y in self.layers[1:]]
 19 | 		self.weights = [np.random.randn(y, x)/np.sqrt(x) for x, y in zip(self.layers[:-1], self.layers[1:])]
 20 | 
 21 | 	def fit(self, training_data, l1=0.0, l2=0.0, epochs=500, eta=0.001, minibatches=1, regularization = L2):
 22 | 		""" Fits the parameters according to training data.
 23 | 		l1(2) is the L1(2) regularization coefficient. """
 24 | 		self.l1 = l1
 25 | 		self.l2 = l2
 26 | 		n = len(training_data)
 27 | 		for epoch in xrange(epochs):
 28 | 			random.shuffle(training_data)
 29 | 			mini_batches = [training_data[k:k+mini_batch_size] for k in xrange(0, n, minibatches)]
 30 | 			for mini_batch in mini_batches:
 31 | 				self.batch_update( mini_batch, eta, len(training_data), regularization)
 32 | 
 33 | 	def batch_update(self, mini_batch, eta, n, regularization=L2):
 34 | 		""" Update the network's weights and biases by applying gradient
 35 | 		descent using backpropagation to a single mini batch. """
 36 | 		nabla_b = [np.zeroes(b.shape) for b in self.biases]
 37 | 		nabla_w = [np.zeros(w.shape) for w in self.weights]
 38 | 		for x, y in mini_batch:
 39 | 			delta_nabla_b, delta_nabla_w = self.back_propogation(x, y)
 40 | 			nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
 41 | 			nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
 42 | 		self.biases = [b-(eta/len(mini_batch))*nb for b, nb in zip(self.biases, nabla_b)]
 43 | 		if regularization == L2:
 44 | 			self.weights = [(1-eta*(self.l2/n))*w-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)]
 45 | 		elif regularization == L1:
 46 | 			self.weights = [w - eta*self.l1*np.sign(w)/n-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)]
 47 | 
 48 | 
 49 | 	def back_propogation(self, x, y, fn = SIGMOID):
 50 | 		""" Gradient for cost function is calculated from a(L) and 
 51 | 		back-propogated to the input layer.
 52 | 		Cross Entropy cost functionis associated with sigmoid neurons, while
 53 | 		Log-Likelihood cost function is associated with softmax neurons."""
 54 | 		nabla_b = [np.zeros(b.shape) for b in self.biases]
 55 | 		nabla_w = [np.zeros(w.shape) for w in self.weights]
 56 | 		activation = x
 57 | 		activations = [x]
 58 | 		zs = []    
 59 | 		for b, w in zip(self.biases, self.weights):
 60 | 			z = np.dot(w, activation)+b
 61 | 			zs.append(z)
 62 | 			if fn == SIGMOID:
 63 | 				activation = sigmoid(z)
 64 | 			else:
 65 | 				activation = softmax(z)
 66 | 			activations.append(activation)
 67 | 		dell = delta(activations[-1], y)
 68 | 		nabla_b[-1] = dell
 69 | 		nabla_w[-1] = np.dot(dell, activations[-2].transpose())
 70 | 		for l in xrange(2, self.num_layers -2, 0, -1):
 71 | 			dell = np.dot(self.weights[l+1].transpose(), dell) * derivative(zs[l], fn)
 72 | 			nabla_b[-l] = dell
 73 | 			nabla_w[-l] = np.dot(dell, activations[-l-1].transpose())
 74 | 		return (nabla_b, nabla_w)
 75 | 
 76 | 	def cross_entropy_loss(a, y):
 77 | 		return np.sum(np.nan_to_num(-y*np.log(a)-(1-y)*np.log(1-a)))
 78 | 
 79 | 	def log_likelihood_loss(a, y):
 80 | 		return -np.dot(y, softmax(a).transpose())
 81 | 
 82 | 	def delta(a, y):
 83 | 		""" delta for both activations works out to be the same"""
 84 | 		return (a-y)
 85 | 
 86 | 	def sigmoid(z):
 87 | 		""" expit is equivalent to 1.0/(1.0 + np.exp(-z)) """
 88 | 		return expit(z)
 89 | 
 90 | 	def softmax(z):
 91 | 		e = np.exp(float(z))
 92 | 		return (e/np.sum(e))
 93 | 	
 94 | 	def derivative(z, fn):
 95 | 		""" derivative for f is f(1-f) for respective cost functions """
 96 | 		if fn == SIGMOID:
 97 | 			f = sigmoid
 98 | 		elif fn == SOFTMAX:
 99 | 			f = softmax
100 | 		return f(z)*(1-f(z))


--------------------------------------------------------------------------------
/neuralnet.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meetvora/mlp-classifier/0af5759e95cbfc730d0b733d65b87a6186e49fcc/neuralnet.pyc


--------------------------------------------------------------------------------