├── LICENSE
├── README.md
├── constants.py
├── data
└── mnist
│ ├── t10k-images-idx3-ubyte
│ ├── t10k-labels-idx1-ubyte
│ ├── train-images-idx3-ubyte
│ └── train-labels-idx1-ubyte
├── loader.py
├── models
└── naiveScratch.py
├── neuralnet.py
└── neuralnet.pyc
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2016 Meet Vora
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | MLP Classifier
2 | ==========================
3 | A *Handwritten* **Multilayer Perceptron Classifier**
4 |
5 | This python implementation is an extension of artifical neural network discussed in [Python Machine Learning](https://github.com/rasbt/python-machine-learning-book) and [Neural networks and Deep learning](http://neuralnetworksanddeeplearning.com) by extending the ANN to **deep** neural network & including **softmax layers**, along with **log-likelihood** *loss function* and **L1** and **L2** *regularization techniques*.
6 |
7 | ### Some Basics
8 | An artificial neuron is mathematical function conceived as a model of biological neurons. Each of the nodes in the diagram is a a neuron, which transfer their information to the next layer through transfer function.
9 |
10 | 
11 |
12 | The transfer function is a linear combination of the input neurons and a fixed value - *bias* (threshold in figure). The coefficients of the input neurons are *weights*.
13 | In the code, bias is a numpy array of size(layers-1) as input layer do not have a bias. The weights, also a numpy array, form a matrix for every two layers in the network.
14 | Activation function is the output of the given neuron.
15 |
16 | ```python
17 | X: vectorize{(j-1)th layer}
18 |
19 | w = weights[j-1]
20 | bias = threshold[j-1]
21 | transfer_function = dot_product(w, X)
22 | o = activation(transfer_function + bias)
23 | ````
24 |
25 | ### Details
26 | The implementation includes two types of artificial neurons:
27 | * Sigmoid Neurons
28 | * Softmax Neurons
29 |
30 | The loss function associated with Softmax function is the *log-likelihood function*, while the loss function for Sigmoid function is the the *cross-entropy function*. The calculus for both loss functions have been discussed within the code.
31 |
32 | Further, the two most common regularization techiques - *L1* and *L2* have been used to prevent overfitting of training data.
33 |
34 | ##### Why Softmax?
35 | For **z**Lj in some vector **Z**L, *softmax(**z**Lj)* is defined as
36 | 
37 | The output from the softmax layer can be thought of as a probability distribution.
38 | 
39 |
40 | In many problems it is convenient to be able to interpret the output activation ***O(j)*** as the network's estimate of the probability that the correct output is ***j***.
41 |
42 | Refer these [notes](https://www.ics.uci.edu/~pjsadows/notes.pdf) for calculus of softmax function.
43 | [Source](http://yann.lecun.com/exdb/mnist/) of MNIST training data-set.
--------------------------------------------------------------------------------
/constants.py:
--------------------------------------------------------------------------------
1 | """Definitions of Constants used in neuralnet"""
2 |
3 | # Activation functions
4 | SIGMOID = 'sigmoid'
5 | SOFTMAX = 'softmax'
6 |
7 | # Regularization method
8 | L2 = 'L2'
9 | L1 = 'L1'
10 |
--------------------------------------------------------------------------------
/data/mnist/t10k-images-idx3-ubyte:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meetvora/mlp-classifier/0af5759e95cbfc730d0b733d65b87a6186e49fcc/data/mnist/t10k-images-idx3-ubyte
--------------------------------------------------------------------------------
/data/mnist/t10k-labels-idx1-ubyte:
--------------------------------------------------------------------------------
1 | '
--------------------------------------------------------------------------------
/data/mnist/train-images-idx3-ubyte:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meetvora/mlp-classifier/0af5759e95cbfc730d0b733d65b87a6186e49fcc/data/mnist/train-images-idx3-ubyte
--------------------------------------------------------------------------------
/data/mnist/train-labels-idx1-ubyte:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meetvora/mlp-classifier/0af5759e95cbfc730d0b733d65b87a6186e49fcc/data/mnist/train-labels-idx1-ubyte
--------------------------------------------------------------------------------
/loader.py:
--------------------------------------------------------------------------------
1 | import os, struct
2 | import numpy as np
3 |
4 | def load_mnist(path, kind='train'):
5 | """ load_mnist('mnist') => returns (images, labels)"""
6 | labels_path = os.path.join('data', path, '%s-labels-idx1-ubyte' % kind)
7 | images_path = os.path.join('data', path, '%s-images-idx3-ubyte' % kind)
8 | with open(labels_path, 'rb') as lpath:
9 | magic, n = struct.unpack('>II', lpath.read(8))
10 | labels = np.fromfile(lpath, dtype=np.uint8)
11 | with open(images_path, 'rb') as imgpath:
12 | magic, num, rows, cols = struct.unpack(">IIII", imgpath.read(16))
13 | images = np.fromfile(imgpath, dtype=np.uint8).reshape(len(labels), 784)
14 | return images, labels
--------------------------------------------------------------------------------
/models/naiveScratch.py:
--------------------------------------------------------------------------------
1 | import math
2 |
3 | class NaiveBayesClassifier(object):
4 | def __init__(self, x, y):
5 | self.classes = set(y)
6 | self.class_count = len(set(y))
7 | self.train = zip(x,y)
8 |
9 | def classProb(self, clss, input, method='regular'):
10 | try:
11 | x_probab = []
12 | clssProbability = len(filter(lambda u: u[1] == clss, self.train))/float(len(self.train))
13 | if method == 'regular':
14 | clss_instances = filter(lambda u: u[1] == clss, self.train)
15 | for i in range(len(input)):
16 | x_occurrence = len(filter(lambda u: u[0][i]== input[i], clss_instances))
17 | x_probab.append(x_occurrence/len(clss_instances))
18 | if method == 'gaussian':
19 | x_mean, x_stdev = self._classMean(clss), self._classStd(clss)
20 | for i in range(len(input)):
21 | x, mean, stdev = input[i], x_mean[i], x_stdev[i]
22 | if stdev == 0:
23 | continue
24 | exponent = math.exp(-(math.pow(x-mean,2)/float(2*math.pow(stdev,2))))
25 | x_probab.append((1.0 / (math.sqrt(2*math.pi) * stdev)) * exponent)
26 | return (reduce(lambda x,y: x*y, x_probab) * clssProbability)
27 | except Exception as e:
28 | print e
29 |
30 | def _classMean(self, clss):
31 | def mean(values):
32 | return sum(values)/len(values)
33 | clss_instances = filter(lambda u: u[1] == clss, self.train)
34 | X = [u[0] for u in clss_instances]
35 | x_mean = tuple([mean(attr) for attr in zip(*X)])
36 | return x_mean
37 |
38 | def _classStd(self, clss):
39 | def stdev(values):
40 | avg = sum(values)/len(values)
41 | variance = sum([pow(x-avg,2) for x in values])/float(len(values)-1)
42 | return math.sqrt(variance)
43 | clss_instances = filter(lambda u: u[1] == clss, self.train)
44 | X = [u[0] for u in clss_instances]
45 | x_stdev = tuple([stdev(attr) for attr in zip(*X)])
46 | return x_stdev
47 |
48 | def predictClass(self, input, method='regular'):
49 | probabilityMap = {}
50 | for clss in self.classes:
51 | probabilityMap[clss] = classProb(clss, input, method)
52 | print 'Prediction:', max(probabilityMap, key=probabilityMap.get)
--------------------------------------------------------------------------------
/neuralnet.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from scipy.special import expit
3 | from constants import *
4 |
5 | class NeuralNetMLP(object):
6 | def __init__(self, layers, random_state=None):
7 | """ Initialise the layers as list(input_layer, ...hidden_layers..., output_layer) """
8 | np.random.seed(random_state)
9 | self.num_layers = len(layers)
10 | self.layers = layers
11 | self.initialize_weights()
12 |
13 | def initialize_weights(self):
14 | """ Randomly generate biases and weights for hidden layers.
15 | Weights have a Gaussian distribution with mean 0 and
16 | standard deviation 1 over the square root of the number
17 | of weights connecting to the same neuron """
18 | self.biases = [np.random.randn(y, 1) for y in self.layers[1:]]
19 | self.weights = [np.random.randn(y, x)/np.sqrt(x) for x, y in zip(self.layers[:-1], self.layers[1:])]
20 |
21 | def fit(self, training_data, l1=0.0, l2=0.0, epochs=500, eta=0.001, minibatches=1, regularization = L2):
22 | """ Fits the parameters according to training data.
23 | l1(2) is the L1(2) regularization coefficient. """
24 | self.l1 = l1
25 | self.l2 = l2
26 | n = len(training_data)
27 | for epoch in xrange(epochs):
28 | random.shuffle(training_data)
29 | mini_batches = [training_data[k:k+mini_batch_size] for k in xrange(0, n, minibatches)]
30 | for mini_batch in mini_batches:
31 | self.batch_update( mini_batch, eta, len(training_data), regularization)
32 |
33 | def batch_update(self, mini_batch, eta, n, regularization=L2):
34 | """ Update the network's weights and biases by applying gradient
35 | descent using backpropagation to a single mini batch. """
36 | nabla_b = [np.zeroes(b.shape) for b in self.biases]
37 | nabla_w = [np.zeros(w.shape) for w in self.weights]
38 | for x, y in mini_batch:
39 | delta_nabla_b, delta_nabla_w = self.back_propogation(x, y)
40 | nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
41 | nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
42 | self.biases = [b-(eta/len(mini_batch))*nb for b, nb in zip(self.biases, nabla_b)]
43 | if regularization == L2:
44 | self.weights = [(1-eta*(self.l2/n))*w-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)]
45 | elif regularization == L1:
46 | self.weights = [w - eta*self.l1*np.sign(w)/n-(eta/len(mini_batch))*nw for w, nw in zip(self.weights, nabla_w)]
47 |
48 |
49 | def back_propogation(self, x, y, fn = SIGMOID):
50 | """ Gradient for cost function is calculated from a(L) and
51 | back-propogated to the input layer.
52 | Cross Entropy cost functionis associated with sigmoid neurons, while
53 | Log-Likelihood cost function is associated with softmax neurons."""
54 | nabla_b = [np.zeros(b.shape) for b in self.biases]
55 | nabla_w = [np.zeros(w.shape) for w in self.weights]
56 | activation = x
57 | activations = [x]
58 | zs = []
59 | for b, w in zip(self.biases, self.weights):
60 | z = np.dot(w, activation)+b
61 | zs.append(z)
62 | if fn == SIGMOID:
63 | activation = sigmoid(z)
64 | else:
65 | activation = softmax(z)
66 | activations.append(activation)
67 | dell = delta(activations[-1], y)
68 | nabla_b[-1] = dell
69 | nabla_w[-1] = np.dot(dell, activations[-2].transpose())
70 | for l in xrange(2, self.num_layers -2, 0, -1):
71 | dell = np.dot(self.weights[l+1].transpose(), dell) * derivative(zs[l], fn)
72 | nabla_b[-l] = dell
73 | nabla_w[-l] = np.dot(dell, activations[-l-1].transpose())
74 | return (nabla_b, nabla_w)
75 |
76 | def cross_entropy_loss(a, y):
77 | return np.sum(np.nan_to_num(-y*np.log(a)-(1-y)*np.log(1-a)))
78 |
79 | def log_likelihood_loss(a, y):
80 | return -np.dot(y, softmax(a).transpose())
81 |
82 | def delta(a, y):
83 | """ delta for both activations works out to be the same"""
84 | return (a-y)
85 |
86 | def sigmoid(z):
87 | """ expit is equivalent to 1.0/(1.0 + np.exp(-z)) """
88 | return expit(z)
89 |
90 | def softmax(z):
91 | e = np.exp(float(z))
92 | return (e/np.sum(e))
93 |
94 | def derivative(z, fn):
95 | """ derivative for f is f(1-f) for respective cost functions """
96 | if fn == SIGMOID:
97 | f = sigmoid
98 | elif fn == SOFTMAX:
99 | f = softmax
100 | return f(z)*(1-f(z))
--------------------------------------------------------------------------------
/neuralnet.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/meetvora/mlp-classifier/0af5759e95cbfc730d0b733d65b87a6186e49fcc/neuralnet.pyc
--------------------------------------------------------------------------------