├── README.md
├── __init__.py
├── algebra_helpers.py
├── data
    └── mldata
    │   └── mnist-original.mat
├── eval_nn.png
├── mlp_np.py
├── mlp_plain.py
└── requirements.txt


/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | # Multi Layer Perceptron from Scratch Using Python3 
  3 | 
  4 | ## Table of contents
  5 | 
  6 | * [Intro](#intro)
  7 | * [Structure and Components](#structure-and-components)
  8 | * [Prerequisites](#prerequisites)
  9 | * [Quickstart](#quickstart)
 10 | * [Experiments and Evaluation](#experiments-and-evaluation)
 11 | * [To-Dos](#to-dos)
 12 | * [Final Thoughts](#final-thoughts-and-future-work)
 13 | * [References](#references)
 14 | 
 15 | 
 16 | ## Intro
 17 | To better understand the processes in a multi layer perceptron, this projects implements a simple mlp from scratch using no external machine learning libraries. Algebraic or calculus libraries are just used in a saving manner. 
 18 | This is a multi layer perceptron written in Python 3.
 19 | ## Structure and Components
 20 | This project contains three modules:
 21 | - mlp_np.py uses NumPy for linear algebra and calculus operations
 22 | - mlp_plain.py uses **no** additional libraries in the feed forward and backpropagation process
 23 | - algebra_helpers.py contains methods for linear algebra
 24 | 
 25 | The mlp consists of an input layer a hidden layer and an output layer. The hidden layer uses a ReLU activation function, sigmoid is available, too. The output layer uses a softmax function to predict a multinomial problem. The input data labels needs to be encoded as one hot vectors. You can find a one hot vector encoder and decoder for multinomial classes in the code.
 26 | The mlp has three optimization algorithms implemented:
 27 | - Stochastic gradient descent
 28 | - Momentum
 29 | - AdaGrad
 30 | 
 31 | All three optimizers are based on calculating gradients per sample.
 32 | The scripts contain the following methods:
 33 | 
 34 | 1. **fit**
 35 | 
 36 | >Fits the model to the data. If verbose mode, returns the number of number of seen samples as a list and the suitable accuracy scores as a list.
 37 | 2. **predict**
 38 | 
 39 | >Makes predictions of given dataset samples and labels using the fitted model.
 40 | 
 41 | The mlp class takes following parameters.
 42 | - dataset samples as a list of lists
 43 | - dataset labels as a list of lists
 44 | - size of the hidden layer (default: 30)
 45 | - eta hyperparameter (default: 0.1)
 46 | - my hyperparameter (default: 0.9)
 47 | - number of epochs (default: 10)
 48 | - optimizer type (default: SGD)
 49 | - show intermediate steps, including accuracy score per epoch (default: True)
 50 | 
 51 | ## Prerequisites
 52 | - Python 3.4+
 53 | 	- Scikit Learn zur Evaluation und Datenbeschaffung
 54 |     - NumPy
 55 | 	- matplotlib
 56 | ## Quickstart
 57 | ```
 58 | $ virtualenv -p python3 venv
 59 | $ source venv/bin/activate  
 60 | $ pip install -r requirements.txt  
 61 | ```
 62 | Congratulations! You are now ready to train the mlp recognizing handwritten digits.
 63 | The scripts by default uses the MNIST dataset for fitting. Generally it can run on any dataset fullfilling the sklearn dataset format.
 64 | The scripts will output the mean error per epoch, the accuracy on the test set and the accuracy on the training set for each of the three optimizers. Finally it will create a graph, displaying the accuracy on seen samples (see evaluation section).
 65 | To run the mlp with NumPy just type in:
 66 | ```
 67 | $ python mlp_np.py
 68 | ```
 69 | Example output:
 70 | ```
 71 | Epoch:  0  - Error:  4.46429073186e-09  - Accuracy im Testset:  0.944415584416
 72 | Epoch:  0  - Error:  4.46429073186e-09  - Accuracy im Trainingsset:  0.947846481876
 73 | ############################################
 74 | ```
 75 | To run the plain mlp type:
 76 | ```
 77 | $ python mlp_plain.py
 78 | ```
 79 | If you want to change hyperparameters and optimizer types, edit lines 212+ in the script.
 80 | ```python
 81 | etas = [0.5,0.8,1,2]
 82 | for eta in etas:
 83 |     NN = NeuralNetwork(samples=X_train, labels=y_train, eta=eta, epochs=50, size_hidden=40, optimizer="sgd", verbose=True)
 84 |     fitted = NN.fit()
 85 | 
 86 | ```
 87 | ## Experiments and Evaluation
 88 | The dataset is splitted into a trainingset (46900 samples) and a testset (23100 samples) using the train_test_split method of sklearn. 
 89 | For the evaluation mlp_np.py is used, as it performs much faster, than mlp_plain.py. Seeing 46900 Samples mlp_np.py needs 14 seconds and mlp_plain.py 399 seconds.
 90 | The model is evaluated after **2.400.000** samples are seen. The accuracy scores are based on the model performance on the test set. 
 91 | 
 92 | The hyperparameters for each optimizer are noted in the table. They have been chosen after some evaluation test runs. A deeper hyperparameter optimization havent been done so far (see to-dos):
 93 | 
 94 | | Optimizer  | Best Accuracy score |Mean total Error|
 95 | | ------------- | ------------- |-------------|
 96 | | SGD  *Eta=3.5*|  97,1% | 2.97e-12|
 97 | | Momentum *Eta=0.1, My=0.9*| 96,5%  |1.97e-09 |
 98 | | AdaGrad *Eta=1*| 96,3%|1.22e-08|
 99 | 
100 | ![Evaluation Curve - Accuracy vs Samples seen](https://github.com/MaviccPRP/mlp_from_scratch/blob/master/eval_nn.png)
101 | ## To-Dos
102 | - [X] Implement per sample SGD
103 | - [ ] Implementing mini-batch 
104 | - [ ] Implement the AdaGrad with diagonal matrices
105 | - [ ] Gridsearch for optimizing hyperparameters
106 | - [ ] Compare evaluation performance on testset and trainingset
107 | 
108 | ## Final Thoughts and Future Work
109 | 
110 | - Best results are reached using the SGD per sample algorithm. 
111 | - Momentum is oscillating, when higher percentages are reached. This could be because of the per sample approach. 
112 | - The AdaGrad algorithms is smoother than Momentum and SGD, but does not reach accuracy scores of SGD or Momentum.
113 | - In future work, implementing mini-natches could help improving accuracy performance and smoothness of Momentum and AdaGrad. 
114 | - Still hyperparameters are not fully evaluated, this could be done using Gridsearch.
115 | - Additionally the AdaGrad algorithm could be updated with a diagonal matrix approach.
116 | - Comparing accuracy on test and training sets, to exclude overfitting.
117 | ## References
118 | - Principles of training multi-layer neural network using backpropagation, http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html
119 | - A Step by Step Backpropagation Example, https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
120 | - Julia Kreutzer, Julian Hitschler, Neural Networks: Architectures and Applications for NLP, http://www.cl.uni-heidelberg.de/courses/ws16/neuralnetworks/slides/session02.pdf
121 | - Julia Kreutzer, Julian Hitschler, Neural Networks: Architectures and Applications for NLP, http://www.cl.uni-heidelberg.de/courses/ws16/neuralnetworks/slides/session03.pdf
122 | - A Neural Network in 13 lines of Python (Part 2 - Gradient Descent), http://iamtrask.github.io/2015/07/27/python-network-part2/
123 | - How the backpropagation algorithm works, http://neuralnetworksanddeeplearning.com/chap2.html
124 | - Paul Fackler, Notes on Matrix Calculus, http://www4.ncsu.edu/%7Epfackler/MatCalc.pdf
125 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MaviccPRP/mlp_from_scratch/7032036147a2ad29e3fd3226f99e72eac76b0add/__init__.py


--------------------------------------------------------------------------------
/algebra_helpers.py:
--------------------------------------------------------------------------------
 1 | # Some helper functions for linear algebra
 2 | 
 3 | # scalar multiplication
 4 | def scalar_mult(k, X):
 5 |     res = []
 6 |     for i in X:
 7 |         row = []
 8 |         for j in i:
 9 |             row.append(k * j)
10 |         res.append(row)
11 |     return res
12 | 
13 | 
14 | 
15 | # matrix mutiplication
16 | def dot(X,Y):
17 |     if len(X[0]) != len(Y):
18 |         return "Matrices do not match for multiplication"
19 |     return [[sum(a*b for a,b in zip(X_row,Y_col)) for Y_col in zip(*Y)] for X_row in X]
20 | 
21 | # hadamard product
22 | def hadamard(X,Y):
23 |     res = []
24 |     if not any(isinstance(el, list) for el in X):
25 |         for i in Y:
26 |             row = []
27 |             for d, j in enumerate(X):
28 |                 row.append(i[d] * j)
29 |             res.append(row)
30 |     else:
31 |         for i, j in zip(X,Y):
32 |             row = []
33 |             if isinstance(i, list):
34 |                 for k, l in zip(i,j):
35 |                     row.append(k*l)
36 |                 res.append(row)
37 | 
38 |     return res
39 | 
40 | 
41 | # element wise summing up of two matrices
42 | def summarize(X,Y):
43 |     res = []
44 |     for i, j in zip(X, Y):
45 |         row = []
46 |         for k, l in zip(i, j):
47 |             row.append(k + l)
48 |         res.append(row)
49 |     return res
50 | 
51 | def substract(X,Y):
52 |     res = []
53 |     for i, j in zip(X, Y):
54 |         row = []
55 |         for k, l in zip(i, j):
56 |             row.append(k - l)
57 |         res.append(row)
58 |     return res
59 | 
60 | def transpose(X):
61 |     return list(zip(*X))
62 | 
63 | 
64 | def sum_matrix(X, axis=0):
65 |     if axis == 0:
66 |         return [sum(row[i] for row in X) for i in range(len(X[0]))]
67 |     if axis == 1:
68 |         rows = len(X)
69 |         cols = len(X[0])
70 |         total = []
71 |         for x in range(0, rows):
72 |             rowtotal = 0
73 |             for y in range(0, cols):
74 |                 rowtotal = rowtotal + X[x][y]
75 |             total.append(rowtotal)
76 |         return total
77 | 
78 | 


--------------------------------------------------------------------------------
/data/mldata/mnist-original.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MaviccPRP/mlp_from_scratch/7032036147a2ad29e3fd3226f99e72eac76b0add/data/mldata/mnist-original.mat


--------------------------------------------------------------------------------
/eval_nn.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/MaviccPRP/mlp_from_scratch/7032036147a2ad29e3fd3226f99e72eac76b0add/eval_nn.png


--------------------------------------------------------------------------------
/mlp_np.py:
--------------------------------------------------------------------------------
  1 | # Class of a NeuralNetwork with SGD, Momentum and AdaGrad
  2 | import numpy as np
  3 | import matplotlib.pyplot as plt  # Plotting library
  4 | plt.matplotlib.use('Agg')
  5 | from sklearn.datasets import load_digits
  6 | from sklearn.preprocessing import OneHotEncoder
  7 | from sklearn.model_selection import train_test_split
  8 | from sklearn.metrics import accuracy_score
  9 | from sklearn.datasets import fetch_mldata
 10 | 
 11 | class NeuralNetwork:
 12 |     def __init__(self, samples, labels, size_hidden=30, eta=0.1, my=0.9, epochs=10, optimizer="sgd", verbose=False):
 13 |         '''
 14 | 
 15 |         :param samples: input samples
 16 |         :param labels: input labels
 17 |         :param size_hidden: number of units in hidden layer
 18 |         :param eta: learning rate
 19 |         :param my: learning factor for momentum
 20 |         :param epochs: number of epochs
 21 |         :param optimizer: type of optimizer ("sgd", "momentum", "adagrad")
 22 |         :param verbose: print accuracy and error per epoch
 23 |         '''
 24 |         self.samples = samples
 25 |         self.labels = labels
 26 |         self.w01 = np.random.random((len(self.samples[0]), size_hidden))
 27 |         self.w12 = np.random.random((size_hidden, len(self.labels[0])))
 28 |         self.v01 = np.zeros((len(self.samples[0]), size_hidden))
 29 |         self.v12 = np.zeros((size_hidden, len(self.labels[0])))
 30 |         self.g01 = np.zeros((len(self.samples[0]), size_hidden))
 31 |         self.g12 = np.zeros((size_hidden, len(self.labels[0])))
 32 |         self.b1 = np.array([0])
 33 |         self.b2 = np.array([0])
 34 |         self.eta = eta
 35 |         self.epochs = epochs
 36 |         self.my = my
 37 |         self.optimizer = optimizer
 38 |         self.verbose = verbose
 39 | 
 40 |     def sigmoid(self, x, deriv=False):
 41 |         if (deriv == True):
 42 |             return x * (1 - x)
 43 |         return 1 / (1 + np.exp(-x))
 44 | 
 45 |     def softmax(self, x, deriv=False):
 46 |         if deriv == True:
 47 |             # Return the partial derivation of the activation function
 48 |             return np.multiply(x, 1 - x)
 49 |         y = x - np.max(x)
 50 |         e_x = np.exp(y)
 51 |         return e_x / e_x.sum()
 52 | 
 53 | 
 54 |     def relu(self, x, deriv=False):
 55 |         if deriv == True:
 56 |             return 1. * (x > 0)
 57 |         return x * (x > 0)
 58 | 
 59 |     def fit(self):
 60 |         '''
 61 |         Method to fit the input data and optimize the weights in the neural network
 62 |         :return:
 63 |         '''
 64 |         accuracy = []
 65 |         no_epochs = []
 66 |         sample_no = 0
 67 |         
 68 |         if self.optimizer == "adagrad":
 69 |             # initialize matrix for adagrad
 70 |             gti_01 = np.zeros(len(self.w01[0]))
 71 |             gti_12 = np.zeros(len(self.w12[0]))
 72 | 
 73 |         for epoch in range(self.epochs):
 74 |             for i in range(0, len(self.samples), 1):
 75 |                 sample_no += 1
 76 |                 l0 = self.samples[i:i + 1]
 77 |                 y = self.labels[i:i + 1]
 78 | 
 79 |                 # Feed Forward Pass
 80 |                 l1 = self.relu(np.dot(l0, self.w01) + 1 * self.b1)
 81 |                 l2 = self.softmax(np.dot(l1, self.w12) + 1 * self.b2)
 82 | 
 83 |                 l2_error = ((1 / 2) * np.power((y - l2), 2))
 84 |                 l2_error_total = str(np.mean(np.abs(l2_error)))
 85 | 
 86 |                 if l2_error_total == 1.0:
 87 |                     if self.verbose: print("Overflow")
 88 |                     return
 89 |                 # Backpropagation
 90 |                 # dE_total/douto
 91 |                 l2_delta = (-1 * (y - l2))
 92 |                 # douto/dneto = deriv activation
 93 |                 l2_delta = l2_delta * self.softmax(l2, deriv=True)
 94 |                 # dneth/dw
 95 |                 l2_delta = np.dot(l2_delta.T, l1)
 96 | 
 97 |                 # dEo/neto
 98 |                 # dEo/douto * douto/dneto
 99 |                 l1_delta = ((np.sum(((-1 * (y - l2)) * self.softmax(l2, deriv=True)), axis=0)))
100 |                 # dEo/outh
101 |                 # dEo/neto * dneto/douth
102 |                 l1_delta = l1_delta * self.w12
103 | 
104 |                 # dEtotal/outh = Sum(Eo/outh)
105 |                 l1_delta = np.sum(l1_delta, axis=1)
106 |                 # douth/neth
107 |                 l1_delta = l1_delta * self.relu(l1, deriv=True)
108 |                 # dneth/dw
109 |                 l1_delta = np.dot(l1_delta.T, l0)
110 | 
111 |                 if self.optimizer == "adagrad":
112 |                     # Fundamental idea using https://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descent/
113 |                     # Update Weights using AdaGrad
114 |                     grad_12 = self.eta * l2_delta.T
115 |                     self.g12 += np.power(grad_12, 2)
116 |                     adjusted_grad = grad_12 / np.sqrt(0.0000001 + self.g12)
117 |                     self.w12 = self.w12 - adjusted_grad
118 | 
119 |                     grad_01 = self.eta * l1_delta.T
120 |                     self.g01 += np.power(grad_01, 2)
121 |                     adjusted_grad = grad_01 / np.sqrt(0.0000001 + self.g01)
122 |                     self.w01 = self.w01 - adjusted_grad
123 | 
124 |                 if self.optimizer == "sgd":
125 |                     # Update Weights
126 |                     self.w01 -= (self.eta/((epoch+1)/50)) * l1_delta.T
127 |                     self.w12 -= (self.eta/((epoch+1)/50)) * l2_delta.T
128 | 
129 |                 if self.optimizer == "momentum":
130 |                     # Update Weights using Momentum
131 |                     self.v01 = self.my * self.v01 + self.eta * l1_delta.T
132 |                     self.w01 -= self.v01
133 |                     self.v12 = self.my * self.v12 + self.eta * l2_delta.T
134 |                     self.w12 -= self.v12
135 | 
136 |             if epoch % 1 == 0:
137 |                 if self.verbose:
138 |                     y_pred, y_true = self.predict(X_test, y_test)
139 |                     acc = accuracy_score(y_true, y_pred)
140 |                     print("Epoch: ", epoch, " - Error: ", l2_error_total, " - Accuracy im Testset: ", acc)
141 |                     y_pred, y_true = self.predict(X_train, y_train)
142 |                     print("Epoch: ", epoch, " - Error: ", l2_error_total, " - Accuracy im Trainingsset: ", accuracy_score(y_true, y_pred))
143 |                     print("############################################")
144 | 
145 |                     accuracy.append(acc)
146 |                     no_epochs.append(sample_no)
147 |         if self.verbose:
148 |             return no_epochs, accuracy
149 | 
150 |     def predict(self, test_samples, test_labels):
151 |         '''
152 |         Predict test data using the fitted model
153 |         :param test_samples:
154 |         :param test_labels:
155 |         :return:
156 |         '''
157 |         l1 = self.relu(np.dot(test_samples, self.w01) + 1 * self.b1)
158 |         l2 = self.softmax(np.dot(l1, self.w12) + 1 * self.b2)
159 |         y_pred = (l2 == l2.max(axis=1)[:, None]).astype(float)
160 |         res_pred = []
161 |         res_labels = []
162 | 
163 |         def checkEqual1(iterator):
164 |             iterator = iter(iterator)
165 |             try:
166 |                 first = next(iterator)
167 |             except StopIteration:
168 |                 return True
169 |             return all(first == rest for rest in iterator)
170 | 
171 |         for k in y_pred:
172 |             for i, j in enumerate(k):
173 |                 if int(j) == 1 and not checkEqual1(k):
174 |                     res_pred.append(i)
175 |                     break
176 |                 if checkEqual1(k):
177 |                     res_pred.append(0)
178 |                     break
179 |         for k in test_labels:
180 |             for i, j in enumerate(k):
181 |                 if j == 1.0:
182 |                     res_labels.append(i)
183 | 
184 |         return res_pred, res_labels
185 | 
186 | 
187 | # Prepare dataset and split into test and training data
188 | '''
189 | # Digits
190 | digits = load_digits()
191 | samples = digits.data
192 | y = digits.target.reshape((len(samples),1))
193 | enc = OneHotEncoder()
194 | enc.fit(y)
195 | labels = enc.transform(y).toarray()
196 | X_train, X_test, y_train, y_test = train_test_split(samples, labels, test_size=0.33, random_state=42)
197 | '''
198 | # MNIST
199 | mnist = fetch_mldata('MNIST original', data_home="./data")
200 | samples = mnist.data
201 | samples = samples/(len(samples)*10)
202 | y = mnist.target.reshape((len(samples), 1))
203 | 
204 | enc = OneHotEncoder()
205 | enc.fit(y)
206 | labels = enc.transform(y).toarray()
207 | 
208 | X_train, X_test, y_train, y_test = train_test_split(samples, labels, test_size=0.33, random_state=42)
209 | # Create instance of NeuralNetwork, fit to dataset, predict and print accuracy
210 | 
211 | etas = [3.5]
212 | 
213 | for eta in etas:
214 |     print(eta)
215 |     NN = NeuralNetwork(samples=X_train, labels=y_train, eta=eta, epochs=50, size_hidden=40, optimizer="sgd", verbose=True)
216 |     fitted = NN.fit()
217 |     plt.plot(fitted[0], fitted[1], 'y-', linewidth=2, label='sgd; Eta=3.5')
218 | 
219 |     NN = NeuralNetwork(samples=X_train, labels=y_train, eta=eta, epochs=50, size_hidden=40, optimizer="momentum", verbose=True)
220 |     fitted = NN.fit()
221 |     plt.plot(fitted[0], fitted[1], 'b-', linewidth=2, label='momentum; Eta=0.1, My=0.9')
222 | 
223 |     NN = NeuralNetwork(samples=X_train, labels=y_train, eta=1, epochs=50, size_hidden=40, optimizer="adagrad", verbose=True)
224 |     fitted = NN.fit()
225 |     plt.plot(fitted[0], fitted[1], 'r-', linewidth=2, label='adagrad; Eta=1')
226 | 
227 |     plt.xlabel("Samples seen")
228 |     plt.ylabel("Accuarcy")
229 |     plt.legend(loc='lower right')
230 |     fig = plt.gcf()
231 |     fig.savefig("eval_nn.png")
232 | 
233 |     #y_pred, y_true = NN.predict(X_test, y_test)
234 | 
235 | 
236 |     # print("Accuracy: ",accuracy_score(y_true, y_pred))
237 | 
238 | 


--------------------------------------------------------------------------------
/mlp_plain.py:
--------------------------------------------------------------------------------
  1 | # Class of a NeuralNetwork with SGD, Momentum and AdaGrad
  2 | import numpy as np
  3 | import matplotlib.pyplot as plt  # Plotting library
  4 | from sklearn.datasets import load_digits
  5 | from sklearn.preprocessing import OneHotEncoder
  6 | from sklearn.model_selection import train_test_split
  7 | from sklearn.metrics import accuracy_score
  8 | from sklearn.datasets import fetch_mldata
  9 | from algebra_helpers import dot, hadamard, substract, scalar_mult, transpose, summarize, sum_matrix
 10 | 
 11 | class NeuralNetwork:
 12 |     def __init__(self, samples, labels, size_hidden=30, eta=0.1, my=0.9, epochs=10, optimizer="", verbose=False):
 13 |         '''
 14 | 
 15 |         :param samples: input samples
 16 |         :param labels: input labels
 17 |         :param size_hidden: number of units in hidden layer
 18 |         :param eta: learning rate
 19 |         :param my: learning factor for momentum
 20 |         :param epochs: number of epochs
 21 |         :param optimizer: type of optimizer ("sgd", "momentum", "adagrad")
 22 |         :param verbose: print accuracy and error per epoch
 23 |         '''
 24 |         self.samples = samples
 25 |         self.labels = labels
 26 |         self.w01 = np.random.random((len(self.samples[0]), size_hidden))
 27 |         self.w12 = np.random.random((size_hidden, len(self.labels[0])))
 28 |         self.v01 = np.zeros((len(self.samples[0]), size_hidden))
 29 |         self.v12 = np.zeros((size_hidden, len(self.labels[0])))
 30 |         self.g01 = np.zeros((len(self.samples[0]), size_hidden))
 31 |         self.g12 = np.zeros((size_hidden, len(self.labels[0])))
 32 |         self.b1 = np.array([0])
 33 |         self.b2 = np.array([0])
 34 |         self.eta = eta
 35 |         self.epochs = epochs
 36 |         self.my = my
 37 |         self.optimizer = optimizer
 38 |         self.verbose = verbose
 39 | 
 40 |     def sigmoid(self, x, deriv=False):
 41 |         if (deriv == True):
 42 |             return x * (1 - x)
 43 |         return 1 / (1 + np.exp(-x))
 44 | 
 45 |     def softmax(self, x, deriv=False):
 46 |         if deriv == True:
 47 |             # Return the partial derivation of the activation function
 48 |             return np.multiply(x, 1 - x)
 49 |         y = x - np.max(x)
 50 |         e_x = np.exp(y)
 51 |         return e_x / e_x.sum()
 52 | 
 53 | 
 54 |     def relu(self, x, deriv=False):
 55 |         if deriv == True:
 56 |             return 1. * (x > 0)
 57 |         return x * (x > 0)
 58 | 
 59 |     def fit(self):
 60 |         '''
 61 |         Method to fit the input data and optimize the weights in the neural network
 62 |         :return:
 63 |         '''
 64 |         accuracy = []
 65 |         no_epochs = []
 66 | 
 67 |         if self.optimizer == "adagrad":
 68 |             # initialize matrix for adagrad
 69 |             gti_01 = np.zeros(len(self.w01[0]))
 70 |             gti_12 = np.zeros(len(self.w12[0]))
 71 | 
 72 |         for epoch in range(self.epochs):
 73 |             for i in range(0, len(self.samples), 1):
 74 |                 l0 = self.samples[i:i + 1]
 75 |                 y = self.labels[i:i + 1]
 76 | 
 77 |                 # Feed Forward Pass
 78 |                 l1 = self.relu(dot(l0, self.w01) + 1 * self.b1)
 79 |                 l2 = self.softmax(dot(l1, self.w12) + 1 * self.b2)
 80 | 
 81 |                 err = substract(y, l2)
 82 |                 l2_error = (scalar_mult(0.5,hadamard(err,err)))
 83 | 
 84 |                 l2_error_total = str(np.mean(np.abs(l2_error)))
 85 | 
 86 | 
 87 |                 #if l2_error_total == 1.0:
 88 |                 #    if self.verbose: print("Overflow")
 89 |                 #    return
 90 | 
 91 |                 # Backpropagation
 92 |                 # dE_total/douto
 93 |                 l2_delta = scalar_mult(-1, err)
 94 |                 # douto/dneto = deriv activation
 95 |                 l2_delta = hadamard(l2_delta, self.softmax(l2, deriv=True))
 96 |                 # dneth/dw
 97 |                 l2_delta = dot(transpose(l2_delta), l1)
 98 | 
 99 |                 # dEo/neto
100 |                 # dEo/douto * douto/dneto
101 |                 l1_delta = sum_matrix(hadamard(scalar_mult(-1, err), self.softmax(l2, deriv=True)), axis=0)
102 | 
103 |                 # dEo/outh
104 |                 # dEo/neto * dneto/douth
105 |                 l1_delta = hadamard(l1_delta, self.w12)
106 | 
107 |                 # dEtotal/outh = Sum(Eo/outh)
108 |                 l1_delta = sum_matrix(l1_delta, axis=1)
109 | 
110 |                 # douth/neth
111 |                 l1_delta = hadamard(l1_delta, self.relu(l1, deriv=True))
112 | 
113 |                 # dneth/dw
114 |                 l1_delta = dot(transpose(l1_delta), l0)
115 | 
116 |                 if self.optimizer == "adagrad":
117 |                     # Fundamental idea using https://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descent/
118 |                     # Update Weights using AdaGrad
119 |                     grad_12 = l2_delta.T
120 |                     self.g12 += np.power(grad_12, 2)
121 |                     adjusted_grad = grad_12 / np.sqrt(0.0000001 + self.g12)
122 |                     self.w12 = self.w12 - adjusted_grad
123 | 
124 |                     grad_01 = l1_delta.T
125 |                     self.g01 += np.power(grad_01, 2)
126 |                     adjusted_grad = grad_01 / np.sqrt(0.0000001 + self.g01)
127 |                     self.w01 = self.w01 - adjusted_grad
128 | 
129 |                 if self.optimizer == "sgd":
130 |                     # Update Weights
131 |                     self.w01 = substract(self.w01, scalar_mult(self.eta/((epoch+1)/50), transpose(l1_delta)))
132 |                     self.w12 = substract(self.w12, scalar_mult(self.eta/((epoch+1)/50), transpose(l2_delta)))
133 | 
134 |                 if self.optimizer == "momentum":
135 |                     # Update Weights using Momentum
136 |                     self.v01 = self.my * self.v01 + self.eta * l1_delta.T
137 |                     self.w01 -= self.v01
138 |                     self.v12 = self.my * self.v12 + self.eta * l2_delta.T
139 |                     self.w12 -= self.v12
140 | 
141 |             if epoch % 1 == 0:
142 |                 if self.verbose:
143 |                     y_pred, y_true = self.predict(X_test, y_test)
144 |                     print("Epoch: ", epoch, " - Error: ", l2_error_total, " - Accuracy im Testset: ", accuracy_score(y_true, y_pred))
145 |                     y_pred, y_true = self.predict(X_train, y_train)
146 |                     #print("Epoch: ", epoch, " - Error: ", l2_error_total, " - Accuracy im Trainingsset: ", accuracy_score(y_true, y_pred))
147 |                     #print("############################################")
148 | 
149 |                     y_pred, y_true = self.predict(X_test, y_test)
150 |                     acc = accuracy_score(y_true, y_pred)
151 |                     accuracy.append(acc)
152 |                     no_epochs.append(epoch)
153 |         
154 |         if self.verbose:
155 |             return no_epochs, accuracy
156 | 
157 |     def predict(self, test_samples, test_labels):
158 |         '''
159 |         Predict test data using the fitted model
160 |         :param test_samples:
161 |         :param test_labels:
162 |         :return:
163 |         '''
164 |         l1 = self.relu(np.dot(test_samples, self.w01) + 1 * self.b1)
165 |         l2 = self.softmax(np.dot(l1, self.w12) + 1 * self.b2)
166 |         y_pred = (l2 == l2.max(axis=1)[:, None]).astype(float)
167 |         res_pred = []
168 |         res_labels = []
169 | 
170 |         def checkEqual1(iterator):
171 |             iterator = iter(iterator)
172 |             try:
173 |                 first = next(iterator)
174 |             except StopIteration:
175 |                 return True
176 |             return all(first == rest for rest in iterator)
177 | 
178 |         for k in y_pred:
179 |             for i, j in enumerate(k):
180 |                 if int(j) == 1 and not checkEqual1(k):
181 |                     res_pred.append(i)
182 |                     break
183 |                 if checkEqual1(k):
184 |                     res_pred.append(0)
185 |                     break
186 |         for k in test_labels:
187 |             for i, j in enumerate(k):
188 |                 if j == 1.0:
189 |                     res_labels.append(i)
190 | 
191 |         return res_pred, res_labels
192 | 
193 | 
194 | # Prepare dataset and split into test and training data
195 | 
196 | # MNIST
197 | mnist = fetch_mldata('MNIST original', data_home="./data")
198 | samples = mnist.data
199 | samples = samples/(len(samples)*10)
200 | y = mnist.target.reshape((len(samples), 1))
201 | 
202 | enc = OneHotEncoder()
203 | enc.fit(y)
204 | labels = enc.transform(y).toarray()
205 | 
206 | X_train, X_test, y_train, y_test = train_test_split(samples, labels, test_size=0.33, random_state=42)
207 | 
208 | # Create instance of NeuralNetwork, fit to dataset, predict and print accuracy
209 | 
210 | etas = [3.5]
211 | 
212 | for eta in etas:
213 |     NN = NeuralNetwork(samples=X_train, labels=y_train, eta=eta, epochs=100, size_hidden=5, optimizer="sgd", verbose=True)
214 |     NN.fit()
215 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | matplotlib==1.5.3
2 | numpy==1.11.3
3 | scikit-learn==0.18.1
4 | scipy==0.18.1
5 | 
6 | 


--------------------------------------------------------------------------------