├── README.md ├── __init__.py ├── algebra_helpers.py ├── data └── mldata │ └── mnist-original.mat ├── eval_nn.png ├── mlp_np.py ├── mlp_plain.py └── requirements.txt /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Multi Layer Perceptron from Scratch Using Python3 3 | 4 | ## Table of contents 5 | 6 | * [Intro](#intro) 7 | * [Structure and Components](#structure-and-components) 8 | * [Prerequisites](#prerequisites) 9 | * [Quickstart](#quickstart) 10 | * [Experiments and Evaluation](#experiments-and-evaluation) 11 | * [To-Dos](#to-dos) 12 | * [Final Thoughts](#final-thoughts-and-future-work) 13 | * [References](#references) 14 | 15 | 16 | ## Intro 17 | To better understand the processes in a multi layer perceptron, this projects implements a simple mlp from scratch using no external machine learning libraries. Algebraic or calculus libraries are just used in a saving manner. 18 | This is a multi layer perceptron written in Python 3. 19 | ## Structure and Components 20 | This project contains three modules: 21 | - mlp_np.py uses NumPy for linear algebra and calculus operations 22 | - mlp_plain.py uses **no** additional libraries in the feed forward and backpropagation process 23 | - algebra_helpers.py contains methods for linear algebra 24 | 25 | The mlp consists of an input layer a hidden layer and an output layer. The hidden layer uses a ReLU activation function, sigmoid is available, too. The output layer uses a softmax function to predict a multinomial problem. The input data labels needs to be encoded as one hot vectors. You can find a one hot vector encoder and decoder for multinomial classes in the code. 26 | The mlp has three optimization algorithms implemented: 27 | - Stochastic gradient descent 28 | - Momentum 29 | - AdaGrad 30 | 31 | All three optimizers are based on calculating gradients per sample. 32 | The scripts contain the following methods: 33 | 34 | 1. **fit** 35 | 36 | >Fits the model to the data. If verbose mode, returns the number of number of seen samples as a list and the suitable accuracy scores as a list. 37 | 2. **predict** 38 | 39 | >Makes predictions of given dataset samples and labels using the fitted model. 40 | 41 | The mlp class takes following parameters. 42 | - dataset samples as a list of lists 43 | - dataset labels as a list of lists 44 | - size of the hidden layer (default: 30) 45 | - eta hyperparameter (default: 0.1) 46 | - my hyperparameter (default: 0.9) 47 | - number of epochs (default: 10) 48 | - optimizer type (default: SGD) 49 | - show intermediate steps, including accuracy score per epoch (default: True) 50 | 51 | ## Prerequisites 52 | - Python 3.4+ 53 | - Scikit Learn zur Evaluation und Datenbeschaffung 54 | - NumPy 55 | - matplotlib 56 | ## Quickstart 57 | ``` 58 | $ virtualenv -p python3 venv 59 | $ source venv/bin/activate 60 | $ pip install -r requirements.txt 61 | ``` 62 | Congratulations! You are now ready to train the mlp recognizing handwritten digits. 63 | The scripts by default uses the MNIST dataset for fitting. Generally it can run on any dataset fullfilling the sklearn dataset format. 64 | The scripts will output the mean error per epoch, the accuracy on the test set and the accuracy on the training set for each of the three optimizers. Finally it will create a graph, displaying the accuracy on seen samples (see evaluation section). 65 | To run the mlp with NumPy just type in: 66 | ``` 67 | $ python mlp_np.py 68 | ``` 69 | Example output: 70 | ``` 71 | Epoch: 0 - Error: 4.46429073186e-09 - Accuracy im Testset: 0.944415584416 72 | Epoch: 0 - Error: 4.46429073186e-09 - Accuracy im Trainingsset: 0.947846481876 73 | ############################################ 74 | ``` 75 | To run the plain mlp type: 76 | ``` 77 | $ python mlp_plain.py 78 | ``` 79 | If you want to change hyperparameters and optimizer types, edit lines 212+ in the script. 80 | ```python 81 | etas = [0.5,0.8,1,2] 82 | for eta in etas: 83 | NN = NeuralNetwork(samples=X_train, labels=y_train, eta=eta, epochs=50, size_hidden=40, optimizer="sgd", verbose=True) 84 | fitted = NN.fit() 85 | 86 | ``` 87 | ## Experiments and Evaluation 88 | The dataset is splitted into a trainingset (46900 samples) and a testset (23100 samples) using the train_test_split method of sklearn. 89 | For the evaluation mlp_np.py is used, as it performs much faster, than mlp_plain.py. Seeing 46900 Samples mlp_np.py needs 14 seconds and mlp_plain.py 399 seconds. 90 | The model is evaluated after **2.400.000** samples are seen. The accuracy scores are based on the model performance on the test set. 91 | 92 | The hyperparameters for each optimizer are noted in the table. They have been chosen after some evaluation test runs. A deeper hyperparameter optimization havent been done so far (see to-dos): 93 | 94 | | Optimizer | Best Accuracy score |Mean total Error| 95 | | ------------- | ------------- |-------------| 96 | | SGD *Eta=3.5*| 97,1% | 2.97e-12| 97 | | Momentum *Eta=0.1, My=0.9*| 96,5% |1.97e-09 | 98 | | AdaGrad *Eta=1*| 96,3%|1.22e-08| 99 | 100 | ![Evaluation Curve - Accuracy vs Samples seen](https://github.com/MaviccPRP/mlp_from_scratch/blob/master/eval_nn.png) 101 | ## To-Dos 102 | - [X] Implement per sample SGD 103 | - [ ] Implementing mini-batch 104 | - [ ] Implement the AdaGrad with diagonal matrices 105 | - [ ] Gridsearch for optimizing hyperparameters 106 | - [ ] Compare evaluation performance on testset and trainingset 107 | 108 | ## Final Thoughts and Future Work 109 | 110 | - Best results are reached using the SGD per sample algorithm. 111 | - Momentum is oscillating, when higher percentages are reached. This could be because of the per sample approach. 112 | - The AdaGrad algorithms is smoother than Momentum and SGD, but does not reach accuracy scores of SGD or Momentum. 113 | - In future work, implementing mini-natches could help improving accuracy performance and smoothness of Momentum and AdaGrad. 114 | - Still hyperparameters are not fully evaluated, this could be done using Gridsearch. 115 | - Additionally the AdaGrad algorithm could be updated with a diagonal matrix approach. 116 | - Comparing accuracy on test and training sets, to exclude overfitting. 117 | ## References 118 | - Principles of training multi-layer neural network using backpropagation, http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html 119 | - A Step by Step Backpropagation Example, https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ 120 | - Julia Kreutzer, Julian Hitschler, Neural Networks: Architectures and Applications for NLP, http://www.cl.uni-heidelberg.de/courses/ws16/neuralnetworks/slides/session02.pdf 121 | - Julia Kreutzer, Julian Hitschler, Neural Networks: Architectures and Applications for NLP, http://www.cl.uni-heidelberg.de/courses/ws16/neuralnetworks/slides/session03.pdf 122 | - A Neural Network in 13 lines of Python (Part 2 - Gradient Descent), http://iamtrask.github.io/2015/07/27/python-network-part2/ 123 | - How the backpropagation algorithm works, http://neuralnetworksanddeeplearning.com/chap2.html 124 | - Paul Fackler, Notes on Matrix Calculus, http://www4.ncsu.edu/%7Epfackler/MatCalc.pdf 125 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MaviccPRP/mlp_from_scratch/7032036147a2ad29e3fd3226f99e72eac76b0add/__init__.py -------------------------------------------------------------------------------- /algebra_helpers.py: -------------------------------------------------------------------------------- 1 | # Some helper functions for linear algebra 2 | 3 | # scalar multiplication 4 | def scalar_mult(k, X): 5 | res = [] 6 | for i in X: 7 | row = [] 8 | for j in i: 9 | row.append(k * j) 10 | res.append(row) 11 | return res 12 | 13 | 14 | 15 | # matrix mutiplication 16 | def dot(X,Y): 17 | if len(X[0]) != len(Y): 18 | return "Matrices do not match for multiplication" 19 | return [[sum(a*b for a,b in zip(X_row,Y_col)) for Y_col in zip(*Y)] for X_row in X] 20 | 21 | # hadamard product 22 | def hadamard(X,Y): 23 | res = [] 24 | if not any(isinstance(el, list) for el in X): 25 | for i in Y: 26 | row = [] 27 | for d, j in enumerate(X): 28 | row.append(i[d] * j) 29 | res.append(row) 30 | else: 31 | for i, j in zip(X,Y): 32 | row = [] 33 | if isinstance(i, list): 34 | for k, l in zip(i,j): 35 | row.append(k*l) 36 | res.append(row) 37 | 38 | return res 39 | 40 | 41 | # element wise summing up of two matrices 42 | def summarize(X,Y): 43 | res = [] 44 | for i, j in zip(X, Y): 45 | row = [] 46 | for k, l in zip(i, j): 47 | row.append(k + l) 48 | res.append(row) 49 | return res 50 | 51 | def substract(X,Y): 52 | res = [] 53 | for i, j in zip(X, Y): 54 | row = [] 55 | for k, l in zip(i, j): 56 | row.append(k - l) 57 | res.append(row) 58 | return res 59 | 60 | def transpose(X): 61 | return list(zip(*X)) 62 | 63 | 64 | def sum_matrix(X, axis=0): 65 | if axis == 0: 66 | return [sum(row[i] for row in X) for i in range(len(X[0]))] 67 | if axis == 1: 68 | rows = len(X) 69 | cols = len(X[0]) 70 | total = [] 71 | for x in range(0, rows): 72 | rowtotal = 0 73 | for y in range(0, cols): 74 | rowtotal = rowtotal + X[x][y] 75 | total.append(rowtotal) 76 | return total 77 | 78 | -------------------------------------------------------------------------------- /data/mldata/mnist-original.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MaviccPRP/mlp_from_scratch/7032036147a2ad29e3fd3226f99e72eac76b0add/data/mldata/mnist-original.mat -------------------------------------------------------------------------------- /eval_nn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MaviccPRP/mlp_from_scratch/7032036147a2ad29e3fd3226f99e72eac76b0add/eval_nn.png -------------------------------------------------------------------------------- /mlp_np.py: -------------------------------------------------------------------------------- 1 | # Class of a NeuralNetwork with SGD, Momentum and AdaGrad 2 | import numpy as np 3 | import matplotlib.pyplot as plt # Plotting library 4 | plt.matplotlib.use('Agg') 5 | from sklearn.datasets import load_digits 6 | from sklearn.preprocessing import OneHotEncoder 7 | from sklearn.model_selection import train_test_split 8 | from sklearn.metrics import accuracy_score 9 | from sklearn.datasets import fetch_mldata 10 | 11 | class NeuralNetwork: 12 | def __init__(self, samples, labels, size_hidden=30, eta=0.1, my=0.9, epochs=10, optimizer="sgd", verbose=False): 13 | ''' 14 | 15 | :param samples: input samples 16 | :param labels: input labels 17 | :param size_hidden: number of units in hidden layer 18 | :param eta: learning rate 19 | :param my: learning factor for momentum 20 | :param epochs: number of epochs 21 | :param optimizer: type of optimizer ("sgd", "momentum", "adagrad") 22 | :param verbose: print accuracy and error per epoch 23 | ''' 24 | self.samples = samples 25 | self.labels = labels 26 | self.w01 = np.random.random((len(self.samples[0]), size_hidden)) 27 | self.w12 = np.random.random((size_hidden, len(self.labels[0]))) 28 | self.v01 = np.zeros((len(self.samples[0]), size_hidden)) 29 | self.v12 = np.zeros((size_hidden, len(self.labels[0]))) 30 | self.g01 = np.zeros((len(self.samples[0]), size_hidden)) 31 | self.g12 = np.zeros((size_hidden, len(self.labels[0]))) 32 | self.b1 = np.array([0]) 33 | self.b2 = np.array([0]) 34 | self.eta = eta 35 | self.epochs = epochs 36 | self.my = my 37 | self.optimizer = optimizer 38 | self.verbose = verbose 39 | 40 | def sigmoid(self, x, deriv=False): 41 | if (deriv == True): 42 | return x * (1 - x) 43 | return 1 / (1 + np.exp(-x)) 44 | 45 | def softmax(self, x, deriv=False): 46 | if deriv == True: 47 | # Return the partial derivation of the activation function 48 | return np.multiply(x, 1 - x) 49 | y = x - np.max(x) 50 | e_x = np.exp(y) 51 | return e_x / e_x.sum() 52 | 53 | 54 | def relu(self, x, deriv=False): 55 | if deriv == True: 56 | return 1. * (x > 0) 57 | return x * (x > 0) 58 | 59 | def fit(self): 60 | ''' 61 | Method to fit the input data and optimize the weights in the neural network 62 | :return: 63 | ''' 64 | accuracy = [] 65 | no_epochs = [] 66 | sample_no = 0 67 | 68 | if self.optimizer == "adagrad": 69 | # initialize matrix for adagrad 70 | gti_01 = np.zeros(len(self.w01[0])) 71 | gti_12 = np.zeros(len(self.w12[0])) 72 | 73 | for epoch in range(self.epochs): 74 | for i in range(0, len(self.samples), 1): 75 | sample_no += 1 76 | l0 = self.samples[i:i + 1] 77 | y = self.labels[i:i + 1] 78 | 79 | # Feed Forward Pass 80 | l1 = self.relu(np.dot(l0, self.w01) + 1 * self.b1) 81 | l2 = self.softmax(np.dot(l1, self.w12) + 1 * self.b2) 82 | 83 | l2_error = ((1 / 2) * np.power((y - l2), 2)) 84 | l2_error_total = str(np.mean(np.abs(l2_error))) 85 | 86 | if l2_error_total == 1.0: 87 | if self.verbose: print("Overflow") 88 | return 89 | # Backpropagation 90 | # dE_total/douto 91 | l2_delta = (-1 * (y - l2)) 92 | # douto/dneto = deriv activation 93 | l2_delta = l2_delta * self.softmax(l2, deriv=True) 94 | # dneth/dw 95 | l2_delta = np.dot(l2_delta.T, l1) 96 | 97 | # dEo/neto 98 | # dEo/douto * douto/dneto 99 | l1_delta = ((np.sum(((-1 * (y - l2)) * self.softmax(l2, deriv=True)), axis=0))) 100 | # dEo/outh 101 | # dEo/neto * dneto/douth 102 | l1_delta = l1_delta * self.w12 103 | 104 | # dEtotal/outh = Sum(Eo/outh) 105 | l1_delta = np.sum(l1_delta, axis=1) 106 | # douth/neth 107 | l1_delta = l1_delta * self.relu(l1, deriv=True) 108 | # dneth/dw 109 | l1_delta = np.dot(l1_delta.T, l0) 110 | 111 | if self.optimizer == "adagrad": 112 | # Fundamental idea using https://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descent/ 113 | # Update Weights using AdaGrad 114 | grad_12 = self.eta * l2_delta.T 115 | self.g12 += np.power(grad_12, 2) 116 | adjusted_grad = grad_12 / np.sqrt(0.0000001 + self.g12) 117 | self.w12 = self.w12 - adjusted_grad 118 | 119 | grad_01 = self.eta * l1_delta.T 120 | self.g01 += np.power(grad_01, 2) 121 | adjusted_grad = grad_01 / np.sqrt(0.0000001 + self.g01) 122 | self.w01 = self.w01 - adjusted_grad 123 | 124 | if self.optimizer == "sgd": 125 | # Update Weights 126 | self.w01 -= (self.eta/((epoch+1)/50)) * l1_delta.T 127 | self.w12 -= (self.eta/((epoch+1)/50)) * l2_delta.T 128 | 129 | if self.optimizer == "momentum": 130 | # Update Weights using Momentum 131 | self.v01 = self.my * self.v01 + self.eta * l1_delta.T 132 | self.w01 -= self.v01 133 | self.v12 = self.my * self.v12 + self.eta * l2_delta.T 134 | self.w12 -= self.v12 135 | 136 | if epoch % 1 == 0: 137 | if self.verbose: 138 | y_pred, y_true = self.predict(X_test, y_test) 139 | acc = accuracy_score(y_true, y_pred) 140 | print("Epoch: ", epoch, " - Error: ", l2_error_total, " - Accuracy im Testset: ", acc) 141 | y_pred, y_true = self.predict(X_train, y_train) 142 | print("Epoch: ", epoch, " - Error: ", l2_error_total, " - Accuracy im Trainingsset: ", accuracy_score(y_true, y_pred)) 143 | print("############################################") 144 | 145 | accuracy.append(acc) 146 | no_epochs.append(sample_no) 147 | if self.verbose: 148 | return no_epochs, accuracy 149 | 150 | def predict(self, test_samples, test_labels): 151 | ''' 152 | Predict test data using the fitted model 153 | :param test_samples: 154 | :param test_labels: 155 | :return: 156 | ''' 157 | l1 = self.relu(np.dot(test_samples, self.w01) + 1 * self.b1) 158 | l2 = self.softmax(np.dot(l1, self.w12) + 1 * self.b2) 159 | y_pred = (l2 == l2.max(axis=1)[:, None]).astype(float) 160 | res_pred = [] 161 | res_labels = [] 162 | 163 | def checkEqual1(iterator): 164 | iterator = iter(iterator) 165 | try: 166 | first = next(iterator) 167 | except StopIteration: 168 | return True 169 | return all(first == rest for rest in iterator) 170 | 171 | for k in y_pred: 172 | for i, j in enumerate(k): 173 | if int(j) == 1 and not checkEqual1(k): 174 | res_pred.append(i) 175 | break 176 | if checkEqual1(k): 177 | res_pred.append(0) 178 | break 179 | for k in test_labels: 180 | for i, j in enumerate(k): 181 | if j == 1.0: 182 | res_labels.append(i) 183 | 184 | return res_pred, res_labels 185 | 186 | 187 | # Prepare dataset and split into test and training data 188 | ''' 189 | # Digits 190 | digits = load_digits() 191 | samples = digits.data 192 | y = digits.target.reshape((len(samples),1)) 193 | enc = OneHotEncoder() 194 | enc.fit(y) 195 | labels = enc.transform(y).toarray() 196 | X_train, X_test, y_train, y_test = train_test_split(samples, labels, test_size=0.33, random_state=42) 197 | ''' 198 | # MNIST 199 | mnist = fetch_mldata('MNIST original', data_home="./data") 200 | samples = mnist.data 201 | samples = samples/(len(samples)*10) 202 | y = mnist.target.reshape((len(samples), 1)) 203 | 204 | enc = OneHotEncoder() 205 | enc.fit(y) 206 | labels = enc.transform(y).toarray() 207 | 208 | X_train, X_test, y_train, y_test = train_test_split(samples, labels, test_size=0.33, random_state=42) 209 | # Create instance of NeuralNetwork, fit to dataset, predict and print accuracy 210 | 211 | etas = [3.5] 212 | 213 | for eta in etas: 214 | print(eta) 215 | NN = NeuralNetwork(samples=X_train, labels=y_train, eta=eta, epochs=50, size_hidden=40, optimizer="sgd", verbose=True) 216 | fitted = NN.fit() 217 | plt.plot(fitted[0], fitted[1], 'y-', linewidth=2, label='sgd; Eta=3.5') 218 | 219 | NN = NeuralNetwork(samples=X_train, labels=y_train, eta=eta, epochs=50, size_hidden=40, optimizer="momentum", verbose=True) 220 | fitted = NN.fit() 221 | plt.plot(fitted[0], fitted[1], 'b-', linewidth=2, label='momentum; Eta=0.1, My=0.9') 222 | 223 | NN = NeuralNetwork(samples=X_train, labels=y_train, eta=1, epochs=50, size_hidden=40, optimizer="adagrad", verbose=True) 224 | fitted = NN.fit() 225 | plt.plot(fitted[0], fitted[1], 'r-', linewidth=2, label='adagrad; Eta=1') 226 | 227 | plt.xlabel("Samples seen") 228 | plt.ylabel("Accuarcy") 229 | plt.legend(loc='lower right') 230 | fig = plt.gcf() 231 | fig.savefig("eval_nn.png") 232 | 233 | #y_pred, y_true = NN.predict(X_test, y_test) 234 | 235 | 236 | # print("Accuracy: ",accuracy_score(y_true, y_pred)) 237 | 238 | -------------------------------------------------------------------------------- /mlp_plain.py: -------------------------------------------------------------------------------- 1 | # Class of a NeuralNetwork with SGD, Momentum and AdaGrad 2 | import numpy as np 3 | import matplotlib.pyplot as plt # Plotting library 4 | from sklearn.datasets import load_digits 5 | from sklearn.preprocessing import OneHotEncoder 6 | from sklearn.model_selection import train_test_split 7 | from sklearn.metrics import accuracy_score 8 | from sklearn.datasets import fetch_mldata 9 | from algebra_helpers import dot, hadamard, substract, scalar_mult, transpose, summarize, sum_matrix 10 | 11 | class NeuralNetwork: 12 | def __init__(self, samples, labels, size_hidden=30, eta=0.1, my=0.9, epochs=10, optimizer="", verbose=False): 13 | ''' 14 | 15 | :param samples: input samples 16 | :param labels: input labels 17 | :param size_hidden: number of units in hidden layer 18 | :param eta: learning rate 19 | :param my: learning factor for momentum 20 | :param epochs: number of epochs 21 | :param optimizer: type of optimizer ("sgd", "momentum", "adagrad") 22 | :param verbose: print accuracy and error per epoch 23 | ''' 24 | self.samples = samples 25 | self.labels = labels 26 | self.w01 = np.random.random((len(self.samples[0]), size_hidden)) 27 | self.w12 = np.random.random((size_hidden, len(self.labels[0]))) 28 | self.v01 = np.zeros((len(self.samples[0]), size_hidden)) 29 | self.v12 = np.zeros((size_hidden, len(self.labels[0]))) 30 | self.g01 = np.zeros((len(self.samples[0]), size_hidden)) 31 | self.g12 = np.zeros((size_hidden, len(self.labels[0]))) 32 | self.b1 = np.array([0]) 33 | self.b2 = np.array([0]) 34 | self.eta = eta 35 | self.epochs = epochs 36 | self.my = my 37 | self.optimizer = optimizer 38 | self.verbose = verbose 39 | 40 | def sigmoid(self, x, deriv=False): 41 | if (deriv == True): 42 | return x * (1 - x) 43 | return 1 / (1 + np.exp(-x)) 44 | 45 | def softmax(self, x, deriv=False): 46 | if deriv == True: 47 | # Return the partial derivation of the activation function 48 | return np.multiply(x, 1 - x) 49 | y = x - np.max(x) 50 | e_x = np.exp(y) 51 | return e_x / e_x.sum() 52 | 53 | 54 | def relu(self, x, deriv=False): 55 | if deriv == True: 56 | return 1. * (x > 0) 57 | return x * (x > 0) 58 | 59 | def fit(self): 60 | ''' 61 | Method to fit the input data and optimize the weights in the neural network 62 | :return: 63 | ''' 64 | accuracy = [] 65 | no_epochs = [] 66 | 67 | if self.optimizer == "adagrad": 68 | # initialize matrix for adagrad 69 | gti_01 = np.zeros(len(self.w01[0])) 70 | gti_12 = np.zeros(len(self.w12[0])) 71 | 72 | for epoch in range(self.epochs): 73 | for i in range(0, len(self.samples), 1): 74 | l0 = self.samples[i:i + 1] 75 | y = self.labels[i:i + 1] 76 | 77 | # Feed Forward Pass 78 | l1 = self.relu(dot(l0, self.w01) + 1 * self.b1) 79 | l2 = self.softmax(dot(l1, self.w12) + 1 * self.b2) 80 | 81 | err = substract(y, l2) 82 | l2_error = (scalar_mult(0.5,hadamard(err,err))) 83 | 84 | l2_error_total = str(np.mean(np.abs(l2_error))) 85 | 86 | 87 | #if l2_error_total == 1.0: 88 | # if self.verbose: print("Overflow") 89 | # return 90 | 91 | # Backpropagation 92 | # dE_total/douto 93 | l2_delta = scalar_mult(-1, err) 94 | # douto/dneto = deriv activation 95 | l2_delta = hadamard(l2_delta, self.softmax(l2, deriv=True)) 96 | # dneth/dw 97 | l2_delta = dot(transpose(l2_delta), l1) 98 | 99 | # dEo/neto 100 | # dEo/douto * douto/dneto 101 | l1_delta = sum_matrix(hadamard(scalar_mult(-1, err), self.softmax(l2, deriv=True)), axis=0) 102 | 103 | # dEo/outh 104 | # dEo/neto * dneto/douth 105 | l1_delta = hadamard(l1_delta, self.w12) 106 | 107 | # dEtotal/outh = Sum(Eo/outh) 108 | l1_delta = sum_matrix(l1_delta, axis=1) 109 | 110 | # douth/neth 111 | l1_delta = hadamard(l1_delta, self.relu(l1, deriv=True)) 112 | 113 | # dneth/dw 114 | l1_delta = dot(transpose(l1_delta), l0) 115 | 116 | if self.optimizer == "adagrad": 117 | # Fundamental idea using https://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descent/ 118 | # Update Weights using AdaGrad 119 | grad_12 = l2_delta.T 120 | self.g12 += np.power(grad_12, 2) 121 | adjusted_grad = grad_12 / np.sqrt(0.0000001 + self.g12) 122 | self.w12 = self.w12 - adjusted_grad 123 | 124 | grad_01 = l1_delta.T 125 | self.g01 += np.power(grad_01, 2) 126 | adjusted_grad = grad_01 / np.sqrt(0.0000001 + self.g01) 127 | self.w01 = self.w01 - adjusted_grad 128 | 129 | if self.optimizer == "sgd": 130 | # Update Weights 131 | self.w01 = substract(self.w01, scalar_mult(self.eta/((epoch+1)/50), transpose(l1_delta))) 132 | self.w12 = substract(self.w12, scalar_mult(self.eta/((epoch+1)/50), transpose(l2_delta))) 133 | 134 | if self.optimizer == "momentum": 135 | # Update Weights using Momentum 136 | self.v01 = self.my * self.v01 + self.eta * l1_delta.T 137 | self.w01 -= self.v01 138 | self.v12 = self.my * self.v12 + self.eta * l2_delta.T 139 | self.w12 -= self.v12 140 | 141 | if epoch % 1 == 0: 142 | if self.verbose: 143 | y_pred, y_true = self.predict(X_test, y_test) 144 | print("Epoch: ", epoch, " - Error: ", l2_error_total, " - Accuracy im Testset: ", accuracy_score(y_true, y_pred)) 145 | y_pred, y_true = self.predict(X_train, y_train) 146 | #print("Epoch: ", epoch, " - Error: ", l2_error_total, " - Accuracy im Trainingsset: ", accuracy_score(y_true, y_pred)) 147 | #print("############################################") 148 | 149 | y_pred, y_true = self.predict(X_test, y_test) 150 | acc = accuracy_score(y_true, y_pred) 151 | accuracy.append(acc) 152 | no_epochs.append(epoch) 153 | 154 | if self.verbose: 155 | return no_epochs, accuracy 156 | 157 | def predict(self, test_samples, test_labels): 158 | ''' 159 | Predict test data using the fitted model 160 | :param test_samples: 161 | :param test_labels: 162 | :return: 163 | ''' 164 | l1 = self.relu(np.dot(test_samples, self.w01) + 1 * self.b1) 165 | l2 = self.softmax(np.dot(l1, self.w12) + 1 * self.b2) 166 | y_pred = (l2 == l2.max(axis=1)[:, None]).astype(float) 167 | res_pred = [] 168 | res_labels = [] 169 | 170 | def checkEqual1(iterator): 171 | iterator = iter(iterator) 172 | try: 173 | first = next(iterator) 174 | except StopIteration: 175 | return True 176 | return all(first == rest for rest in iterator) 177 | 178 | for k in y_pred: 179 | for i, j in enumerate(k): 180 | if int(j) == 1 and not checkEqual1(k): 181 | res_pred.append(i) 182 | break 183 | if checkEqual1(k): 184 | res_pred.append(0) 185 | break 186 | for k in test_labels: 187 | for i, j in enumerate(k): 188 | if j == 1.0: 189 | res_labels.append(i) 190 | 191 | return res_pred, res_labels 192 | 193 | 194 | # Prepare dataset and split into test and training data 195 | 196 | # MNIST 197 | mnist = fetch_mldata('MNIST original', data_home="./data") 198 | samples = mnist.data 199 | samples = samples/(len(samples)*10) 200 | y = mnist.target.reshape((len(samples), 1)) 201 | 202 | enc = OneHotEncoder() 203 | enc.fit(y) 204 | labels = enc.transform(y).toarray() 205 | 206 | X_train, X_test, y_train, y_test = train_test_split(samples, labels, test_size=0.33, random_state=42) 207 | 208 | # Create instance of NeuralNetwork, fit to dataset, predict and print accuracy 209 | 210 | etas = [3.5] 211 | 212 | for eta in etas: 213 | NN = NeuralNetwork(samples=X_train, labels=y_train, eta=eta, epochs=100, size_hidden=5, optimizer="sgd", verbose=True) 214 | NN.fit() 215 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | matplotlib==1.5.3 2 | numpy==1.11.3 3 | scikit-learn==0.18.1 4 | scipy==0.18.1 5 | 6 | --------------------------------------------------------------------------------