├── Neural_Network_from_scratch_with_Numpy.ipynb ├── images ├── backprop.gif ├── decision_boundary.png └── loss_acc.png ├── nn.py └── readme.md /images/backprop.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ahmedbesbes/Neural-Network-from-scratch/493be2f4015d345fc68d3addd518c2b127e8c648/images/backprop.gif -------------------------------------------------------------------------------- /images/decision_boundary.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ahmedbesbes/Neural-Network-from-scratch/493be2f4015d345fc68d3addd518c2b127e8c648/images/decision_boundary.png -------------------------------------------------------------------------------- /images/loss_acc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ahmedbesbes/Neural-Network-from-scratch/493be2f4015d345fc68d3addd518c2b127e8c648/images/loss_acc.png -------------------------------------------------------------------------------- /nn.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # Author: Ahmed BESBES 3 | # 4 | # 5 | 6 | # matplotlib for plotting 7 | import matplotlib 8 | matplotlib.rcParams['figure.figsize'] = (10.0, 10.0) 9 | from matplotlib import pyplot as plt 10 | 11 | # numpy for vector and matrix manipulations 12 | import numpy as np 13 | 14 | # we won't use scikit per se but we'll use some help functions in it 15 | # such as accuracy_score, shuffle, and train_test_split 16 | from sklearn.metrics import accuracy_score 17 | from sklearn.utils import shuffle 18 | from sklearn.model_selection import train_test_split 19 | 20 | # tqdm is progress-bar. make sure it's installed: pip install tqdm 21 | from tqdm import tqdm 22 | from IPython import display 23 | 24 | 25 | def activation(z, derivative=False): 26 | """ 27 | Sigmoid activation function: 28 | It handles two modes: normal and derivative mode. 29 | Applies a pointwize operation on vectors 30 | 31 | Parameters: 32 | --- 33 | z: pre-activation vector at layer l 34 | shape (n[l], batch_size) 35 | 36 | Returns: 37 | pontwize activation on each element of the input z 38 | """ 39 | if derivative: 40 | return activation(z) * (1 - activation(z)) 41 | else: 42 | return 1 / (1 + np.exp(-z)) 43 | 44 | def cost_function(y_true, y_pred): 45 | """ 46 | Computes the Mean Square Error between a ground truth vector and a prediction vector 47 | Parameters: 48 | --- 49 | y_true: ground-truth vector 50 | y_pred: prediction vector 51 | Returns: 52 | --- 53 | cost: a scalar value representing the loss 54 | """ 55 | n = y_pred.shape[1] 56 | cost = (1./(2*n)) * np.sum((y_true - y_pred) ** 2) 57 | return cost 58 | 59 | def cost_function_prime(y_true, y_pred): 60 | """ 61 | Computes the derivative of the loss function w.r.t the activation of the output layer 62 | Parameters: 63 | --- 64 | y_true: ground-truth vector 65 | y_pred: prediction vector 66 | Returns: 67 | --- 68 | cost_prime: derivative of the loss w.r.t. the activation of the output 69 | shape: (n[L], batch_size) 70 | """ 71 | cost_prime = y_pred - y_true 72 | return cost_prime 73 | 74 | 75 | 76 | class NeuralNetwork(object): 77 | ''' 78 | This is a custom neural netwok package built from scratch with numpy. 79 | It allows training using SGD, inference and live plotting of the decision boundary. 80 | This code is not optimized and should not be used with real-world examples. 81 | It's written for educational purposes only. 82 | 83 | The Neural Network as well as its parameters and training method and procedure will 84 | reside in this class. 85 | 86 | Parameters 87 | --- 88 | size: list of number of neurons per layer 89 | 90 | Examples 91 | --- 92 | >>> import NeuralNetwork 93 | >>> nn = NeuralNetword([2, 3, 4, 1]) 94 | 95 | This means : 96 | 1 input layer with 2 neurons 97 | 1 hidden layer with 3 neurons 98 | 1 hidden layer with 4 neurons 99 | 1 output layer with 1 neuron 100 | 101 | ''' 102 | 103 | def __init__(self, size, seed=42): 104 | ''' 105 | Instantiate the weights and biases of the network 106 | weights and biases are attributes of the NeuralNetwork class 107 | They are updated during the training 108 | ''' 109 | self.seed = seed 110 | np.random.seed(self.seed) 111 | self.size = size 112 | self.weights = [np.random.randn(self.size[i], self.size[i-1]) * np.sqrt(1 / self.size[i-1]) for i in range(1, len(self.size))] 113 | self.biases = [np.random.rand(n, 1) for n in self.size[1:]] 114 | 115 | def forward(self, input): 116 | ''' 117 | Perform a feed forward computation 118 | 119 | Parameters 120 | --- 121 | input: data to be fed to the network with 122 | shape: (input_shape, batch_size) 123 | 124 | Returns 125 | --- 126 | a: ouptut activation (output_shape, batch_size) 127 | pre_activations: list of pre-activations per layer 128 | each of shape (n[l], batch_size), where n[l] is the number 129 | of neuron at layer l 130 | activations: list of activations per layer 131 | each of shape (n[l], batch_size), where n[l] is the number 132 | of neuron at layer l 133 | 134 | ''' 135 | a = input 136 | pre_activations = [] 137 | activations = [a] 138 | for w, b in zip(self.weights, self.biases): 139 | z = np.dot(w, a) + b 140 | a = activation(z) 141 | pre_activations.append(z) 142 | activations.append(a) 143 | return a, pre_activations, activations 144 | 145 | def compute_deltas(self, pre_activations, y_true, y_pred): 146 | """ 147 | Computes a list containing the values of delta for each layer using 148 | a recursion 149 | Parameters: 150 | --- 151 | pre_activations: list of of pre-activations. each corresponding to a layer 152 | y_true: ground truth values of the labels 153 | y_pred: prediction values of the labels 154 | Returns: 155 | --- 156 | deltas: a list of deltas per layer 157 | 158 | """ 159 | delta_L = cost_function_prime(y_true, y_pred) * activation(pre_activations[-1], derivative=True) 160 | deltas = [0] * (len(self.size) - 1) 161 | deltas[-1] = delta_L 162 | for l in range(len(deltas) - 2, -1, -1): 163 | delta = np.dot(self.weights[l + 1].transpose(), deltas[l + 1]) * activation(pre_activations[l], derivative=True) 164 | deltas[l] = delta 165 | return deltas 166 | 167 | def backpropagate(self, deltas, pre_activations, activations): 168 | """ 169 | Applies back-propagation and computes the gradient of the loss 170 | w.r.t the weights and biases of the network 171 | 172 | Parameters: 173 | --- 174 | deltas: list of deltas computed by compute_deltas 175 | pre_activations: a list of pre-activations per layer 176 | activations: a list of activations per layer 177 | Returns: 178 | --- 179 | dW: list of gradients w.r.t. the weight matrices of the network 180 | db: list of gradients w.r.t. the biases (vectors) of the network 181 | 182 | """ 183 | dW = [] 184 | db = [] 185 | deltas = [0] + deltas 186 | for l in range(1, len(self.size)): 187 | dW_l = np.dot(deltas[l], activations[l-1].transpose()) 188 | db_l = deltas[l] 189 | dW.append(dW_l) 190 | db.append(np.expand_dims(db_l.mean(axis=1), 1)) 191 | return dW, db 192 | 193 | def plot_decision_regions(self, X, y, iteration, train_loss, val_loss, train_acc, val_acc, res=0.01): 194 | """ 195 | Plots the decision boundary at each iteration (i.e. epoch) in order to inspect the performance 196 | of the model 197 | 198 | Parameters: 199 | --- 200 | X: the input data 201 | y: the labels 202 | iteration: the epoch number 203 | train_loss: value of the training loss 204 | val_loss: value of the validation loss 205 | train_acc: value of the training accuracy 206 | val_acc: value of the validation accuracy 207 | res: resolution of the plot 208 | Returns: 209 | --- 210 | None: this function plots the decision boundary 211 | """ 212 | X, y = X.T, y.T 213 | x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5 214 | y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5 215 | xx, yy = np.meshgrid(np.arange(x_min, x_max, res), 216 | np.arange(y_min, y_max, res)) 217 | 218 | Z = self.predict(np.c_[xx.ravel(), yy.ravel()].T) 219 | Z = Z.reshape(xx.shape) 220 | plt.contourf(xx, yy, Z, alpha=0.5) 221 | plt.xlim(xx.min(), xx.max()) 222 | plt.ylim(yy.min(), yy.max()) 223 | plt.scatter(X[:, 0], X[:, 1], c=y.reshape(-1), alpha=0.2) 224 | message = 'iteration: {} | train loss: {} | val loss: {} | train acc: {} | val acc: {}'.format(iteration, 225 | train_loss, 226 | val_loss, 227 | train_acc, 228 | val_acc) 229 | plt.title(message) 230 | 231 | 232 | 233 | def train(self, X, y, batch_size, epochs, learning_rate, validation_split=0.2, print_every=10, tqdm_=True, plot_every=None): 234 | """ 235 | Trains the network using the gradients computed by back-propagation 236 | Splits the data in train and validation splits 237 | Processes the training data by batches and trains the network using batch gradient descent 238 | 239 | Parameters: 240 | --- 241 | X: input data 242 | y: input labels 243 | batch_size: number of data points to process in each batch 244 | epochs: number of epochs for the training 245 | learning_rate: value of the learning rate 246 | validation_split: percentage of the data for validation 247 | print_every: the number of epochs by which the network logs the loss and accuracy metrics for train and validations splits 248 | tqdm_: use tqdm progress-bar 249 | plot_every: the number of epochs by which the network plots the decision boundary 250 | 251 | Returns: 252 | --- 253 | history: dictionary of train and validation metrics per epoch 254 | train_acc: train accuracy 255 | test_acc: validation accuracy 256 | train_loss: train loss 257 | test_loss: validation loss 258 | 259 | This history is used to plot the performance of the model 260 | """ 261 | history_train_losses = [] 262 | history_train_accuracies = [] 263 | history_test_losses = [] 264 | history_test_accuracies = [] 265 | 266 | x_train, x_test, y_train, y_test = train_test_split(X.T, y.T, test_size=validation_split, ) 267 | x_train, x_test, y_train, y_test = x_train.T, x_test.T, y_train.T, y_test.T 268 | 269 | if tqdm_: 270 | epoch_iterator = tqdm(range(epochs)) 271 | else: 272 | epoch_iterator = range(epochs) 273 | 274 | for e in epoch_iterator: 275 | if x_train.shape[1] % batch_size == 0: 276 | n_batches = int(x_train.shape[1] / batch_size) 277 | else: 278 | n_batches = int(x_train.shape[1] / batch_size ) - 1 279 | 280 | x_train, y_train = shuffle(x_train.T, y_train.T) 281 | x_train, y_train = x_train.T, y_train.T 282 | 283 | batches_x = [x_train[:, batch_size*i:batch_size*(i+1)] for i in range(0, n_batches)] 284 | batches_y = [y_train[:, batch_size*i:batch_size*(i+1)] for i in range(0, n_batches)] 285 | 286 | train_losses = [] 287 | train_accuracies = [] 288 | 289 | test_losses = [] 290 | test_accuracies = [] 291 | 292 | dw_per_epoch = [np.zeros(w.shape) for w in self.weights] 293 | db_per_epoch = [np.zeros(b.shape) for b in self.biases] 294 | 295 | for batch_x, batch_y in zip(batches_x, batches_y): 296 | batch_y_pred, pre_activations, activations = self.forward(batch_x) 297 | deltas = self.compute_deltas(pre_activations, batch_y, batch_y_pred) 298 | dW, db = self.backpropagate(deltas, pre_activations, activations) 299 | for i, (dw_i, db_i) in enumerate(zip(dW, db)): 300 | dw_per_epoch[i] += dw_i / batch_size 301 | db_per_epoch[i] += db_i / batch_size 302 | 303 | batch_y_train_pred = self.predict(batch_x) 304 | 305 | train_loss = cost_function(batch_y, batch_y_train_pred) 306 | train_losses.append(train_loss) 307 | train_accuracy = accuracy_score(batch_y.T, batch_y_train_pred.T) 308 | train_accuracies.append(train_accuracy) 309 | 310 | batch_y_test_pred = self.predict(x_test) 311 | 312 | test_loss = cost_function(y_test, batch_y_test_pred) 313 | test_losses.append(test_loss) 314 | test_accuracy = accuracy_score(y_test.T, batch_y_test_pred.T) 315 | test_accuracies.append(test_accuracy) 316 | 317 | 318 | # weight update 319 | for i, (dw_epoch, db_epoch) in enumerate(zip(dw_per_epoch, db_per_epoch)): 320 | self.weights[i] = self.weights[i] - learning_rate * dw_epoch 321 | self.biases[i] = self.biases[i] - learning_rate * db_epoch 322 | 323 | history_train_losses.append(np.mean(train_losses)) 324 | history_train_accuracies.append(np.mean(train_accuracies)) 325 | 326 | history_test_losses.append(np.mean(test_losses)) 327 | history_test_accuracies.append(np.mean(test_accuracies)) 328 | 329 | 330 | if not plot_every: 331 | if e % print_every == 0: 332 | print('Epoch {} / {} | train loss: {} | train accuracy: {} | val loss : {} | val accuracy : {} '.format( 333 | e, epochs, np.round(np.mean(train_losses), 3), np.round(np.mean(train_accuracies), 3), 334 | np.round(np.mean(test_losses), 3), np.round(np.mean(test_accuracies), 3))) 335 | else: 336 | if e % plot_every == 0: 337 | self.plot_decision_regions(x_train, y_train, e, 338 | np.round(np.mean(train_losses), 4), 339 | np.round(np.mean(test_losses), 4), 340 | np.round(np.mean(train_accuracies), 4), 341 | np.round(np.mean(test_accuracies), 4), 342 | ) 343 | plt.show() 344 | display.display(plt.gcf()) 345 | display.clear_output(wait=True) 346 | 347 | self.plot_decision_regions(X, y, e, 348 | np.round(np.mean(train_losses), 4), 349 | np.round(np.mean(test_losses), 4), 350 | np.round(np.mean(train_accuracies), 4), 351 | np.round(np.mean(test_accuracies), 4), 352 | ) 353 | 354 | history = {'epochs': epochs, 355 | 'train_loss': history_train_losses, 356 | 'train_acc': history_train_accuracies, 357 | 'test_loss': history_test_losses, 358 | 'test_acc': history_test_accuracies 359 | } 360 | return history 361 | 362 | def predict(self, a): 363 | ''' 364 | Use the current state of the network to make predictions 365 | 366 | Parameters: 367 | --- 368 | a: input data, shape: (input_shape, batch_size) 369 | 370 | Returns: 371 | --- 372 | predictions: vector of output predictions 373 | ''' 374 | for w, b in zip(self.weights, self.biases): 375 | z = np.dot(w, a) + b 376 | a = activation(z) 377 | predictions = (a > 0.5).astype(int) 378 | return predictions 379 | -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | 2 | ## Learn backpropagtion the **hard** way 3 | 4 | ![Backpropagation](https://github.com/ahmedbesbes/Neural-Network-from-scratch/blob/master/images/backprop.gif) 5 | 6 | In this repository, I will show you how to build a neural network from scratch (yes, by using plain python code with no framework involved) that trains by mini-batches using gradient descent. Check **nn.py** for the code. 7 | 8 | In the related notebook **Neural_Network_from_scratch_with_Numpy.ipynb** we will test nn.py on a set of non-linear classification problems 9 | 10 | - We'll train the neural network for some number of epochs and some hyperparameters 11 | - Plot a live/interactive decision boundary 12 | - Plot the train and validation metrics such as the loss and the accuracies 13 | 14 | 15 | ## Example: Noisy Moons (Check the notebook for other kinds of problems) 16 | 17 | ### Decision boundary (you'll get to this graph animated during training) 18 | ![Decision boundary](https://github.com/ahmedbesbes/Neural-Network-from-scratch/blob/master/images/decision_boundary.png) 19 | 20 | ### Loss and accuracy monitoring on train and validation sets 21 | ![Loss/Accuracy monitoring on train/val](https://github.com/ahmedbesbes/Neural-Network-from-scratch/blob/master/images/loss_acc.png) 22 | 23 | 24 | ## Where to go from here? 25 | nn.py is a toy neural network that is meant for educational purposes only. So there's room for a lot of improvement if you want to pimp it. Here are some guidelines: 26 | 27 | - Implement a different loss function such as the Binary Cross Entropy loss. For a classification problem, this loss works better than a Mean Square Error. 28 | - Make the code generic regarding the activation functions so that we can choose any function we want: ReLU, Sigmoid, Tanh, etc. 29 | - Try to code another optimizers: SGD is good but it has some limitations: sometimes it can be stuck in local minima. Look into Adam or RMSProp. 30 | - Play with the hyperparameters and check the validation metrics 31 | 32 | --------------------------------------------------------------------------------