├── Neural_Network_from_scratch_with_Numpy.ipynb
├── images
    ├── backprop.gif
    ├── decision_boundary.png
    └── loss_acc.png
├── nn.py
└── readme.md


/images/backprop.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ahmedbesbes/Neural-Network-from-scratch/493be2f4015d345fc68d3addd518c2b127e8c648/images/backprop.gif


--------------------------------------------------------------------------------
/images/decision_boundary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ahmedbesbes/Neural-Network-from-scratch/493be2f4015d345fc68d3addd518c2b127e8c648/images/decision_boundary.png


--------------------------------------------------------------------------------
/images/loss_acc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ahmedbesbes/Neural-Network-from-scratch/493be2f4015d345fc68d3addd518c2b127e8c648/images/loss_acc.png


--------------------------------------------------------------------------------
/nn.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # Author: Ahmed BESBES 
  3 | # <ahmed.besbes@hotmail.com>
  4 | #
  5 | 
  6 | # matplotlib for plotting
  7 | import matplotlib
  8 | matplotlib.rcParams['figure.figsize'] = (10.0, 10.0)
  9 | from matplotlib import pyplot as plt
 10 | 
 11 | # numpy for vector and matrix manipulations
 12 | import numpy as np
 13 | 
 14 | # we won't use scikit per se but we'll use some help functions in it
 15 | # such as accuracy_score, shuffle, and train_test_split
 16 | from sklearn.metrics import accuracy_score
 17 | from sklearn.utils import shuffle
 18 | from sklearn.model_selection import train_test_split
 19 | 
 20 | # tqdm is progress-bar. make sure it's installed: pip install tqdm
 21 | from tqdm import tqdm
 22 | from IPython import display
 23 | 
 24 | 
 25 | def activation(z, derivative=False):
 26 |     """
 27 |     Sigmoid activation function:
 28 |     It handles two modes: normal and derivative mode.
 29 |     Applies a pointwize operation on vectors
 30 |     
 31 |     Parameters:
 32 |     ---
 33 |     z: pre-activation vector at layer l
 34 |         shape (n[l], batch_size)
 35 | 
 36 |     Returns: 
 37 |     pontwize activation on each element of the input z
 38 |     """
 39 |     if derivative:
 40 |         return activation(z) * (1 - activation(z))
 41 |     else:
 42 |         return 1 / (1 + np.exp(-z))
 43 | 
 44 | def cost_function(y_true, y_pred):
 45 |     """
 46 |     Computes the Mean Square Error between a ground truth vector and a prediction vector
 47 |     Parameters:
 48 |     ---
 49 |     y_true: ground-truth vector
 50 |     y_pred: prediction vector
 51 |     Returns:
 52 |     ---
 53 |     cost: a scalar value representing the loss
 54 |     """
 55 |     n = y_pred.shape[1]
 56 |     cost = (1./(2*n)) * np.sum((y_true - y_pred) ** 2)
 57 |     return cost
 58 | 
 59 | def cost_function_prime(y_true, y_pred):
 60 |     """
 61 |     Computes the derivative of the loss function w.r.t the activation of the output layer
 62 |     Parameters:
 63 |     ---
 64 |     y_true: ground-truth vector
 65 |     y_pred: prediction vector
 66 |     Returns:
 67 |     ---
 68 |     cost_prime: derivative of the loss w.r.t. the activation of the output
 69 |     shape: (n[L], batch_size)    
 70 |     """
 71 |     cost_prime = y_pred - y_true
 72 |     return cost_prime
 73 | 
 74 | 
 75 | 
 76 | class NeuralNetwork(object):     
 77 |     '''
 78 |     This is a custom neural netwok package built from scratch with numpy. 
 79 |     It allows training using SGD, inference and live plotting of the decision boundary.
 80 |     This code is not optimized and should not be used with real-world examples.
 81 |     It's written for educational purposes only.
 82 | 
 83 |     The Neural Network as well as its parameters and training method and procedure will 
 84 |     reside in this class.
 85 | 
 86 |     Parameters
 87 |     ---
 88 |     size: list of number of neurons per layer
 89 | 
 90 |     Examples
 91 |     ---
 92 |     >>> import NeuralNetwork
 93 |     >>> nn = NeuralNetword([2, 3, 4, 1])
 94 |     
 95 |     This means :
 96 |     1 input layer with 2 neurons
 97 |     1 hidden layer with 3 neurons
 98 |     1 hidden layer with 4 neurons
 99 |     1 output layer with 1 neuron
100 |     
101 |     '''
102 | 
103 |     def __init__(self, size, seed=42):
104 |         '''
105 |         Instantiate the weights and biases of the network
106 |         weights and biases are attributes of the NeuralNetwork class
107 |         They are updated during the training
108 |         '''
109 |         self.seed = seed
110 |         np.random.seed(self.seed)
111 |         self.size = size
112 |         self.weights = [np.random.randn(self.size[i], self.size[i-1]) * np.sqrt(1 / self.size[i-1]) for i in range(1, len(self.size))]
113 |         self.biases = [np.random.rand(n, 1) for n in self.size[1:]]
114 | 
115 |     def forward(self, input):
116 |         '''
117 |         Perform a feed forward computation 
118 | 
119 |         Parameters
120 |         ---
121 |         input: data to be fed to the network with
122 |         shape: (input_shape, batch_size)
123 | 
124 |         Returns
125 |         ---
126 |         a: ouptut activation (output_shape, batch_size)
127 |         pre_activations: list of pre-activations per layer
128 |         each of shape (n[l], batch_size), where n[l] is the number 
129 |         of neuron at layer l
130 |         activations: list of activations per layer
131 |         each of shape (n[l], batch_size), where n[l] is the number 
132 |         of neuron at layer l
133 | 
134 |         '''
135 |         a = input
136 |         pre_activations = []
137 |         activations = [a]
138 |         for w, b in zip(self.weights, self.biases):
139 |             z = np.dot(w, a) + b
140 |             a  = activation(z)
141 |             pre_activations.append(z)
142 |             activations.append(a)
143 |         return a, pre_activations, activations
144 | 
145 |     def compute_deltas(self, pre_activations, y_true, y_pred):
146 |         """
147 |         Computes a list containing the values of delta for each layer using 
148 |         a recursion
149 |         Parameters:
150 |         ---
151 |         pre_activations: list of of pre-activations. each corresponding to a layer
152 |         y_true: ground truth values of the labels
153 |         y_pred: prediction values of the labels
154 |         Returns:
155 |         ---
156 |         deltas: a list of deltas per layer
157 |         
158 |         """
159 |         delta_L = cost_function_prime(y_true, y_pred) * activation(pre_activations[-1], derivative=True)
160 |         deltas = [0] * (len(self.size) - 1)
161 |         deltas[-1] = delta_L
162 |         for l in range(len(deltas) - 2, -1, -1):
163 |             delta = np.dot(self.weights[l + 1].transpose(), deltas[l + 1]) * activation(pre_activations[l], derivative=True) 
164 |             deltas[l] = delta
165 |         return deltas
166 | 
167 |     def backpropagate(self, deltas, pre_activations, activations):
168 |         """
169 |         Applies back-propagation and computes the gradient of the loss
170 |         w.r.t the weights and biases of the network
171 | 
172 |         Parameters:
173 |         ---
174 |         deltas: list of deltas computed by compute_deltas
175 |         pre_activations: a list of pre-activations per layer
176 |         activations: a list of activations per layer
177 |         Returns:
178 |         ---
179 |         dW: list of gradients w.r.t. the weight matrices of the network
180 |         db: list of gradients w.r.t. the biases (vectors) of the network
181 |     
182 |         """
183 |         dW = []
184 |         db = []
185 |         deltas = [0] + deltas
186 |         for l in range(1, len(self.size)):
187 |             dW_l = np.dot(deltas[l], activations[l-1].transpose()) 
188 |             db_l = deltas[l]
189 |             dW.append(dW_l)
190 |             db.append(np.expand_dims(db_l.mean(axis=1), 1))
191 |         return dW, db
192 | 
193 |     def plot_decision_regions(self, X, y, iteration, train_loss, val_loss, train_acc, val_acc, res=0.01):
194 |         """
195 |         Plots the decision boundary at each iteration (i.e. epoch) in order to inspect the performance
196 |         of the model
197 | 
198 |         Parameters:
199 |         ---
200 |         X: the input data
201 |         y: the labels
202 |         iteration: the epoch number
203 |         train_loss: value of the training loss
204 |         val_loss: value of the validation loss
205 |         train_acc: value of the training accuracy
206 |         val_acc: value of the validation accuracy
207 |         res: resolution of the plot
208 |         Returns:
209 |         ---
210 |         None: this function plots the decision boundary
211 |         """
212 |         X, y = X.T, y.T 
213 |         x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
214 |         y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
215 |         xx, yy = np.meshgrid(np.arange(x_min, x_max, res),
216 |                             np.arange(y_min, y_max, res))
217 |         
218 |         Z = self.predict(np.c_[xx.ravel(), yy.ravel()].T)
219 |         Z = Z.reshape(xx.shape)
220 |         plt.contourf(xx, yy, Z, alpha=0.5)
221 |         plt.xlim(xx.min(), xx.max())
222 |         plt.ylim(yy.min(), yy.max())
223 |         plt.scatter(X[:, 0], X[:, 1], c=y.reshape(-1),  alpha=0.2)
224 |         message = 'iteration: {} | train loss: {} | val loss: {} | train acc: {} | val acc: {}'.format(iteration,
225 |                                                                                                      train_loss, 
226 |                                                                                                      val_loss, 
227 |                                                                                                      train_acc, 
228 |                                                                                                      val_acc)
229 |         plt.title(message)
230 | 
231 |     
232 | 
233 |     def train(self, X, y, batch_size, epochs, learning_rate, validation_split=0.2, print_every=10, tqdm_=True, plot_every=None):
234 |         """
235 |         Trains the network using the gradients computed by back-propagation
236 |         Splits the data in train and validation splits
237 |         Processes the training data by batches and trains the network using batch gradient descent
238 | 
239 |         Parameters:
240 |         ---
241 |         X: input data
242 |         y: input labels
243 |         batch_size: number of data points to process in each batch
244 |         epochs: number of epochs for the training
245 |         learning_rate: value of the learning rate
246 |         validation_split: percentage of the data for validation
247 |         print_every: the number of epochs by which the network logs the loss and accuracy metrics for train and validations splits
248 |         tqdm_: use tqdm progress-bar
249 |         plot_every: the number of epochs by which the network plots the decision boundary
250 |     
251 |         Returns:
252 |         ---
253 |         history: dictionary of train and validation metrics per epoch
254 |             train_acc: train accuracy
255 |             test_acc: validation accuracy
256 |             train_loss: train loss
257 |             test_loss: validation loss
258 | 
259 |         This history is used to plot the performance of the model
260 |         """
261 |         history_train_losses = []
262 |         history_train_accuracies = []
263 |         history_test_losses = []
264 |         history_test_accuracies = []
265 | 
266 |         x_train, x_test, y_train, y_test = train_test_split(X.T, y.T, test_size=validation_split, )
267 |         x_train, x_test, y_train, y_test = x_train.T, x_test.T, y_train.T, y_test.T 
268 | 
269 |         if tqdm_:
270 |             epoch_iterator = tqdm(range(epochs))
271 |         else:
272 |             epoch_iterator = range(epochs)
273 | 
274 |         for e in epoch_iterator:
275 |             if x_train.shape[1] % batch_size == 0:
276 |                 n_batches = int(x_train.shape[1] / batch_size)
277 |             else:
278 |                 n_batches = int(x_train.shape[1] / batch_size ) - 1
279 | 
280 |             x_train, y_train = shuffle(x_train.T, y_train.T)
281 |             x_train, y_train = x_train.T, y_train.T
282 | 
283 |             batches_x = [x_train[:, batch_size*i:batch_size*(i+1)] for i in range(0, n_batches)]
284 |             batches_y = [y_train[:, batch_size*i:batch_size*(i+1)] for i in range(0, n_batches)]
285 | 
286 |             train_losses = []
287 |             train_accuracies = []
288 |             
289 |             test_losses = []
290 |             test_accuracies = []
291 | 
292 |             dw_per_epoch = [np.zeros(w.shape) for w in self.weights]
293 |             db_per_epoch = [np.zeros(b.shape) for b in self.biases] 
294 |             
295 |             for batch_x, batch_y in zip(batches_x, batches_y):
296 |                 batch_y_pred, pre_activations, activations = self.forward(batch_x)
297 |                 deltas = self.compute_deltas(pre_activations, batch_y, batch_y_pred)
298 |                 dW, db = self.backpropagate(deltas, pre_activations, activations)
299 |                 for i, (dw_i, db_i) in enumerate(zip(dW, db)):
300 |                     dw_per_epoch[i] += dw_i / batch_size
301 |                     db_per_epoch[i] += db_i / batch_size
302 | 
303 |                 batch_y_train_pred = self.predict(batch_x)
304 | 
305 |                 train_loss = cost_function(batch_y, batch_y_train_pred)
306 |                 train_losses.append(train_loss)
307 |                 train_accuracy = accuracy_score(batch_y.T, batch_y_train_pred.T)
308 |                 train_accuracies.append(train_accuracy)
309 | 
310 |                 batch_y_test_pred = self.predict(x_test)
311 | 
312 |                 test_loss = cost_function(y_test, batch_y_test_pred)
313 |                 test_losses.append(test_loss)
314 |                 test_accuracy = accuracy_score(y_test.T, batch_y_test_pred.T)
315 |                 test_accuracies.append(test_accuracy)
316 | 
317 | 
318 |             # weight update
319 |             for i, (dw_epoch, db_epoch) in enumerate(zip(dw_per_epoch, db_per_epoch)):
320 |                 self.weights[i] = self.weights[i] - learning_rate * dw_epoch
321 |                 self.biases[i] = self.biases[i] - learning_rate * db_epoch
322 | 
323 |             history_train_losses.append(np.mean(train_losses))
324 |             history_train_accuracies.append(np.mean(train_accuracies))
325 |             
326 |             history_test_losses.append(np.mean(test_losses))
327 |             history_test_accuracies.append(np.mean(test_accuracies))
328 | 
329 | 
330 |             if not plot_every:
331 |                 if e % print_every == 0:    
332 |                     print('Epoch {} / {} | train loss: {} | train accuracy: {} | val loss : {} | val accuracy : {} '.format(
333 |                         e, epochs, np.round(np.mean(train_losses), 3), np.round(np.mean(train_accuracies), 3), 
334 |                         np.round(np.mean(test_losses), 3),  np.round(np.mean(test_accuracies), 3)))
335 |             else:
336 |                 if e % plot_every == 0:
337 |                     self.plot_decision_regions(x_train, y_train, e, 
338 |                                                 np.round(np.mean(train_losses), 4), 
339 |                                                 np.round(np.mean(test_losses), 4),
340 |                                                 np.round(np.mean(train_accuracies), 4), 
341 |                                                 np.round(np.mean(test_accuracies), 4), 
342 |                                                 )
343 |                     plt.show()                    
344 |                     display.display(plt.gcf())
345 |                     display.clear_output(wait=True)
346 | 
347 |         self.plot_decision_regions(X, y, e, 
348 |                                     np.round(np.mean(train_losses), 4), 
349 |                                     np.round(np.mean(test_losses), 4),
350 |                                     np.round(np.mean(train_accuracies), 4), 
351 |                                     np.round(np.mean(test_accuracies), 4), 
352 |                                     )
353 | 
354 |         history = {'epochs': epochs,
355 |                    'train_loss': history_train_losses, 
356 |                    'train_acc': history_train_accuracies,
357 |                    'test_loss': history_test_losses,
358 |                    'test_acc': history_test_accuracies
359 |                    }
360 |         return history
361 | 
362 |     def predict(self, a):
363 |         '''
364 |         Use the current state of the network to make predictions
365 | 
366 |         Parameters:
367 |         ---
368 |         a: input data, shape: (input_shape, batch_size)
369 | 
370 |         Returns:
371 |         ---
372 |         predictions: vector of output predictions
373 |         '''
374 |         for w, b in zip(self.weights, self.biases):
375 |             z = np.dot(w, a) + b
376 |             a = activation(z)
377 |         predictions = (a > 0.5).astype(int)
378 |         return predictions
379 | 


--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
 1 | 
 2 | ## Learn backpropagtion the **hard** way
 3 | 
 4 | ![Backpropagation](https://github.com/ahmedbesbes/Neural-Network-from-scratch/blob/master/images/backprop.gif)
 5 | 
 6 | In this repository, I will show you how to build a neural network from scratch (yes, by using plain python code with no framework involved) that trains by mini-batches using gradient descent. Check **nn.py** for the code.
 7 | 
 8 | In the related notebook **Neural_Network_from_scratch_with_Numpy.ipynb** we will test nn.py on a set of non-linear classification problems
 9 | 
10 | - We'll train the neural network for some number of epochs and some hyperparameters
11 | - Plot a live/interactive decision boundary 
12 | - Plot the train and validation metrics such as the loss and the accuracies
13 | 
14 | 
15 | ## Example: Noisy Moons (Check the notebook for other kinds of problems)
16 | 
17 | ### Decision boundary (you'll get to this graph animated during training)
18 | ![Decision boundary](https://github.com/ahmedbesbes/Neural-Network-from-scratch/blob/master/images/decision_boundary.png)
19 | 
20 | ### Loss and accuracy monitoring on train and validation sets 
21 | ![Loss/Accuracy monitoring on train/val](https://github.com/ahmedbesbes/Neural-Network-from-scratch/blob/master/images/loss_acc.png)
22 | 
23 | 
24 | ## Where to go from here?
25 | nn.py is a toy neural network that is meant for educational purposes only. So there's room for a lot of improvement if you want to pimp it. Here are some guidelines:
26 | 
27 | - Implement a different loss function such as the Binary Cross Entropy loss. For a classification problem, this loss works better than a Mean Square Error. 
28 | - Make the code generic regarding the activation functions so that we can choose any function we want: ReLU, Sigmoid, Tanh, etc.
29 | - Try to code another optimizers: SGD is good but it has some limitations: sometimes it can be stuck in local minima. Look into Adam or RMSProp.
30 | - Play with the hyperparameters and check the validation metrics
31 | 
32 | 


--------------------------------------------------------------------------------