├── LICENSE ├── README.md ├── four_layer_network.py ├── images ├── circles.png ├── learning_rate.png ├── moons.png ├── noise.png ├── num_nodes.png ├── num_observations.png └── regularization.png ├── tests.py └── three_layer_network.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016 James LeDoux 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # NumPy Neural Network 2 | #### This is a simple multilayer perceptron implemented from scratch in pure Python and NumPy. 3 | 4 | This repo includes a three and four layer nueral network (with one and two hidden layers respectively), trained via batch gradient descent with backpropogation. The tunable parameters include: 5 | * Learning rate 6 | * Regularization lambda 7 | * Nodes per hidden layer 8 | * Number of output classes 9 | * Stopping criterion 10 | * Activation function 11 | 12 | A good starting point for this model is a learning rate of 0.01, regularization of 0.01, 32 nodes per hidden layer, and ReLU activations. These will differ according the context in which the model is used. 13 | 14 | 15 | Here are some results from the three-layer model on some particularly tricky separation boundaries. The model generalizes well to non-linear patterns. 16 | 17 | ![inline 50%](images/moons.png)![inline 50%](images/circles.png) 18 | 19 | 20 | parameter tuning looked as follows: 21 | 22 | 23 | ![inline 50%](images/num_nodes.png)![inline 50%](images/num_observations.png) 24 | ![inline 50%](images/regularization.png)![inline 50%](images/learning_rate.png) 25 | 26 | As you can see, most of the patterns worked as expected. More data led to more stable training, more nodes led to a better model fit, increased regularization led to increased training loss, and a smaller learning rate caused a smoother but slower-moving training curve. Worth noting, however, is how extreme values of some of these values caused the model to become less stable. A high number of observations or learning rate, for example, caused erratic and sub-optimal behaviour during training. This is an indication that there is still a significant that can be done in order to optimize this model. 27 | 28 | ### lessons learned: 29 | * Logistic activation functions really do complicate MLP training. Too low of a learning rate, too many observations, and sigmoidal activation functions all made this model unstable, and even broke it in some cases. 30 | 31 | * These these models are incredibly flexible. This simple network was able to approximate every function I threw its way. 32 | 33 | * Neural networks are hard. I have a newfound appreciation for the layers of abstraction that tensorflow, keras, etc. provide between programmer and network. 34 | 35 | 36 | Thank you to [WildML](http://www.wildml.com/2015/09/implementing-a-neural-network-from-scratch/) for providing a starting point for this project + code, and to Ian Goodfellow's book *Deep Learning* for background on the algorithms and parameter tuning. -------------------------------------------------------------------------------- /four_layer_network.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import math 3 | from sklearn import datasets 4 | 5 | def relu(X): 6 | return np.maximum(X, 0) 7 | 8 | def relu_derivative(X): 9 | return 1. * (X > 0) 10 | 11 | def build_model(X,hidden_nodes,output_dim=2): 12 | model = {} 13 | input_dim = X.shape[1] 14 | model['W1'] = np.random.randn(input_dim, hidden_nodes) / np.sqrt(input_dim) 15 | model['b1'] = np.zeros((1, hidden_nodes)) 16 | model['W2'] = np.random.randn(hidden_nodes, hidden_nodes) / np.sqrt(hidden_nodes) 17 | model['b2'] = np.zeros((1, hidden_nodes)) 18 | model['W3'] = np.random.randn(hidden_nodes, output_dim) / np.sqrt(hidden_nodes) 19 | model['b3'] = np.zeros((1, output_dim)) 20 | return model 21 | 22 | def feed_forward(model, x): 23 | W1, b1, W2, b2, W3, b3 = model['W1'], model['b1'], model['W2'], model['b2'], model['W3'], model['b3'] 24 | # Forward propagation 25 | z1 = x.dot(W1) + b1 26 | #a1 = np.tanh(z1) 27 | a1 = relu(z1) 28 | z2 = a1.dot(W2) + b2 29 | a2 = relu(z2) 30 | z3 = a2.dot(W3) + b3 31 | exp_scores = np.exp(z3) 32 | out = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) 33 | return z1, a1, z2, a2, z3, out 34 | 35 | def calculate_loss(model,X,y,reg_lambda): 36 | num_examples = X.shape[0] 37 | W1, b1, W2, b2, W3, b3 = model['W1'], model['b1'], model['W2'], model['b2'], model['W3'], model['b3'] 38 | # Forward propagation to calculate our predictions 39 | z1, a1, z2, a2, z3, out = feed_forward(model, X) 40 | probs = out / np.sum(out, axis=1, keepdims=True) 41 | # Calculating the loss 42 | corect_logprobs = -np.log(probs[range(num_examples), y]) 43 | loss = np.sum(corect_logprobs) 44 | # Add regulatization term to loss (optional) 45 | loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)) + np.sum(np.square(W3))) 46 | return 1./num_examples * loss 47 | 48 | def backprop(X,y,model,z1,a1,z2,a2,z3,output,reg_lambda): 49 | delta3 = output 50 | delta3[range(X.shape[0]), y] -= 1 #yhat - y 51 | dW3 = (a2.T).dot(delta3) 52 | db3 = np.sum(delta3, axis=0, keepdims=True) 53 | delta2 = delta3.dot(model['W3'].T) * relu_derivative(a2) #if ReLU 54 | dW2 = np.dot(a1.T, delta2) 55 | db2 = np.sum(delta2, axis=0) 56 | #delta2 = delta3.dot(model['W2'].T) * (1 - np.power(a1, 2)) #if tanh 57 | delta1 = delta2.dot(model['W2'].T) * relu_derivative(a1) #if ReLU 58 | dW1 = np.dot(X.T, delta1) 59 | db1 = np.sum(delta1, axis=0) 60 | # Add regularization terms 61 | dW3 += reg_lambda * model['W3'] 62 | dW2 += reg_lambda * model['W2'] 63 | dW1 += reg_lambda * model['W1'] 64 | return dW1, dW2, dW3, db1, db2, db3 65 | 66 | 67 | def train(model, X, y, num_passes=10000, reg_lambda = .1, learning_rate=0.1): 68 | # Batch gradient descent 69 | done = False 70 | previous_loss = float('inf') 71 | i = 0 72 | losses = [] 73 | while done == False: #comment out while performance testing 74 | #while i < 1500: 75 | #feed forward 76 | z1,a1,z2,a2,z3,output = feed_forward(model, X) 77 | #backpropagation 78 | dW1, dW2, dW3, db1, db2, db3 = backprop(X,y,model,z1,a1,z2,a2,z3,output,reg_lambda) 79 | #update weights and biases 80 | model['W1'] -= learning_rate * dW1 81 | model['b1'] -= learning_rate * db1 82 | model['W2'] -= learning_rate * dW2 83 | model['b2'] -= learning_rate * db2 84 | model['W3'] -= learning_rate * dW3 85 | model['b3'] -= learning_rate * db3 86 | if i % 1000 == 0: 87 | loss = calculate_loss(model, X, y, reg_lambda) 88 | losses.append(loss) 89 | print "Loss after iteration %i: %f" %(i, loss) #uncomment once testing finished, return mod val to 1000 90 | if (previous_loss-loss)/previous_loss < 0.01: 91 | done = True 92 | #print i 93 | previous_loss = loss 94 | i += 1 95 | return model, losses 96 | 97 | def main(): 98 | #toy dataset 99 | X, y = datasets.make_moons(16, noise=0.10) 100 | num_examples = len(X) # training set size 101 | nn_input_dim = 2 # input layer dimensionality 102 | nn_output_dim = 2 # output layer dimensionality 103 | learning_rate = 0.01 # learning rate for gradient descent 104 | reg_lambda = 0.01 # regularization strength 105 | model = build_model(X,20,2) 106 | model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate) 107 | output = feed_forward(model, X) 108 | preds = np.argmax(output[3], axis=1) 109 | 110 | if __name__ == "__main__": 111 | main() 112 | -------------------------------------------------------------------------------- /images/circles.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/circles.png -------------------------------------------------------------------------------- /images/learning_rate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/learning_rate.png -------------------------------------------------------------------------------- /images/moons.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/moons.png -------------------------------------------------------------------------------- /images/noise.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/noise.png -------------------------------------------------------------------------------- /images/num_nodes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/num_nodes.png -------------------------------------------------------------------------------- /images/num_observations.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/num_observations.png -------------------------------------------------------------------------------- /images/regularization.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/regularization.png -------------------------------------------------------------------------------- /tests.py: -------------------------------------------------------------------------------- 1 | from three_layer_network import build_model, relu, relu_derivative, feed_forward, \ 2 | calculate_loss, backprop, train 3 | #from three_layer_network import * 4 | import matplotlib.pyplot as plt 5 | from sklearn import datasets 6 | import numpy as np 7 | 8 | """ 9 | to reproduce tests, modify the three_layer_network.py file by commenting out 10 | 'while done == True', and uncommenting 'while i < 150', and then by changing 11 | 'if i % 1000 == 0' to 'if i % 150 == 0' 12 | """ 13 | 14 | 15 | def num_observations(): 16 | obs_values = [10, 100, 1000] 17 | nn_input_dim = 2 # input layer dimensionality 18 | nn_output_dim = 2 # output layer dimensionality 19 | learning_rate = 0.01 # learning rate for gradient descent 20 | reg_lambda = 0.01 # regularization strength 21 | losses_store = [] 22 | for i in obs_values: 23 | X, y = datasets.make_moons(i, noise=0.1) 24 | num_examples = len(X) # training set size 25 | model = build_model(X,32,2) 26 | model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate) 27 | losses_store.append(losses) 28 | print losses 29 | x = np.linspace(0,145,30) 30 | for i in range(len(losses_store)): 31 | lab = 'n_observations = ' + str(obs_values[i]) 32 | plt.plot(x,losses_store[i],label=lab) 33 | plt.legend() 34 | plt.show() 35 | 36 | def noise(): 37 | noise_values = [0.01, 0.1, 0.2, 0.3, 0.4] 38 | nn_input_dim = 2 # input layer dimensionality 39 | nn_output_dim = 2 # output layer dimensionality 40 | learning_rate = 0.01 # learning rate for gradient descent 41 | reg_lambda = 0.01 # regularization strength 42 | losses_store = [] 43 | for i in noise_values: 44 | X, y = datasets.make_moons(200, noise=i) 45 | num_examples = len(X) # training set size 46 | model = build_model(X,32,2) 47 | model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate) 48 | losses_store.append(losses) 49 | print losses 50 | x = np.linspace(0,145,30) 51 | for i in range(len(losses_store)): 52 | lab = 'noise_value = ' + str(noise_values[i]) 53 | plt.plot(x,losses_store[i],label=lab) 54 | plt.legend() 55 | plt.show() 56 | 57 | def reg(): 58 | reg_values = [0.00, 0.01, 0.1, 0.2, 0.3] 59 | nn_input_dim = 2 # input layer dimensionality 60 | nn_output_dim = 2 # output layer dimensionality 61 | learning_rate = 0.01 # learning rate for gradient descent 62 | losses_store = [] 63 | for i in reg_values: 64 | reg_lambda = i # regularization strength 65 | X, y = datasets.make_moons(200, noise=0.2) 66 | num_examples = len(X) # training set size 67 | model = build_model(X,32,2) 68 | model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate) 69 | losses_store.append(losses) 70 | print losses 71 | x = np.linspace(0,145,30) 72 | for i in range(len(losses_store)): 73 | lab = 'regularization_value = ' + str(reg_values[i]) 74 | plt.plot(x,losses_store[i],label=lab) 75 | plt.legend() 76 | plt.show() 77 | 78 | 79 | def lr(): 80 | lr_values = [0.001, 0.01, 0.05] 81 | nn_input_dim = 2 # input layer dimensionality 82 | nn_output_dim = 2 # output layer dimensionality 83 | reg_lambda = .01 # regularization strength 84 | losses_store = [] 85 | for i in lr_values: 86 | learning_rate = i 87 | X, y = datasets.make_moons(200, noise=0.2) 88 | num_examples = len(X) # training set size 89 | model = build_model(X,32,2) 90 | model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate) 91 | losses_store.append(losses) 92 | print losses 93 | x = np.linspace(0,145,30) 94 | for i in range(len(losses_store)): 95 | lab = 'learning rate = ' + str(lr_values[i]) 96 | plt.plot(x,losses_store[i],label=lab) 97 | plt.legend() 98 | plt.show() 99 | 100 | def test_num_nodes(): 101 | X, y = datasets.make_moons(400, noise=0.2) 102 | num_examples = len(X) # training set size 103 | nn_input_dim = 2 # input layer dimensionality 104 | nn_output_dim = 2 # output layer dimensionality 105 | learning_rate = 0.01 # learning rate for gradient descent 106 | reg_lambda = 0.01 # regularization strength 107 | node_vals = [4,8,16,32,64,128] 108 | losses_store = [] 109 | for val in node_vals: 110 | model = build_model(X,val,2) 111 | model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate) 112 | losses_store.append(losses) 113 | print losses 114 | x = np.linspace(0,145,30) 115 | for i in range(len(losses_store)): 116 | lab = 'n_nodes = ' + str(node_vals[i]) 117 | plt.plot(x,losses_store[i],label=lab) 118 | plt.legend() 119 | plt.show() 120 | 121 | print "number of observations:" 122 | num_observations() 123 | print 'noise:' 124 | noise() 125 | print 'regularization:' 126 | reg() 127 | print 'learning rate:' 128 | lr() 129 | print 'hidden nodes:' 130 | test_num_nodes() -------------------------------------------------------------------------------- /three_layer_network.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import math 3 | from sklearn import datasets 4 | 5 | def relu(X): 6 | return np.maximum(X, 0) 7 | 8 | def relu_derivative(X): 9 | return 1. * (X > 0) 10 | 11 | def build_model(X,hidden_nodes,output_dim=2): 12 | model = {} 13 | input_dim = X.shape[1] 14 | model['W1'] = np.random.randn(input_dim, hidden_nodes) / np.sqrt(input_dim) 15 | model['b1'] = np.zeros((1, hidden_nodes)) 16 | model['W2'] = np.random.randn(hidden_nodes, output_dim) / np.sqrt(hidden_nodes) 17 | model['b2'] = np.zeros((1, output_dim)) 18 | return model 19 | 20 | def feed_forward(model, x): 21 | W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2'] 22 | # Forward propagation 23 | z1 = x.dot(W1) + b1 24 | #a1 = np.tanh(z1) 25 | a1 = relu(z1) 26 | z2 = a1.dot(W2) + b2 27 | exp_scores = np.exp(z2) 28 | out = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) 29 | return z1, a1, z2, out 30 | 31 | def calculate_loss(model,X,y,reg_lambda): 32 | num_examples = X.shape[0] 33 | W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2'] 34 | # Forward propagation to calculate our predictions 35 | z1, a1, z2, out = feed_forward(model, X) 36 | probs = out / np.sum(out, axis=1, keepdims=True) 37 | # Calculating the loss 38 | corect_logprobs = -np.log(probs[range(num_examples), y]) 39 | loss = np.sum(corect_logprobs) 40 | # Add regulatization term to loss (optional) 41 | loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2))) 42 | return 1./num_examples * loss 43 | 44 | def backprop(X,y,model,z1,a1,z2,output,reg_lambda): 45 | delta3 = output 46 | delta3[range(X.shape[0]), y] -= 1 #yhat - y 47 | dW2 = (a1.T).dot(delta3) 48 | db2 = np.sum(delta3, axis=0, keepdims=True) 49 | #delta2 = delta3.dot(model['W2'].T) * (1 - np.power(a1, 2)) #if tanh 50 | delta2 = delta3.dot(model['W2'].T) * relu_derivative(a1) #if ReLU 51 | dW1 = np.dot(X.T, delta2) 52 | db1 = np.sum(delta2, axis=0) 53 | # Add regularization terms 54 | dW2 += reg_lambda * model['W2'] 55 | dW1 += reg_lambda * model['W1'] 56 | return dW1, dW2, db1, db2 57 | 58 | 59 | def train(model, X, y, num_passes=10000, reg_lambda = .1, learning_rate=0.1): 60 | # Batch gradient descent 61 | done = False 62 | previous_loss = float('inf') 63 | i = 0 64 | losses = [] 65 | while done == False: #comment out while performance testing 66 | #while i < 1500: 67 | #feed forward 68 | z1,a1,z2,output = feed_forward(model, X) 69 | #backpropagation 70 | dW1, dW2, db1, db2 = backprop(X,y,model,z1,a1,z2,output,reg_lambda) 71 | #update weights and biases 72 | model['W1'] -= learning_rate * dW1 73 | model['b1'] -= learning_rate * db1 74 | model['W2'] -= learning_rate * dW2 75 | model['b2'] -= learning_rate * db2 76 | if i % 1000 == 0: 77 | loss = calculate_loss(model, X, y, reg_lambda) 78 | losses.append(loss) 79 | print "Loss after iteration %i: %f" %(i, loss) #uncomment once testing finished, return mod val to 1000 80 | if (previous_loss-loss)/previous_loss < 0.01: 81 | done = True 82 | #print i 83 | previous_loss = loss 84 | i += 1 85 | return model, losses 86 | 87 | def main(): 88 | #toy dataset 89 | X, y = datasets.make_moons(16, noise=0.10) 90 | num_examples = len(X) # training set size 91 | nn_input_dim = 2 # input layer dimensionality 92 | nn_output_dim = 2 # output layer dimensionality 93 | learning_rate = 0.01 # learning rate for gradient descent 94 | reg_lambda = 0.01 # regularization strength 95 | model = build_model(X,20,2) 96 | model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate) 97 | output = feed_forward(model, X) 98 | preds = np.argmax(output[3], axis=1) 99 | 100 | if __name__ == "__main__": 101 | main() 102 | --------------------------------------------------------------------------------