├── LICENSE
├── README.md
├── four_layer_network.py
├── images
    ├── circles.png
    ├── learning_rate.png
    ├── moons.png
    ├── noise.png
    ├── num_nodes.png
    ├── num_observations.png
    └── regularization.png
├── tests.py
└── three_layer_network.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2016 James LeDoux
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # NumPy Neural Network
 2 | #### This is a simple multilayer perceptron implemented from scratch in pure Python and NumPy.
 3 | 
 4 | This repo includes a three and four layer nueral network (with one and two hidden layers respectively), trained via batch gradient descent with backpropogation. The tunable parameters include:
 5 | * Learning rate
 6 | * Regularization lambda
 7 | * Nodes per hidden layer
 8 | * Number of output classes
 9 | * Stopping criterion
10 | * Activation function
11 | 
12 | A good starting point for this model is a learning rate of 0.01, regularization of 0.01, 32 nodes per hidden layer, and ReLU activations. These will differ according the context in which the model is used. 
13 | 
14 | 
15 | Here are some results from the three-layer model on some particularly tricky separation boundaries. The model generalizes well to non-linear patterns.
16 | 
17 | ![inline 50%](images/moons.png)![inline 50%](images/circles.png)
18 | 
19 | 
20 | parameter tuning looked as follows:
21 | 
22 | 
23 | ![inline 50%](images/num_nodes.png)![inline 50%](images/num_observations.png)
24 | ![inline 50%](images/regularization.png)![inline 50%](images/learning_rate.png)
25 | 
26 | As you can see, most of the patterns worked as expected. More data led to more stable training, more nodes led to a better model fit, increased regularization led to increased training loss, and a smaller learning rate caused a smoother but slower-moving training curve. Worth noting, however, is how extreme values of some of these values caused the model to become less stable. A high number of observations or learning rate, for example, caused erratic and sub-optimal behaviour during training. This is an indication that there is still a significant that can be done in order to optimize this model. 
27 | 
28 | ### lessons learned:
29 | * Logistic activation functions really do complicate MLP training. Too low of a learning rate, too many observations, and sigmoidal activation functions all made this model unstable, and even broke it in some cases.
30 | 
31 | * These these models  are incredibly flexible. This simple network was able to approximate every function I threw its way.
32 | 
33 | * Neural networks are hard. I have a newfound appreciation for the layers of abstraction that tensorflow, keras, etc. provide between programmer and network.
34 | 
35 | 
36 | Thank you to [WildML](http://www.wildml.com/2015/09/implementing-a-neural-network-from-scratch/) for providing a starting point for this project + code, and to Ian Goodfellow's book *Deep Learning* for background on the algorithms and parameter tuning.


--------------------------------------------------------------------------------
/four_layer_network.py:
--------------------------------------------------------------------------------
  1 | import numpy as np 
  2 | import math
  3 | from sklearn import datasets
  4 | 
  5 | def relu(X):
  6 |     return np.maximum(X, 0)
  7 | 
  8 | def relu_derivative(X):
  9 |     return 1. * (X > 0)
 10 | 
 11 | def build_model(X,hidden_nodes,output_dim=2):
 12 |     model = {}
 13 |     input_dim = X.shape[1]
 14 |     model['W1'] = np.random.randn(input_dim, hidden_nodes) / np.sqrt(input_dim)
 15 |     model['b1'] = np.zeros((1, hidden_nodes))
 16 |     model['W2'] = np.random.randn(hidden_nodes, hidden_nodes) / np.sqrt(hidden_nodes)
 17 |     model['b2'] = np.zeros((1, hidden_nodes))
 18 |     model['W3'] = np.random.randn(hidden_nodes, output_dim) / np.sqrt(hidden_nodes)
 19 |     model['b3'] = np.zeros((1, output_dim))
 20 |     return model
 21 | 
 22 | def feed_forward(model, x):
 23 |     W1, b1, W2, b2, W3, b3 = model['W1'], model['b1'], model['W2'], model['b2'], model['W3'], model['b3']
 24 |     # Forward propagation
 25 |     z1 = x.dot(W1) + b1
 26 |     #a1 = np.tanh(z1)
 27 |     a1 = relu(z1)
 28 |     z2 = a1.dot(W2) + b2
 29 |     a2 = relu(z2)
 30 |     z3 = a2.dot(W3) + b3
 31 |     exp_scores = np.exp(z3)
 32 |     out = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
 33 |     return z1, a1, z2, a2, z3, out
 34 | 
 35 | def calculate_loss(model,X,y,reg_lambda):
 36 |     num_examples = X.shape[0]
 37 |     W1, b1, W2, b2, W3, b3 = model['W1'], model['b1'], model['W2'], model['b2'], model['W3'], model['b3']
 38 |     # Forward propagation to calculate our predictions
 39 |     z1, a1, z2, a2, z3, out = feed_forward(model, X)
 40 |     probs = out / np.sum(out, axis=1, keepdims=True)
 41 |     # Calculating the loss
 42 |     corect_logprobs = -np.log(probs[range(num_examples), y])
 43 |     loss = np.sum(corect_logprobs)
 44 |     # Add regulatization term to loss (optional)
 45 |     loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)) + np.sum(np.square(W3)))
 46 |     return 1./num_examples * loss
 47 | 
 48 | def backprop(X,y,model,z1,a1,z2,a2,z3,output,reg_lambda):
 49 |     delta3 = output
 50 |     delta3[range(X.shape[0]), y] -= 1  #yhat - y
 51 |     dW3 = (a2.T).dot(delta3)
 52 |     db3 = np.sum(delta3, axis=0, keepdims=True)
 53 |     delta2 = delta3.dot(model['W3'].T) * relu_derivative(a2) #if ReLU
 54 |     dW2 = np.dot(a1.T, delta2)
 55 |     db2 = np.sum(delta2, axis=0)
 56 |     #delta2 = delta3.dot(model['W2'].T) * (1 - np.power(a1, 2)) #if tanh
 57 |     delta1 = delta2.dot(model['W2'].T) * relu_derivative(a1) #if ReLU
 58 |     dW1 = np.dot(X.T, delta1)
 59 |     db1 = np.sum(delta1, axis=0)
 60 |     # Add regularization terms
 61 |     dW3 += reg_lambda * model['W3']
 62 |     dW2 += reg_lambda * model['W2']
 63 |     dW1 += reg_lambda * model['W1']
 64 |     return dW1, dW2, dW3, db1, db2, db3
 65 | 
 66 | 
 67 | def train(model, X, y, num_passes=10000, reg_lambda = .1, learning_rate=0.1):
 68 |     # Batch gradient descent
 69 |     done = False
 70 |     previous_loss = float('inf')
 71 |     i = 0
 72 |     losses = []
 73 |     while done == False:  #comment out while performance testing
 74 |     #while i < 1500:
 75 |         #feed forward
 76 |         z1,a1,z2,a2,z3,output = feed_forward(model, X)
 77 |         #backpropagation
 78 |         dW1, dW2, dW3, db1, db2, db3 = backprop(X,y,model,z1,a1,z2,a2,z3,output,reg_lambda)
 79 |         #update weights and biases
 80 |         model['W1'] -= learning_rate * dW1
 81 |         model['b1'] -= learning_rate * db1
 82 |         model['W2'] -= learning_rate * dW2
 83 |         model['b2'] -= learning_rate * db2
 84 |         model['W3'] -= learning_rate * dW3
 85 |         model['b3'] -= learning_rate * db3
 86 |         if i % 1000 == 0:
 87 |             loss = calculate_loss(model, X, y, reg_lambda)
 88 |             losses.append(loss)
 89 |             print "Loss after iteration %i: %f" %(i, loss)  #uncomment once testing finished, return mod val to 1000
 90 |             if (previous_loss-loss)/previous_loss < 0.01:
 91 |                 done = True
 92 |                 #print i
 93 |             previous_loss = loss
 94 |         i += 1
 95 |     return model, losses
 96 | 
 97 | def main():
 98 |     #toy dataset
 99 |     X, y = datasets.make_moons(16, noise=0.10)
100 |     num_examples = len(X) # training set size
101 |     nn_input_dim = 2 # input layer dimensionality
102 |     nn_output_dim = 2 # output layer dimensionality 
103 |     learning_rate = 0.01 # learning rate for gradient descent
104 |     reg_lambda = 0.01 # regularization strength
105 |     model = build_model(X,20,2)
106 |     model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate)
107 |     output = feed_forward(model, X)
108 |     preds = np.argmax(output[3], axis=1)
109 | 
110 | if __name__ == "__main__":
111 |     main()
112 | 


--------------------------------------------------------------------------------
/images/circles.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/circles.png


--------------------------------------------------------------------------------
/images/learning_rate.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/learning_rate.png


--------------------------------------------------------------------------------
/images/moons.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/moons.png


--------------------------------------------------------------------------------
/images/noise.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/noise.png


--------------------------------------------------------------------------------
/images/num_nodes.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/num_nodes.png


--------------------------------------------------------------------------------
/images/num_observations.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/num_observations.png


--------------------------------------------------------------------------------
/images/regularization.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jldbc/numpy_neural_net/24531d9674c853b15de06c1b3bed961cfd4fcf3f/images/regularization.png


--------------------------------------------------------------------------------
/tests.py:
--------------------------------------------------------------------------------
  1 | from three_layer_network import build_model, relu, relu_derivative, feed_forward, \
  2 | 								calculate_loss, backprop, train
  3 | #from three_layer_network import *
  4 | import matplotlib.pyplot as plt
  5 | from sklearn import datasets
  6 | import numpy as np
  7 | 
  8 | """
  9 | to reproduce tests, modify the three_layer_network.py file by commenting out 
 10 | 'while done == True', and uncommenting 'while i < 150', and then by changing 
 11 | 'if i % 1000 == 0' to 'if i % 150 == 0'
 12 | """
 13 | 
 14 | 
 15 | def num_observations():
 16 | 	obs_values = [10, 100, 1000]
 17 | 	nn_input_dim = 2 # input layer dimensionality
 18 | 	nn_output_dim = 2 # output layer dimensionality 
 19 | 	learning_rate = 0.01 # learning rate for gradient descent
 20 | 	reg_lambda = 0.01 # regularization strength
 21 | 	losses_store = []
 22 | 	for i in obs_values:
 23 | 		X, y = datasets.make_moons(i, noise=0.1)
 24 | 		num_examples = len(X) # training set size
 25 | 		model = build_model(X,32,2)
 26 | 		model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate)
 27 | 		losses_store.append(losses)
 28 | 		print losses
 29 | 	x = np.linspace(0,145,30)
 30 | 	for i in range(len(losses_store)):
 31 | 		lab = 'n_observations = ' + str(obs_values[i])
 32 | 		plt.plot(x,losses_store[i],label=lab)
 33 | 	plt.legend()
 34 | 	plt.show()
 35 | 
 36 | def noise():
 37 | 	noise_values = [0.01, 0.1, 0.2, 0.3, 0.4]
 38 | 	nn_input_dim = 2 # input layer dimensionality
 39 | 	nn_output_dim = 2 # output layer dimensionality 
 40 | 	learning_rate = 0.01 # learning rate for gradient descent
 41 | 	reg_lambda = 0.01 # regularization strength
 42 | 	losses_store = []
 43 | 	for i in noise_values:
 44 | 		X, y = datasets.make_moons(200, noise=i)
 45 | 		num_examples = len(X) # training set size
 46 | 		model = build_model(X,32,2)
 47 | 		model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate)
 48 | 		losses_store.append(losses)
 49 | 		print losses
 50 | 	x = np.linspace(0,145,30)
 51 | 	for i in range(len(losses_store)):
 52 | 		lab = 'noise_value = ' + str(noise_values[i])
 53 | 		plt.plot(x,losses_store[i],label=lab)
 54 | 	plt.legend()
 55 | 	plt.show()
 56 | 
 57 | def reg():
 58 | 	reg_values = [0.00, 0.01, 0.1, 0.2, 0.3]
 59 | 	nn_input_dim = 2 # input layer dimensionality
 60 | 	nn_output_dim = 2 # output layer dimensionality 
 61 | 	learning_rate = 0.01 # learning rate for gradient descent
 62 | 	losses_store = []
 63 | 	for i in reg_values:
 64 | 		reg_lambda = i # regularization strength
 65 | 		X, y = datasets.make_moons(200, noise=0.2)
 66 | 		num_examples = len(X) # training set size
 67 | 		model = build_model(X,32,2)
 68 | 		model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate)
 69 | 		losses_store.append(losses)
 70 | 		print losses
 71 | 	x = np.linspace(0,145,30)
 72 | 	for i in range(len(losses_store)):
 73 | 		lab = 'regularization_value = ' + str(reg_values[i])
 74 | 		plt.plot(x,losses_store[i],label=lab)
 75 | 	plt.legend()
 76 | 	plt.show()
 77 | 
 78 | 
 79 | def lr():
 80 | 	lr_values = [0.001, 0.01, 0.05]
 81 | 	nn_input_dim = 2 # input layer dimensionality
 82 | 	nn_output_dim = 2 # output layer dimensionality 
 83 | 	reg_lambda = .01 # regularization strength
 84 | 	losses_store = []
 85 | 	for i in lr_values:
 86 | 		learning_rate = i
 87 | 		X, y = datasets.make_moons(200, noise=0.2)
 88 | 		num_examples = len(X) # training set size
 89 | 		model = build_model(X,32,2)
 90 | 		model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate)
 91 | 		losses_store.append(losses)
 92 | 		print losses
 93 | 	x = np.linspace(0,145,30)
 94 | 	for i in range(len(losses_store)):
 95 | 		lab = 'learning rate = ' + str(lr_values[i])
 96 | 		plt.plot(x,losses_store[i],label=lab)
 97 | 	plt.legend()
 98 | 	plt.show()
 99 | 
100 | def test_num_nodes():
101 | 	X, y = datasets.make_moons(400, noise=0.2)
102 | 	num_examples = len(X) # training set size
103 | 	nn_input_dim = 2 # input layer dimensionality
104 | 	nn_output_dim = 2 # output layer dimensionality 
105 | 	learning_rate = 0.01 # learning rate for gradient descent
106 | 	reg_lambda = 0.01 # regularization strength
107 | 	node_vals = [4,8,16,32,64,128]
108 | 	losses_store = []
109 | 	for val in node_vals:
110 | 		model = build_model(X,val,2)
111 | 		model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate)
112 | 		losses_store.append(losses)
113 | 		print losses
114 | 	x = np.linspace(0,145,30)
115 | 	for i in range(len(losses_store)):
116 | 		lab = 'n_nodes = ' + str(node_vals[i])
117 | 		plt.plot(x,losses_store[i],label=lab)
118 | 	plt.legend()
119 | 	plt.show()
120 | 
121 | print "number of observations:"
122 | num_observations()
123 | print 'noise:'
124 | noise()
125 | print 'regularization:'
126 | reg()
127 | print 'learning rate:'
128 | lr()
129 | print 'hidden nodes:'
130 | test_num_nodes()


--------------------------------------------------------------------------------
/three_layer_network.py:
--------------------------------------------------------------------------------
  1 | import numpy as np 
  2 | import math
  3 | from sklearn import datasets
  4 | 
  5 | def relu(X):
  6 | 	return np.maximum(X, 0)
  7 | 
  8 | def relu_derivative(X):
  9 | 	return 1. * (X > 0)
 10 | 
 11 | def build_model(X,hidden_nodes,output_dim=2):
 12 |     model = {}
 13 |     input_dim = X.shape[1]
 14 |     model['W1'] = np.random.randn(input_dim, hidden_nodes) / np.sqrt(input_dim)
 15 |     model['b1'] = np.zeros((1, hidden_nodes))
 16 |     model['W2'] = np.random.randn(hidden_nodes, output_dim) / np.sqrt(hidden_nodes)
 17 |     model['b2'] = np.zeros((1, output_dim))
 18 |     return model
 19 | 
 20 | def feed_forward(model, x):
 21 |     W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
 22 |     # Forward propagation
 23 |     z1 = x.dot(W1) + b1
 24 |     #a1 = np.tanh(z1)
 25 |     a1 = relu(z1)
 26 |     z2 = a1.dot(W2) + b2
 27 |     exp_scores = np.exp(z2)
 28 |     out = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
 29 |     return z1, a1, z2, out
 30 | 
 31 | def calculate_loss(model,X,y,reg_lambda):
 32 |     num_examples = X.shape[0]
 33 |     W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
 34 |     # Forward propagation to calculate our predictions
 35 |     z1, a1, z2, out = feed_forward(model, X)
 36 |     probs = out / np.sum(out, axis=1, keepdims=True)
 37 |     # Calculating the loss
 38 |     corect_logprobs = -np.log(probs[range(num_examples), y])
 39 |     loss = np.sum(corect_logprobs)
 40 |     # Add regulatization term to loss (optional)
 41 |     loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)))
 42 |     return 1./num_examples * loss
 43 | 
 44 | def backprop(X,y,model,z1,a1,z2,output,reg_lambda):
 45 |     delta3 = output
 46 |     delta3[range(X.shape[0]), y] -= 1  #yhat - y
 47 |     dW2 = (a1.T).dot(delta3)
 48 |     db2 = np.sum(delta3, axis=0, keepdims=True)
 49 |     #delta2 = delta3.dot(model['W2'].T) * (1 - np.power(a1, 2)) #if tanh
 50 |     delta2 = delta3.dot(model['W2'].T) * relu_derivative(a1) #if ReLU
 51 |     dW1 = np.dot(X.T, delta2)
 52 |     db1 = np.sum(delta2, axis=0)
 53 |     # Add regularization terms
 54 |     dW2 += reg_lambda * model['W2']
 55 |     dW1 += reg_lambda * model['W1']
 56 |     return dW1, dW2, db1, db2
 57 | 
 58 | 
 59 | def train(model, X, y, num_passes=10000, reg_lambda = .1, learning_rate=0.1):
 60 |     # Batch gradient descent
 61 |     done = False
 62 |     previous_loss = float('inf')
 63 |     i = 0
 64 |     losses = []
 65 |     while done == False:  #comment out while performance testing
 66 |     #while i < 1500:
 67 |     	#feed forward
 68 |         z1,a1,z2,output = feed_forward(model, X)
 69 |         #backpropagation
 70 |         dW1, dW2, db1, db2 = backprop(X,y,model,z1,a1,z2,output,reg_lambda)
 71 |         #update weights and biases
 72 |         model['W1'] -= learning_rate * dW1
 73 |         model['b1'] -= learning_rate * db1
 74 |         model['W2'] -= learning_rate * dW2
 75 |         model['b2'] -= learning_rate * db2
 76 |         if i % 1000 == 0:
 77 |         	loss = calculate_loss(model, X, y, reg_lambda)
 78 |         	losses.append(loss)
 79 |         	print "Loss after iteration %i: %f" %(i, loss)  #uncomment once testing finished, return mod val to 1000
 80 |         	if (previous_loss-loss)/previous_loss < 0.01:
 81 |         		done = True
 82 |         		#print i
 83 |         	previous_loss = loss
 84 |         i += 1
 85 |     return model, losses
 86 | 
 87 | def main():
 88 | 	#toy dataset
 89 | 	X, y = datasets.make_moons(16, noise=0.10)
 90 | 	num_examples = len(X) # training set size
 91 | 	nn_input_dim = 2 # input layer dimensionality
 92 | 	nn_output_dim = 2 # output layer dimensionality 
 93 | 	learning_rate = 0.01 # learning rate for gradient descent
 94 | 	reg_lambda = 0.01 # regularization strength
 95 | 	model = build_model(X,20,2)
 96 | 	model, losses = train(model,X, y, reg_lambda=reg_lambda, learning_rate=learning_rate)
 97 | 	output = feed_forward(model, X)
 98 | 	preds = np.argmax(output[3], axis=1)
 99 | 
100 | if __name__ == "__main__":
101 |     main()
102 | 


--------------------------------------------------------------------------------