├── README.md ├── cnn.py ├── fully_connected_network.py ├── layers ├── activation.py ├── convolution.py ├── flatten.py ├── fully_connected.py └── pooling.py ├── loss └── losses.py └── utilities ├── filereader.py ├── initializers.py ├── model.py ├── settings.py └── utils.py /README.md: -------------------------------------------------------------------------------- 1 | # Numpy CNN 2 | A numpy based CNN implementation for classifying images. 3 | 4 | **status: archived** 5 | 6 | ## Usage 7 | 8 | Follow the steps listed below for using this repository after cloning it. 9 | For examples, you can look at the code in [fully_connected_network.py](https://github.com/ElefHead/numpy-cnn/blob/master/fully_connected_network.py) and [cnn.py](https://github.com/ElefHead/numpy-cnn/blob/master/cnn.py). 10 | I placed the data inside a folder called data within the project root folder (this code works by default with cifar10, for other datasets, the filereader in utilities can't be used). 11 | 12 | After placing data, the directory structure looks as follows 13 | - root 14 | * data\ 15 | * data_batch_1 16 | * data_batch_2 17 | * .. 18 | * layers\ 19 | * loss\ 20 | * utilities\ 21 | * cnn.py 22 | * fully_connected_network.py 23 | 24 | --- 25 | 26 | 1) Import the required layer classes from layers folder, for example 27 | ```python 28 | from layers.fully_connected import FullyConnected 29 | from layers.convolution import Convolution 30 | from layers.flatten import Flatten 31 | ``` 32 | 2) Import the activations and losses in a similar way, for example 33 | ```python 34 | from layers.activation import Elu, Softmax 35 | from loss.losses import CategoricalCrossEntropy 36 | ``` 37 | 3) Import the model class from utilities folder 38 | ```python 39 | from utilities.model import Model 40 | ``` 41 | 4) Create a model using Model and layer classes 42 | ```python 43 | model = Model( 44 | Convolution(filters=5, padding='same'), 45 | Elu(), 46 | Pooling(mode='max', kernel_shape=(2, 2), stride=2), 47 | Flatten(), 48 | FullyConnected(units=10), 49 | Softmax(), 50 | name='cnn-model' 51 | ) 52 | ``` 53 | 5) Set model loss 54 | ```python 55 | model.set_loss(CategoricalCrossEntropy) 56 | ``` 57 | 6) Train the model using 58 | ```python 59 | model.train(data, labels) 60 | ``` 61 | * set load_and_continue = True for loading trained weights and continue training 62 | * By default the model uses AdamOptimization with AMSgrad 63 | * It also saves the weights after each epoch to a models folder within the project 64 | 7) For prediction, use 65 | ```python 66 | prediction = model.predict(data) 67 | ``` 68 | 8) For calculating accuracy, the model class provides its own function 69 | ```python 70 | accuracy = model.evaluate(data, labels) 71 | ``` 72 | 9) To load model in a different place with the trained weights, follow till step 5 and then 73 | ```python 74 | model.load_weights() 75 | ``` 76 | Note: You will have to have similar directory structure. 77 | 78 | 79 | --- 80 | This was a fun project that started out as me trying to implement a CNN by myself for classifying cifar10 images. In process, I was able to implement a reusable (numpy based) 81 | library-ish code for creating CNNs with adam optimization. 82 | 83 | Anyone wanting to understand how backpropagation works in CNNs is welcome to try out this code, but for all practical usage there are better frameworks 84 | with performances that this code cannot even come close to replicating. 85 | 86 | The CNN implemented here is based on [Andrej Karpathy's notes](http://cs231n.github.io/convolutional-networks/) 87 | -------------------------------------------------------------------------------- /cnn.py: -------------------------------------------------------------------------------- 1 | from layers.fully_connected import FullyConnected 2 | from layers.convolution import Convolution 3 | from layers.pooling import Pooling 4 | from layers.flatten import Flatten 5 | from layers.activation import Elu, Softmax 6 | 7 | from utilities.filereader import get_data 8 | from utilities.model import Model 9 | 10 | from loss.losses import CategoricalCrossEntropy 11 | 12 | import numpy as np 13 | np.random.seed(0) 14 | 15 | 16 | if __name__ == '__main__': 17 | train_data, train_labels = get_data(num_samples=50000) 18 | test_data, test_labels = get_data(num_samples=10000, dataset="testing") 19 | 20 | train_data = train_data / 255 21 | test_data = test_data / 255 22 | 23 | print("Train data shape: {}, {}".format(train_data.shape, train_labels.shape)) 24 | print("Test data shape: {}, {}".format(test_data.shape, test_labels.shape)) 25 | 26 | model = Model( 27 | Convolution(filters=5, padding='same'), 28 | Elu(), 29 | Pooling(mode='max', kernel_shape=(2, 2), stride=2), 30 | Flatten(), 31 | FullyConnected(units=10), 32 | Softmax(), 33 | name='cnn5' 34 | ) 35 | 36 | model.set_loss(CategoricalCrossEntropy) 37 | 38 | model.train(train_data, train_labels.T, epochs=2) # set load_and_continue to True if you want to start from already trained weights 39 | # model.load_weights() # uncomment if loading previously trained weights and comment above line to skip training and only load trained weights. 40 | 41 | print('Testing accuracy = {}'.format(model.evaluate(test_data, test_labels))) 42 | -------------------------------------------------------------------------------- /fully_connected_network.py: -------------------------------------------------------------------------------- 1 | from layers.fully_connected import FullyConnected 2 | from layers.flatten import Flatten 3 | from layers.activation import Elu, Softmax 4 | 5 | from utilities.filereader import get_data 6 | from utilities.model import Model 7 | 8 | from loss.losses import CategoricalCrossEntropy 9 | 10 | 11 | import numpy as np 12 | np.random.seed(0) 13 | 14 | 15 | if __name__ == '__main__': 16 | train_data, train_labels = get_data() 17 | test_data, test_labels = get_data(num_samples=10000, dataset="testing") 18 | 19 | train_data = train_data / 255 20 | test_data = test_data / 255 21 | train_labels = train_labels.T 22 | test_labels = test_labels.T 23 | 24 | print("Train data shape: {}, {}".format(train_data.shape, train_labels.shape)) 25 | print("Test data shape: {}, {}".format(test_data.shape, test_labels.shape)) 26 | 27 | model = Model( 28 | Flatten(), 29 | FullyConnected(units=200), 30 | Elu(), 31 | FullyConnected(units=200), 32 | Elu(), 33 | FullyConnected(units=10), 34 | Softmax(), 35 | name='fcn200' 36 | ) 37 | 38 | model.set_loss(CategoricalCrossEntropy) 39 | model.train(train_data, train_labels, batch_size=128, epochs=50) 40 | 41 | print('Testing accuracy = {}'.format(model.evaluate(test_data, test_labels))) 42 | 43 | -------------------------------------------------------------------------------- /layers/activation.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class Relu: 5 | def __init__(self): 6 | self.cache = {} 7 | self.has_units = False 8 | 9 | def has_weights(self): 10 | return self.has_units 11 | 12 | def forward_propagate(self, Z, save_cache=False): 13 | if save_cache: 14 | self.cache['Z'] = Z 15 | return np.where(Z >= 0, Z, 0) 16 | 17 | def back_propagate(self, dA): 18 | Z = self.cache['Z'] 19 | return dA * np.where(Z >= 0, 1, 0) 20 | 21 | 22 | class Softmax: 23 | def __init__(self): 24 | self.cache = {} 25 | self.has_units = False 26 | 27 | def has_weights(self): 28 | return self.has_units 29 | 30 | def forward_propagate(self, Z, save_cache=False): 31 | if save_cache: 32 | self.cache['Z'] = Z 33 | Z_ = Z - Z.max() 34 | e = np.exp(Z_) 35 | return e / np.sum(e, axis=0, keepdims=True) 36 | 37 | def back_propagate(self, dA): 38 | Z = self.cache['Z'] 39 | return dA * (Z * (1 - Z)) 40 | 41 | 42 | class Elu: 43 | def __init__(self, alpha=1.2): 44 | self.cache = {} 45 | self.params = { 46 | 'alpha': alpha 47 | } 48 | self.has_units = False 49 | 50 | def has_weights(self): 51 | return self.has_units 52 | 53 | def forward_propagate(self, Z, save_cache=False): 54 | if save_cache: 55 | self.cache['Z'] = Z 56 | return np.where(Z >= 0, Z, self.params['alpha'] * (np.exp(Z) - 1)) 57 | 58 | def back_propagate(self, dA): 59 | alpha = self.params['alpha'] 60 | Z = self.cache['Z'] 61 | return dA * np.where(Z >= 0, 1, self.forward_propagate(Z, alpha) + alpha) 62 | 63 | 64 | class Selu: 65 | def __init__(self, alpha=1.6733, selu_lambda=1.0507): 66 | self.params = { 67 | 'alpha' : alpha, 68 | 'lambda' : selu_lambda 69 | } 70 | self.cache = {} 71 | self.has_units = False 72 | 73 | def has_weights(self): 74 | return self.has_units 75 | 76 | def forward_propagate(self, Z, save_cache=False): 77 | if save_cache: 78 | self.cache['Z'] = Z 79 | return self.params['lambda'] * np.where(Z >= 0, Z, self.params['alpha'] * (np.exp(Z) - 1)) 80 | 81 | def back_propagate(self, dA): 82 | Z = self.cache['Z'] 83 | selu_lambda, alpha = self.params['lambda'], self.params['alpha'] 84 | return dA * selu_lambda*np.where(Z >= 0, 1, alpha*np.exp(Z)) -------------------------------------------------------------------------------- /layers/convolution.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pickle 3 | from os import path, makedirs, remove 4 | 5 | from utilities.utils import pad_inputs 6 | from utilities.initializers import glorot_uniform 7 | from utilities.settings import get_layer_num, inc_layer_num 8 | 9 | 10 | class Convolution: 11 | def __init__(self, filters, kernel_shape=(3, 3), padding='valid', stride=1, name=None): 12 | self.params = { 13 | 'filters': filters, 14 | 'padding': padding, 15 | 'kernel_shape': kernel_shape, 16 | 'stride': stride 17 | } 18 | self.cache = {} 19 | self.rmsprop_cache = {} 20 | self.momentum_cache = {} 21 | self.grads = {} 22 | self.has_units = True 23 | self.name = name 24 | self.type = 'conv' 25 | 26 | def has_weights(self): 27 | return self.has_units 28 | 29 | def save_weights(self, dump_path): 30 | dump_cache = { 31 | 'cache': self.cache, 32 | 'grads': self.grads, 33 | 'momentum': self.momentum_cache, 34 | 'rmsprop': self.rmsprop_cache 35 | } 36 | save_path = path.join(dump_path, self.name+'.pickle') 37 | makedirs(path.dirname(save_path), exist_ok=True) 38 | remove(save_path) 39 | with open(save_path, 'wb') as d: 40 | pickle.dump(dump_cache, d) 41 | 42 | def load_weights(self, dump_path): 43 | if self.name is None: 44 | self.name = '{}_{}'.format(self.type, get_layer_num(self.type)) 45 | inc_layer_num(self.type) 46 | read_path = path.join(dump_path, self.name+'.pickle') 47 | with open(read_path, 'rb') as r: 48 | dump_cache = pickle.load(r) 49 | self.cache = dump_cache['cache'] 50 | self.grads = dump_cache['grads'] 51 | self.momentum_cache = dump_cache['momentum'] 52 | self.rmsprop_cache = dump_cache['rmsprop'] 53 | 54 | def conv_single_step(self, input, W, b): 55 | ''' 56 | Function to apply one filter to input slice. 57 | :param input:[numpy array]: slice of input data of shape (f, f, n_C_prev) 58 | :param W:[numpy array]: One filter of shape (f, f, n_C_prev) 59 | :param b:[numpy array]: Bias value for the filter. Shape (1, 1, 1) 60 | :return: 61 | ''' 62 | return np.sum(np.multiply(input, W)) + float(b) 63 | 64 | def forward_propagate(self, X, save_cache=False): 65 | ''' 66 | 67 | :param X: 68 | :param save_cache: 69 | :return: 70 | ''' 71 | if self.name is None: 72 | self.name = '{}_{}'.format(self.type, get_layer_num(self.type)) 73 | inc_layer_num(self.type) 74 | 75 | (num_data_points, prev_height, prev_width, prev_channels) = X.shape 76 | filter_shape_h, filter_shape_w = self.params['kernel_shape'] 77 | 78 | if 'W' not in self.params: 79 | shape = (filter_shape_h, filter_shape_w, prev_channels, self.params['filters']) 80 | self.params['W'], self.params['b'] = glorot_uniform(shape=shape) 81 | 82 | if self.params['padding'] == 'same': 83 | pad_h = int(((prev_height - 1)*self.params['stride'] + filter_shape_h - prev_height) / 2) 84 | pad_w = int(((prev_width - 1)*self.params['stride'] + filter_shape_w - prev_width) / 2) 85 | n_H = prev_height 86 | n_W = prev_width 87 | else: 88 | pad_h = 0 89 | pad_w = 0 90 | n_H = int((prev_height - filter_shape_h) / self.params['stride']) + 1 91 | n_W = int((prev_width - filter_shape_w) / self.params['stride']) + 1 92 | 93 | self.params['pad_h'], self.params['pad_w'] = pad_h, pad_w 94 | 95 | Z = np.zeros(shape=(num_data_points, n_H, n_W, self.params['filters'])) 96 | 97 | X_pad = pad_inputs(X, (pad_h, pad_w)) 98 | 99 | for i in range(num_data_points): 100 | x = X_pad[i] 101 | for h in range(n_H): 102 | for w in range(n_W): 103 | vert_start = self.params['stride'] * h 104 | vert_end = vert_start + filter_shape_h 105 | horiz_start = self.params['stride'] * w 106 | horiz_end = horiz_start + filter_shape_w 107 | 108 | for c in range(self.params['filters']): 109 | 110 | x_slice = x[vert_start: vert_end, horiz_start: horiz_end, :] 111 | 112 | Z[i, h, w, c] = self.conv_single_step(x_slice, self.params['W'][:, :, :, c], 113 | self.params['b'][:, :, :, c]) 114 | 115 | if save_cache: 116 | self.cache['A'] = X 117 | 118 | return Z 119 | 120 | def back_propagate(self, dZ): 121 | ''' 122 | 123 | :param dZ: 124 | :return: 125 | ''' 126 | A = self.cache['A'] 127 | filter_shape_h, filter_shape_w = self.params['kernel_shape'] 128 | pad_h, pad_w = self.params['pad_h'], self.params['pad_w'] 129 | 130 | (num_data_points, prev_height, prev_width, prev_channels) = A.shape 131 | 132 | dA = np.zeros((num_data_points, prev_height, prev_width, prev_channels)) 133 | self.grads = self.init_cache() 134 | 135 | A_pad = pad_inputs(A, (pad_h, pad_w)) 136 | dA_pad = pad_inputs(dA, (pad_h, pad_w)) 137 | 138 | for i in range(num_data_points): 139 | a_pad = A_pad[i] 140 | da_pad = dA_pad[i] 141 | 142 | for h in range(prev_height): 143 | for w in range(prev_width): 144 | 145 | vert_start = self.params['stride'] * h 146 | vert_end = vert_start + filter_shape_h 147 | horiz_start = self.params['stride'] * w 148 | horiz_end = horiz_start + filter_shape_w 149 | 150 | for c in range(self.params['filters']): 151 | a_slice = a_pad[vert_start: vert_end, horiz_start: horiz_end, :] 152 | 153 | da_pad[vert_start:vert_end, horiz_start:horiz_end, :] += self.params['W'][:, :, :, c] * dZ[i, h, w, c] 154 | self.grads['dW'][:, :, :, c] += a_slice * dZ[i, h, w, c] 155 | self.grads['db'][:, :, :, c] += dZ[i, h, w, c] 156 | dA[i, :, :, :] = da_pad[pad_h: -pad_h, pad_w: -pad_w, :] 157 | 158 | return dA 159 | 160 | def init_cache(self): 161 | cache = dict() 162 | cache['dW'] = np.zeros_like(self.params['W']) 163 | cache['db'] = np.zeros_like(self.params['b']) 164 | return cache 165 | 166 | def momentum(self, beta=0.9): 167 | if not self.momentum_cache: 168 | self.momentum_cache = self.init_cache() 169 | self.momentum_cache['dW'] = beta * self.momentum_cache['dW'] + (1 - beta) * self.grads['dW'] 170 | self.momentum_cache['db'] = beta * self.momentum_cache['db'] + (1 - beta) * self.grads['db'] 171 | 172 | def rmsprop(self, beta=0.999, amsprop=True): 173 | if not self.rmsprop_cache: 174 | self.rmsprop_cache = self.init_cache() 175 | 176 | new_dW = beta * self.rmsprop_cache['dW'] + (1 - beta) * (self.grads['dW']**2) 177 | new_db = beta * self.rmsprop_cache['db'] + (1 - beta) * (self.grads['db']**2) 178 | 179 | if amsprop: 180 | self.rmsprop_cache['dW'] = np.maximum(self.rmsprop_cache['dW'], new_dW) 181 | self.rmsprop_cache['db'] = np.maximum(self.rmsprop_cache['db'], new_db) 182 | else: 183 | self.rmsprop_cache['dW'] = new_dW 184 | self.rmsprop_cache['db'] = new_db 185 | 186 | def apply_grads(self, learning_rate=0.001, l2_penalty=1e-4, optimization='adam', epsilon=1e-8, 187 | correct_bias=False, beta1=0.9, beta2=0.999, iter=999): 188 | if optimization != 'adam': 189 | self.params['W'] -= learning_rate * (self.grads['dW'] + l2_penalty * self.params['W']) 190 | self.params['b'] -= learning_rate * (self.grads['db'] + l2_penalty * self.params['b']) 191 | 192 | else: 193 | if correct_bias: 194 | W_first_moment = self.momentum_cache['dW'] / (1 - beta1 ** iter) 195 | b_first_moment = self.momentum_cache['db'] / (1 - beta1 ** iter) 196 | W_second_moment = self.rmsprop_cache['dW'] / (1 - beta2 ** iter) 197 | b_second_moment = self.rmsprop_cache['db'] / (1 - beta2 ** iter) 198 | else: 199 | W_first_moment = self.momentum_cache['dW'] 200 | b_first_moment = self.momentum_cache['db'] 201 | W_second_moment = self.rmsprop_cache['dW'] 202 | b_second_moment = self.rmsprop_cache['db'] 203 | 204 | W_learning_rate = learning_rate / (np.sqrt(W_second_moment) + epsilon) 205 | b_learning_rate = learning_rate / (np.sqrt(b_second_moment) + epsilon) 206 | 207 | self.params['W'] -= W_learning_rate * (W_first_moment + l2_penalty * self.params['W']) 208 | self.params['b'] -= b_learning_rate * (b_first_moment + l2_penalty * self.params['b']) 209 | -------------------------------------------------------------------------------- /layers/flatten.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class Flatten: 5 | def __init__(self, transpose=True): 6 | self.shape = () 7 | self.transpose = transpose 8 | self.has_units = False 9 | 10 | def has_weights(self): 11 | return self.has_units 12 | 13 | def forward_propagate(self, Z, save_cache=False): 14 | shape = Z.shape 15 | if save_cache: 16 | self.shape = shape 17 | data = np.ravel(Z).reshape(shape[0], -1) 18 | if self.transpose: 19 | data = data.T 20 | return data 21 | 22 | def back_propagate(self, Z): 23 | if self.transpose: 24 | Z = Z.T 25 | return Z.reshape(self.shape) 26 | -------------------------------------------------------------------------------- /layers/fully_connected.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pickle 3 | from os import path, makedirs, remove 4 | 5 | from utilities.initializers import he_normal 6 | from utilities.settings import get_layer_num, inc_layer_num 7 | 8 | np.random.seed(0) 9 | 10 | 11 | class FullyConnected: 12 | def __init__(self, units=200, name=None): 13 | self.units = units 14 | self.params = {} 15 | self.cache = {} 16 | self.grads = {} 17 | self.momentum_cache = {} 18 | self.rmsprop_cache = {} 19 | self.has_units = True 20 | self.type = 'fc' 21 | self.name = name 22 | 23 | def has_weights(self): 24 | return self.has_units 25 | 26 | def save_weights(self, dump_path): 27 | dump_cache = { 28 | 'cache': self.cache, 29 | 'grads': self.grads, 30 | 'momentum': self.momentum_cache, 31 | 'rmsprop_cache': self.rmsprop_cache 32 | } 33 | save_path = path.join(dump_path, self.name+'.pickle') 34 | makedirs(path.dirname(save_path), exist_ok=True) 35 | remove(save_path) 36 | with open(save_path, 'wb') as d: 37 | pickle.dump(dump_cache, d) 38 | 39 | def load_weights(self, dump_path): 40 | if self.name is None: 41 | self.name = '{}_{}'.format(self.type, get_layer_num(self.type)) 42 | inc_layer_num(self.type) 43 | read_path = path.join(dump_path, self.name+'.pickle') 44 | with open(read_path, 'rb') as r: 45 | dump_cache = pickle.load(r) 46 | self.cache = dump_cache['cache'] 47 | self.grads = dump_cache['grads'] 48 | self.momentum_cache = dump_cache['momentum'] 49 | self.rmsprop_cache = dump_cache['rmsprop_cache'] 50 | 51 | def forward_propagate(self, X, save_cache=False): 52 | if self.name is None: 53 | self.name = '{}_{}'.format(self.type, get_layer_num(self.type)) 54 | inc_layer_num(self.type) 55 | 56 | if 'W' not in self.params: 57 | self.params['W'], self.params['b'] = he_normal((X.shape[0], self.units)) 58 | Z = np.dot(self.params['W'], X) + self.params['b'] 59 | if save_cache: 60 | self.cache['A'] = X 61 | return Z 62 | 63 | def back_propagate(self, dZ): 64 | batch_size = dZ.shape[1] 65 | self.grads['dW'] = np.dot(dZ, self.cache['A'].T) / batch_size 66 | self.grads['db'] = np.sum(dZ, axis=1, keepdims=True) 67 | return np.dot(self.params['W'].T, dZ) 68 | 69 | def init_cache(self): 70 | cache = dict() 71 | cache['dW'] = np.zeros_like(self.params['W']) 72 | cache['db'] = np.zeros_like(self.params['b']) 73 | return cache 74 | 75 | def momentum(self, beta=0.9): 76 | if not self.momentum_cache: 77 | self.momentum_cache = self.init_cache() 78 | self.momentum_cache['dW'] = beta * self.momentum_cache['dW'] + (1 - beta) * self.grads['dW'] 79 | self.momentum_cache['db'] = beta * self.momentum_cache['db'] + (1 - beta) * self.grads['db'] 80 | 81 | def rmsprop(self, beta=0.999, amsprop=True): 82 | if not self.rmsprop_cache: 83 | self.rmsprop_cache = self.init_cache() 84 | 85 | new_dW = beta * self.rmsprop_cache['dW'] + (1 - beta) * (self.grads['dW']**2) 86 | new_db = beta * self.rmsprop_cache['db'] + (1 - beta) * (self.grads['db']**2) 87 | 88 | if amsprop: 89 | self.rmsprop_cache['dW'] = np.maximum(self.rmsprop_cache['dW'], new_dW) 90 | self.rmsprop_cache['db'] = np.maximum(self.rmsprop_cache['db'], new_db) 91 | else: 92 | self.rmsprop_cache['dW'] = new_dW 93 | self.rmsprop_cache['db'] = new_db 94 | 95 | def apply_grads(self, learning_rate=0.001, l2_penalty=1e-4, optimization='adam', epsilon=1e-8, \ 96 | correct_bias=False, beta1=0.9, beta2=0.999, iter=999): 97 | if optimization != 'adam': 98 | self.params['W'] -= learning_rate * (self.grads['dW'] + l2_penalty * self.params['W']) 99 | self.params['b'] -= learning_rate * (self.grads['db'] + l2_penalty * self.params['b']) 100 | else: 101 | if correct_bias: 102 | W_first_moment = self.momentum_cache['dW'] / (1 - beta1 ** iter) 103 | b_first_moment = self.momentum_cache['db'] / (1 - beta1 ** iter) 104 | W_second_moment = self.rmsprop_cache['dW'] / (1 - beta2 ** iter) 105 | b_second_moment = self.rmsprop_cache['db'] / (1 - beta2 ** iter) 106 | else: 107 | W_first_moment = self.momentum_cache['dW'] 108 | b_first_moment = self.momentum_cache['db'] 109 | W_second_moment = self.rmsprop_cache['dW'] 110 | b_second_moment = self.rmsprop_cache['db'] 111 | 112 | W_learning_rate = learning_rate / (np.sqrt(W_second_moment) + epsilon) 113 | b_learning_rate = learning_rate / (np.sqrt(b_second_moment) + epsilon) 114 | 115 | self.params['W'] -= W_learning_rate * (W_first_moment + l2_penalty * self.params['W']) 116 | self.params['b'] -= b_learning_rate * (b_first_moment + l2_penalty * self.params['b']) 117 | -------------------------------------------------------------------------------- /layers/pooling.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from utilities.settings import get_layer_num, inc_layer_num 3 | 4 | 5 | class Pooling: 6 | def __init__(self, kernel_shape=(3, 3), stride=1, mode="max", name=None): 7 | ''' 8 | 9 | :param kernel_shape: 10 | :param stride: 11 | :param mode: 12 | ''' 13 | self.params = { 14 | 'kernel_shape': kernel_shape, 15 | 'stride': stride, 16 | 'mode': mode 17 | } 18 | self.type = 'pooling' 19 | self.cache = {} 20 | self.has_units = False 21 | self.name = name 22 | 23 | def has_weights(self): 24 | return self.has_units 25 | 26 | def forward_propagate(self, X, save_cache=False): 27 | ''' 28 | 29 | :param X: 30 | :param save_cache: 31 | :return: 32 | ''' 33 | 34 | (num_data_points, prev_height, prev_width, prev_channels) = X.shape 35 | filter_shape_h, filter_shape_w = self.params['kernel_shape'] 36 | 37 | n_H = int(1 + (prev_height - filter_shape_h) / self.params['stride']) 38 | n_W = int(1 + (prev_width - filter_shape_w) / self.params['stride']) 39 | n_C = prev_channels 40 | 41 | A = np.zeros((num_data_points, n_H, n_W, n_C)) 42 | 43 | for i in range(num_data_points): 44 | for h in range(n_H): 45 | for w in range(n_W): 46 | 47 | vert_start = h * self.params['stride'] 48 | vert_end = vert_start + filter_shape_h 49 | horiz_start = w * self.params['stride'] 50 | horiz_end = horiz_start + filter_shape_w 51 | 52 | for c in range(n_C): 53 | 54 | if self.params['mode'] == 'average': 55 | A[i, h, w, c] = np.mean(X[i, vert_start: vert_end, horiz_start: horiz_end, c]) 56 | else: 57 | A[i, h, w, c] = np.max(X[i, vert_start: vert_end, horiz_start: horiz_end, c]) 58 | if save_cache: 59 | self.cache['A'] = X 60 | 61 | return A 62 | 63 | def distribute_value(self, dz, shape): 64 | (n_H, n_W) = shape 65 | average = 1 / (n_H * n_W) 66 | return np.ones(shape) * dz * average 67 | 68 | def create_mask(self, x): 69 | return x == np.max(x) 70 | 71 | def back_propagate(self, dA): 72 | A = self.cache['A'] 73 | filter_shape_h, filter_shape_w = self.params['kernel_shape'] 74 | 75 | (num_data_points, prev_height, prev_width, prev_channels) = A.shape 76 | m, n_H, n_W, n_C = dA.shape 77 | 78 | dA_prev = np.zeros(shape=(num_data_points, prev_height, prev_width, prev_channels)) 79 | 80 | for i in range(num_data_points): 81 | a = A[i] 82 | 83 | for h in range(n_H): 84 | for w in range(n_W): 85 | 86 | vert_start = h * self.params['stride'] 87 | vert_end = vert_start + filter_shape_h 88 | horiz_start = w * self.params['stride'] 89 | horiz_end = horiz_start + filter_shape_w 90 | 91 | for c in range(n_C): 92 | 93 | if self.params['mode'] == 'average': 94 | da = dA[i, h, w, c] 95 | dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += \ 96 | self.distribute_value(da, self.params['kernel_shape']) 97 | 98 | else: 99 | a_slice = a[vert_start: vert_end, horiz_start: horiz_end, c] 100 | mask = self.create_mask(a_slice) 101 | dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += \ 102 | dA[i, h, w, c] * mask 103 | 104 | return dA_prev 105 | -------------------------------------------------------------------------------- /loss/losses.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class CategoricalCrossEntropy: 5 | @staticmethod 6 | def compute_loss(labels, predictions, epsilon=1e-8): 7 | ''' 8 | The function to compute the categorical cross entropy loss, given training labels and prediction 9 | :param labels:[numpy array]: Training labels 10 | :param predictions:[numpy array]: Predicted labels 11 | :param epsilon:[float default=1e-8]: A small value for applying clipping for stability 12 | :return:[float]: The computed value of loss. 13 | ''' 14 | predictions /= np.sum(predictions, axis=0, keepdims=True) 15 | predictions = np.clip(predictions, epsilon, 1. - epsilon) 16 | return -np.sum(labels * np.log(predictions)) 17 | 18 | @staticmethod 19 | def compute_derivative(labels, predictions): 20 | ''' 21 | The function to compute the derivative values of categorical cross entropy values, given labels and prediction 22 | :param labels:[numpy array]: Training labels 23 | :param predictions:[numpy array]: Predicted labels 24 | :return:[numpy array]: The computed derivatives of categorical cross entropy function. 25 | ''' 26 | return labels - predictions 27 | -------------------------------------------------------------------------------- /utilities/filereader.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import numpy as np 3 | from os import path 4 | from utilities.utils import to_categorical 5 | 6 | 7 | TOTAL_BATCHES = 5 8 | NUM_DIMENSIONS = 3072 9 | NUM_CLASSES = 10 10 | SAMPLES_PER_BATCH = 10000 11 | MAX_TRAINING_SAMPLES = 50000 12 | MAX_TESTING_SAMPLES = 10000 13 | FILE_NAME = { 14 | 'training': 'data_batch_', 15 | 'testing': 'test_batch' 16 | } 17 | 18 | 19 | def unpickle(file, num_samples=10000): 20 | ''' 21 | Function to read the data from the binary files 22 | Description of data taken from CIFAR-10 website 23 | :param file: the path to the datafile 24 | :param num_samples: (remaining) samples required from a particular set (not same as num_samples in get_data) 25 | :return: data and one-hot-encoded labels 26 | ''' 27 | with open(file, 'rb') as fo: 28 | data = pickle.load(fo, encoding='bytes') 29 | return data[b'data'][:num_samples, :], to_categorical(data[b'labels'][:num_samples], NUM_CLASSES) 30 | 31 | 32 | def get_data(data_path="data", num_samples=50000, dataset="training"): 33 | ''' 34 | Function that reads and returns the required training or testing data 35 | :param data_path: string: the relative folder path to where the data lies (default: ./data) 36 | :param num_samples: int: number of samples required (MAX 50000) 37 | :param dataset: string: training or testing, default is training 38 | :return: two numpy arrays 1 containing data and other containing corresponding labels. 39 | data shape = [num_samples, 32, 32, 3] and labels shape = [num_samples, 10] for cifar-10 data 40 | consistency checked with keras dataset cifar10 41 | ''' 42 | if dataset == "testing" and num_samples > MAX_TESTING_SAMPLES: 43 | num_samples = MAX_TESTING_SAMPLES 44 | if dataset == "training" and num_samples>MAX_TRAINING_SAMPLES: 45 | num_samples = MAX_TRAINING_SAMPLES 46 | data = np.zeros(shape=(num_samples, NUM_DIMENSIONS)) 47 | labels = np.zeros(shape=(NUM_CLASSES, num_samples)) 48 | num_batches = num_samples//SAMPLES_PER_BATCH + 1 49 | if num_batches > TOTAL_BATCHES: 50 | num_batches = TOTAL_BATCHES 51 | remaining = num_samples - 0 52 | for _ in range(num_batches): 53 | file_name = FILE_NAME[dataset]+str(_+1) if dataset=="training" else FILE_NAME[dataset] 54 | file = path.join('.', data_path, file_name) 55 | if remaining > SAMPLES_PER_BATCH: 56 | ret_val = unpickle(file, SAMPLES_PER_BATCH) 57 | data[_*SAMPLES_PER_BATCH: SAMPLES_PER_BATCH*(_+1)] = ret_val[0] 58 | labels[:, _*SAMPLES_PER_BATCH: SAMPLES_PER_BATCH*(_+1)] = ret_val[1] 59 | else: 60 | ret_val = unpickle(file, remaining) 61 | data[_*SAMPLES_PER_BATCH:] = ret_val[0] 62 | labels[:, _*SAMPLES_PER_BATCH:] = ret_val[1] 63 | remaining = remaining - SAMPLES_PER_BATCH 64 | return data.reshape(-1, 3, 32, 32).transpose(0, 2, 3, 1).astype(np.float32), labels.T 65 | -------------------------------------------------------------------------------- /utilities/initializers.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def get_fans(shape): 5 | ''' 6 | 7 | :param shape: 8 | :return: 9 | ''' 10 | fan_in = shape[0] if len(shape) == 2 else np.prod(shape[1:]) 11 | fan_out = shape[1] if len(shape) == 2 else shape[0] 12 | return fan_in, fan_out 13 | 14 | 15 | def normal(shape, scale=0.05): 16 | ''' 17 | 18 | :param shape: 19 | :param scale: 20 | :return: 21 | ''' 22 | return np.random.normal(0, scale, size=shape) 23 | 24 | 25 | def uniform(shape, scale=0.05): 26 | ''' 27 | 28 | :param shape: 29 | :param scale: 30 | :return: 31 | ''' 32 | return np.random.uniform(-scale, scale, size=shape) 33 | 34 | 35 | def he_normal(shape): 36 | ''' 37 | A function for smart normal distribution based initialization of parameters 38 | [He et al. https://arxiv.org/abs/1502.01852] 39 | :param fan_in: The number of units in previous layer. 40 | :param fan_out: The number of units in current layer. 41 | :return:[numpy array, numpy array]: A randomly initialized array of shape [fan_out, fan_in] 42 | ''' 43 | fan_in, fan_out = get_fans(shape) 44 | scale = np.sqrt(2. / fan_in) 45 | shape = (fan_out, fan_in) if len(shape) == 2 else shape # For a fully connected network 46 | bias_shape = (fan_out, 1) if len(shape) == 2 else ( 47 | 1, 1, 1, shape[3]) # This supports only CNNs and fully connected networks 48 | return normal(shape, scale), uniform(bias_shape) 49 | 50 | 51 | def he_uniform(shape): 52 | ''' 53 | A function for smart uniform distribution based initialization of parameters 54 | [He et al. https://arxiv.org/abs/1502.01852] 55 | :param fan_in: The number of units in previous layer. 56 | :param fan_out: The number of units in current layer. 57 | :return:[numpy array, numpy array]: A randomly initialized array of shape [fan_out, fan_in] and 58 | the bias of shape [fan_out, 1] 59 | ''' 60 | fan_in, fan_out = get_fans(shape) 61 | scale = np.sqrt(6. / fan_in) 62 | shape = (fan_out, fan_in) if len(shape) == 2 else shape # For a fully connected network 63 | bias_shape = (fan_out, 1) if len(shape) == 2 else ( 64 | 1, 1, 1, shape[3]) # This supports only CNNs and fully connected networks 65 | return uniform(shape, scale), uniform(shape=bias_shape) 66 | 67 | 68 | def glorot_normal(shape): 69 | ''' 70 | A function for smart uniform distribution based initialization of parameters 71 | [Glorot et al. http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf] 72 | :param fan_in: The number of units in previous layer. 73 | :param fan_out: The number of units in current layer. 74 | :return:[numpy array, numpy array]: A randomly initialized array of shape [fan_out, fan_in] and 75 | the bias of shape [fan_out, 1] 76 | ''' 77 | fan_in, fan_out = get_fans(shape) 78 | scale = np.sqrt(2. / (fan_in + fan_out)) 79 | shape = (fan_out, fan_in) if len(shape) == 2 else shape # For a fully connected network 80 | bias_shape = (fan_out, 1) if len(shape) == 2 else ( 81 | 1, 1, 1, shape[3]) # This supports only CNNs and fully connected networks 82 | return normal(shape, scale), uniform(shape=bias_shape) 83 | 84 | 85 | def glorot_uniform(shape): 86 | ''' 87 | A function for smart uniform distribution based initialization of parameters 88 | [Glorot et al. http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf] 89 | :param fan_in: The number of units in previous layer. 90 | :param fan_out: The number of units in current layer. 91 | :return:[numpy array, numpy array]: A randomly initialized array of shape [fan_out, fan_in] and 92 | the bias of shape [fan_out, 1] 93 | ''' 94 | fan_in, fan_out = get_fans(shape) 95 | scale = np.sqrt(6. / (fan_in + fan_out)) 96 | shape = (fan_out, fan_in) if len(shape) == 2 else shape # For a fully connected network 97 | bias_shape = (fan_out, 1) if len(shape) == 2 else ( 98 | 1, 1, 1, shape[3]) # This supports only CNNs and fully connected networks 99 | return uniform(shape, scale), uniform(shape=bias_shape) 100 | -------------------------------------------------------------------------------- /utilities/model.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from os import makedirs, path 4 | 5 | from utilities.utils import get_batches, evaluate 6 | from utilities.settings import set_network_name, get_models_path 7 | 8 | class Model: 9 | def __init__(self, *model, **kwargs): 10 | self.model = model 11 | self.num_classes = 0 12 | self.batch_size = 0 13 | self.loss = None 14 | self.optimizer = None 15 | self.name = kwargs['name'] if 'name' in kwargs else None 16 | 17 | def set_batch_size(self, batch_size): 18 | self.batch_size = batch_size 19 | 20 | def set_loss(self, loss): 21 | self.loss = loss 22 | 23 | def set_name(self, name): 24 | set_network_name(name) 25 | 26 | def load_weights(self): 27 | for layer in self.model: 28 | if layer.has_weights(): 29 | layer.load_weights(path.join(get_models_path(), self.name)) 30 | 31 | def train(self, data, labels, batch_size=256, epochs=50, optimization='adam', 32 | save_model=True, load_and_continue=False): 33 | if self.loss is None: 34 | raise RuntimeError("Set loss first using 'model.set_loss()'") 35 | 36 | self.set_batch_size(batch_size) 37 | if save_model: 38 | self.set_name(self.name) 39 | 40 | if load_and_continue: 41 | for layer in self.model: 42 | if layer.has_weights(): 43 | layer.load_weights(path.join(get_models_path(), self.name)) 44 | 45 | iter = 1 46 | for epoch in range(epochs): 47 | print('Running Epoch:', epoch + 1) 48 | for i, (x_batch, y_batch) in enumerate(get_batches(data, labels)): 49 | batch_preds = x_batch.copy() 50 | for num, layer in enumerate(self.model): 51 | batch_preds = layer.forward_propagate(batch_preds, save_cache=True) 52 | dA = self.loss.compute_derivative(y_batch, batch_preds) 53 | for layer in reversed(self.model): 54 | dA = layer.back_propagate(dA) 55 | if layer.has_weights(): 56 | if optimization == 'adam': 57 | layer.momentum() 58 | layer.rmsprop() 59 | 60 | for layer in self.model: 61 | if layer.has_weights(): 62 | layer.apply_grads(optimization=optimization, correct_bias=True, iter=iter) 63 | for layer in self.model: 64 | if layer.has_weights(): 65 | layer.save_weights(path.join(get_models_path(), self.name)) 66 | 67 | iter += batch_size 68 | 69 | def predict(self, data): 70 | if self.batch_size == 0: 71 | self.batch_size = data.shape[0] 72 | if self.num_classes == 0: 73 | predictions = np.zeros((1, data.shape[0])) 74 | else: 75 | predictions = np.zeros((self.num_classes, data.shape[0])) 76 | num_batches = data.shape[0] // self.batch_size 77 | for batch_num, x_batch in enumerate(get_batches(data, batch_size=self.batch_size, shuffle=False)): 78 | batch_preds = x_batch.copy() 79 | for layer in self.model: 80 | batch_preds = layer.forward_propagate(batch_preds, save_cache=False) 81 | M, N = batch_preds.shape 82 | if M != predictions.shape[0]: 83 | predictions = np.zeros(shape=(M, data.shape[0])) 84 | if batch_num <= num_batches - 1: 85 | predictions[:, batch_num * self.batch_size:(batch_num + 1) * self.batch_size] = batch_preds 86 | else: 87 | predictions[:, batch_num * self.batch_size:] = batch_preds 88 | return predictions 89 | 90 | def evaluate(self, data, labels): 91 | predictions = self.predict(data) 92 | M, N = predictions.shape 93 | if (M, N) == labels.shape: 94 | return evaluate(labels, predictions) 95 | elif (N, M) == labels.shape: 96 | return evaluate(labels.T, predictions) 97 | else: 98 | raise RuntimeError("Prediction and label shapes don't match") 99 | -------------------------------------------------------------------------------- /utilities/settings.py: -------------------------------------------------------------------------------- 1 | layer_nums = { 2 | 'fc': 1, 3 | 'conv': 1 4 | } 5 | network_name = None 6 | models_path = 'models' 7 | 8 | 9 | def init(): 10 | global layer_nums 11 | layer_nums = { 12 | 'fc': 1, 13 | 'conv': 1 14 | } 15 | global network_name 16 | network_name = None 17 | 18 | 19 | def get_layer_num(layer_type): 20 | global layer_nums 21 | return layer_nums[layer_type] 22 | 23 | 24 | def get_models_path(): 25 | return models_path 26 | 27 | 28 | def inc_layer_num(layer_type): 29 | global layer_nums 30 | layer_nums[layer_type] += 1 31 | 32 | 33 | def set_network_name(name): 34 | global network_name 35 | network_name = name 36 | 37 | 38 | def get_network_name(): 39 | if network_name is None: 40 | raise RuntimeError("Model name not set, set name as 'model.set_name()'") 41 | return network_name 42 | -------------------------------------------------------------------------------- /utilities/utils.py: -------------------------------------------------------------------------------- 1 | import matplotlib.pyplot as plt 2 | import numpy as np 3 | 4 | 5 | labels_to_name_map = { 6 | 0: 'airplane', 7 | 1: 'automobile', 8 | 2: 'bird', 9 | 3: 'cat', 10 | 4: 'deer', 11 | 5: 'dog', 12 | 6: 'frog', 13 | 7: 'horse', 14 | 8: 'ship', 15 | 9: 'truck' 16 | } 17 | 18 | 19 | def get_name(label): 20 | return labels_to_name_map[int(np.argmax(label))] 21 | 22 | 23 | def pad_inputs(X, pad): 24 | ''' 25 | Function to apply zero padding to the image 26 | :param X:[numpy array]: Dataset of shape (m, height, width, depth) 27 | :param pad:[int]: number of columns to pad 28 | :return:[numpy array]: padded dataset 29 | ''' 30 | return np.pad(X, ((0, 0), (pad[0], pad[0]), (pad[1], pad[1]), (0, 0)), 'constant') 31 | 32 | 33 | def show_image(image, title=None, cmap=None): 34 | ''' 35 | Function to display one image 36 | :param image: numpy float array: of shape (32, 32, 3) 37 | :return: Void 38 | ''' 39 | if cmap is not None: 40 | plt.imshow(image, cmap=cmap) 41 | else: 42 | plt.imshow(image) 43 | if title is not None: 44 | plt.title(title) 45 | plt.show() 46 | 47 | 48 | def plot_graph(Y, X=None, title=None, xlabel=None, ylabel=None): 49 | ''' 50 | A function to plot a line graph. 51 | :param Y: Values for Y axis 52 | :param X: Values for X axis(optional) 53 | :param title:[string default=None]: Graph title. 54 | :param xlabel:[string default=None]: X axis label. 55 | :param ylabel:[string default=None]: Y axis label. 56 | :return: Void 57 | ''' 58 | if X is None: 59 | plt.plot(Y) 60 | else: 61 | plt.plot(X, Y) 62 | if title is not None: 63 | plt.title(title) 64 | if xlabel is not None: 65 | plt.xlabel(xlabel) 66 | if ylabel is not None: 67 | plt.ylabel(ylabel) 68 | plt.show() 69 | 70 | 71 | def to_categorical(labels, num_classes, axis=0): 72 | ''' 73 | Function to one-hot-encode the labels 74 | :param labels:[list or vector]: list of ints: list of numbers (ranging 0-9 for CIFAR-10) 75 | :param num_classes:[int]: the total number of unique classes or categories. 76 | :param axis:[int Default=0]: decides row matrix or column matrix. if 0 then column matrix, else row 77 | :return: numpy array of ints: one-hot-encoded labels 78 | ''' 79 | ohe_labels = np.zeros((len(labels), num_classes)) if axis != 0 else np.zeros((num_classes, len(labels))) 80 | for _ in range(len(labels)): 81 | if axis == 0: 82 | ohe_labels[labels[_], _] = 1 83 | else: 84 | ohe_labels[_, labels[_]] = 1 85 | return ohe_labels 86 | 87 | 88 | def get_batches(data, labels=None, batch_size=256, shuffle=True): 89 | ''' 90 | Function to get data in batches. 91 | :param data:[numpy array]: training or test data. Assumes shape=[M, N] where M is the features and N is samples. 92 | :param labels:[numpy array, Default = None (for without labels)]: actual labels corresponding to the data. 93 | Assumes shape=[M, N] where M is number of classes/results per sample and N is number of samples. 94 | :param batch_size:[int, Default = 256]: required size of batch. If data can't be exactly divided by batch_size, 95 | remaining samples will be in a new batch 96 | :param shuffle:[boolean, Default = True]: if true, function will shuffle the data 97 | :return:[numpy array, numpy array]: batch data and corresponding labels 98 | ''' 99 | N = data.shape[1] if len(data.shape) == 2 else data.shape[0] 100 | num_batches = N//batch_size 101 | if len(data.shape) == 2: 102 | data = data.T 103 | if shuffle: 104 | shuffled_indices = np.random.permutation(N) 105 | data = data[shuffled_indices] 106 | labels = labels[:, shuffled_indices] if labels is not None else None 107 | if num_batches == 0: 108 | if labels is not None: 109 | yield (data.T, labels) if len(data.shape) == 2 else (data, labels) 110 | else: 111 | yield data.T if len(data.shape) == 2 else data 112 | for batch_num in range(num_batches): 113 | if labels is not None: 114 | yield (data[batch_num*batch_size:(batch_num+1)*batch_size].T, 115 | labels[:, batch_num*batch_size:(batch_num+1)*batch_size]) if len(data.shape) == 2 \ 116 | else (data[batch_num*batch_size:(batch_num+1)*batch_size], 117 | labels[:, batch_num*batch_size:(batch_num+1)*batch_size]) 118 | else: 119 | yield data[batch_num*batch_size:(batch_num+1)*batch_size].T if len(data.shape) == 2 else \ 120 | data[batch_num*batch_size:(batch_num+1)*batch_size] 121 | if N%batch_size != 0 and num_batches != 0: 122 | if labels is not None: 123 | yield (data[num_batches*batch_size:].T, labels[:, num_batches*batch_size:]) if len(data.shape) == 2 else \ 124 | (data[num_batches*batch_size:], labels[:, num_batches*batch_size:]) 125 | else: 126 | yield data[num_batches*batch_size:].T if len(data.shape)==2 else data[num_batches*batch_size:] 127 | 128 | 129 | def evaluate(labels, predictions): 130 | ''' 131 | A function to compute the accuracy of the predictions on a scale of 0-1. 132 | :param labels:[numpy array]: Training labels (or testing/validation if available) 133 | :param predictions:[numpy array]: Predicted labels 134 | :return:[float]: a number between [0, 1] denoting the accuracy of the prediction 135 | ''' 136 | return np.mean(np.argmax(labels, axis=0) == np.argmax(predictions, axis=0)) 137 | --------------------------------------------------------------------------------