├── .gitignore ├── LICENSE ├── README.md ├── cnn_custom_dataset.py ├── cnn_mnist.py ├── mlp.py ├── nn ├── __init__.py ├── activations.py ├── functional.py ├── layers.py ├── losses.py ├── net.py └── utils.py ├── requirements.txt └── tests └── core_tests.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | *.xml 3 | __pycache__ -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Tivadar Danka 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # neural-networks-from-scratch 2 | 3 | # Contents 4 | - [Quickstart](#quickstart) 5 | - [A simple example CNN](#CNN-example) 6 | - [The `Net` object](#net) 7 | - [Layers](#layers) 8 | - [`Linear`](#linear) 9 | - [`Conv2D`](#conv2d) 10 | - [`MaxPool2D`](#maxpool2d) 11 | - [`BatchNorm2D`](#batchnorm2d) 12 | - [`Flatten`](#flatten) 13 | - [Losses](#losses) 14 | - [`CrossEntropyLoss`](#crossentropyloss) 15 | - [`MeanSquareLoss`](#meansquareloss) 16 | - [Activations](#activations) 17 | 18 | # Quickstart 19 | 20 | ## Installation 21 | To run the examples, creating a virtual environment is recommended. 22 | ```bash 23 | virtualenv neural-networks-from scratch 24 | ``` 25 | When a virtual environment is in place, all requirements can be installed with pip. 26 | ```bash 27 | source neural-networks-from-scratch/bin/activate 28 | pip install -r requirements.txt 29 | ``` 30 | 31 | 32 | ## A simple example CNN 33 | A simple convolutional network for image classification can be found in `CNN_custom_dataset.py`. To try it on your own dataset, you should prepare your images in the following format: 34 | ```bash 35 | images_folder 36 | |-- class_01 37 | |-- 001.png 38 | |-- ... 39 | |-- class_02 40 | |-- 001.png 41 | |-- ... 42 | |-- ... 43 | ``` 44 | Its required argument is 45 | - `--dataset`: path to the dataset, 46 | 47 | while the optional arguments are 48 | - `--epochs`: number of epochs, 49 | - `--batch_size`: size of the training batch, 50 | - `--lr`: learning rate. 51 | 52 | ## The `Net` object 53 | To define a neural network, the `nn.net.Net` object can be used. Its parameters are 54 | * `layers`: a list of layers from `nn.layers`, for example `[Linear(2, 4), ReLU(), Linear(4, 2)]`, 55 | * `loss`: a loss function from `nn.losses`, for example `CrossEntropyLoss` or `MeanSquareLoss`. 56 | If you would like to train the model with data `X` and label `y`, you should 57 | 1) perform the forward pass, during which local gradients are calculated, 58 | 2) calculate the loss, 59 | 3) perform the backward pass, where global gradients with respect to the variables and layer parameters are calculated, 60 | 4) update the weights. 61 | 62 | In code, this looks like the following: 63 | ```python3 64 | out = net(X) 65 | loss = net.loss(out, y) 66 | net.backward() 67 | net.update_weights(lr) 68 | ``` 69 | 70 | # Layers 71 | The currently implemented layers can be found in `nn.layers`. Each layer is a callable object, where calling performs the forward pass and calculates local gradients. The most important methods are: 72 | - `.forward(X)`: performs the forward pass for X. Instead calling `forward` directly, the layer object should be called directly, which calculates and caches local gradients. 73 | - `.backward(dY)`: performs the backward pass, where `dY` is the gradient propagated backwards from the consequtive layer. 74 | - `.local_grad(X)`: calculates the local gradient of the input. 75 | 76 | The input to the layers should always be a `numpy.ndarray` of shape `(n_batch, ...)`. For the 2D layers for images, the input should have shape `(n_batch, n_channels, n_height, n_width)`. 77 | 78 | ## `Linear` 79 | A simple fully connected layer. 80 | Parameters: 81 | - `in_dim`: integer, dimensions of the input. 82 | - `out_dim`: integer, dimensions of the output. 83 | 84 | Usage: 85 | - input: `numpy.ndarray` of shape `(N, in_dim)`. 86 | - output: `numpy.ndarray` of shape `(N, out_dim)`. 87 | 88 | ## `Conv2D` 89 | 2D convolutional layer. Parameters: 90 | - `in_channels`: integer, number of channels in the input image. 91 | - `out_channels`: integer, number of filters to be learned. 92 | - `kernel_size`: integer or tuple, the size of the filter to be learned. Defaults to 3. 93 | - `stride`: integer, stride of the convolution. Defaults to 1. 94 | - `padding`: integer, number of zeros to be added to each edge of the images. Defaults to 0. 95 | 96 | Usage: 97 | - input: `numpy.ndarray` of shape `(N, C_in, H_in, W_in)`. 98 | - output: `numpy.ndarray` of shape `(N, C_out, H_out, W_out)`. 99 | 100 | ## `MaxPool2D` 101 | 2D max pooling layer. Parameters: 102 | - `kernel_size`: integer or tuple, size of the pooling window. Defaults to 2. 103 | 104 | Usage: 105 | - input: `numpy.ndarray` of shape `(N, C, H, W)`. 106 | - output: `numpy.ndarray` of shape `(N, C, H//KH, W//KW)` with kernel size `(KH, KW)`. 107 | 108 | ## `BatchNorm2D` 109 | 2D batch normalization layer. Parameters: 110 | - `n_channels`: integer, number of channels. 111 | - `epsilon`: epsilon parameter for BatchNorm, defaults to 1e-5. 112 | 113 | Usage: 114 | - input: `numpy.ndarray` of shape `(N, C, H, W)`. 115 | - output: `numpy.ndarray` of shape `(N, C, H, W)`. 116 | 117 | ## `Flatten` 118 | A simple layer which flattens the outputs of a 2D layer for images. 119 | 120 | Usage: 121 | - input: `numpy.ndarray` of shape `(N, C, H, W)`. 122 | - output: `numpy.ndarray` of shape `(N, C*H*W)`. 123 | 124 | # Losses 125 | The implemented loss functions are located in `nn.losses`. As Layers, they are callable objects, with predictions and targets as input. 126 | 127 | ## `CrossEntropyLoss` 128 | Cross-entropy loss. Usage: 129 | - input: `numpy.ndarray` of shape `(N, D)` containing the class scores for each element in the batch. 130 | - output: `float`. 131 | 132 | ## `MeanSquareLoss` 133 | Mean square loss. Usage: 134 | - input: `numpy.ndarray` of shape `(N, D)`. 135 | - output: `numpy.ndarray` of shape `(N, D)`. 136 | 137 | # Activations 138 | The activation layers for the network can be found in `nn.activations`. They are functions, applying the specified activation function elementwisely on a `numpy.ndarray`. Currently, the following activation functions are implemented: 139 | - ReLU 140 | - Leaky ReLU 141 | - Sigmoid 142 | -------------------------------------------------------------------------------- /cnn_custom_dataset.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from argparse import ArgumentParser 4 | 5 | from nn.layers import * 6 | from nn.losses import CrossEntropyLoss 7 | from nn.activations import ReLU 8 | from nn.net import Net 9 | from nn.utils import load_data 10 | 11 | parser = ArgumentParser() 12 | parser.add_argument("--dataset", type=str, required=True) 13 | parser.add_argument("--epochs", type=int, default=100) 14 | parser.add_argument("--batch_size", type=int, default=100) 15 | parser.add_argument("--lr", type=float, default=1e-2) 16 | args = parser.parse_args() 17 | 18 | # load images 19 | print("loading data ...") 20 | X, y = load_data(args.dataset) 21 | print("data loaded") 22 | # scaling and converting to float 23 | print("scaling data...") 24 | X = X.astype("float32") / 255 25 | print("data scaled") 26 | 27 | # split to train and validation datasets 28 | idx = np.arange(len(X)) 29 | np.random.shuffle(idx) 30 | val_split = int(len(X) * 0.9) 31 | X_train, y_train = X[idx[:val_split]], y[idx[:val_split]] 32 | X_val, y_val = X[idx[val_split:]], y[idx[val_split:]] 33 | 34 | net = Net( 35 | layers=[ 36 | Conv2D(3, 8, 3, padding=1), 37 | MaxPool2D(kernel_size=2), 38 | ReLU(), 39 | BatchNorm2D(8), 40 | Conv2D(8, 16, 3, padding=1), 41 | MaxPool2D(kernel_size=2), 42 | ReLU(), 43 | BatchNorm2D(16), 44 | Flatten(), 45 | Linear(16 * 13 * 13, 12), 46 | ], 47 | loss=CrossEntropyLoss(), 48 | ) 49 | 50 | n_epochs = args.epochs 51 | n_batch = args.batch_size 52 | for epoch_idx in range(n_epochs): 53 | batch_idx = np.random.choice(range(len(X_train)), size=n_batch, replace=False) 54 | out = net(X_train[batch_idx]) 55 | preds = np.argmax(out, axis=1).reshape(-1, 1) 56 | accuracy = 100 * (preds == y_train[batch_idx]).sum() / n_batch 57 | loss = net.loss(out, y_train[batch_idx]) 58 | net.backward() 59 | net.update_weights(lr=args.lr) 60 | print("Epoch no. %d loss = %2f4 \t accuracy = %d %%" % (epoch_idx + 1, loss, accuracy)) 61 | if epoch_idx % 10 == 0: 62 | val_idx = np.random.choice(range(len(X_val)), size=n_batch, replace=False) 63 | val_out = net.forward(X_val[val_idx]) 64 | val_pred = np.argmax(val_out, axis=1).reshape(-1, 1) 65 | val_loss = net.loss(val_out, y_val[val_idx]) 66 | val_acc = 100 * (val_pred == y_val[val_idx]).sum() / n_batch 67 | print("Validation loss = %2f4 \t accuracy = %d %%" % (val_loss, val_acc)) 68 | -------------------------------------------------------------------------------- /cnn_mnist.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from nn.layers import * 4 | from nn.losses import CrossEntropyLoss 5 | from nn.activations import ReLU 6 | from nn.net import Net 7 | 8 | from keras.datasets import mnist 9 | 10 | net = Net( 11 | layers=[ 12 | Conv2D(1, 4, 3, padding=1), 13 | MaxPool2D(kernel_size=2), 14 | ReLU(), 15 | BatchNorm2D(4), 16 | Conv2D(4, 8, 3, padding=1), 17 | MaxPool2D(kernel_size=2), 18 | ReLU(), 19 | BatchNorm2D(8), 20 | Flatten(), 21 | Linear(8 * 7 * 7, 10), 22 | ], 23 | loss=CrossEntropyLoss(), 24 | ) 25 | 26 | (X_train, y_train), (X_test, y_test) = mnist.load_data() 27 | # reshaping 28 | X_train, X_test = X_train.reshape(-1, 1, 28, 28), X_test.reshape(-1, 1, 28, 28) 29 | y_train, y_test = y_train.reshape(-1, 1), y_test.reshape(-1, 1) 30 | # normalizing and scaling data 31 | X_train, X_test = X_train.astype("float32") / 255, X_test.astype("float32") / 255 32 | 33 | n_epochs = 1000 34 | n_batch = 100 35 | for epoch_idx in range(n_epochs): 36 | batch_idx = np.random.choice(range(len(X_train)), size=n_batch, replace=False) 37 | out = net(X_train[batch_idx]) 38 | preds = np.argmax(out, axis=1).reshape(-1, 1) 39 | accuracy = 100 * (preds == y_train[batch_idx]).sum() / n_batch 40 | loss = net.loss(out, y_train[batch_idx]) 41 | net.backward() 42 | net.update_weights(lr=0.01) 43 | print("Epoch no. %d loss = %2f4 \t accuracy = %d %%" % (epoch_idx + 1, loss, accuracy)) 44 | -------------------------------------------------------------------------------- /mlp.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | import matplotlib.pyplot as plt 4 | 5 | from nn.layers import * 6 | from nn.losses import CrossEntropyLoss 7 | from nn.activations import ReLU, Softmax 8 | from nn.net import Net 9 | 10 | 11 | # functions for visualization 12 | def plot_data(X1, X2, export_path=None): 13 | with plt.style.context("seaborn-white"): 14 | plt.figure(figsize=(10, 10)) 15 | plt.scatter(X1[:, 0], X1[:, 1], c="r", edgecolor="k") 16 | plt.scatter(X2[:, 0], X2[:, 1], c="b", edgecolor="k") 17 | plt.title("The data") 18 | if export_path is None: 19 | plt.show() 20 | else: 21 | plt.savefig(export_path, dpi=500) 22 | 23 | 24 | def make_grid(X_data, n_res=20): 25 | x_min, x_max = X_data[:, 0].min() - 0.5, X_data[:, 0].max() + 0.5 26 | y_min, y_max = X_data[:, 1].min() - 0.5, X_data[:, 1].max() + 0.5 27 | x_meshgrid, y_meshgrid = np.meshgrid( 28 | np.linspace(x_min, x_max, n_res), np.linspace(y_min, y_max, n_res) 29 | ) 30 | 31 | X_grid = np.concatenate((x_meshgrid.reshape(-1, 1), y_meshgrid.reshape(-1, 1)), axis=1) 32 | 33 | return x_meshgrid, y_meshgrid, X_grid 34 | 35 | 36 | def plot_classifier(net, X_data, x_meshgrid, y_meshgrid, X_grid, export_path=None): 37 | y_grid = Softmax()(net(X_grid))[:, 0].reshape(x_meshgrid.shape) 38 | y_data = net(X_data) 39 | preds = np.argmax(y_data, axis=1) 40 | 41 | with plt.style.context("seaborn-white"): 42 | plt.figure(figsize=(5, 5)) 43 | plt.scatter(X_data[preds == 0, 0], X_data[preds == 0, 1], c="b", zorder=1, edgecolor="k") 44 | plt.scatter(X_data[preds == 1, 0], X_data[preds == 1, 1], c="r", zorder=1, edgecolor="k") 45 | plt.contourf(x_meshgrid, y_meshgrid, y_grid, zorder=0, cmap="RdBu") 46 | if not export_path: 47 | plt.show() 48 | else: 49 | plt.savefig(export_path, dpi=500) 50 | 51 | plt.close("all") 52 | 53 | 54 | # generating some data 55 | n_class_size = 100 56 | r = 2 57 | X1_offset = np.random.rand(n_class_size, 2) - 0.5 58 | np.sqrt(np.sum(X1_offset ** 2, axis=1, keepdims=True)) 59 | X1_offset = r * X1_offset / np.sqrt(np.sum(X1_offset ** 2, axis=1, keepdims=True)) 60 | X1 = np.random.multivariate_normal([0, 0], [[0.1, 0], [0, 0.1]], size=n_class_size) + X1_offset 61 | X2 = np.random.multivariate_normal([0, 0], [[0.1, 0], [0, 0.1]], size=n_class_size) 62 | 63 | X = np.concatenate((X1, X2)) 64 | Y_labels = np.array([0] * n_class_size + [1] * n_class_size) 65 | 66 | plot_data(X1, X2) 67 | # make meshgrid 68 | x_meshgrid, y_meshgrid, X_grid = make_grid(X, n_res=100) 69 | 70 | net = Net( 71 | layers=[Linear(2, 4), ReLU(), Linear(4, 2), Softmax()], loss=CrossEntropyLoss() 72 | ) 73 | 74 | n_epochs = 10000 75 | for epoch_idx in range(n_epochs): 76 | print("Epoch no. %d" % epoch_idx) 77 | out = net(X) 78 | # prediction accuracy 79 | pred = np.argmax(out, axis=1) 80 | print("accuracy: %1.4f" % (1 - np.abs(pred - Y_labels).sum() / 200)) 81 | loss = net.loss(out, Y_labels) 82 | print("loss: %1.4f" % loss) 83 | grad = net.backward() 84 | net.update_weights(0.1) 85 | if epoch_idx % 1000 == 0: 86 | plot_classifier(net, X, x_meshgrid, y_meshgrid, X_grid) 87 | -------------------------------------------------------------------------------- /nn/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cosmic-cortex/neural-networks-from-scratch/8cb53a47a56455df0dc2bf3b3e8ddcc4318ab208/nn/__init__.py -------------------------------------------------------------------------------- /nn/activations.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from .functional import * 3 | from .layers import Function 4 | 5 | 6 | class Sigmoid(Function): 7 | def forward(self, X): 8 | return sigmoid(X) 9 | 10 | def backward(self, dY): 11 | return dY * self.grad["X"] 12 | 13 | def local_grad(self, X): 14 | grads = {"X": sigmoid_prime(X)} 15 | return grads 16 | 17 | 18 | class ReLU(Function): 19 | def forward(self, X): 20 | return relu(X) 21 | 22 | def backward(self, dY): 23 | return dY * self.grad["X"] 24 | 25 | def local_grad(self, X): 26 | grads = {"X": relu_prime(X)} 27 | return grads 28 | 29 | 30 | class LeakyReLU(Function): 31 | def forward(self, X): 32 | return leaky_relu(X) 33 | 34 | def backward(self, dY): 35 | return dY * self.grad["X"] 36 | 37 | def local_grad(self, X): 38 | grads = {"X": leaky_relu_prime(X)} 39 | return grads 40 | 41 | 42 | class Softmax(Function): 43 | def forward(self, X): 44 | exp_x = np.exp(X) 45 | probs = exp_x / np.sum(exp_x, axis=1, keepdims=True) 46 | self.cache["X"] = X 47 | self.cache["output"] = probs 48 | return probs 49 | 50 | def backward(self, dY): 51 | dX = [] 52 | 53 | for dY_row, grad_row in zip(dY, self.grad["X"]): 54 | dX.append(np.dot(dY_row, grad_row)) 55 | 56 | return np.array(dX) 57 | 58 | def local_grad(self, X): 59 | grad = [] 60 | 61 | for prob in self.cache["output"]: 62 | prob = prob.reshape(-1, 1) 63 | grad_row = -np.dot(prob, prob.T) 64 | grad_row_diagonal = prob * (1 - prob) 65 | np.fill_diagonal(grad_row, grad_row_diagonal) 66 | grad.append(grad_row) 67 | 68 | grad = np.array(grad) 69 | return {"X": grad} 70 | -------------------------------------------------------------------------------- /nn/functional.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def sigmoid(x): 5 | return 1 / (1 + np.exp(-x)) 6 | 7 | 8 | def sigmoid_prime(x): 9 | s = sigmoid(x) 10 | return s * (1 - s) 11 | 12 | 13 | def relu(x): 14 | return x * (x > 0) 15 | 16 | 17 | def relu_prime(x): 18 | return 1 * (x > 0) 19 | 20 | 21 | def leaky_relu(x, alpha): 22 | return x * (x > 0) + alpha * x * (x <= 0) 23 | 24 | 25 | def leaky_relu_prime(x, alpha): 26 | return 1 * (x > 0) + alpha * (x <= 0) 27 | -------------------------------------------------------------------------------- /nn/layers.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from math import sqrt 4 | from itertools import product 5 | 6 | from .utils import zero_pad 7 | 8 | 9 | class Function: 10 | """ 11 | Abstract model of a differentiable function. 12 | """ 13 | 14 | def __init__(self, *args, **kwargs): 15 | # initializing cache for intermediate results 16 | # helps with gradient calculation in some cases 17 | self.cache = {} 18 | # cache for gradients 19 | self.grad = {} 20 | 21 | def __call__(self, *args, **kwargs): 22 | # calculating output 23 | output = self.forward(*args, **kwargs) 24 | # calculating and caching local gradients 25 | self.grad = self.local_grad(*args, **kwargs) 26 | return output 27 | 28 | def forward(self, *args, **kwargs): 29 | """ 30 | Forward pass of the function. Calculates the output value and the 31 | gradient at the input as well. 32 | """ 33 | pass 34 | 35 | def backward(self, *args, **kwargs): 36 | """ 37 | Backward pass. Computes the local gradient at the input value 38 | after forward pass. 39 | """ 40 | pass 41 | 42 | def local_grad(self, *args, **kwargs): 43 | """ 44 | Calculates the local gradients of the function at the given input. 45 | 46 | Returns: 47 | grad: dictionary of local gradients. 48 | """ 49 | pass 50 | 51 | 52 | class Layer(Function): 53 | """ 54 | Abstract model of a neural network layer. In addition to Function, a Layer 55 | also has weights and gradients with respect to the weights. 56 | """ 57 | 58 | def __init__(self, *args, **kwargs): 59 | super().__init__(*args, **kwargs) 60 | self.weight = {} 61 | self.weight_update = {} 62 | 63 | def _init_weights(self, *args, **kwargs): 64 | pass 65 | 66 | def _update_weights(self, lr): 67 | """ 68 | Updates the weights using the corresponding _global_ gradients computed during 69 | backpropagation. 70 | 71 | Args: 72 | lr: float. Learning rate. 73 | """ 74 | for weight_key, weight in self.weight.items(): 75 | self.weight[weight_key] = self.weight[weight_key] - lr * self.weight_update[weight_key] 76 | 77 | 78 | class Flatten(Function): 79 | def forward(self, X): 80 | self.cache["shape"] = X.shape 81 | n_batch = X.shape[0] 82 | return X.reshape(n_batch, -1) 83 | 84 | def backward(self, dY): 85 | return dY.reshape(self.cache["shape"]) 86 | 87 | 88 | class MaxPool2D(Function): 89 | def __init__(self, kernel_size=(2, 2)): 90 | super().__init__() 91 | self.kernel_size = ( 92 | (kernel_size, kernel_size) if isinstance(kernel_size, int) else kernel_size 93 | ) 94 | 95 | def __call__(self, X): 96 | # in contrary to other Function subclasses, MaxPool2D does not need to call 97 | # .local_grad() after forward pass because the gradient is calculated during it 98 | return self.forward(X) 99 | 100 | def forward(self, X): 101 | N, C, H, W = X.shape 102 | KH, KW = self.kernel_size 103 | 104 | grad = np.zeros_like(X) 105 | Y = np.zeros((N, C, H // KH, W // KW)) 106 | 107 | # for n in range(N): 108 | for h, w in product(range(0, H // KH), range(0, W // KW)): 109 | h_offset, w_offset = h * KH, w * KW 110 | rec_field = X[:, :, h_offset : h_offset + KH, w_offset : w_offset + KW] 111 | Y[:, :, h, w] = np.max(rec_field, axis=(2, 3)) 112 | for kh, kw in product(range(KH), range(KW)): 113 | grad[:, :, h_offset + kh, w_offset + kw] = ( 114 | X[:, :, h_offset + kh, w_offset + kw] >= Y[:, :, h, w] 115 | ) 116 | 117 | # storing the gradient 118 | self.grad["X"] = grad 119 | 120 | return Y 121 | 122 | def backward(self, dY): 123 | dY = np.repeat( 124 | np.repeat(dY, repeats=self.kernel_size[0], axis=2), repeats=self.kernel_size[1], axis=3 125 | ) 126 | return self.grad["X"] * dY 127 | 128 | def local_grad(self, X): 129 | # small hack: because for MaxPool calculating the gradient is simpler during 130 | # the forward pass, it is calculated there and this function just returns the 131 | # grad dictionary 132 | return self.grad 133 | 134 | 135 | class BatchNorm2D(Layer): 136 | def __init__(self, n_channels, epsilon=1e-5): 137 | super().__init__() 138 | self.epsilon = epsilon 139 | self.n_channels = n_channels 140 | self._init_weights(n_channels) 141 | 142 | def _init_weights(self, n_channels): 143 | self.weight["gamma"] = np.ones(shape=(1, n_channels, 1, 1)) 144 | self.weight["beta"] = np.zeros(shape=(1, n_channels, 1, 1)) 145 | 146 | def forward(self, X): 147 | """ 148 | Forward pass for the 2D batchnorm layer. 149 | 150 | Args: 151 | X: numpy.ndarray of shape (n_batch, n_channels, height, width). 152 | 153 | Returns_ 154 | Y: numpy.ndarray of shape (n_batch, n_channels, height, width). 155 | Batch-normalized tensor of X. 156 | """ 157 | mean = np.mean(X, axis=(2, 3), keepdims=True) 158 | var = np.var(X, axis=(2, 3), keepdims=True) + self.epsilon 159 | invvar = 1.0 / var 160 | sqrt_invvar = np.sqrt(invvar) 161 | centered = X - mean 162 | scaled = centered * sqrt_invvar 163 | normalized = scaled * self.weight["gamma"] + self.weight["beta"] 164 | 165 | # caching intermediate results for backprop 166 | self.cache["mean"] = mean 167 | self.cache["var"] = var 168 | self.cache["invvar"] = invvar 169 | self.cache["sqrt_invvar"] = sqrt_invvar 170 | self.cache["centered"] = centered 171 | self.cache["scaled"] = scaled 172 | self.cache["normalized"] = normalized 173 | 174 | return normalized 175 | 176 | def backward(self, dY): 177 | """ 178 | Backward pass for the 2D batchnorm layer. Calculates global gradients 179 | for the input and the parameters. 180 | 181 | Args: 182 | dY: numpy.ndarray of shape (n_batch, n_channels, height, width). 183 | 184 | Returns: 185 | dX: numpy.ndarray of shape (n_batch, n_channels, height, width). 186 | Global gradient wrt the input X. 187 | """ 188 | # global gradients of parameters 189 | dgamma = np.sum(self.cache["scaled"] * dY, axis=(0, 2, 3), keepdims=True) 190 | dbeta = np.sum(dY, axis=(0, 2, 3), keepdims=True) 191 | 192 | # caching global gradients of parameters 193 | self.weight_update["gamma"] = dgamma 194 | self.weight_update["beta"] = dbeta 195 | 196 | # global gradient of the input 197 | dX = self.grad["X"] * dY 198 | 199 | return dX 200 | 201 | def local_grad(self, X): 202 | """ 203 | Calculates the local gradient for X. 204 | 205 | Args: 206 | dY: numpy.ndarray of shape (n_batch, n_channels, height, width). 207 | 208 | Returns: 209 | grads: dictionary of gradients. 210 | """ 211 | # global gradient of the input 212 | N, C, H, W = X.shape 213 | # ppc = pixels per channel, useful variable for further computations 214 | ppc = H * W 215 | 216 | # gradient for 'denominator path' 217 | dsqrt_invvar = self.cache["centered"] 218 | dinvvar = (1.0 / (2.0 * np.sqrt(self.cache["invvar"]))) * dsqrt_invvar 219 | dvar = (-1.0 / self.cache["var"] ** 2) * dinvvar 220 | ddenominator = (X - self.cache["mean"]) * (2 * (ppc - 1) / ppc ** 2) * dvar 221 | 222 | # gradient for 'numerator path' 223 | dcentered = self.cache["sqrt_invvar"] 224 | dnumerator = (1.0 - 1.0 / ppc) * dcentered 225 | 226 | dX = ddenominator + dnumerator 227 | grads = {"X": dX} 228 | return grads 229 | 230 | 231 | class Linear(Layer): 232 | def __init__(self, in_dim, out_dim): 233 | super().__init__() 234 | self._init_weights(in_dim, out_dim) 235 | 236 | def _init_weights(self, in_dim, out_dim): 237 | scale = 1 / sqrt(in_dim) 238 | self.weight["W"] = scale * np.random.randn(in_dim, out_dim) 239 | self.weight["b"] = scale * np.random.randn(1, out_dim) 240 | 241 | def forward(self, X): 242 | """ 243 | Forward pass for the Linear layer. 244 | 245 | Args: 246 | X: numpy.ndarray of shape (n_batch, in_dim) containing 247 | the input value. 248 | 249 | Returns: 250 | Y: numpy.ndarray of shape of shape (n_batch, out_dim) containing 251 | the output value. 252 | """ 253 | 254 | output = np.dot(X, self.weight["W"]) + self.weight["b"] 255 | 256 | # caching variables for backprop 257 | self.cache["X"] = X 258 | self.cache["output"] = output 259 | 260 | return output 261 | 262 | def backward(self, dY): 263 | """ 264 | Backward pass for the Linear layer. 265 | 266 | Args: 267 | dY: numpy.ndarray of shape (n_batch, n_out). Global gradient 268 | backpropagated from the next layer. 269 | 270 | Returns: 271 | dX: numpy.ndarray of shape (n_batch, n_out). Global gradient 272 | of the Linear layer. 273 | """ 274 | # calculating the global gradient, to be propagated backwards 275 | dX = dY.dot(self.grad["X"].T) 276 | # calculating the global gradient wrt to weights 277 | X = self.cache["X"] 278 | dW = self.grad["W"].T.dot(dY) 279 | db = np.sum(dY, axis=0, keepdims=True) 280 | # caching the global gradients 281 | self.weight_update = {"W": dW, "b": db} 282 | 283 | return dX 284 | 285 | def local_grad(self, X): 286 | """ 287 | Local gradients of the Linear layer at X. 288 | 289 | Args: 290 | X: numpy.ndarray of shape (n_batch, in_dim) containing the 291 | input data. 292 | 293 | Returns: 294 | grads: dictionary of local gradients with the following items: 295 | X: numpy.ndarray of shape (n_batch, in_dim). 296 | W: numpy.ndarray of shape (n_batch, in_dim). 297 | b: numpy.ndarray of shape (n_batch, 1). 298 | """ 299 | gradX_local = self.weight["W"] 300 | gradW_local = X 301 | gradb_local = np.ones_like(self.weight["b"]) 302 | grads = {"X": gradX_local, "W": gradW_local, "b": gradb_local} 303 | return grads 304 | 305 | 306 | class Conv2D(Layer): 307 | def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=0): 308 | super().__init__() 309 | self.in_channels = in_channels 310 | self.out_channels = out_channels 311 | self.stride = stride 312 | self.kernel_size = ( 313 | kernel_size if isinstance(kernel_size, tuple) else (kernel_size, kernel_size) 314 | ) 315 | self.padding = padding 316 | self._init_weights(in_channels, out_channels, self.kernel_size) 317 | 318 | def _init_weights(self, in_channels, out_channels, kernel_size): 319 | scale = 2 / sqrt(in_channels * kernel_size[0] * kernel_size[1]) 320 | 321 | self.weight = { 322 | "W": np.random.normal(scale=scale, size=(out_channels, in_channels, *kernel_size)), 323 | "b": np.zeros(shape=(out_channels, 1)), 324 | } 325 | 326 | def forward(self, X): 327 | """ 328 | Forward pass for the convolution layer. 329 | 330 | Args: 331 | X: numpy.ndarray of shape (N, C, H_in, W_in). 332 | 333 | Returns: 334 | Y: numpy.ndarray of shape (N, F, H_out, W_out). 335 | """ 336 | if self.padding: 337 | X = zero_pad(X, pad_width=self.padding, dims=(2, 3)) 338 | 339 | self.cache["X"] = X 340 | 341 | N, C, H, W = X.shape 342 | KH, KW = self.kernel_size 343 | out_shape = (N, self.out_channels, 1 + (H - KH) // self.stride, 1 + (W - KW) // self.stride) 344 | Y = np.zeros(out_shape) 345 | for n in range(N): 346 | for c_w in range(self.out_channels): 347 | for h, w in product(range(out_shape[2]), range(out_shape[3])): 348 | h_offset, w_offset = h * self.stride, w * self.stride 349 | rec_field = X[n, :, h_offset : h_offset + KH, w_offset : w_offset + KW] 350 | Y[n, c_w, h, w] = ( 351 | np.sum(self.weight["W"][c_w] * rec_field) + self.weight["b"][c_w] 352 | ) 353 | 354 | return Y 355 | 356 | def backward(self, dY): 357 | # calculating the global gradient to be propagated backwards 358 | # TODO: this is actually transpose convolution, move this to a util function 359 | X = self.cache["X"] 360 | dX = np.zeros_like(X) 361 | N, C, H, W = dX.shape 362 | KH, KW = self.kernel_size 363 | for n in range(N): 364 | for c_w in range(self.out_channels): 365 | for h, w in product(range(dY.shape[2]), range(dY.shape[3])): 366 | h_offset, w_offset = h * self.stride, w * self.stride 367 | dX[n, :, h_offset : h_offset + KH, w_offset : w_offset + KW] += ( 368 | self.weight["W"][c_w] * dY[n, c_w, h, w] 369 | ) 370 | 371 | # calculating the global gradient wrt the conv filter weights 372 | dW = np.zeros_like(self.weight["W"]) 373 | for c_w in range(self.out_channels): 374 | for c_i in range(self.in_channels): 375 | for h, w in product(range(KH), range(KW)): 376 | X_rec_field = X[ 377 | :, c_i, h : H - KH + h + 1 : self.stride, w : W - KW + w + 1 : self.stride 378 | ] 379 | dY_rec_field = dY[:, c_w] 380 | dW[c_w, c_i, h, w] = np.sum(X_rec_field * dY_rec_field) 381 | 382 | # calculating the global gradient wrt to the bias 383 | db = np.sum(dY, axis=(0, 2, 3)).reshape(-1, 1) 384 | 385 | # caching the global gradients of the parameters 386 | self.weight_update["W"] = dW 387 | self.weight_update["b"] = db 388 | 389 | return dX[:, :, self.padding : -self.padding, self.padding : -self.padding] 390 | -------------------------------------------------------------------------------- /nn/losses.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from .layers import Function 3 | 4 | 5 | class Loss(Function): 6 | def forward(self, X, Y): 7 | """ 8 | Computes the loss of x with respect to y. 9 | 10 | Args: 11 | X: numpy.ndarray of shape (n_batch, n_dim). 12 | Y: numpy.ndarray of shape (n_batch, n_dim). 13 | 14 | Returns: 15 | loss: numpy.float. 16 | """ 17 | pass 18 | 19 | def backward(self): 20 | """ 21 | Backward pass for the loss function. Since it should be the final layer 22 | of an architecture, no input is needed for the backward pass. 23 | 24 | Returns: 25 | gradX: numpy.ndarray of shape (n_batch, n_dim). Local gradient of the loss. 26 | """ 27 | return self.grad["X"] 28 | 29 | def local_grad(self, X, Y): 30 | """ 31 | Local gradient with respect to X at (X, Y). 32 | 33 | Args: 34 | X: numpy.ndarray of shape (n_batch, n_dim). 35 | Y: numpy.ndarray of shape (n_batch, n_dim). 36 | 37 | Returns: 38 | gradX: numpy.ndarray of shape (n_batch, n_dim). 39 | """ 40 | pass 41 | 42 | 43 | class MeanSquareLoss(Loss): 44 | def forward(self, X, Y): 45 | """ 46 | Computes the mean square error of X with respect to Y. 47 | 48 | Args: 49 | X: numpy.ndarray of shape (n_batch, n_dim). 50 | Y: numpy.ndarray of shape (n_batch, n_dim). 51 | 52 | Returns: 53 | mse_loss: numpy.float. Mean square error of x with respect to y. 54 | """ 55 | # calculating loss 56 | sum = np.sum((X - Y) ** 2, axis=1, keepdims=True) 57 | mse_loss = np.mean(sum) 58 | return mse_loss 59 | 60 | def local_grad(self, X, Y): 61 | """ 62 | Local gradient with respect to X at (X, Y). 63 | 64 | Args: 65 | X: numpy.ndarray of shape (n_batch, n_dim). 66 | Y: numpy.ndarray of shape (n_batch, n_dim). 67 | 68 | Returns: 69 | gradX: numpy.ndarray of shape (n_batch, n_dim). Gradient of MSE wrt X at X and Y. 70 | """ 71 | grads = {"X": 2 * (X - Y) / X.shape[0]} 72 | return grads 73 | 74 | 75 | class CrossEntropyLoss(Loss): 76 | def forward(self, X, y): 77 | """ 78 | Computes the cross entropy loss of x with respect to y. 79 | 80 | Args: 81 | X: numpy.ndarray of shape (n_batch, n_dim). 82 | y: numpy.ndarray of shape (n_batch, 1). Should contain class labels 83 | for each data point in x. 84 | 85 | Returns: 86 | crossentropy_loss: numpy.float. Cross entropy loss of x with respect to y. 87 | """ 88 | # calculating crossentropy 89 | exp_x = np.exp(X) 90 | probs = exp_x / np.sum(exp_x, axis=1, keepdims=True) 91 | log_probs = -np.log([probs[i, y[i]] for i in range(len(probs))]) 92 | crossentropy_loss = np.mean(log_probs) 93 | 94 | # caching for backprop 95 | self.cache["probs"] = probs 96 | self.cache["y"] = y 97 | 98 | return crossentropy_loss 99 | 100 | def local_grad(self, X, Y): 101 | probs = self.cache["probs"] 102 | ones = np.zeros_like(probs) 103 | for row_idx, col_idx in enumerate(Y): 104 | ones[row_idx, col_idx] = 1.0 105 | 106 | grads = {"X": (probs - ones) / float(len(X))} 107 | return grads 108 | -------------------------------------------------------------------------------- /nn/net.py: -------------------------------------------------------------------------------- 1 | from .losses import Loss 2 | from .layers import Function, Layer 3 | 4 | 5 | class Net: 6 | __slots__ = ["layers", "loss_fn"] 7 | 8 | def __init__(self, layers, loss): 9 | assert isinstance(loss, Loss), "loss must be an instance of nn.losses.Loss" 10 | for layer in layers: 11 | assert isinstance(layer, Function), ( 12 | "layer should be an instance of " "nn.layers.Function or nn.layers.Layer" 13 | ) 14 | 15 | self.layers = layers 16 | self.loss_fn = loss 17 | 18 | def __call__(self, *args, **kwargs): 19 | return self.forward(*args, **kwargs) 20 | 21 | def forward(self, x): 22 | """ 23 | Calculates the forward pass by propagating the input through the 24 | layers. 25 | 26 | Args: 27 | x: numpy.ndarray. Input of the net. 28 | 29 | Returns: 30 | output: numpy.ndarray. Output of the net. 31 | """ 32 | for layer in self.layers: 33 | x = layer(x) 34 | return x 35 | 36 | def loss(self, x, y): 37 | """ 38 | Calculates the loss of the forward pass output with respect to y. 39 | Should be called after forward pass. 40 | 41 | Args: 42 | x: numpy.ndarray. Output of the forward pass. 43 | y: numpy.ndarray. Ground truth. 44 | 45 | Returns: 46 | loss: numpy.float. Loss value. 47 | """ 48 | loss = self.loss_fn(x, y) 49 | return loss 50 | 51 | def backward(self): 52 | """ 53 | Complete backward pass for the net. Should be called after the forward 54 | pass and the loss are calculated. 55 | 56 | Returns: 57 | d: numpy.ndarray of shape matching the input during forward pass. 58 | """ 59 | d = self.loss_fn.backward() 60 | for layer in reversed(self.layers): 61 | d = layer.backward(d) 62 | return d 63 | 64 | def update_weights(self, lr): 65 | """ 66 | Updates the weights for all layers using the corresponding gradients 67 | computed during backpropagation. 68 | 69 | Args: 70 | lr: float. Learning rate. 71 | """ 72 | for layer in self.layers: 73 | if isinstance(layer, Layer): 74 | layer._update_weights(lr) 75 | -------------------------------------------------------------------------------- /nn/utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | 4 | from skimage import io 5 | 6 | 7 | def zero_pad(X, pad_width, dims): 8 | """ 9 | Pads the given array X with zeroes at the both end of given dims. 10 | 11 | Args: 12 | X: numpy.ndarray. 13 | pad_width: int, width of the padding. 14 | dims: int or tuple, dimensions to be padded. 15 | 16 | Returns: 17 | X_padded: numpy.ndarray, zero padded X. 18 | """ 19 | dims = (dims) if isinstance(dims, int) else dims 20 | pad = [(0, 0) if idx not in dims else (pad_width, pad_width) for idx in range(len(X.shape))] 21 | X_padded = np.pad(X, pad, "constant") 22 | return X_padded 23 | 24 | 25 | def load_data(folder_path): 26 | imgs = [] 27 | labels = [] 28 | for class_dir in os.listdir(folder_path): 29 | class_label = int(class_dir) - 1 30 | class_path = os.path.join(folder_path, class_dir) 31 | imgs.append( 32 | np.array( 33 | [ 34 | io.imread(os.path.join(class_path, fname)).transpose((2, 0, 1)) 35 | for fname in os.listdir(class_path) 36 | ] 37 | ) 38 | ) 39 | labels.append(np.array([class_label] * len(os.listdir(class_path)))) 40 | 41 | X = np.concatenate(imgs, axis=0) 42 | y = np.concatenate(labels).reshape(-1, 1) 43 | 44 | return X, y 45 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | scikit-image -------------------------------------------------------------------------------- /tests/core_tests.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | 4 | sys.path.append("..") 5 | import unittest 6 | 7 | from itertools import product 8 | 9 | from nn.layers import Linear 10 | 11 | 12 | class Test(unittest.TestCase): 13 | def test_linear(self): 14 | for in_dim, out_dim, n_batch in product(range(1, 10), range(1, 10), range(1, 10)): 15 | linear = Linear(in_dim, out_dim) 16 | x = np.random.rand(n_batch, in_dim) 17 | y = linear.forward(x) 18 | self.assertEqual(y.shape, (in_dim, out_dim)) 19 | --------------------------------------------------------------------------------