├── .gitignore
├── LICENSE
├── README.md
├── cnn_custom_dataset.py
├── cnn_mnist.py
├── mlp.py
├── nn
├── __init__.py
├── activations.py
├── functional.py
├── layers.py
├── losses.py
├── net.py
└── utils.py
├── requirements.txt
└── tests
└── core_tests.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .idea
2 | *.xml
3 | __pycache__
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2020 Tivadar Danka
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # neural-networks-from-scratch
2 |
3 | # Contents
4 | - [Quickstart](#quickstart)
5 | - [A simple example CNN](#CNN-example)
6 | - [The `Net` object](#net)
7 | - [Layers](#layers)
8 | - [`Linear`](#linear)
9 | - [`Conv2D`](#conv2d)
10 | - [`MaxPool2D`](#maxpool2d)
11 | - [`BatchNorm2D`](#batchnorm2d)
12 | - [`Flatten`](#flatten)
13 | - [Losses](#losses)
14 | - [`CrossEntropyLoss`](#crossentropyloss)
15 | - [`MeanSquareLoss`](#meansquareloss)
16 | - [Activations](#activations)
17 |
18 | # Quickstart
19 |
20 | ## Installation
21 | To run the examples, creating a virtual environment is recommended.
22 | ```bash
23 | virtualenv neural-networks-from scratch
24 | ```
25 | When a virtual environment is in place, all requirements can be installed with pip.
26 | ```bash
27 | source neural-networks-from-scratch/bin/activate
28 | pip install -r requirements.txt
29 | ```
30 |
31 |
32 | ## A simple example CNN
33 | A simple convolutional network for image classification can be found in `CNN_custom_dataset.py`. To try it on your own dataset, you should prepare your images in the following format:
34 | ```bash
35 | images_folder
36 | |-- class_01
37 | |-- 001.png
38 | |-- ...
39 | |-- class_02
40 | |-- 001.png
41 | |-- ...
42 | |-- ...
43 | ```
44 | Its required argument is
45 | - `--dataset`: path to the dataset,
46 |
47 | while the optional arguments are
48 | - `--epochs`: number of epochs,
49 | - `--batch_size`: size of the training batch,
50 | - `--lr`: learning rate.
51 |
52 | ## The `Net` object
53 | To define a neural network, the `nn.net.Net` object can be used. Its parameters are
54 | * `layers`: a list of layers from `nn.layers`, for example `[Linear(2, 4), ReLU(), Linear(4, 2)]`,
55 | * `loss`: a loss function from `nn.losses`, for example `CrossEntropyLoss` or `MeanSquareLoss`.
56 | If you would like to train the model with data `X` and label `y`, you should
57 | 1) perform the forward pass, during which local gradients are calculated,
58 | 2) calculate the loss,
59 | 3) perform the backward pass, where global gradients with respect to the variables and layer parameters are calculated,
60 | 4) update the weights.
61 |
62 | In code, this looks like the following:
63 | ```python3
64 | out = net(X)
65 | loss = net.loss(out, y)
66 | net.backward()
67 | net.update_weights(lr)
68 | ```
69 |
70 | # Layers
71 | The currently implemented layers can be found in `nn.layers`. Each layer is a callable object, where calling performs the forward pass and calculates local gradients. The most important methods are:
72 | - `.forward(X)`: performs the forward pass for X. Instead calling `forward` directly, the layer object should be called directly, which calculates and caches local gradients.
73 | - `.backward(dY)`: performs the backward pass, where `dY` is the gradient propagated backwards from the consequtive layer.
74 | - `.local_grad(X)`: calculates the local gradient of the input.
75 |
76 | The input to the layers should always be a `numpy.ndarray` of shape `(n_batch, ...)`. For the 2D layers for images, the input should have shape `(n_batch, n_channels, n_height, n_width)`.
77 |
78 | ## `Linear`
79 | A simple fully connected layer.
80 | Parameters:
81 | - `in_dim`: integer, dimensions of the input.
82 | - `out_dim`: integer, dimensions of the output.
83 |
84 | Usage:
85 | - input: `numpy.ndarray` of shape `(N, in_dim)`.
86 | - output: `numpy.ndarray` of shape `(N, out_dim)`.
87 |
88 | ## `Conv2D`
89 | 2D convolutional layer. Parameters:
90 | - `in_channels`: integer, number of channels in the input image.
91 | - `out_channels`: integer, number of filters to be learned.
92 | - `kernel_size`: integer or tuple, the size of the filter to be learned. Defaults to 3.
93 | - `stride`: integer, stride of the convolution. Defaults to 1.
94 | - `padding`: integer, number of zeros to be added to each edge of the images. Defaults to 0.
95 |
96 | Usage:
97 | - input: `numpy.ndarray` of shape `(N, C_in, H_in, W_in)`.
98 | - output: `numpy.ndarray` of shape `(N, C_out, H_out, W_out)`.
99 |
100 | ## `MaxPool2D`
101 | 2D max pooling layer. Parameters:
102 | - `kernel_size`: integer or tuple, size of the pooling window. Defaults to 2.
103 |
104 | Usage:
105 | - input: `numpy.ndarray` of shape `(N, C, H, W)`.
106 | - output: `numpy.ndarray` of shape `(N, C, H//KH, W//KW)` with kernel size `(KH, KW)`.
107 |
108 | ## `BatchNorm2D`
109 | 2D batch normalization layer. Parameters:
110 | - `n_channels`: integer, number of channels.
111 | - `epsilon`: epsilon parameter for BatchNorm, defaults to 1e-5.
112 |
113 | Usage:
114 | - input: `numpy.ndarray` of shape `(N, C, H, W)`.
115 | - output: `numpy.ndarray` of shape `(N, C, H, W)`.
116 |
117 | ## `Flatten`
118 | A simple layer which flattens the outputs of a 2D layer for images.
119 |
120 | Usage:
121 | - input: `numpy.ndarray` of shape `(N, C, H, W)`.
122 | - output: `numpy.ndarray` of shape `(N, C*H*W)`.
123 |
124 | # Losses
125 | The implemented loss functions are located in `nn.losses`. As Layers, they are callable objects, with predictions and targets as input.
126 |
127 | ## `CrossEntropyLoss`
128 | Cross-entropy loss. Usage:
129 | - input: `numpy.ndarray` of shape `(N, D)` containing the class scores for each element in the batch.
130 | - output: `float`.
131 |
132 | ## `MeanSquareLoss`
133 | Mean square loss. Usage:
134 | - input: `numpy.ndarray` of shape `(N, D)`.
135 | - output: `numpy.ndarray` of shape `(N, D)`.
136 |
137 | # Activations
138 | The activation layers for the network can be found in `nn.activations`. They are functions, applying the specified activation function elementwisely on a `numpy.ndarray`. Currently, the following activation functions are implemented:
139 | - ReLU
140 | - Leaky ReLU
141 | - Sigmoid
142 |
--------------------------------------------------------------------------------
/cnn_custom_dataset.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 | from argparse import ArgumentParser
4 |
5 | from nn.layers import *
6 | from nn.losses import CrossEntropyLoss
7 | from nn.activations import ReLU
8 | from nn.net import Net
9 | from nn.utils import load_data
10 |
11 | parser = ArgumentParser()
12 | parser.add_argument("--dataset", type=str, required=True)
13 | parser.add_argument("--epochs", type=int, default=100)
14 | parser.add_argument("--batch_size", type=int, default=100)
15 | parser.add_argument("--lr", type=float, default=1e-2)
16 | args = parser.parse_args()
17 |
18 | # load images
19 | print("loading data ...")
20 | X, y = load_data(args.dataset)
21 | print("data loaded")
22 | # scaling and converting to float
23 | print("scaling data...")
24 | X = X.astype("float32") / 255
25 | print("data scaled")
26 |
27 | # split to train and validation datasets
28 | idx = np.arange(len(X))
29 | np.random.shuffle(idx)
30 | val_split = int(len(X) * 0.9)
31 | X_train, y_train = X[idx[:val_split]], y[idx[:val_split]]
32 | X_val, y_val = X[idx[val_split:]], y[idx[val_split:]]
33 |
34 | net = Net(
35 | layers=[
36 | Conv2D(3, 8, 3, padding=1),
37 | MaxPool2D(kernel_size=2),
38 | ReLU(),
39 | BatchNorm2D(8),
40 | Conv2D(8, 16, 3, padding=1),
41 | MaxPool2D(kernel_size=2),
42 | ReLU(),
43 | BatchNorm2D(16),
44 | Flatten(),
45 | Linear(16 * 13 * 13, 12),
46 | ],
47 | loss=CrossEntropyLoss(),
48 | )
49 |
50 | n_epochs = args.epochs
51 | n_batch = args.batch_size
52 | for epoch_idx in range(n_epochs):
53 | batch_idx = np.random.choice(range(len(X_train)), size=n_batch, replace=False)
54 | out = net(X_train[batch_idx])
55 | preds = np.argmax(out, axis=1).reshape(-1, 1)
56 | accuracy = 100 * (preds == y_train[batch_idx]).sum() / n_batch
57 | loss = net.loss(out, y_train[batch_idx])
58 | net.backward()
59 | net.update_weights(lr=args.lr)
60 | print("Epoch no. %d loss = %2f4 \t accuracy = %d %%" % (epoch_idx + 1, loss, accuracy))
61 | if epoch_idx % 10 == 0:
62 | val_idx = np.random.choice(range(len(X_val)), size=n_batch, replace=False)
63 | val_out = net.forward(X_val[val_idx])
64 | val_pred = np.argmax(val_out, axis=1).reshape(-1, 1)
65 | val_loss = net.loss(val_out, y_val[val_idx])
66 | val_acc = 100 * (val_pred == y_val[val_idx]).sum() / n_batch
67 | print("Validation loss = %2f4 \t accuracy = %d %%" % (val_loss, val_acc))
68 |
--------------------------------------------------------------------------------
/cnn_mnist.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 | from nn.layers import *
4 | from nn.losses import CrossEntropyLoss
5 | from nn.activations import ReLU
6 | from nn.net import Net
7 |
8 | from keras.datasets import mnist
9 |
10 | net = Net(
11 | layers=[
12 | Conv2D(1, 4, 3, padding=1),
13 | MaxPool2D(kernel_size=2),
14 | ReLU(),
15 | BatchNorm2D(4),
16 | Conv2D(4, 8, 3, padding=1),
17 | MaxPool2D(kernel_size=2),
18 | ReLU(),
19 | BatchNorm2D(8),
20 | Flatten(),
21 | Linear(8 * 7 * 7, 10),
22 | ],
23 | loss=CrossEntropyLoss(),
24 | )
25 |
26 | (X_train, y_train), (X_test, y_test) = mnist.load_data()
27 | # reshaping
28 | X_train, X_test = X_train.reshape(-1, 1, 28, 28), X_test.reshape(-1, 1, 28, 28)
29 | y_train, y_test = y_train.reshape(-1, 1), y_test.reshape(-1, 1)
30 | # normalizing and scaling data
31 | X_train, X_test = X_train.astype("float32") / 255, X_test.astype("float32") / 255
32 |
33 | n_epochs = 1000
34 | n_batch = 100
35 | for epoch_idx in range(n_epochs):
36 | batch_idx = np.random.choice(range(len(X_train)), size=n_batch, replace=False)
37 | out = net(X_train[batch_idx])
38 | preds = np.argmax(out, axis=1).reshape(-1, 1)
39 | accuracy = 100 * (preds == y_train[batch_idx]).sum() / n_batch
40 | loss = net.loss(out, y_train[batch_idx])
41 | net.backward()
42 | net.update_weights(lr=0.01)
43 | print("Epoch no. %d loss = %2f4 \t accuracy = %d %%" % (epoch_idx + 1, loss, accuracy))
44 |
--------------------------------------------------------------------------------
/mlp.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 | import matplotlib.pyplot as plt
4 |
5 | from nn.layers import *
6 | from nn.losses import CrossEntropyLoss
7 | from nn.activations import ReLU, Softmax
8 | from nn.net import Net
9 |
10 |
11 | # functions for visualization
12 | def plot_data(X1, X2, export_path=None):
13 | with plt.style.context("seaborn-white"):
14 | plt.figure(figsize=(10, 10))
15 | plt.scatter(X1[:, 0], X1[:, 1], c="r", edgecolor="k")
16 | plt.scatter(X2[:, 0], X2[:, 1], c="b", edgecolor="k")
17 | plt.title("The data")
18 | if export_path is None:
19 | plt.show()
20 | else:
21 | plt.savefig(export_path, dpi=500)
22 |
23 |
24 | def make_grid(X_data, n_res=20):
25 | x_min, x_max = X_data[:, 0].min() - 0.5, X_data[:, 0].max() + 0.5
26 | y_min, y_max = X_data[:, 1].min() - 0.5, X_data[:, 1].max() + 0.5
27 | x_meshgrid, y_meshgrid = np.meshgrid(
28 | np.linspace(x_min, x_max, n_res), np.linspace(y_min, y_max, n_res)
29 | )
30 |
31 | X_grid = np.concatenate((x_meshgrid.reshape(-1, 1), y_meshgrid.reshape(-1, 1)), axis=1)
32 |
33 | return x_meshgrid, y_meshgrid, X_grid
34 |
35 |
36 | def plot_classifier(net, X_data, x_meshgrid, y_meshgrid, X_grid, export_path=None):
37 | y_grid = Softmax()(net(X_grid))[:, 0].reshape(x_meshgrid.shape)
38 | y_data = net(X_data)
39 | preds = np.argmax(y_data, axis=1)
40 |
41 | with plt.style.context("seaborn-white"):
42 | plt.figure(figsize=(5, 5))
43 | plt.scatter(X_data[preds == 0, 0], X_data[preds == 0, 1], c="b", zorder=1, edgecolor="k")
44 | plt.scatter(X_data[preds == 1, 0], X_data[preds == 1, 1], c="r", zorder=1, edgecolor="k")
45 | plt.contourf(x_meshgrid, y_meshgrid, y_grid, zorder=0, cmap="RdBu")
46 | if not export_path:
47 | plt.show()
48 | else:
49 | plt.savefig(export_path, dpi=500)
50 |
51 | plt.close("all")
52 |
53 |
54 | # generating some data
55 | n_class_size = 100
56 | r = 2
57 | X1_offset = np.random.rand(n_class_size, 2) - 0.5
58 | np.sqrt(np.sum(X1_offset ** 2, axis=1, keepdims=True))
59 | X1_offset = r * X1_offset / np.sqrt(np.sum(X1_offset ** 2, axis=1, keepdims=True))
60 | X1 = np.random.multivariate_normal([0, 0], [[0.1, 0], [0, 0.1]], size=n_class_size) + X1_offset
61 | X2 = np.random.multivariate_normal([0, 0], [[0.1, 0], [0, 0.1]], size=n_class_size)
62 |
63 | X = np.concatenate((X1, X2))
64 | Y_labels = np.array([0] * n_class_size + [1] * n_class_size)
65 |
66 | plot_data(X1, X2)
67 | # make meshgrid
68 | x_meshgrid, y_meshgrid, X_grid = make_grid(X, n_res=100)
69 |
70 | net = Net(
71 | layers=[Linear(2, 4), ReLU(), Linear(4, 2), Softmax()], loss=CrossEntropyLoss()
72 | )
73 |
74 | n_epochs = 10000
75 | for epoch_idx in range(n_epochs):
76 | print("Epoch no. %d" % epoch_idx)
77 | out = net(X)
78 | # prediction accuracy
79 | pred = np.argmax(out, axis=1)
80 | print("accuracy: %1.4f" % (1 - np.abs(pred - Y_labels).sum() / 200))
81 | loss = net.loss(out, Y_labels)
82 | print("loss: %1.4f" % loss)
83 | grad = net.backward()
84 | net.update_weights(0.1)
85 | if epoch_idx % 1000 == 0:
86 | plot_classifier(net, X, x_meshgrid, y_meshgrid, X_grid)
87 |
--------------------------------------------------------------------------------
/nn/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cosmic-cortex/neural-networks-from-scratch/8cb53a47a56455df0dc2bf3b3e8ddcc4318ab208/nn/__init__.py
--------------------------------------------------------------------------------
/nn/activations.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from .functional import *
3 | from .layers import Function
4 |
5 |
6 | class Sigmoid(Function):
7 | def forward(self, X):
8 | return sigmoid(X)
9 |
10 | def backward(self, dY):
11 | return dY * self.grad["X"]
12 |
13 | def local_grad(self, X):
14 | grads = {"X": sigmoid_prime(X)}
15 | return grads
16 |
17 |
18 | class ReLU(Function):
19 | def forward(self, X):
20 | return relu(X)
21 |
22 | def backward(self, dY):
23 | return dY * self.grad["X"]
24 |
25 | def local_grad(self, X):
26 | grads = {"X": relu_prime(X)}
27 | return grads
28 |
29 |
30 | class LeakyReLU(Function):
31 | def forward(self, X):
32 | return leaky_relu(X)
33 |
34 | def backward(self, dY):
35 | return dY * self.grad["X"]
36 |
37 | def local_grad(self, X):
38 | grads = {"X": leaky_relu_prime(X)}
39 | return grads
40 |
41 |
42 | class Softmax(Function):
43 | def forward(self, X):
44 | exp_x = np.exp(X)
45 | probs = exp_x / np.sum(exp_x, axis=1, keepdims=True)
46 | self.cache["X"] = X
47 | self.cache["output"] = probs
48 | return probs
49 |
50 | def backward(self, dY):
51 | dX = []
52 |
53 | for dY_row, grad_row in zip(dY, self.grad["X"]):
54 | dX.append(np.dot(dY_row, grad_row))
55 |
56 | return np.array(dX)
57 |
58 | def local_grad(self, X):
59 | grad = []
60 |
61 | for prob in self.cache["output"]:
62 | prob = prob.reshape(-1, 1)
63 | grad_row = -np.dot(prob, prob.T)
64 | grad_row_diagonal = prob * (1 - prob)
65 | np.fill_diagonal(grad_row, grad_row_diagonal)
66 | grad.append(grad_row)
67 |
68 | grad = np.array(grad)
69 | return {"X": grad}
70 |
--------------------------------------------------------------------------------
/nn/functional.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | def sigmoid(x):
5 | return 1 / (1 + np.exp(-x))
6 |
7 |
8 | def sigmoid_prime(x):
9 | s = sigmoid(x)
10 | return s * (1 - s)
11 |
12 |
13 | def relu(x):
14 | return x * (x > 0)
15 |
16 |
17 | def relu_prime(x):
18 | return 1 * (x > 0)
19 |
20 |
21 | def leaky_relu(x, alpha):
22 | return x * (x > 0) + alpha * x * (x <= 0)
23 |
24 |
25 | def leaky_relu_prime(x, alpha):
26 | return 1 * (x > 0) + alpha * (x <= 0)
27 |
--------------------------------------------------------------------------------
/nn/layers.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 | from math import sqrt
4 | from itertools import product
5 |
6 | from .utils import zero_pad
7 |
8 |
9 | class Function:
10 | """
11 | Abstract model of a differentiable function.
12 | """
13 |
14 | def __init__(self, *args, **kwargs):
15 | # initializing cache for intermediate results
16 | # helps with gradient calculation in some cases
17 | self.cache = {}
18 | # cache for gradients
19 | self.grad = {}
20 |
21 | def __call__(self, *args, **kwargs):
22 | # calculating output
23 | output = self.forward(*args, **kwargs)
24 | # calculating and caching local gradients
25 | self.grad = self.local_grad(*args, **kwargs)
26 | return output
27 |
28 | def forward(self, *args, **kwargs):
29 | """
30 | Forward pass of the function. Calculates the output value and the
31 | gradient at the input as well.
32 | """
33 | pass
34 |
35 | def backward(self, *args, **kwargs):
36 | """
37 | Backward pass. Computes the local gradient at the input value
38 | after forward pass.
39 | """
40 | pass
41 |
42 | def local_grad(self, *args, **kwargs):
43 | """
44 | Calculates the local gradients of the function at the given input.
45 |
46 | Returns:
47 | grad: dictionary of local gradients.
48 | """
49 | pass
50 |
51 |
52 | class Layer(Function):
53 | """
54 | Abstract model of a neural network layer. In addition to Function, a Layer
55 | also has weights and gradients with respect to the weights.
56 | """
57 |
58 | def __init__(self, *args, **kwargs):
59 | super().__init__(*args, **kwargs)
60 | self.weight = {}
61 | self.weight_update = {}
62 |
63 | def _init_weights(self, *args, **kwargs):
64 | pass
65 |
66 | def _update_weights(self, lr):
67 | """
68 | Updates the weights using the corresponding _global_ gradients computed during
69 | backpropagation.
70 |
71 | Args:
72 | lr: float. Learning rate.
73 | """
74 | for weight_key, weight in self.weight.items():
75 | self.weight[weight_key] = self.weight[weight_key] - lr * self.weight_update[weight_key]
76 |
77 |
78 | class Flatten(Function):
79 | def forward(self, X):
80 | self.cache["shape"] = X.shape
81 | n_batch = X.shape[0]
82 | return X.reshape(n_batch, -1)
83 |
84 | def backward(self, dY):
85 | return dY.reshape(self.cache["shape"])
86 |
87 |
88 | class MaxPool2D(Function):
89 | def __init__(self, kernel_size=(2, 2)):
90 | super().__init__()
91 | self.kernel_size = (
92 | (kernel_size, kernel_size) if isinstance(kernel_size, int) else kernel_size
93 | )
94 |
95 | def __call__(self, X):
96 | # in contrary to other Function subclasses, MaxPool2D does not need to call
97 | # .local_grad() after forward pass because the gradient is calculated during it
98 | return self.forward(X)
99 |
100 | def forward(self, X):
101 | N, C, H, W = X.shape
102 | KH, KW = self.kernel_size
103 |
104 | grad = np.zeros_like(X)
105 | Y = np.zeros((N, C, H // KH, W // KW))
106 |
107 | # for n in range(N):
108 | for h, w in product(range(0, H // KH), range(0, W // KW)):
109 | h_offset, w_offset = h * KH, w * KW
110 | rec_field = X[:, :, h_offset : h_offset + KH, w_offset : w_offset + KW]
111 | Y[:, :, h, w] = np.max(rec_field, axis=(2, 3))
112 | for kh, kw in product(range(KH), range(KW)):
113 | grad[:, :, h_offset + kh, w_offset + kw] = (
114 | X[:, :, h_offset + kh, w_offset + kw] >= Y[:, :, h, w]
115 | )
116 |
117 | # storing the gradient
118 | self.grad["X"] = grad
119 |
120 | return Y
121 |
122 | def backward(self, dY):
123 | dY = np.repeat(
124 | np.repeat(dY, repeats=self.kernel_size[0], axis=2), repeats=self.kernel_size[1], axis=3
125 | )
126 | return self.grad["X"] * dY
127 |
128 | def local_grad(self, X):
129 | # small hack: because for MaxPool calculating the gradient is simpler during
130 | # the forward pass, it is calculated there and this function just returns the
131 | # grad dictionary
132 | return self.grad
133 |
134 |
135 | class BatchNorm2D(Layer):
136 | def __init__(self, n_channels, epsilon=1e-5):
137 | super().__init__()
138 | self.epsilon = epsilon
139 | self.n_channels = n_channels
140 | self._init_weights(n_channels)
141 |
142 | def _init_weights(self, n_channels):
143 | self.weight["gamma"] = np.ones(shape=(1, n_channels, 1, 1))
144 | self.weight["beta"] = np.zeros(shape=(1, n_channels, 1, 1))
145 |
146 | def forward(self, X):
147 | """
148 | Forward pass for the 2D batchnorm layer.
149 |
150 | Args:
151 | X: numpy.ndarray of shape (n_batch, n_channels, height, width).
152 |
153 | Returns_
154 | Y: numpy.ndarray of shape (n_batch, n_channels, height, width).
155 | Batch-normalized tensor of X.
156 | """
157 | mean = np.mean(X, axis=(2, 3), keepdims=True)
158 | var = np.var(X, axis=(2, 3), keepdims=True) + self.epsilon
159 | invvar = 1.0 / var
160 | sqrt_invvar = np.sqrt(invvar)
161 | centered = X - mean
162 | scaled = centered * sqrt_invvar
163 | normalized = scaled * self.weight["gamma"] + self.weight["beta"]
164 |
165 | # caching intermediate results for backprop
166 | self.cache["mean"] = mean
167 | self.cache["var"] = var
168 | self.cache["invvar"] = invvar
169 | self.cache["sqrt_invvar"] = sqrt_invvar
170 | self.cache["centered"] = centered
171 | self.cache["scaled"] = scaled
172 | self.cache["normalized"] = normalized
173 |
174 | return normalized
175 |
176 | def backward(self, dY):
177 | """
178 | Backward pass for the 2D batchnorm layer. Calculates global gradients
179 | for the input and the parameters.
180 |
181 | Args:
182 | dY: numpy.ndarray of shape (n_batch, n_channels, height, width).
183 |
184 | Returns:
185 | dX: numpy.ndarray of shape (n_batch, n_channels, height, width).
186 | Global gradient wrt the input X.
187 | """
188 | # global gradients of parameters
189 | dgamma = np.sum(self.cache["scaled"] * dY, axis=(0, 2, 3), keepdims=True)
190 | dbeta = np.sum(dY, axis=(0, 2, 3), keepdims=True)
191 |
192 | # caching global gradients of parameters
193 | self.weight_update["gamma"] = dgamma
194 | self.weight_update["beta"] = dbeta
195 |
196 | # global gradient of the input
197 | dX = self.grad["X"] * dY
198 |
199 | return dX
200 |
201 | def local_grad(self, X):
202 | """
203 | Calculates the local gradient for X.
204 |
205 | Args:
206 | dY: numpy.ndarray of shape (n_batch, n_channels, height, width).
207 |
208 | Returns:
209 | grads: dictionary of gradients.
210 | """
211 | # global gradient of the input
212 | N, C, H, W = X.shape
213 | # ppc = pixels per channel, useful variable for further computations
214 | ppc = H * W
215 |
216 | # gradient for 'denominator path'
217 | dsqrt_invvar = self.cache["centered"]
218 | dinvvar = (1.0 / (2.0 * np.sqrt(self.cache["invvar"]))) * dsqrt_invvar
219 | dvar = (-1.0 / self.cache["var"] ** 2) * dinvvar
220 | ddenominator = (X - self.cache["mean"]) * (2 * (ppc - 1) / ppc ** 2) * dvar
221 |
222 | # gradient for 'numerator path'
223 | dcentered = self.cache["sqrt_invvar"]
224 | dnumerator = (1.0 - 1.0 / ppc) * dcentered
225 |
226 | dX = ddenominator + dnumerator
227 | grads = {"X": dX}
228 | return grads
229 |
230 |
231 | class Linear(Layer):
232 | def __init__(self, in_dim, out_dim):
233 | super().__init__()
234 | self._init_weights(in_dim, out_dim)
235 |
236 | def _init_weights(self, in_dim, out_dim):
237 | scale = 1 / sqrt(in_dim)
238 | self.weight["W"] = scale * np.random.randn(in_dim, out_dim)
239 | self.weight["b"] = scale * np.random.randn(1, out_dim)
240 |
241 | def forward(self, X):
242 | """
243 | Forward pass for the Linear layer.
244 |
245 | Args:
246 | X: numpy.ndarray of shape (n_batch, in_dim) containing
247 | the input value.
248 |
249 | Returns:
250 | Y: numpy.ndarray of shape of shape (n_batch, out_dim) containing
251 | the output value.
252 | """
253 |
254 | output = np.dot(X, self.weight["W"]) + self.weight["b"]
255 |
256 | # caching variables for backprop
257 | self.cache["X"] = X
258 | self.cache["output"] = output
259 |
260 | return output
261 |
262 | def backward(self, dY):
263 | """
264 | Backward pass for the Linear layer.
265 |
266 | Args:
267 | dY: numpy.ndarray of shape (n_batch, n_out). Global gradient
268 | backpropagated from the next layer.
269 |
270 | Returns:
271 | dX: numpy.ndarray of shape (n_batch, n_out). Global gradient
272 | of the Linear layer.
273 | """
274 | # calculating the global gradient, to be propagated backwards
275 | dX = dY.dot(self.grad["X"].T)
276 | # calculating the global gradient wrt to weights
277 | X = self.cache["X"]
278 | dW = self.grad["W"].T.dot(dY)
279 | db = np.sum(dY, axis=0, keepdims=True)
280 | # caching the global gradients
281 | self.weight_update = {"W": dW, "b": db}
282 |
283 | return dX
284 |
285 | def local_grad(self, X):
286 | """
287 | Local gradients of the Linear layer at X.
288 |
289 | Args:
290 | X: numpy.ndarray of shape (n_batch, in_dim) containing the
291 | input data.
292 |
293 | Returns:
294 | grads: dictionary of local gradients with the following items:
295 | X: numpy.ndarray of shape (n_batch, in_dim).
296 | W: numpy.ndarray of shape (n_batch, in_dim).
297 | b: numpy.ndarray of shape (n_batch, 1).
298 | """
299 | gradX_local = self.weight["W"]
300 | gradW_local = X
301 | gradb_local = np.ones_like(self.weight["b"])
302 | grads = {"X": gradX_local, "W": gradW_local, "b": gradb_local}
303 | return grads
304 |
305 |
306 | class Conv2D(Layer):
307 | def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=0):
308 | super().__init__()
309 | self.in_channels = in_channels
310 | self.out_channels = out_channels
311 | self.stride = stride
312 | self.kernel_size = (
313 | kernel_size if isinstance(kernel_size, tuple) else (kernel_size, kernel_size)
314 | )
315 | self.padding = padding
316 | self._init_weights(in_channels, out_channels, self.kernel_size)
317 |
318 | def _init_weights(self, in_channels, out_channels, kernel_size):
319 | scale = 2 / sqrt(in_channels * kernel_size[0] * kernel_size[1])
320 |
321 | self.weight = {
322 | "W": np.random.normal(scale=scale, size=(out_channels, in_channels, *kernel_size)),
323 | "b": np.zeros(shape=(out_channels, 1)),
324 | }
325 |
326 | def forward(self, X):
327 | """
328 | Forward pass for the convolution layer.
329 |
330 | Args:
331 | X: numpy.ndarray of shape (N, C, H_in, W_in).
332 |
333 | Returns:
334 | Y: numpy.ndarray of shape (N, F, H_out, W_out).
335 | """
336 | if self.padding:
337 | X = zero_pad(X, pad_width=self.padding, dims=(2, 3))
338 |
339 | self.cache["X"] = X
340 |
341 | N, C, H, W = X.shape
342 | KH, KW = self.kernel_size
343 | out_shape = (N, self.out_channels, 1 + (H - KH) // self.stride, 1 + (W - KW) // self.stride)
344 | Y = np.zeros(out_shape)
345 | for n in range(N):
346 | for c_w in range(self.out_channels):
347 | for h, w in product(range(out_shape[2]), range(out_shape[3])):
348 | h_offset, w_offset = h * self.stride, w * self.stride
349 | rec_field = X[n, :, h_offset : h_offset + KH, w_offset : w_offset + KW]
350 | Y[n, c_w, h, w] = (
351 | np.sum(self.weight["W"][c_w] * rec_field) + self.weight["b"][c_w]
352 | )
353 |
354 | return Y
355 |
356 | def backward(self, dY):
357 | # calculating the global gradient to be propagated backwards
358 | # TODO: this is actually transpose convolution, move this to a util function
359 | X = self.cache["X"]
360 | dX = np.zeros_like(X)
361 | N, C, H, W = dX.shape
362 | KH, KW = self.kernel_size
363 | for n in range(N):
364 | for c_w in range(self.out_channels):
365 | for h, w in product(range(dY.shape[2]), range(dY.shape[3])):
366 | h_offset, w_offset = h * self.stride, w * self.stride
367 | dX[n, :, h_offset : h_offset + KH, w_offset : w_offset + KW] += (
368 | self.weight["W"][c_w] * dY[n, c_w, h, w]
369 | )
370 |
371 | # calculating the global gradient wrt the conv filter weights
372 | dW = np.zeros_like(self.weight["W"])
373 | for c_w in range(self.out_channels):
374 | for c_i in range(self.in_channels):
375 | for h, w in product(range(KH), range(KW)):
376 | X_rec_field = X[
377 | :, c_i, h : H - KH + h + 1 : self.stride, w : W - KW + w + 1 : self.stride
378 | ]
379 | dY_rec_field = dY[:, c_w]
380 | dW[c_w, c_i, h, w] = np.sum(X_rec_field * dY_rec_field)
381 |
382 | # calculating the global gradient wrt to the bias
383 | db = np.sum(dY, axis=(0, 2, 3)).reshape(-1, 1)
384 |
385 | # caching the global gradients of the parameters
386 | self.weight_update["W"] = dW
387 | self.weight_update["b"] = db
388 |
389 | return dX[:, :, self.padding : -self.padding, self.padding : -self.padding]
390 |
--------------------------------------------------------------------------------
/nn/losses.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from .layers import Function
3 |
4 |
5 | class Loss(Function):
6 | def forward(self, X, Y):
7 | """
8 | Computes the loss of x with respect to y.
9 |
10 | Args:
11 | X: numpy.ndarray of shape (n_batch, n_dim).
12 | Y: numpy.ndarray of shape (n_batch, n_dim).
13 |
14 | Returns:
15 | loss: numpy.float.
16 | """
17 | pass
18 |
19 | def backward(self):
20 | """
21 | Backward pass for the loss function. Since it should be the final layer
22 | of an architecture, no input is needed for the backward pass.
23 |
24 | Returns:
25 | gradX: numpy.ndarray of shape (n_batch, n_dim). Local gradient of the loss.
26 | """
27 | return self.grad["X"]
28 |
29 | def local_grad(self, X, Y):
30 | """
31 | Local gradient with respect to X at (X, Y).
32 |
33 | Args:
34 | X: numpy.ndarray of shape (n_batch, n_dim).
35 | Y: numpy.ndarray of shape (n_batch, n_dim).
36 |
37 | Returns:
38 | gradX: numpy.ndarray of shape (n_batch, n_dim).
39 | """
40 | pass
41 |
42 |
43 | class MeanSquareLoss(Loss):
44 | def forward(self, X, Y):
45 | """
46 | Computes the mean square error of X with respect to Y.
47 |
48 | Args:
49 | X: numpy.ndarray of shape (n_batch, n_dim).
50 | Y: numpy.ndarray of shape (n_batch, n_dim).
51 |
52 | Returns:
53 | mse_loss: numpy.float. Mean square error of x with respect to y.
54 | """
55 | # calculating loss
56 | sum = np.sum((X - Y) ** 2, axis=1, keepdims=True)
57 | mse_loss = np.mean(sum)
58 | return mse_loss
59 |
60 | def local_grad(self, X, Y):
61 | """
62 | Local gradient with respect to X at (X, Y).
63 |
64 | Args:
65 | X: numpy.ndarray of shape (n_batch, n_dim).
66 | Y: numpy.ndarray of shape (n_batch, n_dim).
67 |
68 | Returns:
69 | gradX: numpy.ndarray of shape (n_batch, n_dim). Gradient of MSE wrt X at X and Y.
70 | """
71 | grads = {"X": 2 * (X - Y) / X.shape[0]}
72 | return grads
73 |
74 |
75 | class CrossEntropyLoss(Loss):
76 | def forward(self, X, y):
77 | """
78 | Computes the cross entropy loss of x with respect to y.
79 |
80 | Args:
81 | X: numpy.ndarray of shape (n_batch, n_dim).
82 | y: numpy.ndarray of shape (n_batch, 1). Should contain class labels
83 | for each data point in x.
84 |
85 | Returns:
86 | crossentropy_loss: numpy.float. Cross entropy loss of x with respect to y.
87 | """
88 | # calculating crossentropy
89 | exp_x = np.exp(X)
90 | probs = exp_x / np.sum(exp_x, axis=1, keepdims=True)
91 | log_probs = -np.log([probs[i, y[i]] for i in range(len(probs))])
92 | crossentropy_loss = np.mean(log_probs)
93 |
94 | # caching for backprop
95 | self.cache["probs"] = probs
96 | self.cache["y"] = y
97 |
98 | return crossentropy_loss
99 |
100 | def local_grad(self, X, Y):
101 | probs = self.cache["probs"]
102 | ones = np.zeros_like(probs)
103 | for row_idx, col_idx in enumerate(Y):
104 | ones[row_idx, col_idx] = 1.0
105 |
106 | grads = {"X": (probs - ones) / float(len(X))}
107 | return grads
108 |
--------------------------------------------------------------------------------
/nn/net.py:
--------------------------------------------------------------------------------
1 | from .losses import Loss
2 | from .layers import Function, Layer
3 |
4 |
5 | class Net:
6 | __slots__ = ["layers", "loss_fn"]
7 |
8 | def __init__(self, layers, loss):
9 | assert isinstance(loss, Loss), "loss must be an instance of nn.losses.Loss"
10 | for layer in layers:
11 | assert isinstance(layer, Function), (
12 | "layer should be an instance of " "nn.layers.Function or nn.layers.Layer"
13 | )
14 |
15 | self.layers = layers
16 | self.loss_fn = loss
17 |
18 | def __call__(self, *args, **kwargs):
19 | return self.forward(*args, **kwargs)
20 |
21 | def forward(self, x):
22 | """
23 | Calculates the forward pass by propagating the input through the
24 | layers.
25 |
26 | Args:
27 | x: numpy.ndarray. Input of the net.
28 |
29 | Returns:
30 | output: numpy.ndarray. Output of the net.
31 | """
32 | for layer in self.layers:
33 | x = layer(x)
34 | return x
35 |
36 | def loss(self, x, y):
37 | """
38 | Calculates the loss of the forward pass output with respect to y.
39 | Should be called after forward pass.
40 |
41 | Args:
42 | x: numpy.ndarray. Output of the forward pass.
43 | y: numpy.ndarray. Ground truth.
44 |
45 | Returns:
46 | loss: numpy.float. Loss value.
47 | """
48 | loss = self.loss_fn(x, y)
49 | return loss
50 |
51 | def backward(self):
52 | """
53 | Complete backward pass for the net. Should be called after the forward
54 | pass and the loss are calculated.
55 |
56 | Returns:
57 | d: numpy.ndarray of shape matching the input during forward pass.
58 | """
59 | d = self.loss_fn.backward()
60 | for layer in reversed(self.layers):
61 | d = layer.backward(d)
62 | return d
63 |
64 | def update_weights(self, lr):
65 | """
66 | Updates the weights for all layers using the corresponding gradients
67 | computed during backpropagation.
68 |
69 | Args:
70 | lr: float. Learning rate.
71 | """
72 | for layer in self.layers:
73 | if isinstance(layer, Layer):
74 | layer._update_weights(lr)
75 |
--------------------------------------------------------------------------------
/nn/utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import numpy as np
3 |
4 | from skimage import io
5 |
6 |
7 | def zero_pad(X, pad_width, dims):
8 | """
9 | Pads the given array X with zeroes at the both end of given dims.
10 |
11 | Args:
12 | X: numpy.ndarray.
13 | pad_width: int, width of the padding.
14 | dims: int or tuple, dimensions to be padded.
15 |
16 | Returns:
17 | X_padded: numpy.ndarray, zero padded X.
18 | """
19 | dims = (dims) if isinstance(dims, int) else dims
20 | pad = [(0, 0) if idx not in dims else (pad_width, pad_width) for idx in range(len(X.shape))]
21 | X_padded = np.pad(X, pad, "constant")
22 | return X_padded
23 |
24 |
25 | def load_data(folder_path):
26 | imgs = []
27 | labels = []
28 | for class_dir in os.listdir(folder_path):
29 | class_label = int(class_dir) - 1
30 | class_path = os.path.join(folder_path, class_dir)
31 | imgs.append(
32 | np.array(
33 | [
34 | io.imread(os.path.join(class_path, fname)).transpose((2, 0, 1))
35 | for fname in os.listdir(class_path)
36 | ]
37 | )
38 | )
39 | labels.append(np.array([class_label] * len(os.listdir(class_path))))
40 |
41 | X = np.concatenate(imgs, axis=0)
42 | y = np.concatenate(labels).reshape(-1, 1)
43 |
44 | return X, y
45 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | scikit-image
--------------------------------------------------------------------------------
/tests/core_tests.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 |
4 | sys.path.append("..")
5 | import unittest
6 |
7 | from itertools import product
8 |
9 | from nn.layers import Linear
10 |
11 |
12 | class Test(unittest.TestCase):
13 | def test_linear(self):
14 | for in_dim, out_dim, n_batch in product(range(1, 10), range(1, 10), range(1, 10)):
15 | linear = Linear(in_dim, out_dim)
16 | x = np.random.rand(n_batch, in_dim)
17 | y = linear.forward(x)
18 | self.assertEqual(y.shape, (in_dim, out_dim))
19 |
--------------------------------------------------------------------------------