├── README.md
├── model.py
└── sample_images
├── 0_wrong.png
├── 3.png
├── 4.png
├── 9.png
├── all_layers.png
├── all_layers_fashion.png
├── ankle_boot.png
├── bag_wrong.png
├── dress.png
├── shirt.png
├── trouser.png
└── tshirt_wrong.png
/README.md:
--------------------------------------------------------------------------------
1 | # MNIST Neural Network classifier
2 |
3 | ## The model
4 |
5 | This is a trained neural network built from scratch to classify handwritten digits or fashions from the MNIST dataset.
6 |
7 | By default, the neural network has an architecture:
8 |
9 | *L* = number of layers = 8
10 |
11 | | Layer index (*l*) | Number of Activation units (*n[l]*) | Activation function |
12 | | ----------------- | ----------------------------------- | ------------------- |
13 | | 0 | 784 | N/A |
14 | | 1 | 3000 | ReLU |
15 | | 2 | 2500 | ReLU |
16 | | 3 | 2000 | ReLU |
17 | | 4 | 1500 | ReLU |
18 | | 5 | 1000 | ReLU |
19 | | 6 | 500 | ReLU |
20 | | 7 | 10 | Softmax |
21 |
22 | Trained using Adam optimizer and initialized parameters with He-et-al initialization.
23 |
24 | Default hyperparameters settings:
25 |
26 | | Hyperparameter name | Value |
27 | | ----------------------------------------------------- | ----- |
28 | | Learning rate (alpha) | 0.001 |
29 | | Number of epochs (n_epochs) | 2000 |
30 | | Batch size (batch_size) | 256 |
31 | | Dropout keep activation units probability (keep_prob) | 0.7 |
32 | | L2 regularization parameter (lbd) | 0.05 |
33 | | Adam's parameter β1 (beta1) | 0.9 |
34 | | Adam's parameter β2 (beta2) | 0.999 |
35 | | Learning rate decay (decay_rate) | 1.0 |
36 |
37 |
38 |
39 | ## Installation, demo, and training
40 |
41 | **Prerequisites**: Make sure you've installed the required libraries/packages: Numpy, Tensorflow (only used to get the dataset), and Matplotlib
42 |
43 | 1. Clone this repository
44 |
45 | ```shell
46 | git clone https://github.com/pkien01/MNIST-neural-network-classifier
47 | ```
48 |
49 | 2. Download the pretrained weights for the MNIST digits and fashions dataset [here](https://drive.google.com/drive/folders/1CmQRokKnQ75ukEU_Y5Lq9DsjYVWxP6MM?usp=sharing) and move it to the `MNIST-neural-network-classifier` folder.
50 |
51 | 3. After that, the `MNIST-neural-network-classifier` folder should have the following structure
52 |
53 | ```bash
54 | MNIST-neural-network-classifier/
55 | MNIST-neural-network-classifier/model.py #the model source code
56 | MNIST-neural-network-classifier/mnist_trained_weights_deep.dat #the pretrained MNIST digits weights
57 | MNIST-neural-network-classifier/fashion_mnist_trained_weights_deep.dat #the pretrained MNIST fashion weights
58 | ```
59 |
60 | 4. Open the `model.py` source code file and at the end, it should be similar to the following
61 |
62 | ```python
63 | ...
64 | W, b = load_cache()
65 |
66 | #gradient_descent(W, b, keep_prob=.7, lbd =.03, learning_rate=0.001)
67 | #print(set_performance(X_train, Y_train, W, b))
68 | #print(set_performance(X_test, Y_test, W, b))
69 | demo(W, b, fashion=False)
70 | #demo_wrong(W, b, fashion=False)
71 | #demo_all_layers(W, b)
72 | ```
73 |
74 | 5. Run the source code file
75 |
76 | ```bash
77 | cd MNIST-neural-network-classifier
78 | python3 model.py
79 | ```
80 |
81 | Congrats! You've just ran the demo on the MNIST handwritten digits dataset
82 |
83 | 6. To run it on the MNIST fashion dataset:
84 |
85 | - Open the `model.py` in a text/code editor
86 |
87 | - Change the following line (line 8 in the default source code)
88 |
89 | ```python
90 | (X_train, label_train), (X_test, label_test) = tf.keras.datasets.mnist.load_data()
91 | ```
92 |
93 | to this
94 |
95 | ```
96 | (X_train, label_train), (X_test, label_test) = tf.keras.datasets.fashion_mnist.load_data()
97 | ```
98 |
99 | - Next, change the following line (line 24 in the default source code)
100 |
101 | ```python
102 | weights_file = './mnist_trained_weights_deep.dat'
103 | ```
104 |
105 | to this
106 |
107 | ```python
108 | weights_file = './fashion_mnist_trained_weights_deep.dat'
109 | ```
110 |
111 | * Next, set the `fashion` variable to `True` in the `demo` function
112 |
113 | ```python
114 | demo(W, b, fashion=False)
115 | #demo_wrong(W, b, fashion=False)
116 | ```
117 |
118 | to this
119 |
120 | ```python
121 | demo(W, b, fashion=True)
122 | #demo_wrong(W, b, fashion=True)
123 | ```
124 | * Repeat step 5 to run the source code file
125 |
126 | 7. To demo on incorrectly labeled images, uncomment the line `demo_wrong(W, b, fashion=False)` and comment all the other function calls (of course, the `fashion`variable can be set `True` or `False` depending if you want to demo on the fashion or the handwritten digits images). Then, repeat step 5 to run the source code.
127 |
128 | 8. To verify the model performance on train and test set, respectively, uncomment the following lines (and comment all the other function calls).
129 |
130 | ```python
131 | print(set_performance(X_train, Y_train, W, b))
132 | print(set_performance(X_test, Y_test, W, b))
133 | ```
134 |
135 | Repeat step 5 to run the source code
136 |
137 | 9. To visualize each of the individual layers' activations, uncomment the function `demo_all_layers(W, b)` and comment the rest function calls. Repeat step 5 to run the source code.
138 |
139 | 10. To re-train it from the trained weights, uncomment the function `gradient_descent(W, b, keep_prob=.7, lbd =.03, learning_rate=0.001)` and comment the rest function calls. You can use your own set of hyperparameters to train the neural network or tune it yourself if you want.
140 |
141 | If you want to train from scratch, delete the weights file first before training (though it may take quite a long time to train): `mnist_trained_weights_deep.dat` for the MNIST handwritten digits dataset or `fashion_mnist_trained_weights_deep.dat` for the MNIST fashion dataset.
142 |
143 | You can also train it using a different algorithm like standard gradient descent or gradient descent with momentum, instead of Adam, by modifying the initialization and parameters update function calls in the `gradient descent()` function. For example, the update function of standard gradient descent is `update_para(W, b, dW, db, learning_rate)` and gradient descent with momentum is `update_para_momentum(W, b, dW, db, VdW, Vdb, epoch_num, learning_rate, beta1)`.
144 |
145 |
146 |
147 | ## Results of pretrained weights
148 |
149 | * On the MNIST handwritten digits dataset
150 |
151 | Training set accuracy: 99.99833%.
152 |
153 | Test set accuracy: 98.09%.
154 |
155 | * On the MNIST fashion dataset
156 |
157 | Training set accuracy: 99.39%.
158 |
159 | Test set accuracy: 89.24%.
160 |
161 |
162 | You can train it for a longer period, and/or adjust the hyperparameters, to get better performance.
163 |
164 |
165 |
166 | Here are the results on some example images on the handwritten digits dataset:
167 |
168 |
169 |
170 |
171 |
172 |
173 |
174 |
175 |
176 |
177 | And here are some on the fashion dataset:
178 |
179 |
180 |
181 |

182 |

183 |
184 |
185 |

186 |

187 |
188 |
189 |
190 | Finally, here are the visualizations of all the layers' activations (looks pretty random, huh?):
191 | 
192 | 
193 |
194 | Have fun playing around with the model as you like.
195 |
196 |
197 |
198 | ## Contacts
199 |
200 | If you have any questions or encounter some serious errors/bugs in the code, you can write an email to phamkienxmas2001@gmail.com, v.kienp16@vinai.io, or kindly leave a message below on GitHub. Thank you!
201 |
--------------------------------------------------------------------------------
/model.py:
--------------------------------------------------------------------------------
1 | import tensorflow as tf
2 | import matplotlib.pyplot as plt
3 | import numpy as np
4 | import math
5 | import pickle
6 | import os
7 |
8 | (X_train, label_train), (X_test, label_test) = tf.keras.datasets.mnist.load_data()
9 |
10 | X_test_images = X_test
11 | X_train_images = X_train
12 | #print('Train: X=%s, y=%s' % (X_train.shape, Y_train.shape))
13 | #print('Test: X=%s, y=%s' % (X_test.shape, Y_test.shape))
14 | m_train = X_train.shape[0]
15 | m_test = X_test.shape[0]
16 | n_x = X_train.shape[1] * X_train.shape[2]
17 |
18 | #Network structure
19 | n_classes = 10
20 | #n = [n_x, 2500, 2000, 1500, 1000, 500, n_classes]
21 | #n = [n_x, 800, n_classes]
22 | n = [n_x, 3000, 2500, 2000, 1500, 1000, 500, n_classes]
23 | L = len(n)
24 | weights_file = './mnist_trained_weights_deep.dat'
25 |
26 | def process_data(X, label):
27 | m = X.shape[0]
28 | assert(m == label.shape[0])
29 | X = X.reshape(m, -1).T / 255
30 | Y = np.zeros((m, 10))
31 | Y[np.arange(m), label] = 1
32 | Y = Y.T
33 | return X, Y
34 |
35 | X_train, Y_train = process_data(X_train, label_train)
36 | X_test, Y_test = process_data(X_test, label_test)
37 |
38 | def relu(x):
39 | return np.maximum(0, x)
40 |
41 | def softmax(x):
42 | t = np.exp(x - np.max(x, axis = 0).reshape((1, x.shape[1])))
43 | return t / np.sum(t, axis = 0, keepdims = True)
44 |
45 | def forward_prop(X, W, b, keep_prob):
46 | m = X.shape[1]
47 | Z = [None]
48 | A = [X]
49 | D = [None]
50 | for l in range(1, L):
51 | Z.append(np.dot(W[l], A[l - 1]) + b[l])
52 | if (l == L - 1): A.append(softmax(Z[l]))
53 | else:
54 | A.append(relu(Z[l]))
55 | D.append(np.random.rand(A[l].shape[0], A[l].shape[1]) < keep_prob)
56 | assert(D[l].shape == A[l].shape)
57 | A[l] = A[l] * D[l] / keep_prob
58 |
59 | assert(Z[l].shape == (n[l], m))
60 | assert(A[l].shape == Z[l].shape)
61 | return Z, A, D
62 |
63 | def num_stable_prob(x, epsilon = 1e-18):
64 | x = np.maximum(x, epsilon)
65 | x = np.minimum(x, 1. - epsilon)
66 | return x
67 |
68 | def cross_entropy_loss(Yhat, Y, lbd):
69 | m = Y.shape[1]
70 | assert(m == Yhat.shape[1])
71 | num_stable_prob(Yhat)
72 | res = -np.squeeze(np.sum(Y * np.log(Yhat))) / m
73 | assert(res.shape == ())
74 | for l in range(1, L): res += lbd * np.sum(np.square(W[l])) / m / 2.
75 | return res
76 |
77 | def relu_der(x):
78 | return np.int64(x > 0)
79 |
80 | def backward_prop(X, Y, W, b, Z, A, D, keep_prob, lbd):
81 | dZ = [None] * L
82 | dW = [None] * L
83 | db = [None] * L
84 | m = Y.shape[1]
85 | assert(X.shape[1] == m)
86 | for l in reversed(range(1, L)):
87 | if (l == L - 1): dZ[l] = A[l] - Y
88 | else:
89 | dA_l = np.dot(W[l + 1].T, dZ[l + 1])
90 | dA_l = dA_l * D[l] / keep_prob
91 | dZ[l] = dA_l * relu_der(Z[l])
92 |
93 | dW[l] = np.dot(dZ[l], A[l - 1].T) / m + (lbd * W[l]) / m
94 | db[l] = np.sum(dZ[l], axis = 1, keepdims = True) / m
95 | assert(dZ[l].shape == Z[l].shape)
96 | assert(dW[l].shape == W[l].shape)
97 | assert(db[l].shape == b[l].shape)
98 | return dW, db
99 |
100 | def split_batches(X, Y, batch_size):
101 | m = X.shape[1]
102 | assert(m == Y.shape[1])
103 | perm = list(np.random.permutation(m))
104 |
105 | shuffled_X = X[:, perm]
106 | shuffled_Y = Y[:, perm].reshape((n_classes, m))
107 | assert(shuffled_X.shape == X.shape)
108 | assert(shuffled_Y.shape == Y.shape)
109 | n_batches = m // batch_size
110 | batches = []
111 | for i in range(0, n_batches):
112 | batch_X = shuffled_X[:, i * batch_size : (i + 1) * batch_size]
113 | batch_Y = shuffled_Y[:, i * batch_size : (i + 1) * batch_size]
114 | batches.append((batch_X, batch_Y))
115 | if (m % batch_size != 0):
116 | batch_X = shuffled_X[:, batch_size * n_batches : m]
117 | batch_Y = shuffled_Y[:, batch_size * n_batches : m]
118 | batches.append((batch_X, batch_Y))
119 | return batches
120 |
121 | def init_adam():
122 | VSdW = [None]
123 | VSdb = [None]
124 | for l in range(1, L):
125 | VSdW.append(np.zeros_like(W[l]))
126 | VSdb.append(np.zeros_like(b[l]))
127 | return VSdW, VSdb
128 |
129 | def update_para(W, b, dW, db, alpha):
130 | for l in range(1, L):
131 | W[l] -= alpha * dW[l]
132 | b[l] -= alpha * db[l]
133 | return W, b
134 |
135 | def update_para_adam(W, b, dW, db, VdW, Vdb, SdW, Sdb, iter_idx, alpha, beta1, beta2, epsilon = 1e-8):
136 | for l in range(1, L):
137 | VdW[l] = beta1 * VdW[l] + (1. - beta1) * dW[l]
138 | Vdb[l] = beta1 * Vdb[l] + (1. - beta1) * db[l]
139 |
140 | SdW[l] = beta2 * SdW[l] + (1. - beta2) * np.square(dW[l])
141 | Sdb[l] = beta2 * Sdb[l] + (1. - beta2) * np.square(db[l])
142 |
143 | V_upd = VdW[l] / (1. - beta1 ** iter_idx)
144 | S_upd = SdW[l] / (1. - beta2 ** iter_idx)
145 | assert(V_upd.shape == S_upd.shape)
146 | assert(V_upd.shape == W[l].shape)
147 | W[l] -= alpha * V_upd / (np.sqrt(S_upd) + epsilon)
148 |
149 | V_upd = Vdb[l] / (1. - beta1 ** iter_idx)
150 | S_upd = Sdb[l] / (1. - beta2 ** iter_idx)
151 | assert(V_upd.shape == S_upd.shape)
152 | assert(V_upd.shape == b[l].shape)
153 | b[l] -= alpha * V_upd / (np.sqrt(S_upd) + epsilon)
154 | return W, b
155 |
156 | def update_para_momentum(W, b, dW, db, VdW, Vdb, iter_idx, alpha, beta):
157 | for l in range(1, L):
158 | VdW[l] = beta * VdW[l] + (1. - beta) * dW[l]
159 | Vdb[l] = beta * Vdb[l] + (1. - beta) * db[l]
160 | V_upd = VdW[l] / (1. - beta ** iter_idx)
161 | W[l] -= alpha * V_upd
162 | V_upd = Vdb[l] / (1. - beta ** iter_idx)
163 | b[l] -= alpha * V_upd
164 | return W, b
165 |
166 | def gradient_descent(W, b, n_epochs = 2000, batch_size = 2**8, keep_prob = 1., lbd = 0., learning_rate = .002, beta1 = .9, beta2 = .999, decay_rate = 1.):
167 | VdW, Vdb = init_adam()
168 | SdW, Sdb = init_adam()
169 | for epoch_num in range(n_epochs):
170 | batches = split_batches(X_train, Y_train, batch_size)
171 | n_batches = len(batches)
172 | for batch_idx in range(n_batches):
173 | X_cur, Y_cur = batches[batch_idx]
174 | Z, A, D = forward_prop(X_cur, W, b, keep_prob)
175 | cost = cross_entropy_loss(A[L - 1], Y_cur, lbd)
176 | iter_idx = epoch_num * n_batches + batch_idx + 1
177 | print("Cost after " + str(iter_idx) + " iterations: " + str(cost) + '.')
178 | dW, db = backward_prop(X_cur, Y_cur, W, b, Z, A, D, keep_prob, lbd)
179 | cur_learning_rate = learning_rate / math.sqrt(epoch_num + 1) / decay_rate
180 | update_para_adam(W, b, dW, db, VdW, Vdb, SdW, Sdb, iter_idx, cur_learning_rate, beta1, beta2)
181 | #update_para(W, b, dW, db, learning_rate)
182 | #update_para_momentum(W, b, dW, db, VdW, Vdb, epoch_num, learning_rate, beta1)
183 | Wfile = open(weights_file, 'wb')
184 | pickle.dump([W, b], Wfile)
185 | Wfile.close()
186 |
187 | def set_performance(X, Y, W, b, batch_size = 2**8):
188 | m = X.shape[1]
189 | assert(m == Y.shape[1])
190 | batches = split_batches(X, Y, batch_size)
191 | acc = 0
192 | for batch_idx in range(len(batches)):
193 | X_cur, Y_cur = batches[batch_idx]
194 | m_cur = X_cur.shape[1]
195 | assert(m_cur == Y_cur.shape[1])
196 | Z_cur, A_cur, D_cur = forward_prop(X_cur, W, b, 1.)
197 | pred = np.argmax(A_cur[L - 1], axis = 0).reshape((m_cur, 1))
198 | label = np.argmax(Y_cur, axis = 0).reshape((m_cur, 1))
199 | acc += np.sum(pred == label)
200 | return acc / m
201 |
202 | fashion_type = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
203 |
204 | def demo(W, b, idx = np.random.randint(0, m_test - 1), fashion = False):
205 | Z, A, D = forward_prop(X_test[:, idx].reshape(n_x, -1), W, b, 1.)
206 | pred = np.squeeze(np.argmax(A[L - 1]))
207 | sample = X_test_images[idx]
208 | plt.imshow(sample)
209 | if (fashion): plt.suptitle("Prediction label: " + fashion_type[pred] + " | Ground truth label: " + fashion_type[label_test[idx]] )
210 | else: plt.suptitle("Prediction label: " + str(pred) + " | Ground truth label: " + str(label_test[idx]) )
211 | plt.title("Confidence: " + str(np.squeeze(A[L - 1][pred]) * 100.) + "%.")
212 | plt.show()
213 |
214 | def demo_wrong(W, b, fashion = False):
215 | while (True):
216 | idx = np.random.randint(0, m_test - 1)
217 | Z, A, D = forward_prop(X_test[:, idx].reshape(n_x, -1), W, b, 1.)
218 | pred = np.squeeze(np.argmax(A[L - 1]))
219 | truth = label_test[idx]
220 | if (pred != truth):
221 | sample = X_test_images[idx]
222 | plt.imshow(sample)
223 | if (fashion): plt.suptitle("Prediction label: " + fashion_type[pred] + " | Ground truth label: " + fashion_type[label_test[idx]] )
224 | else: plt.suptitle("Prediction label: " + str(pred) + " | Ground truth label: " + str(label_test[idx]) )
225 | plt.title("Confidence: " + str(np.squeeze(A[L - 1][pred]) * 100.) + "%.")
226 | plt.show()
227 | break
228 |
229 | layers_shape = [(28, 28), (60, 50), (50, 50), (50, 40), (50, 30), (40, 25), (25, 20), (10, 1)]
230 | def demo_all_layers(W, b, idx = np.random.randint(0, m_test - 1)):
231 | Z, A, D = forward_prop(X_test[:, idx].reshape(n_x, -1), W, b, 1.)
232 | fig = plt.figure()
233 | for l in range(0, L):
234 | cur_plot = fig.add_subplot(1, L, l+1)
235 | cur_plot.set_title("Layer " + str(l))
236 | plt.imshow(A[l].reshape(layers_shape[l]))
237 | plt.show()
238 |
239 | def load_cache():
240 | file_exists = os.path.isfile(weights_file)
241 | if (file_exists):
242 | Wfile = open(weights_file, 'rb')
243 | W, b = pickle.load(Wfile)
244 | Wfile.close()
245 | else:
246 | W = [None]
247 | b = [None]
248 | for l in range(1, L):
249 | W.append(np.random.randn(n[l], n[l - 1]) * np.sqrt(2. / n[l - 1]))
250 | b.append(np.zeros((n[l], 1)))
251 | Wfile = open(weights_file, 'wb')
252 | pickle.dump([W, b], Wfile)
253 | Wfile.close()
254 | return W, b
255 |
256 | W, b = load_cache()
257 |
258 | #gradient_descent(W, b, keep_prob=.7, lbd =.03, learning_rate=0.001)
259 | #print(set_performance(X_train, Y_train, W, b))
260 | #print(set_performance(X_test, Y_test, W, b))
261 | demo(W, b, fashion=False)
262 | #demo_wrong(W, b, fashion=False)
263 | #demo_all_layers(W, b)
264 |
--------------------------------------------------------------------------------
/sample_images/0_wrong.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/0_wrong.png
--------------------------------------------------------------------------------
/sample_images/3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/3.png
--------------------------------------------------------------------------------
/sample_images/4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/4.png
--------------------------------------------------------------------------------
/sample_images/9.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/9.png
--------------------------------------------------------------------------------
/sample_images/all_layers.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/all_layers.png
--------------------------------------------------------------------------------
/sample_images/all_layers_fashion.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/all_layers_fashion.png
--------------------------------------------------------------------------------
/sample_images/ankle_boot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/ankle_boot.png
--------------------------------------------------------------------------------
/sample_images/bag_wrong.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/bag_wrong.png
--------------------------------------------------------------------------------
/sample_images/dress.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/dress.png
--------------------------------------------------------------------------------
/sample_images/shirt.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/shirt.png
--------------------------------------------------------------------------------
/sample_images/trouser.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/trouser.png
--------------------------------------------------------------------------------
/sample_images/tshirt_wrong.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/tshirt_wrong.png
--------------------------------------------------------------------------------