├── README.md ├── model.py └── sample_images ├── 0_wrong.png ├── 3.png ├── 4.png ├── 9.png ├── all_layers.png ├── all_layers_fashion.png ├── ankle_boot.png ├── bag_wrong.png ├── dress.png ├── shirt.png ├── trouser.png └── tshirt_wrong.png /README.md: -------------------------------------------------------------------------------- 1 | # MNIST Neural Network classifier 2 | 3 | ## The model 4 | 5 | This is a trained neural network built from scratch to classify handwritten digits or fashions from the MNIST dataset. 6 | 7 | By default, the neural network has an architecture: 8 | 9 | *L* = number of layers = 8 10 | 11 | | Layer index (*l*) | Number of Activation units (*n[l]*) | Activation function | 12 | | ----------------- | ----------------------------------- | ------------------- | 13 | | 0 | 784 | N/A | 14 | | 1 | 3000 | ReLU | 15 | | 2 | 2500 | ReLU | 16 | | 3 | 2000 | ReLU | 17 | | 4 | 1500 | ReLU | 18 | | 5 | 1000 | ReLU | 19 | | 6 | 500 | ReLU | 20 | | 7 | 10 | Softmax | 21 | 22 | Trained using Adam optimizer and initialized parameters with He-et-al initialization. 23 | 24 | Default hyperparameters settings: 25 | 26 | | Hyperparameter name | Value | 27 | | ----------------------------------------------------- | ----- | 28 | | Learning rate (alpha) | 0.001 | 29 | | Number of epochs (n_epochs) | 2000 | 30 | | Batch size (batch_size) | 256 | 31 | | Dropout keep activation units probability (keep_prob) | 0.7 | 32 | | L2 regularization parameter (lbd) | 0.05 | 33 | | Adam's parameter β1 (beta1) | 0.9 | 34 | | Adam's parameter β2 (beta2) | 0.999 | 35 | | Learning rate decay (decay_rate) | 1.0 | 36 | 37 | 38 | 39 | ## Installation, demo, and training 40 | 41 | **Prerequisites**: Make sure you've installed the required libraries/packages: Numpy, Tensorflow (only used to get the dataset), and Matplotlib 42 | 43 | 1. Clone this repository 44 | 45 | ```shell 46 | git clone https://github.com/pkien01/MNIST-neural-network-classifier 47 | ``` 48 | 49 | 2. Download the pretrained weights for the MNIST digits and fashions dataset [here](https://drive.google.com/drive/folders/1CmQRokKnQ75ukEU_Y5Lq9DsjYVWxP6MM?usp=sharing) and move it to the `MNIST-neural-network-classifier` folder. 50 | 51 | 3. After that, the `MNIST-neural-network-classifier` folder should have the following structure 52 | 53 | ```bash 54 | MNIST-neural-network-classifier/ 55 | MNIST-neural-network-classifier/model.py #the model source code 56 | MNIST-neural-network-classifier/mnist_trained_weights_deep.dat #the pretrained MNIST digits weights 57 | MNIST-neural-network-classifier/fashion_mnist_trained_weights_deep.dat #the pretrained MNIST fashion weights 58 | ``` 59 | 60 | 4. Open the `model.py` source code file and at the end, it should be similar to the following 61 | 62 | ```python 63 | ... 64 | W, b = load_cache() 65 | 66 | #gradient_descent(W, b, keep_prob=.7, lbd =.03, learning_rate=0.001) 67 | #print(set_performance(X_train, Y_train, W, b)) 68 | #print(set_performance(X_test, Y_test, W, b)) 69 | demo(W, b, fashion=False) 70 | #demo_wrong(W, b, fashion=False) 71 | #demo_all_layers(W, b) 72 | ``` 73 | 74 | 5. Run the source code file 75 | 76 | ```bash 77 | cd MNIST-neural-network-classifier 78 | python3 model.py 79 | ``` 80 | 81 | Congrats! You've just ran the demo on the MNIST handwritten digits dataset 82 | 83 | 6. To run it on the MNIST fashion dataset: 84 | 85 | - Open the `model.py` in a text/code editor 86 | 87 | - Change the following line (line 8 in the default source code) 88 | 89 | ```python 90 | (X_train, label_train), (X_test, label_test) = tf.keras.datasets.mnist.load_data() 91 | ``` 92 | 93 | to this 94 | 95 | ``` 96 | (X_train, label_train), (X_test, label_test) = tf.keras.datasets.fashion_mnist.load_data() 97 | ``` 98 | 99 | - Next, change the following line (line 24 in the default source code) 100 | 101 | ```python 102 | weights_file = './mnist_trained_weights_deep.dat' 103 | ``` 104 | 105 | to this 106 | 107 | ```python 108 | weights_file = './fashion_mnist_trained_weights_deep.dat' 109 | ``` 110 | 111 | * Next, set the `fashion` variable to `True` in the `demo` function 112 | 113 | ```python 114 | demo(W, b, fashion=False) 115 | #demo_wrong(W, b, fashion=False) 116 | ``` 117 | 118 | to this 119 | 120 | ```python 121 | demo(W, b, fashion=True) 122 | #demo_wrong(W, b, fashion=True) 123 | ``` 124 | * Repeat step 5 to run the source code file 125 | 126 | 7. To demo on incorrectly labeled images, uncomment the line `demo_wrong(W, b, fashion=False)` and comment all the other function calls (of course, the `fashion`variable can be set `True` or `False` depending if you want to demo on the fashion or the handwritten digits images). Then, repeat step 5 to run the source code. 127 | 128 | 8. To verify the model performance on train and test set, respectively, uncomment the following lines (and comment all the other function calls). 129 | 130 | ```python 131 | print(set_performance(X_train, Y_train, W, b)) 132 | print(set_performance(X_test, Y_test, W, b)) 133 | ``` 134 | 135 | Repeat step 5 to run the source code 136 | 137 | 9. To visualize each of the individual layers' activations, uncomment the function `demo_all_layers(W, b)` and comment the rest function calls. Repeat step 5 to run the source code. 138 | 139 | 10. To re-train it from the trained weights, uncomment the function `gradient_descent(W, b, keep_prob=.7, lbd =.03, learning_rate=0.001)` and comment the rest function calls. You can use your own set of hyperparameters to train the neural network or tune it yourself if you want. 140 | 141 | If you want to train from scratch, delete the weights file first before training (though it may take quite a long time to train): `mnist_trained_weights_deep.dat` for the MNIST handwritten digits dataset or `fashion_mnist_trained_weights_deep.dat` for the MNIST fashion dataset. 142 | 143 | You can also train it using a different algorithm like standard gradient descent or gradient descent with momentum, instead of Adam, by modifying the initialization and parameters update function calls in the `gradient descent()` function. For example, the update function of standard gradient descent is `update_para(W, b, dW, db, learning_rate)` and gradient descent with momentum is `update_para_momentum(W, b, dW, db, VdW, Vdb, epoch_num, learning_rate, beta1)`. 144 | 145 | 146 | 147 | ## Results of pretrained weights 148 | 149 | * On the MNIST handwritten digits dataset 150 | 151 | Training set accuracy: 99.99833%. 152 | 153 | Test set accuracy: 98.09%. 154 | 155 | * On the MNIST fashion dataset 156 | 157 | Training set accuracy: 99.39%. 158 | 159 | Test set accuracy: 89.24%. 160 | 161 | 162 | You can train it for a longer period, and/or adjust the hyperparameters, to get better performance. 163 | 164 | 165 | 166 | Here are the results on some example images on the handwritten digits dataset: 167 | 168 |

169 | 170 | 171 |

172 |

173 | 174 | 175 |

176 | 177 | And here are some on the fashion dataset: 178 | 179 |
180 |
181 | 182 | 183 |
184 |
185 | 186 | 187 |
188 |
189 | 190 | Finally, here are the visualizations of all the layers' activations (looks pretty random, huh?): 191 | ![](https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/master/sample_images/all_layers.png) 192 | ![](https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/master/sample_images/all_layers_fashion.png) 193 | 194 | Have fun playing around with the model as you like. 195 | 196 | 197 | 198 | ## Contacts 199 | 200 | If you have any questions or encounter some serious errors/bugs in the code, you can write an email to phamkienxmas2001@gmail.com, v.kienp16@vinai.io, or kindly leave a message below on GitHub. Thank you! 201 | -------------------------------------------------------------------------------- /model.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import matplotlib.pyplot as plt 3 | import numpy as np 4 | import math 5 | import pickle 6 | import os 7 | 8 | (X_train, label_train), (X_test, label_test) = tf.keras.datasets.mnist.load_data() 9 | 10 | X_test_images = X_test 11 | X_train_images = X_train 12 | #print('Train: X=%s, y=%s' % (X_train.shape, Y_train.shape)) 13 | #print('Test: X=%s, y=%s' % (X_test.shape, Y_test.shape)) 14 | m_train = X_train.shape[0] 15 | m_test = X_test.shape[0] 16 | n_x = X_train.shape[1] * X_train.shape[2] 17 | 18 | #Network structure 19 | n_classes = 10 20 | #n = [n_x, 2500, 2000, 1500, 1000, 500, n_classes] 21 | #n = [n_x, 800, n_classes] 22 | n = [n_x, 3000, 2500, 2000, 1500, 1000, 500, n_classes] 23 | L = len(n) 24 | weights_file = './mnist_trained_weights_deep.dat' 25 | 26 | def process_data(X, label): 27 | m = X.shape[0] 28 | assert(m == label.shape[0]) 29 | X = X.reshape(m, -1).T / 255 30 | Y = np.zeros((m, 10)) 31 | Y[np.arange(m), label] = 1 32 | Y = Y.T 33 | return X, Y 34 | 35 | X_train, Y_train = process_data(X_train, label_train) 36 | X_test, Y_test = process_data(X_test, label_test) 37 | 38 | def relu(x): 39 | return np.maximum(0, x) 40 | 41 | def softmax(x): 42 | t = np.exp(x - np.max(x, axis = 0).reshape((1, x.shape[1]))) 43 | return t / np.sum(t, axis = 0, keepdims = True) 44 | 45 | def forward_prop(X, W, b, keep_prob): 46 | m = X.shape[1] 47 | Z = [None] 48 | A = [X] 49 | D = [None] 50 | for l in range(1, L): 51 | Z.append(np.dot(W[l], A[l - 1]) + b[l]) 52 | if (l == L - 1): A.append(softmax(Z[l])) 53 | else: 54 | A.append(relu(Z[l])) 55 | D.append(np.random.rand(A[l].shape[0], A[l].shape[1]) < keep_prob) 56 | assert(D[l].shape == A[l].shape) 57 | A[l] = A[l] * D[l] / keep_prob 58 | 59 | assert(Z[l].shape == (n[l], m)) 60 | assert(A[l].shape == Z[l].shape) 61 | return Z, A, D 62 | 63 | def num_stable_prob(x, epsilon = 1e-18): 64 | x = np.maximum(x, epsilon) 65 | x = np.minimum(x, 1. - epsilon) 66 | return x 67 | 68 | def cross_entropy_loss(Yhat, Y, lbd): 69 | m = Y.shape[1] 70 | assert(m == Yhat.shape[1]) 71 | num_stable_prob(Yhat) 72 | res = -np.squeeze(np.sum(Y * np.log(Yhat))) / m 73 | assert(res.shape == ()) 74 | for l in range(1, L): res += lbd * np.sum(np.square(W[l])) / m / 2. 75 | return res 76 | 77 | def relu_der(x): 78 | return np.int64(x > 0) 79 | 80 | def backward_prop(X, Y, W, b, Z, A, D, keep_prob, lbd): 81 | dZ = [None] * L 82 | dW = [None] * L 83 | db = [None] * L 84 | m = Y.shape[1] 85 | assert(X.shape[1] == m) 86 | for l in reversed(range(1, L)): 87 | if (l == L - 1): dZ[l] = A[l] - Y 88 | else: 89 | dA_l = np.dot(W[l + 1].T, dZ[l + 1]) 90 | dA_l = dA_l * D[l] / keep_prob 91 | dZ[l] = dA_l * relu_der(Z[l]) 92 | 93 | dW[l] = np.dot(dZ[l], A[l - 1].T) / m + (lbd * W[l]) / m 94 | db[l] = np.sum(dZ[l], axis = 1, keepdims = True) / m 95 | assert(dZ[l].shape == Z[l].shape) 96 | assert(dW[l].shape == W[l].shape) 97 | assert(db[l].shape == b[l].shape) 98 | return dW, db 99 | 100 | def split_batches(X, Y, batch_size): 101 | m = X.shape[1] 102 | assert(m == Y.shape[1]) 103 | perm = list(np.random.permutation(m)) 104 | 105 | shuffled_X = X[:, perm] 106 | shuffled_Y = Y[:, perm].reshape((n_classes, m)) 107 | assert(shuffled_X.shape == X.shape) 108 | assert(shuffled_Y.shape == Y.shape) 109 | n_batches = m // batch_size 110 | batches = [] 111 | for i in range(0, n_batches): 112 | batch_X = shuffled_X[:, i * batch_size : (i + 1) * batch_size] 113 | batch_Y = shuffled_Y[:, i * batch_size : (i + 1) * batch_size] 114 | batches.append((batch_X, batch_Y)) 115 | if (m % batch_size != 0): 116 | batch_X = shuffled_X[:, batch_size * n_batches : m] 117 | batch_Y = shuffled_Y[:, batch_size * n_batches : m] 118 | batches.append((batch_X, batch_Y)) 119 | return batches 120 | 121 | def init_adam(): 122 | VSdW = [None] 123 | VSdb = [None] 124 | for l in range(1, L): 125 | VSdW.append(np.zeros_like(W[l])) 126 | VSdb.append(np.zeros_like(b[l])) 127 | return VSdW, VSdb 128 | 129 | def update_para(W, b, dW, db, alpha): 130 | for l in range(1, L): 131 | W[l] -= alpha * dW[l] 132 | b[l] -= alpha * db[l] 133 | return W, b 134 | 135 | def update_para_adam(W, b, dW, db, VdW, Vdb, SdW, Sdb, iter_idx, alpha, beta1, beta2, epsilon = 1e-8): 136 | for l in range(1, L): 137 | VdW[l] = beta1 * VdW[l] + (1. - beta1) * dW[l] 138 | Vdb[l] = beta1 * Vdb[l] + (1. - beta1) * db[l] 139 | 140 | SdW[l] = beta2 * SdW[l] + (1. - beta2) * np.square(dW[l]) 141 | Sdb[l] = beta2 * Sdb[l] + (1. - beta2) * np.square(db[l]) 142 | 143 | V_upd = VdW[l] / (1. - beta1 ** iter_idx) 144 | S_upd = SdW[l] / (1. - beta2 ** iter_idx) 145 | assert(V_upd.shape == S_upd.shape) 146 | assert(V_upd.shape == W[l].shape) 147 | W[l] -= alpha * V_upd / (np.sqrt(S_upd) + epsilon) 148 | 149 | V_upd = Vdb[l] / (1. - beta1 ** iter_idx) 150 | S_upd = Sdb[l] / (1. - beta2 ** iter_idx) 151 | assert(V_upd.shape == S_upd.shape) 152 | assert(V_upd.shape == b[l].shape) 153 | b[l] -= alpha * V_upd / (np.sqrt(S_upd) + epsilon) 154 | return W, b 155 | 156 | def update_para_momentum(W, b, dW, db, VdW, Vdb, iter_idx, alpha, beta): 157 | for l in range(1, L): 158 | VdW[l] = beta * VdW[l] + (1. - beta) * dW[l] 159 | Vdb[l] = beta * Vdb[l] + (1. - beta) * db[l] 160 | V_upd = VdW[l] / (1. - beta ** iter_idx) 161 | W[l] -= alpha * V_upd 162 | V_upd = Vdb[l] / (1. - beta ** iter_idx) 163 | b[l] -= alpha * V_upd 164 | return W, b 165 | 166 | def gradient_descent(W, b, n_epochs = 2000, batch_size = 2**8, keep_prob = 1., lbd = 0., learning_rate = .002, beta1 = .9, beta2 = .999, decay_rate = 1.): 167 | VdW, Vdb = init_adam() 168 | SdW, Sdb = init_adam() 169 | for epoch_num in range(n_epochs): 170 | batches = split_batches(X_train, Y_train, batch_size) 171 | n_batches = len(batches) 172 | for batch_idx in range(n_batches): 173 | X_cur, Y_cur = batches[batch_idx] 174 | Z, A, D = forward_prop(X_cur, W, b, keep_prob) 175 | cost = cross_entropy_loss(A[L - 1], Y_cur, lbd) 176 | iter_idx = epoch_num * n_batches + batch_idx + 1 177 | print("Cost after " + str(iter_idx) + " iterations: " + str(cost) + '.') 178 | dW, db = backward_prop(X_cur, Y_cur, W, b, Z, A, D, keep_prob, lbd) 179 | cur_learning_rate = learning_rate / math.sqrt(epoch_num + 1) / decay_rate 180 | update_para_adam(W, b, dW, db, VdW, Vdb, SdW, Sdb, iter_idx, cur_learning_rate, beta1, beta2) 181 | #update_para(W, b, dW, db, learning_rate) 182 | #update_para_momentum(W, b, dW, db, VdW, Vdb, epoch_num, learning_rate, beta1) 183 | Wfile = open(weights_file, 'wb') 184 | pickle.dump([W, b], Wfile) 185 | Wfile.close() 186 | 187 | def set_performance(X, Y, W, b, batch_size = 2**8): 188 | m = X.shape[1] 189 | assert(m == Y.shape[1]) 190 | batches = split_batches(X, Y, batch_size) 191 | acc = 0 192 | for batch_idx in range(len(batches)): 193 | X_cur, Y_cur = batches[batch_idx] 194 | m_cur = X_cur.shape[1] 195 | assert(m_cur == Y_cur.shape[1]) 196 | Z_cur, A_cur, D_cur = forward_prop(X_cur, W, b, 1.) 197 | pred = np.argmax(A_cur[L - 1], axis = 0).reshape((m_cur, 1)) 198 | label = np.argmax(Y_cur, axis = 0).reshape((m_cur, 1)) 199 | acc += np.sum(pred == label) 200 | return acc / m 201 | 202 | fashion_type = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"] 203 | 204 | def demo(W, b, idx = np.random.randint(0, m_test - 1), fashion = False): 205 | Z, A, D = forward_prop(X_test[:, idx].reshape(n_x, -1), W, b, 1.) 206 | pred = np.squeeze(np.argmax(A[L - 1])) 207 | sample = X_test_images[idx] 208 | plt.imshow(sample) 209 | if (fashion): plt.suptitle("Prediction label: " + fashion_type[pred] + " | Ground truth label: " + fashion_type[label_test[idx]] ) 210 | else: plt.suptitle("Prediction label: " + str(pred) + " | Ground truth label: " + str(label_test[idx]) ) 211 | plt.title("Confidence: " + str(np.squeeze(A[L - 1][pred]) * 100.) + "%.") 212 | plt.show() 213 | 214 | def demo_wrong(W, b, fashion = False): 215 | while (True): 216 | idx = np.random.randint(0, m_test - 1) 217 | Z, A, D = forward_prop(X_test[:, idx].reshape(n_x, -1), W, b, 1.) 218 | pred = np.squeeze(np.argmax(A[L - 1])) 219 | truth = label_test[idx] 220 | if (pred != truth): 221 | sample = X_test_images[idx] 222 | plt.imshow(sample) 223 | if (fashion): plt.suptitle("Prediction label: " + fashion_type[pred] + " | Ground truth label: " + fashion_type[label_test[idx]] ) 224 | else: plt.suptitle("Prediction label: " + str(pred) + " | Ground truth label: " + str(label_test[idx]) ) 225 | plt.title("Confidence: " + str(np.squeeze(A[L - 1][pred]) * 100.) + "%.") 226 | plt.show() 227 | break 228 | 229 | layers_shape = [(28, 28), (60, 50), (50, 50), (50, 40), (50, 30), (40, 25), (25, 20), (10, 1)] 230 | def demo_all_layers(W, b, idx = np.random.randint(0, m_test - 1)): 231 | Z, A, D = forward_prop(X_test[:, idx].reshape(n_x, -1), W, b, 1.) 232 | fig = plt.figure() 233 | for l in range(0, L): 234 | cur_plot = fig.add_subplot(1, L, l+1) 235 | cur_plot.set_title("Layer " + str(l)) 236 | plt.imshow(A[l].reshape(layers_shape[l])) 237 | plt.show() 238 | 239 | def load_cache(): 240 | file_exists = os.path.isfile(weights_file) 241 | if (file_exists): 242 | Wfile = open(weights_file, 'rb') 243 | W, b = pickle.load(Wfile) 244 | Wfile.close() 245 | else: 246 | W = [None] 247 | b = [None] 248 | for l in range(1, L): 249 | W.append(np.random.randn(n[l], n[l - 1]) * np.sqrt(2. / n[l - 1])) 250 | b.append(np.zeros((n[l], 1))) 251 | Wfile = open(weights_file, 'wb') 252 | pickle.dump([W, b], Wfile) 253 | Wfile.close() 254 | return W, b 255 | 256 | W, b = load_cache() 257 | 258 | #gradient_descent(W, b, keep_prob=.7, lbd =.03, learning_rate=0.001) 259 | #print(set_performance(X_train, Y_train, W, b)) 260 | #print(set_performance(X_test, Y_test, W, b)) 261 | demo(W, b, fashion=False) 262 | #demo_wrong(W, b, fashion=False) 263 | #demo_all_layers(W, b) 264 | -------------------------------------------------------------------------------- /sample_images/0_wrong.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/0_wrong.png -------------------------------------------------------------------------------- /sample_images/3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/3.png -------------------------------------------------------------------------------- /sample_images/4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/4.png -------------------------------------------------------------------------------- /sample_images/9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/9.png -------------------------------------------------------------------------------- /sample_images/all_layers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/all_layers.png -------------------------------------------------------------------------------- /sample_images/all_layers_fashion.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/all_layers_fashion.png -------------------------------------------------------------------------------- /sample_images/ankle_boot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/ankle_boot.png -------------------------------------------------------------------------------- /sample_images/bag_wrong.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/bag_wrong.png -------------------------------------------------------------------------------- /sample_images/dress.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/dress.png -------------------------------------------------------------------------------- /sample_images/shirt.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/shirt.png -------------------------------------------------------------------------------- /sample_images/trouser.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/trouser.png -------------------------------------------------------------------------------- /sample_images/tshirt_wrong.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pkien01/MNIST-neural-network-classifier/29c818727269e442e539884cd1cc57f9099d5d93/sample_images/tshirt_wrong.png --------------------------------------------------------------------------------