├── LICENSE ├── README.md ├── cnn.py ├── cnn_exercise.py ├── display_network.py ├── gradient.py ├── linear_decoder_exercise.py ├── load_MNIST.py ├── load_images.py ├── output ├── patches_raw.png ├── patches_zca.png ├── patches_zca_features.png ├── pca.png ├── pca_tilde.png ├── pca_zcawhite.png ├── raw_pca.png ├── weights_sampledata.png ├── weights_selftaughtlearning.png └── weights_sparseAE.png ├── pca_gen.py ├── sample_images.py ├── softmax.py ├── softmax_exercise.py ├── sparse_autoencoder.py ├── stacked_ae_exercise.py ├── stacked_autoencoder.py ├── stl_exercise.py └── train.py /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2014 Jatin Shah 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Stanford Unsupervised Feature Learning and Deep Learning Tutorial 2 | 3 | Tutorial Website: http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial 4 | 5 | ### Sparse Autoencoder 6 | Sparse Autoencoder vectorized implementation, learning/visualizing features on MNIST data 7 | 8 | * [load_MNIST.py](load_MNIST.py): Load MNIST images 9 | * [sample_images.py](sample_images.py): Load sample images for testing sparse auto-encoder 10 | * [gradient.py](gradient.py): Functions to compute & check cost and gradient 11 | * [display_network.py](display_network.py): Display visualized features 12 | * [sparse_autoencoder.py](sparse_autoencoder.py): Sparse autoencoder cost & gradient functions 13 | * [train.py](train.py): Train sparse autoencoder with MNIST data and visualize learnt featured 14 | 15 | ### Preprocessing: PCA & Whitening 16 | Implement PCA, PCA whitening & ZCA whitening 17 | 18 | * [pca_gen.py](pca_gen.py) 19 | 20 | ### Softmax Regression 21 | Classify MNIST digits via softmax regression (multivariate logistic regression) 22 | 23 | * [softmax.py](softmax.py): Softmax regression cost & gradient functions 24 | * [softmax_exercise](softmax_exercise.py): Classify MNIST digits 25 | 26 | ### Self-Taught Learning and Unsupervised Feature Learning 27 | Classify MNIST digits via self-taught learning paradigm, i.e. learn features via sparse autoencoder using digits 5-9 as unlabelled examples and train softmax regression on digits 0-4 as labelled examples 28 | 29 | * [stl_exercise.py](stl_exercise.py): Classify MNIST digits via self-taught learning 30 | 31 | ### Building Deep Networks for Classification (Stacked Sparse Autoencoder) 32 | Stacked sparse autoencoder for MNIST digit classification 33 | 34 | * [stacked_autoencoder.py](stacked_autoencoder.py): Stacked auto encoder cost & gradient functions 35 | * [stacked_ae_exercise.py](stacked_ae_exercise.py): Classify MNIST digits 36 | 37 | ### Linear Decoders with Auto encoders 38 | Learn features on 8x8 patches of 96x96 STL-10 color images via linear decoder (sparse autoencoder with linear activation function in output layer) 39 | 40 | * [linear_decoder_exercise.py](linear_decoder_exercise.py) 41 | 42 | ### Working with Large Images (Convolutional Neural Networks) 43 | Classify 64x64 STL-10 images using features learnt via linear decoder (previous section) and convolutional neural networks 44 | 45 | * [cnn.py](cnn.py): Convolution neural networks. Convolve & Pooling functions 46 | * [cnn_exercise.py](cnn_exercise.py): Classify STL-10 images 47 | -------------------------------------------------------------------------------- /cnn.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy.signal 3 | 4 | 5 | def sigmoid(x): 6 | return 1 / (1 + np.exp(-x)) 7 | 8 | 9 | def cnn_convolve(patch_dim, num_features, images, W, b, zca_white, patch_mean): 10 | """ 11 | Returns the convolution of the features given by W and b with 12 | the given images 13 | :param patch_dim: patch (feature) dimension 14 | :param num_features: number of features 15 | :param images: large images to convolve with, matrix in the form 16 | images(r, c, channel, image number) 17 | :param W: weights of the sparse autoencoder 18 | :param b: bias of the sparse autoencoder 19 | :param zca_white: zca whitening 20 | :param patch_mean: mean of the images 21 | :return: 22 | """ 23 | 24 | num_images = images.shape[3] 25 | image_dim = images.shape[0] 26 | image_channels = images.shape[2] 27 | 28 | # Instructions: 29 | # Convolve every feature with every large image here to produce the 30 | # numFeatures x numImages x (imageDim - patchDim + 1) x (imageDim - patchDim + 1) 31 | # matrix convolvedFeatures, such that 32 | # convolvedFeatures(featureNum, imageNum, imageRow, imageCol) is the 33 | # value of the convolved featureNum feature for the imageNum image over 34 | # the region (imageRow, imageCol) to (imageRow + patchDim - 1, imageCol + patchDim - 1) 35 | # 36 | # Expected running times: 37 | # Convolving with 100 images should take less than 3 minutes 38 | # Convolving with 5000 images should take around an hour 39 | # (So to save time when testing, you should convolve with less images, as 40 | # described earlier) 41 | 42 | convolved_features = np.zeros(shape=(num_features, num_images, image_dim - patch_dim + 1, 43 | image_dim - patch_dim + 1), 44 | dtype=np.float64) 45 | 46 | WT = W.dot(zca_white) 47 | bT = b - WT.dot(patch_mean) 48 | 49 | for i in range(num_images): 50 | for j in range(num_features): 51 | # convolution of image with feature matrix for each channel 52 | convolved_image = np.zeros(shape=(image_dim - patch_dim + 1, image_dim - patch_dim + 1), 53 | dtype=np.float64) 54 | 55 | for channel in range(image_channels): 56 | # Obtain the feature (patchDim x patchDim) needed during the convolution 57 | patch_size = patch_dim * patch_dim 58 | feature = WT[j, patch_size * channel:patch_size * (channel + 1)].reshape(patch_dim, patch_dim) 59 | 60 | # Flip the feature matrix because of the definition of convolution, as explained later 61 | feature = np.flipud(np.fliplr(feature)) 62 | 63 | # Obtain the image 64 | im = images[:, :, channel, i] 65 | 66 | # Convolve "feature" with "im", adding the result to convolvedImage 67 | # be sure to do a 'valid' convolution 68 | convolved_image += scipy.signal.convolve2d(im, feature, mode='valid') 69 | 70 | # Subtract the bias unit (correcting for the mean subtraction as well) 71 | # Then, apply the sigmoid function to get the hidden activation 72 | convolved_image = sigmoid(convolved_image + bT[j]) 73 | 74 | # The convolved feature is the sum of the convolved values for all channels 75 | convolved_features[j, i, :, :] = convolved_image 76 | 77 | return convolved_features 78 | 79 | 80 | def cnn_pool(pool_dim, convolved_features): 81 | """ 82 | Pools the given convolved features 83 | 84 | :param pool_dim: dimension of the pooling region 85 | :param convolved_features: convolved features to pool (as given by cnn_convolve) 86 | convolved_features(feature_num, image_num, image_row, image_col) 87 | :return: pooled_features: matrix of pooled features in the form 88 | pooledFeatures(featureNum, imageNum, poolRow, poolCol) 89 | """ 90 | 91 | num_images = convolved_features.shape[1] 92 | num_features = convolved_features.shape[0] 93 | convolved_dim = convolved_features.shape[2] 94 | 95 | assert convolved_dim % pool_dim == 0, "Pooling dimension is not an exact multiple of convolved dimension" 96 | 97 | pool_size = convolved_dim / pool_dim 98 | pooled_features = np.zeros(shape=(num_features, num_images, pool_size, pool_size), 99 | dtype=np.float64) 100 | 101 | for i in range(pool_size): 102 | for j in range(pool_size): 103 | pool = convolved_features[:, :, i * pool_dim:(i + 1) * pool_dim, j * pool_dim:(j + 1) * pool_dim] 104 | pooled_features[:, :, i, j] = np.mean(np.mean(pool, 2), 2) 105 | 106 | return pooled_features -------------------------------------------------------------------------------- /cnn_exercise.py: -------------------------------------------------------------------------------- 1 | import cPickle as pickle 2 | import display_network 3 | import numpy as np 4 | import scipy.io 5 | import cnn 6 | import sparse_autoencoder 7 | import sys 8 | import time 9 | import datetime 10 | import softmax 11 | 12 | ## CS294A/CS294W Convolutional Neural Networks Exercise 13 | 14 | # Instructions 15 | # ------------ 16 | # 17 | # This file contains code that helps you get started on the 18 | # convolutional neural networks exercise. In this exercise, you will only 19 | # need to modify cnnConvolve.m and cnnPool.m. You will not need to modify 20 | # this file. 21 | 22 | ##====================================================================== 23 | ## STEP 0: Initialization 24 | # Here we initialize some parameters used for the exercise. 25 | 26 | image_dim = 64 # image dimension 27 | image_channels = 3 # number of channels (rgb, so 3) 28 | 29 | patch_dim = 8 # patch dimension 30 | num_patches = 50000 # number of patches 31 | 32 | visible_size = patch_dim * patch_dim * image_channels # number of input units 33 | output_size = visible_size # number of output units 34 | hidden_size = 400 # number of hidden units 35 | 36 | epsilon = 0.1 # epsilon for ZCA whitening 37 | 38 | pool_dim = 19 # dimension of pooling region 39 | 40 | ##====================================================================== 41 | ## STEP 1: Train a sparse autoencoder (with a linear decoder) to learn 42 | # features from color patches. If you have completed the linear decoder 43 | # execise, use the features that you have obtained from that exercise, 44 | # loading them into optTheta. Recall that we have to keep around the 45 | # parameters used in whitening (i.e., the ZCA whitening matrix and the 46 | # meanPatch) 47 | with open('stl10_features.pickle', 'r') as f: 48 | opt_theta = pickle.load(f) 49 | zca_white = pickle.load(f) 50 | patch_mean = pickle.load(f) 51 | 52 | # Display and check to see that the features look good 53 | W = opt_theta[0:hidden_size * visible_size].reshape(hidden_size, visible_size) 54 | b = opt_theta[2 * hidden_size * visible_size:2 * hidden_size * visible_size + hidden_size] 55 | display_network.display_color_network(W.dot(zca_white).transpose(), 'zca_features_test.png') 56 | 57 | 58 | ##====================================================================== 59 | ## STEP 2: Implement and test convolution and pooling 60 | # In this step, you will implement convolution and pooling, and test them 61 | # on a small part of the data set to ensure that you have implemented 62 | # these two functions correctly. In the next step, you will actually 63 | # convolve and pool the features with the STL10 images. 64 | 65 | ## STEP 2a: Implement convolution 66 | # Implement convolution in the function cnnConvolve in cnnConvolve.m 67 | 68 | # Note that we have to preprocess the images in the exact same way 69 | # we preprocessed the patches before we can obtain the feature activations. 70 | 71 | stl_train = scipy.io.loadmat('data/stlTrainSubset.mat') 72 | train_images = stl_train['trainImages'] 73 | train_labels = stl_train['trainLabels'] 74 | num_train_images = stl_train['numTrainImages'][0][0] 75 | 76 | ## Use only the first 8 images for testing 77 | conv_images = train_images[:, :, :, 0:8] 78 | 79 | convolved_features = cnn.cnn_convolve(patch_dim, hidden_size, conv_images, 80 | W, b, zca_white, patch_mean) 81 | 82 | ## STEP 2b: Checking your convolution 83 | # To ensure that you have convolved the features correctly, we have 84 | # provided some code to compare the results of your convolution with 85 | # activations from the sparse autoencoder 86 | 87 | # For 1000 random points 88 | for i in range(1000): 89 | feature_num = np.random.randint(0, hidden_size) 90 | image_num = np.random.randint(0, 8) 91 | image_row = np.random.randint(0, image_dim - patch_dim + 1) 92 | image_col = np.random.randint(0, image_dim - patch_dim + 1) 93 | 94 | patch = conv_images[image_row:image_row + patch_dim, image_col:image_col + patch_dim, :, image_num] 95 | 96 | patch = np.concatenate((patch[:, :, 0].flatten(), patch[:, :, 1].flatten(), patch[:, :, 2].flatten())) 97 | patch = np.reshape(patch, (patch.size, 1)) 98 | patch = patch - np.tile(patch_mean, (patch.shape[1], 1)).transpose() 99 | patch = zca_white.dot(patch) 100 | 101 | features = sparse_autoencoder.sparse_autoencoder(opt_theta, hidden_size, visible_size, patch) 102 | 103 | if abs(features[feature_num, 0] - convolved_features[feature_num, image_num, image_row, image_col]) > 1e-9: 104 | print 'Convolved feature does not match activation from autoencoder' 105 | print 'Feature Number :', feature_num 106 | print 'Image Number :', image_num 107 | print 'Image Row :', image_row 108 | print 'Image Column :', image_col 109 | print 'Convolved feature :', convolved_features[feature_num, image_num, image_row, image_col] 110 | print 'Sparse AE feature :', features[feature_num, 0] 111 | sys.exit("Convolved feature does not match activation from autoencoder. Exiting...") 112 | 113 | print 'Congratulations! Your convolution code passed the test.' 114 | 115 | ## STEP 2c: Implement pooling 116 | # Implement pooling in the function cnnPool in cnnPool.m 117 | 118 | # NOTE: Implement cnnPool in cnnPool.m first! 119 | 120 | ## STEP 2d: Checking your pooling 121 | # To ensure that you have implemented pooling, we will use your pooling 122 | # function to pool over a test matrix and check the results. 123 | test_matrix = np.arange(64).reshape(8, 8) 124 | expected_matrix = np.array([[np.mean(test_matrix[0:4, 0:4]), np.mean(test_matrix[0:4, 4:8])], 125 | [np.mean(test_matrix[4:8, 0:4]), np.mean(test_matrix[4:8, 4:8])]]) 126 | 127 | test_matrix = np.reshape(test_matrix, (1, 1, 8, 8)) 128 | 129 | pooled_features = cnn.cnn_pool(4, test_matrix) 130 | 131 | if not (pooled_features == expected_matrix).all(): 132 | print "Pooling incorrect" 133 | print "Expected matrix" 134 | print expected_matrix 135 | print "Got" 136 | print pooled_features 137 | 138 | print 'Congratulations! Your pooling code passed the test.' 139 | 140 | ##====================================================================== 141 | ## STEP 3: Convolve and pool with the dataset 142 | # In this step, you will convolve each of the features you learned with 143 | # the full large images to obtain the convolved features. You will then 144 | # pool the convolved features to obtain the pooled features for 145 | # classification. 146 | # 147 | # Because the convolved features matrix is very large, we will do the 148 | # convolution and pooling 50 features at a time to avoid running out of 149 | # memory. Reduce this number if necessary 150 | step_size = 25 151 | assert hidden_size % step_size == 0, "step_size should divide hidden_size" 152 | 153 | stl_train = scipy.io.loadmat('data/stlTrainSubset.mat') 154 | train_images = stl_train['trainImages'] 155 | train_labels = stl_train['trainLabels'] 156 | num_train_images = stl_train['numTrainImages'][0][0] 157 | 158 | stl_test = scipy.io.loadmat('data/stlTestSubset.mat') 159 | test_images = stl_test['testImages'] 160 | test_labels = stl_test['testLabels'] 161 | num_test_images = stl_test['numTestImages'][0][0] 162 | 163 | pooled_features_train = np.zeros(shape=(hidden_size, num_train_images, 164 | np.floor((image_dim - patch_dim + 1) / pool_dim), 165 | np.floor((image_dim - patch_dim + 1) / pool_dim)), 166 | dtype=np.float64) 167 | pooled_features_test = np.zeros(shape=(hidden_size, num_test_images, 168 | np.floor((image_dim - patch_dim + 1) / pool_dim), 169 | np.floor((image_dim - patch_dim + 1) / pool_dim)), 170 | dtype=np.float64) 171 | 172 | start_time = time.time() 173 | for conv_part in range(hidden_size / step_size): 174 | features_start = conv_part * step_size 175 | features_end = (conv_part + 1) * step_size 176 | print "Step:", conv_part, "features", features_start, "to", features_end 177 | 178 | Wt = W[features_start:features_end, :] 179 | bt = b[features_start:features_end] 180 | 181 | print "Convolving & pooling train images" 182 | convolved_features = cnn.cnn_convolve(patch_dim, step_size, train_images, 183 | Wt, bt, zca_white, patch_mean) 184 | pooled_features = cnn.cnn_pool(pool_dim, convolved_features) 185 | pooled_features_train[features_start:features_end, :, :, :] = pooled_features 186 | 187 | print "Time elapsed:", str(datetime.timedelta(seconds=time.time() - start_time)) 188 | 189 | print "Convolving and pooling test images" 190 | convolved_features = cnn.cnn_convolve(patch_dim, step_size, test_images, 191 | Wt, bt, zca_white, patch_mean) 192 | pooled_features = cnn.cnn_pool(pool_dim, convolved_features) 193 | pooled_features_test[features_start:features_end, :, :, :] = pooled_features 194 | 195 | print "Time elapsed:", str(datetime.timedelta(seconds=time.time() - start_time)) 196 | 197 | print('Saving pooled features...') 198 | with open('cnn_pooled_features.pickle', 'wb') as f: 199 | pickle.dump(pooled_features_train, f) 200 | pickle.dump(pooled_features_test, f) 201 | 202 | print "Saved" 203 | print "Time elapsed:", str(datetime.timedelta(seconds=time.time() - start_time)) 204 | 205 | ##====================================================================== 206 | ## STEP 4: Use pooled features for classification 207 | # Now, you will use your pooled features to train a softmax classifier, 208 | # using softmaxTrain from the softmax exercise. 209 | # Training the softmax classifer for 1000 iterations should take less than 210 | # 10 minutes. 211 | 212 | # Load pooled features 213 | with open('cnn_pooled_features.pickle', 'r') as f: 214 | pooled_features_train = pickle.load(f) 215 | pooled_features_test = pickle.load(f) 216 | 217 | # Setup parameters for softmax 218 | softmax_lambda = 1e-4 219 | num_classes = 4 220 | 221 | # Reshape the pooled_features to form an input vector for softmax 222 | softmax_images = np.transpose(pooled_features_train, axes=[0, 2, 3, 1]) 223 | softmax_images = softmax_images.reshape((softmax_images.size / num_train_images, num_train_images)) 224 | softmax_labels = train_labels.flatten() - 1 # Ensure that labels are from 0..n-1 (for n classes) 225 | 226 | options_ = {'maxiter': 1000, 'disp': True} 227 | softmax_model = softmax.softmax_train(softmax_images.size / num_train_images, num_classes, 228 | softmax_lambda, softmax_images, softmax_labels, options_) 229 | 230 | (softmax_opt_theta, softmax_input_size, softmax_num_classes) = softmax_model 231 | 232 | 233 | ##====================================================================== 234 | ## STEP 5: Test classifer 235 | # Now you will test your trained classifer against the test images 236 | softmax_images = np.transpose(pooled_features_test, axes=[0, 2, 3, 1]) 237 | softmax_images = softmax_images.reshape((softmax_images.size / num_test_images, num_test_images)) 238 | softmax_labels = test_labels.flatten() - 1 239 | 240 | predictions = softmax.softmax_predict(softmax_model, softmax_images) 241 | print "Accuracy: {0:.2f}%".format(100 * np.sum(predictions == softmax_labels, dtype=np.float64) / test_labels.shape[0]) 242 | 243 | # You should expect to get an accuracy of around 80% on the test images. 244 | -------------------------------------------------------------------------------- /display_network.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import matplotlib.pyplot as plt 3 | import matplotlib 4 | import PIL 5 | 6 | 7 | # This function visualizes filters in matrix A. Each column of A is a 8 | # filter. We will reshape each column into a square image and visualizes 9 | # on each cell of the visualization panel. 10 | # All other parameters are optional, usually you do not need to worry 11 | # about it. 12 | # opt_normalize: whether we need to normalize the filter so that all of 13 | # them can have similar contrast. Default value is true. 14 | # opt_graycolor: whether we use gray as the heat map. Default is true. 15 | # opt_colmajor: you can switch convention to row major for A. In that 16 | # case, each row of A is a filter. Default value is false. 17 | def display_network(A, filename='weights.png'): 18 | opt_normalize = True 19 | opt_graycolor = True 20 | 21 | # Rescale 22 | A = A - np.average(A) 23 | 24 | # Compute rows & cols 25 | (row, col) = A.shape 26 | sz = int(np.ceil(np.sqrt(row))) 27 | buf = 1 28 | n = np.ceil(np.sqrt(col)) 29 | m = np.ceil(col / n) 30 | 31 | image = np.ones(shape=(buf + m * (sz + buf), buf + n * (sz + buf))) 32 | 33 | if not opt_graycolor: 34 | image *= 0.1 35 | 36 | k = 0 37 | for i in range(int(m)): 38 | for j in range(int(n)): 39 | if k >= col: 40 | continue 41 | 42 | clim = np.max(np.abs(A[:, k])) 43 | 44 | if opt_normalize: 45 | image[buf + i * (sz + buf):buf + i * (sz + buf) + sz, buf + j * (sz + buf):buf + j * (sz + buf) + sz] = \ 46 | A[:, k].reshape(sz, sz) / clim 47 | else: 48 | image[buf + i * (sz + buf):buf + i * (sz + buf) + sz, buf + j * (sz + buf):buf + j * (sz + buf) + sz] = \ 49 | A[:, k].reshape(sz, sz) / np.max(np.abs(A)) 50 | k += 1 51 | 52 | plt.imsave(filename, image, cmap=matplotlib.cm.gray) 53 | 54 | 55 | def display_color_network(A, filename='weights.png'): 56 | """ 57 | # display receptive field(s) or basis vector(s) for image patches 58 | # 59 | # A the basis, with patches as column vectors 60 | 61 | # In case the midpoint is not set at 0, we shift it dynamically 62 | 63 | :param A: 64 | :param file: 65 | :return: 66 | """ 67 | if np.min(A) >= 0: 68 | A = A - np.mean(A) 69 | 70 | cols = np.round(np.sqrt(A.shape[1])) 71 | 72 | channel_size = A.shape[0] / 3 73 | dim = np.sqrt(channel_size) 74 | dimp = dim + 1 75 | rows = np.ceil(A.shape[1] / cols) 76 | 77 | B = A[0:channel_size, :] 78 | C = A[channel_size:2 * channel_size, :] 79 | D = A[2 * channel_size:3 * channel_size, :] 80 | 81 | B = B / np.max(np.abs(B)) 82 | C = C / np.max(np.abs(C)) 83 | D = D / np.max(np.abs(D)) 84 | 85 | # Initialization of the image 86 | image = np.ones(shape=(dim * rows + rows - 1, dim * cols + cols - 1, 3)) 87 | 88 | for i in range(int(rows)): 89 | for j in range(int(cols)): 90 | # This sets the patch 91 | image[i * dimp:i * dimp + dim, j * dimp:j * dimp + dim, 0] = B[:, i * cols + j].reshape(dim, dim) 92 | image[i * dimp:i * dimp + dim, j * dimp:j * dimp + dim, 1] = C[:, i * cols + j].reshape(dim, dim) 93 | image[i * dimp:i * dimp + dim, j * dimp:j * dimp + dim, 2] = D[:, i * cols + j].reshape(dim, dim) 94 | 95 | image = (image + 1) / 2 96 | 97 | PIL.Image.fromarray(np.uint8(image * 255), 'RGB').save(filename) 98 | 99 | return 0 -------------------------------------------------------------------------------- /gradient.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import stacked_autoencoder 3 | 4 | 5 | # this function accepts a 2D vector as input. 6 | # Its outputs are: 7 | # value: h(x1, x2) = x1^2 + 3*x1*x2 8 | # grad: A 2x1 vector that gives the partial derivatives of h with respect to x1 and x2 9 | # Note that when we pass @simpleQuadraticFunction(x) to computeNumericalGradients, we're assuming 10 | # that computeNumericalGradients will use only the first returned value of this function. 11 | def simple_quadratic_function(x): 12 | value = x[0] ** 2 + 3 * x[0] * x[1] 13 | 14 | grad = np.zeros(shape=2, dtype=np.float32) 15 | grad[0] = 2 * x[0] + 3 * x[1] 16 | grad[1] = 3 * x[0] 17 | 18 | return value, grad 19 | 20 | 21 | # theta: a vector of parameters 22 | # J: a function that outputs a real-number. Calling y = J(theta) will return the 23 | # function value at theta. 24 | def compute_gradient(J, theta): 25 | epsilon = 0.0001 26 | 27 | gradient = np.zeros(theta.shape) 28 | for i in range(theta.shape[0]): 29 | theta_epsilon_plus = np.array(theta, dtype=np.float64) 30 | theta_epsilon_plus[i] = theta[i] + epsilon 31 | theta_epsilon_minus = np.array(theta, dtype=np.float64) 32 | theta_epsilon_minus[i] = theta[i] - epsilon 33 | 34 | gradient[i] = (J(theta_epsilon_plus)[0] - J(theta_epsilon_minus)[0]) / (2 * epsilon) 35 | if i % 100 == 0: 36 | print "Computing gradient for input:", i 37 | 38 | return gradient 39 | 40 | 41 | # This code can be used to check your numerical gradient implementation 42 | # in computeNumericalGradient.m 43 | # It analytically evaluates the gradient of a very simple function called 44 | # simpleQuadraticFunction (see below) and compares the result with your numerical 45 | # solution. Your numerical gradient implementation is incorrect if 46 | # your numerical solution deviates too much from the analytical solution. 47 | def check_gradient(): 48 | x = np.array([4, 10], dtype=np.float64) 49 | (value, grad) = simple_quadratic_function(x) 50 | 51 | num_grad = compute_gradient(simple_quadratic_function, x) 52 | print num_grad, grad 53 | print "The above two columns you get should be very similar.\n" \ 54 | "(Left-Your Numerical Gradient, Right-Analytical Gradient)\n" 55 | 56 | diff = np.linalg.norm(num_grad - grad) / np.linalg.norm(num_grad + grad) 57 | print diff 58 | print "Norm of the difference between numerical and analytical num_grad (should be < 1e-9)\n" 59 | 60 | 61 | def check_stacked_autoencoder(): 62 | """ 63 | # Check the gradients for the stacked autoencoder 64 | # 65 | # In general, we recommend that the creation of such files for checking 66 | # gradients when you write new cost functions. 67 | # 68 | 69 | :return: 70 | """ 71 | ## Setup random data / small model 72 | 73 | input_size = 64 74 | hidden_size_L1 = 36 75 | hidden_size_L2 = 25 76 | lambda_ = 0.01 77 | data = np.random.randn(input_size, 10) 78 | labels = np.random.randint(4, size=10) 79 | num_classes = 4 80 | 81 | stack = [dict() for i in range(2)] 82 | stack[0]['w'] = 0.1 * np.random.randn(hidden_size_L1, input_size) 83 | stack[0]['b'] = np.random.randn(hidden_size_L1) 84 | stack[1]['w'] = 0.1 * np.random.randn(hidden_size_L2, hidden_size_L1) 85 | stack[1]['b'] = np.random.randn(hidden_size_L2) 86 | softmax_theta = 0.005 * np.random.randn(hidden_size_L2 * num_classes) 87 | 88 | params, net_config = stacked_autoencoder.stack2params(stack) 89 | 90 | stacked_theta = np.concatenate((softmax_theta, params)) 91 | 92 | cost, grad = stacked_autoencoder.stacked_autoencoder_cost(stacked_theta, input_size, 93 | hidden_size_L2, num_classes, 94 | net_config, lambda_, data, labels) 95 | 96 | # Check that the numerical and analytic gradients are the same 97 | J = lambda x: stacked_autoencoder.stacked_autoencoder_cost(x, input_size, hidden_size_L2, 98 | num_classes, net_config, lambda_, 99 | data, labels) 100 | num_grad = compute_gradient(J, stacked_theta) 101 | 102 | print num_grad, grad 103 | print "The above two columns you get should be very similar.\n" \ 104 | "(Left-Your Numerical Gradient, Right-Analytical Gradient)\n" 105 | 106 | diff = np.linalg.norm(num_grad - grad) / np.linalg.norm(num_grad + grad) 107 | print diff 108 | print "Norm of the difference between numerical and analytical num_grad (should be < 1e-9)\n" -------------------------------------------------------------------------------- /linear_decoder_exercise.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import sparse_autoencoder 3 | import gradient 4 | import scipy.io 5 | import display_network 6 | import scipy.optimize 7 | import cPickle 8 | 9 | ##====================================================================== 10 | ## STEP 0: Initialization 11 | # Here we initialize some parameters used for the exercise. 12 | 13 | image_channels = 3 # number of channels (rgb, so 3) 14 | 15 | patch_dim = 8 # patch dimension 16 | num_patches = 100000 # number of patches 17 | 18 | visible_size = patch_dim * patch_dim * image_channels # number of input units 19 | output_size = visible_size # number of output units 20 | hidden_size = 400 # number of hidden units 21 | 22 | sparsity_param = 0.035 # desired average activation of the hidden units. 23 | lambda_ = 3e-3 # weight decay parameter 24 | beta = 5 # weight of sparsity penalty term 25 | 26 | epsilon = 0.1 # epsilon for ZCA whitening 27 | 28 | ##====================================================================== 29 | ## STEP 1: Create and modify sparseAutoencoderLinearCost.m to use a linear decoder, 30 | # and check gradients 31 | # You should copy sparseAutoencoderCost.m from your earlier exercise 32 | # and rename it to sparseAutoencoderLinearCost.m. 33 | # Then you need to rename the function from sparseAutoencoderCost to 34 | # sparseAutoencoderLinearCost, and modify it so that the sparse autoencoder 35 | # uses a linear decoder instead. Once that is done, you should check 36 | # your gradients to verify that they are correct. 37 | 38 | # NOTE: Modify sparseAutoencoderCost first! 39 | 40 | # To speed up gradient checking, we will use a reduced network and some 41 | # dummy patches 42 | 43 | debug_hidden_size = 5 44 | debug_visible_size = 8 45 | patches = np.random.rand(8, 10) 46 | 47 | theta = sparse_autoencoder.initialize(debug_hidden_size, debug_visible_size) 48 | 49 | cost, grad = sparse_autoencoder.sparse_autoencoder_linear_cost(theta, debug_visible_size, debug_hidden_size, 50 | lambda_, sparsity_param, beta, patches) 51 | 52 | # Check gradients 53 | J = lambda x: sparse_autoencoder.sparse_autoencoder_linear_cost(x, debug_visible_size, debug_hidden_size, 54 | lambda_, sparsity_param, beta, patches) 55 | num_grad = gradient.compute_gradient(J, theta) 56 | 57 | print grad, num_grad 58 | 59 | # Compare numerically computed gradients with the ones obtained from backpropagation 60 | diff = np.linalg.norm(num_grad - grad) / np.linalg.norm(num_grad + grad) 61 | print diff 62 | print "Norm of the difference between numerical and analytical num_grad (should be < 1e-9)\n\n" 63 | 64 | ##====================================================================== 65 | ## STEP 2: Learn features on small patches 66 | # In this step, you will use your sparse autoencoder (which now uses a 67 | # linear decoder) to learn features on small patches sampled from related 68 | # images. 69 | 70 | ## STEP 2a: Load patches 71 | # In this step, we load 100k patches sampled from the STL10 dataset and 72 | # visualize them. Note that these patches have been scaled to [0,1] 73 | 74 | patches = scipy.io.loadmat('data/stlSampledPatches.mat')['patches'] 75 | 76 | display_network.display_color_network(patches[:, 0:100], filename='patches_raw.png') 77 | 78 | 79 | ## STEP 2b: Apply preprocessing 80 | # In this sub-step, we preprocess the sampled patches, in particular, 81 | # ZCA whitening them. 82 | # 83 | # In a later exercise on convolution and pooling, you will need to replicate 84 | # exactly the preprocessing steps you apply to these patches before 85 | # using the autoencoder to learn features on them. Hence, we will save the 86 | # ZCA whitening and mean image matrices together with the learned features 87 | # later on. 88 | 89 | # Subtract mean patch (hence zeroing the mean of the patches) 90 | patch_mean = np.mean(patches, 1) 91 | patches = patches - np.tile(patch_mean, (patches.shape[1], 1)).transpose() 92 | 93 | # Apply ZCA whitening 94 | sigma = patches.dot(patches.transpose()) / patches.shape[1] 95 | (u, s, v) = np.linalg.svd(sigma) 96 | zca_white = u.dot(np.diag(1 / (s + epsilon))).dot(u.transpose()) 97 | patches_zca = zca_white.dot(patches) 98 | 99 | display_network.display_color_network(patches_zca[:, 0:100], filename='patches_zca.png') 100 | 101 | ## STEP 2c: Learn features 102 | # You will now use your sparse autoencoder (with linear decoder) to learn 103 | # features on the preprocessed patches. This should take around 45 minutes. 104 | 105 | theta = sparse_autoencoder.initialize(hidden_size, visible_size) 106 | 107 | options_ = {'maxiter': 400, 'disp': True} 108 | 109 | J = lambda x: sparse_autoencoder.sparse_autoencoder_linear_cost(x, visible_size, hidden_size, 110 | lambda_, sparsity_param, beta, patches_zca) 111 | 112 | result = scipy.optimize.minimize(J, theta, method='L-BFGS-B', jac=True, options=options_) 113 | opt_theta = result.x 114 | print result 115 | 116 | # Save the learned features and the preprocessing matrices for use in 117 | # the later exercise on convolution and pooling 118 | print('Saving learned features and preprocessing matrices...') 119 | with open('stl10_features.pickle', 'wb') as f: 120 | cPickle.dump(opt_theta, f) 121 | cPickle.dump(zca_white, f) 122 | cPickle.dump(patch_mean, f) 123 | print('Saved.') 124 | 125 | ## STEP 2d: Visualize learned features 126 | W = opt_theta[0:hidden_size * visible_size].reshape(hidden_size, visible_size) 127 | b = opt_theta[2 * hidden_size * visible_size:2 * hidden_size * visible_size + hidden_size] 128 | display_network.display_color_network(W.dot(zca_white).transpose(), 'patches_zca_features.png') -------------------------------------------------------------------------------- /load_MNIST.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def load_MNIST_images(filename): 5 | """ 6 | returns a 28x28x[number of MNIST images] matrix containing 7 | the raw MNIST images 8 | :param filename: input data file 9 | """ 10 | with open(filename, "r") as f: 11 | magic = np.fromfile(f, dtype=np.dtype('>i4'), count=1) 12 | 13 | num_images = np.fromfile(f, dtype=np.dtype('>i4'), count=1) 14 | num_rows = np.fromfile(f, dtype=np.dtype('>i4'), count=1) 15 | num_cols = np.fromfile(f, dtype=np.dtype('>i4'), count=1) 16 | 17 | images = np.fromfile(f, dtype=np.ubyte) 18 | images = images.reshape((num_images, num_rows * num_cols)).transpose() 19 | images = images.astype(np.float64) / 255 20 | 21 | f.close() 22 | 23 | return images 24 | 25 | 26 | def load_MNIST_labels(filename): 27 | """ 28 | returns a [number of MNIST images]x1 matrix containing 29 | the labels for the MNIST images 30 | 31 | :param filename: input file with labels 32 | """ 33 | with open(filename, 'r') as f: 34 | magic = np.fromfile(f, dtype=np.dtype('>i4'), count=1) 35 | 36 | num_labels = np.fromfile(f, dtype=np.dtype('>i4'), count=1) 37 | 38 | labels = np.fromfile(f, dtype=np.ubyte) 39 | 40 | f.close() 41 | 42 | return labels -------------------------------------------------------------------------------- /load_images.py: -------------------------------------------------------------------------------- 1 | import cPickle 2 | import numpy as np 3 | 4 | 5 | def unpickle(file_name): 6 | fo = open(file_name, 'rb') 7 | image_dict = cPickle.load(fo) 8 | fo.close() 9 | return image_dict 10 | 11 | 12 | # Each column contains grayscale value for the image 13 | # Squash data to [0.1, 0.9] 14 | def normalize_data(images): 15 | # Subtract mean of each image from its individual values 16 | mean = images.mean(axis=0) 17 | images = images - mean 18 | 19 | # Truncate to +/- 3 standard deviations and scale to -1 and +1 20 | pstd = 3 * images.std() 21 | images = np.maximum(np.minimum(images, pstd), -pstd) / pstd 22 | 23 | # Rescale from [-1,+1] to [0.1,0.9] 24 | images = (1 + images) * 0.4 + 0.1 25 | 26 | return images 27 | 28 | 29 | # Convert RGB values to monochrome 30 | def monochrome(r, g, b): 31 | return (0.2125 * r) + (0.7154 * g) + (0.0721 * b) 32 | 33 | 34 | # Returns 10000 gray scale images for training from CIFAR-10 data 35 | def load_images(): 36 | image_size = 32 37 | num_images = 10000 38 | image_file = 'data/cifar10/data_batch_1' 39 | 40 | # Load Images & select first num_images images 41 | image_dict = unpickle(image_file) 42 | image_data = image_dict['data'][0:num_images] 43 | 44 | # Convert to grayscale & normalize 45 | red_data = image_data[:, 0:image_size * image_size] 46 | green_data = image_data[:, image_size * image_size:2 * image_size * image_size] 47 | blue_data = image_data[:, 2 * image_size * image_size:3 * image_size * image_size] 48 | 49 | grayscale_data = monochrome(red_data, green_data, blue_data) 50 | grayscale_data = normalize_data(grayscale_data.transpose()) 51 | 52 | return normalize_data(grayscale_data) -------------------------------------------------------------------------------- /output/patches_raw.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/patches_raw.png -------------------------------------------------------------------------------- /output/patches_zca.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/patches_zca.png -------------------------------------------------------------------------------- /output/patches_zca_features.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/patches_zca_features.png -------------------------------------------------------------------------------- /output/pca.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/pca.png -------------------------------------------------------------------------------- /output/pca_tilde.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/pca_tilde.png -------------------------------------------------------------------------------- /output/pca_zcawhite.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/pca_zcawhite.png -------------------------------------------------------------------------------- /output/raw_pca.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/raw_pca.png -------------------------------------------------------------------------------- /output/weights_sampledata.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/weights_sampledata.png -------------------------------------------------------------------------------- /output/weights_selftaughtlearning.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/weights_selftaughtlearning.png -------------------------------------------------------------------------------- /output/weights_sparseAE.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/weights_sparseAE.png -------------------------------------------------------------------------------- /pca_gen.py: -------------------------------------------------------------------------------- 1 | import sample_images 2 | import random 3 | import display_network 4 | import numpy as np 5 | 6 | 7 | ##================================================================ 8 | ## Step 0a: Load data 9 | # Here we provide the code to load natural image data into x. 10 | # x will be a 144 * 10000 matrix, where the kth column x(:, k) corresponds to 11 | # the raw image data from the kth 12x12 image patch sampled. 12 | # You do not need to change the code below. 13 | 14 | patches = sample_images.sample_images_raw() 15 | num_samples = patches.shape[1] 16 | random_sel = random.sample(range(num_samples), 400) 17 | display_network.display_network(patches[:, random_sel], 'raw_pca.png') 18 | 19 | ##================================================================ 20 | ## Step 0b: Zero-mean the data (by row) 21 | # You can make use of the mean and repmat/bsxfun functions. 22 | 23 | # patches = patches - patches.mean(axis=0) 24 | patch_mean = patches.mean(axis=1) 25 | patches = patches - np.tile(patch_mean, (patches.shape[1], 1)).transpose() 26 | 27 | ##================================================================ 28 | ## Step 1a: Implement PCA to obtain xRot 29 | # Implement PCA to obtain xRot, the matrix in which the data is expressed 30 | # with respect to the eigenbasis of sigma, which is the matrix U. 31 | 32 | sigma = patches.dot(patches.transpose()) / patches.shape[1] 33 | (u, s, v) = np.linalg.svd(sigma) 34 | 35 | patches_rot = u.transpose().dot(patches) 36 | 37 | ##================================================================ 38 | ## Step 2: Find k, the number of components to retain 39 | # Write code to determine k, the number of components to retain in order 40 | # to retain at least 99% of the variance. 41 | 42 | k = 0 43 | for k in range(s.shape[0]): 44 | if s[0:k].sum() / s.sum() >= 0.99: 45 | break 46 | print 'Optimal k to retain 99% variance is:', k 47 | 48 | ##================================================================ 49 | ## Step 3: Implement PCA with dimension reduction 50 | # Now that you have found k, you can reduce the dimension of the data by 51 | # discarding the remaining dimensions. In this way, you can represent the 52 | # data in k dimensions instead of the original 144, which will save you 53 | # computational time when running learning algorithms on the reduced 54 | # representation. 55 | # 56 | # Following the dimension reduction, invert the PCA transformation to produce 57 | # the matrix xHat, the dimension-reduced data with respect to the original basis. 58 | # Visualise the data and compare it to the raw data. You will observe that 59 | # there is little loss due to throwing away the principal components that 60 | # correspond to dimensions with low variation. 61 | 62 | patches_tilde = u[:, 0:k].transpose().dot(patches) 63 | patches_hat = u.dot(np.resize(patches_tilde, patches.shape)) 64 | 65 | display_network.display_network(patches_hat[:, random_sel], 'pca_tilde.png') 66 | display_network.display_network(patches[:, random_sel], 'pca.png') 67 | 68 | ##================================================================ 69 | ## Step 4a: Implement PCA with whitening and regularisation 70 | # Implement PCA with whitening and regularisation to produce the matrix 71 | # xPCAWhite. 72 | 73 | epsilon = 0.1 74 | patches_pcawhite = np.diag(1 / (s + epsilon)).dot(patches_rot) 75 | 76 | 77 | ##================================================================ 78 | ## Step 5: Implement ZCA whitening 79 | # Now implement ZCA whitening to produce the matrix xZCAWhite. 80 | # Visualise the data and compare it to the raw data. You should observe 81 | # that whitening results in, among other things, enhanced edges. 82 | 83 | patches_zcawhite = u.dot(patches_pcawhite) 84 | display_network.display_network(patches_zcawhite[:, random_sel], 'pca_zcawhite.png') 85 | -------------------------------------------------------------------------------- /sample_images.py: -------------------------------------------------------------------------------- 1 | import random 2 | 3 | import numpy as np 4 | import scipy.io 5 | 6 | 7 | # Returns 10000 image patches for training 8 | # Each column contains grayscale value for the image 9 | # Squash data to [0.1, 0.9] 10 | def normalize_data(images): 11 | # Subtract mean of each image from its individual values 12 | mean = images.mean(axis=0) 13 | images = images - mean 14 | 15 | # Truncate to +/- 3 standard deviations and scale to -1 and +1 16 | pstd = 3 * images.std() 17 | images = np.maximum(np.minimum(images, pstd), -pstd) / pstd 18 | 19 | # Rescale from [-1,+1] to [0.1,0.9] 20 | images = (1 + images) * 0.4 + 0.1 21 | 22 | return images 23 | 24 | 25 | # Returns 10000 patches for training 26 | # IMAGES is a 3D array containing 10 images 27 | # For instance, IMAGES(:,:,6) is a 512x512 array containing the 6th image, 28 | # (The contrast on these images look a bit off because they have 29 | # been preprocessed using using "whitening." See the lecture notes for 30 | # more details.) As a second example, IMAGES(21:30,21:30,1) is an image 31 | # patch corresponding to the pixels in the block (21,21) to (30,30) of 32 | # Image 1 33 | def sample_images(): 34 | patch_size = 8 35 | num_patches = 10000 36 | num_images = 10 37 | image_size = 512 38 | 39 | image_data = scipy.io.loadmat('data/IMAGES.mat')['IMAGES'] 40 | 41 | # Initialize patches with zeros. 42 | patches = np.zeros(shape=(patch_size * patch_size, num_patches)) 43 | 44 | for i in range(num_patches): 45 | image_id = random.randint(0, num_images - 1) 46 | image_x = random.randint(0, image_size - patch_size) 47 | image_y = random.randint(0, image_size - patch_size) 48 | 49 | img = image_data[:, :, image_id] 50 | patch = img[image_x:image_x + patch_size, image_y:image_y + patch_size].reshape(patch_size * patch_size) 51 | patches[:, i] = patch 52 | 53 | return normalize_data(patches) 54 | 55 | 56 | # sampleIMAGESRAW 57 | # Returns 10000 "raw" unwhitened patches 58 | def sample_images_raw(): 59 | image_data = scipy.io.loadmat('data/IMAGES_RAW.mat')['IMAGESr'] 60 | 61 | patch_size = 12 62 | num_patches = 10000 63 | num_images = image_data.shape[2] 64 | image_size = image_data.shape[0] 65 | 66 | patches = np.zeros(shape=(patch_size * patch_size, num_patches)) 67 | 68 | for i in range(num_patches): 69 | image_id = random.randint(0, num_images - 1) 70 | image_x = random.randint(0, image_size - patch_size) 71 | image_y = random.randint(0, image_size - patch_size) 72 | 73 | img = image_data[:, :, image_id] 74 | patch = img[image_x:image_x + patch_size, image_y:image_y + patch_size].reshape(patch_size * patch_size) 75 | patches[:, i] = patch 76 | 77 | return patches -------------------------------------------------------------------------------- /softmax.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy.sparse 3 | import scipy.optimize 4 | 5 | 6 | def softmax_cost(theta, num_classes, input_size, lambda_, data, labels): 7 | """ 8 | 9 | :param theta: 10 | :param num_classes: the number of classes 11 | :param input_size: the size N of input vector 12 | :param lambda_: weight decay parameter 13 | :param data: the N x M input matrix, where each column corresponds 14 | a single test set 15 | :param labels: an M x 1 matrix containing the labels for the input data 16 | """ 17 | m = data.shape[1] 18 | theta = theta.reshape(num_classes, input_size) 19 | theta_data = theta.dot(data) 20 | theta_data = theta_data - np.max(theta_data) 21 | prob_data = np.exp(theta_data) / np.sum(np.exp(theta_data), axis=0) 22 | indicator = scipy.sparse.csr_matrix((np.ones(m), (labels, np.array(range(m))))) 23 | indicator = np.array(indicator.todense()) 24 | cost = (-1 / m) * np.sum(indicator * np.log(prob_data)) + (lambda_ / 2) * np.sum(theta * theta) 25 | 26 | grad = (-1 / m) * (indicator - prob_data).dot(data.transpose()) + lambda_ * theta 27 | 28 | return cost, grad.flatten() 29 | 30 | 31 | def softmax_predict(model, data): 32 | # model - model trained using softmaxTrain 33 | # data - the N x M input matrix, where each column data(:, i) corresponds to 34 | # a single test set 35 | # 36 | # Your code should produce the prediction matrix 37 | # pred, where pred(i) is argmax_c P(y(c) | x(i)). 38 | 39 | opt_theta, input_size, num_classes = model 40 | opt_theta = opt_theta.reshape(num_classes, input_size) 41 | 42 | prod = opt_theta.dot(data) 43 | pred = np.exp(prod) / np.sum(np.exp(prod), axis=0) 44 | pred = pred.argmax(axis=0) 45 | 46 | return pred 47 | 48 | 49 | def softmax_train(input_size, num_classes, lambda_, data, labels, options={'maxiter': 400, 'disp': True}): 50 | #softmaxTrain Train a softmax model with the given parameters on the given 51 | # data. Returns softmaxOptTheta, a vector containing the trained parameters 52 | # for the model. 53 | # 54 | # input_size: the size of an input vector x^(i) 55 | # num_classes: the number of classes 56 | # lambda_: weight decay parameter 57 | # input_data: an N by M matrix containing the input data, such that 58 | # inputData(:, c) is the cth input 59 | # labels: M by 1 matrix containing the class labels for the 60 | # corresponding inputs. labels(c) is the class label for 61 | # the cth input 62 | # options (optional): options 63 | # options.maxIter: number of iterations to train for 64 | 65 | # Initialize theta randomly 66 | theta = 0.005 * np.random.randn(num_classes * input_size) 67 | 68 | J = lambda x: softmax_cost(x, num_classes, input_size, lambda_, data, labels) 69 | 70 | result = scipy.optimize.minimize(J, theta, method='L-BFGS-B', jac=True, options=options) 71 | 72 | print result 73 | # Return optimum theta, input size & num classes 74 | opt_theta = result.x 75 | 76 | return opt_theta, input_size, num_classes 77 | 78 | -------------------------------------------------------------------------------- /softmax_exercise.py: -------------------------------------------------------------------------------- 1 | import load_MNIST 2 | import numpy as np 3 | import softmax 4 | import gradient 5 | 6 | ##====================================================================== 7 | ## STEP 0: Initialise constants and parameters 8 | # 9 | # Here we define and initialise some constants which allow your code 10 | # to be used more generally on any arbitrary input. 11 | # We also initialise some parameters used for tuning the model. 12 | 13 | # Size of input vector (MNIST images are 28x28) 14 | input_size = 28 * 28 15 | # Number of classes (MNIST images fall into 10 classes) 16 | num_classes = 10 17 | # Weight decay parameter 18 | lambda_ = 1e-4 19 | # Debug 20 | debug = False 21 | 22 | ##====================================================================== 23 | ## STEP 1: Load data 24 | # 25 | # In this section, we load the input and output data. 26 | # For softmax regression on MNIST pixels, 27 | # the input data is the images, and 28 | # the output data is the labels. 29 | # 30 | 31 | # Change the filenames if you've saved the files under different names 32 | # On some platforms, the files might be saved as 33 | # train-images.idx3-ubyte / train-labels.idx1-ubyte 34 | 35 | images = load_MNIST.load_MNIST_images('data/mnist/train-images-idx3-ubyte') 36 | labels = load_MNIST.load_MNIST_labels('data/mnist/train-labels-idx1-ubyte') 37 | 38 | if debug: 39 | input_size = 8 * 8 40 | input_data = np.random.randn(input_size, 100) 41 | labels = np.random.randint(num_classes, size=100) 42 | else: 43 | input_size = 28 * 28 44 | input_data = images 45 | 46 | # Randomly initialise theta 47 | theta = 0.005 * np.random.randn(num_classes * input_size) 48 | 49 | 50 | ##====================================================================== 51 | ## STEP 2: Implement softmaxCost 52 | # 53 | # Implement softmaxCost in softmaxCost.m. 54 | 55 | (cost, grad) = softmax.softmax_cost(theta, num_classes, input_size, lambda_, input_data, labels) 56 | 57 | ##====================================================================== 58 | ## STEP 3: Gradient checking 59 | # 60 | # As with any learning algorithm, you should always check that your 61 | # gradients are correct before learning the parameters. 62 | # 63 | if debug: 64 | J = lambda x: softmax.softmax_cost(x, num_classes, input_size, lambda_, input_data, labels) 65 | 66 | num_grad = gradient.compute_gradient(J, theta) 67 | 68 | # Use this to visually compare the gradients side by side 69 | print num_grad, grad 70 | 71 | # Compare numerically computed gradients with the ones obtained from backpropagation 72 | diff = np.linalg.norm(num_grad - grad) / np.linalg.norm(num_grad + grad) 73 | print diff 74 | print "Norm of the difference between numerical and analytical num_grad (should be < 1e-7)\n\n" 75 | 76 | ##====================================================================== 77 | ## STEP 4: Learning parameters 78 | # 79 | # Once you have verified that your gradients are correct, 80 | # you can start training your softmax regression code using softmaxTrain 81 | # (which uses minFunc). 82 | 83 | options_ = {'maxiter': 100, 'disp': True} 84 | opt_theta, input_size, num_classes = softmax.softmax_train(input_size, num_classes, 85 | lambda_, input_data, labels, options_) 86 | 87 | ##====================================================================== 88 | ## STEP 5: Testing 89 | # 90 | # You should now test your model against the test images. 91 | # To do this, you will first need to write softmaxPredict 92 | # (in softmaxPredict.m), which should return predictions 93 | # given a softmax model and the input data. 94 | 95 | test_images = load_MNIST.load_MNIST_images('data/mnist/t10k-images.idx3-ubyte') 96 | test_labels = load_MNIST.load_MNIST_labels('data/mnist/t10k-labels.idx1-ubyte') 97 | predictions = softmax.softmax_predict((opt_theta, input_size, num_classes), test_images) 98 | print "Accuracy: {0:.2f}%".format(100 * np.sum(predictions == test_labels, dtype=np.float64) / test_labels.shape[0]) -------------------------------------------------------------------------------- /sparse_autoencoder.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def sigmoid(x): 5 | return 1 / (1 + np.exp(-x)) 6 | 7 | 8 | def sigmoid_prime(x): 9 | return sigmoid(x) * (1 - sigmoid(x)) 10 | 11 | 12 | def KL_divergence(x, y): 13 | return x * np.log(x / y) + (1 - x) * np.log((1 - x) / (1 - y)) 14 | 15 | 16 | def initialize(hidden_size, visible_size): 17 | # we'll choose weights uniformly from the interval [-r, r] 18 | r = np.sqrt(6) / np.sqrt(hidden_size + visible_size + 1) 19 | W1 = np.random.random((hidden_size, visible_size)) * 2 * r - r 20 | W2 = np.random.random((visible_size, hidden_size)) * 2 * r - r 21 | 22 | b1 = np.zeros(hidden_size, dtype=np.float64) 23 | b2 = np.zeros(visible_size, dtype=np.float64) 24 | 25 | theta = np.concatenate((W1.reshape(hidden_size * visible_size), 26 | W2.reshape(hidden_size * visible_size), 27 | b1.reshape(hidden_size), 28 | b2.reshape(visible_size))) 29 | 30 | return theta 31 | 32 | 33 | # visible_size: the number of input units (probably 64) 34 | # hidden_size: the number of hidden units (probably 25) 35 | # lambda_: weight decay parameter 36 | # sparsity_param: The desired average activation for the hidden units (denoted in the lecture 37 | # notes by the greek alphabet rho, which looks like a lower-case "p"). 38 | # beta: weight of sparsity penalty term 39 | # data: Our 64x10000 matrix containing the training data. So, data(:,i) is the i-th training example. 40 | # 41 | # The input theta is a vector (because minFunc expects the parameters to be a vector). 42 | # We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this 43 | # follows the notation convention of the lecture notes. 44 | # Returns: (cost,gradient) tuple 45 | def sparse_autoencoder_cost(theta, visible_size, hidden_size, 46 | lambda_, sparsity_param, beta, data): 47 | # The input theta is a vector (because minFunc expects the parameters to be a vector). 48 | # We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this 49 | # follows the notation convention of the lecture notes. 50 | 51 | W1 = theta[0:hidden_size * visible_size].reshape(hidden_size, visible_size) 52 | W2 = theta[hidden_size * visible_size:2 * hidden_size * visible_size].reshape(visible_size, hidden_size) 53 | b1 = theta[2 * hidden_size * visible_size:2 * hidden_size * visible_size + hidden_size] 54 | b2 = theta[2 * hidden_size * visible_size + hidden_size:] 55 | 56 | # Number of training examples 57 | m = data.shape[1] 58 | 59 | # Forward propagation 60 | z2 = W1.dot(data) + np.tile(b1, (m, 1)).transpose() 61 | a2 = sigmoid(z2) 62 | z3 = W2.dot(a2) + np.tile(b2, (m, 1)).transpose() 63 | h = sigmoid(z3) 64 | 65 | # Sparsity 66 | rho_hat = np.sum(a2, axis=1) / m 67 | rho = np.tile(sparsity_param, hidden_size) 68 | 69 | # Cost function 70 | cost = np.sum((h - data) ** 2) / (2 * m) + \ 71 | (lambda_ / 2) * (np.sum(W1 ** 2) + np.sum(W2 ** 2)) + \ 72 | beta * np.sum(KL_divergence(rho, rho_hat)) 73 | 74 | # Backprop 75 | sparsity_delta = np.tile(- rho / rho_hat + (1 - rho) / (1 - rho_hat), (m, 1)).transpose() 76 | 77 | delta3 = -(data - h) * sigmoid_prime(z3) 78 | delta2 = (W2.transpose().dot(delta3) + beta * sparsity_delta) * sigmoid_prime(z2) 79 | W1grad = delta2.dot(data.transpose()) / m + lambda_ * W1 80 | W2grad = delta3.dot(a2.transpose()) / m + lambda_ * W2 81 | b1grad = np.sum(delta2, axis=1) / m 82 | b2grad = np.sum(delta3, axis=1) / m 83 | 84 | # After computing the cost and gradient, we will convert the gradients back 85 | # to a vector format (suitable for minFunc). Specifically, we will unroll 86 | # your gradient matrices into a vector. 87 | grad = np.concatenate((W1grad.reshape(hidden_size * visible_size), 88 | W2grad.reshape(hidden_size * visible_size), 89 | b1grad.reshape(hidden_size), 90 | b2grad.reshape(visible_size))) 91 | 92 | return cost, grad 93 | 94 | 95 | def sparse_autoencoder(theta, hidden_size, visible_size, data): 96 | """ 97 | :param theta: trained weights from the autoencoder 98 | :param hidden_size: the number of hidden units (probably 25) 99 | :param visible_size: the number of input units (probably 64) 100 | :param data: Our matrix containing the training data as columns. So, data(:,i) is the i-th training example. 101 | """ 102 | 103 | # We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this 104 | # follows the notation convention of the lecture notes. 105 | W1 = theta[0:hidden_size * visible_size].reshape(hidden_size, visible_size) 106 | b1 = theta[2 * hidden_size * visible_size:2 * hidden_size * visible_size + hidden_size] 107 | 108 | # Number of training examples 109 | m = data.shape[1] 110 | 111 | # Forward propagation 112 | z2 = W1.dot(data) + np.tile(b1, (m, 1)).transpose() 113 | a2 = sigmoid(z2) 114 | 115 | return a2 116 | 117 | 118 | # visible_size: the number of input units (probably 64) 119 | # hidden_size: the number of hidden units (probably 25) 120 | # lambda_: weight decay parameter 121 | # sparsity_param: The desired average activation for the hidden units (denoted in the lecture 122 | # notes by the greek alphabet rho, which looks like a lower-case "p"). 123 | # beta: weight of sparsity penalty term 124 | # data: Our 64x10000 matrix containing the training data. So, data(:,i) is the i-th training example. 125 | # 126 | # The input theta is a vector (because minFunc expects the parameters to be a vector). 127 | # We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this 128 | # follows the notation convention of the lecture notes. 129 | # Returns: (cost,gradient) tuple 130 | def sparse_autoencoder_linear_cost(theta, visible_size, hidden_size, 131 | lambda_, sparsity_param, beta, data): 132 | # The input theta is a vector (because minFunc expects the parameters to be a vector). 133 | # We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this 134 | # follows the notation convention of the lecture notes. 135 | 136 | W1 = theta[0:hidden_size * visible_size].reshape(hidden_size, visible_size) 137 | W2 = theta[hidden_size * visible_size:2 * hidden_size * visible_size].reshape(visible_size, hidden_size) 138 | b1 = theta[2 * hidden_size * visible_size:2 * hidden_size * visible_size + hidden_size] 139 | b2 = theta[2 * hidden_size * visible_size + hidden_size:] 140 | 141 | # Number of training examples 142 | m = data.shape[1] 143 | 144 | # Forward propagation 145 | z2 = W1.dot(data) + np.tile(b1, (m, 1)).transpose() 146 | a2 = sigmoid(z2) 147 | z3 = W2.dot(a2) + np.tile(b2, (m, 1)).transpose() 148 | h = z3 149 | 150 | # Sparsity 151 | rho_hat = np.sum(a2, axis=1) / m 152 | rho = np.tile(sparsity_param, hidden_size) 153 | 154 | 155 | # Cost function 156 | cost = np.sum((h - data) ** 2) / (2 * m) + \ 157 | (lambda_ / 2) * (np.sum(W1 ** 2) + np.sum(W2 ** 2)) + \ 158 | beta * np.sum(KL_divergence(rho, rho_hat)) 159 | 160 | 161 | 162 | # Backprop 163 | sparsity_delta = np.tile(- rho / rho_hat + (1 - rho) / (1 - rho_hat), (m, 1)).transpose() 164 | 165 | delta3 = -(data - h) 166 | delta2 = (W2.transpose().dot(delta3) + beta * sparsity_delta) * sigmoid_prime(z2) 167 | W1grad = delta2.dot(data.transpose()) / m + lambda_ * W1 168 | W2grad = delta3.dot(a2.transpose()) / m + lambda_ * W2 169 | b1grad = np.sum(delta2, axis=1) / m 170 | b2grad = np.sum(delta3, axis=1) / m 171 | 172 | # After computing the cost and gradient, we will convert the gradients back 173 | # to a vector format (suitable for minFunc). Specifically, we will unroll 174 | # your gradient matrices into a vector. 175 | grad = np.concatenate((W1grad.reshape(hidden_size * visible_size), 176 | W2grad.reshape(hidden_size * visible_size), 177 | b1grad.reshape(hidden_size), 178 | b2grad.reshape(visible_size))) 179 | 180 | return cost, grad 181 | 182 | -------------------------------------------------------------------------------- /stacked_ae_exercise.py: -------------------------------------------------------------------------------- 1 | import load_MNIST 2 | import sparse_autoencoder 3 | import scipy.optimize 4 | import softmax 5 | import stacked_autoencoder 6 | import numpy as np 7 | 8 | ##====================================================================== 9 | ## STEP 0: Here we provide the relevant parameters values that will 10 | # allow your sparse autoencoder to get good filters; you do not need to 11 | # change the parameters below. 12 | 13 | input_size = 28 * 28 14 | num_classes = 10 15 | hidden_size_L1 = 200 # Layer 1 Hidden Size 16 | hidden_size_L2 = 200 # Layer 2 Hidden Size 17 | sparsity_param = 0.1 # desired average activation of the hidden units. 18 | lambda_ = 3e-3 # weight decay parameter 19 | beta = 3 # weight of sparsity penalty term 20 | 21 | ##====================================================================== 22 | ## STEP 1: Load data from the MNIST database 23 | # 24 | # This loads our training data from the MNIST database files. 25 | 26 | train_images = load_MNIST.load_MNIST_images('data/mnist/train-images-idx3-ubyte') 27 | train_labels = load_MNIST.load_MNIST_labels('data/mnist/train-labels-idx1-ubyte') 28 | 29 | 30 | ##====================================================================== 31 | ## STEP 2: Train the first sparse autoencoder 32 | # This trains the first sparse autoencoder on the unlabelled STL training 33 | # images. 34 | # If you've correctly implemented sparseAutoencoderCost.m, you don't need 35 | # to change anything here. 36 | 37 | # Randomly initialize the parameters 38 | sae1_theta = sparse_autoencoder.initialize(hidden_size_L1, input_size) 39 | 40 | J = lambda x: sparse_autoencoder.sparse_autoencoder_cost(x, input_size, hidden_size_L1, 41 | lambda_, sparsity_param, 42 | beta, train_images) 43 | options_ = {'maxiter': 400, 'disp': True} 44 | 45 | result = scipy.optimize.minimize(J, sae1_theta, method='L-BFGS-B', jac=True, options=options_) 46 | sae1_opt_theta = result.x 47 | 48 | print result 49 | 50 | ##====================================================================== 51 | ## STEP 3: Train the second sparse autoencoder 52 | # This trains the second sparse autoencoder on the first autoencoder 53 | # featurse. 54 | # If you've correctly implemented sparseAutoencoderCost.m, you don't need 55 | # to change anything here. 56 | 57 | sae1_features = sparse_autoencoder.sparse_autoencoder(sae1_opt_theta, hidden_size_L1, 58 | input_size, train_images) 59 | 60 | # Randomly initialize the parameters 61 | sae2_theta = sparse_autoencoder.initialize(hidden_size_L2, hidden_size_L1) 62 | 63 | J = lambda x: sparse_autoencoder.sparse_autoencoder_cost(x, hidden_size_L1, hidden_size_L2, 64 | lambda_, sparsity_param, 65 | beta, sae1_features) 66 | 67 | options_ = {'maxiter': 400, 'disp': True} 68 | 69 | result = scipy.optimize.minimize(J, sae2_theta, method='L-BFGS-B', jac=True, options=options_) 70 | sae2_opt_theta = result.x 71 | 72 | print result 73 | 74 | 75 | ##====================================================================== 76 | ## STEP 4: Train the softmax classifier 77 | # This trains the sparse autoencoder on the second autoencoder features. 78 | # If you've correctly implemented softmaxCost.m, you don't need 79 | # to change anything here. 80 | 81 | sae2_features = sparse_autoencoder.sparse_autoencoder(sae2_opt_theta, hidden_size_L2, 82 | hidden_size_L1, sae1_features) 83 | 84 | options_ = {'maxiter': 400, 'disp': True} 85 | 86 | softmax_theta, softmax_input_size, softmax_num_classes = softmax.softmax_train(hidden_size_L2, num_classes, 87 | lambda_, sae2_features, 88 | train_labels, options_) 89 | 90 | ##====================================================================== 91 | ## STEP 5: Finetune softmax model 92 | 93 | # Implement the stacked_autoencoder_cost to give the combined cost of the whole model 94 | # then run this cell. 95 | 96 | 97 | # Initialize the stack using the parameters learned 98 | stack = [dict() for i in range(2)] 99 | stack[0]['w'] = sae1_opt_theta[0:hidden_size_L1 * input_size].reshape(hidden_size_L1, input_size) 100 | stack[0]['b'] = sae1_opt_theta[2 * hidden_size_L1 * input_size:2 * hidden_size_L1 * input_size + hidden_size_L1] 101 | stack[1]['w'] = sae2_opt_theta[0:hidden_size_L1 * hidden_size_L2].reshape(hidden_size_L2, hidden_size_L1) 102 | stack[1]['b'] = sae2_opt_theta[2 * hidden_size_L1 * hidden_size_L2:2 * hidden_size_L1 * hidden_size_L2 + hidden_size_L2] 103 | 104 | # Initialize the parameters for the deep model 105 | (stack_params, net_config) = stacked_autoencoder.stack2params(stack) 106 | 107 | stacked_autoencoder_theta = np.concatenate((softmax_theta.flatten(), stack_params)) 108 | 109 | J = lambda x: stacked_autoencoder.stacked_autoencoder_cost(x, input_size, hidden_size_L2, 110 | num_classes, net_config, lambda_, 111 | train_images, train_labels) 112 | 113 | options_ = {'maxiter': 400, 'disp': True} 114 | result = scipy.optimize.minimize(J, stacked_autoencoder_theta, method='L-BFGS-B', jac=True, options=options_) 115 | stacked_autoencoder_opt_theta = result.x 116 | 117 | print result 118 | 119 | ##====================================================================== 120 | ## STEP 6: Test 121 | 122 | test_images = load_MNIST.load_MNIST_images('data/mnist/t10k-images.idx3-ubyte') 123 | test_labels = load_MNIST.load_MNIST_labels('data/mnist/t10k-labels.idx1-ubyte') 124 | 125 | 126 | # Two auto encoders without fine tuning 127 | pred = stacked_autoencoder.stacked_autoencoder_predict(stacked_autoencoder_theta, input_size, hidden_size_L2, 128 | num_classes, net_config, test_images) 129 | 130 | print "Before fine-tuning accuracy: {0:.2f}%".format(100 * np.sum(pred == test_labels, dtype=np.float64) / 131 | test_labels.shape[0]) 132 | 133 | # Two auto encoders with fine tuning 134 | pred = stacked_autoencoder.stacked_autoencoder_predict(stacked_autoencoder_opt_theta, input_size, hidden_size_L2, 135 | num_classes, net_config, test_images) 136 | 137 | print "After fine-tuning accuracy: {0:.2f}%".format(100 * np.sum(pred == test_labels, dtype=np.float64) / 138 | test_labels.shape[0]) 139 | -------------------------------------------------------------------------------- /stacked_autoencoder.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy.sparse 3 | import softmax 4 | 5 | 6 | def sigmoid(x): 7 | return 1 / (1 + np.exp(-x)) 8 | 9 | 10 | def sigmoid_prime(x): 11 | return sigmoid(x) * (1 - sigmoid(x)) 12 | 13 | 14 | def stack2params(stack): 15 | """ 16 | Converts a "stack" structure into a flattened parameter vector and also 17 | stores the network configuration. This is useful when working with 18 | optimization toolboxes such as minFunc. 19 | 20 | [params, netconfig] = stack2params(stack) 21 | 22 | stack - the stack structure, where stack{1}.w = weights of first layer 23 | stack{1}.b = weights of first layer 24 | stack{2}.w = weights of second layer 25 | stack{2}.b = weights of second layer 26 | ... etc. 27 | 28 | :param stack: the stack structure 29 | :return: params: flattened parameter vector 30 | :return: net_config: aux. variable with network structure 31 | """ 32 | 33 | params = [] 34 | for s in stack: 35 | params.append(s['w'].flatten()) 36 | params.append(s['b'].flatten()) 37 | params = np.concatenate(params) 38 | 39 | net_config = {} 40 | if len(stack) == 0: 41 | net_config['input_size'] = 0 42 | net_config['layer_sizes'] = [] 43 | else: 44 | net_config['input_size'] = stack[0]['w'].shape[1] 45 | net_config['layer_sizes'] = [] 46 | for s in stack: 47 | net_config['layer_sizes'].append(s['w'].shape[0]) 48 | 49 | return params, net_config 50 | 51 | 52 | def params2stack(params, net_config): 53 | """ 54 | Converts a flattened parameter vector into a nice "stack" structure 55 | for us to work with. This is useful when you're building multilayer 56 | networks. 57 | stack = params2stack(params, netconfig) 58 | 59 | :param params: flattened parameter vector 60 | :param net_config: aux. variable containing network config. 61 | :return: stack structure (see above) 62 | 63 | """ 64 | # Map the params (a vector into a stack of weights) 65 | depth = len(net_config['layer_sizes']) 66 | stack = [dict() for i in range(depth)] 67 | 68 | prev_layer_size = net_config['input_size'] 69 | current_pos = 0 70 | 71 | for i in range(depth): 72 | # Extract weights 73 | wlen = prev_layer_size * net_config['layer_sizes'][i] 74 | stack[i]['w'] = params[current_pos:current_pos + wlen].reshape(net_config['layer_sizes'][i], prev_layer_size) 75 | current_pos = current_pos + wlen 76 | 77 | # Extract bias 78 | blen = net_config['layer_sizes'][i] 79 | stack[i]['b'] = params[current_pos:current_pos + blen] 80 | current_pos = current_pos + blen 81 | 82 | # Set previous layer size 83 | prev_layer_size = net_config['layer_sizes'][i] 84 | 85 | return stack 86 | 87 | 88 | def stacked_autoencoder_cost(theta, input_size, hidden_size, num_classes, 89 | net_config, lambda_, data, labels): 90 | """ 91 | Takes a trained softmax_theta and a training data set with labels 92 | and returns cost and gradient using stacked autoencoder model. 93 | Used only for finetuning 94 | 95 | :param theta: trained weights from the autoencoder 96 | :param input_size: the number of input units 97 | :param hidden_size: the number of hidden units (at the layer before softmax) 98 | :param num_classes: number of categories 99 | :param net_config: network configuration of the stack 100 | :param lambda_: weight regularization penalty 101 | :param data: matrix containing data as columns. data[:,i-1] is i-th example 102 | :param labels: vector containing labels, labels[i-1] is the label for i-th example 103 | """ 104 | 105 | ## Unroll softmax_theta parameter 106 | 107 | # We first extract the part which compute the softmax gradient 108 | softmax_theta = theta[0:hidden_size * num_classes].reshape(num_classes, hidden_size) 109 | 110 | # Extract out the "stack" 111 | stack = params2stack(theta[hidden_size * num_classes:], net_config) 112 | 113 | m = data.shape[1] 114 | 115 | # Forward propagation 116 | a = [data] 117 | z = [np.array(0)] # Dummy value 118 | 119 | for s in stack: 120 | z.append(s['w'].dot(a[-1]) + np.tile(s['b'], (m, 1)).transpose()) 121 | a.append(sigmoid(z[-1])) 122 | 123 | # Softmax 124 | prod = softmax_theta.dot(a[-1]) 125 | prod = prod - np.max(prod) 126 | prob = np.exp(prod) / np.sum(np.exp(prod), axis=0) 127 | indicator = scipy.sparse.csr_matrix((np.ones(m), (labels, np.array(range(m))))) 128 | indicator = np.array(indicator.todense()) 129 | 130 | cost = (-1 / float(m)) * np.sum(indicator * np.log(prob)) + (lambda_ / 2) * np.sum(softmax_theta * softmax_theta) 131 | softmax_grad = (-1 / float(m)) * (indicator - prob).dot(a[-1].transpose()) + lambda_ * softmax_theta 132 | 133 | # Backprop 134 | # Compute partial of cost (J) w.r.t to outputs of last layer (before softmax) 135 | softmax_grad_a = softmax_theta.transpose().dot(indicator - prob) 136 | 137 | # Compute deltas 138 | delta = [-softmax_grad_a * sigmoid_prime(z[-1])] 139 | for i in reversed(range(len(stack))): 140 | d = stack[i]['w'].transpose().dot(delta[0]) * sigmoid_prime(z[i]) 141 | delta.insert(0, d) 142 | 143 | # Compute gradients 144 | stack_grad = [dict() for i in range(len(stack))] 145 | for i in range(len(stack_grad)): 146 | stack_grad[i]['w'] = delta[i + 1].dot(a[i].transpose()) / m 147 | stack_grad[i]['b'] = np.sum(delta[i + 1], axis=1) / m 148 | 149 | grad_params, net_config = stack2params(stack_grad) 150 | grad = np.concatenate((softmax_grad.flatten(), grad_params)) 151 | 152 | return cost, grad 153 | 154 | 155 | def stacked_autoencoder_predict(theta, input_size, hidden_size, num_classes, net_config, data): 156 | """ 157 | Takes a trained theta and a test data set, 158 | and returns the predicted labels for each example 159 | :param theta: trained weights from the autoencoder 160 | :param input_size: the number of input units 161 | :param hidden_size: the number of hidden units at the layer before softmax 162 | :param num_classes: the number of categories 163 | :param netconfig: network configuration of the stack 164 | :param data: the matrix containing the training data as columsn. data[:,i-1] is the i-th training example 165 | :return: 166 | 167 | Your code should produce the prediction matrix 168 | pred, where pred(i) is argmax_c P(y(c) | x(i)). 169 | """ 170 | 171 | ## Unroll theta parameter 172 | # We first extract the part which compute the softmax gradient 173 | softmax_theta = theta[0:hidden_size * num_classes].reshape(num_classes, hidden_size) 174 | 175 | # Extract out the "stack" 176 | stack = params2stack(theta[hidden_size * num_classes:], net_config) 177 | 178 | m = data.shape[1] 179 | 180 | # Compute predictions 181 | a = [data] 182 | z = [np.array(0)] # Dummy value 183 | 184 | # Sparse Autoencoder Computation 185 | for s in stack: 186 | z.append(s['w'].dot(a[-1]) + np.tile(s['b'], (m, 1)).transpose()) 187 | a.append(sigmoid(z[-1])) 188 | 189 | # Softmax 190 | pred = softmax.softmax_predict((softmax_theta, hidden_size, num_classes), a[-1]) 191 | 192 | return pred -------------------------------------------------------------------------------- /stl_exercise.py: -------------------------------------------------------------------------------- 1 | import load_MNIST 2 | import numpy as np 3 | import sparse_autoencoder 4 | import scipy.optimize 5 | import display_network 6 | import softmax 7 | 8 | ## ====================================================================== 9 | # STEP 0: Here we provide the relevant parameters values that will 10 | # allow your sparse autoencoder to get good filters; you do not need to 11 | # change the parameters below. 12 | 13 | input_size = 28 * 28 14 | num_labels = 5 15 | hidden_size = 196 16 | 17 | sparsity_param = 0.1 # desired average activation of the hidden units. 18 | lambda_ = 3e-3 # weight decay parameter 19 | beta = 3 # weight of sparsity penalty term 20 | 21 | ## ====================================================================== 22 | # STEP 1: Load data from the MNIST database 23 | # 24 | # This loads our training and test data from the MNIST database files. 25 | # We have sorted the data for you in this so that you will not have to 26 | # change it. 27 | 28 | images = load_MNIST.load_MNIST_images('data/mnist/train-images-idx3-ubyte') 29 | labels = load_MNIST.load_MNIST_labels('data/mnist/train-labels-idx1-ubyte') 30 | 31 | unlabeled_index = np.argwhere(labels >= 5).flatten() 32 | labeled_index = np.argwhere(labels < 5).flatten() 33 | 34 | num_train = round(labeled_index.shape[0] / 2) 35 | train_index = labeled_index[0:num_train] 36 | test_index = labeled_index[num_train:] 37 | 38 | unlabeled_data = images[:, unlabeled_index] 39 | 40 | train_data = images[:, train_index] 41 | train_labels = labels[train_index] 42 | 43 | test_data = images[:, test_index] 44 | test_labels = labels[test_index] 45 | 46 | print '# examples in unlabeled set: {0:d}\n'.format(unlabeled_data.shape[1]) 47 | print '# examples in supervised training set: {0:d}\n'.format(train_data.shape[1]) 48 | print '# examples in supervised testing set: {0:d}\n'.format(test_data.shape[1]) 49 | 50 | ## ====================================================================== 51 | # STEP 2: Train the sparse autoencoder 52 | # This trains the sparse autoencoder on the unlabeled training 53 | # images. 54 | 55 | # Randomly initialize the parameters 56 | theta = sparse_autoencoder.initialize(hidden_size, input_size) 57 | 58 | J = lambda x: sparse_autoencoder.sparse_autoencoder_cost(x, input_size, hidden_size, 59 | lambda_, sparsity_param, 60 | beta, unlabeled_data) 61 | 62 | options_ = {'maxiter': 400, 'disp': True} 63 | result = scipy.optimize.minimize(J, theta, method='L-BFGS-B', jac=True, options=options_) 64 | opt_theta = result.x 65 | 66 | print result 67 | 68 | # Visualize the weights 69 | W1 = opt_theta[0:hidden_size * input_size].reshape(hidden_size, input_size).transpose() 70 | display_network.display_network(W1) 71 | 72 | ##====================================================================== 73 | ## STEP 3: Extract Features from the Supervised Dataset 74 | # 75 | # You need to complete the code in feedForwardAutoencoder.m so that the 76 | # following command will extract features from the data. 77 | 78 | train_features = sparse_autoencoder.sparse_autoencoder(opt_theta, hidden_size, 79 | input_size, train_data) 80 | 81 | test_features = sparse_autoencoder.sparse_autoencoder(opt_theta, hidden_size, 82 | input_size, test_data) 83 | 84 | ##====================================================================== 85 | ## STEP 4: Train the softmax classifier 86 | 87 | lambda_ = 1e-4 88 | options_ = {'maxiter': 400, 'disp': True} 89 | 90 | opt_theta, input_size, num_classes = softmax.softmax_train(hidden_size, num_labels, 91 | lambda_, train_features, 92 | train_labels, options_) 93 | 94 | ##====================================================================== 95 | ## STEP 5: Testing 96 | 97 | predictions = softmax.softmax_predict((opt_theta, input_size, num_classes), test_features) 98 | print "Accuracy: {0:.2f}%".format(100 * np.sum(predictions == test_labels, dtype=np.float64) / test_labels.shape[0]) 99 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy.optimize 3 | import sample_images 4 | import sparse_autoencoder 5 | import gradient 6 | import display_network 7 | import load_MNIST 8 | 9 | 10 | ##====================================================================== 11 | ## STEP 0: Here we provide the relevant parameters values that will 12 | # allow your sparse autoencoder to get good filters; you do not need to 13 | # change the parameters below. 14 | 15 | # number of input units 16 | visible_size = 28 * 28 17 | # number of input units 18 | hidden_size = 196 19 | 20 | # desired average activation of the hidden units. 21 | # (This was denoted by the Greek alphabet rho, which looks like a lower-case "p", 22 | # in the lecture notes). 23 | sparsity_param = 0.1 24 | # weight decay parameter 25 | lambda_ = 3e-3 26 | # weight of sparsity penalty term 27 | beta = 3 28 | # debug 29 | debug = False 30 | 31 | 32 | ##====================================================================== 33 | ## STEP 1: Implement sampleIMAGES 34 | # 35 | # After implementing sampleIMAGES, the display_network command should 36 | # display a random sample of 200 patches from the dataset 37 | 38 | # Loading Sample Images 39 | # patches = sample_images.sample_images() 40 | 41 | # Loading 10K images from MNIST database 42 | images = load_MNIST.load_MNIST_images('data/mnist/train-images-idx3-ubyte') 43 | patches = images[:, 0:10000] 44 | 45 | # Obtain random parameters theta 46 | theta = sparse_autoencoder.initialize(hidden_size, visible_size) 47 | 48 | ##====================================================================== 49 | ## STEP 2: Implement sparseAutoencoderCost 50 | # 51 | # You can implement all of the components (squared error cost, weight decay term, 52 | # sparsity penalty) in the cost function at once, but it may be easier to do 53 | # it step-by-step and run gradient checking (see STEP 3) after each step. We 54 | # suggest implementing the sparseAutoencoderCost function using the following steps: 55 | # 56 | # (a) Implement forward propagation in your neural network, and implement the 57 | # squared error term of the cost function. Implement backpropagation to 58 | # compute the derivatives. Then (using lambda=beta=0), run Gradient Checking 59 | # to verify that the calculations corresponding to the squared error cost 60 | # term are correct. 61 | # 62 | # (b) Add in the weight decay term (in both the cost function and the derivative 63 | # calculations), then re-run Gradient Checking to verify correctness. 64 | # 65 | # (c) Add in the sparsity penalty term, then re-run Gradient Checking to 66 | # verify correctness. 67 | # 68 | # Feel free to change the training settings when debugging your 69 | # code. (For example, reducing the training set size or 70 | # number of hidden units may make your code run faster; and setting beta 71 | # and/or lambda to zero may be helpful for debugging.) However, in your 72 | # final submission of the visualized weights, please use parameters we 73 | # gave in Step 0 above. 74 | 75 | (cost, grad) = sparse_autoencoder.sparse_autoencoder_cost(theta, visible_size, 76 | hidden_size, lambda_, 77 | sparsity_param, beta, patches) 78 | print cost, grad 79 | ##====================================================================== 80 | ## STEP 3: Gradient Checking 81 | # 82 | # Hint: If you are debugging your code, performing gradient checking on smaller models 83 | # and smaller training sets (e.g., using only 10 training examples and 1-2 hidden 84 | # units) may speed things up. 85 | 86 | # First, lets make sure your numerical gradient computation is correct for a 87 | # simple function. After you have implemented computeNumericalGradient.m, 88 | # run the following: 89 | 90 | 91 | if debug: 92 | gradient.check_gradient() 93 | 94 | # Now we can use it to check your cost function and derivative calculations 95 | # for the sparse autoencoder. 96 | # J is the cost function 97 | 98 | J = lambda x: sparse_autoencoder.sparse_autoencoder_cost(x, visible_size, hidden_size, 99 | lambda_, sparsity_param, 100 | beta, patches) 101 | num_grad = gradient.compute_gradient(J, theta) 102 | 103 | # Use this to visually compare the gradients side by side 104 | print num_grad, grad 105 | 106 | # Compare numerically computed gradients with the ones obtained from backpropagation 107 | diff = np.linalg.norm(num_grad - grad) / np.linalg.norm(num_grad + grad) 108 | print diff 109 | print "Norm of the difference between numerical and analytical num_grad (should be < 1e-9)\n\n" 110 | 111 | ##====================================================================== 112 | ## STEP 4: After verifying that your implementation of 113 | # sparseAutoencoderCost is correct, You can start training your sparse 114 | # autoencoder with minFunc (L-BFGS). 115 | 116 | # Randomly initialize the parameters 117 | theta = sparse_autoencoder.initialize(hidden_size, visible_size) 118 | 119 | J = lambda x: sparse_autoencoder.sparse_autoencoder_cost(x, visible_size, hidden_size, 120 | lambda_, sparsity_param, 121 | beta, patches) 122 | options_ = {'maxiter': 400, 'disp': True} 123 | result = scipy.optimize.minimize(J, theta, method='L-BFGS-B', jac=True, options=options_) 124 | opt_theta = result.x 125 | 126 | print result 127 | 128 | ##====================================================================== 129 | ## STEP 5: Visualization 130 | 131 | W1 = opt_theta[0:hidden_size * visible_size].reshape(hidden_size, visible_size).transpose() 132 | display_network.display_network(W1) 133 | 134 | --------------------------------------------------------------------------------