├── LICENSE
├── README.md
├── cnn.py
├── cnn_exercise.py
├── display_network.py
├── gradient.py
├── linear_decoder_exercise.py
├── load_MNIST.py
├── load_images.py
├── output
    ├── patches_raw.png
    ├── patches_zca.png
    ├── patches_zca_features.png
    ├── pca.png
    ├── pca_tilde.png
    ├── pca_zcawhite.png
    ├── raw_pca.png
    ├── weights_sampledata.png
    ├── weights_selftaughtlearning.png
    └── weights_sparseAE.png
├── pca_gen.py
├── sample_images.py
├── softmax.py
├── softmax_exercise.py
├── sparse_autoencoder.py
├── stacked_ae_exercise.py
├── stacked_autoencoder.py
├── stl_exercise.py
└── train.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2014 Jatin Shah
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## Stanford Unsupervised Feature Learning and Deep Learning Tutorial
 2 | 
 3 | Tutorial Website: http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial
 4 | 
 5 | ### Sparse Autoencoder
 6 | Sparse Autoencoder vectorized implementation, learning/visualizing features on MNIST data
 7 | 
 8 | * [load_MNIST.py](load_MNIST.py): Load MNIST images
 9 | * [sample_images.py](sample_images.py): Load sample images for testing sparse auto-encoder
10 | * [gradient.py](gradient.py): Functions to compute & check cost and gradient
11 | * [display_network.py](display_network.py): Display visualized features
12 | * [sparse_autoencoder.py](sparse_autoencoder.py): Sparse autoencoder cost & gradient functions
13 | * [train.py](train.py): Train sparse autoencoder with MNIST data and visualize learnt featured
14 | 
15 | ### Preprocessing: PCA & Whitening
16 | Implement PCA, PCA whitening & ZCA whitening
17 | 
18 | * [pca_gen.py](pca_gen.py)
19 | 
20 | ### Softmax Regression
21 | Classify MNIST digits via softmax regression (multivariate logistic regression)
22 | 
23 | * [softmax.py](softmax.py): Softmax regression cost & gradient functions
24 | * [softmax_exercise](softmax_exercise.py): Classify MNIST digits
25 | 
26 | ### Self-Taught Learning and Unsupervised Feature Learning
27 | Classify MNIST digits via self-taught learning paradigm, i.e. learn features via sparse autoencoder using digits 5-9 as unlabelled examples and train softmax regression on digits 0-4 as labelled examples
28 | 
29 | * [stl_exercise.py](stl_exercise.py): Classify MNIST digits via self-taught learning
30 | 
31 | ### Building Deep Networks for Classification (Stacked Sparse Autoencoder)
32 | Stacked sparse autoencoder for MNIST digit classification
33 | 
34 | * [stacked_autoencoder.py](stacked_autoencoder.py): Stacked auto encoder cost & gradient functions
35 | * [stacked_ae_exercise.py](stacked_ae_exercise.py): Classify MNIST digits
36 | 
37 | ### Linear Decoders with Auto encoders
38 | Learn features on 8x8 patches of 96x96 STL-10 color images via linear decoder (sparse autoencoder with linear activation function in output layer)
39 | 
40 | * [linear_decoder_exercise.py](linear_decoder_exercise.py)
41 | 
42 | ### Working with Large Images (Convolutional Neural Networks)
43 | Classify 64x64 STL-10 images using features learnt via linear decoder (previous section) and convolutional neural networks
44 | 
45 | * [cnn.py](cnn.py): Convolution neural networks. Convolve & Pooling functions
46 | * [cnn_exercise.py](cnn_exercise.py): Classify STL-10 images
47 | 


--------------------------------------------------------------------------------
/cnn.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import scipy.signal
  3 | 
  4 | 
  5 | def sigmoid(x):
  6 |     return 1 / (1 + np.exp(-x))
  7 | 
  8 | 
  9 | def cnn_convolve(patch_dim, num_features, images, W, b, zca_white, patch_mean):
 10 |     """
 11 |     Returns the convolution of the features given by W and b with
 12 |     the given images
 13 |     :param patch_dim: patch (feature) dimension
 14 |     :param num_features: number of features
 15 |     :param images: large images to convolve with, matrix in the form
 16 |                    images(r, c, channel, image number)
 17 |     :param W: weights of the sparse autoencoder
 18 |     :param b: bias of the sparse autoencoder
 19 |     :param zca_white: zca whitening
 20 |     :param patch_mean: mean of the images
 21 |     :return:
 22 |     """
 23 | 
 24 |     num_images = images.shape[3]
 25 |     image_dim = images.shape[0]
 26 |     image_channels = images.shape[2]
 27 | 
 28 |     #  Instructions:
 29 |     #    Convolve every feature with every large image here to produce the
 30 |     #    numFeatures x numImages x (imageDim - patchDim + 1) x (imageDim - patchDim + 1)
 31 |     #    matrix convolvedFeatures, such that
 32 |     #    convolvedFeatures(featureNum, imageNum, imageRow, imageCol) is the
 33 |     #    value of the convolved featureNum feature for the imageNum image over
 34 |     #    the region (imageRow, imageCol) to (imageRow + patchDim - 1, imageCol + patchDim - 1)
 35 |     #
 36 |     #  Expected running times:
 37 |     #    Convolving with 100 images should take less than 3 minutes
 38 |     #    Convolving with 5000 images should take around an hour
 39 |     #    (So to save time when testing, you should convolve with less images, as
 40 |     #    described earlier)
 41 | 
 42 |     convolved_features = np.zeros(shape=(num_features, num_images, image_dim - patch_dim + 1,
 43 |                                          image_dim - patch_dim + 1),
 44 |                                   dtype=np.float64)
 45 | 
 46 |     WT = W.dot(zca_white)
 47 |     bT = b - WT.dot(patch_mean)
 48 | 
 49 |     for i in range(num_images):
 50 |         for j in range(num_features):
 51 |             # convolution of image with feature matrix for each channel
 52 |             convolved_image = np.zeros(shape=(image_dim - patch_dim + 1, image_dim - patch_dim + 1),
 53 |                                        dtype=np.float64)
 54 | 
 55 |             for channel in range(image_channels):
 56 |                 # Obtain the feature (patchDim x patchDim) needed during the convolution
 57 |                 patch_size = patch_dim * patch_dim
 58 |                 feature = WT[j, patch_size * channel:patch_size * (channel + 1)].reshape(patch_dim, patch_dim)
 59 | 
 60 |                 # Flip the feature matrix because of the definition of convolution, as explained later
 61 |                 feature = np.flipud(np.fliplr(feature))
 62 | 
 63 |                 # Obtain the image
 64 |                 im = images[:, :, channel, i]
 65 | 
 66 |                 # Convolve "feature" with "im", adding the result to convolvedImage
 67 |                 # be sure to do a 'valid' convolution
 68 |                 convolved_image += scipy.signal.convolve2d(im, feature, mode='valid')
 69 | 
 70 |             # Subtract the bias unit (correcting for the mean subtraction as well)
 71 |             # Then, apply the sigmoid function to get the hidden activation
 72 |             convolved_image = sigmoid(convolved_image + bT[j])
 73 | 
 74 |             # The convolved feature is the sum of the convolved values for all channels
 75 |             convolved_features[j, i, :, :] = convolved_image
 76 | 
 77 |     return convolved_features
 78 | 
 79 | 
 80 | def cnn_pool(pool_dim, convolved_features):
 81 |     """
 82 |     Pools the given convolved features
 83 | 
 84 |     :param pool_dim: dimension of the pooling region
 85 |     :param convolved_features: convolved features to pool (as given by cnn_convolve)
 86 |                                convolved_features(feature_num, image_num, image_row, image_col)
 87 |     :return: pooled_features: matrix of pooled features in the form
 88 |                               pooledFeatures(featureNum, imageNum, poolRow, poolCol)
 89 |     """
 90 | 
 91 |     num_images = convolved_features.shape[1]
 92 |     num_features = convolved_features.shape[0]
 93 |     convolved_dim = convolved_features.shape[2]
 94 | 
 95 |     assert convolved_dim % pool_dim == 0, "Pooling dimension is not an exact multiple of convolved dimension"
 96 | 
 97 |     pool_size = convolved_dim / pool_dim
 98 |     pooled_features = np.zeros(shape=(num_features, num_images, pool_size, pool_size),
 99 |                                dtype=np.float64)
100 | 
101 |     for i in range(pool_size):
102 |         for j in range(pool_size):
103 |             pool = convolved_features[:, :, i * pool_dim:(i + 1) * pool_dim, j * pool_dim:(j + 1) * pool_dim]
104 |             pooled_features[:, :, i, j] = np.mean(np.mean(pool, 2), 2)
105 | 
106 |     return pooled_features


--------------------------------------------------------------------------------
/cnn_exercise.py:
--------------------------------------------------------------------------------
  1 | import cPickle as pickle
  2 | import display_network
  3 | import numpy as np
  4 | import scipy.io
  5 | import cnn
  6 | import sparse_autoencoder
  7 | import sys
  8 | import time
  9 | import datetime
 10 | import softmax
 11 | 
 12 | ## CS294A/CS294W Convolutional Neural Networks Exercise
 13 | 
 14 | #  Instructions
 15 | #  ------------
 16 | #
 17 | #  This file contains code that helps you get started on the
 18 | #  convolutional neural networks exercise. In this exercise, you will only
 19 | #  need to modify cnnConvolve.m and cnnPool.m. You will not need to modify
 20 | #  this file.
 21 | 
 22 | ##======================================================================
 23 | ## STEP 0: Initialization
 24 | #  Here we initialize some parameters used for the exercise.
 25 | 
 26 | image_dim = 64  # image dimension
 27 | image_channels = 3  # number of channels (rgb, so 3)
 28 | 
 29 | patch_dim = 8  # patch dimension
 30 | num_patches = 50000  # number of patches
 31 | 
 32 | visible_size = patch_dim * patch_dim * image_channels  # number of input units
 33 | output_size = visible_size  # number of output units
 34 | hidden_size = 400  # number of hidden units
 35 | 
 36 | epsilon = 0.1  # epsilon for ZCA whitening
 37 | 
 38 | pool_dim = 19  # dimension of pooling region
 39 | 
 40 | ##======================================================================
 41 | ## STEP 1: Train a sparse autoencoder (with a linear decoder) to learn
 42 | #  features from color patches. If you have completed the linear decoder
 43 | #  execise, use the features that you have obtained from that exercise,
 44 | #  loading them into optTheta. Recall that we have to keep around the
 45 | #  parameters used in whitening (i.e., the ZCA whitening matrix and the
 46 | #  meanPatch)
 47 | with open('stl10_features.pickle', 'r') as f:
 48 |     opt_theta = pickle.load(f)
 49 |     zca_white = pickle.load(f)
 50 |     patch_mean = pickle.load(f)
 51 | 
 52 | # Display and check to see that the features look good
 53 | W = opt_theta[0:hidden_size * visible_size].reshape(hidden_size, visible_size)
 54 | b = opt_theta[2 * hidden_size * visible_size:2 * hidden_size * visible_size + hidden_size]
 55 | display_network.display_color_network(W.dot(zca_white).transpose(), 'zca_features_test.png')
 56 | 
 57 | 
 58 | ##======================================================================
 59 | ## STEP 2: Implement and test convolution and pooling
 60 | #  In this step, you will implement convolution and pooling, and test them
 61 | #  on a small part of the data set to ensure that you have implemented
 62 | #  these two functions correctly. In the next step, you will actually
 63 | #  convolve and pool the features with the STL10 images.
 64 | 
 65 | ## STEP 2a: Implement convolution
 66 | #  Implement convolution in the function cnnConvolve in cnnConvolve.m
 67 | 
 68 | # Note that we have to preprocess the images in the exact same way
 69 | # we preprocessed the patches before we can obtain the feature activations.
 70 | 
 71 | stl_train = scipy.io.loadmat('data/stlTrainSubset.mat')
 72 | train_images = stl_train['trainImages']
 73 | train_labels = stl_train['trainLabels']
 74 | num_train_images = stl_train['numTrainImages'][0][0]
 75 | 
 76 | ## Use only the first 8 images for testing
 77 | conv_images = train_images[:, :, :, 0:8]
 78 | 
 79 | convolved_features = cnn.cnn_convolve(patch_dim, hidden_size, conv_images,
 80 |                                       W, b, zca_white, patch_mean)
 81 | 
 82 | ## STEP 2b: Checking your convolution
 83 | #  To ensure that you have convolved the features correctly, we have
 84 | #  provided some code to compare the results of your convolution with
 85 | #  activations from the sparse autoencoder
 86 | 
 87 | # For 1000 random points
 88 | for i in range(1000):
 89 |     feature_num = np.random.randint(0, hidden_size)
 90 |     image_num = np.random.randint(0, 8)
 91 |     image_row = np.random.randint(0, image_dim - patch_dim + 1)
 92 |     image_col = np.random.randint(0, image_dim - patch_dim + 1)
 93 | 
 94 |     patch = conv_images[image_row:image_row + patch_dim, image_col:image_col + patch_dim, :, image_num]
 95 | 
 96 |     patch = np.concatenate((patch[:, :, 0].flatten(), patch[:, :, 1].flatten(), patch[:, :, 2].flatten()))
 97 |     patch = np.reshape(patch, (patch.size, 1))
 98 |     patch = patch - np.tile(patch_mean, (patch.shape[1], 1)).transpose()
 99 |     patch = zca_white.dot(patch)
100 | 
101 |     features = sparse_autoencoder.sparse_autoencoder(opt_theta, hidden_size, visible_size, patch)
102 | 
103 |     if abs(features[feature_num, 0] - convolved_features[feature_num, image_num, image_row, image_col]) > 1e-9:
104 |         print 'Convolved feature does not match activation from autoencoder'
105 |         print 'Feature Number      :', feature_num
106 |         print 'Image Number        :', image_num
107 |         print 'Image Row           :', image_row
108 |         print 'Image Column        :', image_col
109 |         print 'Convolved feature   :', convolved_features[feature_num, image_num, image_row, image_col]
110 |         print 'Sparse AE feature   :', features[feature_num, 0]
111 |         sys.exit("Convolved feature does not match activation from autoencoder. Exiting...")
112 | 
113 | print 'Congratulations! Your convolution code passed the test.'
114 | 
115 | ## STEP 2c: Implement pooling
116 | #  Implement pooling in the function cnnPool in cnnPool.m
117 | 
118 | # NOTE: Implement cnnPool in cnnPool.m first!
119 | 
120 | ## STEP 2d: Checking your pooling
121 | #  To ensure that you have implemented pooling, we will use your pooling
122 | #  function to pool over a test matrix and check the results.
123 | test_matrix = np.arange(64).reshape(8, 8)
124 | expected_matrix = np.array([[np.mean(test_matrix[0:4, 0:4]), np.mean(test_matrix[0:4, 4:8])],
125 |                             [np.mean(test_matrix[4:8, 0:4]), np.mean(test_matrix[4:8, 4:8])]])
126 | 
127 | test_matrix = np.reshape(test_matrix, (1, 1, 8, 8))
128 | 
129 | pooled_features = cnn.cnn_pool(4, test_matrix)
130 | 
131 | if not (pooled_features == expected_matrix).all():
132 |     print "Pooling incorrect"
133 |     print "Expected matrix"
134 |     print expected_matrix
135 |     print "Got"
136 |     print pooled_features
137 | 
138 | print 'Congratulations! Your pooling code passed the test.'
139 | 
140 | ##======================================================================
141 | ## STEP 3: Convolve and pool with the dataset
142 | #  In this step, you will convolve each of the features you learned with
143 | #  the full large images to obtain the convolved features. You will then
144 | #  pool the convolved features to obtain the pooled features for
145 | #  classification.
146 | #
147 | #  Because the convolved features matrix is very large, we will do the
148 | #  convolution and pooling 50 features at a time to avoid running out of
149 | #  memory. Reduce this number if necessary
150 | step_size = 25
151 | assert hidden_size % step_size == 0, "step_size should divide hidden_size"
152 | 
153 | stl_train = scipy.io.loadmat('data/stlTrainSubset.mat')
154 | train_images = stl_train['trainImages']
155 | train_labels = stl_train['trainLabels']
156 | num_train_images = stl_train['numTrainImages'][0][0]
157 | 
158 | stl_test = scipy.io.loadmat('data/stlTestSubset.mat')
159 | test_images = stl_test['testImages']
160 | test_labels = stl_test['testLabels']
161 | num_test_images = stl_test['numTestImages'][0][0]
162 | 
163 | pooled_features_train = np.zeros(shape=(hidden_size, num_train_images,
164 |                                         np.floor((image_dim - patch_dim + 1) / pool_dim),
165 |                                         np.floor((image_dim - patch_dim + 1) / pool_dim)),
166 |                                  dtype=np.float64)
167 | pooled_features_test = np.zeros(shape=(hidden_size, num_test_images,
168 |                                        np.floor((image_dim - patch_dim + 1) / pool_dim),
169 |                                        np.floor((image_dim - patch_dim + 1) / pool_dim)),
170 |                                 dtype=np.float64)
171 | 
172 | start_time = time.time()
173 | for conv_part in range(hidden_size / step_size):
174 |     features_start = conv_part * step_size
175 |     features_end = (conv_part + 1) * step_size
176 |     print "Step:", conv_part, "features", features_start, "to", features_end
177 | 
178 |     Wt = W[features_start:features_end, :]
179 |     bt = b[features_start:features_end]
180 | 
181 |     print "Convolving & pooling train images"
182 |     convolved_features = cnn.cnn_convolve(patch_dim, step_size, train_images,
183 |                                           Wt, bt, zca_white, patch_mean)
184 |     pooled_features = cnn.cnn_pool(pool_dim, convolved_features)
185 |     pooled_features_train[features_start:features_end, :, :, :] = pooled_features
186 | 
187 |     print "Time elapsed:", str(datetime.timedelta(seconds=time.time() - start_time))
188 | 
189 |     print "Convolving and pooling test images"
190 |     convolved_features = cnn.cnn_convolve(patch_dim, step_size, test_images,
191 |                                           Wt, bt, zca_white, patch_mean)
192 |     pooled_features = cnn.cnn_pool(pool_dim, convolved_features)
193 |     pooled_features_test[features_start:features_end, :, :, :] = pooled_features
194 | 
195 |     print "Time elapsed:", str(datetime.timedelta(seconds=time.time() - start_time))
196 | 
197 | print('Saving pooled features...')
198 | with open('cnn_pooled_features.pickle', 'wb') as f:
199 |     pickle.dump(pooled_features_train, f)
200 |     pickle.dump(pooled_features_test, f)
201 | 
202 | print "Saved"
203 | print "Time elapsed:", str(datetime.timedelta(seconds=time.time() - start_time))
204 | 
205 | ##======================================================================
206 | ## STEP 4: Use pooled features for classification
207 | #  Now, you will use your pooled features to train a softmax classifier,
208 | #  using softmaxTrain from the softmax exercise.
209 | #  Training the softmax classifer for 1000 iterations should take less than
210 | #  10 minutes.
211 | 
212 | # Load pooled features
213 | with open('cnn_pooled_features.pickle', 'r') as f:
214 |     pooled_features_train = pickle.load(f)
215 |     pooled_features_test = pickle.load(f)
216 | 
217 | # Setup parameters for softmax
218 | softmax_lambda = 1e-4
219 | num_classes = 4
220 | 
221 | # Reshape the pooled_features to form an input vector for softmax
222 | softmax_images = np.transpose(pooled_features_train, axes=[0, 2, 3, 1])
223 | softmax_images = softmax_images.reshape((softmax_images.size / num_train_images, num_train_images))
224 | softmax_labels = train_labels.flatten() - 1  # Ensure that labels are from 0..n-1 (for n classes)
225 | 
226 | options_ = {'maxiter': 1000, 'disp': True}
227 | softmax_model = softmax.softmax_train(softmax_images.size / num_train_images, num_classes,
228 |                                       softmax_lambda, softmax_images, softmax_labels, options_)
229 | 
230 | (softmax_opt_theta, softmax_input_size, softmax_num_classes) = softmax_model
231 | 
232 | 
233 | ##======================================================================
234 | ## STEP 5: Test classifer
235 | #  Now you will test your trained classifer against the test images
236 | softmax_images = np.transpose(pooled_features_test, axes=[0, 2, 3, 1])
237 | softmax_images = softmax_images.reshape((softmax_images.size / num_test_images, num_test_images))
238 | softmax_labels = test_labels.flatten() - 1
239 | 
240 | predictions = softmax.softmax_predict(softmax_model, softmax_images)
241 | print "Accuracy: {0:.2f}%".format(100 * np.sum(predictions == softmax_labels, dtype=np.float64) / test_labels.shape[0])
242 | 
243 | # You should expect to get an accuracy of around 80% on the test images.
244 | 


--------------------------------------------------------------------------------
/display_network.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import matplotlib.pyplot as plt
 3 | import matplotlib
 4 | import PIL
 5 | 
 6 | 
 7 | # This function visualizes filters in matrix A. Each column of A is a
 8 | # filter. We will reshape each column into a square image and visualizes
 9 | # on each cell of the visualization panel.
10 | # All other parameters are optional, usually you do not need to worry
11 | # about it.
12 | # opt_normalize: whether we need to normalize the filter so that all of
13 | # them can have similar contrast. Default value is true.
14 | # opt_graycolor: whether we use gray as the heat map. Default is true.
15 | # opt_colmajor: you can switch convention to row major for A. In that
16 | # case, each row of A is a filter. Default value is false.
17 | def display_network(A, filename='weights.png'):
18 |     opt_normalize = True
19 |     opt_graycolor = True
20 | 
21 |     # Rescale
22 |     A = A - np.average(A)
23 | 
24 |     # Compute rows & cols
25 |     (row, col) = A.shape
26 |     sz = int(np.ceil(np.sqrt(row)))
27 |     buf = 1
28 |     n = np.ceil(np.sqrt(col))
29 |     m = np.ceil(col / n)
30 | 
31 |     image = np.ones(shape=(buf + m * (sz + buf), buf + n * (sz + buf)))
32 | 
33 |     if not opt_graycolor:
34 |         image *= 0.1
35 | 
36 |     k = 0
37 |     for i in range(int(m)):
38 |         for j in range(int(n)):
39 |             if k >= col:
40 |                 continue
41 | 
42 |             clim = np.max(np.abs(A[:, k]))
43 | 
44 |             if opt_normalize:
45 |                 image[buf + i * (sz + buf):buf + i * (sz + buf) + sz, buf + j * (sz + buf):buf + j * (sz + buf) + sz] = \
46 |                     A[:, k].reshape(sz, sz) / clim
47 |             else:
48 |                 image[buf + i * (sz + buf):buf + i * (sz + buf) + sz, buf + j * (sz + buf):buf + j * (sz + buf) + sz] = \
49 |                     A[:, k].reshape(sz, sz) / np.max(np.abs(A))
50 |             k += 1
51 | 
52 |     plt.imsave(filename, image, cmap=matplotlib.cm.gray)
53 | 
54 | 
55 | def display_color_network(A, filename='weights.png'):
56 |     """
57 |     # display receptive field(s) or basis vector(s) for image patches
58 |     #
59 |     # A         the basis, with patches as column vectors
60 | 
61 |     # In case the midpoint is not set at 0, we shift it dynamically
62 | 
63 |     :param A:
64 |     :param file:
65 |     :return:
66 |     """
67 |     if np.min(A) >= 0:
68 |         A = A - np.mean(A)
69 | 
70 |     cols = np.round(np.sqrt(A.shape[1]))
71 | 
72 |     channel_size = A.shape[0] / 3
73 |     dim = np.sqrt(channel_size)
74 |     dimp = dim + 1
75 |     rows = np.ceil(A.shape[1] / cols)
76 | 
77 |     B = A[0:channel_size, :]
78 |     C = A[channel_size:2 * channel_size, :]
79 |     D = A[2 * channel_size:3 * channel_size, :]
80 | 
81 |     B = B / np.max(np.abs(B))
82 |     C = C / np.max(np.abs(C))
83 |     D = D / np.max(np.abs(D))
84 | 
85 |     # Initialization of the image
86 |     image = np.ones(shape=(dim * rows + rows - 1, dim * cols + cols - 1, 3))
87 | 
88 |     for i in range(int(rows)):
89 |         for j in range(int(cols)):
90 |             # This sets the patch
91 |             image[i * dimp:i * dimp + dim, j * dimp:j * dimp + dim, 0] = B[:, i * cols + j].reshape(dim, dim)
92 |             image[i * dimp:i * dimp + dim, j * dimp:j * dimp + dim, 1] = C[:, i * cols + j].reshape(dim, dim)
93 |             image[i * dimp:i * dimp + dim, j * dimp:j * dimp + dim, 2] = D[:, i * cols + j].reshape(dim, dim)
94 | 
95 |     image = (image + 1) / 2
96 | 
97 |     PIL.Image.fromarray(np.uint8(image * 255), 'RGB').save(filename)
98 | 
99 |     return 0


--------------------------------------------------------------------------------
/gradient.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import stacked_autoencoder
  3 | 
  4 | 
  5 | # this function accepts a 2D vector as input.
  6 | # Its outputs are:
  7 | #   value: h(x1, x2) = x1^2 + 3*x1*x2
  8 | #   grad: A 2x1 vector that gives the partial derivatives of h with respect to x1 and x2
  9 | # Note that when we pass @simpleQuadraticFunction(x) to computeNumericalGradients, we're assuming
 10 | # that computeNumericalGradients will use only the first returned value of this function.
 11 | def simple_quadratic_function(x):
 12 |     value = x[0] ** 2 + 3 * x[0] * x[1]
 13 | 
 14 |     grad = np.zeros(shape=2, dtype=np.float32)
 15 |     grad[0] = 2 * x[0] + 3 * x[1]
 16 |     grad[1] = 3 * x[0]
 17 | 
 18 |     return value, grad
 19 | 
 20 | 
 21 | # theta: a vector of parameters
 22 | # J: a function that outputs a real-number. Calling y = J(theta) will return the
 23 | # function value at theta.
 24 | def compute_gradient(J, theta):
 25 |     epsilon = 0.0001
 26 | 
 27 |     gradient = np.zeros(theta.shape)
 28 |     for i in range(theta.shape[0]):
 29 |         theta_epsilon_plus = np.array(theta, dtype=np.float64)
 30 |         theta_epsilon_plus[i] = theta[i] + epsilon
 31 |         theta_epsilon_minus = np.array(theta, dtype=np.float64)
 32 |         theta_epsilon_minus[i] = theta[i] - epsilon
 33 | 
 34 |         gradient[i] = (J(theta_epsilon_plus)[0] - J(theta_epsilon_minus)[0]) / (2 * epsilon)
 35 |         if i % 100 == 0:
 36 |             print "Computing gradient for input:", i
 37 | 
 38 |     return gradient
 39 | 
 40 | 
 41 | # This code can be used to check your numerical gradient implementation
 42 | # in computeNumericalGradient.m
 43 | # It analytically evaluates the gradient of a very simple function called
 44 | # simpleQuadraticFunction (see below) and compares the result with your numerical
 45 | # solution. Your numerical gradient implementation is incorrect if
 46 | # your numerical solution deviates too much from the analytical solution.
 47 | def check_gradient():
 48 |     x = np.array([4, 10], dtype=np.float64)
 49 |     (value, grad) = simple_quadratic_function(x)
 50 | 
 51 |     num_grad = compute_gradient(simple_quadratic_function, x)
 52 |     print num_grad, grad
 53 |     print "The above two columns you get should be very similar.\n" \
 54 |           "(Left-Your Numerical Gradient, Right-Analytical Gradient)\n"
 55 | 
 56 |     diff = np.linalg.norm(num_grad - grad) / np.linalg.norm(num_grad + grad)
 57 |     print diff
 58 |     print "Norm of the difference between numerical and analytical num_grad (should be < 1e-9)\n"
 59 | 
 60 | 
 61 | def check_stacked_autoencoder():
 62 |     """
 63 |     # Check the gradients for the stacked autoencoder
 64 |     #
 65 |     # In general, we recommend that the creation of such files for checking
 66 |     # gradients when you write new cost functions.
 67 |     #
 68 | 
 69 |     :return:
 70 |     """
 71 |     ## Setup random data / small model
 72 | 
 73 |     input_size = 64
 74 |     hidden_size_L1 = 36
 75 |     hidden_size_L2 = 25
 76 |     lambda_ = 0.01
 77 |     data = np.random.randn(input_size, 10)
 78 |     labels = np.random.randint(4, size=10)
 79 |     num_classes = 4
 80 | 
 81 |     stack = [dict() for i in range(2)]
 82 |     stack[0]['w'] = 0.1 * np.random.randn(hidden_size_L1, input_size)
 83 |     stack[0]['b'] = np.random.randn(hidden_size_L1)
 84 |     stack[1]['w'] = 0.1 * np.random.randn(hidden_size_L2, hidden_size_L1)
 85 |     stack[1]['b'] = np.random.randn(hidden_size_L2)
 86 |     softmax_theta = 0.005 * np.random.randn(hidden_size_L2 * num_classes)
 87 | 
 88 |     params, net_config = stacked_autoencoder.stack2params(stack)
 89 | 
 90 |     stacked_theta = np.concatenate((softmax_theta, params))
 91 | 
 92 |     cost, grad = stacked_autoencoder.stacked_autoencoder_cost(stacked_theta, input_size,
 93 |                                                               hidden_size_L2, num_classes,
 94 |                                                               net_config, lambda_, data, labels)
 95 | 
 96 |     # Check that the numerical and analytic gradients are the same
 97 |     J = lambda x: stacked_autoencoder.stacked_autoencoder_cost(x, input_size, hidden_size_L2,
 98 |                                                                num_classes, net_config, lambda_,
 99 |                                                                data, labels)
100 |     num_grad = compute_gradient(J, stacked_theta)
101 | 
102 |     print num_grad, grad
103 |     print "The above two columns you get should be very similar.\n" \
104 |           "(Left-Your Numerical Gradient, Right-Analytical Gradient)\n"
105 | 
106 |     diff = np.linalg.norm(num_grad - grad) / np.linalg.norm(num_grad + grad)
107 |     print diff
108 |     print "Norm of the difference between numerical and analytical num_grad (should be < 1e-9)\n"


--------------------------------------------------------------------------------
/linear_decoder_exercise.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import sparse_autoencoder
  3 | import gradient
  4 | import scipy.io
  5 | import display_network
  6 | import scipy.optimize
  7 | import cPickle
  8 | 
  9 | ##======================================================================
 10 | ## STEP 0: Initialization
 11 | #  Here we initialize some parameters used for the exercise.
 12 | 
 13 | image_channels = 3  # number of channels (rgb, so 3)
 14 | 
 15 | patch_dim = 8  # patch dimension
 16 | num_patches = 100000  # number of patches
 17 | 
 18 | visible_size = patch_dim * patch_dim * image_channels  # number of input units
 19 | output_size = visible_size  # number of output units
 20 | hidden_size = 400  # number of hidden units
 21 | 
 22 | sparsity_param = 0.035  # desired average activation of the hidden units.
 23 | lambda_ = 3e-3  # weight decay parameter
 24 | beta = 5  # weight of sparsity penalty term
 25 | 
 26 | epsilon = 0.1  # epsilon for ZCA whitening
 27 | 
 28 | ##======================================================================
 29 | ## STEP 1: Create and modify sparseAutoencoderLinearCost.m to use a linear decoder,
 30 | #          and check gradients
 31 | #  You should copy sparseAutoencoderCost.m from your earlier exercise
 32 | #  and rename it to sparseAutoencoderLinearCost.m.
 33 | #  Then you need to rename the function from sparseAutoencoderCost to
 34 | #  sparseAutoencoderLinearCost, and modify it so that the sparse autoencoder
 35 | #  uses a linear decoder instead. Once that is done, you should check
 36 | # your gradients to verify that they are correct.
 37 | 
 38 | # NOTE: Modify sparseAutoencoderCost first!
 39 | 
 40 | # To speed up gradient checking, we will use a reduced network and some
 41 | # dummy patches
 42 | 
 43 | debug_hidden_size = 5
 44 | debug_visible_size = 8
 45 | patches = np.random.rand(8, 10)
 46 | 
 47 | theta = sparse_autoencoder.initialize(debug_hidden_size, debug_visible_size)
 48 | 
 49 | cost, grad = sparse_autoencoder.sparse_autoencoder_linear_cost(theta, debug_visible_size, debug_hidden_size,
 50 |                                                                lambda_, sparsity_param, beta, patches)
 51 | 
 52 | # Check gradients
 53 | J = lambda x: sparse_autoencoder.sparse_autoencoder_linear_cost(x, debug_visible_size, debug_hidden_size,
 54 |                                                                 lambda_, sparsity_param, beta, patches)
 55 | num_grad = gradient.compute_gradient(J, theta)
 56 | 
 57 | print grad, num_grad
 58 | 
 59 | # Compare numerically computed gradients with the ones obtained from backpropagation
 60 | diff = np.linalg.norm(num_grad - grad) / np.linalg.norm(num_grad + grad)
 61 | print diff
 62 | print "Norm of the difference between numerical and analytical num_grad (should be < 1e-9)\n\n"
 63 | 
 64 | ##======================================================================
 65 | ## STEP 2: Learn features on small patches
 66 | #  In this step, you will use your sparse autoencoder (which now uses a
 67 | #  linear decoder) to learn features on small patches sampled from related
 68 | #  images.
 69 | 
 70 | ## STEP 2a: Load patches
 71 | #  In this step, we load 100k patches sampled from the STL10 dataset and
 72 | #  visualize them. Note that these patches have been scaled to [0,1]
 73 | 
 74 | patches = scipy.io.loadmat('data/stlSampledPatches.mat')['patches']
 75 | 
 76 | display_network.display_color_network(patches[:, 0:100], filename='patches_raw.png')
 77 | 
 78 | 
 79 | ## STEP 2b: Apply preprocessing
 80 | #  In this sub-step, we preprocess the sampled patches, in particular,
 81 | #  ZCA whitening them.
 82 | #
 83 | #  In a later exercise on convolution and pooling, you will need to replicate
 84 | #  exactly the preprocessing steps you apply to these patches before
 85 | #  using the autoencoder to learn features on them. Hence, we will save the
 86 | #  ZCA whitening and mean image matrices together with the learned features
 87 | #  later on.
 88 | 
 89 | # Subtract mean patch (hence zeroing the mean of the patches)
 90 | patch_mean = np.mean(patches, 1)
 91 | patches = patches - np.tile(patch_mean, (patches.shape[1], 1)).transpose()
 92 | 
 93 | # Apply ZCA whitening
 94 | sigma = patches.dot(patches.transpose()) / patches.shape[1]
 95 | (u, s, v) = np.linalg.svd(sigma)
 96 | zca_white = u.dot(np.diag(1 / (s + epsilon))).dot(u.transpose())
 97 | patches_zca = zca_white.dot(patches)
 98 | 
 99 | display_network.display_color_network(patches_zca[:, 0:100], filename='patches_zca.png')
100 | 
101 | ## STEP 2c: Learn features
102 | #  You will now use your sparse autoencoder (with linear decoder) to learn
103 | #  features on the preprocessed patches. This should take around 45 minutes.
104 | 
105 | theta = sparse_autoencoder.initialize(hidden_size, visible_size)
106 | 
107 | options_ = {'maxiter': 400, 'disp': True}
108 | 
109 | J = lambda x: sparse_autoencoder.sparse_autoencoder_linear_cost(x, visible_size, hidden_size,
110 |                                                                 lambda_, sparsity_param, beta, patches_zca)
111 | 
112 | result = scipy.optimize.minimize(J, theta, method='L-BFGS-B', jac=True, options=options_)
113 | opt_theta = result.x
114 | print result
115 | 
116 | # Save the learned features and the preprocessing matrices for use in
117 | # the later exercise on convolution and pooling
118 | print('Saving learned features and preprocessing matrices...')
119 | with open('stl10_features.pickle', 'wb') as f:
120 |     cPickle.dump(opt_theta, f)
121 |     cPickle.dump(zca_white, f)
122 |     cPickle.dump(patch_mean, f)
123 | print('Saved.')
124 | 
125 | ## STEP 2d: Visualize learned features
126 | W = opt_theta[0:hidden_size * visible_size].reshape(hidden_size, visible_size)
127 | b = opt_theta[2 * hidden_size * visible_size:2 * hidden_size * visible_size + hidden_size]
128 | display_network.display_color_network(W.dot(zca_white).transpose(), 'patches_zca_features.png')


--------------------------------------------------------------------------------
/load_MNIST.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | def load_MNIST_images(filename):
 5 |     """
 6 |     returns a 28x28x[number of MNIST images] matrix containing
 7 |     the raw MNIST images
 8 |     :param filename: input data file
 9 |     """
10 |     with open(filename, "r") as f:
11 |         magic = np.fromfile(f, dtype=np.dtype('>i4'), count=1)
12 | 
13 |         num_images = np.fromfile(f, dtype=np.dtype('>i4'), count=1)
14 |         num_rows = np.fromfile(f, dtype=np.dtype('>i4'), count=1)
15 |         num_cols = np.fromfile(f, dtype=np.dtype('>i4'), count=1)
16 | 
17 |         images = np.fromfile(f, dtype=np.ubyte)
18 |         images = images.reshape((num_images, num_rows * num_cols)).transpose()
19 |         images = images.astype(np.float64) / 255
20 | 
21 |         f.close()
22 | 
23 |         return images
24 | 
25 | 
26 | def load_MNIST_labels(filename):
27 |     """
28 |     returns a [number of MNIST images]x1 matrix containing
29 |     the labels for the MNIST images
30 | 
31 |     :param filename: input file with labels
32 |     """
33 |     with open(filename, 'r') as f:
34 |         magic = np.fromfile(f, dtype=np.dtype('>i4'), count=1)
35 | 
36 |         num_labels = np.fromfile(f, dtype=np.dtype('>i4'), count=1)
37 | 
38 |         labels = np.fromfile(f, dtype=np.ubyte)
39 | 
40 |         f.close()
41 | 
42 |         return labels


--------------------------------------------------------------------------------
/load_images.py:
--------------------------------------------------------------------------------
 1 | import cPickle
 2 | import numpy as np
 3 | 
 4 | 
 5 | def unpickle(file_name):
 6 |     fo = open(file_name, 'rb')
 7 |     image_dict = cPickle.load(fo)
 8 |     fo.close()
 9 |     return image_dict
10 | 
11 | 
12 | # Each column contains grayscale value for the image
13 | # Squash data to [0.1, 0.9]
14 | def normalize_data(images):
15 |     # Subtract mean of each image from its individual values
16 |     mean = images.mean(axis=0)
17 |     images = images - mean
18 | 
19 |     # Truncate to +/- 3 standard deviations and scale to -1 and +1
20 |     pstd = 3 * images.std()
21 |     images = np.maximum(np.minimum(images, pstd), -pstd) / pstd
22 | 
23 |     # Rescale from [-1,+1] to [0.1,0.9]
24 |     images = (1 + images) * 0.4 + 0.1
25 | 
26 |     return images
27 | 
28 | 
29 | # Convert RGB values to monochrome
30 | def monochrome(r, g, b):
31 |     return (0.2125 * r) + (0.7154 * g) + (0.0721 * b)
32 | 
33 | 
34 | # Returns 10000 gray scale images for training from CIFAR-10 data
35 | def load_images():
36 |     image_size = 32
37 |     num_images = 10000
38 |     image_file = 'data/cifar10/data_batch_1'
39 | 
40 |     # Load Images & select first num_images images
41 |     image_dict = unpickle(image_file)
42 |     image_data = image_dict['data'][0:num_images]
43 | 
44 |     # Convert to grayscale & normalize
45 |     red_data = image_data[:, 0:image_size * image_size]
46 |     green_data = image_data[:, image_size * image_size:2 * image_size * image_size]
47 |     blue_data = image_data[:, 2 * image_size * image_size:3 * image_size * image_size]
48 | 
49 |     grayscale_data = monochrome(red_data, green_data, blue_data)
50 |     grayscale_data = normalize_data(grayscale_data.transpose())
51 | 
52 |     return normalize_data(grayscale_data)


--------------------------------------------------------------------------------
/output/patches_raw.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/patches_raw.png


--------------------------------------------------------------------------------
/output/patches_zca.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/patches_zca.png


--------------------------------------------------------------------------------
/output/patches_zca_features.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/patches_zca_features.png


--------------------------------------------------------------------------------
/output/pca.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/pca.png


--------------------------------------------------------------------------------
/output/pca_tilde.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/pca_tilde.png


--------------------------------------------------------------------------------
/output/pca_zcawhite.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/pca_zcawhite.png


--------------------------------------------------------------------------------
/output/raw_pca.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/raw_pca.png


--------------------------------------------------------------------------------
/output/weights_sampledata.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/weights_sampledata.png


--------------------------------------------------------------------------------
/output/weights_selftaughtlearning.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/weights_selftaughtlearning.png


--------------------------------------------------------------------------------
/output/weights_sparseAE.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jatinshah/ufldl_tutorial/8e4a6724342dfd4cd3a8211f60b6ef9a4137e5ce/output/weights_sparseAE.png


--------------------------------------------------------------------------------
/pca_gen.py:
--------------------------------------------------------------------------------
 1 | import sample_images
 2 | import random
 3 | import display_network
 4 | import numpy as np
 5 | 
 6 | 
 7 | ##================================================================
 8 | ## Step 0a: Load data
 9 | #  Here we provide the code to load natural image data into x.
10 | #  x will be a 144 * 10000 matrix, where the kth column x(:, k) corresponds to
11 | #  the raw image data from the kth 12x12 image patch sampled.
12 | #  You do not need to change the code below.
13 | 
14 | patches = sample_images.sample_images_raw()
15 | num_samples = patches.shape[1]
16 | random_sel = random.sample(range(num_samples), 400)
17 | display_network.display_network(patches[:, random_sel], 'raw_pca.png')
18 | 
19 | ##================================================================
20 | ## Step 0b: Zero-mean the data (by row)
21 | #  You can make use of the mean and repmat/bsxfun functions.
22 | 
23 | # patches = patches - patches.mean(axis=0)
24 | patch_mean = patches.mean(axis=1)
25 | patches = patches - np.tile(patch_mean, (patches.shape[1], 1)).transpose()
26 | 
27 | ##================================================================
28 | ## Step 1a: Implement PCA to obtain xRot
29 | #  Implement PCA to obtain xRot, the matrix in which the data is expressed
30 | #  with respect to the eigenbasis of sigma, which is the matrix U.
31 | 
32 | sigma = patches.dot(patches.transpose()) / patches.shape[1]
33 | (u, s, v) = np.linalg.svd(sigma)
34 | 
35 | patches_rot = u.transpose().dot(patches)
36 | 
37 | ##================================================================
38 | ## Step 2: Find k, the number of components to retain
39 | #  Write code to determine k, the number of components to retain in order
40 | #  to retain at least 99% of the variance.
41 | 
42 | k = 0
43 | for k in range(s.shape[0]):
44 |     if s[0:k].sum() / s.sum() >= 0.99:
45 |         break
46 | print 'Optimal k to retain 99% variance is:', k
47 | 
48 | ##================================================================
49 | ## Step 3: Implement PCA with dimension reduction
50 | #  Now that you have found k, you can reduce the dimension of the data by
51 | #  discarding the remaining dimensions. In this way, you can represent the
52 | #  data in k dimensions instead of the original 144, which will save you
53 | #  computational time when running learning algorithms on the reduced
54 | #  representation.
55 | # 
56 | #  Following the dimension reduction, invert the PCA transformation to produce 
57 | #  the matrix xHat, the dimension-reduced data with respect to the original basis.
58 | #  Visualise the data and compare it to the raw data. You will observe that
59 | #  there is little loss due to throwing away the principal components that
60 | #  correspond to dimensions with low variation.
61 | 
62 | patches_tilde = u[:, 0:k].transpose().dot(patches)
63 | patches_hat = u.dot(np.resize(patches_tilde, patches.shape))
64 | 
65 | display_network.display_network(patches_hat[:, random_sel], 'pca_tilde.png')
66 | display_network.display_network(patches[:, random_sel], 'pca.png')
67 | 
68 | ##================================================================
69 | ## Step 4a: Implement PCA with whitening and regularisation
70 | #  Implement PCA with whitening and regularisation to produce the matrix
71 | #  xPCAWhite.
72 | 
73 | epsilon = 0.1
74 | patches_pcawhite = np.diag(1 / (s + epsilon)).dot(patches_rot)
75 | 
76 | 
77 | ##================================================================
78 | ## Step 5: Implement ZCA whitening
79 | #  Now implement ZCA whitening to produce the matrix xZCAWhite.
80 | #  Visualise the data and compare it to the raw data. You should observe
81 | #  that whitening results in, among other things, enhanced edges.
82 | 
83 | patches_zcawhite = u.dot(patches_pcawhite)
84 | display_network.display_network(patches_zcawhite[:, random_sel], 'pca_zcawhite.png')
85 | 


--------------------------------------------------------------------------------
/sample_images.py:
--------------------------------------------------------------------------------
 1 | import random
 2 | 
 3 | import numpy as np
 4 | import scipy.io
 5 | 
 6 | 
 7 | # Returns 10000 image patches for training
 8 | # Each column contains grayscale value for the image
 9 | # Squash data to [0.1, 0.9]
10 | def normalize_data(images):
11 |     # Subtract mean of each image from its individual values
12 |     mean = images.mean(axis=0)
13 |     images = images - mean
14 | 
15 |     # Truncate to +/- 3 standard deviations and scale to -1 and +1
16 |     pstd = 3 * images.std()
17 |     images = np.maximum(np.minimum(images, pstd), -pstd) / pstd
18 | 
19 |     # Rescale from [-1,+1] to [0.1,0.9]
20 |     images = (1 + images) * 0.4 + 0.1
21 | 
22 |     return images
23 | 
24 | 
25 | # Returns 10000 patches for training
26 | #  IMAGES is a 3D array containing 10 images
27 | #  For instance, IMAGES(:,:,6) is a 512x512 array containing the 6th image,
28 | #  (The contrast on these images look a bit off because they have
29 | #  been preprocessed using using "whitening."  See the lecture notes for
30 | #  more details.) As a second example, IMAGES(21:30,21:30,1) is an image
31 | #  patch corresponding to the pixels in the block (21,21) to (30,30) of
32 | #  Image 1
33 | def sample_images():
34 |     patch_size = 8
35 |     num_patches = 10000
36 |     num_images = 10
37 |     image_size = 512
38 | 
39 |     image_data = scipy.io.loadmat('data/IMAGES.mat')['IMAGES']
40 | 
41 |     # Initialize patches with zeros.
42 |     patches = np.zeros(shape=(patch_size * patch_size, num_patches))
43 | 
44 |     for i in range(num_patches):
45 |         image_id = random.randint(0, num_images - 1)
46 |         image_x = random.randint(0, image_size - patch_size)
47 |         image_y = random.randint(0, image_size - patch_size)
48 | 
49 |         img = image_data[:, :, image_id]
50 |         patch = img[image_x:image_x + patch_size, image_y:image_y + patch_size].reshape(patch_size * patch_size)
51 |         patches[:, i] = patch
52 | 
53 |     return normalize_data(patches)
54 | 
55 | 
56 | # sampleIMAGESRAW
57 | # Returns 10000 "raw" unwhitened  patches
58 | def sample_images_raw():
59 |     image_data = scipy.io.loadmat('data/IMAGES_RAW.mat')['IMAGESr']
60 | 
61 |     patch_size = 12
62 |     num_patches = 10000
63 |     num_images = image_data.shape[2]
64 |     image_size = image_data.shape[0]
65 | 
66 |     patches = np.zeros(shape=(patch_size * patch_size, num_patches))
67 | 
68 |     for i in range(num_patches):
69 |         image_id = random.randint(0, num_images - 1)
70 |         image_x = random.randint(0, image_size - patch_size)
71 |         image_y = random.randint(0, image_size - patch_size)
72 | 
73 |         img = image_data[:, :, image_id]
74 |         patch = img[image_x:image_x + patch_size, image_y:image_y + patch_size].reshape(patch_size * patch_size)
75 |         patches[:, i] = patch
76 | 
77 |     return patches


--------------------------------------------------------------------------------
/softmax.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import scipy.sparse
 3 | import scipy.optimize
 4 | 
 5 | 
 6 | def softmax_cost(theta, num_classes, input_size, lambda_, data, labels):
 7 |     """
 8 | 
 9 |     :param theta:
10 |     :param num_classes: the number of classes
11 |     :param input_size: the size N of input vector
12 |     :param lambda_: weight decay parameter
13 |     :param data: the N x M input matrix, where each column corresponds
14 |                  a single test set
15 |     :param labels: an M x 1 matrix containing the labels for the input data
16 |     """
17 |     m = data.shape[1]
18 |     theta = theta.reshape(num_classes, input_size)
19 |     theta_data = theta.dot(data)
20 |     theta_data = theta_data - np.max(theta_data)
21 |     prob_data = np.exp(theta_data) / np.sum(np.exp(theta_data), axis=0)
22 |     indicator = scipy.sparse.csr_matrix((np.ones(m), (labels, np.array(range(m)))))
23 |     indicator = np.array(indicator.todense())
24 |     cost = (-1 / m) * np.sum(indicator * np.log(prob_data)) + (lambda_ / 2) * np.sum(theta * theta)
25 | 
26 |     grad = (-1 / m) * (indicator - prob_data).dot(data.transpose()) + lambda_ * theta
27 | 
28 |     return cost, grad.flatten()
29 | 
30 | 
31 | def softmax_predict(model, data):
32 |     # model - model trained using softmaxTrain
33 |     # data - the N x M input matrix, where each column data(:, i) corresponds to
34 |     #        a single test set
35 |     #
36 |     # Your code should produce the prediction matrix
37 |     # pred, where pred(i) is argmax_c P(y(c) | x(i)).
38 | 
39 |     opt_theta, input_size, num_classes = model
40 |     opt_theta = opt_theta.reshape(num_classes, input_size)
41 | 
42 |     prod = opt_theta.dot(data)
43 |     pred = np.exp(prod) / np.sum(np.exp(prod), axis=0)
44 |     pred = pred.argmax(axis=0)
45 | 
46 |     return pred
47 | 
48 | 
49 | def softmax_train(input_size, num_classes, lambda_, data, labels, options={'maxiter': 400, 'disp': True}):
50 |     #softmaxTrain Train a softmax model with the given parameters on the given
51 |     # data. Returns softmaxOptTheta, a vector containing the trained parameters
52 |     # for the model.
53 |     #
54 |     # input_size: the size of an input vector x^(i)
55 |     # num_classes: the number of classes
56 |     # lambda_: weight decay parameter
57 |     # input_data: an N by M matrix containing the input data, such that
58 |     #            inputData(:, c) is the cth input
59 |     # labels: M by 1 matrix containing the class labels for the
60 |     #            corresponding inputs. labels(c) is the class label for
61 |     #            the cth input
62 |     # options (optional): options
63 |     #   options.maxIter: number of iterations to train for
64 | 
65 |     # Initialize theta randomly
66 |     theta = 0.005 * np.random.randn(num_classes * input_size)
67 | 
68 |     J = lambda x: softmax_cost(x, num_classes, input_size, lambda_, data, labels)
69 | 
70 |     result = scipy.optimize.minimize(J, theta, method='L-BFGS-B', jac=True, options=options)
71 | 
72 |     print result
73 |     # Return optimum theta, input size & num classes
74 |     opt_theta = result.x
75 | 
76 |     return opt_theta, input_size, num_classes
77 | 
78 | 


--------------------------------------------------------------------------------
/softmax_exercise.py:
--------------------------------------------------------------------------------
 1 | import load_MNIST
 2 | import numpy as np
 3 | import softmax
 4 | import gradient
 5 | 
 6 | ##======================================================================
 7 | ## STEP 0: Initialise constants and parameters
 8 | #
 9 | #  Here we define and initialise some constants which allow your code
10 | #  to be used more generally on any arbitrary input.
11 | #  We also initialise some parameters used for tuning the model.
12 | 
13 | # Size of input vector (MNIST images are 28x28)
14 | input_size = 28 * 28
15 | # Number of classes (MNIST images fall into 10 classes)
16 | num_classes = 10
17 | # Weight decay parameter
18 | lambda_ = 1e-4
19 | # Debug
20 | debug = False
21 | 
22 | ##======================================================================
23 | ## STEP 1: Load data
24 | #
25 | #  In this section, we load the input and output data.
26 | #  For softmax regression on MNIST pixels,
27 | #  the input data is the images, and
28 | #  the output data is the labels.
29 | #
30 | 
31 | # Change the filenames if you've saved the files under different names
32 | # On some platforms, the files might be saved as
33 | # train-images.idx3-ubyte / train-labels.idx1-ubyte
34 | 
35 | images = load_MNIST.load_MNIST_images('data/mnist/train-images-idx3-ubyte')
36 | labels = load_MNIST.load_MNIST_labels('data/mnist/train-labels-idx1-ubyte')
37 | 
38 | if debug:
39 |     input_size = 8 * 8
40 |     input_data = np.random.randn(input_size, 100)
41 |     labels = np.random.randint(num_classes, size=100)
42 | else:
43 |     input_size = 28 * 28
44 |     input_data = images
45 | 
46 | # Randomly initialise theta
47 | theta = 0.005 * np.random.randn(num_classes * input_size)
48 | 
49 | 
50 | ##======================================================================
51 | ## STEP 2: Implement softmaxCost
52 | #
53 | #  Implement softmaxCost in softmaxCost.m.
54 | 
55 | (cost, grad) = softmax.softmax_cost(theta, num_classes, input_size, lambda_, input_data, labels)
56 | 
57 | ##======================================================================
58 | ## STEP 3: Gradient checking
59 | #
60 | #  As with any learning algorithm, you should always check that your
61 | #  gradients are correct before learning the parameters.
62 | #
63 | if debug:
64 |     J = lambda x: softmax.softmax_cost(x, num_classes, input_size, lambda_, input_data, labels)
65 | 
66 |     num_grad = gradient.compute_gradient(J, theta)
67 | 
68 |     # Use this to visually compare the gradients side by side
69 |     print num_grad, grad
70 | 
71 |     # Compare numerically computed gradients with the ones obtained from backpropagation
72 |     diff = np.linalg.norm(num_grad - grad) / np.linalg.norm(num_grad + grad)
73 |     print diff
74 |     print "Norm of the difference between numerical and analytical num_grad (should be < 1e-7)\n\n"
75 | 
76 | ##======================================================================
77 | ## STEP 4: Learning parameters
78 | #
79 | #  Once you have verified that your gradients are correct,
80 | #  you can start training your softmax regression code using softmaxTrain
81 | #  (which uses minFunc).
82 | 
83 | options_ = {'maxiter': 100, 'disp': True}
84 | opt_theta, input_size, num_classes = softmax.softmax_train(input_size, num_classes,
85 |                                                            lambda_, input_data, labels, options_)
86 | 
87 | ##======================================================================
88 | ## STEP 5: Testing
89 | #
90 | #  You should now test your model against the test images.
91 | #  To do this, you will first need to write softmaxPredict
92 | #  (in softmaxPredict.m), which should return predictions
93 | #  given a softmax model and the input data.
94 | 
95 | test_images = load_MNIST.load_MNIST_images('data/mnist/t10k-images.idx3-ubyte')
96 | test_labels = load_MNIST.load_MNIST_labels('data/mnist/t10k-labels.idx1-ubyte')
97 | predictions = softmax.softmax_predict((opt_theta, input_size, num_classes), test_images)
98 | print "Accuracy: {0:.2f}%".format(100 * np.sum(predictions == test_labels, dtype=np.float64) / test_labels.shape[0])


--------------------------------------------------------------------------------
/sparse_autoencoder.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | 
  4 | def sigmoid(x):
  5 |     return 1 / (1 + np.exp(-x))
  6 | 
  7 | 
  8 | def sigmoid_prime(x):
  9 |     return sigmoid(x) * (1 - sigmoid(x))
 10 | 
 11 | 
 12 | def KL_divergence(x, y):
 13 |     return x * np.log(x / y) + (1 - x) * np.log((1 - x) / (1 - y))
 14 | 
 15 | 
 16 | def initialize(hidden_size, visible_size):
 17 |     # we'll choose weights uniformly from the interval [-r, r]
 18 |     r = np.sqrt(6) / np.sqrt(hidden_size + visible_size + 1)
 19 |     W1 = np.random.random((hidden_size, visible_size)) * 2 * r - r
 20 |     W2 = np.random.random((visible_size, hidden_size)) * 2 * r - r
 21 | 
 22 |     b1 = np.zeros(hidden_size, dtype=np.float64)
 23 |     b2 = np.zeros(visible_size, dtype=np.float64)
 24 | 
 25 |     theta = np.concatenate((W1.reshape(hidden_size * visible_size),
 26 |                             W2.reshape(hidden_size * visible_size),
 27 |                             b1.reshape(hidden_size),
 28 |                             b2.reshape(visible_size)))
 29 | 
 30 |     return theta
 31 | 
 32 | 
 33 | # visible_size: the number of input units (probably 64)
 34 | # hidden_size: the number of hidden units (probably 25)
 35 | # lambda_: weight decay parameter
 36 | # sparsity_param: The desired average activation for the hidden units (denoted in the lecture
 37 | #                            notes by the greek alphabet rho, which looks like a lower-case "p").
 38 | # beta: weight of sparsity penalty term
 39 | # data: Our 64x10000 matrix containing the training data.  So, data(:,i) is the i-th training example.
 40 | #
 41 | # The input theta is a vector (because minFunc expects the parameters to be a vector).
 42 | # We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this
 43 | # follows the notation convention of the lecture notes.
 44 | # Returns: (cost,gradient) tuple
 45 | def sparse_autoencoder_cost(theta, visible_size, hidden_size,
 46 |                             lambda_, sparsity_param, beta, data):
 47 |     # The input theta is a vector (because minFunc expects the parameters to be a vector).
 48 |     # We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this
 49 |     # follows the notation convention of the lecture notes.
 50 | 
 51 |     W1 = theta[0:hidden_size * visible_size].reshape(hidden_size, visible_size)
 52 |     W2 = theta[hidden_size * visible_size:2 * hidden_size * visible_size].reshape(visible_size, hidden_size)
 53 |     b1 = theta[2 * hidden_size * visible_size:2 * hidden_size * visible_size + hidden_size]
 54 |     b2 = theta[2 * hidden_size * visible_size + hidden_size:]
 55 | 
 56 |     # Number of training examples
 57 |     m = data.shape[1]
 58 | 
 59 |     # Forward propagation
 60 |     z2 = W1.dot(data) + np.tile(b1, (m, 1)).transpose()
 61 |     a2 = sigmoid(z2)
 62 |     z3 = W2.dot(a2) + np.tile(b2, (m, 1)).transpose()
 63 |     h = sigmoid(z3)
 64 | 
 65 |     # Sparsity
 66 |     rho_hat = np.sum(a2, axis=1) / m
 67 |     rho = np.tile(sparsity_param, hidden_size)
 68 | 
 69 |     # Cost function
 70 |     cost = np.sum((h - data) ** 2) / (2 * m) + \
 71 |            (lambda_ / 2) * (np.sum(W1 ** 2) + np.sum(W2 ** 2)) + \
 72 |            beta * np.sum(KL_divergence(rho, rho_hat))
 73 | 
 74 |     # Backprop
 75 |     sparsity_delta = np.tile(- rho / rho_hat + (1 - rho) / (1 - rho_hat), (m, 1)).transpose()
 76 | 
 77 |     delta3 = -(data - h) * sigmoid_prime(z3)
 78 |     delta2 = (W2.transpose().dot(delta3) + beta * sparsity_delta) * sigmoid_prime(z2)
 79 |     W1grad = delta2.dot(data.transpose()) / m + lambda_ * W1
 80 |     W2grad = delta3.dot(a2.transpose()) / m + lambda_ * W2
 81 |     b1grad = np.sum(delta2, axis=1) / m
 82 |     b2grad = np.sum(delta3, axis=1) / m
 83 | 
 84 |     # After computing the cost and gradient, we will convert the gradients back
 85 |     # to a vector format (suitable for minFunc).  Specifically, we will unroll
 86 |     # your gradient matrices into a vector.
 87 |     grad = np.concatenate((W1grad.reshape(hidden_size * visible_size),
 88 |                            W2grad.reshape(hidden_size * visible_size),
 89 |                            b1grad.reshape(hidden_size),
 90 |                            b2grad.reshape(visible_size)))
 91 | 
 92 |     return cost, grad
 93 | 
 94 | 
 95 | def sparse_autoencoder(theta, hidden_size, visible_size, data):
 96 |     """
 97 |     :param theta: trained weights from the autoencoder
 98 |     :param hidden_size: the number of hidden units (probably 25)
 99 |     :param visible_size: the number of input units (probably 64)
100 |     :param data: Our matrix containing the training data as columns.  So, data(:,i) is the i-th training example.
101 |     """
102 | 
103 |     # We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this
104 |     # follows the notation convention of the lecture notes.
105 |     W1 = theta[0:hidden_size * visible_size].reshape(hidden_size, visible_size)
106 |     b1 = theta[2 * hidden_size * visible_size:2 * hidden_size * visible_size + hidden_size]
107 | 
108 |     # Number of training examples
109 |     m = data.shape[1]
110 | 
111 |     # Forward propagation
112 |     z2 = W1.dot(data) + np.tile(b1, (m, 1)).transpose()
113 |     a2 = sigmoid(z2)
114 | 
115 |     return a2
116 | 
117 | 
118 | # visible_size: the number of input units (probably 64)
119 | # hidden_size: the number of hidden units (probably 25)
120 | # lambda_: weight decay parameter
121 | # sparsity_param: The desired average activation for the hidden units (denoted in the lecture
122 | #                            notes by the greek alphabet rho, which looks like a lower-case "p").
123 | # beta: weight of sparsity penalty term
124 | # data: Our 64x10000 matrix containing the training data.  So, data(:,i) is the i-th training example.
125 | #
126 | # The input theta is a vector (because minFunc expects the parameters to be a vector).
127 | # We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this
128 | # follows the notation convention of the lecture notes.
129 | # Returns: (cost,gradient) tuple
130 | def sparse_autoencoder_linear_cost(theta, visible_size, hidden_size,
131 |                                    lambda_, sparsity_param, beta, data):
132 |     # The input theta is a vector (because minFunc expects the parameters to be a vector).
133 |     # We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this
134 |     # follows the notation convention of the lecture notes.
135 | 
136 |     W1 = theta[0:hidden_size * visible_size].reshape(hidden_size, visible_size)
137 |     W2 = theta[hidden_size * visible_size:2 * hidden_size * visible_size].reshape(visible_size, hidden_size)
138 |     b1 = theta[2 * hidden_size * visible_size:2 * hidden_size * visible_size + hidden_size]
139 |     b2 = theta[2 * hidden_size * visible_size + hidden_size:]
140 | 
141 |     # Number of training examples
142 |     m = data.shape[1]
143 | 
144 |     # Forward propagation
145 |     z2 = W1.dot(data) + np.tile(b1, (m, 1)).transpose()
146 |     a2 = sigmoid(z2)
147 |     z3 = W2.dot(a2) + np.tile(b2, (m, 1)).transpose()
148 |     h = z3
149 | 
150 |     # Sparsity
151 |     rho_hat = np.sum(a2, axis=1) / m
152 |     rho = np.tile(sparsity_param, hidden_size)
153 | 
154 | 
155 |     # Cost function
156 |     cost = np.sum((h - data) ** 2) / (2 * m) + \
157 |            (lambda_ / 2) * (np.sum(W1 ** 2) + np.sum(W2 ** 2)) + \
158 |            beta * np.sum(KL_divergence(rho, rho_hat))
159 | 
160 | 
161 | 
162 |     # Backprop
163 |     sparsity_delta = np.tile(- rho / rho_hat + (1 - rho) / (1 - rho_hat), (m, 1)).transpose()
164 | 
165 |     delta3 = -(data - h)
166 |     delta2 = (W2.transpose().dot(delta3) + beta * sparsity_delta) * sigmoid_prime(z2)
167 |     W1grad = delta2.dot(data.transpose()) / m + lambda_ * W1
168 |     W2grad = delta3.dot(a2.transpose()) / m + lambda_ * W2
169 |     b1grad = np.sum(delta2, axis=1) / m
170 |     b2grad = np.sum(delta3, axis=1) / m
171 | 
172 |     # After computing the cost and gradient, we will convert the gradients back
173 |     # to a vector format (suitable for minFunc).  Specifically, we will unroll
174 |     # your gradient matrices into a vector.
175 |     grad = np.concatenate((W1grad.reshape(hidden_size * visible_size),
176 |                            W2grad.reshape(hidden_size * visible_size),
177 |                            b1grad.reshape(hidden_size),
178 |                            b2grad.reshape(visible_size)))
179 | 
180 |     return cost, grad
181 | 
182 | 


--------------------------------------------------------------------------------
/stacked_ae_exercise.py:
--------------------------------------------------------------------------------
  1 | import load_MNIST
  2 | import sparse_autoencoder
  3 | import scipy.optimize
  4 | import softmax
  5 | import stacked_autoencoder
  6 | import numpy as np
  7 | 
  8 | ##======================================================================
  9 | ## STEP 0: Here we provide the relevant parameters values that will
 10 | #  allow your sparse autoencoder to get good filters; you do not need to
 11 | #  change the parameters below.
 12 | 
 13 | input_size = 28 * 28
 14 | num_classes = 10
 15 | hidden_size_L1 = 200  # Layer 1 Hidden Size
 16 | hidden_size_L2 = 200  # Layer 2 Hidden Size
 17 | sparsity_param = 0.1  # desired average activation of the hidden units.
 18 | lambda_ = 3e-3  # weight decay parameter
 19 | beta = 3  # weight of sparsity penalty term
 20 | 
 21 | ##======================================================================
 22 | ## STEP 1: Load data from the MNIST database
 23 | #
 24 | #  This loads our training data from the MNIST database files.
 25 | 
 26 | train_images = load_MNIST.load_MNIST_images('data/mnist/train-images-idx3-ubyte')
 27 | train_labels = load_MNIST.load_MNIST_labels('data/mnist/train-labels-idx1-ubyte')
 28 | 
 29 | 
 30 | ##======================================================================
 31 | ## STEP 2: Train the first sparse autoencoder
 32 | #  This trains the first sparse autoencoder on the unlabelled STL training
 33 | #  images.
 34 | #  If you've correctly implemented sparseAutoencoderCost.m, you don't need
 35 | #  to change anything here.
 36 | 
 37 | #  Randomly initialize the parameters
 38 | sae1_theta = sparse_autoencoder.initialize(hidden_size_L1, input_size)
 39 | 
 40 | J = lambda x: sparse_autoencoder.sparse_autoencoder_cost(x, input_size, hidden_size_L1,
 41 |                                                          lambda_, sparsity_param,
 42 |                                                          beta, train_images)
 43 | options_ = {'maxiter': 400, 'disp': True}
 44 | 
 45 | result = scipy.optimize.minimize(J, sae1_theta, method='L-BFGS-B', jac=True, options=options_)
 46 | sae1_opt_theta = result.x
 47 | 
 48 | print result
 49 | 
 50 | ##======================================================================
 51 | ## STEP 3: Train the second sparse autoencoder
 52 | #  This trains the second sparse autoencoder on the first autoencoder
 53 | #  featurse.
 54 | #  If you've correctly implemented sparseAutoencoderCost.m, you don't need
 55 | #  to change anything here.
 56 | 
 57 | sae1_features = sparse_autoencoder.sparse_autoencoder(sae1_opt_theta, hidden_size_L1,
 58 |                                                       input_size, train_images)
 59 | 
 60 | #  Randomly initialize the parameters
 61 | sae2_theta = sparse_autoencoder.initialize(hidden_size_L2, hidden_size_L1)
 62 | 
 63 | J = lambda x: sparse_autoencoder.sparse_autoencoder_cost(x, hidden_size_L1, hidden_size_L2,
 64 |                                                          lambda_, sparsity_param,
 65 |                                                          beta, sae1_features)
 66 | 
 67 | options_ = {'maxiter': 400, 'disp': True}
 68 | 
 69 | result = scipy.optimize.minimize(J, sae2_theta, method='L-BFGS-B', jac=True, options=options_)
 70 | sae2_opt_theta = result.x
 71 | 
 72 | print result
 73 | 
 74 | 
 75 | ##======================================================================
 76 | ## STEP 4: Train the softmax classifier
 77 | #  This trains the sparse autoencoder on the second autoencoder features.
 78 | #  If you've correctly implemented softmaxCost.m, you don't need
 79 | #  to change anything here.
 80 | 
 81 | sae2_features = sparse_autoencoder.sparse_autoencoder(sae2_opt_theta, hidden_size_L2,
 82 |                                                       hidden_size_L1, sae1_features)
 83 | 
 84 | options_ = {'maxiter': 400, 'disp': True}
 85 | 
 86 | softmax_theta, softmax_input_size, softmax_num_classes = softmax.softmax_train(hidden_size_L2, num_classes,
 87 |                                                                                lambda_, sae2_features,
 88 |                                                                                train_labels, options_)
 89 | 
 90 | ##======================================================================
 91 | ## STEP 5: Finetune softmax model
 92 | 
 93 | # Implement the stacked_autoencoder_cost to give the combined cost of the whole model
 94 | # then run this cell.
 95 | 
 96 | 
 97 | # Initialize the stack using the parameters learned
 98 | stack = [dict() for i in range(2)]
 99 | stack[0]['w'] = sae1_opt_theta[0:hidden_size_L1 * input_size].reshape(hidden_size_L1, input_size)
100 | stack[0]['b'] = sae1_opt_theta[2 * hidden_size_L1 * input_size:2 * hidden_size_L1 * input_size + hidden_size_L1]
101 | stack[1]['w'] = sae2_opt_theta[0:hidden_size_L1 * hidden_size_L2].reshape(hidden_size_L2, hidden_size_L1)
102 | stack[1]['b'] = sae2_opt_theta[2 * hidden_size_L1 * hidden_size_L2:2 * hidden_size_L1 * hidden_size_L2 + hidden_size_L2]
103 | 
104 | # Initialize the parameters for the deep model
105 | (stack_params, net_config) = stacked_autoencoder.stack2params(stack)
106 | 
107 | stacked_autoencoder_theta = np.concatenate((softmax_theta.flatten(), stack_params))
108 | 
109 | J = lambda x: stacked_autoencoder.stacked_autoencoder_cost(x, input_size, hidden_size_L2,
110 |                                                            num_classes, net_config, lambda_,
111 |                                                            train_images, train_labels)
112 | 
113 | options_ = {'maxiter': 400, 'disp': True}
114 | result = scipy.optimize.minimize(J, stacked_autoencoder_theta, method='L-BFGS-B', jac=True, options=options_)
115 | stacked_autoencoder_opt_theta = result.x
116 | 
117 | print result
118 | 
119 | ##======================================================================
120 | ## STEP 6: Test
121 | 
122 | test_images = load_MNIST.load_MNIST_images('data/mnist/t10k-images.idx3-ubyte')
123 | test_labels = load_MNIST.load_MNIST_labels('data/mnist/t10k-labels.idx1-ubyte')
124 | 
125 | 
126 | # Two auto encoders without fine tuning
127 | pred = stacked_autoencoder.stacked_autoencoder_predict(stacked_autoencoder_theta, input_size, hidden_size_L2,
128 |                                                        num_classes, net_config, test_images)
129 | 
130 | print "Before fine-tuning accuracy: {0:.2f}%".format(100 * np.sum(pred == test_labels, dtype=np.float64) /
131 |                                                      test_labels.shape[0])
132 | 
133 | # Two auto encoders with fine tuning
134 | pred = stacked_autoencoder.stacked_autoencoder_predict(stacked_autoencoder_opt_theta, input_size, hidden_size_L2,
135 |                                                        num_classes, net_config, test_images)
136 | 
137 | print "After fine-tuning accuracy: {0:.2f}%".format(100 * np.sum(pred == test_labels, dtype=np.float64) /
138 |                                                     test_labels.shape[0])
139 | 


--------------------------------------------------------------------------------
/stacked_autoencoder.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import scipy.sparse
  3 | import softmax
  4 | 
  5 | 
  6 | def sigmoid(x):
  7 |     return 1 / (1 + np.exp(-x))
  8 | 
  9 | 
 10 | def sigmoid_prime(x):
 11 |     return sigmoid(x) * (1 - sigmoid(x))
 12 | 
 13 | 
 14 | def stack2params(stack):
 15 |     """
 16 |     Converts a "stack" structure into a flattened parameter vector and also
 17 |     stores the network configuration. This is useful when working with
 18 |     optimization toolboxes such as minFunc.
 19 | 
 20 |     [params, netconfig] = stack2params(stack)
 21 | 
 22 |     stack - the stack structure, where stack{1}.w = weights of first layer
 23 |                                        stack{1}.b = weights of first layer
 24 |                                        stack{2}.w = weights of second layer
 25 |                                        stack{2}.b = weights of second layer
 26 |                                        ... etc.
 27 | 
 28 |     :param stack: the stack structure
 29 |     :return: params: flattened parameter vector
 30 |     :return: net_config: aux. variable with network structure
 31 |     """
 32 | 
 33 |     params = []
 34 |     for s in stack:
 35 |         params.append(s['w'].flatten())
 36 |         params.append(s['b'].flatten())
 37 |     params = np.concatenate(params)
 38 | 
 39 |     net_config = {}
 40 |     if len(stack) == 0:
 41 |         net_config['input_size'] = 0
 42 |         net_config['layer_sizes'] = []
 43 |     else:
 44 |         net_config['input_size'] = stack[0]['w'].shape[1]
 45 |         net_config['layer_sizes'] = []
 46 |         for s in stack:
 47 |             net_config['layer_sizes'].append(s['w'].shape[0])
 48 | 
 49 |     return params, net_config
 50 | 
 51 | 
 52 | def params2stack(params, net_config):
 53 |     """
 54 |     Converts a flattened parameter vector into a nice "stack" structure
 55 |     for us to work with. This is useful when you're building multilayer
 56 |     networks.
 57 |     stack = params2stack(params, netconfig)
 58 | 
 59 |     :param params: flattened parameter vector
 60 |     :param net_config: aux. variable containing network config.
 61 |     :return: stack structure (see above)
 62 | 
 63 |     """
 64 |     # Map the params (a vector into a stack of weights)
 65 |     depth = len(net_config['layer_sizes'])
 66 |     stack = [dict() for i in range(depth)]
 67 | 
 68 |     prev_layer_size = net_config['input_size']
 69 |     current_pos = 0
 70 | 
 71 |     for i in range(depth):
 72 |         # Extract weights
 73 |         wlen = prev_layer_size * net_config['layer_sizes'][i]
 74 |         stack[i]['w'] = params[current_pos:current_pos + wlen].reshape(net_config['layer_sizes'][i], prev_layer_size)
 75 |         current_pos = current_pos + wlen
 76 | 
 77 |         # Extract bias
 78 |         blen = net_config['layer_sizes'][i]
 79 |         stack[i]['b'] = params[current_pos:current_pos + blen]
 80 |         current_pos = current_pos + blen
 81 | 
 82 |         # Set previous layer size
 83 |         prev_layer_size = net_config['layer_sizes'][i]
 84 | 
 85 |     return stack
 86 | 
 87 | 
 88 | def stacked_autoencoder_cost(theta, input_size, hidden_size, num_classes,
 89 |                              net_config, lambda_, data, labels):
 90 |     """
 91 |     Takes a trained softmax_theta and a training data set with labels
 92 |     and returns cost and gradient using stacked autoencoder model.
 93 |     Used only for finetuning
 94 | 
 95 |     :param theta: trained weights from the autoencoder
 96 |     :param input_size: the number of input units
 97 |     :param hidden_size: the number of hidden units (at the layer before softmax)
 98 |     :param num_classes: number of categories
 99 |     :param net_config: network configuration of the stack
100 |     :param lambda_: weight regularization penalty
101 |     :param data: matrix containing data as columns. data[:,i-1] is i-th example
102 |     :param labels: vector containing labels, labels[i-1] is the label for i-th example
103 |     """
104 | 
105 |     ## Unroll softmax_theta parameter
106 | 
107 |     # We first extract the part which compute the softmax gradient
108 |     softmax_theta = theta[0:hidden_size * num_classes].reshape(num_classes, hidden_size)
109 | 
110 |     # Extract out the "stack"
111 |     stack = params2stack(theta[hidden_size * num_classes:], net_config)
112 | 
113 |     m = data.shape[1]
114 | 
115 |     # Forward propagation
116 |     a = [data]
117 |     z = [np.array(0)]  # Dummy value
118 | 
119 |     for s in stack:
120 |         z.append(s['w'].dot(a[-1]) + np.tile(s['b'], (m, 1)).transpose())
121 |         a.append(sigmoid(z[-1]))
122 | 
123 |     # Softmax
124 |     prod = softmax_theta.dot(a[-1])
125 |     prod = prod - np.max(prod)
126 |     prob = np.exp(prod) / np.sum(np.exp(prod), axis=0)
127 |     indicator = scipy.sparse.csr_matrix((np.ones(m), (labels, np.array(range(m)))))
128 |     indicator = np.array(indicator.todense())
129 | 
130 |     cost = (-1 / float(m)) * np.sum(indicator * np.log(prob)) + (lambda_ / 2) * np.sum(softmax_theta * softmax_theta)
131 |     softmax_grad = (-1 / float(m)) * (indicator - prob).dot(a[-1].transpose()) + lambda_ * softmax_theta
132 | 
133 |     # Backprop
134 |     # Compute partial of cost (J) w.r.t to outputs of last layer (before softmax)
135 |     softmax_grad_a = softmax_theta.transpose().dot(indicator - prob)
136 | 
137 |     # Compute deltas
138 |     delta = [-softmax_grad_a * sigmoid_prime(z[-1])]
139 |     for i in reversed(range(len(stack))):
140 |         d = stack[i]['w'].transpose().dot(delta[0]) * sigmoid_prime(z[i])
141 |         delta.insert(0, d)
142 | 
143 |     # Compute gradients
144 |     stack_grad = [dict() for i in range(len(stack))]
145 |     for i in range(len(stack_grad)):
146 |         stack_grad[i]['w'] = delta[i + 1].dot(a[i].transpose()) / m
147 |         stack_grad[i]['b'] = np.sum(delta[i + 1], axis=1) / m
148 | 
149 |     grad_params, net_config = stack2params(stack_grad)
150 |     grad = np.concatenate((softmax_grad.flatten(), grad_params))
151 | 
152 |     return cost, grad
153 | 
154 | 
155 | def stacked_autoencoder_predict(theta, input_size, hidden_size, num_classes, net_config, data):
156 |     """
157 |     Takes a trained theta and a test data set,
158 |     and returns the predicted labels for each example
159 |     :param theta: trained weights from the autoencoder
160 |     :param input_size: the number of input units
161 |     :param hidden_size: the number of hidden units at the layer before softmax
162 |     :param num_classes: the number of categories
163 |     :param netconfig: network configuration of the stack
164 |     :param data: the matrix containing the training data as columsn. data[:,i-1] is the i-th training example
165 |     :return:
166 | 
167 |     Your code should produce the prediction matrix
168 |     pred, where pred(i) is argmax_c P(y(c) | x(i)).
169 |     """
170 | 
171 |     ## Unroll theta parameter
172 |     # We first extract the part which compute the softmax gradient
173 |     softmax_theta = theta[0:hidden_size * num_classes].reshape(num_classes, hidden_size)
174 | 
175 |     # Extract out the "stack"
176 |     stack = params2stack(theta[hidden_size * num_classes:], net_config)
177 | 
178 |     m = data.shape[1]
179 | 
180 |     # Compute predictions
181 |     a = [data]
182 |     z = [np.array(0)]  # Dummy value
183 | 
184 |     # Sparse Autoencoder Computation
185 |     for s in stack:
186 |         z.append(s['w'].dot(a[-1]) + np.tile(s['b'], (m, 1)).transpose())
187 |         a.append(sigmoid(z[-1]))
188 | 
189 |     # Softmax
190 |     pred = softmax.softmax_predict((softmax_theta, hidden_size, num_classes), a[-1])
191 | 
192 |     return pred


--------------------------------------------------------------------------------
/stl_exercise.py:
--------------------------------------------------------------------------------
 1 | import load_MNIST
 2 | import numpy as np
 3 | import sparse_autoencoder
 4 | import scipy.optimize
 5 | import display_network
 6 | import softmax
 7 | 
 8 | ## ======================================================================
 9 | #  STEP 0: Here we provide the relevant parameters values that will
10 | #  allow your sparse autoencoder to get good filters; you do not need to
11 | #  change the parameters below.
12 | 
13 | input_size = 28 * 28
14 | num_labels = 5
15 | hidden_size = 196
16 | 
17 | sparsity_param = 0.1  # desired average activation of the hidden units.
18 | lambda_ = 3e-3  # weight decay parameter
19 | beta = 3  # weight of sparsity penalty term
20 | 
21 | ## ======================================================================
22 | #  STEP 1: Load data from the MNIST database
23 | #
24 | #  This loads our training and test data from the MNIST database files.
25 | #  We have sorted the data for you in this so that you will not have to
26 | #  change it.
27 | 
28 | images = load_MNIST.load_MNIST_images('data/mnist/train-images-idx3-ubyte')
29 | labels = load_MNIST.load_MNIST_labels('data/mnist/train-labels-idx1-ubyte')
30 | 
31 | unlabeled_index = np.argwhere(labels >= 5).flatten()
32 | labeled_index = np.argwhere(labels < 5).flatten()
33 | 
34 | num_train = round(labeled_index.shape[0] / 2)
35 | train_index = labeled_index[0:num_train]
36 | test_index = labeled_index[num_train:]
37 | 
38 | unlabeled_data = images[:, unlabeled_index]
39 | 
40 | train_data = images[:, train_index]
41 | train_labels = labels[train_index]
42 | 
43 | test_data = images[:, test_index]
44 | test_labels = labels[test_index]
45 | 
46 | print '# examples in unlabeled set: {0:d}\n'.format(unlabeled_data.shape[1])
47 | print '# examples in supervised training set: {0:d}\n'.format(train_data.shape[1])
48 | print '# examples in supervised testing set: {0:d}\n'.format(test_data.shape[1])
49 | 
50 | ## ======================================================================
51 | #  STEP 2: Train the sparse autoencoder
52 | #  This trains the sparse autoencoder on the unlabeled training
53 | #  images.
54 | 
55 | #  Randomly initialize the parameters
56 | theta = sparse_autoencoder.initialize(hidden_size, input_size)
57 | 
58 | J = lambda x: sparse_autoencoder.sparse_autoencoder_cost(x, input_size, hidden_size,
59 |                                                          lambda_, sparsity_param,
60 |                                                          beta, unlabeled_data)
61 | 
62 | options_ = {'maxiter': 400, 'disp': True}
63 | result = scipy.optimize.minimize(J, theta, method='L-BFGS-B', jac=True, options=options_)
64 | opt_theta = result.x
65 | 
66 | print result
67 | 
68 | # Visualize the weights
69 | W1 = opt_theta[0:hidden_size * input_size].reshape(hidden_size, input_size).transpose()
70 | display_network.display_network(W1)
71 | 
72 | ##======================================================================
73 | ## STEP 3: Extract Features from the Supervised Dataset
74 | #
75 | #  You need to complete the code in feedForwardAutoencoder.m so that the
76 | #  following command will extract features from the data.
77 | 
78 | train_features = sparse_autoencoder.sparse_autoencoder(opt_theta, hidden_size,
79 |                                                        input_size, train_data)
80 | 
81 | test_features = sparse_autoencoder.sparse_autoencoder(opt_theta, hidden_size,
82 |                                                       input_size, test_data)
83 | 
84 | ##======================================================================
85 | ## STEP 4: Train the softmax classifier
86 | 
87 | lambda_ = 1e-4
88 | options_ = {'maxiter': 400, 'disp': True}
89 | 
90 | opt_theta, input_size, num_classes = softmax.softmax_train(hidden_size, num_labels,
91 |                                                            lambda_, train_features,
92 |                                                            train_labels, options_)
93 | 
94 | ##======================================================================
95 | ## STEP 5: Testing
96 | 
97 | predictions = softmax.softmax_predict((opt_theta, input_size, num_classes), test_features)
98 | print "Accuracy: {0:.2f}%".format(100 * np.sum(predictions == test_labels, dtype=np.float64) / test_labels.shape[0])
99 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import scipy.optimize
  3 | import sample_images
  4 | import sparse_autoencoder
  5 | import gradient
  6 | import display_network
  7 | import load_MNIST
  8 | 
  9 | 
 10 | ##======================================================================
 11 | ## STEP 0: Here we provide the relevant parameters values that will
 12 | #  allow your sparse autoencoder to get good filters; you do not need to
 13 | #  change the parameters below.
 14 | 
 15 | # number of input units
 16 | visible_size = 28 * 28
 17 | # number of input units
 18 | hidden_size = 196
 19 | 
 20 | # desired average activation of the hidden units.
 21 | # (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",
 22 | #  in the lecture notes).
 23 | sparsity_param = 0.1
 24 | # weight decay parameter
 25 | lambda_ = 3e-3
 26 | # weight of sparsity penalty term
 27 | beta = 3
 28 | # debug
 29 | debug = False
 30 | 
 31 | 
 32 | ##======================================================================
 33 | ## STEP 1: Implement sampleIMAGES
 34 | #
 35 | #  After implementing sampleIMAGES, the display_network command should
 36 | #  display a random sample of 200 patches from the dataset
 37 | 
 38 | # Loading Sample Images
 39 | # patches = sample_images.sample_images()
 40 | 
 41 | # Loading 10K images from MNIST database
 42 | images = load_MNIST.load_MNIST_images('data/mnist/train-images-idx3-ubyte')
 43 | patches = images[:, 0:10000]
 44 | 
 45 | #  Obtain random parameters theta
 46 | theta = sparse_autoencoder.initialize(hidden_size, visible_size)
 47 | 
 48 | ##======================================================================
 49 | ## STEP 2: Implement sparseAutoencoderCost
 50 | #
 51 | #  You can implement all of the components (squared error cost, weight decay term,
 52 | #  sparsity penalty) in the cost function at once, but it may be easier to do
 53 | #  it step-by-step and run gradient checking (see STEP 3) after each step.  We
 54 | #  suggest implementing the sparseAutoencoderCost function using the following steps:
 55 | #
 56 | #  (a) Implement forward propagation in your neural network, and implement the
 57 | #      squared error term of the cost function.  Implement backpropagation to
 58 | #      compute the derivatives.   Then (using lambda=beta=0), run Gradient Checking
 59 | #      to verify that the calculations corresponding to the squared error cost
 60 | #      term are correct.
 61 | #
 62 | #  (b) Add in the weight decay term (in both the cost function and the derivative
 63 | #      calculations), then re-run Gradient Checking to verify correctness.
 64 | #
 65 | #  (c) Add in the sparsity penalty term, then re-run Gradient Checking to
 66 | #      verify correctness.
 67 | #
 68 | #  Feel free to change the training settings when debugging your
 69 | #  code.  (For example, reducing the training set size or
 70 | #  number of hidden units may make your code run faster; and setting beta
 71 | #  and/or lambda to zero may be helpful for debugging.)  However, in your
 72 | #  final submission of the visualized weights, please use parameters we
 73 | #  gave in Step 0 above.
 74 | 
 75 | (cost, grad) = sparse_autoencoder.sparse_autoencoder_cost(theta, visible_size,
 76 |                                                           hidden_size, lambda_,
 77 |                                                           sparsity_param, beta, patches)
 78 | print cost, grad
 79 | ##======================================================================
 80 | ## STEP 3: Gradient Checking
 81 | #
 82 | # Hint: If you are debugging your code, performing gradient checking on smaller models
 83 | # and smaller training sets (e.g., using only 10 training examples and 1-2 hidden
 84 | # units) may speed things up.
 85 | 
 86 | # First, lets make sure your numerical gradient computation is correct for a
 87 | # simple function.  After you have implemented computeNumericalGradient.m,
 88 | # run the following:
 89 | 
 90 | 
 91 | if debug:
 92 |     gradient.check_gradient()
 93 | 
 94 |     # Now we can use it to check your cost function and derivative calculations
 95 |     # for the sparse autoencoder.
 96 |     # J is the cost function
 97 | 
 98 |     J = lambda x: sparse_autoencoder.sparse_autoencoder_cost(x, visible_size, hidden_size,
 99 |                                                              lambda_, sparsity_param,
100 |                                                              beta, patches)
101 |     num_grad = gradient.compute_gradient(J, theta)
102 | 
103 |     # Use this to visually compare the gradients side by side
104 |     print num_grad, grad
105 | 
106 |     # Compare numerically computed gradients with the ones obtained from backpropagation
107 |     diff = np.linalg.norm(num_grad - grad) / np.linalg.norm(num_grad + grad)
108 |     print diff
109 |     print "Norm of the difference between numerical and analytical num_grad (should be < 1e-9)\n\n"
110 | 
111 | ##======================================================================
112 | ## STEP 4: After verifying that your implementation of
113 | #  sparseAutoencoderCost is correct, You can start training your sparse
114 | #  autoencoder with minFunc (L-BFGS).
115 | 
116 | #  Randomly initialize the parameters
117 | theta = sparse_autoencoder.initialize(hidden_size, visible_size)
118 | 
119 | J = lambda x: sparse_autoencoder.sparse_autoencoder_cost(x, visible_size, hidden_size,
120 |                                                          lambda_, sparsity_param,
121 |                                                          beta, patches)
122 | options_ = {'maxiter': 400, 'disp': True}
123 | result = scipy.optimize.minimize(J, theta, method='L-BFGS-B', jac=True, options=options_)
124 | opt_theta = result.x
125 | 
126 | print result
127 | 
128 | ##======================================================================
129 | ## STEP 5: Visualization
130 | 
131 | W1 = opt_theta[0:hidden_size * visible_size].reshape(hidden_size, visible_size).transpose()
132 | display_network.display_network(W1)
133 | 
134 | 


--------------------------------------------------------------------------------