├── README.md ├── assignment2 ├── .gitignore ├── .ipynb_checkpoints │ ├── BatchNormalization-checkpoint.ipynb │ ├── ConvolutionalNetworks-checkpoint.ipynb │ ├── Dropout-checkpoint.ipynb │ └── FullyConnectedNets-checkpoint.ipynb ├── BatchNormalization.ipynb ├── ConvolutionalNetworks.ipynb ├── Dropout.ipynb ├── FullyConnectedNets.ipynb ├── README.md ├── collectSubmission.sh ├── cs231n │ ├── .gitignore │ ├── __init__.py │ ├── classifiers │ │ ├── __init__.py │ │ ├── cnn.py │ │ └── fc_net.py │ ├── data_utils.py │ ├── datasets │ │ ├── .gitignore │ │ └── get_datasets.sh │ ├── fast_layers.py │ ├── gradient_check.py │ ├── im2col.py │ ├── im2col_cython.pyx │ ├── layer_utils.py │ ├── layers.py │ ├── optim.py │ ├── setup.py │ ├── solver.py │ └── vis_utils.py ├── frameworkpython ├── kitten.jpg ├── puppy.jpg ├── requirements.txt └── start_ipython_osx.sh └── assignment3 ├── .gitignore ├── .ipynb_checkpoints ├── ImageGeneration-checkpoint.ipynb ├── ImageGradients-checkpoint.ipynb ├── LSTM_Captioning-checkpoint.ipynb └── RNN_Captioning-checkpoint.ipynb ├── ImageGeneration.ipynb ├── ImageGradients.ipynb ├── LSTM_Captioning.ipynb ├── RNN_Captioning.ipynb ├── collectSubmission.sh ├── cs231n ├── .gitignore ├── __init__.py ├── captioning_solver.py ├── classifiers │ ├── __init__.py │ ├── pretrained_cnn.py │ └── rnn.py ├── coco_utils.py ├── data_utils.py ├── datasets │ ├── get_coco_captioning.sh │ ├── get_pretrained_model.sh │ └── get_tiny_imagenet_a.sh ├── fast_layers.py ├── gradient_check.py ├── im2col.py ├── im2col_cython.pyx ├── image_utils.py ├── layer_utils.py ├── layers.py ├── optim.py ├── rnn_layers.py └── setup.py ├── frameworkpython ├── kitten.jpg ├── requirements.txt ├── sky.jpg └── start_ipython_osx.sh /README.md: -------------------------------------------------------------------------------- 1 | # Stanford-CS231n-assignments 2 | The answers to the assignments of CS231n 2016. Finished by frankheshibi@gmail.com 3 | 4 | http://cs231n.stanford.edu/syllabus.html 5 | 6 | I deleted all datasets and my python virtual environment files. To create one, we simply follow the instructions on assignments' readme. 7 | 8 | As assignment 1 is too easy, I did not do it. 9 | 10 | Recently, I have heard that the videos in syllabus were rescinded because of some copyright problems. What a pity! I believe there are still some backups on YouTube or online drives. 11 | -------------------------------------------------------------------------------- /assignment2/.gitignore: -------------------------------------------------------------------------------- 1 | *.swp 2 | *.pyc 3 | .env/* 4 | -------------------------------------------------------------------------------- /assignment2/README.md: -------------------------------------------------------------------------------- 1 | In this assignment you will practice writing backpropagation code, and training 2 | Neural Networks and Convolutional Neural Networks. The goals of this assignment 3 | are as follows: 4 | 5 | - understand **Neural Networks** and how they are arranged in layered 6 | architectures 7 | - understand and be able to implement (vectorized) **backpropagation** 8 | - implement various **update rules** used to optimize Neural Networks 9 | - implement **batch normalization** for training deep networks 10 | - implement **dropout** to regularize networks 11 | - effectively **cross-validate** and find the best hyperparameters for Neural 12 | Network architecture 13 | - understand the architecture of **Convolutional Neural Networks** and train 14 | gain experience with training these models on data 15 | 16 | ## Setup 17 | You can work on the assignment in one of two ways: locally on your own machine, 18 | or on a virtual machine through Terminal.com. 19 | 20 | ### Working in the cloud on Terminal 21 | 22 | Terminal has created a separate subdomain to serve our class, 23 | [www.stanfordterminalcloud.com](https://www.stanfordterminalcloud.com). Register 24 | your account there. The Assignment 2 snapshot can then be found HERE. If you are 25 | registered in the class you can contact the TA (see Piazza for more information) 26 | to request Terminal credits for use on the assignment. Once you boot up the 27 | snapshot everything will be installed for you, and you will be ready to start on 28 | your assignment right away. We have written a small tutorial on Terminal 29 | [here](http://cs231n.github.io/terminal-tutorial/). 30 | 31 | ### Working locally 32 | Get the code as a zip file 33 | [here](http://vision.stanford.edu/teaching/cs231n/winter1516_assignment2.zip). 34 | As for the dependencies: 35 | 36 | **[Option 1] Use Anaconda:** 37 | The preferred approach for installing all the assignment dependencies is to use 38 | [Anaconda](https://www.continuum.io/downloads), which is a Python distribution 39 | that includes many of the most popular Python packages for science, math, 40 | engineering and data analysis. Once you install it you can skip all mentions of 41 | requirements and you are ready to go directly to working on the assignment. 42 | 43 | **[Option 2] Manual install, virtual environment:** 44 | If you do not want to use Anaconda and want to go with a more manual and risky 45 | installation route you will likely want to create a 46 | [virtual environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/) 47 | for the project. If you choose not to use a virtual environment, it is up to you 48 | to make sure that all dependencies for the code are installed globally on your 49 | machine. To set up a virtual environment, run the following: 50 | 51 | ```bash 52 | cd assignment2 53 | sudo pip install virtualenv # This may already be installed 54 | virtualenv .env # Create a virtual environment 55 | source .env/bin/activate # Activate the virtual environment 56 | pip install -r requirements.txt # Install dependencies 57 | # Work on the assignment for a while ... 58 | deactivate # Exit the virtual environment 59 | ``` 60 | 61 | **Download data:** 62 | Once you have the starter code, you will need to download the CIFAR-10 dataset. 63 | Run the following from the `assignment2` directory: 64 | 65 | ```bash 66 | cd cs231n/datasets 67 | ./get_datasets.sh 68 | ``` 69 | 70 | **Compile the Cython extension:** Convolutional Neural Networks require a very 71 | efficient implementation. We have implemented of the functionality using 72 | [Cython](http://cython.org/); you will need to compile the Cython extension 73 | before you can run the code. From the `cs231n` directory, run the following 74 | command: 75 | 76 | ```bash 77 | python setup.py build_ext --inplace 78 | ``` 79 | 80 | **Start IPython:** 81 | After you have the CIFAR-10 data, you should start the IPython notebook server 82 | from the `assignment2` directory. If you are unfamiliar with IPython, you should 83 | read our [IPython tutorial](http://cs231n.github.io/ipython-tutorial/). 84 | 85 | **NOTE:** If you are working in a virtual environment on OSX, you may encounter 86 | errors with matplotlib due to the 87 | [issues described here](http://matplotlib.org/faq/virtualenv_faq.html). 88 | You can work around this issue by starting the IPython server using the 89 | `start_ipython_osx.sh` script from the `assignment2` directory; the script 90 | assumes that your virtual environment is named `.env`. 91 | 92 | 93 | ### Submitting your work: 94 | Whether you work on the assignment locally or using Terminal, once you are done 95 | working run the `collectSubmission.sh` script; this will produce a file called 96 | `assignment2.zip`. Upload this file to your dropbox on 97 | [the coursework](https://coursework.stanford.edu/portal/site/W15-CS-231N-01/) 98 | page for the course. 99 | 100 | 101 | ### Q1: Fully-connected Neural Network (30 points) 102 | The IPython notebook `FullyConnectedNets.ipynb` will introduce you to our 103 | modular layer design, and then use those layers to implement fully-connected 104 | networks of arbitrary depth. To optimize these models you will implement several 105 | popular update rules. 106 | 107 | ### Q2: Batch Normalization (30 points) 108 | In the IPython notebook `BatchNormalization.ipynb` you will implement batch 109 | normalization, and use it to train deep fully-connected networks. 110 | 111 | ### Q3: Dropout (10 points) 112 | The IPython notebook `Dropout.ipynb` will help you implement Dropout and explore 113 | its effects on model generalization. 114 | 115 | ### Q4: ConvNet on CIFAR-10 (30 points) 116 | In the IPython Notebook `ConvolutionalNetworks.ipynb` you will implement several 117 | new layers that are commonly used in convolutional networks. You will train a 118 | (shallow) convolutional network on CIFAR-10, and it will then be up to you to 119 | train the best network that you can. 120 | 121 | ### Q5: Do something extra! (up to +10 points) 122 | In the process of training your network, you should feel free to implement 123 | anything that you want to get better performance. You can modify the solver, 124 | implement additional layers, use different types of regularization, use an 125 | ensemble of models, or anything else that comes to mind. If you implement these 126 | or other ideas not covered in the assignment then you will be awarded some bonus 127 | points. 128 | 129 | -------------------------------------------------------------------------------- /assignment2/collectSubmission.sh: -------------------------------------------------------------------------------- 1 | rm -f assignment2.zip 2 | zip -r assignment2.zip . -x "*.git*" "*cs231n/datasets*" "*.ipynb_checkpoints*" "*README.md" "*collectSubmission.sh" "*requirements.txt" "venv/*" "*.pyc" "*cs231n/build/*" 3 | -------------------------------------------------------------------------------- /assignment2/cs231n/.gitignore: -------------------------------------------------------------------------------- 1 | build/* 2 | im2col_cython.c 3 | im2col_cython.so 4 | -------------------------------------------------------------------------------- /assignment2/cs231n/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment2/cs231n/__init__.py -------------------------------------------------------------------------------- /assignment2/cs231n/classifiers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment2/cs231n/classifiers/__init__.py -------------------------------------------------------------------------------- /assignment2/cs231n/classifiers/cnn.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from cs231n.layers import * 4 | from cs231n.fast_layers import * 5 | from cs231n.layer_utils import * 6 | 7 | 8 | class ThreeLayerConvNet(object): 9 | """ 10 | A three-layer convolutional network with the following architecture: 11 | 12 | conv - relu - 2x2 max pool - affine - relu - affine - softmax 13 | 14 | The network operates on minibatches of data that have shape (N, C, H, W) 15 | consisting of N images, each with height H and width W and with C input 16 | channels. 17 | """ 18 | 19 | def __init__(self, input_dim=(3, 32, 32), num_filters=32, filter_size=7, 20 | hidden_dim=100, num_classes=10, weight_scale=1e-3, reg=0.0, 21 | dtype=np.float32): 22 | """ 23 | Initialize a new network. 24 | 25 | Inputs: 26 | - input_dim: Tuple (C, H, W) giving size of input data 27 | - num_filters: Number of filters to use in the convolutional layer 28 | - filter_size: Size of filters to use in the convolutional layer 29 | - hidden_dim: Number of units to use in the fully-connected hidden layer 30 | - num_classes: Number of scores to produce from the final affine layer. 31 | - weight_scale: Scalar giving standard deviation for random initialization 32 | of weights. 33 | - reg: Scalar giving L2 regularization strength 34 | - dtype: numpy datatype to use for computation. 35 | """ 36 | self.params = {} 37 | self.reg = reg 38 | self.dtype = dtype 39 | 40 | ############################################################################ 41 | # TODO: Initialize weights and biases for the three-layer convolutional # 42 | # network. Weights should be initialized from a Gaussian with standard # 43 | # deviation equal to weight_scale; biases should be initialized to zero. # 44 | # All weights and biases should be stored in the dictionary self.params. # 45 | # Store weights and biases for the convolutional layer using the keys 'W1' # 46 | # and 'b1'; use keys 'W2' and 'b2' for the weights and biases of the # 47 | # hidden affine layer, and keys 'W3' and 'b3' for the weights and biases # 48 | # of the output affine layer. # 49 | ############################################################################ 50 | C, H, W = input_dim 51 | self.params['W1'] = np.random.randn(num_filters, C, filter_size, filter_size) * weight_scale 52 | self.params['b1'] = np.zeros(num_filters) 53 | self.params['W2'] = np.random.randn(num_filters*H*W/4, hidden_dim)*weight_scale 54 | self.params['b2'] = np.zeros(hidden_dim) 55 | self.params['W3'] = np.random.randn(hidden_dim, num_classes)*weight_scale 56 | self.params['b3'] = np.zeros(num_classes) 57 | ############################################################################ 58 | # END OF YOUR CODE # 59 | ############################################################################ 60 | 61 | for k, v in self.params.iteritems(): 62 | self.params[k] = v.astype(dtype) 63 | 64 | 65 | def loss(self, X, y=None): 66 | """ 67 | Evaluate loss and gradient for the three-layer convolutional network. 68 | 69 | Input / output: Same API as TwoLayerNet in fc_net.py. 70 | """ 71 | W1, b1 = self.params['W1'], self.params['b1'] 72 | W2, b2 = self.params['W2'], self.params['b2'] 73 | W3, b3 = self.params['W3'], self.params['b3'] 74 | 75 | # pass conv_param to the forward pass for the convolutional layer 76 | filter_size = W1.shape[2] 77 | conv_param = {'stride': 1, 'pad': (filter_size - 1) / 2} 78 | 79 | # pass pool_param to the forward pass for the max-pooling layer 80 | pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2} 81 | 82 | scores = None 83 | ############################################################################ 84 | # TODO: Implement the forward pass for the three-layer convolutional net, # 85 | # computing the class scores for X and storing them in the scores # 86 | # variable. # 87 | ############################################################################ 88 | out, cache1 = conv_relu_pool_forward(X, W1, b1, conv_param, pool_param) 89 | out, cache2 = affine_relu_forward(out, W2, b2) 90 | scores, cache3 = affine_forward(out, W3, b3) 91 | ############################################################################ 92 | # END OF YOUR CODE # 93 | ############################################################################ 94 | 95 | if y is None: 96 | return scores 97 | 98 | loss, grads = 0, {} 99 | ############################################################################ 100 | # TODO: Implement the backward pass for the three-layer convolutional net, # 101 | # storing the loss and gradients in the loss and grads variables. Compute # 102 | # data loss using softmax, and make sure that grads[k] holds the gradients # 103 | # for self.params[k]. Don't forget to add L2 regularization! # 104 | ############################################################################ 105 | loss, dx = softmax_loss(scores, y) 106 | loss += 0.5 * self.reg*(np.sum(self.params['W1']* self.params['W1']) + \ 107 | np.sum(self.params['W2']* self.params['W2'])+np.sum(self.params['W3']* self.params['W3'])) 108 | 109 | dx, grads['W3'], grads['b3'] = affine_backward(dx, cache3) 110 | dx, grads['W2'], grads['b2'] = affine_relu_backward(dx, cache2) 111 | _, grads['W1'], grads['b1'] = conv_relu_pool_backward(dx, cache1) 112 | grads['W1'] += self.reg*self.params['W1'] 113 | grads['W2'] += self.reg*self.params['W2'] 114 | grads['W3'] += self.reg*self.params['W3'] 115 | ############################################################################ 116 | # END OF YOUR CODE # 117 | ############################################################################ 118 | 119 | return loss, grads 120 | 121 | 122 | pass 123 | -------------------------------------------------------------------------------- /assignment2/cs231n/classifiers/fc_net.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from cs231n.layers import * 4 | from cs231n.layer_utils import * 5 | 6 | 7 | class TwoLayerNet(object): 8 | """ 9 | A two-layer fully-connected neural network with ReLU nonlinearity and 10 | softmax loss that uses a modular layer design. We assume an input dimension 11 | of D, a hidden dimension of H, and perform classification over C classes. 12 | 13 | The architecure should be affine - relu - affine - softmax. 14 | 15 | Note that this class does not implement gradient descent; instead, it 16 | will interact with a separate Solver object that is responsible for running 17 | optimization. 18 | 19 | The learnable parameters of the model are stored in the dictionary 20 | self.params that maps parameter names to numpy arrays. 21 | """ 22 | 23 | def __init__(self, input_dim=3*32*32, hidden_dim=100, num_classes=10, 24 | weight_scale=1e-3, reg=0.0): 25 | """ 26 | Initialize a new network. 27 | 28 | Inputs: 29 | - input_dim: An integer giving the size of the input 30 | - hidden_dim: An integer giving the size of the hidden layer 31 | - num_classes: An integer giving the number of classes to classify 32 | - dropout: Scalar between 0 and 1 giving dropout strength. 33 | - weight_scale: Scalar giving the standard deviation for random 34 | initialization of the weights. 35 | - reg: Scalar giving L2 regularization strength. 36 | """ 37 | self.params = {} 38 | self.reg = reg 39 | 40 | ############################################################################ 41 | # TODO: Initialize the weights and biases of the two-layer net. Weights # 42 | # should be initialized from a Gaussian with standard deviation equal to # 43 | # weight_scale, and biases should be initialized to zero. All weights and # 44 | # biases should be stored in the dictionary self.params, with first layer # 45 | # weights and biases using the keys 'W1' and 'b1' and second layer weights # 46 | # and biases using the keys 'W2' and 'b2'. # 47 | ############################################################################ 48 | self.params['W1'] = np.random.randn(input_dim, hidden_dim) * weight_scale 49 | self.params['b1'] = np.zeros(hidden_dim) 50 | self.params['W2'] = np.random.randn(hidden_dim, num_classes) * weight_scale 51 | self.params['b2'] = np.zeros(num_classes) 52 | ############################################################################ 53 | # END OF YOUR CODE # 54 | ############################################################################ 55 | 56 | 57 | def loss(self, X, y=None): 58 | """ 59 | Compute loss and gradient for a minibatch of data. 60 | 61 | Inputs: 62 | - X: Array of input data of shape (N, d_1, ..., d_k) 63 | - y: Array of labels, of shape (N,). y[i] gives the label for X[i]. 64 | 65 | Returns: 66 | If y is None, then run a test-time forward pass of the model and return: 67 | - scores: Array of shape (N, C) giving classification scores, where 68 | scores[i, c] is the classification score for X[i] and class c. 69 | 70 | If y is not None, then run a training-time forward and backward pass and 71 | return a tuple of: 72 | - loss: Scalar value giving the loss 73 | - grads: Dictionary with the same keys as self.params, mapping parameter 74 | names to gradients of the loss with respect to those parameters. 75 | """ 76 | scores = None 77 | ############################################################################ 78 | # TODO: Implement the forward pass for the two-layer net, computing the # 79 | # class scores for X and storing them in the scores variable. # 80 | ############################################################################ 81 | out1, cache1 = affine_relu_forward(X, self.params['W1'], self.params['b1']) 82 | scores, cache2 = affine_forward(out1, self.params['W2'], self.params['b2']) 83 | ############################################################################ 84 | # END OF YOUR CODE # 85 | ############################################################################ 86 | 87 | # If y is None then we are in test mode so just return scores 88 | if y is None: 89 | return scores 90 | 91 | loss, grads = 0, {} 92 | ############################################################################ 93 | # TODO: Implement the backward pass for the two-layer net. Store the loss # 94 | # in the loss variable and gradients in the grads dictionary. Compute data # 95 | # loss using softmax, and make sure that grads[k] holds the gradients for # 96 | # self.params[k]. Don't forget to add L2 regularization! # 97 | # # 98 | # NOTE: To ensure that your implementation matches ours and you pass the # 99 | # automated tests, make sure that your L2 regularization includes a factor # 100 | # of 0.5 to simplify the expression for the gradient. # 101 | ############################################################################ 102 | loss, dx = softmax_loss(scores, y) 103 | loss += self.reg*0.5*(np.sum(np.square(self.params['W1']))+ 104 | np.sum(np.square(self.params['W2']))) 105 | 106 | dx, grads['W2'], grads['b2'] = affine_backward(dx, cache2) 107 | dx, grads['W1'], grads['b1'] = affine_relu_backward(dx, cache1) 108 | """MISTAKE! Here I take grads as regularization mistakenly""" 109 | # grads['W2'] += self.reg*grads['W2'] 110 | # grads['W1'] += self.reg*grads['W1'] 111 | grads['W1'] += self.reg*self.params['W1'] 112 | grads['W2'] += self.reg*self.params['W2'] 113 | ############################################################################ 114 | # END OF YOUR CODE # 115 | ############################################################################ 116 | 117 | return loss, grads 118 | 119 | 120 | class FullyConnectedNet(object): 121 | """ 122 | A fully-connected neural network with an arbitrary number of hidden layers, 123 | ReLU nonlinearities, and a softmax loss function. This will also implement 124 | dropout and batch normalization as options. For a network with L layers, 125 | the architecture will be 126 | 127 | {affine - [batch norm] - relu - [dropout]} x (L - 1) - affine - softmax 128 | 129 | where batch normalization and dropout are optional, and the {...} block is 130 | repeated L - 1 times. 131 | 132 | Similar to the TwoLayerNet above, learnable parameters are stored in the 133 | self.params dictionary and will be learned using the Solver class. 134 | """ 135 | 136 | def __init__(self, hidden_dims, input_dim=3*32*32, num_classes=10, 137 | dropout=0, use_batchnorm=False, reg=0.0, 138 | weight_scale=1e-2, dtype=np.float32, seed=None): 139 | """ 140 | Initialize a new FullyConnectedNet. 141 | 142 | Inputs: 143 | - hidden_dims: A list of integers giving the size of each hidden layer. 144 | - input_dim: An integer giving the size of the input. 145 | - num_classes: An integer giving the number of classes to classify. 146 | - dropout: Scalar between 0 and 1 giving dropout strength. If dropout=0 then 147 | the network should not use dropout at all. 148 | - use_batchnorm: Whether or not the network should use batch normalization. 149 | - reg: Scalar giving L2 regularization strength. 150 | - weight_scale: Scalar giving the standard deviation for random 151 | initialization of the weights. 152 | - dtype: A numpy datatype object; all computations will be performed using 153 | this datatype. float32 is faster but less accurate, so you should use 154 | float64 for numeric gradient checking. 155 | - seed: If not None, then pass this random seed to the dropout layers. This 156 | will make the dropout layers deteriminstic so we can gradient check the 157 | model. 158 | """ 159 | self.use_batchnorm = use_batchnorm 160 | self.use_dropout = dropout > 0 161 | self.reg = reg 162 | self.num_layers = 1 + len(hidden_dims) 163 | self.dtype = dtype 164 | self.params = {} 165 | 166 | ############################################################################ 167 | # TODO: Initialize the parameters of the network, storing all values in # 168 | # the self.params dictionary. Store weights and biases for the first layer # 169 | # in W1 and b1; for the second layer use W2 and b2, etc. Weights should be # 170 | # initialized from a normal distribution with standard deviation equal to # 171 | # weight_scale and biases should be initialized to zero. # 172 | # # 173 | # When using batch normalization, store scale and shift parameters for the # 174 | # first layer in gamma1 and beta1; for the second layer use gamma2 and # 175 | # beta2, etc. Scale parameters should be initialized to one and shift # 176 | # parameters should be initialized to zero. # 177 | ############################################################################ 178 | hidden_layers = len(hidden_dims) 179 | self.params['W1'] = np.random.randn(input_dim, hidden_dims[0]) * weight_scale 180 | self.params['b1'] = np.zeros(hidden_dims[0]) 181 | # In fact 'b' is useless if we use batch norm 182 | if self.use_batchnorm: 183 | del self.params['b1'] 184 | self.params['gamma1'] = np.ones(hidden_dims[0]) 185 | self.params['beta1'] = np.zeros(hidden_dims[0]) 186 | self.params['W'+str(self.num_layers)] = np.random.randn(hidden_dims[-1], num_classes) * weight_scale 187 | self.params['b'+str(self.num_layers)] = np.zeros(num_classes) 188 | for i in range(1, hidden_layers): 189 | self.params['W'+str(i+1)] = np.random.randn(hidden_dims[i-1], hidden_dims[i]) * weight_scale 190 | self.params['b'+str(i+1)] = np.zeros(hidden_dims[i]) 191 | if self.use_batchnorm: 192 | del self.params['b'+str(i+1)] 193 | self.params['gamma'+str(i+1)] = np.ones(hidden_dims[i]) 194 | self.params['beta'+str(i+1)] = np.zeros(hidden_dims[i]) 195 | 196 | ############################################################################ 197 | # END OF YOUR CODE # 198 | ############################################################################ 199 | 200 | # When using dropout we need to pass a dropout_param dictionary to each 201 | # dropout layer so that the layer knows the dropout probability and the mode 202 | # (train / test). You can pass the same dropout_param to each dropout layer. 203 | self.dropout_param = {} 204 | if self.use_dropout: 205 | self.dropout_param = {'mode': 'train', 'p': dropout} 206 | if seed is not None: 207 | self.dropout_param['seed'] = seed 208 | 209 | # With batch normalization we need to keep track of running means and 210 | # variances, so we need to pass a special bn_param object to each batch 211 | # normalization layer. You should pass self.bn_params[0] to the forward pass 212 | # of the first batch normalization layer, self.bn_params[1] to the forward 213 | # pass of the second batch normalization layer, etc. 214 | self.bn_params = [] 215 | if self.use_batchnorm: 216 | self.bn_params = [{'mode': 'train'} for i in xrange(self.num_layers - 1)] 217 | 218 | # Cast all parameters to the correct datatype 219 | for k, v in self.params.iteritems(): 220 | self.params[k] = v.astype(dtype) 221 | 222 | 223 | def loss(self, X, y=None): 224 | """ 225 | Compute loss and gradient for the fully-connected net. 226 | 227 | Input / output: Same as TwoLayerNet above. 228 | """ 229 | X = X.astype(self.dtype) 230 | mode = 'test' if y is None else 'train' 231 | 232 | # Set train/test mode for batchnorm params and dropout param since they 233 | # behave differently during training and testing. 234 | if self.dropout_param is not None: 235 | self.dropout_param['mode'] = mode 236 | if self.use_batchnorm: 237 | for bn_param in self.bn_params: 238 | # I think the author made a mistake 239 | # bn_param[mode] = mode 240 | bn_param['mode'] = mode 241 | 242 | 243 | scores = None 244 | ############################################################################ 245 | # TODO: Implement the forward pass for the fully-connected net, computing # 246 | # the class scores for X and storing them in the scores variable. # 247 | # # 248 | # When using dropout, you'll need to pass self.dropout_param to each # 249 | # dropout forward pass. # 250 | # # 251 | # When using batch normalization, you'll need to pass self.bn_params[0] to # 252 | # the forward pass for the first batch normalization layer, pass # 253 | # self.bn_params[1] to the forward pass for the second batch normalization # 254 | # layer, etc. # 255 | ############################################################################ 256 | self.cache = {} 257 | hidden_layers = self.num_layers-1 258 | if not self.use_batchnorm: 259 | out, self.cache['cache1'] = affine_relu_forward(X, self.params['W1'], self.params['b1']) 260 | else: 261 | out, self.cache['cache1'] = affine_bn_relu_forward(X, self.params['W1'], self.params['gamma1'], self.params['beta1'], self.bn_params[0]) 262 | 263 | if self.use_dropout: 264 | out, self.cache['drop1'] = dropout_forward(out, self.dropout_param) 265 | 266 | for i in range(2, hidden_layers+1): 267 | if not self.use_batchnorm: 268 | out, self.cache['cache'+str(i)] = affine_relu_forward(out, self.params['W'+str(i)], self.params['b'+str(i)]) 269 | else: 270 | out, self.cache['cache'+str(i)] = affine_bn_relu_forward(out, self.params['W'+str(i)], self.params['gamma'+str(i)], self.params['beta'+str(i)], self.bn_params[i-1]) 271 | 272 | if self.use_dropout: 273 | out, self.cache['drop'+str(i)] = dropout_forward(out, self.dropout_param) 274 | 275 | scores, self.cache['cache'+str(hidden_layers+1)] = affine_forward(out, self.params['W'+str(hidden_layers+1)], self.params['b'+str(hidden_layers+1)]) 276 | ############################################################################ 277 | # END OF YOUR CODE # 278 | ############################################################################ 279 | 280 | # If test mode return early 281 | if mode == 'test': 282 | return scores 283 | 284 | loss, grads = 0.0, {} 285 | ############################################################################ 286 | # TODO: Implement the backward pass for the fully-connected net. Store the # 287 | # loss in the loss variable and gradients in the grads dictionary. Compute # 288 | # data loss using softmax, and make sure that grads[k] holds the gradients # 289 | # for self.params[k]. Don't forget to add L2 regularization! # 290 | # # 291 | # When using batch normalization, you don't need to regularize the scale # 292 | # and shift parameters. # 293 | # # 294 | # NOTE: To ensure that your implementation matches ours and you pass the # 295 | # automated tests, make sure that your L2 regularization includes a factor # 296 | # of 0.5 to simplify the expression for the gradient. # 297 | ############################################################################ 298 | # loss = softmax_loss(scores, y) 299 | for i in range(1, hidden_layers+2): 300 | loss += np.sum(np.square(self.params['W'+str(i)])) 301 | loss = loss*0.5*self.reg 302 | loss_0, dx = softmax_loss(scores, y) 303 | loss += loss_0 304 | 305 | if not self.use_batchnorm: 306 | dx, grads['W'+str(hidden_layers+1)], grads['b'+str(hidden_layers+1)] = affine_backward(dx, self.cache['cache'+str(hidden_layers+1)]) 307 | else: 308 | dx, grads['W'+str(hidden_layers+1)], grads['b'+str(hidden_layers+1)] = affine_backward(dx, self.cache['cache'+str(hidden_layers+1)]) 309 | grads['W'+str(hidden_layers+1)] += self.reg*self.params['W'+str(hidden_layers+1)] 310 | 311 | for i in range(hidden_layers, 0, -1): 312 | if self.use_dropout: 313 | dx = dropout_backward(dx, self.cache['drop'+str(i)]) 314 | 315 | if not self.use_batchnorm: 316 | dx, grads['W'+str(i)], grads['b'+str(i)] = affine_relu_backward(dx, self.cache['cache'+str(i)]) 317 | else: 318 | dx, grads['W'+str(i)], grads['gamma'+str(i)], grads['beta'+str(i)] = affine_bn_relu_backward(dx, self.cache['cache'+str(i)]) 319 | grads['W'+str(i)] += self.reg*self.params['W'+str(i)] 320 | ############################################################################ 321 | # END OF YOUR CODE # 322 | ############################################################################ 323 | 324 | return loss, grads 325 | -------------------------------------------------------------------------------- /assignment2/cs231n/data_utils.py: -------------------------------------------------------------------------------- 1 | import cPickle as pickle 2 | import numpy as np 3 | import os 4 | from scipy.misc import imread 5 | 6 | def load_CIFAR_batch(filename): 7 | """ load single batch of cifar """ 8 | with open(filename, 'rb') as f: 9 | datadict = pickle.load(f) 10 | X = datadict['data'] 11 | Y = datadict['labels'] 12 | X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float") 13 | Y = np.array(Y) 14 | return X, Y 15 | 16 | def load_CIFAR10(ROOT): 17 | """ load all of cifar """ 18 | xs = [] 19 | ys = [] 20 | for b in range(1,6): 21 | f = os.path.join(ROOT, 'data_batch_%d' % (b, )) 22 | X, Y = load_CIFAR_batch(f) 23 | xs.append(X) 24 | ys.append(Y) 25 | Xtr = np.concatenate(xs) 26 | Ytr = np.concatenate(ys) 27 | del X, Y 28 | Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch')) 29 | return Xtr, Ytr, Xte, Yte 30 | 31 | 32 | def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000): 33 | """ 34 | Load the CIFAR-10 dataset from disk and perform preprocessing to prepare 35 | it for classifiers. These are the same steps as we used for the SVM, but 36 | condensed to a single function. 37 | """ 38 | # Load the raw CIFAR-10 data 39 | cifar10_dir = 'cs231n/datasets/cifar-10-batches-py' 40 | X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir) 41 | 42 | # Subsample the data 43 | mask = range(num_training, num_training + num_validation) 44 | X_val = X_train[mask] 45 | y_val = y_train[mask] 46 | mask = range(num_training) 47 | X_train = X_train[mask] 48 | y_train = y_train[mask] 49 | mask = range(num_test) 50 | X_test = X_test[mask] 51 | y_test = y_test[mask] 52 | 53 | # Normalize the data: subtract the mean image 54 | mean_image = np.mean(X_train, axis=0) 55 | X_train -= mean_image 56 | X_val -= mean_image 57 | X_test -= mean_image 58 | 59 | # Transpose so that channels come first 60 | X_train = X_train.transpose(0, 3, 1, 2).copy() 61 | X_val = X_val.transpose(0, 3, 1, 2).copy() 62 | X_test = X_test.transpose(0, 3, 1, 2).copy() 63 | 64 | # Package data into a dictionary 65 | return { 66 | 'X_train': X_train, 'y_train': y_train, 67 | 'X_val': X_val, 'y_val': y_val, 68 | 'X_test': X_test, 'y_test': y_test, 69 | } 70 | 71 | 72 | def load_tiny_imagenet(path, dtype=np.float32): 73 | """ 74 | Load TinyImageNet. Each of TinyImageNet-100-A, TinyImageNet-100-B, and 75 | TinyImageNet-200 have the same directory structure, so this can be used 76 | to load any of them. 77 | 78 | Inputs: 79 | - path: String giving path to the directory to load. 80 | - dtype: numpy datatype used to load the data. 81 | 82 | Returns: A tuple of 83 | - class_names: A list where class_names[i] is a list of strings giving the 84 | WordNet names for class i in the loaded dataset. 85 | - X_train: (N_tr, 3, 64, 64) array of training images 86 | - y_train: (N_tr,) array of training labels 87 | - X_val: (N_val, 3, 64, 64) array of validation images 88 | - y_val: (N_val,) array of validation labels 89 | - X_test: (N_test, 3, 64, 64) array of testing images. 90 | - y_test: (N_test,) array of test labels; if test labels are not available 91 | (such as in student code) then y_test will be None. 92 | """ 93 | # First load wnids 94 | with open(os.path.join(path, 'wnids.txt'), 'r') as f: 95 | wnids = [x.strip() for x in f] 96 | 97 | # Map wnids to integer labels 98 | wnid_to_label = {wnid: i for i, wnid in enumerate(wnids)} 99 | 100 | # Use words.txt to get names for each class 101 | with open(os.path.join(path, 'words.txt'), 'r') as f: 102 | wnid_to_words = dict(line.split('\t') for line in f) 103 | for wnid, words in wnid_to_words.iteritems(): 104 | wnid_to_words[wnid] = [w.strip() for w in words.split(',')] 105 | class_names = [wnid_to_words[wnid] for wnid in wnids] 106 | 107 | # Next load training data. 108 | X_train = [] 109 | y_train = [] 110 | for i, wnid in enumerate(wnids): 111 | if (i + 1) % 20 == 0: 112 | print 'loading training data for synset %d / %d' % (i + 1, len(wnids)) 113 | # To figure out the filenames we need to open the boxes file 114 | boxes_file = os.path.join(path, 'train', wnid, '%s_boxes.txt' % wnid) 115 | with open(boxes_file, 'r') as f: 116 | filenames = [x.split('\t')[0] for x in f] 117 | num_images = len(filenames) 118 | 119 | X_train_block = np.zeros((num_images, 3, 64, 64), dtype=dtype) 120 | y_train_block = wnid_to_label[wnid] * np.ones(num_images, dtype=np.int64) 121 | for j, img_file in enumerate(filenames): 122 | img_file = os.path.join(path, 'train', wnid, 'images', img_file) 123 | img = imread(img_file) 124 | if img.ndim == 2: 125 | ## grayscale file 126 | img.shape = (64, 64, 1) 127 | X_train_block[j] = img.transpose(2, 0, 1) 128 | X_train.append(X_train_block) 129 | y_train.append(y_train_block) 130 | 131 | # We need to concatenate all training data 132 | X_train = np.concatenate(X_train, axis=0) 133 | y_train = np.concatenate(y_train, axis=0) 134 | 135 | # Next load validation data 136 | with open(os.path.join(path, 'val', 'val_annotations.txt'), 'r') as f: 137 | img_files = [] 138 | val_wnids = [] 139 | for line in f: 140 | img_file, wnid = line.split('\t')[:2] 141 | img_files.append(img_file) 142 | val_wnids.append(wnid) 143 | num_val = len(img_files) 144 | y_val = np.array([wnid_to_label[wnid] for wnid in val_wnids]) 145 | X_val = np.zeros((num_val, 3, 64, 64), dtype=dtype) 146 | for i, img_file in enumerate(img_files): 147 | img_file = os.path.join(path, 'val', 'images', img_file) 148 | img = imread(img_file) 149 | if img.ndim == 2: 150 | img.shape = (64, 64, 1) 151 | X_val[i] = img.transpose(2, 0, 1) 152 | 153 | # Next load test images 154 | # Students won't have test labels, so we need to iterate over files in the 155 | # images directory. 156 | img_files = os.listdir(os.path.join(path, 'test', 'images')) 157 | X_test = np.zeros((len(img_files), 3, 64, 64), dtype=dtype) 158 | for i, img_file in enumerate(img_files): 159 | img_file = os.path.join(path, 'test', 'images', img_file) 160 | img = imread(img_file) 161 | if img.ndim == 2: 162 | img.shape = (64, 64, 1) 163 | X_test[i] = img.transpose(2, 0, 1) 164 | 165 | y_test = None 166 | y_test_file = os.path.join(path, 'test', 'test_annotations.txt') 167 | if os.path.isfile(y_test_file): 168 | with open(y_test_file, 'r') as f: 169 | img_file_to_wnid = {} 170 | for line in f: 171 | line = line.split('\t') 172 | img_file_to_wnid[line[0]] = line[1] 173 | y_test = [wnid_to_label[img_file_to_wnid[img_file]] for img_file in img_files] 174 | y_test = np.array(y_test) 175 | 176 | return class_names, X_train, y_train, X_val, y_val, X_test, y_test 177 | 178 | 179 | def load_models(models_dir): 180 | """ 181 | Load saved models from disk. This will attempt to unpickle all files in a 182 | directory; any files that give errors on unpickling (such as README.txt) will 183 | be skipped. 184 | 185 | Inputs: 186 | - models_dir: String giving the path to a directory containing model files. 187 | Each model file is a pickled dictionary with a 'model' field. 188 | 189 | Returns: 190 | A dictionary mapping model file names to models. 191 | """ 192 | models = {} 193 | for model_file in os.listdir(models_dir): 194 | with open(os.path.join(models_dir, model_file), 'rb') as f: 195 | try: 196 | models[model_file] = pickle.load(f)['model'] 197 | except pickle.UnpicklingError: 198 | continue 199 | return models 200 | -------------------------------------------------------------------------------- /assignment2/cs231n/datasets/.gitignore: -------------------------------------------------------------------------------- 1 | cifar-10-batches-py/* 2 | tiny-imagenet-100-A* 3 | tiny-imagenet-100-B* 4 | tiny-100-A-pretrained/* 5 | -------------------------------------------------------------------------------- /assignment2/cs231n/datasets/get_datasets.sh: -------------------------------------------------------------------------------- 1 | # Get CIFAR10 2 | wget http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz 3 | tar -xzvf cifar-10-python.tar.gz 4 | rm cifar-10-python.tar.gz 5 | -------------------------------------------------------------------------------- /assignment2/cs231n/fast_layers.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | try: 3 | from cs231n.im2col_cython import col2im_cython, im2col_cython 4 | from cs231n.im2col_cython import col2im_6d_cython 5 | except ImportError: 6 | print 'run the following from the cs231n directory and try again:' 7 | print 'python setup.py build_ext --inplace' 8 | print 'You may also need to restart your iPython kernel' 9 | 10 | from cs231n.im2col import * 11 | 12 | 13 | def conv_forward_im2col(x, w, b, conv_param): 14 | """ 15 | A fast implementation of the forward pass for a convolutional layer 16 | based on im2col and col2im. 17 | """ 18 | N, C, H, W = x.shape 19 | num_filters, _, filter_height, filter_width = w.shape 20 | stride, pad = conv_param['stride'], conv_param['pad'] 21 | 22 | # Check dimensions 23 | assert (W + 2 * pad - filter_width) % stride == 0, 'width does not work' 24 | assert (H + 2 * pad - filter_height) % stride == 0, 'height does not work' 25 | 26 | # Create output 27 | out_height = (H + 2 * pad - filter_height) / stride + 1 28 | out_width = (W + 2 * pad - filter_width) / stride + 1 29 | out = np.zeros((N, num_filters, out_height, out_width), dtype=x.dtype) 30 | 31 | # x_cols = im2col_indices(x, w.shape[2], w.shape[3], pad, stride) 32 | x_cols = im2col_cython(x, w.shape[2], w.shape[3], pad, stride) 33 | res = w.reshape((w.shape[0], -1)).dot(x_cols) + b.reshape(-1, 1) 34 | 35 | out = res.reshape(w.shape[0], out.shape[2], out.shape[3], x.shape[0]) 36 | out = out.transpose(3, 0, 1, 2) 37 | 38 | cache = (x, w, b, conv_param, x_cols) 39 | return out, cache 40 | 41 | 42 | def conv_forward_strides(x, w, b, conv_param): 43 | N, C, H, W = x.shape 44 | F, _, HH, WW = w.shape 45 | stride, pad = conv_param['stride'], conv_param['pad'] 46 | 47 | # Check dimensions 48 | assert (W + 2 * pad - WW) % stride == 0, 'width does not work' 49 | assert (H + 2 * pad - HH) % stride == 0, 'height does not work' 50 | 51 | # Pad the input 52 | p = pad 53 | x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant') 54 | 55 | # Figure out output dimensions 56 | H += 2 * pad 57 | W += 2 * pad 58 | out_h = (H - HH) / stride + 1 59 | out_w = (W - WW) / stride + 1 60 | 61 | # Perform an im2col operation by picking clever strides 62 | shape = (C, HH, WW, N, out_h, out_w) 63 | strides = (H * W, W, 1, C * H * W, stride * W, stride) 64 | strides = x.itemsize * np.array(strides) 65 | x_stride = np.lib.stride_tricks.as_strided(x_padded, 66 | shape=shape, strides=strides) 67 | x_cols = np.ascontiguousarray(x_stride) 68 | x_cols.shape = (C * HH * WW, N * out_h * out_w) 69 | 70 | # Now all our convolutions are a big matrix multiply 71 | res = w.reshape(F, -1).dot(x_cols) + b.reshape(-1, 1) 72 | 73 | # Reshape the output 74 | res.shape = (F, N, out_h, out_w) 75 | out = res.transpose(1, 0, 2, 3) 76 | 77 | # Be nice and return a contiguous array 78 | # The old version of conv_forward_fast doesn't do this, so for a fair 79 | # comparison we won't either 80 | out = np.ascontiguousarray(out) 81 | 82 | cache = (x, w, b, conv_param, x_cols) 83 | return out, cache 84 | 85 | 86 | def conv_backward_strides(dout, cache): 87 | x, w, b, conv_param, x_cols = cache 88 | stride, pad = conv_param['stride'], conv_param['pad'] 89 | 90 | N, C, H, W = x.shape 91 | F, _, HH, WW = w.shape 92 | _, _, out_h, out_w = dout.shape 93 | 94 | db = np.sum(dout, axis=(0, 2, 3)) 95 | 96 | dout_reshaped = dout.transpose(1, 0, 2, 3).reshape(F, -1) 97 | dw = dout_reshaped.dot(x_cols.T).reshape(w.shape) 98 | 99 | dx_cols = w.reshape(F, -1).T.dot(dout_reshaped) 100 | dx_cols.shape = (C, HH, WW, N, out_h, out_w) 101 | dx = col2im_6d_cython(dx_cols, N, C, H, W, HH, WW, pad, stride) 102 | 103 | return dx, dw, db 104 | 105 | 106 | def conv_backward_im2col(dout, cache): 107 | """ 108 | A fast implementation of the backward pass for a convolutional layer 109 | based on im2col and col2im. 110 | """ 111 | x, w, b, conv_param, x_cols = cache 112 | stride, pad = conv_param['stride'], conv_param['pad'] 113 | 114 | db = np.sum(dout, axis=(0, 2, 3)) 115 | 116 | num_filters, _, filter_height, filter_width = w.shape 117 | dout_reshaped = dout.transpose(1, 2, 3, 0).reshape(num_filters, -1) 118 | dw = dout_reshaped.dot(x_cols.T).reshape(w.shape) 119 | 120 | dx_cols = w.reshape(num_filters, -1).T.dot(dout_reshaped) 121 | # dx = col2im_indices(dx_cols, x.shape, filter_height, filter_width, pad, stride) 122 | dx = col2im_cython(dx_cols, x.shape[0], x.shape[1], x.shape[2], x.shape[3], 123 | filter_height, filter_width, pad, stride) 124 | 125 | return dx, dw, db 126 | 127 | 128 | conv_forward_fast = conv_forward_strides 129 | conv_backward_fast = conv_backward_strides 130 | 131 | 132 | def max_pool_forward_fast(x, pool_param): 133 | """ 134 | A fast implementation of the forward pass for a max pooling layer. 135 | 136 | This chooses between the reshape method and the im2col method. If the pooling 137 | regions are square and tile the input image, then we can use the reshape 138 | method which is very fast. Otherwise we fall back on the im2col method, which 139 | is not much faster than the naive method. 140 | """ 141 | N, C, H, W = x.shape 142 | pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width'] 143 | stride = pool_param['stride'] 144 | 145 | same_size = pool_height == pool_width == stride 146 | tiles = H % pool_height == 0 and W % pool_width == 0 147 | if same_size and tiles: 148 | out, reshape_cache = max_pool_forward_reshape(x, pool_param) 149 | cache = ('reshape', reshape_cache) 150 | else: 151 | out, im2col_cache = max_pool_forward_im2col(x, pool_param) 152 | cache = ('im2col', im2col_cache) 153 | return out, cache 154 | 155 | 156 | def max_pool_backward_fast(dout, cache): 157 | """ 158 | A fast implementation of the backward pass for a max pooling layer. 159 | 160 | This switches between the reshape method an the im2col method depending on 161 | which method was used to generate the cache. 162 | """ 163 | method, real_cache = cache 164 | if method == 'reshape': 165 | return max_pool_backward_reshape(dout, real_cache) 166 | elif method == 'im2col': 167 | return max_pool_backward_im2col(dout, real_cache) 168 | else: 169 | raise ValueError('Unrecognized method "%s"' % method) 170 | 171 | 172 | def max_pool_forward_reshape(x, pool_param): 173 | """ 174 | A fast implementation of the forward pass for the max pooling layer that uses 175 | some clever reshaping. 176 | 177 | This can only be used for square pooling regions that tile the input. 178 | """ 179 | N, C, H, W = x.shape 180 | pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width'] 181 | stride = pool_param['stride'] 182 | assert pool_height == pool_width == stride, 'Invalid pool params' 183 | assert H % pool_height == 0 184 | assert W % pool_height == 0 185 | x_reshaped = x.reshape(N, C, H / pool_height, pool_height, 186 | W / pool_width, pool_width) 187 | out = x_reshaped.max(axis=3).max(axis=4) 188 | 189 | cache = (x, x_reshaped, out) 190 | return out, cache 191 | 192 | 193 | def max_pool_backward_reshape(dout, cache): 194 | """ 195 | A fast implementation of the backward pass for the max pooling layer that 196 | uses some clever broadcasting and reshaping. 197 | 198 | This can only be used if the forward pass was computed using 199 | max_pool_forward_reshape. 200 | 201 | NOTE: If there are multiple argmaxes, this method will assign gradient to 202 | ALL argmax elements of the input rather than picking one. In this case the 203 | gradient will actually be incorrect. However this is unlikely to occur in 204 | practice, so it shouldn't matter much. One possible solution is to split the 205 | upstream gradient equally among all argmax elements; this should result in a 206 | valid subgradient. You can make this happen by uncommenting the line below; 207 | however this results in a significant performance penalty (about 40% slower) 208 | and is unlikely to matter in practice so we don't do it. 209 | """ 210 | x, x_reshaped, out = cache 211 | 212 | dx_reshaped = np.zeros_like(x_reshaped) 213 | out_newaxis = out[:, :, :, np.newaxis, :, np.newaxis] 214 | mask = (x_reshaped == out_newaxis) 215 | dout_newaxis = dout[:, :, :, np.newaxis, :, np.newaxis] 216 | dout_broadcast, _ = np.broadcast_arrays(dout_newaxis, dx_reshaped) 217 | dx_reshaped[mask] = dout_broadcast[mask] 218 | dx_reshaped /= np.sum(mask, axis=(3, 5), keepdims=True) 219 | dx = dx_reshaped.reshape(x.shape) 220 | 221 | return dx 222 | 223 | 224 | def max_pool_forward_im2col(x, pool_param): 225 | """ 226 | An implementation of the forward pass for max pooling based on im2col. 227 | 228 | This isn't much faster than the naive version, so it should be avoided if 229 | possible. 230 | """ 231 | N, C, H, W = x.shape 232 | pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width'] 233 | stride = pool_param['stride'] 234 | 235 | assert (H - pool_height) % stride == 0, 'Invalid height' 236 | assert (W - pool_width) % stride == 0, 'Invalid width' 237 | 238 | out_height = (H - pool_height) / stride + 1 239 | out_width = (W - pool_width) / stride + 1 240 | 241 | x_split = x.reshape(N * C, 1, H, W) 242 | x_cols = im2col(x_split, pool_height, pool_width, padding=0, stride=stride) 243 | x_cols_argmax = np.argmax(x_cols, axis=0) 244 | x_cols_max = x_cols[x_cols_argmax, np.arange(x_cols.shape[1])] 245 | out = x_cols_max.reshape(out_height, out_width, N, C).transpose(2, 3, 0, 1) 246 | 247 | cache = (x, x_cols, x_cols_argmax, pool_param) 248 | return out, cache 249 | 250 | 251 | def max_pool_backward_im2col(dout, cache): 252 | """ 253 | An implementation of the backward pass for max pooling based on im2col. 254 | 255 | This isn't much faster than the naive version, so it should be avoided if 256 | possible. 257 | """ 258 | x, x_cols, x_cols_argmax, pool_param = cache 259 | N, C, H, W = x.shape 260 | pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width'] 261 | stride = pool_param['stride'] 262 | 263 | dout_reshaped = dout.transpose(2, 3, 0, 1).flatten() 264 | dx_cols = np.zeros_like(x_cols) 265 | dx_cols[x_cols_argmax, np.arange(dx_cols.shape[1])] = dout_reshaped 266 | dx = col2im_indices(dx_cols, (N * C, 1, H, W), pool_height, pool_width, 267 | padding=0, stride=stride) 268 | dx = dx.reshape(x.shape) 269 | 270 | return dx 271 | -------------------------------------------------------------------------------- /assignment2/cs231n/gradient_check.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from random import randrange 3 | 4 | def eval_numerical_gradient(f, x, verbose=True, h=0.00001): 5 | """ 6 | a naive implementation of numerical gradient of f at x 7 | - f should be a function that takes a single argument 8 | - x is the point (numpy array) to evaluate the gradient at 9 | """ 10 | 11 | fx = f(x) # evaluate function value at original point 12 | grad = np.zeros_like(x) 13 | # iterate over all indexes in x 14 | it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite']) 15 | while not it.finished: 16 | 17 | # evaluate function at x+h 18 | ix = it.multi_index 19 | oldval = x[ix] 20 | x[ix] = oldval + h # increment by h 21 | fxph = f(x) # evalute f(x + h) 22 | x[ix] = oldval - h 23 | fxmh = f(x) # evaluate f(x - h) 24 | x[ix] = oldval # restore 25 | 26 | # compute the partial derivative with centered formula 27 | grad[ix] = (fxph - fxmh) / (2 * h) # the slope 28 | if verbose: 29 | print ix, grad[ix] 30 | it.iternext() # step to next dimension 31 | 32 | return grad 33 | 34 | 35 | def eval_numerical_gradient_array(f, x, df, h=1e-5): 36 | """ 37 | Evaluate a numeric gradient for a function that accepts a numpy 38 | array and returns a numpy array. 39 | """ 40 | grad = np.zeros_like(x) 41 | it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite']) 42 | while not it.finished: 43 | ix = it.multi_index 44 | 45 | oldval = x[ix] 46 | x[ix] = oldval + h 47 | pos = f(x).copy() 48 | x[ix] = oldval - h 49 | neg = f(x).copy() 50 | x[ix] = oldval 51 | 52 | grad[ix] = np.sum((pos - neg) * df) / (2 * h) 53 | it.iternext() 54 | return grad 55 | 56 | 57 | def eval_numerical_gradient_blobs(f, inputs, output, h=1e-5): 58 | """ 59 | Compute numeric gradients for a function that operates on input 60 | and output blobs. 61 | 62 | We assume that f accepts several input blobs as arguments, followed by a blob 63 | into which outputs will be written. For example, f might be called like this: 64 | 65 | f(x, w, out) 66 | 67 | where x and w are input Blobs, and the result of f will be written to out. 68 | 69 | Inputs: 70 | - f: function 71 | - inputs: tuple of input blobs 72 | - output: output blob 73 | - h: step size 74 | """ 75 | numeric_diffs = [] 76 | for input_blob in inputs: 77 | diff = np.zeros_like(input_blob.diffs) 78 | it = np.nditer(input_blob.vals, flags=['multi_index'], 79 | op_flags=['readwrite']) 80 | while not it.finished: 81 | idx = it.multi_index 82 | orig = input_blob.vals[idx] 83 | 84 | input_blob.vals[idx] = orig + h 85 | f(*(inputs + (output,))) 86 | pos = np.copy(output.vals) 87 | input_blob.vals[idx] = orig - h 88 | f(*(inputs + (output,))) 89 | neg = np.copy(output.vals) 90 | input_blob.vals[idx] = orig 91 | 92 | diff[idx] = np.sum((pos - neg) * output.diffs) / (2.0 * h) 93 | 94 | it.iternext() 95 | numeric_diffs.append(diff) 96 | return numeric_diffs 97 | 98 | 99 | def eval_numerical_gradient_net(net, inputs, output, h=1e-5): 100 | return eval_numerical_gradient_blobs(lambda *args: net.forward(), 101 | inputs, output, h=h) 102 | 103 | 104 | def grad_check_sparse(f, x, analytic_grad, num_checks=10, h=1e-5): 105 | """ 106 | sample a few random elements and only return numerical 107 | in this dimensions. 108 | """ 109 | 110 | for i in xrange(num_checks): 111 | ix = tuple([randrange(m) for m in x.shape]) 112 | 113 | oldval = x[ix] 114 | x[ix] = oldval + h # increment by h 115 | fxph = f(x) # evaluate f(x + h) 116 | x[ix] = oldval - h # increment by h 117 | fxmh = f(x) # evaluate f(x - h) 118 | x[ix] = oldval # reset 119 | 120 | grad_numerical = (fxph - fxmh) / (2 * h) 121 | grad_analytic = analytic_grad[ix] 122 | rel_error = abs(grad_numerical - grad_analytic) / (abs(grad_numerical) + abs(grad_analytic)) 123 | print 'numerical: %f analytic: %f, relative error: %e' % (grad_numerical, grad_analytic, rel_error) 124 | 125 | -------------------------------------------------------------------------------- /assignment2/cs231n/im2col.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def get_im2col_indices(x_shape, field_height, field_width, padding=1, stride=1): 5 | # First figure out what the size of the output should be 6 | N, C, H, W = x_shape 7 | assert (H + 2 * padding - field_height) % stride == 0 8 | assert (W + 2 * padding - field_height) % stride == 0 9 | out_height = (H + 2 * padding - field_height) / stride + 1 10 | out_width = (W + 2 * padding - field_width) / stride + 1 11 | 12 | i0 = np.repeat(np.arange(field_height), field_width) 13 | i0 = np.tile(i0, C) 14 | i1 = stride * np.repeat(np.arange(out_height), out_width) 15 | j0 = np.tile(np.arange(field_width), field_height * C) 16 | j1 = stride * np.tile(np.arange(out_width), out_height) 17 | i = i0.reshape(-1, 1) + i1.reshape(1, -1) 18 | j = j0.reshape(-1, 1) + j1.reshape(1, -1) 19 | 20 | k = np.repeat(np.arange(C), field_height * field_width).reshape(-1, 1) 21 | 22 | return (k, i, j) 23 | 24 | 25 | def im2col_indices(x, field_height, field_width, padding=1, stride=1): 26 | """ An implementation of im2col based on some fancy indexing """ 27 | # Zero-pad the input 28 | p = padding 29 | x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant') 30 | 31 | k, i, j = get_im2col_indices(x.shape, field_height, field_width, padding, 32 | stride) 33 | 34 | cols = x_padded[:, k, i, j] 35 | C = x.shape[1] 36 | cols = cols.transpose(1, 2, 0).reshape(field_height * field_width * C, -1) 37 | return cols 38 | 39 | 40 | def col2im_indices(cols, x_shape, field_height=3, field_width=3, padding=1, 41 | stride=1): 42 | """ An implementation of col2im based on fancy indexing and np.add.at """ 43 | N, C, H, W = x_shape 44 | H_padded, W_padded = H + 2 * padding, W + 2 * padding 45 | x_padded = np.zeros((N, C, H_padded, W_padded), dtype=cols.dtype) 46 | k, i, j = get_im2col_indices(x_shape, field_height, field_width, padding, 47 | stride) 48 | cols_reshaped = cols.reshape(C * field_height * field_width, -1, N) 49 | cols_reshaped = cols_reshaped.transpose(2, 0, 1) 50 | np.add.at(x_padded, (slice(None), k, i, j), cols_reshaped) 51 | if padding == 0: 52 | return x_padded 53 | return x_padded[:, :, padding:-padding, padding:-padding] 54 | 55 | pass 56 | -------------------------------------------------------------------------------- /assignment2/cs231n/im2col_cython.pyx: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | cimport numpy as np 3 | cimport cython 4 | 5 | # DTYPE = np.float64 6 | # ctypedef np.float64_t DTYPE_t 7 | 8 | ctypedef fused DTYPE_t: 9 | np.float32_t 10 | np.float64_t 11 | 12 | def im2col_cython(np.ndarray[DTYPE_t, ndim=4] x, int field_height, 13 | int field_width, int padding, int stride): 14 | cdef int N = x.shape[0] 15 | cdef int C = x.shape[1] 16 | cdef int H = x.shape[2] 17 | cdef int W = x.shape[3] 18 | 19 | cdef int HH = (H + 2 * padding - field_height) / stride + 1 20 | cdef int WW = (W + 2 * padding - field_width) / stride + 1 21 | 22 | cdef int p = padding 23 | cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.pad(x, 24 | ((0, 0), (0, 0), (p, p), (p, p)), mode='constant') 25 | 26 | cdef np.ndarray[DTYPE_t, ndim=2] cols = np.zeros( 27 | (C * field_height * field_width, N * HH * WW), 28 | dtype=x.dtype) 29 | 30 | # Moving the inner loop to a C function with no bounds checking works, but does 31 | # not seem to help performance in any measurable way. 32 | 33 | im2col_cython_inner(cols, x_padded, N, C, H, W, HH, WW, 34 | field_height, field_width, padding, stride) 35 | return cols 36 | 37 | 38 | @cython.boundscheck(False) 39 | cdef int im2col_cython_inner(np.ndarray[DTYPE_t, ndim=2] cols, 40 | np.ndarray[DTYPE_t, ndim=4] x_padded, 41 | int N, int C, int H, int W, int HH, int WW, 42 | int field_height, int field_width, int padding, int stride) except? -1: 43 | cdef int c, ii, jj, row, yy, xx, i, col 44 | 45 | for c in range(C): 46 | for yy in range(HH): 47 | for xx in range(WW): 48 | for ii in range(field_height): 49 | for jj in range(field_width): 50 | row = c * field_width * field_height + ii * field_height + jj 51 | for i in range(N): 52 | col = yy * WW * N + xx * N + i 53 | cols[row, col] = x_padded[i, c, stride * yy + ii, stride * xx + jj] 54 | 55 | 56 | 57 | def col2im_cython(np.ndarray[DTYPE_t, ndim=2] cols, int N, int C, int H, int W, 58 | int field_height, int field_width, int padding, int stride): 59 | cdef np.ndarray x = np.empty((N, C, H, W), dtype=cols.dtype) 60 | cdef int HH = (H + 2 * padding - field_height) / stride + 1 61 | cdef int WW = (W + 2 * padding - field_width) / stride + 1 62 | cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.zeros((N, C, H + 2 * padding, W + 2 * padding), 63 | dtype=cols.dtype) 64 | 65 | # Moving the inner loop to a C-function with no bounds checking improves 66 | # performance quite a bit for col2im. 67 | col2im_cython_inner(cols, x_padded, N, C, H, W, HH, WW, 68 | field_height, field_width, padding, stride) 69 | if padding > 0: 70 | return x_padded[:, :, padding:-padding, padding:-padding] 71 | return x_padded 72 | 73 | 74 | @cython.boundscheck(False) 75 | cdef int col2im_cython_inner(np.ndarray[DTYPE_t, ndim=2] cols, 76 | np.ndarray[DTYPE_t, ndim=4] x_padded, 77 | int N, int C, int H, int W, int HH, int WW, 78 | int field_height, int field_width, int padding, int stride) except? -1: 79 | cdef int c, ii, jj, row, yy, xx, i, col 80 | 81 | for c in range(C): 82 | for ii in range(field_height): 83 | for jj in range(field_width): 84 | row = c * field_width * field_height + ii * field_height + jj 85 | for yy in range(HH): 86 | for xx in range(WW): 87 | for i in range(N): 88 | col = yy * WW * N + xx * N + i 89 | x_padded[i, c, stride * yy + ii, stride * xx + jj] += cols[row, col] 90 | 91 | 92 | @cython.boundscheck(False) 93 | @cython.wraparound(False) 94 | cdef col2im_6d_cython_inner(np.ndarray[DTYPE_t, ndim=6] cols, 95 | np.ndarray[DTYPE_t, ndim=4] x_padded, 96 | int N, int C, int H, int W, int HH, int WW, 97 | int out_h, int out_w, int pad, int stride): 98 | 99 | cdef int c, hh, ww, n, h, w 100 | for n in range(N): 101 | for c in range(C): 102 | for hh in range(HH): 103 | for ww in range(WW): 104 | for h in range(out_h): 105 | for w in range(out_w): 106 | x_padded[n, c, stride * h + hh, stride * w + ww] += cols[c, hh, ww, n, h, w] 107 | 108 | 109 | def col2im_6d_cython(np.ndarray[DTYPE_t, ndim=6] cols, int N, int C, int H, int W, 110 | int HH, int WW, int pad, int stride): 111 | cdef np.ndarray x = np.empty((N, C, H, W), dtype=cols.dtype) 112 | cdef int out_h = (H + 2 * pad - HH) / stride + 1 113 | cdef int out_w = (W + 2 * pad - WW) / stride + 1 114 | cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.zeros((N, C, H + 2 * pad, W + 2 * pad), 115 | dtype=cols.dtype) 116 | 117 | col2im_6d_cython_inner(cols, x_padded, N, C, H, W, HH, WW, out_h, out_w, pad, stride) 118 | 119 | if pad > 0: 120 | return x_padded[:, :, pad:-pad, pad:-pad] 121 | return x_padded 122 | -------------------------------------------------------------------------------- /assignment2/cs231n/layer_utils.py: -------------------------------------------------------------------------------- 1 | from cs231n.layers import * 2 | from cs231n.fast_layers import * 3 | 4 | 5 | def affine_relu_forward(x, w, b): 6 | """ 7 | Convenience layer that perorms an affine transform followed by a ReLU 8 | 9 | Inputs: 10 | - x: Input to the affine layer 11 | - w, b: Weights for the affine layer 12 | 13 | Returns a tuple of: 14 | - out: Output from the ReLU 15 | - cache: Object to give to the backward pass 16 | """ 17 | a, fc_cache = affine_forward(x, w, b) 18 | out, relu_cache = relu_forward(a) 19 | cache = (fc_cache, relu_cache) 20 | return out, cache 21 | 22 | 23 | def affine_relu_backward(dout, cache): 24 | """ 25 | Backward pass for the affine-relu convenience layer 26 | """ 27 | fc_cache, relu_cache = cache 28 | da = relu_backward(dout, relu_cache) 29 | dx, dw, db = affine_backward(da, fc_cache) 30 | return dx, dw, db 31 | 32 | 33 | def affine_bn_relu_forward(x, w, gamma, beta, bn_param): 34 | """ 35 | Convenience layer that perorms an affine transform followed by a ReLU 36 | and batch normalization 37 | 38 | Inputs: 39 | - x: Input to the affine layer 40 | - w, b: Weights for the affine layer 41 | 42 | Returns a tuple of: 43 | - out: Output from the ReLU 44 | - cache: Object to give to the backward pass 45 | """ 46 | out, fc_cache = affine_forward(x, w, 0) 47 | out, bn_cache = batchnorm_forward(out, gamma, beta, bn_param) 48 | out, relu_cache = relu_forward(out) 49 | cache = (fc_cache, bn_cache, relu_cache) 50 | return out, cache 51 | 52 | def affine_bn_relu_backward(dout, cache): 53 | """ 54 | Backward pass for the affine-bn-relu convenience layer 55 | """ 56 | fc_cache, bn_cache, relu_cache = cache 57 | dout = relu_backward(dout, relu_cache) 58 | dout, dgamma, dbeta = batchnorm_backward(dout, bn_cache) 59 | dx, dw, db = affine_backward(dout, fc_cache) 60 | return dx, dw, dgamma, dbeta 61 | 62 | def conv_relu_forward(x, w, b, conv_param): 63 | """ 64 | A convenience layer that performs a convolution followed by a ReLU. 65 | 66 | Inputs: 67 | - x: Input to the convolutional layer 68 | - w, b, conv_param: Weights and parameters for the convolutional layer 69 | 70 | Returns a tuple of: 71 | - out: Output from the ReLU 72 | - cache: Object to give to the backward pass 73 | """ 74 | a, conv_cache = conv_forward_fast(x, w, b, conv_param) 75 | out, relu_cache = relu_forward(a) 76 | cache = (conv_cache, relu_cache) 77 | return out, cache 78 | 79 | 80 | def conv_relu_backward(dout, cache): 81 | """ 82 | Backward pass for the conv-relu convenience layer. 83 | """ 84 | conv_cache, relu_cache = cache 85 | da = relu_backward(dout, relu_cache) 86 | dx, dw, db = conv_backward_fast(da, conv_cache) 87 | return dx, dw, db 88 | 89 | 90 | def conv_relu_pool_forward(x, w, b, conv_param, pool_param): 91 | """ 92 | Convenience layer that performs a convolution, a ReLU, and a pool. 93 | 94 | Inputs: 95 | - x: Input to the convolutional layer 96 | - w, b, conv_param: Weights and parameters for the convolutional layer 97 | - pool_param: Parameters for the pooling layer 98 | 99 | Returns a tuple of: 100 | - out: Output from the pooling layer 101 | - cache: Object to give to the backward pass 102 | """ 103 | a, conv_cache = conv_forward_fast(x, w, b, conv_param) 104 | s, relu_cache = relu_forward(a) 105 | out, pool_cache = max_pool_forward_fast(s, pool_param) 106 | cache = (conv_cache, relu_cache, pool_cache) 107 | return out, cache 108 | 109 | 110 | def conv_relu_pool_backward(dout, cache): 111 | """ 112 | Backward pass for the conv-relu-pool convenience layer 113 | """ 114 | conv_cache, relu_cache, pool_cache = cache 115 | ds = max_pool_backward_fast(dout, pool_cache) 116 | da = relu_backward(ds, relu_cache) 117 | dx, dw, db = conv_backward_fast(da, conv_cache) 118 | return dx, dw, db 119 | 120 | -------------------------------------------------------------------------------- /assignment2/cs231n/optim.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | """ 4 | This file implements various first-order update rules that are commonly used for 5 | training neural networks. Each update rule accepts current weights and the 6 | gradient of the loss with respect to those weights and produces the next set of 7 | weights. Each update rule has the same interface: 8 | 9 | def update(w, dw, config=None): 10 | 11 | Inputs: 12 | - w: A numpy array giving the current weights. 13 | - dw: A numpy array of the same shape as w giving the gradient of the 14 | loss with respect to w. 15 | - config: A dictionary containing hyperparameter values such as learning rate, 16 | momentum, etc. If the update rule requires caching values over many 17 | iterations, then config will also hold these cached values. 18 | 19 | Returns: 20 | - next_w: The next point after the update. 21 | - config: The config dictionary to be passed to the next iteration of the 22 | update rule. 23 | 24 | NOTE: For most update rules, the default learning rate will probably not perform 25 | well; however the default values of the other hyperparameters should work well 26 | for a variety of different problems. 27 | 28 | For efficiency, update rules may perform in-place updates, mutating w and 29 | setting next_w equal to w. 30 | """ 31 | 32 | 33 | def sgd(w, dw, config=None): 34 | """ 35 | Performs vanilla stochastic gradient descent. 36 | 37 | config format: 38 | - learning_rate: Scalar learning rate. 39 | """ 40 | if config is None: config = {} 41 | config.setdefault('learning_rate', 1e-2) 42 | 43 | w -= config['learning_rate'] * dw 44 | return w, config 45 | 46 | 47 | def sgd_momentum(w, dw, config=None): 48 | """ 49 | Performs stochastic gradient descent with momentum. 50 | 51 | config format: 52 | - learning_rate: Scalar learning rate. 53 | - momentum: Scalar between 0 and 1 giving the momentum value. 54 | Setting momentum = 0 reduces to sgd. 55 | - velocity: A numpy array of the same shape as w and dw used to store a moving 56 | average of the gradients. 57 | """ 58 | if config is None: config = {} 59 | config.setdefault('learning_rate', 1e-2) 60 | config.setdefault('momentum', 0.9) 61 | v = config.get('velocity', np.zeros_like(w)) 62 | 63 | next_w = None 64 | ############################################################################# 65 | # TODO: Implement the momentum update formula. Store the updated value in # 66 | # the next_w variable. You should also use and update the velocity v. # 67 | ############################################################################# 68 | # MISTAKE! Do not forget that you should minus the gradient 69 | v = v*config['momentum'] - config['learning_rate']*dw 70 | next_w = w+v 71 | ############################################################################# 72 | # END OF YOUR CODE # 73 | ############################################################################# 74 | config['velocity'] = v 75 | 76 | return next_w, config 77 | 78 | 79 | 80 | def rmsprop(x, dx, config=None): 81 | """ 82 | Uses the RMSProp update rule, which uses a moving average of squared gradient 83 | values to set adaptive per-parameter learning rates. 84 | 85 | config format: 86 | - learning_rate: Scalar learning rate. 87 | - decay_rate: Scalar between 0 and 1 giving the decay rate for the squared 88 | gradient cache. 89 | - epsilon: Small scalar used for smoothing to avoid dividing by zero. 90 | - cache: Moving average of second moments of gradients. 91 | """ 92 | if config is None: config = {} 93 | config.setdefault('learning_rate', 1e-2) 94 | config.setdefault('decay_rate', 0.99) 95 | config.setdefault('epsilon', 1e-8) 96 | config.setdefault('cache', np.zeros_like(x)) 97 | 98 | next_x = None 99 | ############################################################################# 100 | # TODO: Implement the RMSprop update formula, storing the next value of x # 101 | # in the next_x variable. Don't forget to update cache value stored in # 102 | # config['cache']. # 103 | ############################################################################# 104 | config['cache'] = config['decay_rate']*config['cache'] + \ 105 | (1-config['decay_rate'])*np.square(dx) 106 | next_x = x-config['learning_rate']*dx/(np.sqrt(config['cache'])+config['epsilon']) 107 | ############################################################################# 108 | # END OF YOUR CODE # 109 | ############################################################################# 110 | 111 | return next_x, config 112 | 113 | 114 | def adam(x, dx, config=None): 115 | """ 116 | Uses the Adam update rule, which incorporates moving averages of both the 117 | gradient and its square and a bias correction term. 118 | 119 | config format: 120 | - learning_rate: Scalar learning rate. 121 | - beta1: Decay rate for moving average of first moment of gradient. 122 | - beta2: Decay rate for moving average of second moment of gradient. 123 | - epsilon: Small scalar used for smoothing to avoid dividing by zero. 124 | - m: Moving average of gradient. 125 | - v: Moving average of squared gradient. 126 | - t: Iteration number. 127 | """ 128 | if config is None: config = {} 129 | config.setdefault('learning_rate', 1e-3) 130 | config.setdefault('beta1', 0.9) 131 | config.setdefault('beta2', 0.999) 132 | config.setdefault('epsilon', 1e-8) 133 | config.setdefault('m', np.zeros_like(x)) 134 | config.setdefault('v', np.zeros_like(x)) 135 | config.setdefault('t', 0) 136 | 137 | next_x = None 138 | ############################################################################# 139 | # TODO: Implement the Adam update formula, storing the next value of x in # 140 | # the next_x variable. Don't forget to update the m, v, and t variables # 141 | # stored in config. # 142 | ############################################################################# 143 | config['t'] += 1 144 | config['m'] = config['beta1']*config['m']+(1-config['beta1'])*dx 145 | config['v'] = config['beta2']*config['v']+(1-config['beta2'])*np.square(dx) 146 | #MISTAKE! Read paper carefully! 147 | # config['m'] /= 1-np.power(config['beta1'], config['t']) 148 | # config['v'] /= 1-np.power(config['beta2'], config['t']) 149 | m_biased = config['m'] / (1-np.power(config['beta1'], config['t'])) 150 | v_biased = config['v'] / (1-np.power(config['beta2'], config['t'])) 151 | next_x=x-config['learning_rate']*m_biased/(np.sqrt(v_biased)+config['epsilon']) 152 | ############################################################################# 153 | # END OF YOUR CODE # 154 | ############################################################################# 155 | 156 | return next_x, config 157 | 158 | 159 | 160 | 161 | 162 | -------------------------------------------------------------------------------- /assignment2/cs231n/setup.py: -------------------------------------------------------------------------------- 1 | from distutils.core import setup 2 | from distutils.extension import Extension 3 | from Cython.Build import cythonize 4 | import numpy 5 | 6 | extensions = [ 7 | Extension('im2col_cython', ['im2col_cython.pyx'], 8 | include_dirs = [numpy.get_include()] 9 | ), 10 | ] 11 | 12 | setup( 13 | ext_modules = cythonize(extensions), 14 | ) 15 | -------------------------------------------------------------------------------- /assignment2/cs231n/solver.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from cs231n import optim 4 | 5 | 6 | class Solver(object): 7 | """ 8 | A Solver encapsulates all the logic necessary for training classification 9 | models. The Solver performs stochastic gradient descent using different 10 | update rules defined in optim.py. 11 | 12 | The solver accepts both training and validataion data and labels so it can 13 | periodically check classification accuracy on both training and validation 14 | data to watch out for overfitting. 15 | 16 | To train a model, you will first construct a Solver instance, passing the 17 | model, dataset, and various optoins (learning rate, batch size, etc) to the 18 | constructor. You will then call the train() method to run the optimization 19 | procedure and train the model. 20 | 21 | After the train() method returns, model.params will contain the parameters 22 | that performed best on the validation set over the course of training. 23 | In addition, the instance variable solver.loss_history will contain a list 24 | of all losses encountered during training and the instance variables 25 | solver.train_acc_history and solver.val_acc_history will be lists containing 26 | the accuracies of the model on the training and validation set at each epoch. 27 | 28 | Example usage might look something like this: 29 | 30 | data = { 31 | 'X_train': # training data 32 | 'y_train': # training labels 33 | 'X_val': # validation data 34 | 'X_train': # validation labels 35 | } 36 | model = MyAwesomeModel(hidden_size=100, reg=10) 37 | solver = Solver(model, data, 38 | update_rule='sgd', 39 | optim_config={ 40 | 'learning_rate': 1e-3, 41 | }, 42 | lr_decay=0.95, 43 | num_epochs=10, batch_size=100, 44 | print_every=100) 45 | solver.train() 46 | 47 | 48 | A Solver works on a model object that must conform to the following API: 49 | 50 | - model.params must be a dictionary mapping string parameter names to numpy 51 | arrays containing parameter values. 52 | 53 | - model.loss(X, y) must be a function that computes training-time loss and 54 | gradients, and test-time classification scores, with the following inputs 55 | and outputs: 56 | 57 | Inputs: 58 | - X: Array giving a minibatch of input data of shape (N, d_1, ..., d_k) 59 | - y: Array of labels, of shape (N,) giving labels for X where y[i] is the 60 | label for X[i]. 61 | 62 | Returns: 63 | If y is None, run a test-time forward pass and return: 64 | - scores: Array of shape (N, C) giving classification scores for X where 65 | scores[i, c] gives the score of class c for X[i]. 66 | 67 | If y is not None, run a training time forward and backward pass and return 68 | a tuple of: 69 | - loss: Scalar giving the loss 70 | - grads: Dictionary with the same keys as self.params mapping parameter 71 | names to gradients of the loss with respect to those parameters. 72 | """ 73 | 74 | def __init__(self, model, data, **kwargs): 75 | """ 76 | Construct a new Solver instance. 77 | 78 | Required arguments: 79 | - model: A model object conforming to the API described above 80 | - data: A dictionary of training and validation data with the following: 81 | 'X_train': Array of shape (N_train, d_1, ..., d_k) giving training images 82 | 'X_val': Array of shape (N_val, d_1, ..., d_k) giving validation images 83 | 'y_train': Array of shape (N_train,) giving labels for training images 84 | 'y_val': Array of shape (N_val,) giving labels for validation images 85 | 86 | Optional arguments: 87 | - update_rule: A string giving the name of an update rule in optim.py. 88 | Default is 'sgd'. 89 | - optim_config: A dictionary containing hyperparameters that will be 90 | passed to the chosen update rule. Each update rule requires different 91 | hyperparameters (see optim.py) but all update rules require a 92 | 'learning_rate' parameter so that should always be present. 93 | - lr_decay: A scalar for learning rate decay; after each epoch the learning 94 | rate is multiplied by this value. 95 | - batch_size: Size of minibatches used to compute loss and gradient during 96 | training. 97 | - num_epochs: The number of epochs to run for during training. 98 | - print_every: Integer; training losses will be printed every print_every 99 | iterations. 100 | - verbose: Boolean; if set to false then no output will be printed during 101 | training. 102 | """ 103 | self.model = model 104 | self.X_train = data['X_train'] 105 | self.y_train = data['y_train'] 106 | self.X_val = data['X_val'] 107 | self.y_val = data['y_val'] 108 | 109 | # Unpack keyword arguments 110 | self.update_rule = kwargs.pop('update_rule', 'sgd') 111 | self.optim_config = kwargs.pop('optim_config', {}) 112 | self.lr_decay = kwargs.pop('lr_decay', 1.0) 113 | self.batch_size = kwargs.pop('batch_size', 100) 114 | self.num_epochs = kwargs.pop('num_epochs', 10) 115 | 116 | self.print_every = kwargs.pop('print_every', 10) 117 | self.verbose = kwargs.pop('verbose', True) 118 | 119 | # Throw an error if there are extra keyword arguments 120 | if len(kwargs) > 0: 121 | extra = ', '.join('"%s"' % k for k in kwargs.keys()) 122 | raise ValueError('Unrecognized arguments %s' % extra) 123 | 124 | # Make sure the update rule exists, then replace the string 125 | # name with the actual function 126 | if not hasattr(optim, self.update_rule): 127 | raise ValueError('Invalid update_rule "%s"' % self.update_rule) 128 | self.update_rule = getattr(optim, self.update_rule) 129 | 130 | self._reset() 131 | 132 | 133 | def _reset(self): 134 | """ 135 | Set up some book-keeping variables for optimization. Don't call this 136 | manually. 137 | """ 138 | # Set up some variables for book-keeping 139 | self.epoch = 0 140 | self.best_val_acc = 0 141 | self.best_params = {} 142 | self.loss_history = [] 143 | self.train_acc_history = [] 144 | self.val_acc_history = [] 145 | 146 | # Make a deep copy of the optim_config for each parameter 147 | self.optim_configs = {} 148 | for p in self.model.params: 149 | d = {k: v for k, v in self.optim_config.iteritems()} 150 | self.optim_configs[p] = d 151 | 152 | 153 | def _step(self): 154 | """ 155 | Make a single gradient update. This is called by train() and should not 156 | be called manually. 157 | """ 158 | # Make a minibatch of training data 159 | num_train = self.X_train.shape[0] 160 | batch_mask = np.random.choice(num_train, self.batch_size) 161 | X_batch = self.X_train[batch_mask] 162 | y_batch = self.y_train[batch_mask] 163 | 164 | # Compute loss and gradient 165 | loss, grads = self.model.loss(X_batch, y_batch) 166 | self.loss_history.append(loss) 167 | 168 | # Perform a parameter update 169 | for p, w in self.model.params.iteritems(): 170 | dw = grads[p] 171 | config = self.optim_configs[p] 172 | next_w, next_config = self.update_rule(w, dw, config) 173 | self.model.params[p] = next_w 174 | self.optim_configs[p] = next_config 175 | 176 | 177 | def check_accuracy(self, X, y, num_samples=None, batch_size=100): 178 | """ 179 | Check accuracy of the model on the provided data. 180 | 181 | Inputs: 182 | - X: Array of data, of shape (N, d_1, ..., d_k) 183 | - y: Array of labels, of shape (N,) 184 | - num_samples: If not None, subsample the data and only test the model 185 | on num_samples datapoints. 186 | - batch_size: Split X and y into batches of this size to avoid using too 187 | much memory. 188 | 189 | Returns: 190 | - acc: Scalar giving the fraction of instances that were correctly 191 | classified by the model. 192 | """ 193 | 194 | # Maybe subsample the data 195 | N = X.shape[0] 196 | if num_samples is not None and N > num_samples: 197 | mask = np.random.choice(N, num_samples) 198 | N = num_samples 199 | X = X[mask] 200 | y = y[mask] 201 | 202 | # Compute predictions in batches 203 | num_batches = N / batch_size 204 | if N % batch_size != 0: 205 | num_batches += 1 206 | y_pred = [] 207 | for i in xrange(num_batches): 208 | start = i * batch_size 209 | end = (i + 1) * batch_size 210 | scores = self.model.loss(X[start:end]) 211 | y_pred.append(np.argmax(scores, axis=1)) 212 | y_pred = np.hstack(y_pred) 213 | acc = np.mean(y_pred == y) 214 | 215 | return acc 216 | 217 | 218 | def train(self): 219 | """ 220 | Run optimization to train the model. 221 | """ 222 | num_train = self.X_train.shape[0] 223 | iterations_per_epoch = max(num_train / self.batch_size, 1) 224 | num_iterations = self.num_epochs * iterations_per_epoch 225 | print 'num_train=', num_train 226 | print 'iterations_per_epoch=', iterations_per_epoch 227 | print 'num_iterations=', num_iterations 228 | 229 | for t in xrange(num_iterations): 230 | self._step() 231 | 232 | # Maybe print training loss 233 | if self.verbose and t % self.print_every == 0: 234 | print '(Iteration %d / %d) loss: %f' % ( 235 | t + 1, num_iterations, self.loss_history[-1]) 236 | 237 | # At the end of every epoch, increment the epoch counter and decay the 238 | # learning rate. 239 | epoch_end = (t + 1) % iterations_per_epoch == 0 240 | if epoch_end: 241 | self.epoch += 1 242 | for k in self.optim_configs: 243 | self.optim_configs[k]['learning_rate'] *= self.lr_decay 244 | 245 | # Check train and val accuracy on the first iteration, the last 246 | # iteration, and at the end of each epoch. 247 | first_it = (t == 0) 248 | last_it = (t == num_iterations + 1) 249 | if first_it or last_it or epoch_end: 250 | train_acc = self.check_accuracy(self.X_train, self.y_train, 251 | num_samples=1000) 252 | val_acc = self.check_accuracy(self.X_val, self.y_val) 253 | self.train_acc_history.append(train_acc) 254 | self.val_acc_history.append(val_acc) 255 | 256 | if self.verbose: 257 | print '(Epoch %d / %d) train acc: %f; val_acc: %f' % ( 258 | self.epoch, self.num_epochs, train_acc, val_acc) 259 | 260 | # Keep track of the best model 261 | if val_acc > self.best_val_acc: 262 | self.best_val_acc = val_acc 263 | self.best_params = {} 264 | for k, v in self.model.params.iteritems(): 265 | self.best_params[k] = v.copy() 266 | 267 | # At the end of training swap the best params into the model 268 | self.model.params = self.best_params 269 | 270 | -------------------------------------------------------------------------------- /assignment2/cs231n/vis_utils.py: -------------------------------------------------------------------------------- 1 | from math import sqrt, ceil 2 | import numpy as np 3 | 4 | def visualize_grid(Xs, ubound=255.0, padding=1): 5 | """ 6 | Reshape a 4D tensor of image data to a grid for easy visualization. 7 | 8 | Inputs: 9 | - Xs: Data of shape (N, H, W, C) 10 | - ubound: Output grid will have values scaled to the range [0, ubound] 11 | - padding: The number of blank pixels between elements of the grid 12 | """ 13 | (N, H, W, C) = Xs.shape 14 | grid_size = int(ceil(sqrt(N))) 15 | grid_height = H * grid_size + padding * (grid_size - 1) 16 | grid_width = W * grid_size + padding * (grid_size - 1) 17 | grid = np.zeros((grid_height, grid_width, C)) 18 | next_idx = 0 19 | y0, y1 = 0, H 20 | for y in xrange(grid_size): 21 | x0, x1 = 0, W 22 | for x in xrange(grid_size): 23 | if next_idx < N: 24 | img = Xs[next_idx] 25 | low, high = np.min(img), np.max(img) 26 | grid[y0:y1, x0:x1] = ubound * (img - low) / (high - low) 27 | # grid[y0:y1, x0:x1] = Xs[next_idx] 28 | next_idx += 1 29 | x0 += W + padding 30 | x1 += W + padding 31 | y0 += H + padding 32 | y1 += H + padding 33 | # grid_max = np.max(grid) 34 | # grid_min = np.min(grid) 35 | # grid = ubound * (grid - grid_min) / (grid_max - grid_min) 36 | return grid 37 | 38 | def vis_grid(Xs): 39 | """ visualize a grid of images """ 40 | (N, H, W, C) = Xs.shape 41 | A = int(ceil(sqrt(N))) 42 | G = np.ones((A*H+A, A*W+A, C), Xs.dtype) 43 | G *= np.min(Xs) 44 | n = 0 45 | for y in range(A): 46 | for x in range(A): 47 | if n < N: 48 | G[y*H+y:(y+1)*H+y, x*W+x:(x+1)*W+x, :] = Xs[n,:,:,:] 49 | n += 1 50 | # normalize to [0,1] 51 | maxg = G.max() 52 | ming = G.min() 53 | G = (G - ming)/(maxg-ming) 54 | return G 55 | 56 | def vis_nn(rows): 57 | """ visualize array of arrays of images """ 58 | N = len(rows) 59 | D = len(rows[0]) 60 | H,W,C = rows[0][0].shape 61 | Xs = rows[0][0] 62 | G = np.ones((N*H+N, D*W+D, C), Xs.dtype) 63 | for y in range(N): 64 | for x in range(D): 65 | G[y*H+y:(y+1)*H+y, x*W+x:(x+1)*W+x, :] = rows[y][x] 66 | # normalize to [0,1] 67 | maxg = G.max() 68 | ming = G.min() 69 | G = (G - ming)/(maxg-ming) 70 | return G 71 | 72 | 73 | 74 | -------------------------------------------------------------------------------- /assignment2/frameworkpython: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # what real Python executable to use 4 | PYVER=2.7 5 | PATHTOPYTHON=/usr/local/bin/ 6 | PYTHON=${PATHTOPYTHON}python${PYVER} 7 | 8 | # find the root of the virtualenv, it should be the parent of the dir this script is in 9 | ENV=`$PYTHON -c "import os; print os.path.abspath(os.path.join(os.path.dirname(\"$0\"), '..'))"` 10 | 11 | # now run Python with the virtualenv set as Python's HOME 12 | export PYTHONHOME=$ENV 13 | exec $PYTHON "$@" 14 | -------------------------------------------------------------------------------- /assignment2/kitten.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment2/kitten.jpg -------------------------------------------------------------------------------- /assignment2/puppy.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment2/puppy.jpg -------------------------------------------------------------------------------- /assignment2/requirements.txt: -------------------------------------------------------------------------------- 1 | Cython==0.23.4 2 | Jinja2==2.8 3 | MarkupSafe==0.23 4 | Pillow==3.0.0 5 | Pygments==2.0.2 6 | appnope==0.1.0 7 | argparse==1.2.1 8 | backports-abc==0.4 9 | backports.ssl-match-hostname==3.5.0.1 10 | certifi==2015.11.20.1 11 | cycler==0.9.0 12 | decorator==4.0.6 13 | functools32==3.2.3-2 14 | gnureadline==6.3.3 15 | ipykernel==4.2.2 16 | ipython==4.0.1 17 | ipython-genutils==0.1.0 18 | ipywidgets==4.1.1 19 | jsonschema==2.5.1 20 | jupyter==1.0.0 21 | jupyter-client==4.1.1 22 | jupyter-console==4.0.3 23 | jupyter-core==4.0.6 24 | matplotlib==1.5.0 25 | mistune==0.7.1 26 | nbconvert==4.1.0 27 | nbformat==4.0.1 28 | notebook==4.0.6 29 | numpy==1.10.4 30 | path.py==8.1.2 31 | pexpect==4.0.1 32 | pickleshare==0.5 33 | ptyprocess==0.5 34 | pyparsing==2.0.7 35 | python-dateutil==2.4.2 36 | pytz==2015.7 37 | pyzmq==15.1.0 38 | qtconsole==4.1.1 39 | scipy==0.16.1 40 | simplegeneric==0.8.1 41 | singledispatch==3.4.0.3 42 | six==1.10.0 43 | terminado==0.5 44 | tornado==4.3 45 | traitlets==4.0.0 46 | wsgiref==0.1.2 47 | -------------------------------------------------------------------------------- /assignment2/start_ipython_osx.sh: -------------------------------------------------------------------------------- 1 | # Assume the virtualenv is called .env 2 | 3 | cp frameworkpython venv/bin 4 | venv/bin/frameworkpython -m IPython notebook 5 | -------------------------------------------------------------------------------- /assignment3/.gitignore: -------------------------------------------------------------------------------- 1 | *.swp 2 | *.pyc 3 | .env/* 4 | -------------------------------------------------------------------------------- /assignment3/.ipynb_checkpoints/ImageGeneration-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Image Generation\n", 8 | "In this notebook we will continue our exploration of image gradients using the deep model that was pretrained on TinyImageNet. We will explore various ways of using these image gradients to generate images. We will implement class visualizations, feature inversion, and DeepDream." 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 1, 14 | "metadata": { 15 | "collapsed": false 16 | }, 17 | "outputs": [], 18 | "source": [ 19 | "# As usual, a bit of setup\n", 20 | "\n", 21 | "import time, os, json\n", 22 | "import numpy as np\n", 23 | "from scipy.misc import imread, imresize\n", 24 | "import matplotlib.pyplot as plt\n", 25 | "\n", 26 | "from cs231n.classifiers.pretrained_cnn import PretrainedCNN\n", 27 | "from cs231n.data_utils import load_tiny_imagenet\n", 28 | "from cs231n.image_utils import blur_image, deprocess_image, preprocess_image\n", 29 | "\n", 30 | "%matplotlib inline\n", 31 | "plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n", 32 | "plt.rcParams['image.interpolation'] = 'nearest'\n", 33 | "plt.rcParams['image.cmap'] = 'gray'\n", 34 | "\n", 35 | "# for auto-reloading external modules\n", 36 | "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n", 37 | "%load_ext autoreload\n", 38 | "%autoreload 2" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "# TinyImageNet and pretrained model\n", 46 | "As in the previous notebook, load the TinyImageNet dataset and the pretrained model." 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 2, 52 | "metadata": { 53 | "collapsed": false 54 | }, 55 | "outputs": [ 56 | { 57 | "name": "stdout", 58 | "output_type": "stream", 59 | "text": [ 60 | "loading training data for synset 20 / 100\n", 61 | "loading training data for synset 40 / 100\n", 62 | "loading training data for synset 60 / 100\n", 63 | "loading training data for synset 80 / 100\n", 64 | "loading training data for synset 100 / 100\n" 65 | ] 66 | } 67 | ], 68 | "source": [ 69 | "data = load_tiny_imagenet('cs231n/datasets/tiny-imagenet-100-A', subtract_mean=True)\n", 70 | "model = PretrainedCNN(h5_file='cs231n/datasets/pretrained_model.h5')" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | " # Class visualization\n", 78 | "By starting with a random noise image and performing gradient ascent on a target class, we can generate an image that the network will recognize as the target class. This idea was first presented in [1]; [2] extended this idea by suggesting several regularization techniques that can improve the quality of the generated image.\n", 79 | "\n", 80 | "Concretely, let $I$ be an image and let $y$ be a target class. Let $s_y(I)$ be the score that a convolutional network assigns to the image $I$ for class $y$; note that these are raw unnormalized scores, not class probabilities. We wish to generate an image $I^*$ that achieves a high score for the class $y$ by solving the problem\n", 81 | "\n", 82 | "$$\n", 83 | "I^* = \\arg\\max_I s_y(I) + R(I)\n", 84 | "$$\n", 85 | "\n", 86 | "where $R$ is a (possibly implicit) regularizer. We can solve this optimization problem using gradient descent, computing gradients with respect to the generated image. We will use (explicit) L2 regularization of the form\n", 87 | "\n", 88 | "$$\n", 89 | "R(I) + \\lambda \\|I\\|_2^2\n", 90 | "$$\n", 91 | "\n", 92 | "and implicit regularization as suggested by [2] by peridically blurring the generated image. We can solve this problem using gradient ascent on the generated image.\n", 93 | "\n", 94 | "In the cell below, complete the implementation of the `create_class_visualization` function.\n", 95 | "\n", 96 | "[1] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. \"Deep Inside Convolutional Networks: Visualising\n", 97 | "Image Classification Models and Saliency Maps\", ICLR Workshop 2014.\n", 98 | "\n", 99 | "[2] Yosinski et al, \"Understanding Neural Networks Through Deep Visualization\", ICML 2015 Deep Learning Workshop" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": null, 105 | "metadata": { 106 | "collapsed": true 107 | }, 108 | "outputs": [], 109 | "source": [ 110 | "def create_class_visualization(target_y, model, **kwargs):\n", 111 | " \"\"\"\n", 112 | " Perform optimization over the image to generate class visualizations.\n", 113 | " \n", 114 | " Inputs:\n", 115 | " - target_y: Integer in the range [0, 100) giving the target class\n", 116 | " - model: A PretrainedCNN that will be used for generation\n", 117 | " \n", 118 | " Keyword arguments:\n", 119 | " - learning_rate: Floating point number giving the learning rate\n", 120 | " - blur_every: An integer; how often to blur the image as a regularizer\n", 121 | " - l2_reg: Floating point number giving L2 regularization strength on the image;\n", 122 | " this is lambda in the equation above.\n", 123 | " - max_jitter: How much random jitter to add to the image as regularization\n", 124 | " - num_iterations: How many iterations to run for\n", 125 | " - show_every: How often to show the image\n", 126 | " \"\"\"\n", 127 | " \n", 128 | " learning_rate = kwargs.pop('learning_rate', 10000)\n", 129 | " blur_every = kwargs.pop('blur_every', 1)\n", 130 | " l2_reg = kwargs.pop('l2_reg', 1e-6)\n", 131 | " max_jitter = kwargs.pop('max_jitter', 4)\n", 132 | " num_iterations = kwargs.pop('num_iterations', 100)\n", 133 | " show_every = kwargs.pop('show_every', 25)\n", 134 | " \n", 135 | " X = np.random.randn(1, 3, 64, 64)\n", 136 | " for t in xrange(num_iterations):\n", 137 | " # As a regularizer, add random jitter to the image\n", 138 | " ox, oy = np.random.randint(-max_jitter, max_jitter+1, 2)\n", 139 | " X = np.roll(np.roll(X, ox, -1), oy, -2)\n", 140 | "\n", 141 | " dX = None\n", 142 | " ############################################################################\n", 143 | " # TODO: Compute the image gradient dX of the image with respect to the #\n", 144 | " # target_y class score. This should be similar to the fooling images. Also #\n", 145 | " # add L2 regularization to dX and update the image X using the image #\n", 146 | " # gradient and the learning rate. #\n", 147 | " ############################################################################\n", 148 | " y = np.array([target_y])\n", 149 | " v = 0\n", 150 | " mu = 0.95\n", 151 | " lr0 = 1000\n", 152 | " k=0.02\n", 153 | " \n", 154 | " for i in range(1000):\n", 155 | " loss, y_out, dX = model.calc_loss(X_fooling, y)\n", 156 | " \n", 157 | " lr = lr0*np.exp(-k*i)\n", 158 | " v = mu * v - lr * dX\n", 159 | " X_fooling += v\n", 160 | " print i, 'lr=', lr, y_out, y, 'loss=', loss\n", 161 | " if y_out==y:\n", 162 | " break\n", 163 | " ############################################################################\n", 164 | " # END OF YOUR CODE #\n", 165 | " ############################################################################\n", 166 | " \n", 167 | " # Undo the jitter\n", 168 | " X = np.roll(np.roll(X, -ox, -1), -oy, -2)\n", 169 | " \n", 170 | " # As a regularizer, clip the image\n", 171 | " X = np.clip(X, -data['mean_image'], 255.0 - data['mean_image'])\n", 172 | " \n", 173 | " # As a regularizer, periodically blur the image\n", 174 | " if t % blur_every == 0:\n", 175 | " X = blur_image(X)\n", 176 | " \n", 177 | " # Periodically show the image\n", 178 | " if t % show_every == 0:\n", 179 | " plt.imshow(deprocess_image(X, data['mean_image']))\n", 180 | " plt.gcf().set_size_inches(3, 3)\n", 181 | " plt.axis('off')\n", 182 | " plt.show()\n", 183 | " return X" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "You can use the code above to generate some cool images! An example is shown below. Try to generate a cool-looking image. If you want you can try to implement the other regularization schemes from Yosinski et al, but it isn't required." 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": { 197 | "collapsed": false 198 | }, 199 | "outputs": [], 200 | "source": [ 201 | "target_y = 43 # Tarantula\n", 202 | "print data['class_names'][target_y]\n", 203 | "X = create_class_visualization(target_y, model, show_every=25)" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": {}, 209 | "source": [ 210 | "# Feature Inversion\n", 211 | "In an attempt to understand the types of features that convolutional networks learn to recognize, a recent paper [1] attempts to reconstruct an image from its feature representation. We can easily implement this idea using image gradients from the pretrained network.\n", 212 | "\n", 213 | "Concretely, given a image $I$, let $\\phi_\\ell(I)$ be the activations at layer $\\ell$ of the convolutional network $\\phi$. We wish to find an image $I^*$ with a similar feature representation as $I$ at layer $\\ell$ of the network $\\phi$ by solving the optimization problem\n", 214 | "\n", 215 | "$$\n", 216 | "I^* = \\arg\\min_{I'} \\|\\phi_\\ell(I) - \\phi_\\ell(I')\\|_2^2 + R(I')\n", 217 | "$$\n", 218 | "\n", 219 | "where $\\|\\cdot\\|_2^2$ is the squared Euclidean norm. As above, $R$ is a (possibly implicit) regularizer. We can solve this optimization problem using gradient descent, computing gradients with respect to the generated image. We will use (explicit) L2 regularization of the form\n", 220 | "\n", 221 | "$$\n", 222 | "R(I') + \\lambda \\|I'\\|_2^2\n", 223 | "$$\n", 224 | "\n", 225 | "together with implicit regularization by periodically blurring the image, as recommended by [2].\n", 226 | "\n", 227 | "Implement this method in the function below.\n", 228 | "\n", 229 | "[1] Aravindh Mahendran, Andrea Vedaldi, \"Understanding Deep Image Representations by Inverting them\", CVPR 2015\n", 230 | "\n", 231 | "[2] Yosinski et al, \"Understanding Neural Networks Through Deep Visualization\", ICML 2015 Deep Learning Workshop" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": null, 237 | "metadata": { 238 | "collapsed": false 239 | }, 240 | "outputs": [], 241 | "source": [ 242 | "def invert_features(target_feats, layer, model, **kwargs):\n", 243 | " \"\"\"\n", 244 | " Perform feature inversion in the style of Mahendran and Vedaldi 2015, using\n", 245 | " L2 regularization and periodic blurring.\n", 246 | " \n", 247 | " Inputs:\n", 248 | " - target_feats: Image features of the target image, of shape (1, C, H, W);\n", 249 | " we will try to generate an image that matches these features\n", 250 | " - layer: The index of the layer from which the features were extracted\n", 251 | " - model: A PretrainedCNN that was used to extract features\n", 252 | " \n", 253 | " Keyword arguments:\n", 254 | " - learning_rate: The learning rate to use for gradient descent\n", 255 | " - num_iterations: The number of iterations to use for gradient descent\n", 256 | " - l2_reg: The strength of L2 regularization to use; this is lambda in the\n", 257 | " equation above.\n", 258 | " - blur_every: How often to blur the image as implicit regularization; set\n", 259 | " to 0 to disable blurring.\n", 260 | " - show_every: How often to show the generated image; set to 0 to disable\n", 261 | " showing intermediate reuslts.\n", 262 | " \n", 263 | " Returns:\n", 264 | " - X: Generated image of shape (1, 3, 64, 64) that matches the target features.\n", 265 | " \"\"\"\n", 266 | " learning_rate = kwargs.pop('learning_rate', 10000)\n", 267 | " num_iterations = kwargs.pop('num_iterations', 500)\n", 268 | " l2_reg = kwargs.pop('l2_reg', 1e-7)\n", 269 | " blur_every = kwargs.pop('blur_every', 1)\n", 270 | " show_every = kwargs.pop('show_every', 50)\n", 271 | " \n", 272 | " X = np.random.randn(1, 3, 64, 64)\n", 273 | " for t in xrange(num_iterations):\n", 274 | " ############################################################################\n", 275 | " # TODO: Compute the image gradient dX of the reconstruction loss with #\n", 276 | " # respect to the image. You should include L2 regularization penalizing #\n", 277 | " # large pixel values in the generated image using the l2_reg parameter; #\n", 278 | " # then update the generated image using the learning_rate from above. #\n", 279 | " ############################################################################\n", 280 | " pass\n", 281 | " ############################################################################\n", 282 | " # END OF YOUR CODE #\n", 283 | " ############################################################################\n", 284 | " \n", 285 | " # As a regularizer, clip the image\n", 286 | " X = np.clip(X, -data['mean_image'], 255.0 - data['mean_image'])\n", 287 | " \n", 288 | " # As a regularizer, periodically blur the image\n", 289 | " if (blur_every > 0) and t % blur_every == 0:\n", 290 | " X = blur_image(X)\n", 291 | "\n", 292 | " if (show_every > 0) and (t % show_every == 0 or t + 1 == num_iterations):\n", 293 | " plt.imshow(deprocess_image(X, data['mean_image']))\n", 294 | " plt.gcf().set_size_inches(3, 3)\n", 295 | " plt.axis('off')\n", 296 | " plt.title('t = %d' % t)\n", 297 | " plt.show()" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "### Shallow feature reconstruction\n", 305 | "After implementing the feature inversion above, run the following cell to try and reconstruct features from the fourth convolutional layer of the pretrained model. You should be able to reconstruct the features using the provided optimization parameters." 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": null, 311 | "metadata": { 312 | "collapsed": false, 313 | "scrolled": false 314 | }, 315 | "outputs": [], 316 | "source": [ 317 | "filename = 'kitten.jpg'\n", 318 | "layer = 3 # layers start from 0 so these are features after 4 convolutions\n", 319 | "img = imresize(imread(filename), (64, 64))\n", 320 | "\n", 321 | "plt.imshow(img)\n", 322 | "plt.gcf().set_size_inches(3, 3)\n", 323 | "plt.title('Original image')\n", 324 | "plt.axis('off')\n", 325 | "plt.show()\n", 326 | "\n", 327 | "# Preprocess the image before passing it to the network:\n", 328 | "# subtract the mean, add a dimension, etc\n", 329 | "img_pre = preprocess_image(img, data['mean_image'])\n", 330 | "\n", 331 | "# Extract features from the image\n", 332 | "feats, _ = model.forward(img_pre, end=layer)\n", 333 | "\n", 334 | "# Invert the features\n", 335 | "kwargs = {\n", 336 | " 'num_iterations': 400,\n", 337 | " 'learning_rate': 5000,\n", 338 | " 'l2_reg': 1e-8,\n", 339 | " 'show_every': 100,\n", 340 | " 'blur_every': 10,\n", 341 | "}\n", 342 | "X = invert_features(feats, layer, model, **kwargs)" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "### Deep feature reconstruction\n", 350 | "Reconstructing images using features from deeper layers of the network tends to give interesting results. In the cell below, try to reconstruct the best image you can by inverting the features after 7 layers of convolutions. You will need to play with the hyperparameters to try and get a good result.\n", 351 | "\n", 352 | "HINT: If you read the paper by Mahendran and Vedaldi, you'll see that reconstructions from deep features tend not to look much like the original image, so you shouldn't expect the results to look like the reconstruction above. You should be able to get an image that shows some discernable structure within 1000 iterations." 353 | ] 354 | }, 355 | { 356 | "cell_type": "code", 357 | "execution_count": null, 358 | "metadata": { 359 | "collapsed": false 360 | }, 361 | "outputs": [], 362 | "source": [ 363 | "filename = 'kitten.jpg'\n", 364 | "layer = 6 # layers start from 0 so these are features after 7 convolutions\n", 365 | "img = imresize(imread(filename), (64, 64))\n", 366 | "\n", 367 | "plt.imshow(img)\n", 368 | "plt.gcf().set_size_inches(3, 3)\n", 369 | "plt.title('Original image')\n", 370 | "plt.axis('off')\n", 371 | "plt.show()\n", 372 | "\n", 373 | "# Preprocess the image before passing it to the network:\n", 374 | "# subtract the mean, add a dimension, etc\n", 375 | "img_pre = preprocess_image(img, data['mean_image'])\n", 376 | "\n", 377 | "# Extract features from the image\n", 378 | "feats, _ = model.forward(img_pre, end=layer)\n", 379 | "\n", 380 | "# Invert the features\n", 381 | "# You will need to play with these parameters.\n", 382 | "kwargs = {\n", 383 | " 'num_iterations': 1000,\n", 384 | " 'learning_rate': 0,\n", 385 | " 'l2_reg': 0,\n", 386 | " 'show_every': 100,\n", 387 | " 'blur_every': 0,\n", 388 | "}\n", 389 | "X = invert_features(feats, layer, model, **kwargs)" 390 | ] 391 | }, 392 | { 393 | "cell_type": "markdown", 394 | "metadata": {}, 395 | "source": [ 396 | "# DeepDream\n", 397 | "In the summer of 2015, Google released a [blog post](http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html) describing a new method of generating images from neural networks, and they later [released code](https://github.com/google/deepdream) to generate these images.\n", 398 | "\n", 399 | "The idea is very simple. We pick some layer from the network, pass the starting image through the network to extract features at the chosen layer, set the gradient at that layer equal to the activations themselves, and then backpropagate to the image. This has the effect of modifying the image to amplify the activations at the chosen layer of the network.\n", 400 | "\n", 401 | "For DeepDream we usually extract features from one of the convolutional layers, allowing us to generate images of any resolution.\n", 402 | "\n", 403 | "We can implement this idea using our pretrained network. The results probably won't look as good as Google's since their network is much bigger, but we should still be able to generate some interesting images." 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": null, 409 | "metadata": { 410 | "collapsed": false 411 | }, 412 | "outputs": [], 413 | "source": [ 414 | "def deepdream(X, layer, model, **kwargs):\n", 415 | " \"\"\"\n", 416 | " Generate a DeepDream image.\n", 417 | " \n", 418 | " Inputs:\n", 419 | " - X: Starting image, of shape (1, 3, H, W)\n", 420 | " - layer: Index of layer at which to dream\n", 421 | " - model: A PretrainedCNN object\n", 422 | " \n", 423 | " Keyword arguments:\n", 424 | " - learning_rate: How much to update the image at each iteration\n", 425 | " - max_jitter: Maximum number of pixels for jitter regularization\n", 426 | " - num_iterations: How many iterations to run for\n", 427 | " - show_every: How often to show the generated image\n", 428 | " \"\"\"\n", 429 | " \n", 430 | " X = X.copy()\n", 431 | " \n", 432 | " learning_rate = kwargs.pop('learning_rate', 5.0)\n", 433 | " max_jitter = kwargs.pop('max_jitter', 16)\n", 434 | " num_iterations = kwargs.pop('num_iterations', 100)\n", 435 | " show_every = kwargs.pop('show_every', 25)\n", 436 | " \n", 437 | " for t in xrange(num_iterations):\n", 438 | " # As a regularizer, add random jitter to the image\n", 439 | " ox, oy = np.random.randint(-max_jitter, max_jitter+1, 2)\n", 440 | " X = np.roll(np.roll(X, ox, -1), oy, -2)\n", 441 | "\n", 442 | " dX = None\n", 443 | " ############################################################################\n", 444 | " # TODO: Compute the image gradient dX using the DeepDream method. You'll #\n", 445 | " # need to use the forward and backward methods of the model object to #\n", 446 | " # extract activations and set gradients for the chosen layer. After #\n", 447 | " # computing the image gradient dX, you should use the learning rate to #\n", 448 | " # update the image X. #\n", 449 | " ############################################################################\n", 450 | " pass\n", 451 | " ############################################################################\n", 452 | " # END OF YOUR CODE #\n", 453 | " ############################################################################\n", 454 | " \n", 455 | " # Undo the jitter\n", 456 | " X = np.roll(np.roll(X, -ox, -1), -oy, -2)\n", 457 | " \n", 458 | " # As a regularizer, clip the image\n", 459 | " mean_pixel = data['mean_image'].mean(axis=(1, 2), keepdims=True)\n", 460 | " X = np.clip(X, -mean_pixel, 255.0 - mean_pixel)\n", 461 | " \n", 462 | " # Periodically show the image\n", 463 | " if t == 0 or (t + 1) % show_every == 0:\n", 464 | " img = deprocess_image(X, data['mean_image'], mean='pixel')\n", 465 | " plt.imshow(img)\n", 466 | " plt.title('t = %d' % (t + 1))\n", 467 | " plt.gcf().set_size_inches(8, 8)\n", 468 | " plt.axis('off')\n", 469 | " plt.show()\n", 470 | " return X" 471 | ] 472 | }, 473 | { 474 | "cell_type": "markdown", 475 | "metadata": {}, 476 | "source": [ 477 | "# Generate some images!\n", 478 | "Try and generate a cool-looking DeepDeam image using the pretrained network. You can try using different layers, or starting from different images. You can reduce the image size if it runs too slowly on your machine, or increase the image size if you are feeling ambitious." 479 | ] 480 | }, 481 | { 482 | "cell_type": "code", 483 | "execution_count": null, 484 | "metadata": { 485 | "collapsed": false, 486 | "scrolled": false 487 | }, 488 | "outputs": [], 489 | "source": [ 490 | "def read_image(filename, max_size):\n", 491 | " \"\"\"\n", 492 | " Read an image from disk and resize it so its larger side is max_size\n", 493 | " \"\"\"\n", 494 | " img = imread(filename)\n", 495 | " H, W, _ = img.shape\n", 496 | " if H >= W:\n", 497 | " img = imresize(img, (max_size, int(W * float(max_size) / H)))\n", 498 | " elif H < W:\n", 499 | " img = imresize(img, (int(H * float(max_size) / W), max_size))\n", 500 | " return img\n", 501 | "\n", 502 | "filename = 'kitten.jpg'\n", 503 | "max_size = 256\n", 504 | "img = read_image(filename, max_size)\n", 505 | "plt.imshow(img)\n", 506 | "plt.axis('off')\n", 507 | "\n", 508 | "# Preprocess the image by converting to float, transposing,\n", 509 | "# and performing mean subtraction.\n", 510 | "img_pre = preprocess_image(img, data['mean_image'], mean='pixel')\n", 511 | "\n", 512 | "out = deepdream(img_pre, 7, model, learning_rate=2000)" 513 | ] 514 | } 515 | ], 516 | "metadata": { 517 | "kernelspec": { 518 | "display_name": "Python 2", 519 | "language": "python", 520 | "name": "python2" 521 | }, 522 | "language_info": { 523 | "codemirror_mode": { 524 | "name": "ipython", 525 | "version": 2 526 | }, 527 | "file_extension": ".py", 528 | "mimetype": "text/x-python", 529 | "name": "python", 530 | "nbconvert_exporter": "python", 531 | "pygments_lexer": "ipython2", 532 | "version": "2.7.10" 533 | } 534 | }, 535 | "nbformat": 4, 536 | "nbformat_minor": 0 537 | } 538 | -------------------------------------------------------------------------------- /assignment3/collectSubmission.sh: -------------------------------------------------------------------------------- 1 | rm -f assignment3.zip 2 | zip -r assignment3.zip . -x "*.git" "*cs231n/datasets*" "*.ipynb_checkpoints*" "*README.md" "*collectSubmission.sh" "*requirements.txt" ".env/*" "*.pyc" "*cs231n/build/*" 3 | -------------------------------------------------------------------------------- /assignment3/cs231n/.gitignore: -------------------------------------------------------------------------------- 1 | build/* 2 | im2col_cython.c 3 | im2col_cython.so 4 | -------------------------------------------------------------------------------- /assignment3/cs231n/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment3/cs231n/__init__.py -------------------------------------------------------------------------------- /assignment3/cs231n/captioning_solver.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from cs231n import optim 4 | from cs231n.coco_utils import sample_coco_minibatch 5 | 6 | 7 | class CaptioningSolver(object): 8 | """ 9 | A CaptioningSolver encapsulates all the logic necessary for training 10 | image captioning models. The CaptioningSolver performs stochastic gradient 11 | descent using different update rules defined in optim.py. 12 | 13 | The solver accepts both training and validataion data and labels so it can 14 | periodically check classification accuracy on both training and validation 15 | data to watch out for overfitting. 16 | 17 | To train a model, you will first construct a CaptioningSolver instance, 18 | passing the model, dataset, and various options (learning rate, batch size, 19 | etc) to the constructor. You will then call the train() method to run the 20 | optimization procedure and train the model. 21 | 22 | After the train() method returns, model.params will contain the parameters 23 | that performed best on the validation set over the course of training. 24 | In addition, the instance variable solver.loss_history will contain a list 25 | of all losses encountered during training and the instance variables 26 | solver.train_acc_history and solver.val_acc_history will be lists containing 27 | the accuracies of the model on the training and validation set at each epoch. 28 | 29 | Example usage might look something like this: 30 | 31 | data = load_coco_data() 32 | model = MyAwesomeModel(hidden_dim=100) 33 | solver = CaptioningSolver(model, data, 34 | update_rule='sgd', 35 | optim_config={ 36 | 'learning_rate': 1e-3, 37 | }, 38 | lr_decay=0.95, 39 | num_epochs=10, batch_size=100, 40 | print_every=100) 41 | solver.train() 42 | 43 | 44 | A CaptioningSolver works on a model object that must conform to the following 45 | API: 46 | 47 | - model.params must be a dictionary mapping string parameter names to numpy 48 | arrays containing parameter values. 49 | 50 | - model.loss(features, captions) must be a function that computes 51 | training-time loss and gradients, with the following inputs and outputs: 52 | 53 | Inputs: 54 | - features: Array giving a minibatch of features for images, of shape (N, D 55 | - captions: Array of captions for those images, of shape (N, T) where 56 | each element is in the range (0, V]. 57 | 58 | Returns: 59 | - loss: Scalar giving the loss 60 | - grads: Dictionary with the same keys as self.params mapping parameter 61 | names to gradients of the loss with respect to those parameters. 62 | """ 63 | 64 | def __init__(self, model, data, **kwargs): 65 | """ 66 | Construct a new CaptioningSolver instance. 67 | 68 | Required arguments: 69 | - model: A model object conforming to the API described above 70 | - data: A dictionary of training and validation data from load_coco_data 71 | 72 | Optional arguments: 73 | - update_rule: A string giving the name of an update rule in optim.py. 74 | Default is 'sgd'. 75 | - optim_config: A dictionary containing hyperparameters that will be 76 | passed to the chosen update rule. Each update rule requires different 77 | hyperparameters (see optim.py) but all update rules require a 78 | 'learning_rate' parameter so that should always be present. 79 | - lr_decay: A scalar for learning rate decay; after each epoch the learning 80 | rate is multiplied by this value. 81 | - batch_size: Size of minibatches used to compute loss and gradient during 82 | training. 83 | - num_epochs: The number of epochs to run for during training. 84 | - print_every: Integer; training losses will be printed every print_every 85 | iterations. 86 | - verbose: Boolean; if set to false then no output will be printed during 87 | training. 88 | """ 89 | self.model = model 90 | self.data = data 91 | 92 | # Unpack keyword arguments 93 | self.update_rule = kwargs.pop('update_rule', 'sgd') 94 | self.optim_config = kwargs.pop('optim_config', {}) 95 | self.lr_decay = kwargs.pop('lr_decay', 1.0) 96 | self.batch_size = kwargs.pop('batch_size', 100) 97 | self.num_epochs = kwargs.pop('num_epochs', 10) 98 | 99 | self.print_every = kwargs.pop('print_every', 10) 100 | self.verbose = kwargs.pop('verbose', True) 101 | 102 | # Throw an error if there are extra keyword arguments 103 | if len(kwargs) > 0: 104 | extra = ', '.join('"%s"' % k for k in kwargs.keys()) 105 | raise ValueError('Unrecognized arguments %s' % extra) 106 | 107 | # Make sure the update rule exists, then replace the string 108 | # name with the actual function 109 | if not hasattr(optim, self.update_rule): 110 | raise ValueError('Invalid update_rule "%s"' % self.update_rule) 111 | self.update_rule = getattr(optim, self.update_rule) 112 | 113 | self._reset() 114 | 115 | 116 | def _reset(self): 117 | """ 118 | Set up some book-keeping variables for optimization. Don't call this 119 | manually. 120 | """ 121 | # Set up some variables for book-keeping 122 | self.epoch = 0 123 | self.best_val_acc = 0 124 | self.best_params = {} 125 | self.loss_history = [] 126 | self.train_acc_history = [] 127 | self.val_acc_history = [] 128 | 129 | # Make a deep copy of the optim_config for each parameter 130 | self.optim_configs = {} 131 | for p in self.model.params: 132 | d = {k: v for k, v in self.optim_config.iteritems()} 133 | self.optim_configs[p] = d 134 | 135 | 136 | def _step(self): 137 | """ 138 | Make a single gradient update. This is called by train() and should not 139 | be called manually. 140 | """ 141 | # Make a minibatch of training data 142 | minibatch = sample_coco_minibatch(self.data, 143 | batch_size=self.batch_size, 144 | split='train') 145 | captions, features, urls = minibatch 146 | 147 | # Compute loss and gradient 148 | loss, grads = self.model.loss(features, captions) 149 | self.loss_history.append(loss) 150 | 151 | # Perform a parameter update 152 | for p, w in self.model.params.iteritems(): 153 | dw = grads[p] 154 | config = self.optim_configs[p] 155 | next_w, next_config = self.update_rule(w, dw, config) 156 | self.model.params[p] = next_w 157 | self.optim_configs[p] = next_config 158 | 159 | 160 | # TODO: This does nothing right now; maybe implement BLEU? 161 | def check_accuracy(self, X, y, num_samples=None, batch_size=100): 162 | """ 163 | Check accuracy of the model on the provided data. 164 | 165 | Inputs: 166 | - X: Array of data, of shape (N, d_1, ..., d_k) 167 | - y: Array of labels, of shape (N,) 168 | - num_samples: If not None, subsample the data and only test the model 169 | on num_samples datapoints. 170 | - batch_size: Split X and y into batches of this size to avoid using too 171 | much memory. 172 | 173 | Returns: 174 | - acc: Scalar giving the fraction of instances that were correctly 175 | classified by the model. 176 | """ 177 | return 0.0 178 | 179 | # Maybe subsample the data 180 | N = X.shape[0] 181 | if num_samples is not None and N > num_samples: 182 | mask = np.random.choice(N, num_samples) 183 | N = num_samples 184 | X = X[mask] 185 | y = y[mask] 186 | 187 | # Compute predictions in batches 188 | num_batches = N / batch_size 189 | if N % batch_size != 0: 190 | num_batches += 1 191 | y_pred = [] 192 | for i in xrange(num_batches): 193 | start = i * batch_size 194 | end = (i + 1) * batch_size 195 | scores = self.model.loss(X[start:end]) 196 | y_pred.append(np.argmax(scores, axis=1)) 197 | y_pred = np.hstack(y_pred) 198 | acc = np.mean(y_pred == y) 199 | 200 | return acc 201 | 202 | 203 | def train(self): 204 | """ 205 | Run optimization to train the model. 206 | """ 207 | num_train = self.data['train_captions'].shape[0] 208 | iterations_per_epoch = max(num_train / self.batch_size, 1) 209 | num_iterations = self.num_epochs * iterations_per_epoch 210 | 211 | for t in xrange(num_iterations): 212 | self._step() 213 | 214 | # Maybe print training loss 215 | if self.verbose and t % self.print_every == 0: 216 | print '(Iteration %d / %d) loss: %f' % ( 217 | t + 1, num_iterations, self.loss_history[-1]) 218 | 219 | # At the end of every epoch, increment the epoch counter and decay the 220 | # learning rate. 221 | epoch_end = (t + 1) % iterations_per_epoch == 0 222 | if epoch_end: 223 | self.epoch += 1 224 | for k in self.optim_configs: 225 | self.optim_configs[k]['learning_rate'] *= self.lr_decay 226 | 227 | # Check train and val accuracy on the first iteration, the last 228 | # iteration, and at the end of each epoch. 229 | # TODO: Implement some logic to check Bleu on validation set periodically 230 | 231 | # At the end of training swap the best params into the model 232 | # self.model.params = self.best_params 233 | 234 | -------------------------------------------------------------------------------- /assignment3/cs231n/classifiers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment3/cs231n/classifiers/__init__.py -------------------------------------------------------------------------------- /assignment3/cs231n/classifiers/pretrained_cnn.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import h5py 3 | 4 | from cs231n.layers import * 5 | from cs231n.fast_layers import * 6 | from cs231n.layer_utils import * 7 | 8 | 9 | def print_tuple(x): 10 | for item in x: 11 | if type(item)==tuple: 12 | print_tuple(item) 13 | elif type(item)==np.ndarray: 14 | print ' C:', item.shape, np.sum(np.absolute(item)) 15 | else: 16 | print ' C:', item 17 | 18 | class PretrainedCNN(object): 19 | def __init__(self, dtype=np.float32, num_classes=100, input_size=64, h5_file=None): 20 | self.debug = False 21 | self.dtype = dtype 22 | self.conv_params = [] 23 | self.input_size = input_size 24 | self.num_classes = num_classes 25 | 26 | # TODO: In the future it would be nice if the architecture could be loaded from 27 | # the HDF5 file rather than being hardcoded. For now this will have to do. 28 | self.conv_params.append({'stride': 2, 'pad': 2}) 29 | self.conv_params.append({'stride': 1, 'pad': 1}) 30 | self.conv_params.append({'stride': 2, 'pad': 1}) 31 | self.conv_params.append({'stride': 1, 'pad': 1}) 32 | self.conv_params.append({'stride': 2, 'pad': 1}) 33 | self.conv_params.append({'stride': 1, 'pad': 1}) 34 | self.conv_params.append({'stride': 2, 'pad': 1}) 35 | self.conv_params.append({'stride': 1, 'pad': 1}) 36 | self.conv_params.append({'stride': 2, 'pad': 1}) 37 | 38 | self.filter_sizes = [5, 3, 3, 3, 3, 3, 3, 3, 3] 39 | self.num_filters = [64, 64, 128, 128, 256, 256, 512, 512, 1024] 40 | hidden_dim = 512 41 | 42 | self.bn_params = [] 43 | 44 | cur_size = input_size 45 | prev_dim = 3 46 | self.params = {} 47 | for i, (f, next_dim) in enumerate(zip(self.filter_sizes, self.num_filters)): 48 | fan_in = f * f * prev_dim 49 | self.params['W%d' % (i + 1)] = np.sqrt(2.0 / fan_in) * np.random.randn(next_dim, prev_dim, f, f) 50 | self.params['b%d' % (i + 1)] = np.zeros(next_dim) 51 | self.params['gamma%d' % (i + 1)] = np.ones(next_dim) 52 | self.params['beta%d' % (i + 1)] = np.zeros(next_dim) 53 | self.bn_params.append({'mode': 'train'}) 54 | prev_dim = next_dim 55 | if self.conv_params[i]['stride'] == 2: cur_size /= 2 56 | 57 | # Add a fully-connected layers 58 | fan_in = cur_size * cur_size * self.num_filters[-1] 59 | self.params['W%d' % (i + 2)] = np.sqrt(2.0 / fan_in) * np.random.randn(fan_in, hidden_dim) 60 | self.params['b%d' % (i + 2)] = np.zeros(hidden_dim) 61 | self.params['gamma%d' % (i + 2)] = np.ones(hidden_dim) 62 | self.params['beta%d' % (i + 2)] = np.zeros(hidden_dim) 63 | self.bn_params.append({'mode': 'train'}) 64 | self.params['W%d' % (i + 3)] = np.sqrt(2.0 / hidden_dim) * np.random.randn(hidden_dim, num_classes) 65 | self.params['b%d' % (i + 3)] = np.zeros(num_classes) 66 | 67 | for k, v in self.params.iteritems(): 68 | self.params[k] = v.astype(dtype) 69 | 70 | if h5_file is not None: 71 | self.load_weights(h5_file) 72 | 73 | 74 | def load_weights(self, h5_file, verbose=False): 75 | """ 76 | Load pretrained weights from an HDF5 file. 77 | 78 | Inputs: 79 | - h5_file: Path to the HDF5 file where pretrained weights are stored. 80 | - verbose: Whether to print debugging info 81 | """ 82 | 83 | # Before loading weights we need to make a dummy forward pass to initialize 84 | # the running averages in the bn_pararams 85 | x = np.random.randn(1, 3, self.input_size, self.input_size) 86 | y = np.random.randint(self.num_classes, size=1) 87 | loss, grads = self.loss(x, y) 88 | 89 | with h5py.File(h5_file, 'r') as f: 90 | for k, v in f.iteritems(): 91 | v = np.asarray(v) 92 | if k in self.params: 93 | if verbose: print k, v.shape, self.params[k].shape 94 | if v.shape == self.params[k].shape: 95 | self.params[k] = v.copy() 96 | elif v.T.shape == self.params[k].shape: 97 | self.params[k] = v.T.copy() 98 | else: 99 | raise ValueError('shapes for %s do not match' % k) 100 | if k.startswith('running_mean'): 101 | i = int(k[12:]) - 1 102 | assert self.bn_params[i]['running_mean'].shape == v.shape 103 | self.bn_params[i]['running_mean'] = v.copy() 104 | if verbose: print k, v.shape 105 | if k.startswith('running_var'): 106 | i = int(k[11:]) - 1 107 | assert v.shape == self.bn_params[i]['running_var'].shape 108 | self.bn_params[i]['running_var'] = v.copy() 109 | if verbose: print k, v.shape 110 | 111 | for k, v in self.params.iteritems(): 112 | self.params[k] = v.astype(self.dtype) 113 | 114 | 115 | def forward(self, X, start=None, end=None, mode='test'): 116 | """ 117 | Run part of the model forward, starting and ending at an arbitrary layer, 118 | in either training mode or testing mode. 119 | 120 | You can pass arbitrary input to the starting layer, and you will receive 121 | output from the ending layer and a cache object that can be used to run 122 | the model backward over the same set of layers. 123 | 124 | For the purposes of this function, a "layer" is one of the following blocks: 125 | 126 | [conv - spatial batchnorm - relu] (There are 9 of these) 127 | [affine - batchnorm - relu] (There is one of these) 128 | [affine] (There is one of these) 129 | 130 | Inputs: 131 | - X: The input to the starting layer. If start=0, then this should be an 132 | array of shape (N, C, 64, 64). 133 | - start: The index of the layer to start from. start=0 starts from the first 134 | convolutional layer. Default is 0. 135 | - end: The index of the layer to end at. start=11 ends at the last 136 | fully-connected layer, returning class scores. Default is 11. 137 | - mode: The mode to use, either 'test' or 'train'. We need this because 138 | batch normalization behaves differently at training time and test time. 139 | 140 | Returns: 141 | - out: Output from the end layer. 142 | - cache: A cache object that can be passed to the backward method to run the 143 | network backward over the same range of layers. 144 | """ 145 | X = X.astype(self.dtype) 146 | if start is None: start = 0 147 | if end is None: end = len(self.conv_params) + 1 148 | layer_caches = [] 149 | 150 | prev_a = X 151 | for i in xrange(start, end + 1): 152 | i1 = i + 1 153 | if 0 <= i < len(self.conv_params): 154 | # This is a conv layer 155 | w, b = self.params['W%d' % i1], self.params['b%d' % i1] 156 | gamma, beta = self.params['gamma%d' % i1], self.params['beta%d' % i1] 157 | conv_param = self.conv_params[i] 158 | bn_param = self.bn_params[i] 159 | bn_param['mode'] = mode 160 | 161 | next_a, cache = conv_bn_relu_forward(prev_a, w, b, gamma, beta, conv_param, bn_param) 162 | elif i == len(self.conv_params): 163 | # This is the fully-connected hidden layer 164 | w, b = self.params['W%d' % i1], self.params['b%d' % i1] 165 | gamma, beta = self.params['gamma%d' % i1], self.params['beta%d' % i1] 166 | bn_param = self.bn_params[i] 167 | bn_param['mode'] = mode 168 | next_a, cache = affine_bn_relu_forward(prev_a, w, b, gamma, beta, bn_param) 169 | elif i == len(self.conv_params) + 1: 170 | # This is the last fully-connected layer that produces scores 171 | w, b = self.params['W%d' % i1], self.params['b%d' % i1] 172 | next_a, cache = affine_forward(prev_a, w, b) 173 | else: 174 | raise ValueError('Invalid layer index %d' % i) 175 | 176 | layer_caches.append(cache) 177 | prev_a = next_a 178 | 179 | out = prev_a 180 | cache = (start, end, layer_caches) 181 | return out, cache 182 | 183 | def backward(self, dout, cache): 184 | """ 185 | Run the model backward over a sequence of layers that were previously run 186 | forward using the self.forward method. 187 | 188 | Inputs: 189 | - dout: Gradient with respect to the ending layer; this should have the same 190 | shape as the out variable returned from the corresponding call to forward. 191 | - cache: A cache object returned from self.forward. 192 | 193 | Returns: 194 | - dX: Gradient with respect to the start layer. This will have the same 195 | shape as the input X passed to self.forward. 196 | - grads: Gradient of all parameters in the layers. For example if you run 197 | forward through two convolutional layers, then on the corresponding call 198 | to backward grads will contain the gradients with respect to the weights, 199 | biases, and spatial batchnorm parameters of those two convolutional 200 | layers. The grads dictionary will therefore contain a subset of the keys 201 | of self.params, and grads[k] and self.params[k] will have the same shape. 202 | """ 203 | start, end, layer_caches = cache 204 | dnext_a = dout 205 | grads = {} 206 | cache = None 207 | for i in reversed(range(start, end + 1)): 208 | if self.debug: 209 | print 'layer:', i+1, "top gradient's shape and abs sum:", dnext_a.shape, np.sum(np.absolute(dnext_a)) 210 | 211 | i1 = i + 1 212 | if i == len(self.conv_params) + 1: 213 | cache = layer_caches.pop() 214 | # This is the last fully-connected layer 215 | dprev_a, dw, db = affine_backward(dnext_a, cache) 216 | grads['W%d' % i1] = dw 217 | grads['b%d' % i1] = db 218 | elif i == len(self.conv_params): 219 | # This is the fully-connected hidden layer 220 | cache = layer_caches.pop() 221 | temp = affine_bn_relu_backward(dnext_a, cache) 222 | dprev_a, dw, db, dgamma, dbeta = temp 223 | grads['W%d' % i1] = dw 224 | grads['b%d' % i1] = db 225 | grads['gamma%d' % i1] = dgamma 226 | grads['beta%d' % i1] = dbeta 227 | elif 0 <= i < len(self.conv_params): 228 | # This is a conv layer 229 | cache = layer_caches.pop() 230 | temp = conv_bn_relu_backward(dnext_a, cache) 231 | dprev_a, dw, db, dgamma, dbeta = temp 232 | grads['W%d' % i1] = dw 233 | grads['b%d' % i1] = db 234 | grads['gamma%d' % i1] = dgamma 235 | grads['beta%d' % i1] = dbeta 236 | else: 237 | raise ValueError('Invalid layer index %d' % i) 238 | dnext_a = dprev_a 239 | 240 | if self.debug: 241 | names = [] 242 | for name, item in grads.iteritems(): 243 | print name, '->G', item.shape, np.sum(np.absolute(item)) 244 | names.append(name) 245 | for name in names: 246 | del grads[name] 247 | 248 | print_tuple(cache) 249 | 250 | dnext_a = dprev_a 251 | print 'layer:', i+1, "bottom gradient's shape and abs sum:", dnext_a.shape, np.sum(np.absolute(dnext_a)) 252 | raw_input() 253 | 254 | dX = dnext_a 255 | return dX, grads 256 | 257 | 258 | def loss(self, X, y=None): 259 | """ 260 | Classification loss used to train the network. 261 | 262 | Inputs: 263 | - X: Array of data, of shape (N, 3, 64, 64) 264 | - y: Array of labels, of shape (N,) 265 | 266 | If y is None, then run a test-time forward pass and return: 267 | - scores: Array of shape (N, 100) giving class scores. 268 | 269 | If y is not None, then run a training-time forward and backward pass and 270 | return a tuple of: 271 | - loss: Scalar giving loss 272 | - grads: Dictionary of gradients, with the same keys as self.params. 273 | """ 274 | # Note that we implement this by just caling self.forward and self.backward 275 | mode = 'test' if y is None else 'train' 276 | scores, cache = self.forward(X, mode=mode) 277 | if mode == 'test': 278 | return scores 279 | loss, dscores = softmax_loss(scores, y) 280 | 281 | dX, grads = self.backward(dscores, cache) 282 | return loss, grads 283 | 284 | def calc_loss(self, X, y): 285 | scores, cache = self.forward(X, mode='test') 286 | loss, dscores = softmax_loss(scores, y) 287 | dX, _ = self.backward(dscores, cache) 288 | return loss, np.argmax(scores), dX 289 | -------------------------------------------------------------------------------- /assignment3/cs231n/classifiers/rnn.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from cs231n.layers import * 4 | from cs231n.rnn_layers import * 5 | 6 | 7 | class CaptioningRNN(object): 8 | """ 9 | A CaptioningRNN produces captions from image features using a recurrent 10 | neural network. 11 | 12 | The RNN receives input vectors of size D, has a vocab size of V, works on 13 | sequences of length T, has an RNN hidden dimension of H, uses word vectors 14 | of dimension W, and operates on minibatches of size N. 15 | 16 | Note that we don't use any regularization for the CaptioningRNN. 17 | """ 18 | 19 | def __init__(self, word_to_idx, input_dim=512, wordvec_dim=128, 20 | hidden_dim=128, cell_type='rnn', dtype=np.float32): 21 | """ 22 | Construct a new CaptioningRNN instance. 23 | 24 | Inputs: 25 | - word_to_idx: A dictionary giving the vocabulary. It contains V entries, 26 | and maps each string to a unique integer in the range [0, V). 27 | - input_dim: Dimension D of input image feature vectors. 28 | - wordvec_dim: Dimension W of word vectors. 29 | - hidden_dim: Dimension H for the hidden state of the RNN. 30 | - cell_type: What type of RNN to use; either 'rnn' or 'lstm'. 31 | - dtype: numpy datatype to use; use float32 for training and float64 for 32 | numeric gradient checking. 33 | """ 34 | if cell_type not in {'rnn', 'lstm'}: 35 | raise ValueError('Invalid cell_type "%s"' % cell_type) 36 | 37 | self.cell_type = cell_type 38 | self.dtype = dtype 39 | self.word_to_idx = word_to_idx 40 | self.idx_to_word = {i: w for w, i in word_to_idx.iteritems()} 41 | self.params = {} 42 | 43 | vocab_size = len(word_to_idx) 44 | 45 | self._null = word_to_idx[''] 46 | self._start = word_to_idx.get('', None) 47 | self._end = word_to_idx.get('', None) 48 | 49 | # Initialize word vectors 50 | self.params['W_embed'] = np.random.randn(vocab_size, wordvec_dim) 51 | self.params['W_embed'] /= 100 52 | 53 | # Initialize CNN -> hidden state projection parameters 54 | self.params['W_proj'] = np.random.randn(input_dim, hidden_dim) 55 | self.params['W_proj'] /= np.sqrt(input_dim) 56 | self.params['b_proj'] = np.zeros(hidden_dim) 57 | 58 | # Initialize parameters for the RNN 59 | dim_mul = {'lstm': 4, 'rnn': 1}[cell_type] 60 | self.params['Wx'] = np.random.randn(wordvec_dim, dim_mul * hidden_dim) 61 | self.params['Wx'] /= np.sqrt(wordvec_dim) 62 | self.params['Wh'] = np.random.randn(hidden_dim, dim_mul * hidden_dim) 63 | self.params['Wh'] /= np.sqrt(hidden_dim) 64 | self.params['b'] = np.zeros(dim_mul * hidden_dim) 65 | 66 | # Initialize output to vocab weights 67 | self.params['W_vocab'] = np.random.randn(hidden_dim, vocab_size) 68 | self.params['W_vocab'] /= np.sqrt(hidden_dim) 69 | self.params['b_vocab'] = np.zeros(vocab_size) 70 | 71 | # Cast parameters to correct dtype 72 | for k, v in self.params.iteritems(): 73 | self.params[k] = v.astype(self.dtype) 74 | 75 | 76 | def loss(self, features, captions): 77 | """ 78 | Compute training-time loss for the RNN. We input image features and 79 | ground-truth captions for those images, and use an RNN (or LSTM) to compute 80 | loss and gradients on all parameters. 81 | 82 | Inputs: 83 | - features: Input image features, of shape (N, D) 84 | - captions: Ground-truth captions; an integer array of shape (N, T) where 85 | each element is in the range 0 <= y[i, t] < V 86 | 87 | Returns a tuple of: 88 | - loss: Scalar loss 89 | - grads: Dictionary of gradients parallel to self.params 90 | """ 91 | # Cut captions into two pieces: captions_in has everything but the last word 92 | # and will be input to the RNN; captions_out has everything but the first 93 | # word and this is what we will expect the RNN to generate. These are offset 94 | # by one relative to each other because the RNN should produce word (t+1) 95 | # after receiving word t. The first element of captions_in will be the START 96 | # token, and the first element of captions_out will be the first word. 97 | captions_in = captions[:, :-1] 98 | captions_out = captions[:, 1:] 99 | 100 | # You'll need this 101 | mask = (captions_out != self._null) 102 | # I think here author made a mistake 103 | 104 | # Weight and bias for the affine transform from image features to initial 105 | # hidden state 106 | W_proj, b_proj = self.params['W_proj'], self.params['b_proj'] 107 | 108 | # Word embedding matrix 109 | W_embed = self.params['W_embed'] 110 | 111 | # Input-to-hidden, hidden-to-hidden, and biases for the RNN 112 | Wx, Wh, b = self.params['Wx'], self.params['Wh'], self.params['b'] 113 | 114 | # Weight and bias for the hidden-to-vocab transformation. 115 | W_vocab, b_vocab = self.params['W_vocab'], self.params['b_vocab'] 116 | 117 | loss, grads = 0.0, {} 118 | ############################################################################ 119 | # TODO: Implement the forward and backward passes for the CaptioningRNN. # 120 | # In the forward pass you will need to do the following: # 121 | # (1) Use an affine transformation to compute the initial hidden state # 122 | # from the image features. This should produce an array of shape (N, H)# 123 | # (2) Use a word embedding layer to transform the words in captions_in # 124 | # from indices to vectors, giving an array of shape (N, T, W). # 125 | # (3) Use either a vanilla RNN or LSTM (depending on self.cell_type) to # 126 | # process the sequence of input word vectors and produce hidden state # 127 | # vectors for all timesteps, producing an array of shape (N, T, H). # 128 | # (4) Use a (temporal) affine transformation to compute scores over the # 129 | # vocabulary at every timestep using the hidden states, giving an # 130 | # array of shape (N, T, V). # 131 | # (5) Use (temporal) softmax to compute loss using captions_out, ignoring # 132 | # the points where the output word is using the mask above. # 133 | # # 134 | # In the backward pass you will need to compute the gradient of the loss # 135 | # with respect to all model parameters. Use the loss and grads variables # 136 | # defined above to store loss and gradients; grads[k] should give the # 137 | # gradients for self.params[k]. # 138 | ############################################################################ 139 | N, T = captions.shape 140 | h0, cache0 = affine_forward(features, W_proj, b_proj) 141 | 142 | """Most papers add as the first input word, but this lesson 143 | apparently did not. Therefore I added these codes below. 144 | """ 145 | captions_in = np.concatenate((np.tile(self._start, (N, 1)), captions_in), 1) 146 | captions_out = np.concatenate((captions_out, np.tile(self._end, (N, 1))), 1) 147 | mask = np.concatenate((mask, np.tile(True, (N, 1))), 1) 148 | 149 | out, cache1 = word_embedding_forward(captions_in, W_embed) 150 | if self.cell_type == 'rnn': 151 | h, cache2 = rnn_forward(out, h0, Wx, Wh, b) 152 | elif self.cell_type == 'lstm': 153 | h, cache2 = lstm_forward(out, h0, Wx, Wh, b) 154 | out2, cache3 = temporal_affine_forward(h, W_vocab, b_vocab) 155 | loss, dout = temporal_softmax_loss(out2, captions_out, mask) 156 | 157 | dout, grads['W_vocab'], grads['b_vocab'] = temporal_affine_backward(dout, cache3) 158 | if self.cell_type == 'rnn': 159 | dout, dh0, grads['Wx'], grads['Wh'], grads['b'] = rnn_backward(dout, cache2) 160 | elif self.cell_type == 'lstm': 161 | dout, dh0, grads['Wx'], grads['Wh'], grads['b'] = lstm_backward(dout, cache2) 162 | grads['W_embed'] = word_embedding_backward(dout, cache1) 163 | _, grads['W_proj'], grads['b_proj'] = affine_backward(dh0, cache0) 164 | ############################################################################ 165 | # END OF YOUR CODE # 166 | ############################################################################ 167 | 168 | return loss, grads 169 | 170 | 171 | def sample(self, features, max_length=30): 172 | """ 173 | Run a test-time forward pass for the model, sampling captions for input 174 | feature vectors. 175 | 176 | At each timestep, we embed the current word, pass it and the previous hidden 177 | state to the RNN to get the next hidden state, use the hidden state to get 178 | scores for all vocab words, and choose the word with the highest score as 179 | the next word. The initial hidden state is computed by applying an affine 180 | transform to the input image features, and the initial word is the 181 | token. 182 | 183 | For LSTMs you will also have to keep track of the cell state; in that case 184 | the initial cell state should be zero. 185 | 186 | Inputs: 187 | - features: Array of input image features of shape (N, D). 188 | - max_length: Maximum length T of generated captions. 189 | 190 | Returns: 191 | - captions: Array of shape (N, max_length) giving sampled captions, 192 | where each element is an integer in the range [0, V). The first element 193 | of captions should be the first sampled word, not the token. 194 | """ 195 | N = features.shape[0] 196 | captions = self._null * np.ones((N, max_length), dtype=np.int32) 197 | 198 | # Unpack parameters 199 | W_proj, b_proj = self.params['W_proj'], self.params['b_proj'] 200 | W_embed = self.params['W_embed'] 201 | Wx, Wh, b = self.params['Wx'], self.params['Wh'], self.params['b'] 202 | W_vocab, b_vocab = self.params['W_vocab'], self.params['b_vocab'] 203 | 204 | ########################################################################### 205 | # TODO: Implement test-time sampling for the model. You will need to # 206 | # initialize the hidden state of the RNN by applying the learned affine # 207 | # transform to the input image features. The first word that you feed to # 208 | # the RNN should be the token; its value is stored in the # 209 | # variable self._start. At each timestep you will need to do to: # 210 | # (1) Embed the previous word using the learned word embeddings # 211 | # (2) Make an RNN step using the previous hidden state and the embedded # 212 | # current word to get the next hidden state. # 213 | # (3) Apply the learned affine transformation to the next hidden state to # 214 | # get scores for all words in the vocabulary # 215 | # (4) Select the word with the highest score as the next word, writing it # 216 | # to the appropriate slot in the captions variable # 217 | # # 218 | # For simplicity, you do not need to stop generating after an token # 219 | # is sampled, but you can if you want to. # 220 | # # 221 | # HINT: You will not be able to use the rnn_forward or lstm_forward # 222 | # functions; you'll need to call rnn_step_forward or lstm_step_forward in # 223 | # a loop. # 224 | ########################################################################### 225 | prev_h, _ = affine_forward(features, W_proj, b_proj) 226 | prev_c = np.zeros(prev_h.shape) 227 | word_index_input = [self._start] * N 228 | 229 | for i in range(max_length): 230 | word_vector_input = W_embed[word_index_input] # (N, wordvec_dim) 231 | if self.cell_type == 'rnn': 232 | next_h, _ = rnn_step_forward(word_vector_input, prev_h, Wx, Wh, b) 233 | elif self.cell_type == 'lstm': 234 | next_h, next_c, _ = lstm_step_forward(word_vector_input, prev_h, prev_c, Wx, Wh, b) 235 | prev_c = next_c 236 | prev_h = next_h # (N, H) 237 | out, _ = affine_forward(prev_h, W_vocab, b_vocab) # (N,V) 238 | word_index_input = captions[:,i] = np.argmax(out, axis=1) # (N,) 239 | ############################################################################ 240 | # END OF YOUR CODE # 241 | ############################################################################ 242 | return captions 243 | -------------------------------------------------------------------------------- /assignment3/cs231n/coco_utils.py: -------------------------------------------------------------------------------- 1 | import os, json 2 | import numpy as np 3 | import h5py 4 | 5 | 6 | def load_coco_data(base_dir='cs231n/datasets/coco_captioning', 7 | max_train=None, 8 | pca_features=True): 9 | data = {} 10 | caption_file = os.path.join(base_dir, 'coco2014_captions.h5') 11 | with h5py.File(caption_file, 'r') as f: 12 | for k, v in f.iteritems(): 13 | data[k] = np.asarray(v) 14 | 15 | if pca_features: 16 | train_feat_file = os.path.join(base_dir, 'train2014_vgg16_fc7_pca.h5') 17 | else: 18 | train_feat_file = os.path.join(base_dir, 'train2014_vgg16_fc7.h5') 19 | with h5py.File(train_feat_file, 'r') as f: 20 | data['train_features'] = np.asarray(f['features']) 21 | 22 | if pca_features: 23 | val_feat_file = os.path.join(base_dir, 'val2014_vgg16_fc7_pca.h5') 24 | else: 25 | val_feat_file = os.path.join(base_dir, 'val2014_vgg16_fc7.h5') 26 | with h5py.File(val_feat_file, 'r') as f: 27 | data['val_features'] = np.asarray(f['features']) 28 | 29 | dict_file = os.path.join(base_dir, 'coco2014_vocab.json') 30 | with open(dict_file, 'r') as f: 31 | dict_data = json.load(f) 32 | for k, v in dict_data.iteritems(): 33 | data[k] = v 34 | 35 | train_url_file = os.path.join(base_dir, 'train2014_urls.txt') 36 | with open(train_url_file, 'r') as f: 37 | train_urls = np.asarray([line.strip() for line in f]) 38 | data['train_urls'] = train_urls 39 | 40 | val_url_file = os.path.join(base_dir, 'val2014_urls.txt') 41 | with open(val_url_file, 'r') as f: 42 | val_urls = np.asarray([line.strip() for line in f]) 43 | data['val_urls'] = val_urls 44 | 45 | # Maybe subsample the training data 46 | if max_train is not None: 47 | num_train = data['train_captions'].shape[0] 48 | mask = np.random.randint(num_train, size=max_train) 49 | data['train_captions'] = data['train_captions'][mask] 50 | data['train_image_idxs'] = data['train_image_idxs'][mask] 51 | 52 | return data 53 | 54 | 55 | def decode_captions(captions, idx_to_word): 56 | singleton = False 57 | if captions.ndim == 1: 58 | singleton = True 59 | captions = captions[None] 60 | decoded = [] 61 | N, T = captions.shape 62 | for i in xrange(N): 63 | words = [] 64 | for t in xrange(T): 65 | word = idx_to_word[captions[i, t]] 66 | if word != '': 67 | words.append(word) 68 | if word == '': 69 | break 70 | decoded.append(' '.join(words)) 71 | if singleton: 72 | decoded = decoded[0] 73 | return decoded 74 | 75 | 76 | def sample_coco_minibatch(data, batch_size=100, split='train'): 77 | split_size = data['%s_captions' % split].shape[0] 78 | mask = np.random.choice(split_size, batch_size) 79 | captions = data['%s_captions' % split][mask] 80 | image_idxs = data['%s_image_idxs' % split][mask] 81 | image_features = data['%s_features' % split][image_idxs] 82 | urls = data['%s_urls' % split][image_idxs] 83 | return captions, image_features, urls 84 | 85 | -------------------------------------------------------------------------------- /assignment3/cs231n/data_utils.py: -------------------------------------------------------------------------------- 1 | import cPickle as pickle 2 | import numpy as np 3 | import os 4 | from scipy.misc import imread 5 | 6 | def load_CIFAR_batch(filename): 7 | """ load single batch of cifar """ 8 | with open(filename, 'rb') as f: 9 | datadict = pickle.load(f) 10 | X = datadict['data'] 11 | Y = datadict['labels'] 12 | X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float") 13 | Y = np.array(Y) 14 | return X, Y 15 | 16 | def load_CIFAR10(ROOT): 17 | """ load all of cifar """ 18 | xs = [] 19 | ys = [] 20 | for b in range(1,6): 21 | f = os.path.join(ROOT, 'data_batch_%d' % (b, )) 22 | X, Y = load_CIFAR_batch(f) 23 | xs.append(X) 24 | ys.append(Y) 25 | Xtr = np.concatenate(xs) 26 | Ytr = np.concatenate(ys) 27 | del X, Y 28 | Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch')) 29 | return Xtr, Ytr, Xte, Yte 30 | 31 | 32 | def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, 33 | subtract_mean=True): 34 | """ 35 | Load the CIFAR-10 dataset from disk and perform preprocessing to prepare 36 | it for classifiers. These are the same steps as we used for the SVM, but 37 | condensed to a single function. 38 | """ 39 | # Load the raw CIFAR-10 data 40 | cifar10_dir = 'cs231n/datasets/cifar-10-batches-py' 41 | X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir) 42 | 43 | # Subsample the data 44 | mask = range(num_training, num_training + num_validation) 45 | X_val = X_train[mask] 46 | y_val = y_train[mask] 47 | mask = range(num_training) 48 | X_train = X_train[mask] 49 | y_train = y_train[mask] 50 | mask = range(num_test) 51 | X_test = X_test[mask] 52 | y_test = y_test[mask] 53 | 54 | # Normalize the data: subtract the mean image 55 | if subtract_mean: 56 | mean_image = np.mean(X_train, axis=0) 57 | X_train -= mean_image 58 | X_val -= mean_image 59 | X_test -= mean_image 60 | 61 | # Transpose so that channels come first 62 | X_train = X_train.transpose(0, 3, 1, 2).copy() 63 | X_val = X_val.transpose(0, 3, 1, 2).copy() 64 | X_test = X_test.transpose(0, 3, 1, 2).copy() 65 | 66 | # Package data into a dictionary 67 | return { 68 | 'X_train': X_train, 'y_train': y_train, 69 | 'X_val': X_val, 'y_val': y_val, 70 | 'X_test': X_test, 'y_test': y_test, 71 | } 72 | 73 | 74 | def load_tiny_imagenet(path, dtype=np.float32, subtract_mean=True): 75 | """ 76 | Load TinyImageNet. Each of TinyImageNet-100-A, TinyImageNet-100-B, and 77 | TinyImageNet-200 have the same directory structure, so this can be used 78 | to load any of them. 79 | 80 | Inputs: 81 | - path: String giving path to the directory to load. 82 | - dtype: numpy datatype used to load the data. 83 | - subtract_mean: Whether to subtract the mean training image. 84 | 85 | Returns: A dictionary with the following entries: 86 | - class_names: A list where class_names[i] is a list of strings giving the 87 | WordNet names for class i in the loaded dataset. 88 | - X_train: (N_tr, 3, 64, 64) array of training images 89 | - y_train: (N_tr,) array of training labels 90 | - X_val: (N_val, 3, 64, 64) array of validation images 91 | - y_val: (N_val,) array of validation labels 92 | - X_test: (N_test, 3, 64, 64) array of testing images. 93 | - y_test: (N_test,) array of test labels; if test labels are not available 94 | (such as in student code) then y_test will be None. 95 | - mean_image: (3, 64, 64) array giving mean training image 96 | """ 97 | # First load wnids 98 | with open(os.path.join(path, 'wnids.txt'), 'r') as f: 99 | wnids = [x.strip() for x in f] 100 | 101 | # Map wnids to integer labels 102 | wnid_to_label = {wnid: i for i, wnid in enumerate(wnids)} 103 | 104 | # Use words.txt to get names for each class 105 | with open(os.path.join(path, 'words.txt'), 'r') as f: 106 | wnid_to_words = dict(line.split('\t') for line in f) 107 | for wnid, words in wnid_to_words.iteritems(): 108 | wnid_to_words[wnid] = [w.strip() for w in words.split(',')] 109 | class_names = [wnid_to_words[wnid] for wnid in wnids] 110 | 111 | # Next load training data. 112 | X_train = [] 113 | y_train = [] 114 | for i, wnid in enumerate(wnids): 115 | if (i + 1) % 20 == 0: 116 | print 'loading training data for synset %d / %d' % (i + 1, len(wnids)) 117 | # To figure out the filenames we need to open the boxes file 118 | boxes_file = os.path.join(path, 'train', wnid, '%s_boxes.txt' % wnid) 119 | with open(boxes_file, 'r') as f: 120 | filenames = [x.split('\t')[0] for x in f] 121 | num_images = len(filenames) 122 | 123 | X_train_block = np.zeros((num_images, 3, 64, 64), dtype=dtype) 124 | y_train_block = wnid_to_label[wnid] * np.ones(num_images, dtype=np.int64) 125 | for j, img_file in enumerate(filenames): 126 | img_file = os.path.join(path, 'train', wnid, 'images', img_file) 127 | img = imread(img_file) 128 | if img.ndim == 2: 129 | ## grayscale file 130 | img.shape = (64, 64, 1) 131 | X_train_block[j] = img.transpose(2, 0, 1) 132 | X_train.append(X_train_block) 133 | y_train.append(y_train_block) 134 | 135 | # We need to concatenate all training data 136 | X_train = np.concatenate(X_train, axis=0) 137 | y_train = np.concatenate(y_train, axis=0) 138 | 139 | # Next load validation data 140 | with open(os.path.join(path, 'val', 'val_annotations.txt'), 'r') as f: 141 | img_files = [] 142 | val_wnids = [] 143 | for line in f: 144 | img_file, wnid = line.split('\t')[:2] 145 | img_files.append(img_file) 146 | val_wnids.append(wnid) 147 | num_val = len(img_files) 148 | y_val = np.array([wnid_to_label[wnid] for wnid in val_wnids]) 149 | X_val = np.zeros((num_val, 3, 64, 64), dtype=dtype) 150 | for i, img_file in enumerate(img_files): 151 | img_file = os.path.join(path, 'val', 'images', img_file) 152 | img = imread(img_file) 153 | if img.ndim == 2: 154 | img.shape = (64, 64, 1) 155 | X_val[i] = img.transpose(2, 0, 1) 156 | 157 | # Next load test images 158 | # Students won't have test labels, so we need to iterate over files in the 159 | # images directory. 160 | img_files = os.listdir(os.path.join(path, 'test', 'images')) 161 | X_test = np.zeros((len(img_files), 3, 64, 64), dtype=dtype) 162 | for i, img_file in enumerate(img_files): 163 | img_file = os.path.join(path, 'test', 'images', img_file) 164 | img = imread(img_file) 165 | if img.ndim == 2: 166 | img.shape = (64, 64, 1) 167 | X_test[i] = img.transpose(2, 0, 1) 168 | 169 | y_test = None 170 | y_test_file = os.path.join(path, 'test', 'test_annotations.txt') 171 | if os.path.isfile(y_test_file): 172 | with open(y_test_file, 'r') as f: 173 | img_file_to_wnid = {} 174 | for line in f: 175 | line = line.split('\t') 176 | img_file_to_wnid[line[0]] = line[1] 177 | y_test = [wnid_to_label[img_file_to_wnid[img_file]] for img_file in img_files] 178 | y_test = np.array(y_test) 179 | 180 | mean_image = X_train.mean(axis=0) 181 | if subtract_mean: 182 | X_train -= mean_image[None] 183 | X_val -= mean_image[None] 184 | X_test -= mean_image[None] 185 | 186 | return { 187 | 'class_names': class_names, 188 | 'X_train': X_train, 189 | 'y_train': y_train, 190 | 'X_val': X_val, 191 | 'y_val': y_val, 192 | 'X_test': X_test, 193 | 'y_test': y_test, 194 | 'class_names': class_names, 195 | 'mean_image': mean_image, 196 | } 197 | 198 | 199 | def load_models(models_dir): 200 | """ 201 | Load saved models from disk. This will attempt to unpickle all files in a 202 | directory; any files that give errors on unpickling (such as README.txt) will 203 | be skipped. 204 | 205 | Inputs: 206 | - models_dir: String giving the path to a directory containing model files. 207 | Each model file is a pickled dictionary with a 'model' field. 208 | 209 | Returns: 210 | A dictionary mapping model file names to models. 211 | """ 212 | models = {} 213 | for model_file in os.listdir(models_dir): 214 | with open(os.path.join(models_dir, model_file), 'rb') as f: 215 | try: 216 | models[model_file] = pickle.load(f)['model'] 217 | except pickle.UnpicklingError: 218 | continue 219 | return models 220 | -------------------------------------------------------------------------------- /assignment3/cs231n/datasets/get_coco_captioning.sh: -------------------------------------------------------------------------------- 1 | wget "http://cs231n.stanford.edu/coco_captioning.zip" 2 | unzip coco_captioning.zip 3 | rm coco_captioning.zip 4 | -------------------------------------------------------------------------------- /assignment3/cs231n/datasets/get_pretrained_model.sh: -------------------------------------------------------------------------------- 1 | wget http://cs231n.stanford.edu/pretrained_model.h5 2 | -------------------------------------------------------------------------------- /assignment3/cs231n/datasets/get_tiny_imagenet_a.sh: -------------------------------------------------------------------------------- 1 | wget http://cs231n.stanford.edu/tiny-imagenet-100-A.zip 2 | unzip tiny-imagenet-100-A.zip 3 | rm tiny-imagenet-100-A.zip 4 | -------------------------------------------------------------------------------- /assignment3/cs231n/fast_layers.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | try: 3 | from cs231n.im2col_cython import col2im_cython, im2col_cython 4 | from cs231n.im2col_cython import col2im_6d_cython 5 | except ImportError: 6 | print 'run the following from the cs231n directory and try again:' 7 | print 'python setup.py build_ext --inplace' 8 | print 'You may also need to restart your iPython kernel' 9 | 10 | from cs231n.im2col import * 11 | 12 | 13 | def conv_forward_im2col(x, w, b, conv_param): 14 | """ 15 | A fast implementation of the forward pass for a convolutional layer 16 | based on im2col and col2im. 17 | """ 18 | N, C, H, W = x.shape 19 | num_filters, _, filter_height, filter_width = w.shape 20 | stride, pad = conv_param['stride'], conv_param['pad'] 21 | 22 | # Check dimensions 23 | assert (W + 2 * pad - filter_width) % stride == 0, 'width does not work' 24 | assert (H + 2 * pad - filter_height) % stride == 0, 'height does not work' 25 | 26 | # Create output 27 | out_height = (H + 2 * pad - filter_height) / stride + 1 28 | out_width = (W + 2 * pad - filter_width) / stride + 1 29 | out = np.zeros((N, num_filters, out_height, out_width), dtype=x.dtype) 30 | 31 | # x_cols = im2col_indices(x, w.shape[2], w.shape[3], pad, stride) 32 | x_cols = im2col_cython(x, w.shape[2], w.shape[3], pad, stride) 33 | res = w.reshape((w.shape[0], -1)).dot(x_cols) + b.reshape(-1, 1) 34 | 35 | out = res.reshape(w.shape[0], out.shape[2], out.shape[3], x.shape[0]) 36 | out = out.transpose(3, 0, 1, 2) 37 | 38 | cache = (x, w, b, conv_param, x_cols) 39 | return out, cache 40 | 41 | 42 | def conv_forward_strides(x, w, b, conv_param): 43 | N, C, H, W = x.shape 44 | F, _, HH, WW = w.shape 45 | stride, pad = conv_param['stride'], conv_param['pad'] 46 | 47 | # Check dimensions 48 | #assert (W + 2 * pad - WW) % stride == 0, 'width does not work' 49 | #assert (H + 2 * pad - HH) % stride == 0, 'height does not work' 50 | 51 | # Pad the input 52 | p = pad 53 | x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant') 54 | 55 | # Figure out output dimensions 56 | H += 2 * pad 57 | W += 2 * pad 58 | out_h = (H - HH) / stride + 1 59 | out_w = (W - WW) / stride + 1 60 | 61 | # Perform an im2col operation by picking clever strides 62 | shape = (C, HH, WW, N, out_h, out_w) 63 | strides = (H * W, W, 1, C * H * W, stride * W, stride) 64 | strides = x.itemsize * np.array(strides) 65 | x_stride = np.lib.stride_tricks.as_strided(x_padded, 66 | shape=shape, strides=strides) 67 | x_cols = np.ascontiguousarray(x_stride) 68 | x_cols.shape = (C * HH * WW, N * out_h * out_w) 69 | 70 | # Now all our convolutions are a big matrix multiply 71 | res = w.reshape(F, -1).dot(x_cols) + b.reshape(-1, 1) 72 | 73 | # Reshape the output 74 | res.shape = (F, N, out_h, out_w) 75 | out = res.transpose(1, 0, 2, 3) 76 | 77 | # Be nice and return a contiguous array 78 | # The old version of conv_forward_fast doesn't do this, so for a fair 79 | # comparison we won't either 80 | out = np.ascontiguousarray(out) 81 | 82 | cache = (x, w, b, conv_param, x_cols) 83 | return out, cache 84 | 85 | 86 | def conv_backward_strides(dout, cache): 87 | x, w, b, conv_param, x_cols = cache 88 | stride, pad = conv_param['stride'], conv_param['pad'] 89 | 90 | N, C, H, W = x.shape 91 | F, _, HH, WW = w.shape 92 | _, _, out_h, out_w = dout.shape 93 | 94 | db = np.sum(dout, axis=(0, 2, 3)) 95 | 96 | dout_reshaped = dout.transpose(1, 0, 2, 3).reshape(F, -1) 97 | dw = dout_reshaped.dot(x_cols.T).reshape(w.shape) 98 | 99 | dx_cols = w.reshape(F, -1).T.dot(dout_reshaped) 100 | dx_cols.shape = (C, HH, WW, N, out_h, out_w) 101 | dx = col2im_6d_cython(dx_cols, N, C, H, W, HH, WW, pad, stride) 102 | 103 | return dx, dw, db 104 | 105 | 106 | def conv_backward_im2col(dout, cache): 107 | """ 108 | A fast implementation of the backward pass for a convolutional layer 109 | based on im2col and col2im. 110 | """ 111 | x, w, b, conv_param, x_cols = cache 112 | stride, pad = conv_param['stride'], conv_param['pad'] 113 | 114 | db = np.sum(dout, axis=(0, 2, 3)) 115 | 116 | num_filters, _, filter_height, filter_width = w.shape 117 | dout_reshaped = dout.transpose(1, 2, 3, 0).reshape(num_filters, -1) 118 | dw = dout_reshaped.dot(x_cols.T).reshape(w.shape) 119 | 120 | dx_cols = w.reshape(num_filters, -1).T.dot(dout_reshaped) 121 | # dx = col2im_indices(dx_cols, x.shape, filter_height, filter_width, pad, stride) 122 | dx = col2im_cython(dx_cols, x.shape[0], x.shape[1], x.shape[2], x.shape[3], 123 | filter_height, filter_width, pad, stride) 124 | 125 | return dx, dw, db 126 | 127 | 128 | conv_forward_fast = conv_forward_strides 129 | conv_backward_fast = conv_backward_strides 130 | 131 | 132 | def max_pool_forward_fast(x, pool_param): 133 | """ 134 | A fast implementation of the forward pass for a max pooling layer. 135 | 136 | This chooses between the reshape method and the im2col method. If the pooling 137 | regions are square and tile the input image, then we can use the reshape 138 | method which is very fast. Otherwise we fall back on the im2col method, which 139 | is not much faster than the naive method. 140 | """ 141 | N, C, H, W = x.shape 142 | pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width'] 143 | stride = pool_param['stride'] 144 | 145 | same_size = pool_height == pool_width == stride 146 | tiles = H % pool_height == 0 and W % pool_width == 0 147 | if same_size and tiles: 148 | out, reshape_cache = max_pool_forward_reshape(x, pool_param) 149 | cache = ('reshape', reshape_cache) 150 | else: 151 | out, im2col_cache = max_pool_forward_im2col(x, pool_param) 152 | cache = ('im2col', im2col_cache) 153 | return out, cache 154 | 155 | 156 | def max_pool_backward_fast(dout, cache): 157 | """ 158 | A fast implementation of the backward pass for a max pooling layer. 159 | 160 | This switches between the reshape method an the im2col method depending on 161 | which method was used to generate the cache. 162 | """ 163 | method, real_cache = cache 164 | if method == 'reshape': 165 | return max_pool_backward_reshape(dout, real_cache) 166 | elif method == 'im2col': 167 | return max_pool_backward_im2col(dout, real_cache) 168 | else: 169 | raise ValueError('Unrecognized method "%s"' % method) 170 | 171 | 172 | def max_pool_forward_reshape(x, pool_param): 173 | """ 174 | A fast implementation of the forward pass for the max pooling layer that uses 175 | some clever reshaping. 176 | 177 | This can only be used for square pooling regions that tile the input. 178 | """ 179 | N, C, H, W = x.shape 180 | pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width'] 181 | stride = pool_param['stride'] 182 | assert pool_height == pool_width == stride, 'Invalid pool params' 183 | assert H % pool_height == 0 184 | assert W % pool_height == 0 185 | x_reshaped = x.reshape(N, C, H / pool_height, pool_height, 186 | W / pool_width, pool_width) 187 | out = x_reshaped.max(axis=3).max(axis=4) 188 | 189 | cache = (x, x_reshaped, out) 190 | return out, cache 191 | 192 | 193 | def max_pool_backward_reshape(dout, cache): 194 | """ 195 | A fast implementation of the backward pass for the max pooling layer that 196 | uses some clever broadcasting and reshaping. 197 | 198 | This can only be used if the forward pass was computed using 199 | max_pool_forward_reshape. 200 | 201 | NOTE: If there are multiple argmaxes, this method will assign gradient to 202 | ALL argmax elements of the input rather than picking one. In this case the 203 | gradient will actually be incorrect. However this is unlikely to occur in 204 | practice, so it shouldn't matter much. One possible solution is to split the 205 | upstream gradient equally among all argmax elements; this should result in a 206 | valid subgradient. You can make this happen by uncommenting the line below; 207 | however this results in a significant performance penalty (about 40% slower) 208 | and is unlikely to matter in practice so we don't do it. 209 | """ 210 | x, x_reshaped, out = cache 211 | 212 | dx_reshaped = np.zeros_like(x_reshaped) 213 | out_newaxis = out[:, :, :, np.newaxis, :, np.newaxis] 214 | mask = (x_reshaped == out_newaxis) 215 | dout_newaxis = dout[:, :, :, np.newaxis, :, np.newaxis] 216 | dout_broadcast, _ = np.broadcast_arrays(dout_newaxis, dx_reshaped) 217 | dx_reshaped[mask] = dout_broadcast[mask] 218 | dx_reshaped /= np.sum(mask, axis=(3, 5), keepdims=True) 219 | dx = dx_reshaped.reshape(x.shape) 220 | 221 | return dx 222 | 223 | 224 | def max_pool_forward_im2col(x, pool_param): 225 | """ 226 | An implementation of the forward pass for max pooling based on im2col. 227 | 228 | This isn't much faster than the naive version, so it should be avoided if 229 | possible. 230 | """ 231 | N, C, H, W = x.shape 232 | pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width'] 233 | stride = pool_param['stride'] 234 | 235 | assert (H - pool_height) % stride == 0, 'Invalid height' 236 | assert (W - pool_width) % stride == 0, 'Invalid width' 237 | 238 | out_height = (H - pool_height) / stride + 1 239 | out_width = (W - pool_width) / stride + 1 240 | 241 | x_split = x.reshape(N * C, 1, H, W) 242 | x_cols = im2col(x_split, pool_height, pool_width, padding=0, stride=stride) 243 | x_cols_argmax = np.argmax(x_cols, axis=0) 244 | x_cols_max = x_cols[x_cols_argmax, np.arange(x_cols.shape[1])] 245 | out = x_cols_max.reshape(out_height, out_width, N, C).transpose(2, 3, 0, 1) 246 | 247 | cache = (x, x_cols, x_cols_argmax, pool_param) 248 | return out, cache 249 | 250 | 251 | def max_pool_backward_im2col(dout, cache): 252 | """ 253 | An implementation of the backward pass for max pooling based on im2col. 254 | 255 | This isn't much faster than the naive version, so it should be avoided if 256 | possible. 257 | """ 258 | x, x_cols, x_cols_argmax, pool_param = cache 259 | N, C, H, W = x.shape 260 | pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width'] 261 | stride = pool_param['stride'] 262 | 263 | dout_reshaped = dout.transpose(2, 3, 0, 1).flatten() 264 | dx_cols = np.zeros_like(x_cols) 265 | dx_cols[x_cols_argmax, np.arange(dx_cols.shape[1])] = dout_reshaped 266 | dx = col2im_indices(dx_cols, (N * C, 1, H, W), pool_height, pool_width, 267 | padding=0, stride=stride) 268 | dx = dx.reshape(x.shape) 269 | 270 | return dx 271 | -------------------------------------------------------------------------------- /assignment3/cs231n/gradient_check.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from random import randrange 3 | 4 | def eval_numerical_gradient(f, x, verbose=True, h=0.00001): 5 | """ 6 | a naive implementation of numerical gradient of f at x 7 | - f should be a function that takes a single argument 8 | - x is the point (numpy array) to evaluate the gradient at 9 | """ 10 | 11 | fx = f(x) # evaluate function value at original point 12 | grad = np.zeros_like(x) 13 | # iterate over all indexes in x 14 | it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite']) 15 | while not it.finished: 16 | 17 | # evaluate function at x+h 18 | ix = it.multi_index 19 | oldval = x[ix] 20 | x[ix] = oldval + h # increment by h 21 | fxph = f(x) # evalute f(x + h) 22 | x[ix] = oldval - h 23 | fxmh = f(x) # evaluate f(x - h) 24 | x[ix] = oldval # restore 25 | 26 | # compute the partial derivative with centered formula 27 | grad[ix] = (fxph - fxmh) / (2 * h) # the slope 28 | if verbose: 29 | print ix, grad[ix] 30 | it.iternext() # step to next dimension 31 | 32 | return grad 33 | 34 | 35 | def eval_numerical_gradient_array(f, x, df, h=1e-5): 36 | """ 37 | Evaluate a numeric gradient for a function that accepts a numpy 38 | array and returns a numpy array. 39 | """ 40 | grad = np.zeros_like(x) 41 | it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite']) 42 | while not it.finished: 43 | ix = it.multi_index 44 | 45 | oldval = x[ix] 46 | x[ix] = oldval + h 47 | pos = f(x).copy() 48 | x[ix] = oldval - h 49 | neg = f(x).copy() 50 | x[ix] = oldval 51 | 52 | grad[ix] = np.sum((pos - neg) * df) / (2 * h) 53 | it.iternext() 54 | return grad 55 | 56 | 57 | def eval_numerical_gradient_blobs(f, inputs, output, h=1e-5): 58 | """ 59 | Compute numeric gradients for a function that operates on input 60 | and output blobs. 61 | 62 | We assume that f accepts several input blobs as arguments, followed by a blob 63 | into which outputs will be written. For example, f might be called like this: 64 | 65 | f(x, w, out) 66 | 67 | where x and w are input Blobs, and the result of f will be written to out. 68 | 69 | Inputs: 70 | - f: function 71 | - inputs: tuple of input blobs 72 | - output: output blob 73 | - h: step size 74 | """ 75 | numeric_diffs = [] 76 | for input_blob in inputs: 77 | diff = np.zeros_like(input_blob.diffs) 78 | it = np.nditer(input_blob.vals, flags=['multi_index'], 79 | op_flags=['readwrite']) 80 | while not it.finished: 81 | idx = it.multi_index 82 | orig = input_blob.vals[idx] 83 | 84 | input_blob.vals[idx] = orig + h 85 | f(*(inputs + (output,))) 86 | pos = np.copy(output.vals) 87 | input_blob.vals[idx] = orig - h 88 | f(*(inputs + (output,))) 89 | neg = np.copy(output.vals) 90 | input_blob.vals[idx] = orig 91 | 92 | diff[idx] = np.sum((pos - neg) * output.diffs) / (2.0 * h) 93 | 94 | it.iternext() 95 | numeric_diffs.append(diff) 96 | return numeric_diffs 97 | 98 | 99 | def eval_numerical_gradient_net(net, inputs, output, h=1e-5): 100 | return eval_numerical_gradient_blobs(lambda *args: net.forward(), 101 | inputs, output, h=h) 102 | 103 | 104 | def grad_check_sparse(f, x, analytic_grad, num_checks=10, h=1e-5): 105 | """ 106 | sample a few random elements and only return numerical 107 | in this dimensions. 108 | """ 109 | 110 | for i in xrange(num_checks): 111 | ix = tuple([randrange(m) for m in x.shape]) 112 | 113 | oldval = x[ix] 114 | x[ix] = oldval + h # increment by h 115 | fxph = f(x) # evaluate f(x + h) 116 | x[ix] = oldval - h # increment by h 117 | fxmh = f(x) # evaluate f(x - h) 118 | x[ix] = oldval # reset 119 | 120 | grad_numerical = (fxph - fxmh) / (2 * h) 121 | grad_analytic = analytic_grad[ix] 122 | rel_error = abs(grad_numerical - grad_analytic) / (abs(grad_numerical) + abs(grad_analytic)) 123 | print 'numerical: %f analytic: %f, relative error: %e' % (grad_numerical, grad_analytic, rel_error) 124 | 125 | -------------------------------------------------------------------------------- /assignment3/cs231n/im2col.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def get_im2col_indices(x_shape, field_height, field_width, padding=1, stride=1): 5 | # First figure out what the size of the output should be 6 | N, C, H, W = x_shape 7 | assert (H + 2 * padding - field_height) % stride == 0 8 | assert (W + 2 * padding - field_height) % stride == 0 9 | out_height = (H + 2 * padding - field_height) / stride + 1 10 | out_width = (W + 2 * padding - field_width) / stride + 1 11 | 12 | i0 = np.repeat(np.arange(field_height), field_width) 13 | i0 = np.tile(i0, C) 14 | i1 = stride * np.repeat(np.arange(out_height), out_width) 15 | j0 = np.tile(np.arange(field_width), field_height * C) 16 | j1 = stride * np.tile(np.arange(out_width), out_height) 17 | i = i0.reshape(-1, 1) + i1.reshape(1, -1) 18 | j = j0.reshape(-1, 1) + j1.reshape(1, -1) 19 | 20 | k = np.repeat(np.arange(C), field_height * field_width).reshape(-1, 1) 21 | 22 | return (k, i, j) 23 | 24 | 25 | def im2col_indices(x, field_height, field_width, padding=1, stride=1): 26 | """ An implementation of im2col based on some fancy indexing """ 27 | # Zero-pad the input 28 | p = padding 29 | x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant') 30 | 31 | k, i, j = get_im2col_indices(x.shape, field_height, field_width, padding, 32 | stride) 33 | 34 | cols = x_padded[:, k, i, j] 35 | C = x.shape[1] 36 | cols = cols.transpose(1, 2, 0).reshape(field_height * field_width * C, -1) 37 | return cols 38 | 39 | 40 | def col2im_indices(cols, x_shape, field_height=3, field_width=3, padding=1, 41 | stride=1): 42 | """ An implementation of col2im based on fancy indexing and np.add.at """ 43 | N, C, H, W = x_shape 44 | H_padded, W_padded = H + 2 * padding, W + 2 * padding 45 | x_padded = np.zeros((N, C, H_padded, W_padded), dtype=cols.dtype) 46 | k, i, j = get_im2col_indices(x_shape, field_height, field_width, padding, 47 | stride) 48 | cols_reshaped = cols.reshape(C * field_height * field_width, -1, N) 49 | cols_reshaped = cols_reshaped.transpose(2, 0, 1) 50 | np.add.at(x_padded, (slice(None), k, i, j), cols_reshaped) 51 | if padding == 0: 52 | return x_padded 53 | return x_padded[:, :, padding:-padding, padding:-padding] 54 | 55 | pass 56 | -------------------------------------------------------------------------------- /assignment3/cs231n/im2col_cython.pyx: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | cimport numpy as np 3 | cimport cython 4 | 5 | # DTYPE = np.float64 6 | # ctypedef np.float64_t DTYPE_t 7 | 8 | ctypedef fused DTYPE_t: 9 | np.float32_t 10 | np.float64_t 11 | 12 | def im2col_cython(np.ndarray[DTYPE_t, ndim=4] x, int field_height, 13 | int field_width, int padding, int stride): 14 | cdef int N = x.shape[0] 15 | cdef int C = x.shape[1] 16 | cdef int H = x.shape[2] 17 | cdef int W = x.shape[3] 18 | 19 | cdef int HH = (H + 2 * padding - field_height) / stride + 1 20 | cdef int WW = (W + 2 * padding - field_width) / stride + 1 21 | 22 | cdef int p = padding 23 | cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.pad(x, 24 | ((0, 0), (0, 0), (p, p), (p, p)), mode='constant') 25 | 26 | cdef np.ndarray[DTYPE_t, ndim=2] cols = np.zeros( 27 | (C * field_height * field_width, N * HH * WW), 28 | dtype=x.dtype) 29 | 30 | # Moving the inner loop to a C function with no bounds checking works, but does 31 | # not seem to help performance in any measurable way. 32 | 33 | im2col_cython_inner(cols, x_padded, N, C, H, W, HH, WW, 34 | field_height, field_width, padding, stride) 35 | return cols 36 | 37 | 38 | @cython.boundscheck(False) 39 | cdef int im2col_cython_inner(np.ndarray[DTYPE_t, ndim=2] cols, 40 | np.ndarray[DTYPE_t, ndim=4] x_padded, 41 | int N, int C, int H, int W, int HH, int WW, 42 | int field_height, int field_width, int padding, int stride) except? -1: 43 | cdef int c, ii, jj, row, yy, xx, i, col 44 | 45 | for c in range(C): 46 | for yy in range(HH): 47 | for xx in range(WW): 48 | for ii in range(field_height): 49 | for jj in range(field_width): 50 | row = c * field_width * field_height + ii * field_height + jj 51 | for i in range(N): 52 | col = yy * WW * N + xx * N + i 53 | cols[row, col] = x_padded[i, c, stride * yy + ii, stride * xx + jj] 54 | 55 | 56 | 57 | def col2im_cython(np.ndarray[DTYPE_t, ndim=2] cols, int N, int C, int H, int W, 58 | int field_height, int field_width, int padding, int stride): 59 | cdef np.ndarray x = np.empty((N, C, H, W), dtype=cols.dtype) 60 | cdef int HH = (H + 2 * padding - field_height) / stride + 1 61 | cdef int WW = (W + 2 * padding - field_width) / stride + 1 62 | cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.zeros((N, C, H + 2 * padding, W + 2 * padding), 63 | dtype=cols.dtype) 64 | 65 | # Moving the inner loop to a C-function with no bounds checking improves 66 | # performance quite a bit for col2im. 67 | col2im_cython_inner(cols, x_padded, N, C, H, W, HH, WW, 68 | field_height, field_width, padding, stride) 69 | if padding > 0: 70 | return x_padded[:, :, padding:-padding, padding:-padding] 71 | return x_padded 72 | 73 | 74 | @cython.boundscheck(False) 75 | cdef int col2im_cython_inner(np.ndarray[DTYPE_t, ndim=2] cols, 76 | np.ndarray[DTYPE_t, ndim=4] x_padded, 77 | int N, int C, int H, int W, int HH, int WW, 78 | int field_height, int field_width, int padding, int stride) except? -1: 79 | cdef int c, ii, jj, row, yy, xx, i, col 80 | 81 | for c in range(C): 82 | for ii in range(field_height): 83 | for jj in range(field_width): 84 | row = c * field_width * field_height + ii * field_height + jj 85 | for yy in range(HH): 86 | for xx in range(WW): 87 | for i in range(N): 88 | col = yy * WW * N + xx * N + i 89 | x_padded[i, c, stride * yy + ii, stride * xx + jj] += cols[row, col] 90 | 91 | 92 | @cython.boundscheck(False) 93 | @cython.wraparound(False) 94 | cdef col2im_6d_cython_inner(np.ndarray[DTYPE_t, ndim=6] cols, 95 | np.ndarray[DTYPE_t, ndim=4] x_padded, 96 | int N, int C, int H, int W, int HH, int WW, 97 | int out_h, int out_w, int pad, int stride): 98 | 99 | cdef int c, hh, ww, n, h, w 100 | for n in range(N): 101 | for c in range(C): 102 | for hh in range(HH): 103 | for ww in range(WW): 104 | for h in range(out_h): 105 | for w in range(out_w): 106 | x_padded[n, c, stride * h + hh, stride * w + ww] += cols[c, hh, ww, n, h, w] 107 | 108 | 109 | def col2im_6d_cython(np.ndarray[DTYPE_t, ndim=6] cols, int N, int C, int H, int W, 110 | int HH, int WW, int pad, int stride): 111 | cdef np.ndarray x = np.empty((N, C, H, W), dtype=cols.dtype) 112 | cdef int out_h = (H + 2 * pad - HH) / stride + 1 113 | cdef int out_w = (W + 2 * pad - WW) / stride + 1 114 | cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.zeros((N, C, H + 2 * pad, W + 2 * pad), 115 | dtype=cols.dtype) 116 | 117 | col2im_6d_cython_inner(cols, x_padded, N, C, H, W, HH, WW, out_h, out_w, pad, stride) 118 | 119 | if pad > 0: 120 | return x_padded[:, :, pad:-pad, pad:-pad] 121 | return x_padded 122 | -------------------------------------------------------------------------------- /assignment3/cs231n/image_utils.py: -------------------------------------------------------------------------------- 1 | import urllib2, os, tempfile 2 | 3 | import numpy as np 4 | from scipy.misc import imread 5 | 6 | from cs231n.fast_layers import conv_forward_fast 7 | 8 | 9 | """ 10 | Utility functions used for viewing and processing images. 11 | """ 12 | 13 | 14 | def blur_image(X): 15 | """ 16 | A very gentle image blurring operation, to be used as a regularizer for image 17 | generation. 18 | 19 | Inputs: 20 | - X: Image data of shape (N, 3, H, W) 21 | 22 | Returns: 23 | - X_blur: Blurred version of X, of shape (N, 3, H, W) 24 | """ 25 | w_blur = np.zeros((3, 3, 3, 3)) 26 | b_blur = np.zeros(3) 27 | blur_param = {'stride': 1, 'pad': 1} 28 | for i in xrange(3): 29 | w_blur[i, i] = np.asarray([[1, 2, 1], [2, 188, 2], [1, 2, 1]], dtype=np.float32) 30 | w_blur /= 200.0 31 | return conv_forward_fast(X, w_blur, b_blur, blur_param)[0] 32 | 33 | 34 | def preprocess_image(img, mean_img, mean='image'): 35 | """ 36 | Convert to float, transepose, and subtract mean pixel 37 | 38 | Input: 39 | - img: (H, W, 3) 40 | 41 | Returns: 42 | - (1, 3, H, 3) 43 | """ 44 | if mean == 'image': 45 | mean = mean_img 46 | elif mean == 'pixel': 47 | mean = mean_img.mean(axis=(1, 2), keepdims=True) 48 | elif mean == 'none': 49 | mean = 0 50 | else: 51 | raise ValueError('mean must be image or pixel or none') 52 | return img.astype(np.float32).transpose(2, 0, 1)[None] - mean 53 | 54 | 55 | def deprocess_image(img, mean_img, mean='image', renorm=False): 56 | """ 57 | Add mean pixel, transpose, and convert to uint8 58 | 59 | Input: 60 | - (1, 3, H, W) or (3, H, W) 61 | 62 | Returns: 63 | - (H, W, 3) 64 | """ 65 | if mean == 'image': 66 | mean = mean_img 67 | elif mean == 'pixel': 68 | mean = mean_img.mean(axis=(1, 2), keepdims=True) 69 | elif mean == 'none': 70 | mean = 0 71 | else: 72 | raise ValueError('mean must be image or pixel or none') 73 | if img.ndim == 3: 74 | img = img[None] 75 | img = (img + mean)[0].transpose(1, 2, 0) 76 | if renorm: 77 | low, high = img.min(), img.max() 78 | img = 255.0 * (img - low) / (high - low) 79 | return img.astype(np.uint8) 80 | 81 | 82 | def image_from_url(url): 83 | """ 84 | Read an image from a URL. Returns a numpy array with the pixel data. 85 | We write the image to a temporary file then read it back. Kinda gross. 86 | """ 87 | try: 88 | f = urllib2.urlopen(url) 89 | _, fname = tempfile.mkstemp() 90 | with open(fname, 'wb') as ff: 91 | ff.write(f.read()) 92 | img = imread(fname) 93 | os.remove(fname) 94 | return img 95 | except urllib2.URLError as e: 96 | print 'URL Error: ', e.reason, url 97 | except urllib2.HTTPError as e: 98 | print 'HTTP Error: ', e.code, url 99 | -------------------------------------------------------------------------------- /assignment3/cs231n/layer_utils.py: -------------------------------------------------------------------------------- 1 | from cs231n.layers import * 2 | from cs231n.fast_layers import * 3 | 4 | 5 | def affine_relu_forward(x, w, b): 6 | """ 7 | Convenience layer that perorms an affine transform followed by a ReLU 8 | 9 | Inputs: 10 | - x: Input to the affine layer 11 | - w, b: Weights for the affine layer 12 | 13 | Returns a tuple of: 14 | - out: Output from the ReLU 15 | - cache: Object to give to the backward pass 16 | """ 17 | a, fc_cache = affine_forward(x, w, b) 18 | out, relu_cache = relu_forward(a) 19 | cache = (fc_cache, relu_cache) 20 | return out, cache 21 | 22 | 23 | def affine_relu_backward(dout, cache): 24 | """ 25 | Backward pass for the affine-relu convenience layer 26 | """ 27 | fc_cache, relu_cache = cache 28 | da = relu_backward(dout, relu_cache) 29 | dx, dw, db = affine_backward(da, fc_cache) 30 | return dx, dw, db 31 | 32 | 33 | def affine_bn_relu_forward(x, w, b, gamma, beta, bn_param): 34 | """ 35 | Convenience layer that performs an affine transform, batch normalization, 36 | and ReLU. 37 | 38 | Inputs: 39 | - x: Array of shape (N, D1); input to the affine layer 40 | - w, b: Arrays of shape (D2, D2) and (D2,) giving the weight and bias for 41 | the affine transform. 42 | - gamma, beta: Arrays of shape (D2,) and (D2,) giving scale and shift 43 | parameters for batch normalization. 44 | - bn_param: Dictionary of parameters for batch normalization. 45 | 46 | Returns: 47 | - out: Output from ReLU, of shape (N, D2) 48 | - cache: Object to give to the backward pass. 49 | """ 50 | a, fc_cache = affine_forward(x, w, b) 51 | a_bn, bn_cache = batchnorm_forward(a, gamma, beta, bn_param) 52 | out, relu_cache = relu_forward(a_bn) 53 | cache = (fc_cache, bn_cache, relu_cache) 54 | return out, cache 55 | 56 | 57 | def affine_bn_relu_backward(dout, cache): 58 | """ 59 | Backward pass for the affine-batchnorm-relu convenience layer. 60 | """ 61 | fc_cache, bn_cache, relu_cache = cache 62 | da_bn = relu_backward(dout, relu_cache) 63 | da, dgamma, dbeta = batchnorm_backward(da_bn, bn_cache) 64 | dx, dw, db = affine_backward(da, fc_cache) 65 | return dx, dw, db, dgamma, dbeta 66 | 67 | 68 | def conv_relu_forward(x, w, b, conv_param): 69 | """ 70 | A convenience layer that performs a convolution followed by a ReLU. 71 | 72 | Inputs: 73 | - x: Input to the convolutional layer 74 | - w, b, conv_param: Weights and parameters for the convolutional layer 75 | 76 | Returns a tuple of: 77 | - out: Output from the ReLU 78 | - cache: Object to give to the backward pass 79 | """ 80 | a, conv_cache = conv_forward_fast(x, w, b, conv_param) 81 | out, relu_cache = relu_forward(a) 82 | cache = (conv_cache, relu_cache) 83 | return out, cache 84 | 85 | 86 | def conv_relu_backward(dout, cache): 87 | """ 88 | Backward pass for the conv-relu convenience layer. 89 | """ 90 | conv_cache, relu_cache = cache 91 | da = relu_backward(dout, relu_cache) 92 | dx, dw, db = conv_backward_fast(da, conv_cache) 93 | return dx, dw, db 94 | 95 | 96 | def conv_bn_relu_forward(x, w, b, gamma, beta, conv_param, bn_param): 97 | a, conv_cache = conv_forward_fast(x, w, b, conv_param) 98 | an, bn_cache = spatial_batchnorm_forward(a, gamma, beta, bn_param) 99 | out, relu_cache = relu_forward(an) 100 | cache = (conv_cache, bn_cache, relu_cache) 101 | return out, cache 102 | 103 | 104 | def conv_bn_relu_backward(dout, cache): 105 | conv_cache, bn_cache, relu_cache = cache 106 | dan = relu_backward(dout, relu_cache) 107 | da, dgamma, dbeta = spatial_batchnorm_backward(dan, bn_cache) 108 | dx, dw, db = conv_backward_fast(da, conv_cache) 109 | return dx, dw, db, dgamma, dbeta 110 | 111 | 112 | def conv_relu_pool_forward(x, w, b, conv_param, pool_param): 113 | """ 114 | Convenience layer that performs a convolution, a ReLU, and a pool. 115 | 116 | Inputs: 117 | - x: Input to the convolutional layer 118 | - w, b, conv_param: Weights and parameters for the convolutional layer 119 | - pool_param: Parameters for the pooling layer 120 | 121 | Returns a tuple of: 122 | - out: Output from the pooling layer 123 | - cache: Object to give to the backward pass 124 | """ 125 | a, conv_cache = conv_forward_fast(x, w, b, conv_param) 126 | s, relu_cache = relu_forward(a) 127 | out, pool_cache = max_pool_forward_fast(s, pool_param) 128 | cache = (conv_cache, relu_cache, pool_cache) 129 | return out, cache 130 | 131 | 132 | def conv_relu_pool_backward(dout, cache): 133 | """ 134 | Backward pass for the conv-relu-pool convenience layer 135 | """ 136 | conv_cache, relu_cache, pool_cache = cache 137 | ds = max_pool_backward_fast(dout, pool_cache) 138 | da = relu_backward(ds, relu_cache) 139 | dx, dw, db = conv_backward_fast(da, conv_cache) 140 | return dx, dw, db 141 | 142 | -------------------------------------------------------------------------------- /assignment3/cs231n/layers.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def affine_forward(x, w, b): 5 | """ 6 | Computes the forward pass for an affine (fully-connected) layer. 7 | 8 | The input x has shape (N, d_1, ..., d_k) where x[i] is the ith input. 9 | We multiply this against a weight matrix of shape (D, M) where 10 | D = \prod_i d_i 11 | 12 | Inputs: 13 | x - Input data, of shape (N, d_1, ..., d_k) 14 | w - Weights, of shape (D, M) 15 | b - Biases, of shape (M,) 16 | 17 | Returns a tuple of: 18 | - out: output, of shape (N, M) 19 | - cache: (x, w, b) 20 | """ 21 | out = x.reshape(x.shape[0], -1).dot(w) + b 22 | cache = (x, w, b) 23 | return out, cache 24 | 25 | 26 | def affine_backward(dout, cache): 27 | """ 28 | Computes the backward pass for an affine layer. 29 | 30 | Inputs: 31 | - dout: Upstream derivative, of shape (N, M) 32 | - cache: Tuple of: 33 | - x: Input data, of shape (N, d_1, ... d_k) 34 | - w: Weights, of shape (D, M) 35 | 36 | Returns a tuple of: 37 | - dx: Gradient with respect to x, of shape (N, d1, ..., d_k) 38 | - dw: Gradient with respect to w, of shape (D, M) 39 | - db: Gradient with respect to b, of shape (M,) 40 | """ 41 | x, w, b = cache 42 | dx = dout.dot(w.T).reshape(x.shape) 43 | dw = x.reshape(x.shape[0], -1).T.dot(dout) 44 | db = np.sum(dout, axis=0) 45 | return dx, dw, db 46 | 47 | 48 | def relu_forward(x): 49 | """ 50 | Computes the forward pass for a layer of rectified linear units (ReLUs). 51 | 52 | Input: 53 | - x: Inputs, of any shape 54 | 55 | Returns a tuple of: 56 | - out: Output, of the same shape as x 57 | - cache: x 58 | """ 59 | out = np.maximum(0, x) 60 | cache = x 61 | return out, cache 62 | 63 | 64 | def relu_backward(dout, cache): 65 | """ 66 | Computes the backward pass for a layer of rectified linear units (ReLUs). 67 | 68 | Input: 69 | - dout: Upstream derivatives, of any shape 70 | - cache: Input x, of same shape as dout 71 | 72 | Returns: 73 | - dx: Gradient with respect to x 74 | """ 75 | x = cache 76 | dx = np.where(x > 0, dout, 0) 77 | return dx 78 | 79 | 80 | def batchnorm_forward(x, gamma, beta, bn_param): 81 | """ 82 | Forward pass for batch normalization. 83 | 84 | During training the sample mean and (uncorrected) sample variance are 85 | computed from minibatch statistics and used to normalize the incoming data. 86 | During training we also keep an exponentially decaying running mean of the mean 87 | and variance of each feature, and these averages are used to normalize data 88 | at test-time. 89 | 90 | At each timestep we update the running averages for mean and variance using 91 | an exponential decay based on the momentum parameter: 92 | 93 | running_mean = momentum * running_mean + (1 - momentum) * sample_mean 94 | running_var = momentum * running_var + (1 - momentum) * sample_var 95 | 96 | Note that the batch normalization paper suggests a different test-time 97 | behavior: they compute sample mean and variance for each feature using a 98 | large number of training images rather than using a running average. For 99 | this implementation we have chosen to use running averages instead since 100 | they do not require an additional estimation step; the torch7 implementation 101 | of batch normalization also uses running averages. 102 | 103 | Input: 104 | - x: Data of shape (N, D) 105 | - gamma: Scale parameter of shape (D,) 106 | - beta: Shift paremeter of shape (D,) 107 | - bn_param: Dictionary with the following keys: 108 | - mode: 'train' or 'test'; required 109 | - eps: Constant for numeric stability 110 | - momentum: Constant for running mean / variance. 111 | - running_mean: Array of shape (D,) giving running mean of features 112 | - running_var Array of shape (D,) giving running variance of features 113 | 114 | Returns a tuple of: 115 | - out: of shape (N, D) 116 | - cache: A tuple of values needed in the backward pass 117 | """ 118 | mode = bn_param['mode'] 119 | eps = bn_param.get('eps', 1e-5) 120 | momentum = bn_param.get('momentum', 0.9) 121 | 122 | N, D = x.shape 123 | running_mean = bn_param.get('running_mean', np.zeros(D, dtype=x.dtype)) 124 | running_var = bn_param.get('running_var', np.zeros(D, dtype=x.dtype)) 125 | 126 | out, cache = None, None 127 | if mode == 'train': 128 | # Compute output 129 | mu = x.mean(axis=0) 130 | xc = x - mu 131 | var = np.mean(xc ** 2, axis=0) 132 | std = np.sqrt(var + eps) 133 | xn = xc / std 134 | out = gamma * xn + beta 135 | 136 | cache = (mode, x, gamma, xc, std, xn, out) 137 | 138 | # Update running average of mean 139 | running_mean *= momentum 140 | running_mean += (1 - momentum) * mu 141 | 142 | # Update running average of variance 143 | running_var *= momentum 144 | running_var += (1 - momentum) * var 145 | elif mode == 'test': 146 | # Using running mean and variance to normalize 147 | std = np.sqrt(running_var + eps) 148 | xn = (x - running_mean) / std 149 | out = gamma * xn + beta 150 | cache = (mode, x, xn, gamma, beta, std) 151 | else: 152 | raise ValueError('Invalid forward batchnorm mode "%s"' % mode) 153 | 154 | # Store the updated running means back into bn_param 155 | bn_param['running_mean'] = running_mean 156 | bn_param['running_var'] = running_var 157 | 158 | return out, cache 159 | 160 | 161 | def batchnorm_backward(dout, cache): 162 | """ 163 | Backward pass for batch normalization. 164 | 165 | For this implementation, you should write out a computation graph for 166 | batch normalization on paper and propagate gradients backward through 167 | intermediate nodes. 168 | 169 | Inputs: 170 | - dout: Upstream derivatives, of shape (N, D) 171 | - cache: Variable of intermediates from batchnorm_forward. 172 | 173 | Returns a tuple of: 174 | - dx: Gradient with respect to inputs x, of shape (N, D) 175 | - dgamma: Gradient with respect to scale parameter gamma, of shape (D,) 176 | - dbeta: Gradient with respect to shift parameter beta, of shape (D,) 177 | """ 178 | mode = cache[0] 179 | if mode == 'train': 180 | mode, x, gamma, xc, std, xn, out = cache 181 | 182 | N = x.shape[0] 183 | dbeta = dout.sum(axis=0) 184 | dgamma = np.sum(xn * dout, axis=0) 185 | dxn = gamma * dout 186 | dxc = dxn / std 187 | dstd = -np.sum((dxn * xc) / (std * std), axis=0) 188 | dvar = 0.5 * dstd / std 189 | dxc += (2.0 / N) * xc * dvar 190 | dmu = np.sum(dxc, axis=0) 191 | dx = dxc - dmu / N 192 | elif mode == 'test': 193 | mode, x, xn, gamma, beta, std = cache 194 | dbeta = dout.sum(axis=0) 195 | dgamma = np.sum(xn * dout, axis=0) 196 | dxn = gamma * dout 197 | dx = dxn / std 198 | else: 199 | raise ValueError(mode) 200 | 201 | return dx, dgamma, dbeta 202 | 203 | 204 | def spatial_batchnorm_forward(x, gamma, beta, bn_param): 205 | """ 206 | Computes the forward pass for spatial batch normalization. 207 | 208 | Inputs: 209 | - x: Input data of shape (N, C, H, W) 210 | - gamma: Scale parameter, of shape (C,) 211 | - beta: Shift parameter, of shape (C,) 212 | - bn_param: Dictionary with the following keys: 213 | - mode: 'train' or 'test'; required 214 | - eps: Constant for numeric stability 215 | - momentum: Constant for running mean / variance. momentum=0 means that 216 | old information is discarded completely at every time step, while 217 | momentum=1 means that new information is never incorporated. The 218 | default of momentum=0.9 should work well in most situations. 219 | - running_mean: Array of shape (D,) giving running mean of features 220 | - running_var Array of shape (D,) giving running variance of features 221 | 222 | Returns a tuple of: 223 | - out: Output data, of shape (N, C, H, W) 224 | - cache: Values needed for the backward pass 225 | """ 226 | N, C, H, W = x.shape 227 | x_flat = x.transpose(0, 2, 3, 1).reshape(-1, C) 228 | out_flat, cache = batchnorm_forward(x_flat, gamma, beta, bn_param) 229 | out = out_flat.reshape(N, H, W, C).transpose(0, 3, 1, 2) 230 | return out, cache 231 | 232 | 233 | def spatial_batchnorm_backward(dout, cache): 234 | """ 235 | Computes the backward pass for spatial batch normalization. 236 | 237 | Inputs: 238 | - dout: Upstream derivatives, of shape (N, C, H, W) 239 | - cache: Values from the forward pass 240 | 241 | Returns a tuple of: 242 | - dx: Gradient with respect to inputs, of shape (N, C, H, W) 243 | - dgamma: Gradient with respect to scale parameter, of shape (C,) 244 | - dbeta: Gradient with respect to shift parameter, of shape (C,) 245 | """ 246 | N, C, H, W = dout.shape 247 | dout_flat = dout.transpose(0, 2, 3, 1).reshape(-1, C) 248 | dx_flat, dgamma, dbeta = batchnorm_backward(dout_flat, cache) 249 | dx = dx_flat.reshape(N, H, W, C).transpose(0, 3, 1, 2) 250 | return dx, dgamma, dbeta 251 | 252 | 253 | def svm_loss(x, y): 254 | """ 255 | Computes the loss and gradient using for multiclass SVM classification. 256 | 257 | Inputs: 258 | - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class 259 | for the ith input. 260 | - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and 261 | 0 <= y[i] < C 262 | 263 | Returns a tuple of: 264 | - loss: Scalar giving the loss 265 | - dx: Gradient of the loss with respect to x 266 | """ 267 | N = x.shape[0] 268 | correct_class_scores = x[np.arange(N), y] 269 | margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0) 270 | margins[np.arange(N), y] = 0 271 | loss = np.sum(margins) / N 272 | num_pos = np.sum(margins > 0, axis=1) 273 | dx = np.zeros_like(x) 274 | dx[margins > 0] = 1 275 | dx[np.arange(N), y] -= num_pos 276 | dx /= N 277 | return loss, dx 278 | 279 | 280 | def softmax_loss(x, y): 281 | """ 282 | Computes the loss and gradient for softmax classification. 283 | 284 | Inputs: 285 | - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class 286 | for the ith input. 287 | - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and 288 | 0 <= y[i] < C 289 | 290 | Returns a tuple of: 291 | - loss: Scalar giving the loss 292 | - dx: Gradient of the loss with respect to x 293 | """ 294 | probs = np.exp(x - np.max(x, axis=1, keepdims=True)) 295 | probs /= np.sum(probs, axis=1, keepdims=True) 296 | N = x.shape[0] 297 | loss = -np.sum(np.log(probs[np.arange(N), y])) / N 298 | dx = probs.copy() 299 | dx[np.arange(N), y] -= 1 300 | dx /= N 301 | return loss, dx 302 | 303 | -------------------------------------------------------------------------------- /assignment3/cs231n/optim.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | """ 4 | This file implements various first-order update rules that are commonly used for 5 | training neural networks. Each update rule accepts current weights and the 6 | gradient of the loss with respect to those weights and produces the next set of 7 | weights. Each update rule has the same interface: 8 | 9 | def update(w, dw, config=None): 10 | 11 | Inputs: 12 | - w: A numpy array giving the current weights. 13 | - dw: A numpy array of the same shape as w giving the gradient of the 14 | loss with respect to w. 15 | - config: A dictionary containing hyperparameter values such as learning rate, 16 | momentum, etc. If the update rule requires caching values over many 17 | iterations, then config will also hold these cached values. 18 | 19 | Returns: 20 | - next_w: The next point after the update. 21 | - config: The config dictionary to be passed to the next iteration of the 22 | update rule. 23 | 24 | NOTE: For most update rules, the default learning rate will probably not perform 25 | well; however the default values of the other hyperparameters should work well 26 | for a variety of different problems. 27 | 28 | For efficiency, update rules may perform in-place updates, mutating w and 29 | setting next_w equal to w. 30 | """ 31 | 32 | 33 | def sgd(w, dw, config=None): 34 | """ 35 | Performs vanilla stochastic gradient descent. 36 | 37 | config format: 38 | - learning_rate: Scalar learning rate. 39 | """ 40 | if config is None: config = {} 41 | config.setdefault('learning_rate', 1e-2) 42 | 43 | w -= config['learning_rate'] * dw 44 | return w, config 45 | 46 | 47 | def adam(x, dx, config=None): 48 | """ 49 | Uses the Adam update rule, which incorporates moving averages of both the 50 | gradient and its square and a bias correction term. 51 | 52 | config format: 53 | - learning_rate: Scalar learning rate. 54 | - beta1: Decay rate for moving average of first moment of gradient. 55 | - beta2: Decay rate for moving average of second moment of gradient. 56 | - epsilon: Small scalar used for smoothing to avoid dividing by zero. 57 | - m: Moving average of gradient. 58 | - v: Moving average of squared gradient. 59 | - t: Iteration number. 60 | """ 61 | if config is None: config = {} 62 | config.setdefault('learning_rate', 1e-3) 63 | config.setdefault('beta1', 0.9) 64 | config.setdefault('beta2', 0.999) 65 | config.setdefault('epsilon', 1e-8) 66 | config.setdefault('m', np.zeros_like(x)) 67 | config.setdefault('v', np.zeros_like(x)) 68 | config.setdefault('t', 0) 69 | 70 | next_x = None 71 | beta1, beta2, eps = config['beta1'], config['beta2'], config['epsilon'] 72 | t, m, v = config['t'], config['m'], config['v'] 73 | m = beta1 * m + (1 - beta1) * dx 74 | v = beta2 * v + (1 - beta2) * (dx * dx) 75 | t += 1 76 | alpha = config['learning_rate'] * np.sqrt(1 - beta2 ** t) / (1 - beta1 ** t) 77 | x -= alpha * (m / (np.sqrt(v) + eps)) 78 | config['t'] = t 79 | config['m'] = m 80 | config['v'] = v 81 | next_x = x 82 | 83 | return next_x, config 84 | 85 | 86 | -------------------------------------------------------------------------------- /assignment3/cs231n/rnn_layers.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | """ 5 | This file defines layer types that are commonly used for recurrent neural 6 | networks. 7 | """ 8 | 9 | 10 | def rnn_step_forward(x, prev_h, Wx, Wh, b): 11 | """ 12 | Run the forward pass for a single timestep of a vanilla RNN that uses a tanh 13 | activation function. 14 | 15 | The input data has dimension D, the hidden state has dimension H, and we use 16 | a minibatch size of N. 17 | 18 | Inputs: 19 | - x: Input data for this timestep, of shape (N, D). 20 | - prev_h: Hidden state from previous timestep, of shape (N, H) 21 | - Wx: Weight matrix for input-to-hidden connections, of shape (D, H) 22 | - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H) 23 | - b: Biases of shape (H,) 24 | 25 | Returns a tuple of: 26 | - next_h: Next hidden state, of shape (N, H) 27 | - cache: Tuple of values needed for the backward pass. 28 | """ 29 | next_h, cache = None, None 30 | ############################################################################## 31 | # TODO: Implement a single forward step for the vanilla RNN. Store the next # 32 | # hidden state and any values you need for the backward pass in the next_h # 33 | # and cache variables respectively. # 34 | ############################################################################## 35 | intermediate = np.dot(x, Wx) + np.dot(prev_h, Wh) + b # (N,H) 36 | next_h = np.tanh(intermediate) 37 | cache = (x, prev_h, Wx, Wh, intermediate, next_h) 38 | ############################################################################## 39 | # END OF YOUR CODE # 40 | ############################################################################## 41 | return next_h, cache 42 | 43 | 44 | def rnn_step_backward(dnext_h, cache): 45 | """ 46 | Backward pass for a single timestep of a vanilla RNN. 47 | 48 | Inputs: 49 | - dnext_h: Gradient of loss with respect to next hidden state 50 | - cache: Cache object from the forward pass 51 | 52 | Returns a tuple of: 53 | - dx: Gradients of input data, of shape (N, D) 54 | - dprev_h: Gradients of previous hidden state, of shape (N, H) 55 | - dWx: Gradients of input-to-hidden weights, of shape (N, H) 56 | - dWh: Gradients of hidden-to-hidden weights, of shape (H, H) 57 | - db: Gradients of bias vector, of shape (H,) 58 | """ 59 | dx, dprev_h, dWx, dWh, db = None, None, None, None, None 60 | ############################################################################## 61 | # TODO: Implement the backward pass for a single step of a vanilla RNN. # 62 | # # 63 | # HINT: For the tanh function, you can compute the local derivative in terms # 64 | # of the output value from tanh. # 65 | ############################################################################## 66 | x, prev_h, Wx, Wh, intermediate, next_h = cache 67 | dintermediate = dnext_h*(1-next_h**2) 68 | db = np.sum(dintermediate, axis=0) 69 | dx = np.dot(dintermediate, Wx.T) 70 | dWx = np.dot(x.T, dintermediate) 71 | dprev_h = np.dot(dintermediate, Wh.T) 72 | dWh = np.dot(prev_h.T, dintermediate) 73 | ############################################################################## 74 | # END OF YOUR CODE # 75 | ############################################################################## 76 | return dx, dprev_h, dWx, dWh, db 77 | 78 | 79 | def rnn_forward(x, h0, Wx, Wh, b): 80 | """ 81 | Run a vanilla RNN forward on an entire sequence of data. We assume an input 82 | sequence composed of T vectors, each of dimension D. The RNN uses a hidden 83 | size of H, and we work over a minibatch containing N sequences. After running 84 | the RNN forward, we return the hidden states for all timesteps. 85 | 86 | Inputs: 87 | - x: Input data for the entire timeseries, of shape (N, T, D). 88 | - h0: Initial hidden state, of shape (N, H) 89 | - Wx: Weight matrix for input-to-hidden connections, of shape (D, H) 90 | - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H) 91 | - b: Biases of shape (H,) 92 | 93 | Returns a tuple of: 94 | - h: Hidden states for the entire timeseries, of shape (N, T, H). 95 | - cache: Values needed in the backward pass 96 | """ 97 | h, cache = None, None 98 | ############################################################################## 99 | # TODO: Implement forward pass for a vanilla RNN running on a sequence of # 100 | # input data. You should use the rnn_step_forward function that you defined # 101 | # above. # 102 | ############################################################################## 103 | N, T, D = x.shape 104 | H = h0.shape[1] 105 | h = np.zeros((N, T, H)) 106 | cache = [None] * T 107 | for t in range(T): 108 | if t==0: 109 | h[:,t,:], cache[t] = rnn_step_forward(x[:,t,:], h0, Wx, Wh, b) 110 | else: 111 | h[:,t,:], cache[t] = rnn_step_forward(x[:,t,:], h[:,t-1,:], Wx, Wh, b) 112 | ############################################################################## 113 | # END OF YOUR CODE # 114 | ############################################################################## 115 | return h, cache 116 | 117 | 118 | def rnn_backward(dh, cache): 119 | """ 120 | Compute the backward pass for a vanilla RNN over an entire sequence of data. 121 | 122 | Inputs: 123 | - dh: Upstream gradients of all hidden states, of shape (N, T, H) 124 | 125 | Returns a tuple of: 126 | - dx: Gradient of inputs, of shape (N, T, D) 127 | - dh0: Gradient of initial hidden state, of shape (N, H) 128 | - dWx: Gradient of input-to-hidden weights, of shape (D, H) 129 | - dWh: Gradient of hidden-to-hidden weights, of shape (H, H) 130 | - db: Gradient of biases, of shape (H,) 131 | """ 132 | dx, dh0, dWx, dWh, db = None, None, None, None, None 133 | ############################################################################## 134 | # TODO: Implement the backward pass for a vanilla RNN running an entire # 135 | # sequence of data. You should use the rnn_step_backward function that you # 136 | # defined above. # 137 | ############################################################################## 138 | N, T, H = dh.shape 139 | D = cache[0][0].shape[1] 140 | dx = np.zeros((N, T, D)) 141 | dWx = np.zeros((D, H)) 142 | dWh = np.zeros((H, H)) 143 | db = np.zeros(H) 144 | dprev_h_t = np.zeros((N, H)) 145 | 146 | for t in range(T-1, -1, -1): 147 | dx_t, dprev_h_t, dWx_t, dWh_t, db_t = rnn_step_backward(dprev_h_t + dh[:,t,:], cache[t]) 148 | dx[:,t,:] = dx_t 149 | dWx += dWx_t 150 | dWh += dWh_t 151 | db += db_t 152 | dh0 = dprev_h_t 153 | ############################################################################## 154 | # END OF YOUR CODE # 155 | ############################################################################## 156 | return dx, dh0, dWx, dWh, db 157 | 158 | 159 | def word_embedding_forward(x, W): 160 | """ 161 | Forward pass for word embeddings. We operate on minibatches of size N where 162 | each sequence has length T. We assume a vocabulary of V words, assigning each 163 | to a vector of dimension D. 164 | 165 | Inputs: 166 | - x: Integer array of shape (N, T) giving indices of words. Each element idx 167 | of x muxt be in the range 0 <= idx < V. 168 | - W: Weight matrix of shape (V, D) giving word vectors for all words. 169 | 170 | Returns a tuple of: 171 | - out: Array of shape (N, T, D) giving word vectors for all input words. 172 | - cache: Values needed for the backward pass 173 | """ 174 | out, cache = None, None 175 | ############################################################################## 176 | # TODO: Implement the forward pass for word embeddings. # 177 | # # 178 | # HINT: This should be very simple. # 179 | ############################################################################## 180 | N, T = x.shape 181 | V, D = W.shape 182 | out = np.zeros((N, T, D)) 183 | it = np.nditer(x, flags=['multi_index']) 184 | while not it.finished: 185 | out[it.multi_index] = W[it.value] 186 | it.iternext() 187 | cache = (x, V, D) 188 | ############################################################################## 189 | # END OF YOUR CODE # 190 | ############################################################################## 191 | return out, cache 192 | 193 | 194 | def word_embedding_backward(dout, cache): 195 | """ 196 | Backward pass for word embeddings. We cannot back-propagate into the words 197 | since they are integers, so we only return gradient for the word embedding 198 | matrix. 199 | 200 | HINT: Look up the function np.add.at 201 | 202 | Inputs: 203 | - dout: Upstream gradients of shape (N, T, D) 204 | - cache: Values from the forward pass 205 | 206 | Returns: 207 | - dW: Gradient of word embedding matrix, of shape (V, D). 208 | """ 209 | dW = None 210 | ############################################################################## 211 | # TODO: Implement the backward pass for word embeddings. # 212 | # # 213 | # HINT: Look up the function np.add.at # 214 | ############################################################################## 215 | x,V,D = cache 216 | dW = np.zeros((V,D)) 217 | 218 | np.add.at(dW, x, dout) 219 | ############################################################################## 220 | # END OF YOUR CODE # 221 | ############################################################################## 222 | return dW 223 | 224 | 225 | def sigmoid(x): 226 | """ 227 | A numerically stable version of the logistic sigmoid function. 228 | """ 229 | pos_mask = (x >= 0) 230 | neg_mask = (x < 0) 231 | z = np.zeros_like(x) 232 | z[pos_mask] = np.exp(-x[pos_mask]) 233 | z[neg_mask] = np.exp(x[neg_mask]) 234 | top = np.ones_like(x) 235 | top[neg_mask] = z[neg_mask] 236 | return top / (1 + z) 237 | 238 | 239 | def lstm_step_forward(x, prev_h, prev_c, Wx, Wh, b): 240 | """ 241 | Forward pass for a single timestep of an LSTM. 242 | 243 | The input data has dimension D, the hidden state has dimension H, and we use 244 | a minibatch size of N. 245 | 246 | Inputs: 247 | - x: Input data, of shape (N, D) 248 | - prev_h: Previous hidden state, of shape (N, H) 249 | - prev_c: previous cell state, of shape (N, H) 250 | - Wx: Input-to-hidden weights, of shape (D, 4H) 251 | - Wh: Hidden-to-hidden weights, of shape (H, 4H) 252 | - b: Biases, of shape (4H,) 253 | 254 | Returns a tuple of: 255 | - next_h: Next hidden state, of shape (N, H) 256 | - next_c: Next cell state, of shape (N, H) 257 | - cache: Tuple of values needed for backward pass. 258 | """ 259 | next_h, next_c, cache = None, None, None 260 | ############################################################################# 261 | # TODO: Implement the forward pass for a single timestep of an LSTM. # 262 | # You may want to use the numerically stable sigmoid implementation above. # 263 | ############################################################################# 264 | N, H = prev_h.shape 265 | intermediate = np.dot(x, Wx) + np.dot(prev_h, Wh) + b # (N, 4H) 266 | i = sigmoid(intermediate[:, :H]) 267 | f = sigmoid(intermediate[:, H:2*H]) 268 | o = sigmoid(intermediate[:, 2*H:3*H]) 269 | g = np.tanh(intermediate[:, 3*H:]) 270 | 271 | next_c = f*prev_c + i*g 272 | next_h = o*np.tanh(next_c) 273 | cache = (x, prev_h, prev_c, Wx, Wh, i, f, o, g, next_h, next_c) 274 | ############################################################################## 275 | # END OF YOUR CODE # 276 | ############################################################################## 277 | 278 | return next_h, next_c, cache 279 | 280 | 281 | def lstm_step_backward(dnext_h, dnext_c, cache): 282 | """ 283 | Backward pass for a single timestep of an LSTM. 284 | 285 | Inputs: 286 | - dnext_h: Gradients of next hidden state, of shape (N, H) 287 | - dnext_c: Gradients of next cell state, of shape (N, H) 288 | - cache: Values from the forward pass 289 | 290 | Returns a tuple of: 291 | - dx: Gradient of input data, of shape (N, D) 292 | - dprev_h: Gradient of previous hidden state, of shape (N, H) 293 | - dprev_c: Gradient of previous cell state, of shape (N, H) 294 | - dWx: Gradient of input-to-hidden weights, of shape (D, 4H) 295 | - dWh: Gradient of hidden-to-hidden weights, of shape (H, 4H) 296 | - db: Gradient of biases, of shape (4H,) 297 | """ 298 | dx, dh, dc, dWx, dWh, db = None, None, None, None, None, None 299 | ############################################################################# 300 | # TODO: Implement the backward pass for a single timestep of an LSTM. # 301 | # # 302 | # HINT: For sigmoid and tanh you can compute local derivatives in terms of # 303 | # the output value from the nonlinearity. # 304 | ############################################################################# 305 | x, prev_h, prev_c, Wx, Wh, i, f, o, g, next_h, next_c = cache 306 | N, H = prev_h.shape 307 | # next_c = f*prev_c + i*g next_h = o*np.tanh(next_c) 308 | dnext_c = dnext_c + o*(1-np.tanh(next_c)**2)*dnext_h # Important! 309 | dprev_c = dnext_c*f 310 | 311 | di = dnext_c*g 312 | df = dnext_c*prev_c 313 | do = dnext_h*np.tanh(next_c) 314 | dg = dnext_c*i 315 | 316 | d_intermediate = np.zeros((N, 4*H)) 317 | d_intermediate[:, :H] = di*i*(1-i) 318 | d_intermediate[:, H:2*H] = df*f*(1-f) 319 | d_intermediate[:, 2*H:3*H] = do*o*(1-o) 320 | d_intermediate[:, 3*H:] = dg*(1-g*g) 321 | 322 | db = np.sum(d_intermediate, axis=0) # (N, 4H)-> (,4H) 323 | # intermediate = np.dot(x, Wx) + np.dot(prev_h, Wh) + b 324 | dx = np.dot(d_intermediate, Wx.T) # N*4H 4H*D 325 | dWx = np.dot(x.T, d_intermediate) # D*N N*4H 326 | dprev_h = np.dot(d_intermediate, Wh.T) # N*4H 4H*H 327 | dWh = np.dot(prev_h.T, d_intermediate) # H*N N*4H 328 | ############################################################################## 329 | # END OF YOUR CODE # 330 | ############################################################################## 331 | 332 | return dx, dprev_h, dprev_c, dWx, dWh, db 333 | 334 | 335 | def lstm_forward(x, h0, Wx, Wh, b): 336 | """ 337 | Forward pass for an LSTM over an entire sequence of data. We assume an input 338 | sequence composed of T vectors, each of dimension D. The LSTM uses a hidden 339 | size of H, and we work over a minibatch containing N sequences. After running 340 | the LSTM forward, we return the hidden states for all timesteps. 341 | 342 | Note that the initial cell state is passed as input, but the initial cell 343 | state is set to zero. Also note that the cell state is not returned; it is 344 | an internal variable to the LSTM and is not accessed from outside. 345 | 346 | Inputs: 347 | - x: Input data of shape (N, T, D) 348 | - h0: Initial hidden state of shape (N, H) 349 | - Wx: Weights for input-to-hidden connections, of shape (D, 4H) 350 | - Wh: Weights for hidden-to-hidden connections, of shape (H, 4H) 351 | - b: Biases of shape (4H,) 352 | 353 | Returns a tuple of: 354 | - h: Hidden states for all timesteps of all sequences, of shape (N, T, H) 355 | - cache: Values needed for the backward pass. 356 | """ 357 | h, cache = None, None 358 | ############################################################################# 359 | # TODO: Implement the forward pass for an LSTM over an entire timeseries. # 360 | # You should use the lstm_step_forward function that you just defined. # 361 | ############################################################################# 362 | N, T, D = x.shape 363 | H = h0.shape[1] 364 | h = np.zeros((N, T, H)) 365 | c = np.zeros((N, T, H)) 366 | cache = [None] * T 367 | c0 = np.zeros((N, H)) 368 | 369 | for t in range(T): 370 | if t==0: 371 | h[:,t,:], c[:, t, :], cache[t] = lstm_step_forward(x[:,t,:], h0, c0, Wx, Wh, b) 372 | else: 373 | h[:,t,:], c[:, t, :], cache[t] = lstm_step_forward(x[:,t,:], h[:,t-1,:], c[:,t-1,:], Wx, Wh, b) 374 | 375 | ############################################################################## 376 | # END OF YOUR CODE # 377 | ############################################################################## 378 | 379 | return h, cache 380 | 381 | 382 | def lstm_backward(dh, cache): 383 | """ 384 | Backward pass for an LSTM over an entire sequence of data.] 385 | 386 | Inputs: 387 | - dh: Upstream gradients of hidden states, of shape (N, T, H) 388 | - cache: Values from the forward pass 389 | 390 | Returns a tuple of: 391 | - dx: Gradient of input data of shape (N, T, D) 392 | - dh0: Gradient of initial hidden state of shape (N, H) 393 | - dWx: Gradient of input-to-hidden weight matrix of shape (D, 4H) 394 | - dWh: Gradient of hidden-to-hidden weight matrix of shape (H, 4H) 395 | - db: Gradient of biases, of shape (4H,) 396 | """ 397 | dx, dh0, dWx, dWh, db = None, None, None, None, None 398 | ############################################################################# 399 | # TODO: Implement the backward pass for an LSTM over an entire timeseries. # 400 | # You should use the lstm_step_backward function that you just defined. # 401 | ############################################################################# 402 | N, T, H = dh.shape 403 | D = cache[0][0].shape[1] 404 | dx = np.zeros((N, T, D)) 405 | dWx = np.zeros((D, 4*H)) 406 | dWh = np.zeros((H, 4*H)) 407 | db = np.zeros(4*H) 408 | dprev_h_t = np.zeros((N, H)) 409 | dprev_c_t = np.zeros((N, H)) 410 | 411 | for t in range(T-1, -1, -1): 412 | dx_t, dprev_h_t, dprev_c_t, dWx_t, dWh_t, db_t = lstm_step_backward(dprev_h_t + dh[:,t,:], dprev_c_t, cache[t]) 413 | dx[:,t,:] = dx_t 414 | dWx += dWx_t 415 | dWh += dWh_t 416 | db += db_t 417 | dh0 = dprev_h_t 418 | ############################################################################## 419 | # END OF YOUR CODE # 420 | ############################################################################## 421 | 422 | return dx, dh0, dWx, dWh, db 423 | 424 | 425 | def temporal_affine_forward(x, w, b): 426 | """ 427 | Forward pass for a temporal affine layer. The input is a set of D-dimensional 428 | vectors arranged into a minibatch of N timeseries, each of length T. We use 429 | an affine function to transform each of those vectors into a new vector of 430 | dimension M. 431 | 432 | Inputs: 433 | - x: Input data of shape (N, T, D) 434 | - w: Weights of shape (D, M) 435 | - b: Biases of shape (M,) 436 | 437 | Returns a tuple of: 438 | - out: Output data of shape (N, T, M) 439 | - cache: Values needed for the backward pass 440 | """ 441 | N, T, D = x.shape 442 | M = b.shape[0] 443 | out = x.reshape(N * T, D).dot(w).reshape(N, T, M) + b 444 | cache = x, w, b, out 445 | return out, cache 446 | 447 | 448 | def temporal_affine_backward(dout, cache): 449 | """ 450 | Backward pass for temporal affine layer. 451 | 452 | Input: 453 | - dout: Upstream gradients of shape (N, T, M) 454 | - cache: Values from forward pass 455 | 456 | Returns a tuple of: 457 | - dx: Gradient of input, of shape (N, T, D) 458 | - dw: Gradient of weights, of shape (D, M) 459 | - db: Gradient of biases, of shape (M,) 460 | """ 461 | x, w, b, out = cache 462 | N, T, D = x.shape 463 | M = b.shape[0] 464 | 465 | dx = dout.reshape(N * T, M).dot(w.T).reshape(N, T, D) 466 | dw = dout.reshape(N * T, M).T.dot(x.reshape(N * T, D)).T 467 | db = dout.sum(axis=(0, 1)) 468 | 469 | return dx, dw, db 470 | 471 | 472 | def temporal_softmax_loss(x, y, mask, verbose=False): 473 | """ 474 | A temporal version of softmax loss for use in RNNs. We assume that we are 475 | making predictions over a vocabulary of size V for each timestep of a 476 | timeseries of length T, over a minibatch of size N. The input x gives scores 477 | for all vocabulary elements at all timesteps, and y gives the indices of the 478 | ground-truth element at each timestep. We use a cross-entropy loss at each 479 | timestep, summing the loss over all timesteps and averaging across the 480 | minibatch. 481 | 482 | As an additional complication, we may want to ignore the model output at some 483 | timesteps, since sequences of different length may have been combined into a 484 | minibatch and padded with NULL tokens. The optional mask argument tells us 485 | which elements should contribute to the loss. 486 | 487 | Inputs: 488 | - x: Input scores, of shape (N, T, V) 489 | - y: Ground-truth indices, of shape (N, T) where each element is in the range 490 | 0 <= y[i, t] < V 491 | - mask: Boolean array of shape (N, T) where mask[i, t] tells whether or not 492 | the scores at x[i, t] should contribute to the loss. 493 | 494 | Returns a tuple of: 495 | - loss: Scalar giving loss 496 | - dx: Gradient of loss with respect to scores x. 497 | """ 498 | 499 | N, T, V = x.shape 500 | 501 | x_flat = x.reshape(N * T, V) 502 | y_flat = y.reshape(N * T) 503 | mask_flat = mask.reshape(N * T) 504 | 505 | probs = np.exp(x_flat - np.max(x_flat, axis=1, keepdims=True)) 506 | probs /= np.sum(probs, axis=1, keepdims=True) 507 | loss = -np.sum(mask_flat * np.log(probs[np.arange(N * T), y_flat])) / N 508 | dx_flat = probs.copy() 509 | dx_flat[np.arange(N * T), y_flat] -= 1 510 | dx_flat /= N 511 | dx_flat *= mask_flat[:, np.newaxis] 512 | 513 | 514 | if verbose: print 'dx_flat: ', dx_flat.shape 515 | 516 | dx = dx_flat.reshape(N, T, V) 517 | 518 | return loss, dx 519 | 520 | -------------------------------------------------------------------------------- /assignment3/cs231n/setup.py: -------------------------------------------------------------------------------- 1 | from distutils.core import setup 2 | from distutils.extension import Extension 3 | from Cython.Build import cythonize 4 | import numpy 5 | 6 | extensions = [ 7 | Extension('im2col_cython', ['im2col_cython.pyx'], 8 | include_dirs = [numpy.get_include()] 9 | ), 10 | ] 11 | 12 | setup( 13 | ext_modules = cythonize(extensions), 14 | ) 15 | -------------------------------------------------------------------------------- /assignment3/frameworkpython: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # what real Python executable to use 4 | PYVER=2.7 5 | PATHTOPYTHON=/usr/local/bin/ 6 | PYTHON=${PATHTOPYTHON}python${PYVER} 7 | 8 | # find the root of the virtualenv, it should be the parent of the dir this script is in 9 | ENV=`$PYTHON -c "import os; print os.path.abspath(os.path.join(os.path.dirname(\"$0\"), '..'))"` 10 | 11 | # now run Python with the virtualenv set as Python's HOME 12 | export PYTHONHOME=$ENV 13 | exec $PYTHON "$@" 14 | -------------------------------------------------------------------------------- /assignment3/kitten.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment3/kitten.jpg -------------------------------------------------------------------------------- /assignment3/requirements.txt: -------------------------------------------------------------------------------- 1 | Cython==0.23.4 2 | Jinja2==2.8 3 | MarkupSafe==0.23 4 | Pillow==3.0.0 5 | Pygments==2.0.2 6 | appnope==0.1.0 7 | argparse==1.2.1 8 | backports-abc==0.4 9 | backports.ssl-match-hostname==3.5.0.1 10 | certifi==2015.11.20.1 11 | cycler==0.9.0 12 | decorator==4.0.6 13 | functools32==3.2.3-2 14 | gnureadline==6.3.3 15 | ipykernel==4.2.2 16 | ipython==4.0.1 17 | ipython-genutils==0.1.0 18 | ipywidgets==4.1.1 19 | jsonschema==2.5.1 20 | jupyter==1.0.0 21 | jupyter-client==4.1.1 22 | jupyter-console==4.0.3 23 | jupyter-core==4.0.6 24 | matplotlib==1.5.0 25 | mistune==0.7.1 26 | nbconvert==4.1.0 27 | nbformat==4.0.1 28 | notebook==4.0.6 29 | numpy==1.10.4 30 | path.py==8.1.2 31 | pexpect==4.0.1 32 | pickleshare==0.5 33 | ptyprocess==0.5 34 | pyparsing==2.0.7 35 | python-dateutil==2.4.2 36 | pytz==2015.7 37 | pyzmq==15.1.0 38 | qtconsole==4.1.1 39 | scipy==0.16.1 40 | simplegeneric==0.8.1 41 | singledispatch==3.4.0.3 42 | six==1.10.0 43 | terminado==0.5 44 | tornado==4.3 45 | traitlets==4.0.0 46 | wsgiref==0.1.2 47 | -------------------------------------------------------------------------------- /assignment3/sky.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment3/sky.jpg -------------------------------------------------------------------------------- /assignment3/start_ipython_osx.sh: -------------------------------------------------------------------------------- 1 | # Assume the virtualenv is called .env 2 | 3 | cp frameworkpython .env/bin 4 | .env/bin/frameworkpython -m IPython notebook 5 | --------------------------------------------------------------------------------