├── README.md
├── assignment2
    ├── .gitignore
    ├── .ipynb_checkpoints
    │   ├── BatchNormalization-checkpoint.ipynb
    │   ├── ConvolutionalNetworks-checkpoint.ipynb
    │   ├── Dropout-checkpoint.ipynb
    │   └── FullyConnectedNets-checkpoint.ipynb
    ├── BatchNormalization.ipynb
    ├── ConvolutionalNetworks.ipynb
    ├── Dropout.ipynb
    ├── FullyConnectedNets.ipynb
    ├── README.md
    ├── collectSubmission.sh
    ├── cs231n
    │   ├── .gitignore
    │   ├── __init__.py
    │   ├── classifiers
    │   │   ├── __init__.py
    │   │   ├── cnn.py
    │   │   └── fc_net.py
    │   ├── data_utils.py
    │   ├── datasets
    │   │   ├── .gitignore
    │   │   └── get_datasets.sh
    │   ├── fast_layers.py
    │   ├── gradient_check.py
    │   ├── im2col.py
    │   ├── im2col_cython.pyx
    │   ├── layer_utils.py
    │   ├── layers.py
    │   ├── optim.py
    │   ├── setup.py
    │   ├── solver.py
    │   └── vis_utils.py
    ├── frameworkpython
    ├── kitten.jpg
    ├── puppy.jpg
    ├── requirements.txt
    └── start_ipython_osx.sh
└── assignment3
    ├── .gitignore
    ├── .ipynb_checkpoints
        ├── ImageGeneration-checkpoint.ipynb
        ├── ImageGradients-checkpoint.ipynb
        ├── LSTM_Captioning-checkpoint.ipynb
        └── RNN_Captioning-checkpoint.ipynb
    ├── ImageGeneration.ipynb
    ├── ImageGradients.ipynb
    ├── LSTM_Captioning.ipynb
    ├── RNN_Captioning.ipynb
    ├── collectSubmission.sh
    ├── cs231n
        ├── .gitignore
        ├── __init__.py
        ├── captioning_solver.py
        ├── classifiers
        │   ├── __init__.py
        │   ├── pretrained_cnn.py
        │   └── rnn.py
        ├── coco_utils.py
        ├── data_utils.py
        ├── datasets
        │   ├── get_coco_captioning.sh
        │   ├── get_pretrained_model.sh
        │   └── get_tiny_imagenet_a.sh
        ├── fast_layers.py
        ├── gradient_check.py
        ├── im2col.py
        ├── im2col_cython.pyx
        ├── image_utils.py
        ├── layer_utils.py
        ├── layers.py
        ├── optim.py
        ├── rnn_layers.py
        └── setup.py
    ├── frameworkpython
    ├── kitten.jpg
    ├── requirements.txt
    ├── sky.jpg
    └── start_ipython_osx.sh


/README.md:
--------------------------------------------------------------------------------
 1 | # Stanford-CS231n-assignments
 2 | The answers to the assignments of CS231n 2016. Finished by frankheshibi@gmail.com
 3 | 
 4 | http://cs231n.stanford.edu/syllabus.html
 5 | 
 6 | I deleted all datasets and my python virtual environment files. To create one, we simply follow the instructions on assignments' readme.
 7 | 
 8 | As assignment 1 is too easy, I did not do it.
 9 | 
10 | Recently, I have heard that the videos in syllabus were rescinded because of some copyright problems. What a pity! I believe there are still some backups on YouTube or online drives.
11 | 


--------------------------------------------------------------------------------
/assignment2/.gitignore:
--------------------------------------------------------------------------------
1 | *.swp
2 | *.pyc
3 | .env/*
4 | 


--------------------------------------------------------------------------------
/assignment2/README.md:
--------------------------------------------------------------------------------
  1 | In this assignment you will practice writing backpropagation code, and training
  2 | Neural Networks and Convolutional Neural Networks. The goals of this assignment
  3 | are as follows:
  4 | 
  5 | - understand **Neural Networks** and how they are arranged in layered
  6 |   architectures
  7 | - understand and be able to implement (vectorized) **backpropagation**
  8 | - implement various **update rules** used to optimize Neural Networks
  9 | - implement **batch normalization** for training deep networks
 10 | - implement **dropout** to regularize networks
 11 | - effectively **cross-validate** and find the best hyperparameters for Neural
 12 |   Network architecture
 13 | - understand the architecture of **Convolutional Neural Networks** and train
 14 |   gain experience with training these models on data
 15 | 
 16 | ## Setup
 17 | You can work on the assignment in one of two ways: locally on your own machine,
 18 | or on a virtual machine through Terminal.com. 
 19 | 
 20 | ### Working in the cloud on Terminal
 21 | 
 22 | Terminal has created a separate subdomain to serve our class,
 23 | [www.stanfordterminalcloud.com](https://www.stanfordterminalcloud.com). Register
 24 | your account there. The Assignment 2 snapshot can then be found HERE. If you are
 25 | registered in the class you can contact the TA (see Piazza for more information)
 26 | to request Terminal credits for use on the assignment. Once you boot up the
 27 | snapshot everything will be installed for you, and you will be ready to start on
 28 | your assignment right away. We have written a small tutorial on Terminal
 29 | [here](http://cs231n.github.io/terminal-tutorial/).
 30 | 
 31 | ### Working locally
 32 | Get the code as a zip file
 33 | [here](http://vision.stanford.edu/teaching/cs231n/winter1516_assignment2.zip).
 34 | As for the dependencies:
 35 | 
 36 | **[Option 1] Use Anaconda:**
 37 | The preferred approach for installing all the assignment dependencies is to use
 38 | [Anaconda](https://www.continuum.io/downloads), which is a Python distribution
 39 | that includes many of the most popular Python packages for science, math,
 40 | engineering and data analysis. Once you install it you can skip all mentions of
 41 | requirements and you are ready to go directly to working on the assignment.
 42 | 
 43 | **[Option 2] Manual install, virtual environment:**
 44 | If you do not want to use Anaconda and want to go with a more manual and risky
 45 | installation route you will likely want to create a
 46 | [virtual environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/)
 47 | for the project. If you choose not to use a virtual environment, it is up to you
 48 | to make sure that all dependencies for the code are installed globally on your
 49 | machine. To set up a virtual environment, run the following:
 50 | 
 51 | ```bash
 52 | cd assignment2
 53 | sudo pip install virtualenv      # This may already be installed
 54 | virtualenv .env                  # Create a virtual environment
 55 | source .env/bin/activate         # Activate the virtual environment
 56 | pip install -r requirements.txt  # Install dependencies
 57 | # Work on the assignment for a while ...
 58 | deactivate                       # Exit the virtual environment
 59 | ```
 60 | 
 61 | **Download data:**
 62 | Once you have the starter code, you will need to download the CIFAR-10 dataset.
 63 | Run the following from the `assignment2` directory:
 64 | 
 65 | ```bash
 66 | cd cs231n/datasets
 67 | ./get_datasets.sh
 68 | ```
 69 | 
 70 | **Compile the Cython extension:** Convolutional Neural Networks require a very
 71 | efficient implementation. We have implemented of the functionality using
 72 | [Cython](http://cython.org/); you will need to compile the Cython extension
 73 | before you can run the code. From the `cs231n` directory, run the following
 74 | command:
 75 | 
 76 | ```bash
 77 | python setup.py build_ext --inplace
 78 | ```
 79 | 
 80 | **Start IPython:**
 81 | After you have the CIFAR-10 data, you should start the IPython notebook server
 82 | from the `assignment2` directory. If you are unfamiliar with IPython, you should 
 83 | read our [IPython tutorial](http://cs231n.github.io/ipython-tutorial/).
 84 | 
 85 | **NOTE:** If you are working in a virtual environment on OSX, you may encounter
 86 | errors with matplotlib due to the
 87 | [issues described here](http://matplotlib.org/faq/virtualenv_faq.html).
 88 | You can work around this issue by starting the IPython server using the
 89 | `start_ipython_osx.sh` script from the `assignment2` directory; the script
 90 | assumes that your virtual environment is named `.env`.
 91 | 
 92 | 
 93 | ### Submitting your work:
 94 | Whether you work on the assignment locally or using Terminal, once you are done
 95 | working run the `collectSubmission.sh` script; this will produce a file called
 96 | `assignment2.zip`. Upload this file to your dropbox on
 97 | [the coursework](https://coursework.stanford.edu/portal/site/W15-CS-231N-01/)
 98 | page for the course.
 99 | 
100 | 
101 | ### Q1: Fully-connected Neural Network (30 points)
102 | The IPython notebook `FullyConnectedNets.ipynb` will introduce you to our
103 | modular layer design, and then use those layers to implement fully-connected
104 | networks of arbitrary depth. To optimize these models you will implement several
105 | popular update rules.
106 | 
107 | ### Q2: Batch Normalization (30 points)
108 | In the IPython notebook `BatchNormalization.ipynb` you will implement batch
109 | normalization, and use it to train deep fully-connected networks.
110 | 
111 | ### Q3: Dropout (10 points)
112 | The IPython notebook `Dropout.ipynb` will help you implement Dropout and explore
113 | its effects on model generalization.
114 | 
115 | ### Q4: ConvNet on CIFAR-10 (30 points)
116 | In the IPython Notebook `ConvolutionalNetworks.ipynb` you will implement several
117 | new layers that are commonly used in convolutional networks. You will train a
118 | (shallow) convolutional network on CIFAR-10, and it will then be up to you to
119 | train the best network that you can.
120 | 
121 | ### Q5: Do something extra! (up to +10 points)
122 | In the process of training your network, you should feel free to implement
123 | anything that you want to get better performance. You can modify the solver,
124 | implement additional layers, use different types of regularization, use an
125 | ensemble of models, or anything else that comes to mind. If you implement these
126 | or other ideas not covered in the assignment then you will be awarded some bonus
127 | points.
128 | 
129 | 


--------------------------------------------------------------------------------
/assignment2/collectSubmission.sh:
--------------------------------------------------------------------------------
1 | rm -f assignment2.zip
2 | zip -r assignment2.zip . -x "*.git*" "*cs231n/datasets*" "*.ipynb_checkpoints*" "*README.md" "*collectSubmission.sh" "*requirements.txt" "venv/*" "*.pyc" "*cs231n/build/*"
3 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/.gitignore:
--------------------------------------------------------------------------------
1 | build/*
2 | im2col_cython.c
3 | im2col_cython.so
4 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment2/cs231n/__init__.py


--------------------------------------------------------------------------------
/assignment2/cs231n/classifiers/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment2/cs231n/classifiers/__init__.py


--------------------------------------------------------------------------------
/assignment2/cs231n/classifiers/cnn.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | from cs231n.layers import *
  4 | from cs231n.fast_layers import *
  5 | from cs231n.layer_utils import *
  6 | 
  7 | 
  8 | class ThreeLayerConvNet(object):
  9 |   """
 10 |   A three-layer convolutional network with the following architecture:
 11 |   
 12 |   conv - relu - 2x2 max pool - affine - relu - affine - softmax
 13 |   
 14 |   The network operates on minibatches of data that have shape (N, C, H, W)
 15 |   consisting of N images, each with height H and width W and with C input
 16 |   channels.
 17 |   """
 18 |   
 19 |   def __init__(self, input_dim=(3, 32, 32), num_filters=32, filter_size=7,
 20 |                hidden_dim=100, num_classes=10, weight_scale=1e-3, reg=0.0,
 21 |                dtype=np.float32):
 22 |     """
 23 |     Initialize a new network.
 24 |     
 25 |     Inputs:
 26 |     - input_dim: Tuple (C, H, W) giving size of input data
 27 |     - num_filters: Number of filters to use in the convolutional layer
 28 |     - filter_size: Size of filters to use in the convolutional layer
 29 |     - hidden_dim: Number of units to use in the fully-connected hidden layer
 30 |     - num_classes: Number of scores to produce from the final affine layer.
 31 |     - weight_scale: Scalar giving standard deviation for random initialization
 32 |       of weights.
 33 |     - reg: Scalar giving L2 regularization strength
 34 |     - dtype: numpy datatype to use for computation.
 35 |     """
 36 |     self.params = {}
 37 |     self.reg = reg
 38 |     self.dtype = dtype
 39 |     
 40 |     ############################################################################
 41 |     # TODO: Initialize weights and biases for the three-layer convolutional    #
 42 |     # network. Weights should be initialized from a Gaussian with standard     #
 43 |     # deviation equal to weight_scale; biases should be initialized to zero.   #
 44 |     # All weights and biases should be stored in the dictionary self.params.   #
 45 |     # Store weights and biases for the convolutional layer using the keys 'W1' #
 46 |     # and 'b1'; use keys 'W2' and 'b2' for the weights and biases of the       #
 47 |     # hidden affine layer, and keys 'W3' and 'b3' for the weights and biases   #
 48 |     # of the output affine layer.                                              #
 49 |     ############################################################################
 50 |     C, H, W = input_dim
 51 |     self.params['W1'] = np.random.randn(num_filters, C, filter_size, filter_size) * weight_scale
 52 |     self.params['b1'] = np.zeros(num_filters)
 53 |     self.params['W2'] = np.random.randn(num_filters*H*W/4, hidden_dim)*weight_scale
 54 |     self.params['b2'] = np.zeros(hidden_dim)
 55 |     self.params['W3'] = np.random.randn(hidden_dim, num_classes)*weight_scale
 56 |     self.params['b3'] = np.zeros(num_classes)
 57 |     ############################################################################
 58 |     #                             END OF YOUR CODE                             #
 59 |     ############################################################################
 60 | 
 61 |     for k, v in self.params.iteritems():
 62 |       self.params[k] = v.astype(dtype)
 63 |      
 64 |  
 65 |   def loss(self, X, y=None):
 66 |     """
 67 |     Evaluate loss and gradient for the three-layer convolutional network.
 68 |     
 69 |     Input / output: Same API as TwoLayerNet in fc_net.py.
 70 |     """
 71 |     W1, b1 = self.params['W1'], self.params['b1']
 72 |     W2, b2 = self.params['W2'], self.params['b2']
 73 |     W3, b3 = self.params['W3'], self.params['b3']
 74 |     
 75 |     # pass conv_param to the forward pass for the convolutional layer
 76 |     filter_size = W1.shape[2]
 77 |     conv_param = {'stride': 1, 'pad': (filter_size - 1) / 2}
 78 | 
 79 |     # pass pool_param to the forward pass for the max-pooling layer
 80 |     pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}
 81 | 
 82 |     scores = None
 83 |     ############################################################################
 84 |     # TODO: Implement the forward pass for the three-layer convolutional net,  #
 85 |     # computing the class scores for X and storing them in the scores          #
 86 |     # variable.                                                                #
 87 |     ############################################################################
 88 |     out, cache1 = conv_relu_pool_forward(X, W1, b1, conv_param, pool_param)
 89 |     out, cache2 = affine_relu_forward(out, W2, b2)
 90 |     scores, cache3 = affine_forward(out, W3, b3)
 91 |     ############################################################################
 92 |     #                             END OF YOUR CODE                             #
 93 |     ############################################################################
 94 |     
 95 |     if y is None:
 96 |       return scores
 97 |     
 98 |     loss, grads = 0, {}
 99 |     ############################################################################
100 |     # TODO: Implement the backward pass for the three-layer convolutional net, #
101 |     # storing the loss and gradients in the loss and grads variables. Compute  #
102 |     # data loss using softmax, and make sure that grads[k] holds the gradients #
103 |     # for self.params[k]. Don't forget to add L2 regularization!               #
104 |     ############################################################################
105 |     loss, dx = softmax_loss(scores, y)
106 |     loss += 0.5 * self.reg*(np.sum(self.params['W1']* self.params['W1']) + \
107 |         np.sum(self.params['W2']* self.params['W2'])+np.sum(self.params['W3']* self.params['W3']))
108 |     
109 |     dx, grads['W3'], grads['b3'] = affine_backward(dx, cache3)
110 |     dx, grads['W2'], grads['b2'] = affine_relu_backward(dx, cache2)
111 |     _, grads['W1'], grads['b1'] = conv_relu_pool_backward(dx, cache1)
112 |     grads['W1'] += self.reg*self.params['W1']
113 |     grads['W2'] += self.reg*self.params['W2']
114 |     grads['W3'] += self.reg*self.params['W3']
115 |     ############################################################################
116 |     #                             END OF YOUR CODE                             #
117 |     ############################################################################
118 |     
119 |     return loss, grads
120 |   
121 |   
122 | pass
123 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/classifiers/fc_net.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | from cs231n.layers import *
  4 | from cs231n.layer_utils import *
  5 | 
  6 | 
  7 | class TwoLayerNet(object):
  8 |   """
  9 |   A two-layer fully-connected neural network with ReLU nonlinearity and
 10 |   softmax loss that uses a modular layer design. We assume an input dimension
 11 |   of D, a hidden dimension of H, and perform classification over C classes.
 12 |   
 13 |   The architecure should be affine - relu - affine - softmax.
 14 | 
 15 |   Note that this class does not implement gradient descent; instead, it
 16 |   will interact with a separate Solver object that is responsible for running
 17 |   optimization.
 18 | 
 19 |   The learnable parameters of the model are stored in the dictionary
 20 |   self.params that maps parameter names to numpy arrays.
 21 |   """
 22 |   
 23 |   def __init__(self, input_dim=3*32*32, hidden_dim=100, num_classes=10,
 24 |                weight_scale=1e-3, reg=0.0):
 25 |     """
 26 |     Initialize a new network.
 27 | 
 28 |     Inputs:
 29 |     - input_dim: An integer giving the size of the input
 30 |     - hidden_dim: An integer giving the size of the hidden layer
 31 |     - num_classes: An integer giving the number of classes to classify
 32 |     - dropout: Scalar between 0 and 1 giving dropout strength.
 33 |     - weight_scale: Scalar giving the standard deviation for random
 34 |       initialization of the weights.
 35 |     - reg: Scalar giving L2 regularization strength.
 36 |     """
 37 |     self.params = {}
 38 |     self.reg = reg
 39 |     
 40 |     ############################################################################
 41 |     # TODO: Initialize the weights and biases of the two-layer net. Weights    #
 42 |     # should be initialized from a Gaussian with standard deviation equal to   #
 43 |     # weight_scale, and biases should be initialized to zero. All weights and  #
 44 |     # biases should be stored in the dictionary self.params, with first layer  #
 45 |     # weights and biases using the keys 'W1' and 'b1' and second layer weights #
 46 |     # and biases using the keys 'W2' and 'b2'.                                 #
 47 |     ############################################################################
 48 |     self.params['W1'] = np.random.randn(input_dim, hidden_dim) * weight_scale
 49 |     self.params['b1'] = np.zeros(hidden_dim)
 50 |     self.params['W2'] = np.random.randn(hidden_dim, num_classes) * weight_scale
 51 |     self.params['b2'] = np.zeros(num_classes)
 52 |     ############################################################################
 53 |     #                             END OF YOUR CODE                             #
 54 |     ############################################################################
 55 | 
 56 | 
 57 |   def loss(self, X, y=None):
 58 |     """
 59 |     Compute loss and gradient for a minibatch of data.
 60 | 
 61 |     Inputs:
 62 |     - X: Array of input data of shape (N, d_1, ..., d_k)
 63 |     - y: Array of labels, of shape (N,). y[i] gives the label for X[i].
 64 | 
 65 |     Returns:
 66 |     If y is None, then run a test-time forward pass of the model and return:
 67 |     - scores: Array of shape (N, C) giving classification scores, where
 68 |       scores[i, c] is the classification score for X[i] and class c.
 69 | 
 70 |     If y is not None, then run a training-time forward and backward pass and
 71 |     return a tuple of:
 72 |     - loss: Scalar value giving the loss
 73 |     - grads: Dictionary with the same keys as self.params, mapping parameter
 74 |       names to gradients of the loss with respect to those parameters.
 75 |     """  
 76 |     scores = None
 77 |     ############################################################################
 78 |     # TODO: Implement the forward pass for the two-layer net, computing the    #
 79 |     # class scores for X and storing them in the scores variable.              #
 80 |     ############################################################################
 81 |     out1, cache1 = affine_relu_forward(X, self.params['W1'], self.params['b1'])
 82 |     scores, cache2 = affine_forward(out1, self.params['W2'], self.params['b2'])
 83 |     ############################################################################
 84 |     #                             END OF YOUR CODE                             #
 85 |     ############################################################################
 86 | 
 87 |     # If y is None then we are in test mode so just return scores
 88 |     if y is None:
 89 |       return scores
 90 |     
 91 |     loss, grads = 0, {}
 92 |     ############################################################################
 93 |     # TODO: Implement the backward pass for the two-layer net. Store the loss  #
 94 |     # in the loss variable and gradients in the grads dictionary. Compute data #
 95 |     # loss using softmax, and make sure that grads[k] holds the gradients for  #
 96 |     # self.params[k]. Don't forget to add L2 regularization!                   #
 97 |     #                                                                          #
 98 |     # NOTE: To ensure that your implementation matches ours and you pass the   #
 99 |     # automated tests, make sure that your L2 regularization includes a factor #
100 |     # of 0.5 to simplify the expression for the gradient.                      #
101 |     ############################################################################
102 |     loss, dx = softmax_loss(scores, y)
103 |     loss += self.reg*0.5*(np.sum(np.square(self.params['W1']))+
104 |       np.sum(np.square(self.params['W2'])))
105 | 
106 |     dx, grads['W2'], grads['b2'] = affine_backward(dx, cache2)
107 |     dx, grads['W1'], grads['b1'] = affine_relu_backward(dx, cache1)
108 |     """MISTAKE! Here I take grads as regularization mistakenly"""
109 |     # grads['W2'] += self.reg*grads['W2']
110 |     # grads['W1'] += self.reg*grads['W1']
111 |     grads['W1'] += self.reg*self.params['W1']
112 |     grads['W2'] += self.reg*self.params['W2']
113 |     ############################################################################
114 |     #                             END OF YOUR CODE                             #
115 |     ############################################################################
116 | 
117 |     return loss, grads
118 | 
119 | 
120 | class FullyConnectedNet(object):
121 |   """
122 |   A fully-connected neural network with an arbitrary number of hidden layers,
123 |   ReLU nonlinearities, and a softmax loss function. This will also implement
124 |   dropout and batch normalization as options. For a network with L layers,
125 |   the architecture will be
126 |   
127 |   {affine - [batch norm] - relu - [dropout]} x (L - 1) - affine - softmax
128 |   
129 |   where batch normalization and dropout are optional, and the {...} block is
130 |   repeated L - 1 times.
131 |   
132 |   Similar to the TwoLayerNet above, learnable parameters are stored in the
133 |   self.params dictionary and will be learned using the Solver class.
134 |   """
135 | 
136 |   def __init__(self, hidden_dims, input_dim=3*32*32, num_classes=10,
137 |                dropout=0, use_batchnorm=False, reg=0.0,
138 |                weight_scale=1e-2, dtype=np.float32, seed=None):
139 |     """
140 |     Initialize a new FullyConnectedNet.
141 |     
142 |     Inputs:
143 |     - hidden_dims: A list of integers giving the size of each hidden layer.
144 |     - input_dim: An integer giving the size of the input.
145 |     - num_classes: An integer giving the number of classes to classify.
146 |     - dropout: Scalar between 0 and 1 giving dropout strength. If dropout=0 then
147 |       the network should not use dropout at all.
148 |     - use_batchnorm: Whether or not the network should use batch normalization.
149 |     - reg: Scalar giving L2 regularization strength.
150 |     - weight_scale: Scalar giving the standard deviation for random
151 |       initialization of the weights.
152 |     - dtype: A numpy datatype object; all computations will be performed using
153 |       this datatype. float32 is faster but less accurate, so you should use
154 |       float64 for numeric gradient checking.
155 |     - seed: If not None, then pass this random seed to the dropout layers. This
156 |       will make the dropout layers deteriminstic so we can gradient check the
157 |       model.
158 |     """
159 |     self.use_batchnorm = use_batchnorm
160 |     self.use_dropout = dropout > 0
161 |     self.reg = reg
162 |     self.num_layers = 1 + len(hidden_dims)
163 |     self.dtype = dtype
164 |     self.params = {}
165 | 
166 |     ############################################################################
167 |     # TODO: Initialize the parameters of the network, storing all values in    #
168 |     # the self.params dictionary. Store weights and biases for the first layer #
169 |     # in W1 and b1; for the second layer use W2 and b2, etc. Weights should be #
170 |     # initialized from a normal distribution with standard deviation equal to  #
171 |     # weight_scale and biases should be initialized to zero.                   #
172 |     #                                                                          #
173 |     # When using batch normalization, store scale and shift parameters for the #
174 |     # first layer in gamma1 and beta1; for the second layer use gamma2 and     #
175 |     # beta2, etc. Scale parameters should be initialized to one and shift      #
176 |     # parameters should be initialized to zero.                                #
177 |     ############################################################################
178 |     hidden_layers = len(hidden_dims)
179 |     self.params['W1'] = np.random.randn(input_dim, hidden_dims[0]) * weight_scale
180 |     self.params['b1'] = np.zeros(hidden_dims[0])
181 |     # In fact 'b' is useless if we use batch norm
182 |     if self.use_batchnorm:
183 |       del self.params['b1']
184 |       self.params['gamma1'] = np.ones(hidden_dims[0])
185 |       self.params['beta1'] = np.zeros(hidden_dims[0])
186 |     self.params['W'+str(self.num_layers)] = np.random.randn(hidden_dims[-1], num_classes) * weight_scale
187 |     self.params['b'+str(self.num_layers)] = np.zeros(num_classes)
188 |     for i in range(1, hidden_layers):
189 |       self.params['W'+str(i+1)] = np.random.randn(hidden_dims[i-1], hidden_dims[i]) * weight_scale
190 |       self.params['b'+str(i+1)] = np.zeros(hidden_dims[i])
191 |       if self.use_batchnorm:
192 |         del self.params['b'+str(i+1)]
193 |         self.params['gamma'+str(i+1)] = np.ones(hidden_dims[i])
194 |         self.params['beta'+str(i+1)] = np.zeros(hidden_dims[i])
195 | 
196 |     ############################################################################
197 |     #                             END OF YOUR CODE                             #
198 |     ############################################################################
199 | 
200 |     # When using dropout we need to pass a dropout_param dictionary to each
201 |     # dropout layer so that the layer knows the dropout probability and the mode
202 |     # (train / test). You can pass the same dropout_param to each dropout layer.
203 |     self.dropout_param = {}
204 |     if self.use_dropout:
205 |       self.dropout_param = {'mode': 'train', 'p': dropout}
206 |       if seed is not None:
207 |         self.dropout_param['seed'] = seed
208 |     
209 |     # With batch normalization we need to keep track of running means and
210 |     # variances, so we need to pass a special bn_param object to each batch
211 |     # normalization layer. You should pass self.bn_params[0] to the forward pass
212 |     # of the first batch normalization layer, self.bn_params[1] to the forward
213 |     # pass of the second batch normalization layer, etc.
214 |     self.bn_params = []
215 |     if self.use_batchnorm:
216 |       self.bn_params = [{'mode': 'train'} for i in xrange(self.num_layers - 1)]
217 |     
218 |     # Cast all parameters to the correct datatype
219 |     for k, v in self.params.iteritems():
220 |       self.params[k] = v.astype(dtype)
221 | 
222 | 
223 |   def loss(self, X, y=None):
224 |     """
225 |     Compute loss and gradient for the fully-connected net.
226 | 
227 |     Input / output: Same as TwoLayerNet above.
228 |     """
229 |     X = X.astype(self.dtype)
230 |     mode = 'test' if y is None else 'train'
231 | 
232 |     # Set train/test mode for batchnorm params and dropout param since they
233 |     # behave differently during training and testing.
234 |     if self.dropout_param is not None:
235 |       self.dropout_param['mode'] = mode   
236 |     if self.use_batchnorm:
237 |       for bn_param in self.bn_params:
238 |         # I think the author made a mistake
239 |         # bn_param[mode] = mode
240 |         bn_param['mode'] = mode
241 | 
242 | 
243 |     scores = None
244 |     ############################################################################
245 |     # TODO: Implement the forward pass for the fully-connected net, computing  #
246 |     # the class scores for X and storing them in the scores variable.          #
247 |     #                                                                          #
248 |     # When using dropout, you'll need to pass self.dropout_param to each       #
249 |     # dropout forward pass.                                                    #
250 |     #                                                                          #
251 |     # When using batch normalization, you'll need to pass self.bn_params[0] to #
252 |     # the forward pass for the first batch normalization layer, pass           #
253 |     # self.bn_params[1] to the forward pass for the second batch normalization #
254 |     # layer, etc.                                                              #
255 |     ############################################################################
256 |     self.cache = {}
257 |     hidden_layers = self.num_layers-1
258 |     if not self.use_batchnorm:
259 |       out, self.cache['cache1'] = affine_relu_forward(X, self.params['W1'], self.params['b1'])
260 |     else:
261 |       out, self.cache['cache1'] = affine_bn_relu_forward(X, self.params['W1'], self.params['gamma1'], self.params['beta1'], self.bn_params[0])
262 |     
263 |     if self.use_dropout:
264 |       out, self.cache['drop1'] = dropout_forward(out, self.dropout_param)
265 | 
266 |     for i in range(2, hidden_layers+1):
267 |       if not self.use_batchnorm:
268 |         out, self.cache['cache'+str(i)] = affine_relu_forward(out, self.params['W'+str(i)], self.params['b'+str(i)])
269 |       else:
270 |         out, self.cache['cache'+str(i)] = affine_bn_relu_forward(out, self.params['W'+str(i)], self.params['gamma'+str(i)], self.params['beta'+str(i)], self.bn_params[i-1])
271 | 
272 |       if self.use_dropout:
273 |         out, self.cache['drop'+str(i)] = dropout_forward(out, self.dropout_param)
274 | 
275 |     scores, self.cache['cache'+str(hidden_layers+1)] = affine_forward(out, self.params['W'+str(hidden_layers+1)], self.params['b'+str(hidden_layers+1)])
276 |     ############################################################################
277 |     #                             END OF YOUR CODE                             #
278 |     ############################################################################
279 | 
280 |     # If test mode return early
281 |     if mode == 'test':
282 |       return scores
283 | 
284 |     loss, grads = 0.0, {}
285 |     ############################################################################
286 |     # TODO: Implement the backward pass for the fully-connected net. Store the #
287 |     # loss in the loss variable and gradients in the grads dictionary. Compute #
288 |     # data loss using softmax, and make sure that grads[k] holds the gradients #
289 |     # for self.params[k]. Don't forget to add L2 regularization!               #
290 |     #                                                                          #
291 |     # When using batch normalization, you don't need to regularize the scale   #
292 |     # and shift parameters.                                                    #
293 |     #                                                                          #
294 |     # NOTE: To ensure that your implementation matches ours and you pass the   #
295 |     # automated tests, make sure that your L2 regularization includes a factor #
296 |     # of 0.5 to simplify the expression for the gradient.                      #
297 |     ############################################################################
298 |     # loss = softmax_loss(scores, y)
299 |     for i in range(1, hidden_layers+2):
300 |       loss += np.sum(np.square(self.params['W'+str(i)]))
301 |     loss = loss*0.5*self.reg
302 |     loss_0, dx = softmax_loss(scores, y)
303 |     loss += loss_0
304 | 
305 |     if not self.use_batchnorm:
306 |       dx, grads['W'+str(hidden_layers+1)], grads['b'+str(hidden_layers+1)] = affine_backward(dx, self.cache['cache'+str(hidden_layers+1)])
307 |     else:
308 |       dx, grads['W'+str(hidden_layers+1)], grads['b'+str(hidden_layers+1)] = affine_backward(dx, self.cache['cache'+str(hidden_layers+1)])
309 |     grads['W'+str(hidden_layers+1)] += self.reg*self.params['W'+str(hidden_layers+1)]
310 |       
311 |     for i in range(hidden_layers, 0, -1):
312 |       if self.use_dropout:
313 |         dx = dropout_backward(dx, self.cache['drop'+str(i)])
314 |         
315 |       if not self.use_batchnorm:
316 |         dx, grads['W'+str(i)], grads['b'+str(i)] = affine_relu_backward(dx, self.cache['cache'+str(i)])
317 |       else:
318 |         dx, grads['W'+str(i)], grads['gamma'+str(i)], grads['beta'+str(i)] = affine_bn_relu_backward(dx, self.cache['cache'+str(i)])
319 |       grads['W'+str(i)] += self.reg*self.params['W'+str(i)]
320 |     ############################################################################
321 |     #                             END OF YOUR CODE                             #
322 |     ############################################################################
323 | 
324 |     return loss, grads
325 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/data_utils.py:
--------------------------------------------------------------------------------
  1 | import cPickle as pickle
  2 | import numpy as np
  3 | import os
  4 | from scipy.misc import imread
  5 | 
  6 | def load_CIFAR_batch(filename):
  7 |   """ load single batch of cifar """
  8 |   with open(filename, 'rb') as f:
  9 |     datadict = pickle.load(f)
 10 |     X = datadict['data']
 11 |     Y = datadict['labels']
 12 |     X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float")
 13 |     Y = np.array(Y)
 14 |     return X, Y
 15 | 
 16 | def load_CIFAR10(ROOT):
 17 |   """ load all of cifar """
 18 |   xs = []
 19 |   ys = []
 20 |   for b in range(1,6):
 21 |     f = os.path.join(ROOT, 'data_batch_%d' % (b, ))
 22 |     X, Y = load_CIFAR_batch(f)
 23 |     xs.append(X)
 24 |     ys.append(Y)    
 25 |   Xtr = np.concatenate(xs)
 26 |   Ytr = np.concatenate(ys)
 27 |   del X, Y
 28 |   Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch'))
 29 |   return Xtr, Ytr, Xte, Yte
 30 | 
 31 | 
 32 | def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
 33 |     """
 34 |     Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
 35 |     it for classifiers. These are the same steps as we used for the SVM, but
 36 |     condensed to a single function.
 37 |     """
 38 |     # Load the raw CIFAR-10 data
 39 |     cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
 40 |     X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
 41 |         
 42 |     # Subsample the data
 43 |     mask = range(num_training, num_training + num_validation)
 44 |     X_val = X_train[mask]
 45 |     y_val = y_train[mask]
 46 |     mask = range(num_training)
 47 |     X_train = X_train[mask]
 48 |     y_train = y_train[mask]
 49 |     mask = range(num_test)
 50 |     X_test = X_test[mask]
 51 |     y_test = y_test[mask]
 52 | 
 53 |     # Normalize the data: subtract the mean image
 54 |     mean_image = np.mean(X_train, axis=0)
 55 |     X_train -= mean_image
 56 |     X_val -= mean_image
 57 |     X_test -= mean_image
 58 |     
 59 |     # Transpose so that channels come first
 60 |     X_train = X_train.transpose(0, 3, 1, 2).copy()
 61 |     X_val = X_val.transpose(0, 3, 1, 2).copy()
 62 |     X_test = X_test.transpose(0, 3, 1, 2).copy()
 63 | 
 64 |     # Package data into a dictionary
 65 |     return {
 66 |       'X_train': X_train, 'y_train': y_train,
 67 |       'X_val': X_val, 'y_val': y_val,
 68 |       'X_test': X_test, 'y_test': y_test,
 69 |     }
 70 |     
 71 | 
 72 | def load_tiny_imagenet(path, dtype=np.float32):
 73 |   """
 74 |   Load TinyImageNet. Each of TinyImageNet-100-A, TinyImageNet-100-B, and
 75 |   TinyImageNet-200 have the same directory structure, so this can be used
 76 |   to load any of them.
 77 | 
 78 |   Inputs:
 79 |   - path: String giving path to the directory to load.
 80 |   - dtype: numpy datatype used to load the data.
 81 | 
 82 |   Returns: A tuple of
 83 |   - class_names: A list where class_names[i] is a list of strings giving the
 84 |     WordNet names for class i in the loaded dataset.
 85 |   - X_train: (N_tr, 3, 64, 64) array of training images
 86 |   - y_train: (N_tr,) array of training labels
 87 |   - X_val: (N_val, 3, 64, 64) array of validation images
 88 |   - y_val: (N_val,) array of validation labels
 89 |   - X_test: (N_test, 3, 64, 64) array of testing images.
 90 |   - y_test: (N_test,) array of test labels; if test labels are not available
 91 |     (such as in student code) then y_test will be None.
 92 |   """
 93 |   # First load wnids
 94 |   with open(os.path.join(path, 'wnids.txt'), 'r') as f:
 95 |     wnids = [x.strip() for x in f]
 96 | 
 97 |   # Map wnids to integer labels
 98 |   wnid_to_label = {wnid: i for i, wnid in enumerate(wnids)}
 99 | 
100 |   # Use words.txt to get names for each class
101 |   with open(os.path.join(path, 'words.txt'), 'r') as f:
102 |     wnid_to_words = dict(line.split('\t') for line in f)
103 |     for wnid, words in wnid_to_words.iteritems():
104 |       wnid_to_words[wnid] = [w.strip() for w in words.split(',')]
105 |   class_names = [wnid_to_words[wnid] for wnid in wnids]
106 | 
107 |   # Next load training data.
108 |   X_train = []
109 |   y_train = []
110 |   for i, wnid in enumerate(wnids):
111 |     if (i + 1) % 20 == 0:
112 |       print 'loading training data for synset %d / %d' % (i + 1, len(wnids))
113 |     # To figure out the filenames we need to open the boxes file
114 |     boxes_file = os.path.join(path, 'train', wnid, '%s_boxes.txt' % wnid)
115 |     with open(boxes_file, 'r') as f:
116 |       filenames = [x.split('\t')[0] for x in f]
117 |     num_images = len(filenames)
118 |     
119 |     X_train_block = np.zeros((num_images, 3, 64, 64), dtype=dtype)
120 |     y_train_block = wnid_to_label[wnid] * np.ones(num_images, dtype=np.int64)
121 |     for j, img_file in enumerate(filenames):
122 |       img_file = os.path.join(path, 'train', wnid, 'images', img_file)
123 |       img = imread(img_file)
124 |       if img.ndim == 2:
125 |         ## grayscale file
126 |         img.shape = (64, 64, 1)
127 |       X_train_block[j] = img.transpose(2, 0, 1)
128 |     X_train.append(X_train_block)
129 |     y_train.append(y_train_block)
130 |       
131 |   # We need to concatenate all training data
132 |   X_train = np.concatenate(X_train, axis=0)
133 |   y_train = np.concatenate(y_train, axis=0)
134 |   
135 |   # Next load validation data
136 |   with open(os.path.join(path, 'val', 'val_annotations.txt'), 'r') as f:
137 |     img_files = []
138 |     val_wnids = []
139 |     for line in f:
140 |       img_file, wnid = line.split('\t')[:2]
141 |       img_files.append(img_file)
142 |       val_wnids.append(wnid)
143 |     num_val = len(img_files)
144 |     y_val = np.array([wnid_to_label[wnid] for wnid in val_wnids])
145 |     X_val = np.zeros((num_val, 3, 64, 64), dtype=dtype)
146 |     for i, img_file in enumerate(img_files):
147 |       img_file = os.path.join(path, 'val', 'images', img_file)
148 |       img = imread(img_file)
149 |       if img.ndim == 2:
150 |         img.shape = (64, 64, 1)
151 |       X_val[i] = img.transpose(2, 0, 1)
152 | 
153 |   # Next load test images
154 |   # Students won't have test labels, so we need to iterate over files in the
155 |   # images directory.
156 |   img_files = os.listdir(os.path.join(path, 'test', 'images'))
157 |   X_test = np.zeros((len(img_files), 3, 64, 64), dtype=dtype)
158 |   for i, img_file in enumerate(img_files):
159 |     img_file = os.path.join(path, 'test', 'images', img_file)
160 |     img = imread(img_file)
161 |     if img.ndim == 2:
162 |       img.shape = (64, 64, 1)
163 |     X_test[i] = img.transpose(2, 0, 1)
164 | 
165 |   y_test = None
166 |   y_test_file = os.path.join(path, 'test', 'test_annotations.txt')
167 |   if os.path.isfile(y_test_file):
168 |     with open(y_test_file, 'r') as f:
169 |       img_file_to_wnid = {}
170 |       for line in f:
171 |         line = line.split('\t')
172 |         img_file_to_wnid[line[0]] = line[1]
173 |     y_test = [wnid_to_label[img_file_to_wnid[img_file]] for img_file in img_files]
174 |     y_test = np.array(y_test)
175 |   
176 |   return class_names, X_train, y_train, X_val, y_val, X_test, y_test
177 | 
178 | 
179 | def load_models(models_dir):
180 |   """
181 |   Load saved models from disk. This will attempt to unpickle all files in a
182 |   directory; any files that give errors on unpickling (such as README.txt) will
183 |   be skipped.
184 | 
185 |   Inputs:
186 |   - models_dir: String giving the path to a directory containing model files.
187 |     Each model file is a pickled dictionary with a 'model' field.
188 | 
189 |   Returns:
190 |   A dictionary mapping model file names to models.
191 |   """
192 |   models = {}
193 |   for model_file in os.listdir(models_dir):
194 |     with open(os.path.join(models_dir, model_file), 'rb') as f:
195 |       try:
196 |         models[model_file] = pickle.load(f)['model']
197 |       except pickle.UnpicklingError:
198 |         continue
199 |   return models
200 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/datasets/.gitignore:
--------------------------------------------------------------------------------
1 | cifar-10-batches-py/*
2 | tiny-imagenet-100-A*
3 | tiny-imagenet-100-B*
4 | tiny-100-A-pretrained/*
5 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/datasets/get_datasets.sh:
--------------------------------------------------------------------------------
1 | # Get CIFAR10
2 | wget http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
3 | tar -xzvf cifar-10-python.tar.gz
4 | rm cifar-10-python.tar.gz 
5 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/fast_layers.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | try:
  3 |   from cs231n.im2col_cython import col2im_cython, im2col_cython
  4 |   from cs231n.im2col_cython import col2im_6d_cython
  5 | except ImportError:
  6 |   print 'run the following from the cs231n directory and try again:'
  7 |   print 'python setup.py build_ext --inplace'
  8 |   print 'You may also need to restart your iPython kernel'
  9 | 
 10 | from cs231n.im2col import *
 11 | 
 12 | 
 13 | def conv_forward_im2col(x, w, b, conv_param):
 14 |   """
 15 |   A fast implementation of the forward pass for a convolutional layer
 16 |   based on im2col and col2im.
 17 |   """
 18 |   N, C, H, W = x.shape
 19 |   num_filters, _, filter_height, filter_width = w.shape
 20 |   stride, pad = conv_param['stride'], conv_param['pad']
 21 | 
 22 |   # Check dimensions
 23 |   assert (W + 2 * pad - filter_width) % stride == 0, 'width does not work'
 24 |   assert (H + 2 * pad - filter_height) % stride == 0, 'height does not work'
 25 | 
 26 |   # Create output
 27 |   out_height = (H + 2 * pad - filter_height) / stride + 1
 28 |   out_width = (W + 2 * pad - filter_width) / stride + 1
 29 |   out = np.zeros((N, num_filters, out_height, out_width), dtype=x.dtype)
 30 | 
 31 |   # x_cols = im2col_indices(x, w.shape[2], w.shape[3], pad, stride)
 32 |   x_cols = im2col_cython(x, w.shape[2], w.shape[3], pad, stride)
 33 |   res = w.reshape((w.shape[0], -1)).dot(x_cols) + b.reshape(-1, 1)
 34 | 
 35 |   out = res.reshape(w.shape[0], out.shape[2], out.shape[3], x.shape[0])
 36 |   out = out.transpose(3, 0, 1, 2)
 37 | 
 38 |   cache = (x, w, b, conv_param, x_cols)
 39 |   return out, cache
 40 | 
 41 | 
 42 | def conv_forward_strides(x, w, b, conv_param):
 43 |   N, C, H, W = x.shape
 44 |   F, _, HH, WW = w.shape
 45 |   stride, pad = conv_param['stride'], conv_param['pad']
 46 | 
 47 |   # Check dimensions
 48 |   assert (W + 2 * pad - WW) % stride == 0, 'width does not work'
 49 |   assert (H + 2 * pad - HH) % stride == 0, 'height does not work'
 50 | 
 51 |   # Pad the input
 52 |   p = pad
 53 |   x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')
 54 |   
 55 |   # Figure out output dimensions
 56 |   H += 2 * pad
 57 |   W += 2 * pad
 58 |   out_h = (H - HH) / stride + 1
 59 |   out_w = (W - WW) / stride + 1
 60 | 
 61 |   # Perform an im2col operation by picking clever strides
 62 |   shape = (C, HH, WW, N, out_h, out_w)
 63 |   strides = (H * W, W, 1, C * H * W, stride * W, stride)
 64 |   strides = x.itemsize * np.array(strides)
 65 |   x_stride = np.lib.stride_tricks.as_strided(x_padded,
 66 |                 shape=shape, strides=strides)
 67 |   x_cols = np.ascontiguousarray(x_stride)
 68 |   x_cols.shape = (C * HH * WW, N * out_h * out_w)
 69 | 
 70 |   # Now all our convolutions are a big matrix multiply
 71 |   res = w.reshape(F, -1).dot(x_cols) + b.reshape(-1, 1)
 72 | 
 73 |   # Reshape the output
 74 |   res.shape = (F, N, out_h, out_w)
 75 |   out = res.transpose(1, 0, 2, 3)
 76 | 
 77 |   # Be nice and return a contiguous array
 78 |   # The old version of conv_forward_fast doesn't do this, so for a fair
 79 |   # comparison we won't either
 80 |   out = np.ascontiguousarray(out)
 81 | 
 82 |   cache = (x, w, b, conv_param, x_cols)
 83 |   return out, cache
 84 |   
 85 | 
 86 | def conv_backward_strides(dout, cache):
 87 |   x, w, b, conv_param, x_cols = cache
 88 |   stride, pad = conv_param['stride'], conv_param['pad']
 89 | 
 90 |   N, C, H, W = x.shape
 91 |   F, _, HH, WW = w.shape
 92 |   _, _, out_h, out_w = dout.shape
 93 | 
 94 |   db = np.sum(dout, axis=(0, 2, 3))
 95 | 
 96 |   dout_reshaped = dout.transpose(1, 0, 2, 3).reshape(F, -1)
 97 |   dw = dout_reshaped.dot(x_cols.T).reshape(w.shape)
 98 | 
 99 |   dx_cols = w.reshape(F, -1).T.dot(dout_reshaped)
100 |   dx_cols.shape = (C, HH, WW, N, out_h, out_w)
101 |   dx = col2im_6d_cython(dx_cols, N, C, H, W, HH, WW, pad, stride)
102 | 
103 |   return dx, dw, db
104 | 
105 | 
106 | def conv_backward_im2col(dout, cache):
107 |   """
108 |   A fast implementation of the backward pass for a convolutional layer
109 |   based on im2col and col2im.
110 |   """
111 |   x, w, b, conv_param, x_cols = cache
112 |   stride, pad = conv_param['stride'], conv_param['pad']
113 | 
114 |   db = np.sum(dout, axis=(0, 2, 3))
115 | 
116 |   num_filters, _, filter_height, filter_width = w.shape
117 |   dout_reshaped = dout.transpose(1, 2, 3, 0).reshape(num_filters, -1)
118 |   dw = dout_reshaped.dot(x_cols.T).reshape(w.shape)
119 | 
120 |   dx_cols = w.reshape(num_filters, -1).T.dot(dout_reshaped)
121 |   # dx = col2im_indices(dx_cols, x.shape, filter_height, filter_width, pad, stride)
122 |   dx = col2im_cython(dx_cols, x.shape[0], x.shape[1], x.shape[2], x.shape[3],
123 |                      filter_height, filter_width, pad, stride)
124 | 
125 |   return dx, dw, db
126 | 
127 | 
128 | conv_forward_fast = conv_forward_strides
129 | conv_backward_fast = conv_backward_strides
130 | 
131 | 
132 | def max_pool_forward_fast(x, pool_param):
133 |   """
134 |   A fast implementation of the forward pass for a max pooling layer.
135 | 
136 |   This chooses between the reshape method and the im2col method. If the pooling
137 |   regions are square and tile the input image, then we can use the reshape
138 |   method which is very fast. Otherwise we fall back on the im2col method, which
139 |   is not much faster than the naive method.
140 |   """
141 |   N, C, H, W = x.shape
142 |   pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width']
143 |   stride = pool_param['stride']
144 | 
145 |   same_size = pool_height == pool_width == stride
146 |   tiles = H % pool_height == 0 and W % pool_width == 0
147 |   if same_size and tiles:
148 |     out, reshape_cache = max_pool_forward_reshape(x, pool_param)
149 |     cache = ('reshape', reshape_cache)
150 |   else:
151 |     out, im2col_cache = max_pool_forward_im2col(x, pool_param)
152 |     cache = ('im2col', im2col_cache)
153 |   return out, cache
154 | 
155 | 
156 | def max_pool_backward_fast(dout, cache):
157 |   """
158 |   A fast implementation of the backward pass for a max pooling layer.
159 | 
160 |   This switches between the reshape method an the im2col method depending on
161 |   which method was used to generate the cache.
162 |   """
163 |   method, real_cache = cache
164 |   if method == 'reshape':
165 |     return max_pool_backward_reshape(dout, real_cache)
166 |   elif method == 'im2col':
167 |     return max_pool_backward_im2col(dout, real_cache)
168 |   else:
169 |     raise ValueError('Unrecognized method "%s"' % method)
170 | 
171 | 
172 | def max_pool_forward_reshape(x, pool_param):
173 |   """
174 |   A fast implementation of the forward pass for the max pooling layer that uses
175 |   some clever reshaping.
176 | 
177 |   This can only be used for square pooling regions that tile the input.
178 |   """
179 |   N, C, H, W = x.shape
180 |   pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width']
181 |   stride = pool_param['stride']
182 |   assert pool_height == pool_width == stride, 'Invalid pool params'
183 |   assert H % pool_height == 0
184 |   assert W % pool_height == 0
185 |   x_reshaped = x.reshape(N, C, H / pool_height, pool_height,
186 |                          W / pool_width, pool_width)
187 |   out = x_reshaped.max(axis=3).max(axis=4)
188 | 
189 |   cache = (x, x_reshaped, out)
190 |   return out, cache
191 | 
192 | 
193 | def max_pool_backward_reshape(dout, cache):
194 |   """
195 |   A fast implementation of the backward pass for the max pooling layer that
196 |   uses some clever broadcasting and reshaping.
197 | 
198 |   This can only be used if the forward pass was computed using
199 |   max_pool_forward_reshape.
200 | 
201 |   NOTE: If there are multiple argmaxes, this method will assign gradient to
202 |   ALL argmax elements of the input rather than picking one. In this case the
203 |   gradient will actually be incorrect. However this is unlikely to occur in
204 |   practice, so it shouldn't matter much. One possible solution is to split the
205 |   upstream gradient equally among all argmax elements; this should result in a
206 |   valid subgradient. You can make this happen by uncommenting the line below;
207 |   however this results in a significant performance penalty (about 40% slower)
208 |   and is unlikely to matter in practice so we don't do it.
209 |   """
210 |   x, x_reshaped, out = cache
211 | 
212 |   dx_reshaped = np.zeros_like(x_reshaped)
213 |   out_newaxis = out[:, :, :, np.newaxis, :, np.newaxis]
214 |   mask = (x_reshaped == out_newaxis)
215 |   dout_newaxis = dout[:, :, :, np.newaxis, :, np.newaxis]
216 |   dout_broadcast, _ = np.broadcast_arrays(dout_newaxis, dx_reshaped)
217 |   dx_reshaped[mask] = dout_broadcast[mask]
218 |   dx_reshaped /= np.sum(mask, axis=(3, 5), keepdims=True)
219 |   dx = dx_reshaped.reshape(x.shape)
220 | 
221 |   return dx
222 | 
223 | 
224 | def max_pool_forward_im2col(x, pool_param):
225 |   """
226 |   An implementation of the forward pass for max pooling based on im2col.
227 | 
228 |   This isn't much faster than the naive version, so it should be avoided if
229 |   possible.
230 |   """
231 |   N, C, H, W = x.shape
232 |   pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width']
233 |   stride = pool_param['stride']
234 | 
235 |   assert (H - pool_height) % stride == 0, 'Invalid height'
236 |   assert (W - pool_width) % stride == 0, 'Invalid width'
237 | 
238 |   out_height = (H - pool_height) / stride + 1
239 |   out_width = (W - pool_width) / stride + 1
240 | 
241 |   x_split = x.reshape(N * C, 1, H, W)
242 |   x_cols = im2col(x_split, pool_height, pool_width, padding=0, stride=stride)
243 |   x_cols_argmax = np.argmax(x_cols, axis=0)
244 |   x_cols_max = x_cols[x_cols_argmax, np.arange(x_cols.shape[1])]
245 |   out = x_cols_max.reshape(out_height, out_width, N, C).transpose(2, 3, 0, 1)
246 | 
247 |   cache = (x, x_cols, x_cols_argmax, pool_param)
248 |   return out, cache
249 | 
250 | 
251 | def max_pool_backward_im2col(dout, cache):
252 |   """
253 |   An implementation of the backward pass for max pooling based on im2col.
254 | 
255 |   This isn't much faster than the naive version, so it should be avoided if
256 |   possible.
257 |   """
258 |   x, x_cols, x_cols_argmax, pool_param = cache
259 |   N, C, H, W = x.shape
260 |   pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width']
261 |   stride = pool_param['stride']
262 | 
263 |   dout_reshaped = dout.transpose(2, 3, 0, 1).flatten()
264 |   dx_cols = np.zeros_like(x_cols)
265 |   dx_cols[x_cols_argmax, np.arange(dx_cols.shape[1])] = dout_reshaped
266 |   dx = col2im_indices(dx_cols, (N * C, 1, H, W), pool_height, pool_width,
267 |               padding=0, stride=stride)
268 |   dx = dx.reshape(x.shape)
269 | 
270 |   return dx
271 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/gradient_check.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from random import randrange
  3 | 
  4 | def eval_numerical_gradient(f, x, verbose=True, h=0.00001):
  5 |   """ 
  6 |   a naive implementation of numerical gradient of f at x 
  7 |   - f should be a function that takes a single argument
  8 |   - x is the point (numpy array) to evaluate the gradient at
  9 |   """ 
 10 | 
 11 |   fx = f(x) # evaluate function value at original point
 12 |   grad = np.zeros_like(x)
 13 |   # iterate over all indexes in x
 14 |   it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
 15 |   while not it.finished:
 16 | 
 17 |     # evaluate function at x+h
 18 |     ix = it.multi_index
 19 |     oldval = x[ix]
 20 |     x[ix] = oldval + h # increment by h
 21 |     fxph = f(x) # evalute f(x + h)
 22 |     x[ix] = oldval - h
 23 |     fxmh = f(x) # evaluate f(x - h)
 24 |     x[ix] = oldval # restore
 25 | 
 26 |     # compute the partial derivative with centered formula
 27 |     grad[ix] = (fxph - fxmh) / (2 * h) # the slope
 28 |     if verbose:
 29 |       print ix, grad[ix]
 30 |     it.iternext() # step to next dimension
 31 | 
 32 |   return grad
 33 | 
 34 | 
 35 | def eval_numerical_gradient_array(f, x, df, h=1e-5):
 36 |   """
 37 |   Evaluate a numeric gradient for a function that accepts a numpy
 38 |   array and returns a numpy array.
 39 |   """
 40 |   grad = np.zeros_like(x)
 41 |   it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
 42 |   while not it.finished:
 43 |     ix = it.multi_index
 44 | 
 45 |     oldval = x[ix]
 46 |     x[ix] = oldval + h
 47 |     pos = f(x).copy()
 48 |     x[ix] = oldval - h
 49 |     neg = f(x).copy()
 50 |     x[ix] = oldval
 51 |     
 52 |     grad[ix] = np.sum((pos - neg) * df) / (2 * h)
 53 |     it.iternext()
 54 |   return grad
 55 | 
 56 | 
 57 | def eval_numerical_gradient_blobs(f, inputs, output, h=1e-5):
 58 |   """
 59 |   Compute numeric gradients for a function that operates on input
 60 |   and output blobs.
 61 |   
 62 |   We assume that f accepts several input blobs as arguments, followed by a blob
 63 |   into which outputs will be written. For example, f might be called like this:
 64 | 
 65 |   f(x, w, out)
 66 |   
 67 |   where x and w are input Blobs, and the result of f will be written to out.
 68 | 
 69 |   Inputs: 
 70 |   - f: function
 71 |   - inputs: tuple of input blobs
 72 |   - output: output blob
 73 |   - h: step size
 74 |   """
 75 |   numeric_diffs = []
 76 |   for input_blob in inputs:
 77 |     diff = np.zeros_like(input_blob.diffs)
 78 |     it = np.nditer(input_blob.vals, flags=['multi_index'],
 79 |                    op_flags=['readwrite'])
 80 |     while not it.finished:
 81 |       idx = it.multi_index
 82 |       orig = input_blob.vals[idx]
 83 | 
 84 |       input_blob.vals[idx] = orig + h
 85 |       f(*(inputs + (output,)))
 86 |       pos = np.copy(output.vals)
 87 |       input_blob.vals[idx] = orig - h
 88 |       f(*(inputs + (output,)))
 89 |       neg = np.copy(output.vals)
 90 |       input_blob.vals[idx] = orig
 91 |       
 92 |       diff[idx] = np.sum((pos - neg) * output.diffs) / (2.0 * h)
 93 | 
 94 |       it.iternext()
 95 |     numeric_diffs.append(diff)
 96 |   return numeric_diffs
 97 | 
 98 | 
 99 | def eval_numerical_gradient_net(net, inputs, output, h=1e-5):
100 |   return eval_numerical_gradient_blobs(lambda *args: net.forward(),
101 |               inputs, output, h=h)
102 | 
103 | 
104 | def grad_check_sparse(f, x, analytic_grad, num_checks=10, h=1e-5):
105 |   """
106 |   sample a few random elements and only return numerical
107 |   in this dimensions.
108 |   """
109 | 
110 |   for i in xrange(num_checks):
111 |     ix = tuple([randrange(m) for m in x.shape])
112 | 
113 |     oldval = x[ix]
114 |     x[ix] = oldval + h # increment by h
115 |     fxph = f(x) # evaluate f(x + h)
116 |     x[ix] = oldval - h # increment by h
117 |     fxmh = f(x) # evaluate f(x - h)
118 |     x[ix] = oldval # reset
119 | 
120 |     grad_numerical = (fxph - fxmh) / (2 * h)
121 |     grad_analytic = analytic_grad[ix]
122 |     rel_error = abs(grad_numerical - grad_analytic) / (abs(grad_numerical) + abs(grad_analytic))
123 |     print 'numerical: %f analytic: %f, relative error: %e' % (grad_numerical, grad_analytic, rel_error)
124 | 
125 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/im2col.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | def get_im2col_indices(x_shape, field_height, field_width, padding=1, stride=1):
 5 |   # First figure out what the size of the output should be
 6 |   N, C, H, W = x_shape
 7 |   assert (H + 2 * padding - field_height) % stride == 0
 8 |   assert (W + 2 * padding - field_height) % stride == 0
 9 |   out_height = (H + 2 * padding - field_height) / stride + 1
10 |   out_width = (W + 2 * padding - field_width) / stride + 1
11 | 
12 |   i0 = np.repeat(np.arange(field_height), field_width)
13 |   i0 = np.tile(i0, C)
14 |   i1 = stride * np.repeat(np.arange(out_height), out_width)
15 |   j0 = np.tile(np.arange(field_width), field_height * C)
16 |   j1 = stride * np.tile(np.arange(out_width), out_height)
17 |   i = i0.reshape(-1, 1) + i1.reshape(1, -1)
18 |   j = j0.reshape(-1, 1) + j1.reshape(1, -1)
19 | 
20 |   k = np.repeat(np.arange(C), field_height * field_width).reshape(-1, 1)
21 | 
22 |   return (k, i, j)
23 | 
24 | 
25 | def im2col_indices(x, field_height, field_width, padding=1, stride=1):
26 |   """ An implementation of im2col based on some fancy indexing """
27 |   # Zero-pad the input
28 |   p = padding
29 |   x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')
30 | 
31 |   k, i, j = get_im2col_indices(x.shape, field_height, field_width, padding,
32 |                                stride)
33 | 
34 |   cols = x_padded[:, k, i, j]
35 |   C = x.shape[1]
36 |   cols = cols.transpose(1, 2, 0).reshape(field_height * field_width * C, -1)
37 |   return cols
38 | 
39 | 
40 | def col2im_indices(cols, x_shape, field_height=3, field_width=3, padding=1,
41 |                    stride=1):
42 |   """ An implementation of col2im based on fancy indexing and np.add.at """
43 |   N, C, H, W = x_shape
44 |   H_padded, W_padded = H + 2 * padding, W + 2 * padding
45 |   x_padded = np.zeros((N, C, H_padded, W_padded), dtype=cols.dtype)
46 |   k, i, j = get_im2col_indices(x_shape, field_height, field_width, padding,
47 |                                stride)
48 |   cols_reshaped = cols.reshape(C * field_height * field_width, -1, N)
49 |   cols_reshaped = cols_reshaped.transpose(2, 0, 1)
50 |   np.add.at(x_padded, (slice(None), k, i, j), cols_reshaped)
51 |   if padding == 0:
52 |     return x_padded
53 |   return x_padded[:, :, padding:-padding, padding:-padding]
54 | 
55 | pass
56 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/im2col_cython.pyx:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | cimport numpy as np
  3 | cimport cython
  4 | 
  5 | # DTYPE = np.float64
  6 | # ctypedef np.float64_t DTYPE_t
  7 | 
  8 | ctypedef fused DTYPE_t:
  9 |     np.float32_t
 10 |     np.float64_t
 11 | 
 12 | def im2col_cython(np.ndarray[DTYPE_t, ndim=4] x, int field_height,
 13 |                   int field_width, int padding, int stride):
 14 |     cdef int N = x.shape[0]
 15 |     cdef int C = x.shape[1]
 16 |     cdef int H = x.shape[2]
 17 |     cdef int W = x.shape[3]
 18 |     
 19 |     cdef int HH = (H + 2 * padding - field_height) / stride + 1
 20 |     cdef int WW = (W + 2 * padding - field_width) / stride + 1
 21 | 
 22 |     cdef int p = padding
 23 |     cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.pad(x,
 24 |             ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')
 25 | 
 26 |     cdef np.ndarray[DTYPE_t, ndim=2] cols = np.zeros(
 27 |             (C * field_height * field_width, N * HH * WW),
 28 |             dtype=x.dtype)
 29 | 
 30 |     # Moving the inner loop to a C function with no bounds checking works, but does
 31 |     # not seem to help performance in any measurable way.
 32 | 
 33 |     im2col_cython_inner(cols, x_padded, N, C, H, W, HH, WW,
 34 |                         field_height, field_width, padding, stride)
 35 |     return cols
 36 | 
 37 | 
 38 | @cython.boundscheck(False)
 39 | cdef int im2col_cython_inner(np.ndarray[DTYPE_t, ndim=2] cols,
 40 |                              np.ndarray[DTYPE_t, ndim=4] x_padded,
 41 |                              int N, int C, int H, int W, int HH, int WW,
 42 |                              int field_height, int field_width, int padding, int stride) except? -1:
 43 |     cdef int c, ii, jj, row, yy, xx, i, col
 44 | 
 45 |     for c in range(C):
 46 |         for yy in range(HH):
 47 |             for xx in range(WW):
 48 |                 for ii in range(field_height):
 49 |                     for jj in range(field_width):
 50 |                         row = c * field_width * field_height + ii * field_height + jj
 51 |                         for i in range(N):
 52 |                             col = yy * WW * N + xx * N + i
 53 |                             cols[row, col] = x_padded[i, c, stride * yy + ii, stride * xx + jj]
 54 | 
 55 | 
 56 | 
 57 | def col2im_cython(np.ndarray[DTYPE_t, ndim=2] cols, int N, int C, int H, int W,
 58 |                   int field_height, int field_width, int padding, int stride):
 59 |     cdef np.ndarray x = np.empty((N, C, H, W), dtype=cols.dtype)
 60 |     cdef int HH = (H + 2 * padding - field_height) / stride + 1
 61 |     cdef int WW = (W + 2 * padding - field_width) / stride + 1
 62 |     cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.zeros((N, C, H + 2 * padding, W + 2 * padding),
 63 |                                         dtype=cols.dtype)
 64 | 
 65 |     # Moving the inner loop to a C-function with no bounds checking improves
 66 |     # performance quite a bit for col2im.
 67 |     col2im_cython_inner(cols, x_padded, N, C, H, W, HH, WW, 
 68 |                         field_height, field_width, padding, stride)
 69 |     if padding > 0:
 70 |         return x_padded[:, :, padding:-padding, padding:-padding]
 71 |     return x_padded
 72 | 
 73 | 
 74 | @cython.boundscheck(False)
 75 | cdef int col2im_cython_inner(np.ndarray[DTYPE_t, ndim=2] cols,
 76 |                              np.ndarray[DTYPE_t, ndim=4] x_padded,
 77 |                              int N, int C, int H, int W, int HH, int WW,
 78 |                              int field_height, int field_width, int padding, int stride) except? -1:
 79 |     cdef int c, ii, jj, row, yy, xx, i, col
 80 | 
 81 |     for c in range(C):
 82 |         for ii in range(field_height):
 83 |             for jj in range(field_width):
 84 |                 row = c * field_width * field_height + ii * field_height + jj
 85 |                 for yy in range(HH):
 86 |                     for xx in range(WW):
 87 |                         for i in range(N):
 88 |                             col = yy * WW * N + xx * N + i
 89 |                             x_padded[i, c, stride * yy + ii, stride * xx + jj] += cols[row, col]
 90 | 
 91 | 
 92 | @cython.boundscheck(False)
 93 | @cython.wraparound(False)
 94 | cdef col2im_6d_cython_inner(np.ndarray[DTYPE_t, ndim=6] cols,
 95 |                             np.ndarray[DTYPE_t, ndim=4] x_padded,
 96 |                             int N, int C, int H, int W, int HH, int WW,
 97 |                             int out_h, int out_w, int pad, int stride):
 98 | 
 99 |     cdef int c, hh, ww, n, h, w
100 |     for n in range(N):
101 |         for c in range(C):
102 |             for hh in range(HH):
103 |                 for ww in range(WW):
104 |                     for h in range(out_h):
105 |                         for w in range(out_w):
106 |                             x_padded[n, c, stride * h + hh, stride * w + ww] += cols[c, hh, ww, n, h, w]
107 |     
108 | 
109 | def col2im_6d_cython(np.ndarray[DTYPE_t, ndim=6] cols, int N, int C, int H, int W,
110 |         int HH, int WW, int pad, int stride):
111 |     cdef np.ndarray x = np.empty((N, C, H, W), dtype=cols.dtype)
112 |     cdef int out_h = (H + 2 * pad - HH) / stride + 1
113 |     cdef int out_w = (W + 2 * pad - WW) / stride + 1
114 |     cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.zeros((N, C, H + 2 * pad, W + 2 * pad),
115 |                                                   dtype=cols.dtype)
116 | 
117 |     col2im_6d_cython_inner(cols, x_padded, N, C, H, W, HH, WW, out_h, out_w, pad, stride)
118 | 
119 |     if pad > 0:
120 |         return x_padded[:, :, pad:-pad, pad:-pad]
121 |     return x_padded 
122 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/layer_utils.py:
--------------------------------------------------------------------------------
  1 | from cs231n.layers import *
  2 | from cs231n.fast_layers import *
  3 | 
  4 | 
  5 | def affine_relu_forward(x, w, b):
  6 |   """
  7 |   Convenience layer that perorms an affine transform followed by a ReLU
  8 | 
  9 |   Inputs:
 10 |   - x: Input to the affine layer
 11 |   - w, b: Weights for the affine layer
 12 | 
 13 |   Returns a tuple of:
 14 |   - out: Output from the ReLU
 15 |   - cache: Object to give to the backward pass
 16 |   """
 17 |   a, fc_cache = affine_forward(x, w, b)
 18 |   out, relu_cache = relu_forward(a)
 19 |   cache = (fc_cache, relu_cache)
 20 |   return out, cache
 21 | 
 22 | 
 23 | def affine_relu_backward(dout, cache):
 24 |   """
 25 |   Backward pass for the affine-relu convenience layer
 26 |   """
 27 |   fc_cache, relu_cache = cache
 28 |   da = relu_backward(dout, relu_cache)
 29 |   dx, dw, db = affine_backward(da, fc_cache)
 30 |   return dx, dw, db
 31 | 
 32 | 
 33 | def affine_bn_relu_forward(x, w, gamma, beta, bn_param):
 34 |   """
 35 |   Convenience layer that perorms an affine transform followed by a ReLU
 36 |   and batch normalization
 37 | 
 38 |   Inputs:
 39 |   - x: Input to the affine layer
 40 |   - w, b: Weights for the affine layer
 41 | 
 42 |   Returns a tuple of:
 43 |   - out: Output from the ReLU
 44 |   - cache: Object to give to the backward pass
 45 |   """
 46 |   out, fc_cache = affine_forward(x, w, 0)
 47 |   out, bn_cache = batchnorm_forward(out, gamma, beta, bn_param)
 48 |   out, relu_cache = relu_forward(out)
 49 |   cache = (fc_cache, bn_cache, relu_cache)
 50 |   return out, cache
 51 | 
 52 | def affine_bn_relu_backward(dout, cache):
 53 |   """
 54 |   Backward pass for the affine-bn-relu convenience layer
 55 |   """
 56 |   fc_cache, bn_cache, relu_cache = cache
 57 |   dout = relu_backward(dout, relu_cache)
 58 |   dout, dgamma, dbeta = batchnorm_backward(dout, bn_cache)
 59 |   dx, dw, db = affine_backward(dout, fc_cache)
 60 |   return dx, dw, dgamma, dbeta
 61 | 
 62 | def conv_relu_forward(x, w, b, conv_param):
 63 |   """
 64 |   A convenience layer that performs a convolution followed by a ReLU.
 65 | 
 66 |   Inputs:
 67 |   - x: Input to the convolutional layer
 68 |   - w, b, conv_param: Weights and parameters for the convolutional layer
 69 |   
 70 |   Returns a tuple of:
 71 |   - out: Output from the ReLU
 72 |   - cache: Object to give to the backward pass
 73 |   """
 74 |   a, conv_cache = conv_forward_fast(x, w, b, conv_param)
 75 |   out, relu_cache = relu_forward(a)
 76 |   cache = (conv_cache, relu_cache)
 77 |   return out, cache
 78 | 
 79 | 
 80 | def conv_relu_backward(dout, cache):
 81 |   """
 82 |   Backward pass for the conv-relu convenience layer.
 83 |   """
 84 |   conv_cache, relu_cache = cache
 85 |   da = relu_backward(dout, relu_cache)
 86 |   dx, dw, db = conv_backward_fast(da, conv_cache)
 87 |   return dx, dw, db
 88 | 
 89 | 
 90 | def conv_relu_pool_forward(x, w, b, conv_param, pool_param):
 91 |   """
 92 |   Convenience layer that performs a convolution, a ReLU, and a pool.
 93 | 
 94 |   Inputs:
 95 |   - x: Input to the convolutional layer
 96 |   - w, b, conv_param: Weights and parameters for the convolutional layer
 97 |   - pool_param: Parameters for the pooling layer
 98 | 
 99 |   Returns a tuple of:
100 |   - out: Output from the pooling layer
101 |   - cache: Object to give to the backward pass
102 |   """
103 |   a, conv_cache = conv_forward_fast(x, w, b, conv_param)
104 |   s, relu_cache = relu_forward(a)
105 |   out, pool_cache = max_pool_forward_fast(s, pool_param)
106 |   cache = (conv_cache, relu_cache, pool_cache)
107 |   return out, cache
108 | 
109 | 
110 | def conv_relu_pool_backward(dout, cache):
111 |   """
112 |   Backward pass for the conv-relu-pool convenience layer
113 |   """
114 |   conv_cache, relu_cache, pool_cache = cache
115 |   ds = max_pool_backward_fast(dout, pool_cache)
116 |   da = relu_backward(ds, relu_cache)
117 |   dx, dw, db = conv_backward_fast(da, conv_cache)
118 |   return dx, dw, db
119 | 
120 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/optim.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | """
  4 | This file implements various first-order update rules that are commonly used for
  5 | training neural networks. Each update rule accepts current weights and the
  6 | gradient of the loss with respect to those weights and produces the next set of
  7 | weights. Each update rule has the same interface:
  8 | 
  9 | def update(w, dw, config=None):
 10 | 
 11 | Inputs:
 12 |   - w: A numpy array giving the current weights.
 13 |   - dw: A numpy array of the same shape as w giving the gradient of the
 14 |     loss with respect to w.
 15 |   - config: A dictionary containing hyperparameter values such as learning rate,
 16 |     momentum, etc. If the update rule requires caching values over many
 17 |     iterations, then config will also hold these cached values.
 18 | 
 19 | Returns:
 20 |   - next_w: The next point after the update.
 21 |   - config: The config dictionary to be passed to the next iteration of the
 22 |     update rule.
 23 | 
 24 | NOTE: For most update rules, the default learning rate will probably not perform
 25 | well; however the default values of the other hyperparameters should work well
 26 | for a variety of different problems.
 27 | 
 28 | For efficiency, update rules may perform in-place updates, mutating w and
 29 | setting next_w equal to w.
 30 | """
 31 | 
 32 | 
 33 | def sgd(w, dw, config=None):
 34 |   """
 35 |   Performs vanilla stochastic gradient descent.
 36 | 
 37 |   config format:
 38 |   - learning_rate: Scalar learning rate.
 39 |   """
 40 |   if config is None: config = {}
 41 |   config.setdefault('learning_rate', 1e-2)
 42 | 
 43 |   w -= config['learning_rate'] * dw
 44 |   return w, config
 45 | 
 46 | 
 47 | def sgd_momentum(w, dw, config=None):
 48 |   """
 49 |   Performs stochastic gradient descent with momentum.
 50 | 
 51 |   config format:
 52 |   - learning_rate: Scalar learning rate.
 53 |   - momentum: Scalar between 0 and 1 giving the momentum value.
 54 |     Setting momentum = 0 reduces to sgd.
 55 |   - velocity: A numpy array of the same shape as w and dw used to store a moving
 56 |     average of the gradients.
 57 |   """
 58 |   if config is None: config = {}
 59 |   config.setdefault('learning_rate', 1e-2)
 60 |   config.setdefault('momentum', 0.9)
 61 |   v = config.get('velocity', np.zeros_like(w))
 62 |   
 63 |   next_w = None
 64 |   #############################################################################
 65 |   # TODO: Implement the momentum update formula. Store the updated value in   #
 66 |   # the next_w variable. You should also use and update the velocity v.       #
 67 |   #############################################################################
 68 |   # MISTAKE! Do not forget that you should minus the gradient
 69 |   v = v*config['momentum'] - config['learning_rate']*dw
 70 |   next_w = w+v
 71 |   #############################################################################
 72 |   #                             END OF YOUR CODE                              #
 73 |   #############################################################################
 74 |   config['velocity'] = v
 75 | 
 76 |   return next_w, config
 77 | 
 78 | 
 79 | 
 80 | def rmsprop(x, dx, config=None):
 81 |   """
 82 |   Uses the RMSProp update rule, which uses a moving average of squared gradient
 83 |   values to set adaptive per-parameter learning rates.
 84 | 
 85 |   config format:
 86 |   - learning_rate: Scalar learning rate.
 87 |   - decay_rate: Scalar between 0 and 1 giving the decay rate for the squared
 88 |     gradient cache.
 89 |   - epsilon: Small scalar used for smoothing to avoid dividing by zero.
 90 |   - cache: Moving average of second moments of gradients.
 91 |   """
 92 |   if config is None: config = {}
 93 |   config.setdefault('learning_rate', 1e-2)
 94 |   config.setdefault('decay_rate', 0.99)
 95 |   config.setdefault('epsilon', 1e-8)
 96 |   config.setdefault('cache', np.zeros_like(x))
 97 | 
 98 |   next_x = None
 99 |   #############################################################################
100 |   # TODO: Implement the RMSprop update formula, storing the next value of x   #
101 |   # in the next_x variable. Don't forget to update cache value stored in      #  
102 |   # config['cache'].                                                          #
103 |   #############################################################################
104 |   config['cache'] = config['decay_rate']*config['cache'] + \
105 |     (1-config['decay_rate'])*np.square(dx)
106 |   next_x = x-config['learning_rate']*dx/(np.sqrt(config['cache'])+config['epsilon'])
107 |   #############################################################################
108 |   #                             END OF YOUR CODE                              #
109 |   #############################################################################
110 | 
111 |   return next_x, config
112 | 
113 | 
114 | def adam(x, dx, config=None):
115 |   """
116 |   Uses the Adam update rule, which incorporates moving averages of both the
117 |   gradient and its square and a bias correction term.
118 | 
119 |   config format:
120 |   - learning_rate: Scalar learning rate.
121 |   - beta1: Decay rate for moving average of first moment of gradient.
122 |   - beta2: Decay rate for moving average of second moment of gradient.
123 |   - epsilon: Small scalar used for smoothing to avoid dividing by zero.
124 |   - m: Moving average of gradient.
125 |   - v: Moving average of squared gradient.
126 |   - t: Iteration number.
127 |   """
128 |   if config is None: config = {}
129 |   config.setdefault('learning_rate', 1e-3)
130 |   config.setdefault('beta1', 0.9)
131 |   config.setdefault('beta2', 0.999)
132 |   config.setdefault('epsilon', 1e-8)
133 |   config.setdefault('m', np.zeros_like(x))
134 |   config.setdefault('v', np.zeros_like(x))
135 |   config.setdefault('t', 0)
136 |   
137 |   next_x = None
138 |   #############################################################################
139 |   # TODO: Implement the Adam update formula, storing the next value of x in   #
140 |   # the next_x variable. Don't forget to update the m, v, and t variables     #
141 |   # stored in config.                                                         #
142 |   #############################################################################
143 |   config['t'] += 1
144 |   config['m'] = config['beta1']*config['m']+(1-config['beta1'])*dx
145 |   config['v'] = config['beta2']*config['v']+(1-config['beta2'])*np.square(dx)
146 |   #MISTAKE! Read paper carefully!
147 |   # config['m'] /= 1-np.power(config['beta1'], config['t'])
148 |   # config['v'] /= 1-np.power(config['beta2'], config['t'])
149 |   m_biased = config['m'] / (1-np.power(config['beta1'], config['t']))
150 |   v_biased = config['v'] / (1-np.power(config['beta2'], config['t']))
151 |   next_x=x-config['learning_rate']*m_biased/(np.sqrt(v_biased)+config['epsilon'])
152 |   #############################################################################
153 |   #                             END OF YOUR CODE                              #
154 |   #############################################################################
155 |   
156 |   return next_x, config
157 | 
158 |   
159 |   
160 |   
161 | 
162 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/setup.py:
--------------------------------------------------------------------------------
 1 | from distutils.core import setup
 2 | from distutils.extension import Extension
 3 | from Cython.Build import cythonize
 4 | import numpy
 5 | 
 6 | extensions = [
 7 |   Extension('im2col_cython', ['im2col_cython.pyx'],
 8 |             include_dirs = [numpy.get_include()]
 9 |   ),
10 | ]
11 | 
12 | setup(
13 |     ext_modules = cythonize(extensions),
14 | )
15 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/solver.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | from cs231n import optim
  4 | 
  5 | 
  6 | class Solver(object):
  7 |   """
  8 |   A Solver encapsulates all the logic necessary for training classification
  9 |   models. The Solver performs stochastic gradient descent using different
 10 |   update rules defined in optim.py.
 11 | 
 12 |   The solver accepts both training and validataion data and labels so it can
 13 |   periodically check classification accuracy on both training and validation
 14 |   data to watch out for overfitting.
 15 | 
 16 |   To train a model, you will first construct a Solver instance, passing the
 17 |   model, dataset, and various optoins (learning rate, batch size, etc) to the
 18 |   constructor. You will then call the train() method to run the optimization
 19 |   procedure and train the model.
 20 |   
 21 |   After the train() method returns, model.params will contain the parameters
 22 |   that performed best on the validation set over the course of training.
 23 |   In addition, the instance variable solver.loss_history will contain a list
 24 |   of all losses encountered during training and the instance variables
 25 |   solver.train_acc_history and solver.val_acc_history will be lists containing
 26 |   the accuracies of the model on the training and validation set at each epoch.
 27 |   
 28 |   Example usage might look something like this:
 29 |   
 30 |   data = {
 31 |     'X_train': # training data
 32 |     'y_train': # training labels
 33 |     'X_val': # validation data
 34 |     'X_train': # validation labels
 35 |   }
 36 |   model = MyAwesomeModel(hidden_size=100, reg=10)
 37 |   solver = Solver(model, data,
 38 |                   update_rule='sgd',
 39 |                   optim_config={
 40 |                     'learning_rate': 1e-3,
 41 |                   },
 42 |                   lr_decay=0.95,
 43 |                   num_epochs=10, batch_size=100,
 44 |                   print_every=100)
 45 |   solver.train()
 46 | 
 47 | 
 48 |   A Solver works on a model object that must conform to the following API:
 49 | 
 50 |   - model.params must be a dictionary mapping string parameter names to numpy
 51 |     arrays containing parameter values.
 52 | 
 53 |   - model.loss(X, y) must be a function that computes training-time loss and
 54 |     gradients, and test-time classification scores, with the following inputs
 55 |     and outputs:
 56 | 
 57 |     Inputs:
 58 |     - X: Array giving a minibatch of input data of shape (N, d_1, ..., d_k)
 59 |     - y: Array of labels, of shape (N,) giving labels for X where y[i] is the
 60 |       label for X[i].
 61 | 
 62 |     Returns:
 63 |     If y is None, run a test-time forward pass and return:
 64 |     - scores: Array of shape (N, C) giving classification scores for X where
 65 |       scores[i, c] gives the score of class c for X[i].
 66 | 
 67 |     If y is not None, run a training time forward and backward pass and return
 68 |     a tuple of:
 69 |     - loss: Scalar giving the loss
 70 |     - grads: Dictionary with the same keys as self.params mapping parameter
 71 |       names to gradients of the loss with respect to those parameters.
 72 |   """
 73 | 
 74 |   def __init__(self, model, data, **kwargs):
 75 |     """
 76 |     Construct a new Solver instance.
 77 |     
 78 |     Required arguments:
 79 |     - model: A model object conforming to the API described above
 80 |     - data: A dictionary of training and validation data with the following:
 81 |       'X_train': Array of shape (N_train, d_1, ..., d_k) giving training images
 82 |       'X_val': Array of shape (N_val, d_1, ..., d_k) giving validation images
 83 |       'y_train': Array of shape (N_train,) giving labels for training images
 84 |       'y_val': Array of shape (N_val,) giving labels for validation images
 85 |       
 86 |     Optional arguments:
 87 |     - update_rule: A string giving the name of an update rule in optim.py.
 88 |       Default is 'sgd'.
 89 |     - optim_config: A dictionary containing hyperparameters that will be
 90 |       passed to the chosen update rule. Each update rule requires different
 91 |       hyperparameters (see optim.py) but all update rules require a
 92 |       'learning_rate' parameter so that should always be present.
 93 |     - lr_decay: A scalar for learning rate decay; after each epoch the learning
 94 |       rate is multiplied by this value.
 95 |     - batch_size: Size of minibatches used to compute loss and gradient during
 96 |       training.
 97 |     - num_epochs: The number of epochs to run for during training.
 98 |     - print_every: Integer; training losses will be printed every print_every
 99 |       iterations.
100 |     - verbose: Boolean; if set to false then no output will be printed during
101 |       training.
102 |     """
103 |     self.model = model
104 |     self.X_train = data['X_train']
105 |     self.y_train = data['y_train']
106 |     self.X_val = data['X_val']
107 |     self.y_val = data['y_val']
108 |     
109 |     # Unpack keyword arguments
110 |     self.update_rule = kwargs.pop('update_rule', 'sgd')
111 |     self.optim_config = kwargs.pop('optim_config', {})
112 |     self.lr_decay = kwargs.pop('lr_decay', 1.0)
113 |     self.batch_size = kwargs.pop('batch_size', 100)
114 |     self.num_epochs = kwargs.pop('num_epochs', 10)
115 | 
116 |     self.print_every = kwargs.pop('print_every', 10)
117 |     self.verbose = kwargs.pop('verbose', True)
118 | 
119 |     # Throw an error if there are extra keyword arguments
120 |     if len(kwargs) > 0:
121 |       extra = ', '.join('"%s"' % k for k in kwargs.keys())
122 |       raise ValueError('Unrecognized arguments %s' % extra)
123 | 
124 |     # Make sure the update rule exists, then replace the string
125 |     # name with the actual function
126 |     if not hasattr(optim, self.update_rule):
127 |       raise ValueError('Invalid update_rule "%s"' % self.update_rule)
128 |     self.update_rule = getattr(optim, self.update_rule)
129 | 
130 |     self._reset()
131 | 
132 | 
133 |   def _reset(self):
134 |     """
135 |     Set up some book-keeping variables for optimization. Don't call this
136 |     manually.
137 |     """
138 |     # Set up some variables for book-keeping
139 |     self.epoch = 0
140 |     self.best_val_acc = 0
141 |     self.best_params = {}
142 |     self.loss_history = []
143 |     self.train_acc_history = []
144 |     self.val_acc_history = []
145 | 
146 |     # Make a deep copy of the optim_config for each parameter
147 |     self.optim_configs = {}
148 |     for p in self.model.params:
149 |       d = {k: v for k, v in self.optim_config.iteritems()}
150 |       self.optim_configs[p] = d
151 | 
152 | 
153 |   def _step(self):
154 |     """
155 |     Make a single gradient update. This is called by train() and should not
156 |     be called manually.
157 |     """
158 |     # Make a minibatch of training data
159 |     num_train = self.X_train.shape[0]
160 |     batch_mask = np.random.choice(num_train, self.batch_size)
161 |     X_batch = self.X_train[batch_mask]
162 |     y_batch = self.y_train[batch_mask]
163 | 
164 |     # Compute loss and gradient
165 |     loss, grads = self.model.loss(X_batch, y_batch)
166 |     self.loss_history.append(loss)
167 | 
168 |     # Perform a parameter update
169 |     for p, w in self.model.params.iteritems():
170 |       dw = grads[p]
171 |       config = self.optim_configs[p]
172 |       next_w, next_config = self.update_rule(w, dw, config)
173 |       self.model.params[p] = next_w
174 |       self.optim_configs[p] = next_config
175 | 
176 | 
177 |   def check_accuracy(self, X, y, num_samples=None, batch_size=100):
178 |     """
179 |     Check accuracy of the model on the provided data.
180 |     
181 |     Inputs:
182 |     - X: Array of data, of shape (N, d_1, ..., d_k)
183 |     - y: Array of labels, of shape (N,)
184 |     - num_samples: If not None, subsample the data and only test the model
185 |       on num_samples datapoints.
186 |     - batch_size: Split X and y into batches of this size to avoid using too
187 |       much memory.
188 |       
189 |     Returns:
190 |     - acc: Scalar giving the fraction of instances that were correctly
191 |       classified by the model.
192 |     """
193 |     
194 |     # Maybe subsample the data
195 |     N = X.shape[0]
196 |     if num_samples is not None and N > num_samples:
197 |       mask = np.random.choice(N, num_samples)
198 |       N = num_samples
199 |       X = X[mask]
200 |       y = y[mask]
201 | 
202 |     # Compute predictions in batches
203 |     num_batches = N / batch_size
204 |     if N % batch_size != 0:
205 |       num_batches += 1
206 |     y_pred = []
207 |     for i in xrange(num_batches):
208 |       start = i * batch_size
209 |       end = (i + 1) * batch_size
210 |       scores = self.model.loss(X[start:end])
211 |       y_pred.append(np.argmax(scores, axis=1))
212 |     y_pred = np.hstack(y_pred)
213 |     acc = np.mean(y_pred == y)
214 | 
215 |     return acc
216 | 
217 | 
218 |   def train(self):
219 |     """
220 |     Run optimization to train the model.
221 |     """
222 |     num_train = self.X_train.shape[0]
223 |     iterations_per_epoch = max(num_train / self.batch_size, 1)
224 |     num_iterations = self.num_epochs * iterations_per_epoch
225 |     print 'num_train=', num_train
226 |     print 'iterations_per_epoch=', iterations_per_epoch
227 |     print 'num_iterations=', num_iterations
228 | 
229 |     for t in xrange(num_iterations):
230 |       self._step()
231 | 
232 |       # Maybe print training loss
233 |       if self.verbose and t % self.print_every == 0:
234 |         print '(Iteration %d / %d) loss: %f' % (
235 |                t + 1, num_iterations, self.loss_history[-1])
236 | 
237 |       # At the end of every epoch, increment the epoch counter and decay the
238 |       # learning rate.
239 |       epoch_end = (t + 1) % iterations_per_epoch == 0
240 |       if epoch_end:
241 |         self.epoch += 1
242 |         for k in self.optim_configs:
243 |           self.optim_configs[k]['learning_rate'] *= self.lr_decay
244 | 
245 |       # Check train and val accuracy on the first iteration, the last
246 |       # iteration, and at the end of each epoch.
247 |       first_it = (t == 0)
248 |       last_it = (t == num_iterations + 1)
249 |       if first_it or last_it or epoch_end:
250 |         train_acc = self.check_accuracy(self.X_train, self.y_train,
251 |                                         num_samples=1000)
252 |         val_acc = self.check_accuracy(self.X_val, self.y_val)
253 |         self.train_acc_history.append(train_acc)
254 |         self.val_acc_history.append(val_acc)
255 | 
256 |         if self.verbose:
257 |           print '(Epoch %d / %d) train acc: %f; val_acc: %f' % (
258 |                  self.epoch, self.num_epochs, train_acc, val_acc)
259 | 
260 |         # Keep track of the best model
261 |         if val_acc > self.best_val_acc:
262 |           self.best_val_acc = val_acc
263 |           self.best_params = {}
264 |           for k, v in self.model.params.iteritems():
265 |             self.best_params[k] = v.copy()
266 | 
267 |     # At the end of training swap the best params into the model
268 |     self.model.params = self.best_params
269 | 
270 | 


--------------------------------------------------------------------------------
/assignment2/cs231n/vis_utils.py:
--------------------------------------------------------------------------------
 1 | from math import sqrt, ceil
 2 | import numpy as np
 3 | 
 4 | def visualize_grid(Xs, ubound=255.0, padding=1):
 5 |   """
 6 |   Reshape a 4D tensor of image data to a grid for easy visualization.
 7 | 
 8 |   Inputs:
 9 |   - Xs: Data of shape (N, H, W, C)
10 |   - ubound: Output grid will have values scaled to the range [0, ubound]
11 |   - padding: The number of blank pixels between elements of the grid
12 |   """
13 |   (N, H, W, C) = Xs.shape
14 |   grid_size = int(ceil(sqrt(N)))
15 |   grid_height = H * grid_size + padding * (grid_size - 1)
16 |   grid_width = W * grid_size + padding * (grid_size - 1)
17 |   grid = np.zeros((grid_height, grid_width, C))
18 |   next_idx = 0
19 |   y0, y1 = 0, H
20 |   for y in xrange(grid_size):
21 |     x0, x1 = 0, W
22 |     for x in xrange(grid_size):
23 |       if next_idx < N:
24 |         img = Xs[next_idx]
25 |         low, high = np.min(img), np.max(img)
26 |         grid[y0:y1, x0:x1] = ubound * (img - low) / (high - low)
27 |         # grid[y0:y1, x0:x1] = Xs[next_idx]
28 |         next_idx += 1
29 |       x0 += W + padding
30 |       x1 += W + padding
31 |     y0 += H + padding
32 |     y1 += H + padding
33 |   # grid_max = np.max(grid)
34 |   # grid_min = np.min(grid)
35 |   # grid = ubound * (grid - grid_min) / (grid_max - grid_min)
36 |   return grid
37 | 
38 | def vis_grid(Xs):
39 |   """ visualize a grid of images """
40 |   (N, H, W, C) = Xs.shape
41 |   A = int(ceil(sqrt(N)))
42 |   G = np.ones((A*H+A, A*W+A, C), Xs.dtype)
43 |   G *= np.min(Xs)
44 |   n = 0
45 |   for y in range(A):
46 |     for x in range(A):
47 |       if n < N:
48 |         G[y*H+y:(y+1)*H+y, x*W+x:(x+1)*W+x, :] = Xs[n,:,:,:]
49 |         n += 1
50 |   # normalize to [0,1]
51 |   maxg = G.max()
52 |   ming = G.min()
53 |   G = (G - ming)/(maxg-ming)
54 |   return G
55 |   
56 | def vis_nn(rows):
57 |   """ visualize array of arrays of images """
58 |   N = len(rows)
59 |   D = len(rows[0])
60 |   H,W,C = rows[0][0].shape
61 |   Xs = rows[0][0]
62 |   G = np.ones((N*H+N, D*W+D, C), Xs.dtype)
63 |   for y in range(N):
64 |     for x in range(D):
65 |       G[y*H+y:(y+1)*H+y, x*W+x:(x+1)*W+x, :] = rows[y][x]
66 |   # normalize to [0,1]
67 |   maxg = G.max()
68 |   ming = G.min()
69 |   G = (G - ming)/(maxg-ming)
70 |   return G
71 | 
72 | 
73 | 
74 | 


--------------------------------------------------------------------------------
/assignment2/frameworkpython:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # what real Python executable to use
 4 | PYVER=2.7
 5 | PATHTOPYTHON=/usr/local/bin/
 6 | PYTHON=${PATHTOPYTHON}python${PYVER}
 7 | 
 8 | # find the root of the virtualenv, it should be the parent of the dir this script is in
 9 | ENV=`$PYTHON -c "import os; print os.path.abspath(os.path.join(os.path.dirname(\"$0\"), '..'))"`
10 | 
11 | # now run Python with the virtualenv set as Python's HOME
12 | export PYTHONHOME=$ENV
13 | exec $PYTHON "$@"
14 | 


--------------------------------------------------------------------------------
/assignment2/kitten.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment2/kitten.jpg


--------------------------------------------------------------------------------
/assignment2/puppy.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment2/puppy.jpg


--------------------------------------------------------------------------------
/assignment2/requirements.txt:
--------------------------------------------------------------------------------
 1 | Cython==0.23.4
 2 | Jinja2==2.8
 3 | MarkupSafe==0.23
 4 | Pillow==3.0.0
 5 | Pygments==2.0.2
 6 | appnope==0.1.0
 7 | argparse==1.2.1
 8 | backports-abc==0.4
 9 | backports.ssl-match-hostname==3.5.0.1
10 | certifi==2015.11.20.1
11 | cycler==0.9.0
12 | decorator==4.0.6
13 | functools32==3.2.3-2
14 | gnureadline==6.3.3
15 | ipykernel==4.2.2
16 | ipython==4.0.1
17 | ipython-genutils==0.1.0
18 | ipywidgets==4.1.1
19 | jsonschema==2.5.1
20 | jupyter==1.0.0
21 | jupyter-client==4.1.1
22 | jupyter-console==4.0.3
23 | jupyter-core==4.0.6
24 | matplotlib==1.5.0
25 | mistune==0.7.1
26 | nbconvert==4.1.0
27 | nbformat==4.0.1
28 | notebook==4.0.6
29 | numpy==1.10.4
30 | path.py==8.1.2
31 | pexpect==4.0.1
32 | pickleshare==0.5
33 | ptyprocess==0.5
34 | pyparsing==2.0.7
35 | python-dateutil==2.4.2
36 | pytz==2015.7
37 | pyzmq==15.1.0
38 | qtconsole==4.1.1
39 | scipy==0.16.1
40 | simplegeneric==0.8.1
41 | singledispatch==3.4.0.3
42 | six==1.10.0
43 | terminado==0.5
44 | tornado==4.3
45 | traitlets==4.0.0
46 | wsgiref==0.1.2
47 | 


--------------------------------------------------------------------------------
/assignment2/start_ipython_osx.sh:
--------------------------------------------------------------------------------
1 | # Assume the virtualenv is called .env
2 | 
3 | cp frameworkpython venv/bin
4 | venv/bin/frameworkpython -m IPython notebook
5 | 


--------------------------------------------------------------------------------
/assignment3/.gitignore:
--------------------------------------------------------------------------------
1 | *.swp
2 | *.pyc
3 | .env/*
4 | 


--------------------------------------------------------------------------------
/assignment3/.ipynb_checkpoints/ImageGeneration-checkpoint.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Image Generation\n",
  8 |     "In this notebook we will continue our exploration of image gradients using the deep model that was pretrained on TinyImageNet. We will explore various ways of using these image gradients to generate images. We will implement class visualizations, feature inversion, and DeepDream."
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "code",
 13 |    "execution_count": 1,
 14 |    "metadata": {
 15 |     "collapsed": false
 16 |    },
 17 |    "outputs": [],
 18 |    "source": [
 19 |     "# As usual, a bit of setup\n",
 20 |     "\n",
 21 |     "import time, os, json\n",
 22 |     "import numpy as np\n",
 23 |     "from scipy.misc import imread, imresize\n",
 24 |     "import matplotlib.pyplot as plt\n",
 25 |     "\n",
 26 |     "from cs231n.classifiers.pretrained_cnn import PretrainedCNN\n",
 27 |     "from cs231n.data_utils import load_tiny_imagenet\n",
 28 |     "from cs231n.image_utils import blur_image, deprocess_image, preprocess_image\n",
 29 |     "\n",
 30 |     "%matplotlib inline\n",
 31 |     "plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n",
 32 |     "plt.rcParams['image.interpolation'] = 'nearest'\n",
 33 |     "plt.rcParams['image.cmap'] = 'gray'\n",
 34 |     "\n",
 35 |     "# for auto-reloading external modules\n",
 36 |     "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n",
 37 |     "%load_ext autoreload\n",
 38 |     "%autoreload 2"
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "markdown",
 43 |    "metadata": {},
 44 |    "source": [
 45 |     "# TinyImageNet and pretrained model\n",
 46 |     "As in the previous notebook, load the TinyImageNet dataset and the pretrained model."
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": 2,
 52 |    "metadata": {
 53 |     "collapsed": false
 54 |    },
 55 |    "outputs": [
 56 |     {
 57 |      "name": "stdout",
 58 |      "output_type": "stream",
 59 |      "text": [
 60 |       "loading training data for synset 20 / 100\n",
 61 |       "loading training data for synset 40 / 100\n",
 62 |       "loading training data for synset 60 / 100\n",
 63 |       "loading training data for synset 80 / 100\n",
 64 |       "loading training data for synset 100 / 100\n"
 65 |      ]
 66 |     }
 67 |    ],
 68 |    "source": [
 69 |     "data = load_tiny_imagenet('cs231n/datasets/tiny-imagenet-100-A', subtract_mean=True)\n",
 70 |     "model = PretrainedCNN(h5_file='cs231n/datasets/pretrained_model.h5')"
 71 |    ]
 72 |   },
 73 |   {
 74 |    "cell_type": "markdown",
 75 |    "metadata": {},
 76 |    "source": [
 77 |     " # Class visualization\n",
 78 |     "By starting with a random noise image and performing gradient ascent on a target class, we can generate an image that the network will recognize as the target class. This idea was first presented in [1]; [2] extended this idea by suggesting several regularization techniques that can improve the quality of the generated image.\n",
 79 |     "\n",
 80 |     "Concretely, let $I$ be an image and let $y$ be a target class. Let $s_y(I)$ be the score that a convolutional network assigns to the image $I$ for class $y$; note that these are raw unnormalized scores, not class probabilities. We wish to generate an image $I^*$ that achieves a high score for the class $y$ by solving the problem\n",
 81 |     "\n",
 82 |     "$$\n",
 83 |     "I^* = \\arg\\max_I s_y(I) + R(I)\n",
 84 |     "$$\n",
 85 |     "\n",
 86 |     "where $R$ is a (possibly implicit) regularizer. We can solve this optimization problem using gradient descent, computing gradients with respect to the generated image. We will use (explicit) L2 regularization of the form\n",
 87 |     "\n",
 88 |     "$$\n",
 89 |     "R(I) + \\lambda \\|I\\|_2^2\n",
 90 |     "$$\n",
 91 |     "\n",
 92 |     "and implicit regularization as suggested by [2] by peridically blurring the generated image. We can solve this problem using gradient ascent on the generated image.\n",
 93 |     "\n",
 94 |     "In the cell below, complete the implementation of the `create_class_visualization` function.\n",
 95 |     "\n",
 96 |     "[1] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. \"Deep Inside Convolutional Networks: Visualising\n",
 97 |     "Image Classification Models and Saliency Maps\", ICLR Workshop 2014.\n",
 98 |     "\n",
 99 |     "[2] Yosinski et al, \"Understanding Neural Networks Through Deep Visualization\", ICML 2015 Deep Learning Workshop"
100 |    ]
101 |   },
102 |   {
103 |    "cell_type": "code",
104 |    "execution_count": null,
105 |    "metadata": {
106 |     "collapsed": true
107 |    },
108 |    "outputs": [],
109 |    "source": [
110 |     "def create_class_visualization(target_y, model, **kwargs):\n",
111 |     "  \"\"\"\n",
112 |     "  Perform optimization over the image to generate class visualizations.\n",
113 |     "  \n",
114 |     "  Inputs:\n",
115 |     "  - target_y: Integer in the range [0, 100) giving the target class\n",
116 |     "  - model: A PretrainedCNN that will be used for generation\n",
117 |     "  \n",
118 |     "  Keyword arguments:\n",
119 |     "  - learning_rate: Floating point number giving the learning rate\n",
120 |     "  - blur_every: An integer; how often to blur the image as a regularizer\n",
121 |     "  - l2_reg: Floating point number giving L2 regularization strength on the image;\n",
122 |     "    this is lambda in the equation above.\n",
123 |     "  - max_jitter: How much random jitter to add to the image as regularization\n",
124 |     "  - num_iterations: How many iterations to run for\n",
125 |     "  - show_every: How often to show the image\n",
126 |     "  \"\"\"\n",
127 |     "  \n",
128 |     "  learning_rate = kwargs.pop('learning_rate', 10000)\n",
129 |     "  blur_every = kwargs.pop('blur_every', 1)\n",
130 |     "  l2_reg = kwargs.pop('l2_reg', 1e-6)\n",
131 |     "  max_jitter = kwargs.pop('max_jitter', 4)\n",
132 |     "  num_iterations = kwargs.pop('num_iterations', 100)\n",
133 |     "  show_every = kwargs.pop('show_every', 25)\n",
134 |     "  \n",
135 |     "  X = np.random.randn(1, 3, 64, 64)\n",
136 |     "  for t in xrange(num_iterations):\n",
137 |     "    # As a regularizer, add random jitter to the image\n",
138 |     "    ox, oy = np.random.randint(-max_jitter, max_jitter+1, 2)\n",
139 |     "    X = np.roll(np.roll(X, ox, -1), oy, -2)\n",
140 |     "\n",
141 |     "    dX = None\n",
142 |     "    ############################################################################\n",
143 |     "    # TODO: Compute the image gradient dX of the image with respect to the     #\n",
144 |     "    # target_y class score. This should be similar to the fooling images. Also #\n",
145 |     "    # add L2 regularization to dX and update the image X using the image       #\n",
146 |     "    # gradient and the learning rate.                                          #\n",
147 |     "    ############################################################################\n",
148 |     "    y = np.array([target_y])\n",
149 |     "    v = 0\n",
150 |     "    mu = 0.95\n",
151 |     "    lr0 = 1000\n",
152 |     "    k=0.02\n",
153 |     "    \n",
154 |     "    for i in range(1000):\n",
155 |     "    loss, y_out, dX = model.calc_loss(X_fooling, y)\n",
156 |     "    \n",
157 |     "    lr = lr0*np.exp(-k*i)\n",
158 |     "    v = mu * v - lr * dX\n",
159 |     "    X_fooling += v\n",
160 |     "    print i, 'lr=', lr, y_out, y, 'loss=', loss\n",
161 |     "    if y_out==y:\n",
162 |     "        break\n",
163 |     "    ############################################################################\n",
164 |     "    #                             END OF YOUR CODE                             #\n",
165 |     "    ############################################################################\n",
166 |     "    \n",
167 |     "    # Undo the jitter\n",
168 |     "    X = np.roll(np.roll(X, -ox, -1), -oy, -2)\n",
169 |     "    \n",
170 |     "    # As a regularizer, clip the image\n",
171 |     "    X = np.clip(X, -data['mean_image'], 255.0 - data['mean_image'])\n",
172 |     "    \n",
173 |     "    # As a regularizer, periodically blur the image\n",
174 |     "    if t % blur_every == 0:\n",
175 |     "      X = blur_image(X)\n",
176 |     "    \n",
177 |     "    # Periodically show the image\n",
178 |     "    if t % show_every == 0:\n",
179 |     "      plt.imshow(deprocess_image(X, data['mean_image']))\n",
180 |     "      plt.gcf().set_size_inches(3, 3)\n",
181 |     "      plt.axis('off')\n",
182 |     "      plt.show()\n",
183 |     "  return X"
184 |    ]
185 |   },
186 |   {
187 |    "cell_type": "markdown",
188 |    "metadata": {},
189 |    "source": [
190 |     "You can use the code above to generate some cool images! An example is shown below. Try to generate a cool-looking image. If you want you can try to implement the other regularization schemes from Yosinski et al, but it isn't required."
191 |    ]
192 |   },
193 |   {
194 |    "cell_type": "code",
195 |    "execution_count": null,
196 |    "metadata": {
197 |     "collapsed": false
198 |    },
199 |    "outputs": [],
200 |    "source": [
201 |     "target_y = 43 # Tarantula\n",
202 |     "print data['class_names'][target_y]\n",
203 |     "X = create_class_visualization(target_y, model, show_every=25)"
204 |    ]
205 |   },
206 |   {
207 |    "cell_type": "markdown",
208 |    "metadata": {},
209 |    "source": [
210 |     "# Feature Inversion\n",
211 |     "In an attempt to understand the types of features that convolutional networks learn to recognize, a recent paper [1] attempts to reconstruct an image from its feature representation. We can easily implement this idea using image gradients from the pretrained network.\n",
212 |     "\n",
213 |     "Concretely, given a image $I$, let $\\phi_\\ell(I)$ be the activations at layer $\\ell$ of the convolutional network $\\phi$. We wish to find an image $I^*$ with a similar feature representation as $I$ at layer $\\ell$ of the network $\\phi$ by solving the optimization problem\n",
214 |     "\n",
215 |     "$$\n",
216 |     "I^* = \\arg\\min_{I'} \\|\\phi_\\ell(I) - \\phi_\\ell(I')\\|_2^2 + R(I')\n",
217 |     "$$\n",
218 |     "\n",
219 |     "where $\\|\\cdot\\|_2^2$ is the squared Euclidean norm. As above, $R$ is a (possibly implicit) regularizer. We can solve this optimization problem using gradient descent, computing gradients with respect to the generated image. We will use (explicit) L2 regularization of the form\n",
220 |     "\n",
221 |     "$$\n",
222 |     "R(I') + \\lambda \\|I'\\|_2^2\n",
223 |     "$$\n",
224 |     "\n",
225 |     "together with implicit regularization by periodically blurring the image, as recommended by [2].\n",
226 |     "\n",
227 |     "Implement this method in the function below.\n",
228 |     "\n",
229 |     "[1] Aravindh Mahendran, Andrea Vedaldi, \"Understanding Deep Image Representations by Inverting them\", CVPR 2015\n",
230 |     "\n",
231 |     "[2] Yosinski et al, \"Understanding Neural Networks Through Deep Visualization\", ICML 2015 Deep Learning Workshop"
232 |    ]
233 |   },
234 |   {
235 |    "cell_type": "code",
236 |    "execution_count": null,
237 |    "metadata": {
238 |     "collapsed": false
239 |    },
240 |    "outputs": [],
241 |    "source": [
242 |     "def invert_features(target_feats, layer, model, **kwargs):\n",
243 |     "  \"\"\"\n",
244 |     "  Perform feature inversion in the style of Mahendran and Vedaldi 2015, using\n",
245 |     "  L2 regularization and periodic blurring.\n",
246 |     "  \n",
247 |     "  Inputs:\n",
248 |     "  - target_feats: Image features of the target image, of shape (1, C, H, W);\n",
249 |     "    we will try to generate an image that matches these features\n",
250 |     "  - layer: The index of the layer from which the features were extracted\n",
251 |     "  - model: A PretrainedCNN that was used to extract features\n",
252 |     "  \n",
253 |     "  Keyword arguments:\n",
254 |     "  - learning_rate: The learning rate to use for gradient descent\n",
255 |     "  - num_iterations: The number of iterations to use for gradient descent\n",
256 |     "  - l2_reg: The strength of L2 regularization to use; this is lambda in the\n",
257 |     "    equation above.\n",
258 |     "  - blur_every: How often to blur the image as implicit regularization; set\n",
259 |     "    to 0 to disable blurring.\n",
260 |     "  - show_every: How often to show the generated image; set to 0 to disable\n",
261 |     "    showing intermediate reuslts.\n",
262 |     "    \n",
263 |     "  Returns:\n",
264 |     "  - X: Generated image of shape (1, 3, 64, 64) that matches the target features.\n",
265 |     "  \"\"\"\n",
266 |     "  learning_rate = kwargs.pop('learning_rate', 10000)\n",
267 |     "  num_iterations = kwargs.pop('num_iterations', 500)\n",
268 |     "  l2_reg = kwargs.pop('l2_reg', 1e-7)\n",
269 |     "  blur_every = kwargs.pop('blur_every', 1)\n",
270 |     "  show_every = kwargs.pop('show_every', 50)\n",
271 |     "  \n",
272 |     "  X = np.random.randn(1, 3, 64, 64)\n",
273 |     "  for t in xrange(num_iterations):\n",
274 |     "    ############################################################################\n",
275 |     "    # TODO: Compute the image gradient dX of the reconstruction loss with      #\n",
276 |     "    # respect to the image. You should include L2 regularization penalizing    #\n",
277 |     "    # large pixel values in the generated image using the l2_reg parameter;    #\n",
278 |     "    # then update the generated image using the learning_rate from above.      #\n",
279 |     "    ############################################################################\n",
280 |     "    pass\n",
281 |     "    ############################################################################\n",
282 |     "    #                             END OF YOUR CODE                             #\n",
283 |     "    ############################################################################\n",
284 |     "    \n",
285 |     "    # As a regularizer, clip the image\n",
286 |     "    X = np.clip(X, -data['mean_image'], 255.0 - data['mean_image'])\n",
287 |     "    \n",
288 |     "    # As a regularizer, periodically blur the image\n",
289 |     "    if (blur_every > 0) and t % blur_every == 0:\n",
290 |     "      X = blur_image(X)\n",
291 |     "\n",
292 |     "    if (show_every > 0) and (t % show_every == 0 or t + 1 == num_iterations):\n",
293 |     "      plt.imshow(deprocess_image(X, data['mean_image']))\n",
294 |     "      plt.gcf().set_size_inches(3, 3)\n",
295 |     "      plt.axis('off')\n",
296 |     "      plt.title('t = %d' % t)\n",
297 |     "      plt.show()"
298 |    ]
299 |   },
300 |   {
301 |    "cell_type": "markdown",
302 |    "metadata": {},
303 |    "source": [
304 |     "### Shallow feature reconstruction\n",
305 |     "After implementing the feature inversion above, run the following cell to try and reconstruct features from the fourth convolutional layer of the pretrained model. You should be able to reconstruct the features using the provided optimization parameters."
306 |    ]
307 |   },
308 |   {
309 |    "cell_type": "code",
310 |    "execution_count": null,
311 |    "metadata": {
312 |     "collapsed": false,
313 |     "scrolled": false
314 |    },
315 |    "outputs": [],
316 |    "source": [
317 |     "filename = 'kitten.jpg'\n",
318 |     "layer = 3 # layers start from 0 so these are features after 4 convolutions\n",
319 |     "img = imresize(imread(filename), (64, 64))\n",
320 |     "\n",
321 |     "plt.imshow(img)\n",
322 |     "plt.gcf().set_size_inches(3, 3)\n",
323 |     "plt.title('Original image')\n",
324 |     "plt.axis('off')\n",
325 |     "plt.show()\n",
326 |     "\n",
327 |     "# Preprocess the image before passing it to the network:\n",
328 |     "# subtract the mean, add a dimension, etc\n",
329 |     "img_pre = preprocess_image(img, data['mean_image'])\n",
330 |     "\n",
331 |     "# Extract features from the image\n",
332 |     "feats, _ = model.forward(img_pre, end=layer)\n",
333 |     "\n",
334 |     "# Invert the features\n",
335 |     "kwargs = {\n",
336 |     "  'num_iterations': 400,\n",
337 |     "  'learning_rate': 5000,\n",
338 |     "  'l2_reg': 1e-8,\n",
339 |     "  'show_every': 100,\n",
340 |     "  'blur_every': 10,\n",
341 |     "}\n",
342 |     "X = invert_features(feats, layer, model, **kwargs)"
343 |    ]
344 |   },
345 |   {
346 |    "cell_type": "markdown",
347 |    "metadata": {},
348 |    "source": [
349 |     "### Deep feature reconstruction\n",
350 |     "Reconstructing images using features from deeper layers of the network tends to give interesting results. In the cell below, try to reconstruct the best image you can by inverting the features after 7 layers of convolutions. You will need to play with the hyperparameters to try and get a good result.\n",
351 |     "\n",
352 |     "HINT: If you read the paper by Mahendran and Vedaldi, you'll see that reconstructions from deep features tend not to look much like the original image, so you shouldn't expect the results to look like the reconstruction above. You should be able to get an image that shows some discernable structure within 1000 iterations."
353 |    ]
354 |   },
355 |   {
356 |    "cell_type": "code",
357 |    "execution_count": null,
358 |    "metadata": {
359 |     "collapsed": false
360 |    },
361 |    "outputs": [],
362 |    "source": [
363 |     "filename = 'kitten.jpg'\n",
364 |     "layer = 6 # layers start from 0 so these are features after 7 convolutions\n",
365 |     "img = imresize(imread(filename), (64, 64))\n",
366 |     "\n",
367 |     "plt.imshow(img)\n",
368 |     "plt.gcf().set_size_inches(3, 3)\n",
369 |     "plt.title('Original image')\n",
370 |     "plt.axis('off')\n",
371 |     "plt.show()\n",
372 |     "\n",
373 |     "# Preprocess the image before passing it to the network:\n",
374 |     "# subtract the mean, add a dimension, etc\n",
375 |     "img_pre = preprocess_image(img, data['mean_image'])\n",
376 |     "\n",
377 |     "# Extract features from the image\n",
378 |     "feats, _ = model.forward(img_pre, end=layer)\n",
379 |     "\n",
380 |     "# Invert the features\n",
381 |     "# You will need to play with these parameters.\n",
382 |     "kwargs = {\n",
383 |     "  'num_iterations': 1000,\n",
384 |     "  'learning_rate': 0,\n",
385 |     "  'l2_reg': 0,\n",
386 |     "  'show_every': 100,\n",
387 |     "  'blur_every': 0,\n",
388 |     "}\n",
389 |     "X = invert_features(feats, layer, model, **kwargs)"
390 |    ]
391 |   },
392 |   {
393 |    "cell_type": "markdown",
394 |    "metadata": {},
395 |    "source": [
396 |     "# DeepDream\n",
397 |     "In the summer of 2015, Google released a [blog post](http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html) describing a new method of generating images from neural networks, and they later [released code](https://github.com/google/deepdream) to generate these images.\n",
398 |     "\n",
399 |     "The idea is very simple. We pick some layer from the network, pass the starting image through the network to extract features at the chosen layer, set the gradient at that layer equal to the activations themselves, and then backpropagate to the image. This has the effect of modifying the image to amplify the activations at the chosen layer of the network.\n",
400 |     "\n",
401 |     "For DeepDream we usually extract features from one of the convolutional layers, allowing us to generate images of any resolution.\n",
402 |     "\n",
403 |     "We can implement this idea using our pretrained network. The results probably won't look as good as Google's since their network is much bigger, but we should still be able to generate some interesting images."
404 |    ]
405 |   },
406 |   {
407 |    "cell_type": "code",
408 |    "execution_count": null,
409 |    "metadata": {
410 |     "collapsed": false
411 |    },
412 |    "outputs": [],
413 |    "source": [
414 |     "def deepdream(X, layer, model, **kwargs):\n",
415 |     "  \"\"\"\n",
416 |     "  Generate a DeepDream image.\n",
417 |     "  \n",
418 |     "  Inputs:\n",
419 |     "  - X: Starting image, of shape (1, 3, H, W)\n",
420 |     "  - layer: Index of layer at which to dream\n",
421 |     "  - model: A PretrainedCNN object\n",
422 |     "  \n",
423 |     "  Keyword arguments:\n",
424 |     "  - learning_rate: How much to update the image at each iteration\n",
425 |     "  - max_jitter: Maximum number of pixels for jitter regularization\n",
426 |     "  - num_iterations: How many iterations to run for\n",
427 |     "  - show_every: How often to show the generated image\n",
428 |     "  \"\"\"\n",
429 |     "  \n",
430 |     "  X = X.copy()\n",
431 |     "  \n",
432 |     "  learning_rate = kwargs.pop('learning_rate', 5.0)\n",
433 |     "  max_jitter = kwargs.pop('max_jitter', 16)\n",
434 |     "  num_iterations = kwargs.pop('num_iterations', 100)\n",
435 |     "  show_every = kwargs.pop('show_every', 25)\n",
436 |     "  \n",
437 |     "  for t in xrange(num_iterations):\n",
438 |     "    # As a regularizer, add random jitter to the image\n",
439 |     "    ox, oy = np.random.randint(-max_jitter, max_jitter+1, 2)\n",
440 |     "    X = np.roll(np.roll(X, ox, -1), oy, -2)\n",
441 |     "\n",
442 |     "    dX = None\n",
443 |     "    ############################################################################\n",
444 |     "    # TODO: Compute the image gradient dX using the DeepDream method. You'll   #\n",
445 |     "    # need to use the forward and backward methods of the model object to      #\n",
446 |     "    # extract activations and set gradients for the chosen layer. After        #\n",
447 |     "    # computing the image gradient dX, you should use the learning rate to     #\n",
448 |     "    # update the image X.                                                      #\n",
449 |     "    ############################################################################\n",
450 |     "    pass\n",
451 |     "    ############################################################################\n",
452 |     "    #                             END OF YOUR CODE                             #\n",
453 |     "    ############################################################################\n",
454 |     "    \n",
455 |     "    # Undo the jitter\n",
456 |     "    X = np.roll(np.roll(X, -ox, -1), -oy, -2)\n",
457 |     "    \n",
458 |     "    # As a regularizer, clip the image\n",
459 |     "    mean_pixel = data['mean_image'].mean(axis=(1, 2), keepdims=True)\n",
460 |     "    X = np.clip(X, -mean_pixel, 255.0 - mean_pixel)\n",
461 |     "    \n",
462 |     "    # Periodically show the image\n",
463 |     "    if t == 0 or (t + 1) % show_every == 0:\n",
464 |     "      img = deprocess_image(X, data['mean_image'], mean='pixel')\n",
465 |     "      plt.imshow(img)\n",
466 |     "      plt.title('t = %d' % (t + 1))\n",
467 |     "      plt.gcf().set_size_inches(8, 8)\n",
468 |     "      plt.axis('off')\n",
469 |     "      plt.show()\n",
470 |     "  return X"
471 |    ]
472 |   },
473 |   {
474 |    "cell_type": "markdown",
475 |    "metadata": {},
476 |    "source": [
477 |     "# Generate some images!\n",
478 |     "Try and generate a cool-looking DeepDeam image using the pretrained network. You can try using different layers, or starting from different images. You can reduce the image size if it runs too slowly on your machine, or increase the image size if you are feeling ambitious."
479 |    ]
480 |   },
481 |   {
482 |    "cell_type": "code",
483 |    "execution_count": null,
484 |    "metadata": {
485 |     "collapsed": false,
486 |     "scrolled": false
487 |    },
488 |    "outputs": [],
489 |    "source": [
490 |     "def read_image(filename, max_size):\n",
491 |     "  \"\"\"\n",
492 |     "  Read an image from disk and resize it so its larger side is max_size\n",
493 |     "  \"\"\"\n",
494 |     "  img = imread(filename)\n",
495 |     "  H, W, _ = img.shape\n",
496 |     "  if H >= W:\n",
497 |     "    img = imresize(img, (max_size, int(W * float(max_size) / H)))\n",
498 |     "  elif H < W:\n",
499 |     "    img = imresize(img, (int(H * float(max_size) / W), max_size))\n",
500 |     "  return img\n",
501 |     "\n",
502 |     "filename = 'kitten.jpg'\n",
503 |     "max_size = 256\n",
504 |     "img = read_image(filename, max_size)\n",
505 |     "plt.imshow(img)\n",
506 |     "plt.axis('off')\n",
507 |     "\n",
508 |     "# Preprocess the image by converting to float, transposing,\n",
509 |     "# and performing mean subtraction.\n",
510 |     "img_pre = preprocess_image(img, data['mean_image'], mean='pixel')\n",
511 |     "\n",
512 |     "out = deepdream(img_pre, 7, model, learning_rate=2000)"
513 |    ]
514 |   }
515 |  ],
516 |  "metadata": {
517 |   "kernelspec": {
518 |    "display_name": "Python 2",
519 |    "language": "python",
520 |    "name": "python2"
521 |   },
522 |   "language_info": {
523 |    "codemirror_mode": {
524 |     "name": "ipython",
525 |     "version": 2
526 |    },
527 |    "file_extension": ".py",
528 |    "mimetype": "text/x-python",
529 |    "name": "python",
530 |    "nbconvert_exporter": "python",
531 |    "pygments_lexer": "ipython2",
532 |    "version": "2.7.10"
533 |   }
534 |  },
535 |  "nbformat": 4,
536 |  "nbformat_minor": 0
537 | }
538 | 


--------------------------------------------------------------------------------
/assignment3/collectSubmission.sh:
--------------------------------------------------------------------------------
1 | rm -f assignment3.zip
2 | zip -r assignment3.zip . -x "*.git" "*cs231n/datasets*" "*.ipynb_checkpoints*" "*README.md" "*collectSubmission.sh" "*requirements.txt" ".env/*" "*.pyc" "*cs231n/build/*"
3 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/.gitignore:
--------------------------------------------------------------------------------
1 | build/*
2 | im2col_cython.c
3 | im2col_cython.so
4 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment3/cs231n/__init__.py


--------------------------------------------------------------------------------
/assignment3/cs231n/captioning_solver.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | from cs231n import optim
  4 | from cs231n.coco_utils import sample_coco_minibatch
  5 | 
  6 | 
  7 | class CaptioningSolver(object):
  8 |   """
  9 |   A CaptioningSolver encapsulates all the logic necessary for training
 10 |   image captioning models. The CaptioningSolver performs stochastic gradient
 11 |   descent using different update rules defined in optim.py.
 12 | 
 13 |   The solver accepts both training and validataion data and labels so it can
 14 |   periodically check classification accuracy on both training and validation
 15 |   data to watch out for overfitting.
 16 | 
 17 |   To train a model, you will first construct a CaptioningSolver instance,
 18 |   passing the model, dataset, and various options (learning rate, batch size,
 19 |   etc) to the constructor. You will then call the train() method to run the 
 20 |   optimization procedure and train the model.
 21 |   
 22 |   After the train() method returns, model.params will contain the parameters
 23 |   that performed best on the validation set over the course of training.
 24 |   In addition, the instance variable solver.loss_history will contain a list
 25 |   of all losses encountered during training and the instance variables
 26 |   solver.train_acc_history and solver.val_acc_history will be lists containing
 27 |   the accuracies of the model on the training and validation set at each epoch.
 28 |   
 29 |   Example usage might look something like this:
 30 |   
 31 |   data = load_coco_data()
 32 |   model = MyAwesomeModel(hidden_dim=100)
 33 |   solver = CaptioningSolver(model, data,
 34 |                   update_rule='sgd',
 35 |                   optim_config={
 36 |                     'learning_rate': 1e-3,
 37 |                   },
 38 |                   lr_decay=0.95,
 39 |                   num_epochs=10, batch_size=100,
 40 |                   print_every=100)
 41 |   solver.train()
 42 | 
 43 | 
 44 |   A CaptioningSolver works on a model object that must conform to the following
 45 |   API:
 46 | 
 47 |   - model.params must be a dictionary mapping string parameter names to numpy
 48 |     arrays containing parameter values.
 49 | 
 50 |   - model.loss(features, captions) must be a function that computes
 51 |     training-time loss and gradients, with the following inputs and outputs:
 52 | 
 53 |     Inputs:
 54 |     - features: Array giving a minibatch of features for images, of shape (N, D
 55 |     - captions: Array of captions for those images, of shape (N, T) where
 56 |       each element is in the range (0, V].
 57 | 
 58 |     Returns:
 59 |     - loss: Scalar giving the loss
 60 |     - grads: Dictionary with the same keys as self.params mapping parameter
 61 |       names to gradients of the loss with respect to those parameters.
 62 |   """
 63 | 
 64 |   def __init__(self, model, data, **kwargs):
 65 |     """
 66 |     Construct a new CaptioningSolver instance.
 67 |     
 68 |     Required arguments:
 69 |     - model: A model object conforming to the API described above
 70 |     - data: A dictionary of training and validation data from load_coco_data
 71 | 
 72 |     Optional arguments:
 73 |     - update_rule: A string giving the name of an update rule in optim.py.
 74 |       Default is 'sgd'.
 75 |     - optim_config: A dictionary containing hyperparameters that will be
 76 |       passed to the chosen update rule. Each update rule requires different
 77 |       hyperparameters (see optim.py) but all update rules require a
 78 |       'learning_rate' parameter so that should always be present.
 79 |     - lr_decay: A scalar for learning rate decay; after each epoch the learning
 80 |       rate is multiplied by this value.
 81 |     - batch_size: Size of minibatches used to compute loss and gradient during
 82 |       training.
 83 |     - num_epochs: The number of epochs to run for during training.
 84 |     - print_every: Integer; training losses will be printed every print_every
 85 |       iterations.
 86 |     - verbose: Boolean; if set to false then no output will be printed during
 87 |       training.
 88 |     """
 89 |     self.model = model
 90 |     self.data = data
 91 |     
 92 |     # Unpack keyword arguments
 93 |     self.update_rule = kwargs.pop('update_rule', 'sgd')
 94 |     self.optim_config = kwargs.pop('optim_config', {})
 95 |     self.lr_decay = kwargs.pop('lr_decay', 1.0)
 96 |     self.batch_size = kwargs.pop('batch_size', 100)
 97 |     self.num_epochs = kwargs.pop('num_epochs', 10)
 98 | 
 99 |     self.print_every = kwargs.pop('print_every', 10)
100 |     self.verbose = kwargs.pop('verbose', True)
101 | 
102 |     # Throw an error if there are extra keyword arguments
103 |     if len(kwargs) > 0:
104 |       extra = ', '.join('"%s"' % k for k in kwargs.keys())
105 |       raise ValueError('Unrecognized arguments %s' % extra)
106 | 
107 |     # Make sure the update rule exists, then replace the string
108 |     # name with the actual function
109 |     if not hasattr(optim, self.update_rule):
110 |       raise ValueError('Invalid update_rule "%s"' % self.update_rule)
111 |     self.update_rule = getattr(optim, self.update_rule)
112 | 
113 |     self._reset()
114 | 
115 | 
116 |   def _reset(self):
117 |     """
118 |     Set up some book-keeping variables for optimization. Don't call this
119 |     manually.
120 |     """
121 |     # Set up some variables for book-keeping
122 |     self.epoch = 0
123 |     self.best_val_acc = 0
124 |     self.best_params = {}
125 |     self.loss_history = []
126 |     self.train_acc_history = []
127 |     self.val_acc_history = []
128 | 
129 |     # Make a deep copy of the optim_config for each parameter
130 |     self.optim_configs = {}
131 |     for p in self.model.params:
132 |       d = {k: v for k, v in self.optim_config.iteritems()}
133 |       self.optim_configs[p] = d
134 | 
135 | 
136 |   def _step(self):
137 |     """
138 |     Make a single gradient update. This is called by train() and should not
139 |     be called manually.
140 |     """
141 |     # Make a minibatch of training data
142 |     minibatch = sample_coco_minibatch(self.data,
143 |                   batch_size=self.batch_size,
144 |                   split='train')
145 |     captions, features, urls = minibatch
146 | 
147 |     # Compute loss and gradient
148 |     loss, grads = self.model.loss(features, captions)
149 |     self.loss_history.append(loss)
150 | 
151 |     # Perform a parameter update
152 |     for p, w in self.model.params.iteritems():
153 |       dw = grads[p]
154 |       config = self.optim_configs[p]
155 |       next_w, next_config = self.update_rule(w, dw, config)
156 |       self.model.params[p] = next_w
157 |       self.optim_configs[p] = next_config
158 | 
159 |   
160 |   # TODO: This does nothing right now; maybe implement BLEU?
161 |   def check_accuracy(self, X, y, num_samples=None, batch_size=100):
162 |     """
163 |     Check accuracy of the model on the provided data.
164 |     
165 |     Inputs:
166 |     - X: Array of data, of shape (N, d_1, ..., d_k)
167 |     - y: Array of labels, of shape (N,)
168 |     - num_samples: If not None, subsample the data and only test the model
169 |       on num_samples datapoints.
170 |     - batch_size: Split X and y into batches of this size to avoid using too
171 |       much memory.
172 |       
173 |     Returns:
174 |     - acc: Scalar giving the fraction of instances that were correctly
175 |       classified by the model.
176 |     """
177 |     return 0.0
178 |     
179 |     # Maybe subsample the data
180 |     N = X.shape[0]
181 |     if num_samples is not None and N > num_samples:
182 |       mask = np.random.choice(N, num_samples)
183 |       N = num_samples
184 |       X = X[mask]
185 |       y = y[mask]
186 | 
187 |     # Compute predictions in batches
188 |     num_batches = N / batch_size
189 |     if N % batch_size != 0:
190 |       num_batches += 1
191 |     y_pred = []
192 |     for i in xrange(num_batches):
193 |       start = i * batch_size
194 |       end = (i + 1) * batch_size
195 |       scores = self.model.loss(X[start:end])
196 |       y_pred.append(np.argmax(scores, axis=1))
197 |     y_pred = np.hstack(y_pred)
198 |     acc = np.mean(y_pred == y)
199 | 
200 |     return acc
201 | 
202 | 
203 |   def train(self):
204 |     """
205 |     Run optimization to train the model.
206 |     """
207 |     num_train = self.data['train_captions'].shape[0]
208 |     iterations_per_epoch = max(num_train / self.batch_size, 1)
209 |     num_iterations = self.num_epochs * iterations_per_epoch
210 | 
211 |     for t in xrange(num_iterations):
212 |       self._step()
213 | 
214 |       # Maybe print training loss
215 |       if self.verbose and t % self.print_every == 0:
216 |         print '(Iteration %d / %d) loss: %f' % (
217 |                t + 1, num_iterations, self.loss_history[-1])
218 | 
219 |       # At the end of every epoch, increment the epoch counter and decay the
220 |       # learning rate.
221 |       epoch_end = (t + 1) % iterations_per_epoch == 0
222 |       if epoch_end:
223 |         self.epoch += 1
224 |         for k in self.optim_configs:
225 |           self.optim_configs[k]['learning_rate'] *= self.lr_decay
226 | 
227 |       # Check train and val accuracy on the first iteration, the last
228 |       # iteration, and at the end of each epoch.
229 |       # TODO: Implement some logic to check Bleu on validation set periodically
230 | 
231 |     # At the end of training swap the best params into the model
232 |     # self.model.params = self.best_params
233 | 
234 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/classifiers/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment3/cs231n/classifiers/__init__.py


--------------------------------------------------------------------------------
/assignment3/cs231n/classifiers/pretrained_cnn.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import h5py
  3 | 
  4 | from cs231n.layers import *
  5 | from cs231n.fast_layers import *
  6 | from cs231n.layer_utils import *
  7 | 
  8 | 
  9 | def print_tuple(x):
 10 |   for item in x:
 11 |     if type(item)==tuple:
 12 |       print_tuple(item)
 13 |     elif type(item)==np.ndarray:
 14 |       print '    C:', item.shape, np.sum(np.absolute(item))
 15 |     else:
 16 |       print '    C:', item
 17 | 
 18 | class PretrainedCNN(object):
 19 |   def __init__(self, dtype=np.float32, num_classes=100, input_size=64, h5_file=None):
 20 |     self.debug = False
 21 |     self.dtype = dtype
 22 |     self.conv_params = []
 23 |     self.input_size = input_size
 24 |     self.num_classes = num_classes
 25 |     
 26 |     # TODO: In the future it would be nice if the architecture could be loaded from
 27 |     # the HDF5 file rather than being hardcoded. For now this will have to do.
 28 |     self.conv_params.append({'stride': 2, 'pad': 2})
 29 |     self.conv_params.append({'stride': 1, 'pad': 1})
 30 |     self.conv_params.append({'stride': 2, 'pad': 1})
 31 |     self.conv_params.append({'stride': 1, 'pad': 1})
 32 |     self.conv_params.append({'stride': 2, 'pad': 1})
 33 |     self.conv_params.append({'stride': 1, 'pad': 1})
 34 |     self.conv_params.append({'stride': 2, 'pad': 1})
 35 |     self.conv_params.append({'stride': 1, 'pad': 1})
 36 |     self.conv_params.append({'stride': 2, 'pad': 1})
 37 | 
 38 |     self.filter_sizes = [5, 3, 3, 3, 3, 3, 3, 3, 3]
 39 |     self.num_filters = [64, 64, 128, 128, 256, 256, 512, 512, 1024]
 40 |     hidden_dim = 512
 41 | 
 42 |     self.bn_params = []
 43 |     
 44 |     cur_size = input_size
 45 |     prev_dim = 3
 46 |     self.params = {}
 47 |     for i, (f, next_dim) in enumerate(zip(self.filter_sizes, self.num_filters)):
 48 |       fan_in = f * f * prev_dim
 49 |       self.params['W%d' % (i + 1)] = np.sqrt(2.0 / fan_in) * np.random.randn(next_dim, prev_dim, f, f)
 50 |       self.params['b%d' % (i + 1)] = np.zeros(next_dim)
 51 |       self.params['gamma%d' % (i + 1)] = np.ones(next_dim)
 52 |       self.params['beta%d' % (i + 1)] = np.zeros(next_dim)
 53 |       self.bn_params.append({'mode': 'train'})
 54 |       prev_dim = next_dim
 55 |       if self.conv_params[i]['stride'] == 2: cur_size /= 2
 56 |     
 57 |     # Add a fully-connected layers
 58 |     fan_in = cur_size * cur_size * self.num_filters[-1]
 59 |     self.params['W%d' % (i + 2)] = np.sqrt(2.0 / fan_in) * np.random.randn(fan_in, hidden_dim)
 60 |     self.params['b%d' % (i + 2)] = np.zeros(hidden_dim)
 61 |     self.params['gamma%d' % (i + 2)] = np.ones(hidden_dim)
 62 |     self.params['beta%d' % (i + 2)] = np.zeros(hidden_dim)
 63 |     self.bn_params.append({'mode': 'train'})
 64 |     self.params['W%d' % (i + 3)] = np.sqrt(2.0 / hidden_dim) * np.random.randn(hidden_dim, num_classes)
 65 |     self.params['b%d' % (i + 3)] = np.zeros(num_classes)
 66 |     
 67 |     for k, v in self.params.iteritems():
 68 |       self.params[k] = v.astype(dtype)
 69 | 
 70 |     if h5_file is not None:
 71 |       self.load_weights(h5_file)
 72 | 
 73 |   
 74 |   def load_weights(self, h5_file, verbose=False):
 75 |     """
 76 |     Load pretrained weights from an HDF5 file.
 77 | 
 78 |     Inputs:
 79 |     - h5_file: Path to the HDF5 file where pretrained weights are stored.
 80 |     - verbose: Whether to print debugging info
 81 |     """
 82 | 
 83 |     # Before loading weights we need to make a dummy forward pass to initialize
 84 |     # the running averages in the bn_pararams
 85 |     x = np.random.randn(1, 3, self.input_size, self.input_size)
 86 |     y = np.random.randint(self.num_classes, size=1)
 87 |     loss, grads = self.loss(x, y)
 88 | 
 89 |     with h5py.File(h5_file, 'r') as f:
 90 |       for k, v in f.iteritems():
 91 |         v = np.asarray(v)
 92 |         if k in self.params:
 93 |           if verbose: print k, v.shape, self.params[k].shape
 94 |           if v.shape == self.params[k].shape:
 95 |             self.params[k] = v.copy()
 96 |           elif v.T.shape == self.params[k].shape:
 97 |             self.params[k] = v.T.copy()
 98 |           else:
 99 |             raise ValueError('shapes for %s do not match' % k)
100 |         if k.startswith('running_mean'):
101 |           i = int(k[12:]) - 1
102 |           assert self.bn_params[i]['running_mean'].shape == v.shape
103 |           self.bn_params[i]['running_mean'] = v.copy()
104 |           if verbose: print k, v.shape
105 |         if k.startswith('running_var'):
106 |           i = int(k[11:]) - 1
107 |           assert v.shape == self.bn_params[i]['running_var'].shape
108 |           self.bn_params[i]['running_var'] = v.copy()
109 |           if verbose: print k, v.shape
110 |         
111 |     for k, v in self.params.iteritems():
112 |       self.params[k] = v.astype(self.dtype)
113 | 
114 |   
115 |   def forward(self, X, start=None, end=None, mode='test'):
116 |     """
117 |     Run part of the model forward, starting and ending at an arbitrary layer,
118 |     in either training mode or testing mode.
119 | 
120 |     You can pass arbitrary input to the starting layer, and you will receive
121 |     output from the ending layer and a cache object that can be used to run
122 |     the model backward over the same set of layers.
123 | 
124 |     For the purposes of this function, a "layer" is one of the following blocks:
125 | 
126 |     [conv - spatial batchnorm - relu] (There are 9 of these)
127 |     [affine - batchnorm - relu] (There is one of these)
128 |     [affine] (There is one of these)
129 | 
130 |     Inputs:
131 |     - X: The input to the starting layer. If start=0, then this should be an
132 |       array of shape (N, C, 64, 64).
133 |     - start: The index of the layer to start from. start=0 starts from the first
134 |       convolutional layer. Default is 0.
135 |     - end: The index of the layer to end at. start=11 ends at the last
136 |       fully-connected layer, returning class scores. Default is 11.
137 |     - mode: The mode to use, either 'test' or 'train'. We need this because
138 |       batch normalization behaves differently at training time and test time.
139 | 
140 |     Returns:
141 |     - out: Output from the end layer.
142 |     - cache: A cache object that can be passed to the backward method to run the
143 |       network backward over the same range of layers.
144 |     """
145 |     X = X.astype(self.dtype)
146 |     if start is None: start = 0
147 |     if end is None: end = len(self.conv_params) + 1
148 |     layer_caches = []
149 | 
150 |     prev_a = X
151 |     for i in xrange(start, end + 1):
152 |       i1 = i + 1
153 |       if 0 <= i < len(self.conv_params):
154 |         # This is a conv layer
155 |         w, b = self.params['W%d' % i1], self.params['b%d' % i1]
156 |         gamma, beta = self.params['gamma%d' % i1], self.params['beta%d' % i1]
157 |         conv_param = self.conv_params[i]
158 |         bn_param = self.bn_params[i]
159 |         bn_param['mode'] = mode
160 | 
161 |         next_a, cache = conv_bn_relu_forward(prev_a, w, b, gamma, beta, conv_param, bn_param)
162 |       elif i == len(self.conv_params):
163 |         # This is the fully-connected hidden layer
164 |         w, b = self.params['W%d' % i1], self.params['b%d' % i1]
165 |         gamma, beta = self.params['gamma%d' % i1], self.params['beta%d' % i1]
166 |         bn_param = self.bn_params[i]
167 |         bn_param['mode'] = mode
168 |         next_a, cache = affine_bn_relu_forward(prev_a, w, b, gamma, beta, bn_param)
169 |       elif i == len(self.conv_params) + 1:
170 |         # This is the last fully-connected layer that produces scores
171 |         w, b = self.params['W%d' % i1], self.params['b%d' % i1]
172 |         next_a, cache = affine_forward(prev_a, w, b)
173 |       else:
174 |         raise ValueError('Invalid layer index %d' % i)
175 | 
176 |       layer_caches.append(cache)
177 |       prev_a = next_a
178 | 
179 |     out = prev_a
180 |     cache = (start, end, layer_caches)
181 |     return out, cache
182 | 
183 |   def backward(self, dout, cache):
184 |     """
185 |     Run the model backward over a sequence of layers that were previously run
186 |     forward using the self.forward method.
187 | 
188 |     Inputs:
189 |     - dout: Gradient with respect to the ending layer; this should have the same
190 |       shape as the out variable returned from the corresponding call to forward.
191 |     - cache: A cache object returned from self.forward.
192 | 
193 |     Returns:
194 |     - dX: Gradient with respect to the start layer. This will have the same
195 |       shape as the input X passed to self.forward.
196 |     - grads: Gradient of all parameters in the layers. For example if you run
197 |       forward through two convolutional layers, then on the corresponding call
198 |       to backward grads will contain the gradients with respect to the weights,
199 |       biases, and spatial batchnorm parameters of those two convolutional
200 |       layers. The grads dictionary will therefore contain a subset of the keys
201 |       of self.params, and grads[k] and self.params[k] will have the same shape.
202 |     """
203 |     start, end, layer_caches = cache
204 |     dnext_a = dout
205 |     grads = {}
206 |     cache = None
207 |     for i in reversed(range(start, end + 1)):
208 |       if self.debug:
209 |         print 'layer:', i+1, "top gradient's shape and abs sum:", dnext_a.shape, np.sum(np.absolute(dnext_a))
210 | 
211 |       i1 = i + 1
212 |       if i == len(self.conv_params) + 1:
213 |         cache = layer_caches.pop()
214 |         # This is the last fully-connected layer
215 |         dprev_a, dw, db = affine_backward(dnext_a, cache)
216 |         grads['W%d' % i1] = dw
217 |         grads['b%d' % i1] = db
218 |       elif i == len(self.conv_params):
219 |         # This is the fully-connected hidden layer
220 |         cache = layer_caches.pop()
221 |         temp = affine_bn_relu_backward(dnext_a, cache)
222 |         dprev_a, dw, db, dgamma, dbeta = temp
223 |         grads['W%d' % i1] = dw
224 |         grads['b%d' % i1] = db
225 |         grads['gamma%d' % i1] = dgamma
226 |         grads['beta%d' % i1] = dbeta
227 |       elif 0 <= i < len(self.conv_params):
228 |         # This is a conv layer
229 |         cache = layer_caches.pop()
230 |         temp = conv_bn_relu_backward(dnext_a, cache)
231 |         dprev_a, dw, db, dgamma, dbeta = temp
232 |         grads['W%d' % i1] = dw
233 |         grads['b%d' % i1] = db
234 |         grads['gamma%d' % i1] = dgamma
235 |         grads['beta%d' % i1] = dbeta
236 |       else:
237 |         raise ValueError('Invalid layer index %d' % i)
238 |       dnext_a = dprev_a
239 | 
240 |       if self.debug:
241 |         names = []
242 |         for name, item in grads.iteritems():
243 |           print name, '->G', item.shape, np.sum(np.absolute(item))
244 |           names.append(name)
245 |         for name in names:
246 |           del grads[name]
247 | 
248 |         print_tuple(cache)
249 | 
250 |         dnext_a = dprev_a
251 |         print 'layer:', i+1, "bottom gradient's shape and abs sum:", dnext_a.shape, np.sum(np.absolute(dnext_a))
252 |         raw_input()
253 | 
254 |     dX = dnext_a
255 |     return dX, grads
256 | 
257 | 
258 |   def loss(self, X, y=None):
259 |     """
260 |     Classification loss used to train the network.
261 | 
262 |     Inputs:
263 |     - X: Array of data, of shape (N, 3, 64, 64)
264 |     - y: Array of labels, of shape (N,)
265 | 
266 |     If y is None, then run a test-time forward pass and return:
267 |     - scores: Array of shape (N, 100) giving class scores.
268 | 
269 |     If y is not None, then run a training-time forward and backward pass and
270 |     return a tuple of:
271 |     - loss: Scalar giving loss
272 |     - grads: Dictionary of gradients, with the same keys as self.params.
273 |     """
274 |     # Note that we implement this by just caling self.forward and self.backward
275 |     mode = 'test' if y is None else 'train'
276 |     scores, cache = self.forward(X, mode=mode)
277 |     if mode == 'test':
278 |       return scores
279 |     loss, dscores = softmax_loss(scores, y)
280 | 
281 |     dX, grads = self.backward(dscores, cache)
282 |     return loss, grads
283 | 
284 |   def calc_loss(self, X, y):
285 |     scores, cache = self.forward(X, mode='test')
286 |     loss, dscores = softmax_loss(scores, y)
287 |     dX, _ = self.backward(dscores, cache)
288 |     return loss, np.argmax(scores), dX
289 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/classifiers/rnn.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | from cs231n.layers import *
  4 | from cs231n.rnn_layers import *
  5 | 
  6 | 
  7 | class CaptioningRNN(object):
  8 |   """
  9 |   A CaptioningRNN produces captions from image features using a recurrent
 10 |   neural network.
 11 | 
 12 |   The RNN receives input vectors of size D, has a vocab size of V, works on
 13 |   sequences of length T, has an RNN hidden dimension of H, uses word vectors
 14 |   of dimension W, and operates on minibatches of size N.
 15 | 
 16 |   Note that we don't use any regularization for the CaptioningRNN.
 17 |   """
 18 |   
 19 |   def __init__(self, word_to_idx, input_dim=512, wordvec_dim=128,
 20 |                hidden_dim=128, cell_type='rnn', dtype=np.float32):
 21 |     """
 22 |     Construct a new CaptioningRNN instance.
 23 | 
 24 |     Inputs:
 25 |     - word_to_idx: A dictionary giving the vocabulary. It contains V entries,
 26 |       and maps each string to a unique integer in the range [0, V).
 27 |     - input_dim: Dimension D of input image feature vectors.
 28 |     - wordvec_dim: Dimension W of word vectors.
 29 |     - hidden_dim: Dimension H for the hidden state of the RNN.
 30 |     - cell_type: What type of RNN to use; either 'rnn' or 'lstm'.
 31 |     - dtype: numpy datatype to use; use float32 for training and float64 for
 32 |       numeric gradient checking.
 33 |     """
 34 |     if cell_type not in {'rnn', 'lstm'}:
 35 |       raise ValueError('Invalid cell_type "%s"' % cell_type)
 36 |     
 37 |     self.cell_type = cell_type
 38 |     self.dtype = dtype
 39 |     self.word_to_idx = word_to_idx
 40 |     self.idx_to_word = {i: w for w, i in word_to_idx.iteritems()}
 41 |     self.params = {}
 42 |     
 43 |     vocab_size = len(word_to_idx)
 44 | 
 45 |     self._null = word_to_idx['<NULL>']
 46 |     self._start = word_to_idx.get('<START>', None)
 47 |     self._end = word_to_idx.get('<END>', None)
 48 |     
 49 |     # Initialize word vectors
 50 |     self.params['W_embed'] = np.random.randn(vocab_size, wordvec_dim)
 51 |     self.params['W_embed'] /= 100
 52 |     
 53 |     # Initialize CNN -> hidden state projection parameters
 54 |     self.params['W_proj'] = np.random.randn(input_dim, hidden_dim)
 55 |     self.params['W_proj'] /= np.sqrt(input_dim)
 56 |     self.params['b_proj'] = np.zeros(hidden_dim)
 57 | 
 58 |     # Initialize parameters for the RNN
 59 |     dim_mul = {'lstm': 4, 'rnn': 1}[cell_type]
 60 |     self.params['Wx'] = np.random.randn(wordvec_dim, dim_mul * hidden_dim)
 61 |     self.params['Wx'] /= np.sqrt(wordvec_dim)
 62 |     self.params['Wh'] = np.random.randn(hidden_dim, dim_mul * hidden_dim)
 63 |     self.params['Wh'] /= np.sqrt(hidden_dim)
 64 |     self.params['b'] = np.zeros(dim_mul * hidden_dim)
 65 |     
 66 |     # Initialize output to vocab weights
 67 |     self.params['W_vocab'] = np.random.randn(hidden_dim, vocab_size)
 68 |     self.params['W_vocab'] /= np.sqrt(hidden_dim)
 69 |     self.params['b_vocab'] = np.zeros(vocab_size)
 70 |       
 71 |     # Cast parameters to correct dtype
 72 |     for k, v in self.params.iteritems():
 73 |       self.params[k] = v.astype(self.dtype)
 74 | 
 75 | 
 76 |   def loss(self, features, captions):
 77 |     """
 78 |     Compute training-time loss for the RNN. We input image features and
 79 |     ground-truth captions for those images, and use an RNN (or LSTM) to compute
 80 |     loss and gradients on all parameters.
 81 |     
 82 |     Inputs:
 83 |     - features: Input image features, of shape (N, D)
 84 |     - captions: Ground-truth captions; an integer array of shape (N, T) where
 85 |       each element is in the range 0 <= y[i, t] < V
 86 |       
 87 |     Returns a tuple of:
 88 |     - loss: Scalar loss
 89 |     - grads: Dictionary of gradients parallel to self.params
 90 |     """
 91 |     # Cut captions into two pieces: captions_in has everything but the last word
 92 |     # and will be input to the RNN; captions_out has everything but the first
 93 |     # word and this is what we will expect the RNN to generate. These are offset
 94 |     # by one relative to each other because the RNN should produce word (t+1)
 95 |     # after receiving word t. The first element of captions_in will be the START
 96 |     # token, and the first element of captions_out will be the first word.
 97 |     captions_in = captions[:, :-1]
 98 |     captions_out = captions[:, 1:]
 99 |     
100 |     # You'll need this 
101 |     mask = (captions_out != self._null)
102 |     # I think here author made a mistake
103 | 
104 |     # Weight and bias for the affine transform from image features to initial
105 |     # hidden state
106 |     W_proj, b_proj = self.params['W_proj'], self.params['b_proj']
107 |     
108 |     # Word embedding matrix
109 |     W_embed = self.params['W_embed']
110 | 
111 |     # Input-to-hidden, hidden-to-hidden, and biases for the RNN
112 |     Wx, Wh, b = self.params['Wx'], self.params['Wh'], self.params['b']
113 | 
114 |     # Weight and bias for the hidden-to-vocab transformation.
115 |     W_vocab, b_vocab = self.params['W_vocab'], self.params['b_vocab']
116 |     
117 |     loss, grads = 0.0, {}
118 |     ############################################################################
119 |     # TODO: Implement the forward and backward passes for the CaptioningRNN.   #
120 |     # In the forward pass you will need to do the following:                   #
121 |     # (1) Use an affine transformation to compute the initial hidden state     #
122 |     #     from the image features. This should produce an array of shape (N, H)#
123 |     # (2) Use a word embedding layer to transform the words in captions_in     #
124 |     #     from indices to vectors, giving an array of shape (N, T, W).         #
125 |     # (3) Use either a vanilla RNN or LSTM (depending on self.cell_type) to    #
126 |     #     process the sequence of input word vectors and produce hidden state  #
127 |     #     vectors for all timesteps, producing an array of shape (N, T, H).    #
128 |     # (4) Use a (temporal) affine transformation to compute scores over the    #
129 |     #     vocabulary at every timestep using the hidden states, giving an      #
130 |     #     array of shape (N, T, V).                                            #
131 |     # (5) Use (temporal) softmax to compute loss using captions_out, ignoring  #
132 |     #     the points where the output word is <NULL> using the mask above.     #
133 |     #                                                                          #
134 |     # In the backward pass you will need to compute the gradient of the loss   #
135 |     # with respect to all model parameters. Use the loss and grads variables   #
136 |     # defined above to store loss and gradients; grads[k] should give the      #
137 |     # gradients for self.params[k].                                            #
138 |     ############################################################################
139 |     N, T = captions.shape
140 |     h0, cache0 = affine_forward(features, W_proj, b_proj)
141 | 
142 |     """Most papers add <START> as the first input word, but this lesson
143 |     apparently did not. Therefore I added these codes below.
144 |     """
145 |     captions_in = np.concatenate((np.tile(self._start, (N, 1)), captions_in), 1)
146 |     captions_out = np.concatenate((captions_out, np.tile(self._end, (N, 1))), 1)
147 |     mask = np.concatenate((mask, np.tile(True, (N, 1))), 1)
148 |     
149 |     out, cache1 = word_embedding_forward(captions_in, W_embed)
150 |     if self.cell_type == 'rnn':
151 |         h, cache2 = rnn_forward(out, h0, Wx, Wh, b)
152 |     elif self.cell_type == 'lstm':
153 |         h, cache2 = lstm_forward(out, h0, Wx, Wh, b)
154 |     out2, cache3 = temporal_affine_forward(h, W_vocab, b_vocab)
155 |     loss, dout = temporal_softmax_loss(out2, captions_out, mask)
156 | 
157 |     dout, grads['W_vocab'], grads['b_vocab'] = temporal_affine_backward(dout, cache3)
158 |     if self.cell_type == 'rnn':
159 |         dout, dh0, grads['Wx'], grads['Wh'], grads['b'] = rnn_backward(dout, cache2)
160 |     elif self.cell_type == 'lstm':
161 |         dout, dh0, grads['Wx'], grads['Wh'], grads['b'] = lstm_backward(dout, cache2)
162 |     grads['W_embed'] = word_embedding_backward(dout, cache1)
163 |     _, grads['W_proj'], grads['b_proj'] = affine_backward(dh0, cache0)
164 |     ############################################################################
165 |     #                             END OF YOUR CODE                             #
166 |     ############################################################################
167 |     
168 |     return loss, grads
169 | 
170 | 
171 |   def sample(self, features, max_length=30):
172 |     """
173 |     Run a test-time forward pass for the model, sampling captions for input
174 |     feature vectors.
175 | 
176 |     At each timestep, we embed the current word, pass it and the previous hidden
177 |     state to the RNN to get the next hidden state, use the hidden state to get
178 |     scores for all vocab words, and choose the word with the highest score as
179 |     the next word. The initial hidden state is computed by applying an affine
180 |     transform to the input image features, and the initial word is the <START>
181 |     token.
182 | 
183 |     For LSTMs you will also have to keep track of the cell state; in that case
184 |     the initial cell state should be zero.
185 | 
186 |     Inputs:
187 |     - features: Array of input image features of shape (N, D).
188 |     - max_length: Maximum length T of generated captions.
189 | 
190 |     Returns:
191 |     - captions: Array of shape (N, max_length) giving sampled captions,
192 |       where each element is an integer in the range [0, V). The first element
193 |       of captions should be the first sampled word, not the <START> token.
194 |     """
195 |     N = features.shape[0]
196 |     captions = self._null * np.ones((N, max_length), dtype=np.int32)
197 | 
198 |     # Unpack parameters
199 |     W_proj, b_proj = self.params['W_proj'], self.params['b_proj']
200 |     W_embed = self.params['W_embed']
201 |     Wx, Wh, b = self.params['Wx'], self.params['Wh'], self.params['b']
202 |     W_vocab, b_vocab = self.params['W_vocab'], self.params['b_vocab']
203 |     
204 |     ###########################################################################
205 |     # TODO: Implement test-time sampling for the model. You will need to      #
206 |     # initialize the hidden state of the RNN by applying the learned affine   #
207 |     # transform to the input image features. The first word that you feed to  #
208 |     # the RNN should be the <START> token; its value is stored in the         #
209 |     # variable self._start. At each timestep you will need to do to:          #
210 |     # (1) Embed the previous word using the learned word embeddings           #
211 |     # (2) Make an RNN step using the previous hidden state and the embedded   #
212 |     #     current word to get the next hidden state.                          #
213 |     # (3) Apply the learned affine transformation to the next hidden state to #
214 |     #     get scores for all words in the vocabulary                          #
215 |     # (4) Select the word with the highest score as the next word, writing it #
216 |     #     to the appropriate slot in the captions variable                    #
217 |     #                                                                         #
218 |     # For simplicity, you do not need to stop generating after an <END> token #
219 |     # is sampled, but you can if you want to.                                 #
220 |     #                                                                         #
221 |     # HINT: You will not be able to use the rnn_forward or lstm_forward       #
222 |     # functions; you'll need to call rnn_step_forward or lstm_step_forward in #
223 |     # a loop.                                                                 #
224 |     ###########################################################################
225 |     prev_h, _ = affine_forward(features, W_proj, b_proj)
226 |     prev_c = np.zeros(prev_h.shape)
227 |     word_index_input = [self._start] * N
228 | 
229 |     for i in range(max_length):
230 |         word_vector_input = W_embed[word_index_input]  # (N, wordvec_dim)
231 |         if self.cell_type == 'rnn':
232 |             next_h, _ = rnn_step_forward(word_vector_input, prev_h, Wx, Wh, b)
233 |         elif self.cell_type == 'lstm':
234 |             next_h, next_c, _ = lstm_step_forward(word_vector_input, prev_h, prev_c, Wx, Wh, b)
235 |             prev_c = next_c
236 |         prev_h = next_h  # (N, H)
237 |         out, _ = affine_forward(prev_h, W_vocab, b_vocab)  # (N,V)
238 |         word_index_input = captions[:,i] = np.argmax(out, axis=1)  # (N,)
239 |     ############################################################################
240 |     #                             END OF YOUR CODE                             #
241 |     ############################################################################
242 |     return captions
243 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/coco_utils.py:
--------------------------------------------------------------------------------
 1 | import os, json
 2 | import numpy as np
 3 | import h5py
 4 | 
 5 | 
 6 | def load_coco_data(base_dir='cs231n/datasets/coco_captioning',
 7 |                    max_train=None,
 8 |                    pca_features=True):
 9 |   data = {}
10 |   caption_file = os.path.join(base_dir, 'coco2014_captions.h5')
11 |   with h5py.File(caption_file, 'r') as f:
12 |     for k, v in f.iteritems():
13 |       data[k] = np.asarray(v)
14 | 
15 |   if pca_features:
16 |     train_feat_file = os.path.join(base_dir, 'train2014_vgg16_fc7_pca.h5')
17 |   else:
18 |     train_feat_file = os.path.join(base_dir, 'train2014_vgg16_fc7.h5')
19 |   with h5py.File(train_feat_file, 'r') as f:
20 |     data['train_features'] = np.asarray(f['features'])
21 | 
22 |   if pca_features:
23 |     val_feat_file = os.path.join(base_dir, 'val2014_vgg16_fc7_pca.h5')
24 |   else:
25 |     val_feat_file = os.path.join(base_dir, 'val2014_vgg16_fc7.h5')
26 |   with h5py.File(val_feat_file, 'r') as f:
27 |     data['val_features'] = np.asarray(f['features'])
28 | 
29 |   dict_file = os.path.join(base_dir, 'coco2014_vocab.json')
30 |   with open(dict_file, 'r') as f:
31 |     dict_data = json.load(f)
32 |     for k, v in dict_data.iteritems():
33 |       data[k] = v
34 | 
35 |   train_url_file = os.path.join(base_dir, 'train2014_urls.txt')
36 |   with open(train_url_file, 'r') as f:
37 |     train_urls = np.asarray([line.strip() for line in f])
38 |   data['train_urls'] = train_urls
39 | 
40 |   val_url_file = os.path.join(base_dir, 'val2014_urls.txt')
41 |   with open(val_url_file, 'r') as f:
42 |     val_urls = np.asarray([line.strip() for line in f])
43 |   data['val_urls'] = val_urls
44 | 
45 |   # Maybe subsample the training data
46 |   if max_train is not None:
47 |     num_train = data['train_captions'].shape[0]
48 |     mask = np.random.randint(num_train, size=max_train)
49 |     data['train_captions'] = data['train_captions'][mask]
50 |     data['train_image_idxs'] = data['train_image_idxs'][mask]
51 | 
52 |   return data
53 | 
54 | 
55 | def decode_captions(captions, idx_to_word):
56 |   singleton = False
57 |   if captions.ndim == 1:
58 |     singleton = True
59 |     captions = captions[None]
60 |   decoded = []
61 |   N, T = captions.shape
62 |   for i in xrange(N):
63 |     words = []
64 |     for t in xrange(T):
65 |       word = idx_to_word[captions[i, t]]
66 |       if word != '<NULL>':
67 |         words.append(word)
68 |       if word == '<END>':
69 |         break
70 |     decoded.append(' '.join(words))
71 |   if singleton:
72 |     decoded = decoded[0]
73 |   return decoded
74 | 
75 | 
76 | def sample_coco_minibatch(data, batch_size=100, split='train'):
77 |   split_size = data['%s_captions' % split].shape[0]
78 |   mask = np.random.choice(split_size, batch_size)
79 |   captions = data['%s_captions' % split][mask]
80 |   image_idxs = data['%s_image_idxs' % split][mask]
81 |   image_features = data['%s_features' % split][image_idxs]
82 |   urls = data['%s_urls' % split][image_idxs]
83 |   return captions, image_features, urls
84 | 
85 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/data_utils.py:
--------------------------------------------------------------------------------
  1 | import cPickle as pickle
  2 | import numpy as np
  3 | import os
  4 | from scipy.misc import imread
  5 | 
  6 | def load_CIFAR_batch(filename):
  7 |   """ load single batch of cifar """
  8 |   with open(filename, 'rb') as f:
  9 |     datadict = pickle.load(f)
 10 |     X = datadict['data']
 11 |     Y = datadict['labels']
 12 |     X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float")
 13 |     Y = np.array(Y)
 14 |     return X, Y
 15 | 
 16 | def load_CIFAR10(ROOT):
 17 |   """ load all of cifar """
 18 |   xs = []
 19 |   ys = []
 20 |   for b in range(1,6):
 21 |     f = os.path.join(ROOT, 'data_batch_%d' % (b, ))
 22 |     X, Y = load_CIFAR_batch(f)
 23 |     xs.append(X)
 24 |     ys.append(Y)    
 25 |   Xtr = np.concatenate(xs)
 26 |   Ytr = np.concatenate(ys)
 27 |   del X, Y
 28 |   Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch'))
 29 |   return Xtr, Ytr, Xte, Yte
 30 | 
 31 | 
 32 | def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000,
 33 |                      subtract_mean=True):
 34 |     """
 35 |     Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
 36 |     it for classifiers. These are the same steps as we used for the SVM, but
 37 |     condensed to a single function.
 38 |     """
 39 |     # Load the raw CIFAR-10 data
 40 |     cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
 41 |     X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
 42 |         
 43 |     # Subsample the data
 44 |     mask = range(num_training, num_training + num_validation)
 45 |     X_val = X_train[mask]
 46 |     y_val = y_train[mask]
 47 |     mask = range(num_training)
 48 |     X_train = X_train[mask]
 49 |     y_train = y_train[mask]
 50 |     mask = range(num_test)
 51 |     X_test = X_test[mask]
 52 |     y_test = y_test[mask]
 53 | 
 54 |     # Normalize the data: subtract the mean image
 55 |     if subtract_mean:
 56 |       mean_image = np.mean(X_train, axis=0)
 57 |       X_train -= mean_image
 58 |       X_val -= mean_image
 59 |       X_test -= mean_image
 60 |     
 61 |     # Transpose so that channels come first
 62 |     X_train = X_train.transpose(0, 3, 1, 2).copy()
 63 |     X_val = X_val.transpose(0, 3, 1, 2).copy()
 64 |     X_test = X_test.transpose(0, 3, 1, 2).copy()
 65 | 
 66 |     # Package data into a dictionary
 67 |     return {
 68 |       'X_train': X_train, 'y_train': y_train,
 69 |       'X_val': X_val, 'y_val': y_val,
 70 |       'X_test': X_test, 'y_test': y_test,
 71 |     }
 72 |     
 73 | 
 74 | def load_tiny_imagenet(path, dtype=np.float32, subtract_mean=True):
 75 |   """
 76 |   Load TinyImageNet. Each of TinyImageNet-100-A, TinyImageNet-100-B, and
 77 |   TinyImageNet-200 have the same directory structure, so this can be used
 78 |   to load any of them.
 79 | 
 80 |   Inputs:
 81 |   - path: String giving path to the directory to load.
 82 |   - dtype: numpy datatype used to load the data.
 83 |   - subtract_mean: Whether to subtract the mean training image.
 84 | 
 85 |   Returns: A dictionary with the following entries:
 86 |   - class_names: A list where class_names[i] is a list of strings giving the
 87 |     WordNet names for class i in the loaded dataset.
 88 |   - X_train: (N_tr, 3, 64, 64) array of training images
 89 |   - y_train: (N_tr,) array of training labels
 90 |   - X_val: (N_val, 3, 64, 64) array of validation images
 91 |   - y_val: (N_val,) array of validation labels
 92 |   - X_test: (N_test, 3, 64, 64) array of testing images.
 93 |   - y_test: (N_test,) array of test labels; if test labels are not available
 94 |     (such as in student code) then y_test will be None.
 95 |   - mean_image: (3, 64, 64) array giving mean training image
 96 |   """
 97 |   # First load wnids
 98 |   with open(os.path.join(path, 'wnids.txt'), 'r') as f:
 99 |     wnids = [x.strip() for x in f]
100 | 
101 |   # Map wnids to integer labels
102 |   wnid_to_label = {wnid: i for i, wnid in enumerate(wnids)}
103 | 
104 |   # Use words.txt to get names for each class
105 |   with open(os.path.join(path, 'words.txt'), 'r') as f:
106 |     wnid_to_words = dict(line.split('\t') for line in f)
107 |     for wnid, words in wnid_to_words.iteritems():
108 |       wnid_to_words[wnid] = [w.strip() for w in words.split(',')]
109 |   class_names = [wnid_to_words[wnid] for wnid in wnids]
110 | 
111 |   # Next load training data.
112 |   X_train = []
113 |   y_train = []
114 |   for i, wnid in enumerate(wnids):
115 |     if (i + 1) % 20 == 0:
116 |       print 'loading training data for synset %d / %d' % (i + 1, len(wnids))
117 |     # To figure out the filenames we need to open the boxes file
118 |     boxes_file = os.path.join(path, 'train', wnid, '%s_boxes.txt' % wnid)
119 |     with open(boxes_file, 'r') as f:
120 |       filenames = [x.split('\t')[0] for x in f]
121 |     num_images = len(filenames)
122 |     
123 |     X_train_block = np.zeros((num_images, 3, 64, 64), dtype=dtype)
124 |     y_train_block = wnid_to_label[wnid] * np.ones(num_images, dtype=np.int64)
125 |     for j, img_file in enumerate(filenames):
126 |       img_file = os.path.join(path, 'train', wnid, 'images', img_file)
127 |       img = imread(img_file)
128 |       if img.ndim == 2:
129 |         ## grayscale file
130 |         img.shape = (64, 64, 1)
131 |       X_train_block[j] = img.transpose(2, 0, 1)
132 |     X_train.append(X_train_block)
133 |     y_train.append(y_train_block)
134 |       
135 |   # We need to concatenate all training data
136 |   X_train = np.concatenate(X_train, axis=0)
137 |   y_train = np.concatenate(y_train, axis=0)
138 |   
139 |   # Next load validation data
140 |   with open(os.path.join(path, 'val', 'val_annotations.txt'), 'r') as f:
141 |     img_files = []
142 |     val_wnids = []
143 |     for line in f:
144 |       img_file, wnid = line.split('\t')[:2]
145 |       img_files.append(img_file)
146 |       val_wnids.append(wnid)
147 |     num_val = len(img_files)
148 |     y_val = np.array([wnid_to_label[wnid] for wnid in val_wnids])
149 |     X_val = np.zeros((num_val, 3, 64, 64), dtype=dtype)
150 |     for i, img_file in enumerate(img_files):
151 |       img_file = os.path.join(path, 'val', 'images', img_file)
152 |       img = imread(img_file)
153 |       if img.ndim == 2:
154 |         img.shape = (64, 64, 1)
155 |       X_val[i] = img.transpose(2, 0, 1)
156 | 
157 |   # Next load test images
158 |   # Students won't have test labels, so we need to iterate over files in the
159 |   # images directory.
160 |   img_files = os.listdir(os.path.join(path, 'test', 'images'))
161 |   X_test = np.zeros((len(img_files), 3, 64, 64), dtype=dtype)
162 |   for i, img_file in enumerate(img_files):
163 |     img_file = os.path.join(path, 'test', 'images', img_file)
164 |     img = imread(img_file)
165 |     if img.ndim == 2:
166 |       img.shape = (64, 64, 1)
167 |     X_test[i] = img.transpose(2, 0, 1)
168 | 
169 |   y_test = None
170 |   y_test_file = os.path.join(path, 'test', 'test_annotations.txt')
171 |   if os.path.isfile(y_test_file):
172 |     with open(y_test_file, 'r') as f:
173 |       img_file_to_wnid = {}
174 |       for line in f:
175 |         line = line.split('\t')
176 |         img_file_to_wnid[line[0]] = line[1]
177 |     y_test = [wnid_to_label[img_file_to_wnid[img_file]] for img_file in img_files]
178 |     y_test = np.array(y_test)
179 |   
180 |   mean_image = X_train.mean(axis=0)
181 |   if subtract_mean:
182 |     X_train -= mean_image[None]
183 |     X_val -= mean_image[None]
184 |     X_test -= mean_image[None]
185 | 
186 |   return {
187 |     'class_names': class_names,
188 |     'X_train': X_train,
189 |     'y_train': y_train,
190 |     'X_val': X_val,
191 |     'y_val': y_val,
192 |     'X_test': X_test,
193 |     'y_test': y_test,
194 |     'class_names': class_names,
195 |     'mean_image': mean_image,
196 |   }
197 | 
198 | 
199 | def load_models(models_dir):
200 |   """
201 |   Load saved models from disk. This will attempt to unpickle all files in a
202 |   directory; any files that give errors on unpickling (such as README.txt) will
203 |   be skipped.
204 | 
205 |   Inputs:
206 |   - models_dir: String giving the path to a directory containing model files.
207 |     Each model file is a pickled dictionary with a 'model' field.
208 | 
209 |   Returns:
210 |   A dictionary mapping model file names to models.
211 |   """
212 |   models = {}
213 |   for model_file in os.listdir(models_dir):
214 |     with open(os.path.join(models_dir, model_file), 'rb') as f:
215 |       try:
216 |         models[model_file] = pickle.load(f)['model']
217 |       except pickle.UnpicklingError:
218 |         continue
219 |   return models
220 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/datasets/get_coco_captioning.sh:
--------------------------------------------------------------------------------
1 | wget "http://cs231n.stanford.edu/coco_captioning.zip"
2 | unzip coco_captioning.zip
3 | rm coco_captioning.zip
4 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/datasets/get_pretrained_model.sh:
--------------------------------------------------------------------------------
1 | wget http://cs231n.stanford.edu/pretrained_model.h5
2 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/datasets/get_tiny_imagenet_a.sh:
--------------------------------------------------------------------------------
1 | wget http://cs231n.stanford.edu/tiny-imagenet-100-A.zip
2 | unzip tiny-imagenet-100-A.zip
3 | rm tiny-imagenet-100-A.zip
4 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/fast_layers.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | try:
  3 |   from cs231n.im2col_cython import col2im_cython, im2col_cython
  4 |   from cs231n.im2col_cython import col2im_6d_cython
  5 | except ImportError:
  6 |   print 'run the following from the cs231n directory and try again:'
  7 |   print 'python setup.py build_ext --inplace'
  8 |   print 'You may also need to restart your iPython kernel'
  9 | 
 10 | from cs231n.im2col import *
 11 | 
 12 | 
 13 | def conv_forward_im2col(x, w, b, conv_param):
 14 |   """
 15 |   A fast implementation of the forward pass for a convolutional layer
 16 |   based on im2col and col2im.
 17 |   """
 18 |   N, C, H, W = x.shape
 19 |   num_filters, _, filter_height, filter_width = w.shape
 20 |   stride, pad = conv_param['stride'], conv_param['pad']
 21 | 
 22 |   # Check dimensions
 23 |   assert (W + 2 * pad - filter_width) % stride == 0, 'width does not work'
 24 |   assert (H + 2 * pad - filter_height) % stride == 0, 'height does not work'
 25 | 
 26 |   # Create output
 27 |   out_height = (H + 2 * pad - filter_height) / stride + 1
 28 |   out_width = (W + 2 * pad - filter_width) / stride + 1
 29 |   out = np.zeros((N, num_filters, out_height, out_width), dtype=x.dtype)
 30 | 
 31 |   # x_cols = im2col_indices(x, w.shape[2], w.shape[3], pad, stride)
 32 |   x_cols = im2col_cython(x, w.shape[2], w.shape[3], pad, stride)
 33 |   res = w.reshape((w.shape[0], -1)).dot(x_cols) + b.reshape(-1, 1)
 34 | 
 35 |   out = res.reshape(w.shape[0], out.shape[2], out.shape[3], x.shape[0])
 36 |   out = out.transpose(3, 0, 1, 2)
 37 | 
 38 |   cache = (x, w, b, conv_param, x_cols)
 39 |   return out, cache
 40 | 
 41 | 
 42 | def conv_forward_strides(x, w, b, conv_param):
 43 |   N, C, H, W = x.shape
 44 |   F, _, HH, WW = w.shape
 45 |   stride, pad = conv_param['stride'], conv_param['pad']
 46 | 
 47 |   # Check dimensions
 48 |   #assert (W + 2 * pad - WW) % stride == 0, 'width does not work'
 49 |   #assert (H + 2 * pad - HH) % stride == 0, 'height does not work'
 50 | 
 51 |   # Pad the input
 52 |   p = pad
 53 |   x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')
 54 |   
 55 |   # Figure out output dimensions
 56 |   H += 2 * pad
 57 |   W += 2 * pad
 58 |   out_h = (H - HH) / stride + 1
 59 |   out_w = (W - WW) / stride + 1
 60 | 
 61 |   # Perform an im2col operation by picking clever strides
 62 |   shape = (C, HH, WW, N, out_h, out_w)
 63 |   strides = (H * W, W, 1, C * H * W, stride * W, stride)
 64 |   strides = x.itemsize * np.array(strides)
 65 |   x_stride = np.lib.stride_tricks.as_strided(x_padded,
 66 |                 shape=shape, strides=strides)
 67 |   x_cols = np.ascontiguousarray(x_stride)
 68 |   x_cols.shape = (C * HH * WW, N * out_h * out_w)
 69 | 
 70 |   # Now all our convolutions are a big matrix multiply
 71 |   res = w.reshape(F, -1).dot(x_cols) + b.reshape(-1, 1)
 72 | 
 73 |   # Reshape the output
 74 |   res.shape = (F, N, out_h, out_w)
 75 |   out = res.transpose(1, 0, 2, 3)
 76 | 
 77 |   # Be nice and return a contiguous array
 78 |   # The old version of conv_forward_fast doesn't do this, so for a fair
 79 |   # comparison we won't either
 80 |   out = np.ascontiguousarray(out)
 81 | 
 82 |   cache = (x, w, b, conv_param, x_cols)
 83 |   return out, cache
 84 |   
 85 | 
 86 | def conv_backward_strides(dout, cache):
 87 |   x, w, b, conv_param, x_cols = cache
 88 |   stride, pad = conv_param['stride'], conv_param['pad']
 89 | 
 90 |   N, C, H, W = x.shape
 91 |   F, _, HH, WW = w.shape
 92 |   _, _, out_h, out_w = dout.shape
 93 | 
 94 |   db = np.sum(dout, axis=(0, 2, 3))
 95 | 
 96 |   dout_reshaped = dout.transpose(1, 0, 2, 3).reshape(F, -1)
 97 |   dw = dout_reshaped.dot(x_cols.T).reshape(w.shape)
 98 | 
 99 |   dx_cols = w.reshape(F, -1).T.dot(dout_reshaped)
100 |   dx_cols.shape = (C, HH, WW, N, out_h, out_w)
101 |   dx = col2im_6d_cython(dx_cols, N, C, H, W, HH, WW, pad, stride)
102 | 
103 |   return dx, dw, db
104 | 
105 | 
106 | def conv_backward_im2col(dout, cache):
107 |   """
108 |   A fast implementation of the backward pass for a convolutional layer
109 |   based on im2col and col2im.
110 |   """
111 |   x, w, b, conv_param, x_cols = cache
112 |   stride, pad = conv_param['stride'], conv_param['pad']
113 | 
114 |   db = np.sum(dout, axis=(0, 2, 3))
115 | 
116 |   num_filters, _, filter_height, filter_width = w.shape
117 |   dout_reshaped = dout.transpose(1, 2, 3, 0).reshape(num_filters, -1)
118 |   dw = dout_reshaped.dot(x_cols.T).reshape(w.shape)
119 | 
120 |   dx_cols = w.reshape(num_filters, -1).T.dot(dout_reshaped)
121 |   # dx = col2im_indices(dx_cols, x.shape, filter_height, filter_width, pad, stride)
122 |   dx = col2im_cython(dx_cols, x.shape[0], x.shape[1], x.shape[2], x.shape[3],
123 |                      filter_height, filter_width, pad, stride)
124 | 
125 |   return dx, dw, db
126 | 
127 | 
128 | conv_forward_fast = conv_forward_strides
129 | conv_backward_fast = conv_backward_strides
130 | 
131 | 
132 | def max_pool_forward_fast(x, pool_param):
133 |   """
134 |   A fast implementation of the forward pass for a max pooling layer.
135 | 
136 |   This chooses between the reshape method and the im2col method. If the pooling
137 |   regions are square and tile the input image, then we can use the reshape
138 |   method which is very fast. Otherwise we fall back on the im2col method, which
139 |   is not much faster than the naive method.
140 |   """
141 |   N, C, H, W = x.shape
142 |   pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width']
143 |   stride = pool_param['stride']
144 | 
145 |   same_size = pool_height == pool_width == stride
146 |   tiles = H % pool_height == 0 and W % pool_width == 0
147 |   if same_size and tiles:
148 |     out, reshape_cache = max_pool_forward_reshape(x, pool_param)
149 |     cache = ('reshape', reshape_cache)
150 |   else:
151 |     out, im2col_cache = max_pool_forward_im2col(x, pool_param)
152 |     cache = ('im2col', im2col_cache)
153 |   return out, cache
154 | 
155 | 
156 | def max_pool_backward_fast(dout, cache):
157 |   """
158 |   A fast implementation of the backward pass for a max pooling layer.
159 | 
160 |   This switches between the reshape method an the im2col method depending on
161 |   which method was used to generate the cache.
162 |   """
163 |   method, real_cache = cache
164 |   if method == 'reshape':
165 |     return max_pool_backward_reshape(dout, real_cache)
166 |   elif method == 'im2col':
167 |     return max_pool_backward_im2col(dout, real_cache)
168 |   else:
169 |     raise ValueError('Unrecognized method "%s"' % method)
170 | 
171 | 
172 | def max_pool_forward_reshape(x, pool_param):
173 |   """
174 |   A fast implementation of the forward pass for the max pooling layer that uses
175 |   some clever reshaping.
176 | 
177 |   This can only be used for square pooling regions that tile the input.
178 |   """
179 |   N, C, H, W = x.shape
180 |   pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width']
181 |   stride = pool_param['stride']
182 |   assert pool_height == pool_width == stride, 'Invalid pool params'
183 |   assert H % pool_height == 0
184 |   assert W % pool_height == 0
185 |   x_reshaped = x.reshape(N, C, H / pool_height, pool_height,
186 |                          W / pool_width, pool_width)
187 |   out = x_reshaped.max(axis=3).max(axis=4)
188 | 
189 |   cache = (x, x_reshaped, out)
190 |   return out, cache
191 | 
192 | 
193 | def max_pool_backward_reshape(dout, cache):
194 |   """
195 |   A fast implementation of the backward pass for the max pooling layer that
196 |   uses some clever broadcasting and reshaping.
197 | 
198 |   This can only be used if the forward pass was computed using
199 |   max_pool_forward_reshape.
200 | 
201 |   NOTE: If there are multiple argmaxes, this method will assign gradient to
202 |   ALL argmax elements of the input rather than picking one. In this case the
203 |   gradient will actually be incorrect. However this is unlikely to occur in
204 |   practice, so it shouldn't matter much. One possible solution is to split the
205 |   upstream gradient equally among all argmax elements; this should result in a
206 |   valid subgradient. You can make this happen by uncommenting the line below;
207 |   however this results in a significant performance penalty (about 40% slower)
208 |   and is unlikely to matter in practice so we don't do it.
209 |   """
210 |   x, x_reshaped, out = cache
211 | 
212 |   dx_reshaped = np.zeros_like(x_reshaped)
213 |   out_newaxis = out[:, :, :, np.newaxis, :, np.newaxis]
214 |   mask = (x_reshaped == out_newaxis)
215 |   dout_newaxis = dout[:, :, :, np.newaxis, :, np.newaxis]
216 |   dout_broadcast, _ = np.broadcast_arrays(dout_newaxis, dx_reshaped)
217 |   dx_reshaped[mask] = dout_broadcast[mask]
218 |   dx_reshaped /= np.sum(mask, axis=(3, 5), keepdims=True)
219 |   dx = dx_reshaped.reshape(x.shape)
220 | 
221 |   return dx
222 | 
223 | 
224 | def max_pool_forward_im2col(x, pool_param):
225 |   """
226 |   An implementation of the forward pass for max pooling based on im2col.
227 | 
228 |   This isn't much faster than the naive version, so it should be avoided if
229 |   possible.
230 |   """
231 |   N, C, H, W = x.shape
232 |   pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width']
233 |   stride = pool_param['stride']
234 | 
235 |   assert (H - pool_height) % stride == 0, 'Invalid height'
236 |   assert (W - pool_width) % stride == 0, 'Invalid width'
237 | 
238 |   out_height = (H - pool_height) / stride + 1
239 |   out_width = (W - pool_width) / stride + 1
240 | 
241 |   x_split = x.reshape(N * C, 1, H, W)
242 |   x_cols = im2col(x_split, pool_height, pool_width, padding=0, stride=stride)
243 |   x_cols_argmax = np.argmax(x_cols, axis=0)
244 |   x_cols_max = x_cols[x_cols_argmax, np.arange(x_cols.shape[1])]
245 |   out = x_cols_max.reshape(out_height, out_width, N, C).transpose(2, 3, 0, 1)
246 | 
247 |   cache = (x, x_cols, x_cols_argmax, pool_param)
248 |   return out, cache
249 | 
250 | 
251 | def max_pool_backward_im2col(dout, cache):
252 |   """
253 |   An implementation of the backward pass for max pooling based on im2col.
254 | 
255 |   This isn't much faster than the naive version, so it should be avoided if
256 |   possible.
257 |   """
258 |   x, x_cols, x_cols_argmax, pool_param = cache
259 |   N, C, H, W = x.shape
260 |   pool_height, pool_width = pool_param['pool_height'], pool_param['pool_width']
261 |   stride = pool_param['stride']
262 | 
263 |   dout_reshaped = dout.transpose(2, 3, 0, 1).flatten()
264 |   dx_cols = np.zeros_like(x_cols)
265 |   dx_cols[x_cols_argmax, np.arange(dx_cols.shape[1])] = dout_reshaped
266 |   dx = col2im_indices(dx_cols, (N * C, 1, H, W), pool_height, pool_width,
267 |               padding=0, stride=stride)
268 |   dx = dx.reshape(x.shape)
269 | 
270 |   return dx
271 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/gradient_check.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from random import randrange
  3 | 
  4 | def eval_numerical_gradient(f, x, verbose=True, h=0.00001):
  5 |   """ 
  6 |   a naive implementation of numerical gradient of f at x 
  7 |   - f should be a function that takes a single argument
  8 |   - x is the point (numpy array) to evaluate the gradient at
  9 |   """ 
 10 | 
 11 |   fx = f(x) # evaluate function value at original point
 12 |   grad = np.zeros_like(x)
 13 |   # iterate over all indexes in x
 14 |   it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
 15 |   while not it.finished:
 16 | 
 17 |     # evaluate function at x+h
 18 |     ix = it.multi_index
 19 |     oldval = x[ix]
 20 |     x[ix] = oldval + h # increment by h
 21 |     fxph = f(x) # evalute f(x + h)
 22 |     x[ix] = oldval - h
 23 |     fxmh = f(x) # evaluate f(x - h)
 24 |     x[ix] = oldval # restore
 25 | 
 26 |     # compute the partial derivative with centered formula
 27 |     grad[ix] = (fxph - fxmh) / (2 * h) # the slope
 28 |     if verbose:
 29 |       print ix, grad[ix]
 30 |     it.iternext() # step to next dimension
 31 | 
 32 |   return grad
 33 | 
 34 | 
 35 | def eval_numerical_gradient_array(f, x, df, h=1e-5):
 36 |   """
 37 |   Evaluate a numeric gradient for a function that accepts a numpy
 38 |   array and returns a numpy array.
 39 |   """
 40 |   grad = np.zeros_like(x)
 41 |   it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
 42 |   while not it.finished:
 43 |     ix = it.multi_index
 44 |     
 45 |     oldval = x[ix]
 46 |     x[ix] = oldval + h
 47 |     pos = f(x).copy()
 48 |     x[ix] = oldval - h
 49 |     neg = f(x).copy()
 50 |     x[ix] = oldval
 51 |     
 52 |     grad[ix] = np.sum((pos - neg) * df) / (2 * h)
 53 |     it.iternext()
 54 |   return grad
 55 | 
 56 | 
 57 | def eval_numerical_gradient_blobs(f, inputs, output, h=1e-5):
 58 |   """
 59 |   Compute numeric gradients for a function that operates on input
 60 |   and output blobs.
 61 |   
 62 |   We assume that f accepts several input blobs as arguments, followed by a blob
 63 |   into which outputs will be written. For example, f might be called like this:
 64 | 
 65 |   f(x, w, out)
 66 |   
 67 |   where x and w are input Blobs, and the result of f will be written to out.
 68 | 
 69 |   Inputs: 
 70 |   - f: function
 71 |   - inputs: tuple of input blobs
 72 |   - output: output blob
 73 |   - h: step size
 74 |   """
 75 |   numeric_diffs = []
 76 |   for input_blob in inputs:
 77 |     diff = np.zeros_like(input_blob.diffs)
 78 |     it = np.nditer(input_blob.vals, flags=['multi_index'],
 79 |                    op_flags=['readwrite'])
 80 |     while not it.finished:
 81 |       idx = it.multi_index
 82 |       orig = input_blob.vals[idx]
 83 | 
 84 |       input_blob.vals[idx] = orig + h
 85 |       f(*(inputs + (output,)))
 86 |       pos = np.copy(output.vals)
 87 |       input_blob.vals[idx] = orig - h
 88 |       f(*(inputs + (output,)))
 89 |       neg = np.copy(output.vals)
 90 |       input_blob.vals[idx] = orig
 91 |       
 92 |       diff[idx] = np.sum((pos - neg) * output.diffs) / (2.0 * h)
 93 | 
 94 |       it.iternext()
 95 |     numeric_diffs.append(diff)
 96 |   return numeric_diffs
 97 | 
 98 | 
 99 | def eval_numerical_gradient_net(net, inputs, output, h=1e-5):
100 |   return eval_numerical_gradient_blobs(lambda *args: net.forward(),
101 |               inputs, output, h=h)
102 | 
103 | 
104 | def grad_check_sparse(f, x, analytic_grad, num_checks=10, h=1e-5):
105 |   """
106 |   sample a few random elements and only return numerical
107 |   in this dimensions.
108 |   """
109 | 
110 |   for i in xrange(num_checks):
111 |     ix = tuple([randrange(m) for m in x.shape])
112 | 
113 |     oldval = x[ix]
114 |     x[ix] = oldval + h # increment by h
115 |     fxph = f(x) # evaluate f(x + h)
116 |     x[ix] = oldval - h # increment by h
117 |     fxmh = f(x) # evaluate f(x - h)
118 |     x[ix] = oldval # reset
119 | 
120 |     grad_numerical = (fxph - fxmh) / (2 * h)
121 |     grad_analytic = analytic_grad[ix]
122 |     rel_error = abs(grad_numerical - grad_analytic) / (abs(grad_numerical) + abs(grad_analytic))
123 |     print 'numerical: %f analytic: %f, relative error: %e' % (grad_numerical, grad_analytic, rel_error)
124 | 
125 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/im2col.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | def get_im2col_indices(x_shape, field_height, field_width, padding=1, stride=1):
 5 |   # First figure out what the size of the output should be
 6 |   N, C, H, W = x_shape
 7 |   assert (H + 2 * padding - field_height) % stride == 0
 8 |   assert (W + 2 * padding - field_height) % stride == 0
 9 |   out_height = (H + 2 * padding - field_height) / stride + 1
10 |   out_width = (W + 2 * padding - field_width) / stride + 1
11 | 
12 |   i0 = np.repeat(np.arange(field_height), field_width)
13 |   i0 = np.tile(i0, C)
14 |   i1 = stride * np.repeat(np.arange(out_height), out_width)
15 |   j0 = np.tile(np.arange(field_width), field_height * C)
16 |   j1 = stride * np.tile(np.arange(out_width), out_height)
17 |   i = i0.reshape(-1, 1) + i1.reshape(1, -1)
18 |   j = j0.reshape(-1, 1) + j1.reshape(1, -1)
19 | 
20 |   k = np.repeat(np.arange(C), field_height * field_width).reshape(-1, 1)
21 | 
22 |   return (k, i, j)
23 | 
24 | 
25 | def im2col_indices(x, field_height, field_width, padding=1, stride=1):
26 |   """ An implementation of im2col based on some fancy indexing """
27 |   # Zero-pad the input
28 |   p = padding
29 |   x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')
30 | 
31 |   k, i, j = get_im2col_indices(x.shape, field_height, field_width, padding,
32 |                                stride)
33 | 
34 |   cols = x_padded[:, k, i, j]
35 |   C = x.shape[1]
36 |   cols = cols.transpose(1, 2, 0).reshape(field_height * field_width * C, -1)
37 |   return cols
38 | 
39 | 
40 | def col2im_indices(cols, x_shape, field_height=3, field_width=3, padding=1,
41 |                    stride=1):
42 |   """ An implementation of col2im based on fancy indexing and np.add.at """
43 |   N, C, H, W = x_shape
44 |   H_padded, W_padded = H + 2 * padding, W + 2 * padding
45 |   x_padded = np.zeros((N, C, H_padded, W_padded), dtype=cols.dtype)
46 |   k, i, j = get_im2col_indices(x_shape, field_height, field_width, padding,
47 |                                stride)
48 |   cols_reshaped = cols.reshape(C * field_height * field_width, -1, N)
49 |   cols_reshaped = cols_reshaped.transpose(2, 0, 1)
50 |   np.add.at(x_padded, (slice(None), k, i, j), cols_reshaped)
51 |   if padding == 0:
52 |     return x_padded
53 |   return x_padded[:, :, padding:-padding, padding:-padding]
54 | 
55 | pass
56 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/im2col_cython.pyx:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | cimport numpy as np
  3 | cimport cython
  4 | 
  5 | # DTYPE = np.float64
  6 | # ctypedef np.float64_t DTYPE_t
  7 | 
  8 | ctypedef fused DTYPE_t:
  9 |     np.float32_t
 10 |     np.float64_t
 11 | 
 12 | def im2col_cython(np.ndarray[DTYPE_t, ndim=4] x, int field_height,
 13 |                   int field_width, int padding, int stride):
 14 |     cdef int N = x.shape[0]
 15 |     cdef int C = x.shape[1]
 16 |     cdef int H = x.shape[2]
 17 |     cdef int W = x.shape[3]
 18 |     
 19 |     cdef int HH = (H + 2 * padding - field_height) / stride + 1
 20 |     cdef int WW = (W + 2 * padding - field_width) / stride + 1
 21 | 
 22 |     cdef int p = padding
 23 |     cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.pad(x,
 24 |             ((0, 0), (0, 0), (p, p), (p, p)), mode='constant')
 25 | 
 26 |     cdef np.ndarray[DTYPE_t, ndim=2] cols = np.zeros(
 27 |             (C * field_height * field_width, N * HH * WW),
 28 |             dtype=x.dtype)
 29 | 
 30 |     # Moving the inner loop to a C function with no bounds checking works, but does
 31 |     # not seem to help performance in any measurable way.
 32 | 
 33 |     im2col_cython_inner(cols, x_padded, N, C, H, W, HH, WW,
 34 |                         field_height, field_width, padding, stride)
 35 |     return cols
 36 | 
 37 | 
 38 | @cython.boundscheck(False)
 39 | cdef int im2col_cython_inner(np.ndarray[DTYPE_t, ndim=2] cols,
 40 |                              np.ndarray[DTYPE_t, ndim=4] x_padded,
 41 |                              int N, int C, int H, int W, int HH, int WW,
 42 |                              int field_height, int field_width, int padding, int stride) except? -1:
 43 |     cdef int c, ii, jj, row, yy, xx, i, col
 44 | 
 45 |     for c in range(C):
 46 |         for yy in range(HH):
 47 |             for xx in range(WW):
 48 |                 for ii in range(field_height):
 49 |                     for jj in range(field_width):
 50 |                         row = c * field_width * field_height + ii * field_height + jj
 51 |                         for i in range(N):
 52 |                             col = yy * WW * N + xx * N + i
 53 |                             cols[row, col] = x_padded[i, c, stride * yy + ii, stride * xx + jj]
 54 | 
 55 | 
 56 | 
 57 | def col2im_cython(np.ndarray[DTYPE_t, ndim=2] cols, int N, int C, int H, int W,
 58 |                   int field_height, int field_width, int padding, int stride):
 59 |     cdef np.ndarray x = np.empty((N, C, H, W), dtype=cols.dtype)
 60 |     cdef int HH = (H + 2 * padding - field_height) / stride + 1
 61 |     cdef int WW = (W + 2 * padding - field_width) / stride + 1
 62 |     cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.zeros((N, C, H + 2 * padding, W + 2 * padding),
 63 |                                         dtype=cols.dtype)
 64 | 
 65 |     # Moving the inner loop to a C-function with no bounds checking improves
 66 |     # performance quite a bit for col2im.
 67 |     col2im_cython_inner(cols, x_padded, N, C, H, W, HH, WW, 
 68 |                         field_height, field_width, padding, stride)
 69 |     if padding > 0:
 70 |         return x_padded[:, :, padding:-padding, padding:-padding]
 71 |     return x_padded
 72 | 
 73 | 
 74 | @cython.boundscheck(False)
 75 | cdef int col2im_cython_inner(np.ndarray[DTYPE_t, ndim=2] cols,
 76 |                              np.ndarray[DTYPE_t, ndim=4] x_padded,
 77 |                              int N, int C, int H, int W, int HH, int WW,
 78 |                              int field_height, int field_width, int padding, int stride) except? -1:
 79 |     cdef int c, ii, jj, row, yy, xx, i, col
 80 | 
 81 |     for c in range(C):
 82 |         for ii in range(field_height):
 83 |             for jj in range(field_width):
 84 |                 row = c * field_width * field_height + ii * field_height + jj
 85 |                 for yy in range(HH):
 86 |                     for xx in range(WW):
 87 |                         for i in range(N):
 88 |                             col = yy * WW * N + xx * N + i
 89 |                             x_padded[i, c, stride * yy + ii, stride * xx + jj] += cols[row, col]
 90 | 
 91 | 
 92 | @cython.boundscheck(False)
 93 | @cython.wraparound(False)
 94 | cdef col2im_6d_cython_inner(np.ndarray[DTYPE_t, ndim=6] cols,
 95 |                             np.ndarray[DTYPE_t, ndim=4] x_padded,
 96 |                             int N, int C, int H, int W, int HH, int WW,
 97 |                             int out_h, int out_w, int pad, int stride):
 98 | 
 99 |     cdef int c, hh, ww, n, h, w
100 |     for n in range(N):
101 |         for c in range(C):
102 |             for hh in range(HH):
103 |                 for ww in range(WW):
104 |                     for h in range(out_h):
105 |                         for w in range(out_w):
106 |                             x_padded[n, c, stride * h + hh, stride * w + ww] += cols[c, hh, ww, n, h, w]
107 |     
108 | 
109 | def col2im_6d_cython(np.ndarray[DTYPE_t, ndim=6] cols, int N, int C, int H, int W,
110 |         int HH, int WW, int pad, int stride):
111 |     cdef np.ndarray x = np.empty((N, C, H, W), dtype=cols.dtype)
112 |     cdef int out_h = (H + 2 * pad - HH) / stride + 1
113 |     cdef int out_w = (W + 2 * pad - WW) / stride + 1
114 |     cdef np.ndarray[DTYPE_t, ndim=4] x_padded = np.zeros((N, C, H + 2 * pad, W + 2 * pad),
115 |                                                   dtype=cols.dtype)
116 | 
117 |     col2im_6d_cython_inner(cols, x_padded, N, C, H, W, HH, WW, out_h, out_w, pad, stride)
118 | 
119 |     if pad > 0:
120 |         return x_padded[:, :, pad:-pad, pad:-pad]
121 |     return x_padded 
122 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/image_utils.py:
--------------------------------------------------------------------------------
 1 | import urllib2, os, tempfile
 2 | 
 3 | import numpy as np
 4 | from scipy.misc import imread
 5 | 
 6 | from cs231n.fast_layers import conv_forward_fast
 7 | 
 8 | 
 9 | """
10 | Utility functions used for viewing and processing images.
11 | """
12 | 
13 | 
14 | def blur_image(X):
15 |   """
16 |   A very gentle image blurring operation, to be used as a regularizer for image
17 |   generation.
18 |   
19 |   Inputs:
20 |   - X: Image data of shape (N, 3, H, W)
21 |   
22 |   Returns:
23 |   - X_blur: Blurred version of X, of shape (N, 3, H, W)
24 |   """
25 |   w_blur = np.zeros((3, 3, 3, 3))
26 |   b_blur = np.zeros(3)
27 |   blur_param = {'stride': 1, 'pad': 1}
28 |   for i in xrange(3):
29 |     w_blur[i, i] = np.asarray([[1, 2, 1], [2, 188, 2], [1, 2, 1]], dtype=np.float32)
30 |   w_blur /= 200.0
31 |   return conv_forward_fast(X, w_blur, b_blur, blur_param)[0]
32 | 
33 | 
34 | def preprocess_image(img, mean_img, mean='image'):
35 |   """
36 |   Convert to float, transepose, and subtract mean pixel
37 |   
38 |   Input:
39 |   - img: (H, W, 3)
40 |   
41 |   Returns:
42 |   - (1, 3, H, 3)
43 |   """
44 |   if mean == 'image':
45 |     mean = mean_img
46 |   elif mean == 'pixel':
47 |     mean = mean_img.mean(axis=(1, 2), keepdims=True)
48 |   elif mean == 'none':
49 |     mean = 0
50 |   else:
51 |     raise ValueError('mean must be image or pixel or none')
52 |   return img.astype(np.float32).transpose(2, 0, 1)[None] - mean
53 | 
54 | 
55 | def deprocess_image(img, mean_img, mean='image', renorm=False):
56 |   """
57 |   Add mean pixel, transpose, and convert to uint8
58 |   
59 |   Input:
60 |   - (1, 3, H, W) or (3, H, W)
61 |   
62 |   Returns:
63 |   - (H, W, 3)
64 |   """
65 |   if mean == 'image':
66 |     mean = mean_img
67 |   elif mean == 'pixel':
68 |     mean = mean_img.mean(axis=(1, 2), keepdims=True)
69 |   elif mean == 'none':
70 |     mean = 0
71 |   else:
72 |     raise ValueError('mean must be image or pixel or none')
73 |   if img.ndim == 3:
74 |     img = img[None]
75 |   img = (img + mean)[0].transpose(1, 2, 0)
76 |   if renorm:
77 |     low, high = img.min(), img.max()
78 |     img = 255.0 * (img - low) / (high - low)
79 |   return img.astype(np.uint8)
80 | 
81 | 
82 | def image_from_url(url):
83 |   """
84 |   Read an image from a URL. Returns a numpy array with the pixel data.
85 |   We write the image to a temporary file then read it back. Kinda gross.
86 |   """
87 |   try:
88 |     f = urllib2.urlopen(url)
89 |     _, fname = tempfile.mkstemp()
90 |     with open(fname, 'wb') as ff:
91 |       ff.write(f.read())
92 |     img = imread(fname)
93 |     os.remove(fname)
94 |     return img
95 |   except urllib2.URLError as e:
96 |     print 'URL Error: ', e.reason, url
97 |   except urllib2.HTTPError as e:
98 |     print 'HTTP Error: ', e.code, url
99 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/layer_utils.py:
--------------------------------------------------------------------------------
  1 | from cs231n.layers import *
  2 | from cs231n.fast_layers import *
  3 | 
  4 | 
  5 | def affine_relu_forward(x, w, b):
  6 |   """
  7 |   Convenience layer that perorms an affine transform followed by a ReLU
  8 | 
  9 |   Inputs:
 10 |   - x: Input to the affine layer
 11 |   - w, b: Weights for the affine layer
 12 | 
 13 |   Returns a tuple of:
 14 |   - out: Output from the ReLU
 15 |   - cache: Object to give to the backward pass
 16 |   """
 17 |   a, fc_cache = affine_forward(x, w, b)
 18 |   out, relu_cache = relu_forward(a)
 19 |   cache = (fc_cache, relu_cache)
 20 |   return out, cache
 21 | 
 22 | 
 23 | def affine_relu_backward(dout, cache):
 24 |   """
 25 |   Backward pass for the affine-relu convenience layer
 26 |   """
 27 |   fc_cache, relu_cache = cache
 28 |   da = relu_backward(dout, relu_cache)
 29 |   dx, dw, db = affine_backward(da, fc_cache)
 30 |   return dx, dw, db
 31 | 
 32 | 
 33 | def affine_bn_relu_forward(x, w, b, gamma, beta, bn_param):
 34 |   """
 35 |   Convenience layer that performs an affine transform, batch normalization,
 36 |   and ReLU.
 37 | 
 38 |   Inputs:
 39 |   - x: Array of shape (N, D1); input to the affine layer
 40 |   - w, b: Arrays of shape (D2, D2) and (D2,) giving the weight and bias for
 41 |     the affine transform.
 42 |   - gamma, beta: Arrays of shape (D2,) and (D2,) giving scale and shift
 43 |     parameters for batch normalization.
 44 |   - bn_param: Dictionary of parameters for batch normalization.
 45 | 
 46 |   Returns:
 47 |   - out: Output from ReLU, of shape (N, D2)
 48 |   - cache: Object to give to the backward pass.
 49 |   """
 50 |   a, fc_cache = affine_forward(x, w, b)
 51 |   a_bn, bn_cache = batchnorm_forward(a, gamma, beta, bn_param)
 52 |   out, relu_cache = relu_forward(a_bn)
 53 |   cache = (fc_cache, bn_cache, relu_cache)
 54 |   return out, cache
 55 | 
 56 | 
 57 | def affine_bn_relu_backward(dout, cache):
 58 |   """
 59 |   Backward pass for the affine-batchnorm-relu convenience layer.
 60 |   """
 61 |   fc_cache, bn_cache, relu_cache = cache
 62 |   da_bn = relu_backward(dout, relu_cache)
 63 |   da, dgamma, dbeta = batchnorm_backward(da_bn, bn_cache)
 64 |   dx, dw, db = affine_backward(da, fc_cache)
 65 |   return dx, dw, db, dgamma, dbeta  
 66 | 
 67 | 
 68 | def conv_relu_forward(x, w, b, conv_param):
 69 |   """
 70 |   A convenience layer that performs a convolution followed by a ReLU.
 71 | 
 72 |   Inputs:
 73 |   - x: Input to the convolutional layer
 74 |   - w, b, conv_param: Weights and parameters for the convolutional layer
 75 |   
 76 |   Returns a tuple of:
 77 |   - out: Output from the ReLU
 78 |   - cache: Object to give to the backward pass
 79 |   """
 80 |   a, conv_cache = conv_forward_fast(x, w, b, conv_param)
 81 |   out, relu_cache = relu_forward(a)
 82 |   cache = (conv_cache, relu_cache)
 83 |   return out, cache
 84 | 
 85 | 
 86 | def conv_relu_backward(dout, cache):
 87 |   """
 88 |   Backward pass for the conv-relu convenience layer.
 89 |   """
 90 |   conv_cache, relu_cache = cache
 91 |   da = relu_backward(dout, relu_cache)
 92 |   dx, dw, db = conv_backward_fast(da, conv_cache)
 93 |   return dx, dw, db
 94 | 
 95 | 
 96 | def conv_bn_relu_forward(x, w, b, gamma, beta, conv_param, bn_param):
 97 |   a, conv_cache = conv_forward_fast(x, w, b, conv_param)
 98 |   an, bn_cache = spatial_batchnorm_forward(a, gamma, beta, bn_param)
 99 |   out, relu_cache = relu_forward(an)
100 |   cache = (conv_cache, bn_cache, relu_cache)
101 |   return out, cache
102 | 
103 | 
104 | def conv_bn_relu_backward(dout, cache):
105 |   conv_cache, bn_cache, relu_cache = cache
106 |   dan = relu_backward(dout, relu_cache)
107 |   da, dgamma, dbeta = spatial_batchnorm_backward(dan, bn_cache)
108 |   dx, dw, db = conv_backward_fast(da, conv_cache)
109 |   return dx, dw, db, dgamma, dbeta
110 | 
111 | 
112 | def conv_relu_pool_forward(x, w, b, conv_param, pool_param):
113 |   """
114 |   Convenience layer that performs a convolution, a ReLU, and a pool.
115 | 
116 |   Inputs:
117 |   - x: Input to the convolutional layer
118 |   - w, b, conv_param: Weights and parameters for the convolutional layer
119 |   - pool_param: Parameters for the pooling layer
120 | 
121 |   Returns a tuple of:
122 |   - out: Output from the pooling layer
123 |   - cache: Object to give to the backward pass
124 |   """
125 |   a, conv_cache = conv_forward_fast(x, w, b, conv_param)
126 |   s, relu_cache = relu_forward(a)
127 |   out, pool_cache = max_pool_forward_fast(s, pool_param)
128 |   cache = (conv_cache, relu_cache, pool_cache)
129 |   return out, cache
130 | 
131 | 
132 | def conv_relu_pool_backward(dout, cache):
133 |   """
134 |   Backward pass for the conv-relu-pool convenience layer
135 |   """
136 |   conv_cache, relu_cache, pool_cache = cache
137 |   ds = max_pool_backward_fast(dout, pool_cache)
138 |   da = relu_backward(ds, relu_cache)
139 |   dx, dw, db = conv_backward_fast(da, conv_cache)
140 |   return dx, dw, db
141 | 
142 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/layers.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | 
  4 | def affine_forward(x, w, b):
  5 |   """
  6 |   Computes the forward pass for an affine (fully-connected) layer.
  7 | 
  8 |   The input x has shape (N, d_1, ..., d_k) where x[i] is the ith input.
  9 |   We multiply this against a weight matrix of shape (D, M) where
 10 |   D = \prod_i d_i
 11 | 
 12 |   Inputs:
 13 |   x - Input data, of shape (N, d_1, ..., d_k)
 14 |   w - Weights, of shape (D, M)
 15 |   b - Biases, of shape (M,)
 16 |   
 17 |   Returns a tuple of:
 18 |   - out: output, of shape (N, M)
 19 |   - cache: (x, w, b)
 20 |   """
 21 |   out = x.reshape(x.shape[0], -1).dot(w) + b
 22 |   cache = (x, w, b)
 23 |   return out, cache
 24 | 
 25 | 
 26 | def affine_backward(dout, cache):
 27 |   """
 28 |   Computes the backward pass for an affine layer.
 29 | 
 30 |   Inputs:
 31 |   - dout: Upstream derivative, of shape (N, M)
 32 |   - cache: Tuple of:
 33 |     - x: Input data, of shape (N, d_1, ... d_k)
 34 |     - w: Weights, of shape (D, M)
 35 | 
 36 |   Returns a tuple of:
 37 |   - dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
 38 |   - dw: Gradient with respect to w, of shape (D, M)
 39 |   - db: Gradient with respect to b, of shape (M,)
 40 |   """
 41 |   x, w, b = cache
 42 |   dx = dout.dot(w.T).reshape(x.shape)
 43 |   dw = x.reshape(x.shape[0], -1).T.dot(dout)
 44 |   db = np.sum(dout, axis=0)
 45 |   return dx, dw, db
 46 | 
 47 | 
 48 | def relu_forward(x):
 49 |   """
 50 |   Computes the forward pass for a layer of rectified linear units (ReLUs).
 51 | 
 52 |   Input:
 53 |   - x: Inputs, of any shape
 54 | 
 55 |   Returns a tuple of:
 56 |   - out: Output, of the same shape as x
 57 |   - cache: x
 58 |   """
 59 |   out = np.maximum(0, x)
 60 |   cache = x
 61 |   return out, cache
 62 | 
 63 | 
 64 | def relu_backward(dout, cache):
 65 |   """
 66 |   Computes the backward pass for a layer of rectified linear units (ReLUs).
 67 | 
 68 |   Input:
 69 |   - dout: Upstream derivatives, of any shape
 70 |   - cache: Input x, of same shape as dout
 71 | 
 72 |   Returns:
 73 |   - dx: Gradient with respect to x
 74 |   """
 75 |   x = cache
 76 |   dx = np.where(x > 0, dout, 0)
 77 |   return dx
 78 | 
 79 | 
 80 | def batchnorm_forward(x, gamma, beta, bn_param):
 81 |   """
 82 |   Forward pass for batch normalization.
 83 |   
 84 |   During training the sample mean and (uncorrected) sample variance are
 85 |   computed from minibatch statistics and used to normalize the incoming data.
 86 |   During training we also keep an exponentially decaying running mean of the mean
 87 |   and variance of each feature, and these averages are used to normalize data
 88 |   at test-time.
 89 | 
 90 |   At each timestep we update the running averages for mean and variance using
 91 |   an exponential decay based on the momentum parameter:
 92 | 
 93 |   running_mean = momentum * running_mean + (1 - momentum) * sample_mean
 94 |   running_var = momentum * running_var + (1 - momentum) * sample_var
 95 | 
 96 |   Note that the batch normalization paper suggests a different test-time
 97 |   behavior: they compute sample mean and variance for each feature using a
 98 |   large number of training images rather than using a running average. For
 99 |   this implementation we have chosen to use running averages instead since
100 |   they do not require an additional estimation step; the torch7 implementation
101 |   of batch normalization also uses running averages.
102 | 
103 |   Input:
104 |   - x: Data of shape (N, D)
105 |   - gamma: Scale parameter of shape (D,)
106 |   - beta: Shift paremeter of shape (D,)
107 |   - bn_param: Dictionary with the following keys:
108 |     - mode: 'train' or 'test'; required
109 |     - eps: Constant for numeric stability
110 |     - momentum: Constant for running mean / variance.
111 |     - running_mean: Array of shape (D,) giving running mean of features
112 |     - running_var Array of shape (D,) giving running variance of features
113 | 
114 |   Returns a tuple of:
115 |   - out: of shape (N, D)
116 |   - cache: A tuple of values needed in the backward pass
117 |   """
118 |   mode = bn_param['mode']
119 |   eps = bn_param.get('eps', 1e-5)
120 |   momentum = bn_param.get('momentum', 0.9)
121 | 
122 |   N, D = x.shape
123 |   running_mean = bn_param.get('running_mean', np.zeros(D, dtype=x.dtype))
124 |   running_var = bn_param.get('running_var', np.zeros(D, dtype=x.dtype))
125 | 
126 |   out, cache = None, None
127 |   if mode == 'train':
128 |     # Compute output
129 |     mu = x.mean(axis=0)
130 |     xc = x - mu
131 |     var = np.mean(xc ** 2, axis=0)
132 |     std = np.sqrt(var + eps)
133 |     xn = xc / std
134 |     out = gamma * xn + beta
135 | 
136 |     cache = (mode, x, gamma, xc, std, xn, out)
137 | 
138 |     # Update running average of mean
139 |     running_mean *= momentum
140 |     running_mean += (1 - momentum) * mu
141 | 
142 |     # Update running average of variance
143 |     running_var *= momentum
144 |     running_var += (1 - momentum) * var
145 |   elif mode == 'test':
146 |     # Using running mean and variance to normalize
147 |     std = np.sqrt(running_var + eps)
148 |     xn = (x - running_mean) / std
149 |     out = gamma * xn + beta
150 |     cache = (mode, x, xn, gamma, beta, std)
151 |   else:
152 |     raise ValueError('Invalid forward batchnorm mode "%s"' % mode)
153 | 
154 |   # Store the updated running means back into bn_param
155 |   bn_param['running_mean'] = running_mean
156 |   bn_param['running_var'] = running_var
157 | 
158 |   return out, cache
159 | 
160 | 
161 | def batchnorm_backward(dout, cache):
162 |   """
163 |   Backward pass for batch normalization.
164 |   
165 |   For this implementation, you should write out a computation graph for
166 |   batch normalization on paper and propagate gradients backward through
167 |   intermediate nodes.
168 |   
169 |   Inputs:
170 |   - dout: Upstream derivatives, of shape (N, D)
171 |   - cache: Variable of intermediates from batchnorm_forward.
172 |   
173 |   Returns a tuple of:
174 |   - dx: Gradient with respect to inputs x, of shape (N, D)
175 |   - dgamma: Gradient with respect to scale parameter gamma, of shape (D,)
176 |   - dbeta: Gradient with respect to shift parameter beta, of shape (D,)
177 |   """
178 |   mode = cache[0]
179 |   if mode == 'train':
180 |     mode, x, gamma, xc, std, xn, out = cache
181 | 
182 |     N = x.shape[0]
183 |     dbeta = dout.sum(axis=0)
184 |     dgamma = np.sum(xn * dout, axis=0)
185 |     dxn = gamma * dout
186 |     dxc = dxn / std
187 |     dstd = -np.sum((dxn * xc) / (std * std), axis=0)
188 |     dvar = 0.5 * dstd / std
189 |     dxc += (2.0 / N) * xc * dvar
190 |     dmu = np.sum(dxc, axis=0)
191 |     dx = dxc - dmu / N
192 |   elif mode == 'test':
193 |     mode, x, xn, gamma, beta, std = cache
194 |     dbeta = dout.sum(axis=0)
195 |     dgamma = np.sum(xn * dout, axis=0)
196 |     dxn = gamma * dout
197 |     dx = dxn / std
198 |   else:
199 |     raise ValueError(mode)
200 | 
201 |   return dx, dgamma, dbeta
202 | 
203 | 
204 | def spatial_batchnorm_forward(x, gamma, beta, bn_param):
205 |   """
206 |   Computes the forward pass for spatial batch normalization.
207 |   
208 |   Inputs:
209 |   - x: Input data of shape (N, C, H, W)
210 |   - gamma: Scale parameter, of shape (C,)
211 |   - beta: Shift parameter, of shape (C,)
212 |   - bn_param: Dictionary with the following keys:
213 |     - mode: 'train' or 'test'; required
214 |     - eps: Constant for numeric stability
215 |     - momentum: Constant for running mean / variance. momentum=0 means that
216 |       old information is discarded completely at every time step, while
217 |       momentum=1 means that new information is never incorporated. The
218 |       default of momentum=0.9 should work well in most situations.
219 |     - running_mean: Array of shape (D,) giving running mean of features
220 |     - running_var Array of shape (D,) giving running variance of features
221 |     
222 |   Returns a tuple of:
223 |   - out: Output data, of shape (N, C, H, W)
224 |   - cache: Values needed for the backward pass
225 |   """
226 |   N, C, H, W = x.shape
227 |   x_flat = x.transpose(0, 2, 3, 1).reshape(-1, C)
228 |   out_flat, cache = batchnorm_forward(x_flat, gamma, beta, bn_param)
229 |   out = out_flat.reshape(N, H, W, C).transpose(0, 3, 1, 2)
230 |   return out, cache
231 | 
232 | 
233 | def spatial_batchnorm_backward(dout, cache):
234 |   """
235 |   Computes the backward pass for spatial batch normalization.
236 |   
237 |   Inputs:
238 |   - dout: Upstream derivatives, of shape (N, C, H, W)
239 |   - cache: Values from the forward pass
240 |   
241 |   Returns a tuple of:
242 |   - dx: Gradient with respect to inputs, of shape (N, C, H, W)
243 |   - dgamma: Gradient with respect to scale parameter, of shape (C,)
244 |   - dbeta: Gradient with respect to shift parameter, of shape (C,)
245 |   """
246 |   N, C, H, W = dout.shape
247 |   dout_flat = dout.transpose(0, 2, 3, 1).reshape(-1, C)
248 |   dx_flat, dgamma, dbeta = batchnorm_backward(dout_flat, cache)
249 |   dx = dx_flat.reshape(N, H, W, C).transpose(0, 3, 1, 2)
250 |   return dx, dgamma, dbeta
251 | 
252 | 
253 | def svm_loss(x, y):
254 |   """
255 |   Computes the loss and gradient using for multiclass SVM classification.
256 | 
257 |   Inputs:
258 |   - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class
259 |     for the ith input.
260 |   - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
261 |     0 <= y[i] < C
262 | 
263 |   Returns a tuple of:
264 |   - loss: Scalar giving the loss
265 |   - dx: Gradient of the loss with respect to x
266 |   """
267 |   N = x.shape[0]
268 |   correct_class_scores = x[np.arange(N), y]
269 |   margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
270 |   margins[np.arange(N), y] = 0
271 |   loss = np.sum(margins) / N
272 |   num_pos = np.sum(margins > 0, axis=1)
273 |   dx = np.zeros_like(x)
274 |   dx[margins > 0] = 1
275 |   dx[np.arange(N), y] -= num_pos
276 |   dx /= N
277 |   return loss, dx
278 | 
279 | 
280 | def softmax_loss(x, y):
281 |   """
282 |   Computes the loss and gradient for softmax classification.
283 | 
284 |   Inputs:
285 |   - x: Input data, of shape (N, C) where x[i, j] is the score for the jth class
286 |     for the ith input.
287 |   - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
288 |     0 <= y[i] < C
289 | 
290 |   Returns a tuple of:
291 |   - loss: Scalar giving the loss
292 |   - dx: Gradient of the loss with respect to x
293 |   """
294 |   probs = np.exp(x - np.max(x, axis=1, keepdims=True))
295 |   probs /= np.sum(probs, axis=1, keepdims=True)
296 |   N = x.shape[0]
297 |   loss = -np.sum(np.log(probs[np.arange(N), y])) / N
298 |   dx = probs.copy()
299 |   dx[np.arange(N), y] -= 1
300 |   dx /= N
301 |   return loss, dx
302 | 
303 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/optim.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | """
 4 | This file implements various first-order update rules that are commonly used for
 5 | training neural networks. Each update rule accepts current weights and the
 6 | gradient of the loss with respect to those weights and produces the next set of
 7 | weights. Each update rule has the same interface:
 8 | 
 9 | def update(w, dw, config=None):
10 | 
11 | Inputs:
12 |   - w: A numpy array giving the current weights.
13 |   - dw: A numpy array of the same shape as w giving the gradient of the
14 |     loss with respect to w.
15 |   - config: A dictionary containing hyperparameter values such as learning rate,
16 |     momentum, etc. If the update rule requires caching values over many
17 |     iterations, then config will also hold these cached values.
18 | 
19 | Returns:
20 |   - next_w: The next point after the update.
21 |   - config: The config dictionary to be passed to the next iteration of the
22 |     update rule.
23 | 
24 | NOTE: For most update rules, the default learning rate will probably not perform
25 | well; however the default values of the other hyperparameters should work well
26 | for a variety of different problems.
27 | 
28 | For efficiency, update rules may perform in-place updates, mutating w and
29 | setting next_w equal to w.
30 | """
31 | 
32 | 
33 | def sgd(w, dw, config=None):
34 |   """
35 |   Performs vanilla stochastic gradient descent.
36 | 
37 |   config format:
38 |   - learning_rate: Scalar learning rate.
39 |   """
40 |   if config is None: config = {}
41 |   config.setdefault('learning_rate', 1e-2)
42 | 
43 |   w -= config['learning_rate'] * dw
44 |   return w, config
45 | 
46 | 
47 | def adam(x, dx, config=None):
48 |   """
49 |   Uses the Adam update rule, which incorporates moving averages of both the
50 |   gradient and its square and a bias correction term.
51 | 
52 |   config format:
53 |   - learning_rate: Scalar learning rate.
54 |   - beta1: Decay rate for moving average of first moment of gradient.
55 |   - beta2: Decay rate for moving average of second moment of gradient.
56 |   - epsilon: Small scalar used for smoothing to avoid dividing by zero.
57 |   - m: Moving average of gradient.
58 |   - v: Moving average of squared gradient.
59 |   - t: Iteration number.
60 |   """
61 |   if config is None: config = {}
62 |   config.setdefault('learning_rate', 1e-3)
63 |   config.setdefault('beta1', 0.9)
64 |   config.setdefault('beta2', 0.999)
65 |   config.setdefault('epsilon', 1e-8)
66 |   config.setdefault('m', np.zeros_like(x))
67 |   config.setdefault('v', np.zeros_like(x))
68 |   config.setdefault('t', 0)
69 |   
70 |   next_x = None
71 |   beta1, beta2, eps = config['beta1'], config['beta2'], config['epsilon']
72 |   t, m, v = config['t'], config['m'], config['v']
73 |   m = beta1 * m + (1 - beta1) * dx
74 |   v = beta2 * v + (1 - beta2) * (dx * dx)
75 |   t += 1
76 |   alpha = config['learning_rate'] * np.sqrt(1 - beta2 ** t) / (1 - beta1 ** t)
77 |   x -= alpha * (m / (np.sqrt(v) + eps))
78 |   config['t'] = t
79 |   config['m'] = m
80 |   config['v'] = v
81 |   next_x = x
82 |   
83 |   return next_x, config
84 | 
85 |   
86 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/rnn_layers.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | 
  4 | """
  5 | This file defines layer types that are commonly used for recurrent neural
  6 | networks.
  7 | """
  8 | 
  9 | 
 10 | def rnn_step_forward(x, prev_h, Wx, Wh, b):
 11 |   """
 12 |   Run the forward pass for a single timestep of a vanilla RNN that uses a tanh
 13 |   activation function.
 14 | 
 15 |   The input data has dimension D, the hidden state has dimension H, and we use
 16 |   a minibatch size of N.
 17 | 
 18 |   Inputs:
 19 |   - x: Input data for this timestep, of shape (N, D).
 20 |   - prev_h: Hidden state from previous timestep, of shape (N, H)
 21 |   - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
 22 |   - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
 23 |   - b: Biases of shape (H,)
 24 | 
 25 |   Returns a tuple of:
 26 |   - next_h: Next hidden state, of shape (N, H)
 27 |   - cache: Tuple of values needed for the backward pass.
 28 |   """
 29 |   next_h, cache = None, None
 30 |   ##############################################################################
 31 |   # TODO: Implement a single forward step for the vanilla RNN. Store the next  #
 32 |   # hidden state and any values you need for the backward pass in the next_h   #
 33 |   # and cache variables respectively.                                          #
 34 |   ##############################################################################
 35 |   intermediate = np.dot(x, Wx) + np.dot(prev_h, Wh) + b  # (N,H)
 36 |   next_h = np.tanh(intermediate)
 37 |   cache = (x, prev_h, Wx, Wh, intermediate, next_h)
 38 |   ##############################################################################
 39 |   #                               END OF YOUR CODE                             #
 40 |   ##############################################################################
 41 |   return next_h, cache
 42 | 
 43 | 
 44 | def rnn_step_backward(dnext_h, cache):
 45 |   """
 46 |   Backward pass for a single timestep of a vanilla RNN.
 47 |   
 48 |   Inputs:
 49 |   - dnext_h: Gradient of loss with respect to next hidden state
 50 |   - cache: Cache object from the forward pass
 51 |   
 52 |   Returns a tuple of:
 53 |   - dx: Gradients of input data, of shape (N, D)
 54 |   - dprev_h: Gradients of previous hidden state, of shape (N, H)
 55 |   - dWx: Gradients of input-to-hidden weights, of shape (N, H)
 56 |   - dWh: Gradients of hidden-to-hidden weights, of shape (H, H)
 57 |   - db: Gradients of bias vector, of shape (H,)
 58 |   """
 59 |   dx, dprev_h, dWx, dWh, db = None, None, None, None, None
 60 |   ##############################################################################
 61 |   # TODO: Implement the backward pass for a single step of a vanilla RNN.      #
 62 |   #                                                                            #
 63 |   # HINT: For the tanh function, you can compute the local derivative in terms #
 64 |   # of the output value from tanh.                                             #
 65 |   ##############################################################################
 66 |   x, prev_h, Wx, Wh, intermediate, next_h = cache
 67 |   dintermediate = dnext_h*(1-next_h**2)
 68 |   db = np.sum(dintermediate, axis=0)
 69 |   dx = np.dot(dintermediate, Wx.T)
 70 |   dWx = np.dot(x.T, dintermediate)
 71 |   dprev_h = np.dot(dintermediate, Wh.T)
 72 |   dWh = np.dot(prev_h.T, dintermediate)
 73 |   ##############################################################################
 74 |   #                               END OF YOUR CODE                             #
 75 |   ##############################################################################
 76 |   return dx, dprev_h, dWx, dWh, db
 77 | 
 78 | 
 79 | def rnn_forward(x, h0, Wx, Wh, b):
 80 |   """
 81 |   Run a vanilla RNN forward on an entire sequence of data. We assume an input
 82 |   sequence composed of T vectors, each of dimension D. The RNN uses a hidden
 83 |   size of H, and we work over a minibatch containing N sequences. After running
 84 |   the RNN forward, we return the hidden states for all timesteps.
 85 |   
 86 |   Inputs:
 87 |   - x: Input data for the entire timeseries, of shape (N, T, D).
 88 |   - h0: Initial hidden state, of shape (N, H)
 89 |   - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
 90 |   - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
 91 |   - b: Biases of shape (H,)
 92 |   
 93 |   Returns a tuple of:
 94 |   - h: Hidden states for the entire timeseries, of shape (N, T, H).
 95 |   - cache: Values needed in the backward pass
 96 |   """
 97 |   h, cache = None, None
 98 |   ##############################################################################
 99 |   # TODO: Implement forward pass for a vanilla RNN running on a sequence of    #
100 |   # input data. You should use the rnn_step_forward function that you defined  #
101 |   # above.                                                                     #
102 |   ##############################################################################
103 |   N, T, D = x.shape
104 |   H = h0.shape[1]
105 |   h = np.zeros((N, T, H))
106 |   cache = [None] * T
107 |   for t in range(T):
108 |     if t==0:
109 |       h[:,t,:], cache[t] = rnn_step_forward(x[:,t,:], h0, Wx, Wh, b)
110 |     else:
111 |       h[:,t,:], cache[t] = rnn_step_forward(x[:,t,:], h[:,t-1,:], Wx, Wh, b)
112 |   ##############################################################################
113 |   #                               END OF YOUR CODE                             #
114 |   ##############################################################################
115 |   return h, cache
116 | 
117 | 
118 | def rnn_backward(dh, cache):
119 |   """
120 |   Compute the backward pass for a vanilla RNN over an entire sequence of data.
121 |   
122 |   Inputs:
123 |   - dh: Upstream gradients of all hidden states, of shape (N, T, H)
124 |   
125 |   Returns a tuple of:
126 |   - dx: Gradient of inputs, of shape (N, T, D)
127 |   - dh0: Gradient of initial hidden state, of shape (N, H)
128 |   - dWx: Gradient of input-to-hidden weights, of shape (D, H)
129 |   - dWh: Gradient of hidden-to-hidden weights, of shape (H, H)
130 |   - db: Gradient of biases, of shape (H,)
131 |   """
132 |   dx, dh0, dWx, dWh, db = None, None, None, None, None
133 |   ##############################################################################
134 |   # TODO: Implement the backward pass for a vanilla RNN running an entire      #
135 |   # sequence of data. You should use the rnn_step_backward function that you   #
136 |   # defined above.                                                             #
137 |   ##############################################################################
138 |   N, T, H = dh.shape
139 |   D = cache[0][0].shape[1]
140 |   dx = np.zeros((N, T, D))
141 |   dWx = np.zeros((D, H))
142 |   dWh = np.zeros((H, H))
143 |   db = np.zeros(H)
144 |   dprev_h_t = np.zeros((N, H))
145 | 
146 |   for t in range(T-1, -1, -1):
147 |     dx_t, dprev_h_t, dWx_t, dWh_t, db_t = rnn_step_backward(dprev_h_t + dh[:,t,:], cache[t])
148 |     dx[:,t,:] = dx_t
149 |     dWx += dWx_t
150 |     dWh += dWh_t
151 |     db += db_t
152 |   dh0 = dprev_h_t
153 |   ##############################################################################
154 |   #                               END OF YOUR CODE                             #
155 |   ##############################################################################
156 |   return dx, dh0, dWx, dWh, db
157 | 
158 | 
159 | def word_embedding_forward(x, W):
160 |   """
161 |   Forward pass for word embeddings. We operate on minibatches of size N where
162 |   each sequence has length T. We assume a vocabulary of V words, assigning each
163 |   to a vector of dimension D.
164 |   
165 |   Inputs:
166 |   - x: Integer array of shape (N, T) giving indices of words. Each element idx
167 |     of x muxt be in the range 0 <= idx < V.
168 |   - W: Weight matrix of shape (V, D) giving word vectors for all words.
169 |   
170 |   Returns a tuple of:
171 |   - out: Array of shape (N, T, D) giving word vectors for all input words.
172 |   - cache: Values needed for the backward pass
173 |   """
174 |   out, cache = None, None
175 |   ##############################################################################
176 |   # TODO: Implement the forward pass for word embeddings.                      #
177 |   #                                                                            #
178 |   # HINT: This should be very simple.                                          #
179 |   ##############################################################################
180 |   N, T = x.shape
181 |   V, D = W.shape
182 |   out = np.zeros((N, T, D))
183 |   it = np.nditer(x, flags=['multi_index'])
184 |   while not it.finished:
185 |     out[it.multi_index] = W[it.value]
186 |     it.iternext()
187 |   cache = (x, V, D)
188 |   ##############################################################################
189 |   #                               END OF YOUR CODE                             #
190 |   ##############################################################################
191 |   return out, cache
192 | 
193 | 
194 | def word_embedding_backward(dout, cache):
195 |   """
196 |   Backward pass for word embeddings. We cannot back-propagate into the words
197 |   since they are integers, so we only return gradient for the word embedding
198 |   matrix.
199 |   
200 |   HINT: Look up the function np.add.at
201 |   
202 |   Inputs:
203 |   - dout: Upstream gradients of shape (N, T, D)
204 |   - cache: Values from the forward pass
205 |   
206 |   Returns:
207 |   - dW: Gradient of word embedding matrix, of shape (V, D).
208 |   """
209 |   dW = None
210 |   ##############################################################################
211 |   # TODO: Implement the backward pass for word embeddings.                     #
212 |   #                                                                            #
213 |   # HINT: Look up the function np.add.at                                       #
214 |   ##############################################################################
215 |   x,V,D = cache
216 |   dW = np.zeros((V,D))
217 | 
218 |   np.add.at(dW, x, dout)
219 |   ##############################################################################
220 |   #                               END OF YOUR CODE                             #
221 |   ##############################################################################
222 |   return dW
223 | 
224 | 
225 | def sigmoid(x):
226 |   """
227 |   A numerically stable version of the logistic sigmoid function.
228 |   """
229 |   pos_mask = (x >= 0)
230 |   neg_mask = (x < 0)
231 |   z = np.zeros_like(x)
232 |   z[pos_mask] = np.exp(-x[pos_mask])
233 |   z[neg_mask] = np.exp(x[neg_mask])
234 |   top = np.ones_like(x)
235 |   top[neg_mask] = z[neg_mask]
236 |   return top / (1 + z)
237 | 
238 | 
239 | def lstm_step_forward(x, prev_h, prev_c, Wx, Wh, b):
240 |   """
241 |   Forward pass for a single timestep of an LSTM.
242 |   
243 |   The input data has dimension D, the hidden state has dimension H, and we use
244 |   a minibatch size of N.
245 |   
246 |   Inputs:
247 |   - x: Input data, of shape (N, D)
248 |   - prev_h: Previous hidden state, of shape (N, H)
249 |   - prev_c: previous cell state, of shape (N, H)
250 |   - Wx: Input-to-hidden weights, of shape (D, 4H)
251 |   - Wh: Hidden-to-hidden weights, of shape (H, 4H)
252 |   - b: Biases, of shape (4H,)
253 |   
254 |   Returns a tuple of:
255 |   - next_h: Next hidden state, of shape (N, H)
256 |   - next_c: Next cell state, of shape (N, H)
257 |   - cache: Tuple of values needed for backward pass.
258 |   """
259 |   next_h, next_c, cache = None, None, None
260 |   #############################################################################
261 |   # TODO: Implement the forward pass for a single timestep of an LSTM.        #
262 |   # You may want to use the numerically stable sigmoid implementation above.  #
263 |   #############################################################################
264 |   N, H = prev_h.shape
265 |   intermediate = np.dot(x, Wx) + np.dot(prev_h, Wh) + b  # (N, 4H)
266 |   i = sigmoid(intermediate[:, :H])
267 |   f = sigmoid(intermediate[:, H:2*H])
268 |   o = sigmoid(intermediate[:, 2*H:3*H])
269 |   g = np.tanh(intermediate[:, 3*H:])
270 | 
271 |   next_c = f*prev_c + i*g
272 |   next_h = o*np.tanh(next_c)
273 |   cache = (x, prev_h, prev_c, Wx, Wh, i, f, o, g, next_h, next_c)
274 |   ##############################################################################
275 |   #                               END OF YOUR CODE                             #
276 |   ##############################################################################
277 |   
278 |   return next_h, next_c, cache
279 | 
280 | 
281 | def lstm_step_backward(dnext_h, dnext_c, cache):
282 |   """
283 |   Backward pass for a single timestep of an LSTM.
284 |   
285 |   Inputs:
286 |   - dnext_h: Gradients of next hidden state, of shape (N, H)
287 |   - dnext_c: Gradients of next cell state, of shape (N, H)
288 |   - cache: Values from the forward pass
289 |   
290 |   Returns a tuple of:
291 |   - dx: Gradient of input data, of shape (N, D)
292 |   - dprev_h: Gradient of previous hidden state, of shape (N, H)
293 |   - dprev_c: Gradient of previous cell state, of shape (N, H)
294 |   - dWx: Gradient of input-to-hidden weights, of shape (D, 4H)
295 |   - dWh: Gradient of hidden-to-hidden weights, of shape (H, 4H)
296 |   - db: Gradient of biases, of shape (4H,)
297 |   """
298 |   dx, dh, dc, dWx, dWh, db = None, None, None, None, None, None
299 |   #############################################################################
300 |   # TODO: Implement the backward pass for a single timestep of an LSTM.       #
301 |   #                                                                           #
302 |   # HINT: For sigmoid and tanh you can compute local derivatives in terms of  #
303 |   # the output value from the nonlinearity.                                   #
304 |   #############################################################################
305 |   x, prev_h, prev_c, Wx, Wh, i, f, o, g, next_h, next_c = cache
306 |   N, H = prev_h.shape
307 |   # next_c = f*prev_c + i*g   next_h = o*np.tanh(next_c)
308 |   dnext_c = dnext_c + o*(1-np.tanh(next_c)**2)*dnext_h  # Important!
309 |   dprev_c = dnext_c*f
310 | 
311 |   di = dnext_c*g
312 |   df = dnext_c*prev_c
313 |   do = dnext_h*np.tanh(next_c)
314 |   dg = dnext_c*i
315 | 
316 |   d_intermediate = np.zeros((N, 4*H))
317 |   d_intermediate[:, :H] = di*i*(1-i)
318 |   d_intermediate[:, H:2*H] = df*f*(1-f)
319 |   d_intermediate[:, 2*H:3*H] = do*o*(1-o)
320 |   d_intermediate[:, 3*H:] = dg*(1-g*g)
321 | 
322 |   db = np.sum(d_intermediate, axis=0)  # (N, 4H)-> (,4H)
323 |   # intermediate = np.dot(x, Wx) + np.dot(prev_h, Wh) + b
324 |   dx = np.dot(d_intermediate, Wx.T)  # N*4H 4H*D
325 |   dWx = np.dot(x.T, d_intermediate)  # D*N N*4H
326 |   dprev_h = np.dot(d_intermediate, Wh.T)  # N*4H 4H*H
327 |   dWh = np.dot(prev_h.T, d_intermediate)  # H*N N*4H
328 |   ##############################################################################
329 |   #                               END OF YOUR CODE                             #
330 |   ##############################################################################
331 | 
332 |   return dx, dprev_h, dprev_c, dWx, dWh, db
333 | 
334 | 
335 | def lstm_forward(x, h0, Wx, Wh, b):
336 |   """
337 |   Forward pass for an LSTM over an entire sequence of data. We assume an input
338 |   sequence composed of T vectors, each of dimension D. The LSTM uses a hidden
339 |   size of H, and we work over a minibatch containing N sequences. After running
340 |   the LSTM forward, we return the hidden states for all timesteps.
341 |   
342 |   Note that the initial cell state is passed as input, but the initial cell
343 |   state is set to zero. Also note that the cell state is not returned; it is
344 |   an internal variable to the LSTM and is not accessed from outside.
345 |   
346 |   Inputs:
347 |   - x: Input data of shape (N, T, D)
348 |   - h0: Initial hidden state of shape (N, H)
349 |   - Wx: Weights for input-to-hidden connections, of shape (D, 4H)
350 |   - Wh: Weights for hidden-to-hidden connections, of shape (H, 4H)
351 |   - b: Biases of shape (4H,)
352 |   
353 |   Returns a tuple of:
354 |   - h: Hidden states for all timesteps of all sequences, of shape (N, T, H)
355 |   - cache: Values needed for the backward pass.
356 |   """
357 |   h, cache = None, None
358 |   #############################################################################
359 |   # TODO: Implement the forward pass for an LSTM over an entire timeseries.   #
360 |   # You should use the lstm_step_forward function that you just defined.      #
361 |   #############################################################################
362 |   N, T, D = x.shape
363 |   H = h0.shape[1]
364 |   h = np.zeros((N, T, H))
365 |   c = np.zeros((N, T, H))
366 |   cache = [None] * T
367 |   c0 = np.zeros((N, H))
368 | 
369 |   for t in range(T):
370 |     if t==0:
371 |       h[:,t,:], c[:, t, :], cache[t] = lstm_step_forward(x[:,t,:], h0, c0, Wx, Wh, b)
372 |     else:
373 |       h[:,t,:], c[:, t, :], cache[t] = lstm_step_forward(x[:,t,:], h[:,t-1,:], c[:,t-1,:], Wx, Wh, b)
374 |  
375 |   ##############################################################################
376 |   #                               END OF YOUR CODE                             #
377 |   ##############################################################################
378 | 
379 |   return h, cache
380 | 
381 | 
382 | def lstm_backward(dh, cache):
383 |   """
384 |   Backward pass for an LSTM over an entire sequence of data.]
385 |   
386 |   Inputs:
387 |   - dh: Upstream gradients of hidden states, of shape (N, T, H)
388 |   - cache: Values from the forward pass
389 |   
390 |   Returns a tuple of:
391 |   - dx: Gradient of input data of shape (N, T, D)
392 |   - dh0: Gradient of initial hidden state of shape (N, H)
393 |   - dWx: Gradient of input-to-hidden weight matrix of shape (D, 4H)
394 |   - dWh: Gradient of hidden-to-hidden weight matrix of shape (H, 4H)
395 |   - db: Gradient of biases, of shape (4H,)
396 |   """
397 |   dx, dh0, dWx, dWh, db = None, None, None, None, None
398 |   #############################################################################
399 |   # TODO: Implement the backward pass for an LSTM over an entire timeseries.  #
400 |   # You should use the lstm_step_backward function that you just defined.     #
401 |   #############################################################################
402 |   N, T, H = dh.shape
403 |   D = cache[0][0].shape[1]
404 |   dx = np.zeros((N, T, D))
405 |   dWx = np.zeros((D, 4*H))
406 |   dWh = np.zeros((H, 4*H))
407 |   db = np.zeros(4*H)
408 |   dprev_h_t = np.zeros((N, H))
409 |   dprev_c_t = np.zeros((N, H))
410 | 
411 |   for t in range(T-1, -1, -1):
412 |     dx_t, dprev_h_t, dprev_c_t, dWx_t, dWh_t, db_t = lstm_step_backward(dprev_h_t + dh[:,t,:], dprev_c_t, cache[t])
413 |     dx[:,t,:] = dx_t
414 |     dWx += dWx_t
415 |     dWh += dWh_t
416 |     db += db_t
417 |   dh0 = dprev_h_t
418 |   ##############################################################################
419 |   #                               END OF YOUR CODE                             #
420 |   ##############################################################################
421 |   
422 |   return dx, dh0, dWx, dWh, db
423 | 
424 | 
425 | def temporal_affine_forward(x, w, b):
426 |   """
427 |   Forward pass for a temporal affine layer. The input is a set of D-dimensional
428 |   vectors arranged into a minibatch of N timeseries, each of length T. We use
429 |   an affine function to transform each of those vectors into a new vector of
430 |   dimension M.
431 | 
432 |   Inputs:
433 |   - x: Input data of shape (N, T, D)
434 |   - w: Weights of shape (D, M)
435 |   - b: Biases of shape (M,)
436 |   
437 |   Returns a tuple of:
438 |   - out: Output data of shape (N, T, M)
439 |   - cache: Values needed for the backward pass
440 |   """
441 |   N, T, D = x.shape
442 |   M = b.shape[0]
443 |   out = x.reshape(N * T, D).dot(w).reshape(N, T, M) + b
444 |   cache = x, w, b, out
445 |   return out, cache
446 | 
447 | 
448 | def temporal_affine_backward(dout, cache):
449 |   """
450 |   Backward pass for temporal affine layer.
451 | 
452 |   Input:
453 |   - dout: Upstream gradients of shape (N, T, M)
454 |   - cache: Values from forward pass
455 | 
456 |   Returns a tuple of:
457 |   - dx: Gradient of input, of shape (N, T, D)
458 |   - dw: Gradient of weights, of shape (D, M)
459 |   - db: Gradient of biases, of shape (M,)
460 |   """
461 |   x, w, b, out = cache
462 |   N, T, D = x.shape
463 |   M = b.shape[0]
464 | 
465 |   dx = dout.reshape(N * T, M).dot(w.T).reshape(N, T, D)
466 |   dw = dout.reshape(N * T, M).T.dot(x.reshape(N * T, D)).T
467 |   db = dout.sum(axis=(0, 1))
468 | 
469 |   return dx, dw, db
470 | 
471 | 
472 | def temporal_softmax_loss(x, y, mask, verbose=False):
473 |   """
474 |   A temporal version of softmax loss for use in RNNs. We assume that we are
475 |   making predictions over a vocabulary of size V for each timestep of a
476 |   timeseries of length T, over a minibatch of size N. The input x gives scores
477 |   for all vocabulary elements at all timesteps, and y gives the indices of the
478 |   ground-truth element at each timestep. We use a cross-entropy loss at each
479 |   timestep, summing the loss over all timesteps and averaging across the
480 |   minibatch.
481 | 
482 |   As an additional complication, we may want to ignore the model output at some
483 |   timesteps, since sequences of different length may have been combined into a
484 |   minibatch and padded with NULL tokens. The optional mask argument tells us
485 |   which elements should contribute to the loss.
486 | 
487 |   Inputs:
488 |   - x: Input scores, of shape (N, T, V)
489 |   - y: Ground-truth indices, of shape (N, T) where each element is in the range
490 |        0 <= y[i, t] < V
491 |   - mask: Boolean array of shape (N, T) where mask[i, t] tells whether or not
492 |     the scores at x[i, t] should contribute to the loss.
493 | 
494 |   Returns a tuple of:
495 |   - loss: Scalar giving loss
496 |   - dx: Gradient of loss with respect to scores x.
497 |   """
498 | 
499 |   N, T, V = x.shape
500 |   
501 |   x_flat = x.reshape(N * T, V)
502 |   y_flat = y.reshape(N * T)
503 |   mask_flat = mask.reshape(N * T)
504 |   
505 |   probs = np.exp(x_flat - np.max(x_flat, axis=1, keepdims=True))
506 |   probs /= np.sum(probs, axis=1, keepdims=True)
507 |   loss = -np.sum(mask_flat * np.log(probs[np.arange(N * T), y_flat])) / N
508 |   dx_flat = probs.copy()
509 |   dx_flat[np.arange(N * T), y_flat] -= 1
510 |   dx_flat /= N
511 |   dx_flat *= mask_flat[:, np.newaxis]
512 | 
513 |   
514 |   if verbose: print 'dx_flat: ', dx_flat.shape
515 |   
516 |   dx = dx_flat.reshape(N, T, V)
517 |   
518 |   return loss, dx
519 | 
520 | 


--------------------------------------------------------------------------------
/assignment3/cs231n/setup.py:
--------------------------------------------------------------------------------
 1 | from distutils.core import setup
 2 | from distutils.extension import Extension
 3 | from Cython.Build import cythonize
 4 | import numpy
 5 | 
 6 | extensions = [
 7 |   Extension('im2col_cython', ['im2col_cython.pyx'],
 8 |             include_dirs = [numpy.get_include()]
 9 |   ),
10 | ]
11 | 
12 | setup(
13 |     ext_modules = cythonize(extensions),
14 | )
15 | 


--------------------------------------------------------------------------------
/assignment3/frameworkpython:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | # what real Python executable to use
 4 | PYVER=2.7
 5 | PATHTOPYTHON=/usr/local/bin/
 6 | PYTHON=${PATHTOPYTHON}python${PYVER}
 7 | 
 8 | # find the root of the virtualenv, it should be the parent of the dir this script is in
 9 | ENV=`$PYTHON -c "import os; print os.path.abspath(os.path.join(os.path.dirname(\"$0\"), '..'))"`
10 | 
11 | # now run Python with the virtualenv set as Python's HOME
12 | export PYTHONHOME=$ENV
13 | exec $PYTHON "$@"
14 | 


--------------------------------------------------------------------------------
/assignment3/kitten.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment3/kitten.jpg


--------------------------------------------------------------------------------
/assignment3/requirements.txt:
--------------------------------------------------------------------------------
 1 | Cython==0.23.4
 2 | Jinja2==2.8
 3 | MarkupSafe==0.23
 4 | Pillow==3.0.0
 5 | Pygments==2.0.2
 6 | appnope==0.1.0
 7 | argparse==1.2.1
 8 | backports-abc==0.4
 9 | backports.ssl-match-hostname==3.5.0.1
10 | certifi==2015.11.20.1
11 | cycler==0.9.0
12 | decorator==4.0.6
13 | functools32==3.2.3-2
14 | gnureadline==6.3.3
15 | ipykernel==4.2.2
16 | ipython==4.0.1
17 | ipython-genutils==0.1.0
18 | ipywidgets==4.1.1
19 | jsonschema==2.5.1
20 | jupyter==1.0.0
21 | jupyter-client==4.1.1
22 | jupyter-console==4.0.3
23 | jupyter-core==4.0.6
24 | matplotlib==1.5.0
25 | mistune==0.7.1
26 | nbconvert==4.1.0
27 | nbformat==4.0.1
28 | notebook==4.0.6
29 | numpy==1.10.4
30 | path.py==8.1.2
31 | pexpect==4.0.1
32 | pickleshare==0.5
33 | ptyprocess==0.5
34 | pyparsing==2.0.7
35 | python-dateutil==2.4.2
36 | pytz==2015.7
37 | pyzmq==15.1.0
38 | qtconsole==4.1.1
39 | scipy==0.16.1
40 | simplegeneric==0.8.1
41 | singledispatch==3.4.0.3
42 | six==1.10.0
43 | terminado==0.5
44 | tornado==4.3
45 | traitlets==4.0.0
46 | wsgiref==0.1.2
47 | 


--------------------------------------------------------------------------------
/assignment3/sky.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShibiHe/Stanford-CS231n-assignments/4590c50075a2e4aa291777b4db45d04b877dcdf6/assignment3/sky.jpg


--------------------------------------------------------------------------------
/assignment3/start_ipython_osx.sh:
--------------------------------------------------------------------------------
1 | # Assume the virtualenv is called .env
2 | 
3 | cp frameworkpython .env/bin
4 | .env/bin/frameworkpython -m IPython notebook
5 | 


--------------------------------------------------------------------------------