└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # New in Pytorch v0.4.0
  2 | 
  3 | - `Tensors` and `Variable` have merged
  4 | - Deprecation of `volatile`
  5 | 
  6 | More information here: https://github.com/pytorch/pytorch/releases
  7 | 
  8 | # pytorch-cheatsheet
  9 | 
 10 | A cheatsheet to cover the most commonly used aspects of PyTorch!
 11 | 
 12 | Pytorch Documentation: 
 13 | http://pytorch.org/docs/master/
 14 | 
 15 | Pytorch Forums: 
 16 | https://discuss.pytorch.org/
 17 | 
 18 | ## Tensor Types and Operations
 19 | 
 20 | ##### Commonly used types of tensors
 21 | `torch.IntTensor`
 22 | 
 23 | `torch.FloatTensor`
 24 | 
 25 | `torch.DoubleTensor`
 26 | 
 27 | `torch.LongTensor`
 28 | 
 29 | ##### Common pytorch functions
 30 | ```
 31 | torch.sigmoid(x)
 32 | torch.log(x)
 33 | torch.sum(x, dim=<dim>)
 34 | torch.div(x, y)
 35 | ```
 36 | and many others: `neg(), reciprocal(), pow(), sin(), tanh(), sqrt(), sign()`
 37 | 
 38 | ##### Convert numpy array to pytorch tensor
 39 | `b = torch.from_numpy(a)`
 40 | 
 41 | The above keeps the original dtype of the data (e.g. float64 becomes `torch.DoubleTensor`). To override this you can do the following (to cast it to a `torch.FloatTensor`):
 42 | 
 43 | `b = torch.FloatTensor(a)`
 44 | 
 45 | ##### Moving data from CPU to GPU
 46 | `data = data.cuda()`
 47 | 
 48 | ##### Pytorch Variable
 49 | The pytorch `torch.autograd.Variable` has a "data" tensor under it:
 50 | `data_variable.data`
 51 | 
 52 | ##### Moving data from GPU to CPU
 53 | `data = data.cpu()`
 54 | 
 55 | ##### Convert tensor to numpy array
 56 | `data_arr = data_tensor.numpy()`
 57 | 
 58 | ##### Moving a Variable to CPU and converting to numpy array
 59 | `data = data.data.cpu().numpy()`
 60 | 
 61 | ##### Viewing the size of a tensor
 62 | `data.size()`
 63 | 
 64 | ##### Add a dimenstion to a torch tensor
 65 | `<tensor>.unsqueeze(axis)`
 66 | 
 67 | ##### Reshaping tensors
 68 | The equivalent of numpy's `reshape()` in torch is `view()`
 69 | 
 70 | Examples:
 71 | 
 72 | ```
 73 | a = torch.range(1, 16)
 74 | a = a.view(4, 4)        # reshapes from 1 x 16 to 4 x 4
 75 | ```
 76 | 
 77 | `<tensor>.view(-1)` vectorizes a tensor.
 78 | 
 79 | ##### Operations between Pytorch Variables and Numpy variables
 80 | The Numpy variables have to be first converted to `torch.Tensor` and then converted to pytorch `torch.autograd.Variable`. 
 81 | An example is shown below.
 82 | 
 83 | ```
 84 | matrix_tensor = torch.Tensor(matrix_numpy)
 85 | # Use cuda() if everything will be calculated on GPU
 86 | matrix_pytorch_variable_cuda_from_numpy = torch.autograd.Variable(matrix_tensor, requires_grad=False).cuda()
 87 | loss = F.mse_loss(matrix_pytorch_variable_cuda, matrix_pytorch_variable_cuda_from_numpy)
 88 | ```
 89 | 
 90 | ##### Batch matrix operations
 91 | ```
 92 | # Batch matrix multiply
 93 | torch.btorch.bmm(batch1, batch2, out=None)
 94 | ```
 95 | ##### Transpose a tensor
 96 | Transpose axis1 and axis2
 97 | `<tensor>.transpose(axis1, axis2)`
 98 | 
 99 | ##### Outer product
100 | Outer product between two vectors `vec1` and `vec2`
101 | ```
102 | output = vec1.unsqueeze(2)*vec2.unsqueeze(1)
103 | output = output.view(output.size(0),-1)
104 | ```
105 | 
106 | ## Running on multiple GPUs
107 | 
108 | ##### Multiple GPUs/CPUs for training
109 | 
110 | Instantiate the model first and then call DataParallel. todo: Add a way to specify the number of GPUs.
111 | 
112 | `model = Net()`
113 | 
114 | `model = torch.nn.DataParallel(model)`
115 | 
116 | For specifying the GPU devices: 
117 | `model = torch.nn.DataParallel(model, device_ids=[0,1,2,3]).cuda()`
118 | 
119 | Pytorch 0.2.0 supports distributed data parallelism, i.e. training over multiple nodes (CPUs and GPUs)
120 | 
121 | ##### Setting the GPUs
122 | 
123 | Usage of the `torch.cuda.set_device(gpu_idx)` is discouraged in favor of `device()`. In most cases it’s better to use the `CUDA_VISIBLE_DEVICES` environmental variable.
124 | 
125 | 
126 | 
127 | ## Datasets and Data Loaders
128 | 
129 | ##### Creating a dataset and enumerating over it
130 | Inherit from `torch.utils.data.Dataset` and overload `__getitem__()` and `__len()__`
131 | 
132 | Example:
133 | 
134 | ```
135 | class FooDataset(torch.utils.data.Dataset):
136 |   def __init__(self, root_dir):
137 |     ...
138 |   def __getitem__(self, idx):
139 |     ...
140 |     return {'data': batch_data, 'label': batch_label}
141 |   def __len__(self):
142 |     ...
143 |     return <length of dataset>
144 | ```
145 | 
146 | To loop over a datasets batches:
147 | ```
148 | foo_dataset = FooDataset(root_dir)
149 | data_loader = torch.utils.data.DataLoader(foo_dataset, batch_size=<batch_size>, shuffle=True)
150 | 
151 | for batch_idx, batch in enumerate(data_loader):
152 |   data = batch['data']
153 |   label = batch['label']
154 |   
155 |   if args.cuda:
156 |     data, label = data.cuda(), label.cuda()
157 |     
158 |   data = Variable(data)
159 |   label = Variable(label)
160 | ```
161 | 
162 | ## Setting Torch Random Seed
163 | Use `torch.manual_seed(seed)` in addition to `np.random.seed(seed)` to make training deterministic. I believe in the future torch will use the numpy seed so they won't be separate anymore. 
164 | 
165 | ## Convolutional Layers
166 | 
167 | ##### Conv2d layer
168 | `torch.nn.Conv2d(in_channels, out_channels, (kernel_w, kernel_h), stride=(x,y), padding=(x,y), bias=False, dilation=<d>)`
169 | 
170 | 
171 | ##### Transpose Conv2d layer ('Deconvolution layer')
172 | `torch.nn.ConvTranspose2d(in_channels, out_channels, (kernel_w, kernel_h), stride=(x,y), padding=(x,y), output_padding=(x,y), bias=False, dilation=<d>)`
173 | 
174 | ## Activations
175 | 
176 | `nn.ReLU(inplace=True)`
177 | 
178 | from the Pytorch forums:
179 | 
180 | `inplace=True` means that it will modify the input directly, without allocating any additional output. It can sometimes slightly decrease the memory usage, but may not always be a valid operation (because the original input is destroyed). However, if you don’t see an error, it means that your use case is valid.
181 | 
182 | ## Model Inference
183 | 
184 | ##### Volatile at inference time
185 | Don't forget to set the input to the graph/net to `volatile=True`. Even if you do `model.eval()`, if the input data is not set to volatile then memory will be used up to compute the gradients. `model.eval()` sets batchnorm and dropout to inference/test mode, insteasd of training mode which is the default when the model is instantiated. If at least one torch Variable is not volatile in the graph (including the input variable being fed into the network graph), it will cause gradients to be computed in the graph even if `model.eval()` was called. This will take up extra memory. 
186 | 
187 | Important: `volatile` has been deprecated as of v0.4.0. Now it has been replaced with `requires_grad` (attribute of `Tensor`), `torch.no_grad()`, `torch.set_grad_enabled(grad_mode)`. See information here: https://github.com/pytorch/pytorch/releases
188 | 
189 | Example:
190 | 
191 | `data = Variable(data, volatile=True)`
192 | 
193 | `output = model(data)`
194 | 
195 | ##### Deploying/Serving Pytorch to Production Using TensorRT
196 | 
197 | [copied from here https://docs.nvidia.com/deeplearning/sdk/tensorrt-install-guide/index.html]
198 | 
199 | Using NVIDIA TensorRT. NVIDIA TensorRT is a C++ library that facilitates high performance inference on NVIDIA graphics processing units (GPUs). TensorRT takes a network definition and optimizes it by merging tensors and layers, transforming weights, choosing efficient intermediate data formats, and selecting from a large kernel catalog based on layer parameters and measured performance.
200 | 
201 | TensorRT consists of import methods to help you express your trained deep learning model for TensorRT to optimize and run. It is an optimization tool that applies graph optimization and layer fusion and finds the fastest implementation of that model leveraging a diverse collection of highly optimized kernels, and a runtime that you can use to execute this network in an inference context.
202 | 
203 | TensorRT includes an infrastructure that allows you to leverage high speed reduced precision capabilities of Pascal GPUs as an optional optimization.
204 | 
205 | For installation and setup, see link above. I would recommend following the tar installation. I found an error with `cuda.h` not being found so had to make sure my cuda version was properly setup and upgraded to cuda-9.0. The tar installation should lead you through installing pycuda, tensorRT and UFF.
206 | 
207 | A summary of the steps I did to get this work was:
208 | * Install pycuda first: `pip install 'pycuda>=2017.1.1'` (had problems with pycuda installation. couldnt find cuda.h - so installed cuda-9.0 and updated `PATH` and `LD_LIBRARY_PATH` in `~/.bashrc` and sourced.
209 | * Downloaded tensorRT tar and followed instructions to install (e.g. `pip install tensorRT/python/<path-to-wheel>.whl` and `pip install tensorRT/uff/<path-to-wheel>.whl`).
210 | * To verify I made sure I could do the following (of course you have to install tensorflow - see below):
211 | ```
212 |     import tensorflow
213 |     import uff 
214 |     import tensorrt as trr
215 | ```
216 | This worked with tensorRT v4.0.0.3, cuda-9.0, tensorflow version: 1.4.1, pytorch version: 0.3.0.post4. Pytorch was needed for the example below of converting a pytorch model to run on an tensorRT engine.
217 | 
218 | Once installed you have to also install tensorflow.
219 | For tensorflow-gpu with cuda8 use (tensorflow version 1.5 uses cuda 9.0): `pip install tensorflow-gpu==1.4.1`
220 | else just use `pip install tensorflow-gpu` for the latest version.
221 | 
222 | Example of doing this for Pytorch and tensorRT 3.0 (this also worked for my tensorRT version of 4.0): 
223 | https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/topics/topics/workflows/manually_construct_tensorrt_engine.html
224 | 
225 | ## Portability to Other Frameworks
226 | 
227 | Pytorch now (as of v0.3.0) supports exporting models to other frameworks starting with Caffe2AI and MSCNTK. So now models can be deployed/served!
228 | 
229 | <!--Pytorch v0.3.0: Model Exporter to ONNX (ship PyTorch to Caffe2 (part of Pytorch now), CoreML, CNTK, MXNet, Tensorflow)-->
230 | 
231 | ## Saving/Loading Pytorch Models
232 | 
233 | <!--
234 | It is not recommended to save the entire model (architecture and weights) the way that you did because that method will not work if you try to load the model in a different project. For example, if you try to send your model file ./torch_model_v1 to me and I try to load it with torch.load("./torch_model_v1") I will get an error because it’s likely my project won’t have the exact same directory structure as your project.
235 | Instead you should save only the model weights (state dict), define the architecture in code, then load the weights into the new models state dict.
236 | -->
237 | 
238 | <!--Currently there is no supported way within Pytorch to serve/deploy models efficiently. -->
239 | 
240 | For the sake of resuming training Pytorch allows saving and loading the models via two means - see http://pytorch.org/docs/master/notes/serialization.html. 
241 | Beware that load/save pytorch models breaks down if the directory structure or class definitions change so when its time to deploy the model (and by this I mean running it purely from python on another machine for example) the model class has to be added to the python path in order for the class to be instantiated. It's actually very weird. If you save a model, change the directory structure (e.g. put the model in a subfolder) and try to load the model - it will not load. It will complain that it cannot find the class definition. The work around would be to add the class definition to your python path. This is written as a note on the Pytorch documentation page http://pytorch.org/docs/master/notes/serialization.html. See example here:
242 | 
243 | Save the models weights and define the model architecture in the code. You can then load the weights into the new model state dict. 
244 | 
245 | ### Save 
246 | ```
247 | torch.save(model.state_dict(), "./torch_model_v1.pt")
248 | ```
249 | 
250 | ### Load
251 | ```
252 | model = Model() # the model should be defined with the same code you used to create the trained model
253 | state_dict = torch.load( "./torch_model_v1.pt")
254 | model.load_state_dict(state_dict)
255 | ```
256 | 
257 | [taken from https://discuss.pytorch.org/t/using-a-pytorch-model-for-inference/14770/2]
258 | 
259 | ## Loss Functions
260 | 
261 | A list of all the ready-made losses is here: http://pytorch.org/docs/master/nn.html#loss-functions
262 | 
263 | In Pytorch you can write any loss you want as long as you stick to using Pytorch `Variables` (without any `.data` unpacking or numpy conversions) and `torch` functions. The loss will not backprop (when using `loss.backward()`) if you use numpy data structures.
264 | 
265 | ## Training an RNN with features from a CNN
266 | Use `torch.stack(feature_seq, dim=1)` to stack all the features from the CNNs into a sequence. Then feed this into the RNN. Remember you can specify the batch size as the first dimension of the input tensor but you have to set the `batch_first=True` argument when instantiating the RNN (by default it is set to False).
267 | 
268 | Example:
269 | ```
270 |  self.rnn1 = torch.nn.GRU(input_size=input_size, hidden_size=hidden_size, num_layers=2, batch_first=True)
271 | ```
272 | 
273 | 
274 | 
275 | 
276 | 


--------------------------------------------------------------------------------