└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # New in Pytorch v0.4.0 2 | 3 | - `Tensors` and `Variable` have merged 4 | - Deprecation of `volatile` 5 | 6 | More information here: https://github.com/pytorch/pytorch/releases 7 | 8 | # pytorch-cheatsheet 9 | 10 | A cheatsheet to cover the most commonly used aspects of PyTorch! 11 | 12 | Pytorch Documentation: 13 | http://pytorch.org/docs/master/ 14 | 15 | Pytorch Forums: 16 | https://discuss.pytorch.org/ 17 | 18 | ## Tensor Types and Operations 19 | 20 | ##### Commonly used types of tensors 21 | `torch.IntTensor` 22 | 23 | `torch.FloatTensor` 24 | 25 | `torch.DoubleTensor` 26 | 27 | `torch.LongTensor` 28 | 29 | ##### Common pytorch functions 30 | ``` 31 | torch.sigmoid(x) 32 | torch.log(x) 33 | torch.sum(x, dim=) 34 | torch.div(x, y) 35 | ``` 36 | and many others: `neg(), reciprocal(), pow(), sin(), tanh(), sqrt(), sign()` 37 | 38 | ##### Convert numpy array to pytorch tensor 39 | `b = torch.from_numpy(a)` 40 | 41 | The above keeps the original dtype of the data (e.g. float64 becomes `torch.DoubleTensor`). To override this you can do the following (to cast it to a `torch.FloatTensor`): 42 | 43 | `b = torch.FloatTensor(a)` 44 | 45 | ##### Moving data from CPU to GPU 46 | `data = data.cuda()` 47 | 48 | ##### Pytorch Variable 49 | The pytorch `torch.autograd.Variable` has a "data" tensor under it: 50 | `data_variable.data` 51 | 52 | ##### Moving data from GPU to CPU 53 | `data = data.cpu()` 54 | 55 | ##### Convert tensor to numpy array 56 | `data_arr = data_tensor.numpy()` 57 | 58 | ##### Moving a Variable to CPU and converting to numpy array 59 | `data = data.data.cpu().numpy()` 60 | 61 | ##### Viewing the size of a tensor 62 | `data.size()` 63 | 64 | ##### Add a dimenstion to a torch tensor 65 | `.unsqueeze(axis)` 66 | 67 | ##### Reshaping tensors 68 | The equivalent of numpy's `reshape()` in torch is `view()` 69 | 70 | Examples: 71 | 72 | ``` 73 | a = torch.range(1, 16) 74 | a = a.view(4, 4) # reshapes from 1 x 16 to 4 x 4 75 | ``` 76 | 77 | `.view(-1)` vectorizes a tensor. 78 | 79 | ##### Operations between Pytorch Variables and Numpy variables 80 | The Numpy variables have to be first converted to `torch.Tensor` and then converted to pytorch `torch.autograd.Variable`. 81 | An example is shown below. 82 | 83 | ``` 84 | matrix_tensor = torch.Tensor(matrix_numpy) 85 | # Use cuda() if everything will be calculated on GPU 86 | matrix_pytorch_variable_cuda_from_numpy = torch.autograd.Variable(matrix_tensor, requires_grad=False).cuda() 87 | loss = F.mse_loss(matrix_pytorch_variable_cuda, matrix_pytorch_variable_cuda_from_numpy) 88 | ``` 89 | 90 | ##### Batch matrix operations 91 | ``` 92 | # Batch matrix multiply 93 | torch.btorch.bmm(batch1, batch2, out=None) 94 | ``` 95 | ##### Transpose a tensor 96 | Transpose axis1 and axis2 97 | `.transpose(axis1, axis2)` 98 | 99 | ##### Outer product 100 | Outer product between two vectors `vec1` and `vec2` 101 | ``` 102 | output = vec1.unsqueeze(2)*vec2.unsqueeze(1) 103 | output = output.view(output.size(0),-1) 104 | ``` 105 | 106 | ## Running on multiple GPUs 107 | 108 | ##### Multiple GPUs/CPUs for training 109 | 110 | Instantiate the model first and then call DataParallel. todo: Add a way to specify the number of GPUs. 111 | 112 | `model = Net()` 113 | 114 | `model = torch.nn.DataParallel(model)` 115 | 116 | For specifying the GPU devices: 117 | `model = torch.nn.DataParallel(model, device_ids=[0,1,2,3]).cuda()` 118 | 119 | Pytorch 0.2.0 supports distributed data parallelism, i.e. training over multiple nodes (CPUs and GPUs) 120 | 121 | ##### Setting the GPUs 122 | 123 | Usage of the `torch.cuda.set_device(gpu_idx)` is discouraged in favor of `device()`. In most cases it’s better to use the `CUDA_VISIBLE_DEVICES` environmental variable. 124 | 125 | 126 | 127 | ## Datasets and Data Loaders 128 | 129 | ##### Creating a dataset and enumerating over it 130 | Inherit from `torch.utils.data.Dataset` and overload `__getitem__()` and `__len()__` 131 | 132 | Example: 133 | 134 | ``` 135 | class FooDataset(torch.utils.data.Dataset): 136 | def __init__(self, root_dir): 137 | ... 138 | def __getitem__(self, idx): 139 | ... 140 | return {'data': batch_data, 'label': batch_label} 141 | def __len__(self): 142 | ... 143 | return 144 | ``` 145 | 146 | To loop over a datasets batches: 147 | ``` 148 | foo_dataset = FooDataset(root_dir) 149 | data_loader = torch.utils.data.DataLoader(foo_dataset, batch_size=, shuffle=True) 150 | 151 | for batch_idx, batch in enumerate(data_loader): 152 | data = batch['data'] 153 | label = batch['label'] 154 | 155 | if args.cuda: 156 | data, label = data.cuda(), label.cuda() 157 | 158 | data = Variable(data) 159 | label = Variable(label) 160 | ``` 161 | 162 | ## Setting Torch Random Seed 163 | Use `torch.manual_seed(seed)` in addition to `np.random.seed(seed)` to make training deterministic. I believe in the future torch will use the numpy seed so they won't be separate anymore. 164 | 165 | ## Convolutional Layers 166 | 167 | ##### Conv2d layer 168 | `torch.nn.Conv2d(in_channels, out_channels, (kernel_w, kernel_h), stride=(x,y), padding=(x,y), bias=False, dilation=)` 169 | 170 | 171 | ##### Transpose Conv2d layer ('Deconvolution layer') 172 | `torch.nn.ConvTranspose2d(in_channels, out_channels, (kernel_w, kernel_h), stride=(x,y), padding=(x,y), output_padding=(x,y), bias=False, dilation=)` 173 | 174 | ## Activations 175 | 176 | `nn.ReLU(inplace=True)` 177 | 178 | from the Pytorch forums: 179 | 180 | `inplace=True` means that it will modify the input directly, without allocating any additional output. It can sometimes slightly decrease the memory usage, but may not always be a valid operation (because the original input is destroyed). However, if you don’t see an error, it means that your use case is valid. 181 | 182 | ## Model Inference 183 | 184 | ##### Volatile at inference time 185 | Don't forget to set the input to the graph/net to `volatile=True`. Even if you do `model.eval()`, if the input data is not set to volatile then memory will be used up to compute the gradients. `model.eval()` sets batchnorm and dropout to inference/test mode, insteasd of training mode which is the default when the model is instantiated. If at least one torch Variable is not volatile in the graph (including the input variable being fed into the network graph), it will cause gradients to be computed in the graph even if `model.eval()` was called. This will take up extra memory. 186 | 187 | Important: `volatile` has been deprecated as of v0.4.0. Now it has been replaced with `requires_grad` (attribute of `Tensor`), `torch.no_grad()`, `torch.set_grad_enabled(grad_mode)`. See information here: https://github.com/pytorch/pytorch/releases 188 | 189 | Example: 190 | 191 | `data = Variable(data, volatile=True)` 192 | 193 | `output = model(data)` 194 | 195 | ##### Deploying/Serving Pytorch to Production Using TensorRT 196 | 197 | [copied from here https://docs.nvidia.com/deeplearning/sdk/tensorrt-install-guide/index.html] 198 | 199 | Using NVIDIA TensorRT. NVIDIA TensorRT is a C++ library that facilitates high performance inference on NVIDIA graphics processing units (GPUs). TensorRT takes a network definition and optimizes it by merging tensors and layers, transforming weights, choosing efficient intermediate data formats, and selecting from a large kernel catalog based on layer parameters and measured performance. 200 | 201 | TensorRT consists of import methods to help you express your trained deep learning model for TensorRT to optimize and run. It is an optimization tool that applies graph optimization and layer fusion and finds the fastest implementation of that model leveraging a diverse collection of highly optimized kernels, and a runtime that you can use to execute this network in an inference context. 202 | 203 | TensorRT includes an infrastructure that allows you to leverage high speed reduced precision capabilities of Pascal GPUs as an optional optimization. 204 | 205 | For installation and setup, see link above. I would recommend following the tar installation. I found an error with `cuda.h` not being found so had to make sure my cuda version was properly setup and upgraded to cuda-9.0. The tar installation should lead you through installing pycuda, tensorRT and UFF. 206 | 207 | A summary of the steps I did to get this work was: 208 | * Install pycuda first: `pip install 'pycuda>=2017.1.1'` (had problems with pycuda installation. couldnt find cuda.h - so installed cuda-9.0 and updated `PATH` and `LD_LIBRARY_PATH` in `~/.bashrc` and sourced. 209 | * Downloaded tensorRT tar and followed instructions to install (e.g. `pip install tensorRT/python/.whl` and `pip install tensorRT/uff/.whl`). 210 | * To verify I made sure I could do the following (of course you have to install tensorflow - see below): 211 | ``` 212 | import tensorflow 213 | import uff 214 | import tensorrt as trr 215 | ``` 216 | This worked with tensorRT v4.0.0.3, cuda-9.0, tensorflow version: 1.4.1, pytorch version: 0.3.0.post4. Pytorch was needed for the example below of converting a pytorch model to run on an tensorRT engine. 217 | 218 | Once installed you have to also install tensorflow. 219 | For tensorflow-gpu with cuda8 use (tensorflow version 1.5 uses cuda 9.0): `pip install tensorflow-gpu==1.4.1` 220 | else just use `pip install tensorflow-gpu` for the latest version. 221 | 222 | Example of doing this for Pytorch and tensorRT 3.0 (this also worked for my tensorRT version of 4.0): 223 | https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/topics/topics/workflows/manually_construct_tensorrt_engine.html 224 | 225 | ## Portability to Other Frameworks 226 | 227 | Pytorch now (as of v0.3.0) supports exporting models to other frameworks starting with Caffe2AI and MSCNTK. So now models can be deployed/served! 228 | 229 | 230 | 231 | ## Saving/Loading Pytorch Models 232 | 233 | 237 | 238 | 239 | 240 | For the sake of resuming training Pytorch allows saving and loading the models via two means - see http://pytorch.org/docs/master/notes/serialization.html. 241 | Beware that load/save pytorch models breaks down if the directory structure or class definitions change so when its time to deploy the model (and by this I mean running it purely from python on another machine for example) the model class has to be added to the python path in order for the class to be instantiated. It's actually very weird. If you save a model, change the directory structure (e.g. put the model in a subfolder) and try to load the model - it will not load. It will complain that it cannot find the class definition. The work around would be to add the class definition to your python path. This is written as a note on the Pytorch documentation page http://pytorch.org/docs/master/notes/serialization.html. See example here: 242 | 243 | Save the models weights and define the model architecture in the code. You can then load the weights into the new model state dict. 244 | 245 | ### Save 246 | ``` 247 | torch.save(model.state_dict(), "./torch_model_v1.pt") 248 | ``` 249 | 250 | ### Load 251 | ``` 252 | model = Model() # the model should be defined with the same code you used to create the trained model 253 | state_dict = torch.load( "./torch_model_v1.pt") 254 | model.load_state_dict(state_dict) 255 | ``` 256 | 257 | [taken from https://discuss.pytorch.org/t/using-a-pytorch-model-for-inference/14770/2] 258 | 259 | ## Loss Functions 260 | 261 | A list of all the ready-made losses is here: http://pytorch.org/docs/master/nn.html#loss-functions 262 | 263 | In Pytorch you can write any loss you want as long as you stick to using Pytorch `Variables` (without any `.data` unpacking or numpy conversions) and `torch` functions. The loss will not backprop (when using `loss.backward()`) if you use numpy data structures. 264 | 265 | ## Training an RNN with features from a CNN 266 | Use `torch.stack(feature_seq, dim=1)` to stack all the features from the CNNs into a sequence. Then feed this into the RNN. Remember you can specify the batch size as the first dimension of the input tensor but you have to set the `batch_first=True` argument when instantiating the RNN (by default it is set to False). 267 | 268 | Example: 269 | ``` 270 | self.rnn1 = torch.nn.GRU(input_size=input_size, hidden_size=hidden_size, num_layers=2, batch_first=True) 271 | ``` 272 | 273 | 274 | 275 | 276 | --------------------------------------------------------------------------------