├── .gitignore ├── README.md └── appendix ├── PytorchBugs.md ├── module ├── .DS_Store ├── data │ └── data_aug.py ├── models │ ├── classification.md │ ├── fpn.py │ ├── mnasnet.py │ ├── mobilenet_v1.py │ ├── resnet.py │ ├── senet.py │ ├── shufflenet_v2.py │ └── unet.py └── optimizer │ └── Learning_rate.ipynb ├── production ├── distributed │ ├── pytorch-distributed-example-master │ │ ├── .gitignore │ │ ├── LICENSE │ │ ├── README.md │ │ ├── mnist │ │ │ ├── Dockerfile │ │ │ ├── README.md │ │ │ ├── docker-compose-gpu.yml │ │ │ ├── docker-compose.yml │ │ │ └── main.py │ │ ├── setup.cfg │ │ └── toy │ │ │ ├── README.md │ │ │ └── main.py │ └── pytorch-distributed-master │ │ ├── .gitignore │ │ ├── LICENSE │ │ ├── README.md │ │ ├── apex_distributed.py │ │ ├── dataparallel.py │ │ ├── distributed.py │ │ ├── distributed_slurm_main.py │ │ ├── horovod_distributed.py │ │ ├── multiprocessing_distributed.py │ │ ├── requirements.txt │ │ ├── start.sh │ │ └── statistics.sh └── inference │ ├── TensorRT │ ├── .ipynb_checkpoints │ │ └── pytorch_onnx_trt-checkpoint.ipynb │ ├── README.md │ ├── cat.jpeg │ ├── pytorch_onnx_trt.ipynb │ ├── trt_helper.py │ └── trt_int8_calibration_helper.py │ └── flask-api │ ├── .gitignore │ ├── LICENSE │ ├── README.md │ ├── app.py │ ├── cat.jpeg │ ├── imagenet_class_index.json │ └── requirements.txt └── template ├── .gitignore ├── LICENSE ├── README.md ├── assets └── images │ ├── 01.JPG │ └── 02.JPG ├── data ├── test │ ├── .gitkeep │ ├── high │ │ ├── 1.png │ │ └── 22.png │ └── low │ │ ├── 1.png │ │ └── 22.png ├── train │ ├── high │ │ ├── 2.png │ │ └── 5.png │ └── low │ │ ├── 2.png │ │ └── 5.png └── valid │ ├── high │ ├── 1.png │ └── 22.png │ └── low │ ├── 1.png │ └── 22.png ├── datasets.py ├── logs └── .gitignore ├── loss.py ├── metrics.py ├── models.py ├── requirements.txt ├── test.py ├── train.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Compiled python 2 | *.pyc 3 | *.pyd 4 | 5 | # Compiled MATLAB 6 | *.mex* 7 | 8 | # IPython notebook checkpoints 9 | .ipynb_checkpoints 10 | 11 | # Editor temporaries 12 | *.swn 13 | *.swo 14 | *.swp 15 | *~ 16 | 17 | # Sublime Text settings 18 | *.sublime-workspace 19 | *.sublime-project 20 | 21 | # Eclipse Project settings 22 | *.*project 23 | .settings 24 | 25 | # QtCreator files 26 | *.user 27 | 28 | # PyCharm files 29 | .idea 30 | 31 | # Visual Studio Code files 32 | .vscode 33 | .vs 34 | 35 | # OSX dir files 36 | .DS_Store 37 | -------------------------------------------------------------------------------- /appendix/PytorchBugs.md: -------------------------------------------------------------------------------- 1 | ### 一. CUDA & cudnn 2 | 3 | **1. cuDNN error:CUDNN_STATUS_EXECUTION_FAILED** 4 | 5 | **A:**This happens also in the windows port of PyTorch, the only way to overcome this when using (in my case) large CNN’s is to use: 6 | 7 | ~~~shell 8 | torch.backends.cudnn.enabled=False 9 | ~~~ 10 | 11 | **2. out of memory at /opt/conda/conda-bld/pytorch_1524590031827/work/aten/src/THC/generic/THCStorage.cu:58** 12 | 13 | **A:**! 显存不够, 没啥办法 -> **Batchsize 改小、加显卡、混合精度训练** 14 | 15 | **3. CUDA 设置指定 GPU 可见** 16 | 17 | **> 可设置环境变量 CUDA_VISIBLE_DEVICES,指明可见的 cuda 设备** 18 | 19 | **方法1**: 在 `/etc/profile` 或 `~/.bashrc` 的配置文件中配置环境变量(`/etc/profile`影响所有用户,`~/.bashrc `影响当前用户使用的 bash shell) 20 | 21 | 在 `/etc/profile` 文件末尾添加以下行: 22 | 23 | ~~~ 24 | export CUDA_VISIBLE_DEVICES=0,1 # 仅显卡设备0,1GPU可见 25 | ~~~ 26 | 27 | `:wq` 保存并退出, 然后执行如下命令: 28 | 29 | ~~~ 30 | source /etc/profile 31 | ~~~ 32 | 33 | 使配置文件生效 34 | 35 | **方法2**:若上述配置无效,可在执行 cuda 程序时指明参数,如 36 | 37 | ~~~shell 38 | CUDA_VISIBLE_DEVICES=0,1 ./cuda_executable 39 | # Environment Variable Syntax Results 40 | # CUDA_VISIBLE_DEVICES=1 Only device 1 will be seen 41 | # CUDA_VISIBLE_DEVICES=0,1 Devices 0 and 1 will be visible 42 | # CUDA_VISIBLE_DEVICES="0,1" Same as above, quotation marks are optional 43 | # CUDA_VISIBLE_DEVICES=0,2,3 Devices 0, 2, 3 will be visible; device 1 is masked 44 | ~~~ 45 | 46 | **4. nn.Module.cuda() 和 Tensor.cuda() 的作用效果差异** 47 | 48 | 无论是对于模型还是数据,`cuda()` 函数都能实现从 CPU 到 GPU 的内存迁移,但是他们的作用效果有所不同。 49 | 50 | 对于 `nn.Module`: 51 | 52 | ~~~python 53 | model = model.cuda() # 等价于 model.cuda() 54 | ~~~ 55 | 56 | 上面两句能够达到一样的效果,即对 model 自身进行的内存迁移。 57 | 58 | 对于 `Tensor`: 59 | 60 | ​ 和nn.Module不同,调用 `tensor.cuda()` 只是返回这个 tensor 对象在 GPU 内存上的拷贝,而不会对自身进行改变**。因此必须对tensor进行重新赋值,即 **`tensor = tensor.cuda()`. 61 | 62 | ~~~python 63 | model = create_a_model() 64 | tensor = torch.zeros([2,3,10,10]) 65 | model.cuda() 66 | tensor.cuda() 67 | model(tensor) # 会报错 68 | tensor = tensor.cuda() 69 | model(tensor) # 正常运行 70 | ~~~ 71 | 72 | **5. an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generated/../THCReduceAll.cuh:339** 73 | 74 | 在 GPU 训练中不正确的内存访问,有可能是程序问题也有可能是当前驱动不兼容的问题: 75 | 76 | ​ 因为 cuda 运行是异步的,所以我们的错误信息可能没有那么准确,为此我们将环境变量 `CUDA_LAUNCH_BLOCKING=1` 设为1,在当前的 terminal 中执行 `CUDA_LAUNCH_BLOCKING=1 python3 train.py` —— (train.py是你要执行的.py文件),再次执行就可以查看到当前出错的代码行。 77 | 78 | ​ 仔细检查当前的代码,查看是否有内存的不正确访问,最常见的是索引超出范围。 如果不是代码问题,那么有可能是当前的 pytorch 版本和你的显卡型号不兼容,或者cudnn的库不兼容的问题。可以挑选出错误代码段对其进行简单的测试观察有没有错误即可。 79 | 80 | **6.** **AttributeError: module 'torch._C' has no attribute '_cuda_getDevice'** 81 | 82 | **A:** According to [docs](http://pytorch.org/docs/master/torch.html#torch.load), I believe you should be loading your models with 83 | 84 | ~~~python 85 | torch.load(file, map_location=device) 86 | ~~~ 87 | 88 | 89 | 90 | ### 二. mismacth 91 | 92 | **1.** **Input type (CUDAFloatTensor) and weight type (CPUFloatTensor) should be the same** 93 | 94 | **A:** **输入数据和模型中的权重设备不一样**,**模型的参数不在GPU中,而输入数据在CPU中**,可以通过添加 model.cuda() 将模型转移到GPU上以解决这个问题。 95 | 96 | **2. Input type (CUDADoubleTensor) and weight type (CUDAFloatTensor) should be the same** 97 | 98 | **A:** **输入数据和模型中的权重数据类型不一致,一个为 Double 一个为 Float**, 通过对输入数据 Tensor(x) 进行 x.float() 将输入数据和模型权重类型一致,或者将模型权重的类型转化为 Double 也可以解决问题。 99 | 100 | **3. RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 1 and 3 in dimension 1** 101 | 102 | (1) 读取数据的时候发生错误了。一般来说是维度不匹配,如果一个数据集中有 3 通道的也有四通道的图像,总之就是从 dataset 中传入 dataloader 中的图像大小不一致。自己好好检查检查,是否将所有图像都变成同样的 shape 。注意,**只要是 dataset 中的数据都要 shape 一样,不论是图像还是 label,或者 box,都必须一致了。** 103 | 104 | (2) 尺寸的原因。**检查卷积核的尺寸和输入尺寸是否匹配,padding数是否正确**。 105 | 106 | **4. invalid argument 0: Sizes of tensors must match except in dimension 1. Got 14 and 13 in dimension 0 at /home/prototype/Downloads/pytorch/aten/src/THC/generic/THCTensorMath.cu:83** 107 | 108 | (1) **你输入的图像数据的维度不完全是一样的**,比如是训练的数据有 100 组,其中 99 组是 256*256,但有一组是 384*384,这样会导致Pytorch的检查程序报错。 -> 整理一下你的数据集保证每个图像的维度和通道数都一致即可。 109 | 110 | (2) 另外一个则是比较隐晦的 batchsize 的问题,Pytorch 中检查你训练维度正确是按照每个 batchsize 的维度来检查的。 如果你有 999 组数据,你继续使用 batchsize 为 4 的话,这样 999 和 4 并不能整除,你在训练前 249 组时的张量维度都为(4,3,256,256) 但是最后一个批次的维度为 (3,3,256,256),Pytorch 检查到 (4,3,256,256) != (3,3,256,256),维度不匹配,自然就会报错了,这可以称为一个小 bug。 -> 挑选一个可以被数据集个数整除的batchsize或者直接把batchsize设置为1即可。 111 | 112 | **5. expected CPU tensor (got CUDA tensor)** 113 | 114 | 期望得到CPU类型张量,得到的却是CUDA张量类型。 很典型的错误,例如计算图中有的参数为cuda型有的参数却是cpu型就会遇到这样的错误。 115 | 116 | 117 | 118 | ### 三. version 119 | 120 | **1.** **IndexError:** **invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number** 121 | 122 | ~~~python 123 | total_loss += loss.data[0] 124 | ~~~ 125 | 126 | **A:** 这个错误很常见, 由于高版本的pytorch对接口进行更改的导致的! 127 | 128 | It's likely that you're using a more recent version of pytorch than we specified. Replacing this line with 129 | 130 | ~~~python 131 | total_loss += loss.item() 132 | ~~~ 133 | 134 | Should probably solve the problem. 135 | 136 | **2.** /home/zhaozhichao/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: **FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.** 137 | 138 | **A:**进入错误所在文件的位置: 将 `np.dtype([("quint8", np.uint8, 1)])` 修改为 `np.dtype([("quint8", np.uint8, (1,))])` 即可。 139 | 140 | **3.** UserWarning: **indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool** 141 | 142 | **A:** 项目目录: https://github.com/eriklindernoren/PyTorch-YOLOv3/blob/master/models.py#L191 143 | 144 | 在 model.py 中 191 行添加如下内容: 145 | 146 | ~~~python 147 | # obj_mask 转为 bool 148 | obj_mask = obj_mask.bool() # convert int8 to bool 149 | noobj_mask = noobj_mask.bool() # convert int8 to bool 150 | ~~~ 151 | 152 | **4.** **warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")** 153 | 154 | A: 不要再使用 `torch.nn.Functional` 下面的激活函数了。 比如 `F.Sigmoid(x)` 这种形式了。 正确的使用方式是: 155 | 156 | ~~~ 157 | sigmoid = torch.nn.Sigmoid() out = sigmoid(x) 158 | ~~~ 159 | 160 | **5** KeyError: 'unexpected key "module.bn1.num_batches_tracked" in state_dict' 161 | 162 | 经过研究发现,在 pytorch 0.4.1 及后面的版本里,BatchNorm 层新增了 num_batches_tracked 参数,用来统计训练时的 forward 过的 batch 数目,源码如下(pytorch0.4.1): 163 | 164 | ~~~python 165 | if self.training and self.track_running_stats: 166 | self.num_batches_tracked += 1 167 | if self.momentum is None: # use cumulative moving average 168 | exponential_average_factor = 1.0 / self.num_batches_tracked.item() 169 | else: # use exponential moving average 170 | exponential_average_factor = self.momentum 171 | ~~~ 172 | 173 | 大概可以看出,这个参数和训练时的归一化的计算方式有关。 174 | 175 | 因此,我们可以知道该错误是由于训练和测试所用的 pytorch 版本( 0.4.1 版本前后的差异)不一致引起的。具体的解决方案是:如果是模型参数( Orderdict 格式,很容易修改)里少了 num_batches_tracked 变量,就加上去,如果是多了就删掉。偷懒的做法是将 load_state_dict 的 strict 参数置为 False,如下所示: 176 | 177 | ~~~python 178 | load_state_dict(torch.load(weight_path), strict=False) 179 | ~~~ 180 | 181 | 182 | 183 | ### 四. Data & dataloader 184 | 185 | **1. 在开始训练的时候 jupyter notebook kernel 死掉的现象** 186 | 187 | A: 最首先的应该是去检查 DataLoader 中的num_workers, 尝试一下将 num_workers 设置更小或者设置 `num_workers = 0`。 // 把 jupyter 改为 code 形式会有报错信息 188 | 189 | **2. ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)** 190 | 191 | ​ 出现这个错误的情况是,在服务器上的 docker 中运行训练代码时,batch size 设置得过大,shared memory 不够(因为docker 限制了 shm ).解决方法是,将 Dataloader 的 num_workers 设置为 0. 192 | 193 | **3. Assertion `cur_target >= 0 && cur_target < n_classes’ failed.** 194 | 195 | **A: RuntimeError: cuda runtime error (59) : device-side assert triggered at /home/loop/pytorch-master/torch/lib/THC/generic/THCTensorMath.cu:15** 196 | 197 | 我们在分类训练中经常遇到这个问题,一般来说在我们网络中输出的种类数和你label设置的种类数量不同的时候就会出现这个错误。 198 | 199 | 但是,Pytorch有个要求,**在使用** **CrossEntropyLoss** **这个函数进行验证时label必须是以0开始的**: 200 | 201 | 假如我这样: 202 | 203 | ~~~ 204 | self.classes = [0, 1, 2, 3] 205 | ~~~ 206 | 207 | 我的种类有四类,分别是0.1.2.3,这样就没有什么问题,但是如果我写成: 208 | 209 | ~~~ 210 | self.classes = [1, 2, 3, 4] 211 | ~~~ 212 | 213 | 这样就会报错 214 | 215 | **->** 可以判断为 pytorch 所设计的分类器的分类 label 为 `[0,max-1]`,而 true ground 的标签值为 `[1,max]`。 所以可以通过修改 `label = (label-1).to(opt.device)` 216 | 217 | 218 | 219 | ### 五. 模型载入问题 220 | 221 | **1. RuntimeError: Error(s) in loading state_dict for Missing key(s) in state_dict: “fc.weight”, “fc.bias”.** 222 | 223 | 像这种出现**丢失 key**(**missing key**) 224 | 225 | If you have partial state_dict, which is missing some keys you can do the following: 226 | 227 | ~~~python 228 | state = model.state_dict() 229 | state.update(partial) 230 | model.load_state_dict(state) 231 | ~~~ 232 | 233 | **2. RuntimeError: Error(s) in loading state_dict for Missing key(s) in Unexpected key(s) in state_dict: “classifier.0.weight”,** 234 | 235 | **A:** 236 | 237 | ~~~ 238 | # original saved file with DataParallel 239 | state_dict = torch.load('myfile.pth.tar') 240 | # create new OrderedDict that does not contain `module.` 241 | from collections import OrderedDict 242 | new_state_dict = OrderedDict() 243 | for k, v in state_dict.items(): 244 | name = k[7:] # remove `module.` 245 | new_state_dict[name] = v 246 | # load params 247 | model.load_state_dict(new_state_dict) 248 | ~~~ 249 | 250 | 251 | 252 | ### 六. 损失函数 253 | 254 | **1. 训练时损失出现 nan** 255 | 256 | 可能导致梯度出现 nan 的三个原因: 257 | 258 | 1). **梯度爆炸**。也就是说梯度数值超出范围变成 nan. **通常可以调小学习率、加 BN 层或者做梯度裁剪来试试看有没有解决**。 259 | 260 | 2). **损失函数或者网络设计**。比方说,**出现了除 0,或者出现一些边界情况导致函数不可导,比方说 log(0)、sqrt(0)**. 261 | 262 | 3). 脏数据。**可以事先对输入数据进行判断看看是否存在nan.** 263 | 264 | 补充一下nan数据的判断方法: 265 | 266 | 注意!像nan或者inf这样的数值不能使用 == 或者 is 来判断! 为了安全起见统一使用 `math.isnan()` 或者 `numpy.isnan()` 吧。 267 | 268 | 例如: 269 | 270 | ~~~ 271 | import numpy as np 272 | 273 | # 判断输入数据是否存在nan 274 | if np.any(np.isnan(input.cpu().numpy())): 275 | print('Input data has NaN!') 276 | 277 | # 判断损失是否为nan 278 | if np.isnan(loss.item()): 279 | print('Loss value is NaN!') 280 | ~~~ 281 | 282 | **2. pytorch中loss函数的参数设置** 283 | 284 | ​ 以 CrossEntropyLoss 为例: 285 | 286 | ~~~ 287 | CrossEntropyLoss(self, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='elementwise_mean') 288 | ~~~ 289 | 290 | - **reduce**: 291 | 292 | ​ 若 **reduce = False**,那么 size_average 参数失效,直接**返回向量形式**的 loss,即batch中每个元素对应的loss. 293 | 294 | ​ 若 **reduce = True**,那么 loss 返回的是标量: 295 | 296 | ​ 如果 size_average = True,返回 **loss.mean()**. 297 | 298 | ​ 如果 size_average = False,返回 **loss.sum()**. 299 | 300 | - **weight** : 输入一个1D的权值向量,**为各个类别的loss加权**。 301 | 302 | - **ignore_index** : 选择要忽视的目标值,使其对输入梯度不作贡献。如果 size_average = True,那么只计算不被忽视的目标的loss的均值。 303 | - **reduction** : 可选的参数有:‘none’ | ‘elementwise_mean’ | ‘sum’, 正如参数的字面意思,不解释。 304 | 305 | **七. 多机多卡与并行** 306 | 307 | **1. 含有: torch.nn.DataParallel 的代码无法在 CPU上运行** 308 | 309 | 解决方案, 使用如下代码替换 DataParallel 进行一次封装: 310 | 311 | ~~~python 312 | class WrappedModel(torch.nn.Module): 313 | def __init__(self, module): 314 | super(WrappedModel, self).__init__() 315 | self.module = module # that I actually define. 316 | def forward(self, *x, **kwargs): 317 | return self.module(*x, **kwargs) 318 | ~~~ 319 | 320 | **Why:** 321 | 322 | 含有 DataParallel 的代码, 会对源代码添加一层 module 进行封装, 我们所做的仅仅是对其进行了一次 module 的封装 323 | 324 | **2. 使用nn.Dataparallel 数据不在同一个gpu 上** 325 | 326 | 背景:pytorch 多GPU训练主要是采用数据并行方式: 327 | 328 | ~~~python 329 | model = nn.DataParallel(model) 330 | ~~~ 331 | 332 | 问题:但是一次同事训练基于光流检测的实验时发现 data not in same cuda, 做代码 review 时候,打印每个节点 tensor,cuda 里的数据竟然没有分布在同一个 gpu 上 333 | 334 | -> 最终解决方案是在数据,吐出后统一进行执行.cuda() 将数据归入到同一个 cuda 流中解决了该问题。 335 | 336 | **3. pytorch model load可能会踩到的坑:** 337 | 338 | 如果使用了nn.Dataparallel 进行多卡训练在读入模型时候要注意加.module, 代码如下: 339 | 340 | ~~~python 341 | def get_model(self): 342 | if self.nGPU == 1: 343 | return self.model 344 | else: 345 | return self.model.module 346 | ~~~ 347 | 348 | **4. 多 GPU 的处理机制** 349 | 350 | 使用多 GPU 时,应该记住 pytorch 的处理逻辑是: 351 | 352 | 1) 在各个 GPU 上初始化模型。 353 | 354 | 2) 前向传播时,把 batch 分配到各个 GPU 上进行计算。 355 | 356 | 3) 得到的输出在主 GPU 上进行汇总,计算 loss 并反向传播,更新主 GPU上的权值。 357 | 358 | 4) 把主 GPU 上的模型复制到其它 GPU 上。 359 | 360 | 361 | 362 | ### 八. 优化问题(lr & optim) 363 | 364 | **1. 优化器的 weight_decay 项导致的隐蔽 bug** 365 | 366 | ​ 我们都知道 weight_decay 指的是权值衰减,即在原损失的基础上加上一个 L2 惩罚项,使得模型趋向于选择更小的权重参数,起到正则化的效果。但是我经常会忽略掉这一项的存在,从而引发了意想不到的问题。 这次的坑是这样的,在训练一个 ResNet50 的时候,网络的高层部分 layer 4暂时没有用到,因此也并不会有梯度回传,于是我就放心地将 ResNet50 的所有参数都传递给 Optimizer 进行更新了,想着 layer4 应该能保持原来的权重不变才对。但是实际上,尽管 layer4 没有梯度回传,但是weight_decay 的作用仍然存在,它使得 layer4 权值越来越小,趋向于 0。后面需要用到 layer4 的时候,发现输出异常(接近于0),才注意到这个问题的存在。 虽然这样的情况可能不容易遇到,但是还是要谨慎:**暂时不需要更新的权值,一定不要传递给 Optimizer,避免不必要的麻烦**。 367 | 368 | 369 | 370 | ### 九. pytorch 可重复性问题 371 | 372 | https://blog.csdn.net/hyk_1996/article/details/84307108 373 | 374 | 375 | 376 | ### 十. 基本语法问题 377 | 378 | **1. RuntimeError: some of the strides of a given numpy array are negative. This is currently not supported, but will be added in future releases.** 379 | 380 | ~~~python 381 | import numpy as np 382 | x = np.random.random(size=(32, 32, 7)) 383 | torch.from_numpy(np.flip(x, axis=0)) 384 | ~~~ 385 | 386 | Same error with np.rot90() 387 | 388 | **A:** ndarray.copy() will alocate new memory for numpy array which make it normal, I mean the stride is not negative any more. 389 | 390 | ~~~python 391 | torch.from_numpy(np.flip(x, axis=0).copy()) 392 | ~~~ 393 | 394 | **2. view()操作只能用在连续的tensor下** 395 | 396 | 利用 `is_contiguous()` 判断该 tensor 在内存中是否连续,不连续的话使用 `.contiguous()`使其连续。 397 | 398 | **3. input is not contiguous at /pytorch/torch/lib/THC/generic/THCTensor.c:227** 399 | 400 | ~~~ 401 | batch_size, c, h, w = input.size() 402 | rh, rw = (2, 2) 403 | oh, ow = h * rh, w * rw 404 | oc = c // (rh * rw) 405 | out = input.view(batch_size, rh, rw, oc, h, w) 406 | out = out.permute(0, 3, 4, 1, 5, 2) 407 | out = out.view(batch_size, oc, oh, ow) 408 | invalid argument 2: input is not contiguous at /pytorch/torch/lib/THC/generic/THCTensor.c:227 409 | ~~~ 410 | 411 | **A:** 上述在第7行报错,报错原因是由于浅拷贝。上面式子中input为Variable变量。 412 | 413 | 上面第5行 `out = out.permute(0, 3, 4, 1, 5, 2)` 时执行了浅拷贝,out 只是复制了out 从 input 传递过来的指针,也就是说 input 要改变 out 也要随之改变。 414 | 415 | 解决方法是,在第6行的时候使用 `tensor.contiguous()`,第6行改成:`out = out.permute(0, 3, 4, 1, 5, 2).contiguous()`即可。 416 | 417 | **4. RuntimeError: some of the strides of a given numpy array are negative. This is currently not supported, but will be added in future releases.** 418 | 419 | **A:** 这个原因是因为程序中操作的 numpy 中有使用负索引的情况:`image[…, ::-1]`。 420 | 421 | 解决办法比较简单,加入 image 这个 numpy 变量引发了错误,返回 `image.copy()` 即可。因为copy操作可以在原先的numpy变量中创造一个新的不适用负索引的numpy变量。 422 | 423 | **5. torch.Tensor.detach()的使用** 424 | 425 | `detach()`的官方说明如下: Returns a new Tensor, detached from the current graph. The result will never require gradient. 426 | 427 | **假设有模型A和模型B,我们需要将A的输出作为B的输入,但训练时我们只训练模型B. 那么可以这样做:** 428 | 429 | ~~~python 430 | input_B = output_A.detach() 431 | ~~~ 432 | 433 | 它可以使两个计算图的梯度传递断开,从而实现我们所需的功能。 434 | 435 | **6. ValueError: Expected more than 1 value per channel when training** 436 | 437 | 当 batch 里只有一个样本时,再调用 batch_norm 就会报下面这个错误: **raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))**。 没有什么特别好的解决办法,在训练前用 **num_of_samples % batch_size** 算一下会不会正好剩下一个样本。 **! 当 bacthsize 为1的时候, batchnorm 是无法运行的。** 438 | 439 | 440 | 441 | ### 十一. 基本用法 442 | 443 | **Q: Pytorch 如何忽略警告** 444 | 445 | ~~~python 446 | python3 -W ignore::UserWarning xxxx.py 447 | ~~~ 448 | 449 | 450 | 451 | 452 | 453 | ### 十二. ONNX 454 | 455 | ``` 456 | ONNX's Upsample/Resize operator did not match Pytorch's Interpolation until opset 11. Attributes to determine how to transform the input were added in onnx:Resize in opset 11 to support Pytorch's behavior (like coordinate_transformation_mode and nearest_mode). 457 | ``` 458 | 459 | 警告信息已经完整说明,**ONNX's Upsample/Resize operator did not match Pytorch's Interpolation until opset 11.**,因此将ONNX的导出代码中规定其版本,具体如下: 460 | 461 | ~~~python 462 | import torch 463 | torch.onnx.export(model, ..., opset_version=11) 464 | ~~~ 465 | 466 | -------------------------------------------------------------------------------- /appendix/module/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/module/.DS_Store -------------------------------------------------------------------------------- /appendix/module/data/data_aug.py: -------------------------------------------------------------------------------- 1 | #!usr/bin/env python3 2 | # -*- coding:utf-8 -*- 3 | 4 | # data aug 5 | # author: zhaozhichao 6 | # reference: 7 | # https://github.com/apache/incubator-mxnet/tree/6f9a67901362a794e3c022dd75daf8a516760fea/python/mxnet/image 8 | # http://scipy-lectures.org/advanced/image_processing/ 9 | # https://blog.csdn.net/lwplwf/article/details/85776309 10 | # https://blog.csdn.net/u011995719/article/details/85107009 11 | # https://github.com/pytorch/vision/tree/master/torchvision/transforms 12 | # https://github.com/mirzaevinom/data_science_bowl_2018/blob/master/codes/augment_preprocess.py 13 | # https://github.com/jacobkie/2018DSB/blob/07df7d385f23a2272d8258351d680b037705ce3c/script_final/preprocess.py 14 | # https://github.com/selimsef/dsb2018_topcoders/blob/master/albu/src/augmentations/functional.py 15 | 16 | import cv2 17 | import math 18 | import numpy as np 19 | from scipy.ndimage.filters import gaussian_filter 20 | 21 | # FancyPCA 22 | def FancyPCA(img): 23 | """ 24 | """ 25 | h, w, c = img.shape 26 | img = np.reshape(img, (h * w, c)).astype('float32') 27 | mean = np.mean(img, axis=0) 28 | std = np.std(img, axis=0) 29 | img = (img - mean) / std 30 | 31 | cov = np.cov(img, rowvar=False) 32 | lambdas, p = np.linalg.eig(cov) 33 | alphas = np.random.normal(0, 0.1, c) 34 | pca_img = img + np.dot(p, alphas*lambdas) 35 | 36 | pca_color_img = pca_img * std + mean 37 | pca_color_img = np.maximum(np.minimum(pca_color_img, 255), 0) 38 | return pca_color_img.reshape(h, w, c).astype(np.uint8) 39 | 40 | # Flip and Rotation 41 | def random_horizontal_flip(img, p): 42 | """ 43 | img : Image to be horizontal flipped 44 | p: probability that image should be horizontal flipped. 45 | """ 46 | if np.random.random() < p: 47 | img = np.fliplr(img) 48 | return img 49 | 50 | def random_vertical_flip(img, p): 51 | """ 52 | img : Image to be vertical flipped 53 | p: probability that image should be vertical flipped. 54 | """ 55 | if np.random.random() < p: 56 | img = np.flipud(img) 57 | return img 58 | 59 | def random_rotate90(img, p): 60 | """ 61 | img : Image to be random rotated 62 | p: probability that image should be random rotated. 63 | """ 64 | if np.random.random() < p: 65 | angle = np.random.randint(1, 3) * 90 66 | 67 | if angle == 90: 68 | img = img.transpose(1,0,2) 69 | img = np.fliplr(img) 70 | 71 | elif angle == 180: 72 | img = np.rot90(img, 2) 73 | 74 | elif angle == 270: 75 | img = img.transpose(1,0,2) 76 | img = np.flipud(img) 77 | return img 78 | 79 | def rotate(img, angle): 80 | """ 81 | img : Image to be rotated 82 | angle(degree measure): angle to be rotated 83 | """ 84 | height, width = img.shape[0:2] 85 | mat = cv2.getRotationMatrix2D((width/2, height/2), angle, 1.0) 86 | img = cv2.warpAffine(img, mat, (width, height), 87 | flags=cv2.INTER_LINEAR, 88 | borderMode=cv2.BORDER_REFLECT_101) 89 | return img 90 | 91 | def shift_scale_rotate(img, angle, scale, dx, dy): 92 | """ 93 | img : Image to be affine transformation 94 | angle(degree measure): angle to be rotated(15, 30) 95 | scale: sclae(1.1/1.2/1.3) 96 | dx, dy: offset, Compared to the original image 97 | 98 | """ 99 | height, width = img.shape[:2] 100 | 101 | angle = np.random.uniform(-angle, angle) 102 | scale = np.random.uniform(1.0/scale, scale) 103 | 104 | cc = math.cos(angle/180*math.pi) * scale 105 | ss = math.sin(angle/180*math.pi) * scale 106 | rotate_matrix = np.array([[cc, -ss], [ss, cc]]) 107 | 108 | box0 = np.array([[0, 0], [width, 0], [width, height], [0, height], ]) 109 | box1 = box0 - np.array([width/2, height/2]) 110 | box1 = np.dot(box1, rotate_matrix.T) + \ 111 | np.array([width/2+dx*width, height/2+dy*height]) 112 | 113 | box0 = box0.astype(np.float32) 114 | box1 = box1.astype(np.float32) 115 | mat = cv2.getPerspectiveTransform(box0, box1) 116 | img = cv2.warpPerspective(img, mat, (width, height), 117 | flags=cv2.INTER_LINEAR, 118 | borderMode=cv2.BORDER_REFLECT_101) 119 | return img 120 | 121 | 122 | # color: brightness, contrast, saturation : Done 123 | def random_brightness(img, brightness): 124 | """ 125 | brightness : float, The brightness jitter ratio range, [0, 1] 126 | """ 127 | alpha = 1 + np.random.uniform(-brightness, brightness) 128 | img = alpha * img 129 | img = np.clip(img, 0, 255).astype(np.uint8) 130 | return img 131 | 132 | def random_contrast(img, contrast): 133 | """ 134 | contrast : The contrast jitter ratio range, [0, 1] 135 | """ 136 | coef = np.array([[[0.114, 0.587, 0.299]]]) # rgb to gray (YCbCr) 137 | alpha = 1.0 + np.random.uniform(-contrast, contrast) 138 | gray = img * coef 139 | gray = (3.0 * (1.0 - alpha) / gray.size) * np.sum(gray) 140 | img = alpha*img + gray 141 | img = np.clip(img, 0, 255).astype(np.uint8) 142 | return img 143 | 144 | def random_saturation(img, saturation): 145 | coef = np.array([[[0.299, 0.587, 0.114]]]) 146 | alpha = np.random.uniform(-saturation, saturation) 147 | gray = img * coef 148 | gray = np.sum(gray,axis=2, keepdims=True) 149 | img = alpha*img + (1.0 - alpha)*gray 150 | img = np.clip(img, 0, 255).astype(np.uint8) 151 | return img 152 | 153 | def random_color(img, brightness, contrast, saturation): 154 | """ 155 | brightness : The brightness jitter ratio range, [0, 1] 156 | contrast : The contrast jitter ratio range, [0, 1] 157 | saturation : The saturation jitter ratio range, [0, 1] 158 | """ 159 | if brightness > 0: 160 | img = random_brightness(img, brightness) 161 | if contrast > 0: 162 | img = random_contrast(img, contrast) 163 | if saturation > 0: 164 | img = random_saturation(img, saturation) 165 | return img 166 | 167 | def random_hue(image, hue): 168 | """ 169 | The hue jitter ratio range, [0, 1] 170 | """ 171 | h = int(np.random.uniform(-hue, hue)*180) 172 | 173 | hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) 174 | hsv[:, :, 0] = (hsv[:, :, 0].astype(int) + h) % 180 175 | image = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR) 176 | return image 177 | 178 | 179 | # add noise 180 | def random_noise(img, limit=[0, 0.1], p=1): 181 | if np.random.random() < p: 182 | H,W = img.shape[:2] 183 | noise = np.random.uniform(limit[0], limit[1], size=(H,W))*255 184 | 185 | img = img + noise[:,:,np.newaxis]*np.array([1,1,1]) 186 | img = np.clip(img, 0, 255).astype(np.uint8) 187 | 188 | return img 189 | 190 | 191 | # crop and resize 192 | def random_crop(img, size): 193 | """ 194 | size: (tuple) (new_w, new_h) 195 | value: 0.9*W > new_w > 0.8*W 196 | 0.9*H > new_h > 0.8*H 197 | """ 198 | H, W = img.shape[:2] 199 | new_w, new_h = size 200 | assert(H > new_h) 201 | assert(W > new_w) 202 | 203 | x0 = np.random.choice(W-new_w) if W!=new_w else 0 204 | y0 = np.random.choice(H-new_h) if H!=new_h else 0 205 | 206 | if (new_w, new_h) != (W, H): 207 | img = img[y0:y0+new_h, x0:x0+new_w, :] 208 | 209 | return img 210 | 211 | def center_crop(img, size): 212 | """ 213 | size: (tuple) (new_w, new_h) 214 | """ 215 | H, W = img.shape[:2] 216 | new_w, new_h = size 217 | 218 | x0 = (W - new_w) // 2 219 | y0 = (H - new_h) // 2 220 | 221 | if (new_w, new_h) != (W, H): 222 | img = img[y0:y0+new_h, x0:x0+new_w] 223 | 224 | return img 225 | 226 | def _get_interp_method(interp, sizes=()): 227 | """ 228 | interpolation method for all resizing operations 229 | Possible values: 230 | 0: Nearest Neighbors Interpolation. 231 | 1: Bilinear interpolation. 232 | 2: Area-based (resampling using pixel area relation). It may be a 233 | preferred method for image decimation, as it gives moire-free 234 | results. But when the image is zoomed, it is similar to the Nearest 235 | Neighbors method. (used by default). 236 | 3: Bicubic interpolation over 4x4 pixel neighborhood. 237 | 4: Lanczos interpolation over 8x8 pixel neighborhood. 238 | 9: Cubic for enlarge, area for shrink, bilinear for others 239 | 10: Random select from interpolation method metioned above. 240 | sizes : tuple of int 241 | """ 242 | if interp == 9: 243 | if sizes: 244 | assert len(sizes) == 4 245 | oh, ow, nh, nw = sizes 246 | if nh > oh and nw > ow: 247 | return 2 248 | elif nh < oh and nw < ow: 249 | return 3 250 | else: 251 | return 1 252 | else: 253 | return 2 254 | if interp == 10: 255 | return random.randint(0, 4) 256 | if interp not in (0, 1, 2, 3, 4): 257 | raise ValueError('Unknown interp method %d' % interp) 258 | return interp 259 | 260 | def resize(img, size, interp=2): 261 | h, w = img.shape[:2] 262 | 263 | if h > w: 264 | new_h, new_w = size * h // w, size 265 | else: 266 | new_h, new_w = size, size * w // h 267 | return cv2.resize(img, (new_w, new_h), _get_interp_method(interp, (h, w, new_h, new_w))) 268 | 269 | def elastic_transform_fast(img, alpha=2, sigma=100, alpha_affine=100, random_state=None): 270 | """Elastic deformation of images as described in [Simard2003]_ (with modifications). 271 | .. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for 272 | Convolutional Neural Networks applied to Visual Document Analysis", in 273 | Proc. of the International Conference on Document Analysis and 274 | Recognition, 2003. 275 | Based on https://gist.github.com/erniejunior/601cdf56d2b424757de5 276 | """ 277 | if random_state is None: 278 | random_state = np.random.RandomState(1234) 279 | 280 | shape = img.shape 281 | shape_size = shape[:2] 282 | 283 | # Random affine 284 | center_square = np.float32(shape_size) // 2 285 | square_size = min(shape_size) // 3 286 | alpha = float(alpha) 287 | sigma = float(sigma) 288 | alpha_affine = float(alpha_affine) 289 | 290 | pts1 = np.float32([center_square + square_size, [center_square[0] + square_size, center_square[1] - square_size], 291 | center_square - square_size]) 292 | pts2 = pts1 + random_state.uniform(-alpha_affine, 293 | alpha_affine, size=pts1.shape).astype(np.float32) 294 | M = cv2.getAffineTransform(pts1, pts2) 295 | 296 | img = cv2.warpAffine( 297 | img, M, shape_size[::-1], borderMode=cv2.BORDER_REFLECT_101) 298 | 299 | dx = np.float32(gaussian_filter( 300 | (random_state.rand(*shape_size) * 2 - 1), sigma) * alpha) 301 | dy = np.float32(gaussian_filter( 302 | (random_state.rand(*shape_size) * 2 - 1), sigma) * alpha) 303 | 304 | x, y = np.meshgrid(np.arange(shape[1]), np.arange(shape[0])) 305 | 306 | mapx = np.float32(x + dx) 307 | mapy = np.float32(y + dy) 308 | 309 | return cv2.remap(img, mapx, mapy, interpolation=cv2.INTER_LINEAR, borderMode=cv2.BORDER_REFLECT_101) 310 | 311 | def channel_shuffle(img): 312 | ch_arr = [0, 1, 2] 313 | np.random.shuffle(ch_arr) 314 | img = img[..., ch_arr] 315 | return img 316 | 317 | def to_gray(img): 318 | gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) 319 | if np.mean(gray) > 127: 320 | gray = 255 - gray 321 | return cv2.cvtColor(gray, cv2.COLOR_GRAY2RGB) 322 | 323 | 324 | if __name__ == '__main__': 325 | img_path = '/home/zhaozhichao/Desktop/3-workspace/FACE_MODE_3DDFA/example/b24.jpg' 326 | img = cv2.imread(img_path) 327 | cv2.imshow("origin img", img) 328 | if len(img.shape) == 2: 329 | w, h = img.shape[:2] 330 | img = img.reshape(w, h, 1) 331 | im2 = FancyPCA(img) 332 | cv2.imshow("im2", im2) 333 | cv2.waitKey() -------------------------------------------------------------------------------- /appendix/module/models/classification.md: -------------------------------------------------------------------------------- 1 | 本文代码基于PyTorch 1.0版本,需要用到以下包 2 | 3 | ~~~ 4 | import numpy as np 5 | import torch 6 | import torch.nn as nn 7 | import torch.nn.functional as F 8 | from torch.autograd import Variable 9 | ~~~ 10 | 11 | #### 1. Basic define 12 | 13 | ~~~python 14 | # conv3x3 15 | def conv3x3(in_planes, out_planes, stride=1, padding=1, groups=1): 16 | """3x3 convolution""" 17 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 18 | padding=1, groups=groups, bias=False) 19 | # conv1x1 20 | def conv1x1(in_channels, out_channels, stride=1): 21 | """1x1 convolution""" 22 | return nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, padding=0, 23 | bias=False) 24 | # BasciConv2d 25 | class BasicConv2d(nn.Module): 26 | def __init__(self, in_channels, out_channels, **kwargs): 27 | super(BasicConv2d, self).__init__() 28 | self.conv = nn.Conv2d(in_channels, out_channels, bias=True, **kwargs) 29 | self.bn = nn.BatchNorm2d(out_channels, eps=1e-5) 30 | 31 | def forward(self, x): 32 | x = self.conv(x) 33 | x = self.bn(x) 34 | return F.relu(x, inplace=True) 35 | ~~~ 36 | 37 | #### 2. resnet 38 | 39 | ~~~python 40 | class BasicBlock(nn.Module): 41 | expansion = 1 42 | 43 | def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1, 44 | base_width=64, norm_layer=None): 45 | super(BasicBlock, self).__init__() 46 | if norm_layer is None: 47 | norm_layer = nn.BatchNorm2d 48 | if groups != 1 or base_width != 64: 49 | raise ValueError('BasicBlock only supports groups=1 and base_width=64') 50 | # Both self.conv1 and self.downsample layers downsample the input when stride != 1 51 | self.conv1 = conv3x3(inplanes, planes, stride) 52 | self.bn1 = norm_layer(planes) 53 | self.relu = nn.ReLU(inplace=True) 54 | self.conv2 = conv3x3(planes, planes) 55 | self.bn2 = norm_layer(planes) 56 | self.downsample = downsample 57 | self.stride = stride 58 | 59 | def forward(self, x): 60 | identity = x 61 | 62 | out = self.conv1(x) 63 | out = self.bn1(out) 64 | out = self.relu(out) 65 | 66 | out = self.conv2(out) 67 | out = self.bn2(out) 68 | 69 | if self.downsample is not None: 70 | identity = self.downsample(x) 71 | 72 | out += identity 73 | out = self.relu(out) 74 | 75 | return out 76 | ~~~ 77 | 78 | ~~~python 79 | class Bottleneck(nn.Module): 80 | expansion = 4 81 | 82 | def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1, 83 | base_width=64, norm_layer=None): 84 | super(Bottleneck, self).__init__() 85 | if norm_layer is None: 86 | norm_layer = nn.BatchNorm2d 87 | width = int(planes * (base_width / 64.)) * groups 88 | # Both self.conv2 and self.downsample layers downsample the input when stride != 1 89 | self.conv1 = conv1x1(inplanes, width) 90 | self.bn1 = norm_layer(width) 91 | self.conv2 = conv3x3(width, width, stride, groups) 92 | self.bn2 = norm_layer(width) 93 | self.conv3 = conv1x1(width, planes * self.expansion) 94 | self.bn3 = norm_layer(planes * self.expansion) 95 | self.relu = nn.ReLU(inplace=True) 96 | self.downsample = downsample 97 | self.stride = stride 98 | 99 | def forward(self, x): 100 | identity = x 101 | 102 | out = self.conv1(x) 103 | out = self.bn1(out) 104 | out = self.relu(out) 105 | 106 | out = self.conv2(out) 107 | out = self.bn2(out) 108 | out = self.relu(out) 109 | 110 | out = self.conv3(out) 111 | out = self.bn3(out) 112 | 113 | if self.downsample is not None: 114 | identity = self.downsample(x) 115 | 116 | out += identity 117 | out = self.relu(out) 118 | 119 | return out 120 | ~~~ 121 | 122 | #### 3. Inception 123 | 124 | ~~~python 125 | # Inception 126 | class Inception(nn.Module): 127 | 128 | def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj): 129 | super(Inception, self).__init__() 130 | self.branch1 = BasicConv2d(in_channels, ch1x1, kernel_size=1) 131 | self.branch2 = nn.Sequential( 132 | BasicConv2d(in_channels, ch3x3red, kernel_size=1), 133 | BasicConv2d(ch3x3red, ch3x3, kernel_size=3, padding=1) 134 | ) 135 | self.branch3 = nn.Sequential( 136 | BasicConv2d(in_channels, ch5x5red, kernel_size=1), 137 | BasicConv2d(ch5x5red, ch5x5, kernel_size=3, padding=1) 138 | ) 139 | self.branch4 = nn.Sequential( 140 | nn.MaxPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True), 141 | BasicConv2d(in_channels, pool_proj, kernel_size=1) 142 | ) 143 | 144 | def forward(self, x): 145 | branch1 = self.branch1(x) 146 | branch2 = self.branch2(x) 147 | branch3 = self.branch3(x) 148 | branch4 = self.branch4(x) 149 | outputs = [branch1, branch2, branch3, branch4] 150 | return torch.cat(outputs, 1) 151 | ~~~ 152 | 153 | #### 4. SENet 154 | 155 | ~~~python 156 | class BasicBlock(nn.Module): 157 | expansion = 1 158 | 159 | def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=16): 160 | super(SEBasicBlock, self).__init__() 161 | self.conv1 = conv3x3(inplanes, planes, stride) 162 | self.bn1 = nn.BatchNorm2d(planes) 163 | self.relu = nn.ReLU(inplace=True) 164 | self.conv2 = conv3x3(planes, planes, 1) 165 | self.bn2 = nn.BatchNorm2d(planes) 166 | self.se = SELayer(planes, reduction) 167 | self.downsample = downsample 168 | self.stride = stride 169 | 170 | def forward(self, x): 171 | residual = x 172 | out = self.conv1(x) 173 | out = self.bn1(out) 174 | out = self.relu(out) 175 | 176 | out = self.conv2(out) 177 | out = self.bn2(out) 178 | out = self.se(out) 179 | 180 | if self.downsample is not None: 181 | residual = self.downsample(x) 182 | 183 | out += residual 184 | out = self.relu(out) 185 | 186 | return out 187 | ~~~ 188 | 189 | ~~~python 190 | class Bottleneck(nn.Module): 191 | expansion = 4 192 | 193 | def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=16): 194 | super(Bottleneck, self).__init__() 195 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 196 | self.bn1 = nn.BatchNorm2d(planes) 197 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, 198 | padding=1, bias=False) 199 | self.bn2 = nn.BatchNorm2d(planes) 200 | self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False) 201 | self.bn3 = nn.BatchNorm2d(planes * 4) 202 | self.relu = nn.ReLU(inplace=True) 203 | self.se = SELayer(planes * 4, reduction) 204 | self.downsample = downsample 205 | self.stride = stride 206 | 207 | def forward(self, x): 208 | residual = x 209 | 210 | out = self.conv1(x) 211 | out = self.bn1(out) 212 | out = self.relu(out) 213 | 214 | out = self.conv2(out) 215 | out = self.bn2(out) 216 | out = self.relu(out) 217 | 218 | out = self.conv3(out) 219 | out = self.bn3(out) 220 | out = self.se(out) 221 | 222 | if self.downsample is not None: 223 | residual = self.downsample(x) 224 | 225 | out += residual 226 | out = self.relu(out) 227 | 228 | return out 229 | ~~~ 230 | 231 | #### 5. Shufflenet 232 | 233 | ~~~python 234 | # ShufflenetV1 235 | class ShufflenetUnit(nn.Module): 236 | expansion = 4 237 | def __init__(self, inplanes, planes, stride=1, downsample=None, flag=False): 238 | super(ShufflenetUnit, self).__init__() 239 | self.downsample = downsample 240 | group_num = 3 241 | self.flag = flag 242 | if self.flag: 243 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, groups=1, bias=False) 244 | else: 245 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, groups=group_num, bias=False) 246 | self.bn1 = nn.BatchNorm2d(planes) 247 | 248 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, 249 | padding=1, bias=False) 250 | self.bn2 = nn.BatchNorm2d(planes) 251 | 252 | self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, groups=group_num, bias=False) 253 | self.bn3 = nn.BatchNorm2d(planes * 4) 254 | self.relu = nn.ReLU(inplace=True) 255 | 256 | def _shuffle(self, features, g): 257 | channels = features.size()[1] 258 | index = torch.from_numpy(np.asarray([i for i in range(channels)])) 259 | index = index.view(-1, g).t().contiguous() 260 | index = index.view(-1).cuda() 261 | features = features[:, index] 262 | return features 263 | 264 | def forward(self, x): 265 | residual = x 266 | 267 | out = self.conv1(x) 268 | out = self.bn1(out) 269 | out = self.relu(out) 270 | 271 | if not self.flag: 272 | out = self._shuffle(out, 3) 273 | 274 | out = self.conv2(out) 275 | out = self.bn2(out) 276 | 277 | out = self.conv3(out) 278 | out = self.bn3(out) 279 | 280 | if self.downsample is not None: 281 | residual = self.downsample(x) 282 | out = torch.cat((out, residual), 1) 283 | else: 284 | out += residual 285 | out = self.relu(out) 286 | 287 | return out 288 | ~~~ 289 | 290 | ~~~python 291 | # ShuffleUnit V2 292 | class ShuffleUnit(nn.Module): 293 | def __init__(self, inplanes, planes, stride=1, downsample=None): 294 | super(ShufflenetBlockV2, self).__init__() 295 | self.downsample = downsample 296 | 297 | if not self.downsample: #---if not downsample, then channel split, so the channel become half 298 | inplanes = inplanes // 2 299 | planes = planes // 2 300 | 301 | self.conv1x1_1 = conv1x1(in_channels=inplanes, out_channels=planes) 302 | self.conv1x1_1_bn = nn.BatchNorm2d(planes) 303 | 304 | self.dwconv3x3 = conv3x3(in_channels=planes, out_channels=planes, stride=stride, groups=planes) 305 | self.dwconv3x3_bn= nn.BatchNorm2d(planes) 306 | 307 | self.conv1x1_2 = conv1x1(in_channels=planes, out_channels=planes) 308 | self.conv1x1_2_bn = nn.BatchNorm2d(planes) 309 | 310 | self.relu = nn.ReLU(inplace=True) 311 | 312 | def _channel_split(self, features, ratio=0.5): 313 | size = features.size()[1] 314 | split_idx = int(size * ratio) 315 | return features[:,:split_idx], features[:,split_idx:] 316 | 317 | def _channel_shuffle(self, features, g=2): 318 | channels = features.size()[1] 319 | index = torch.from_numpy(np.asarray([i for i in range(channels)])) 320 | index = index.view(-1, g).t().contiguous() 321 | index = index.view(-1).cuda() 322 | features = features[:, index] 323 | return features 324 | 325 | def forward(self, x): 326 | if self.downsample: 327 | # x1 = x.clone() #----deep copy x, so where x2 is modified, x1 not be affected 328 | x1 = x 329 | x2 = x 330 | else: 331 | x1, x2 = self._channel_split(x) 332 | 333 | #----right branch----- 334 | x2 = self.conv1x1_1(x2) 335 | x2 = self.conv1x1_1_bn(x2) 336 | x2 = self.relu(x2) 337 | 338 | x2 = self.dwconv3x3(x2) 339 | x2 = self.dwconv3x3_bn(x2) 340 | 341 | x2 = self.conv1x1_2(x2) 342 | x2 = self.conv1x1_2_bn(x2) 343 | x2 = self.relu(x2) 344 | 345 | #---left branch------- 346 | if self.downsample: 347 | x1 = self.downsample(x1) 348 | 349 | x = torch.cat([x1, x2], 1) 350 | x = self._channel_shuffle(x) 351 | return x 352 | ~~~ 353 | 354 | #### 6. Mobilenet V2 355 | 356 | ~~~python 357 | # Mobilenet V2 358 | class InvertedResidual(nn.Module): 359 | def __init__(self, inp, oup, stride, expand_ratio): 360 | super(InvertedResidual, self).__init__() 361 | self.stride = stride 362 | self.use_res_connect = self.stride == 1 and inp == oup 363 | self.conv = nn.Sequential( 364 | # pw 365 | nn.Conv2d(inp, inp * expand_ratio, 1, 1, 0, bias=False), 366 | nn.BatchNorm2d(inp * expand_ratio), 367 | nn.ReLU6(inplace=True), 368 | # dw 369 | nn.Conv2d(inp * expand_ratio, inp * expand_ratio, 3, stride, 1, groups=inp * expand_ratio, bias=False), 370 | nn.BatchNorm2d(inp * expand_ratio), 371 | nn.ReLU6(inplace=True), 372 | # pw-linear 373 | nn.Conv2d(inp * expand_ratio, oup, 1, 1, 0, bias=False), 374 | nn.BatchNorm2d(oup), 375 | ) 376 | 377 | def forward(self, x): 378 | if self.use_res_connect: 379 | return x + self.conv(x) 380 | else: 381 | return self.conv(x) 382 | ~~~ 383 | 384 | 385 | 386 | -------------------------------------------------------------------------------- /appendix/module/models/fpn.py: -------------------------------------------------------------------------------- 1 | '''RetinaFPN in PyTorch.''' 2 | import torch 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | 6 | from torch.autograd import Variable 7 | 8 | 9 | class Bottleneck(nn.Module): 10 | expansion = 4 11 | 12 | def __init__(self, in_planes, planes, stride=1): 13 | super(Bottleneck, self).__init__() 14 | self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False) 15 | self.bn1 = nn.BatchNorm2d(planes) 16 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False) 17 | self.bn2 = nn.BatchNorm2d(planes) 18 | self.conv3 = nn.Conv2d(planes, self.expansion*planes, kernel_size=1, bias=False) 19 | self.bn3 = nn.BatchNorm2d(self.expansion*planes) 20 | 21 | self.downsample = nn.Sequential() 22 | if stride != 1 or in_planes != self.expansion*planes: 23 | self.downsample = nn.Sequential( 24 | nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False), 25 | nn.BatchNorm2d(self.expansion*planes) 26 | ) 27 | 28 | def forward(self, x): 29 | out = F.relu(self.bn1(self.conv1(x))) 30 | out = F.relu(self.bn2(self.conv2(out))) 31 | out = self.bn3(self.conv3(out)) 32 | out += self.downsample(x) 33 | out = F.relu(out) 34 | return out 35 | 36 | 37 | class FPN(nn.Module): 38 | def __init__(self, block, num_blocks): 39 | super(FPN, self).__init__() 40 | self.in_planes = 64 41 | 42 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False) 43 | self.bn1 = nn.BatchNorm2d(64) 44 | 45 | # Bottom-up layers 46 | self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1) 47 | self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2) 48 | self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2) 49 | self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2) 50 | self.conv6 = nn.Conv2d(2048, 256, kernel_size=3, stride=2, padding=1) 51 | self.conv7 = nn.Conv2d( 256, 256, kernel_size=3, stride=2, padding=1) 52 | 53 | # Lateral layers 54 | self.latlayer1 = nn.Conv2d(2048, 256, kernel_size=1, stride=1, padding=0) 55 | self.latlayer2 = nn.Conv2d(1024, 256, kernel_size=1, stride=1, padding=0) 56 | self.latlayer3 = nn.Conv2d( 512, 256, kernel_size=1, stride=1, padding=0) 57 | 58 | # Top-down layers 59 | self.toplayer1 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1) 60 | self.toplayer2 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1) 61 | 62 | def _make_layer(self, block, planes, num_blocks, stride): 63 | strides = [stride] + [1]*(num_blocks-1) 64 | layers = [] 65 | for stride in strides: 66 | layers.append(block(self.in_planes, planes, stride)) 67 | self.in_planes = planes * block.expansion 68 | return nn.Sequential(*layers) 69 | 70 | def _upsample_add(self, x, y): 71 | '''Upsample and add two feature maps. 72 | Args: 73 | x: (Variable) top feature map to be upsampled. 74 | y: (Variable) lateral feature map. 75 | Returns: 76 | (Variable) added feature map. 77 | Note in PyTorch, when input size is odd, the upsampled feature map 78 | with `F.upsample(..., scale_factor=2, mode='nearest')` 79 | maybe not equal to the lateral feature map size. 80 | e.g. 81 | original input size: [N,_,15,15] -> 82 | conv2d feature map size: [N,_,8,8] -> 83 | upsampled feature map size: [N,_,16,16] 84 | So we choose bilinear upsample which supports arbitrary output sizes. 85 | ''' 86 | _,_,H,W = y.size() 87 | return F.upsample(x, size=(H,W), mode='bilinear') + y 88 | 89 | def forward(self, x): 90 | # Bottom-up 91 | c1 = F.relu(self.bn1(self.conv1(x))) 92 | c1 = F.max_pool2d(c1, kernel_size=3, stride=2, padding=1) 93 | c2 = self.layer1(c1) 94 | c3 = self.layer2(c2) 95 | c4 = self.layer3(c3) 96 | c5 = self.layer4(c4) 97 | p6 = self.conv6(c5) 98 | p7 = self.conv7(F.relu(p6)) 99 | # Top-down 100 | p5 = self.latlayer1(c5) 101 | p4 = self._upsample_add(p5, self.latlayer2(c4)) 102 | p4 = self.toplayer1(p4) 103 | p3 = self._upsample_add(p4, self.latlayer3(c3)) 104 | p3 = self.toplayer2(p3) 105 | return p3, p4, p5, p6, p7 106 | 107 | 108 | def FPN50(): 109 | return FPN(Bottleneck, [3,4,6,3]) 110 | 111 | def FPN101(): 112 | return FPN(Bottleneck, [2,4,23,3]) 113 | 114 | 115 | def test(): 116 | net = FPN50() 117 | fms = net(Variable(torch.randn(1,3,600,300))) 118 | for fm in fms: 119 | print(fm.size()) 120 | 121 | # test() -------------------------------------------------------------------------------- /appendix/module/models/mnasnet.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | import torch 4 | import torch.nn as nn 5 | 6 | __all__ = ['MNASNet', 'mnasnet0_5', 'mnasnet0_75', 'mnasnet1_0', 'mnasnet1_3'] 7 | 8 | _MODEL_URLS = { 9 | "mnasnet0_5": 10 | "https://download.pytorch.org/models/mnasnet0.5_top1_67.592-7c6cb539b9.pth", 11 | "mnasnet0_75": None, 12 | "mnasnet1_0": 13 | "https://download.pytorch.org/models/mnasnet1.0_top1_73.512-f206786ef8.pth", 14 | "mnasnet1_3": None 15 | } 16 | 17 | # Paper suggests 0.9997 momentum, for TensorFlow. Equivalent PyTorch momentum is 18 | # 1.0 - tensorflow. 19 | _BN_MOMENTUM = 1 - 0.9997 20 | 21 | 22 | class _InvertedResidual(nn.Module): 23 | 24 | def __init__(self, in_ch, out_ch, kernel_size, stride, expansion_factor, 25 | bn_momentum=0.1): 26 | super(_InvertedResidual, self).__init__() 27 | assert stride in [1, 2] 28 | assert kernel_size in [3, 5] 29 | mid_ch = in_ch * expansion_factor 30 | self.apply_residual = (in_ch == out_ch and stride == 1) 31 | self.layers = nn.Sequential( 32 | # Pointwise 33 | nn.Conv2d(in_ch, mid_ch, 1, bias=False), 34 | nn.BatchNorm2d(mid_ch, momentum=bn_momentum), 35 | nn.ReLU(inplace=True), 36 | # Depthwise 37 | nn.Conv2d(mid_ch, mid_ch, kernel_size, padding=kernel_size // 2, 38 | stride=stride, groups=mid_ch, bias=False), 39 | nn.BatchNorm2d(mid_ch, momentum=bn_momentum), 40 | nn.ReLU(inplace=True), 41 | # Linear pointwise. Note that there's no activation. 42 | nn.Conv2d(mid_ch, out_ch, 1, bias=False), 43 | nn.BatchNorm2d(out_ch, momentum=bn_momentum)) 44 | 45 | def forward(self, input): 46 | if self.apply_residual: 47 | return self.layers(input) + input 48 | else: 49 | return self.layers(input) 50 | 51 | 52 | def _stack(in_ch, out_ch, kernel_size, stride, exp_factor, repeats, 53 | bn_momentum): 54 | """ Creates a stack of inverted residuals. """ 55 | assert repeats >= 1 56 | # First one has no skip, because feature map size changes. 57 | first = _InvertedResidual(in_ch, out_ch, kernel_size, stride, exp_factor, 58 | bn_momentum=bn_momentum) 59 | remaining = [] 60 | for _ in range(1, repeats): 61 | remaining.append( 62 | _InvertedResidual(out_ch, out_ch, kernel_size, 1, exp_factor, 63 | bn_momentum=bn_momentum)) 64 | return nn.Sequential(first, *remaining) 65 | 66 | 67 | def _round_to_multiple_of(val, divisor, round_up_bias=0.9): 68 | """ Asymmetric rounding to make `val` divisible by `divisor`. With default 69 | bias, will round up, unless the number is no more than 10% greater than the 70 | smaller divisible value, i.e. (83, 8) -> 80, but (84, 8) -> 88. """ 71 | assert 0.0 < round_up_bias < 1.0 72 | new_val = max(divisor, int(val + divisor / 2) // divisor * divisor) 73 | return new_val if new_val >= round_up_bias * val else new_val + divisor 74 | 75 | 76 | def _scale_depths(depths, alpha): 77 | """ Scales tensor depths as in reference MobileNet code, prefers rouding up 78 | rather than down. """ 79 | return [_round_to_multiple_of(depth * alpha, 8) for depth in depths] 80 | 81 | 82 | class MNASNet(torch.nn.Module): 83 | """ MNASNet, as described in https://arxiv.org/pdf/1807.11626.pdf. 84 | >>> model = MNASNet(1000, 1.0) 85 | >>> x = torch.rand(1, 3, 224, 224) 86 | >>> y = model(x) 87 | >>> y.dim() 88 | 1 89 | >>> y.nelement() 90 | 1000 91 | """ 92 | 93 | def __init__(self, alpha, num_classes=1000, dropout=0.2): 94 | super(MNASNet, self).__init__() 95 | depths = _scale_depths([24, 40, 80, 96, 192, 320], alpha) 96 | layers = [ 97 | # First layer: regular conv. 98 | nn.Conv2d(3, 32, 3, padding=1, stride=2, bias=False), 99 | nn.BatchNorm2d(32, momentum=_BN_MOMENTUM), 100 | nn.ReLU(inplace=True), 101 | # Depthwise separable, no skip. 102 | nn.Conv2d(32, 32, 3, padding=1, stride=1, groups=32, bias=False), 103 | nn.BatchNorm2d(32, momentum=_BN_MOMENTUM), 104 | nn.ReLU(inplace=True), 105 | nn.Conv2d(32, 16, 1, padding=0, stride=1, bias=False), 106 | nn.BatchNorm2d(16, momentum=_BN_MOMENTUM), 107 | # MNASNet blocks: stacks of inverted residuals. 108 | _stack(16, depths[0], 3, 2, 3, 3, _BN_MOMENTUM), 109 | _stack(depths[0], depths[1], 5, 2, 3, 3, _BN_MOMENTUM), 110 | _stack(depths[1], depths[2], 5, 2, 6, 3, _BN_MOMENTUM), 111 | _stack(depths[2], depths[3], 3, 1, 6, 2, _BN_MOMENTUM), 112 | _stack(depths[3], depths[4], 5, 2, 6, 4, _BN_MOMENTUM), 113 | _stack(depths[4], depths[5], 3, 1, 6, 1, _BN_MOMENTUM), 114 | # Final mapping to classifier input. 115 | nn.Conv2d(depths[5], 1280, 1, padding=0, stride=1, bias=False), 116 | nn.BatchNorm2d(1280, momentum=_BN_MOMENTUM), 117 | nn.ReLU(inplace=True), 118 | ] 119 | self.layers = nn.Sequential(*layers) 120 | self.classifier = nn.Sequential(nn.Dropout(p=dropout, inplace=True), 121 | nn.Linear(1280, num_classes)) 122 | self._initialize_weights() 123 | 124 | def forward(self, x): 125 | x = self.layers(x) 126 | # Equivalent to global avgpool and removing H and W dimensions. 127 | x = x.mean([2, 3]) 128 | return self.classifier(x) 129 | 130 | def _initialize_weights(self): 131 | for m in self.modules(): 132 | if isinstance(m, nn.Conv2d): 133 | nn.init.kaiming_normal_(m.weight, mode="fan_out", 134 | nonlinearity="relu") 135 | if m.bias is not None: 136 | nn.init.zeros_(m.bias) 137 | elif isinstance(m, nn.BatchNorm2d): 138 | nn.init.ones_(m.weight) 139 | nn.init.zeros_(m.bias) 140 | elif isinstance(m, nn.Linear): 141 | nn.init.normal_(m.weight, 0.01) 142 | nn.init.zeros_(m.bias) 143 | 144 | 145 | def mnasnet0_5(pretrained=False, progress=True, **kwargs): 146 | """MNASNet with depth multiplier of 0.5 from 147 | `"MnasNet: Platform-Aware Neural Architecture Search for Mobile" 148 | `_. 149 | Args: 150 | pretrained (bool): If True, returns a model pre-trained on ImageNet 151 | progress (bool): If True, displays a progress bar of the download to stderr 152 | """ 153 | model = MNASNet(0.5, **kwargs) 154 | return model 155 | 156 | 157 | def mnasnet0_75(pretrained=False, progress=True, **kwargs): 158 | """MNASNet with depth multiplier of 0.75 from 159 | `"MnasNet: Platform-Aware Neural Architecture Search for Mobile" 160 | `_. 161 | Args: 162 | pretrained (bool): If True, returns a model pre-trained on ImageNet 163 | progress (bool): If True, displays a progress bar of the download to stderr 164 | """ 165 | model = MNASNet(0.75, **kwargs) 166 | return model 167 | 168 | 169 | def mnasnet1_0(pretrained=False, progress=True, **kwargs): 170 | """MNASNet with depth multiplier of 1.0 from 171 | `"MnasNet: Platform-Aware Neural Architecture Search for Mobile" 172 | `_. 173 | Args: 174 | pretrained (bool): If True, returns a model pre-trained on ImageNet 175 | progress (bool): If True, displays a progress bar of the download to stderr 176 | """ 177 | model = MNASNet(1.0, **kwargs) 178 | return model 179 | 180 | 181 | def mnasnet1_3(pretrained=False, progress=True, **kwargs): 182 | """MNASNet with depth multiplier of 1.3 from 183 | `"MnasNet: Platform-Aware Neural Architecture Search for Mobile" 184 | `_. 185 | Args: 186 | pretrained (bool): If True, returns a model pre-trained on ImageNet 187 | progress (bool): If True, displays a progress bar of the download to stderr 188 | """ 189 | model = MNASNet(1.3, **kwargs) 190 | return model -------------------------------------------------------------------------------- /appendix/module/models/mobilenet_v1.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # coding: utf-8 3 | 4 | from __future__ import division 5 | """ 6 | Creates a MobileNet Model as defined in: 7 | Andrew G. Howard Menglong Zhu Bo Chen, et.al. (2017). 8 | MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. 9 | Copyright (c) Yang Lu, 2017 10 | 11 | Modified By cleardusk 12 | """ 13 | import math 14 | import torch.nn as nn 15 | 16 | __all__ = ['mobilenet_2', 'mobilenet_1', 'mobilenet_075', 'mobilenet_05', 'mobilenet_025'] 17 | 18 | class DepthWiseBlock(nn.Module): 19 | def __init__(self, inplanes, planes, stride=1, prelu=False): 20 | super(DepthWiseBlock, self).__init__() 21 | inplanes, planes = int(inplanes), int(planes) 22 | self.conv_dw = nn.Conv2d(inplanes, inplanes, kernel_size=3, padding=1, stride=stride, groups=inplanes, 23 | bias=False) 24 | self.bn_dw = nn.BatchNorm2d(inplanes) 25 | self.conv_sep = nn.Conv2d(inplanes, planes, kernel_size=1, stride=1, padding=0, bias=False) 26 | self.bn_sep = nn.BatchNorm2d(planes) 27 | if prelu: 28 | self.relu = nn.PReLU() 29 | else: 30 | self.relu = nn.ReLU(inplace=True) 31 | 32 | def forward(self, x): 33 | out = self.conv_dw(x) 34 | out = self.bn_dw(out) 35 | out = self.relu(out) 36 | 37 | out = self.conv_sep(out) 38 | out = self.bn_sep(out) 39 | out = self.relu(out) 40 | 41 | return out 42 | 43 | 44 | class MobileNet(nn.Module): 45 | def __init__(self, widen_factor=1.0, num_classes=1000, prelu=False, input_channel=3): 46 | """ Constructor 47 | Args: 48 | widen_factor: config of widen_factor 49 | num_classes: number of classes 50 | """ 51 | super(MobileNet, self).__init__() 52 | 53 | block = DepthWiseBlock 54 | self.conv1 = nn.Conv2d(input_channel, int(32 * widen_factor), kernel_size=3, stride=2, padding=1, 55 | bias=False) 56 | 57 | self.bn1 = nn.BatchNorm2d(int(32 * widen_factor)) 58 | if prelu: 59 | self.relu = nn.PReLU() 60 | else: 61 | self.relu = nn.ReLU(inplace=True) 62 | 63 | self.dw2_1 = block(32 * widen_factor, 64 * widen_factor, prelu=prelu) 64 | self.dw2_2 = block(64 * widen_factor, 128 * widen_factor, stride=2, prelu=prelu) 65 | 66 | self.dw3_1 = block(128 * widen_factor, 128 * widen_factor, prelu=prelu) 67 | self.dw3_2 = block(128 * widen_factor, 256 * widen_factor, stride=2, prelu=prelu) 68 | 69 | self.dw4_1 = block(256 * widen_factor, 256 * widen_factor, prelu=prelu) 70 | self.dw4_2 = block(256 * widen_factor, 512 * widen_factor, stride=2, prelu=prelu) 71 | 72 | self.dw5_1 = block(512 * widen_factor, 512 * widen_factor, prelu=prelu) 73 | self.dw5_2 = block(512 * widen_factor, 512 * widen_factor, prelu=prelu) 74 | self.dw5_3 = block(512 * widen_factor, 512 * widen_factor, prelu=prelu) 75 | self.dw5_4 = block(512 * widen_factor, 512 * widen_factor, prelu=prelu) 76 | self.dw5_5 = block(512 * widen_factor, 512 * widen_factor, prelu=prelu) 77 | self.dw5_6 = block(512 * widen_factor, 1024 * widen_factor, stride=2, prelu=prelu) 78 | 79 | self.dw6 = block(1024 * widen_factor, 1024 * widen_factor, prelu=prelu) 80 | 81 | self.avgpool = nn.AdaptiveAvgPool2d(1) 82 | self.fc = nn.Linear(int(1024 * widen_factor), num_classes) 83 | 84 | for m in self.modules(): 85 | if isinstance(m, nn.Conv2d): 86 | n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels 87 | m.weight.data.normal_(0, math.sqrt(2. / n)) 88 | elif isinstance(m, nn.BatchNorm2d): 89 | m.weight.data.fill_(1) 90 | m.bias.data.zero_() 91 | 92 | def forward(self, x): 93 | x = self.conv1(x) 94 | x = self.bn1(x) 95 | x = self.relu(x) 96 | 97 | x = self.dw2_1(x) 98 | x = self.dw2_2(x) 99 | x = self.dw3_1(x) 100 | x = self.dw3_2(x) 101 | x = self.dw4_1(x) 102 | x = self.dw4_2(x) 103 | x = self.dw5_1(x) 104 | x = self.dw5_2(x) 105 | x = self.dw5_3(x) 106 | x = self.dw5_4(x) 107 | x = self.dw5_5(x) 108 | x = self.dw5_6(x) 109 | x = self.dw6(x) 110 | 111 | x = self.avgpool(x) 112 | x = x.view(x.size(0), -1) 113 | x = self.fc(x) 114 | 115 | return x 116 | 117 | 118 | def mobilenet(widen_factor=1.0, num_classes=1000): 119 | """ 120 | Construct MobileNet. 121 | widen_factor=1.0 for mobilenet_1 122 | widen_factor=0.75 for mobilenet_075 123 | widen_factor=0.5 for mobilenet_05 124 | widen_factor=0.25 for mobilenet_025 125 | """ 126 | model = MobileNet(widen_factor=widen_factor, num_classes=num_classes) 127 | return model 128 | 129 | 130 | def mobilenet_2(num_classes=62, input_channel=3): 131 | model = MobileNet(widen_factor=2.0, num_classes=num_classes, input_channel=input_channel) 132 | return model 133 | 134 | 135 | def mobilenet_1(num_classes=62, input_channel=3): 136 | model = MobileNet(widen_factor=1.0, num_classes=num_classes, input_channel=input_channel) 137 | return model 138 | 139 | 140 | def mobilenet_075(num_classes=62, input_channel=3): 141 | model = MobileNet(widen_factor=0.75, num_classes=num_classes, input_channel=input_channel) 142 | return model 143 | 144 | 145 | def mobilenet_05(num_classes=62, input_channel=3): 146 | model = MobileNet(widen_factor=0.5, num_classes=num_classes, input_channel=input_channel) 147 | return model 148 | 149 | 150 | def mobilenet_025(num_classes=62, input_channel=3): 151 | model = MobileNet(widen_factor=0.25, num_classes=num_classes, input_channel=input_channel) 152 | return model 153 | -------------------------------------------------------------------------------- /appendix/module/models/resnet.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import torch.utils.model_zoo as model_zoo 3 | 4 | 5 | __all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152'] 6 | 7 | 8 | model_urls = { 9 | 'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth', 10 | 'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth', 11 | 'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth', 12 | 'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth', 13 | 'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth', 14 | } 15 | 16 | 17 | def conv3x3(in_planes, out_planes, stride=1): 18 | """3x3 convolution with padding""" 19 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 20 | padding=1, bias=False) 21 | 22 | 23 | def conv1x1(in_planes, out_planes, stride=1): 24 | """1x1 convolution""" 25 | return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False) 26 | 27 | 28 | class BasicBlock(nn.Module): 29 | expansion = 1 30 | 31 | def __init__(self, inplanes, planes, stride=1, downsample=None): 32 | super(BasicBlock, self).__init__() 33 | self.conv1 = conv3x3(inplanes, planes, stride) 34 | self.bn1 = nn.BatchNorm2d(planes) 35 | self.relu = nn.ReLU(inplace=True) 36 | self.conv2 = conv3x3(planes, planes) 37 | self.bn2 = nn.BatchNorm2d(planes) 38 | self.downsample = downsample 39 | self.stride = stride 40 | 41 | def forward(self, x): 42 | identity = x 43 | 44 | out = self.conv1(x) 45 | out = self.bn1(out) 46 | out = self.relu(out) 47 | 48 | out = self.conv2(out) 49 | out = self.bn2(out) 50 | 51 | if self.downsample is not None: 52 | identity = self.downsample(x) 53 | 54 | out += identity 55 | out = self.relu(out) 56 | 57 | return out 58 | 59 | 60 | class Bottleneck(nn.Module): 61 | expansion = 4 62 | 63 | def __init__(self, inplanes, planes, stride=1, downsample=None): 64 | super(Bottleneck, self).__init__() 65 | self.conv1 = conv1x1(inplanes, planes) 66 | self.bn1 = nn.BatchNorm2d(planes) 67 | self.conv2 = conv3x3(planes, planes, stride) 68 | self.bn2 = nn.BatchNorm2d(planes) 69 | self.conv3 = conv1x1(planes, planes * self.expansion) 70 | self.bn3 = nn.BatchNorm2d(planes * self.expansion) 71 | self.relu = nn.ReLU(inplace=True) 72 | self.downsample = downsample 73 | self.stride = stride 74 | 75 | def forward(self, x): 76 | identity = x 77 | 78 | out = self.conv1(x) 79 | out = self.bn1(out) 80 | out = self.relu(out) 81 | 82 | out = self.conv2(out) 83 | out = self.bn2(out) 84 | out = self.relu(out) 85 | 86 | out = self.conv3(out) 87 | out = self.bn3(out) 88 | 89 | if self.downsample is not None: 90 | identity = self.downsample(x) 91 | 92 | out += identity 93 | out = self.relu(out) 94 | 95 | return out 96 | 97 | 98 | class ResNet(nn.Module): 99 | 100 | def __init__(self, block, layers, num_classes=1000, zero_init_residual=False): 101 | super(ResNet, self).__init__() 102 | self.inplanes = 64 103 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 104 | bias=False) 105 | self.bn1 = nn.BatchNorm2d(64) 106 | self.relu = nn.ReLU(inplace=True) 107 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 108 | self.layer1 = self._make_layer(block, 64, layers[0]) 109 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 110 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 111 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2) 112 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) 113 | self.fc = nn.Linear(512 * block.expansion, num_classes) 114 | 115 | for m in self.modules(): 116 | if isinstance(m, nn.Conv2d): 117 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 118 | elif isinstance(m, nn.BatchNorm2d): 119 | nn.init.constant_(m.weight, 1) 120 | nn.init.constant_(m.bias, 0) 121 | 122 | # Zero-initialize the last BN in each residual branch, 123 | # so that the residual branch starts with zeros, and each residual block behaves like an identity. 124 | # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677 125 | if zero_init_residual: 126 | for m in self.modules(): 127 | if isinstance(m, Bottleneck): 128 | nn.init.constant_(m.bn3.weight, 0) 129 | elif isinstance(m, BasicBlock): 130 | nn.init.constant_(m.bn2.weight, 0) 131 | 132 | def _make_layer(self, block, planes, blocks, stride=1): 133 | downsample = None 134 | if stride != 1 or self.inplanes != planes * block.expansion: 135 | downsample = nn.Sequential( 136 | conv1x1(self.inplanes, planes * block.expansion, stride), 137 | nn.BatchNorm2d(planes * block.expansion), 138 | ) 139 | 140 | layers = [] 141 | layers.append(block(self.inplanes, planes, stride, downsample)) 142 | self.inplanes = planes * block.expansion 143 | for _ in range(1, blocks): 144 | layers.append(block(self.inplanes, planes)) 145 | 146 | return nn.Sequential(*layers) 147 | 148 | def forward(self, x): 149 | x = self.conv1(x) 150 | x = self.bn1(x) 151 | x = self.relu(x) 152 | x = self.maxpool(x) 153 | 154 | x = self.layer1(x) 155 | x = self.layer2(x) 156 | x = self.layer3(x) 157 | x = self.layer4(x) 158 | 159 | x = self.avgpool(x) 160 | x = x.view(x.size(0), -1) 161 | x = self.fc(x) 162 | 163 | return x 164 | 165 | 166 | def resnet18(pretrained=False, **kwargs): 167 | """Constructs a ResNet-18 model. 168 | Args: 169 | pretrained (bool): If True, returns a model pre-trained on ImageNet 170 | """ 171 | model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs) 172 | if pretrained: 173 | model.load_state_dict(model_zoo.load_url(model_urls['resnet18'])) 174 | return model 175 | 176 | 177 | def resnet34(pretrained=False, **kwargs): 178 | """Constructs a ResNet-34 model. 179 | Args: 180 | pretrained (bool): If True, returns a model pre-trained on ImageNet 181 | """ 182 | model = ResNet(BasicBlock, [3, 4, 6, 3], 62) 183 | if pretrained: 184 | model.load_state_dict(model_zoo.load_url(model_urls['resnet34'])) 185 | return model 186 | 187 | 188 | def resnet50(pretrained=False, **kwargs): 189 | """Constructs a ResNet-50 model. 190 | Args: 191 | pretrained (bool): If True, returns a model pre-trained on ImageNet 192 | """ 193 | model = ResNet(Bottleneck, [3, 4, 6, 3], 62) 194 | if pretrained: 195 | model.load_state_dict(model_zoo.load_url(model_urls['resnet50'])) 196 | return model 197 | 198 | 199 | def resnet101(pretrained=False, **kwargs): 200 | """Constructs a ResNet-101 model. 201 | Args: 202 | pretrained (bool): If True, returns a model pre-trained on ImageNet 203 | """ 204 | model = ResNet(Bottleneck, [3, 4, 23, 3], 62) 205 | if pretrained: 206 | model.load_state_dict(model_zoo.load_url(model_urls['resnet101'])) 207 | return model 208 | 209 | 210 | def resnet152(pretrained=False, **kwargs): 211 | """Constructs a ResNet-152 model. 212 | Args: 213 | pretrained (bool): If True, returns a model pre-trained on ImageNet 214 | """ 215 | model = ResNet(Bottleneck, [3, 8, 36, 3], **kwargs) 216 | if pretrained: 217 | model.load_state_dict(model_zoo.load_url(model_urls['resnet152'])) 218 | return model 219 | -------------------------------------------------------------------------------- /appendix/module/models/senet.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import math 3 | import torch.utils.model_zoo as model_zoo 4 | 5 | 6 | __all__ = ['SENet', 'senet18', 'senet34', 'senet50', 'senet101', 'senet152'] 7 | 8 | def conv3x3(in_planes, out_planes, stride=1): 9 | """3x3 convolution with padding""" 10 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 11 | padding=1, bias=False) 12 | 13 | class BasicBlock(nn.Module): 14 | expansion = 1 15 | 16 | def __init__(self, inplanes, planes, stride=1, downsample=None): 17 | super(BasicBlock, self).__init__() 18 | self.conv1 = conv3x3(inplanes, planes, stride) 19 | self.bn1 = nn.BatchNorm2d(planes) 20 | self.relu = nn.ReLU(inplace=True) 21 | self.conv2 = conv3x3(planes, planes) 22 | self.bn2 = nn.BatchNorm2d(planes) 23 | self.downsample = downsample 24 | self.stride = stride 25 | 26 | if planes == 64: 27 | self.globalAvgPool = nn.AvgPool2d(56, stride=1) 28 | elif planes == 128: 29 | self.globalAvgPool = nn.AvgPool2d(28, stride=1) 30 | elif planes == 256: 31 | self.globalAvgPool = nn.AvgPool2d(14, stride=1) 32 | elif planes == 512: 33 | self.globalAvgPool = nn.AvgPool2d(7, stride=1) 34 | self.fc1 = nn.Linear(in_features=planes, out_features=round(planes / 16)) 35 | self.fc2 = nn.Linear(in_features=round(planes / 16), out_features=planes) 36 | self.sigmoid = nn.Sigmoid() 37 | 38 | def forward(self, x): 39 | residual = x 40 | 41 | out = self.conv1(x) 42 | out = self.bn1(out) 43 | out = self.relu(out) 44 | 45 | out = self.conv2(out) 46 | out = self.bn2(out) 47 | 48 | if self.downsample is not None: 49 | residual = self.downsample(x) 50 | 51 | original_out = out 52 | out = self.globalAvgPool(out) 53 | out = out.view(out.size(0), -1) 54 | out = self.fc1(out) 55 | out = self.relu(out) 56 | out = self.fc2(out) 57 | out = self.sigmoid(out) 58 | out = out.view(out.size(0), out.size(1), 1, 1) 59 | out = out * original_out 60 | 61 | out += residual 62 | out = self.relu(out) 63 | 64 | return out 65 | 66 | 67 | class Bottleneck(nn.Module): 68 | expansion = 4 69 | 70 | def __init__(self, inplanes, planes, stride=1, downsample=None): 71 | super(Bottleneck, self).__init__() 72 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 73 | self.bn1 = nn.BatchNorm2d(planes) 74 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, 75 | padding=1, bias=False) 76 | self.bn2 = nn.BatchNorm2d(planes) 77 | self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False) 78 | self.bn3 = nn.BatchNorm2d(planes * 4) 79 | self.relu = nn.ReLU(inplace=True) 80 | if planes == 64: 81 | self.globalAvgPool = nn.AvgPool2d(56, stride=1) 82 | elif planes == 128: 83 | self.globalAvgPool = nn.AvgPool2d(28, stride=1) 84 | elif planes == 256: 85 | self.globalAvgPool = nn.AvgPool2d(14, stride=1) 86 | elif planes == 512: 87 | self.globalAvgPool = nn.AvgPool2d(7, stride=1) 88 | self.fc1 = nn.Linear(in_features=planes * 4, out_features=round(planes / 4)) 89 | self.fc2 = nn.Linear(in_features=round(planes / 4), out_features=planes * 4) 90 | self.sigmoid = nn.Sigmoid() 91 | self.downsample = downsample 92 | self.stride = stride 93 | 94 | def forward(self, x): 95 | residual = x 96 | 97 | out = self.conv1(x) 98 | out = self.bn1(out) 99 | out = self.relu(out) 100 | 101 | out = self.conv2(out) 102 | out = self.bn2(out) 103 | out = self.relu(out) 104 | 105 | out = self.conv3(out) 106 | out = self.bn3(out) 107 | 108 | if self.downsample is not None: 109 | residual = self.downsample(x) 110 | 111 | original_out = out 112 | out = self.globalAvgPool(out) 113 | out = out.view(out.size(0), -1) 114 | out = self.fc1(out) 115 | out = self.relu(out) 116 | out = self.fc2(out) 117 | out = self.sigmoid(out) 118 | out = out.view(out.size(0),out.size(1),1,1) 119 | out = out * original_out 120 | 121 | out += residual 122 | out = self.relu(out) 123 | 124 | return out 125 | 126 | 127 | class SENet(nn.Module): 128 | 129 | def __init__(self, block, layers, num_classes=1000): 130 | self.inplanes = 64 131 | super(SENet, self).__init__() 132 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 133 | bias=False) 134 | self.bn1 = nn.BatchNorm2d(64) 135 | self.relu = nn.ReLU(inplace=True) 136 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 137 | self.layer1 = self._make_layer(block, 64, layers[0]) 138 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 139 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 140 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2) 141 | self.avgpool = nn.AvgPool2d(7, stride=1) 142 | self.fc = nn.Linear(512 * block.expansion, num_classes) 143 | 144 | for m in self.modules(): 145 | if isinstance(m, nn.Conv2d): 146 | n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels 147 | m.weight.data.normal_(0, math.sqrt(2. / n)) 148 | elif isinstance(m, nn.BatchNorm2d): 149 | m.weight.data.fill_(1) 150 | m.bias.data.zero_() 151 | 152 | def _make_layer(self, block, planes, blocks, stride=1): 153 | downsample = None 154 | if stride != 1 or self.inplanes != planes * block.expansion: 155 | downsample = nn.Sequential( 156 | nn.Conv2d(self.inplanes, planes * block.expansion, 157 | kernel_size=1, stride=stride, bias=False), 158 | nn.BatchNorm2d(planes * block.expansion), 159 | ) 160 | 161 | layers = [] 162 | layers.append(block(self.inplanes, planes, stride, downsample)) 163 | self.inplanes = planes * block.expansion 164 | for i in range(1, blocks): 165 | layers.append(block(self.inplanes, planes)) 166 | 167 | return nn.Sequential(*layers) 168 | 169 | def forward(self, x): 170 | x = self.conv1(x) 171 | x = self.bn1(x) 172 | x = self.relu(x) 173 | x = self.maxpool(x) 174 | 175 | x = self.layer1(x) 176 | x = self.layer2(x) 177 | x = self.layer3(x) 178 | x = self.layer4(x) 179 | 180 | x = self.avgpool(x) 181 | x = x.view(x.size(0), -1) 182 | x = self.fc(x) 183 | 184 | return x 185 | 186 | def senet18(pretrained=False, **kwargs): 187 | """Constructs a ResNet-18 model. 188 | Args: 189 | pretrained (bool): If True, returns a model pre-trained on ImageNet 190 | """ 191 | model = SENet(BasicBlock, [2, 2, 2, 2], **kwargs) 192 | return model 193 | 194 | 195 | def senet34(pretrained=False, **kwargs): 196 | """Constructs a ResNet-34 model. 197 | Args: 198 | pretrained (bool): If True, returns a model pre-trained on ImageNet 199 | """ 200 | model = SENet(BasicBlock, [3, 4, 6, 3], **kwargs) 201 | return model 202 | 203 | 204 | def senet50(pretrained=False, **kwargs): 205 | """Constructs a ResNet-50 model. 206 | Args: 207 | pretrained (bool): If True, returns a model pre-trained on ImageNet 208 | """ 209 | model = SENet(Bottleneck, [3, 4, 6, 3], **kwargs) 210 | return model 211 | 212 | 213 | def senet101(pretrained=False, **kwargs): 214 | """Constructs a ResNet-101 model. 215 | Args: 216 | pretrained (bool): If True, returns a model pre-trained on ImageNet 217 | """ 218 | model = SENet(Bottleneck, [3, 4, 23, 3], **kwargs) 219 | return model 220 | 221 | 222 | def senet152(pretrained=False, **kwargs): 223 | """Constructs a ResNet-152 model. 224 | Args: 225 | pretrained (bool): If True, returns a model pre-trained on ImageNet 226 | """ 227 | model = SENet(Bottleneck, [3, 8, 36, 3], **kwargs) 228 | return model -------------------------------------------------------------------------------- /appendix/module/models/shufflenet_v2.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | # import torch.utils.model_zoo as model_zoo 4 | 5 | 6 | __all__ = [ 7 | 'ShuffleNetV2', 'shufflenet_v2_x0_5', 'shufflenet_v2_x1_0', 'shufflenet_v2_x1_5', 'shufflenet_v2_x2_0' 8 | ] 9 | 10 | # model_urls = { 11 | # 'shufflenetv2_x0.5': 'https://download.pytorch.org/models/shufflenetv2_x0.5-f707e7126e.pth', 12 | # 'shufflenetv2_x1.0': 'https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth', 13 | # 'shufflenetv2_x1.5': None, 14 | # 'shufflenetv2_x2.0': None, 15 | # } 16 | 17 | 18 | def channel_shuffle(x, groups): 19 | batchsize, num_channels, height, width = x.data.size() 20 | channels_per_group = num_channels // groups 21 | 22 | # reshape 23 | x = x.view(batchsize, groups, 24 | channels_per_group, height, width) 25 | 26 | x = torch.transpose(x, 1, 2).contiguous() 27 | 28 | # flatten 29 | x = x.view(batchsize, -1, height, width) 30 | 31 | return x 32 | 33 | 34 | class InvertedResidual(nn.Module): 35 | def __init__(self, inp, oup, stride): 36 | super(InvertedResidual, self).__init__() 37 | 38 | if not (1 <= stride <= 3): 39 | raise ValueError('illegal stride value') 40 | self.stride = stride 41 | 42 | branch_features = oup // 2 43 | assert (self.stride != 1) or (inp == branch_features << 1) 44 | 45 | if self.stride > 1: 46 | self.branch1 = nn.Sequential( 47 | self.depthwise_conv(inp, inp, kernel_size=3, stride=self.stride, padding=1), 48 | nn.BatchNorm2d(inp), 49 | nn.Conv2d(inp, branch_features, kernel_size=1, stride=1, padding=0, bias=False), 50 | nn.BatchNorm2d(branch_features), 51 | nn.ReLU(inplace=True), 52 | ) 53 | 54 | self.branch2 = nn.Sequential( 55 | nn.Conv2d(inp if (self.stride > 1) else branch_features, 56 | branch_features, kernel_size=1, stride=1, padding=0, bias=False), 57 | nn.BatchNorm2d(branch_features), 58 | nn.ReLU(inplace=True), 59 | self.depthwise_conv(branch_features, branch_features, kernel_size=3, stride=self.stride, padding=1), 60 | nn.BatchNorm2d(branch_features), 61 | nn.Conv2d(branch_features, branch_features, kernel_size=1, stride=1, padding=0, bias=False), 62 | nn.BatchNorm2d(branch_features), 63 | nn.ReLU(inplace=True), 64 | ) 65 | 66 | @staticmethod 67 | def depthwise_conv(i, o, kernel_size, stride=1, padding=0, bias=False): 68 | return nn.Conv2d(i, o, kernel_size, stride, padding, bias=bias, groups=i) 69 | 70 | def forward(self, x): 71 | if self.stride == 1: 72 | x1, x2 = x.chunk(2, dim=1) 73 | out = torch.cat((x1, self.branch2(x2)), dim=1) 74 | else: 75 | out = torch.cat((self.branch1(x), self.branch2(x)), dim=1) 76 | 77 | out = channel_shuffle(out, 2) 78 | 79 | return out 80 | 81 | 82 | class ShuffleNetV2(nn.Module): 83 | def __init__(self, stages_repeats, stages_out_channels, num_classes=1000): 84 | super(ShuffleNetV2, self).__init__() 85 | 86 | if len(stages_repeats) != 3: 87 | raise ValueError('expected stages_repeats as list of 3 positive ints') 88 | if len(stages_out_channels) != 5: 89 | raise ValueError('expected stages_out_channels as list of 5 positive ints') 90 | self._stage_out_channels = stages_out_channels 91 | 92 | input_channels = 3 93 | output_channels = self._stage_out_channels[0] 94 | self.conv1 = nn.Sequential( 95 | nn.Conv2d(input_channels, output_channels, 3, 2, 1, bias=False), 96 | nn.BatchNorm2d(output_channels), 97 | nn.ReLU(inplace=True), 98 | ) 99 | input_channels = output_channels 100 | 101 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 102 | 103 | stage_names = ['stage{}'.format(i) for i in [2, 3, 4]] 104 | for name, repeats, output_channels in zip( 105 | stage_names, stages_repeats, self._stage_out_channels[1:]): 106 | seq = [InvertedResidual(input_channels, output_channels, 2)] 107 | for i in range(repeats - 1): 108 | seq.append(InvertedResidual(output_channels, output_channels, 1)) 109 | setattr(self, name, nn.Sequential(*seq)) 110 | input_channels = output_channels 111 | 112 | output_channels = self._stage_out_channels[-1] 113 | self.conv5 = nn.Sequential( 114 | nn.Conv2d(input_channels, output_channels, 1, 1, 0, bias=False), 115 | nn.BatchNorm2d(output_channels), 116 | nn.ReLU(inplace=True), 117 | ) 118 | 119 | self.fc = nn.Linear(output_channels, num_classes) 120 | 121 | def forward(self, x): 122 | x = self.conv1(x) 123 | x = self.maxpool(x) 124 | x = self.stage2(x) 125 | x = self.stage3(x) 126 | x = self.stage4(x) 127 | x = self.conv5(x) 128 | x = x.mean([2, 3]) # globalpool 129 | x = self.fc(x) 130 | return x 131 | 132 | 133 | def shufflenet_v2_x0_5(pretrained=False, progress=True, **kwargs): 134 | """ 135 | Constructs a ShuffleNetV2 with 0.5x output channels, as described in 136 | `"ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design" 137 | `_. 138 | Args: 139 | pretrained (bool): If True, returns a model pre-trained on ImageNet 140 | progress (bool): If True, displays a progress bar of the download to stderr 141 | """ 142 | return ShuffleNetV2([4, 8, 4], [24, 48, 96, 192, 1024], **kwargs) 143 | 144 | 145 | def shufflenet_v2_x1_0(pretrained=False, progress=True, **kwargs): 146 | """ 147 | Constructs a ShuffleNetV2 with 1.0x output channels, as described in 148 | `"ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design" 149 | `_. 150 | Args: 151 | pretrained (bool): If True, returns a model pre-trained on ImageNet 152 | progress (bool): If True, displays a progress bar of the download to stderr 153 | """ 154 | return ShuffleNetV2([4, 8, 4], [24, 116, 232, 464, 1024], **kwargs) 155 | 156 | 157 | def shufflenet_v2_x1_5(pretrained=False, progress=True, **kwargs): 158 | """ 159 | Constructs a ShuffleNetV2 with 1.5x output channels, as described in 160 | `"ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design" 161 | `_. 162 | Args: 163 | pretrained (bool): If True, returns a model pre-trained on ImageNet 164 | progress (bool): If True, displays a progress bar of the download to stderr 165 | """ 166 | return ShuffleNetV2([4, 8, 4], [24, 176, 352, 704, 1024], **kwargs) 167 | 168 | 169 | def shufflenet_v2_x2_0(pretrained=False, progress=True, **kwargs): 170 | """ 171 | Constructs a ShuffleNetV2 with 2.0x output channels, as described in 172 | `"ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design" 173 | `_. 174 | Args: 175 | pretrained (bool): If True, returns a model pre-trained on ImageNet 176 | progress (bool): If True, displays a progress bar of the download to stderr 177 | """ 178 | return ShuffleNetV2([4, 8, 4], [24, 244, 488, 976, 2048], **kwargs) -------------------------------------------------------------------------------- /appendix/module/models/unet.py: -------------------------------------------------------------------------------- 1 | import _init_paths 2 | import torch 3 | import torch.nn as nn 4 | from layers import unetConv2, unetUp 5 | from utils import init_weights, count_param 6 | 7 | class UNet(nn.Module): 8 | 9 | def __init__(self, in_channels=1, n_classes=2, feature_scale=2, is_deconv=True, is_batchnorm=True): 10 | super(UNet, self).__init__() 11 | self.in_channels = in_channels 12 | self.feature_scale = feature_scale 13 | self.is_deconv = is_deconv 14 | self.is_batchnorm = is_batchnorm 15 | 16 | 17 | filters = [64, 128, 256, 512, 1024] 18 | filters = [int(x / self.feature_scale) for x in filters] 19 | 20 | # downsampling 21 | self.maxpool = nn.MaxPool2d(kernel_size=2) 22 | self.conv1 = unetConv2(self.in_channels, filters[0], self.is_batchnorm) 23 | self.conv2 = unetConv2(filters[0], filters[1], self.is_batchnorm) 24 | self.conv3 = unetConv2(filters[1], filters[2], self.is_batchnorm) 25 | self.conv4 = unetConv2(filters[2], filters[3], self.is_batchnorm) 26 | self.center = unetConv2(filters[3], filters[4], self.is_batchnorm) 27 | # upsampling 28 | self.up_concat4 = unetUp(filters[4], filters[3], self.is_deconv) 29 | self.up_concat3 = unetUp(filters[3], filters[2], self.is_deconv) 30 | self.up_concat2 = unetUp(filters[2], filters[1], self.is_deconv) 31 | self.up_concat1 = unetUp(filters[1], filters[0], self.is_deconv) 32 | # final conv (without any concat) 33 | self.final = nn.Conv2d(filters[0], n_classes, 1) 34 | 35 | # initialise weights 36 | for m in self.modules(): 37 | if isinstance(m, nn.Conv2d): 38 | init_weights(m, init_type='kaiming') 39 | elif isinstance(m, nn.BatchNorm2d): 40 | init_weights(m, init_type='kaiming') 41 | 42 | def forward(self, inputs): 43 | conv1 = self.conv1(inputs) # 16*512*512 44 | maxpool1 = self.maxpool(conv1) # 16*256*256 45 | 46 | conv2 = self.conv2(maxpool1) # 32*256*256 47 | maxpool2 = self.maxpool(conv2) # 32*128*128 48 | 49 | conv3 = self.conv3(maxpool2) # 64*128*128 50 | maxpool3 = self.maxpool(conv3) # 64*64*64 51 | 52 | conv4 = self.conv4(maxpool3) # 128*64*64 53 | maxpool4 = self.maxpool(conv4) # 128*32*32 54 | 55 | center = self.center(maxpool4) # 256*32*32 56 | up4 = self.up_concat4(center,conv4) # 128*64*64 57 | up3 = self.up_concat3(up4,conv3) # 64*128*128 58 | up2 = self.up_concat2(up3,conv2) # 32*256*256 59 | up1 = self.up_concat1(up2,conv1) # 16*512*512 60 | 61 | final = self.final(up1) 62 | 63 | return final 64 | 65 | if __name__ == '__main__': 66 | print('#### Test Case ###') 67 | from torch.autograd import Variable 68 | x = Variable(torch.rand(2,1,64,64)).cuda() 69 | model = UNet().cuda() 70 | param = count_param(model) 71 | y = model(x) 72 | print('Output shape:',y.shape) 73 | print('UNet totoal parameters: %.2fM (%d)'%(param/1e6,param)) -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-example-master/.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | local_settings.py 56 | 57 | # Flask stuff: 58 | instance/ 59 | .webassets-cache 60 | 61 | # Scrapy stuff: 62 | .scrapy 63 | 64 | # Sphinx documentation 65 | docs/_build/ 66 | 67 | # PyBuilder 68 | target/ 69 | 70 | # Jupyter Notebook 71 | .ipynb_checkpoints 72 | 73 | # pyenv 74 | .python-version 75 | 76 | # celery beat schedule file 77 | celerybeat-schedule 78 | 79 | # SageMath parsed files 80 | *.sage.py 81 | 82 | # dotenv 83 | .env 84 | 85 | # virtualenv 86 | .venv 87 | venv/ 88 | ENV/ 89 | 90 | # Spyder project settings 91 | .spyderproject 92 | .spyproject 93 | 94 | # Rope project settings 95 | .ropeproject 96 | 97 | # mkdocs documentation 98 | /site 99 | 100 | # mypy 101 | .mypy_cache/ 102 | 103 | data/ 104 | -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-example-master/LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 なるみ 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-example-master/README.md: -------------------------------------------------------------------------------- 1 | # Pytorch Distributed Example 2 | 3 | If you are using previous version of PyTorch: 4 | 5 | - [v1.1.0](https://github.com/narumiruna/pytorch-distributed-example/tree/v1.1.0) 6 | - [v1.0.1](https://github.com/narumiruna/pytorch-distributed-example/tree/v1.0.1) 7 | - [v0.4.1](https://github.com/narumiruna/pytorch-distributed-example/tree/v0.4.1) 8 | 9 | ## Requirements 10 | 11 | - pytorch 12 | - torchvision 13 | 14 | ## References 15 | 16 | - [Distributed communication package - torch.distributed](http://pytorch.org/docs/master/distributed.html) 17 | - [Writing Distributed Applications with PyTorch](http://pytorch.org/tutorials/intermediate/dist_tuto.html) 18 | -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-example-master/mnist/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM pytorch/pytorch:1.2-cuda10.0-cudnn7-runtime 2 | 3 | RUN pip install torchvision \ 4 | && rm -rf ~/.cache/pip 5 | 6 | ENV GLOO_SOCKET_IFNAME=eth0 7 | ENV NCCL_SOCKET_IFNAME=eth0 8 | 9 | WORKDIR /work 10 | RUN python -c "from torchvision import datasets;datasets.MNIST('data', download=True)" 11 | COPY main.py . 12 | -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-example-master/mnist/README.md: -------------------------------------------------------------------------------- 1 | # MNIST Example 2 | 3 | ```shell 4 | export GLOO_SOCKET_IFNAME=eth0 5 | ``` 6 | 7 | Rank 0 8 | ``` 9 | $ python3 main.py --init-method tcp://127.0.0.1:23456 --rank 0 --world-size 2 10 | ``` 11 | 12 | Rank 1 13 | ``` 14 | $ python3 main.py --init-method tcp://127.0.0.1:23456 --rank 1 --world-size 2 15 | ``` 16 | 17 | ## Use specific root directory for running example on single machine. 18 | 19 | Rank 0 20 | ``` 21 | $ python3 main.py --init-method tcp://127.0.0.1:23456 --rank 0 --world-size 2 --root data0 22 | ``` 23 | 24 | Rank 1 25 | ``` 26 | $ python3 main.py --init-method tcp://127.0.0.1:23456 --rank 1 --world-size 2 --root data1 27 | ``` 28 | 29 | ## Run in docker 30 | 31 | Install [docker](https://docs.docker.com/install/), [docker-compose](https://docs.docker.com/compose/install/) and [NVIDIA docker](https://github.com/NVIDIA/nvidia-docker) (if you want to run with GPU) 32 | 33 | ``` 34 | $ docker build --file Dockerfile --tag pytorch-distributed-example . 35 | $ docker-compose up 36 | For GPU 37 | $ docker-compose --file docker-compose-gpu.yml up 38 | ``` 39 | -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-example-master/mnist/docker-compose-gpu.yml: -------------------------------------------------------------------------------- 1 | version: "2.3" 2 | services: 3 | rank0: 4 | image: pytorch-distributed-example 5 | runtime: nvidia 6 | networks: 7 | bridge: 8 | ipv4_address: 10.1.0.10 9 | command: python -u main.py --backend nccl --init-method tcp://10.1.0.10:23456 --world-size 2 --rank 0 10 | rank1: 11 | image: pytorch-distributed-example 12 | runtime: nvidia 13 | networks: 14 | bridge: 15 | ipv4_address: 10.1.0.11 16 | command: python -u main.py --backend nccl --init-method tcp://10.1.0.10:23456 --world-size 2 --rank 1 17 | networks: 18 | bridge: 19 | driver: bridge 20 | ipam: 21 | config: 22 | - subnet: 10.1.0.0/16 23 | -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-example-master/mnist/docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: "2.3" 2 | services: 3 | rank0: 4 | image: pytorch-distributed-example 5 | networks: 6 | bridge: 7 | ipv4_address: 10.1.0.10 8 | command: python -u main.py --init-method tcp://10.1.0.10:23456 --world-size 2 --rank 0 9 | rank1: 10 | image: pytorch-distributed-example 11 | networks: 12 | bridge: 13 | ipv4_address: 10.1.0.11 14 | command: python -u main.py --init-method tcp://10.1.0.10:23456 --world-size 2 --rank 1 15 | networks: 16 | bridge: 17 | driver: bridge 18 | ipam: 19 | config: 20 | - subnet: 10.1.0.0/16 21 | -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-example-master/mnist/main.py: -------------------------------------------------------------------------------- 1 | from __future__ import division, print_function 2 | 3 | import argparse 4 | import os 5 | 6 | import torch 7 | import torch.nn.functional as F 8 | from torch import distributed, nn 9 | from torch.utils import data 10 | from torchvision import datasets, transforms 11 | 12 | os.environ['NCCL_SOCKET_IFNAME'] = 'enp2s0' 13 | 14 | def distributed_is_initialized(): 15 | if distributed.is_available(): 16 | if distributed.is_initialized(): 17 | return True 18 | return False 19 | 20 | 21 | class Average(object): 22 | 23 | def __init__(self): 24 | self.sum = 0 25 | self.count = 0 26 | 27 | def __str__(self): 28 | return '{:.6f}'.format(self.average) 29 | 30 | @property 31 | def average(self): 32 | return self.sum / self.count 33 | 34 | def update(self, value, number): 35 | self.sum += value * number 36 | self.count += number 37 | 38 | 39 | class Accuracy(object): 40 | 41 | def __init__(self): 42 | self.correct = 0 43 | self.count = 0 44 | 45 | def __str__(self): 46 | return '{:.2f}%'.format(self.accuracy * 100) 47 | 48 | @property 49 | def accuracy(self): 50 | return self.correct / self.count 51 | 52 | @torch.no_grad() 53 | def update(self, output, target): 54 | pred = output.argmax(dim=1) 55 | correct = pred.eq(target).sum().item() 56 | 57 | self.correct += correct 58 | self.count += output.size(0) 59 | 60 | 61 | class Trainer(object): 62 | 63 | def __init__(self, model, optimizer, train_loader, test_loader, device): 64 | self.model = model 65 | self.optimizer = optimizer 66 | self.train_loader = train_loader 67 | self.test_loader = test_loader 68 | self.device = device 69 | 70 | def fit(self, epochs): 71 | for epoch in range(1, epochs + 1): 72 | train_loss, train_acc = self.train() 73 | test_loss, test_acc = self.evaluate() 74 | 75 | print( 76 | 'Epoch: {}/{},'.format(epoch, epochs), 77 | 'train loss: {}, train acc: {},'.format(train_loss, train_acc), 78 | 'test loss: {}, test acc: {}.'.format(test_loss, test_acc), 79 | ) 80 | 81 | def train(self): 82 | self.model.train() 83 | 84 | train_loss = Average() 85 | train_acc = Accuracy() 86 | 87 | for data, target in self.train_loader: 88 | data = data.to(self.device) 89 | target = target.to(self.device) 90 | 91 | output = self.model(data) 92 | loss = F.cross_entropy(output, target) 93 | 94 | self.optimizer.zero_grad() 95 | loss.backward() 96 | self.optimizer.step() 97 | 98 | train_loss.update(loss.item(), data.size(0)) 99 | train_acc.update(output, target) 100 | 101 | return train_loss, train_acc 102 | 103 | @torch.no_grad() 104 | def evaluate(self): 105 | self.model.eval() 106 | 107 | test_loss = Average() 108 | test_acc = Accuracy() 109 | 110 | for data, target in self.test_loader: 111 | data = data.to(self.device) 112 | target = target.to(self.device) 113 | 114 | output = self.model(data) 115 | loss = F.cross_entropy(output, target) 116 | 117 | test_loss.update(loss.item(), data.size(0)) 118 | test_acc.update(output, target) 119 | 120 | return test_loss, test_acc 121 | 122 | 123 | class Net(nn.Module): 124 | 125 | def __init__(self): 126 | super(Net, self).__init__() 127 | self.fc = nn.Linear(784, 10) 128 | 129 | def forward(self, x): 130 | return self.fc(x.view(x.size(0), -1)) 131 | 132 | 133 | class MNISTDataLoader(data.DataLoader): 134 | 135 | def __init__(self, root, batch_size, train=True): 136 | transform = transforms.Compose([ 137 | transforms.ToTensor(), 138 | transforms.Normalize((0.1307,), (0.3081,)), 139 | ]) 140 | 141 | dataset = datasets.MNIST(root, train=train, transform=transform, download=True) 142 | sampler = None 143 | if train and distributed_is_initialized(): 144 | sampler = data.DistributedSampler(dataset) 145 | 146 | super(MNISTDataLoader, self).__init__( 147 | dataset, 148 | batch_size=batch_size, 149 | shuffle=(sampler is None), 150 | sampler=sampler, 151 | ) 152 | 153 | 154 | def run(args): 155 | device = torch.device('cuda' if torch.cuda.is_available() and not args.no_cuda else 'cpu') 156 | 157 | model = Net() 158 | if distributed_is_initialized(): 159 | model.to(device) 160 | model = nn.parallel.DistributedDataParallel(model) 161 | else: 162 | model = nn.DataParallel(model) 163 | model.to(device) 164 | optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate) 165 | 166 | train_loader = MNISTDataLoader(args.root, args.batch_size, train=True) 167 | test_loader = MNISTDataLoader(args.root, args.batch_size, train=False) 168 | trainer = Trainer(model, optimizer, train_loader, test_loader, device) 169 | trainer.fit(args.epochs) 170 | 171 | 172 | def main(): 173 | parser = argparse.ArgumentParser() 174 | parser.add_argument('--backend', type=str, default='nccl', help='Name of the backend to use.') 175 | parser.add_argument('-i', 176 | '--init-method', 177 | type=str, 178 | default='env://', 179 | help='URL specifying how to initialize the package.') 180 | parser.add_argument('-s', '--world-size', type=int, default=1, help='Number of processes participating in the job.') 181 | parser.add_argument('-r', '--rank', type=int, default=0, help='Rank of the current process.') 182 | parser.add_argument('--epochs', type=int, default=20) 183 | parser.add_argument('--no-cuda', action='store_true') 184 | parser.add_argument('-lr', '--learning-rate', type=float, default=1e-3) 185 | parser.add_argument('--root', type=str, default='data') 186 | parser.add_argument('--batch-size', type=int, default=128) 187 | parser.add_argument('--local_rank', type = int, default=0) 188 | args = parser.parse_args() 189 | print(args) 190 | 191 | if args.world_size > 1: 192 | distributed.init_process_group( 193 | backend=args.backend, 194 | init_method=args.init_method, 195 | world_size=args.world_size, 196 | rank=args.rank, 197 | ) 198 | 199 | run(args) 200 | 201 | 202 | if __name__ == '__main__': 203 | main() 204 | -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-example-master/setup.cfg: -------------------------------------------------------------------------------- 1 | [yapf] 2 | based_on_style = google 3 | column_limit = 120 4 | -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-example-master/toy/README.md: -------------------------------------------------------------------------------- 1 | # Toy Example 2 | 3 | Rank 0 4 | ``` 5 | $ python3 main.py --init-method tcp://127.0.0.1:23456 --rank 0 --world-size 2 6 | ``` 7 | 8 | Rank 2 9 | ``` 10 | $ python3 main.py --init-method tcp://127.0.0.1:23456 --rank 1 --world-size 2 11 | ``` 12 | -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-example-master/toy/main.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | from random import randint 4 | from time import sleep 5 | 6 | import torch 7 | import torch.distributed as dist 8 | 9 | 10 | def run(world_size, rank, steps): 11 | for step in range(1, steps + 1): 12 | # get random int 13 | value = randint(0, 10) 14 | 15 | # group all ranks 16 | ranks = list(range(world_size)) 17 | group = dist.new_group(ranks=ranks) 18 | 19 | # compute reduced sum 20 | tensor = torch.tensor(value, dtype=torch.int).cuda() 21 | dist.all_reduce(tensor, op=dist.ReduceOp.SUM, group=group) 22 | 23 | print('rank: {}, step: {}, value: {}, reduced sum: {}.'.format(rank, step, value, tensor.item())) 24 | 25 | sleep(1) 26 | 27 | 28 | def main(): 29 | parser = argparse.ArgumentParser() 30 | parser.add_argument('--backend', type=str, default='nccl', help='Name of the backend to use.') 31 | parser.add_argument( 32 | '-i', 33 | '--init-method', 34 | type=str, 35 | default='tcp://127.0.0.1:23456', 36 | help='URL specifying how to initialize the package.') 37 | parser.add_argument('-s', '--world-size', type=int, help='Number of processes participating in the job.') 38 | parser.add_argument('-r', '--rank', type=int, help='Rank of the current process.') 39 | parser.add_argument('--steps', type=int, default=20) 40 | args = parser.parse_args() 41 | print(args) 42 | 43 | dist.init_process_group( 44 | backend=args.backend, 45 | init_method=args.init_method, 46 | world_size=args.world_size, 47 | rank=args.rank, 48 | ) 49 | 50 | run(args.world_size, args.rank, args.steps) 51 | 52 | 53 | if __name__ == '__main__': 54 | main() 55 | -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-master/.gitignore: -------------------------------------------------------------------------------- 1 | # Compiled python 2 | *.pyc 3 | *.pyd 4 | 5 | # Compiled MATLAB 6 | *.mex* 7 | 8 | # IPython notebook checkpoints 9 | .ipynb_checkpoints 10 | 11 | # Editor temporaries 12 | *.swn 13 | *.swo 14 | *.swp 15 | *~ 16 | 17 | # Sublime Text settings 18 | *.sublime-workspace 19 | *.sublime-project 20 | 21 | # Eclipse Project settings 22 | *.*project 23 | .settings 24 | 25 | # QtCreator files 26 | *.user 27 | 28 | # PyCharm files 29 | .idea 30 | 31 | # Visual Studio Code files 32 | .vscode 33 | .vs 34 | 35 | # OSX dir files 36 | .DS_Store 37 | -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-master/LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2019-present, Zhi Zhang 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-master/apex_distributed.py: -------------------------------------------------------------------------------- 1 | # https://github.com/NVIDIA/apex/blob/master/examples/imagenet/main_amp.py 2 | 3 | import csv 4 | 5 | import argparse 6 | import os 7 | import random 8 | import shutil 9 | import time 10 | import warnings 11 | 12 | import torch 13 | import torch.nn as nn 14 | import torch.nn.parallel 15 | import torch.backends.cudnn as cudnn 16 | import torch.distributed as dist 17 | import torch.optim 18 | import torch.multiprocessing as mp 19 | import torch.utils.data 20 | import torch.utils.data.distributed 21 | import torchvision.transforms as transforms 22 | import torchvision.datasets as datasets 23 | import torchvision.models as models 24 | 25 | from apex import amp 26 | from apex.parallel import DistributedDataParallel 27 | 28 | model_names = sorted(name for name in models.__dict__ 29 | if name.islower() and not name.startswith("__") and callable(models.__dict__[name])) 30 | 31 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training') 32 | parser.add_argument('--data', metavar='DIR', default='/home/zhangzhi/Data/ImageNet2012', help='path to dataset') 33 | parser.add_argument('-a', 34 | '--arch', 35 | metavar='ARCH', 36 | default='resnet18', 37 | choices=model_names, 38 | help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet18)') 39 | parser.add_argument('-j', 40 | '--workers', 41 | default=4, 42 | type=int, 43 | metavar='N', 44 | help='number of data loading workers (default: 4)') 45 | parser.add_argument('--epochs', default=90, type=int, metavar='N', help='number of total epochs to run') 46 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)') 47 | parser.add_argument('-b', 48 | '--batch-size', 49 | default=6400, 50 | type=int, 51 | metavar='N', 52 | help='mini-batch size (default: 6400), this is the total ' 53 | 'batch size of all GPUs on the current node when ' 54 | 'using Data Parallel or Distributed Data Parallel') 55 | parser.add_argument('--lr', 56 | '--learning-rate', 57 | default=0.1, 58 | type=float, 59 | metavar='LR', 60 | help='initial learning rate', 61 | dest='lr') 62 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum') 63 | parser.add_argument('--local_rank', default=-1, type=int, 64 | help='node rank for distributed training') 65 | parser.add_argument('--wd', 66 | '--weight-decay', 67 | default=1e-4, 68 | type=float, 69 | metavar='W', 70 | help='weight decay (default: 1e-4)', 71 | dest='weight_decay') 72 | parser.add_argument('-p', '--print-freq', default=10, type=int, metavar='N', help='print frequency (default: 10)') 73 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', help='evaluate model on validation set') 74 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', help='use pre-trained model') 75 | parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ') 76 | 77 | best_acc1 = 0 78 | 79 | 80 | class data_prefetcher(): 81 | def __init__(self, loader): 82 | self.loader = iter(loader) 83 | self.stream = torch.cuda.Stream() 84 | self.mean = torch.tensor([0.485 * 255, 0.456 * 255, 0.406 * 255]).cuda().view(1, 3, 1, 1) 85 | self.std = torch.tensor([0.229 * 255, 0.224 * 255, 0.225 * 255]).cuda().view(1, 3, 1, 1) 86 | # With Amp, it isn't necessary to manually convert data to half. 87 | # if args.fp16: 88 | # self.mean = self.mean.half() 89 | # self.std = self.std.half() 90 | self.preload() 91 | 92 | def preload(self): 93 | try: 94 | self.next_input, self.next_target = next(self.loader) 95 | except StopIteration: 96 | self.next_input = None 97 | self.next_target = None 98 | return 99 | # if record_stream() doesn't work, another option is to make sure device inputs are created 100 | # on the main stream. 101 | # self.next_input_gpu = torch.empty_like(self.next_input, device='cuda') 102 | # self.next_target_gpu = torch.empty_like(self.next_target, device='cuda') 103 | # Need to make sure the memory allocated for next_* is not still in use by the main stream 104 | # at the time we start copying to next_*: 105 | # self.stream.wait_stream(torch.cuda.current_stream()) 106 | with torch.cuda.stream(self.stream): 107 | self.next_input = self.next_input.cuda(non_blocking=True) 108 | self.next_target = self.next_target.cuda(non_blocking=True) 109 | # more code for the alternative if record_stream() doesn't work: 110 | # copy_ will record the use of the pinned source tensor in this side stream. 111 | # self.next_input_gpu.copy_(self.next_input, non_blocking=True) 112 | # self.next_target_gpu.copy_(self.next_target, non_blocking=True) 113 | # self.next_input = self.next_input_gpu 114 | # self.next_target = self.next_target_gpu 115 | 116 | # With Amp, it isn't necessary to manually convert data to half. 117 | # if args.fp16: 118 | # self.next_input = self.next_input.half() 119 | # else: 120 | self.next_input = self.next_input.float() 121 | self.next_input = self.next_input.sub_(self.mean).div_(self.std) 122 | 123 | def next(self): 124 | torch.cuda.current_stream().wait_stream(self.stream) 125 | input = self.next_input 126 | target = self.next_target 127 | if input is not None: 128 | input.record_stream(torch.cuda.current_stream()) 129 | if target is not None: 130 | target.record_stream(torch.cuda.current_stream()) 131 | self.preload() 132 | return input, target 133 | 134 | 135 | def main(): 136 | args = parser.parse_args() 137 | 138 | if args.seed is not None: 139 | random.seed(args.seed) 140 | torch.manual_seed(args.seed) 141 | cudnn.deterministic = True 142 | warnings.warn('You have chosen to seed training. ' 143 | 'This will turn on the CUDNN deterministic setting, ' 144 | 'which can slow down your training considerably! ' 145 | 'You may see unexpected behavior when restarting ' 146 | 'from checkpoints.') 147 | 148 | main_worker(args.local_rank, 4, args) 149 | 150 | 151 | def main_worker(gpu, ngpus_per_node, args): 152 | global best_acc1 153 | 154 | dist.init_process_group(backend='nccl') 155 | # create model 156 | if args.pretrained: 157 | print("=> using pre-trained model '{}'".format(args.arch)) 158 | model = models.__dict__[args.arch](pretrained=True) 159 | else: 160 | print("=> creating model '{}'".format(args.arch)) 161 | model = models.__dict__[args.arch]() 162 | 163 | torch.cuda.set_device(gpu) 164 | model.cuda() 165 | # When using a single GPU per process and per 166 | # DistributedDataParallel, we need to divide the batch size 167 | # ourselves based on the total number of GPUs we have 168 | args.batch_size = int(args.batch_size / ngpus_per_node) 169 | 170 | # define loss function (criterion) and optimizer 171 | criterion = nn.CrossEntropyLoss().cuda() 172 | 173 | optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay) 174 | 175 | model, optimizer = amp.initialize(model, 176 | optimizer) 177 | model = DistributedDataParallel(model) 178 | 179 | cudnn.benchmark = True 180 | 181 | # Data loading code 182 | traindir = os.path.join(args.data, 'train') 183 | valdir = os.path.join(args.data, 'val') 184 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 185 | 186 | train_dataset = datasets.ImageFolder( 187 | traindir, 188 | transforms.Compose([ 189 | transforms.RandomResizedCrop(224), 190 | transforms.RandomHorizontalFlip(), 191 | transforms.ToTensor(), 192 | normalize, 193 | ])) 194 | 195 | train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) 196 | 197 | train_loader = torch.utils.data.DataLoader(train_dataset, 198 | batch_size=args.batch_size, 199 | shuffle=(train_sampler is None), 200 | num_workers=2, 201 | pin_memory=True, 202 | sampler=train_sampler) 203 | 204 | val_loader = torch.utils.data.DataLoader(datasets.ImageFolder( 205 | valdir, 206 | transforms.Compose([ 207 | transforms.Resize(256), 208 | transforms.CenterCrop(224), 209 | transforms.ToTensor(), 210 | normalize, 211 | ])), 212 | batch_size=args.batch_size, 213 | shuffle=False, 214 | num_workers=2, 215 | pin_memory=True) 216 | 217 | if args.evaluate: 218 | validate(val_loader, model, criterion, gpu, args) 219 | return 220 | 221 | log_csv = "apex_distributed.csv" 222 | 223 | for epoch in range(args.start_epoch, args.epochs): 224 | epoch_start = time.time() 225 | 226 | train_sampler.set_epoch(epoch) 227 | adjust_learning_rate(optimizer, epoch, args) 228 | 229 | # train for one epoch 230 | train(train_loader, model, criterion, optimizer, epoch, gpu, args) 231 | 232 | # evaluate on validation set 233 | acc1 = validate(val_loader, model, criterion, gpu, args) 234 | 235 | # remember best acc@1 and save checkpoint 236 | is_best = acc1 > best_acc1 237 | best_acc1 = max(acc1, best_acc1) 238 | 239 | epoch_end = time.time() 240 | 241 | with open(log_csv, 'a+') as f: 242 | csv_write = csv.writer(f) 243 | data_row = [time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(epoch_start)), epoch_end - epoch_start] 244 | csv_write.writerow(data_row) 245 | 246 | save_checkpoint( 247 | { 248 | 'epoch': epoch + 1, 249 | 'arch': args.arch, 250 | 'state_dict': model.module.state_dict(), 251 | 'best_acc1': best_acc1, 252 | }, is_best) 253 | 254 | 255 | def train(train_loader, model, criterion, optimizer, epoch, gpu, args): 256 | batch_time = AverageMeter('Time', ':6.3f') 257 | data_time = AverageMeter('Data', ':6.3f') 258 | losses = AverageMeter('Loss', ':.4e') 259 | top1 = AverageMeter('Acc@1', ':6.2f') 260 | top5 = AverageMeter('Acc@5', ':6.2f') 261 | progress = ProgressMeter(len(train_loader), [batch_time, data_time, losses, top1, top5], 262 | prefix="Epoch: [{}]".format(epoch)) 263 | 264 | # switch to train mode 265 | model.train() 266 | 267 | end = time.time() 268 | prefetcher = data_prefetcher(train_loader) 269 | images, target = prefetcher.next() 270 | i = 0 271 | while images is not None: 272 | # measure data loading time 273 | data_time.update(time.time() - end) 274 | 275 | # compute output 276 | output = model(images) 277 | loss = criterion(output, target) 278 | 279 | # measure accuracy and record loss 280 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 281 | losses.update(loss.item(), images.size(0)) 282 | top1.update(acc1[0], images.size(0)) 283 | top5.update(acc5[0], images.size(0)) 284 | 285 | # compute gradient and do SGD step 286 | optimizer.zero_grad() 287 | with amp.scale_loss(loss, optimizer) as scaled_loss: 288 | scaled_loss.backward() 289 | optimizer.step() 290 | 291 | # measure elapsed time 292 | batch_time.update(time.time() - end) 293 | end = time.time() 294 | 295 | if i % args.print_freq == 0: 296 | progress.display(i) 297 | 298 | i += 1 299 | 300 | images, target = prefetcher.next() 301 | 302 | 303 | def validate(val_loader, model, criterion, gpu, args): 304 | batch_time = AverageMeter('Time', ':6.3f') 305 | losses = AverageMeter('Loss', ':.4e') 306 | top1 = AverageMeter('Acc@1', ':6.2f') 307 | top5 = AverageMeter('Acc@5', ':6.2f') 308 | progress = ProgressMeter(len(val_loader), [batch_time, losses, top1, top5], prefix='Test: ') 309 | 310 | # switch to evaluate mode 311 | model.eval() 312 | 313 | with torch.no_grad(): 314 | end = time.time() 315 | prefetcher = data_prefetcher(val_loader) 316 | images, target = prefetcher.next() 317 | i = 0 318 | while images is not None: 319 | 320 | # compute output 321 | output = model(images) 322 | loss = criterion(output, target) 323 | 324 | # measure accuracy and record loss 325 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 326 | losses.update(loss.item(), images.size(0)) 327 | top1.update(acc1[0], images.size(0)) 328 | top5.update(acc5[0], images.size(0)) 329 | 330 | # measure elapsed time 331 | batch_time.update(time.time() - end) 332 | end = time.time() 333 | 334 | if i % args.print_freq == 0: 335 | progress.display(i) 336 | 337 | i += 1 338 | 339 | images, target = prefetcher.next() 340 | 341 | # TODO: this should also be done with the ProgressMeter 342 | print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5)) 343 | 344 | return top1.avg 345 | 346 | 347 | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'): 348 | torch.save(state, filename) 349 | if is_best: 350 | shutil.copyfile(filename, 'model_best.pth.tar') 351 | 352 | 353 | class AverageMeter(object): 354 | """Computes and stores the average and current value""" 355 | def __init__(self, name, fmt=':f'): 356 | self.name = name 357 | self.fmt = fmt 358 | self.reset() 359 | 360 | def reset(self): 361 | self.val = 0 362 | self.avg = 0 363 | self.sum = 0 364 | self.count = 0 365 | 366 | def update(self, val, n=1): 367 | self.val = val 368 | self.sum += val * n 369 | self.count += n 370 | self.avg = self.sum / self.count 371 | 372 | def __str__(self): 373 | fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' 374 | return fmtstr.format(**self.__dict__) 375 | 376 | 377 | class ProgressMeter(object): 378 | def __init__(self, num_batches, meters, prefix=""): 379 | self.batch_fmtstr = self._get_batch_fmtstr(num_batches) 380 | self.meters = meters 381 | self.prefix = prefix 382 | 383 | def display(self, batch): 384 | entries = [self.prefix + self.batch_fmtstr.format(batch)] 385 | entries += [str(meter) for meter in self.meters] 386 | print('\t'.join(entries)) 387 | 388 | def _get_batch_fmtstr(self, num_batches): 389 | num_digits = len(str(num_batches // 1)) 390 | fmt = '{:' + str(num_digits) + 'd}' 391 | return '[' + fmt + '/' + fmt.format(num_batches) + ']' 392 | 393 | 394 | def adjust_learning_rate(optimizer, epoch, args): 395 | """Sets the learning rate to the initial LR decayed by 10 every 30 epochs""" 396 | lr = args.lr * (0.1**(epoch // 30)) 397 | for param_group in optimizer.param_groups: 398 | param_group['lr'] = lr 399 | 400 | 401 | def accuracy(output, target, topk=(1, )): 402 | """Computes the accuracy over the k top predictions for the specified values of k""" 403 | with torch.no_grad(): 404 | maxk = max(topk) 405 | batch_size = target.size(0) 406 | 407 | _, pred = output.topk(maxk, 1, True, True) 408 | pred = pred.t() 409 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 410 | 411 | res = [] 412 | for k in topk: 413 | correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) 414 | res.append(correct_k.mul_(100.0 / batch_size)) 415 | return res 416 | 417 | 418 | if __name__ == '__main__': 419 | main() -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-master/dataparallel.py: -------------------------------------------------------------------------------- 1 | import csv 2 | 3 | import argparse 4 | import os 5 | import random 6 | import shutil 7 | import time 8 | import warnings 9 | 10 | import torch 11 | import torch.nn as nn 12 | import torch.nn.parallel 13 | import torch.backends.cudnn as cudnn 14 | import torch.distributed as dist 15 | import torch.optim 16 | import torch.multiprocessing as mp 17 | import torch.utils.data 18 | import torch.utils.data.distributed 19 | import torchvision.transforms as transforms 20 | import torchvision.datasets as datasets 21 | import torchvision.models as models 22 | 23 | model_names = sorted(name for name in models.__dict__ 24 | if name.islower() and not name.startswith("__") and callable(models.__dict__[name])) 25 | 26 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training') 27 | parser.add_argument('--data', metavar='DIR', default='/home/zhangzhi/Data/ImageNet2012', help='path to dataset') 28 | parser.add_argument('-a', 29 | '--arch', 30 | metavar='ARCH', 31 | default='resnet18', 32 | choices=model_names, 33 | help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet18)') 34 | parser.add_argument('-j', 35 | '--workers', 36 | default=4, 37 | type=int, 38 | metavar='N', 39 | help='number of data loading workers (default: 4)') 40 | parser.add_argument('--epochs', default=90, type=int, metavar='N', help='number of total epochs to run') 41 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)') 42 | parser.add_argument('-b', 43 | '--batch-size', 44 | default=3200, 45 | type=int, 46 | metavar='N', 47 | help='mini-batch size (default: 3200), this is the total ' 48 | 'batch size of all GPUs on the current node when ' 49 | 'using Data Parallel or Distributed Data Parallel') 50 | parser.add_argument('--lr', 51 | '--learning-rate', 52 | default=0.1, 53 | type=float, 54 | metavar='LR', 55 | help='initial learning rate', 56 | dest='lr') 57 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum') 58 | parser.add_argument('--wd', 59 | '--weight-decay', 60 | default=1e-4, 61 | type=float, 62 | metavar='W', 63 | help='weight decay (default: 1e-4)', 64 | dest='weight_decay') 65 | parser.add_argument('-p', '--print-freq', default=10, type=int, metavar='N', help='print frequency (default: 10)') 66 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', help='evaluate model on validation set') 67 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', help='use pre-trained model') 68 | parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ') 69 | 70 | best_acc1 = 0 71 | 72 | 73 | def main(): 74 | args = parser.parse_args() 75 | 76 | if args.seed is not None: 77 | random.seed(args.seed) 78 | torch.manual_seed(args.seed) 79 | cudnn.deterministic = True 80 | warnings.warn('You have chosen to seed training. ' 81 | 'This will turn on the CUDNN deterministic setting, ' 82 | 'which can slow down your training considerably! ' 83 | 'You may see unexpected behavior when restarting ' 84 | 'from checkpoints.') 85 | 86 | gpus = [0, 1, 2, 3] 87 | main_worker(gpus=gpus, args=args) 88 | 89 | 90 | def main_worker(gpus, args): 91 | global best_acc1 92 | 93 | # create model 94 | if args.pretrained: 95 | print("=> using pre-trained model '{}'".format(args.arch)) 96 | model = models.__dict__[args.arch](pretrained=True) 97 | else: 98 | print("=> creating model '{}'".format(args.arch)) 99 | model = models.__dict__[args.arch]() 100 | 101 | torch.cuda.set_device('cuda:{}'.format(gpus[0])) 102 | model.cuda() 103 | # When using a single GPU per process and per 104 | # DistributedDataParallel, we need to divide the batch size 105 | # ourselves based on the total number of GPUs we have 106 | model = nn.DataParallel(model, device_ids=gpus, output_device=gpus[0]) 107 | 108 | # define loss function (criterion) and optimizer 109 | criterion = nn.CrossEntropyLoss() 110 | 111 | optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay) 112 | 113 | cudnn.benchmark = True 114 | 115 | # Data loading code 116 | traindir = os.path.join(args.data, 'train') 117 | valdir = os.path.join(args.data, 'val') 118 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 119 | 120 | train_dataset = datasets.ImageFolder( 121 | traindir, 122 | transforms.Compose([ 123 | transforms.RandomResizedCrop(224), 124 | transforms.RandomHorizontalFlip(), 125 | transforms.ToTensor(), 126 | normalize, 127 | ])) 128 | 129 | train_loader = torch.utils.data.DataLoader(train_dataset, 130 | batch_size=args.batch_size, 131 | shuffle=True, 132 | num_workers=2, 133 | pin_memory=True) 134 | 135 | val_loader = torch.utils.data.DataLoader(datasets.ImageFolder( 136 | valdir, 137 | transforms.Compose([ 138 | transforms.Resize(256), 139 | transforms.CenterCrop(224), 140 | transforms.ToTensor(), 141 | normalize, 142 | ])), 143 | batch_size=args.batch_size, 144 | shuffle=False, 145 | num_workers=2, 146 | pin_memory=True) 147 | 148 | if args.evaluate: 149 | validate(val_loader, model, criterion, args) 150 | return 151 | 152 | log_csv = "dataparallel.csv" 153 | 154 | for epoch in range(args.start_epoch, args.epochs): 155 | epoch_start = time.time() 156 | 157 | adjust_learning_rate(optimizer, epoch, args) 158 | 159 | # train for one epoch 160 | train(train_loader, model, criterion, optimizer, epoch, args) 161 | 162 | # evaluate on validation set 163 | acc1 = validate(val_loader, model, criterion, args) 164 | 165 | # remember best acc@1 and save checkpoint 166 | is_best = acc1 > best_acc1 167 | best_acc1 = max(acc1, best_acc1) 168 | 169 | epoch_end = time.time() 170 | 171 | with open(log_csv, 'a+') as f: 172 | csv_write = csv.writer(f) 173 | data_row = [time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(epoch_start)), epoch_end - epoch_start] 174 | csv_write.writerow(data_row) 175 | 176 | save_checkpoint( 177 | { 178 | 'epoch': epoch + 1, 179 | 'arch': args.arch, 180 | 'state_dict': model.module.state_dict(), 181 | 'best_acc1': best_acc1, 182 | }, is_best) 183 | 184 | 185 | def train(train_loader, model, criterion, optimizer, epoch, args): 186 | batch_time = AverageMeter('Time', ':6.3f') 187 | data_time = AverageMeter('Data', ':6.3f') 188 | losses = AverageMeter('Loss', ':.4e') 189 | top1 = AverageMeter('Acc@1', ':6.2f') 190 | top5 = AverageMeter('Acc@5', ':6.2f') 191 | progress = ProgressMeter(len(train_loader), [batch_time, data_time, losses, top1, top5], 192 | prefix="Epoch: [{}]".format(epoch)) 193 | 194 | # switch to train mode 195 | model.train() 196 | 197 | end = time.time() 198 | for i, (images, target) in enumerate(train_loader): 199 | # measure data loading time 200 | data_time.update(time.time() - end) 201 | 202 | images = images.cuda(non_blocking=True) 203 | target = target.cuda(non_blocking=True) 204 | 205 | # compute output 206 | output = model(images) 207 | loss = criterion(output, target) 208 | 209 | # measure accuracy and record loss 210 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 211 | losses.update(loss.item(), images.size(0)) 212 | top1.update(acc1[0], images.size(0)) 213 | top5.update(acc5[0], images.size(0)) 214 | 215 | # compute gradient and do SGD step 216 | optimizer.zero_grad() 217 | loss.backward() 218 | optimizer.step() 219 | 220 | # measure elapsed time 221 | batch_time.update(time.time() - end) 222 | end = time.time() 223 | 224 | if i % args.print_freq == 0: 225 | progress.display(i) 226 | 227 | 228 | def validate(val_loader, model, criterion, args): 229 | batch_time = AverageMeter('Time', ':6.3f') 230 | losses = AverageMeter('Loss', ':.4e') 231 | top1 = AverageMeter('Acc@1', ':6.2f') 232 | top5 = AverageMeter('Acc@5', ':6.2f') 233 | progress = ProgressMeter(len(val_loader), [batch_time, losses, top1, top5], prefix='Test: ') 234 | 235 | # switch to evaluate mode 236 | model.eval() 237 | 238 | with torch.no_grad(): 239 | end = time.time() 240 | for i, (images, target) in enumerate(val_loader): 241 | images = images.cuda(non_blocking=True) 242 | target = target.cuda(non_blocking=True) 243 | 244 | # compute output 245 | output = model(images) 246 | loss = criterion(output, target) 247 | 248 | # measure accuracy and record loss 249 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 250 | losses.update(loss.item(), images.size(0)) 251 | top1.update(acc1[0], images.size(0)) 252 | top5.update(acc5[0], images.size(0)) 253 | 254 | # measure elapsed time 255 | batch_time.update(time.time() - end) 256 | end = time.time() 257 | 258 | if i % args.print_freq == 0: 259 | progress.display(i) 260 | 261 | # TODO: this should also be done with the ProgressMeter 262 | print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5)) 263 | 264 | return top1.avg 265 | 266 | 267 | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'): 268 | torch.save(state, filename) 269 | if is_best: 270 | shutil.copyfile(filename, 'model_best.pth.tar') 271 | 272 | 273 | class AverageMeter(object): 274 | """Computes and stores the average and current value""" 275 | def __init__(self, name, fmt=':f'): 276 | self.name = name 277 | self.fmt = fmt 278 | self.reset() 279 | 280 | def reset(self): 281 | self.val = 0 282 | self.avg = 0 283 | self.sum = 0 284 | self.count = 0 285 | 286 | def update(self, val, n=1): 287 | self.val = val 288 | self.sum += val * n 289 | self.count += n 290 | self.avg = self.sum / self.count 291 | 292 | def __str__(self): 293 | fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' 294 | return fmtstr.format(**self.__dict__) 295 | 296 | 297 | class ProgressMeter(object): 298 | def __init__(self, num_batches, meters, prefix=""): 299 | self.batch_fmtstr = self._get_batch_fmtstr(num_batches) 300 | self.meters = meters 301 | self.prefix = prefix 302 | 303 | def display(self, batch): 304 | entries = [self.prefix + self.batch_fmtstr.format(batch)] 305 | entries += [str(meter) for meter in self.meters] 306 | print('\t'.join(entries)) 307 | 308 | def _get_batch_fmtstr(self, num_batches): 309 | num_digits = len(str(num_batches // 1)) 310 | fmt = '{:' + str(num_digits) + 'd}' 311 | return '[' + fmt + '/' + fmt.format(num_batches) + ']' 312 | 313 | 314 | def adjust_learning_rate(optimizer, epoch, args): 315 | """Sets the learning rate to the initial LR decayed by 10 every 30 epochs""" 316 | lr = args.lr * (0.1**(epoch // 30)) 317 | for param_group in optimizer.param_groups: 318 | param_group['lr'] = lr 319 | 320 | 321 | def accuracy(output, target, topk=(1, )): 322 | """Computes the accuracy over the k top predictions for the specified values of k""" 323 | with torch.no_grad(): 324 | maxk = max(topk) 325 | batch_size = target.size(0) 326 | 327 | _, pred = output.topk(maxk, 1, True, True) 328 | pred = pred.t() 329 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 330 | 331 | res = [] 332 | for k in topk: 333 | correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) 334 | res.append(correct_k.mul_(100.0 / batch_size)) 335 | return res 336 | 337 | 338 | if __name__ == '__main__': 339 | main() -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-master/distributed.py: -------------------------------------------------------------------------------- 1 | import csv 2 | 3 | import argparse 4 | import os 5 | import random 6 | import shutil 7 | import time 8 | import warnings 9 | 10 | import torch 11 | import torch.nn as nn 12 | import torch.nn.parallel 13 | import torch.backends.cudnn as cudnn 14 | import torch.distributed as dist 15 | import torch.optim 16 | import torch.multiprocessing as mp 17 | import torch.utils.data 18 | import torch.utils.data.distributed 19 | import torchvision.transforms as transforms 20 | import torchvision.datasets as datasets 21 | import torchvision.models as models 22 | 23 | model_names = sorted(name for name in models.__dict__ 24 | if name.islower() and not name.startswith("__") and callable(models.__dict__[name])) 25 | 26 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training') 27 | parser.add_argument('--data', metavar='DIR', default='/home/zhangzhi/Data/ImageNet2012', help='path to dataset') 28 | parser.add_argument('-a', 29 | '--arch', 30 | metavar='ARCH', 31 | default='resnet18', 32 | choices=model_names, 33 | help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet18)') 34 | parser.add_argument('-j', 35 | '--workers', 36 | default=4, 37 | type=int, 38 | metavar='N', 39 | help='number of data loading workers (default: 4)') 40 | parser.add_argument('--epochs', default=90, type=int, metavar='N', help='number of total epochs to run') 41 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)') 42 | parser.add_argument('-b', 43 | '--batch-size', 44 | default=3200, 45 | type=int, 46 | metavar='N', 47 | help='mini-batch size (default: 3200), this is the total ' 48 | 'batch size of all GPUs on the current node when ' 49 | 'using Data Parallel or Distributed Data Parallel') 50 | parser.add_argument('--lr', 51 | '--learning-rate', 52 | default=0.1, 53 | type=float, 54 | metavar='LR', 55 | help='initial learning rate', 56 | dest='lr') 57 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum') 58 | parser.add_argument('--local_rank', default=-1, type=int, 59 | help='node rank for distributed training') 60 | parser.add_argument('--wd', 61 | '--weight-decay', 62 | default=1e-4, 63 | type=float, 64 | metavar='W', 65 | help='weight decay (default: 1e-4)', 66 | dest='weight_decay') 67 | parser.add_argument('-p', '--print-freq', default=10, type=int, metavar='N', help='print frequency (default: 10)') 68 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', help='evaluate model on validation set') 69 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', help='use pre-trained model') 70 | parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ') 71 | 72 | best_acc1 = 0 73 | 74 | 75 | def main(): 76 | args = parser.parse_args() 77 | 78 | if args.seed is not None: 79 | random.seed(args.seed) 80 | torch.manual_seed(args.seed) 81 | cudnn.deterministic = True 82 | warnings.warn('You have chosen to seed training. ' 83 | 'This will turn on the CUDNN deterministic setting, ' 84 | 'which can slow down your training considerably! ' 85 | 'You may see unexpected behavior when restarting ' 86 | 'from checkpoints.') 87 | 88 | main_worker(args.local_rank, 4, args) 89 | 90 | 91 | def main_worker(gpu, ngpus_per_node, args): 92 | global best_acc1 93 | 94 | dist.init_process_group(backend='nccl') 95 | # create model 96 | if args.pretrained: 97 | print("=> using pre-trained model '{}'".format(args.arch)) 98 | model = models.__dict__[args.arch](pretrained=True) 99 | else: 100 | print("=> creating model '{}'".format(args.arch)) 101 | model = models.__dict__[args.arch]() 102 | 103 | torch.cuda.set_device(gpu) 104 | model.cuda(gpu) 105 | # When using a single GPU per process and per 106 | # DistributedDataParallel, we need to divide the batch size 107 | # ourselves based on the total number of GPUs we have 108 | args.batch_size = int(args.batch_size / ngpus_per_node) 109 | model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[gpu]) 110 | 111 | # define loss function (criterion) and optimizer 112 | criterion = nn.CrossEntropyLoss().cuda(gpu) 113 | 114 | optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay) 115 | 116 | cudnn.benchmark = True 117 | 118 | # Data loading code 119 | traindir = os.path.join(args.data, 'train') 120 | valdir = os.path.join(args.data, 'val') 121 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 122 | 123 | train_dataset = datasets.ImageFolder( 124 | traindir, 125 | transforms.Compose([ 126 | transforms.RandomResizedCrop(224), 127 | transforms.RandomHorizontalFlip(), 128 | transforms.ToTensor(), 129 | normalize, 130 | ])) 131 | 132 | train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) 133 | 134 | train_loader = torch.utils.data.DataLoader(train_dataset, 135 | batch_size=args.batch_size, 136 | shuffle=(train_sampler is None), 137 | num_workers=2, 138 | pin_memory=True, 139 | sampler=train_sampler) 140 | 141 | val_loader = torch.utils.data.DataLoader(datasets.ImageFolder( 142 | valdir, 143 | transforms.Compose([ 144 | transforms.Resize(256), 145 | transforms.CenterCrop(224), 146 | transforms.ToTensor(), 147 | normalize, 148 | ])), 149 | batch_size=args.batch_size, 150 | shuffle=False, 151 | num_workers=2, 152 | pin_memory=True) 153 | 154 | if args.evaluate: 155 | validate(val_loader, model, criterion, gpu, args) 156 | return 157 | 158 | log_csv = "distributed.csv" 159 | 160 | for epoch in range(args.start_epoch, args.epochs): 161 | epoch_start = time.time() 162 | 163 | train_sampler.set_epoch(epoch) 164 | adjust_learning_rate(optimizer, epoch, args) 165 | 166 | # train for one epoch 167 | train(train_loader, model, criterion, optimizer, epoch, gpu, args) 168 | 169 | # evaluate on validation set 170 | acc1 = validate(val_loader, model, criterion, gpu, args) 171 | 172 | # remember best acc@1 and save checkpoint 173 | is_best = acc1 > best_acc1 174 | best_acc1 = max(acc1, best_acc1) 175 | 176 | epoch_end = time.time() 177 | 178 | with open(log_csv, 'a+') as f: 179 | csv_write = csv.writer(f) 180 | data_row = [time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(epoch_start)), epoch_end - epoch_start] 181 | csv_write.writerow(data_row) 182 | 183 | save_checkpoint( 184 | { 185 | 'epoch': epoch + 1, 186 | 'arch': args.arch, 187 | 'state_dict': model.module.state_dict(), 188 | 'best_acc1': best_acc1, 189 | }, is_best) 190 | 191 | 192 | def train(train_loader, model, criterion, optimizer, epoch, gpu, args): 193 | batch_time = AverageMeter('Time', ':6.3f') 194 | data_time = AverageMeter('Data', ':6.3f') 195 | losses = AverageMeter('Loss', ':.4e') 196 | top1 = AverageMeter('Acc@1', ':6.2f') 197 | top5 = AverageMeter('Acc@5', ':6.2f') 198 | progress = ProgressMeter(len(train_loader), [batch_time, data_time, losses, top1, top5], 199 | prefix="Epoch: [{}]".format(epoch)) 200 | 201 | # switch to train mode 202 | model.train() 203 | 204 | end = time.time() 205 | for i, (images, target) in enumerate(train_loader): 206 | # measure data loading time 207 | data_time.update(time.time() - end) 208 | 209 | images = images.cuda(gpu, non_blocking=True) 210 | target = target.cuda(gpu, non_blocking=True) 211 | 212 | # compute output 213 | output = model(images) 214 | loss = criterion(output, target) 215 | 216 | # measure accuracy and record loss 217 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 218 | losses.update(loss.item(), images.size(0)) 219 | top1.update(acc1[0], images.size(0)) 220 | top5.update(acc5[0], images.size(0)) 221 | 222 | # compute gradient and do SGD step 223 | optimizer.zero_grad() 224 | loss.backward() 225 | optimizer.step() 226 | 227 | # measure elapsed time 228 | batch_time.update(time.time() - end) 229 | end = time.time() 230 | 231 | if i % args.print_freq == 0: 232 | progress.display(i) 233 | 234 | 235 | def validate(val_loader, model, criterion, gpu, args): 236 | batch_time = AverageMeter('Time', ':6.3f') 237 | losses = AverageMeter('Loss', ':.4e') 238 | top1 = AverageMeter('Acc@1', ':6.2f') 239 | top5 = AverageMeter('Acc@5', ':6.2f') 240 | progress = ProgressMeter(len(val_loader), [batch_time, losses, top1, top5], prefix='Test: ') 241 | 242 | # switch to evaluate mode 243 | model.eval() 244 | 245 | with torch.no_grad(): 246 | end = time.time() 247 | for i, (images, target) in enumerate(val_loader): 248 | images = images.cuda(gpu, non_blocking=True) 249 | target = target.cuda(gpu, non_blocking=True) 250 | 251 | # compute output 252 | output = model(images) 253 | loss = criterion(output, target) 254 | 255 | # measure accuracy and record loss 256 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 257 | losses.update(loss.item(), images.size(0)) 258 | top1.update(acc1[0], images.size(0)) 259 | top5.update(acc5[0], images.size(0)) 260 | 261 | # measure elapsed time 262 | batch_time.update(time.time() - end) 263 | end = time.time() 264 | 265 | if i % args.print_freq == 0: 266 | progress.display(i) 267 | 268 | # TODO: this should also be done with the ProgressMeter 269 | print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5)) 270 | 271 | return top1.avg 272 | 273 | 274 | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'): 275 | torch.save(state, filename) 276 | if is_best: 277 | shutil.copyfile(filename, 'model_best.pth.tar') 278 | 279 | 280 | class AverageMeter(object): 281 | """Computes and stores the average and current value""" 282 | def __init__(self, name, fmt=':f'): 283 | self.name = name 284 | self.fmt = fmt 285 | self.reset() 286 | 287 | def reset(self): 288 | self.val = 0 289 | self.avg = 0 290 | self.sum = 0 291 | self.count = 0 292 | 293 | def update(self, val, n=1): 294 | self.val = val 295 | self.sum += val * n 296 | self.count += n 297 | self.avg = self.sum / self.count 298 | 299 | def __str__(self): 300 | fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' 301 | return fmtstr.format(**self.__dict__) 302 | 303 | 304 | class ProgressMeter(object): 305 | def __init__(self, num_batches, meters, prefix=""): 306 | self.batch_fmtstr = self._get_batch_fmtstr(num_batches) 307 | self.meters = meters 308 | self.prefix = prefix 309 | 310 | def display(self, batch): 311 | entries = [self.prefix + self.batch_fmtstr.format(batch)] 312 | entries += [str(meter) for meter in self.meters] 313 | print('\t'.join(entries)) 314 | 315 | def _get_batch_fmtstr(self, num_batches): 316 | num_digits = len(str(num_batches // 1)) 317 | fmt = '{:' + str(num_digits) + 'd}' 318 | return '[' + fmt + '/' + fmt.format(num_batches) + ']' 319 | 320 | 321 | def adjust_learning_rate(optimizer, epoch, args): 322 | """Sets the learning rate to the initial LR decayed by 10 every 30 epochs""" 323 | lr = args.lr * (0.1**(epoch // 30)) 324 | for param_group in optimizer.param_groups: 325 | param_group['lr'] = lr 326 | 327 | 328 | def accuracy(output, target, topk=(1, )): 329 | """Computes the accuracy over the k top predictions for the specified values of k""" 330 | with torch.no_grad(): 331 | maxk = max(topk) 332 | batch_size = target.size(0) 333 | 334 | _, pred = output.topk(maxk, 1, True, True) 335 | pred = pred.t() 336 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 337 | 338 | res = [] 339 | for k in topk: 340 | correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) 341 | res.append(correct_k.mul_(100.0 / batch_size)) 342 | return res 343 | 344 | 345 | if __name__ == '__main__': 346 | main() -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-master/distributed_slurm_main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import csv 3 | import time 4 | import socket 5 | import random 6 | import shutil 7 | import argparse 8 | import warnings 9 | 10 | import torch 11 | import torch.optim 12 | import torch.nn as nn 13 | import torch.nn.parallel 14 | import torch.backends.cudnn as cudnn 15 | import torch.distributed as dist 16 | import torch.multiprocessing as mp 17 | import torch.utils.data 18 | import torch.utils.data.distributed 19 | 20 | import torchvision.transforms as transforms 21 | import torchvision.datasets as datasets 22 | import torchvision.models as models 23 | 24 | model_names = sorted(name for name in models.__dict__ 25 | if name.islower() and not name.startswith("__") and callable(models.__dict__[name])) 26 | 27 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training') 28 | parser.add_argument('--data', metavar='DIR', default='/home/zhangzhi/Data/exports/ImageNet2012', help='path to dataset') 29 | parser.add_argument('-a', 30 | '--arch', 31 | metavar='ARCH', 32 | default='resnet18', 33 | choices=model_names, 34 | help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet18)') 35 | parser.add_argument('-j', 36 | '--workers', 37 | default=4, 38 | type=int, 39 | metavar='N', 40 | help='number of data loading workers (default: 4)') 41 | parser.add_argument('--epochs', default=90, type=int, metavar='N', help='number of total epochs to run') 42 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)') 43 | parser.add_argument('-b', 44 | '--batch-size', 45 | default=400, 46 | type=int, 47 | metavar='N', 48 | help='mini-batch size (default: 3200), this is the total ' 49 | 'batch size of all GPUs on the current node when ' 50 | 'using Data Parallel or Distributed Data Parallel') 51 | parser.add_argument('--lr', 52 | '--learning-rate', 53 | default=0.1, 54 | type=float, 55 | metavar='LR', 56 | help='initial learning rate', 57 | dest='lr') 58 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum') 59 | parser.add_argument('--wd', 60 | '--weight-decay', 61 | default=1e-4, 62 | type=float, 63 | metavar='W', 64 | help='weight decay (default: 1e-4)', 65 | dest='weight_decay') 66 | parser.add_argument('-p', '--print-freq', default=10, type=int, metavar='N', help='print frequency (default: 10)') 67 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', help='evaluate model on validation set') 68 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', help='use pre-trained model') 69 | parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ') 70 | parser.add_argument('--dist-file', default=None, type=str, help='file used to initial distributed training') 71 | 72 | best_acc1 = 0 73 | 74 | 75 | def main(): 76 | args = parser.parse_args() 77 | 78 | if args.seed is not None: 79 | random.seed(args.seed) 80 | torch.manual_seed(args.seed) 81 | cudnn.deterministic = True 82 | # torch.backends.cudnn.enabled = False 83 | warnings.warn('You have chosen to seed training. ' 84 | 'This will turn on the CUDNN deterministic setting, ' 85 | 'which can slow down your training considerably! ' 86 | 'You may see unexpected behavior when restarting ' 87 | 'from checkpoints.') 88 | 89 | args.local_rank = int(os.environ["SLURM_PROCID"]) 90 | args.world_size = int(os.environ["SLURM_NPROCS"]) 91 | ngpus_per_node = torch.cuda.device_count() 92 | 93 | job_id = os.environ["SLURM_JOBID"] 94 | args.dist_url = "file://{}.{}".format(os.path.realpath(args.dist_file), job_id) 95 | mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) 96 | 97 | 98 | def main_worker(gpu, ngpus_per_node, args): 99 | global best_acc1 100 | rank = args.local_rank * ngpus_per_node + gpu 101 | dist.init_process_group(backend='nccl', init_method=args.dist_url, world_size=args.world_size, rank=rank) 102 | # create model 103 | if args.pretrained: 104 | print("=> using pre-trained model '{}'".format(args.arch)) 105 | model = models.__dict__[args.arch](pretrained=True) 106 | else: 107 | print("=> creating model '{}'".format(args.arch)) 108 | model = models.__dict__[args.arch]() 109 | 110 | torch.cuda.set_device(gpu) 111 | model.cuda(gpu) 112 | # When using a single GPU per process and per 113 | # DistributedDataParallel, we need to divide the batch size 114 | # ourselves based on the total number of GPUs we have 115 | args.batch_size = int(args.batch_size / ngpus_per_node) 116 | model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[gpu]) 117 | 118 | # define loss function (criterion) and optimizer 119 | criterion = nn.CrossEntropyLoss().cuda(gpu) 120 | 121 | optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay) 122 | 123 | cudnn.benchmark = True 124 | 125 | # Data loading code 126 | traindir = os.path.join(args.data, 'train') 127 | valdir = os.path.join(args.data, 'val') 128 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 129 | 130 | train_dataset = datasets.ImageFolder( 131 | traindir, 132 | transforms.Compose([ 133 | transforms.RandomResizedCrop(224), 134 | transforms.RandomHorizontalFlip(), 135 | transforms.ToTensor(), 136 | normalize, 137 | ])) 138 | 139 | train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) 140 | 141 | train_loader = torch.utils.data.DataLoader(train_dataset, 142 | batch_size=args.batch_size, 143 | shuffle=(train_sampler is None), 144 | num_workers=2, 145 | pin_memory=True, 146 | sampler=train_sampler) 147 | 148 | val_loader = torch.utils.data.DataLoader(datasets.ImageFolder( 149 | valdir, 150 | transforms.Compose([ 151 | transforms.Resize(256), 152 | transforms.CenterCrop(224), 153 | transforms.ToTensor(), 154 | normalize, 155 | ])), 156 | batch_size=args.batch_size, 157 | shuffle=False, 158 | num_workers=2, 159 | pin_memory=True) 160 | 161 | if args.evaluate: 162 | validate(val_loader, model, criterion, gpu, args) 163 | return 164 | 165 | log_csv = "distributed.csv" 166 | 167 | for epoch in range(args.start_epoch, args.epochs): 168 | epoch_start = time.time() 169 | 170 | train_sampler.set_epoch(epoch) 171 | adjust_learning_rate(optimizer, epoch, args) 172 | 173 | # train for one epoch 174 | train(train_loader, model, criterion, optimizer, epoch, gpu, args) 175 | 176 | # evaluate on validation set 177 | acc1 = validate(val_loader, model, criterion, gpu, args) 178 | 179 | # remember best acc@1 and save checkpoint 180 | is_best = acc1 > best_acc1 181 | best_acc1 = max(acc1, best_acc1) 182 | 183 | epoch_end = time.time() 184 | 185 | with open(log_csv, 'a+') as f: 186 | csv_write = csv.writer(f) 187 | data_row = [time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(epoch_start)), epoch_end - epoch_start] 188 | csv_write.writerow(data_row) 189 | 190 | save_checkpoint( 191 | { 192 | 'epoch': epoch + 1, 193 | 'arch': args.arch, 194 | 'state_dict': model.module.state_dict(), 195 | 'best_acc1': best_acc1, 196 | }, is_best) 197 | 198 | 199 | def train(train_loader, model, criterion, optimizer, epoch, gpu, args): 200 | batch_time = AverageMeter('Time', ':6.3f') 201 | data_time = AverageMeter('Data', ':6.3f') 202 | losses = AverageMeter('Loss', ':.4e') 203 | top1 = AverageMeter('Acc@1', ':6.2f') 204 | top5 = AverageMeter('Acc@5', ':6.2f') 205 | progress = ProgressMeter(len(train_loader), [batch_time, data_time, losses, top1, top5], 206 | prefix="Epoch: [{}]".format(epoch)) 207 | 208 | # switch to train mode 209 | model.train() 210 | 211 | end = time.time() 212 | for i, (images, target) in enumerate(train_loader): 213 | # measure data loading time 214 | data_time.update(time.time() - end) 215 | 216 | images = images.cuda(gpu, non_blocking=True) 217 | target = target.cuda(gpu, non_blocking=True) 218 | 219 | # compute output 220 | output = model(images) 221 | loss = criterion(output, target) 222 | 223 | # measure accuracy and record loss 224 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 225 | losses.update(loss.item(), images.size(0)) 226 | top1.update(acc1[0], images.size(0)) 227 | top5.update(acc5[0], images.size(0)) 228 | 229 | # compute gradient and do SGD step 230 | optimizer.zero_grad() 231 | loss.backward() 232 | optimizer.step() 233 | 234 | # measure elapsed time 235 | batch_time.update(time.time() - end) 236 | end = time.time() 237 | 238 | if i % args.print_freq == 0: 239 | progress.display(i) 240 | 241 | 242 | def validate(val_loader, model, criterion, gpu, args): 243 | batch_time = AverageMeter('Time', ':6.3f') 244 | losses = AverageMeter('Loss', ':.4e') 245 | top1 = AverageMeter('Acc@1', ':6.2f') 246 | top5 = AverageMeter('Acc@5', ':6.2f') 247 | progress = ProgressMeter(len(val_loader), [batch_time, losses, top1, top5], prefix='Test: ') 248 | 249 | # switch to evaluate mode 250 | model.eval() 251 | 252 | with torch.no_grad(): 253 | end = time.time() 254 | for i, (images, target) in enumerate(val_loader): 255 | images = images.cuda(gpu, non_blocking=True) 256 | target = target.cuda(gpu, non_blocking=True) 257 | 258 | # compute output 259 | output = model(images) 260 | loss = criterion(output, target) 261 | 262 | # measure accuracy and record loss 263 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 264 | losses.update(loss.item(), images.size(0)) 265 | top1.update(acc1[0], images.size(0)) 266 | top5.update(acc5[0], images.size(0)) 267 | 268 | # measure elapsed time 269 | batch_time.update(time.time() - end) 270 | end = time.time() 271 | 272 | if i % args.print_freq == 0: 273 | progress.display(i) 274 | 275 | # TODO: this should also be done with the ProgressMeter 276 | print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5)) 277 | 278 | return top1.avg 279 | 280 | 281 | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'): 282 | torch.save(state, filename) 283 | if is_best: 284 | shutil.copyfile(filename, 'model_best.pth.tar') 285 | 286 | 287 | class AverageMeter(object): 288 | """Computes and stores the average and current value""" 289 | def __init__(self, name, fmt=':f'): 290 | self.name = name 291 | self.fmt = fmt 292 | self.reset() 293 | 294 | def reset(self): 295 | self.val = 0 296 | self.avg = 0 297 | self.sum = 0 298 | self.count = 0 299 | 300 | def update(self, val, n=1): 301 | self.val = val 302 | self.sum += val * n 303 | self.count += n 304 | self.avg = self.sum / self.count 305 | 306 | def __str__(self): 307 | fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' 308 | return fmtstr.format(**self.__dict__) 309 | 310 | 311 | class ProgressMeter(object): 312 | def __init__(self, num_batches, meters, prefix=""): 313 | self.batch_fmtstr = self._get_batch_fmtstr(num_batches) 314 | self.meters = meters 315 | self.prefix = prefix 316 | 317 | def display(self, batch): 318 | entries = [self.prefix + self.batch_fmtstr.format(batch)] 319 | entries += [str(meter) for meter in self.meters] 320 | print('\t'.join(entries)) 321 | 322 | def _get_batch_fmtstr(self, num_batches): 323 | num_digits = len(str(num_batches // 1)) 324 | fmt = '{:' + str(num_digits) + 'd}' 325 | return '[' + fmt + '/' + fmt.format(num_batches) + ']' 326 | 327 | 328 | def adjust_learning_rate(optimizer, epoch, args): 329 | """Sets the learning rate to the initial LR decayed by 10 every 30 epochs""" 330 | lr = args.lr * (0.1**(epoch // 30)) 331 | for param_group in optimizer.param_groups: 332 | param_group['lr'] = lr 333 | 334 | 335 | def accuracy(output, target, topk=(1, )): 336 | """Computes the accuracy over the k top predictions for the specified values of k""" 337 | with torch.no_grad(): 338 | maxk = max(topk) 339 | batch_size = target.size(0) 340 | 341 | _, pred = output.topk(maxk, 1, True, True) 342 | pred = pred.t() 343 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 344 | 345 | res = [] 346 | for k in topk: 347 | correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) 348 | res.append(correct_k.mul_(100.0 / batch_size)) 349 | return res 350 | 351 | 352 | if __name__ == '__main__': 353 | main() -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-master/horovod_distributed.py: -------------------------------------------------------------------------------- 1 | import csv 2 | 3 | import argparse 4 | import os 5 | import random 6 | import shutil 7 | import time 8 | import warnings 9 | 10 | import torch 11 | import torch.nn as nn 12 | import torch.nn.parallel 13 | import torch.backends.cudnn as cudnn 14 | import torch.distributed as dist 15 | import torch.optim 16 | import torch.multiprocessing as mp 17 | import torch.utils.data 18 | import torch.utils.data.distributed 19 | import torchvision.transforms as transforms 20 | import torchvision.datasets as datasets 21 | import torchvision.models as models 22 | import horovod.torch as hvd 23 | 24 | model_names = sorted(name for name in models.__dict__ 25 | if name.islower() and not name.startswith("__") and callable(models.__dict__[name])) 26 | 27 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training') 28 | parser.add_argument('--data', metavar='DIR', default='/home/zhangzhi/Data/ImageNet2012', help='path to dataset') 29 | parser.add_argument('-a', 30 | '--arch', 31 | metavar='ARCH', 32 | default='resnet18', 33 | choices=model_names, 34 | help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet18)') 35 | parser.add_argument('-j', 36 | '--workers', 37 | default=4, 38 | type=int, 39 | metavar='N', 40 | help='number of data loading workers (default: 4)') 41 | parser.add_argument('--epochs', default=90, type=int, metavar='N', help='number of total epochs to run') 42 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)') 43 | parser.add_argument('-b', 44 | '--batch-size', 45 | default=3200, 46 | type=int, 47 | metavar='N', 48 | help='mini-batch size (default: 3200), this is the total ' 49 | 'batch size of all GPUs on the current node when ' 50 | 'using Data Parallel or Distributed Data Parallel') 51 | parser.add_argument('--lr', 52 | '--learning-rate', 53 | default=0.1, 54 | type=float, 55 | metavar='LR', 56 | help='initial learning rate', 57 | dest='lr') 58 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum') 59 | parser.add_argument('--wd', 60 | '--weight-decay', 61 | default=1e-4, 62 | type=float, 63 | metavar='W', 64 | help='weight decay (default: 1e-4)', 65 | dest='weight_decay') 66 | parser.add_argument('-p', '--print-freq', default=10, type=int, metavar='N', help='print frequency (default: 10)') 67 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', help='evaluate model on validation set') 68 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', help='use pre-trained model') 69 | parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ') 70 | 71 | best_acc1 = 0 72 | 73 | 74 | def main(): 75 | args = parser.parse_args() 76 | 77 | if args.seed is not None: 78 | random.seed(args.seed) 79 | torch.manual_seed(args.seed) 80 | cudnn.deterministic = True 81 | warnings.warn('You have chosen to seed training. ' 82 | 'This will turn on the CUDNN deterministic setting, ' 83 | 'which can slow down your training considerably! ' 84 | 'You may see unexpected behavior when restarting ' 85 | 'from checkpoints.') 86 | 87 | hvd.init() 88 | local_rank = hvd.local_rank() 89 | torch.cuda.set_device(local_rank) 90 | 91 | main_worker(local_rank, 4, args) 92 | 93 | 94 | def main_worker(gpu, ngpus_per_node, args): 95 | global best_acc1 96 | 97 | # create model 98 | if args.pretrained: 99 | print("=> using pre-trained model '{}'".format(args.arch)) 100 | model = models.__dict__[args.arch](pretrained=True) 101 | else: 102 | print("=> creating model '{}'".format(args.arch)) 103 | model = models.__dict__[args.arch]() 104 | 105 | model.cuda() 106 | # When using a single GPU per process and per 107 | # DistributedDataParallel, we need to divide the batch size 108 | # ourselves based on the total number of GPUs we have 109 | args.batch_size = int(args.batch_size / ngpus_per_node) 110 | 111 | hvd.broadcast_parameters(model.state_dict(), root_rank=0) 112 | 113 | # define loss function (criterion) and optimizer 114 | criterion = nn.CrossEntropyLoss().cuda() 115 | 116 | optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay) 117 | hvd.broadcast_optimizer_state(optimizer, root_rank=0) 118 | compression = hvd.Compression.fp16 119 | 120 | optimizer = hvd.DistributedOptimizer(optimizer, named_parameters=model.named_parameters(), compression=compression) 121 | 122 | cudnn.benchmark = True 123 | 124 | # Data loading code 125 | traindir = os.path.join(args.data, 'train') 126 | valdir = os.path.join(args.data, 'val') 127 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 128 | 129 | train_dataset = datasets.ImageFolder( 130 | traindir, 131 | transforms.Compose([ 132 | transforms.RandomResizedCrop(224), 133 | transforms.RandomHorizontalFlip(), 134 | transforms.ToTensor(), 135 | normalize, 136 | ])) 137 | 138 | train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset, 139 | num_replicas=hvd.size(), 140 | rank=hvd.rank()) 141 | 142 | train_loader = torch.utils.data.DataLoader(train_dataset, 143 | batch_size=args.batch_size, 144 | shuffle=(train_sampler is None), 145 | num_workers=2, 146 | pin_memory=True, 147 | sampler=train_sampler) 148 | 149 | val_loader = torch.utils.data.DataLoader(datasets.ImageFolder( 150 | valdir, 151 | transforms.Compose([ 152 | transforms.Resize(256), 153 | transforms.CenterCrop(224), 154 | transforms.ToTensor(), 155 | normalize, 156 | ])), 157 | batch_size=args.batch_size, 158 | shuffle=False, 159 | num_workers=2, 160 | pin_memory=True) 161 | 162 | if args.evaluate: 163 | validate(val_loader, model, criterion, args) 164 | return 165 | 166 | log_csv = "horovod_distributed.csv" 167 | 168 | for epoch in range(args.start_epoch, args.epochs): 169 | epoch_start = time.time() 170 | 171 | train_sampler.set_epoch(epoch) 172 | adjust_learning_rate(optimizer, epoch, args) 173 | 174 | # train for one epoch 175 | train(train_loader, model, criterion, optimizer, epoch, args) 176 | 177 | # evaluate on validation set 178 | acc1 = validate(val_loader, model, criterion, args) 179 | 180 | # remember best acc@1 and save checkpoint 181 | is_best = acc1 > best_acc1 182 | best_acc1 = max(acc1, best_acc1) 183 | 184 | epoch_end = time.time() 185 | 186 | with open(log_csv, 'a+') as f: 187 | csv_write = csv.writer(f) 188 | data_row = [time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(epoch_start)), epoch_end - epoch_start] 189 | csv_write.writerow(data_row) 190 | 191 | save_checkpoint( 192 | { 193 | 'epoch': epoch + 1, 194 | 'arch': args.arch, 195 | 'state_dict': model.state_dict(), 196 | 'best_acc1': best_acc1, 197 | }, is_best) 198 | 199 | 200 | def train(train_loader, model, criterion, optimizer, epoch, args): 201 | batch_time = AverageMeter('Time', ':6.3f') 202 | data_time = AverageMeter('Data', ':6.3f') 203 | losses = AverageMeter('Loss', ':.4e') 204 | top1 = AverageMeter('Acc@1', ':6.2f') 205 | top5 = AverageMeter('Acc@5', ':6.2f') 206 | progress = ProgressMeter(len(train_loader), [batch_time, data_time, losses, top1, top5], 207 | prefix="Epoch: [{}]".format(epoch)) 208 | 209 | # switch to train mode 210 | model.train() 211 | 212 | end = time.time() 213 | for i, (images, target) in enumerate(train_loader): 214 | # measure data loading time 215 | data_time.update(time.time() - end) 216 | 217 | images = images.cuda(non_blocking=True) 218 | target = target.cuda(non_blocking=True) 219 | # compute output 220 | output = model(images) 221 | loss = criterion(output, target) 222 | 223 | # measure accuracy and record loss 224 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 225 | losses.update(loss.item(), images.size(0)) 226 | top1.update(acc1[0], images.size(0)) 227 | top5.update(acc5[0], images.size(0)) 228 | 229 | # compute gradient and do SGD step 230 | optimizer.zero_grad() 231 | loss.backward() 232 | optimizer.step() 233 | 234 | # measure elapsed time 235 | batch_time.update(time.time() - end) 236 | end = time.time() 237 | 238 | if i % args.print_freq == 0: 239 | progress.display(i) 240 | 241 | 242 | def validate(val_loader, model, criterion, args): 243 | batch_time = AverageMeter('Time', ':6.3f') 244 | losses = AverageMeter('Loss', ':.4e') 245 | top1 = AverageMeter('Acc@1', ':6.2f') 246 | top5 = AverageMeter('Acc@5', ':6.2f') 247 | progress = ProgressMeter(len(val_loader), [batch_time, losses, top1, top5], prefix='Test: ') 248 | 249 | # switch to evaluate mode 250 | model.eval() 251 | 252 | with torch.no_grad(): 253 | end = time.time() 254 | for i, (images, target) in enumerate(val_loader): 255 | images = images.cuda(non_blocking=True) 256 | target = target.cuda(non_blocking=True) 257 | # compute output 258 | output = model(images) 259 | loss = criterion(output, target) 260 | 261 | # measure accuracy and record loss 262 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 263 | losses.update(loss.item(), images.size(0)) 264 | top1.update(acc1[0], images.size(0)) 265 | top5.update(acc5[0], images.size(0)) 266 | 267 | # measure elapsed time 268 | batch_time.update(time.time() - end) 269 | end = time.time() 270 | 271 | if i % args.print_freq == 0: 272 | progress.display(i) 273 | 274 | # TODO: this should also be done with the ProgressMeter 275 | print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5)) 276 | 277 | return top1.avg 278 | 279 | 280 | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'): 281 | torch.save(state, filename) 282 | if is_best: 283 | shutil.copyfile(filename, 'model_best.pth.tar') 284 | 285 | 286 | class AverageMeter(object): 287 | """Computes and stores the average and current value""" 288 | def __init__(self, name, fmt=':f'): 289 | self.name = name 290 | self.fmt = fmt 291 | self.reset() 292 | 293 | def reset(self): 294 | self.val = 0 295 | self.avg = 0 296 | self.sum = 0 297 | self.count = 0 298 | 299 | def update(self, val, n=1): 300 | self.val = val 301 | self.sum += val * n 302 | self.count += n 303 | self.avg = self.sum / self.count 304 | 305 | def __str__(self): 306 | fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' 307 | return fmtstr.format(**self.__dict__) 308 | 309 | 310 | class ProgressMeter(object): 311 | def __init__(self, num_batches, meters, prefix=""): 312 | self.batch_fmtstr = self._get_batch_fmtstr(num_batches) 313 | self.meters = meters 314 | self.prefix = prefix 315 | 316 | def display(self, batch): 317 | entries = [self.prefix + self.batch_fmtstr.format(batch)] 318 | entries += [str(meter) for meter in self.meters] 319 | print('\t'.join(entries)) 320 | 321 | def _get_batch_fmtstr(self, num_batches): 322 | num_digits = len(str(num_batches // 1)) 323 | fmt = '{:' + str(num_digits) + 'd}' 324 | return '[' + fmt + '/' + fmt.format(num_batches) + ']' 325 | 326 | 327 | def adjust_learning_rate(optimizer, epoch, args): 328 | """Sets the learning rate to the initial LR decayed by 10 every 30 epochs""" 329 | lr = args.lr * (0.1**(epoch // 30)) 330 | for param_group in optimizer.param_groups: 331 | param_group['lr'] = lr 332 | 333 | 334 | def accuracy(output, target, topk=(1, )): 335 | """Computes the accuracy over the k top predictions for the specified values of k""" 336 | with torch.no_grad(): 337 | maxk = max(topk) 338 | batch_size = target.size(0) 339 | 340 | _, pred = output.topk(maxk, 1, True, True) 341 | pred = pred.t() 342 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 343 | 344 | res = [] 345 | for k in topk: 346 | correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) 347 | res.append(correct_k.mul_(100.0 / batch_size)) 348 | return res 349 | 350 | 351 | if __name__ == '__main__': 352 | main() -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-master/multiprocessing_distributed.py: -------------------------------------------------------------------------------- 1 | # https://github.com/pytorch/examples/blob/master/imagenet/main.py 2 | 3 | import csv 4 | 5 | import argparse 6 | import os 7 | import random 8 | import shutil 9 | import time 10 | import warnings 11 | 12 | import torch 13 | import torch.nn as nn 14 | import torch.nn.parallel 15 | import torch.backends.cudnn as cudnn 16 | import torch.distributed as dist 17 | import torch.optim 18 | import torch.multiprocessing as mp 19 | import torch.utils.data 20 | import torch.utils.data.distributed 21 | import torchvision.transforms as transforms 22 | import torchvision.datasets as datasets 23 | import torchvision.models as models 24 | 25 | model_names = sorted(name for name in models.__dict__ 26 | if name.islower() and not name.startswith("__") and callable(models.__dict__[name])) 27 | 28 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training') 29 | parser.add_argument('--data', metavar='DIR', default='/home/zhangzhi/Data/ImageNet2012', help='path to dataset') 30 | parser.add_argument('-a', 31 | '--arch', 32 | metavar='ARCH', 33 | default='resnet18', 34 | choices=model_names, 35 | help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet18)') 36 | parser.add_argument('-j', 37 | '--workers', 38 | default=4, 39 | type=int, 40 | metavar='N', 41 | help='number of data loading workers (default: 4)') 42 | parser.add_argument('--epochs', default=90, type=int, metavar='N', help='number of total epochs to run') 43 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)') 44 | parser.add_argument('-b', 45 | '--batch-size', 46 | default=256, 47 | type=int, 48 | metavar='N', 49 | help='mini-batch size (default: 256), this is the total ' 50 | 'batch size of all GPUs on the current node when ' 51 | 'using Data Parallel or Distributed Data Parallel') 52 | parser.add_argument('--lr', 53 | '--learning-rate', 54 | default=0.1, 55 | type=float, 56 | metavar='LR', 57 | help='initial learning rate', 58 | dest='lr') 59 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum') 60 | parser.add_argument('--wd', 61 | '--weight-decay', 62 | default=1e-4, 63 | type=float, 64 | metavar='W', 65 | help='weight decay (default: 1e-4)', 66 | dest='weight_decay') 67 | parser.add_argument('-p', '--print-freq', default=10, type=int, metavar='N', help='print frequency (default: 10)') 68 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', help='evaluate model on validation set') 69 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', help='use pre-trained model') 70 | parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ') 71 | 72 | best_acc1 = 0 73 | 74 | 75 | def main(): 76 | args = parser.parse_args() 77 | 78 | if args.seed is not None: 79 | random.seed(args.seed) 80 | torch.manual_seed(args.seed) 81 | cudnn.deterministic = True 82 | warnings.warn('You have chosen to seed training. ' 83 | 'This will turn on the CUDNN deterministic setting, ' 84 | 'which can slow down your training considerably! ' 85 | 'You may see unexpected behavior when restarting ' 86 | 'from checkpoints.') 87 | 88 | mp.spawn(main_worker, nprocs=4, args=(4, args)) 89 | 90 | 91 | def main_worker(gpu, ngpus_per_node, args): 92 | global best_acc1 93 | 94 | dist.init_process_group(backend='nccl', init_method='tcp://127.0.0.1:23456', world_size=4, rank=gpu) 95 | # create model 96 | if args.pretrained: 97 | print("=> using pre-trained model '{}'".format(args.arch)) 98 | model = models.__dict__[args.arch](pretrained=True) 99 | else: 100 | print("=> creating model '{}'".format(args.arch)) 101 | model = models.__dict__[args.arch]() 102 | 103 | torch.cuda.set_device(gpu) 104 | model.cuda(gpu) 105 | # When using a single GPU per process and per 106 | # DistributedDataParallel, we need to divide the batch size 107 | # ourselves based on the total number of GPUs we have 108 | args.batch_size = int(args.batch_size / ngpus_per_node) 109 | model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[gpu]) 110 | 111 | # define loss function (criterion) and optimizer 112 | criterion = nn.CrossEntropyLoss().cuda(gpu) 113 | 114 | optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay) 115 | 116 | cudnn.benchmark = True 117 | 118 | # Data loading code 119 | traindir = os.path.join(args.data, 'train') 120 | valdir = os.path.join(args.data, 'val') 121 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 122 | 123 | train_dataset = datasets.ImageFolder( 124 | traindir, 125 | transforms.Compose([ 126 | transforms.RandomResizedCrop(224), 127 | transforms.RandomHorizontalFlip(), 128 | transforms.ToTensor(), 129 | normalize, 130 | ])) 131 | 132 | train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) 133 | 134 | train_loader = torch.utils.data.DataLoader(train_dataset, 135 | batch_size=args.batch_size, 136 | shuffle=(train_sampler is None), 137 | num_workers=2, 138 | pin_memory=True, 139 | sampler=train_sampler) 140 | 141 | val_loader = torch.utils.data.DataLoader(datasets.ImageFolder( 142 | valdir, 143 | transforms.Compose([ 144 | transforms.Resize(256), 145 | transforms.CenterCrop(224), 146 | transforms.ToTensor(), 147 | normalize, 148 | ])), 149 | batch_size=args.batch_size, 150 | shuffle=False, 151 | num_workers=2, 152 | pin_memory=True) 153 | 154 | if args.evaluate: 155 | validate(val_loader, model, criterion, gpu, args) 156 | return 157 | 158 | log_csv = "multiprocessing_distributed.csv" 159 | 160 | for epoch in range(args.start_epoch, args.epochs): 161 | epoch_start = time.time() 162 | 163 | train_sampler.set_epoch(epoch) 164 | adjust_learning_rate(optimizer, epoch, args) 165 | 166 | # train for one epoch 167 | train(train_loader, model, criterion, optimizer, epoch, gpu, args) 168 | 169 | # evaluate on validation set 170 | acc1 = validate(val_loader, model, criterion, gpu, args) 171 | 172 | # remember best acc@1 and save checkpoint 173 | is_best = acc1 > best_acc1 174 | best_acc1 = max(acc1, best_acc1) 175 | 176 | epoch_end = time.time() 177 | 178 | with open(log_csv, 'a+') as f: 179 | csv_write = csv.writer(f) 180 | data_row = [time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(epoch_start)), epoch_end - epoch_start] 181 | csv_write.writerow(data_row) 182 | 183 | save_checkpoint( 184 | { 185 | 'epoch': epoch + 1, 186 | 'arch': args.arch, 187 | 'state_dict': model.module.state_dict(), 188 | 'best_acc1': best_acc1, 189 | }, is_best) 190 | 191 | 192 | def train(train_loader, model, criterion, optimizer, epoch, gpu, args): 193 | batch_time = AverageMeter('Time', ':6.3f') 194 | data_time = AverageMeter('Data', ':6.3f') 195 | losses = AverageMeter('Loss', ':.4e') 196 | top1 = AverageMeter('Acc@1', ':6.2f') 197 | top5 = AverageMeter('Acc@5', ':6.2f') 198 | progress = ProgressMeter(len(train_loader), [batch_time, data_time, losses, top1, top5], 199 | prefix="Epoch: [{}]".format(epoch)) 200 | 201 | # switch to train mode 202 | model.train() 203 | 204 | end = time.time() 205 | for i, (images, target) in enumerate(train_loader): 206 | # measure data loading time 207 | data_time.update(time.time() - end) 208 | 209 | images = images.cuda(gpu, non_blocking=True) 210 | target = target.cuda(gpu, non_blocking=True) 211 | 212 | # compute output 213 | output = model(images) 214 | loss = criterion(output, target) 215 | 216 | # measure accuracy and record loss 217 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 218 | losses.update(loss.item(), images.size(0)) 219 | top1.update(acc1[0], images.size(0)) 220 | top5.update(acc5[0], images.size(0)) 221 | 222 | # compute gradient and do SGD step 223 | optimizer.zero_grad() 224 | loss.backward() 225 | optimizer.step() 226 | 227 | # measure elapsed time 228 | batch_time.update(time.time() - end) 229 | end = time.time() 230 | 231 | if i % args.print_freq == 0: 232 | progress.display(i) 233 | 234 | 235 | def validate(val_loader, model, criterion, gpu, args): 236 | batch_time = AverageMeter('Time', ':6.3f') 237 | losses = AverageMeter('Loss', ':.4e') 238 | top1 = AverageMeter('Acc@1', ':6.2f') 239 | top5 = AverageMeter('Acc@5', ':6.2f') 240 | progress = ProgressMeter(len(val_loader), [batch_time, losses, top1, top5], prefix='Test: ') 241 | 242 | # switch to evaluate mode 243 | model.eval() 244 | 245 | with torch.no_grad(): 246 | end = time.time() 247 | for i, (images, target) in enumerate(val_loader): 248 | images = images.cuda(gpu, non_blocking=True) 249 | target = target.cuda(gpu, non_blocking=True) 250 | 251 | # compute output 252 | output = model(images) 253 | loss = criterion(output, target) 254 | 255 | # measure accuracy and record loss 256 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 257 | losses.update(loss.item(), images.size(0)) 258 | top1.update(acc1[0], images.size(0)) 259 | top5.update(acc5[0], images.size(0)) 260 | 261 | # measure elapsed time 262 | batch_time.update(time.time() - end) 263 | end = time.time() 264 | 265 | if i % args.print_freq == 0: 266 | progress.display(i) 267 | 268 | # TODO: this should also be done with the ProgressMeter 269 | print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5)) 270 | 271 | return top1.avg 272 | 273 | 274 | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'): 275 | torch.save(state, filename) 276 | if is_best: 277 | shutil.copyfile(filename, 'model_best.pth.tar') 278 | 279 | 280 | class AverageMeter(object): 281 | """Computes and stores the average and current value""" 282 | def __init__(self, name, fmt=':f'): 283 | self.name = name 284 | self.fmt = fmt 285 | self.reset() 286 | 287 | def reset(self): 288 | self.val = 0 289 | self.avg = 0 290 | self.sum = 0 291 | self.count = 0 292 | 293 | def update(self, val, n=1): 294 | self.val = val 295 | self.sum += val * n 296 | self.count += n 297 | self.avg = self.sum / self.count 298 | 299 | def __str__(self): 300 | fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' 301 | return fmtstr.format(**self.__dict__) 302 | 303 | 304 | class ProgressMeter(object): 305 | def __init__(self, num_batches, meters, prefix=""): 306 | self.batch_fmtstr = self._get_batch_fmtstr(num_batches) 307 | self.meters = meters 308 | self.prefix = prefix 309 | 310 | def display(self, batch): 311 | entries = [self.prefix + self.batch_fmtstr.format(batch)] 312 | entries += [str(meter) for meter in self.meters] 313 | print('\t'.join(entries)) 314 | 315 | def _get_batch_fmtstr(self, num_batches): 316 | num_digits = len(str(num_batches // 1)) 317 | fmt = '{:' + str(num_digits) + 'd}' 318 | return '[' + fmt + '/' + fmt.format(num_batches) + ']' 319 | 320 | 321 | def adjust_learning_rate(optimizer, epoch, args): 322 | """Sets the learning rate to the initial LR decayed by 10 every 30 epochs""" 323 | lr = args.lr * (0.1**(epoch // 30)) 324 | for param_group in optimizer.param_groups: 325 | param_group['lr'] = lr 326 | 327 | 328 | def accuracy(output, target, topk=(1, )): 329 | """Computes the accuracy over the k top predictions for the specified values of k""" 330 | with torch.no_grad(): 331 | maxk = max(topk) 332 | batch_size = target.size(0) 333 | 334 | _, pred = output.topk(maxk, 1, True, True) 335 | pred = pred.t() 336 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 337 | 338 | res = [] 339 | for k in topk: 340 | correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) 341 | res.append(correct_k.mul_(100.0 / batch_size)) 342 | return res 343 | 344 | 345 | if __name__ == '__main__': 346 | main() -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-master/requirements.txt: -------------------------------------------------------------------------------- 1 | torch==1.3.0 2 | torchvision==0.4.0 3 | apex==0.9.10 4 | horovod==0.18.2 5 | -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-master/start.sh: -------------------------------------------------------------------------------- 1 | python multiprocessing_distributed.py 2 | CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 distributed.py 3 | CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 apex_distributed.py 4 | HOROVOD_WITH_PYTORCH=1 CUDA_VISIBLE_DEVICES=0,1,2,3 horovodrun -np 4 -H localhost:4 --verbose python horovod_distributed.py -------------------------------------------------------------------------------- /appendix/production/distributed/pytorch-distributed-master/statistics.sh: -------------------------------------------------------------------------------- 1 | nvidia-smi -i 0,1,2,3 --format=csv,noheader,nounits --query-gpu=timestamp,index,memory.total,memory.used,memory.free,utilization.gpu,utilization.memory -lms 500 -f multiprocessing_distributed_log.csv 2 | nvidia-smi -i 0,1,2,3 --format=csv,noheader,nounits --query-gpu=timestamp,index,memory.total,memory.used,memory.free,utilization.gpu,utilization.memory -lms 500 -f distributed_log.csv 3 | nvidia-smi -i 0,1,2,3 --format=csv,noheader,nounits --query-gpu=timestamp,index,memory.total,memory.used,memory.free,utilization.gpu,utilization.memory -lms 500 -f apex_distributed_log.csv 4 | nvidia-smi -i 0,1,2,3 --format=csv,noheader,nounits --query-gpu=timestamp,index,memory.total,memory.used,memory.free,utilization.gpu,utilization.memory -lms 500 -f horovod_distributed_log.csv -------------------------------------------------------------------------------- /appendix/production/inference/TensorRT/README.md: -------------------------------------------------------------------------------- 1 | # PyTorch_ONNX_TensorRT 2 | - [ ] 跑通一个 分类的程序 + cpp 调用代码 ! ! 3 | 4 | - [ ] 搞明白 tensort 的基本逻辑 5 | 6 | - [ ] tensorRT 对齐操作(int8量化) 7 | 8 | - [ ] 如何添加一个新层 9 | 10 | - [ ] 跑通一个检测流程 11 | 12 | - [ ] 整理 README 文档(安装、两个示例程序、辅助脚本、常见错误) 13 | 14 | 15 | 16 | **!!! 生成的模型, 必须由相同的算力的显卡来解析, 具体的算力可以在 https://developer.nvidia.com/cuda-gpus 查看。** 17 | 18 | ## 安装 TensorRT 19 | 20 | >Python 环境: 21 | > 22 | >*.pth[pytorch] > *.onnx[onnx] > *.trt[tensorRT] 23 | > 24 | >*.pth[pytorch] > *.txtp[TensorRT] Python 环境 25 | 26 | 27 | 28 | ## TensorRT 执行流程 29 | 30 | - Create Builder : 包含 TensorRT 组件、pineline、buffer地址、输入输出维度 31 | 32 | - Create Network : 保存训练好的神经网络、其输入是神经网络模型(onnx、tf), 其输出是可执行的推理引擎。 33 | 34 | - Create Parser: 解析网络 35 | 36 | - Binging input、output 以及自定义组件 37 | 38 | - 序列化或者反序列化 39 | 40 | - 传输计算数据(host->device) 41 | 42 | - 执行计算 43 | 44 | - 传输计算结果(device->host) 45 | 46 | 47 | 48 | ## Bugs 49 | 50 | **1. AttributeError: ‘NoneType’ object has no attribute ‘create_execution_context’** 51 | 52 | ~~~python 53 | EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH) 54 | 55 | def build_engine(model_path): 56 | with trt.Builder(TRT_LOGGER) as builder, \ 57 | builder.create_network(EXPLICIT_BATCH) as network, \ 58 | trt.OnnxParser(network, TRT_LOGGER) as parser: 59 | ~~~ 60 | 61 | **2. pycuda._driver.LogicError: explicit_context_dependent failed: invalid device context - no currently active context?** 62 | 63 | 原因: pycuda.driver 没有初始化,导致无法得到 context,需要在导入 pycuda.driver 后再导入 pycuda.autoinit, 即如下: 64 | 65 | ~~~ 66 | import pycuda.driver as cuda 67 | import pycuda.autoinit 68 | ~~~ 69 | 70 | **3. output tensor has no attribute _trt** 71 | 72 | 模型中有一些操作还没有实现,需要自己实现。 73 | 74 | 75 | 76 | ## Credits 77 | 78 | - https://github.com/zerollzeng/tiny-tensorrt 79 | - https://github.com/NVIDIA-AI-IOT/torch2trt 80 | - https://github.com/Rapternmn/PyTorch-Onnx-Tensorrt 81 | - https://github.com/onnx/onnx-tensorrt 82 | 83 | 84 | 85 | ## BAK 86 | 87 | 报错:UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xaa in position 8: invalid start byte 88 | 原因:在打开导入序列化模型时,需要采用’rb’模式才能读,否则不能读取,即在读取序列化模型时,需要做3件事,如下: 89 | 90 | 打开文件,必须用rb模式:with open(cfg.work_dir + ‘serialized.engine’, ‘rb’) as f 91 | 创建runtime:trt.Runtime(logger) as runtime 92 | 基于runtime生成反序列化模型:engine = runtime.deserialize_cuda_engine(f.read()) 93 | 报错:onnx.onnx_cpp2py_export.checker.ValidationError: Op registered for Upsample is deprecated in domain_version of 11 94 | 附加报错信息: 95 | Context: Bad node spec: input: “085_convolutional_lrelu” output: “086_upsample” name: “086_upsample” op_type: “Upsample” attribute 96 | { name: “mode” s: “nearest” type: STRING } attribute { name: “scales” floats: 1 floats: 1 floats: 2 floats: 2 type: FLOATS } 97 | 问题原因:onnx更新太快了,在官方1.5.1以后就取消了upsample层,所以对yolov3报错了。参考https://devtalk.nvidia.com/default/topic/1052153/jetson-nano/tensorrt-backend-for-onnx-on-jetson-nano/1 98 | 修改方式是降级onnx到1.4.1 99 | pip uninstall onnx 100 | pip install onnx==1.4.1 101 | 102 | 103 | 104 | 105 | 106 | -------------------------------------------------------------------------------- /appendix/production/inference/TensorRT/cat.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/production/inference/TensorRT/cat.jpeg -------------------------------------------------------------------------------- /appendix/production/inference/TensorRT/trt_helper.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pycuda.autoinit 3 | import numpy as np 4 | import pycuda.driver as cuda 5 | import tensorrt as trt 6 | import torch 7 | from .trt_int8_calibration_helper import PythonEntropyCalibrator 8 | TRT_LOGGER = trt.Logger() # This logger is required to build an engine 9 | EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH) 10 | 11 | class HostDeviceMem(object): 12 | def __init__(self, host_mem, device_mem): 13 | """Within this context, host_mom means the cpu memory and device means the GPU memory 14 | """ 15 | self.host = host_mem 16 | self.device = device_mem 17 | def __str__(self): 18 | return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device) 19 | 20 | def __repr__(self): 21 | return self.__str__() 22 | 23 | def allocate_buffers(engine): 24 | inputs = [] 25 | outputs = [] 26 | bindings = [] 27 | stream = cuda.Stream() 28 | for binding in engine: 29 | size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size 30 | dtype = trt.nptype(engine.get_binding_dtype(binding)) 31 | # Allocate host and device buffers 32 | host_mem = cuda.pagelocked_empty(size, dtype) 33 | device_mem = cuda.mem_alloc(host_mem.nbytes) 34 | # Append the device buffer to device bindings. 35 | bindings.append(int(device_mem)) 36 | # Append to the appropriate list. 37 | if engine.binding_is_input(binding): 38 | inputs.append(HostDeviceMem(host_mem, device_mem)) 39 | else: 40 | outputs.append(HostDeviceMem(host_mem, device_mem)) 41 | return inputs, outputs, bindings, stream 42 | 43 | def get_engine(max_batch_size=1, onnx_file_path="", engine_file_path="",\ 44 | fp16_mode=False, int8_mode=False, calibration_stream=None, save_engine=False, 45 | ): 46 | """Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it.""" 47 | def build_engine(max_batch_size, save_engine): 48 | """Takes an ONNX file and creates a TensorRT engine to run inference with""" 49 | with trt.Builder(TRT_LOGGER) as builder, \ 50 | builder.create_network(EXPLICIT_BATCH) as network,\ 51 | trt.OnnxParser(network, TRT_LOGGER) as parser: 52 | 53 | builder.max_workspace_size = 1 << 30 # Your workspace size 54 | builder.max_batch_size = max_batch_size 55 | #pdb.set_trace() 56 | builder.fp16_mode = fp16_mode # Default: False 57 | builder.int8_mode = int8_mode # Default: False 58 | if int8_mode: 59 | assert calibration_stream, 'Error: a calibration_stream should be provided for int8 mode' 60 | builder.int8_calibrator = PythonEntropyCalibrator(["input"], calibration_stream, 'calibration_cache.bin') 61 | print('Int8 mode enabled') 62 | # Parse model file 63 | if not os.path.exists(onnx_file_path): 64 | quit('ONNX file {} not found'.format(onnx_file_path)) 65 | 66 | print('Loading ONNX file from path {}...'.format(onnx_file_path)) 67 | with open(onnx_file_path, 'rb') as model: 68 | print('Beginning ONNX file parsing') 69 | parser.parse(model.read()) 70 | assert network.num_layers > 0, 'Failed to parse ONNX model. \ 71 | Please check if the ONNX model is compatible ' 72 | 73 | print('Completed parsing of ONNX file') 74 | 75 | print('Building an engine from file {}; this may take a while...'.format(onnx_file_path)) 76 | engine = builder.build_cuda_engine(network) 77 | # If errors happend when executing builder.build_cuda_engine(network), 78 | # a None-Type object would be returned 79 | if engine is None: 80 | print('Failed to create the engine') 81 | return None 82 | 83 | print("Completed creating the engine") 84 | if save_engine: 85 | with open(engine_file_path, "wb") as f: 86 | f.write(engine.serialize()) 87 | return engine 88 | 89 | if os.path.exists(engine_file_path): 90 | # If a serialized engine exists, load it instead of building a new one. 91 | print("Reading engine from file {}".format(engine_file_path)) 92 | with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime: 93 | return runtime.deserialize_cuda_engine(f.read()) 94 | else: 95 | return build_engine(max_batch_size, save_engine) 96 | 97 | def do_inference(context, bindings, inputs, outputs, stream, batch_size=1): 98 | # Transfer data from CPU to the GPU. 99 | [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs] 100 | # Run inference. 101 | context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle) 102 | # Transfer predictions back from the GPU. 103 | [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs] 104 | # Synchronize the stream 105 | stream.synchronize() 106 | # Return only the host outputs. 107 | return [out.host for out in outputs] 108 | 109 | def postprocess_the_outputs(h_outputs, shape_of_output): 110 | h_outputs = h_outputs.reshape(*shape_of_output) 111 | return h_outputs 112 | -------------------------------------------------------------------------------- /appendix/production/inference/TensorRT/trt_int8_calibration_helper.py: -------------------------------------------------------------------------------- 1 | import os 2 | import tensorrt as trt 3 | import pycuda.driver as cuda 4 | import pycuda.autoinit 5 | import numpy as np 6 | import ctypes 7 | 8 | ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.c_char_p 9 | ctypes.pythonapi.PyCapsule_GetPointer.argtypes = [ctypes.py_object, ctypes.c_char_p] 10 | 11 | class PythonEntropyCalibrator(trt.IInt8EntropyCalibrator): 12 | def __init__(self, input_layers, stream, cache_file='calibration_cache.bin'): 13 | trt.IInt8EntropyCalibrator.__init__(self) 14 | self.input_layers = input_layers 15 | self.stream = stream 16 | self.d_input = cuda.mem_alloc(self.stream.calibration_data.nbytes) 17 | self.cache_file = cache_file 18 | stream.reset() 19 | 20 | def get_batch_size(self): 21 | return self.stream.batch_size 22 | 23 | def get_batch(self, bindings, names): 24 | batch = self.stream.next_batch() 25 | if not batch.size: 26 | return None 27 | 28 | cuda.memcpy_htod(self.d_input, batch) 29 | for i in self.input_layers[0]: 30 | assert names[0] != i 31 | 32 | bindings[0] = int(self.d_input) 33 | return bindings 34 | 35 | def read_calibration_cache(self, length): 36 | # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None. 37 | if os.path.exists(self.cache_file): 38 | with open(self.cache_file, "rb") as f: 39 | return f.read() 40 | 41 | def write_calibration_cache(self, ptr, size): 42 | #cache = ctypes.c_char_p(int(ptr)) 43 | value = ctypes.pythonapi.PyCapsule_GetPointer(ptr, None) 44 | 45 | ''' 46 | # TODO: If the calibration is read from cache 'calibration_cache.bin', it will raise bugs 47 | # Will solve this in the future. 48 | with open(self.cache_file, 'wb') as f: 49 | #f.write(cache.value) 50 | f.write(value) 51 | ''' 52 | return None 53 | 54 | 55 | class ImageBatchStreamDemo(): 56 | def __init__(self,dataset, transform, batch_size, img_size, max_batches=10): 57 | ''' 58 | For calibiration, you need to implement your 'next_batch' and 'reset' functions 59 | ''' 60 | self.transform = transform 61 | self.batch_size = batch_size 62 | self.max_batches = max_batches 63 | self.dataset = dataset 64 | 65 | # self.calibration_data = np.zeros((batch_size, 3, 800, 250), dtype=np.float32) 66 | self.calibration_data = np.zeros((batch_size,)+ img_size, dtype=np.float32) # This is a data holder for the calibration 67 | self.batch_count = 0 68 | 69 | 70 | def reset(self): 71 | self.batch_count = 0 72 | 73 | def next_batch(self): 74 | """ 75 | Return a batch of data every time called 76 | """ 77 | #self.max_batches = 2 78 | if self.batch_count < self.max_batches: 79 | i = self.batch_count 80 | for i in range(self.batch_size): 81 | # You should implement your own data pipeline for writing the calibration_data 82 | 83 | x = self.dataset[i] 84 | if self.transform: 85 | x = self.transform(x) 86 | 87 | self.calibration_data[i] = x.data 88 | self.batch_count += 1 89 | return np.ascontiguousarray(self.calibration_data, dtype=np.float32) 90 | else: 91 | return np.array([]) 92 | -------------------------------------------------------------------------------- /appendix/production/inference/flask-api/.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | share/python-wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | MANIFEST 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .nox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | .hypothesis/ 50 | .pytest_cache/ 51 | 52 | # Translations 53 | *.mo 54 | *.pot 55 | 56 | # Django stuff: 57 | *.log 58 | local_settings.py 59 | db.sqlite3 60 | 61 | # Flask stuff: 62 | instance/ 63 | .webassets-cache 64 | 65 | # Scrapy stuff: 66 | .scrapy 67 | 68 | # Sphinx documentation 69 | docs/_build/ 70 | 71 | # PyBuilder 72 | target/ 73 | 74 | # Jupyter Notebook 75 | .ipynb_checkpoints 76 | 77 | # IPython 78 | profile_default/ 79 | ipython_config.py 80 | 81 | # pyenv 82 | .python-version 83 | 84 | # celery beat schedule file 85 | celerybeat-schedule 86 | 87 | # SageMath parsed files 88 | *.sage.py 89 | 90 | # Environments 91 | .env 92 | .venv 93 | env/ 94 | venv/ 95 | ENV/ 96 | env.bak/ 97 | venv.bak/ 98 | 99 | # Spyder project settings 100 | .spyderproject 101 | .spyproject 102 | 103 | # Rope project settings 104 | .ropeproject 105 | 106 | # mkdocs documentation 107 | /site 108 | 109 | # mypy 110 | .mypy_cache/ 111 | .dmypy.json 112 | dmypy.json 113 | 114 | # Pyre type checker 115 | .pyre/ -------------------------------------------------------------------------------- /appendix/production/inference/flask-api/LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2019 Avinash Sajjanshetty 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 9 | the Software, and to permit persons to whom the Software is furnished to do so, 10 | subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 17 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 18 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 19 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 20 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 21 | -------------------------------------------------------------------------------- /appendix/production/inference/flask-api/README.md: -------------------------------------------------------------------------------- 1 | # PyTorch Flask API 2 | 3 | This repo contains a sample code to show how to create a Flask API server by deploying our PyTorch model. This is a sample code which goes with [tutorial](https://pytorch.org/tutorials/intermediate/flask_rest_api_tutorial.html). 4 | 5 | If you'd like to learn how to deploy to Heroku, then check [this repo](https://github.com/avinassh/pytorch-flask-api-heroku). 6 | 7 | 8 | ## How to 9 | 10 | Install the dependencies: 11 | 12 | pip install -r requirements.txt 13 | 14 | 15 | Run the Flask server: 16 | 17 | FLASK_ENV=development FLASK_APP=app.py flask run 18 | 19 | 20 | From another tab, send the image file in a request: 21 | 22 | curl -X POST -F file=@cat_pic.jpeg http://localhost:5000/predict 23 | 24 | 25 | ## License 26 | 27 | The mighty MIT license. Please check `LICENSE` for more details. 28 | -------------------------------------------------------------------------------- /appendix/production/inference/flask-api/app.py: -------------------------------------------------------------------------------- 1 | import io 2 | import json 3 | 4 | from torchvision import models 5 | import torchvision.transforms as transforms 6 | from PIL import Image 7 | from flask import Flask, jsonify, request 8 | 9 | 10 | app = Flask(__name__) 11 | imagenet_class_index = json.load(open('imagenet_class_index.json')) 12 | model = models.densenet121(pretrained=True) 13 | model.eval() 14 | 15 | 16 | def transform_image(image_bytes): 17 | my_transforms = transforms.Compose([transforms.Resize(255), 18 | transforms.CenterCrop(224), 19 | transforms.ToTensor(), 20 | transforms.Normalize( 21 | [0.485, 0.456, 0.406], 22 | [0.229, 0.224, 0.225])]) 23 | image = Image.open(io.BytesIO(image_bytes)) 24 | return my_transforms(image).unsqueeze(0) 25 | 26 | 27 | def get_prediction(image_bytes): 28 | tensor = transform_image(image_bytes=image_bytes) 29 | outputs = model.forward(tensor) 30 | _, y_hat = outputs.max(1) 31 | predicted_idx = str(y_hat.item()) 32 | return imagenet_class_index[predicted_idx] 33 | 34 | 35 | @app.route('/predict', methods=['POST']) 36 | def predict(): 37 | if request.method == 'POST': 38 | file = request.files['file'] 39 | img_bytes = file.read() 40 | class_id, class_name = get_prediction(image_bytes=img_bytes) 41 | return jsonify({'class_id': class_id, 'class_name': class_name}) 42 | 43 | 44 | if __name__ == '__main__': 45 | app.run() 46 | -------------------------------------------------------------------------------- /appendix/production/inference/flask-api/cat.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/production/inference/flask-api/cat.jpeg -------------------------------------------------------------------------------- /appendix/production/inference/flask-api/requirements.txt: -------------------------------------------------------------------------------- 1 | torchvision==0.6.0+cu101 2 | Flask==1.1.2 3 | Pillow==7.1.2 4 | -------------------------------------------------------------------------------- /appendix/template/.gitignore: -------------------------------------------------------------------------------- 1 | 2 | .DS_Store 3 | build 4 | .git 5 | *.egg-info 6 | dist 7 | output 8 | data/coco 9 | backup 10 | weights/*.weights 11 | __pycache__ 12 | checkpoints 13 | -------------------------------------------------------------------------------- /appendix/template/README.md: -------------------------------------------------------------------------------- 1 | # PyTorch-Template 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ### TBD: 12 | 13 | - [ ] Test file 14 | - [ ] README 15 | 16 | ## Credit 17 | 18 | https://github.com/eriklindernoren/PyTorch-YOLOv3 -------------------------------------------------------------------------------- /appendix/template/assets/images/01.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/assets/images/01.JPG -------------------------------------------------------------------------------- /appendix/template/assets/images/02.JPG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/assets/images/02.JPG -------------------------------------------------------------------------------- /appendix/template/data/test/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/test/.gitkeep -------------------------------------------------------------------------------- /appendix/template/data/test/high/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/test/high/1.png -------------------------------------------------------------------------------- /appendix/template/data/test/high/22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/test/high/22.png -------------------------------------------------------------------------------- /appendix/template/data/test/low/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/test/low/1.png -------------------------------------------------------------------------------- /appendix/template/data/test/low/22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/test/low/22.png -------------------------------------------------------------------------------- /appendix/template/data/train/high/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/train/high/2.png -------------------------------------------------------------------------------- /appendix/template/data/train/high/5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/train/high/5.png -------------------------------------------------------------------------------- /appendix/template/data/train/low/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/train/low/2.png -------------------------------------------------------------------------------- /appendix/template/data/train/low/5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/train/low/5.png -------------------------------------------------------------------------------- /appendix/template/data/valid/high/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/valid/high/1.png -------------------------------------------------------------------------------- /appendix/template/data/valid/high/22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/valid/high/22.png -------------------------------------------------------------------------------- /appendix/template/data/valid/low/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/valid/low/1.png -------------------------------------------------------------------------------- /appendix/template/data/valid/low/22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/valid/low/22.png -------------------------------------------------------------------------------- /appendix/template/datasets.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | from glob import glob 4 | 5 | import numpy as np 6 | from PIL import Image 7 | 8 | import torch 9 | from torchvision import transforms 10 | from torch.utils.data import Dataset, DataLoader 11 | from torchvision.transforms import functional as F 12 | 13 | def transform_train(data, target): 14 | # random crop 15 | i, j, h, w = transforms.RandomCrop.get_params( 16 | data, output_size=(96, 96)) 17 | data = F.crop(data, i, j, h, w) 18 | target = F.crop(target, i, j, h, w) 19 | # hflip 20 | if random.random() > 0.5: 21 | data = F.hflip(data) 22 | target = F.hflip(target) 23 | return F.to_tensor(data), F.to_tensor(target) 24 | 25 | def transform_test(data, target): 26 | return F.to_tensor(data), F.to_tensor(target) 27 | 28 | 29 | class KKDataset(Dataset): 30 | """ 31 | """ 32 | def __init__(self, root_dir='./data/train', is_trainval = True, transform=None): 33 | """ 34 | """ 35 | self.root_dir = root_dir 36 | self.is_trainval = is_trainval 37 | self.transform = transform 38 | self.train_data = sorted(glob(os.path.join(self.root_dir, "low/*.png"))) 39 | self.train_target = sorted(glob(os.path.join(self.root_dir, "high/*.png"))) 40 | 41 | def __len__(self): 42 | return len(self.train_data) 43 | 44 | def __getitem__(self, idx): 45 | if torch.is_tensor(idx): 46 | idx = idx.tolist() 47 | 48 | data = Image.open(self.train_data[idx]) 49 | target = Image.open(self.train_target[idx]) 50 | 51 | if self.transform: 52 | data, target = self.transform(data, target) 53 | 54 | sample = {'data': data, 'target': target} 55 | return sample -------------------------------------------------------------------------------- /appendix/template/logs/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/logs/.gitignore -------------------------------------------------------------------------------- /appendix/template/loss.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | class RRLoss(nn.Module): 5 | def __init__(self): 6 | """ RRLoss""" 7 | super(RRLoss, self).__init__() 8 | 9 | def forward(self, predict, data): 10 | return torch.mean(torch.square(predict - data)) -------------------------------------------------------------------------------- /appendix/template/metrics.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | class PSNR(): 4 | """Peak Signal to Noise Ratio 5 | img1 and img2 have range [0, 255]""" 6 | 7 | def __init__(self): 8 | self.name = "PSNR" 9 | 10 | @staticmethod 11 | def __call__(img1, img2): 12 | mse = torch.mean((img1 - img2) ** 2) 13 | return 20 * torch.log10(255.0 / torch.sqrt(mse)) -------------------------------------------------------------------------------- /appendix/template/models.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | class XXXNet(nn.Module): 6 | """ 7 | """ 8 | def __init__(self): 9 | super(XXXNet, self).__init__() 10 | self.conv1 = nn.Conv2d(3, 3, kernel_size = 3, padding=1) 11 | # ... 12 | 13 | def forward(self, x): 14 | out = self.conv1(x) 15 | # ... 16 | return out -------------------------------------------------------------------------------- /appendix/template/requirements.txt: -------------------------------------------------------------------------------- 1 | torch==1.3.0+cu100 2 | numpy==1.18.4 3 | torchvision==0.6.0+cu101 4 | Pillow==7.1.2 5 | -------------------------------------------------------------------------------- /appendix/template/test.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/test.py -------------------------------------------------------------------------------- /appendix/template/train.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | import argparse 5 | import shutil 6 | import numpy as np 7 | 8 | import torch 9 | import torch.nn as nn 10 | import torch.nn.functional as F 11 | import torch.optim as optim 12 | from torchvision import datasets, transforms 13 | from torch.autograd import Variable 14 | from torch.utils.data import DataLoader 15 | 16 | from datasets import KKDataset, transform_train, transform_test # 17 | from models import XXXNet # 18 | from loss import RRLoss # 19 | from metrics import PSNR # 20 | from utils import * 21 | 22 | # Training settings 23 | parser = argparse.ArgumentParser(description='PyTorch Template training') 24 | # env 25 | parser.add_argument('--use_gpu', action='store_true', default=True, help='enables/disables CUDA training') 26 | parser.add_argument('--seed', type=int, default=2020, metavar='S', help='random seed (default: 2020)') 27 | parser.add_argument('--log_path', type=str, default='./logs', metavar='PATH', help='path to save logs (default: current directory/logs)') 28 | # data 29 | parser.add_argument('--train_dataset', type=str, default='./data/train', help='training dataset path') 30 | parser.add_argument('--valid_dataset', type=str, default='./data/valid', help='test dataset path') 31 | parser.add_argument('--batch_size', type=int, default=16, metavar='N', help='input batch size for training (default: 16)') 32 | parser.add_argument('--val_batch_size', type=int, default=16, metavar='N', help='input batch size for testing (default: 16)') 33 | # models 34 | parser.add_argument('--resume', type=str, default='', metavar='PATH', help='path to checkpoint path(default: none)') 35 | parser.add_argument('--checkpoint_dir', type=str, default='./checkpoints', metavar='PATH', help='path to save prune model (default: current directory/checkpoints)') 36 | # optimizer & lr 37 | parser.add_argument('--init_lr', type=float, default=0.01, metavar='LR', help='init learning rate (default: 0.1)') 38 | parser.add_argument('--momentum', type=float, default=0.9, metavar='M', help='SGD momentum (default: 0.9)') 39 | parser.add_argument('--weight_decay', '--wd', default=1e-4, type=float, metavar='W', help='weight decay (default: 1e-4)') 40 | # epoch & save-interval 41 | parser.add_argument('--epochs', type=int, default=160, metavar='N', help='number of epochs to train (default: 160)') 42 | parser.add_argument('--start_epoch', type=int, default=0, metavar='N', help='manual epoch number (useful on restarts)') 43 | parser.add_argument('--save_interval', type=int, default=1, metavar='N', help='how many batches to wait before logging training status') 44 | args = parser.parse_args() 45 | 46 | 47 | # global setting 48 | torch.manual_seed(args.seed) 49 | 50 | device = torch.device("cuda" if (args.use_gpu and torch.cuda.is_available()) else "cpu") 51 | if device == "cuda": 52 | torch.cuda.manual_seed(args.seed) 53 | cudnn.benchmark = True 54 | cudnn.enabled = True 55 | 56 | if not os.path.exists(args.checkpoint_dir): 57 | os.makedirs(args.ckpt_path) 58 | if not os.path.exists(args.log_path): 59 | os.makedirs(args.log_path) 60 | 61 | 62 | # dataset 63 | train_dataset = KKDataset(args.train_dataset, is_trainval = True, transform = transform_train) # 64 | train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=args.batch_size, 65 | shuffle=True, num_workers=0, drop_last=False) # 66 | valid_dataset = KKDataset(args.valid_dataset, is_trainval = True, transform = transform_test) # 67 | valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=args.val_batch_size, 68 | shuffle=True, num_workers=0) # 69 | # model & loss 70 | model = XXXNet().to(device) # 71 | lossfunc = RRLoss() # 72 | criterion = PSNR() # 73 | # lr & optimizer 74 | optimizer = optim.SGD(model.parameters(), lr=args.init_lr, momentum=args.momentum, weight_decay=args.weight_decay) 75 | scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[50, 70], gamma=0.1) 76 | 77 | # load resume 78 | if args.resume: 79 | if os.path.isfile(args.resume): 80 | print("=> loading checkpoint '{}'".format(args.resume)) 81 | checkpoint = torch.load(args.resume) 82 | args.start_epoch = checkpoint['epoch'] 83 | best_prec = checkpoint['best_prec'] 84 | model.load_state_dict(checkpoint['state_dict']) 85 | optimizer.load_state_dict(checkpoint['optimizer']) 86 | print("=> loaded checkpoint '{}' (epoch {}) Prec: {:f}" 87 | .format(args.resume, checkpoint['epoch'], best_prec1)) 88 | else: 89 | print("=> no checkpoint found at '{}'".format(args.resume)) 90 | 91 | def train(epoch): 92 | model.train() 93 | 94 | avg_loss = 0.0 95 | train_acc = 0.0 96 | for batch_idx, batchdata in enumerate(train_loader): 97 | data, target = batchdata["data"], batchdata["target"] # 98 | data, target = data.to(device), target.to(device) # 99 | optimizer.zero_grad() 100 | 101 | predict = model(data) # 102 | loss = lossfunc(predict, target) # 103 | avg_loss += loss.item() # 104 | 105 | loss.backward() 106 | optimizer.step() 107 | 108 | print('Train Epoch: {} [{}/{} ({:.1f}%)]\tLoss: {:.6f}'.format( 109 | epoch, batch_idx * len(data), len(train_loader.dataset), 110 | 100. * batch_idx / len(train_loader), loss.item())) 111 | 112 | if (epoch + 1) % args.save_interval == 0: 113 | state = { 'epoch': epoch + 1, 114 | 'state_dict': model.state_dict(), 115 | 'best_prec': 0.0, 116 | 'optimizer': optimizer.state_dict()} 117 | model_path = os.path.join(args.checkpoint_dir, 'model_' + str(epoch) + '.pth') 118 | torch.save(state, model_path) 119 | 120 | 121 | def test(): 122 | model.eval() 123 | 124 | test_loss = 0 125 | for batch_idx, batchdata in enumerate(valid_loader): 126 | data, target = batchdata["data"], batchdata["target"] # 127 | data, target = data.to(device), target.to(device) # 128 | predict = model(data) # 129 | test_loss += lossfunc(predict, target) # 130 | psnr = criterion(predict * 255, target * 255) # 131 | 132 | test_loss /= len(valid_loader.dataset) 133 | print('\nTest set: Average loss: {:.4f}, loss:{}, PSNR: ({:.1f})\n'.format( 134 | test_loss, test_loss / len(valid_loader.dataset), psnr / len(valid_loader.dataset))) 135 | return psnr / float(len(valid_loader.dataset)) 136 | 137 | 138 | best_prec = 0.0 139 | for epoch in range(args.start_epoch, args.epochs): 140 | train(epoch) 141 | scheduler.step() 142 | print(print(optimizer.state_dict()['param_groups'][0]['lr'])) 143 | 144 | current_prec = test() 145 | is_best = current_prec > best_prec # 更改大小写 ! 146 | best_prec = max(best_prec, best_prec) # max or min 147 | 148 | save_checkpoint({ 149 | 'epoch': epoch + 1, 150 | 'state_dict': model.state_dict(), 151 | 'best_prec': best_prec, 152 | 'optimizer': optimizer.state_dict(), 153 | }, is_best, args.checkpoint_dir) 154 | -------------------------------------------------------------------------------- /appendix/template/utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import shutil 3 | import torch 4 | 5 | def save_checkpoint(state, is_best, checkpoint_dir): 6 | torch.save(state, os.path.join(checkpoint_dir, 'checkpoint_latest.pth.tar')) 7 | if is_best: 8 | shutil.copyfile(os.path.join(checkpoint_dir, 'checkpoint_latest.pth.tar'), 9 | os.path.join(checkpoint_dir, 'checkpoint__best.pth.tar')) --------------------------------------------------------------------------------