├── .gitignore
├── README.md
└── appendix
    ├── PytorchBugs.md
    ├── module
        ├── .DS_Store
        ├── data
        │   └── data_aug.py
        ├── models
        │   ├── classification.md
        │   ├── fpn.py
        │   ├── mnasnet.py
        │   ├── mobilenet_v1.py
        │   ├── resnet.py
        │   ├── senet.py
        │   ├── shufflenet_v2.py
        │   └── unet.py
        └── optimizer
        │   └── Learning_rate.ipynb
    ├── production
        ├── distributed
        │   ├── pytorch-distributed-example-master
        │   │   ├── .gitignore
        │   │   ├── LICENSE
        │   │   ├── README.md
        │   │   ├── mnist
        │   │   │   ├── Dockerfile
        │   │   │   ├── README.md
        │   │   │   ├── docker-compose-gpu.yml
        │   │   │   ├── docker-compose.yml
        │   │   │   └── main.py
        │   │   ├── setup.cfg
        │   │   └── toy
        │   │   │   ├── README.md
        │   │   │   └── main.py
        │   └── pytorch-distributed-master
        │   │   ├── .gitignore
        │   │   ├── LICENSE
        │   │   ├── README.md
        │   │   ├── apex_distributed.py
        │   │   ├── dataparallel.py
        │   │   ├── distributed.py
        │   │   ├── distributed_slurm_main.py
        │   │   ├── horovod_distributed.py
        │   │   ├── multiprocessing_distributed.py
        │   │   ├── requirements.txt
        │   │   ├── start.sh
        │   │   └── statistics.sh
        └── inference
        │   ├── TensorRT
        │       ├── .ipynb_checkpoints
        │       │   └── pytorch_onnx_trt-checkpoint.ipynb
        │       ├── README.md
        │       ├── cat.jpeg
        │       ├── pytorch_onnx_trt.ipynb
        │       ├── trt_helper.py
        │       └── trt_int8_calibration_helper.py
        │   └── flask-api
        │       ├── .gitignore
        │       ├── LICENSE
        │       ├── README.md
        │       ├── app.py
        │       ├── cat.jpeg
        │       ├── imagenet_class_index.json
        │       └── requirements.txt
    └── template
        ├── .gitignore
        ├── LICENSE
        ├── README.md
        ├── assets
            └── images
            │   ├── 01.JPG
            │   └── 02.JPG
        ├── data
            ├── test
            │   ├── .gitkeep
            │   ├── high
            │   │   ├── 1.png
            │   │   └── 22.png
            │   └── low
            │   │   ├── 1.png
            │   │   └── 22.png
            ├── train
            │   ├── high
            │   │   ├── 2.png
            │   │   └── 5.png
            │   └── low
            │   │   ├── 2.png
            │   │   └── 5.png
            └── valid
            │   ├── high
            │       ├── 1.png
            │       └── 22.png
            │   └── low
            │       ├── 1.png
            │       └── 22.png
        ├── datasets.py
        ├── logs
            └── .gitignore
        ├── loss.py
        ├── metrics.py
        ├── models.py
        ├── requirements.txt
        ├── test.py
        ├── train.py
        └── utils.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Compiled python
 2 | *.pyc
 3 | *.pyd
 4 | 
 5 | # Compiled MATLAB
 6 | *.mex*
 7 | 
 8 | # IPython notebook checkpoints
 9 | .ipynb_checkpoints
10 | 
11 | # Editor temporaries
12 | *.swn
13 | *.swo
14 | *.swp
15 | *~
16 | 
17 | # Sublime Text settings
18 | *.sublime-workspace
19 | *.sublime-project
20 | 
21 | # Eclipse Project settings
22 | *.*project
23 | .settings
24 | 
25 | # QtCreator files
26 | *.user
27 | 
28 | # PyCharm files
29 | .idea
30 | 
31 | # Visual Studio Code files
32 | .vscode
33 | .vs
34 | 
35 | # OSX dir files
36 | .DS_Store
37 | 


--------------------------------------------------------------------------------
/appendix/PytorchBugs.md:
--------------------------------------------------------------------------------
  1 | ### 一. CUDA & cudnn
  2 | 
  3 | **1. cuDNN error:CUDNN_STATUS_EXECUTION_FAILED**
  4 | 
  5 | **A：**This happens also in the windows port of PyTorch, the only way to overcome this when using (in my case) large CNN’s is to use: 
  6 | 
  7 | ~~~shell
  8 | torch.backends.cudnn.enabled=False
  9 | ~~~
 10 | 
 11 | **2. out of memory at /opt/conda/conda-bld/pytorch_1524590031827/work/aten/src/THC/generic/THCStorage.cu:58**
 12 | 
 13 | **A：**! 显存不够, 没啥办法　-> **Batchsize 改小、加显卡、混合精度训练**
 14 | 
 15 | **3. CUDA 设置指定 GPU 可见**
 16 | 
 17 | **> 可设置环境变量 CUDA_VISIBLE_DEVICES，指明可见的 cuda 设备**
 18 | 
 19 | **方法1**: 在  `/etc/profile` 或 `~/.bashrc` 的配置文件中配置环境变量(`/etc/profile`影响所有用户，`~/.bashrc `影响当前用户使用的 bash shell)
 20 | 
 21 | 在 `/etc/profile` 文件末尾添加以下行：
 22 | 
 23 | ~~~
 24 | export CUDA_VISIBLE_DEVICES=0,1 # 仅显卡设备0,1GPU可见
 25 | ~~~
 26 | 
 27 | `:wq` 保存并退出, 然后执行如下命令:
 28 | 
 29 | ~~~
 30 | source /etc/profile
 31 | ~~~
 32 | 
 33 | 使配置文件生效
 34 | 
 35 | **方法2**：若上述配置无效，可在执行 cuda 程序时指明参数，如
 36 | 
 37 | ~~~shell
 38 | CUDA_VISIBLE_DEVICES=0,1 ./cuda_executable
 39 | # Environment Variable Syntax                Results
 40 | # CUDA_VISIBLE_DEVICES=1        Only device 1 will be seen
 41 | # CUDA_VISIBLE_DEVICES=0,1       Devices 0 and 1 will be visible
 42 | # CUDA_VISIBLE_DEVICES="0,1"      Same as above, quotation marks are optional
 43 | # CUDA_VISIBLE_DEVICES=0,2,3      Devices 0, 2, 3 will be visible; device 1 is masked
 44 | ~~~
 45 | 
 46 | **4. nn.Module.cuda() 和 Tensor.cuda() 的作用效果差异**
 47 | 
 48 | 无论是对于模型还是数据，`cuda()` 函数都能实现从 CPU 到 GPU 的内存迁移，但是他们的作用效果有所不同。
 49 | 
 50 | 对于 `nn.Module`:
 51 | 
 52 | ~~~python
 53 | model = model.cuda() # 等价于 model.cuda() 
 54 | ~~~
 55 | 
 56 | 上面两句能够达到一样的效果，即对 model 自身进行的内存迁移。
 57 | 
 58 | 对于 `Tensor`:
 59 | 
 60 | ​    和nn.Module不同，调用 `tensor.cuda()` 只是返回这个 tensor 对象在 GPU 内存上的拷贝，而不会对自身进行改变**。因此必须对tensor进行重新赋值，即 **`tensor = tensor.cuda()`.
 61 | 
 62 | ~~~python
 63 | model = create_a_model() 
 64 | tensor = torch.zeros([2,3,10,10]) 
 65 | model.cuda() 
 66 | tensor.cuda() 
 67 | model(tensor)    # 会报错 
 68 | tensor = tensor.cuda() 
 69 | model(tensor)    # 正常运行 
 70 | ~~~
 71 | 
 72 | **5. an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generated/../THCReduceAll.cuh:339**
 73 | 
 74 | 在 GPU 训练中不正确的内存访问，有可能是程序问题也有可能是当前驱动不兼容的问题：
 75 | 
 76 | ​        因为 cuda 运行是异步的，所以我们的错误信息可能没有那么准确，为此我们将环境变量 `CUDA_LAUNCH_BLOCKING=1` 设为1,在当前的 terminal 中执行 `CUDA_LAUNCH_BLOCKING=1 python3 train.py`  —— (train.py是你要执行的.py文件)，再次执行就可以查看到当前出错的代码行。
 77 | 
 78 | ​        仔细检查当前的代码，查看是否有内存的不正确访问，最常见的是索引超出范围。 如果不是代码问题，那么有可能是当前的 pytorch 版本和你的显卡型号不兼容，或者cudnn的库不兼容的问题。可以挑选出错误代码段对其进行简单的测试观察有没有错误即可。
 79 | 
 80 | **6.** **AttributeError: module 'torch._C' has no attribute '_cuda_getDevice'**
 81 | 
 82 | **A:** According to [docs](http://pytorch.org/docs/master/torch.html#torch.load), I believe you should be loading your models with
 83 | 
 84 | ~~~python
 85 | torch.load(file, map_location=device)
 86 | ~~~
 87 | 
 88 | 
 89 | 
 90 | ### 二. mismacth 
 91 | 
 92 | **1.** **Input type (CUDAFloatTensor) and weight type (CPUFloatTensor) should be the same**
 93 | 
 94 | **A:** **输入数据和模型中的权重设备不一样**，**模型的参数不在GPU中，而输入数据在CPU中**，可以通过添加　model.cuda()　将模型转移到GPU上以解决这个问题。
 95 | 
 96 | **2. Input type (CUDADoubleTensor) and weight type (CUDAFloatTensor) should be the same**
 97 | 
 98 | **A:** **输入数据和模型中的权重数据类型不一致，一个为 Double 一个为 Float**, 通过对输入数据 Tensor(x) 进行 x.float() 将输入数据和模型权重类型一致，或者将模型权重的类型转化为 Double 也可以解决问题。
 99 | 
100 | **3. RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 1 and 3 in dimension 1**
101 | 
102 | (1) 读取数据的时候发生错误了。一般来说是维度不匹配，如果一个数据集中有 3 通道的也有四通道的图像，总之就是从 dataset 中传入 dataloader 中的图像大小不一致。自己好好检查检查，是否将所有图像都变成同样的 shape 。注意，**只要是 dataset 中的数据都要 shape 一样，不论是图像还是 label，或者 box，都必须一致了。**
103 | 
104 | (2) 尺寸的原因。**检查卷积核的尺寸和输入尺寸是否匹配，padding数是否正确**。
105 | 
106 | **4. invalid argument 0: Sizes of tensors must match except in dimension 1. Got 14 and 13 in dimension 0 at /home/prototype/Downloads/pytorch/aten/src/THC/generic/THCTensorMath.cu:83**
107 | 
108 | (1)  **你输入的图像数据的维度不完全是一样的**，比如是训练的数据有 100 组，其中 99 组是 256*256，但有一组是 384*384，这样会导致Pytorch的检查程序报错。 -> 整理一下你的数据集保证每个图像的维度和通道数都一致即可。
109 | 
110 | (2)  另外一个则是比较隐晦的 batchsize 的问题，Pytorch 中检查你训练维度正确是按照每个 batchsize 的维度来检查的。 如果你有 999 组数据，你继续使用 batchsize 为 4 的话，这样 999 和 4 并不能整除，你在训练前 249 组时的张量维度都为(4,3,256,256) 但是最后一个批次的维度为 (3,3,256,256)，Pytorch 检查到 (4,3,256,256) != (3,3,256,256)，维度不匹配，自然就会报错了，这可以称为一个小 bug。 -> 挑选一个可以被数据集个数整除的batchsize或者直接把batchsize设置为1即可。
111 | 
112 | **5. expected CPU tensor (got CUDA tensor)**
113 | 
114 | 期望得到CPU类型张量，得到的却是CUDA张量类型。 很典型的错误，例如计算图中有的参数为cuda型有的参数却是cpu型就会遇到这样的错误。
115 | 
116 | 
117 | 
118 | ### 三. version
119 | 
120 | **1.** **IndexError:** **invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number**
121 | 
122 | ~~~python
123 | total_loss += loss.data[0]
124 | ~~~
125 | 
126 | **A:** 这个错误很常见, 由于高版本的pytorch对接口进行更改的导致的! 
127 | 
128 | It's likely that you're using a more recent version of pytorch than we specified. Replacing this line with
129 | 
130 | ~~~python
131 | total_loss += loss.item()
132 | ~~~
133 | 
134 | Should probably solve the problem.
135 | 
136 | **2.** /home/zhaozhichao/.local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: **FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.**
137 | 
138 | **A：**进入错误所在文件的位置: 将 `np.dtype([("quint8", np.uint8, 1)])` 修改为 `np.dtype([("quint8", np.uint8, (1,))])`　即可。
139 | 
140 | **3.** UserWarning: **indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool** 
141 | 
142 | **A:** 项目目录: https://github.com/eriklindernoren/PyTorch-YOLOv3/blob/master/models.py#L191
143 | 
144 | 在 model.py 中 191 行添加如下内容:
145 | 
146 | ~~~python
147 | # obj_mask 转为 bool
148 | obj_mask = obj_mask.bool()  # convert int8 to bool
149 | noobj_mask = noobj_mask.bool() # convert int8 to bool
150 | ~~~
151 | 
152 | **4.** **warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")**
153 | 
154 | A: 不要再使用  `torch.nn.Functional` 下面的激活函数了。 比如 `F.Sigmoid(x)` 这种形式了。 正确的使用方式是:
155 | 
156 | ~~~
157 | sigmoid = torch.nn.Sigmoid() out = sigmoid(x)
158 | ~~~
159 | 
160 | **5**  KeyError: 'unexpected key "module.bn1.num_batches_tracked" in state_dict'
161 | 
162 | 经过研究发现，在 pytorch 0.4.1 及后面的版本里，BatchNorm 层新增了 num_batches_tracked 参数，用来统计训练时的 forward 过的 batch 数目，源码如下（pytorch0.4.1）：
163 | 
164 | ~~~python
165 | if self.training and self.track_running_stats:
166 |     self.num_batches_tracked += 1
167 |     if self.momentum is None:  # use cumulative moving average
168 |         exponential_average_factor = 1.0 / self.num_batches_tracked.item()
169 |     else:  # use exponential moving average
170 |         exponential_average_factor = self.momentum
171 | ~~~
172 | 
173 | 大概可以看出，这个参数和训练时的归一化的计算方式有关。
174 | 
175 | 因此，我们可以知道该错误是由于训练和测试所用的 pytorch 版本( 0.4.1 版本前后的差异)不一致引起的。具体的解决方案是：如果是模型参数（ Orderdict 格式，很容易修改）里少了 num_batches_tracked 变量，就加上去，如果是多了就删掉。偷懒的做法是将 load_state_dict 的 strict 参数置为 False，如下所示：  
176 | 
177 | ~~~python
178 | load_state_dict(torch.load(weight_path), strict=False)
179 | ~~~
180 | 
181 | 
182 | 
183 | ### 四. Data & dataloader
184 | 
185 | **1. 在开始训练的时候 jupyter notebook　kernel 死掉的现象**
186 | 
187 | A:　最首先的应该是去检查　DataLoader 中的num_workers, 尝试一下将 num_workers　设置更小或者设置　`num_workers = 0`。      //  把 jupyter 改为 code 形式会有报错信息
188 | 
189 | **2. ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)**
190 | 
191 | ​    出现这个错误的情况是，在服务器上的 docker 中运行训练代码时，batch size 设置得过大，shared memory 不够（因为docker 限制了 shm ）.解决方法是，将 Dataloader 的 num_workers 设置为 0.
192 | 
193 | **3. Assertion `cur_target >= 0 && cur_target < n_classes’ failed.**
194 | 
195 | **A: RuntimeError: cuda runtime error (59) : device-side assert triggered at /home/loop/pytorch-master/torch/lib/THC/generic/THCTensorMath.cu:15**
196 | 
197 | 我们在分类训练中经常遇到这个问题，一般来说在我们网络中输出的种类数和你label设置的种类数量不同的时候就会出现这个错误。
198 | 
199 | 但是，Pytorch有个要求，**在使用** **CrossEntropyLoss** **这个函数进行验证时label必须是以0开始的**：
200 | 
201 | 假如我这样:
202 | 
203 | ~~~
204 | self.classes = [0, 1, 2, 3]
205 | ~~~
206 | 
207 | 我的种类有四类，分别是0.1.2.3，这样就没有什么问题，但是如果我写成：
208 | 
209 | ~~~
210 | self.classes = [1, 2, 3, 4]
211 | ~~~
212 | 
213 | 这样就会报错
214 | 
215 | **->** 可以判断为 pytorch 所设计的分类器的分类 label 为 `[0,max-1]`，而 true ground 的标签值为 `[1,max]`。 所以可以通过修改 `label = (label-1).to(opt.device)`
216 | 
217 | 
218 | 
219 | ### 五. 模型载入问题
220 | 
221 | **1. RuntimeError: Error(s) in loading state_dict for Missing key(s) in state_dict: “fc.weight”, “fc.bias”.**
222 | 
223 | 像这种出现**丢失 key**(**missing key**)
224 | 
225 | If you have partial state_dict, which is missing some keys you can do the following:
226 | 
227 | ~~~python
228 | state = model.state_dict()
229 | state.update(partial)
230 | model.load_state_dict(state)
231 | ~~~
232 | 
233 | **2. RuntimeError: Error(s) in loading state_dict for Missing key(s) in Unexpected key(s) in state_dict: “classifier.0.weight”,**
234 | 
235 | **A:** 
236 | 
237 | ~~~
238 | # original saved file with DataParallel
239 | state_dict = torch.load('myfile.pth.tar')
240 | # create new OrderedDict that does not contain `module.`
241 | from collections import OrderedDict
242 | new_state_dict = OrderedDict()
243 | for k, v in state_dict.items():
244 |     name = k[7:] # remove `module.`
245 |     new_state_dict[name] = v
246 | # load params
247 | model.load_state_dict(new_state_dict)
248 | ~~~
249 | 
250 | 
251 | 
252 | ### 六. 损失函数 
253 | 
254 | **1. 训练时损失出现 nan** 
255 | 
256 | 可能导致梯度出现 nan 的三个原因：
257 | 
258 | 1). **梯度爆炸**。也就是说梯度数值超出范围变成 nan. **通常可以调小学习率、加 BN 层或者做梯度裁剪来试试看有没有解决**。
259 | 
260 | 2). **损失函数或者网络设计**。比方说，**出现了除 0，或者出现一些边界情况导致函数不可导，比方说 log(0)、sqrt(0)**.
261 | 
262 | 3). 脏数据。**可以事先对输入数据进行判断看看是否存在nan.**
263 | 
264 | 补充一下nan数据的判断方法：
265 | 
266 | 注意！像nan或者inf这样的数值不能使用 == 或者 is 来判断！ 为了安全起见统一使用 `math.isnan()` 或者 `numpy.isnan()` 吧。
267 | 
268 | 例如：
269 | 
270 | ~~~
271 | import numpy as np
272 |  
273 | # 判断输入数据是否存在nan
274 | if np.any(np.isnan(input.cpu().numpy())):
275 |   print('Input data has NaN!')
276 |  
277 | # 判断损失是否为nan
278 | if np.isnan(loss.item()):
279 |   print('Loss value is NaN!')
280 | ~~~
281 | 
282 | **2. pytorch中loss函数的参数设置**
283 | 
284 | ​    以 CrossEntropyLoss 为例：
285 | 
286 | ~~~
287 |  CrossEntropyLoss(self, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='elementwise_mean') 
288 | ~~~
289 | 
290 | - **reduce**：
291 | 
292 | ​    若 **reduce = False**，那么 size_average 参数失效，直接**返回向量形式**的 loss，即batch中每个元素对应的loss.
293 | 
294 | ​    若 **reduce = True**，那么 loss 返回的是标量：
295 | 
296 | ​        如果 size_average = True，返回 **loss.mean()**.
297 | 
298 | ​        如果 size_average = False，返回 **loss.sum()**.
299 | 
300 | - **weight** : 输入一个1D的权值向量，**为各个类别的loss加权**。
301 | 
302 | - **ignore_index** : 选择要忽视的目标值，使其对输入梯度不作贡献。如果 size_average = True，那么只计算不被忽视的目标的loss的均值。
303 | - **reduction** : 可选的参数有：‘none’ | ‘elementwise_mean’ | ‘sum’, 正如参数的字面意思，不解释。
304 | 
305 | **七. 多机多卡与并行**
306 | 
307 | **1. 含有: torch.nn.DataParallel  的代码无法在 CPU上运行**
308 | 
309 | 解决方案， 使用如下代码替换 DataParallel  进行一次封装:
310 | 
311 | ~~~python
312 | class WrappedModel(torch.nn.Module):
313 |     def __init__(self, module):
314 |             super(WrappedModel, self).__init__()
315 |             self.module = module   # that I actually define.
316 |     def forward(self, *x, **kwargs):
317 |             return self.module(*x, **kwargs)
318 | ~~~
319 | 
320 | **Why：**
321 | 
322 | 含有 DataParallel 的代码， 会对源代码添加一层  module 进行封装， 我们所做的仅仅是对其进行了一次 module 的封装
323 | 
324 | **2. 使用nn.Dataparallel 数据不在同一个gpu 上**
325 | 
326 | 背景：pytorch 多GPU训练主要是采用数据并行方式：
327 | 
328 | ~~~python
329 | model = nn.DataParallel(model) 
330 | ~~~
331 | 
332 | 问题：但是一次同事训练基于光流检测的实验时发现 data not in same cuda, 做代码 review 时候，打印每个节点 tensor，cuda 里的数据竟然没有分布在同一个 gpu 上
333 | 
334 | ->  最终解决方案是在数据，吐出后统一进行执行.cuda() 将数据归入到同一个 cuda 流中解决了该问题。
335 | 
336 | **3. pytorch model load可能会踩到的坑：**
337 | 
338 | 如果使用了nn.Dataparallel 进行多卡训练在读入模型时候要注意加.module， 代码如下:
339 | 
340 | ~~~python
341 | def get_model(self):
342 |   if self.nGPU == 1:         
343 |       return self.model     
344 |   else:         
345 |       return self.model.module
346 | ~~~
347 | 
348 | **4. 多 GPU 的处理机制**
349 | 
350 | 使用多 GPU 时，应该记住 pytorch 的处理逻辑是：
351 | 
352 | 1) 在各个 GPU 上初始化模型。
353 | 
354 | 2) 前向传播时，把 batch 分配到各个 GPU 上进行计算。
355 | 
356 | 3) 得到的输出在主 GPU 上进行汇总，计算 loss 并反向传播，更新主 GPU上的权值。
357 | 
358 | 4) 把主 GPU 上的模型复制到其它 GPU 上。
359 | 
360 | 
361 | 
362 | ### 八. 优化问题(lr & optim)
363 | 
364 | **1. 优化器的 weight_decay 项导致的隐蔽 bug**
365 | 
366 | ​        我们都知道 weight_decay 指的是权值衰减，即在原损失的基础上加上一个 L2 惩罚项，使得模型趋向于选择更小的权重参数，起到正则化的效果。但是我经常会忽略掉这一项的存在，从而引发了意想不到的问题。 这次的坑是这样的，在训练一个 ResNet50 的时候，网络的高层部分 layer 4暂时没有用到，因此也并不会有梯度回传，于是我就放心地将 ResNet50 的所有参数都传递给 Optimizer 进行更新了，想着 layer4 应该能保持原来的权重不变才对。但是实际上，尽管 layer4 没有梯度回传，但是weight_decay 的作用仍然存在，它使得 layer4 权值越来越小，趋向于 0。后面需要用到 layer4 的时候，发现输出异常（接近于0），才注意到这个问题的存在。 虽然这样的情况可能不容易遇到，但是还是要谨慎：**暂时不需要更新的权值，一定不要传递给 Optimizer，避免不必要的麻烦**。
367 | 
368 | 
369 | 
370 | ### 九. pytorch 可重复性问题
371 | 
372 | https://blog.csdn.net/hyk_1996/article/details/84307108
373 | 
374 | 
375 | 
376 | ### 十. 基本语法问题
377 | 
378 | **1. RuntimeError: some of the strides of a given numpy array are negative. This is currently not supported, but will be added in future releases.**
379 | 
380 | ~~~python
381 | import numpy as np
382 | x = np.random.random(size=(32, 32, 7))
383 | torch.from_numpy(np.flip(x, axis=0))
384 | ~~~
385 | 
386 | Same error with np.rot90()
387 | 
388 | **A:** ndarray.copy() will alocate new memory for numpy array which make it normal, I mean the stride is not negative any more.
389 | 
390 | ~~~python
391 | torch.from_numpy(np.flip(x, axis=0).copy())
392 | ~~~
393 | 
394 | **2. view()操作只能用在连续的tensor下**
395 | 
396 | 利用 `is_contiguous()` 判断该 tensor 在内存中是否连续，不连续的话使用 `.contiguous()`使其连续。
397 | 
398 | **3. input is not contiguous at /pytorch/torch/lib/THC/generic/THCTensor.c:227**
399 | 
400 | ~~~
401 | batch_size, c, h, w = input.size()
402 |  rh, rw = (2, 2)
403 |  oh, ow = h * rh, w * rw
404 |  oc = c // (rh * rw)
405 |  out = input.view(batch_size, rh, rw, oc, h, w)
406 |  out = out.permute(0, 3, 4, 1, 5, 2)
407 |  out = out.view(batch_size, oc, oh, ow)
408 | invalid argument 2: input is not contiguous at /pytorch/torch/lib/THC/generic/THCTensor.c:227
409 | ~~~
410 | 
411 | **A:** 上述在第7行报错，报错原因是由于浅拷贝。上面式子中input为Variable变量。
412 | 
413 | 上面第5行 `out = out.permute(0, 3, 4, 1, 5, 2)`  时执行了浅拷贝，out 只是复制了out 从 input 传递过来的指针，也就是说 input 要改变 out 也要随之改变。
414 | 
415 | 解决方法是，在第6行的时候使用 `tensor.contiguous()`，第6行改成:`out = out.permute(0, 3, 4, 1, 5, 2).contiguous()`即可。
416 | 
417 | **4. RuntimeError: some of the strides of a given numpy array are negative. This is currently not supported, but will be added in future releases.**
418 | 
419 | **A:** 这个原因是因为程序中操作的 numpy 中有使用负索引的情况：`image[…, ::-1]`。
420 | 
421 | 解决办法比较简单，加入 image 这个 numpy 变量引发了错误，返回 `image.copy()` 即可。因为copy操作可以在原先的numpy变量中创造一个新的不适用负索引的numpy变量。
422 | 
423 | **5. torch.Tensor.detach()的使用**
424 | 
425 | `detach()`的官方说明如下： Returns a new Tensor, detached from the current graph. The result will never require gradient. 
426 | 
427 | **假设有模型A和模型B，我们需要将A的输出作为B的输入，但训练时我们只训练模型B. 那么可以这样做：** 
428 | 
429 | ~~~python
430 | input_B = output_A.detach()
431 | ~~~
432 | 
433 | 它可以使两个计算图的梯度传递断开，从而实现我们所需的功能。
434 | 
435 | **6. ValueError: Expected more than 1 value per channel when training**
436 | 
437 | 当 batch 里只有一个样本时，再调用 batch_norm 就会报下面这个错误： **raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))**。 没有什么特别好的解决办法，在训练前用 **num_of_samples % batch_size** 算一下会不会正好剩下一个样本。 **! 当 bacthsize 为1的时候, batchnorm 是无法运行的。**
438 | 
439 | 
440 | 
441 | ### 十一. 基本用法
442 | 
443 | **Q: Pytorch 如何忽略警告**
444 | 
445 | ~~~python
446 | python3 -W ignore::UserWarning xxxx.py
447 | ~~~
448 | 
449 | 
450 | 
451 | 
452 | 
453 | ### 十二. ONNX
454 | 
455 | ```
456 | ONNX's Upsample/Resize operator did not match Pytorch's Interpolation until opset 11. Attributes to determine how to transform the input were added in onnx:Resize in opset 11 to support Pytorch's behavior (like coordinate_transformation_mode and nearest_mode).
457 | ```
458 | 
459 | 警告信息已经完整说明，**ONNX's Upsample/Resize operator did not match Pytorch's Interpolation until opset 11.**，因此将ONNX的导出代码中规定其版本，具体如下：
460 | 
461 | ~~~python
462 | import torch
463 | torch.onnx.export(model, ..., opset_version=11)
464 | ~~~
465 | 
466 | 


--------------------------------------------------------------------------------
/appendix/module/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/module/.DS_Store


--------------------------------------------------------------------------------
/appendix/module/data/data_aug.py:
--------------------------------------------------------------------------------
  1 | #!usr/bin/env python3
  2 | # -*- coding:utf-8 -*-
  3 | 
  4 | # data aug
  5 | # author: zhaozhichao
  6 | # reference:
  7 | # https://github.com/apache/incubator-mxnet/tree/6f9a67901362a794e3c022dd75daf8a516760fea/python/mxnet/image
  8 | # http://scipy-lectures.org/advanced/image_processing/
  9 | # https://blog.csdn.net/lwplwf/article/details/85776309
 10 | # https://blog.csdn.net/u011995719/article/details/85107009
 11 | # https://github.com/pytorch/vision/tree/master/torchvision/transforms
 12 | # https://github.com/mirzaevinom/data_science_bowl_2018/blob/master/codes/augment_preprocess.py
 13 | # https://github.com/jacobkie/2018DSB/blob/07df7d385f23a2272d8258351d680b037705ce3c/script_final/preprocess.py
 14 | # https://github.com/selimsef/dsb2018_topcoders/blob/master/albu/src/augmentations/functional.py
 15 | 
 16 | import cv2
 17 | import math
 18 | import numpy as np
 19 | from scipy.ndimage.filters import gaussian_filter
 20 | 
 21 | # FancyPCA
 22 | def FancyPCA(img):
 23 |     """
 24 |     """
 25 |     h, w, c = img.shape
 26 |     img = np.reshape(img, (h * w, c)).astype('float32')
 27 |     mean = np.mean(img, axis=0)
 28 |     std = np.std(img, axis=0)
 29 |     img = (img - mean) / std
 30 | 
 31 |     cov = np.cov(img, rowvar=False)
 32 |     lambdas, p = np.linalg.eig(cov)
 33 |     alphas = np.random.normal(0, 0.1, c)
 34 |     pca_img = img + np.dot(p, alphas*lambdas)
 35 | 
 36 |     pca_color_img = pca_img * std + mean
 37 |     pca_color_img = np.maximum(np.minimum(pca_color_img, 255), 0)
 38 |     return pca_color_img.reshape(h, w, c).astype(np.uint8)
 39 | 
 40 | # Flip and Rotation
 41 | def random_horizontal_flip(img, p):
 42 |     """
 43 |         img : Image to be horizontal flipped
 44 |         p: probability that image should be horizontal flipped.
 45 |     """
 46 |     if np.random.random() < p:
 47 |         img = np.fliplr(img) 
 48 |     return img
 49 | 
 50 | def random_vertical_flip(img, p):
 51 |     """
 52 |         img : Image to be vertical flipped
 53 |         p: probability that image should be vertical flipped.
 54 |     """
 55 |     if np.random.random() < p:
 56 |         img = np.flipud(img)
 57 |     return img
 58 | 
 59 | def random_rotate90(img, p):
 60 |     """
 61 |         img : Image to be random rotated
 62 |         p: probability that image should be random rotated.
 63 |     """
 64 |     if np.random.random() < p:
 65 |         angle = np.random.randint(1, 3) * 90
 66 | 
 67 |         if angle == 90:
 68 |             img = img.transpose(1,0,2)
 69 |             img = np.fliplr(img)
 70 | 
 71 |         elif angle == 180:
 72 |         	img = np.rot90(img, 2)
 73 | 
 74 |         elif angle == 270:
 75 |             img = img.transpose(1,0,2)
 76 |             img = np.flipud(img)
 77 |     return img
 78 | 
 79 | def rotate(img, angle):
 80 |     """
 81 |         img : Image to be rotated
 82 |         angle(degree measure): angle to be rotated
 83 |     """
 84 |     height, width = img.shape[0:2]
 85 |     mat = cv2.getRotationMatrix2D((width/2, height/2), angle, 1.0)
 86 |     img = cv2.warpAffine(img, mat, (width, height),
 87 |                          flags=cv2.INTER_LINEAR,
 88 |                          borderMode=cv2.BORDER_REFLECT_101)
 89 |     return img
 90 | 
 91 | def shift_scale_rotate(img, angle, scale, dx, dy):
 92 |     """
 93 |     img : Image to be affine transformation
 94 |     angle(degree measure): angle to be rotated(15, 30)
 95 |     scale: sclae(1.1/1.2/1.3)
 96 |     dx, dy: offset, Compared to the original image
 97 | 
 98 |     """
 99 |     height, width = img.shape[:2]
100 | 
101 |     angle = np.random.uniform(-angle, angle)
102 |     scale = np.random.uniform(1.0/scale, scale)
103 | 
104 |     cc = math.cos(angle/180*math.pi) * scale
105 |     ss = math.sin(angle/180*math.pi) * scale
106 |     rotate_matrix = np.array([[cc, -ss], [ss, cc]])
107 | 
108 |     box0 = np.array([[0, 0], [width, 0],  [width, height], [0, height], ])
109 |     box1 = box0 - np.array([width/2, height/2])
110 |     box1 = np.dot(box1, rotate_matrix.T) + \
111 |         np.array([width/2+dx*width, height/2+dy*height])
112 | 
113 |     box0 = box0.astype(np.float32)
114 |     box1 = box1.astype(np.float32)
115 |     mat = cv2.getPerspectiveTransform(box0, box1)
116 |     img = cv2.warpPerspective(img, mat, (width, height),
117 |                               flags=cv2.INTER_LINEAR,
118 |                               borderMode=cv2.BORDER_REFLECT_101)
119 |     return img
120 | 
121 | 
122 | # color: brightness, contrast, saturation : Done
123 | def random_brightness(img, brightness):
124 |     """
125 |     brightness : float, The brightness jitter ratio range, [0, 1]
126 |     """
127 |     alpha = 1 + np.random.uniform(-brightness, brightness)
128 |     img = alpha * img
129 |     img = np.clip(img, 0, 255).astype(np.uint8)
130 |     return img
131 | 
132 | def random_contrast(img, contrast):
133 |     """
134 |         contrast : The contrast jitter ratio range, [0, 1]
135 |     """
136 |     coef = np.array([[[0.114, 0.587,  0.299]]])   # rgb to gray (YCbCr)
137 |     alpha = 1.0 + np.random.uniform(-contrast, contrast)
138 |     gray = img * coef
139 |     gray = (3.0 * (1.0 - alpha) / gray.size) * np.sum(gray)
140 |     img = alpha*img  + gray
141 |     img = np.clip(img, 0, 255).astype(np.uint8)
142 |     return img
143 | 
144 | def random_saturation(img, saturation):
145 |     coef = np.array([[[0.299, 0.587, 0.114]]])
146 |     alpha = np.random.uniform(-saturation, saturation)
147 |     gray  = img * coef
148 |     gray  = np.sum(gray,axis=2, keepdims=True)
149 |     img = alpha*img  + (1.0 - alpha)*gray
150 |     img = np.clip(img, 0, 255).astype(np.uint8)
151 |     return img
152 | 
153 | def random_color(img, brightness, contrast, saturation):
154 |     """
155 |         brightness : The brightness jitter ratio range, [0, 1]
156 |         contrast : The contrast jitter ratio range, [0, 1]
157 |         saturation : The saturation jitter ratio range, [0, 1]
158 |     """
159 |     if brightness > 0:
160 |         img = random_brightness(img, brightness)
161 |     if contrast > 0:
162 |         img = random_contrast(img, contrast)
163 |     if saturation > 0:
164 |         img = random_saturation(img, saturation)
165 |     return img
166 | 
167 | def random_hue(image, hue):
168 |     """
169 |      The hue jitter ratio range, [0, 1]
170 |     """
171 |     h = int(np.random.uniform(-hue, hue)*180)
172 | 
173 |     hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
174 |     hsv[:, :, 0] = (hsv[:, :, 0].astype(int) + h) % 180
175 |     image = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)
176 |     return image
177 | 
178 | 
179 | # add noise
180 | def random_noise(img, limit=[0, 0.1], p=1):
181 |     if np.random.random() < p:
182 |         H,W = img.shape[:2]
183 |         noise = np.random.uniform(limit[0], limit[1], size=(H,W))*255
184 | 
185 |         img = img + noise[:,:,np.newaxis]*np.array([1,1,1])
186 |         img = np.clip(img, 0, 255).astype(np.uint8)
187 |         
188 |     return img
189 | 
190 | 
191 | # crop and resize
192 | def random_crop(img, size):
193 |     """
194 |     size: (tuple) (new_w, new_h)
195 |           value:   0.9*W > new_w > 0.8*W
196 |                    0.9*H > new_h > 0.8*H
197 |     """
198 |     H, W = img.shape[:2]
199 |     new_w, new_h = size
200 |     assert(H > new_h)
201 |     assert(W > new_w)
202 | 
203 |     x0 = np.random.choice(W-new_w) if W!=new_w else 0
204 |     y0 = np.random.choice(H-new_h) if H!=new_h else 0
205 | 
206 |     if (new_w, new_h) != (W, H):
207 |         img = img[y0:y0+new_h, x0:x0+new_w, :] 
208 | 
209 |     return img
210 | 
211 | def center_crop(img, size):
212 |     """
213 |     size: (tuple) (new_w, new_h)
214 |     """
215 |     H, W = img.shape[:2]
216 |     new_w, new_h = size
217 | 
218 |     x0 = (W - new_w) // 2
219 |     y0 = (H - new_h) // 2
220 | 
221 |     if (new_w, new_h) != (W, H):
222 |         img = img[y0:y0+new_h, x0:x0+new_w] 
223 | 
224 |     return img
225 | 
226 | def _get_interp_method(interp, sizes=()):
227 |     """
228 |         interpolation method for all resizing operations
229 |         Possible values:
230 |         0: Nearest Neighbors Interpolation.
231 |         1: Bilinear interpolation.
232 |         2: Area-based (resampling using pixel area relation). It may be a
233 |         preferred method for image decimation, as it gives moire-free
234 |         results. But when the image is zoomed, it is similar to the Nearest
235 |         Neighbors method. (used by default).
236 |         3: Bicubic interpolation over 4x4 pixel neighborhood.
237 |         4: Lanczos interpolation over 8x8 pixel neighborhood.
238 |         9: Cubic for enlarge, area for shrink, bilinear for others
239 |         10: Random select from interpolation method metioned above.
240 |     sizes : tuple of int
241 |     """
242 |     if interp == 9:
243 |         if sizes:
244 |             assert len(sizes) == 4
245 |             oh, ow, nh, nw = sizes
246 |             if nh > oh and nw > ow:
247 |                 return 2
248 |             elif nh < oh and nw < ow:
249 |                 return 3
250 |             else:
251 |                 return 1
252 |         else:
253 |             return 2
254 |     if interp == 10:
255 |         return random.randint(0, 4)
256 |     if interp not in (0, 1, 2, 3, 4):
257 |         raise ValueError('Unknown interp method %d' % interp)
258 |     return interp
259 | 
260 | def resize(img, size, interp=2):
261 |     h, w = img.shape[:2]
262 | 
263 |     if h > w:
264 |         new_h, new_w = size * h // w, size
265 |     else:
266 |         new_h, new_w = size, size * w // h
267 |     return cv2.resize(img, (new_w, new_h), _get_interp_method(interp, (h, w, new_h, new_w)))
268 | 
269 | def elastic_transform_fast(img, alpha=2, sigma=100, alpha_affine=100, random_state=None):
270 |     """Elastic deformation of images as described in [Simard2003]_ (with modifications).
271 |     .. [Simard2003] Simard, Steinkraus and Platt, "Best Practices for
272 |          Convolutional Neural Networks applied to Visual Document Analysis", in
273 |          Proc. of the International Conference on Document Analysis and
274 |          Recognition, 2003.
275 |      Based on https://gist.github.com/erniejunior/601cdf56d2b424757de5
276 |     """
277 |     if random_state is None:
278 |         random_state = np.random.RandomState(1234)
279 | 
280 |     shape = img.shape
281 |     shape_size = shape[:2]
282 | 
283 |     # Random affine
284 |     center_square = np.float32(shape_size) // 2
285 |     square_size = min(shape_size) // 3
286 |     alpha = float(alpha)
287 |     sigma = float(sigma)
288 |     alpha_affine = float(alpha_affine)
289 | 
290 |     pts1 = np.float32([center_square + square_size, [center_square[0] + square_size, center_square[1] - square_size],
291 |                        center_square - square_size])
292 |     pts2 = pts1 + random_state.uniform(-alpha_affine,
293 |                                        alpha_affine, size=pts1.shape).astype(np.float32)
294 |     M = cv2.getAffineTransform(pts1, pts2)
295 | 
296 |     img = cv2.warpAffine(
297 |         img, M, shape_size[::-1], borderMode=cv2.BORDER_REFLECT_101)
298 | 
299 |     dx = np.float32(gaussian_filter(
300 |         (random_state.rand(*shape_size) * 2 - 1), sigma) * alpha)
301 |     dy = np.float32(gaussian_filter(
302 |         (random_state.rand(*shape_size) * 2 - 1), sigma) * alpha)
303 | 
304 |     x, y = np.meshgrid(np.arange(shape[1]), np.arange(shape[0]))
305 | 
306 |     mapx = np.float32(x + dx)
307 |     mapy = np.float32(y + dy)
308 | 
309 |     return cv2.remap(img, mapx, mapy, interpolation=cv2.INTER_LINEAR, borderMode=cv2.BORDER_REFLECT_101)
310 | 
311 | def channel_shuffle(img):
312 |     ch_arr = [0, 1, 2]
313 |     np.random.shuffle(ch_arr)
314 |     img = img[..., ch_arr]
315 |     return img
316 | 
317 | def to_gray(img):
318 |     gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
319 |     if np.mean(gray) > 127:
320 |         gray = 255 - gray
321 |     return cv2.cvtColor(gray, cv2.COLOR_GRAY2RGB)
322 | 
323 | 
324 | if __name__ == '__main__':
325 |     img_path = '/home/zhaozhichao/Desktop/3-workspace/FACE_MODE_3DDFA/example/b24.jpg'
326 |     img = cv2.imread(img_path)
327 |     cv2.imshow("origin img", img)
328 |     if len(img.shape) == 2:
329 |         w, h = img.shape[:2]
330 |         img = img.reshape(w, h, 1)
331 |     im2 = FancyPCA(img)
332 |     cv2.imshow("im2", im2)
333 |     cv2.waitKey() 


--------------------------------------------------------------------------------
/appendix/module/models/classification.md:
--------------------------------------------------------------------------------
  1 | 本文代码基于PyTorch 1.0版本，需要用到以下包
  2 | 
  3 | ~~~
  4 | import numpy as np
  5 | import torch
  6 | import torch.nn as nn
  7 | import torch.nn.functional as F
  8 | from torch.autograd import Variable
  9 | ~~~
 10 | 
 11 | #### 1. Basic define
 12 | 
 13 | ~~~python
 14 | # conv3x3
 15 | def conv3x3(in_planes, out_planes, stride=1, padding=1, groups=1):
 16 |     """3x3 convolution"""
 17 |     return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
 18 |                      padding=1, groups=groups, bias=False)
 19 | # conv1x1
 20 | def conv1x1(in_channels, out_channels, stride=1):
 21 |     """1x1 convolution"""
 22 |     return nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, padding=0,
 23 |                     bias=False)
 24 | # BasciConv2d  
 25 | class BasicConv2d(nn.Module):
 26 |     def __init__(self, in_channels, out_channels, **kwargs):
 27 |         super(BasicConv2d, self).__init__()
 28 |         self.conv = nn.Conv2d(in_channels, out_channels, bias=True, **kwargs)
 29 |         self.bn = nn.BatchNorm2d(out_channels, eps=1e-5)
 30 | 
 31 |     def forward(self, x):
 32 |         x = self.conv(x)
 33 |         x = self.bn(x)
 34 |         return F.relu(x, inplace=True)
 35 | ~~~
 36 | 
 37 | #### 2. resnet
 38 | 
 39 | ~~~python
 40 | class BasicBlock(nn.Module):
 41 |     expansion = 1
 42 | 
 43 |     def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
 44 |                  base_width=64, norm_layer=None):
 45 |         super(BasicBlock, self).__init__()
 46 |         if norm_layer is None:
 47 |             norm_layer = nn.BatchNorm2d
 48 |         if groups != 1 or base_width != 64:
 49 |             raise ValueError('BasicBlock only supports groups=1 and base_width=64')
 50 |         # Both self.conv1 and self.downsample layers downsample the input when stride != 1
 51 |         self.conv1 = conv3x3(inplanes, planes, stride)
 52 |         self.bn1 = norm_layer(planes)
 53 |         self.relu = nn.ReLU(inplace=True)
 54 |         self.conv2 = conv3x3(planes, planes)
 55 |         self.bn2 = norm_layer(planes)
 56 |         self.downsample = downsample
 57 |         self.stride = stride
 58 | 
 59 |     def forward(self, x):
 60 |         identity = x
 61 | 
 62 |         out = self.conv1(x)
 63 |         out = self.bn1(out)
 64 |         out = self.relu(out)
 65 | 
 66 |         out = self.conv2(out)
 67 |         out = self.bn2(out)
 68 | 
 69 |         if self.downsample is not None:
 70 |             identity = self.downsample(x)
 71 | 
 72 |         out += identity
 73 |         out = self.relu(out)
 74 | 
 75 |         return out
 76 | ~~~
 77 | 
 78 | ~~~python
 79 | class Bottleneck(nn.Module):
 80 |     expansion = 4
 81 | 
 82 |     def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
 83 |                  base_width=64, norm_layer=None):
 84 |         super(Bottleneck, self).__init__()
 85 |         if norm_layer is None:
 86 |             norm_layer = nn.BatchNorm2d
 87 |         width = int(planes * (base_width / 64.)) * groups
 88 |         # Both self.conv2 and self.downsample layers downsample the input when stride != 1
 89 |         self.conv1 = conv1x1(inplanes, width)
 90 |         self.bn1 = norm_layer(width)
 91 |         self.conv2 = conv3x3(width, width, stride, groups)
 92 |         self.bn2 = norm_layer(width)
 93 |         self.conv3 = conv1x1(width, planes * self.expansion)
 94 |         self.bn3 = norm_layer(planes * self.expansion)
 95 |         self.relu = nn.ReLU(inplace=True)
 96 |         self.downsample = downsample
 97 |         self.stride = stride
 98 | 
 99 |     def forward(self, x):
100 |         identity = x
101 | 
102 |         out = self.conv1(x)
103 |         out = self.bn1(out)
104 |         out = self.relu(out)
105 | 
106 |         out = self.conv2(out)
107 |         out = self.bn2(out)
108 |         out = self.relu(out)
109 | 
110 |         out = self.conv3(out)
111 |         out = self.bn3(out)
112 | 
113 |         if self.downsample is not None:
114 |             identity = self.downsample(x)
115 | 
116 |         out += identity
117 |         out = self.relu(out)
118 | 
119 |         return out
120 | ~~~
121 | 
122 | #### 3. Inception
123 | 
124 | ~~~python
125 | # Inception 
126 | class Inception(nn.Module):
127 | 
128 |     def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
129 |         super(Inception, self).__init__()
130 |         self.branch1 = BasicConv2d(in_channels, ch1x1, kernel_size=1)
131 |         self.branch2 = nn.Sequential(
132 |             BasicConv2d(in_channels, ch3x3red, kernel_size=1),
133 |             BasicConv2d(ch3x3red, ch3x3, kernel_size=3, padding=1)
134 |         )
135 |         self.branch3 = nn.Sequential(
136 |             BasicConv2d(in_channels, ch5x5red, kernel_size=1),
137 |             BasicConv2d(ch5x5red, ch5x5, kernel_size=3, padding=1)
138 |         )
139 |         self.branch4 = nn.Sequential(
140 |             nn.MaxPool2d(kernel_size=3, stride=1, padding=1, ceil_mode=True),
141 |             BasicConv2d(in_channels, pool_proj, kernel_size=1)
142 |         )
143 | 
144 |     def forward(self, x):
145 |         branch1 = self.branch1(x)
146 |         branch2 = self.branch2(x)
147 |         branch3 = self.branch3(x)
148 |         branch4 = self.branch4(x)
149 |         outputs = [branch1, branch2, branch3, branch4]
150 |         return torch.cat(outputs, 1)
151 | ~~~
152 | 
153 | #### 4. SENet
154 | 
155 | ~~~python
156 | class BasicBlock(nn.Module):
157 |     expansion = 1
158 | 
159 |     def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=16):
160 |         super(SEBasicBlock, self).__init__()
161 |         self.conv1 = conv3x3(inplanes, planes, stride)
162 |         self.bn1 = nn.BatchNorm2d(planes)
163 |         self.relu = nn.ReLU(inplace=True)
164 |         self.conv2 = conv3x3(planes, planes, 1)
165 |         self.bn2 = nn.BatchNorm2d(planes)
166 |         self.se = SELayer(planes, reduction)
167 |         self.downsample = downsample
168 |         self.stride = stride
169 | 
170 |     def forward(self, x):
171 |         residual = x
172 |         out = self.conv1(x)
173 |         out = self.bn1(out)
174 |         out = self.relu(out)
175 | 
176 |         out = self.conv2(out)
177 |         out = self.bn2(out)
178 |         out = self.se(out)
179 | 
180 |         if self.downsample is not None:
181 |             residual = self.downsample(x)
182 | 
183 |         out += residual
184 |         out = self.relu(out)
185 | 
186 |         return out
187 | ~~~
188 | 
189 | ~~~python
190 | class Bottleneck(nn.Module):
191 |     expansion = 4
192 | 
193 |     def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=16):
194 |         super(Bottleneck, self).__init__()
195 |         self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
196 |         self.bn1 = nn.BatchNorm2d(planes)
197 |         self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
198 |                                padding=1, bias=False)
199 |         self.bn2 = nn.BatchNorm2d(planes)
200 |         self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
201 |         self.bn3 = nn.BatchNorm2d(planes * 4)
202 |         self.relu = nn.ReLU(inplace=True)
203 |         self.se = SELayer(planes * 4, reduction)
204 |         self.downsample = downsample
205 |         self.stride = stride
206 | 
207 |     def forward(self, x):
208 |         residual = x
209 | 
210 |         out = self.conv1(x)
211 |         out = self.bn1(out)
212 |         out = self.relu(out)
213 | 
214 |         out = self.conv2(out)
215 |         out = self.bn2(out)
216 |         out = self.relu(out)
217 | 
218 |         out = self.conv3(out)
219 |         out = self.bn3(out)
220 |         out = self.se(out)
221 | 
222 |         if self.downsample is not None:
223 |             residual = self.downsample(x)
224 | 
225 |         out += residual
226 |         out = self.relu(out)
227 | 
228 |         return out
229 | ~~~
230 | 
231 | #### 5. Shufflenet 
232 | 
233 | ~~~python
234 | # ShufflenetV1
235 | class ShufflenetUnit(nn.Module):
236 |     expansion = 4
237 |     def __init__(self, inplanes, planes, stride=1, downsample=None, flag=False):
238 |         super(ShufflenetUnit, self).__init__()
239 |         self.downsample = downsample
240 |         group_num = 3
241 |         self.flag = flag
242 |         if self.flag:
243 |             self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, groups=1, bias=False)
244 |         else:
245 |             self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, groups=group_num, bias=False)
246 |         self.bn1 = nn.BatchNorm2d(planes)
247 | 
248 |         self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
249 |                                padding=1, bias=False)
250 |         self.bn2 = nn.BatchNorm2d(planes)
251 | 
252 |         self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, groups=group_num, bias=False)
253 |         self.bn3 = nn.BatchNorm2d(planes * 4)
254 |         self.relu = nn.ReLU(inplace=True)
255 | 
256 |     def _shuffle(self, features, g):
257 |         channels = features.size()[1] 
258 |         index = torch.from_numpy(np.asarray([i for i in range(channels)]))
259 |         index = index.view(-1, g).t().contiguous()
260 |         index = index.view(-1).cuda()
261 |         features = features[:, index]
262 |         return features
263 | 
264 |     def forward(self, x):
265 |         residual = x
266 | 
267 |         out = self.conv1(x)
268 |         out = self.bn1(out)
269 |         out = self.relu(out)
270 | 
271 |         if not self.flag:
272 |             out = self._shuffle(out, 3)
273 | 
274 |         out = self.conv2(out)
275 |         out = self.bn2(out)
276 | 
277 |         out = self.conv3(out)
278 |         out = self.bn3(out)
279 | 
280 |         if self.downsample is not None:
281 |             residual = self.downsample(x)
282 |             out = torch.cat((out, residual), 1) 
283 |         else:
284 |             out += residual
285 |         out = self.relu(out)
286 | 
287 |         return out
288 | ~~~
289 | 
290 | ~~~python
291 | # ShuffleUnit V2
292 | class ShuffleUnit(nn.Module):
293 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
294 |         super(ShufflenetBlockV2, self).__init__()
295 |         self.downsample = downsample
296 | 
297 |         if not self.downsample: #---if not downsample, then channel split, so the channel become half
298 |             inplanes = inplanes // 2
299 |             planes = planes // 2
300 |  
301 |         self.conv1x1_1 = conv1x1(in_channels=inplanes, out_channels=planes)
302 |         self.conv1x1_1_bn = nn.BatchNorm2d(planes)
303 | 
304 |         self.dwconv3x3 = conv3x3(in_channels=planes, out_channels=planes, stride=stride, groups=planes)
305 |         self.dwconv3x3_bn= nn.BatchNorm2d(planes)
306 | 
307 |         self.conv1x1_2 = conv1x1(in_channels=planes, out_channels=planes)
308 |         self.conv1x1_2_bn = nn.BatchNorm2d(planes)
309 | 
310 |         self.relu = nn.ReLU(inplace=True)
311 | 
312 |     def _channel_split(self, features, ratio=0.5):
313 |         size = features.size()[1]
314 |         split_idx = int(size * ratio)
315 |         return features[:,:split_idx], features[:,split_idx:]
316 | 
317 |     def _channel_shuffle(self, features, g=2):
318 |         channels = features.size()[1] 
319 |         index = torch.from_numpy(np.asarray([i for i in range(channels)]))
320 |         index = index.view(-1, g).t().contiguous()
321 |         index = index.view(-1).cuda()
322 |         features = features[:, index]
323 |         return features
324 | 
325 |     def forward(self, x):
326 |         if  self.downsample:
327 |             # x1 = x.clone() #----deep copy x, so where x2 is modified, x1 not be affected
328 |             x1 = x
329 |             x2 = x
330 |         else:
331 |             x1, x2 = self._channel_split(x)
332 | 
333 |         #----right branch----- 
334 |         x2 = self.conv1x1_1(x2)
335 |         x2 = self.conv1x1_1_bn(x2)
336 |         x2 = self.relu(x2)
337 |          
338 |         x2 = self.dwconv3x3(x2)
339 |         x2 = self.dwconv3x3_bn(x2)
340 |     
341 |         x2 = self.conv1x1_2(x2)
342 |         x2 = self.conv1x1_2_bn(x2)
343 |         x2 = self.relu(x2)
344 | 
345 |         #---left branch-------
346 |         if self.downsample:
347 |             x1 = self.downsample(x1)
348 | 
349 |         x = torch.cat([x1, x2], 1)
350 |         x = self._channel_shuffle(x)
351 |         return x
352 | ~~~
353 | 
354 | #### 6. Mobilenet V2
355 | 
356 | ~~~python
357 | # Mobilenet V2
358 | class InvertedResidual(nn.Module):
359 |     def __init__(self, inp, oup, stride, expand_ratio):
360 |         super(InvertedResidual, self).__init__()
361 |         self.stride = stride
362 |         self.use_res_connect = self.stride == 1 and inp == oup
363 |         self.conv = nn.Sequential(
364 |             # pw
365 |             nn.Conv2d(inp, inp * expand_ratio, 1, 1, 0, bias=False),
366 |             nn.BatchNorm2d(inp * expand_ratio),
367 |             nn.ReLU6(inplace=True),
368 |             # dw
369 |             nn.Conv2d(inp * expand_ratio, inp * expand_ratio, 3, stride, 1, groups=inp * expand_ratio, bias=False),
370 |             nn.BatchNorm2d(inp * expand_ratio),
371 |             nn.ReLU6(inplace=True),
372 |             # pw-linear
373 |             nn.Conv2d(inp * expand_ratio, oup, 1, 1, 0, bias=False),
374 |             nn.BatchNorm2d(oup),
375 |         )
376 | 
377 |     def forward(self, x):
378 |         if self.use_res_connect:
379 |             return x + self.conv(x)
380 |         else:
381 |             return self.conv(x)
382 | ~~~
383 | 
384 | 
385 | 
386 | 


--------------------------------------------------------------------------------
/appendix/module/models/fpn.py:
--------------------------------------------------------------------------------
  1 | '''RetinaFPN in PyTorch.'''
  2 | import torch
  3 | import torch.nn as nn
  4 | import torch.nn.functional as F
  5 | 
  6 | from torch.autograd import Variable
  7 | 
  8 | 
  9 | class Bottleneck(nn.Module):
 10 |     expansion = 4
 11 | 
 12 |     def __init__(self, in_planes, planes, stride=1):
 13 |         super(Bottleneck, self).__init__()
 14 |         self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
 15 |         self.bn1 = nn.BatchNorm2d(planes)
 16 |         self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
 17 |         self.bn2 = nn.BatchNorm2d(planes)
 18 |         self.conv3 = nn.Conv2d(planes, self.expansion*planes, kernel_size=1, bias=False)
 19 |         self.bn3 = nn.BatchNorm2d(self.expansion*planes)
 20 | 
 21 |         self.downsample = nn.Sequential()
 22 |         if stride != 1 or in_planes != self.expansion*planes:
 23 |             self.downsample = nn.Sequential(
 24 |                 nn.Conv2d(in_planes, self.expansion*planes, kernel_size=1, stride=stride, bias=False),
 25 |                 nn.BatchNorm2d(self.expansion*planes)
 26 |             )
 27 | 
 28 |     def forward(self, x):
 29 |         out = F.relu(self.bn1(self.conv1(x)))
 30 |         out = F.relu(self.bn2(self.conv2(out)))
 31 |         out = self.bn3(self.conv3(out))
 32 |         out += self.downsample(x)
 33 |         out = F.relu(out)
 34 |         return out
 35 | 
 36 | 
 37 | class FPN(nn.Module):
 38 |     def __init__(self, block, num_blocks):
 39 |         super(FPN, self).__init__()
 40 |         self.in_planes = 64
 41 | 
 42 |         self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
 43 |         self.bn1 = nn.BatchNorm2d(64)
 44 | 
 45 |         # Bottom-up layers
 46 |         self.layer1 = self._make_layer(block,  64, num_blocks[0], stride=1)
 47 |         self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
 48 |         self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
 49 |         self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
 50 |         self.conv6 = nn.Conv2d(2048, 256, kernel_size=3, stride=2, padding=1)
 51 |         self.conv7 = nn.Conv2d( 256, 256, kernel_size=3, stride=2, padding=1)
 52 | 
 53 |         # Lateral layers
 54 |         self.latlayer1 = nn.Conv2d(2048, 256, kernel_size=1, stride=1, padding=0)
 55 |         self.latlayer2 = nn.Conv2d(1024, 256, kernel_size=1, stride=1, padding=0)
 56 |         self.latlayer3 = nn.Conv2d( 512, 256, kernel_size=1, stride=1, padding=0)
 57 | 
 58 |         # Top-down layers
 59 |         self.toplayer1 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
 60 |         self.toplayer2 = nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1)
 61 | 
 62 |     def _make_layer(self, block, planes, num_blocks, stride):
 63 |         strides = [stride] + [1]*(num_blocks-1)
 64 |         layers = []
 65 |         for stride in strides:
 66 |             layers.append(block(self.in_planes, planes, stride))
 67 |             self.in_planes = planes * block.expansion
 68 |         return nn.Sequential(*layers)
 69 | 
 70 |     def _upsample_add(self, x, y):
 71 |         '''Upsample and add two feature maps.
 72 |         Args:
 73 |           x: (Variable) top feature map to be upsampled.
 74 |           y: (Variable) lateral feature map.
 75 |         Returns:
 76 |           (Variable) added feature map.
 77 |         Note in PyTorch, when input size is odd, the upsampled feature map
 78 |         with `F.upsample(..., scale_factor=2, mode='nearest')`
 79 |         maybe not equal to the lateral feature map size.
 80 |         e.g.
 81 |         original input size: [N,_,15,15] ->
 82 |         conv2d feature map size: [N,_,8,8] ->
 83 |         upsampled feature map size: [N,_,16,16]
 84 |         So we choose bilinear upsample which supports arbitrary output sizes.
 85 |         '''
 86 |         _,_,H,W = y.size()
 87 |         return F.upsample(x, size=(H,W), mode='bilinear') + y
 88 | 
 89 |     def forward(self, x):
 90 |         # Bottom-up
 91 |         c1 = F.relu(self.bn1(self.conv1(x)))
 92 |         c1 = F.max_pool2d(c1, kernel_size=3, stride=2, padding=1)
 93 |         c2 = self.layer1(c1)
 94 |         c3 = self.layer2(c2)
 95 |         c4 = self.layer3(c3)
 96 |         c5 = self.layer4(c4)
 97 |         p6 = self.conv6(c5)
 98 |         p7 = self.conv7(F.relu(p6))
 99 |         # Top-down
100 |         p5 = self.latlayer1(c5)
101 |         p4 = self._upsample_add(p5, self.latlayer2(c4))
102 |         p4 = self.toplayer1(p4)
103 |         p3 = self._upsample_add(p4, self.latlayer3(c3))
104 |         p3 = self.toplayer2(p3)
105 |         return p3, p4, p5, p6, p7
106 | 
107 | 
108 | def FPN50():
109 |     return FPN(Bottleneck, [3,4,6,3])
110 | 
111 | def FPN101():
112 |     return FPN(Bottleneck, [2,4,23,3])
113 | 
114 | 
115 | def test():
116 |     net = FPN50()
117 |     fms = net(Variable(torch.randn(1,3,600,300)))
118 |     for fm in fms:
119 |         print(fm.size())
120 | 
121 | # test()


--------------------------------------------------------------------------------
/appendix/module/models/mnasnet.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | 
  3 | import torch
  4 | import torch.nn as nn
  5 | 
  6 | __all__ = ['MNASNet', 'mnasnet0_5', 'mnasnet0_75', 'mnasnet1_0', 'mnasnet1_3']
  7 | 
  8 | _MODEL_URLS = {
  9 |     "mnasnet0_5":
 10 |     "https://download.pytorch.org/models/mnasnet0.5_top1_67.592-7c6cb539b9.pth",
 11 |     "mnasnet0_75": None,
 12 |     "mnasnet1_0":
 13 |     "https://download.pytorch.org/models/mnasnet1.0_top1_73.512-f206786ef8.pth",
 14 |     "mnasnet1_3": None
 15 | }
 16 | 
 17 | # Paper suggests 0.9997 momentum, for TensorFlow. Equivalent PyTorch momentum is
 18 | # 1.0 - tensorflow.
 19 | _BN_MOMENTUM = 1 - 0.9997
 20 | 
 21 | 
 22 | class _InvertedResidual(nn.Module):
 23 | 
 24 |     def __init__(self, in_ch, out_ch, kernel_size, stride, expansion_factor,
 25 |                  bn_momentum=0.1):
 26 |         super(_InvertedResidual, self).__init__()
 27 |         assert stride in [1, 2]
 28 |         assert kernel_size in [3, 5]
 29 |         mid_ch = in_ch * expansion_factor
 30 |         self.apply_residual = (in_ch == out_ch and stride == 1)
 31 |         self.layers = nn.Sequential(
 32 |             # Pointwise
 33 |             nn.Conv2d(in_ch, mid_ch, 1, bias=False),
 34 |             nn.BatchNorm2d(mid_ch, momentum=bn_momentum),
 35 |             nn.ReLU(inplace=True),
 36 |             # Depthwise
 37 |             nn.Conv2d(mid_ch, mid_ch, kernel_size, padding=kernel_size // 2,
 38 |                       stride=stride, groups=mid_ch, bias=False),
 39 |             nn.BatchNorm2d(mid_ch, momentum=bn_momentum),
 40 |             nn.ReLU(inplace=True),
 41 |             # Linear pointwise. Note that there's no activation.
 42 |             nn.Conv2d(mid_ch, out_ch, 1, bias=False),
 43 |             nn.BatchNorm2d(out_ch, momentum=bn_momentum))
 44 | 
 45 |     def forward(self, input):
 46 |         if self.apply_residual:
 47 |             return self.layers(input) + input
 48 |         else:
 49 |             return self.layers(input)
 50 | 
 51 | 
 52 | def _stack(in_ch, out_ch, kernel_size, stride, exp_factor, repeats,
 53 |            bn_momentum):
 54 |     """ Creates a stack of inverted residuals. """
 55 |     assert repeats >= 1
 56 |     # First one has no skip, because feature map size changes.
 57 |     first = _InvertedResidual(in_ch, out_ch, kernel_size, stride, exp_factor,
 58 |                               bn_momentum=bn_momentum)
 59 |     remaining = []
 60 |     for _ in range(1, repeats):
 61 |         remaining.append(
 62 |             _InvertedResidual(out_ch, out_ch, kernel_size, 1, exp_factor,
 63 |                               bn_momentum=bn_momentum))
 64 |     return nn.Sequential(first, *remaining)
 65 | 
 66 | 
 67 | def _round_to_multiple_of(val, divisor, round_up_bias=0.9):
 68 |     """ Asymmetric rounding to make `val` divisible by `divisor`. With default
 69 |     bias, will round up, unless the number is no more than 10% greater than the
 70 |     smaller divisible value, i.e. (83, 8) -> 80, but (84, 8) -> 88. """
 71 |     assert 0.0 < round_up_bias < 1.0
 72 |     new_val = max(divisor, int(val + divisor / 2) // divisor * divisor)
 73 |     return new_val if new_val >= round_up_bias * val else new_val + divisor
 74 | 
 75 | 
 76 | def _scale_depths(depths, alpha):
 77 |     """ Scales tensor depths as in reference MobileNet code, prefers rouding up
 78 |     rather than down. """
 79 |     return [_round_to_multiple_of(depth * alpha, 8) for depth in depths]
 80 | 
 81 | 
 82 | class MNASNet(torch.nn.Module):
 83 |     """ MNASNet, as described in https://arxiv.org/pdf/1807.11626.pdf.
 84 |     >>> model = MNASNet(1000, 1.0)
 85 |     >>> x = torch.rand(1, 3, 224, 224)
 86 |     >>> y = model(x)
 87 |     >>> y.dim()
 88 |     1
 89 |     >>> y.nelement()
 90 |     1000
 91 |     """
 92 | 
 93 |     def __init__(self, alpha, num_classes=1000, dropout=0.2):
 94 |         super(MNASNet, self).__init__()
 95 |         depths = _scale_depths([24, 40, 80, 96, 192, 320], alpha)
 96 |         layers = [
 97 |             # First layer: regular conv.
 98 |             nn.Conv2d(3, 32, 3, padding=1, stride=2, bias=False),
 99 |             nn.BatchNorm2d(32, momentum=_BN_MOMENTUM),
100 |             nn.ReLU(inplace=True),
101 |             # Depthwise separable, no skip.
102 |             nn.Conv2d(32, 32, 3, padding=1, stride=1, groups=32, bias=False),
103 |             nn.BatchNorm2d(32, momentum=_BN_MOMENTUM),
104 |             nn.ReLU(inplace=True),
105 |             nn.Conv2d(32, 16, 1, padding=0, stride=1, bias=False),
106 |             nn.BatchNorm2d(16, momentum=_BN_MOMENTUM),
107 |             # MNASNet blocks: stacks of inverted residuals.
108 |             _stack(16, depths[0], 3, 2, 3, 3, _BN_MOMENTUM),
109 |             _stack(depths[0], depths[1], 5, 2, 3, 3, _BN_MOMENTUM),
110 |             _stack(depths[1], depths[2], 5, 2, 6, 3, _BN_MOMENTUM),
111 |             _stack(depths[2], depths[3], 3, 1, 6, 2, _BN_MOMENTUM),
112 |             _stack(depths[3], depths[4], 5, 2, 6, 4, _BN_MOMENTUM),
113 |             _stack(depths[4], depths[5], 3, 1, 6, 1, _BN_MOMENTUM),
114 |             # Final mapping to classifier input.
115 |             nn.Conv2d(depths[5], 1280, 1, padding=0, stride=1, bias=False),
116 |             nn.BatchNorm2d(1280, momentum=_BN_MOMENTUM),
117 |             nn.ReLU(inplace=True),
118 |         ]
119 |         self.layers = nn.Sequential(*layers)
120 |         self.classifier = nn.Sequential(nn.Dropout(p=dropout, inplace=True),
121 |                                         nn.Linear(1280, num_classes))
122 |         self._initialize_weights()
123 | 
124 |     def forward(self, x):
125 |         x = self.layers(x)
126 |         # Equivalent to global avgpool and removing H and W dimensions.
127 |         x = x.mean([2, 3])
128 |         return self.classifier(x)
129 | 
130 |     def _initialize_weights(self):
131 |         for m in self.modules():
132 |             if isinstance(m, nn.Conv2d):
133 |                 nn.init.kaiming_normal_(m.weight, mode="fan_out",
134 |                                         nonlinearity="relu")
135 |                 if m.bias is not None:
136 |                     nn.init.zeros_(m.bias)
137 |             elif isinstance(m, nn.BatchNorm2d):
138 |                 nn.init.ones_(m.weight)
139 |                 nn.init.zeros_(m.bias)
140 |             elif isinstance(m, nn.Linear):
141 |                 nn.init.normal_(m.weight, 0.01)
142 |                 nn.init.zeros_(m.bias)
143 | 
144 | 
145 | def mnasnet0_5(pretrained=False, progress=True, **kwargs):
146 |     """MNASNet with depth multiplier of 0.5 from
147 |     `"MnasNet: Platform-Aware Neural Architecture Search for Mobile"
148 |     <https://arxiv.org/pdf/1807.11626.pdf>`_.
149 |     Args:
150 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
151 |         progress (bool): If True, displays a progress bar of the download to stderr
152 |     """
153 |     model = MNASNet(0.5, **kwargs)
154 |     return model
155 | 
156 | 
157 | def mnasnet0_75(pretrained=False, progress=True, **kwargs):
158 |     """MNASNet with depth multiplier of 0.75 from
159 |     `"MnasNet: Platform-Aware Neural Architecture Search for Mobile"
160 |     <https://arxiv.org/pdf/1807.11626.pdf>`_.
161 |     Args:
162 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
163 |         progress (bool): If True, displays a progress bar of the download to stderr
164 |     """
165 |     model = MNASNet(0.75, **kwargs)
166 |     return model
167 | 
168 | 
169 | def mnasnet1_0(pretrained=False, progress=True, **kwargs):
170 |     """MNASNet with depth multiplier of 1.0 from
171 |     `"MnasNet: Platform-Aware Neural Architecture Search for Mobile"
172 |     <https://arxiv.org/pdf/1807.11626.pdf>`_.
173 |     Args:
174 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
175 |         progress (bool): If True, displays a progress bar of the download to stderr
176 |     """
177 |     model = MNASNet(1.0, **kwargs)
178 |     return model
179 | 
180 | 
181 | def mnasnet1_3(pretrained=False, progress=True, **kwargs):
182 |     """MNASNet with depth multiplier of 1.3 from
183 |     `"MnasNet: Platform-Aware Neural Architecture Search for Mobile"
184 |     <https://arxiv.org/pdf/1807.11626.pdf>`_.
185 |     Args:
186 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
187 |         progress (bool): If True, displays a progress bar of the download to stderr
188 |     """
189 |     model = MNASNet(1.3, **kwargs)
190 |     return model


--------------------------------------------------------------------------------
/appendix/module/models/mobilenet_v1.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # coding: utf-8
  3 | 
  4 | from __future__ import division
  5 | """ 
  6 | Creates a MobileNet Model as defined in:
  7 | Andrew G. Howard Menglong Zhu Bo Chen, et.al. (2017). 
  8 | MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. 
  9 | Copyright (c) Yang Lu, 2017
 10 | 
 11 | Modified By cleardusk
 12 | """
 13 | import math
 14 | import torch.nn as nn
 15 | 
 16 | __all__ = ['mobilenet_2', 'mobilenet_1', 'mobilenet_075', 'mobilenet_05', 'mobilenet_025']
 17 | 
 18 | class DepthWiseBlock(nn.Module):
 19 |     def __init__(self, inplanes, planes, stride=1, prelu=False):
 20 |         super(DepthWiseBlock, self).__init__()
 21 |         inplanes, planes = int(inplanes), int(planes)
 22 |         self.conv_dw = nn.Conv2d(inplanes, inplanes, kernel_size=3, padding=1, stride=stride, groups=inplanes,
 23 |                                  bias=False)
 24 |         self.bn_dw = nn.BatchNorm2d(inplanes)
 25 |         self.conv_sep = nn.Conv2d(inplanes, planes, kernel_size=1, stride=1, padding=0, bias=False)
 26 |         self.bn_sep = nn.BatchNorm2d(planes)
 27 |         if prelu:
 28 |             self.relu = nn.PReLU()
 29 |         else:
 30 |             self.relu = nn.ReLU(inplace=True)
 31 | 
 32 |     def forward(self, x):
 33 |         out = self.conv_dw(x)
 34 |         out = self.bn_dw(out)
 35 |         out = self.relu(out)
 36 | 
 37 |         out = self.conv_sep(out)
 38 |         out = self.bn_sep(out)
 39 |         out = self.relu(out)
 40 | 
 41 |         return out
 42 | 
 43 | 
 44 | class MobileNet(nn.Module):
 45 |     def __init__(self, widen_factor=1.0, num_classes=1000, prelu=False, input_channel=3):
 46 |         """ Constructor
 47 |         Args:
 48 |             widen_factor: config of widen_factor
 49 |             num_classes: number of classes
 50 |         """
 51 |         super(MobileNet, self).__init__()
 52 | 
 53 |         block = DepthWiseBlock
 54 |         self.conv1 = nn.Conv2d(input_channel, int(32 * widen_factor), kernel_size=3, stride=2, padding=1,
 55 |                                bias=False)
 56 | 
 57 |         self.bn1 = nn.BatchNorm2d(int(32 * widen_factor))
 58 |         if prelu:
 59 |             self.relu = nn.PReLU()
 60 |         else:
 61 |             self.relu = nn.ReLU(inplace=True)
 62 | 
 63 |         self.dw2_1 = block(32 * widen_factor, 64 * widen_factor, prelu=prelu)
 64 |         self.dw2_2 = block(64 * widen_factor, 128 * widen_factor, stride=2, prelu=prelu)
 65 | 
 66 |         self.dw3_1 = block(128 * widen_factor, 128 * widen_factor, prelu=prelu)
 67 |         self.dw3_2 = block(128 * widen_factor, 256 * widen_factor, stride=2, prelu=prelu)
 68 | 
 69 |         self.dw4_1 = block(256 * widen_factor, 256 * widen_factor, prelu=prelu)
 70 |         self.dw4_2 = block(256 * widen_factor, 512 * widen_factor, stride=2, prelu=prelu)
 71 | 
 72 |         self.dw5_1 = block(512 * widen_factor, 512 * widen_factor, prelu=prelu)
 73 |         self.dw5_2 = block(512 * widen_factor, 512 * widen_factor, prelu=prelu)
 74 |         self.dw5_3 = block(512 * widen_factor, 512 * widen_factor, prelu=prelu)
 75 |         self.dw5_4 = block(512 * widen_factor, 512 * widen_factor, prelu=prelu)
 76 |         self.dw5_5 = block(512 * widen_factor, 512 * widen_factor, prelu=prelu)
 77 |         self.dw5_6 = block(512 * widen_factor, 1024 * widen_factor, stride=2, prelu=prelu)
 78 | 
 79 |         self.dw6 = block(1024 * widen_factor, 1024 * widen_factor, prelu=prelu)
 80 | 
 81 |         self.avgpool = nn.AdaptiveAvgPool2d(1)
 82 |         self.fc = nn.Linear(int(1024 * widen_factor), num_classes)
 83 | 
 84 |         for m in self.modules():
 85 |             if isinstance(m, nn.Conv2d):
 86 |                 n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
 87 |                 m.weight.data.normal_(0, math.sqrt(2. / n))
 88 |             elif isinstance(m, nn.BatchNorm2d):
 89 |                 m.weight.data.fill_(1)
 90 |                 m.bias.data.zero_()
 91 | 
 92 |     def forward(self, x):
 93 |         x = self.conv1(x)
 94 |         x = self.bn1(x)
 95 |         x = self.relu(x)
 96 | 
 97 |         x = self.dw2_1(x)
 98 |         x = self.dw2_2(x)
 99 |         x = self.dw3_1(x)
100 |         x = self.dw3_2(x)
101 |         x = self.dw4_1(x)
102 |         x = self.dw4_2(x)
103 |         x = self.dw5_1(x)
104 |         x = self.dw5_2(x)
105 |         x = self.dw5_3(x)
106 |         x = self.dw5_4(x)
107 |         x = self.dw5_5(x)
108 |         x = self.dw5_6(x)
109 |         x = self.dw6(x)
110 | 
111 |         x = self.avgpool(x)
112 |         x = x.view(x.size(0), -1)
113 |         x = self.fc(x)
114 | 
115 |         return x
116 | 
117 | 
118 | def mobilenet(widen_factor=1.0, num_classes=1000):
119 |     """
120 |     Construct MobileNet.
121 |     widen_factor=1.0  for mobilenet_1
122 |     widen_factor=0.75 for mobilenet_075
123 |     widen_factor=0.5  for mobilenet_05
124 |     widen_factor=0.25 for mobilenet_025
125 |     """
126 |     model = MobileNet(widen_factor=widen_factor, num_classes=num_classes)
127 |     return model
128 | 
129 | 
130 | def mobilenet_2(num_classes=62, input_channel=3):
131 |     model = MobileNet(widen_factor=2.0, num_classes=num_classes, input_channel=input_channel)
132 |     return model
133 | 
134 | 
135 | def mobilenet_1(num_classes=62, input_channel=3):
136 |     model = MobileNet(widen_factor=1.0, num_classes=num_classes, input_channel=input_channel)
137 |     return model
138 | 
139 | 
140 | def mobilenet_075(num_classes=62, input_channel=3):
141 |     model = MobileNet(widen_factor=0.75, num_classes=num_classes, input_channel=input_channel)
142 |     return model
143 | 
144 | 
145 | def mobilenet_05(num_classes=62, input_channel=3):
146 |     model = MobileNet(widen_factor=0.5, num_classes=num_classes, input_channel=input_channel)
147 |     return model
148 | 
149 | 
150 | def mobilenet_025(num_classes=62, input_channel=3):
151 |     model = MobileNet(widen_factor=0.25, num_classes=num_classes, input_channel=input_channel)
152 |     return model
153 | 


--------------------------------------------------------------------------------
/appendix/module/models/resnet.py:
--------------------------------------------------------------------------------
  1 | import torch.nn as nn
  2 | import torch.utils.model_zoo as model_zoo
  3 | 
  4 | 
  5 | __all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152']
  6 | 
  7 | 
  8 | model_urls = {
  9 |     'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
 10 |     'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
 11 |     'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
 12 |     'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
 13 |     'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
 14 | }
 15 | 
 16 | 
 17 | def conv3x3(in_planes, out_planes, stride=1):
 18 |     """3x3 convolution with padding"""
 19 |     return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
 20 |                      padding=1, bias=False)
 21 | 
 22 | 
 23 | def conv1x1(in_planes, out_planes, stride=1):
 24 |     """1x1 convolution"""
 25 |     return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
 26 | 
 27 | 
 28 | class BasicBlock(nn.Module):
 29 |     expansion = 1
 30 | 
 31 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
 32 |         super(BasicBlock, self).__init__()
 33 |         self.conv1 = conv3x3(inplanes, planes, stride)
 34 |         self.bn1 = nn.BatchNorm2d(planes)
 35 |         self.relu = nn.ReLU(inplace=True)
 36 |         self.conv2 = conv3x3(planes, planes)
 37 |         self.bn2 = nn.BatchNorm2d(planes)
 38 |         self.downsample = downsample
 39 |         self.stride = stride
 40 | 
 41 |     def forward(self, x):
 42 |         identity = x
 43 | 
 44 |         out = self.conv1(x)
 45 |         out = self.bn1(out)
 46 |         out = self.relu(out)
 47 | 
 48 |         out = self.conv2(out)
 49 |         out = self.bn2(out)
 50 | 
 51 |         if self.downsample is not None:
 52 |             identity = self.downsample(x)
 53 | 
 54 |         out += identity
 55 |         out = self.relu(out)
 56 | 
 57 |         return out
 58 | 
 59 | 
 60 | class Bottleneck(nn.Module):
 61 |     expansion = 4
 62 | 
 63 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
 64 |         super(Bottleneck, self).__init__()
 65 |         self.conv1 = conv1x1(inplanes, planes)
 66 |         self.bn1 = nn.BatchNorm2d(planes)
 67 |         self.conv2 = conv3x3(planes, planes, stride)
 68 |         self.bn2 = nn.BatchNorm2d(planes)
 69 |         self.conv3 = conv1x1(planes, planes * self.expansion)
 70 |         self.bn3 = nn.BatchNorm2d(planes * self.expansion)
 71 |         self.relu = nn.ReLU(inplace=True)
 72 |         self.downsample = downsample
 73 |         self.stride = stride
 74 | 
 75 |     def forward(self, x):
 76 |         identity = x
 77 | 
 78 |         out = self.conv1(x)
 79 |         out = self.bn1(out)
 80 |         out = self.relu(out)
 81 | 
 82 |         out = self.conv2(out)
 83 |         out = self.bn2(out)
 84 |         out = self.relu(out)
 85 | 
 86 |         out = self.conv3(out)
 87 |         out = self.bn3(out)
 88 | 
 89 |         if self.downsample is not None:
 90 |             identity = self.downsample(x)
 91 | 
 92 |         out += identity
 93 |         out = self.relu(out)
 94 | 
 95 |         return out
 96 | 
 97 | 
 98 | class ResNet(nn.Module):
 99 | 
100 |     def __init__(self, block, layers, num_classes=1000, zero_init_residual=False):
101 |         super(ResNet, self).__init__()
102 |         self.inplanes = 64
103 |         self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
104 |                                bias=False)
105 |         self.bn1 = nn.BatchNorm2d(64)
106 |         self.relu = nn.ReLU(inplace=True)
107 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
108 |         self.layer1 = self._make_layer(block, 64, layers[0])
109 |         self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
110 |         self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
111 |         self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
112 |         self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
113 |         self.fc = nn.Linear(512 * block.expansion, num_classes)
114 | 
115 |         for m in self.modules():
116 |             if isinstance(m, nn.Conv2d):
117 |                 nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
118 |             elif isinstance(m, nn.BatchNorm2d):
119 |                 nn.init.constant_(m.weight, 1)
120 |                 nn.init.constant_(m.bias, 0)
121 | 
122 |         # Zero-initialize the last BN in each residual branch,
123 |         # so that the residual branch starts with zeros, and each residual block behaves like an identity.
124 |         # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
125 |         if zero_init_residual:
126 |             for m in self.modules():
127 |                 if isinstance(m, Bottleneck):
128 |                     nn.init.constant_(m.bn3.weight, 0)
129 |                 elif isinstance(m, BasicBlock):
130 |                     nn.init.constant_(m.bn2.weight, 0)
131 | 
132 |     def _make_layer(self, block, planes, blocks, stride=1):
133 |         downsample = None
134 |         if stride != 1 or self.inplanes != planes * block.expansion:
135 |             downsample = nn.Sequential(
136 |                 conv1x1(self.inplanes, planes * block.expansion, stride),
137 |                 nn.BatchNorm2d(planes * block.expansion),
138 |             )
139 | 
140 |         layers = []
141 |         layers.append(block(self.inplanes, planes, stride, downsample))
142 |         self.inplanes = planes * block.expansion
143 |         for _ in range(1, blocks):
144 |             layers.append(block(self.inplanes, planes))
145 | 
146 |         return nn.Sequential(*layers)
147 | 
148 |     def forward(self, x):
149 |         x = self.conv1(x)
150 |         x = self.bn1(x)
151 |         x = self.relu(x)
152 |         x = self.maxpool(x)
153 | 
154 |         x = self.layer1(x)
155 |         x = self.layer2(x)
156 |         x = self.layer3(x)
157 |         x = self.layer4(x)
158 | 
159 |         x = self.avgpool(x)
160 |         x = x.view(x.size(0), -1)
161 |         x = self.fc(x)
162 | 
163 |         return x
164 | 
165 | 
166 | def resnet18(pretrained=False, **kwargs):
167 |     """Constructs a ResNet-18 model.
168 |     Args:
169 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
170 |     """
171 |     model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)
172 |     if pretrained:
173 |         model.load_state_dict(model_zoo.load_url(model_urls['resnet18']))
174 |     return model
175 | 
176 | 
177 | def resnet34(pretrained=False, **kwargs):
178 |     """Constructs a ResNet-34 model.
179 |     Args:
180 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
181 |     """
182 |     model = ResNet(BasicBlock, [3, 4, 6, 3], 62)
183 |     if pretrained:
184 |         model.load_state_dict(model_zoo.load_url(model_urls['resnet34']))
185 |     return model
186 | 
187 | 
188 | def resnet50(pretrained=False, **kwargs):
189 |     """Constructs a ResNet-50 model.
190 |     Args:
191 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
192 |     """
193 |     model = ResNet(Bottleneck, [3, 4, 6, 3], 62)
194 |     if pretrained:
195 |         model.load_state_dict(model_zoo.load_url(model_urls['resnet50']))
196 |     return model
197 | 
198 | 
199 | def resnet101(pretrained=False, **kwargs):
200 |     """Constructs a ResNet-101 model.
201 |     Args:
202 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
203 |     """
204 |     model = ResNet(Bottleneck, [3, 4, 23, 3], 62)
205 |     if pretrained:
206 |         model.load_state_dict(model_zoo.load_url(model_urls['resnet101']))
207 |     return model
208 | 
209 | 
210 | def resnet152(pretrained=False, **kwargs):
211 |     """Constructs a ResNet-152 model.
212 |     Args:
213 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
214 |     """
215 |     model = ResNet(Bottleneck, [3, 8, 36, 3], **kwargs)
216 |     if pretrained:
217 |         model.load_state_dict(model_zoo.load_url(model_urls['resnet152']))
218 |     return model
219 | 


--------------------------------------------------------------------------------
/appendix/module/models/senet.py:
--------------------------------------------------------------------------------
  1 | import torch.nn as nn
  2 | import math
  3 | import torch.utils.model_zoo as model_zoo
  4 | 
  5 | 
  6 | __all__ = ['SENet', 'senet18', 'senet34', 'senet50', 'senet101', 'senet152']
  7 | 
  8 | def conv3x3(in_planes, out_planes, stride=1):
  9 |     """3x3 convolution with padding"""
 10 |     return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
 11 |                      padding=1, bias=False)
 12 | 
 13 | class BasicBlock(nn.Module):
 14 |     expansion = 1
 15 | 
 16 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
 17 |         super(BasicBlock, self).__init__()
 18 |         self.conv1 = conv3x3(inplanes, planes, stride)
 19 |         self.bn1 = nn.BatchNorm2d(planes)
 20 |         self.relu = nn.ReLU(inplace=True)
 21 |         self.conv2 = conv3x3(planes, planes)
 22 |         self.bn2 = nn.BatchNorm2d(planes)
 23 |         self.downsample = downsample
 24 |         self.stride = stride
 25 | 
 26 |         if planes == 64:
 27 |             self.globalAvgPool = nn.AvgPool2d(56, stride=1)
 28 |         elif planes == 128:
 29 |             self.globalAvgPool = nn.AvgPool2d(28, stride=1)
 30 |         elif planes == 256:
 31 |             self.globalAvgPool = nn.AvgPool2d(14, stride=1)
 32 |         elif planes == 512:
 33 |             self.globalAvgPool = nn.AvgPool2d(7, stride=1)
 34 |         self.fc1 = nn.Linear(in_features=planes, out_features=round(planes / 16))
 35 |         self.fc2 = nn.Linear(in_features=round(planes / 16), out_features=planes)
 36 |         self.sigmoid = nn.Sigmoid()
 37 | 
 38 |     def forward(self, x):
 39 |         residual = x
 40 | 
 41 |         out = self.conv1(x)
 42 |         out = self.bn1(out)
 43 |         out = self.relu(out)
 44 | 
 45 |         out = self.conv2(out)
 46 |         out = self.bn2(out)
 47 | 
 48 |         if self.downsample is not None:
 49 |             residual = self.downsample(x)
 50 | 
 51 |         original_out = out
 52 |         out = self.globalAvgPool(out)
 53 |         out = out.view(out.size(0), -1)
 54 |         out = self.fc1(out)
 55 |         out = self.relu(out)
 56 |         out = self.fc2(out)
 57 |         out = self.sigmoid(out)
 58 |         out = out.view(out.size(0), out.size(1), 1, 1)
 59 |         out = out * original_out
 60 | 
 61 |         out += residual
 62 |         out = self.relu(out)
 63 | 
 64 |         return out
 65 | 
 66 | 
 67 | class Bottleneck(nn.Module):
 68 |     expansion = 4
 69 | 
 70 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
 71 |         super(Bottleneck, self).__init__()
 72 |         self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
 73 |         self.bn1 = nn.BatchNorm2d(planes)
 74 |         self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
 75 |                                padding=1, bias=False)
 76 |         self.bn2 = nn.BatchNorm2d(planes)
 77 |         self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
 78 |         self.bn3 = nn.BatchNorm2d(planes * 4)
 79 |         self.relu = nn.ReLU(inplace=True)
 80 |         if planes == 64:
 81 |             self.globalAvgPool = nn.AvgPool2d(56, stride=1)
 82 |         elif planes == 128:
 83 |             self.globalAvgPool = nn.AvgPool2d(28, stride=1)
 84 |         elif planes == 256:
 85 |             self.globalAvgPool = nn.AvgPool2d(14, stride=1)
 86 |         elif planes == 512:
 87 |             self.globalAvgPool = nn.AvgPool2d(7, stride=1)
 88 |         self.fc1 = nn.Linear(in_features=planes * 4, out_features=round(planes / 4))
 89 |         self.fc2 = nn.Linear(in_features=round(planes / 4), out_features=planes * 4)
 90 |         self.sigmoid = nn.Sigmoid()
 91 |         self.downsample = downsample
 92 |         self.stride = stride
 93 | 
 94 |     def forward(self, x):
 95 |         residual = x
 96 | 
 97 |         out = self.conv1(x)
 98 |         out = self.bn1(out)
 99 |         out = self.relu(out)
100 | 
101 |         out = self.conv2(out)
102 |         out = self.bn2(out)
103 |         out = self.relu(out)
104 | 
105 |         out = self.conv3(out)
106 |         out = self.bn3(out)
107 | 
108 |         if self.downsample is not None:
109 |             residual = self.downsample(x)
110 | 
111 |         original_out = out
112 |         out = self.globalAvgPool(out)
113 |         out = out.view(out.size(0), -1)
114 |         out = self.fc1(out)
115 |         out = self.relu(out)
116 |         out = self.fc2(out)
117 |         out = self.sigmoid(out)
118 |         out = out.view(out.size(0),out.size(1),1,1)
119 |         out = out * original_out
120 | 
121 |         out += residual
122 |         out = self.relu(out)
123 | 
124 |         return out
125 | 
126 | 
127 | class SENet(nn.Module):
128 | 
129 |     def __init__(self, block, layers, num_classes=1000):
130 |         self.inplanes = 64
131 |         super(SENet, self).__init__()
132 |         self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
133 |                                bias=False)
134 |         self.bn1 = nn.BatchNorm2d(64)
135 |         self.relu = nn.ReLU(inplace=True)
136 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
137 |         self.layer1 = self._make_layer(block, 64, layers[0])
138 |         self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
139 |         self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
140 |         self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
141 |         self.avgpool = nn.AvgPool2d(7, stride=1)
142 |         self.fc = nn.Linear(512 * block.expansion, num_classes)
143 | 
144 |         for m in self.modules():
145 |             if isinstance(m, nn.Conv2d):
146 |                 n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
147 |                 m.weight.data.normal_(0, math.sqrt(2. / n))
148 |             elif isinstance(m, nn.BatchNorm2d):
149 |                 m.weight.data.fill_(1)
150 |                 m.bias.data.zero_()
151 | 
152 |     def _make_layer(self, block, planes, blocks, stride=1):
153 |         downsample = None
154 |         if stride != 1 or self.inplanes != planes * block.expansion:
155 |             downsample = nn.Sequential(
156 |                 nn.Conv2d(self.inplanes, planes * block.expansion,
157 |                           kernel_size=1, stride=stride, bias=False),
158 |                 nn.BatchNorm2d(planes * block.expansion),
159 |             )
160 | 
161 |         layers = []
162 |         layers.append(block(self.inplanes, planes, stride, downsample))
163 |         self.inplanes = planes * block.expansion
164 |         for i in range(1, blocks):
165 |             layers.append(block(self.inplanes, planes))
166 | 
167 |         return nn.Sequential(*layers)
168 | 
169 |     def forward(self, x):
170 |         x = self.conv1(x)
171 |         x = self.bn1(x)
172 |         x = self.relu(x)
173 |         x = self.maxpool(x)
174 | 
175 |         x = self.layer1(x)
176 |         x = self.layer2(x)
177 |         x = self.layer3(x)
178 |         x = self.layer4(x)
179 | 
180 |         x = self.avgpool(x)
181 |         x = x.view(x.size(0), -1)
182 |         x = self.fc(x)
183 | 
184 |         return x
185 | 
186 | def senet18(pretrained=False, **kwargs):
187 |     """Constructs a ResNet-18 model.
188 |     Args:
189 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
190 |     """
191 |     model = SENet(BasicBlock, [2, 2, 2, 2], **kwargs)
192 |     return model
193 | 
194 | 
195 | def senet34(pretrained=False, **kwargs):
196 |     """Constructs a ResNet-34 model.
197 |     Args:
198 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
199 |     """
200 |     model = SENet(BasicBlock, [3, 4, 6, 3], **kwargs)
201 |     return model
202 | 
203 | 
204 | def senet50(pretrained=False, **kwargs):
205 |     """Constructs a ResNet-50 model.
206 |     Args:
207 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
208 |     """
209 |     model = SENet(Bottleneck, [3, 4, 6, 3], **kwargs)
210 |     return model
211 | 
212 | 
213 | def senet101(pretrained=False, **kwargs):
214 |     """Constructs a ResNet-101 model.
215 |     Args:
216 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
217 |     """
218 |     model = SENet(Bottleneck, [3, 4, 23, 3], **kwargs)
219 |     return model
220 | 
221 | 
222 | def senet152(pretrained=False, **kwargs):
223 |     """Constructs a ResNet-152 model.
224 |     Args:
225 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
226 |     """
227 |     model = SENet(Bottleneck, [3, 8, 36, 3], **kwargs)
228 |     return model


--------------------------------------------------------------------------------
/appendix/module/models/shufflenet_v2.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | # import torch.utils.model_zoo as model_zoo
  4 | 
  5 | 
  6 | __all__ = [
  7 |     'ShuffleNetV2', 'shufflenet_v2_x0_5', 'shufflenet_v2_x1_0', 'shufflenet_v2_x1_5', 'shufflenet_v2_x2_0'
  8 | ]
  9 | 
 10 | # model_urls = {
 11 | #     'shufflenetv2_x0.5': 'https://download.pytorch.org/models/shufflenetv2_x0.5-f707e7126e.pth',
 12 | #     'shufflenetv2_x1.0': 'https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth',
 13 | #     'shufflenetv2_x1.5': None,
 14 | #     'shufflenetv2_x2.0': None,
 15 | # }
 16 | 
 17 | 
 18 | def channel_shuffle(x, groups):
 19 |     batchsize, num_channels, height, width = x.data.size()
 20 |     channels_per_group = num_channels // groups
 21 | 
 22 |     # reshape
 23 |     x = x.view(batchsize, groups,
 24 |                channels_per_group, height, width)
 25 | 
 26 |     x = torch.transpose(x, 1, 2).contiguous()
 27 | 
 28 |     # flatten
 29 |     x = x.view(batchsize, -1, height, width)
 30 | 
 31 |     return x
 32 | 
 33 | 
 34 | class InvertedResidual(nn.Module):
 35 |     def __init__(self, inp, oup, stride):
 36 |         super(InvertedResidual, self).__init__()
 37 | 
 38 |         if not (1 <= stride <= 3):
 39 |             raise ValueError('illegal stride value')
 40 |         self.stride = stride
 41 | 
 42 |         branch_features = oup // 2
 43 |         assert (self.stride != 1) or (inp == branch_features << 1)
 44 | 
 45 |         if self.stride > 1:
 46 |             self.branch1 = nn.Sequential(
 47 |                 self.depthwise_conv(inp, inp, kernel_size=3, stride=self.stride, padding=1),
 48 |                 nn.BatchNorm2d(inp),
 49 |                 nn.Conv2d(inp, branch_features, kernel_size=1, stride=1, padding=0, bias=False),
 50 |                 nn.BatchNorm2d(branch_features),
 51 |                 nn.ReLU(inplace=True),
 52 |             )
 53 | 
 54 |         self.branch2 = nn.Sequential(
 55 |             nn.Conv2d(inp if (self.stride > 1) else branch_features,
 56 |                       branch_features, kernel_size=1, stride=1, padding=0, bias=False),
 57 |             nn.BatchNorm2d(branch_features),
 58 |             nn.ReLU(inplace=True),
 59 |             self.depthwise_conv(branch_features, branch_features, kernel_size=3, stride=self.stride, padding=1),
 60 |             nn.BatchNorm2d(branch_features),
 61 |             nn.Conv2d(branch_features, branch_features, kernel_size=1, stride=1, padding=0, bias=False),
 62 |             nn.BatchNorm2d(branch_features),
 63 |             nn.ReLU(inplace=True),
 64 |         )
 65 | 
 66 |     @staticmethod
 67 |     def depthwise_conv(i, o, kernel_size, stride=1, padding=0, bias=False):
 68 |         return nn.Conv2d(i, o, kernel_size, stride, padding, bias=bias, groups=i)
 69 | 
 70 |     def forward(self, x):
 71 |         if self.stride == 1:
 72 |             x1, x2 = x.chunk(2, dim=1)
 73 |             out = torch.cat((x1, self.branch2(x2)), dim=1)
 74 |         else:
 75 |             out = torch.cat((self.branch1(x), self.branch2(x)), dim=1)
 76 | 
 77 |         out = channel_shuffle(out, 2)
 78 | 
 79 |         return out
 80 | 
 81 | 
 82 | class ShuffleNetV2(nn.Module):
 83 |     def __init__(self, stages_repeats, stages_out_channels, num_classes=1000):
 84 |         super(ShuffleNetV2, self).__init__()
 85 | 
 86 |         if len(stages_repeats) != 3:
 87 |             raise ValueError('expected stages_repeats as list of 3 positive ints')
 88 |         if len(stages_out_channels) != 5:
 89 |             raise ValueError('expected stages_out_channels as list of 5 positive ints')
 90 |         self._stage_out_channels = stages_out_channels
 91 | 
 92 |         input_channels = 3
 93 |         output_channels = self._stage_out_channels[0]
 94 |         self.conv1 = nn.Sequential(
 95 |             nn.Conv2d(input_channels, output_channels, 3, 2, 1, bias=False),
 96 |             nn.BatchNorm2d(output_channels),
 97 |             nn.ReLU(inplace=True),
 98 |         )
 99 |         input_channels = output_channels
100 | 
101 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
102 | 
103 |         stage_names = ['stage{}'.format(i) for i in [2, 3, 4]]
104 |         for name, repeats, output_channels in zip(
105 |                 stage_names, stages_repeats, self._stage_out_channels[1:]):
106 |             seq = [InvertedResidual(input_channels, output_channels, 2)]
107 |             for i in range(repeats - 1):
108 |                 seq.append(InvertedResidual(output_channels, output_channels, 1))
109 |             setattr(self, name, nn.Sequential(*seq))
110 |             input_channels = output_channels
111 | 
112 |         output_channels = self._stage_out_channels[-1]
113 |         self.conv5 = nn.Sequential(
114 |             nn.Conv2d(input_channels, output_channels, 1, 1, 0, bias=False),
115 |             nn.BatchNorm2d(output_channels),
116 |             nn.ReLU(inplace=True),
117 |         )
118 | 
119 |         self.fc = nn.Linear(output_channels, num_classes)
120 | 
121 |     def forward(self, x):
122 |         x = self.conv1(x)
123 |         x = self.maxpool(x)
124 |         x = self.stage2(x)
125 |         x = self.stage3(x)
126 |         x = self.stage4(x)
127 |         x = self.conv5(x)
128 |         x = x.mean([2, 3])  # globalpool
129 |         x = self.fc(x)
130 |         return x
131 | 
132 | 
133 | def shufflenet_v2_x0_5(pretrained=False, progress=True, **kwargs):
134 |     """
135 |     Constructs a ShuffleNetV2 with 0.5x output channels, as described in
136 |     `"ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design"
137 |     <https://arxiv.org/abs/1807.11164>`_.
138 |     Args:
139 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
140 |         progress (bool): If True, displays a progress bar of the download to stderr
141 |     """
142 |     return ShuffleNetV2([4, 8, 4], [24, 48, 96, 192, 1024], **kwargs)
143 | 
144 | 
145 | def shufflenet_v2_x1_0(pretrained=False, progress=True, **kwargs):
146 |     """
147 |     Constructs a ShuffleNetV2 with 1.0x output channels, as described in
148 |     `"ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design"
149 |     <https://arxiv.org/abs/1807.11164>`_.
150 |     Args:
151 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
152 |         progress (bool): If True, displays a progress bar of the download to stderr
153 |     """
154 |     return ShuffleNetV2([4, 8, 4], [24, 116, 232, 464, 1024], **kwargs)
155 | 
156 | 
157 | def shufflenet_v2_x1_5(pretrained=False, progress=True, **kwargs):
158 |     """
159 |     Constructs a ShuffleNetV2 with 1.5x output channels, as described in
160 |     `"ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design"
161 |     <https://arxiv.org/abs/1807.11164>`_.
162 |     Args:
163 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
164 |         progress (bool): If True, displays a progress bar of the download to stderr
165 |     """
166 |     return ShuffleNetV2([4, 8, 4], [24, 176, 352, 704, 1024], **kwargs)
167 | 
168 | 
169 | def shufflenet_v2_x2_0(pretrained=False, progress=True, **kwargs):
170 |     """
171 |     Constructs a ShuffleNetV2 with 2.0x output channels, as described in
172 |     `"ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design"
173 |     <https://arxiv.org/abs/1807.11164>`_.
174 |     Args:
175 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
176 |         progress (bool): If True, displays a progress bar of the download to stderr
177 |     """
178 |     return ShuffleNetV2([4, 8, 4], [24, 244, 488, 976, 2048], **kwargs)


--------------------------------------------------------------------------------
/appendix/module/models/unet.py:
--------------------------------------------------------------------------------
 1 | import _init_paths
 2 | import torch
 3 | import torch.nn as nn
 4 | from layers import unetConv2, unetUp
 5 | from utils import init_weights, count_param
 6 | 
 7 | class UNet(nn.Module):
 8 | 
 9 |     def __init__(self, in_channels=1, n_classes=2, feature_scale=2, is_deconv=True, is_batchnorm=True):
10 |         super(UNet, self).__init__()
11 |         self.in_channels = in_channels
12 |         self.feature_scale = feature_scale
13 |         self.is_deconv = is_deconv
14 |         self.is_batchnorm = is_batchnorm
15 |         
16 | 
17 |         filters = [64, 128, 256, 512, 1024]
18 |         filters = [int(x / self.feature_scale) for x in filters]
19 | 
20 |         # downsampling
21 |         self.maxpool = nn.MaxPool2d(kernel_size=2)
22 |         self.conv1 = unetConv2(self.in_channels, filters[0], self.is_batchnorm)
23 |         self.conv2 = unetConv2(filters[0], filters[1], self.is_batchnorm)
24 |         self.conv3 = unetConv2(filters[1], filters[2], self.is_batchnorm)
25 |         self.conv4 = unetConv2(filters[2], filters[3], self.is_batchnorm)
26 |         self.center = unetConv2(filters[3], filters[4], self.is_batchnorm)
27 |         # upsampling
28 |         self.up_concat4 = unetUp(filters[4], filters[3], self.is_deconv)
29 |         self.up_concat3 = unetUp(filters[3], filters[2], self.is_deconv)
30 |         self.up_concat2 = unetUp(filters[2], filters[1], self.is_deconv)
31 |         self.up_concat1 = unetUp(filters[1], filters[0], self.is_deconv)
32 |         # final conv (without any concat)
33 |         self.final = nn.Conv2d(filters[0], n_classes, 1)
34 | 
35 |         # initialise weights
36 |         for m in self.modules():
37 |             if isinstance(m, nn.Conv2d):
38 |                 init_weights(m, init_type='kaiming')
39 |             elif isinstance(m, nn.BatchNorm2d):
40 |                 init_weights(m, init_type='kaiming')
41 | 
42 |     def forward(self, inputs):
43 |         conv1 = self.conv1(inputs)           # 16*512*512
44 |         maxpool1 = self.maxpool(conv1)       # 16*256*256
45 |         
46 |         conv2 = self.conv2(maxpool1)         # 32*256*256
47 |         maxpool2 = self.maxpool(conv2)       # 32*128*128
48 | 
49 |         conv3 = self.conv3(maxpool2)         # 64*128*128
50 |         maxpool3 = self.maxpool(conv3)       # 64*64*64
51 | 
52 |         conv4 = self.conv4(maxpool3)         # 128*64*64
53 |         maxpool4 = self.maxpool(conv4)       # 128*32*32
54 | 
55 |         center = self.center(maxpool4)       # 256*32*32
56 |         up4 = self.up_concat4(center,conv4)  # 128*64*64
57 |         up3 = self.up_concat3(up4,conv3)     # 64*128*128
58 |         up2 = self.up_concat2(up3,conv2)     # 32*256*256
59 |         up1 = self.up_concat1(up2,conv1)     # 16*512*512
60 | 
61 |         final = self.final(up1)
62 | 
63 |         return final
64 | 
65 | if __name__ == '__main__':
66 |     print('#### Test Case ###')
67 |     from torch.autograd import Variable
68 |     x = Variable(torch.rand(2,1,64,64)).cuda()
69 |     model = UNet().cuda()
70 |     param = count_param(model)
71 |     y = model(x)
72 |     print('Output shape:',y.shape)
73 |     print('UNet totoal parameters: %.2fM (%d)'%(param/1e6,param))


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-example-master/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | env/
 12 | build/
 13 | develop-eggs/
 14 | dist/
 15 | downloads/
 16 | eggs/
 17 | .eggs/
 18 | lib/
 19 | lib64/
 20 | parts/
 21 | sdist/
 22 | var/
 23 | wheels/
 24 | *.egg-info/
 25 | .installed.cfg
 26 | *.egg
 27 | 
 28 | # PyInstaller
 29 | #  Usually these files are written by a python script from a template
 30 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 31 | *.manifest
 32 | *.spec
 33 | 
 34 | # Installer logs
 35 | pip-log.txt
 36 | pip-delete-this-directory.txt
 37 | 
 38 | # Unit test / coverage reports
 39 | htmlcov/
 40 | .tox/
 41 | .coverage
 42 | .coverage.*
 43 | .cache
 44 | nosetests.xml
 45 | coverage.xml
 46 | *.cover
 47 | .hypothesis/
 48 | 
 49 | # Translations
 50 | *.mo
 51 | *.pot
 52 | 
 53 | # Django stuff:
 54 | *.log
 55 | local_settings.py
 56 | 
 57 | # Flask stuff:
 58 | instance/
 59 | .webassets-cache
 60 | 
 61 | # Scrapy stuff:
 62 | .scrapy
 63 | 
 64 | # Sphinx documentation
 65 | docs/_build/
 66 | 
 67 | # PyBuilder
 68 | target/
 69 | 
 70 | # Jupyter Notebook
 71 | .ipynb_checkpoints
 72 | 
 73 | # pyenv
 74 | .python-version
 75 | 
 76 | # celery beat schedule file
 77 | celerybeat-schedule
 78 | 
 79 | # SageMath parsed files
 80 | *.sage.py
 81 | 
 82 | # dotenv
 83 | .env
 84 | 
 85 | # virtualenv
 86 | .venv
 87 | venv/
 88 | ENV/
 89 | 
 90 | # Spyder project settings
 91 | .spyderproject
 92 | .spyproject
 93 | 
 94 | # Rope project settings
 95 | .ropeproject
 96 | 
 97 | # mkdocs documentation
 98 | /site
 99 | 
100 | # mypy
101 | .mypy_cache/
102 | 
103 | data/
104 | 


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-example-master/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 なるみ
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-example-master/README.md:
--------------------------------------------------------------------------------
 1 | # Pytorch Distributed Example
 2 | 
 3 | If you are using previous version of PyTorch:
 4 | 
 5 | - [v1.1.0](https://github.com/narumiruna/pytorch-distributed-example/tree/v1.1.0)
 6 | - [v1.0.1](https://github.com/narumiruna/pytorch-distributed-example/tree/v1.0.1)
 7 | - [v0.4.1](https://github.com/narumiruna/pytorch-distributed-example/tree/v0.4.1)
 8 | 
 9 | ## Requirements
10 | 
11 | - pytorch
12 | - torchvision
13 | 
14 | ## References
15 | 
16 | - [Distributed communication package - torch.distributed](http://pytorch.org/docs/master/distributed.html)
17 | - [Writing Distributed Applications with PyTorch](http://pytorch.org/tutorials/intermediate/dist_tuto.html)
18 | 


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-example-master/mnist/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM pytorch/pytorch:1.2-cuda10.0-cudnn7-runtime
 2 | 
 3 | RUN pip install torchvision \
 4 |     && rm -rf ~/.cache/pip
 5 | 
 6 | ENV GLOO_SOCKET_IFNAME=eth0
 7 | ENV NCCL_SOCKET_IFNAME=eth0
 8 | 
 9 | WORKDIR /work
10 | RUN python -c "from torchvision import datasets;datasets.MNIST('data', download=True)"
11 | COPY main.py .
12 | 


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-example-master/mnist/README.md:
--------------------------------------------------------------------------------
 1 | # MNIST Example
 2 | 
 3 | ```shell
 4 | export GLOO_SOCKET_IFNAME=eth0
 5 | ```
 6 | 
 7 | Rank 0
 8 | ```
 9 | $ python3 main.py --init-method tcp://127.0.0.1:23456 --rank 0 --world-size 2
10 | ```
11 | 
12 | Rank 1
13 | ```
14 | $ python3 main.py --init-method tcp://127.0.0.1:23456 --rank 1 --world-size 2
15 | ```
16 | 
17 | ## Use specific root directory for running example on single machine.
18 | 
19 | Rank 0
20 | ```
21 | $ python3 main.py --init-method tcp://127.0.0.1:23456 --rank 0 --world-size 2 --root data0
22 | ```
23 | 
24 | Rank 1
25 | ```
26 | $ python3 main.py --init-method tcp://127.0.0.1:23456 --rank 1 --world-size 2 --root data1
27 | ```
28 | 
29 | ## Run in docker
30 | 
31 | Install [docker](https://docs.docker.com/install/), [docker-compose](https://docs.docker.com/compose/install/) and [NVIDIA docker](https://github.com/NVIDIA/nvidia-docker) (if you want to run with GPU)
32 | 
33 | ```
34 | $ docker build --file Dockerfile --tag pytorch-distributed-example .
35 | $ docker-compose up
36 | For GPU
37 | $ docker-compose --file docker-compose-gpu.yml up
38 | ```
39 | 


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-example-master/mnist/docker-compose-gpu.yml:
--------------------------------------------------------------------------------
 1 | version: "2.3"
 2 | services:
 3 |   rank0:
 4 |     image: pytorch-distributed-example
 5 |     runtime: nvidia
 6 |     networks:
 7 |       bridge:
 8 |         ipv4_address: 10.1.0.10
 9 |     command: python -u main.py --backend nccl --init-method tcp://10.1.0.10:23456 --world-size 2 --rank 0
10 |   rank1:
11 |     image: pytorch-distributed-example
12 |     runtime: nvidia
13 |     networks:
14 |       bridge:
15 |         ipv4_address: 10.1.0.11
16 |     command: python -u main.py --backend nccl --init-method tcp://10.1.0.10:23456 --world-size 2 --rank 1
17 | networks:
18 |   bridge:
19 |     driver: bridge
20 |     ipam:
21 |       config:
22 |         - subnet: 10.1.0.0/16
23 | 


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-example-master/mnist/docker-compose.yml:
--------------------------------------------------------------------------------
 1 | version: "2.3"
 2 | services:
 3 |   rank0:
 4 |     image: pytorch-distributed-example
 5 |     networks:
 6 |       bridge:
 7 |         ipv4_address: 10.1.0.10
 8 |     command: python -u main.py --init-method tcp://10.1.0.10:23456 --world-size 2 --rank 0
 9 |   rank1:
10 |     image: pytorch-distributed-example
11 |     networks:
12 |       bridge:
13 |         ipv4_address: 10.1.0.11
14 |     command: python -u main.py --init-method tcp://10.1.0.10:23456 --world-size 2 --rank 1
15 | networks:
16 |   bridge:
17 |     driver: bridge
18 |     ipam:
19 |       config:
20 |         - subnet: 10.1.0.0/16
21 | 


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-example-master/mnist/main.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division, print_function
  2 | 
  3 | import argparse
  4 | import os
  5 | 
  6 | import torch
  7 | import torch.nn.functional as F
  8 | from torch import distributed, nn
  9 | from torch.utils import data
 10 | from torchvision import datasets, transforms
 11 | 
 12 | os.environ['NCCL_SOCKET_IFNAME'] = 'enp2s0'
 13 | 
 14 | def distributed_is_initialized():
 15 |     if distributed.is_available():
 16 |         if distributed.is_initialized():
 17 |             return True
 18 |     return False
 19 | 
 20 | 
 21 | class Average(object):
 22 | 
 23 |     def __init__(self):
 24 |         self.sum = 0
 25 |         self.count = 0
 26 | 
 27 |     def __str__(self):
 28 |         return '{:.6f}'.format(self.average)
 29 | 
 30 |     @property
 31 |     def average(self):
 32 |         return self.sum / self.count
 33 | 
 34 |     def update(self, value, number):
 35 |         self.sum += value * number
 36 |         self.count += number
 37 | 
 38 | 
 39 | class Accuracy(object):
 40 | 
 41 |     def __init__(self):
 42 |         self.correct = 0
 43 |         self.count = 0
 44 | 
 45 |     def __str__(self):
 46 |         return '{:.2f}%'.format(self.accuracy * 100)
 47 | 
 48 |     @property
 49 |     def accuracy(self):
 50 |         return self.correct / self.count
 51 | 
 52 |     @torch.no_grad()
 53 |     def update(self, output, target):
 54 |         pred = output.argmax(dim=1)
 55 |         correct = pred.eq(target).sum().item()
 56 | 
 57 |         self.correct += correct
 58 |         self.count += output.size(0)
 59 | 
 60 | 
 61 | class Trainer(object):
 62 | 
 63 |     def __init__(self, model, optimizer, train_loader, test_loader, device):
 64 |         self.model = model
 65 |         self.optimizer = optimizer
 66 |         self.train_loader = train_loader
 67 |         self.test_loader = test_loader
 68 |         self.device = device
 69 | 
 70 |     def fit(self, epochs):
 71 |         for epoch in range(1, epochs + 1):
 72 |             train_loss, train_acc = self.train()
 73 |             test_loss, test_acc = self.evaluate()
 74 | 
 75 |             print(
 76 |                 'Epoch: {}/{},'.format(epoch, epochs),
 77 |                 'train loss: {}, train acc: {},'.format(train_loss, train_acc),
 78 |                 'test loss: {}, test acc: {}.'.format(test_loss, test_acc),
 79 |             )
 80 | 
 81 |     def train(self):
 82 |         self.model.train()
 83 | 
 84 |         train_loss = Average()
 85 |         train_acc = Accuracy()
 86 | 
 87 |         for data, target in self.train_loader:
 88 |             data = data.to(self.device)
 89 |             target = target.to(self.device)
 90 | 
 91 |             output = self.model(data)
 92 |             loss = F.cross_entropy(output, target)
 93 | 
 94 |             self.optimizer.zero_grad()
 95 |             loss.backward()
 96 |             self.optimizer.step()
 97 | 
 98 |             train_loss.update(loss.item(), data.size(0))
 99 |             train_acc.update(output, target)
100 | 
101 |         return train_loss, train_acc
102 | 
103 |     @torch.no_grad()
104 |     def evaluate(self):
105 |         self.model.eval()
106 | 
107 |         test_loss = Average()
108 |         test_acc = Accuracy()
109 | 
110 |         for data, target in self.test_loader:
111 |             data = data.to(self.device)
112 |             target = target.to(self.device)
113 | 
114 |             output = self.model(data)
115 |             loss = F.cross_entropy(output, target)
116 | 
117 |             test_loss.update(loss.item(), data.size(0))
118 |             test_acc.update(output, target)
119 | 
120 |         return test_loss, test_acc
121 | 
122 | 
123 | class Net(nn.Module):
124 | 
125 |     def __init__(self):
126 |         super(Net, self).__init__()
127 |         self.fc = nn.Linear(784, 10)
128 | 
129 |     def forward(self, x):
130 |         return self.fc(x.view(x.size(0), -1))
131 | 
132 | 
133 | class MNISTDataLoader(data.DataLoader):
134 | 
135 |     def __init__(self, root, batch_size, train=True):
136 |         transform = transforms.Compose([
137 |             transforms.ToTensor(),
138 |             transforms.Normalize((0.1307,), (0.3081,)),
139 |         ])
140 | 
141 |         dataset = datasets.MNIST(root, train=train, transform=transform, download=True)
142 |         sampler = None
143 |         if train and distributed_is_initialized():
144 |             sampler = data.DistributedSampler(dataset)
145 | 
146 |         super(MNISTDataLoader, self).__init__(
147 |             dataset,
148 |             batch_size=batch_size,
149 |             shuffle=(sampler is None),
150 |             sampler=sampler,
151 |         )
152 | 
153 | 
154 | def run(args):
155 |     device = torch.device('cuda' if torch.cuda.is_available() and not args.no_cuda else 'cpu')
156 | 
157 |     model = Net()
158 |     if distributed_is_initialized():
159 |         model.to(device)
160 |         model = nn.parallel.DistributedDataParallel(model)
161 |     else:
162 |         model = nn.DataParallel(model)
163 |         model.to(device)
164 |     optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate)
165 |     
166 |     train_loader = MNISTDataLoader(args.root, args.batch_size, train=True)
167 |     test_loader = MNISTDataLoader(args.root, args.batch_size, train=False)
168 |     trainer = Trainer(model, optimizer, train_loader, test_loader, device)
169 |     trainer.fit(args.epochs)
170 | 
171 | 
172 | def main():
173 |     parser = argparse.ArgumentParser()
174 |     parser.add_argument('--backend', type=str, default='nccl', help='Name of the backend to use.')
175 |     parser.add_argument('-i',
176 |                         '--init-method',
177 |                         type=str,
178 |                         default='env://',
179 |                         help='URL specifying how to initialize the package.')
180 |     parser.add_argument('-s', '--world-size', type=int, default=1, help='Number of processes participating in the job.')
181 |     parser.add_argument('-r', '--rank', type=int, default=0, help='Rank of the current process.')
182 |     parser.add_argument('--epochs', type=int, default=20)
183 |     parser.add_argument('--no-cuda', action='store_true')
184 |     parser.add_argument('-lr', '--learning-rate', type=float, default=1e-3)
185 |     parser.add_argument('--root', type=str, default='data')
186 |     parser.add_argument('--batch-size', type=int, default=128)
187 |     parser.add_argument('--local_rank', type = int, default=0)
188 |     args = parser.parse_args()
189 |     print(args)
190 | 
191 |     if args.world_size > 1:
192 |         distributed.init_process_group(
193 |             backend=args.backend,
194 |             init_method=args.init_method,
195 |             world_size=args.world_size,
196 |             rank=args.rank,
197 |         )
198 | 
199 |     run(args)
200 | 
201 | 
202 | if __name__ == '__main__':
203 |     main()
204 | 


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-example-master/setup.cfg:
--------------------------------------------------------------------------------
1 | [yapf]
2 | based_on_style = google
3 | column_limit = 120
4 | 


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-example-master/toy/README.md:
--------------------------------------------------------------------------------
 1 | # Toy Example
 2 | 
 3 | Rank 0
 4 | ```
 5 | $ python3 main.py --init-method tcp://127.0.0.1:23456 --rank 0 --world-size 2
 6 | ```
 7 | 
 8 | Rank 2
 9 | ```
10 | $ python3 main.py --init-method tcp://127.0.0.1:23456 --rank 1 --world-size 2
11 | ```
12 | 


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-example-master/toy/main.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import os
 3 | from random import randint
 4 | from time import sleep
 5 | 
 6 | import torch
 7 | import torch.distributed as dist
 8 | 
 9 | 
10 | def run(world_size, rank, steps):
11 |     for step in range(1, steps + 1):
12 |         # get random int
13 |         value = randint(0, 10)
14 | 
15 |         # group all ranks
16 |         ranks = list(range(world_size))
17 |         group = dist.new_group(ranks=ranks)
18 | 
19 |         # compute reduced sum
20 |         tensor = torch.tensor(value, dtype=torch.int).cuda()
21 |         dist.all_reduce(tensor, op=dist.ReduceOp.SUM, group=group)
22 | 
23 |         print('rank: {}, step: {}, value: {}, reduced sum: {}.'.format(rank, step, value, tensor.item()))
24 | 
25 |         sleep(1)
26 | 
27 | 
28 | def main():
29 |     parser = argparse.ArgumentParser()
30 |     parser.add_argument('--backend', type=str, default='nccl', help='Name of the backend to use.')
31 |     parser.add_argument(
32 |         '-i',
33 |         '--init-method',
34 |         type=str,
35 |         default='tcp://127.0.0.1:23456',
36 |         help='URL specifying how to initialize the package.')
37 |     parser.add_argument('-s', '--world-size', type=int, help='Number of processes participating in the job.')
38 |     parser.add_argument('-r', '--rank', type=int, help='Rank of the current process.')
39 |     parser.add_argument('--steps', type=int, default=20)
40 |     args = parser.parse_args()
41 |     print(args)
42 |     
43 |     dist.init_process_group(
44 |         backend=args.backend,
45 |         init_method=args.init_method,
46 |         world_size=args.world_size,
47 |         rank=args.rank,
48 |     )
49 | 
50 |     run(args.world_size, args.rank, args.steps)
51 | 
52 | 
53 | if __name__ == '__main__':
54 |     main()
55 | 


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-master/.gitignore:
--------------------------------------------------------------------------------
 1 | # Compiled python
 2 | *.pyc
 3 | *.pyd
 4 | 
 5 | # Compiled MATLAB
 6 | *.mex*
 7 | 
 8 | # IPython notebook checkpoints
 9 | .ipynb_checkpoints
10 | 
11 | # Editor temporaries
12 | *.swn
13 | *.swo
14 | *.swp
15 | *~
16 | 
17 | # Sublime Text settings
18 | *.sublime-workspace
19 | *.sublime-project
20 | 
21 | # Eclipse Project settings
22 | *.*project
23 | .settings
24 | 
25 | # QtCreator files
26 | *.user
27 | 
28 | # PyCharm files
29 | .idea
30 | 
31 | # Visual Studio Code files
32 | .vscode
33 | .vs
34 | 
35 | # OSX dir files
36 | .DS_Store
37 | 


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-master/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2019-present, Zhi Zhang
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in
13 | all copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21 | THE SOFTWARE.


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-master/apex_distributed.py:
--------------------------------------------------------------------------------
  1 | # https://github.com/NVIDIA/apex/blob/master/examples/imagenet/main_amp.py
  2 | 
  3 | import csv
  4 | 
  5 | import argparse
  6 | import os
  7 | import random
  8 | import shutil
  9 | import time
 10 | import warnings
 11 | 
 12 | import torch
 13 | import torch.nn as nn
 14 | import torch.nn.parallel
 15 | import torch.backends.cudnn as cudnn
 16 | import torch.distributed as dist
 17 | import torch.optim
 18 | import torch.multiprocessing as mp
 19 | import torch.utils.data
 20 | import torch.utils.data.distributed
 21 | import torchvision.transforms as transforms
 22 | import torchvision.datasets as datasets
 23 | import torchvision.models as models
 24 | 
 25 | from apex import amp
 26 | from apex.parallel import DistributedDataParallel
 27 | 
 28 | model_names = sorted(name for name in models.__dict__
 29 |                      if name.islower() and not name.startswith("__") and callable(models.__dict__[name]))
 30 | 
 31 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training')
 32 | parser.add_argument('--data', metavar='DIR', default='/home/zhangzhi/Data/ImageNet2012', help='path to dataset')
 33 | parser.add_argument('-a',
 34 |                     '--arch',
 35 |                     metavar='ARCH',
 36 |                     default='resnet18',
 37 |                     choices=model_names,
 38 |                     help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet18)')
 39 | parser.add_argument('-j',
 40 |                     '--workers',
 41 |                     default=4,
 42 |                     type=int,
 43 |                     metavar='N',
 44 |                     help='number of data loading workers (default: 4)')
 45 | parser.add_argument('--epochs', default=90, type=int, metavar='N', help='number of total epochs to run')
 46 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)')
 47 | parser.add_argument('-b',
 48 |                     '--batch-size',
 49 |                     default=6400,
 50 |                     type=int,
 51 |                     metavar='N',
 52 |                     help='mini-batch size (default: 6400), this is the total '
 53 |                     'batch size of all GPUs on the current node when '
 54 |                     'using Data Parallel or Distributed Data Parallel')
 55 | parser.add_argument('--lr',
 56 |                     '--learning-rate',
 57 |                     default=0.1,
 58 |                     type=float,
 59 |                     metavar='LR',
 60 |                     help='initial learning rate',
 61 |                     dest='lr')
 62 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum')
 63 | parser.add_argument('--local_rank', default=-1, type=int,
 64 |                     help='node rank for distributed training')
 65 | parser.add_argument('--wd',
 66 |                     '--weight-decay',
 67 |                     default=1e-4,
 68 |                     type=float,
 69 |                     metavar='W',
 70 |                     help='weight decay (default: 1e-4)',
 71 |                     dest='weight_decay')
 72 | parser.add_argument('-p', '--print-freq', default=10, type=int, metavar='N', help='print frequency (default: 10)')
 73 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', help='evaluate model on validation set')
 74 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', help='use pre-trained model')
 75 | parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ')
 76 | 
 77 | best_acc1 = 0
 78 | 
 79 | 
 80 | class data_prefetcher():
 81 |     def __init__(self, loader):
 82 |         self.loader = iter(loader)
 83 |         self.stream = torch.cuda.Stream()
 84 |         self.mean = torch.tensor([0.485 * 255, 0.456 * 255, 0.406 * 255]).cuda().view(1, 3, 1, 1)
 85 |         self.std = torch.tensor([0.229 * 255, 0.224 * 255, 0.225 * 255]).cuda().view(1, 3, 1, 1)
 86 |         # With Amp, it isn't necessary to manually convert data to half.
 87 |         # if args.fp16:
 88 |         #     self.mean = self.mean.half()
 89 |         #     self.std = self.std.half()
 90 |         self.preload()
 91 | 
 92 |     def preload(self):
 93 |         try:
 94 |             self.next_input, self.next_target = next(self.loader)
 95 |         except StopIteration:
 96 |             self.next_input = None
 97 |             self.next_target = None
 98 |             return
 99 |         # if record_stream() doesn't work, another option is to make sure device inputs are created
100 |         # on the main stream.
101 |         # self.next_input_gpu = torch.empty_like(self.next_input, device='cuda')
102 |         # self.next_target_gpu = torch.empty_like(self.next_target, device='cuda')
103 |         # Need to make sure the memory allocated for next_* is not still in use by the main stream
104 |         # at the time we start copying to next_*:
105 |         # self.stream.wait_stream(torch.cuda.current_stream())
106 |         with torch.cuda.stream(self.stream):
107 |             self.next_input = self.next_input.cuda(non_blocking=True)
108 |             self.next_target = self.next_target.cuda(non_blocking=True)
109 |             # more code for the alternative if record_stream() doesn't work:
110 |             # copy_ will record the use of the pinned source tensor in this side stream.
111 |             # self.next_input_gpu.copy_(self.next_input, non_blocking=True)
112 |             # self.next_target_gpu.copy_(self.next_target, non_blocking=True)
113 |             # self.next_input = self.next_input_gpu
114 |             # self.next_target = self.next_target_gpu
115 | 
116 |             # With Amp, it isn't necessary to manually convert data to half.
117 |             # if args.fp16:
118 |             #     self.next_input = self.next_input.half()
119 |             # else:
120 |             self.next_input = self.next_input.float()
121 |             self.next_input = self.next_input.sub_(self.mean).div_(self.std)
122 | 
123 |     def next(self):
124 |         torch.cuda.current_stream().wait_stream(self.stream)
125 |         input = self.next_input
126 |         target = self.next_target
127 |         if input is not None:
128 |             input.record_stream(torch.cuda.current_stream())
129 |         if target is not None:
130 |             target.record_stream(torch.cuda.current_stream())
131 |         self.preload()
132 |         return input, target
133 | 
134 | 
135 | def main():
136 |     args = parser.parse_args()
137 | 
138 |     if args.seed is not None:
139 |         random.seed(args.seed)
140 |         torch.manual_seed(args.seed)
141 |         cudnn.deterministic = True
142 |         warnings.warn('You have chosen to seed training. '
143 |                       'This will turn on the CUDNN deterministic setting, '
144 |                       'which can slow down your training considerably! '
145 |                       'You may see unexpected behavior when restarting '
146 |                       'from checkpoints.')
147 | 
148 |     main_worker(args.local_rank, 4, args)
149 | 
150 | 
151 | def main_worker(gpu, ngpus_per_node, args):
152 |     global best_acc1
153 | 
154 |     dist.init_process_group(backend='nccl')
155 |     # create model
156 |     if args.pretrained:
157 |         print("=> using pre-trained model '{}'".format(args.arch))
158 |         model = models.__dict__[args.arch](pretrained=True)
159 |     else:
160 |         print("=> creating model '{}'".format(args.arch))
161 |         model = models.__dict__[args.arch]()
162 | 
163 |     torch.cuda.set_device(gpu)
164 |     model.cuda()
165 |     # When using a single GPU per process and per
166 |     # DistributedDataParallel, we need to divide the batch size
167 |     # ourselves based on the total number of GPUs we have
168 |     args.batch_size = int(args.batch_size / ngpus_per_node)
169 | 
170 |     # define loss function (criterion) and optimizer
171 |     criterion = nn.CrossEntropyLoss().cuda()
172 | 
173 |     optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay)
174 | 
175 |     model, optimizer = amp.initialize(model,
176 |                                       optimizer)
177 |     model = DistributedDataParallel(model)
178 | 
179 |     cudnn.benchmark = True
180 | 
181 |     # Data loading code
182 |     traindir = os.path.join(args.data, 'train')
183 |     valdir = os.path.join(args.data, 'val')
184 |     normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
185 | 
186 |     train_dataset = datasets.ImageFolder(
187 |         traindir,
188 |         transforms.Compose([
189 |             transforms.RandomResizedCrop(224),
190 |             transforms.RandomHorizontalFlip(),
191 |             transforms.ToTensor(),
192 |             normalize,
193 |         ]))
194 | 
195 |     train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
196 | 
197 |     train_loader = torch.utils.data.DataLoader(train_dataset,
198 |                                                batch_size=args.batch_size,
199 |                                                shuffle=(train_sampler is None),
200 |                                                num_workers=2,
201 |                                                pin_memory=True,
202 |                                                sampler=train_sampler)
203 | 
204 |     val_loader = torch.utils.data.DataLoader(datasets.ImageFolder(
205 |         valdir,
206 |         transforms.Compose([
207 |             transforms.Resize(256),
208 |             transforms.CenterCrop(224),
209 |             transforms.ToTensor(),
210 |             normalize,
211 |         ])),
212 |                                              batch_size=args.batch_size,
213 |                                              shuffle=False,
214 |                                              num_workers=2,
215 |                                              pin_memory=True)
216 | 
217 |     if args.evaluate:
218 |         validate(val_loader, model, criterion, gpu, args)
219 |         return
220 | 
221 |     log_csv = "apex_distributed.csv"
222 | 
223 |     for epoch in range(args.start_epoch, args.epochs):
224 |         epoch_start = time.time()
225 | 
226 |         train_sampler.set_epoch(epoch)
227 |         adjust_learning_rate(optimizer, epoch, args)
228 | 
229 |         # train for one epoch
230 |         train(train_loader, model, criterion, optimizer, epoch, gpu, args)
231 | 
232 |         # evaluate on validation set
233 |         acc1 = validate(val_loader, model, criterion, gpu, args)
234 | 
235 |         # remember best acc@1 and save checkpoint
236 |         is_best = acc1 > best_acc1
237 |         best_acc1 = max(acc1, best_acc1)
238 | 
239 |         epoch_end = time.time()
240 | 
241 |         with open(log_csv, 'a+') as f:
242 |             csv_write = csv.writer(f)
243 |             data_row = [time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(epoch_start)), epoch_end - epoch_start]
244 |             csv_write.writerow(data_row)
245 |             
246 |         save_checkpoint(
247 |             {
248 |                 'epoch': epoch + 1,
249 |                 'arch': args.arch,
250 |                 'state_dict': model.module.state_dict(),
251 |                 'best_acc1': best_acc1,
252 |             }, is_best)
253 | 
254 | 
255 | def train(train_loader, model, criterion, optimizer, epoch, gpu, args):
256 |     batch_time = AverageMeter('Time', ':6.3f')
257 |     data_time = AverageMeter('Data', ':6.3f')
258 |     losses = AverageMeter('Loss', ':.4e')
259 |     top1 = AverageMeter('Acc@1', ':6.2f')
260 |     top5 = AverageMeter('Acc@5', ':6.2f')
261 |     progress = ProgressMeter(len(train_loader), [batch_time, data_time, losses, top1, top5],
262 |                              prefix="Epoch: [{}]".format(epoch))
263 | 
264 |     # switch to train mode
265 |     model.train()
266 | 
267 |     end = time.time()
268 |     prefetcher = data_prefetcher(train_loader)
269 |     images, target = prefetcher.next()
270 |     i = 0
271 |     while images is not None:
272 |         # measure data loading time
273 |         data_time.update(time.time() - end)
274 | 
275 |         # compute output
276 |         output = model(images)
277 |         loss = criterion(output, target)
278 | 
279 |         # measure accuracy and record loss
280 |         acc1, acc5 = accuracy(output, target, topk=(1, 5))
281 |         losses.update(loss.item(), images.size(0))
282 |         top1.update(acc1[0], images.size(0))
283 |         top5.update(acc5[0], images.size(0))
284 | 
285 |         # compute gradient and do SGD step
286 |         optimizer.zero_grad()
287 |         with amp.scale_loss(loss, optimizer) as scaled_loss:
288 |             scaled_loss.backward()
289 |         optimizer.step()
290 | 
291 |         # measure elapsed time
292 |         batch_time.update(time.time() - end)
293 |         end = time.time()
294 | 
295 |         if i % args.print_freq == 0:
296 |             progress.display(i)
297 | 
298 |         i += 1
299 | 
300 |         images, target = prefetcher.next()
301 | 
302 | 
303 | def validate(val_loader, model, criterion, gpu, args):
304 |     batch_time = AverageMeter('Time', ':6.3f')
305 |     losses = AverageMeter('Loss', ':.4e')
306 |     top1 = AverageMeter('Acc@1', ':6.2f')
307 |     top5 = AverageMeter('Acc@5', ':6.2f')
308 |     progress = ProgressMeter(len(val_loader), [batch_time, losses, top1, top5], prefix='Test: ')
309 | 
310 |     # switch to evaluate mode
311 |     model.eval()
312 | 
313 |     with torch.no_grad():
314 |         end = time.time()
315 |         prefetcher = data_prefetcher(val_loader)
316 |         images, target = prefetcher.next()
317 |         i = 0
318 |         while images is not None:
319 | 
320 |             # compute output
321 |             output = model(images)
322 |             loss = criterion(output, target)
323 | 
324 |             # measure accuracy and record loss
325 |             acc1, acc5 = accuracy(output, target, topk=(1, 5))
326 |             losses.update(loss.item(), images.size(0))
327 |             top1.update(acc1[0], images.size(0))
328 |             top5.update(acc5[0], images.size(0))
329 | 
330 |             # measure elapsed time
331 |             batch_time.update(time.time() - end)
332 |             end = time.time()
333 | 
334 |             if i % args.print_freq == 0:
335 |                 progress.display(i)
336 | 
337 |             i += 1
338 | 
339 |             images, target = prefetcher.next()
340 | 
341 |         # TODO: this should also be done with the ProgressMeter
342 |         print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5))
343 | 
344 |     return top1.avg
345 | 
346 | 
347 | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
348 |     torch.save(state, filename)
349 |     if is_best:
350 |         shutil.copyfile(filename, 'model_best.pth.tar')
351 | 
352 | 
353 | class AverageMeter(object):
354 |     """Computes and stores the average and current value"""
355 |     def __init__(self, name, fmt=':f'):
356 |         self.name = name
357 |         self.fmt = fmt
358 |         self.reset()
359 | 
360 |     def reset(self):
361 |         self.val = 0
362 |         self.avg = 0
363 |         self.sum = 0
364 |         self.count = 0
365 | 
366 |     def update(self, val, n=1):
367 |         self.val = val
368 |         self.sum += val * n
369 |         self.count += n
370 |         self.avg = self.sum / self.count
371 | 
372 |     def __str__(self):
373 |         fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
374 |         return fmtstr.format(**self.__dict__)
375 | 
376 | 
377 | class ProgressMeter(object):
378 |     def __init__(self, num_batches, meters, prefix=""):
379 |         self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
380 |         self.meters = meters
381 |         self.prefix = prefix
382 | 
383 |     def display(self, batch):
384 |         entries = [self.prefix + self.batch_fmtstr.format(batch)]
385 |         entries += [str(meter) for meter in self.meters]
386 |         print('\t'.join(entries))
387 | 
388 |     def _get_batch_fmtstr(self, num_batches):
389 |         num_digits = len(str(num_batches // 1))
390 |         fmt = '{:' + str(num_digits) + 'd}'
391 |         return '[' + fmt + '/' + fmt.format(num_batches) + ']'
392 | 
393 | 
394 | def adjust_learning_rate(optimizer, epoch, args):
395 |     """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
396 |     lr = args.lr * (0.1**(epoch // 30))
397 |     for param_group in optimizer.param_groups:
398 |         param_group['lr'] = lr
399 | 
400 | 
401 | def accuracy(output, target, topk=(1, )):
402 |     """Computes the accuracy over the k top predictions for the specified values of k"""
403 |     with torch.no_grad():
404 |         maxk = max(topk)
405 |         batch_size = target.size(0)
406 | 
407 |         _, pred = output.topk(maxk, 1, True, True)
408 |         pred = pred.t()
409 |         correct = pred.eq(target.view(1, -1).expand_as(pred))
410 | 
411 |         res = []
412 |         for k in topk:
413 |             correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
414 |             res.append(correct_k.mul_(100.0 / batch_size))
415 |         return res
416 | 
417 | 
418 | if __name__ == '__main__':
419 |     main()


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-master/dataparallel.py:
--------------------------------------------------------------------------------
  1 | import csv
  2 | 
  3 | import argparse
  4 | import os
  5 | import random
  6 | import shutil
  7 | import time
  8 | import warnings
  9 | 
 10 | import torch
 11 | import torch.nn as nn
 12 | import torch.nn.parallel
 13 | import torch.backends.cudnn as cudnn
 14 | import torch.distributed as dist
 15 | import torch.optim
 16 | import torch.multiprocessing as mp
 17 | import torch.utils.data
 18 | import torch.utils.data.distributed
 19 | import torchvision.transforms as transforms
 20 | import torchvision.datasets as datasets
 21 | import torchvision.models as models
 22 | 
 23 | model_names = sorted(name for name in models.__dict__
 24 |                      if name.islower() and not name.startswith("__") and callable(models.__dict__[name]))
 25 | 
 26 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training')
 27 | parser.add_argument('--data', metavar='DIR', default='/home/zhangzhi/Data/ImageNet2012', help='path to dataset')
 28 | parser.add_argument('-a',
 29 |                     '--arch',
 30 |                     metavar='ARCH',
 31 |                     default='resnet18',
 32 |                     choices=model_names,
 33 |                     help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet18)')
 34 | parser.add_argument('-j',
 35 |                     '--workers',
 36 |                     default=4,
 37 |                     type=int,
 38 |                     metavar='N',
 39 |                     help='number of data loading workers (default: 4)')
 40 | parser.add_argument('--epochs', default=90, type=int, metavar='N', help='number of total epochs to run')
 41 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)')
 42 | parser.add_argument('-b',
 43 |                     '--batch-size',
 44 |                     default=3200,
 45 |                     type=int,
 46 |                     metavar='N',
 47 |                     help='mini-batch size (default: 3200), this is the total '
 48 |                     'batch size of all GPUs on the current node when '
 49 |                     'using Data Parallel or Distributed Data Parallel')
 50 | parser.add_argument('--lr',
 51 |                     '--learning-rate',
 52 |                     default=0.1,
 53 |                     type=float,
 54 |                     metavar='LR',
 55 |                     help='initial learning rate',
 56 |                     dest='lr')
 57 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum')
 58 | parser.add_argument('--wd',
 59 |                     '--weight-decay',
 60 |                     default=1e-4,
 61 |                     type=float,
 62 |                     metavar='W',
 63 |                     help='weight decay (default: 1e-4)',
 64 |                     dest='weight_decay')
 65 | parser.add_argument('-p', '--print-freq', default=10, type=int, metavar='N', help='print frequency (default: 10)')
 66 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', help='evaluate model on validation set')
 67 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', help='use pre-trained model')
 68 | parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ')
 69 | 
 70 | best_acc1 = 0
 71 | 
 72 | 
 73 | def main():
 74 |     args = parser.parse_args()
 75 | 
 76 |     if args.seed is not None:
 77 |         random.seed(args.seed)
 78 |         torch.manual_seed(args.seed)
 79 |         cudnn.deterministic = True
 80 |         warnings.warn('You have chosen to seed training. '
 81 |                       'This will turn on the CUDNN deterministic setting, '
 82 |                       'which can slow down your training considerably! '
 83 |                       'You may see unexpected behavior when restarting '
 84 |                       'from checkpoints.')
 85 | 
 86 |     gpus = [0, 1, 2, 3]
 87 |     main_worker(gpus=gpus, args=args)
 88 | 
 89 | 
 90 | def main_worker(gpus, args):
 91 |     global best_acc1
 92 | 
 93 |     # create model
 94 |     if args.pretrained:
 95 |         print("=> using pre-trained model '{}'".format(args.arch))
 96 |         model = models.__dict__[args.arch](pretrained=True)
 97 |     else:
 98 |         print("=> creating model '{}'".format(args.arch))
 99 |         model = models.__dict__[args.arch]()
100 | 
101 |     torch.cuda.set_device('cuda:{}'.format(gpus[0]))
102 |     model.cuda()
103 |     # When using a single GPU per process and per
104 |     # DistributedDataParallel, we need to divide the batch size
105 |     # ourselves based on the total number of GPUs we have
106 |     model = nn.DataParallel(model, device_ids=gpus, output_device=gpus[0])
107 | 
108 |     # define loss function (criterion) and optimizer
109 |     criterion = nn.CrossEntropyLoss()
110 | 
111 |     optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay)
112 | 
113 |     cudnn.benchmark = True
114 | 
115 |     # Data loading code
116 |     traindir = os.path.join(args.data, 'train')
117 |     valdir = os.path.join(args.data, 'val')
118 |     normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
119 | 
120 |     train_dataset = datasets.ImageFolder(
121 |         traindir,
122 |         transforms.Compose([
123 |             transforms.RandomResizedCrop(224),
124 |             transforms.RandomHorizontalFlip(),
125 |             transforms.ToTensor(),
126 |             normalize,
127 |         ]))
128 | 
129 |     train_loader = torch.utils.data.DataLoader(train_dataset,
130 |                                                batch_size=args.batch_size,
131 |                                                shuffle=True,
132 |                                                num_workers=2,
133 |                                                pin_memory=True)
134 | 
135 |     val_loader = torch.utils.data.DataLoader(datasets.ImageFolder(
136 |         valdir,
137 |         transforms.Compose([
138 |             transforms.Resize(256),
139 |             transforms.CenterCrop(224),
140 |             transforms.ToTensor(),
141 |             normalize,
142 |         ])),
143 |                                              batch_size=args.batch_size,
144 |                                              shuffle=False,
145 |                                              num_workers=2,
146 |                                              pin_memory=True)
147 | 
148 |     if args.evaluate:
149 |         validate(val_loader, model, criterion, args)
150 |         return
151 | 
152 |     log_csv = "dataparallel.csv"
153 | 
154 |     for epoch in range(args.start_epoch, args.epochs):
155 |         epoch_start = time.time()
156 | 
157 |         adjust_learning_rate(optimizer, epoch, args)
158 | 
159 |         # train for one epoch
160 |         train(train_loader, model, criterion, optimizer, epoch, args)
161 | 
162 |         # evaluate on validation set
163 |         acc1 = validate(val_loader, model, criterion, args)
164 | 
165 |         # remember best acc@1 and save checkpoint
166 |         is_best = acc1 > best_acc1
167 |         best_acc1 = max(acc1, best_acc1)
168 | 
169 |         epoch_end = time.time()
170 | 
171 |         with open(log_csv, 'a+') as f:
172 |             csv_write = csv.writer(f)
173 |             data_row = [time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(epoch_start)), epoch_end - epoch_start]
174 |             csv_write.writerow(data_row)
175 |             
176 |         save_checkpoint(
177 |             {
178 |                 'epoch': epoch + 1,
179 |                 'arch': args.arch,
180 |                 'state_dict': model.module.state_dict(),
181 |                 'best_acc1': best_acc1,
182 |             }, is_best)
183 | 
184 | 
185 | def train(train_loader, model, criterion, optimizer, epoch, args):
186 |     batch_time = AverageMeter('Time', ':6.3f')
187 |     data_time = AverageMeter('Data', ':6.3f')
188 |     losses = AverageMeter('Loss', ':.4e')
189 |     top1 = AverageMeter('Acc@1', ':6.2f')
190 |     top5 = AverageMeter('Acc@5', ':6.2f')
191 |     progress = ProgressMeter(len(train_loader), [batch_time, data_time, losses, top1, top5],
192 |                              prefix="Epoch: [{}]".format(epoch))
193 | 
194 |     # switch to train mode
195 |     model.train()
196 | 
197 |     end = time.time()
198 |     for i, (images, target) in enumerate(train_loader):
199 |         # measure data loading time
200 |         data_time.update(time.time() - end)
201 | 
202 |         images = images.cuda(non_blocking=True)
203 |         target = target.cuda(non_blocking=True)
204 | 
205 |         # compute output
206 |         output = model(images)
207 |         loss = criterion(output, target)
208 | 
209 |         # measure accuracy and record loss
210 |         acc1, acc5 = accuracy(output, target, topk=(1, 5))
211 |         losses.update(loss.item(), images.size(0))
212 |         top1.update(acc1[0], images.size(0))
213 |         top5.update(acc5[0], images.size(0))
214 | 
215 |         # compute gradient and do SGD step
216 |         optimizer.zero_grad()
217 |         loss.backward()
218 |         optimizer.step()
219 | 
220 |         # measure elapsed time
221 |         batch_time.update(time.time() - end)
222 |         end = time.time()
223 | 
224 |         if i % args.print_freq == 0:
225 |             progress.display(i)
226 | 
227 | 
228 | def validate(val_loader, model, criterion, args):
229 |     batch_time = AverageMeter('Time', ':6.3f')
230 |     losses = AverageMeter('Loss', ':.4e')
231 |     top1 = AverageMeter('Acc@1', ':6.2f')
232 |     top5 = AverageMeter('Acc@5', ':6.2f')
233 |     progress = ProgressMeter(len(val_loader), [batch_time, losses, top1, top5], prefix='Test: ')
234 | 
235 |     # switch to evaluate mode
236 |     model.eval()
237 | 
238 |     with torch.no_grad():
239 |         end = time.time()
240 |         for i, (images, target) in enumerate(val_loader):
241 |             images = images.cuda(non_blocking=True)
242 |             target = target.cuda(non_blocking=True)
243 | 
244 |             # compute output
245 |             output = model(images)
246 |             loss = criterion(output, target)
247 | 
248 |             # measure accuracy and record loss
249 |             acc1, acc5 = accuracy(output, target, topk=(1, 5))
250 |             losses.update(loss.item(), images.size(0))
251 |             top1.update(acc1[0], images.size(0))
252 |             top5.update(acc5[0], images.size(0))
253 | 
254 |             # measure elapsed time
255 |             batch_time.update(time.time() - end)
256 |             end = time.time()
257 | 
258 |             if i % args.print_freq == 0:
259 |                 progress.display(i)
260 | 
261 |         # TODO: this should also be done with the ProgressMeter
262 |         print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5))
263 | 
264 |     return top1.avg
265 | 
266 | 
267 | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
268 |     torch.save(state, filename)
269 |     if is_best:
270 |         shutil.copyfile(filename, 'model_best.pth.tar')
271 | 
272 | 
273 | class AverageMeter(object):
274 |     """Computes and stores the average and current value"""
275 |     def __init__(self, name, fmt=':f'):
276 |         self.name = name
277 |         self.fmt = fmt
278 |         self.reset()
279 | 
280 |     def reset(self):
281 |         self.val = 0
282 |         self.avg = 0
283 |         self.sum = 0
284 |         self.count = 0
285 | 
286 |     def update(self, val, n=1):
287 |         self.val = val
288 |         self.sum += val * n
289 |         self.count += n
290 |         self.avg = self.sum / self.count
291 | 
292 |     def __str__(self):
293 |         fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
294 |         return fmtstr.format(**self.__dict__)
295 | 
296 | 
297 | class ProgressMeter(object):
298 |     def __init__(self, num_batches, meters, prefix=""):
299 |         self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
300 |         self.meters = meters
301 |         self.prefix = prefix
302 | 
303 |     def display(self, batch):
304 |         entries = [self.prefix + self.batch_fmtstr.format(batch)]
305 |         entries += [str(meter) for meter in self.meters]
306 |         print('\t'.join(entries))
307 | 
308 |     def _get_batch_fmtstr(self, num_batches):
309 |         num_digits = len(str(num_batches // 1))
310 |         fmt = '{:' + str(num_digits) + 'd}'
311 |         return '[' + fmt + '/' + fmt.format(num_batches) + ']'
312 | 
313 | 
314 | def adjust_learning_rate(optimizer, epoch, args):
315 |     """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
316 |     lr = args.lr * (0.1**(epoch // 30))
317 |     for param_group in optimizer.param_groups:
318 |         param_group['lr'] = lr
319 | 
320 | 
321 | def accuracy(output, target, topk=(1, )):
322 |     """Computes the accuracy over the k top predictions for the specified values of k"""
323 |     with torch.no_grad():
324 |         maxk = max(topk)
325 |         batch_size = target.size(0)
326 | 
327 |         _, pred = output.topk(maxk, 1, True, True)
328 |         pred = pred.t()
329 |         correct = pred.eq(target.view(1, -1).expand_as(pred))
330 | 
331 |         res = []
332 |         for k in topk:
333 |             correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
334 |             res.append(correct_k.mul_(100.0 / batch_size))
335 |         return res
336 | 
337 | 
338 | if __name__ == '__main__':
339 |     main()


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-master/distributed.py:
--------------------------------------------------------------------------------
  1 | import csv
  2 | 
  3 | import argparse
  4 | import os
  5 | import random
  6 | import shutil
  7 | import time
  8 | import warnings
  9 | 
 10 | import torch
 11 | import torch.nn as nn
 12 | import torch.nn.parallel
 13 | import torch.backends.cudnn as cudnn
 14 | import torch.distributed as dist
 15 | import torch.optim
 16 | import torch.multiprocessing as mp
 17 | import torch.utils.data
 18 | import torch.utils.data.distributed
 19 | import torchvision.transforms as transforms
 20 | import torchvision.datasets as datasets
 21 | import torchvision.models as models
 22 | 
 23 | model_names = sorted(name for name in models.__dict__
 24 |                      if name.islower() and not name.startswith("__") and callable(models.__dict__[name]))
 25 | 
 26 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training')
 27 | parser.add_argument('--data', metavar='DIR', default='/home/zhangzhi/Data/ImageNet2012', help='path to dataset')
 28 | parser.add_argument('-a',
 29 |                     '--arch',
 30 |                     metavar='ARCH',
 31 |                     default='resnet18',
 32 |                     choices=model_names,
 33 |                     help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet18)')
 34 | parser.add_argument('-j',
 35 |                     '--workers',
 36 |                     default=4,
 37 |                     type=int,
 38 |                     metavar='N',
 39 |                     help='number of data loading workers (default: 4)')
 40 | parser.add_argument('--epochs', default=90, type=int, metavar='N', help='number of total epochs to run')
 41 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)')
 42 | parser.add_argument('-b',
 43 |                     '--batch-size',
 44 |                     default=3200,
 45 |                     type=int,
 46 |                     metavar='N',
 47 |                     help='mini-batch size (default: 3200), this is the total '
 48 |                     'batch size of all GPUs on the current node when '
 49 |                     'using Data Parallel or Distributed Data Parallel')
 50 | parser.add_argument('--lr',
 51 |                     '--learning-rate',
 52 |                     default=0.1,
 53 |                     type=float,
 54 |                     metavar='LR',
 55 |                     help='initial learning rate',
 56 |                     dest='lr')
 57 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum')
 58 | parser.add_argument('--local_rank', default=-1, type=int,
 59 |                     help='node rank for distributed training')
 60 | parser.add_argument('--wd',
 61 |                     '--weight-decay',
 62 |                     default=1e-4,
 63 |                     type=float,
 64 |                     metavar='W',
 65 |                     help='weight decay (default: 1e-4)',
 66 |                     dest='weight_decay')
 67 | parser.add_argument('-p', '--print-freq', default=10, type=int, metavar='N', help='print frequency (default: 10)')
 68 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', help='evaluate model on validation set')
 69 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', help='use pre-trained model')
 70 | parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ')
 71 | 
 72 | best_acc1 = 0
 73 | 
 74 | 
 75 | def main():
 76 |     args = parser.parse_args()
 77 | 
 78 |     if args.seed is not None:
 79 |         random.seed(args.seed)
 80 |         torch.manual_seed(args.seed)
 81 |         cudnn.deterministic = True
 82 |         warnings.warn('You have chosen to seed training. '
 83 |                       'This will turn on the CUDNN deterministic setting, '
 84 |                       'which can slow down your training considerably! '
 85 |                       'You may see unexpected behavior when restarting '
 86 |                       'from checkpoints.')
 87 | 
 88 |     main_worker(args.local_rank, 4, args)
 89 | 
 90 | 
 91 | def main_worker(gpu, ngpus_per_node, args):
 92 |     global best_acc1
 93 | 
 94 |     dist.init_process_group(backend='nccl')
 95 |     # create model
 96 |     if args.pretrained:
 97 |         print("=> using pre-trained model '{}'".format(args.arch))
 98 |         model = models.__dict__[args.arch](pretrained=True)
 99 |     else:
100 |         print("=> creating model '{}'".format(args.arch))
101 |         model = models.__dict__[args.arch]()
102 | 
103 |     torch.cuda.set_device(gpu)
104 |     model.cuda(gpu)
105 |     # When using a single GPU per process and per
106 |     # DistributedDataParallel, we need to divide the batch size
107 |     # ourselves based on the total number of GPUs we have
108 |     args.batch_size = int(args.batch_size / ngpus_per_node)
109 |     model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[gpu])
110 | 
111 |     # define loss function (criterion) and optimizer
112 |     criterion = nn.CrossEntropyLoss().cuda(gpu)
113 | 
114 |     optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay)
115 | 
116 |     cudnn.benchmark = True
117 | 
118 |     # Data loading code
119 |     traindir = os.path.join(args.data, 'train')
120 |     valdir = os.path.join(args.data, 'val')
121 |     normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
122 | 
123 |     train_dataset = datasets.ImageFolder(
124 |         traindir,
125 |         transforms.Compose([
126 |             transforms.RandomResizedCrop(224),
127 |             transforms.RandomHorizontalFlip(),
128 |             transforms.ToTensor(),
129 |             normalize,
130 |         ]))
131 | 
132 |     train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
133 | 
134 |     train_loader = torch.utils.data.DataLoader(train_dataset,
135 |                                                batch_size=args.batch_size,
136 |                                                shuffle=(train_sampler is None),
137 |                                                num_workers=2,
138 |                                                pin_memory=True,
139 |                                                sampler=train_sampler)
140 | 
141 |     val_loader = torch.utils.data.DataLoader(datasets.ImageFolder(
142 |         valdir,
143 |         transforms.Compose([
144 |             transforms.Resize(256),
145 |             transforms.CenterCrop(224),
146 |             transforms.ToTensor(),
147 |             normalize,
148 |         ])),
149 |                                              batch_size=args.batch_size,
150 |                                              shuffle=False,
151 |                                              num_workers=2,
152 |                                              pin_memory=True)
153 | 
154 |     if args.evaluate:
155 |         validate(val_loader, model, criterion, gpu, args)
156 |         return
157 | 
158 |     log_csv = "distributed.csv"
159 | 
160 |     for epoch in range(args.start_epoch, args.epochs):
161 |         epoch_start = time.time()
162 | 
163 |         train_sampler.set_epoch(epoch)
164 |         adjust_learning_rate(optimizer, epoch, args)
165 | 
166 |         # train for one epoch
167 |         train(train_loader, model, criterion, optimizer, epoch, gpu, args)
168 | 
169 |         # evaluate on validation set
170 |         acc1 = validate(val_loader, model, criterion, gpu, args)
171 | 
172 |         # remember best acc@1 and save checkpoint
173 |         is_best = acc1 > best_acc1
174 |         best_acc1 = max(acc1, best_acc1)
175 | 
176 |         epoch_end = time.time()
177 | 
178 |         with open(log_csv, 'a+') as f:
179 |             csv_write = csv.writer(f)
180 |             data_row = [time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(epoch_start)), epoch_end - epoch_start]
181 |             csv_write.writerow(data_row)
182 |             
183 |         save_checkpoint(
184 |             {
185 |                 'epoch': epoch + 1,
186 |                 'arch': args.arch,
187 |                 'state_dict': model.module.state_dict(),
188 |                 'best_acc1': best_acc1,
189 |             }, is_best)
190 | 
191 | 
192 | def train(train_loader, model, criterion, optimizer, epoch, gpu, args):
193 |     batch_time = AverageMeter('Time', ':6.3f')
194 |     data_time = AverageMeter('Data', ':6.3f')
195 |     losses = AverageMeter('Loss', ':.4e')
196 |     top1 = AverageMeter('Acc@1', ':6.2f')
197 |     top5 = AverageMeter('Acc@5', ':6.2f')
198 |     progress = ProgressMeter(len(train_loader), [batch_time, data_time, losses, top1, top5],
199 |                              prefix="Epoch: [{}]".format(epoch))
200 | 
201 |     # switch to train mode
202 |     model.train()
203 | 
204 |     end = time.time()
205 |     for i, (images, target) in enumerate(train_loader):
206 |         # measure data loading time
207 |         data_time.update(time.time() - end)
208 | 
209 |         images = images.cuda(gpu, non_blocking=True)
210 |         target = target.cuda(gpu, non_blocking=True)
211 | 
212 |         # compute output
213 |         output = model(images)
214 |         loss = criterion(output, target)
215 | 
216 |         # measure accuracy and record loss
217 |         acc1, acc5 = accuracy(output, target, topk=(1, 5))
218 |         losses.update(loss.item(), images.size(0))
219 |         top1.update(acc1[0], images.size(0))
220 |         top5.update(acc5[0], images.size(0))
221 | 
222 |         # compute gradient and do SGD step
223 |         optimizer.zero_grad()
224 |         loss.backward()
225 |         optimizer.step()
226 | 
227 |         # measure elapsed time
228 |         batch_time.update(time.time() - end)
229 |         end = time.time()
230 | 
231 |         if i % args.print_freq == 0:
232 |             progress.display(i)
233 | 
234 | 
235 | def validate(val_loader, model, criterion, gpu, args):
236 |     batch_time = AverageMeter('Time', ':6.3f')
237 |     losses = AverageMeter('Loss', ':.4e')
238 |     top1 = AverageMeter('Acc@1', ':6.2f')
239 |     top5 = AverageMeter('Acc@5', ':6.2f')
240 |     progress = ProgressMeter(len(val_loader), [batch_time, losses, top1, top5], prefix='Test: ')
241 | 
242 |     # switch to evaluate mode
243 |     model.eval()
244 | 
245 |     with torch.no_grad():
246 |         end = time.time()
247 |         for i, (images, target) in enumerate(val_loader):
248 |             images = images.cuda(gpu, non_blocking=True)
249 |             target = target.cuda(gpu, non_blocking=True)
250 | 
251 |             # compute output
252 |             output = model(images)
253 |             loss = criterion(output, target)
254 | 
255 |             # measure accuracy and record loss
256 |             acc1, acc5 = accuracy(output, target, topk=(1, 5))
257 |             losses.update(loss.item(), images.size(0))
258 |             top1.update(acc1[0], images.size(0))
259 |             top5.update(acc5[0], images.size(0))
260 | 
261 |             # measure elapsed time
262 |             batch_time.update(time.time() - end)
263 |             end = time.time()
264 | 
265 |             if i % args.print_freq == 0:
266 |                 progress.display(i)
267 | 
268 |         # TODO: this should also be done with the ProgressMeter
269 |         print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5))
270 | 
271 |     return top1.avg
272 | 
273 | 
274 | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
275 |     torch.save(state, filename)
276 |     if is_best:
277 |         shutil.copyfile(filename, 'model_best.pth.tar')
278 | 
279 | 
280 | class AverageMeter(object):
281 |     """Computes and stores the average and current value"""
282 |     def __init__(self, name, fmt=':f'):
283 |         self.name = name
284 |         self.fmt = fmt
285 |         self.reset()
286 | 
287 |     def reset(self):
288 |         self.val = 0
289 |         self.avg = 0
290 |         self.sum = 0
291 |         self.count = 0
292 | 
293 |     def update(self, val, n=1):
294 |         self.val = val
295 |         self.sum += val * n
296 |         self.count += n
297 |         self.avg = self.sum / self.count
298 | 
299 |     def __str__(self):
300 |         fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
301 |         return fmtstr.format(**self.__dict__)
302 | 
303 | 
304 | class ProgressMeter(object):
305 |     def __init__(self, num_batches, meters, prefix=""):
306 |         self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
307 |         self.meters = meters
308 |         self.prefix = prefix
309 | 
310 |     def display(self, batch):
311 |         entries = [self.prefix + self.batch_fmtstr.format(batch)]
312 |         entries += [str(meter) for meter in self.meters]
313 |         print('\t'.join(entries))
314 | 
315 |     def _get_batch_fmtstr(self, num_batches):
316 |         num_digits = len(str(num_batches // 1))
317 |         fmt = '{:' + str(num_digits) + 'd}'
318 |         return '[' + fmt + '/' + fmt.format(num_batches) + ']'
319 | 
320 | 
321 | def adjust_learning_rate(optimizer, epoch, args):
322 |     """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
323 |     lr = args.lr * (0.1**(epoch // 30))
324 |     for param_group in optimizer.param_groups:
325 |         param_group['lr'] = lr
326 | 
327 | 
328 | def accuracy(output, target, topk=(1, )):
329 |     """Computes the accuracy over the k top predictions for the specified values of k"""
330 |     with torch.no_grad():
331 |         maxk = max(topk)
332 |         batch_size = target.size(0)
333 | 
334 |         _, pred = output.topk(maxk, 1, True, True)
335 |         pred = pred.t()
336 |         correct = pred.eq(target.view(1, -1).expand_as(pred))
337 | 
338 |         res = []
339 |         for k in topk:
340 |             correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
341 |             res.append(correct_k.mul_(100.0 / batch_size))
342 |         return res
343 | 
344 | 
345 | if __name__ == '__main__':
346 |     main()


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-master/distributed_slurm_main.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import csv
  3 | import time
  4 | import socket
  5 | import random
  6 | import shutil
  7 | import argparse
  8 | import warnings
  9 | 
 10 | import torch
 11 | import torch.optim
 12 | import torch.nn as nn
 13 | import torch.nn.parallel
 14 | import torch.backends.cudnn as cudnn
 15 | import torch.distributed as dist
 16 | import torch.multiprocessing as mp
 17 | import torch.utils.data
 18 | import torch.utils.data.distributed
 19 | 
 20 | import torchvision.transforms as transforms
 21 | import torchvision.datasets as datasets
 22 | import torchvision.models as models
 23 | 
 24 | model_names = sorted(name for name in models.__dict__
 25 |                      if name.islower() and not name.startswith("__") and callable(models.__dict__[name]))
 26 | 
 27 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training')
 28 | parser.add_argument('--data', metavar='DIR', default='/home/zhangzhi/Data/exports/ImageNet2012', help='path to dataset')
 29 | parser.add_argument('-a',
 30 |                     '--arch',
 31 |                     metavar='ARCH',
 32 |                     default='resnet18',
 33 |                     choices=model_names,
 34 |                     help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet18)')
 35 | parser.add_argument('-j',
 36 |                     '--workers',
 37 |                     default=4,
 38 |                     type=int,
 39 |                     metavar='N',
 40 |                     help='number of data loading workers (default: 4)')
 41 | parser.add_argument('--epochs', default=90, type=int, metavar='N', help='number of total epochs to run')
 42 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)')
 43 | parser.add_argument('-b',
 44 |                     '--batch-size',
 45 |                     default=400,
 46 |                     type=int,
 47 |                     metavar='N',
 48 |                     help='mini-batch size (default: 3200), this is the total '
 49 |                     'batch size of all GPUs on the current node when '
 50 |                     'using Data Parallel or Distributed Data Parallel')
 51 | parser.add_argument('--lr',
 52 |                     '--learning-rate',
 53 |                     default=0.1,
 54 |                     type=float,
 55 |                     metavar='LR',
 56 |                     help='initial learning rate',
 57 |                     dest='lr')
 58 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum')
 59 | parser.add_argument('--wd',
 60 |                     '--weight-decay',
 61 |                     default=1e-4,
 62 |                     type=float,
 63 |                     metavar='W',
 64 |                     help='weight decay (default: 1e-4)',
 65 |                     dest='weight_decay')
 66 | parser.add_argument('-p', '--print-freq', default=10, type=int, metavar='N', help='print frequency (default: 10)')
 67 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', help='evaluate model on validation set')
 68 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', help='use pre-trained model')
 69 | parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ')
 70 | parser.add_argument('--dist-file', default=None, type=str, help='file used to initial distributed training')
 71 | 
 72 | best_acc1 = 0
 73 | 
 74 | 
 75 | def main():
 76 |     args = parser.parse_args()
 77 | 
 78 |     if args.seed is not None:
 79 |         random.seed(args.seed)
 80 |         torch.manual_seed(args.seed)
 81 |         cudnn.deterministic = True
 82 |         # torch.backends.cudnn.enabled = False
 83 |         warnings.warn('You have chosen to seed training. '
 84 |                       'This will turn on the CUDNN deterministic setting, '
 85 |                       'which can slow down your training considerably! '
 86 |                       'You may see unexpected behavior when restarting '
 87 |                       'from checkpoints.')
 88 | 
 89 |     args.local_rank = int(os.environ["SLURM_PROCID"])
 90 |     args.world_size = int(os.environ["SLURM_NPROCS"])
 91 |     ngpus_per_node = torch.cuda.device_count()
 92 | 
 93 |     job_id = os.environ["SLURM_JOBID"]
 94 |     args.dist_url = "file://{}.{}".format(os.path.realpath(args.dist_file), job_id)
 95 |     mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
 96 | 
 97 | 
 98 | def main_worker(gpu, ngpus_per_node, args):
 99 |     global best_acc1
100 |     rank = args.local_rank * ngpus_per_node + gpu
101 |     dist.init_process_group(backend='nccl', init_method=args.dist_url, world_size=args.world_size, rank=rank)
102 |     # create model
103 |     if args.pretrained:
104 |         print("=> using pre-trained model '{}'".format(args.arch))
105 |         model = models.__dict__[args.arch](pretrained=True)
106 |     else:
107 |         print("=> creating model '{}'".format(args.arch))
108 |         model = models.__dict__[args.arch]()
109 | 
110 |     torch.cuda.set_device(gpu)
111 |     model.cuda(gpu)
112 |     # When using a single GPU per process and per
113 |     # DistributedDataParallel, we need to divide the batch size
114 |     # ourselves based on the total number of GPUs we have
115 |     args.batch_size = int(args.batch_size / ngpus_per_node)
116 |     model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[gpu])
117 | 
118 |     # define loss function (criterion) and optimizer
119 |     criterion = nn.CrossEntropyLoss().cuda(gpu)
120 | 
121 |     optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay)
122 | 
123 |     cudnn.benchmark = True
124 | 
125 |     # Data loading code
126 |     traindir = os.path.join(args.data, 'train')
127 |     valdir = os.path.join(args.data, 'val')
128 |     normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
129 | 
130 |     train_dataset = datasets.ImageFolder(
131 |         traindir,
132 |         transforms.Compose([
133 |             transforms.RandomResizedCrop(224),
134 |             transforms.RandomHorizontalFlip(),
135 |             transforms.ToTensor(),
136 |             normalize,
137 |         ]))
138 | 
139 |     train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
140 | 
141 |     train_loader = torch.utils.data.DataLoader(train_dataset,
142 |                                                batch_size=args.batch_size,
143 |                                                shuffle=(train_sampler is None),
144 |                                                num_workers=2,
145 |                                                pin_memory=True,
146 |                                                sampler=train_sampler)
147 | 
148 |     val_loader = torch.utils.data.DataLoader(datasets.ImageFolder(
149 |         valdir,
150 |         transforms.Compose([
151 |             transforms.Resize(256),
152 |             transforms.CenterCrop(224),
153 |             transforms.ToTensor(),
154 |             normalize,
155 |         ])),
156 |                                              batch_size=args.batch_size,
157 |                                              shuffle=False,
158 |                                              num_workers=2,
159 |                                              pin_memory=True)
160 | 
161 |     if args.evaluate:
162 |         validate(val_loader, model, criterion, gpu, args)
163 |         return
164 | 
165 |     log_csv = "distributed.csv"
166 | 
167 |     for epoch in range(args.start_epoch, args.epochs):
168 |         epoch_start = time.time()
169 | 
170 |         train_sampler.set_epoch(epoch)
171 |         adjust_learning_rate(optimizer, epoch, args)
172 | 
173 |         # train for one epoch
174 |         train(train_loader, model, criterion, optimizer, epoch, gpu, args)
175 | 
176 |         # evaluate on validation set
177 |         acc1 = validate(val_loader, model, criterion, gpu, args)
178 | 
179 |         # remember best acc@1 and save checkpoint
180 |         is_best = acc1 > best_acc1
181 |         best_acc1 = max(acc1, best_acc1)
182 | 
183 |         epoch_end = time.time()
184 | 
185 |         with open(log_csv, 'a+') as f:
186 |             csv_write = csv.writer(f)
187 |             data_row = [time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(epoch_start)), epoch_end - epoch_start]
188 |             csv_write.writerow(data_row)
189 | 
190 |         save_checkpoint(
191 |             {
192 |                 'epoch': epoch + 1,
193 |                 'arch': args.arch,
194 |                 'state_dict': model.module.state_dict(),
195 |                 'best_acc1': best_acc1,
196 |             }, is_best)
197 | 
198 | 
199 | def train(train_loader, model, criterion, optimizer, epoch, gpu, args):
200 |     batch_time = AverageMeter('Time', ':6.3f')
201 |     data_time = AverageMeter('Data', ':6.3f')
202 |     losses = AverageMeter('Loss', ':.4e')
203 |     top1 = AverageMeter('Acc@1', ':6.2f')
204 |     top5 = AverageMeter('Acc@5', ':6.2f')
205 |     progress = ProgressMeter(len(train_loader), [batch_time, data_time, losses, top1, top5],
206 |                              prefix="Epoch: [{}]".format(epoch))
207 | 
208 |     # switch to train mode
209 |     model.train()
210 | 
211 |     end = time.time()
212 |     for i, (images, target) in enumerate(train_loader):
213 |         # measure data loading time
214 |         data_time.update(time.time() - end)
215 | 
216 |         images = images.cuda(gpu, non_blocking=True)
217 |         target = target.cuda(gpu, non_blocking=True)
218 | 
219 |         # compute output
220 |         output = model(images)
221 |         loss = criterion(output, target)
222 | 
223 |         # measure accuracy and record loss
224 |         acc1, acc5 = accuracy(output, target, topk=(1, 5))
225 |         losses.update(loss.item(), images.size(0))
226 |         top1.update(acc1[0], images.size(0))
227 |         top5.update(acc5[0], images.size(0))
228 | 
229 |         # compute gradient and do SGD step
230 |         optimizer.zero_grad()
231 |         loss.backward()
232 |         optimizer.step()
233 | 
234 |         # measure elapsed time
235 |         batch_time.update(time.time() - end)
236 |         end = time.time()
237 | 
238 |         if i % args.print_freq == 0:
239 |             progress.display(i)
240 | 
241 | 
242 | def validate(val_loader, model, criterion, gpu, args):
243 |     batch_time = AverageMeter('Time', ':6.3f')
244 |     losses = AverageMeter('Loss', ':.4e')
245 |     top1 = AverageMeter('Acc@1', ':6.2f')
246 |     top5 = AverageMeter('Acc@5', ':6.2f')
247 |     progress = ProgressMeter(len(val_loader), [batch_time, losses, top1, top5], prefix='Test: ')
248 | 
249 |     # switch to evaluate mode
250 |     model.eval()
251 | 
252 |     with torch.no_grad():
253 |         end = time.time()
254 |         for i, (images, target) in enumerate(val_loader):
255 |             images = images.cuda(gpu, non_blocking=True)
256 |             target = target.cuda(gpu, non_blocking=True)
257 | 
258 |             # compute output
259 |             output = model(images)
260 |             loss = criterion(output, target)
261 | 
262 |             # measure accuracy and record loss
263 |             acc1, acc5 = accuracy(output, target, topk=(1, 5))
264 |             losses.update(loss.item(), images.size(0))
265 |             top1.update(acc1[0], images.size(0))
266 |             top5.update(acc5[0], images.size(0))
267 | 
268 |             # measure elapsed time
269 |             batch_time.update(time.time() - end)
270 |             end = time.time()
271 | 
272 |             if i % args.print_freq == 0:
273 |                 progress.display(i)
274 | 
275 |         # TODO: this should also be done with the ProgressMeter
276 |         print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5))
277 | 
278 |     return top1.avg
279 | 
280 | 
281 | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
282 |     torch.save(state, filename)
283 |     if is_best:
284 |         shutil.copyfile(filename, 'model_best.pth.tar')
285 | 
286 | 
287 | class AverageMeter(object):
288 |     """Computes and stores the average and current value"""
289 |     def __init__(self, name, fmt=':f'):
290 |         self.name = name
291 |         self.fmt = fmt
292 |         self.reset()
293 | 
294 |     def reset(self):
295 |         self.val = 0
296 |         self.avg = 0
297 |         self.sum = 0
298 |         self.count = 0
299 | 
300 |     def update(self, val, n=1):
301 |         self.val = val
302 |         self.sum += val * n
303 |         self.count += n
304 |         self.avg = self.sum / self.count
305 | 
306 |     def __str__(self):
307 |         fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
308 |         return fmtstr.format(**self.__dict__)
309 | 
310 | 
311 | class ProgressMeter(object):
312 |     def __init__(self, num_batches, meters, prefix=""):
313 |         self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
314 |         self.meters = meters
315 |         self.prefix = prefix
316 | 
317 |     def display(self, batch):
318 |         entries = [self.prefix + self.batch_fmtstr.format(batch)]
319 |         entries += [str(meter) for meter in self.meters]
320 |         print('\t'.join(entries))
321 | 
322 |     def _get_batch_fmtstr(self, num_batches):
323 |         num_digits = len(str(num_batches // 1))
324 |         fmt = '{:' + str(num_digits) + 'd}'
325 |         return '[' + fmt + '/' + fmt.format(num_batches) + ']'
326 | 
327 | 
328 | def adjust_learning_rate(optimizer, epoch, args):
329 |     """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
330 |     lr = args.lr * (0.1**(epoch // 30))
331 |     for param_group in optimizer.param_groups:
332 |         param_group['lr'] = lr
333 | 
334 | 
335 | def accuracy(output, target, topk=(1, )):
336 |     """Computes the accuracy over the k top predictions for the specified values of k"""
337 |     with torch.no_grad():
338 |         maxk = max(topk)
339 |         batch_size = target.size(0)
340 | 
341 |         _, pred = output.topk(maxk, 1, True, True)
342 |         pred = pred.t()
343 |         correct = pred.eq(target.view(1, -1).expand_as(pred))
344 | 
345 |         res = []
346 |         for k in topk:
347 |             correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
348 |             res.append(correct_k.mul_(100.0 / batch_size))
349 |         return res
350 | 
351 | 
352 | if __name__ == '__main__':
353 |     main()


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-master/horovod_distributed.py:
--------------------------------------------------------------------------------
  1 | import csv
  2 | 
  3 | import argparse
  4 | import os
  5 | import random
  6 | import shutil
  7 | import time
  8 | import warnings
  9 | 
 10 | import torch
 11 | import torch.nn as nn
 12 | import torch.nn.parallel
 13 | import torch.backends.cudnn as cudnn
 14 | import torch.distributed as dist
 15 | import torch.optim
 16 | import torch.multiprocessing as mp
 17 | import torch.utils.data
 18 | import torch.utils.data.distributed
 19 | import torchvision.transforms as transforms
 20 | import torchvision.datasets as datasets
 21 | import torchvision.models as models
 22 | import horovod.torch as hvd
 23 | 
 24 | model_names = sorted(name for name in models.__dict__
 25 |                      if name.islower() and not name.startswith("__") and callable(models.__dict__[name]))
 26 | 
 27 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training')
 28 | parser.add_argument('--data', metavar='DIR', default='/home/zhangzhi/Data/ImageNet2012', help='path to dataset')
 29 | parser.add_argument('-a',
 30 |                     '--arch',
 31 |                     metavar='ARCH',
 32 |                     default='resnet18',
 33 |                     choices=model_names,
 34 |                     help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet18)')
 35 | parser.add_argument('-j',
 36 |                     '--workers',
 37 |                     default=4,
 38 |                     type=int,
 39 |                     metavar='N',
 40 |                     help='number of data loading workers (default: 4)')
 41 | parser.add_argument('--epochs', default=90, type=int, metavar='N', help='number of total epochs to run')
 42 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)')
 43 | parser.add_argument('-b',
 44 |                     '--batch-size',
 45 |                     default=3200,
 46 |                     type=int,
 47 |                     metavar='N',
 48 |                     help='mini-batch size (default: 3200), this is the total '
 49 |                     'batch size of all GPUs on the current node when '
 50 |                     'using Data Parallel or Distributed Data Parallel')
 51 | parser.add_argument('--lr',
 52 |                     '--learning-rate',
 53 |                     default=0.1,
 54 |                     type=float,
 55 |                     metavar='LR',
 56 |                     help='initial learning rate',
 57 |                     dest='lr')
 58 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum')
 59 | parser.add_argument('--wd',
 60 |                     '--weight-decay',
 61 |                     default=1e-4,
 62 |                     type=float,
 63 |                     metavar='W',
 64 |                     help='weight decay (default: 1e-4)',
 65 |                     dest='weight_decay')
 66 | parser.add_argument('-p', '--print-freq', default=10, type=int, metavar='N', help='print frequency (default: 10)')
 67 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', help='evaluate model on validation set')
 68 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', help='use pre-trained model')
 69 | parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ')
 70 | 
 71 | best_acc1 = 0
 72 | 
 73 | 
 74 | def main():
 75 |     args = parser.parse_args()
 76 | 
 77 |     if args.seed is not None:
 78 |         random.seed(args.seed)
 79 |         torch.manual_seed(args.seed)
 80 |         cudnn.deterministic = True
 81 |         warnings.warn('You have chosen to seed training. '
 82 |                       'This will turn on the CUDNN deterministic setting, '
 83 |                       'which can slow down your training considerably! '
 84 |                       'You may see unexpected behavior when restarting '
 85 |                       'from checkpoints.')
 86 | 
 87 |     hvd.init()
 88 |     local_rank = hvd.local_rank()
 89 |     torch.cuda.set_device(local_rank)
 90 | 
 91 |     main_worker(local_rank, 4, args)
 92 | 
 93 | 
 94 | def main_worker(gpu, ngpus_per_node, args):
 95 |     global best_acc1
 96 | 
 97 |     # create model
 98 |     if args.pretrained:
 99 |         print("=> using pre-trained model '{}'".format(args.arch))
100 |         model = models.__dict__[args.arch](pretrained=True)
101 |     else:
102 |         print("=> creating model '{}'".format(args.arch))
103 |         model = models.__dict__[args.arch]()
104 | 
105 |     model.cuda()
106 |     # When using a single GPU per process and per
107 |     # DistributedDataParallel, we need to divide the batch size
108 |     # ourselves based on the total number of GPUs we have
109 |     args.batch_size = int(args.batch_size / ngpus_per_node)
110 |     
111 |     hvd.broadcast_parameters(model.state_dict(), root_rank=0)
112 | 
113 |     # define loss function (criterion) and optimizer
114 |     criterion = nn.CrossEntropyLoss().cuda()
115 | 
116 |     optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay)
117 |     hvd.broadcast_optimizer_state(optimizer, root_rank=0)
118 |     compression = hvd.Compression.fp16
119 | 
120 |     optimizer = hvd.DistributedOptimizer(optimizer, named_parameters=model.named_parameters(), compression=compression)
121 | 
122 |     cudnn.benchmark = True
123 | 
124 |     # Data loading code
125 |     traindir = os.path.join(args.data, 'train')
126 |     valdir = os.path.join(args.data, 'val')
127 |     normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
128 | 
129 |     train_dataset = datasets.ImageFolder(
130 |         traindir,
131 |         transforms.Compose([
132 |             transforms.RandomResizedCrop(224),
133 |             transforms.RandomHorizontalFlip(),
134 |             transforms.ToTensor(),
135 |             normalize,
136 |         ]))
137 | 
138 |     train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset,
139 |                                                                     num_replicas=hvd.size(),
140 |                                                                     rank=hvd.rank())
141 | 
142 |     train_loader = torch.utils.data.DataLoader(train_dataset,
143 |                                                batch_size=args.batch_size,
144 |                                                shuffle=(train_sampler is None),
145 |                                                num_workers=2,
146 |                                                pin_memory=True,
147 |                                                sampler=train_sampler)
148 | 
149 |     val_loader = torch.utils.data.DataLoader(datasets.ImageFolder(
150 |         valdir,
151 |         transforms.Compose([
152 |             transforms.Resize(256),
153 |             transforms.CenterCrop(224),
154 |             transforms.ToTensor(),
155 |             normalize,
156 |         ])),
157 |                                              batch_size=args.batch_size,
158 |                                              shuffle=False,
159 |                                              num_workers=2,
160 |                                              pin_memory=True)
161 | 
162 |     if args.evaluate:
163 |         validate(val_loader, model, criterion, args)
164 |         return
165 | 
166 |     log_csv = "horovod_distributed.csv"
167 | 
168 |     for epoch in range(args.start_epoch, args.epochs):
169 |         epoch_start = time.time()
170 | 
171 |         train_sampler.set_epoch(epoch)
172 |         adjust_learning_rate(optimizer, epoch, args)
173 | 
174 |         # train for one epoch
175 |         train(train_loader, model, criterion, optimizer, epoch, args)
176 | 
177 |         # evaluate on validation set
178 |         acc1 = validate(val_loader, model, criterion, args)
179 | 
180 |         # remember best acc@1 and save checkpoint
181 |         is_best = acc1 > best_acc1
182 |         best_acc1 = max(acc1, best_acc1)
183 | 
184 |         epoch_end = time.time()
185 | 
186 |         with open(log_csv, 'a+') as f:
187 |             csv_write = csv.writer(f)
188 |             data_row = [time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(epoch_start)), epoch_end - epoch_start]
189 |             csv_write.writerow(data_row)
190 | 
191 |         save_checkpoint(
192 |             {
193 |                 'epoch': epoch + 1,
194 |                 'arch': args.arch,
195 |                 'state_dict': model.state_dict(),
196 |                 'best_acc1': best_acc1,
197 |             }, is_best)
198 | 
199 | 
200 | def train(train_loader, model, criterion, optimizer, epoch, args):
201 |     batch_time = AverageMeter('Time', ':6.3f')
202 |     data_time = AverageMeter('Data', ':6.3f')
203 |     losses = AverageMeter('Loss', ':.4e')
204 |     top1 = AverageMeter('Acc@1', ':6.2f')
205 |     top5 = AverageMeter('Acc@5', ':6.2f')
206 |     progress = ProgressMeter(len(train_loader), [batch_time, data_time, losses, top1, top5],
207 |                              prefix="Epoch: [{}]".format(epoch))
208 | 
209 |     # switch to train mode
210 |     model.train()
211 | 
212 |     end = time.time()
213 |     for i, (images, target) in enumerate(train_loader):
214 |         # measure data loading time
215 |         data_time.update(time.time() - end)
216 | 
217 |         images = images.cuda(non_blocking=True)
218 |         target = target.cuda(non_blocking=True)
219 |         # compute output
220 |         output = model(images)
221 |         loss = criterion(output, target)
222 | 
223 |         # measure accuracy and record loss
224 |         acc1, acc5 = accuracy(output, target, topk=(1, 5))
225 |         losses.update(loss.item(), images.size(0))
226 |         top1.update(acc1[0], images.size(0))
227 |         top5.update(acc5[0], images.size(0))
228 | 
229 |         # compute gradient and do SGD step
230 |         optimizer.zero_grad()
231 |         loss.backward()
232 |         optimizer.step()
233 | 
234 |         # measure elapsed time
235 |         batch_time.update(time.time() - end)
236 |         end = time.time()
237 | 
238 |         if i % args.print_freq == 0:
239 |             progress.display(i)
240 | 
241 | 
242 | def validate(val_loader, model, criterion, args):
243 |     batch_time = AverageMeter('Time', ':6.3f')
244 |     losses = AverageMeter('Loss', ':.4e')
245 |     top1 = AverageMeter('Acc@1', ':6.2f')
246 |     top5 = AverageMeter('Acc@5', ':6.2f')
247 |     progress = ProgressMeter(len(val_loader), [batch_time, losses, top1, top5], prefix='Test: ')
248 | 
249 |     # switch to evaluate mode
250 |     model.eval()
251 | 
252 |     with torch.no_grad():
253 |         end = time.time()
254 |         for i, (images, target) in enumerate(val_loader):
255 |             images = images.cuda(non_blocking=True)
256 |             target = target.cuda(non_blocking=True)
257 |             # compute output
258 |             output = model(images)
259 |             loss = criterion(output, target)
260 | 
261 |             # measure accuracy and record loss
262 |             acc1, acc5 = accuracy(output, target, topk=(1, 5))
263 |             losses.update(loss.item(), images.size(0))
264 |             top1.update(acc1[0], images.size(0))
265 |             top5.update(acc5[0], images.size(0))
266 | 
267 |             # measure elapsed time
268 |             batch_time.update(time.time() - end)
269 |             end = time.time()
270 | 
271 |             if i % args.print_freq == 0:
272 |                 progress.display(i)
273 | 
274 |         # TODO: this should also be done with the ProgressMeter
275 |         print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5))
276 | 
277 |     return top1.avg
278 | 
279 | 
280 | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
281 |     torch.save(state, filename)
282 |     if is_best:
283 |         shutil.copyfile(filename, 'model_best.pth.tar')
284 | 
285 | 
286 | class AverageMeter(object):
287 |     """Computes and stores the average and current value"""
288 |     def __init__(self, name, fmt=':f'):
289 |         self.name = name
290 |         self.fmt = fmt
291 |         self.reset()
292 | 
293 |     def reset(self):
294 |         self.val = 0
295 |         self.avg = 0
296 |         self.sum = 0
297 |         self.count = 0
298 | 
299 |     def update(self, val, n=1):
300 |         self.val = val
301 |         self.sum += val * n
302 |         self.count += n
303 |         self.avg = self.sum / self.count
304 | 
305 |     def __str__(self):
306 |         fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
307 |         return fmtstr.format(**self.__dict__)
308 | 
309 | 
310 | class ProgressMeter(object):
311 |     def __init__(self, num_batches, meters, prefix=""):
312 |         self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
313 |         self.meters = meters
314 |         self.prefix = prefix
315 | 
316 |     def display(self, batch):
317 |         entries = [self.prefix + self.batch_fmtstr.format(batch)]
318 |         entries += [str(meter) for meter in self.meters]
319 |         print('\t'.join(entries))
320 | 
321 |     def _get_batch_fmtstr(self, num_batches):
322 |         num_digits = len(str(num_batches // 1))
323 |         fmt = '{:' + str(num_digits) + 'd}'
324 |         return '[' + fmt + '/' + fmt.format(num_batches) + ']'
325 | 
326 | 
327 | def adjust_learning_rate(optimizer, epoch, args):
328 |     """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
329 |     lr = args.lr * (0.1**(epoch // 30))
330 |     for param_group in optimizer.param_groups:
331 |         param_group['lr'] = lr
332 | 
333 | 
334 | def accuracy(output, target, topk=(1, )):
335 |     """Computes the accuracy over the k top predictions for the specified values of k"""
336 |     with torch.no_grad():
337 |         maxk = max(topk)
338 |         batch_size = target.size(0)
339 | 
340 |         _, pred = output.topk(maxk, 1, True, True)
341 |         pred = pred.t()
342 |         correct = pred.eq(target.view(1, -1).expand_as(pred))
343 | 
344 |         res = []
345 |         for k in topk:
346 |             correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
347 |             res.append(correct_k.mul_(100.0 / batch_size))
348 |         return res
349 | 
350 | 
351 | if __name__ == '__main__':
352 |     main()


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-master/multiprocessing_distributed.py:
--------------------------------------------------------------------------------
  1 | # https://github.com/pytorch/examples/blob/master/imagenet/main.py
  2 | 
  3 | import csv
  4 | 
  5 | import argparse
  6 | import os
  7 | import random
  8 | import shutil
  9 | import time
 10 | import warnings
 11 | 
 12 | import torch
 13 | import torch.nn as nn
 14 | import torch.nn.parallel
 15 | import torch.backends.cudnn as cudnn
 16 | import torch.distributed as dist
 17 | import torch.optim
 18 | import torch.multiprocessing as mp
 19 | import torch.utils.data
 20 | import torch.utils.data.distributed
 21 | import torchvision.transforms as transforms
 22 | import torchvision.datasets as datasets
 23 | import torchvision.models as models
 24 | 
 25 | model_names = sorted(name for name in models.__dict__
 26 |                      if name.islower() and not name.startswith("__") and callable(models.__dict__[name]))
 27 | 
 28 | parser = argparse.ArgumentParser(description='PyTorch ImageNet Training')
 29 | parser.add_argument('--data', metavar='DIR', default='/home/zhangzhi/Data/ImageNet2012', help='path to dataset')
 30 | parser.add_argument('-a',
 31 |                     '--arch',
 32 |                     metavar='ARCH',
 33 |                     default='resnet18',
 34 |                     choices=model_names,
 35 |                     help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet18)')
 36 | parser.add_argument('-j',
 37 |                     '--workers',
 38 |                     default=4,
 39 |                     type=int,
 40 |                     metavar='N',
 41 |                     help='number of data loading workers (default: 4)')
 42 | parser.add_argument('--epochs', default=90, type=int, metavar='N', help='number of total epochs to run')
 43 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)')
 44 | parser.add_argument('-b',
 45 |                     '--batch-size',
 46 |                     default=256,
 47 |                     type=int,
 48 |                     metavar='N',
 49 |                     help='mini-batch size (default: 256), this is the total '
 50 |                     'batch size of all GPUs on the current node when '
 51 |                     'using Data Parallel or Distributed Data Parallel')
 52 | parser.add_argument('--lr',
 53 |                     '--learning-rate',
 54 |                     default=0.1,
 55 |                     type=float,
 56 |                     metavar='LR',
 57 |                     help='initial learning rate',
 58 |                     dest='lr')
 59 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum')
 60 | parser.add_argument('--wd',
 61 |                     '--weight-decay',
 62 |                     default=1e-4,
 63 |                     type=float,
 64 |                     metavar='W',
 65 |                     help='weight decay (default: 1e-4)',
 66 |                     dest='weight_decay')
 67 | parser.add_argument('-p', '--print-freq', default=10, type=int, metavar='N', help='print frequency (default: 10)')
 68 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', help='evaluate model on validation set')
 69 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', help='use pre-trained model')
 70 | parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ')
 71 | 
 72 | best_acc1 = 0
 73 | 
 74 | 
 75 | def main():
 76 |     args = parser.parse_args()
 77 | 
 78 |     if args.seed is not None:
 79 |         random.seed(args.seed)
 80 |         torch.manual_seed(args.seed)
 81 |         cudnn.deterministic = True
 82 |         warnings.warn('You have chosen to seed training. '
 83 |                       'This will turn on the CUDNN deterministic setting, '
 84 |                       'which can slow down your training considerably! '
 85 |                       'You may see unexpected behavior when restarting '
 86 |                       'from checkpoints.')
 87 | 
 88 |     mp.spawn(main_worker, nprocs=4, args=(4, args))
 89 | 
 90 | 
 91 | def main_worker(gpu, ngpus_per_node, args):
 92 |     global best_acc1
 93 | 
 94 |     dist.init_process_group(backend='nccl', init_method='tcp://127.0.0.1:23456', world_size=4, rank=gpu)
 95 |     # create model
 96 |     if args.pretrained:
 97 |         print("=> using pre-trained model '{}'".format(args.arch))
 98 |         model = models.__dict__[args.arch](pretrained=True)
 99 |     else:
100 |         print("=> creating model '{}'".format(args.arch))
101 |         model = models.__dict__[args.arch]()
102 | 
103 |     torch.cuda.set_device(gpu)
104 |     model.cuda(gpu)
105 |     # When using a single GPU per process and per
106 |     # DistributedDataParallel, we need to divide the batch size
107 |     # ourselves based on the total number of GPUs we have
108 |     args.batch_size = int(args.batch_size / ngpus_per_node)
109 |     model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[gpu])
110 | 
111 |     # define loss function (criterion) and optimizer
112 |     criterion = nn.CrossEntropyLoss().cuda(gpu)
113 | 
114 |     optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay)
115 | 
116 |     cudnn.benchmark = True
117 | 
118 |     # Data loading code
119 |     traindir = os.path.join(args.data, 'train')
120 |     valdir = os.path.join(args.data, 'val')
121 |     normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
122 | 
123 |     train_dataset = datasets.ImageFolder(
124 |         traindir,
125 |         transforms.Compose([
126 |             transforms.RandomResizedCrop(224),
127 |             transforms.RandomHorizontalFlip(),
128 |             transforms.ToTensor(),
129 |             normalize,
130 |         ]))
131 | 
132 |     train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
133 | 
134 |     train_loader = torch.utils.data.DataLoader(train_dataset,
135 |                                                batch_size=args.batch_size,
136 |                                                shuffle=(train_sampler is None),
137 |                                                num_workers=2,
138 |                                                pin_memory=True,
139 |                                                sampler=train_sampler)
140 | 
141 |     val_loader = torch.utils.data.DataLoader(datasets.ImageFolder(
142 |         valdir,
143 |         transforms.Compose([
144 |             transforms.Resize(256),
145 |             transforms.CenterCrop(224),
146 |             transforms.ToTensor(),
147 |             normalize,
148 |         ])),
149 |                                              batch_size=args.batch_size,
150 |                                              shuffle=False,
151 |                                              num_workers=2,
152 |                                              pin_memory=True)
153 | 
154 |     if args.evaluate:
155 |         validate(val_loader, model, criterion, gpu, args)
156 |         return
157 | 
158 |     log_csv = "multiprocessing_distributed.csv"
159 | 
160 |     for epoch in range(args.start_epoch, args.epochs):
161 |         epoch_start = time.time()
162 | 
163 |         train_sampler.set_epoch(epoch)
164 |         adjust_learning_rate(optimizer, epoch, args)
165 | 
166 |         # train for one epoch
167 |         train(train_loader, model, criterion, optimizer, epoch, gpu, args)
168 | 
169 |         # evaluate on validation set
170 |         acc1 = validate(val_loader, model, criterion, gpu, args)
171 | 
172 |         # remember best acc@1 and save checkpoint
173 |         is_best = acc1 > best_acc1
174 |         best_acc1 = max(acc1, best_acc1)
175 | 
176 |         epoch_end = time.time()
177 | 
178 |         with open(log_csv, 'a+') as f:
179 |             csv_write = csv.writer(f)
180 |             data_row = [time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(epoch_start)), epoch_end - epoch_start]
181 |             csv_write.writerow(data_row)
182 | 
183 |         save_checkpoint(
184 |             {
185 |                 'epoch': epoch + 1,
186 |                 'arch': args.arch,
187 |                 'state_dict': model.module.state_dict(),
188 |                 'best_acc1': best_acc1,
189 |             }, is_best)
190 | 
191 | 
192 | def train(train_loader, model, criterion, optimizer, epoch, gpu, args):
193 |     batch_time = AverageMeter('Time', ':6.3f')
194 |     data_time = AverageMeter('Data', ':6.3f')
195 |     losses = AverageMeter('Loss', ':.4e')
196 |     top1 = AverageMeter('Acc@1', ':6.2f')
197 |     top5 = AverageMeter('Acc@5', ':6.2f')
198 |     progress = ProgressMeter(len(train_loader), [batch_time, data_time, losses, top1, top5],
199 |                              prefix="Epoch: [{}]".format(epoch))
200 | 
201 |     # switch to train mode
202 |     model.train()
203 | 
204 |     end = time.time()
205 |     for i, (images, target) in enumerate(train_loader):
206 |         # measure data loading time
207 |         data_time.update(time.time() - end)
208 | 
209 |         images = images.cuda(gpu, non_blocking=True)
210 |         target = target.cuda(gpu, non_blocking=True)
211 | 
212 |         # compute output
213 |         output = model(images)
214 |         loss = criterion(output, target)
215 | 
216 |         # measure accuracy and record loss
217 |         acc1, acc5 = accuracy(output, target, topk=(1, 5))
218 |         losses.update(loss.item(), images.size(0))
219 |         top1.update(acc1[0], images.size(0))
220 |         top5.update(acc5[0], images.size(0))
221 | 
222 |         # compute gradient and do SGD step
223 |         optimizer.zero_grad()
224 |         loss.backward()
225 |         optimizer.step()
226 | 
227 |         # measure elapsed time
228 |         batch_time.update(time.time() - end)
229 |         end = time.time()
230 | 
231 |         if i % args.print_freq == 0:
232 |             progress.display(i)
233 | 
234 | 
235 | def validate(val_loader, model, criterion, gpu, args):
236 |     batch_time = AverageMeter('Time', ':6.3f')
237 |     losses = AverageMeter('Loss', ':.4e')
238 |     top1 = AverageMeter('Acc@1', ':6.2f')
239 |     top5 = AverageMeter('Acc@5', ':6.2f')
240 |     progress = ProgressMeter(len(val_loader), [batch_time, losses, top1, top5], prefix='Test: ')
241 | 
242 |     # switch to evaluate mode
243 |     model.eval()
244 | 
245 |     with torch.no_grad():
246 |         end = time.time()
247 |         for i, (images, target) in enumerate(val_loader):
248 |             images = images.cuda(gpu, non_blocking=True)
249 |             target = target.cuda(gpu, non_blocking=True)
250 | 
251 |             # compute output
252 |             output = model(images)
253 |             loss = criterion(output, target)
254 | 
255 |             # measure accuracy and record loss
256 |             acc1, acc5 = accuracy(output, target, topk=(1, 5))
257 |             losses.update(loss.item(), images.size(0))
258 |             top1.update(acc1[0], images.size(0))
259 |             top5.update(acc5[0], images.size(0))
260 | 
261 |             # measure elapsed time
262 |             batch_time.update(time.time() - end)
263 |             end = time.time()
264 | 
265 |             if i % args.print_freq == 0:
266 |                 progress.display(i)
267 | 
268 |         # TODO: this should also be done with the ProgressMeter
269 |         print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5))
270 | 
271 |     return top1.avg
272 | 
273 | 
274 | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
275 |     torch.save(state, filename)
276 |     if is_best:
277 |         shutil.copyfile(filename, 'model_best.pth.tar')
278 | 
279 | 
280 | class AverageMeter(object):
281 |     """Computes and stores the average and current value"""
282 |     def __init__(self, name, fmt=':f'):
283 |         self.name = name
284 |         self.fmt = fmt
285 |         self.reset()
286 | 
287 |     def reset(self):
288 |         self.val = 0
289 |         self.avg = 0
290 |         self.sum = 0
291 |         self.count = 0
292 | 
293 |     def update(self, val, n=1):
294 |         self.val = val
295 |         self.sum += val * n
296 |         self.count += n
297 |         self.avg = self.sum / self.count
298 | 
299 |     def __str__(self):
300 |         fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
301 |         return fmtstr.format(**self.__dict__)
302 | 
303 | 
304 | class ProgressMeter(object):
305 |     def __init__(self, num_batches, meters, prefix=""):
306 |         self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
307 |         self.meters = meters
308 |         self.prefix = prefix
309 | 
310 |     def display(self, batch):
311 |         entries = [self.prefix + self.batch_fmtstr.format(batch)]
312 |         entries += [str(meter) for meter in self.meters]
313 |         print('\t'.join(entries))
314 | 
315 |     def _get_batch_fmtstr(self, num_batches):
316 |         num_digits = len(str(num_batches // 1))
317 |         fmt = '{:' + str(num_digits) + 'd}'
318 |         return '[' + fmt + '/' + fmt.format(num_batches) + ']'
319 | 
320 | 
321 | def adjust_learning_rate(optimizer, epoch, args):
322 |     """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
323 |     lr = args.lr * (0.1**(epoch // 30))
324 |     for param_group in optimizer.param_groups:
325 |         param_group['lr'] = lr
326 | 
327 | 
328 | def accuracy(output, target, topk=(1, )):
329 |     """Computes the accuracy over the k top predictions for the specified values of k"""
330 |     with torch.no_grad():
331 |         maxk = max(topk)
332 |         batch_size = target.size(0)
333 | 
334 |         _, pred = output.topk(maxk, 1, True, True)
335 |         pred = pred.t()
336 |         correct = pred.eq(target.view(1, -1).expand_as(pred))
337 | 
338 |         res = []
339 |         for k in topk:
340 |             correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
341 |             res.append(correct_k.mul_(100.0 / batch_size))
342 |         return res
343 | 
344 | 
345 | if __name__ == '__main__':
346 |     main()


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-master/requirements.txt:
--------------------------------------------------------------------------------
1 | torch==1.3.0
2 | torchvision==0.4.0
3 | apex==0.9.10
4 | horovod==0.18.2
5 | 


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-master/start.sh:
--------------------------------------------------------------------------------
1 | python multiprocessing_distributed.py
2 | CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 distributed.py
3 | CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 apex_distributed.py
4 | HOROVOD_WITH_PYTORCH=1 CUDA_VISIBLE_DEVICES=0,1,2,3 horovodrun -np 4 -H localhost:4 --verbose python horovod_distributed.py


--------------------------------------------------------------------------------
/appendix/production/distributed/pytorch-distributed-master/statistics.sh:
--------------------------------------------------------------------------------
1 | nvidia-smi -i 0,1,2,3 --format=csv,noheader,nounits --query-gpu=timestamp,index,memory.total,memory.used,memory.free,utilization.gpu,utilization.memory -lms 500 -f multiprocessing_distributed_log.csv
2 | nvidia-smi -i 0,1,2,3 --format=csv,noheader,nounits --query-gpu=timestamp,index,memory.total,memory.used,memory.free,utilization.gpu,utilization.memory -lms 500 -f distributed_log.csv
3 | nvidia-smi -i 0,1,2,3 --format=csv,noheader,nounits --query-gpu=timestamp,index,memory.total,memory.used,memory.free,utilization.gpu,utilization.memory -lms 500 -f apex_distributed_log.csv
4 | nvidia-smi -i 0,1,2,3 --format=csv,noheader,nounits --query-gpu=timestamp,index,memory.total,memory.used,memory.free,utilization.gpu,utilization.memory -lms 500 -f horovod_distributed_log.csv


--------------------------------------------------------------------------------
/appendix/production/inference/TensorRT/README.md:
--------------------------------------------------------------------------------
  1 | # PyTorch_ONNX_TensorRT
  2 | - [ ] 跑通一个 分类的程序 + cpp 调用代码 ! !
  3 | 
  4 | - [ ] 搞明白 tensort 的基本逻辑
  5 | 
  6 | - [ ] tensorRT 对齐操作(int8量化)  
  7 | 
  8 | - [ ] 如何添加一个新层
  9 | 
 10 | - [ ] 跑通一个检测流程
 11 | 
 12 | - [ ] 整理 README 文档(安装、两个示例程序、辅助脚本、常见错误)
 13 | 
 14 |   
 15 | 
 16 | **!!! 生成的模型， 必须由相同的算力的显卡来解析, 具体的算力可以在 https://developer.nvidia.com/cuda-gpus 查看。**
 17 | 
 18 | ## 安装 TensorRT
 19 | 
 20 | >Python 环境:
 21 | >
 22 | >*.pth[pytorch] > *.onnx[onnx] > *.trt[tensorRT] 
 23 | >
 24 | >*.pth[pytorch] > *.txtp[TensorRT] Python 环境
 25 | 
 26 | 
 27 | 
 28 | ## TensorRT 执行流程
 29 | 
 30 | - Create Builder : 包含 TensorRT 组件、pineline、buffer地址、输入输出维度
 31 | 
 32 | - Create Network : 保存训练好的神经网络、其输入是神经网络模型(onnx、tf)， 其输出是可执行的推理引擎。
 33 | 
 34 | - Create Parser: 解析网络
 35 | 
 36 | - Binging input、output　以及自定义组件
 37 | 
 38 | - 序列化或者反序列化
 39 | 
 40 | - 传输计算数据(host->device)
 41 | 
 42 | - 执行计算
 43 | 
 44 | - 传输计算结果(device->host)
 45 | 
 46 | 
 47 | 
 48 | ## Bugs
 49 | 
 50 | **1. AttributeError: ‘NoneType’ object has no attribute ‘create_execution_context’**
 51 | 
 52 | ~~~python
 53 | EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
 54 | 
 55 | def build_engine(model_path):
 56 |     with trt.Builder(TRT_LOGGER) as builder, \
 57 |          builder.create_network(EXPLICIT_BATCH) as network, \
 58 |          trt.OnnxParser(network, TRT_LOGGER) as parser:
 59 | ~~~
 60 | 
 61 | **2. pycuda._driver.LogicError: explicit_context_dependent failed: invalid device context - no currently active context?**
 62 | 
 63 | 原因： pycuda.driver 没有初始化，导致无法得到 context，需要在导入 pycuda.driver 后再导入 pycuda.autoinit,  即如下：
 64 | 
 65 | ~~~
 66 | import pycuda.driver as cuda
 67 | import pycuda.autoinit
 68 | ~~~
 69 | 
 70 | **3. output tensor has no attribute _trt**
 71 | 
 72 | 模型中有一些操作还没有实现，需要自己实现。
 73 | 
 74 | 
 75 | 
 76 | ## Credits
 77 | 
 78 | - https://github.com/zerollzeng/tiny-tensorrt
 79 | - https://github.com/NVIDIA-AI-IOT/torch2trt
 80 | - https://github.com/Rapternmn/PyTorch-Onnx-Tensorrt
 81 | - https://github.com/onnx/onnx-tensorrt
 82 | 
 83 | 
 84 | 
 85 | ## BAK
 86 | 
 87 | 报错：UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xaa in position 8: invalid start byte
 88 | 原因：在打开导入序列化模型时，需要采用’rb’模式才能读，否则不能读取，即在读取序列化模型时，需要做3件事，如下：
 89 | 
 90 | 打开文件，必须用rb模式：with open(cfg.work_dir + ‘serialized.engine’, ‘rb’) as f
 91 | 创建runtime：trt.Runtime(logger) as runtime
 92 | 基于runtime生成反序列化模型：engine = runtime.deserialize_cuda_engine(f.read())
 93 | 报错：onnx.onnx_cpp2py_export.checker.ValidationError: Op registered for Upsample is deprecated in domain_version of 11
 94 | 附加报错信息：
 95 | Context: Bad node spec: input: “085_convolutional_lrelu” output: “086_upsample” name: “086_upsample” op_type: “Upsample” attribute
 96 | { name: “mode” s: “nearest” type: STRING } attribute { name: “scales” floats: 1 floats: 1 floats: 2 floats: 2 type: FLOATS }
 97 | 问题原因：onnx更新太快了，在官方1.5.1以后就取消了upsample层，所以对yolov3报错了。参考https://devtalk.nvidia.com/default/topic/1052153/jetson-nano/tensorrt-backend-for-onnx-on-jetson-nano/1
 98 | 修改方式是降级onnx到1.4.1
 99 | pip uninstall onnx
100 | pip install onnx==1.4.1
101 | 
102 | 
103 | 
104 | 
105 | 
106 | 


--------------------------------------------------------------------------------
/appendix/production/inference/TensorRT/cat.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/production/inference/TensorRT/cat.jpeg


--------------------------------------------------------------------------------
/appendix/production/inference/TensorRT/trt_helper.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import pycuda.autoinit
  3 | import numpy as np
  4 | import pycuda.driver as cuda
  5 | import tensorrt as trt
  6 | import torch
  7 | from .trt_int8_calibration_helper import PythonEntropyCalibrator
  8 | TRT_LOGGER = trt.Logger() # This logger is required to build an engine
  9 | EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
 10 | 
 11 | class HostDeviceMem(object):
 12 |     def __init__(self, host_mem, device_mem):
 13 |         """Within this context, host_mom means the cpu memory and device means the GPU memory
 14 |         """
 15 |         self.host = host_mem 
 16 |         self.device = device_mem
 17 |     def __str__(self):
 18 |         return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)
 19 | 
 20 |     def __repr__(self):
 21 |         return self.__str__()
 22 |     
 23 | def allocate_buffers(engine):
 24 |     inputs = []
 25 |     outputs = []
 26 |     bindings = []
 27 |     stream = cuda.Stream()
 28 |     for binding in engine:
 29 |         size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
 30 |         dtype = trt.nptype(engine.get_binding_dtype(binding))
 31 |         # Allocate host and device buffers
 32 |         host_mem = cuda.pagelocked_empty(size, dtype)
 33 |         device_mem = cuda.mem_alloc(host_mem.nbytes)
 34 |         # Append the device buffer to device bindings.
 35 |         bindings.append(int(device_mem))
 36 |         # Append to the appropriate list.
 37 |         if engine.binding_is_input(binding):
 38 |             inputs.append(HostDeviceMem(host_mem, device_mem))
 39 |         else:
 40 |             outputs.append(HostDeviceMem(host_mem, device_mem))
 41 |     return inputs, outputs, bindings, stream
 42 |         
 43 | def get_engine(max_batch_size=1, onnx_file_path="", engine_file_path="",\
 44 |                fp16_mode=False, int8_mode=False, calibration_stream=None, save_engine=False,
 45 |               ):
 46 |     """Attempts to load a serialized engine if available, otherwise builds a new TensorRT engine and saves it."""
 47 |     def build_engine(max_batch_size, save_engine):
 48 |         """Takes an ONNX file and creates a TensorRT engine to run inference with"""
 49 |         with trt.Builder(TRT_LOGGER) as builder, \
 50 |                 builder.create_network(EXPLICIT_BATCH) as network,\
 51 |                 trt.OnnxParser(network, TRT_LOGGER) as parser:
 52 |             
 53 |             builder.max_workspace_size = 1 << 30 # Your workspace size
 54 |             builder.max_batch_size = max_batch_size
 55 |             #pdb.set_trace()
 56 |             builder.fp16_mode = fp16_mode  # Default: False
 57 |             builder.int8_mode = int8_mode  # Default: False
 58 |             if int8_mode: 
 59 |                 assert calibration_stream, 'Error: a calibration_stream should be provided for int8 mode'
 60 |                 builder.int8_calibrator  = PythonEntropyCalibrator(["input"], calibration_stream, 'calibration_cache.bin')
 61 |                 print('Int8 mode enabled')
 62 |             # Parse model file
 63 |             if not os.path.exists(onnx_file_path):
 64 |                 quit('ONNX file {} not found'.format(onnx_file_path))
 65 |                 
 66 |             print('Loading ONNX file from path {}...'.format(onnx_file_path))
 67 |             with open(onnx_file_path, 'rb') as model:
 68 |                 print('Beginning ONNX file parsing')
 69 |                 parser.parse(model.read())
 70 |                 assert network.num_layers > 0, 'Failed to parse ONNX model. \
 71 |                             Please check if the ONNX model is compatible '
 72 |             
 73 |             print('Completed parsing of ONNX file')
 74 | 
 75 |             print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))        
 76 |             engine = builder.build_cuda_engine(network)
 77 |             # If errors happend when executing builder.build_cuda_engine(network),
 78 |             # a None-Type object would be returned
 79 |             if engine is None:
 80 |                 print('Failed to create the engine')
 81 |                 return None   
 82 |             
 83 |             print("Completed creating the engine")
 84 |             if save_engine:
 85 |                 with open(engine_file_path, "wb") as f:
 86 |                     f.write(engine.serialize())
 87 |             return engine
 88 |         
 89 |     if os.path.exists(engine_file_path):
 90 |         # If a serialized engine exists, load it instead of building a new one.
 91 |         print("Reading engine from file {}".format(engine_file_path))
 92 |         with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
 93 |             return runtime.deserialize_cuda_engine(f.read())
 94 |     else:
 95 |         return build_engine(max_batch_size, save_engine)
 96 |     
 97 | def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
 98 |     # Transfer data from CPU to the GPU.
 99 |     [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
100 |     # Run inference.
101 |     context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
102 |     # Transfer predictions back from the GPU.
103 |     [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
104 |     # Synchronize the stream
105 |     stream.synchronize()
106 |     # Return only the host outputs.
107 |     return [out.host for out in outputs]
108 | 
109 | def postprocess_the_outputs(h_outputs, shape_of_output):
110 |     h_outputs = h_outputs.reshape(*shape_of_output)
111 |     return h_outputs
112 | 


--------------------------------------------------------------------------------
/appendix/production/inference/TensorRT/trt_int8_calibration_helper.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import tensorrt as trt
 3 | import pycuda.driver as cuda
 4 | import pycuda.autoinit
 5 | import numpy as np
 6 | import ctypes
 7 | 
 8 | ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.c_char_p
 9 | ctypes.pythonapi.PyCapsule_GetPointer.argtypes = [ctypes.py_object, ctypes.c_char_p]
10 | 
11 | class PythonEntropyCalibrator(trt.IInt8EntropyCalibrator):
12 |     def __init__(self, input_layers, stream, cache_file='calibration_cache.bin'):
13 |         trt.IInt8EntropyCalibrator.__init__(self)       
14 |         self.input_layers = input_layers
15 |         self.stream = stream
16 |         self.d_input = cuda.mem_alloc(self.stream.calibration_data.nbytes)
17 |         self.cache_file = cache_file
18 |         stream.reset()
19 | 
20 |     def get_batch_size(self):
21 |         return self.stream.batch_size
22 | 
23 |     def get_batch(self, bindings, names):
24 |         batch = self.stream.next_batch()
25 |         if not batch.size:   
26 |             return None
27 |         
28 |         cuda.memcpy_htod(self.d_input, batch)
29 |         for i in self.input_layers[0]:
30 |             assert names[0] != i
31 | 
32 |         bindings[0] = int(self.d_input)
33 |         return bindings
34 | 
35 |     def read_calibration_cache(self, length):
36 |         # If there is a cache, use it instead of calibrating again. Otherwise, implicitly return None.
37 |         if os.path.exists(self.cache_file):
38 |             with open(self.cache_file, "rb") as f:
39 |                 return f.read()
40 | 
41 |     def write_calibration_cache(self, ptr, size):
42 |         #cache = ctypes.c_char_p(int(ptr))
43 |         value = ctypes.pythonapi.PyCapsule_GetPointer(ptr, None)
44 | 
45 |         '''
46 |         # TODO: If the calibration is read from cache 'calibration_cache.bin', it will raise bugs
47 |         #       Will solve this in the future.
48 |         with open(self.cache_file, 'wb') as f:
49 |             #f.write(cache.value)
50 |             f.write(value)
51 |         '''
52 |         return None
53 |     
54 | 
55 | class ImageBatchStreamDemo():
56 |     def __init__(self,dataset, transform, batch_size, img_size, max_batches=10):
57 |         '''
58 |             For calibiration, you need to implement your 'next_batch' and 'reset' functions
59 |         '''
60 |         self.transform = transform
61 |         self.batch_size = batch_size
62 |         self.max_batches = max_batches
63 |         self.dataset = dataset
64 |           
65 |         # self.calibration_data = np.zeros((batch_size, 3, 800, 250), dtype=np.float32)
66 |         self.calibration_data = np.zeros((batch_size,)+ img_size, dtype=np.float32) # This is a data holder for the calibration
67 |         self.batch_count = 0
68 |         
69 |         
70 |     def reset(self):
71 |       self.batch_count = 0
72 |       
73 |     def next_batch(self):
74 |         """
75 |         Return a batch of data every time called
76 |         """
77 |         #self.max_batches = 2
78 |         if self.batch_count < self.max_batches:
79 |             i = self.batch_count
80 |             for i in range(self.batch_size):
81 |                 # You should implement your own data pipeline for writing the calibration_data
82 |                 
83 |                 x = self.dataset[i]   
84 |                 if self.transform:
85 |                     x = self.transform(x) 
86 |                 
87 |                 self.calibration_data[i] = x.data
88 |             self.batch_count += 1
89 |             return np.ascontiguousarray(self.calibration_data, dtype=np.float32) 
90 |         else:
91 |             return np.array([])
92 | 


--------------------------------------------------------------------------------
/appendix/production/inference/flask-api/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | share/python-wheels/
 24 | *.egg-info/
 25 | .installed.cfg
 26 | *.egg
 27 | MANIFEST
 28 | 
 29 | # PyInstaller
 30 | #  Usually these files are written by a python script from a template
 31 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 32 | *.manifest
 33 | *.spec
 34 | 
 35 | # Installer logs
 36 | pip-log.txt
 37 | pip-delete-this-directory.txt
 38 | 
 39 | # Unit test / coverage reports
 40 | htmlcov/
 41 | .tox/
 42 | .nox/
 43 | .coverage
 44 | .coverage.*
 45 | .cache
 46 | nosetests.xml
 47 | coverage.xml
 48 | *.cover
 49 | .hypothesis/
 50 | .pytest_cache/
 51 | 
 52 | # Translations
 53 | *.mo
 54 | *.pot
 55 | 
 56 | # Django stuff:
 57 | *.log
 58 | local_settings.py
 59 | db.sqlite3
 60 | 
 61 | # Flask stuff:
 62 | instance/
 63 | .webassets-cache
 64 | 
 65 | # Scrapy stuff:
 66 | .scrapy
 67 | 
 68 | # Sphinx documentation
 69 | docs/_build/
 70 | 
 71 | # PyBuilder
 72 | target/
 73 | 
 74 | # Jupyter Notebook
 75 | .ipynb_checkpoints
 76 | 
 77 | # IPython
 78 | profile_default/
 79 | ipython_config.py
 80 | 
 81 | # pyenv
 82 | .python-version
 83 | 
 84 | # celery beat schedule file
 85 | celerybeat-schedule
 86 | 
 87 | # SageMath parsed files
 88 | *.sage.py
 89 | 
 90 | # Environments
 91 | .env
 92 | .venv
 93 | env/
 94 | venv/
 95 | ENV/
 96 | env.bak/
 97 | venv.bak/
 98 | 
 99 | # Spyder project settings
100 | .spyderproject
101 | .spyproject
102 | 
103 | # Rope project settings
104 | .ropeproject
105 | 
106 | # mkdocs documentation
107 | /site
108 | 
109 | # mypy
110 | .mypy_cache/
111 | .dmypy.json
112 | dmypy.json
113 | 
114 | # Pyre type checker
115 | .pyre/


--------------------------------------------------------------------------------
/appendix/production/inference/flask-api/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2019 Avinash Sajjanshetty <hi@avi.im>
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 6 | this software and associated documentation files (the "Software"), to deal in
 7 | the Software without restriction, including without limitation the rights to
 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 9 | the Software, and to permit persons to whom the Software is furnished to do so,
10 | subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
17 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
18 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
19 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
20 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
21 | 


--------------------------------------------------------------------------------
/appendix/production/inference/flask-api/README.md:
--------------------------------------------------------------------------------
 1 | # PyTorch Flask API
 2 | 
 3 | This repo contains a sample code to show how to create a Flask API server by deploying our PyTorch model. This is a sample code which goes with [tutorial](https://pytorch.org/tutorials/intermediate/flask_rest_api_tutorial.html).
 4 | 
 5 | If you'd like to learn how to deploy to Heroku, then check [this repo](https://github.com/avinassh/pytorch-flask-api-heroku).
 6 | 
 7 | 
 8 | ## How to 
 9 | 
10 | Install the dependencies:
11 | 
12 |     pip install -r requirements.txt
13 | 
14 | 
15 | Run the Flask server:
16 | 
17 |     FLASK_ENV=development FLASK_APP=app.py flask run
18 | 
19 | 
20 | From another tab, send the image file in a request:
21 | 
22 |     curl -X POST -F file=@cat_pic.jpeg http://localhost:5000/predict
23 | 
24 | 
25 | ## License
26 | 
27 | The mighty MIT license. Please check `LICENSE` for more details.
28 | 


--------------------------------------------------------------------------------
/appendix/production/inference/flask-api/app.py:
--------------------------------------------------------------------------------
 1 | import io
 2 | import json
 3 | 
 4 | from torchvision import models
 5 | import torchvision.transforms as transforms
 6 | from PIL import Image
 7 | from flask import Flask, jsonify, request
 8 | 
 9 | 
10 | app = Flask(__name__)
11 | imagenet_class_index = json.load(open('imagenet_class_index.json'))
12 | model = models.densenet121(pretrained=True)
13 | model.eval()
14 | 
15 | 
16 | def transform_image(image_bytes):
17 |     my_transforms = transforms.Compose([transforms.Resize(255),
18 |                                         transforms.CenterCrop(224),
19 |                                         transforms.ToTensor(),
20 |                                         transforms.Normalize(
21 |                                             [0.485, 0.456, 0.406],
22 |                                             [0.229, 0.224, 0.225])])
23 |     image = Image.open(io.BytesIO(image_bytes))
24 |     return my_transforms(image).unsqueeze(0)
25 | 
26 | 
27 | def get_prediction(image_bytes):
28 |     tensor = transform_image(image_bytes=image_bytes)
29 |     outputs = model.forward(tensor)
30 |     _, y_hat = outputs.max(1)
31 |     predicted_idx = str(y_hat.item())
32 |     return imagenet_class_index[predicted_idx]
33 | 
34 | 
35 | @app.route('/predict', methods=['POST'])
36 | def predict():
37 |     if request.method == 'POST':
38 |         file = request.files['file']
39 |         img_bytes = file.read()
40 |         class_id, class_name = get_prediction(image_bytes=img_bytes)
41 |         return jsonify({'class_id': class_id, 'class_name': class_name})
42 | 
43 | 
44 | if __name__ == '__main__':
45 |     app.run()
46 | 


--------------------------------------------------------------------------------
/appendix/production/inference/flask-api/cat.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/production/inference/flask-api/cat.jpeg


--------------------------------------------------------------------------------
/appendix/production/inference/flask-api/requirements.txt:
--------------------------------------------------------------------------------
1 | torchvision==0.6.0+cu101
2 | Flask==1.1.2
3 | Pillow==7.1.2
4 | 


--------------------------------------------------------------------------------
/appendix/template/.gitignore:
--------------------------------------------------------------------------------
 1 | 
 2 | .DS_Store
 3 | build
 4 | .git
 5 | *.egg-info
 6 | dist
 7 | output
 8 | data/coco
 9 | backup
10 | weights/*.weights
11 | __pycache__
12 | checkpoints
13 | 


--------------------------------------------------------------------------------
/appendix/template/README.md:
--------------------------------------------------------------------------------
 1 | # PyTorch-Template
 2 | 
 3 | 
 4 | 
 5 | 
 6 | 
 7 | 
 8 | 
 9 | 
10 | 
11 | ### TBD:
12 | 
13 | - [ ] Test file 
14 | - [ ] README 
15 | 
16 | ## Credit
17 | 
18 | https://github.com/eriklindernoren/PyTorch-YOLOv3


--------------------------------------------------------------------------------
/appendix/template/assets/images/01.JPG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/assets/images/01.JPG


--------------------------------------------------------------------------------
/appendix/template/assets/images/02.JPG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/assets/images/02.JPG


--------------------------------------------------------------------------------
/appendix/template/data/test/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/test/.gitkeep


--------------------------------------------------------------------------------
/appendix/template/data/test/high/1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/test/high/1.png


--------------------------------------------------------------------------------
/appendix/template/data/test/high/22.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/test/high/22.png


--------------------------------------------------------------------------------
/appendix/template/data/test/low/1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/test/low/1.png


--------------------------------------------------------------------------------
/appendix/template/data/test/low/22.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/test/low/22.png


--------------------------------------------------------------------------------
/appendix/template/data/train/high/2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/train/high/2.png


--------------------------------------------------------------------------------
/appendix/template/data/train/high/5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/train/high/5.png


--------------------------------------------------------------------------------
/appendix/template/data/train/low/2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/train/low/2.png


--------------------------------------------------------------------------------
/appendix/template/data/train/low/5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/train/low/5.png


--------------------------------------------------------------------------------
/appendix/template/data/valid/high/1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/valid/high/1.png


--------------------------------------------------------------------------------
/appendix/template/data/valid/high/22.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/valid/high/22.png


--------------------------------------------------------------------------------
/appendix/template/data/valid/low/1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/valid/low/1.png


--------------------------------------------------------------------------------
/appendix/template/data/valid/low/22.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/data/valid/low/22.png


--------------------------------------------------------------------------------
/appendix/template/datasets.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import random
 3 | from glob import glob
 4 | 
 5 | import numpy as np
 6 | from PIL import Image
 7 | 
 8 | import torch
 9 | from torchvision import transforms
10 | from torch.utils.data import Dataset, DataLoader
11 | from torchvision.transforms import functional as F
12 | 
13 | def transform_train(data, target):
14 |     # random crop
15 |     i, j, h, w = transforms.RandomCrop.get_params(
16 |         data, output_size=(96, 96))
17 |     data = F.crop(data, i, j, h, w)
18 |     target = F.crop(target, i, j, h, w)
19 |     # hflip
20 |     if random.random() > 0.5:
21 |         data = F.hflip(data)
22 |         target = F.hflip(target)
23 |     return F.to_tensor(data), F.to_tensor(target)
24 | 
25 | def transform_test(data, target):
26 |     return F.to_tensor(data), F.to_tensor(target)
27 | 
28 | 
29 | class KKDataset(Dataset):
30 |     """
31 |     """
32 |     def __init__(self, root_dir='./data/train', is_trainval = True, transform=None):
33 |         """
34 |         """
35 |         self.root_dir = root_dir
36 |         self.is_trainval = is_trainval
37 |         self.transform = transform
38 |         self.train_data = sorted(glob(os.path.join(self.root_dir, "low/*.png")))
39 |         self.train_target = sorted(glob(os.path.join(self.root_dir, "high/*.png")))
40 | 
41 |     def __len__(self):
42 |         return len(self.train_data)
43 | 
44 |     def __getitem__(self, idx):
45 |         if torch.is_tensor(idx):
46 |             idx = idx.tolist()
47 | 
48 |         data = Image.open(self.train_data[idx])
49 |         target = Image.open(self.train_target[idx])
50 | 
51 |         if self.transform:
52 |             data, target = self.transform(data, target)
53 | 
54 |         sample = {'data': data, 'target': target}
55 |         return sample


--------------------------------------------------------------------------------
/appendix/template/logs/.gitignore:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/logs/.gitignore


--------------------------------------------------------------------------------
/appendix/template/loss.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | 
 4 | class RRLoss(nn.Module):
 5 |     def __init__(self):
 6 |         """ RRLoss"""
 7 |         super(RRLoss, self).__init__()
 8 | 
 9 |     def forward(self, predict, data):
10 |         return torch.mean(torch.square(predict  - data))


--------------------------------------------------------------------------------
/appendix/template/metrics.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | 
 3 | class PSNR():
 4 |     """Peak Signal to Noise Ratio
 5 |     img1 and img2 have range [0, 255]"""
 6 | 
 7 |     def __init__(self):
 8 |         self.name = "PSNR"
 9 | 
10 |     @staticmethod
11 |     def __call__(img1, img2):
12 |         mse = torch.mean((img1 - img2) ** 2)
13 |         return 20 * torch.log10(255.0 / torch.sqrt(mse))


--------------------------------------------------------------------------------
/appendix/template/models.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import torch.nn.functional as F
 4 | 
 5 | class XXXNet(nn.Module):
 6 |     """
 7 |     """
 8 |     def __init__(self):
 9 |         super(XXXNet, self).__init__()
10 |         self.conv1 = nn.Conv2d(3, 3, kernel_size = 3, padding=1)
11 |         # ...
12 | 
13 |     def forward(self, x): 
14 |         out = self.conv1(x)
15 |         # ...        
16 |         return out


--------------------------------------------------------------------------------
/appendix/template/requirements.txt:
--------------------------------------------------------------------------------
1 | torch==1.3.0+cu100
2 | numpy==1.18.4
3 | torchvision==0.6.0+cu101
4 | Pillow==7.1.2
5 | 


--------------------------------------------------------------------------------
/appendix/template/test.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/polarisZhao/pytorch-cookbook/2de073487a0257936cdbb34920df833ad0603a99/appendix/template/test.py


--------------------------------------------------------------------------------
/appendix/template/train.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | 
  3 | import os
  4 | import argparse
  5 | import shutil
  6 | import numpy as np
  7 | 
  8 | import torch
  9 | import torch.nn as nn
 10 | import torch.nn.functional as F
 11 | import torch.optim as optim
 12 | from torchvision import datasets, transforms
 13 | from torch.autograd import Variable
 14 | from torch.utils.data import DataLoader
 15 | 
 16 | from datasets import KKDataset, transform_train, transform_test #
 17 | from models import XXXNet # 
 18 | from loss import RRLoss #
 19 | from metrics import PSNR # 
 20 | from utils import * 
 21 | 
 22 | # Training settings
 23 | parser = argparse.ArgumentParser(description='PyTorch Template training')
 24 | # env
 25 | parser.add_argument('--use_gpu', action='store_true', default=True, help='enables/disables CUDA training')
 26 | parser.add_argument('--seed', type=int, default=2020, metavar='S', help='random seed (default: 2020)')
 27 | parser.add_argument('--log_path', type=str, default='./logs', metavar='PATH', help='path to save logs (default: current directory/logs)')
 28 | # data
 29 | parser.add_argument('--train_dataset', type=str, default='./data/train', help='training dataset path')
 30 | parser.add_argument('--valid_dataset', type=str, default='./data/valid', help='test dataset path')
 31 | parser.add_argument('--batch_size', type=int, default=16, metavar='N', help='input batch size for training (default: 16)')
 32 | parser.add_argument('--val_batch_size', type=int, default=16, metavar='N', help='input batch size for testing (default: 16)')
 33 | # models
 34 | parser.add_argument('--resume', type=str, default='', metavar='PATH', help='path to checkpoint path(default: none)')
 35 | parser.add_argument('--checkpoint_dir', type=str, default='./checkpoints', metavar='PATH', help='path to save prune model (default: current directory/checkpoints)')
 36 | # optimizer & lr
 37 | parser.add_argument('--init_lr', type=float, default=0.01, metavar='LR', help='init learning rate (default: 0.1)')
 38 | parser.add_argument('--momentum', type=float, default=0.9, metavar='M', help='SGD momentum (default: 0.9)')
 39 | parser.add_argument('--weight_decay', '--wd', default=1e-4, type=float, metavar='W', help='weight decay (default: 1e-4)')
 40 | # epoch & save-interval
 41 | parser.add_argument('--epochs', type=int, default=160, metavar='N', help='number of epochs to train (default: 160)')
 42 | parser.add_argument('--start_epoch', type=int, default=0, metavar='N', help='manual epoch number (useful on restarts)')
 43 | parser.add_argument('--save_interval', type=int, default=1, metavar='N', help='how many batches to wait before logging training status')
 44 | args = parser.parse_args()
 45 | 
 46 | 
 47 | # global setting
 48 | torch.manual_seed(args.seed)
 49 | 
 50 | device = torch.device("cuda" if (args.use_gpu and torch.cuda.is_available()) else "cpu")
 51 | if device == "cuda":
 52 |     torch.cuda.manual_seed(args.seed)
 53 |     cudnn.benchmark = True
 54 |     cudnn.enabled = True
 55 | 
 56 | if not os.path.exists(args.checkpoint_dir):
 57 |     os.makedirs(args.ckpt_path)
 58 | if not os.path.exists(args.log_path):
 59 |     os.makedirs(args.log_path)
 60 | 
 61 | 
 62 | # dataset
 63 | train_dataset = KKDataset(args.train_dataset, is_trainval = True, transform = transform_train) #
 64 | train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=args.batch_size,
 65 |                                            shuffle=True, num_workers=0, drop_last=False) #
 66 | valid_dataset = KKDataset(args.valid_dataset, is_trainval = True, transform = transform_test) # 
 67 | valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=args.val_batch_size, 
 68 |                                            shuffle=True, num_workers=0) #
 69 | # model & loss
 70 | model = XXXNet().to(device) #
 71 | lossfunc = RRLoss() #
 72 | criterion = PSNR() #
 73 | # lr & optimizer
 74 | optimizer = optim.SGD(model.parameters(), lr=args.init_lr, momentum=args.momentum, weight_decay=args.weight_decay)
 75 | scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[50, 70], gamma=0.1)
 76 | 
 77 | # load resume
 78 | if args.resume:
 79 |     if os.path.isfile(args.resume):
 80 |         print("=> loading checkpoint '{}'".format(args.resume))
 81 |         checkpoint = torch.load(args.resume)
 82 |         args.start_epoch = checkpoint['epoch']
 83 |         best_prec = checkpoint['best_prec']
 84 |         model.load_state_dict(checkpoint['state_dict'])
 85 |         optimizer.load_state_dict(checkpoint['optimizer'])
 86 |         print("=> loaded checkpoint '{}' (epoch {}) Prec: {:f}"
 87 |               .format(args.resume, checkpoint['epoch'], best_prec1))
 88 |     else:
 89 |         print("=> no checkpoint found at '{}'".format(args.resume))
 90 | 
 91 | def train(epoch):
 92 |     model.train()
 93 | 
 94 |     avg_loss = 0.0
 95 |     train_acc = 0.0
 96 |     for batch_idx, batchdata in enumerate(train_loader):
 97 |         data, target = batchdata["data"], batchdata["target"] #
 98 |         data, target = data.to(device), target.to(device)  #
 99 |         optimizer.zero_grad()
100 | 
101 |         predict = model(data) # 
102 |         loss = lossfunc(predict, target) #
103 |         avg_loss += loss.item() #
104 | 
105 |         loss.backward()
106 |         optimizer.step()
107 | 
108 |         print('Train Epoch: {} [{}/{} ({:.1f}%)]\tLoss: {:.6f}'.format(
109 |                 epoch, batch_idx * len(data), len(train_loader.dataset),
110 |                 100. * batch_idx / len(train_loader), loss.item()))
111 | 
112 |     if (epoch + 1) %  args.save_interval == 0:
113 |         state = { 'epoch': epoch + 1,
114 |                    'state_dict': model.state_dict(),
115 |                    'best_prec': 0.0,
116 |                    'optimizer': optimizer.state_dict()}
117 |         model_path = os.path.join(args.checkpoint_dir, 'model_' + str(epoch) + '.pth')
118 |         torch.save(state, model_path)
119 | 
120 | 
121 | def test():
122 |     model.eval()
123 | 
124 |     test_loss = 0
125 |     for batch_idx, batchdata in enumerate(valid_loader):
126 |         data, target = batchdata["data"], batchdata["target"] #
127 |         data, target = data.to(device), target.to(device) #
128 |         predict = model(data) # 
129 |         test_loss += lossfunc(predict, target) #
130 |         psnr = criterion(predict * 255, target * 255) #
131 | 
132 |     test_loss /= len(valid_loader.dataset)
133 |     print('\nTest set: Average loss: {:.4f}, loss:{}, PSNR: ({:.1f})\n'.format(
134 |         test_loss, test_loss / len(valid_loader.dataset), psnr / len(valid_loader.dataset)))
135 |     return psnr / float(len(valid_loader.dataset))
136 | 
137 | 
138 | best_prec = 0.0
139 | for epoch in range(args.start_epoch, args.epochs):
140 |     train(epoch)
141 |     scheduler.step()
142 |     print(print(optimizer.state_dict()['param_groups'][0]['lr']))
143 | 
144 |     current_prec = test() 
145 |     is_best = current_prec > best_prec #　更改大小写 !
146 |     best_prec = max(best_prec, best_prec) #  max or min
147 | 
148 |     save_checkpoint({
149 |         'epoch': epoch + 1,
150 |         'state_dict': model.state_dict(),
151 |         'best_prec': best_prec,
152 |         'optimizer': optimizer.state_dict(),
153 |     }, is_best, args.checkpoint_dir)
154 | 


--------------------------------------------------------------------------------
/appendix/template/utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import shutil
3 | import torch
4 | 
5 | def save_checkpoint(state, is_best, checkpoint_dir):
6 |     torch.save(state, os.path.join(checkpoint_dir, 'checkpoint_latest.pth.tar'))
7 |     if is_best:
8 |         shutil.copyfile(os.path.join(checkpoint_dir, 'checkpoint_latest.pth.tar'),
9 |                         os.path.join(checkpoint_dir, 'checkpoint__best.pth.tar'))


--------------------------------------------------------------------------------