├── LICENSE.txt
├── README.md
├── images
    ├── basketball.gif
    ├── basketball_soft.gif
    ├── bowling.gif
    ├── bowling_soft.gif
    ├── buildings.jpg
    ├── buildings_soft.jpg
    ├── cars.gif
    ├── cars_soft.gif
    ├── otters.jpg
    ├── otters_soft.jpg
    ├── parkour.gif
    ├── parkour_soft.gif
    ├── pass.gif
    ├── pass_soft.gif
    ├── pizza_toss.gif
    ├── pizza_toss_soft.gif
    ├── puffin.jpg
    ├── puffin_soft.jpg
    ├── tennis_ball.jpg
    ├── tennis_ball_soft.jpg
    ├── tower.jpg
    ├── tower_soft.jpg
    ├── tram.jpg
    └── tram_soft.jpg
├── main
    ├── ._train.py
    ├── models
    │   ├── __init__.py
    │   ├── config.py
    │   ├── densent.py
    │   ├── inception.py
    │   └── resnet.py
    └── train.py
└── pytorch
    ├── ._setup.py
    ├── CUDA
        ├── limits.cuh
        ├── softpool_cuda.cpp
        └── softpool_cuda_kernel.cu
    ├── Makefile
    ├── SoftPool
        ├── __init__.py
        ├── __pycache__
        │   └── idea.cpython-38.pyc
        └── idea.py
    ├── setup.py
    └── test-files
        ├── ._images
        ├── ._out_1
        ├── ._test.py
        └── test.py


/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Alexandros Stergiou
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Refining activation downsampling with SoftPool
  2 | ![supported versions](https://img.shields.io/badge/python-3.x-brightgreen/?style=flat&logo=python&color=green)
  3 | ![Library](https://img.shields.io/badge/library-PyTorch-blue?logo=Pytorch)
  4 | ![GitHub license](https://img.shields.io/cocoapods/l/AFNetworking)
  5 | 
  6 | 
  7 | --------------------------------------------------------------------------------
  8 | #### Update 10/2021:
  9 | We have extended this work with in our paper: ***AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling***. Info, code and resources are available at [`alexandrosstergiou/adaPool`](https://github.com/alexandrosstergiou/adaPool)
 10 | 
 11 | ## Abstract
 12 | Convolutional Neural Networks (CNNs) use pooling to decrease the size of activation maps. This process is crucial to increase the receptive fields and to reduce computational requirements of subsequent convolutions. An important feature of the pooling operation is the minimization of information loss, with respect to the initial activation maps, without a significant impact on the computation and memory overhead. To meet these requirements, we propose SoftPool: a fast and efficient method for exponentially weighted activation downsampling. Through experiments across a range of architectures and pooling methods, we demonstrate that SoftPool can retain more information in the reduced activation maps. This refined downsampling leads to improvements in a CNN's classification accuracy. Experiments with pooling layer substitutions on ImageNet1K show an increase in accuracy over both original architectures and other pooling methods. We also test SoftPool on video datasets for action recognition. Again, through the direct replacement of pooling layers, we observe consistent performance improvements while computational loads and memory requirements remain limited. <p align="center">
 13 | 
 14 | <i></i>
 15 | <br>
 16 | <i><p align="center"> To appear in <a href="http://iccv2021.thecvf.com/home"> IEEE International Conference on Computer Vision (ICCV) 2021</a></p></i>
 17 | <p align="center">
 18 | <a href="https://arxiv.org/abs/2101.00440" target="blank" >[arXiv preprint]</a>
 19 | &nbsp;&nbsp;&nbsp;
 20 | <a href="https://openaccess.thecvf.com/content/ICCV2021/html/Stergiou_Refining_Activation_Downsampling_With_SoftPool_ICCV_2021_paper.html" target="_blank">[CVF open access]</a>
 21 | &nbsp;&nbsp;&nbsp;
 22 | <a href="https://www.youtube.com/watch?v=iqsMoVQSyDw" target="blank" >[video presentation]</a>
 23 | </p>
 24 | 
 25 | Image based pooling. Images are sub-sampled in both height and width by half.
 26 | 
 27 | |Original|<img src="images/buildings.jpg" width="130" />|<img src="images/otters.jpg" width="130" />|<img src="images/tennis_ball.jpg" width="130" />|<img src="images/puffin.jpg" width="130" />|<img src="images/tram.jpg" width="130" />|<img src="images/tower.jpg" width="130" />|
 28 | |:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|
 29 | |Soft Pool|<img src="images/buildings_soft.jpg" width="130" />|<img src="images/otters_soft.jpg" width="130" />|<img src="images/tennis_ball_soft.jpg" width="130" />|<img src="images/puffin_soft.jpg" width="130" />|<img src="images/tram_soft.jpg" width="130" />|<img src="images/tower_soft.jpg" width="130" />|
 30 | 
 31 | Video based pooling. Videos are sub-sampled in time, height and width by half.
 32 | 
 33 | 
 34 | |Original|<img src="images/cars.gif" width="130" />|<img src="images/basketball.gif" width="130" />|<img src="images/parkour.gif" width="130" />|<img src="images/bowling.gif" width="130" />|<img src="images/pizza_toss.gif" width="130" />|<img src="images/pass.gif" width="130" />|
 35 | |:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|
 36 | |Soft Pool|<img src="images/cars_soft.gif" width="130" />|<img src="images/basketball_soft.gif" width="130" />|<img src="images/parkour_soft.gif" width="130" />|<img src="images/bowling_soft.gif" width="130" />|<img src="images/pizza_toss_soft.gif" width="130" />|<img src="images/pass_soft.gif" width="130" />|
 37 | 
 38 | ## Dependencies
 39 | All parts of the code assume that `torch` is of version 1.4 or higher. There might be instability issues on previous versions.
 40 | 
 41 | > ***! Disclaimer:*** This repository is heavily structurally influenced on Ziteng Gao's LIP repo [https://github.com/sebgao/LIP](https://github.com/sebgao/LIP)
 42 | 
 43 | ## Installation
 44 | 
 45 | You can build the repo through the following commands:
 46 | ```
 47 | $ git clone https://github.com/alexandrosstergiou/SoftPool.git
 48 | $ cd SoftPool-master/pytorch
 49 | $ make install
 50 | --- (optional) ---
 51 | $ make test
 52 | ```
 53 | 
 54 | 
 55 | ## Usage
 56 | 
 57 | You can load any of the 1D, 2D or 3D variants after the installation with:
 58 | 
 59 | ```python
 60 | import softpool_cuda
 61 | from SoftPool import soft_pool1d, SoftPool1d
 62 | from SoftPool import soft_pool2d, SoftPool2d
 63 | from SoftPool import soft_pool3d, SoftPool3d
 64 | ```
 65 | 
 66 | + `soft_poolxd`: Is a functional interface for SoftPool.
 67 | + `SoftPoolxd`: Is the class-based version which created an object that can be referenced later in the code.
 68 | 
 69 | ## ImageNet models
 70 | 
 71 | ImageNet weight can be downloaded from the following links:
 72 | 
 73 | |Network|link|
 74 | |:-----:|:--:|
 75 | | ResNet-18 | [link](https://drive.google.com/file/d/11me4z74Fp4FkGGv_WbMZRQxTr4YJxUHS/view?usp=sharing) |
 76 | | ResNet-34 | [link](https://drive.google.com/file/d/1-5O-r3hCJ7JSrrfVowrUZpaHcp7TcKKT/view?usp=sharing) |
 77 | | ResNet-50 | [link](https://drive.google.com/file/d/1HpBESqJ-QLO_O0pozgh1T3xp4n5MOQLU/view?usp=sharing) |
 78 | | ResNet-101 | [link](https://drive.google.com/file/d/1fng3DFm48W6h-qbFUk-IPZf9s8HsGbdw/view?usp=sharing) |
 79 | | ResNet-152 | [link](https://drive.google.com/file/d/1ejuMgP4DK9pFcVnu1TZo6TELPlrhHJC_/view?usp=sharing) |
 80 | | DenseNet-121 | [link](https://drive.google.com/file/d/1EXIbVI19JyEjgY75caZK2B2-gaxKTVpK/view?usp=sharing) |
 81 | | DenseNet-161 | [link](https://drive.google.com/file/d/18Qs9XUXNPSgBe46_0OGZIcpvdoFZfjU5/view?usp=sharing) |
 82 | | DenseNet-169 | [link](https://drive.google.com/file/d/1shFZV_AIZ6SQFQs-C0YThfpOfZH88hm7/view?usp=sharing) |
 83 | | ResNeXt-50_32x4d | [link](hhttps://drive.google.com/file/d/1-3sd8paTlqa1X8KGUy6B5Eehv791tbVH/view?usp=sharing) |
 84 | | ResNeXt-101_32x4d | [link](https://drive.google.com/file/d/1URDkwAPxDgcQzkYFlV_m-1T5RjZvzabo/view?usp=sharing) |
 85 | | wide-ResNet50 | [link](https://drive.google.com/file/d/1X3A6P0enEJYLeNmY0pUTXA26FEQB1qMe/view?usp=sharing) |
 86 | 
 87 | ## Citation
 88 | 
 89 | ```
 90 | @inproceedings{stergiou2021refining,
 91 |   title={Refining activation downsampling with SoftPool},
 92 |   author={Stergiou, Alexandros, Poppe, Ronald and Kalliatakis Grigorios},
 93 |   booktitle={International Conference on Computer Vision (ICCV)},
 94 |   year={2021},
 95 |   pages={10357-10366},
 96 |   organization={IEEE}
 97 | }
 98 | ```
 99 | 
100 | ## Licence
101 | 
102 | MIT
103 | 
104 | ## Additional resources
105 | A great project is Ren Tianhe's [`pytorh-pooling` repo](https://github.com/rentainhe/pytorch-pooling) for overviewing different pooling strategies.
106 | 


--------------------------------------------------------------------------------
/images/basketball.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/basketball.gif


--------------------------------------------------------------------------------
/images/basketball_soft.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/basketball_soft.gif


--------------------------------------------------------------------------------
/images/bowling.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/bowling.gif


--------------------------------------------------------------------------------
/images/bowling_soft.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/bowling_soft.gif


--------------------------------------------------------------------------------
/images/buildings.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/buildings.jpg


--------------------------------------------------------------------------------
/images/buildings_soft.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/buildings_soft.jpg


--------------------------------------------------------------------------------
/images/cars.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/cars.gif


--------------------------------------------------------------------------------
/images/cars_soft.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/cars_soft.gif


--------------------------------------------------------------------------------
/images/otters.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/otters.jpg


--------------------------------------------------------------------------------
/images/otters_soft.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/otters_soft.jpg


--------------------------------------------------------------------------------
/images/parkour.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/parkour.gif


--------------------------------------------------------------------------------
/images/parkour_soft.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/parkour_soft.gif


--------------------------------------------------------------------------------
/images/pass.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/pass.gif


--------------------------------------------------------------------------------
/images/pass_soft.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/pass_soft.gif


--------------------------------------------------------------------------------
/images/pizza_toss.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/pizza_toss.gif


--------------------------------------------------------------------------------
/images/pizza_toss_soft.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/pizza_toss_soft.gif


--------------------------------------------------------------------------------
/images/puffin.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/puffin.jpg


--------------------------------------------------------------------------------
/images/puffin_soft.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/puffin_soft.jpg


--------------------------------------------------------------------------------
/images/tennis_ball.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/tennis_ball.jpg


--------------------------------------------------------------------------------
/images/tennis_ball_soft.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/tennis_ball_soft.jpg


--------------------------------------------------------------------------------
/images/tower.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/tower.jpg


--------------------------------------------------------------------------------
/images/tower_soft.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/tower_soft.jpg


--------------------------------------------------------------------------------
/images/tram.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/tram.jpg


--------------------------------------------------------------------------------
/images/tram_soft.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/tram_soft.jpg


--------------------------------------------------------------------------------
/main/._train.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/main/._train.py


--------------------------------------------------------------------------------
/main/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/main/models/__init__.py


--------------------------------------------------------------------------------
/main/models/config.py:
--------------------------------------------------------------------------------
 1 | from .resnet import resnet18, resnet34, resnet50, resnet101, resnet152, resnext50_32x4d, resnext101_32x8d, resnext101_32x4d, resnext101_64x4d, wide_resnet50_2, wide_resnet101_2
 2 | from .densent import densenet121, densenet161, densenet169, densenet201
 3 | from .inception import inception_v3
 4 | 
 5 | models = ['resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152', 'resnext50_32x4d', 'resnext101_32x8d', 'resnext101_32x4d', 'resnext101_64x4d', 'wide_resnet50_2', 'wide_resnet101_2', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'inception_v3']
 6 | 
 7 | def get_model(name,use_softpool, **kwargs):
 8 |     net = None
 9 |     if 'res' in name.lower():
10 |         if '18' in name.lower():
11 |             net = resnet18(use_softpool=use_softpool, **kwargs)
12 |         elif '34' in name.lower():
13 |             net = resnet34(use_softpool=use_softpool, **kwargs)
14 |         elif '50' in name.lower():
15 |             if 'xt' in name.lower():
16 |                 net = resnext50_32x4d(use_softpool=use_softpool, **kwargs)
17 |             elif 'wide' in name.lower():
18 |                 net = wide_resnet50_2(use_softpool=use_softpool, **kwargs)
19 |             else:
20 |                 net = resnet50(use_softpool=use_softpool, **kwargs)
21 |         elif '101' in name.lower():
22 |             if 'xt' in name.lower():
23 |                 if '32x4d' in name.lower():
24 |                     net = resnext101_32x4d(use_softpool=use_softpool, **kwargs)
25 |                 elif '64x4d' in name.lower():
26 |                     net = resnext101_64x4d(use_softpool=use_softpool, **kwargs)
27 |                 elif '32x8d' in name.lower():
28 |                     net = resnext101_32x8d(use_softpool=use_softpool, **kwargs)
29 |             elif 'wide' in name.lower():
30 |                 net = wide_resnet101_2(use_softpool=use_softpool, **kwargs)
31 |             else:
32 |                 net = resnet101(use_softpool=use_softpool, **kwargs)
33 |         elif '152' in name.lower():
34 |             net = resnet152(use_softpool=use_softpool, **kwargs)
35 | 
36 |     elif 'densenet' in name.lower():
37 |         if '121' in name.lower():
38 |             net = densenet121(use_softpool=use_softpool, **kwargs)
39 |         elif '161' in name.lower():
40 |             net = densenet161(use_softpool=use_softpool, **kwargs)
41 |         elif '169' in name.lower():
42 |             net = densenet169(use_softpool=use_softpool, **kwargs)
43 |         elif '201' in name.lower():
44 |             net = densenet201(use_softpool=use_softpool, **kwargs)
45 | 
46 |     elif 'inception' in name.lower():
47 |         net = inception_v3(use_softpool=use_softpool, **kwargs)
48 | 
49 |     if net is None:
50 |         print('Selected architecture not implemented !')
51 |         raise NotImplementedError
52 | 
53 |     return net
54 | 


--------------------------------------------------------------------------------
/main/models/densent.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | import torch
  3 | import torch.nn as nn
  4 | import torch.nn.functional as F
  5 | import torch.utils.checkpoint as cp
  6 | from collections import OrderedDict
  7 | from torch import Tensor
  8 | from torch.jit.annotations import List
  9 | 
 10 | import softpool_cuda
 11 | from SoftPool import soft_pool2d, SoftPool2d
 12 | 
 13 | 
 14 | __all__ = ['DenseNet', 'densenet121', 'densenet169', 'densenet201', 'densenet161']
 15 | 
 16 | 
 17 | class _DenseLayer(nn.Module):
 18 |     def __init__(self, num_input_features, growth_rate, bn_size, drop_rate, memory_efficient=False):
 19 |         super(_DenseLayer, self).__init__()
 20 |         self.add_module('norm1', nn.BatchNorm2d(num_input_features)),
 21 |         self.add_module('relu1', nn.ReLU(inplace=True)),
 22 |         self.add_module('conv1', nn.Conv2d(num_input_features, bn_size *
 23 |                                            growth_rate, kernel_size=1, stride=1,
 24 |                                            bias=False)),
 25 |         self.add_module('norm2', nn.BatchNorm2d(bn_size * growth_rate)),
 26 |         self.add_module('relu2', nn.ReLU(inplace=True)),
 27 |         self.add_module('conv2', nn.Conv2d(bn_size * growth_rate, growth_rate,
 28 |                                            kernel_size=3, stride=1, padding=1,
 29 |                                            bias=False)),
 30 |         self.drop_rate = float(drop_rate)
 31 |         self.memory_efficient = memory_efficient
 32 | 
 33 |     def bn_function(self, inputs):
 34 |         # type: (List[Tensor]) -> Tensor
 35 |         concated_features = torch.cat(inputs, 1)
 36 |         bottleneck_output = self.conv1(self.relu1(self.norm1(concated_features)))  # noqa: T484
 37 |         return bottleneck_output
 38 | 
 39 |     # todo: rewrite when torchscript supports any
 40 |     def any_requires_grad(self, input):
 41 |         # type: (List[Tensor]) -> bool
 42 |         for tensor in input:
 43 |             if tensor.requires_grad:
 44 |                 return True
 45 |         return False
 46 | 
 47 |     @torch.jit.unused  # noqa: T484
 48 |     def call_checkpoint_bottleneck(self, input):
 49 |         # type: (List[Tensor]) -> Tensor
 50 |         def closure(*inputs):
 51 |             return self.bn_function(inputs)
 52 | 
 53 |         return cp.checkpoint(closure, *input)
 54 | 
 55 |     @torch.jit._overload_method  # noqa: F811
 56 |     def forward(self, input):
 57 |         # type: (List[Tensor]) -> (Tensor)
 58 |         pass
 59 | 
 60 |     @torch.jit._overload_method  # noqa: F811
 61 |     def forward(self, input):
 62 |         # type: (Tensor) -> (Tensor)
 63 |         pass
 64 | 
 65 |     # torchscript does not yet support *args, so we overload method
 66 |     # allowing it to take either a List[Tensor] or single Tensor
 67 |     def forward(self, input):  # noqa: F811
 68 |         if isinstance(input, Tensor):
 69 |             prev_features = [input]
 70 |         else:
 71 |             prev_features = input
 72 | 
 73 |         if self.memory_efficient and self.any_requires_grad(prev_features):
 74 |             if torch.jit.is_scripting():
 75 |                 raise Exception("Memory Efficient not supported in JIT")
 76 | 
 77 |             bottleneck_output = self.call_checkpoint_bottleneck(prev_features)
 78 |         else:
 79 |             bottleneck_output = self.bn_function(prev_features)
 80 | 
 81 |         new_features = self.conv2(self.relu2(self.norm2(bottleneck_output)))
 82 |         if self.drop_rate > 0:
 83 |             new_features = F.dropout(new_features, p=self.drop_rate,
 84 |                                      training=self.training)
 85 |         return new_features
 86 | 
 87 | 
 88 | class _DenseBlock(nn.ModuleDict):
 89 |     _version = 2
 90 | 
 91 |     def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate, memory_efficient=False):
 92 |         super(_DenseBlock, self).__init__()
 93 |         for i in range(num_layers):
 94 |             layer = _DenseLayer(
 95 |                 num_input_features + i * growth_rate,
 96 |                 growth_rate=growth_rate,
 97 |                 bn_size=bn_size,
 98 |                 drop_rate=drop_rate,
 99 |                 memory_efficient=memory_efficient,
100 |             )
101 |             self.add_module('denselayer%d' % (i + 1), layer)
102 | 
103 |     def forward(self, init_features):
104 |         features = [init_features]
105 |         for name, layer in self.items():
106 |             new_features = layer(features)
107 |             features.append(new_features)
108 |         return torch.cat(features, 1)
109 | 
110 | 
111 | class _Transition(nn.Sequential):
112 |     def __init__(self, num_input_features, num_output_features, use_softpool):
113 |         super(_Transition, self).__init__()
114 |         self.add_module('norm', nn.BatchNorm2d(num_input_features))
115 |         self.add_module('relu', nn.ReLU(inplace=True))
116 |         self.add_module('conv', nn.Conv2d(num_input_features, num_output_features,
117 |                                           kernel_size=1, stride=1, bias=False))
118 |         #if not use_softpool:
119 |         self.add_module('pool', nn.AvgPool2d(kernel_size=2, stride=2))
120 |         #else:
121 |         #    self.add_module('pool', SoftPool2d(kernel_size=(2,2), stride=(2,2)))
122 | 
123 | 
124 | 
125 | class DenseNet(nn.Module):
126 |     r"""Densenet-BC model class, based on
127 |     `"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>`_
128 | 
129 |     Args:
130 |         growth_rate (int) - how many filters to add each layer (`k` in paper)
131 |         block_config (list of 4 ints) - how many layers in each pooling block
132 |         num_init_features (int) - the number of filters to learn in the first convolution layer
133 |         use_softpool (bool) - changes pooling operations to softpooling
134 |         bn_size (int) - multiplicative factor for number of bottle neck layers
135 |           (i.e. bn_size * k features in the bottleneck layer)
136 |         drop_rate (float) - dropout rate after each dense layer
137 |         num_classes (int) - number of classification classes
138 |         memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient,
139 |           but slower. Default: *False*. See `"paper" <https://arxiv.org/pdf/1707.06990.pdf>`_
140 |     """
141 | 
142 |     def __init__(self, growth_rate=32, block_config=(6, 12, 24, 16),
143 |                  num_init_features=64, use_softpool=False,
144 |                  bn_size=4, drop_rate=0,
145 |                  num_classes=1000, memory_efficient=False):
146 | 
147 |         super(DenseNet, self).__init__()
148 | 
149 |         # First convolution
150 |         if not use_softpool:
151 |             self.features = nn.Sequential(OrderedDict([
152 |                 ('conv0', nn.Conv2d(3, num_init_features, kernel_size=7, stride=2,
153 |                                     padding=3, bias=False)),
154 |                 ('norm0', nn.BatchNorm2d(num_init_features)),
155 |                 ('relu0', nn.ReLU(inplace=True)),
156 |                 ('pool0', nn.MaxPool2d(kernel_size=3, stride=2, padding=1)),
157 |             ]))
158 |         else:
159 |             self.features = nn.Sequential(OrderedDict([
160 |                 ('conv0', nn.Conv2d(3, num_init_features, kernel_size=7, stride=2,
161 |                                     padding=3, bias=False)),
162 |                 ('norm0', nn.BatchNorm2d(num_init_features)),
163 |                 ('relu0', nn.ReLU(inplace=True)),
164 |                 ('pool0', SoftPool2d(kernel_size=(2,2), stride=(2,2))),
165 |             ]))
166 | 
167 |         # Each denseblock
168 |         num_features = num_init_features
169 |         for i, num_layers in enumerate(block_config):
170 |             block = _DenseBlock(
171 |                 num_layers=num_layers,
172 |                 num_input_features=num_features,
173 |                 bn_size=bn_size,
174 |                 growth_rate=growth_rate,
175 |                 drop_rate=drop_rate,
176 |                 memory_efficient=memory_efficient
177 |             )
178 |             self.features.add_module('denseblock%d' % (i + 1), block)
179 |             num_features = num_features + num_layers * growth_rate
180 |             if i != len(block_config) - 1:
181 |                 trans = _Transition(num_input_features=num_features,
182 |                                     num_output_features=num_features // 2,
183 |                                     use_softpool=use_softpool)
184 |                 self.features.add_module('transition%d' % (i + 1), trans)
185 |                 num_features = num_features // 2
186 | 
187 |         # Final batch norm
188 |         self.features.add_module('norm5', nn.BatchNorm2d(num_features))
189 | 
190 |         # Linear layer
191 |         self.classifier = nn.Linear(num_features, num_classes)
192 | 
193 |         # Official init from torch repo.
194 |         for m in self.modules():
195 |             if isinstance(m, nn.Conv2d):
196 |                 nn.init.kaiming_normal_(m.weight)
197 |             elif isinstance(m, nn.BatchNorm2d):
198 |                 nn.init.constant_(m.weight, 1)
199 |                 nn.init.constant_(m.bias, 0)
200 |             elif isinstance(m, nn.Linear):
201 |                 nn.init.constant_(m.bias, 0)
202 | 
203 |     def forward(self, x):
204 |         features = self.features(x)
205 |         out = F.relu(features, inplace=True)
206 |         out = F.adaptive_avg_pool2d(out, (1, 1))
207 |         out = torch.flatten(out, 1)
208 |         out = self.classifier(out)
209 |         return out
210 | 
211 | 
212 | def _load_state_dict(model, model_url, progress):
213 |     # '.'s are no longer allowed in module names, but previous _DenseLayer
214 |     # has keys 'norm.1', 'relu.1', 'conv.1', 'norm.2', 'relu.2', 'conv.2'.
215 |     # They are also in the checkpoints in model_urls. This pattern is used
216 |     # to find such keys.
217 |     pattern = re.compile(
218 |         r'^(.*denselayer\d+\.(?:norm|relu|conv))\.((?:[12])\.(?:weight|bias|running_mean|running_var))$')
219 | 
220 |     state_dict = load_state_dict_from_url(model_url, progress=progress)
221 |     for key in list(state_dict.keys()):
222 |         res = pattern.match(key)
223 |         if res:
224 |             new_key = res.group(1) + res.group(2)
225 |             state_dict[new_key] = state_dict[key]
226 |             del state_dict[key]
227 |     model.load_state_dict(state_dict)
228 | 
229 | 
230 | def _densenet(arch, growth_rate, block_config, num_init_features, pretrained, progress,
231 |               use_softpool, **kwargs):
232 |     model = DenseNet(growth_rate, block_config, num_init_features, use_softpool, **kwargs)
233 |     if pretrained:
234 |         _load_state_dict(model, model_urls[arch], progress)
235 |     return model
236 | 
237 | 
238 | def densenet121(pretrained=False, progress=True, use_softpool=False, **kwargs):
239 |     r"""Densenet-121 model from
240 |     `"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>`_
241 | 
242 |     Args:
243 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
244 |         progress (bool): If True, displays a progress bar of the download to stderr
245 |         memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient,
246 |           but slower. Default: *False*. See `"paper" <https://arxiv.org/pdf/1707.06990.pdf>`_
247 |         use_softpool (bool): If True, changes pooling operations to softpooling
248 |     """
249 |     return _densenet('densenet121', 32, (6, 12, 24, 16), 64, pretrained, progress, use_softpool,
250 |                      **kwargs)
251 | 
252 | 
253 | def densenet161(pretrained=False, progress=True, use_softpool=False, **kwargs):
254 |     r"""Densenet-161 model from
255 |     `"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>`_
256 | 
257 |     Args:
258 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
259 |         progress (bool): If True, displays a progress bar of the download to stderr
260 |         memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient,
261 |           but slower. Default: *False*. See `"paper" <https://arxiv.org/pdf/1707.06990.pdf>`_
262 |         use_softpool (bool): If True, changes pooling operations to softpooling
263 |     """
264 |     return _densenet('densenet161', 48, (6, 12, 36, 24), 96, pretrained, progress, use_softpool,
265 |                      **kwargs)
266 | 
267 | 
268 | def densenet169(pretrained=False, progress=True, use_softpool=False, **kwargs):
269 |     r"""Densenet-169 model from
270 |     `"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>`_
271 | 
272 |     Args:
273 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
274 |         progress (bool): If True, displays a progress bar of the download to stderr
275 |         memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient,
276 |           but slower. Default: *False*. See `"paper" <https://arxiv.org/pdf/1707.06990.pdf>`_
277 |         use_softpool (bool): If True, changes pooling operations to softpooling
278 |     """
279 |     return _densenet('densenet169', 32, (6, 12, 32, 32), 64, pretrained, progress, use_softpool,
280 |                      **kwargs)
281 | 
282 | 
283 | def densenet201(pretrained=False, progress=True, use_softpool=False, **kwargs):
284 |     r"""Densenet-201 model from
285 |     `"Densely Connected Convolutional Networks" <https://arxiv.org/pdf/1608.06993.pdf>`_
286 | 
287 |     Args:
288 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
289 |         progress (bool): If True, displays a progress bar of the download to stderr
290 |         memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient,
291 |           but slower. Default: *False*. See `"paper" <https://arxiv.org/pdf/1707.06990.pdf>`_
292 |         use_softpool (bool): If True, changes pooling operations to softpooling
293 |     """
294 |     return _densenet('densenet201', 32, (6, 12, 48, 32), 64, pretrained, progress, use_softpool,
295 |                      **kwargs)
296 | 


--------------------------------------------------------------------------------
/main/models/inception.py:
--------------------------------------------------------------------------------
  1 | from collections import namedtuple
  2 | import warnings
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | from torch.jit.annotations import Optional
  7 | from torch import Tensor
  8 | 
  9 | import softpool_cuda
 10 | from SoftPool import soft_pool2d, SoftPool2d
 11 | 
 12 | __all__ = ['Inception3', 'inception_v3', 'InceptionOutputs', '_InceptionOutputs']
 13 | 
 14 | 
 15 | 
 16 | InceptionOutputs = namedtuple('InceptionOutputs', ['logits', 'aux_logits'])
 17 | InceptionOutputs.__annotations__ = {'logits': torch.Tensor, 'aux_logits': Optional[torch.Tensor]}
 18 | 
 19 | # Script annotations failed with _GoogleNetOutputs = namedtuple ...
 20 | # _InceptionOutputs set here for backwards compat
 21 | _InceptionOutputs = InceptionOutputs
 22 | 
 23 | 
 24 | def inception_v3(pretrained=False, progress=True, use_softpool=True, **kwargs):
 25 |     r"""Inception v3 model architecture from
 26 |     `"Rethinking the Inception Architecture for Computer Vision" <http://arxiv.org/abs/1512.00567>`_.
 27 | 
 28 |     .. note::
 29 |         **Important**: In contrast to the other models the inception_v3 expects tensors with a size of
 30 |         N x 3 x 299 x 299, so ensure your images are sized accordingly.
 31 | 
 32 |     Args:
 33 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
 34 |         progress (bool): If True, displays a progress bar of the download to stderr
 35 |         aux_logits (bool): If True, add an auxiliary branch that can improve training.
 36 |             Default: *True*
 37 |         transform_input (bool): If True, preprocesses the input according to the method with which it
 38 |             was trained on ImageNet. Default: *False*
 39 |     """
 40 |     if pretrained:
 41 |         if 'transform_input' not in kwargs:
 42 |             kwargs['transform_input'] = True
 43 |         if 'aux_logits' in kwargs:
 44 |             original_aux_logits = kwargs['aux_logits']
 45 |             kwargs['aux_logits'] = True
 46 |         else:
 47 |             original_aux_logits = True
 48 |         kwargs['init_weights'] = False  # we are loading weights from a pretrained model
 49 |         model = Inception3(**kwargs)
 50 |         state_dict = load_state_dict_from_url(model_urls['inception_v3_google'],
 51 |                                               progress=progress)
 52 |         model.load_state_dict(state_dict)
 53 |         if not original_aux_logits:
 54 |             model.aux_logits = False
 55 |             del model.AuxLogits
 56 |         return model
 57 | 
 58 |     return Inception3(use_softpool, **kwargs)
 59 | 
 60 | class Inception3(nn.Module):
 61 | 
 62 |     def __init__(self, use_softpool=False, num_classes=1000, aux_logits=False, transform_input=False,
 63 |                  inception_blocks=None, init_weights=None):
 64 |         super(Inception3, self).__init__()
 65 |         if inception_blocks is None:
 66 |             inception_blocks = [
 67 |                 BasicConv2d, InceptionA, InceptionB, InceptionC,
 68 |                 InceptionD, InceptionE, InceptionAux
 69 |             ]
 70 |         if init_weights is None:
 71 |             warnings.warn('The default weight initialization of inception_v3 will be changed in future releases of '
 72 |                           'torchvision. If you wish to keep the old behavior (which leads to long initialization times'
 73 |                           ' due to scipy/scipy#11299), please set init_weights=True.', FutureWarning)
 74 |             init_weights = True
 75 |         assert len(inception_blocks) == 7
 76 |         conv_block = inception_blocks[0]
 77 |         inception_a = inception_blocks[1]
 78 |         inception_b = inception_blocks[2]
 79 |         inception_c = inception_blocks[3]
 80 |         inception_d = inception_blocks[4]
 81 |         inception_e = inception_blocks[5]
 82 |         inception_aux = inception_blocks[6]
 83 | 
 84 |         self.aux_logits = aux_logits
 85 |         self.transform_input = transform_input
 86 |         self.Conv2d_1a_3x3 = conv_block(3, 32, kernel_size=3, stride=2)
 87 |         self.Conv2d_2a_3x3 = conv_block(32, 32, kernel_size=3)
 88 |         self.Conv2d_2b_3x3 = conv_block(32, 64, kernel_size=3, padding=1)
 89 | 
 90 |         if not use_softpool:
 91 |             self.pool1 = nn.MaxPool2d(kernel_size=3, stride=2)
 92 |         else:
 93 |             self.pool1 = SoftPool2d(kernel_size=3, stride=2)
 94 | 
 95 |         self.Conv2d_3b_1x1 = conv_block(64, 80, kernel_size=1)
 96 |         self.Conv2d_4a_3x3 = conv_block(80, 192, kernel_size=3)
 97 | 
 98 |         if not use_softpool:
 99 |             self.pool2 = nn.MaxPool2d(kernel_size=3, stride=2)
100 |         else:
101 |             self.pool2 = SoftPool2d(kernel_size=3, stride=2)
102 | 
103 |         self.Mixed_5b = inception_a(192, pool_features=32, use_softpool=use_softpool,pad=False)
104 |         self.Mixed_5c = inception_a(256, pool_features=64, use_softpool=use_softpool,pad=False)
105 |         self.Mixed_5d = inception_a(288, pool_features=64, use_softpool=use_softpool,pad=False)
106 |         self.Mixed_6a = inception_b(288, use_softpool=use_softpool)
107 |         self.Mixed_6b = inception_c(768, channels_7x7=128, use_softpool=use_softpool,pad=False)
108 |         self.Mixed_6c = inception_c(768, channels_7x7=160, use_softpool=use_softpool,pad=False)
109 |         self.Mixed_6d = inception_c(768, channels_7x7=160, use_softpool=use_softpool,pad=False)
110 |         self.Mixed_6e = inception_c(768, channels_7x7=192, use_softpool=use_softpool,pad=False)
111 |         if aux_logits:
112 |             self.AuxLogits = inception_aux(768, num_classes, use_softpool=use_softpool)
113 |         self.Mixed_7a = inception_d(768, use_softpool=use_softpool)
114 |         self.Mixed_7b = inception_e(1280, use_softpool=use_softpool,pad=False)
115 |         self.Mixed_7c = inception_e(2048, use_softpool=use_softpool,pad=False)
116 |         self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
117 |         self.dropout = nn.Dropout()
118 |         self.fc = nn.Linear(2048, num_classes)
119 |         if init_weights:
120 |             for m in self.modules():
121 |                 if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
122 |                     import scipy.stats as stats
123 |                     stddev = m.stddev if hasattr(m, 'stddev') else 0.1
124 |                     X = stats.truncnorm(-2, 2, scale=stddev)
125 |                     values = torch.as_tensor(X.rvs(m.weight.numel()), dtype=m.weight.dtype)
126 |                     values = values.view(m.weight.size())
127 |                     with torch.no_grad():
128 |                         m.weight.copy_(values)
129 |                 elif isinstance(m, nn.BatchNorm2d):
130 |                     nn.init.constant_(m.weight, 1)
131 |                     nn.init.constant_(m.bias, 0)
132 | 
133 |     def _transform_input(self, x):
134 |         if self.transform_input:
135 |             x_ch0 = torch.unsqueeze(x[:, 0], 1) * (0.229 / 0.5) + (0.485 - 0.5) / 0.5
136 |             x_ch1 = torch.unsqueeze(x[:, 1], 1) * (0.224 / 0.5) + (0.456 - 0.5) / 0.5
137 |             x_ch2 = torch.unsqueeze(x[:, 2], 1) * (0.225 / 0.5) + (0.406 - 0.5) / 0.5
138 |             x = torch.cat((x_ch0, x_ch1, x_ch2), 1)
139 |         return x
140 | 
141 |     def _forward(self, x):
142 |         # N x 3 x 299 x 299
143 |         x = self.Conv2d_1a_3x3(x)
144 |         # N x 32 x 149 x 149
145 |         x = self.Conv2d_2a_3x3(x)
146 |         # N x 32 x 147 x 147
147 |         x = self.Conv2d_2b_3x3(x)
148 |         # N x 64 x 147 x 147
149 |         x = self.pool1(x)
150 |         # N x 64 x 73 x 73
151 |         x = self.Conv2d_3b_1x1(x)
152 |         # N x 80 x 73 x 73
153 |         x = self.Conv2d_4a_3x3(x)
154 |         # N x 192 x 71 x 71
155 |         x = self.pool2(x)
156 |         # N x 192 x 35 x 35
157 |         x = self.Mixed_5b(x)
158 |         # N x 256 x 35 x 35
159 |         x = self.Mixed_5c(x)
160 |         # N x 288 x 35 x 35
161 |         x = self.Mixed_5d(x)
162 |         # N x 288 x 35 x 35
163 |         x = self.Mixed_6a(x)
164 |         # N x 768 x 17 x 17
165 |         x = self.Mixed_6b(x)
166 |         # N x 768 x 17 x 17
167 |         x = self.Mixed_6c(x)
168 |         # N x 768 x 17 x 17
169 |         x = self.Mixed_6d(x)
170 |         # N x 768 x 17 x 17
171 |         x = self.Mixed_6e(x)
172 |         # N x 768 x 17 x 17
173 |         aux_defined = self.training and self.aux_logits
174 |         if aux_defined:
175 |             aux = self.AuxLogits(x)
176 |         else:
177 |             aux = None
178 |         # N x 768 x 17 x 17
179 |         x = self.Mixed_7a(x)
180 |         # N x 1280 x 8 x 8
181 |         x = self.Mixed_7b(x)
182 |         # N x 2048 x 8 x 8
183 |         x = self.Mixed_7c(x)
184 |         # N x 2048 x 8 x 8
185 |         # Adaptive average pooling
186 |         x = self.avgpool(x)
187 |         # N x 2048 x 1 x 1
188 |         x = self.dropout(x)
189 |         # N x 2048 x 1 x 1
190 |         x = torch.flatten(x, 1)
191 |         # N x 2048
192 |         x = self.fc(x)
193 |         # N x 1000 (num_classes)
194 |         return x, aux
195 | 
196 |     @torch.jit.unused
197 |     def eager_outputs(self, x, aux):
198 |         # type: (Tensor, Optional[Tensor]) -> InceptionOutputs
199 |         if self.training and self.aux_logits:
200 |             return InceptionOutputs(x, aux)
201 |         else:
202 |             return x
203 | 
204 |     def forward(self, x):
205 |         x = self._transform_input(x)
206 |         x, aux = self._forward(x)
207 |         aux_defined = self.training and self.aux_logits
208 |         if torch.jit.is_scripting():
209 |             if not aux_defined:
210 |                 warnings.warn("Scripted Inception3 always returns Inception3 Tuple")
211 |             return InceptionOutputs(x, aux)
212 |         else:
213 |             return self.eager_outputs(x, aux)
214 | 
215 | 
216 | class InceptionA(nn.Module):
217 | 
218 |     def __init__(self, in_channels, pool_features, conv_block=None, use_softpool=False, pad=True):
219 |         super(InceptionA, self).__init__()
220 |         if conv_block is None:
221 |             conv_block = BasicConv2d
222 |         self.pad = pad
223 | 
224 |         self.branch1x1 = conv_block(in_channels, 64, kernel_size=1)
225 | 
226 |         self.branch5x5_1 = conv_block(in_channels, 48, kernel_size=1)
227 |         self.branch5x5_2 = conv_block(48, 64, kernel_size=5, padding=2)
228 | 
229 |         self.branch3x3dbl_1 = conv_block(in_channels, 64, kernel_size=1)
230 |         self.branch3x3dbl_2 = conv_block(64, 96, kernel_size=3, padding=1)
231 |         self.branch3x3dbl_3 = conv_block(96, 96, kernel_size=3, padding=1)
232 | 
233 |         self.branch_pool = conv_block(in_channels, pool_features, kernel_size=1)
234 | 
235 |         self.use_softpool = use_softpool
236 | 
237 |     def _forward(self, x):
238 |         branch1x1 = self.branch1x1(x)
239 | 
240 |         branch5x5 = self.branch5x5_1(x)
241 |         branch5x5 = self.branch5x5_2(branch5x5)
242 | 
243 |         branch3x3dbl = self.branch3x3dbl_1(x)
244 |         branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
245 |         branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl)
246 | 
247 |         if not self.use_softpool:
248 |             branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
249 |         else:
250 |             if self.pad:
251 |                 branch_pool = F.pad(x,(1,1,1,1),'constant', 0)
252 |             else:
253 |                 branch_pool = F.pad(x,(0,0,0,0),'constant', 0)
254 |             branch_pool = soft_pool2d(branch_pool, kernel_size=3, stride=1)
255 | 
256 |         branch_pool = self.branch_pool(branch_pool)
257 | 
258 |         outputs = [branch1x1, branch5x5, branch3x3dbl, branch_pool]
259 |         return outputs
260 | 
261 |     def forward(self, x):
262 |         outputs = self._forward(x)
263 |         return torch.cat(outputs, 1)
264 | 
265 | 
266 | class InceptionB(nn.Module):
267 | 
268 |     def __init__(self, in_channels, conv_block=None, use_softpool=False):
269 |         super(InceptionB, self).__init__()
270 |         if conv_block is None:
271 |             conv_block = BasicConv2d
272 |         self.branch3x3 = conv_block(in_channels, 384, kernel_size=3, stride=2)
273 | 
274 |         self.branch3x3dbl_1 = conv_block(in_channels, 64, kernel_size=1)
275 |         self.branch3x3dbl_2 = conv_block(64, 96, kernel_size=3, padding=1)
276 |         self.branch3x3dbl_3 = conv_block(96, 96, kernel_size=3, stride=2)
277 | 
278 |         self.use_softpool = use_softpool
279 | 
280 |     def _forward(self, x):
281 |         branch3x3 = self.branch3x3(x)
282 | 
283 |         branch3x3dbl = self.branch3x3dbl_1(x)
284 |         branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
285 |         branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl)
286 | 
287 |         if not self.use_softpool:
288 |             branch_pool = F.max_pool2d(x, kernel_size=3, stride=2)
289 |         else:
290 |             branch_pool = soft_pool2d(x, kernel_size=3, stride=2)
291 | 
292 |         outputs = [branch3x3, branch3x3dbl, branch_pool]
293 |         return outputs
294 | 
295 |     def forward(self, x):
296 |         outputs = self._forward(x)
297 |         return torch.cat(outputs, 1)
298 | 
299 | 
300 | class InceptionC(nn.Module):
301 | 
302 |     def __init__(self, in_channels, channels_7x7, conv_block=None, use_softpool=False, pad=True):
303 |         super(InceptionC, self).__init__()
304 |         if conv_block is None:
305 |             conv_block = BasicConv2d
306 |         self.pad = pad
307 | 
308 |         self.branch1x1 = conv_block(in_channels, 192, kernel_size=1)
309 | 
310 |         c7 = channels_7x7
311 |         self.branch7x7_1 = conv_block(in_channels, c7, kernel_size=1)
312 |         self.branch7x7_2 = conv_block(c7, c7, kernel_size=(1, 7), padding=(0, 3))
313 |         self.branch7x7_3 = conv_block(c7, 192, kernel_size=(7, 1), padding=(3, 0))
314 | 
315 |         self.branch7x7dbl_1 = conv_block(in_channels, c7, kernel_size=1)
316 |         self.branch7x7dbl_2 = conv_block(c7, c7, kernel_size=(7, 1), padding=(3, 0))
317 |         self.branch7x7dbl_3 = conv_block(c7, c7, kernel_size=(1, 7), padding=(0, 3))
318 |         self.branch7x7dbl_4 = conv_block(c7, c7, kernel_size=(7, 1), padding=(3, 0))
319 |         self.branch7x7dbl_5 = conv_block(c7, 192, kernel_size=(1, 7), padding=(0, 3))
320 | 
321 |         self.branch_pool = conv_block(in_channels, 192, kernel_size=1)
322 | 
323 |         self.use_softpool = use_softpool
324 | 
325 |     def _forward(self, x):
326 |         branch1x1 = self.branch1x1(x)
327 | 
328 |         branch7x7 = self.branch7x7_1(x)
329 |         branch7x7 = self.branch7x7_2(branch7x7)
330 |         branch7x7 = self.branch7x7_3(branch7x7)
331 | 
332 |         branch7x7dbl = self.branch7x7dbl_1(x)
333 |         branch7x7dbl = self.branch7x7dbl_2(branch7x7dbl)
334 |         branch7x7dbl = self.branch7x7dbl_3(branch7x7dbl)
335 |         branch7x7dbl = self.branch7x7dbl_4(branch7x7dbl)
336 |         branch7x7dbl = self.branch7x7dbl_5(branch7x7dbl)
337 | 
338 |         if not self.use_softpool:
339 |             branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
340 |         else:
341 |             if self.pad:
342 |                 branch_pool = F.pad(x,(1,1,1,1),'constant', 0)
343 |             else:
344 |                 branch_pool = F.pad(x,(0,0,0,0),'constant', 0)
345 |             branch_pool = soft_pool2d(branch_pool, kernel_size=3, stride=1)
346 | 
347 |         branch_pool = self.branch_pool(branch_pool)
348 | 
349 |         outputs = [branch1x1, branch7x7, branch7x7dbl, branch_pool]
350 |         return outputs
351 | 
352 |     def forward(self, x):
353 |         outputs = self._forward(x)
354 |         return torch.cat(outputs, 1)
355 | 
356 | 
357 | class InceptionD(nn.Module):
358 | 
359 |     def __init__(self, in_channels, conv_block=None, use_softpool=False):
360 |         super(InceptionD, self).__init__()
361 |         if conv_block is None:
362 |             conv_block = BasicConv2d
363 |         self.branch3x3_1 = conv_block(in_channels, 192, kernel_size=1)
364 |         self.branch3x3_2 = conv_block(192, 320, kernel_size=3, stride=2)
365 | 
366 |         self.branch7x7x3_1 = conv_block(in_channels, 192, kernel_size=1)
367 |         self.branch7x7x3_2 = conv_block(192, 192, kernel_size=(1, 7), padding=(0, 3))
368 |         self.branch7x7x3_3 = conv_block(192, 192, kernel_size=(7, 1), padding=(3, 0))
369 |         self.branch7x7x3_4 = conv_block(192, 192, kernel_size=3, stride=2)
370 | 
371 |         self.use_softpool = use_softpool
372 | 
373 |     def _forward(self, x):
374 |         branch3x3 = self.branch3x3_1(x)
375 |         branch3x3 = self.branch3x3_2(branch3x3)
376 | 
377 |         branch7x7x3 = self.branch7x7x3_1(x)
378 |         branch7x7x3 = self.branch7x7x3_2(branch7x7x3)
379 |         branch7x7x3 = self.branch7x7x3_3(branch7x7x3)
380 |         branch7x7x3 = self.branch7x7x3_4(branch7x7x3)
381 | 
382 |         if not self.use_softpool:
383 |             branch_pool = F.max_pool2d(x, kernel_size=3, stride=2)
384 |         else:
385 |             branch_pool = soft_pool2d(x, kernel_size=3, stride=2)
386 |         outputs = [branch3x3, branch7x7x3, branch_pool]
387 |         return outputs
388 | 
389 |     def forward(self, x):
390 |         outputs = self._forward(x)
391 |         return torch.cat(outputs, 1)
392 | 
393 | 
394 | class InceptionE(nn.Module):
395 | 
396 |     def __init__(self, in_channels, conv_block=None, use_softpool=False, pad=True):
397 |         super(InceptionE, self).__init__()
398 |         if conv_block is None:
399 |             conv_block = BasicConv2d
400 |         self.pad = pad
401 | 
402 |         self.branch1x1 = conv_block(in_channels, 320, kernel_size=1)
403 | 
404 |         self.branch3x3_1 = conv_block(in_channels, 384, kernel_size=1)
405 |         self.branch3x3_2a = conv_block(384, 384, kernel_size=(1, 3), padding=(0, 1))
406 |         self.branch3x3_2b = conv_block(384, 384, kernel_size=(3, 1), padding=(1, 0))
407 | 
408 |         self.branch3x3dbl_1 = conv_block(in_channels, 448, kernel_size=1)
409 |         self.branch3x3dbl_2 = conv_block(448, 384, kernel_size=3, padding=1)
410 |         self.branch3x3dbl_3a = conv_block(384, 384, kernel_size=(1, 3), padding=(0, 1))
411 |         self.branch3x3dbl_3b = conv_block(384, 384, kernel_size=(3, 1), padding=(1, 0))
412 | 
413 |         self.branch_pool = conv_block(in_channels, 192, kernel_size=1)
414 | 
415 |         self.use_softpool = use_softpool
416 | 
417 |     def _forward(self, x):
418 |         branch1x1 = self.branch1x1(x)
419 | 
420 |         branch3x3 = self.branch3x3_1(x)
421 |         branch3x3 = [
422 |             self.branch3x3_2a(branch3x3),
423 |             self.branch3x3_2b(branch3x3),
424 |         ]
425 |         branch3x3 = torch.cat(branch3x3, 1)
426 | 
427 |         branch3x3dbl = self.branch3x3dbl_1(x)
428 |         branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
429 |         branch3x3dbl = [
430 |             self.branch3x3dbl_3a(branch3x3dbl),
431 |             self.branch3x3dbl_3b(branch3x3dbl),
432 |         ]
433 |         branch3x3dbl = torch.cat(branch3x3dbl, 1)
434 | 
435 |         if not self.use_softpool:
436 |             branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
437 |         else:
438 |             if self.pad:
439 |                 branch_pool = F.pad(x,(1,1,1,1),'constant', 0)
440 |             else:
441 |                 branch_pool = F.pad(x,(0,0,0,0),'constant', 0)
442 |             branch_pool = soft_pool2d(branch_pool, kernel_size=3, stride=1)
443 | 
444 |         branch_pool = self.branch_pool(branch_pool)
445 | 
446 |         outputs = [branch1x1, branch3x3, branch3x3dbl, branch_pool]
447 |         return outputs
448 | 
449 |     def forward(self, x):
450 |         outputs = self._forward(x)
451 |         return torch.cat(outputs, 1)
452 | 
453 | 
454 | class InceptionAux(nn.Module):
455 | 
456 |     def __init__(self, in_channels, num_classes, conv_block=None, use_softpool=False):
457 |         super(InceptionAux, self).__init__()
458 |         if conv_block is None:
459 |             conv_block = BasicConv2d
460 |         self.conv0 = conv_block(in_channels, 128, kernel_size=1)
461 |         self.conv1 = conv_block(128, 768, kernel_size=5)
462 |         self.conv1.stddev = 0.01
463 |         self.fc = nn.Linear(768, num_classes)
464 |         self.fc.stddev = 0.001
465 | 
466 |         self.use_softpool = use_softpool
467 | 
468 |     def forward(self, x):
469 |         # N x 768 x 17 x 17
470 |         if not self.use_softpool:
471 |             x = F.avg_pool2d(x, kernel_size=5, stride=3)
472 |         else:
473 |             x = soft_pool2d(x, kernel_size=5, stride=3)
474 |         # N x 768 x 5 x 5
475 |         x = self.conv0(x)
476 |         # N x 128 x 5 x 5
477 |         x = self.conv1(x)
478 |         # N x 768 x 1 x 1
479 |         # Adaptive average pooling
480 |         x = F.adaptive_avg_pool2d(x, (1, 1))
481 |         # N x 768 x 1 x 1
482 |         x = torch.flatten(x, 1)
483 |         # N x 768
484 |         x = self.fc(x)
485 |         # N x 1000
486 |         return x
487 | 
488 | 
489 | class BasicConv2d(nn.Module):
490 | 
491 |     def __init__(self, in_channels, out_channels, **kwargs):
492 |         super(BasicConv2d, self).__init__()
493 |         self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
494 |         self.bn = nn.BatchNorm2d(out_channels, eps=0.001)
495 | 
496 |     def forward(self, x):
497 |         x = self.conv(x)
498 |         x = self.bn(x)
499 |         return F.relu(x, inplace=True)
500 | 


--------------------------------------------------------------------------------
/main/models/resnet.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | 
  4 | import softpool_cuda
  5 | from SoftPool import soft_pool2d, SoftPool2d
  6 | 
  7 | 
  8 | __all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101',
  9 |            'resnet152', 'resnext50_32x4d', 'resnext101_32x8d',
 10 |            'wide_resnet50_2', 'wide_resnet101_2']
 11 | 
 12 | 
 13 | def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):
 14 |     """3x3 convolution with padding"""
 15 |     return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
 16 |                      padding=dilation, groups=groups, bias=False, dilation=dilation)
 17 | 
 18 | 
 19 | def conv1x1(in_planes, out_planes, stride=1):
 20 |     """1x1 convolution"""
 21 |     return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
 22 | 
 23 | 
 24 | class BasicBlock(nn.Module):
 25 |     expansion = 1
 26 | 
 27 |     def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
 28 |                  base_width=64, dilation=1, norm_layer=None):
 29 |         super(BasicBlock, self).__init__()
 30 |         if norm_layer is None:
 31 |             norm_layer = nn.BatchNorm2d
 32 |         if groups != 1 or base_width != 64:
 33 |             raise ValueError('BasicBlock only supports groups=1 and base_width=64')
 34 |         if dilation > 1:
 35 |             raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
 36 |         # Both self.conv1 and self.downsample layers downsample the input when stride != 1
 37 |         self.conv1 = conv3x3(inplanes, planes, stride)
 38 |         self.bn1 = norm_layer(planes)
 39 |         self.relu = nn.ReLU(inplace=True)
 40 |         self.conv2 = conv3x3(planes, planes)
 41 |         self.bn2 = norm_layer(planes)
 42 |         self.downsample = downsample
 43 |         self.stride = stride
 44 | 
 45 |     def forward(self, x):
 46 |         identity = x
 47 | 
 48 |         out = self.conv1(x)
 49 |         out = self.bn1(out)
 50 |         out = self.relu(out)
 51 | 
 52 |         out = self.conv2(out)
 53 |         out = self.bn2(out)
 54 | 
 55 |         if self.downsample is not None:
 56 |             identity = self.downsample(x)
 57 | 
 58 |         out += identity
 59 |         out = self.relu(out)
 60 | 
 61 |         return out
 62 | 
 63 | 
 64 | class Bottleneck(nn.Module):
 65 |     # Bottleneck in torchvision places the stride for downsampling at 3x3 convolution(self.conv2)
 66 |     # while original implementation places the stride at the first 1x1 convolution(self.conv1)
 67 |     # according to "Deep residual learning for image recognition"https://arxiv.org/abs/1512.03385.
 68 |     # This variant is also known as ResNet V1.5 and improves accuracy according to
 69 |     # https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch.
 70 | 
 71 |     expansion = 4
 72 | 
 73 |     def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
 74 |                  base_width=64, dilation=1, norm_layer=None):
 75 |         super(Bottleneck, self).__init__()
 76 |         if norm_layer is None:
 77 |             norm_layer = nn.BatchNorm2d
 78 |         width = int(planes * (base_width / 64.)) * groups
 79 |         # Both self.conv2 and self.downsample layers downsample the input when stride != 1
 80 |         self.conv1 = conv1x1(inplanes, width)
 81 |         self.bn1 = norm_layer(width)
 82 |         self.conv2 = conv3x3(width, width, stride, groups, dilation)
 83 |         self.bn2 = norm_layer(width)
 84 |         self.conv3 = conv1x1(width, planes * self.expansion)
 85 |         self.bn3 = norm_layer(planes * self.expansion)
 86 |         self.relu = nn.ReLU(inplace=True)
 87 |         self.downsample = downsample
 88 |         self.stride = stride
 89 | 
 90 |     def forward(self, x):
 91 |         identity = x
 92 | 
 93 |         out = self.conv1(x)
 94 |         out = self.bn1(out)
 95 |         out = self.relu(out)
 96 | 
 97 |         out = self.conv2(out)
 98 |         out = self.bn2(out)
 99 |         out = self.relu(out)
100 | 
101 |         out = self.conv3(out)
102 |         out = self.bn3(out)
103 | 
104 |         if self.downsample is not None:
105 |             identity = self.downsample(x)
106 | 
107 |         out += identity
108 |         out = self.relu(out)
109 | 
110 |         return out
111 | 
112 | 
113 | class ResNet(nn.Module):
114 | 
115 |     def __init__(self, block, layers, use_softpool=True, num_classes=1000, zero_init_residual=False, groups=1, width_per_group=64, replace_stride_with_dilation=None, norm_layer=None):
116 |         super(ResNet, self).__init__()
117 |         if norm_layer is None:
118 |             norm_layer = nn.BatchNorm2d
119 |         self._norm_layer = norm_layer
120 | 
121 |         self.inplanes = 64
122 |         self.dilation = 1
123 |         if replace_stride_with_dilation is None:
124 |             # each element in the tuple indicates if we should replace
125 |             # the 2x2 stride with a dilated convolution instead
126 |             replace_stride_with_dilation = [False, False, False]
127 |         if len(replace_stride_with_dilation) != 3:
128 |             raise ValueError("replace_stride_with_dilation should be None "
129 |                              "or a 3-element tuple, got {}".format(replace_stride_with_dilation))
130 |         self.groups = groups
131 |         self.base_width = width_per_group
132 |         self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,
133 |                                bias=False)
134 |         self.bn1 = norm_layer(self.inplanes)
135 |         self.relu = nn.ReLU(inplace=True)
136 | 
137 |         if not use_softpool:
138 |             self.pool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
139 |         else:
140 |             self.pool = SoftPool2d(kernel_size=(2,2), stride=(2,2))
141 | 
142 |         self.layer1 = self._make_layer(block, 64, layers[0])
143 |         self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
144 |                                        dilate=replace_stride_with_dilation[0])
145 |         self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
146 |                                        dilate=replace_stride_with_dilation[1])
147 |         self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
148 |                                        dilate=replace_stride_with_dilation[2])
149 |         self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
150 |         self.fc = nn.Linear(512 * block.expansion, num_classes)
151 | 
152 |         for m in self.modules():
153 |             if isinstance(m, nn.Conv2d):
154 |                 nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
155 |             elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
156 |                 nn.init.constant_(m.weight, 1)
157 |                 nn.init.constant_(m.bias, 0)
158 | 
159 |         # Zero-initialize the last BN in each residual branch,
160 |         # so that the residual branch starts with zeros, and each residual block behaves like an identity.
161 |         # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
162 |         if zero_init_residual:
163 |             for m in self.modules():
164 |                 if isinstance(m, Bottleneck):
165 |                     nn.init.constant_(m.bn3.weight, 0)
166 |                 elif isinstance(m, BasicBlock):
167 |                     nn.init.constant_(m.bn2.weight, 0)
168 | 
169 |     def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
170 |         norm_layer = self._norm_layer
171 |         downsample = None
172 |         previous_dilation = self.dilation
173 |         if dilate:
174 |             self.dilation *= stride
175 |             stride = 1
176 |         if stride != 1 or self.inplanes != planes * block.expansion:
177 |             downsample = nn.Sequential(
178 |                 conv1x1(self.inplanes, planes * block.expansion, stride),
179 |                 norm_layer(planes * block.expansion),
180 |             )
181 | 
182 |         layers = []
183 |         layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
184 |                             self.base_width, previous_dilation, norm_layer))
185 |         self.inplanes = planes * block.expansion
186 |         for _ in range(1, blocks):
187 |             layers.append(block(self.inplanes, planes, groups=self.groups,
188 |                                 base_width=self.base_width, dilation=self.dilation,
189 |                                 norm_layer=norm_layer))
190 | 
191 |         return nn.Sequential(*layers)
192 | 
193 |     def _forward_impl(self, x):
194 |         # See note [TorchScript super()]
195 |         x = self.conv1(x)
196 |         x = self.bn1(x)
197 |         x = self.relu(x)
198 |         x = self.pool(x)
199 | 
200 |         x = self.layer1(x)
201 |         x = self.layer2(x)
202 |         x = self.layer3(x)
203 |         x = self.layer4(x)
204 | 
205 |         x = self.avgpool(x)
206 |         x = torch.flatten(x, 1)
207 |         x = self.fc(x)
208 | 
209 |         return x
210 | 
211 |     def forward(self, x):
212 |         return self._forward_impl(x)
213 | 
214 | 
215 | def _resnet(arch, block, layers, pretrained, progress, use_softpool, **kwargs):
216 |     model = ResNet(block, layers, use_softpool, **kwargs)
217 |     if pretrained:
218 |         state_dict = load_state_dict_from_url(model_urls[arch],
219 |                                               progress=progress)
220 |         model.load_state_dict(state_dict)
221 |     return model
222 | 
223 | 
224 | def resnet18(pretrained=False, progress=True, use_softpool=True, **kwargs):
225 |     r"""ResNet-18 model from
226 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
227 | 
228 |     Args:
229 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
230 |         progress (bool): If True, displays a progress bar of the download to stderr
231 |         use_softpool (bool): If True, changes pooling operations to softpooling
232 |     """
233 |     return _resnet('resnet18', BasicBlock, [2, 2, 2, 2], pretrained, progress, use_softpool,
234 |                    **kwargs)
235 | 
236 | 
237 | def resnet34(pretrained=False, progress=True, use_softpool=True, **kwargs):
238 |     r"""ResNet-34 model from
239 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
240 | 
241 |     Args:
242 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
243 |         progress (bool): If True, displays a progress bar of the download to stderr
244 |         use_softpool (bool): If True, changes pooling operations to softpooling
245 |     """
246 |     return _resnet('resnet34', BasicBlock, [3, 4, 6, 3], pretrained, progress, use_softpool,
247 |                    **kwargs)
248 | 
249 | 
250 | def resnet50(pretrained=False, progress=True, use_softpool=True, **kwargs):
251 |     r"""ResNet-50 model from
252 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
253 | 
254 |     Args:
255 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
256 |         progress (bool): If True, displays a progress bar of the download to stderr
257 |         use_softpool (bool): If True, changes pooling operations to softpooling
258 |     """
259 |     return _resnet('resnet50', Bottleneck, [3, 4, 6, 3], pretrained, progress, use_softpool,
260 |                    **kwargs)
261 | 
262 | 
263 | def resnet101(pretrained=False, progress=True, use_softpool=True, **kwargs):
264 |     r"""ResNet-101 model from
265 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
266 | 
267 |     Args:
268 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
269 |         progress (bool): If True, displays a progress bar of the download to stderr
270 |         use_softpool (bool): If True, changes pooling operations to softpooling
271 |     """
272 |     return _resnet('resnet101', Bottleneck, [3, 4, 23, 3], pretrained, progress, use_softpool,
273 |                    **kwargs)
274 | 
275 | 
276 | def resnet152(pretrained=False, progress=True, use_softpool=True, **kwargs):
277 |     r"""ResNet-152 model from
278 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
279 | 
280 |     Args:
281 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
282 |         progress (bool): If True, displays a progress bar of the download to stderr
283 |         use_softpool (bool): If True, changes pooling operations to softpooling
284 |     """
285 |     return _resnet('resnet152', Bottleneck, [3, 8, 36, 3], pretrained, progress, use_softpool,
286 |                    **kwargs)
287 | 
288 | def resnet200(pretrained=False, progress=True, use_softpool=True, **kwargs):
289 |     r"""ResNet-200 model from
290 |     `"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_
291 | 
292 |     Args:
293 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
294 |         progress (bool): If True, displays a progress bar of the download to stderr
295 |         use_softpool (bool): If True, changes pooling operations to softpooling
296 |     """
297 |     return _resnet('resnet200', Bottleneck, [3, 24, 36, 3], pretrained, progress, use_softpool,
298 |                    **kwargs)
299 | 
300 | 
301 | def resnext50_32x4d(pretrained=False, progress=True, use_softpool=True, **kwargs):
302 |     r"""ResNeXt-50 32x4d model from
303 |     `"Aggregated Residual Transformation for Deep Neural Networks" <https://arxiv.org/pdf/1611.05431.pdf>`_
304 | 
305 |     Args:
306 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
307 |         progress (bool): If True, displays a progress bar of the download to stderr
308 |         use_softpool (bool): If True, changes pooling operations to softpooling
309 |     """
310 |     kwargs['groups'] = 32
311 |     kwargs['width_per_group'] = 4
312 |     return _resnet('resnext50_32x4d', Bottleneck, [3, 4, 6, 3],
313 |                    pretrained, progress, use_softpool, **kwargs)
314 | 
315 | def resnext101_32x4d(pretrained=False, progress=True, use_softpool=True, **kwargs):
316 |     r"""ResNeXt-101 32x4d model from
317 |     `"Aggregated Residual Transformation for Deep Neural Networks" <https://arxiv.org/pdf/1611.05431.pdf>`_
318 | 
319 |     Args:
320 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
321 |         progress (bool): If True, displays a progress bar of the download to stderr
322 |         use_softpool (bool): If True, changes pooling operations to softpooling
323 |     """
324 |     kwargs['groups'] = 32
325 |     kwargs['width_per_group'] = 4
326 |     return _resnet('resnext101_32x4d', Bottleneck, [3, 4, 23, 3],
327 |                    pretrained, progress, use_softpool, **kwargs)
328 | 
329 | def resnext101_64x4d(pretrained=False, progress=True, use_softpool=True, **kwargs):
330 |     r"""ResNeXt-101 64x4d model from
331 |     `"Aggregated Residual Transformation for Deep Neural Networks" <https://arxiv.org/pdf/1611.05431.pdf>`_
332 | 
333 |     Args:
334 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
335 |         progress (bool): If True, displays a progress bar of the download to stderr
336 |         use_softpool (bool): If True, changes pooling operations to softpooling
337 |     """
338 |     kwargs['groups'] = 64
339 |     kwargs['width_per_group'] = 4
340 |     return _resnet('resnext101_64x4d', Bottleneck, [3, 4, 23, 3],
341 |                    pretrained, progress, use_softpool, **kwargs)
342 | 
343 | def resnext101_32x8d(pretrained=False, progress=True, use_softpool=True, **kwargs):
344 |     r"""ResNeXt-101 32x8d model from
345 |     `"Aggregated Residual Transformation for Deep Neural Networks" <https://arxiv.org/pdf/1611.05431.pdf>`_
346 | 
347 |     Args:
348 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
349 |         progress (bool): If True, displays a progress bar of the download to stderr
350 |         use_softpool (bool): If True, changes pooling operations to softpooling
351 |     """
352 |     kwargs['groups'] = 32
353 |     kwargs['width_per_group'] = 8
354 |     return _resnet('resnext101_32x8d', Bottleneck, [3, 4, 23, 3],
355 |                    pretrained, progress, use_softpool, **kwargs)
356 | 
357 | 
358 | def wide_resnet50_2(pretrained=False, progress=True, use_softpool=True, **kwargs):
359 |     r"""Wide ResNet-50-2 model from
360 |     `"Wide Residual Networks" <https://arxiv.org/pdf/1605.07146.pdf>`_
361 | 
362 |     The model is the same as ResNet except for the bottleneck number of channels
363 |     which is twice larger in every block. The number of channels in outer 1x1
364 |     convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048
365 |     channels, and in Wide ResNet-50-2 has 2048-1024-2048.
366 | 
367 |     Args:
368 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
369 |         progress (bool): If True, displays a progress bar of the download to stderr
370 |         use_softpool (bool): If True, changes pooling operations to softpooling
371 |     """
372 |     kwargs['width_per_group'] = 64 * 2
373 |     return _resnet('wide_resnet50_2', Bottleneck, [3, 4, 6, 3],
374 |                    pretrained, progress, use_softpool, **kwargs)
375 | 
376 | 
377 | def wide_resnet101_2(pretrained=False, progress=True, use_softpool=True, **kwargs):
378 |     r"""Wide ResNet-101-2 model from
379 |     `"Wide Residual Networks" <https://arxiv.org/pdf/1605.07146.pdf>`_
380 | 
381 |     The model is the same as ResNet except for the bottleneck number of channels
382 |     which is twice larger in every block. The number of channels in outer 1x1
383 |     convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048
384 |     channels, and in Wide ResNet-50-2 has 2048-1024-2048.
385 | 
386 |     Args:
387 |         pretrained (bool): If True, returns a model pre-trained on ImageNet
388 |         progress (bool): If True, displays a progress bar of the download to stderr
389 |         use_softpool (bool): If True, changes pooling operations to softpooling
390 |     """
391 |     kwargs['width_per_group'] = 64 * 2
392 |     return _resnet('wide_resnet101_2', Bottleneck, [3, 4, 23, 3],
393 |                    pretrained, progress, use_softpool, **kwargs)
394 | 
395 | if __name__ == "__main__":
396 |     #from ptflops import get_model_complexity_info
397 |     #tmp = (3,224,224)
398 |     net = resnet18(use_softpool=False)
399 |     net.load_state_dict(torch.load('weights/resnet-18_best.pth'))
400 |     #macs, params = get_model_complexity_info(net, tmp, as_strings=True,print_per_layer_stat=False, verbose=False)
401 |     #print('{:<30}  {:<8}'.format('Computational complexity: ', macs))
402 |     #print('{:<30}  {:<8}'.format('Number of parameters: ', params))
403 |     #print('network 1 test passed \n')
404 | 


--------------------------------------------------------------------------------
/main/train.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import os
  3 | import random
  4 | import shutil
  5 | import time
  6 | import warnings
  7 | import pathlib
  8 | import csv
  9 | import torch
 10 | import torch.nn as nn
 11 | import torch.nn.parallel
 12 | import torch.backends.cudnn as cudnn
 13 | import torch.distributed as dist
 14 | import torch.optim
 15 | import torch.multiprocessing as mp
 16 | import torch.utils.data
 17 | import torch.utils.data.distributed
 18 | import torchvision.transforms as transforms
 19 | import torchvision.datasets as datasets
 20 | 
 21 | from models import config
 22 | 
 23 | model_names = config.models
 24 | 
 25 | parser = argparse.ArgumentParser(description='PyTorch image training')
 26 | parser.add_argument('-d','--dataset', metavar='DATASET',
 27 |                     help='dataset selection', choices=['imagenet','cifar10','cifar100'],
 28 |                     default='imagenet')
 29 | parser.add_argument('-dir','--data_dir', metavar='DIR',
 30 |                     help='path to dataset',default='/home/agstergiou/Desktop')
 31 | parser.add_argument('-a', '--arch', metavar='ARCH', default='resnet18',
 32 |                     choices=model_names,
 33 |                     help='model architecture: ' +
 34 |                         ' | '.join(model_names) +
 35 |                         ' (default: resnet18)')
 36 | parser.add_argument('-j', '--workers', default=8, type=int, metavar='N',
 37 |                     help='number of data loading workers (default: 4)')
 38 | parser.add_argument('--epochs', default=90, type=int, metavar='N',
 39 |                     help='number of total epochs to run')
 40 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N',
 41 |                     help='manual epoch number (useful on restarts)')
 42 | parser.add_argument('--use_softpool', metavar='POOL', type=bool,
 43 |                     help='use softpool', default=True)
 44 | parser.add_argument('-b', '--batch-size', default=128, type=int,
 45 |                     metavar='N',
 46 |                     help='mini-batch size (default: 256), this is the total '
 47 |                          'batch size of all GPUs on the current node when '
 48 |                          'using Data Parallel or Distributed Data Parallel')
 49 | parser.add_argument('--lr', '--learning-rate', default=0.1, type=float,
 50 |                     metavar='LR', help='initial learning rate', dest='lr')
 51 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M',
 52 |                     help='momentum')
 53 | parser.add_argument('--wd', '--weight-decay', default=1e-4, type=float,
 54 |                     metavar='W', help='weight decay (default: 1e-4)',
 55 |                     dest='weight_decay')
 56 | parser.add_argument('-p', '--print-freq', default=1, type=int,
 57 |                     metavar='N', help='print frequency (default: 10)')
 58 | parser.add_argument('--resume', default='', type=str, metavar='PATH',
 59 |                     help='path to latest checkpoint (default: none)')
 60 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true',
 61 |                     help='evaluate model on validation set')
 62 | parser.add_argument('--pretrained', dest='pretrained', action='store_true',
 63 |                     help='directory if using pre-trained model', default=None)
 64 | parser.add_argument('--world-size', default=-1, type=int,
 65 |                     help='number of nodes for distributed training')
 66 | parser.add_argument('--rank', default=-1, type=int,
 67 |                     help='node rank for distributed training')
 68 | parser.add_argument('--dist-url', default='tcp://132.211.32.23:22', type=str,
 69 |                     help='url used to set up distributed training')
 70 | parser.add_argument('--dist-backend', default='nccl', type=str,
 71 |                     help='distributed backend')
 72 | parser.add_argument('--seed', default=None, type=int,
 73 |                     help='seed for initializing training. ')
 74 | parser.add_argument('--gpu', default=None, type=int,
 75 |                     help='GPU id to use.')
 76 | parser.add_argument('--multiprocessing-distributed', action='store_true',
 77 |                     help='Use multi-processing distributed training to launch '
 78 |                          'N processes per node, which has N GPUs. This is the '
 79 |                          'fastest way to use PyTorch for either single node or '
 80 |                          'multi node data parallel training')
 81 | 
 82 | best_acc1 = 0
 83 | 
 84 | 
 85 | def main():
 86 |     args = parser.parse_args()
 87 | 
 88 |     if args.seed is not None:
 89 |         random.seed(args.seed)
 90 |         torch.manual_seed(args.seed)
 91 |         cudnn.deterministic = True
 92 |         warnings.warn('You have chosen to seed training. '
 93 |                       'This will turn on the CUDNN deterministic setting, '
 94 |                       'which can slow down your training considerably! '
 95 |                       'You may see unexpected behavior when restarting '
 96 |                       'from checkpoints.')
 97 | 
 98 |     if args.gpu is not None:
 99 |         warnings.warn('You have chosen a specific GPU. This will completely '
100 |                       'disable data parallelism.')
101 | 
102 |     if args.dist_url == "env://" and args.world_size == -1:
103 |         args.world_size = int(os.environ["WORLD_SIZE"])
104 | 
105 |     args.distributed = args.world_size > 1 or args.multiprocessing_distributed
106 | 
107 |     ngpus_per_node = torch.cuda.device_count()
108 |     if args.multiprocessing_distributed:
109 |         # Since we have ngpus_per_node processes per node, the total world_size
110 |         # needs to be adjusted accordingly
111 |         args.world_size = ngpus_per_node * args.world_size
112 |         # Use torch.multiprocessing.spawn to launch distributed processes: the
113 |         # main_worker process function
114 |         mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
115 |     else:
116 |         # Simply call main_worker function
117 |         main_worker(args.gpu, ngpus_per_node, args)
118 | 
119 | 
120 | def main_worker(gpu, ngpus_per_node, args):
121 |     global best_acc1
122 |     args.gpu = gpu
123 | 
124 |     if args.gpu is not None:
125 |         print("Use GPU: {} for training".format(args.gpu))
126 | 
127 |     if args.distributed:
128 |         if args.dist_url == "env://" and args.rank == -1:
129 |             args.rank = int(os.environ["RANK"])
130 |         if args.multiprocessing_distributed:
131 |             # For multiprocessing distributed training, rank needs to be the
132 |             # global rank among all the processes
133 |             args.rank = args.rank * ngpus_per_node + gpu
134 |         dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
135 |                                 world_size=args.world_size, rank=args.rank)
136 |     # create model
137 |     model = config.get_model(args.arch,args.use_softpool)
138 | 
139 |     if not torch.cuda.is_available():
140 |         print('using CPU, this will be slow')
141 |     elif args.distributed:
142 |         # For multiprocessing distributed, DistributedDataParallel constructor
143 |         # should always set the single device scope, otherwise,
144 |         # DistributedDataParallel will use all available devices.
145 |         if args.gpu is not None:
146 |             torch.cuda.set_device(args.gpu)
147 |             model.cuda(args.gpu)
148 |             # When using a single GPU per process and per
149 |             # DistributedDataParallel, we need to divide the batch size
150 |             # ourselves based on the total number of GPUs we have
151 |             args.batch_size = int(args.batch_size / ngpus_per_node)
152 |             args.workers = int((args.workers + ngpus_per_node - 1) / ngpus_per_node)
153 |             model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu])
154 |         else:
155 |             model.cuda()
156 |             # DistributedDataParallel will divide and allocate batch_size to all
157 |             # available GPUs if device_ids are not set
158 |             model = torch.nn.parallel.DistributedDataParallel(model)
159 |     elif args.gpu is not None:
160 |         torch.cuda.set_device(args.gpu)
161 |         model = model.cuda(args.gpu)
162 |     else:
163 |         # DataParallel will divide and allocate batch_size to all available GPUs
164 |         if args.arch.startswith('alexnet') or args.arch.startswith('vgg'):
165 |             model.features = torch.nn.DataParallel(model.features)
166 |             model.cuda()
167 |         else:
168 |             model = torch.nn.DataParallel(model).cuda()
169 | 
170 |     # Load weights
171 |     if args.pretrained is not None:
172 |         print("=> using pre-trained model '{}' from dir '{}' ".format(args.arch,args.pretrained))
173 |         model.load_state_dict(torch.load(args.pretrained)['state_dict'])
174 |     else:
175 |         print("=> initialised model '{}'".format(args.arch))
176 | 
177 |     # define loss function (criterion) and optimizer
178 |     criterion = nn.CrossEntropyLoss().cuda(args.gpu)
179 | 
180 |     optimizer = torch.optim.SGD(model.parameters(), args.lr,
181 |                                 momentum=args.momentum,
182 |                                 weight_decay=args.weight_decay)
183 | 
184 |     # optionally resume from a checkpoint
185 |     if args.resume:
186 |         if os.path.isfile(args.resume):
187 |             print("=> loading checkpoint '{}'".format(args.resume))
188 |             if args.gpu is None:
189 |                 checkpoint = torch.load(args.resume)
190 |             else:
191 |                 # Map model to be loaded to specified single gpu.
192 |                 loc = 'cuda:{}'.format(args.gpu)
193 |                 checkpoint = torch.load(args.resume, map_location=loc)
194 |             args.start_epoch = checkpoint['epoch']
195 |             best_acc1 = checkpoint['best_acc1']
196 |             if args.gpu is not None:
197 |                 # best_acc1 may be from a checkpoint from a different GPU
198 |                 best_acc1 = best_acc1.to(args.gpu)
199 |             model.load_state_dict(checkpoint['state_dict'])
200 |             optimizer.load_state_dict(checkpoint['optimizer'])
201 |             print("=> loaded checkpoint '{}' (epoch {})"
202 |                   .format(args.resume, checkpoint['epoch']))
203 |         else:
204 |             print("=> no checkpoint found at '{}'".format(args.resume))
205 | 
206 |     cudnn.benchmark = True
207 | 
208 |     # Data loading code
209 |     geometrical = transforms.RandomApply([
210 |                          transforms.RandomRotation(degrees=10),
211 |                          transforms.RandomHorizontalFlip(p=0.3)],
212 |                     p=.6
213 |                 )
214 |     colour = transforms.RandomApply(
215 |                 [transforms.ColorJitter(brightness=.1,contrast=.05,saturation=.01,hue=.05)],
216 |                 p=.4)
217 | 
218 |     normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
219 |                                      std=[0.229, 0.224, 0.225])
220 | 
221 |     if args.dataset == 'imagenet':
222 |         train_dataset = datasets.ImageNet(
223 |             root=os.path.join(args.data_dir,'ILSVRC2012'),
224 |             split='train',
225 |             transform=transforms.Compose([
226 |                       geometrical,
227 |                       transforms.RandomResizedCrop(299),
228 |                       transforms.ToTensor(),
229 |                       normalize,
230 |                       ])
231 |         )
232 |         val_dataset = datasets.ImageNet(
233 |             root=os.path.join(args.data_dir,'ILSVRC2012'),
234 |             split='val',
235 |             transform=transforms.Compose([
236 |                         transforms.Resize(324),
237 |                         transforms.CenterCrop(299),
238 |                         transforms.ToTensor(),
239 |                         normalize,
240 |                         ])
241 |             )
242 | 
243 |     elif args.dataset == 'cifar10':
244 |         train_dataset = datasets.CIFAR10(
245 |             root=os.path.join(args.data_dir,'cifar-10-batches-py'),
246 |             train=True,
247 |             transform=transforms.Compose([
248 |                       geometrical,
249 |                       colour,
250 |                       transforms.ToTensor(),
251 |                       normalize,
252 |                       ])
253 |         )
254 |         val_dataset = datasets.CIFAR10(
255 |             root=os.path.join(args.data_dir,'cifar-10-batches-py'),
256 |             train=False,
257 |             transform=transforms.Compose([
258 |                         transforms.ToTensor(),
259 |                         normalize,
260 |                         ])
261 |             )
262 |     else:
263 |         train_dataset = datasets.CIFAR100(
264 |             root=os.path.join(args.data_dir,'cifar-100-python'),
265 |             train=True,
266 |             transform=transforms.Compose([
267 |                       geometrical,
268 |                       colour,
269 |                       transforms.ToTensor(),
270 |                       normalize,
271 |                       ])
272 |         )
273 |         val_dataset = datasets.CIFAR100(
274 |             root=os.path.join(args.data_dir,'cifar-100-python'),
275 |             train=False,
276 |             transform=transforms.Compose([
277 |                         transforms.ToTensor(),
278 |                         normalize,
279 |                         ])
280 |             )
281 | 
282 | 
283 |     if args.distributed:
284 |         train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
285 |     else:
286 |         train_sampler = None
287 | 
288 |     train_loader = torch.utils.data.DataLoader(
289 |         train_dataset, batch_size=args.batch_size, shuffle=(train_sampler is None),
290 |         num_workers=args.workers, pin_memory=True, sampler=train_sampler)
291 | 
292 |     val_loader = torch.utils.data.DataLoader(
293 |         val_dataset, batch_size=args.batch_size, shuffle=False,
294 |         num_workers=args.workers, pin_memory=True)
295 | 
296 |     if args.evaluate:
297 |         validate(val_loader, model, criterion, args)
298 |         return
299 | 
300 |     # Create csv file or open if it already exists
301 |     pathlib.Path('results/'+args.arch).mkdir(parents=True, exist_ok=True)
302 |     with open ('results/'+args.arch+'/accuracies.csv','a', newline='') as my_file:
303 |         csv_writter = csv.writer(my_file, delimiter=';', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='\n')
304 | 
305 | 
306 |         for epoch in range(args.start_epoch, args.epochs):
307 |             if args.distributed:
308 |                 train_sampler.set_epoch(epoch)
309 |             adjust_learning_rate(optimizer, epoch, args)
310 | 
311 |             # train for one epoch
312 |             train(train_loader, model, criterion, optimizer, epoch, args)
313 | 
314 |             # evaluate on validation set
315 |             loss,acc1,acc5 = validate(val_loader, model, criterion, args)
316 | 
317 |             csv_writter.writerow([str(epoch),str(loss),str(acc1.item()),str(acc5.item())])
318 | 
319 |             # remember best acc@1 and save checkpoint
320 |             is_best = acc1 > best_acc1
321 |             best_acc1 = max(acc1, best_acc1)
322 | 
323 |             pathlib.Path('saved_weights/'+args.arch).mkdir(parents=True, exist_ok=True)
324 | 
325 |             save_checkpoint({
326 |                 'epoch': epoch + 1,
327 |                 'arch': args.arch,
328 |                 'state_dict': model.state_dict(),
329 |                 'best_acc1': best_acc1,
330 |                 'optimizer' : optimizer.state_dict(),
331 |             }, is_best,
332 |             filename='saved_weights/'+args.arch+'/checkpoint_'+str(epoch+1)+'.pth')
333 | 
334 | 
335 | def train(train_loader, model, criterion, optimizer, epoch, args):
336 |     batch_time = AverageMeter('Time', ':6.3f')
337 |     data_time = AverageMeter('Data', ':6.3f')
338 |     losses = AverageMeter('Loss', ':.4e')
339 |     top1 = AverageMeter('Acc@1', ':6.2f')
340 |     top5 = AverageMeter('Acc@5', ':6.2f')
341 |     progress = ProgressMeter(
342 |         len(train_loader),
343 |         [batch_time, data_time, losses, top1, top5],
344 |         prefix="Epoch: [{}]".format(epoch))
345 | 
346 |     # switch to train mode
347 |     model.train()
348 | 
349 |     end = time.time()
350 |     for i, (images, target) in enumerate(train_loader):
351 |         # measure data loading time
352 |         data_time.update(time.time() - end)
353 | 
354 |         if args.gpu is not None:
355 |             images = images.cuda(args.gpu, non_blocking=True)
356 |         if torch.cuda.is_available():
357 |             target = target.cuda(args.gpu, non_blocking=True)
358 | 
359 |         # compute output
360 |         output = model(images)
361 |         loss = criterion(output, target)
362 | 
363 |         # measure accuracy and record loss
364 |         acc1, acc5 = accuracy(output, target, topk=(1, 5))
365 |         losses.update(loss.item(), images.size(0))
366 |         top1.update(acc1[0], images.size(0))
367 |         top5.update(acc5[0], images.size(0))
368 | 
369 |         # compute gradient and do SGD step
370 |         optimizer.zero_grad()
371 |         loss.backward()
372 |         optimizer.step()
373 | 
374 |         # measure elapsed time
375 |         batch_time.update(time.time() - end)
376 |         end = time.time()
377 | 
378 |         if i % args.print_freq == 0:
379 |             progress.display(i)
380 | 
381 | 
382 | def validate(val_loader, model, criterion, args):
383 |     batch_time = AverageMeter('Time', ':6.3f')
384 |     losses = AverageMeter('Loss', ':.4e')
385 |     top1 = AverageMeter('Acc@1', ':6.2f')
386 |     top5 = AverageMeter('Acc@5', ':6.2f')
387 |     progress = ProgressMeter(
388 |         len(val_loader),
389 |         [batch_time, losses, top1, top5],
390 |         prefix='Test: ')
391 | 
392 |     # switch to evaluate mode
393 |     model.eval()
394 | 
395 |     with torch.no_grad():
396 |         end = time.time()
397 |         for i, (images, target) in enumerate(val_loader):
398 |             if args.gpu is not None:
399 |                 images = images.cuda(args.gpu, non_blocking=True)
400 |             if torch.cuda.is_available():
401 |                 target = target.cuda(args.gpu, non_blocking=True)
402 | 
403 |             # compute output
404 |             output = model(images)
405 |             loss = criterion(output, target)
406 | 
407 |             # measure accuracy and record loss
408 |             acc1, acc5 = accuracy(output, target, topk=(1, 5))
409 |             losses.update(loss.item(), images.size(0))
410 |             top1.update(acc1[0], images.size(0))
411 |             top5.update(acc5[0], images.size(0))
412 | 
413 |             # measure elapsed time
414 |             batch_time.update(time.time() - end)
415 |             end = time.time()
416 | 
417 |             if i % args.print_freq == 0:
418 |                 progress.display(i)
419 | 
420 |         # TODO: this should also be done with the ProgressMeter
421 |         print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'
422 |               .format(top1=top1, top5=top5))
423 | 
424 |     return (losses.avg,top1.avg,top5.avg)
425 | 
426 | 
427 | def save_checkpoint(state, is_best, filename='saved_weights/checkpoint.pth'):
428 |     torch.save(state, filename)
429 |     if is_best:
430 |         shutil.copyfile(filename, filename.split('checkpoint')[0]+'model_best.pth')
431 | 
432 | 
433 | class AverageMeter(object):
434 |     """Computes and stores the average and current value"""
435 |     def __init__(self, name, fmt=':f'):
436 |         self.name = name
437 |         self.fmt = fmt
438 |         self.reset()
439 | 
440 |     def reset(self):
441 |         self.val = 0
442 |         self.avg = 0
443 |         self.sum = 0
444 |         self.count = 0
445 | 
446 |     def update(self, val, n=1):
447 |         self.val = val
448 |         self.sum += val * n
449 |         self.count += n
450 |         self.avg = self.sum / self.count
451 | 
452 |     def __str__(self):
453 |         fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
454 |         return fmtstr.format(**self.__dict__)
455 | 
456 | 
457 | class ProgressMeter(object):
458 |     def __init__(self, num_batches, meters, prefix=""):
459 |         self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
460 |         self.meters = meters
461 |         self.prefix = prefix
462 | 
463 |     def display(self, batch):
464 |         entries = [self.prefix + self.batch_fmtstr.format(batch)]
465 |         entries += [str(meter) for meter in self.meters]
466 |         print('\t'.join(entries))
467 | 
468 |     def _get_batch_fmtstr(self, num_batches):
469 |         num_digits = len(str(num_batches // 1))
470 |         fmt = '{:' + str(num_digits) + 'd}'
471 |         return '[' + fmt + '/' + fmt.format(num_batches) + ']'
472 | 
473 | 
474 | def adjust_learning_rate(optimizer, epoch, args):
475 |     """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
476 |     lr = args.lr * (0.1 ** (epoch // 30))
477 |     for param_group in optimizer.param_groups:
478 |         param_group['lr'] = lr
479 | 
480 | 
481 | def accuracy(output, target, topk=(1,)):
482 |     """Computes the accuracy over the k top predictions for the specified values of k"""
483 |     with torch.no_grad():
484 |         maxk = max(topk)
485 |         batch_size = target.size(0)
486 | 
487 |         _, pred = output.topk(maxk, 1, True, True)
488 |         pred = pred.t()
489 |         correct = pred.eq(target.view(1, -1).expand_as(pred))
490 | 
491 |         res = []
492 |         for k in topk:
493 |             correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
494 |             res.append(correct_k.mul_(100.0 / batch_size))
495 |         return res
496 | 
497 | 
498 | if __name__ == '__main__':
499 |     main()
500 | 


--------------------------------------------------------------------------------
/pytorch/._setup.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/pytorch/._setup.py


--------------------------------------------------------------------------------
/pytorch/CUDA/limits.cuh:
--------------------------------------------------------------------------------
  1 | #pragma once
  2 | 
  3 | #include <cuda.h>
  4 | #include <limits.h>
  5 | #include <math.h>
  6 | #include <float.h>
  7 | 
  8 | // NumericLimits.cuh is a holder for numeric limits definitions of commonly used
  9 | // types. This header is very specific to ROCm HIP and may be removed in the future.
 10 | // This header is derived from the legacy THCNumerics.cuh.
 11 | 
 12 | // The lower_bound and upper_bound constants are same as lowest and max for
 13 | // integral types, but are -inf and +inf for floating point types. They are
 14 | // useful in implementing min, max, etc.
 15 | 
 16 | namespace at {
 17 | 
 18 | template <typename T>
 19 | struct n_limits {
 20 | };
 21 | 
 22 | // WARNING: the following at::numeric_limits definitions are there only to support
 23 | //          HIP compilation for the moment. Use std::numeric_limits if you are not
 24 | //          compiling for ROCm.
 25 | //          from @colesbury: "The functions on numeric_limits aren't marked with
 26 | //          __device__ which is why they don't work with ROCm. CUDA allows them
 27 | //          because they're constexpr."
 28 | 
 29 | //namespace {
 30 |   // ROCm doesn't like INFINITY too.
 31 |   //constexpr double inf = INFINITY;
 32 | //}
 33 | 
 34 | template <>
 35 | struct n_limits<bool> {
 36 |   static inline __host__ __device__ bool lowest() { return false; }
 37 |   static inline __host__ __device__ bool min() { return false; }
 38 |   static inline __host__ __device__ bool max() { return true; }
 39 |   static inline __host__ __device__ bool lower_bound() { return false; }
 40 |   static inline __host__ __device__ bool upper_bound() { return true; }
 41 | };
 42 | 
 43 | template <>
 44 | struct n_limits<uint8_t> {
 45 |   static inline __host__ __device__ uint8_t lowest() { return 0; }
 46 |   static inline __host__ __device__ uint8_t min() { return 0; }
 47 |   static inline __host__ __device__ uint8_t max() { return UINT8_MAX; }
 48 |   static inline __host__ __device__ uint8_t lower_bound() { return 0; }
 49 |   static inline __host__ __device__ uint8_t upper_bound() { return UINT8_MAX; }
 50 | };
 51 | 
 52 | template <>
 53 | struct n_limits<int8_t> {
 54 |   static inline __host__ __device__ int8_t lowest() { return INT8_MIN; }
 55 |   static inline __host__ __device__ int8_t min() { return INT8_MIN; }
 56 |   static inline __host__ __device__ int8_t max() { return INT8_MAX; }
 57 |   static inline __host__ __device__ int8_t lower_bound() { return INT8_MIN; }
 58 |   static inline __host__ __device__ int8_t upper_bound() { return INT8_MAX; }
 59 | };
 60 | 
 61 | template <>
 62 | struct n_limits<int16_t> {
 63 |   static inline __host__ __device__ int16_t lowest() { return INT16_MIN; }
 64 |   static inline __host__ __device__ int16_t min() { return INT16_MIN; }
 65 |   static inline __host__ __device__ int16_t max() { return INT16_MAX; }
 66 |   static inline __host__ __device__ int16_t lower_bound() { return INT16_MIN; }
 67 |   static inline __host__ __device__ int16_t upper_bound() { return INT16_MAX; }
 68 | };
 69 | 
 70 | template <>
 71 | struct n_limits<int32_t> {
 72 |   static inline __host__ __device__ int32_t lowest() { return INT32_MIN; }
 73 |   static inline __host__ __device__ int32_t min() { return INT32_MIN; }
 74 |   static inline __host__ __device__ int32_t max() { return INT32_MAX; }
 75 |   static inline __host__ __device__ int32_t lower_bound() { return INT32_MIN; }
 76 |   static inline __host__ __device__ int32_t upper_bound() { return INT32_MAX; }
 77 | };
 78 | 
 79 | template <>
 80 | struct n_limits<int64_t> {
 81 | #ifdef _MSC_VER
 82 |   static inline __host__ __device__ int64_t lowest() { return _I64_MIN; }
 83 |   static inline __host__ __device__ int64_t min() { return _I64_MIN; }
 84 |   static inline __host__ __device__ int64_t max() { return _I64_MAX; }
 85 |   static inline __host__ __device__ int64_t lower_bound() { return _I64_MIN; }
 86 |   static inline __host__ __device__ int64_t upper_bound() { return _I64_MAX; }
 87 | #else
 88 |   static inline __host__ __device__ int64_t lowest() { return INT64_MIN; }
 89 |   static inline __host__ __device__ int64_t min() { return INT64_MIN; }
 90 |   static inline __host__ __device__ int64_t max() { return INT64_MAX; }
 91 |   static inline __host__ __device__ int64_t lower_bound() { return INT64_MIN; }
 92 |   static inline __host__ __device__ int64_t upper_bound() { return INT64_MAX; }
 93 | #endif
 94 | };
 95 | 
 96 | template <>
 97 | struct n_limits<at::Half> {
 98 |   static inline __host__ __device__ at::Half lowest() { return at::Half(0xFBFF, at::Half::from_bits()); }
 99 |   static inline __host__ __device__ at::Half min() { return at::Half(0x0400, at::Half::from_bits()); }
100 |   static inline __host__ __device__ at::Half max() { return at::Half(0x7BFF, at::Half::from_bits()); }
101 |   static inline __host__ __device__ at::Half lower_bound() { return at::Half(0xFC00, at::Half::from_bits()); }
102 |   static inline __host__ __device__ at::Half upper_bound() { return at::Half(0x7C00, at::Half::from_bits()); }
103 | };
104 | 
105 | template <>
106 | struct n_limits<at::BFloat16> {
107 |   static inline __host__ __device__ at::BFloat16 lowest() { return at::BFloat16(0xFF7F, at::BFloat16::from_bits()); }
108 |   static inline __host__ __device__ at::BFloat16 min() { return at::BFloat16(0x0080, at::BFloat16::from_bits()); }
109 |   static inline __host__ __device__ at::BFloat16 max() { return at::BFloat16(0x7F7F, at::BFloat16::from_bits()); }
110 |   static inline __host__ __device__ at::BFloat16 lower_bound() { return at::BFloat16(0xFF80, at::BFloat16::from_bits()); }
111 |   static inline __host__ __device__ at::BFloat16 upper_bound() { return at::BFloat16(0x7F80, at::BFloat16::from_bits()); }
112 | };
113 | 
114 | template <>
115 | struct n_limits<float> {
116 |   static inline __host__ __device__ float lowest() { return -FLT_MAX; }
117 |   static inline __host__ __device__ float min() { return FLT_MIN; }
118 |   static inline __host__ __device__ float max() { return FLT_MAX; }
119 |   static inline __host__ __device__ float lower_bound() { return -std::numeric_limits<float>::infinity(); }
120 |   static inline __host__ __device__ float upper_bound() { return std::numeric_limits<float>::infinity(); }
121 | };
122 | 
123 | template <>
124 | struct n_limits<double> {
125 |   static inline __host__ __device__ double lowest() { return -DBL_MAX; }
126 |   static inline __host__ __device__ double min() { return DBL_MIN; }
127 |   static inline __host__ __device__ double max() { return DBL_MAX; }
128 |   static inline __host__ __device__ double lower_bound() { return -std::numeric_limits<double>::infinity(); }
129 |   static inline __host__ __device__ double upper_bound() { return std::numeric_limits<double>::infinity(); }
130 | };
131 | 
132 | } // namespace at
133 | 


--------------------------------------------------------------------------------
/pytorch/CUDA/softpool_cuda.cpp:
--------------------------------------------------------------------------------
  1 | #include <torch/extension.h>
  2 | #include <vector>
  3 | 
  4 | // CUDA forward declarations
  5 | 
  6 | int SoftPool1dForwardLauncher(const at::Tensor input, const int batches,
  7 |                               const int channels, const int dim,
  8 |                               const int kernel_d, const int stride_d,
  9 |                               at::Tensor output);
 10 | 
 11 | int SoftPool1dBackwardLauncher(const at::Tensor output_grad, const at::Tensor input,
 12 |                                const int batches, const int channels,
 13 |                                const int dim, const int kernel_d,
 14 |                                const int stride_d, at::Tensor input_grad);
 15 | 
 16 | int SoftPool2dForwardLauncher(const at::Tensor input, const int batches,
 17 |                               const int channels, const int height,
 18 |                               const int width, const int kernel_h,
 19 |                               const int kernel_w, const int stride_h,
 20 |                               const int stride_w, at::Tensor output);
 21 | 
 22 | int SoftPool2dBackwardLauncher(const at::Tensor output_grad, const at::Tensor input,
 23 |                                const int batches, const int channels,
 24 |                                const int height, const int width,
 25 |                                const int kernel_h, const int kernel_w,
 26 |                                const int stride_h, const int stride_w,
 27 |                                at::Tensor input_grad);
 28 | 
 29 | int SoftPool3dForwardLauncher(const at::Tensor input, const int batches,
 30 |                               const int channels, const int depth,
 31 |                               const int height, const int width,
 32 |                               const int kernel_d, const int kernel_h,
 33 |                               const int kernel_w, const int stride_d,
 34 |                               const int stride_h, const int stride_w,
 35 |                               at::Tensor output);
 36 | 
 37 | int SoftPool3dBackwardLauncher(const at::Tensor output_grad, const at::Tensor input,
 38 |                                const int batches, const int channels,
 39 |                                const int depth, const int height,
 40 |                                const int width, const int kernel_d,
 41 |                                const int kernel_h, const int kernel_w,
 42 |                                const int stride_d, const int stride_h,
 43 |                                const int stride_w, at::Tensor input_grad);
 44 | 
 45 | // C++ interface
 46 | 
 47 | #define CHECK_CUDA(x) AT_ASSERT(x.is_cuda(), #x " must be a CUDA tensor")
 48 | #define CHECK_CONTIGUOUS(x) AT_ASSERT(x.is_contiguous(), #x " must be a contiguous tensor");
 49 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x);
 50 | 
 51 | int softpool1d_forward_cuda(at::Tensor input, const std::tuple<int> kernel,
 52 |                             const std::tuple<int> stride, at::Tensor output) {
 53 |     CHECK_INPUT(input);
 54 |     CHECK_INPUT(output);
 55 | 
 56 |     int batches = input.size(0);
 57 |     int channels = input.size(1);
 58 |     int dim = input.size(2);
 59 | 
 60 |     int kernel_d = std::get<0>(kernel);
 61 |     int stride_d = std::get<0>(stride);
 62 | 
 63 |     SoftPool1dForwardLauncher(input, batches,
 64 |                               channels, dim,
 65 |                               kernel_d, stride_d,
 66 |                               output);
 67 |     return 1;
 68 | }
 69 | 
 70 | int softpool1d_backward_cuda(const at::Tensor output_grad, const at::Tensor input,
 71 |                              const std::tuple<int> kernel, const std::tuple<int> stride,
 72 |                              at::Tensor input_grad) {
 73 |     CHECK_INPUT(output_grad);
 74 |     CHECK_INPUT(input);
 75 |     CHECK_INPUT(input_grad);
 76 | 
 77 | 
 78 |     int batches = input_grad.size(0);
 79 |     int channels = input_grad.size(1);
 80 |     int dim = input_grad.size(2);
 81 | 
 82 |     int kernel_d = std::get<0>(kernel);
 83 |     int stride_d = std::get<0>(stride);
 84 | 
 85 |     SoftPool1dBackwardLauncher(output_grad, input,
 86 |                                batches, channels,
 87 |                                dim, kernel_d,
 88 |                                stride_d, input_grad);
 89 |     return 1;
 90 | }
 91 | 
 92 | 
 93 | int softpool2d_forward_cuda(at::Tensor input, const std::tuple<int, int> kernel,
 94 |                             const std::tuple<int, int> stride, at::Tensor output) {
 95 |     CHECK_INPUT(input);
 96 |     CHECK_INPUT(output);
 97 | 
 98 |     int batches = input.size(0);
 99 |     int channels = input.size(1);
100 |     int height = input.size(2);
101 |     int width = input.size(3);
102 | 
103 |     int kernel_h = std::get<0>(kernel);
104 |     int kernel_w = std::get<1>(kernel);
105 |     int stride_h = std::get<0>(stride);
106 |     int stride_w = std::get<1>(stride);
107 | 
108 |     SoftPool2dForwardLauncher(input, batches,
109 |                               channels, height,
110 |                               width, kernel_h,
111 |                               kernel_w, stride_h,
112 |                               stride_w, output);
113 |     return 1;
114 | }
115 | 
116 | int softpool2d_backward_cuda(const at::Tensor output_grad, const at::Tensor input,
117 |                              const std::tuple<int, int> kernel, const std::tuple<int, int> stride,
118 |                              at::Tensor input_grad) {
119 |     CHECK_INPUT(output_grad);
120 |     CHECK_INPUT(input);
121 |     CHECK_INPUT(input_grad);
122 | 
123 | 
124 |     int batches = input_grad.size(0);
125 |     int channels = input_grad.size(1);
126 |     int height = input_grad.size(2);
127 |     int width = input_grad.size(3);
128 | 
129 |     int kernel_h = std::get<0>(kernel);
130 |     int kernel_w = std::get<1>(kernel);
131 |     int stride_h = std::get<0>(stride);
132 |     int stride_w = std::get<1>(stride);
133 | 
134 |     SoftPool2dBackwardLauncher(output_grad, input,
135 |                                batches, channels,
136 |                                height, width,
137 |                                kernel_h, kernel_w,
138 |                                stride_h, stride_w,
139 |                                input_grad);
140 |     return 1;
141 | }
142 | 
143 | 
144 | int softpool3d_forward_cuda(at::Tensor input, const std::tuple<int, int, int> kernel,
145 |                             const std::tuple<int, int, int> stride, at::Tensor output) {
146 |     CHECK_INPUT(input);
147 |     CHECK_INPUT(output);
148 | 
149 |     int batches = input.size(0);
150 |     int channels = input.size(1);
151 |     int depth = input.size(2);
152 |     int height = input.size(3);
153 |     int width = input.size(4);
154 | 
155 |     int kernel_d = std::get<0>(kernel);
156 |     int kernel_h = std::get<1>(kernel);
157 |     int kernel_w = std::get<2>(kernel);
158 |     int stride_d = std::get<0>(stride);
159 |     int stride_h = std::get<1>(stride);
160 |     int stride_w = std::get<2>(stride);
161 | 
162 |     SoftPool3dForwardLauncher(input, batches,
163 |                               channels, depth,
164 |                               height, width,
165 |                               kernel_d, kernel_h,
166 |                               kernel_w, stride_d,
167 |                               stride_h, stride_w,
168 |                               output);
169 |     return 1;
170 | }
171 | 
172 | int softpool3d_backward_cuda(const at::Tensor output_grad, const at::Tensor input,
173 |                              const std::tuple<int, int, int> kernel, const std::tuple<int, int, int> stride,
174 |                              at::Tensor input_grad) {
175 |     CHECK_INPUT(output_grad);
176 |     CHECK_INPUT(input);
177 |     CHECK_INPUT(input_grad);
178 | 
179 | 
180 |     int batches = input_grad.size(0);
181 |     int channels = input_grad.size(1);
182 |     int depth = input_grad.size(2);
183 |     int height = input_grad.size(3);
184 |     int width = input_grad.size(4);
185 | 
186 |     int kernel_d = std::get<0>(kernel);
187 |     int kernel_h = std::get<1>(kernel);
188 |     int kernel_w = std::get<2>(kernel);
189 |     int stride_d = std::get<0>(stride);
190 |     int stride_h = std::get<1>(stride);
191 |     int stride_w = std::get<2>(stride);
192 | 
193 |     SoftPool3dBackwardLauncher(output_grad, input,
194 |                                batches, channels,
195 |                                depth, height,
196 |                                width, kernel_d,
197 |                                kernel_h, kernel_w,
198 |                                stride_d, stride_h,
199 |                                stride_w, input_grad);
200 |     return 1;
201 | }
202 | 
203 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
204 |   m.def("forward_1d", &softpool1d_forward_cuda, "SoftPool1d forward (CUDA)");
205 |   m.def("backward_1d", &softpool1d_backward_cuda, "SoftPool1d backward (CUDA)");
206 |   m.def("forward_2d", &softpool2d_forward_cuda, "SoftPool2d forward (CUDA)");
207 |   m.def("backward_2d", &softpool2d_backward_cuda, "SoftPool2d backward (CUDA)");
208 |   m.def("forward_3d", &softpool3d_forward_cuda, "SoftPool3d forward (CUDA)");
209 |   m.def("backward_3d", &softpool3d_backward_cuda, "SoftPool3d backward (CUDA)");
210 | }
211 | 


--------------------------------------------------------------------------------
/pytorch/CUDA/softpool_cuda_kernel.cu:
--------------------------------------------------------------------------------
  1 | #include <float.h>
  2 | #include <ATen/ATen.h>
  3 | #include <THC/THCAtomics.cuh>
  4 | 
  5 | #include "limits.cuh"
  6 | 
  7 | using namespace at;  // fix for pytorch<=0.4.1
  8 | 
  9 | #define CUDA_1D_KERNEL_LOOP(i, n)                            \
 10 |   for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \
 11 |        i += blockDim.x * gridDim.x)
 12 | 
 13 | #define THREADS_PER_BLOCK 1024
 14 | 
 15 | inline int GET_BLOCKS(const int N) {
 16 |   int optimal_block_num = (N + THREADS_PER_BLOCK - 1) / THREADS_PER_BLOCK;
 17 |   int max_block_num = 65000;
 18 |   return min(optimal_block_num, max_block_num);
 19 | }
 20 | 
 21 | //type-safe sign
 22 | template <typename scalar_t>
 23 | __device__ scalar_t sgn(scalar_t val) {
 24 |     return (scalar_t(0) < val) - (val < scalar_t(0));
 25 | }
 26 | 
 27 | // Overflow and Underflow clamp
 28 | template <typename scalar_t>
 29 | __device__  scalar_t clamp(const scalar_t n, const scalar_t lower, const scalar_t upper) {
 30 |   const scalar_t tmp = abs(n);
 31 |   const scalar_t result = max(lower, min(tmp, upper));
 32 |   return result * sgn(n);
 33 | }
 34 | 
 35 | 
 36 | template <typename scalar_t>
 37 | __global__ void SoftPool1dForward(const int nthreads,
 38 |                                   const scalar_t *bottom_input, const int batches,
 39 |                                   const int channels, const int dim,
 40 |                                   const int kernel_d, const int stride_d,
 41 |                                   scalar_t *output_data){
 42 |   int pooled_dim = dim/stride_d;
 43 |   // Run in parallel for each cell within each kernel region
 44 |   CUDA_1D_KERNEL_LOOP(index, nthreads) {
 45 |     int pd = index % pooled_dim;// index of each kernel operation in relation to the position in the input
 46 |     int c = (index / pooled_dim) % channels;
 47 |     int n = index / pooled_dim / channels;
 48 | 
 49 |     const int offset = (n * channels + c) * dim; // initial offset
 50 |     const scalar_t *offset_bottom_input = bottom_input + offset;
 51 | 
 52 |     const int base_d = pd*stride_d; // start cell index for each kernel
 53 |     if (base_d > dim - kernel_d)break; // limit iterations based on the position of the final kernel application over the input
 54 | 
 55 |     // --- Initialisations happen here ----
 56 |     scalar_t mask_sum_max = 0.;
 57 | 
 58 |     output_data[index] = 0.;
 59 |     const scalar_t upper = n_limits<scalar_t>::max();
 60 |     const scalar_t lower = n_limits<scalar_t>::min();
 61 |     const scalar_t zero = 0.;
 62 | 
 63 |     // Iterate over inputs cells within each kernel region in the input
 64 |     for(int id=0; id<kernel_d; id++){
 65 |       const int d_offset = base_d + id;
 66 | 
 67 |       if(d_offset >= dim || d_offset < 0)continue;// check if the offset index is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing)
 68 |       const int offset = d_offset;
 69 | 
 70 |       // Use this for verbose when debugging
 71 |       //printf("(pd: %d), base_d: %d, id: %d, d_offset: %d \n", pd, base_d, id, d_offset);
 72 | 
 73 |       mask_sum_max += exp(offset_bottom_input[offset]);
 74 | 
 75 |     }
 76 |     // Overflow check
 77 |     mask_sum_max = clamp(mask_sum_max, lower, upper);
 78 | 
 79 |     for(int id=0; id<kernel_d; id++){
 80 |       const int d_offset = base_d + id;
 81 | 
 82 |       if(d_offset >= dim || d_offset < 0)continue;
 83 |       const int offset = d_offset;
 84 | 
 85 |       scalar_t mask_ = exp(offset_bottom_input[offset])/ mask_sum_max;// SoftMax
 86 | 
 87 |       output_data[index] += offset_bottom_input[offset] * mask_;
 88 |       output_data[index] = clamp(output_data[index], zero, upper);
 89 |     }
 90 |   }
 91 | }
 92 | 
 93 | 
 94 | template <typename scalar_t>
 95 | __global__ void SoftPool2dForward(const int nthreads,
 96 |                                   const scalar_t *bottom_input, const int batches,
 97 |                                   const int channels, const int height,
 98 |                                   const int width, const int kernel_h,
 99 |                                   const int kernel_w, const int stride_h,
100 |                                   const int stride_w, scalar_t *output_data){
101 |   int pooled_height = height/stride_h;
102 |   int pooled_width = width/stride_w;
103 |   // Run in parallel for each cell within each kernel region
104 |   CUDA_1D_KERNEL_LOOP(index, nthreads) {
105 |     int pw = index % pooled_width; // index over width of each kernel operation in relation to the position in the input
106 |     int ph = (index / pooled_width) % pooled_height; // index  over height of each kernel operation in relation to the position in the input
107 |     int c = (index / pooled_width / pooled_height) % channels;
108 |     int n = index / pooled_width / pooled_height / channels;
109 | 
110 |     const int offset = (n * channels + c) * height * width; // initial offset
111 |     const scalar_t *offset_bottom_input = bottom_input + offset;
112 | 
113 |     const int base_y = ph*stride_h;// start cell index over height/y for each kernel
114 |     if (base_y > height - kernel_h)break; // limit height/y iterations for the index of the final kernel location in the input
115 | 
116 |     const int base_x = pw*stride_w; // start cell index over width/x for each kernel
117 |     if (base_x > width - kernel_w)break; // limit width/x iterations for the index of the final kernel location in the input
118 | 
119 |     // --- Initialisations happen here ----
120 |     scalar_t mask_sum_max = 0.;
121 | 
122 |     output_data[index] = 0.;
123 |     const scalar_t upper = n_limits<scalar_t>::max();
124 |     const scalar_t lower = n_limits<scalar_t>::min();
125 |     const scalar_t zero = 0.;
126 | 
127 |     // Iterate over inputs cells within each kernel region in the input
128 |     for(int iy=0; iy<kernel_h; iy++){
129 |       const int y_offset = base_y + iy;
130 | 
131 |       if(y_offset >= height || y_offset < 0)continue; // check if the offset index over y is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing)
132 | 
133 |       for(int ix=0; ix<kernel_w; ix++){
134 |         const int x_offset = base_x + ix;
135 | 
136 |         if(x_offset >= width || x_offset < 0)continue; // check if the offset index over x is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing)
137 | 
138 |         const int offset = y_offset*width + x_offset;
139 | 
140 |         // Use this for verbose when debugging
141 |         // printf("(ph: %d, pw: %d), base_y: %d, base_x: %d, iy: %d, ix: %d offset: %d \n", ph, pw, base_y, base_x, iy, ix, offset)
142 | 
143 |         mask_sum_max += exp(offset_bottom_input[offset]);
144 | 
145 |       }
146 |     }
147 |     // Overflow check
148 |     mask_sum_max = clamp(mask_sum_max, lower, upper);
149 | 
150 | 
151 |     for(int iy=0; iy<kernel_h; iy++){
152 |       const int y_offset = base_y + iy; // offset adjustment (y-based)
153 | 
154 |       if(y_offset >= height || y_offset < 0)continue;
155 | 
156 |       for(int ix=0; ix<kernel_w; ix++){
157 |         const int x_offset = base_x + ix; // offset adjustment (x-based)
158 | 
159 |         if(x_offset >= width || x_offset < 0)continue;
160 |         const int offset = y_offset*width + x_offset; // x+y adjusted offset
161 | 
162 |         scalar_t mask_ = exp(offset_bottom_input[offset])/  mask_sum_max; // SoftMax
163 | 
164 |         output_data[index] += offset_bottom_input[offset] * mask_;
165 |         output_data[index] = clamp(output_data[index], zero, upper);
166 |       }
167 |     }
168 |   }
169 | }
170 | 
171 | 
172 | template <typename scalar_t>
173 | __global__ void SoftPool3dForward(const int nthreads,
174 |                                   const scalar_t *bottom_input, const int batches,
175 |                                   const int channels, const int depth,
176 |                                   const int height, const int width,
177 |                                   const int kernel_d, const int kernel_h,
178 |                                   const int kernel_w, const int stride_d,
179 |                                   const int stride_h, const int stride_w,
180 |                                   scalar_t *output_data){
181 |     int pooled_depth = depth/stride_d;
182 |     int pooled_height = height/stride_h;
183 |     int pooled_width = width/stride_w;
184 |     CUDA_1D_KERNEL_LOOP(index, nthreads) {
185 |       int pw = index % pooled_width;
186 |       int ph = (index / pooled_width) % pooled_height;
187 |       int pd = (index / pooled_width / pooled_height) % pooled_depth;
188 |       int c = (index / pooled_width / pooled_height / pooled_depth) % channels;
189 |       int n = index / pooled_width / pooled_height / pooled_depth / channels;
190 | 
191 |       const int offset = (n * channels + c) * depth * height * width;
192 |       const scalar_t *offset_bottom_input = bottom_input + offset;
193 | 
194 |       scalar_t mask_sum = 0.;
195 |       output_data[index] = 0.;
196 |       const scalar_t upper = n_limits<scalar_t>::max();
197 |       const scalar_t lower = n_limits<scalar_t>::min();
198 |       const scalar_t zero = 0.;
199 | 
200 |       for(int id=0; id<kernel_d; id++){
201 |         const int d_offset = pd*stride_d + id - kernel_d/2;
202 |         if(d_offset >= depth || d_offset < 0)continue;
203 |         for(int iy=0; iy<kernel_h; iy++){
204 |           const int y_offset = ph*stride_h + iy - kernel_h/2;
205 |           if(y_offset >= height || y_offset < 0)continue;
206 |           for(int ix=0; ix<kernel_w; ix++){
207 |             const int x_offset = pw*stride_w + ix - kernel_w/2;
208 |             if(x_offset >= width || x_offset < 0)continue;
209 |             const int offset = d_offset*height + y_offset*width + x_offset;
210 | 
211 |             // (Over/Under)flow check (A.) 0 <= e^{inp[offset]} <= FLT_MAX
212 |             scalar_t mask = exp(offset_bottom_input[offset]);
213 |             mask = clamp(mask, zero, upper);
214 |             mask_sum += mask;
215 |           }
216 |         }
217 |       }
218 |       // Overflow check (B.) FLT_MIN <= sum{e^{inp[offset]}} <= FLT_MAX
219 |       mask_sum = clamp(mask_sum, lower, upper);
220 | 
221 |       for(int id=0; id<kernel_d; id++){
222 |         const int d_offset = pd*stride_d + id - kernel_d/2;
223 |         if(d_offset >= depth || d_offset < 0)continue;
224 |         for(int iy=0; iy<kernel_h; iy++){
225 |           const int y_offset = ph*stride_h + iy - kernel_h/2;
226 |           if(y_offset >= height || y_offset < 0)continue;
227 |           for(int ix=0; ix<kernel_w; ix++){
228 |             const int x_offset = pw*stride_w + ix - kernel_w/2;
229 |             if(x_offset >= width || x_offset < 0)continue;
230 |             const int offset = d_offset*height + y_offset*width + x_offset;
231 | 
232 |             // (Over/Under)flow check (C.) 0 <= e^{inp[offset]} <= FLT_MAX
233 |             scalar_t mask = exp(offset_bottom_input[offset]);
234 |             mask = clamp(mask, zero, upper);
235 | 
236 |             // Underflow check (D.) 0 <= e^{inp[offset]}/sum{e^{inp[offset]}} <= 1
237 |             mask /=  mask_sum;
238 |             mask = clamp(mask, zero, upper);
239 | 
240 |             // Underflow check (E.) 0 <= (e^{inp[offset]}/sum{e^{inp[offset]}}) * inp[offset] <= FLT_MAX
241 |             scalar_t weighted_inp = offset_bottom_input[offset] * mask;
242 |             weighted_inp = clamp(weighted_inp, zero, upper);
243 | 
244 |             // Overflow check (F.) 0 <= sum[(e^{inp[offset]}/sum{e^{inp[offset]}}) * inp[offset]] <= FLT_MAX
245 |             output_data[index] += weighted_inp;
246 |             output_data[index] = clamp(output_data[index], zero, upper);
247 |           }
248 |         }
249 |       }
250 |     }
251 | }
252 | 
253 | 
254 | int SoftPool1dForwardLauncher(const at::Tensor input, const int batches,
255 |                              const int channels, const int dim,
256 |                              const int kernel_d, const int stride_d,
257 |                              at::Tensor output){
258 |     const int output_size = batches * dim/stride_d * channels;
259 |     AT_DISPATCH_FLOATING_TYPES_AND_HALF(
260 |         input.scalar_type(), "SoftPool1dLauncherForward", ([&] {
261 |         const scalar_t *bottom_input = input.data_ptr<scalar_t>();
262 |         scalar_t *output_data = output.data_ptr<scalar_t>();
263 | 
264 |         SoftPool1dForward<scalar_t>
265 |         <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK>>>(
266 |           output_size, bottom_input,
267 |           batches, channels,
268 |           dim, kernel_d,
269 |           stride_d, output_data);
270 |         })
271 |       );
272 | 
273 |     cudaError_t err = cudaGetLastError();
274 |     if (cudaSuccess != err) {
275 |       fprintf(stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString(err));
276 |       exit(-1);
277 |     }
278 |   return 1;
279 | }
280 | 
281 | int SoftPool2dForwardLauncher(const at::Tensor input, const int batches,
282 |                              const int channels, const int height,
283 |                              const int width, const int kernel_h,
284 |                              const int kernel_w, const int stride_h,
285 |                              const int stride_w, at::Tensor output){
286 |     const int output_size = batches * height/stride_h * width/stride_w * channels;
287 |     AT_DISPATCH_FLOATING_TYPES_AND_HALF(
288 |         input.scalar_type(), "SoftPool2dLauncherForward", ([&] {
289 |         const scalar_t *bottom_input = input.data_ptr<scalar_t>();
290 |         scalar_t *output_data = output.data_ptr<scalar_t>();
291 | 
292 |         SoftPool2dForward<scalar_t>
293 |         <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK>>>(
294 |           output_size, bottom_input,
295 |           batches, channels,
296 |           height, width,
297 |           kernel_h, kernel_w,
298 |           stride_h, stride_w,
299 |           output_data);
300 |         })
301 |       );
302 | 
303 |     cudaError_t err = cudaGetLastError();
304 |     if (cudaSuccess != err) {
305 |       fprintf(stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString(err));
306 |       exit(-1);
307 |     }
308 |   return 1;
309 | }
310 | 
311 | int SoftPool3dForwardLauncher(const at::Tensor input, const int batches,
312 |                              const int channels, const int depth,
313 |                              const int height, const int width,
314 |                              const int kernel_d, const int kernel_h,
315 |                              const int kernel_w, const int stride_d,
316 |                              const int stride_h, const int stride_w,
317 |                             at::Tensor output){
318 |     const int output_size = batches * depth/stride_d * height/stride_h * width/stride_w * channels;
319 |     AT_DISPATCH_FLOATING_TYPES_AND_HALF(
320 |         input.scalar_type(), "SoftPool3dLauncherForward", ([&] {
321 |         const scalar_t *bottom_input = input.data_ptr<scalar_t>();
322 |         scalar_t *output_data = output.data_ptr<scalar_t>();
323 | 
324 |         SoftPool3dForward<scalar_t>
325 |         <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK>>>(
326 |           output_size, bottom_input,
327 |           batches, channels,
328 |           depth, height,
329 |           width, kernel_d,
330 |           kernel_h, kernel_w,
331 |           stride_d, stride_h,
332 |           stride_w, output_data);
333 |         })
334 |       );
335 | 
336 |     cudaError_t err = cudaGetLastError();
337 |     if (cudaSuccess != err) {
338 |       fprintf(stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString(err));
339 |       exit(-1);
340 |     }
341 |   return 1;
342 | }
343 | 
344 | 
345 | template <typename scalar_t>
346 | __global__ void SoftPool1dBackward(const int nthreads,
347 |                               const scalar_t *diff_output, const scalar_t *data_input,
348 |                               const int batches, const int channels,
349 |                               const int dim, const int kernel_d,
350 |                               const int stride_d, scalar_t *diff_input){
351 |     int pooled_dim = dim/stride_d;
352 |     // Run in parallel for each cell within each kernel region
353 |     CUDA_1D_KERNEL_LOOP(index, nthreads) {
354 |       int pd = index % pooled_dim; // index of each kernel operation in relation to the position in the input
355 |       int c = (index / pooled_dim) % channels;
356 |       int n = index / pooled_dim / channels;
357 | 
358 |       const int offset0 = (n * channels + c) * dim; // initial offset
359 |       const scalar_t *offset_data_input = data_input + offset0; // offset based on the input data
360 | 
361 |       const scalar_t diff_output_index = diff_output[index]; // offset based on the output gradients
362 |       scalar_t *offset_diff_input = diff_input + offset0; // offset based on the input gradients
363 | 
364 |       const int base_d = pd*stride_d; // start cell index for each kernel
365 | 
366 |       // --- Initialisations happen here ----
367 |       scalar_t mask_sum_max = 0.;
368 |       const scalar_t upper = n_limits<scalar_t>::max();
369 |       const scalar_t lower = n_limits<scalar_t>::min();
370 | 
371 |       // Iterate over inputs cells within each kernel region in the input
372 |       for(int id=0; id<kernel_d; id++){
373 |         const int d_offset = base_d + id;
374 | 
375 |         if(d_offset >= dim || d_offset < 0)continue; // check if the offset index is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing)
376 |         const int offset = d_offset;
377 | 
378 |         // Use this for verbose when debugging
379 |         //printf("(pd: %d), base_d: %d, id: %d, d_offset: %d \n", pd, base_d, id, d_offset);
380 | 
381 |         mask_sum_max += exp(offset_data_input[offset]);
382 | 
383 |       }
384 |       // Overflow check
385 |       mask_sum_max = clamp(mask_sum_max, lower, upper);
386 | 
387 |       for(int id=0; id<kernel_d; id++){
388 |         const int d_offset = base_d + id;
389 | 
390 |         if(d_offset >= dim || d_offset < 0)continue;
391 |           const int offset = d_offset;
392 | 
393 |           scalar_t mask_ = exp(offset_data_input[offset])/mask_sum_max; // SoftMax
394 | 
395 |           scalar_t weighted_grad = diff_output_index * mask_; // use mask over the output gradients
396 | 
397 |           // Underflow check
398 |           weighted_grad = clamp(weighted_grad, lower, upper);
399 | 
400 |           atomicAdd(offset_diff_input+offset, weighted_grad);
401 |       }
402 |     }
403 | }
404 | 
405 | template <typename scalar_t>
406 | __global__ void SoftPool2dBackward(const int nthreads,
407 |                               const scalar_t *diff_output, const scalar_t *data_input,
408 |                               const int batches, const int channels,
409 |                               const int height, const int width,
410 |                               const int kernel_h, const int kernel_w,
411 |                               const int stride_h, const int stride_w,
412 |                               scalar_t *diff_input){
413 |     int pooled_height = height/stride_h;
414 |     int pooled_width = width/stride_w;
415 |     // Run in parallel for each cell within each kernel region
416 |     CUDA_1D_KERNEL_LOOP(index, nthreads) {
417 |       int pw = index % pooled_width; // index over width of each kernel operation in relation to the position in the input
418 |       int ph = (index / pooled_width) % pooled_height; // index  over height of each kernel operation in relation to the position in the input
419 |       int c = (index / pooled_width / pooled_height) % channels;
420 |       int n = index / pooled_width / pooled_height / channels;
421 | 
422 |       const int offset0 = (n * channels + c) * height * width; // initial offset
423 |       const scalar_t *offset_data_input = data_input + offset0; // offset based on the input data
424 | 
425 |       const scalar_t diff_output_index = diff_output[index]; // offset based on the output gradients
426 |       scalar_t *offset_diff_input = diff_input + offset0; // offset based on the input gradients
427 | 
428 |       const int base_y = ph * stride_h; // start cell index over height/y for each kernel
429 |       if (base_y > height - kernel_h)break; // limit height/y iterations for the index of the final kernel location in the input
430 | 
431 |       const int base_x = pw * stride_w; // start cell index over width/x for each kernel
432 |       if (base_x > width - kernel_w)break; // limit width/x iterations for the index of the final kernel location in the input
433 | 
434 |       // --- Initialisations happen here ----
435 |       scalar_t mask_sum_max = 0.;
436 | 
437 |       const scalar_t upper = n_limits<scalar_t>::max();
438 |       const scalar_t lower = n_limits<scalar_t>::min();
439 | 
440 |       // Iterate over inputs cells within each kernel region in the input
441 |       for(int iy=0; iy<kernel_h; iy++){
442 |         const int y_offset = base_y + iy;
443 | 
444 |         if(y_offset >= height || y_offset < 0)continue; // check if the offset index over y is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing)
445 | 
446 |         for(int ix=0; ix<kernel_w; ix++){
447 |           const int x_offset = base_x + ix;
448 | 
449 |           if(x_offset >= width || x_offset < 0)continue; // check if the offset index over x is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing)
450 | 
451 |           const int offset = y_offset*width + x_offset;
452 | 
453 |           // Use this for verbose when debugging
454 |           // printf("(ph: %d, pw: %d), base_y: %d, base_x: %d, iy: %d, ix: %d offset: %d \n", ph, pw, base_y, base_x, iy, ix, offset)
455 | 
456 |           mask_sum_max += exp(offset_data_input[offset]);
457 | 
458 |         }
459 |       }
460 |       // Overflow check
461 |       mask_sum_max = clamp(mask_sum_max, lower, upper);
462 | 
463 |       for(int iy=0; iy<kernel_h; iy++){
464 |         const int y_offset = base_y + iy; // offset adjustment (y-based)
465 | 
466 |         if(y_offset >= height || y_offset < 0)continue;
467 |         for(int ix=0; ix<kernel_w; ix++){
468 |           const int x_offset = base_x + ix;
469 | 
470 |           if(x_offset >= width || x_offset < 0)continue;
471 |             const int offset = y_offset*width + x_offset; // offset adjustment (x-based)
472 | 
473 |             scalar_t mask_ = exp(offset_data_input[offset])/mask_sum_max; // SoftMax (sum)
474 | 
475 |             scalar_t weighted_grad = diff_output_index * mask_; // use mask over the output gradients
476 | 
477 |             // Underflow check
478 |             weighted_grad = clamp(weighted_grad, lower, upper);
479 | 
480 |             atomicAdd(offset_diff_input+offset, weighted_grad);
481 |         }
482 |       }
483 |     }
484 | }
485 | 
486 | template <typename scalar_t>
487 | __global__ void SoftPool3dBackward(const int nthreads,
488 |                               const scalar_t *diff_output, const scalar_t *data_input,
489 |                               const int batches, const int channels,
490 |                               const int depth, const int height,
491 |                               const int width, const int kernel_d,
492 |                               const int kernel_h, const int kernel_w ,
493 |                               const int stride_d, const int stride_h,
494 |                               const int stride_w, scalar_t *diff_input){
495 |     int pooled_depth = depth/stride_d;
496 |     int pooled_height = width/stride_h;
497 |     int pooled_width = width/stride_w;
498 |     CUDA_1D_KERNEL_LOOP(index, nthreads) {
499 |       int pw = index % pooled_width; // index over width of each kernel operation in relation to the position in the input
500 |       int ph = (index / pooled_width) % pooled_height; // index over height of each kernel operation in relation to the position in the input
501 |       int pd = (index / pooled_width / pooled_height) % pooled_depth; // index over depth of each kernel operation in relation to the position in the input
502 |       int c = (index / pooled_width / pooled_height / pooled_depth) % channels;
503 |       int n = index / pooled_width / pooled_height / pooled_depth / channels;
504 | 
505 |       const int offset0 = (n * channels + c) * depth * height * width; // initial offset
506 |       const scalar_t *offset_data_input = data_input + offset0; // offset based on the input data
507 | 
508 |       const scalar_t diff_output_index = diff_output[index]; // offset based on the output gradients
509 |       scalar_t *offset_diff_input = diff_input + offset0; // offset based on the input gradients
510 | 
511 |       const int base_d = pd*stride_d; // start cell index over depth/d for each kernel
512 |       if (base_d > depth - kernel_d)break; // limit depth/d iterations for the index of the final kernel location in the input
513 | 
514 |       const int base_y = ph*stride_h; // start cell index over height/y for each kernel
515 |       if (base_y > height - kernel_h)break; // limit height/y iterations for the index of the final kernel location in the input
516 | 
517 |       const int base_x = pw*stride_w; // start cell index over width/x for each kernel
518 |       if (base_x > width - kernel_w)break; // limit width/x iterations for the index of the final kernel location in the input
519 | 
520 |       // --- Initialisations happen here ----
521 |       scalar_t mask_sum_max = 0.;
522 | 
523 |       const scalar_t upper = n_limits<scalar_t>::max();
524 |       const scalar_t lower = n_limits<scalar_t>::min();
525 | 
526 |       // Iterate over inputs cells within each kernel region in the input
527 |       for(int id=0; id<kernel_d; id++){
528 |         const int d_offset = base_d + id;
529 | 
530 |         if(d_offset >= depth || d_offset < 0)continue; // check if the offset index over d is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing)
531 | 
532 |         for(int iy=0; iy<kernel_h; iy++){
533 |           const int y_offset = base_y + iy;
534 | 
535 |           if(y_offset >= height || y_offset < 0)continue; // check if the offset index over y is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing)
536 | 
537 |           for(int ix=0; ix<kernel_w; ix++){
538 |             const int x_offset = base_x + ix;
539 | 
540 |             if(x_offset >= width || x_offset < 0)continue; // check if the offset index over x is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing)
541 | 
542 |             const int offset = d_offset*height + y_offset*width + x_offset;
543 | 
544 |             // Use this for verbose when debugging
545 |             // printf("(pd: %d, ph: %d, pw: %d), base_d: %d, base_y: %d, base_x: %d, id: %d, iy: %d, ix: %d, offset: %d \n", pd, ph, pw, base_d, base_y, base_x, id, iy, ix, offset);
546 | 
547 |             mask_sum_max += exp(offset_data_input[offset]);
548 | 
549 |           }
550 |         }
551 |       }
552 |       // Overflow check
553 |       mask_sum_max = clamp(mask_sum_max, lower, upper);
554 | 
555 |       for(int id=0; id<kernel_d; id++){
556 |         const int d_offset = base_d + id; // offset adjustment (d-based)
557 | 
558 |         if(d_offset >= depth || d_offset < 0)continue;
559 |         for(int iy=0; iy<kernel_h; iy++){
560 |           const int y_offset = base_y + iy; // offset adjustment (y-based)
561 | 
562 |           if(y_offset >= height || y_offset < 0)continue;
563 |           for(int ix=0; ix<kernel_w; ix++){
564 |             const int x_offset = base_x + ix; // offset adjustment (x-based)
565 | 
566 |             if(x_offset >= width || x_offset < 0)continue;
567 |               const int offset = d_offset*height + y_offset*width + x_offset;
568 | 
569 |               scalar_t mask_ = exp(offset_data_input[offset])/mask_sum_max; // SoftMax
570 | 
571 |               scalar_t weighted_grad = diff_output_index * mask_; // use mask over the output gradients
572 | 
573 |               // Underflow check
574 |               weighted_grad = clamp(weighted_grad, lower, upper);
575 | 
576 |               atomicAdd(offset_diff_input+offset, weighted_grad);
577 |           }
578 |         }
579 |       }
580 |     }
581 | }
582 | 
583 | int SoftPool1dBackwardLauncher(const at::Tensor output_grad, const at::Tensor input,
584 |                                const int batches, const int channels,
585 |                                const int dim, const int kernel_d,
586 |                                const int stride_d, at::Tensor input_grad){
587 | 
588 |     const int output_size = batches * dim/stride_d * channels;
589 | 
590 |     AT_DISPATCH_FLOATING_TYPES_AND_HALF(
591 |         input.scalar_type(), "SoftPool1dLauncherBackward", ([&] {
592 |         scalar_t *diff_input = input_grad.data_ptr<scalar_t>();
593 |         const scalar_t *diff_output = output_grad.data_ptr<scalar_t>();
594 |         const scalar_t *data_input = input.data_ptr<scalar_t>();
595 | 
596 |         SoftPool1dBackward<scalar_t>
597 |         <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK>>>(
598 |           output_size, diff_output,
599 |           data_input, batches,
600 |           channels, dim,
601 |           kernel_d, stride_d,
602 |           diff_input);
603 |         }
604 |         )
605 |         );
606 | 
607 |     cudaError_t err = cudaGetLastError();
608 |     if (cudaSuccess != err) {
609 |       fprintf(stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString(err));
610 |       exit(-1);
611 |     }
612 |   return 1;
613 | }
614 | 
615 | int SoftPool2dBackwardLauncher(const at::Tensor output_grad, const at::Tensor input,
616 |                                const int batches, const int channels,
617 |                                const int height, const int width,
618 |                                const int kernel_h, const int kernel_w,
619 |                                const int stride_h, const int stride_w,
620 |                                at::Tensor input_grad){
621 | 
622 |     const int output_size = batches * height/stride_h * width/stride_w * channels;
623 | 
624 |     AT_DISPATCH_FLOATING_TYPES_AND_HALF(
625 |         input.scalar_type(), "SoftPool2dLauncherBackward", ([&] {
626 |         scalar_t *diff_input = input_grad.data_ptr<scalar_t>();
627 |         const scalar_t *diff_output = output_grad.data_ptr<scalar_t>();
628 |         const scalar_t *data_input = input.data_ptr<scalar_t>();
629 | 
630 |         SoftPool2dBackward<scalar_t>
631 |         <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK>>>(
632 |           output_size, diff_output,
633 |           data_input, batches,
634 |           channels, height,
635 |           width, kernel_h,
636 |           kernel_w, stride_h,
637 |           stride_w, diff_input);
638 |         }
639 |         )
640 |         );
641 | 
642 |     cudaError_t err = cudaGetLastError();
643 |     if (cudaSuccess != err) {
644 |       fprintf(stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString(err));
645 |       exit(-1);
646 |     }
647 |   return 1;
648 | }
649 | 
650 | int SoftPool3dBackwardLauncher(const at::Tensor output_grad, const at::Tensor input,
651 |                                const int batches, const int channels,
652 |                                const int depth, const int height,
653 |                                const int width, const int kernel_d,
654 |                                const int kernel_h, const int kernel_w,
655 |                                const int stride_d, const int stride_h,
656 |                                const int stride_w, at::Tensor input_grad){
657 | 
658 |     const int output_size = batches * depth/stride_d * height/stride_h * width/stride_w * channels;
659 | 
660 |     AT_DISPATCH_FLOATING_TYPES_AND_HALF(
661 |         input.scalar_type(), "SoftPool3dLauncherBackward", ([&] {
662 |         scalar_t *diff_input = input_grad.data_ptr<scalar_t>();
663 |         const scalar_t *diff_output = output_grad.data_ptr<scalar_t>();
664 |         const scalar_t *data_input = input.data_ptr<scalar_t>();
665 | 
666 |         SoftPool3dBackward<scalar_t>
667 |         <<<GET_BLOCKS(output_size), THREADS_PER_BLOCK>>>(
668 |           output_size, diff_output,
669 |           data_input, batches,
670 |           channels, depth, height,
671 |           width, kernel_d,
672 |           kernel_h, kernel_w,
673 |           stride_d, stride_h,
674 |           stride_w, diff_input);
675 |         }
676 |         )
677 |         );
678 | 
679 |     cudaError_t err = cudaGetLastError();
680 |     if (cudaSuccess != err) {
681 |       fprintf(stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString(err));
682 |       exit(-1);
683 |     }
684 |   return 1;
685 | }
686 | 


--------------------------------------------------------------------------------
/pytorch/Makefile:
--------------------------------------------------------------------------------
 1 | install: clean
 2 | 	python setup.py install
 3 | 
 4 | clean:
 5 | 	rm -rf *.egg-info
 6 | 	rm -rf build dist
 7 | 
 8 | test:
 9 | 	cd test-files && python test.py
10 | 


--------------------------------------------------------------------------------
/pytorch/SoftPool/__init__.py:
--------------------------------------------------------------------------------
1 | from .idea import soft_pool1d, soft_pool2d, soft_pool3d, SoftPool1d, SoftPool2d, SoftPool3d
2 | 
3 | __all__ = ['soft_pool1d', 'soft_pool2d', 'soft_pool3d', 'SoftPool1d', 'SoftPool2d', 'SoftPool3d']
4 | 


--------------------------------------------------------------------------------
/pytorch/SoftPool/__pycache__/idea.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/pytorch/SoftPool/__pycache__/idea.cpython-38.pyc


--------------------------------------------------------------------------------
/pytorch/SoftPool/idea.py:
--------------------------------------------------------------------------------
  1 | from torch import nn
  2 | from torch.autograd import Function
  3 | import torch.nn.functional as F
  4 | import torch
  5 | from torch.nn.modules.utils import _triple, _pair, _single
  6 | 
  7 | import softpool_cuda
  8 | 
  9 | 
 10 | class CUDA_SOFTPOOL1d(Function):
 11 |     @staticmethod
 12 |     @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
 13 |     def forward(ctx, input, kernel=2, stride=None):
 14 |         # Create contiguous tensor (if tensor is not contiguous)
 15 |         no_batch = False
 16 |         if len(input.size()) == 2:
 17 |             no_batch = True
 18 |             input.unsqueeze_(0)
 19 |         B, C, D = input.size()
 20 |         kernel = _single(kernel)
 21 |         if stride is None:
 22 |             stride = kernel
 23 |         else:
 24 |             stride = _single(stride)
 25 |         oD = (D-kernel[0]) // stride[0] + 1
 26 |         output = input.new_zeros((B, C, oD))
 27 |         softpool_cuda.forward_1d(input.contiguous(), kernel, stride, output)
 28 |         ctx.save_for_backward(input)
 29 |         ctx.kernel = kernel
 30 |         ctx.stride = stride
 31 |         if no_batch:
 32 |             return output.squeeze_(0)
 33 |         return output
 34 | 
 35 |     @staticmethod
 36 |     @torch.cuda.amp.custom_bwd
 37 |     def backward(ctx, grad_output):
 38 |         # Create contiguous tensor (if tensor is not contiguous)
 39 |         grad_input = torch.zeros_like(ctx.saved_tensors[0])
 40 |         saved = [grad_output.contiguous()] + list(ctx.saved_tensors) + [ctx.kernel, ctx.stride] + [grad_input]
 41 |         softpool_cuda.backward_1d(*saved)
 42 |         # Gradient underflow
 43 |         saved[-1][torch.isnan(saved[-1])] = 0
 44 |         return saved[-1], None, None
 45 | 
 46 | 
 47 | class CUDA_SOFTPOOL2d(Function):
 48 |     @staticmethod
 49 |     @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
 50 |     def forward(ctx, input, kernel=2, stride=None):
 51 |         # Create contiguous tensor (if tensor is not contiguous)
 52 |         no_batch = False
 53 |         if len(input.size()) == 3:
 54 |             no_batch = True
 55 |             input.unsqueeze_(0)
 56 |         B, C, H, W = input.size()
 57 |         kernel = _pair(kernel)
 58 |         if stride is None:
 59 |             stride = kernel
 60 |         else:
 61 |             stride = _pair(stride)
 62 |         oH = (H - kernel[0]) // stride[0] + 1
 63 |         oW = (W - kernel[1]) // stride[1] + 1
 64 |         output = input.new_zeros((B, C, oH, oW))
 65 |         softpool_cuda.forward_2d(input.contiguous(), kernel, stride, output)
 66 |         ctx.save_for_backward(input)
 67 |         ctx.kernel = kernel
 68 |         ctx.stride = stride
 69 |         if no_batch:
 70 |             return output.squeeze_(0)
 71 |         return output
 72 | 
 73 |     @staticmethod
 74 |     @torch.cuda.amp.custom_bwd
 75 |     def backward(ctx, grad_output):
 76 |         # Create contiguous tensor (if tensor is not contiguous)
 77 |         grad_input = torch.zeros_like(ctx.saved_tensors[0])
 78 |         saved = [grad_output.contiguous()] + list(ctx.saved_tensors) + [ctx.kernel,ctx.stride] + [grad_input]
 79 |         softpool_cuda.backward_2d(*saved)
 80 |         # Gradient underflow
 81 |         saved[-1][torch.isnan(saved[-1])] = 0
 82 |         return saved[-1], None, None
 83 | 
 84 | 
 85 | class CUDA_SOFTPOOL3d(Function):
 86 |     @staticmethod
 87 |     @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
 88 |     def forward(ctx, input, kernel=2, stride=None):
 89 |         # Create contiguous tensor (if tensor is not contiguous)
 90 |         no_batch = False
 91 |         if len(input.size()) == 3:
 92 |             no_batch = True
 93 |             input.unsqueeze_(0)
 94 |         B, C, D, H, W = input.size()
 95 |         kernel = _triple(kernel)
 96 |         if stride is None:
 97 |             stride = kernel
 98 |         else:
 99 |             stride = _triple(stride)
100 |         oD = (D - kernel[0]) // stride[0] + 1
101 |         oH = (H - kernel[1]) // stride[1] + 1
102 |         oW = (W - kernel[2]) // stride[2] + 1
103 |         output = input.new_zeros((B, C, oD, oH, oW))
104 |         softpool_cuda.forward_3d(input.contiguous(), kernel, stride, output)
105 |         ctx.save_for_backward(input)
106 |         ctx.kernel = kernel
107 |         ctx.stride = stride
108 |         if no_batch:
109 |             return output.squeeze_(0)
110 |         return output
111 | 
112 |     @staticmethod
113 |     @torch.cuda.amp.custom_bwd
114 |     def backward(ctx, grad_output):
115 |         # Create contiguous tensor (if tensor is not contiguous)
116 |         grad_input = torch.zeros_like(ctx.saved_tensors[0])
117 |         saved = [grad_output.contiguous()] + list(ctx.saved_tensors) + [ctx.kernel,ctx.stride] + [grad_input]
118 |         softpool_cuda.backward_3d(*saved)
119 |         # Gradient underflow
120 |         saved[-1][torch.isnan(saved[-1])] = 0
121 |         return saved[-1], None, None
122 | 
123 | 
124 | 
125 | '''
126 | ---  S T A R T  O F  F U N C T I O N  S O F T _ P O O L 1 D  ---
127 |     [About]
128 |         Function for dowsampling based on the exponenial proportion rate of pixels (soft pooling).
129 |         If the tensor is in CUDA the custom operation is used. Alternatively, the function uses
130 |         standard (mostly) in-place PyTorch operations for speed and reduced memory consumption.
131 |         It is also possible to use non-inplace operations in order to improve stability.
132 |     [Args]
133 |         - x: PyTorch Tensor, could be in either cpu of CUDA. If in CUDA the homonym extension is used.
134 |         - kernel_size: Integer or Tuple, for the kernel size to be used for downsampling. If an `Integer`
135 |                        is used, a `Tuple` is created for the rest of the dimensions. Defaults to 2.
136 |         - stride: Integer or Tuple, for the steps taken between kernels (i.e. strides). If `None` the
137 |                   strides become equal to the `kernel_size` tuple. Defaults to `None`.
138 |         - force_inplace: Bool, determines if in-place operations are to be used regardless of the CUDA
139 |                          custom op. Mostly useful for time monitoring. Defaults to `False`.
140 |     [Returns]
141 |         - PyTorch Tensor, subsampled based on the specified `kernel_size` and `stride`
142 | '''
143 | def soft_pool1d(x, kernel_size=2, stride=None, force_inplace=False):
144 |     if x.is_cuda and not force_inplace:
145 |         x = CUDA_SOFTPOOL1d.apply(x, kernel_size, stride)
146 |         # Replace `NaN's if found
147 |         if torch.isnan(x).any():
148 |             return torch.nan_to_num(x)
149 |         return x
150 |     kernel_size = _single(kernel_size)
151 |     if stride is None:
152 |         stride = kernel_size
153 |     else:
154 |         stride = _single(stride)
155 |     # Get input sizes
156 |     _, c, d = x.size()
157 |     # Create exponential mask (should be similar to max-like pooling)
158 |     e_x = torch.sum(torch.exp(x),dim=1,keepdim=True)
159 |     e_x = torch.clamp(e_x , float(0), float('inf'))
160 |     # Apply mask to input and pool and calculate the exponential sum
161 |     # Tensor: [b x c x d] -> [b x c x d']
162 |     x = F.avg_pool1d(x.mul(e_x), kernel_size, stride=stride).mul_(sum(kernel_size)).div_(F.avg_pool1d(e_x, kernel_size, stride=stride).mul_(sum(kernel_size)))
163 |     return torch.clamp(x , float(0), float('inf'))
164 | '''
165 | ---  E N D  O F  F U N C T I O N  S O F T _ P O O L 1 D  ---
166 | '''
167 | 
168 | 
169 | 
170 | '''
171 | ---  S T A R T  O F  F U N C T I O N  S O F T _ P O O L 2 D  ---
172 |     [About]
173 |         Function for dowsampling based on the exponenial proportion rate of pixels (soft pooling).
174 |         If the tensor is in CUDA the custom operation is used. Alternatively, the function uses
175 |         standard (mostly) in-place PyTorch operations for speed and reduced memory consumption.
176 |         It is also possible to use non-inplace operations in order to improve stability.
177 |     [Args]
178 |         - x: PyTorch Tensor, could be in either cpu of CUDA. If in CUDA the homonym extension is used.
179 |         - kernel_size: Integer or Tuple, for the kernel size to be used for downsampling. If an `Integer`
180 |                        is used, a `Tuple` is created for the rest of the dimensions. Defaults to 2.
181 |         - stride: Integer or Tuple, for the steps taken between kernels (i.e. strides). If `None` the
182 |                   strides become equal to the `kernel_size` tuple. Defaults to `None`.
183 |         - force_inplace: Bool, determines if in-place operations are to be used regardless of the CUDA
184 |                          custom op. Mostly useful for time monitoring. Defaults to `False`.
185 |     [Returns]
186 |         - PyTorch Tensor, subsampled based on the specified `kernel_size` and `stride`
187 | '''
188 | def soft_pool2d(x, kernel_size=2, stride=None, force_inplace=False):
189 |     if x.is_cuda and not force_inplace:
190 |         x = CUDA_SOFTPOOL2d.apply(x, kernel_size, stride)
191 |         # Replace `NaN's if found
192 |         if torch.isnan(x).any():
193 |             return torch.nan_to_num(x)
194 |         return x
195 |     kernel_size = _pair(kernel_size)
196 |     if stride is None:
197 |         stride = kernel_size
198 |     else:
199 |         stride = _pair(stride)
200 |     # Get input sizes
201 |     _, c, h, w = x.size()
202 |     # Create exponential mask (should be similar to max-like pooling)
203 |     e_x = torch.sum(torch.exp(x),dim=1,keepdim=True)
204 |     e_x = torch.clamp(e_x , float(0), float('inf'))
205 |     # Apply mask to input and pool and calculate the exponential sum
206 |     # Tensor: [b x c x d] -> [b x c x d']
207 |     x = F.avg_pool2d(x.mul(e_x), kernel_size, stride=stride).mul_(sum(kernel_size)).div_(F.avg_pool2d(e_x, kernel_size, stride=stride).mul_(sum(kernel_size)))
208 |     return torch.clamp(x , float(0), float('inf'))
209 | '''
210 | ---  E N D  O F  F U N C T I O N  S O F T _ P O O L 2 D  ---
211 | '''
212 | 
213 | 
214 | 
215 | '''
216 | ---  S T A R T  O F  F U N C T I O N  S O F T _ P O O L 3 D  ---
217 |     [About]
218 |         Function for dowsampling based on the exponenial proportion rate of pixels (soft pooling).
219 |         If the tensor is in CUDA the custom operation is used. Alternatively, the function uses
220 |         standard (mostly) in-place PyTorch operations for speed and reduced memory consumption.
221 |         It is also possible to use non-inplace operations in order to improve stability.
222 |     [Args]
223 |         - x: PyTorch Tensor, could be in either cpu of CUDA. If in CUDA the homonym extension is used.
224 |         - kernel_size: Integer or Tuple, for the kernel size to be used for downsampling. If an `Integer`
225 |                        is used, a `Tuple` is created for the rest of the dimensions. Defaults to 2.
226 |         - stride: Integer or Tuple, for the steps taken between kernels (i.e. strides). If `None` the
227 |                   strides become equal to the `kernel_size` tuple. Defaults to `None`.
228 |         - force_inplace: Bool, determines if in-place operations are to be used regardless of the CUDA
229 |                          custom op. Mostly useful for time monitoring. Defaults to `False`.
230 |     [Returns]
231 |         - PyTorch Tensor, subsampled based on the specified `kernel_size` and `stride`
232 | '''
233 | def soft_pool3d(x, kernel_size=2, stride=None, force_inplace=False):
234 |     if x.is_cuda and not force_inplace:
235 |         x = CUDA_SOFTPOOL3d.apply(x, kernel_size, stride)
236 |         # Replace `NaN's if found
237 |         if torch.isnan(x).any():
238 |             return torch.nan_to_num(x)
239 |         return x
240 |     kernel_size = _triple(kernel_size)
241 |     if stride is None:
242 |         stride = kernel_size
243 |     else:
244 |         stride = _triple(stride)
245 |     # Get input sizes
246 |     _, c, d, h, w = x.size()
247 |     # Create exponential mask (should be similar to max-like pooling)
248 |     e_x = torch.sum(torch.exp(x),dim=1,keepdim=True)
249 |     e_x = torch.clamp(e_x , float(0), float('inf'))
250 |     # Apply mask to input and pool and calculate the exponential sum
251 |     # Tensor: [b x c x d x h x w] -> [b x c x d' x h' x w']
252 |     x = F.avg_pool3d(x.mul(e_x), kernel_size, stride=stride).mul_(sum(kernel_size)).div_(F.avg_pool3d(e_x, kernel_size, stride=stride).mul_(sum(kernel_size)))
253 |     return torch.clamp(x , float(0), float('inf'))
254 | '''
255 | ---  E N D  O F  F U N C T I O N  S O F T _ P O O L 3 D  ---
256 | '''
257 | 
258 | class SoftPool1d(torch.nn.Module):
259 |     def __init__(self, kernel_size=2, stride=None, force_inplace=False):
260 |         super(SoftPool1d, self).__init__()
261 |         self.kernel_size = kernel_size
262 |         self.stride = stride
263 |         self.force_inplace = force_inplace
264 | 
265 |     def forward(self, x):
266 |         return soft_pool1d(x, kernel_size=self.kernel_size, stride=self.stride, force_inplace=self.force_inplace)
267 | 
268 | 
269 | 
270 | class SoftPool2d(torch.nn.Module):
271 |     def __init__(self, kernel_size=2, stride=None, force_inplace=False):
272 |         super(SoftPool2d, self).__init__()
273 |         self.kernel_size = kernel_size
274 |         self.stride = stride
275 |         self.force_inplace = force_inplace
276 | 
277 |     def forward(self, x):
278 |         return soft_pool2d(x, kernel_size=self.kernel_size, stride=self.stride, force_inplace=self.force_inplace)
279 | 
280 | 
281 | 
282 | class SoftPool3d(torch.nn.Module):
283 |     def __init__(self, kernel_size=2, stride=None, force_inplace=False):
284 |         super(SoftPool3d, self).__init__()
285 |         self.kernel_size = kernel_size
286 |         self.stride = stride
287 |         self.force_inplace = force_inplace
288 | 
289 |     def forward(self, x):
290 |         return soft_pool3d(x, kernel_size=self.kernel_size, stride=self.stride, force_inplace=self.force_inplace)
291 | 


--------------------------------------------------------------------------------
/pytorch/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup, find_packages
 2 | from torch.utils.cpp_extension import BuildExtension, CUDAExtension
 3 | 
 4 | setup(
 5 |     name='SoftPool',
 6 |     version='1.1',
 7 |     description='CUDA-accelerated package for performing 1D/2D/3D SoftPool',
 8 |     author='Alexandros Stergiou',
 9 |     author_email='alexstergiou5@gmail.com',
10 |     license='MIT',
11 |     packages=find_packages(),
12 |     ext_modules=[
13 |         CUDAExtension('softpool_cuda', [
14 |             'CUDA/softpool_cuda.cpp',
15 |             'CUDA/softpool_cuda_kernel.cu',
16 |         ]),
17 |     ],
18 |     cmdclass={
19 |         'build_ext': BuildExtension.with_options(use_ninja=False)
20 |     })
21 | 


--------------------------------------------------------------------------------
/pytorch/test-files/._images:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/pytorch/test-files/._images


--------------------------------------------------------------------------------
/pytorch/test-files/._out_1:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/pytorch/test-files/._out_1


--------------------------------------------------------------------------------
/pytorch/test-files/._test.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/pytorch/test-files/._test.py


--------------------------------------------------------------------------------
/pytorch/test-files/test.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import softpool_cuda
  3 | from SoftPool import soft_pool1d, soft_pool2d, soft_pool3d, SoftPool1d, SoftPool2d, SoftPool3d
  4 | 
  5 | import timeit
  6 | 
  7 | 
  8 | def check_close_enough(a, check):
  9 |     a = a.cpu()
 10 |     check = check.cpu()
 11 |     residual = (a-check).data.abs().mean().cpu().item()
 12 |     assert torch.isnan(check).sum() == 0, 'meet NaN(s) in `check`'
 13 |     assert residual < .2, 'residual is not small: {}'.format(residual)
 14 | 
 15 | x_1d = torch.rand((20, 32, 128)).float()
 16 | x_2d = torch.rand((20, 32, 128, 128)).float()
 17 | x_3d = torch.rand((20, 32, 16, 128, 128)).float()
 18 | 
 19 | 
 20 | print('\033[95m' + '--- Initial checks for forward ---' + '\033[0m')
 21 | 
 22 | 
 23 | ################## 1D FORWARD ##################
 24 | print('\033[93m' + '> Checking 1D CPU ...' + '\033[0m')
 25 | try:
 26 |     pl_1d_cpu = soft_pool1d(x_1d)
 27 |     print('\033[92m' + '> PASSED' + '\033[0m')
 28 | except Exception as e:
 29 |     print('\033[91m' + '> FAILED' + '\033[0m')
 30 |     print(e)
 31 | 
 32 | print('\033[93m' + '> Checking 1D GPU ...' + '\033[0m')
 33 | try:
 34 |     pl_1d_gpu = soft_pool1d(x_1d.cuda())
 35 |     print('\033[92m' + '> PASSED' + '\033[0m')
 36 | except Exception as e:
 37 |     print('\033[91m' + '> FAILED' + '\033[0m')
 38 |     print(e)
 39 | 
 40 | print('\033[93m' + '> Checking 1D CPU-GPU output similarities ...' + '\033[0m')
 41 | try:
 42 |     check_close_enough(pl_1d_cpu.data, pl_1d_gpu.data)
 43 |     print('\033[92m' + '> PASSED' + '\033[0m'+'\n')
 44 | except Exception as e:
 45 |     print('\033[91m' + '> FAILED' + '\033[0m')
 46 |     print(e,'\n')
 47 | 
 48 | ################## 2D FORWARD ##################
 49 | print('\033[93m' + '> Checking 2D CPU ...' + '\033[0m')
 50 | try:
 51 |     pl_2d_cpu = soft_pool2d(x_2d)
 52 |     print('\033[92m' + '> PASSED' + '\033[0m')
 53 | except Exception as e:
 54 |     print('\033[91m' + '> FAILED' + '\033[0m')
 55 |     print(e)
 56 | 
 57 | print('\033[93m' + '> Checking 2D GPU ...' + '\033[0m')
 58 | try:
 59 |     pl_2d_gpu = soft_pool2d(x_2d.cuda())
 60 |     print('\033[92m' + '> PASSED' + '\033[0m')
 61 | except Exception as e:
 62 |     print('\033[91m' + '> FAILED' + '\033[0m')
 63 |     print(e)
 64 | 
 65 | print('\033[93m' + '> Checking 2D CPU-GPU output similarities ...' + '\033[0m')
 66 | try:
 67 |     check_close_enough(pl_2d_cpu.data, pl_2d_gpu.data)
 68 |     print('\033[92m' + '> PASSED' + '\033[0m'+'\n')
 69 | except Exception as e:
 70 |     print('\033[91m' + '> FAILED' + '\033[0m')
 71 |     print(e,'\n')
 72 | 
 73 | ################## 3D FORWARD ##################
 74 | print('\033[93m' + '> Checking 3D CPU ...' + '\033[0m')
 75 | try:
 76 |     pl_3d_cpu = soft_pool3d(x_3d)
 77 |     print('\033[92m' + '> PASSED' + '\033[0m')
 78 | except Exception as e:
 79 |     print('\033[91m' + '> FAILED' + '\033[0m')
 80 |     print(e)
 81 | 
 82 | print('\033[93m' + '> Checking 3D GPU ...' + '\033[0m')
 83 | try:
 84 |     pl_3d_gpu = soft_pool3d(x_3d.cuda())
 85 |     print('\033[92m' + '> PASSED' + '\033[0m')
 86 | except Exception as e:
 87 |     print('\033[91m' + '> FAILED' + '\033[0m')
 88 |     print(e)
 89 | 
 90 | print('\033[93m' + '> Checking 3D CPU-GPU output similarities ...' + '\033[0m')
 91 | try:
 92 |     check_close_enough(pl_3d_cpu.data, pl_3d_gpu.data)
 93 |     print('\033[92m' + '> PASSED' + '\033[0m'+'\n')
 94 | except Exception as e:
 95 |     print('\033[91m' + '> FAILED' + '\033[0m')
 96 |     print(e,'\n')
 97 | 
 98 | 
 99 | print('\033[95m' + '--- Initial checks for backward ---' + '\033[0m')
100 | 
101 | a_1d = torch.rand((20, 32, 128)).float()
102 | b_1d = a_1d.clone().cuda()
103 | a_2d = torch.rand((20, 32, 128, 128)).float()
104 | b_2d = a_2d.clone().cuda()
105 | a_3d = torch.rand((20, 32, 16, 128, 128)).float()
106 | b_3d = a_3d.clone().cuda()
107 | 
108 | a_1d.requires_grad = True
109 | a_2d.requires_grad = True
110 | a_3d.requires_grad = True
111 | b_1d.requires_grad = True
112 | b_2d.requires_grad = True
113 | b_3d.requires_grad = True
114 | 
115 | 
116 | print('\033[93m' + '> Checking 1D CPU ...' + '\033[0m')
117 | try:
118 |     soft_pool1d(a_1d).pow(2).mean().backward()
119 |     print('\033[92m' + '> PASSED' + '\033[0m')
120 | except Exception as e:
121 |     print('\033[91m' + '> FAILED' + '\033[0m')
122 |     print(e)
123 | 
124 | print('\033[93m' + '> Checking 1D GPU ...' + '\033[0m')
125 | try:
126 |     soft_pool1d(b_1d).pow(2).mean().backward()
127 |     print('\033[92m' + '> PASSED' + '\033[0m')
128 | except Exception as e:
129 |     print('\033[91m' + '> FAILED' + '\033[0m')
130 |     print(e)
131 | 
132 | print('\033[93m' + '> Checking 1D grad similarities ...' + '\033[0m')
133 | try:
134 |     check_close_enough(a_1d.grad.data, b_1d.grad.data)
135 |     print('\033[92m' + '> PASSED' + '\033[0m'+'\n')
136 | except Exception as e:
137 |     print('\033[91m' + '> FAILED' + '\033[0m')
138 |     print(e,'\n')
139 | 
140 | print('\033[93m' + '> Checking 2D CPU ...' + '\033[0m')
141 | try:
142 |     soft_pool2d(a_2d).pow(2).mean().backward()
143 |     print('\033[92m' + '> PASSED' + '\033[0m')
144 | except Exception as e:
145 |     print('\033[91m' + '> FAILED' + '\033[0m')
146 |     print(e)
147 | 
148 | print('\033[93m' + '> Checking 2D GPU ...' + '\033[0m')
149 | try:
150 |     soft_pool2d(b_2d).pow(2).mean().backward()
151 |     print('\033[92m' + '> PASSED' + '\033[0m')
152 | except Exception as e:
153 |     print('\033[91m' + '> FAILED' + '\033[0m')
154 |     print(e)
155 | 
156 | print('\033[93m' + '> Checking 2D grad similarities ...' + '\033[0m')
157 | try:
158 |     check_close_enough(a_2d.grad.data, b_2d.grad.data)
159 |     print('\033[92m' + '> PASSED' + '\033[0m'+'\n')
160 | except Exception as e:
161 |     print('\033[91m' + '> FAILED' + '\033[0m')
162 |     print(e,'\n')
163 | 
164 | print('\033[93m' + '> Checking 3D CPU ...' + '\033[0m')
165 | try:
166 |     soft_pool3d(a_3d).pow(2).mean().backward()
167 |     print('\033[92m' + '> PASSED' + '\033[0m')
168 | except Exception as e:
169 |     print('\033[91m' + '> FAILED' + '\033[0m')
170 |     print(e)
171 | 
172 | print('\033[93m' + '> Checking 3D GPU ...' + '\033[0m')
173 | try:
174 |     soft_pool3d(b_3d).pow(2).mean().backward()
175 |     print('\033[92m' + '> PASSED' + '\033[0m')
176 | except Exception as e:
177 |     print('\033[91m' + '> FAILED' + '\033[0m')
178 |     print(e)
179 | 
180 | print('\033[93m' + '> Checking 3D grad similarities ...' + '\033[0m')
181 | try:
182 |     check_close_enough(a_3d.grad.data, b_3d.grad.data)
183 |     print('\033[92m' + '> PASSED' + '\033[0m'+'\n')
184 | except Exception as e:
185 |     print('\033[91m' + '> FAILED' + '\033[0m')
186 |     print(e,'\n')
187 | 
188 | 
189 | print('\n'+'\033[92m' + 'TESTS COMPLETED' + '\033[0m'+'\n')
190 | 
191 | print('\033[95m' + '--- Profiling checks ---' + '\033[0m')
192 | 
193 | a_1d = torch.rand((10, 32, 80)).float()
194 | b_1d = a_1d.clone().cuda()
195 | c_1d = a_1d.clone().cuda()
196 | a_2d = torch.rand((10, 32, 80, 80)).float()
197 | b_2d = a_2d.clone().cuda()
198 | c_2d = a_2d.clone().cuda()
199 | a_3d = torch.rand((10, 32, 8, 80, 80)).float()
200 | b_3d = a_3d.clone().cuda()
201 | c_3d = a_3d.clone().cuda()
202 | 
203 | 
204 | a_1d.requires_grad = True
205 | a_2d.requires_grad = True
206 | a_3d.requires_grad = True
207 | b_1d.requires_grad = True
208 | b_2d.requires_grad = True
209 | b_3d.requires_grad = True
210 | c_1d.requires_grad = True
211 | c_2d.requires_grad = True
212 | c_3d.requires_grad = True
213 | 
214 | 
215 | with torch.autograd.profiler.profile(use_cuda=False) as prof:
216 |     for i in range(100):
217 |         soft_pool1d(a_1d)
218 | print('\033[93m' +'SoftPool1d (CPU) [foward]'+ '\033[0m')
219 | print(prof.key_averages().table(sort_by="self_cpu_time_total"))
220 | time_f_1d_cpu = ''.join(str(prof).split('\n')[-2:])
221 | _tt = soft_pool1d(a_1d)
222 | with torch.autograd.profiler.profile(use_cuda=False) as prof:
223 |     for i in range(100):
224 |         soft_pool1d(a_1d).backward(_tt)
225 | print('\033[93m' +'SoftPool1d (CPU) [forward + backward]'+ '\033[0m')
226 | print(prof.key_averages().table(sort_by="self_cpu_time_total"))
227 | time_b_1d_cpu = ''.join(str(prof).split('\n')[-2:])
228 | 
229 | with torch.autograd.profiler.profile(use_cuda=True) as prof:
230 |     for i in range(100):
231 |         soft_pool1d(b_1d,force_inplace=True)
232 | print('\033[93m' +'SoftPool1d (CUDA-inplace) [foward]'+ '\033[0m')
233 | print(prof.key_averages())
234 | time_f_1d_cuda_forced = ''.join(str(prof).split('\n')[-3:])
235 | _tt = soft_pool1d(b_1d,force_inplace=True)
236 | with torch.autograd.profiler.profile(use_cuda=True) as prof:
237 |     for i in range(100):
238 |         soft_pool1d(b_1d,force_inplace=True).backward(_tt)
239 | print('\033[93m' +'SoftPool1d (CUDA-inplace) [forward + backward]'+ '\033[0m')
240 | print(prof.key_averages())
241 | time_b_1d_cuda_forced = ''.join(str(prof).split('\n')[-3:])
242 | 
243 | with torch.autograd.profiler.profile(use_cuda=True) as prof:
244 |     for i in range(100):
245 |         soft_pool1d(c_1d)
246 | print('\033[93m' +'SoftPool1d (CUDA) [foward]'+ '\033[0m')
247 | print(prof.key_averages())
248 | time_f_1d_cuda = ''.join(str(prof).split('\n')[-3:])
249 | _tt = soft_pool1d(c_1d)
250 | with torch.autograd.profiler.profile(use_cuda=True) as prof:
251 |     for i in range(100):
252 |         soft_pool1d(c_1d).backward(_tt)
253 | print('\033[93m' +'SoftPool1d (CUDA) [forward + backward]'+ '\033[0m')
254 | print(prof.key_averages())
255 | time_b_1d_cuda = ''.join(str(prof).split('\n')[-3:])
256 | 
257 | 
258 | 
259 | with torch.autograd.profiler.profile(use_cuda=False) as prof:
260 |     for i in range(100):
261 |         soft_pool2d(a_2d)
262 | print('\033[93m' +'SoftPool2d (CPU) [foward]'+ '\033[0m')
263 | print(prof.key_averages().table(sort_by="self_cpu_time_total"))
264 | time_f_2d_cpu = ''.join(str(prof).split('\n')[-2:])
265 | _tt = soft_pool2d(a_2d)
266 | with torch.autograd.profiler.profile(use_cuda=False) as prof:
267 |     for i in range(100):
268 |         soft_pool2d(a_2d).backward(_tt)
269 | print('\033[93m' +'SoftPool2d (CPU) [forward + backward]'+ '\033[0m')
270 | print(prof.key_averages().table(sort_by="self_cpu_time_total"))
271 | time_b_2d_cpu = ''.join(str(prof).split('\n')[-2:])
272 | 
273 | with torch.autograd.profiler.profile(use_cuda=True) as prof:
274 |     for i in range(100):
275 |         soft_pool2d(b_2d,force_inplace=True)
276 | print('\033[93m' +'SoftPool2d (CUDA-inplace) [foward]'+ '\033[0m')
277 | print(prof.key_averages())
278 | time_f_2d_cuda_forced = ''.join(str(prof).split('\n')[-3:])
279 | _tt = soft_pool2d(b_2d,force_inplace=True)
280 | with torch.autograd.profiler.profile(use_cuda=True) as prof:
281 |     for i in range(100):
282 |         soft_pool2d(b_2d,force_inplace=True).backward(_tt)
283 | print('\033[93m' +'SoftPool2d (CUDA-inplace) [forward + backward]'+ '\033[0m')
284 | print(prof.key_averages())
285 | time_b_2d_cuda_forced = ''.join(str(prof).split('\n')[-3:])
286 | 
287 | with torch.autograd.profiler.profile(use_cuda=True) as prof:
288 |     for i in range(100):
289 |         soft_pool2d(c_2d)
290 | print('\033[93m' +'SoftPool2d (CUDA) [foward]'+ '\033[0m')
291 | time_f_2d_cuda = ''.join(str(prof).split('\n')[-3:])
292 | print(prof.key_averages())
293 | _tt = soft_pool2d(c_2d)
294 | with torch.autograd.profiler.profile(use_cuda=True) as prof:
295 |     for i in range(100):
296 |         soft_pool2d(c_2d).backward(_tt)
297 | print('\033[93m' +'SoftPool2d (CUDA) [forward + backward]'+ '\033[0m')
298 | print(prof.key_averages())
299 | time_b_2d_cuda = ''.join(str(prof).split('\n')[-3:])
300 | 
301 | 
302 | 
303 | with torch.autograd.profiler.profile(use_cuda=False) as prof:
304 |     for i in range(100):
305 |         soft_pool3d(a_3d)
306 | print('\033[93m' +'SoftPool3d (CPU) [foward]'+ '\033[0m')
307 | print(prof.key_averages().table(sort_by="self_cpu_time_total"))
308 | time_f_3d_cpu = ''.join(str(prof).split('\n')[-2:])
309 | _tt = soft_pool3d(a_3d)
310 | with torch.autograd.profiler.profile(use_cuda=False) as prof:
311 |     for i in range(100):
312 |         soft_pool3d(a_3d).backward(_tt)
313 | print('\033[93m' +'SoftPool3d (CPU) [forward + backward]'+ '\033[0m')
314 | print(prof.key_averages().table(sort_by="self_cpu_time_total"))
315 | time_b_3d_cpu = ''.join(str(prof).split('\n')[-2:])
316 | 
317 | with torch.autograd.profiler.profile(use_cuda=True) as prof:
318 |     for i in range(100):
319 |         soft_pool3d(b_3d,force_inplace=True)
320 | print('\033[93m' +'SoftPool3d (CUDA-inplace) [foward]'+ '\033[0m')
321 | print(prof.key_averages())
322 | time_f_3d_cuda_forced = ''.join(str(prof).split('\n')[-3:])
323 | _tt = soft_pool3d(b_3d,force_inplace=True)
324 | with torch.autograd.profiler.profile(use_cuda=True) as prof:
325 |     for i in range(100):
326 |         soft_pool3d(b_3d,force_inplace=True).backward(_tt)
327 | print('\033[93m' +'SoftPool3d (CUDA-inplace) [forward + backward]'+ '\033[0m')
328 | print(prof.key_averages())
329 | time_b_3d_cuda_forced = ''.join(str(prof).split('\n')[-3:])
330 | 
331 | with torch.autograd.profiler.profile(use_cuda=True) as prof:
332 |     for i in range(100):
333 |         soft_pool3d(c_3d)
334 | print('\033[93m' +'SoftPool3d (CUDA) [foward]'+ '\033[0m')
335 | print(prof.key_averages())
336 | time_f_3d_cuda = ''.join(str(prof).split('\n')[-3:])
337 | _tt = soft_pool3d(c_3d)
338 | with torch.autograd.profiler.profile(use_cuda=True) as prof:
339 |     for i in range(100):
340 |         soft_pool3d(c_3d).backward(_tt)
341 | print('\033[93m' +'SoftPool3d (CUDA) [forward + backward]'+ '\033[0m')
342 | print(prof.key_averages())
343 | time_b_3d_cuda = ''.join(str(prof).split('\n')[-3:])
344 | 
345 | 
346 | print('\n'+'\033[93m' +'-------------------------------'+ '\033[0m')
347 | print('\033[93m' +'SoftPool1d [forward + backward]'+ '\033[0m')
348 | print('\n'+'\033[93m' +'----------- C P U ------------'+ '\033[0m')
349 | print(time_b_1d_cpu)
350 | print('\n'+'\033[93m' +'-- C U D A - I N P L A C E ---'+ '\033[0m')
351 | print(time_b_1d_cuda_forced)
352 | print('\n'+'\033[93m' +'---------- C U D A -----------'+ '\033[0m')
353 | print(time_b_1d_cuda)
354 | print('\n'+'\033[93m' +'-------------------------------'+ '\033[0m')
355 | 
356 | print('\n'+'\033[93m' +'-------------------------------'+ '\033[0m')
357 | print('\033[93m' +'SoftPool2d [forward + backward]'+ '\033[0m')
358 | print('\n'+'\033[93m' +'----------- C P U ------------'+ '\033[0m')
359 | print(time_b_2d_cpu)
360 | print('\n'+'\033[93m' +'-- C U D A - I N P L A C E ---'+ '\033[0m')
361 | print(time_b_2d_cuda_forced)
362 | print('\n'+'\033[93m' +'---------- C U D A -----------'+ '\033[0m')
363 | print(time_b_2d_cuda)
364 | print('\n'+'\033[93m' +'-------------------------------'+ '\033[0m')
365 | 
366 | print('\n'+'\033[93m' +'-------------------------------'+ '\033[0m')
367 | print('\033[93m' +'SoftPool3d [forward + backward]'+ '\033[0m')
368 | print('\n'+'\033[93m' +'----------- C P U ------------'+ '\033[0m')
369 | print(time_b_3d_cpu)
370 | print('\n'+'\033[93m' +'-- C U D A - I N P L A C E ---'+ '\033[0m')
371 | print(time_b_3d_cuda_forced)
372 | print('\n'+'\033[93m' +'---------- C U D A -----------'+ '\033[0m')
373 | print(time_b_3d_cuda)
374 | print('\n'+'\033[93m' +'-------------------------------'+ '\033[0m')
375 | 
376 | print('\n'+'\033[95m' + '--- Tests finished ---' + '\033[0m')
377 | 


--------------------------------------------------------------------------------