├── LICENSE.txt ├── README.md ├── images ├── basketball.gif ├── basketball_soft.gif ├── bowling.gif ├── bowling_soft.gif ├── buildings.jpg ├── buildings_soft.jpg ├── cars.gif ├── cars_soft.gif ├── otters.jpg ├── otters_soft.jpg ├── parkour.gif ├── parkour_soft.gif ├── pass.gif ├── pass_soft.gif ├── pizza_toss.gif ├── pizza_toss_soft.gif ├── puffin.jpg ├── puffin_soft.jpg ├── tennis_ball.jpg ├── tennis_ball_soft.jpg ├── tower.jpg ├── tower_soft.jpg ├── tram.jpg └── tram_soft.jpg ├── main ├── ._train.py ├── models │ ├── __init__.py │ ├── config.py │ ├── densent.py │ ├── inception.py │ └── resnet.py └── train.py └── pytorch ├── ._setup.py ├── CUDA ├── limits.cuh ├── softpool_cuda.cpp └── softpool_cuda_kernel.cu ├── Makefile ├── SoftPool ├── __init__.py ├── __pycache__ │ └── idea.cpython-38.pyc └── idea.py ├── setup.py └── test-files ├── ._images ├── ._out_1 ├── ._test.py └── test.py /LICENSE.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Alexandros Stergiou 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Refining activation downsampling with SoftPool 2 | ![supported versions](https://img.shields.io/badge/python-3.x-brightgreen/?style=flat&logo=python&color=green) 3 | ![Library](https://img.shields.io/badge/library-PyTorch-blue?logo=Pytorch) 4 | ![GitHub license](https://img.shields.io/cocoapods/l/AFNetworking) 5 | 6 | 7 | -------------------------------------------------------------------------------- 8 | #### Update 10/2021: 9 | We have extended this work with in our paper: ***AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling***. Info, code and resources are available at [`alexandrosstergiou/adaPool`](https://github.com/alexandrosstergiou/adaPool) 10 | 11 | ## Abstract 12 | Convolutional Neural Networks (CNNs) use pooling to decrease the size of activation maps. This process is crucial to increase the receptive fields and to reduce computational requirements of subsequent convolutions. An important feature of the pooling operation is the minimization of information loss, with respect to the initial activation maps, without a significant impact on the computation and memory overhead. To meet these requirements, we propose SoftPool: a fast and efficient method for exponentially weighted activation downsampling. Through experiments across a range of architectures and pooling methods, we demonstrate that SoftPool can retain more information in the reduced activation maps. This refined downsampling leads to improvements in a CNN's classification accuracy. Experiments with pooling layer substitutions on ImageNet1K show an increase in accuracy over both original architectures and other pooling methods. We also test SoftPool on video datasets for action recognition. Again, through the direct replacement of pooling layers, we observe consistent performance improvements while computational loads and memory requirements remain limited.

13 | 14 | 15 |
16 |

To appear in IEEE International Conference on Computer Vision (ICCV) 2021

17 |

18 | [arXiv preprint] 19 |     20 | [CVF open access] 21 |     22 | [video presentation] 23 |

24 | 25 | Image based pooling. Images are sub-sampled in both height and width by half. 26 | 27 | |Original||||||| 28 | |:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:| 29 | |Soft Pool||||||| 30 | 31 | Video based pooling. Videos are sub-sampled in time, height and width by half. 32 | 33 | 34 | |Original||||||| 35 | |:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:| 36 | |Soft Pool||||||| 37 | 38 | ## Dependencies 39 | All parts of the code assume that `torch` is of version 1.4 or higher. There might be instability issues on previous versions. 40 | 41 | > ***! Disclaimer:*** This repository is heavily structurally influenced on Ziteng Gao's LIP repo [https://github.com/sebgao/LIP](https://github.com/sebgao/LIP) 42 | 43 | ## Installation 44 | 45 | You can build the repo through the following commands: 46 | ``` 47 | $ git clone https://github.com/alexandrosstergiou/SoftPool.git 48 | $ cd SoftPool-master/pytorch 49 | $ make install 50 | --- (optional) --- 51 | $ make test 52 | ``` 53 | 54 | 55 | ## Usage 56 | 57 | You can load any of the 1D, 2D or 3D variants after the installation with: 58 | 59 | ```python 60 | import softpool_cuda 61 | from SoftPool import soft_pool1d, SoftPool1d 62 | from SoftPool import soft_pool2d, SoftPool2d 63 | from SoftPool import soft_pool3d, SoftPool3d 64 | ``` 65 | 66 | + `soft_poolxd`: Is a functional interface for SoftPool. 67 | + `SoftPoolxd`: Is the class-based version which created an object that can be referenced later in the code. 68 | 69 | ## ImageNet models 70 | 71 | ImageNet weight can be downloaded from the following links: 72 | 73 | |Network|link| 74 | |:-----:|:--:| 75 | | ResNet-18 | [link](https://drive.google.com/file/d/11me4z74Fp4FkGGv_WbMZRQxTr4YJxUHS/view?usp=sharing) | 76 | | ResNet-34 | [link](https://drive.google.com/file/d/1-5O-r3hCJ7JSrrfVowrUZpaHcp7TcKKT/view?usp=sharing) | 77 | | ResNet-50 | [link](https://drive.google.com/file/d/1HpBESqJ-QLO_O0pozgh1T3xp4n5MOQLU/view?usp=sharing) | 78 | | ResNet-101 | [link](https://drive.google.com/file/d/1fng3DFm48W6h-qbFUk-IPZf9s8HsGbdw/view?usp=sharing) | 79 | | ResNet-152 | [link](https://drive.google.com/file/d/1ejuMgP4DK9pFcVnu1TZo6TELPlrhHJC_/view?usp=sharing) | 80 | | DenseNet-121 | [link](https://drive.google.com/file/d/1EXIbVI19JyEjgY75caZK2B2-gaxKTVpK/view?usp=sharing) | 81 | | DenseNet-161 | [link](https://drive.google.com/file/d/18Qs9XUXNPSgBe46_0OGZIcpvdoFZfjU5/view?usp=sharing) | 82 | | DenseNet-169 | [link](https://drive.google.com/file/d/1shFZV_AIZ6SQFQs-C0YThfpOfZH88hm7/view?usp=sharing) | 83 | | ResNeXt-50_32x4d | [link](hhttps://drive.google.com/file/d/1-3sd8paTlqa1X8KGUy6B5Eehv791tbVH/view?usp=sharing) | 84 | | ResNeXt-101_32x4d | [link](https://drive.google.com/file/d/1URDkwAPxDgcQzkYFlV_m-1T5RjZvzabo/view?usp=sharing) | 85 | | wide-ResNet50 | [link](https://drive.google.com/file/d/1X3A6P0enEJYLeNmY0pUTXA26FEQB1qMe/view?usp=sharing) | 86 | 87 | ## Citation 88 | 89 | ``` 90 | @inproceedings{stergiou2021refining, 91 | title={Refining activation downsampling with SoftPool}, 92 | author={Stergiou, Alexandros, Poppe, Ronald and Kalliatakis Grigorios}, 93 | booktitle={International Conference on Computer Vision (ICCV)}, 94 | year={2021}, 95 | pages={10357-10366}, 96 | organization={IEEE} 97 | } 98 | ``` 99 | 100 | ## Licence 101 | 102 | MIT 103 | 104 | ## Additional resources 105 | A great project is Ren Tianhe's [`pytorh-pooling` repo](https://github.com/rentainhe/pytorch-pooling) for overviewing different pooling strategies. 106 | -------------------------------------------------------------------------------- /images/basketball.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/basketball.gif -------------------------------------------------------------------------------- /images/basketball_soft.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/basketball_soft.gif -------------------------------------------------------------------------------- /images/bowling.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/bowling.gif -------------------------------------------------------------------------------- /images/bowling_soft.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/bowling_soft.gif -------------------------------------------------------------------------------- /images/buildings.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/buildings.jpg -------------------------------------------------------------------------------- /images/buildings_soft.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/buildings_soft.jpg -------------------------------------------------------------------------------- /images/cars.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/cars.gif -------------------------------------------------------------------------------- /images/cars_soft.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/cars_soft.gif -------------------------------------------------------------------------------- /images/otters.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/otters.jpg -------------------------------------------------------------------------------- /images/otters_soft.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/otters_soft.jpg -------------------------------------------------------------------------------- /images/parkour.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/parkour.gif -------------------------------------------------------------------------------- /images/parkour_soft.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/parkour_soft.gif -------------------------------------------------------------------------------- /images/pass.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/pass.gif -------------------------------------------------------------------------------- /images/pass_soft.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/pass_soft.gif -------------------------------------------------------------------------------- /images/pizza_toss.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/pizza_toss.gif -------------------------------------------------------------------------------- /images/pizza_toss_soft.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/pizza_toss_soft.gif -------------------------------------------------------------------------------- /images/puffin.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/puffin.jpg -------------------------------------------------------------------------------- /images/puffin_soft.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/puffin_soft.jpg -------------------------------------------------------------------------------- /images/tennis_ball.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/tennis_ball.jpg -------------------------------------------------------------------------------- /images/tennis_ball_soft.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/tennis_ball_soft.jpg -------------------------------------------------------------------------------- /images/tower.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/tower.jpg -------------------------------------------------------------------------------- /images/tower_soft.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/tower_soft.jpg -------------------------------------------------------------------------------- /images/tram.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/tram.jpg -------------------------------------------------------------------------------- /images/tram_soft.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/images/tram_soft.jpg -------------------------------------------------------------------------------- /main/._train.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/main/._train.py -------------------------------------------------------------------------------- /main/models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/main/models/__init__.py -------------------------------------------------------------------------------- /main/models/config.py: -------------------------------------------------------------------------------- 1 | from .resnet import resnet18, resnet34, resnet50, resnet101, resnet152, resnext50_32x4d, resnext101_32x8d, resnext101_32x4d, resnext101_64x4d, wide_resnet50_2, wide_resnet101_2 2 | from .densent import densenet121, densenet161, densenet169, densenet201 3 | from .inception import inception_v3 4 | 5 | models = ['resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152', 'resnext50_32x4d', 'resnext101_32x8d', 'resnext101_32x4d', 'resnext101_64x4d', 'wide_resnet50_2', 'wide_resnet101_2', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'inception_v3'] 6 | 7 | def get_model(name,use_softpool, **kwargs): 8 | net = None 9 | if 'res' in name.lower(): 10 | if '18' in name.lower(): 11 | net = resnet18(use_softpool=use_softpool, **kwargs) 12 | elif '34' in name.lower(): 13 | net = resnet34(use_softpool=use_softpool, **kwargs) 14 | elif '50' in name.lower(): 15 | if 'xt' in name.lower(): 16 | net = resnext50_32x4d(use_softpool=use_softpool, **kwargs) 17 | elif 'wide' in name.lower(): 18 | net = wide_resnet50_2(use_softpool=use_softpool, **kwargs) 19 | else: 20 | net = resnet50(use_softpool=use_softpool, **kwargs) 21 | elif '101' in name.lower(): 22 | if 'xt' in name.lower(): 23 | if '32x4d' in name.lower(): 24 | net = resnext101_32x4d(use_softpool=use_softpool, **kwargs) 25 | elif '64x4d' in name.lower(): 26 | net = resnext101_64x4d(use_softpool=use_softpool, **kwargs) 27 | elif '32x8d' in name.lower(): 28 | net = resnext101_32x8d(use_softpool=use_softpool, **kwargs) 29 | elif 'wide' in name.lower(): 30 | net = wide_resnet101_2(use_softpool=use_softpool, **kwargs) 31 | else: 32 | net = resnet101(use_softpool=use_softpool, **kwargs) 33 | elif '152' in name.lower(): 34 | net = resnet152(use_softpool=use_softpool, **kwargs) 35 | 36 | elif 'densenet' in name.lower(): 37 | if '121' in name.lower(): 38 | net = densenet121(use_softpool=use_softpool, **kwargs) 39 | elif '161' in name.lower(): 40 | net = densenet161(use_softpool=use_softpool, **kwargs) 41 | elif '169' in name.lower(): 42 | net = densenet169(use_softpool=use_softpool, **kwargs) 43 | elif '201' in name.lower(): 44 | net = densenet201(use_softpool=use_softpool, **kwargs) 45 | 46 | elif 'inception' in name.lower(): 47 | net = inception_v3(use_softpool=use_softpool, **kwargs) 48 | 49 | if net is None: 50 | print('Selected architecture not implemented !') 51 | raise NotImplementedError 52 | 53 | return net 54 | -------------------------------------------------------------------------------- /main/models/densent.py: -------------------------------------------------------------------------------- 1 | import re 2 | import torch 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | import torch.utils.checkpoint as cp 6 | from collections import OrderedDict 7 | from torch import Tensor 8 | from torch.jit.annotations import List 9 | 10 | import softpool_cuda 11 | from SoftPool import soft_pool2d, SoftPool2d 12 | 13 | 14 | __all__ = ['DenseNet', 'densenet121', 'densenet169', 'densenet201', 'densenet161'] 15 | 16 | 17 | class _DenseLayer(nn.Module): 18 | def __init__(self, num_input_features, growth_rate, bn_size, drop_rate, memory_efficient=False): 19 | super(_DenseLayer, self).__init__() 20 | self.add_module('norm1', nn.BatchNorm2d(num_input_features)), 21 | self.add_module('relu1', nn.ReLU(inplace=True)), 22 | self.add_module('conv1', nn.Conv2d(num_input_features, bn_size * 23 | growth_rate, kernel_size=1, stride=1, 24 | bias=False)), 25 | self.add_module('norm2', nn.BatchNorm2d(bn_size * growth_rate)), 26 | self.add_module('relu2', nn.ReLU(inplace=True)), 27 | self.add_module('conv2', nn.Conv2d(bn_size * growth_rate, growth_rate, 28 | kernel_size=3, stride=1, padding=1, 29 | bias=False)), 30 | self.drop_rate = float(drop_rate) 31 | self.memory_efficient = memory_efficient 32 | 33 | def bn_function(self, inputs): 34 | # type: (List[Tensor]) -> Tensor 35 | concated_features = torch.cat(inputs, 1) 36 | bottleneck_output = self.conv1(self.relu1(self.norm1(concated_features))) # noqa: T484 37 | return bottleneck_output 38 | 39 | # todo: rewrite when torchscript supports any 40 | def any_requires_grad(self, input): 41 | # type: (List[Tensor]) -> bool 42 | for tensor in input: 43 | if tensor.requires_grad: 44 | return True 45 | return False 46 | 47 | @torch.jit.unused # noqa: T484 48 | def call_checkpoint_bottleneck(self, input): 49 | # type: (List[Tensor]) -> Tensor 50 | def closure(*inputs): 51 | return self.bn_function(inputs) 52 | 53 | return cp.checkpoint(closure, *input) 54 | 55 | @torch.jit._overload_method # noqa: F811 56 | def forward(self, input): 57 | # type: (List[Tensor]) -> (Tensor) 58 | pass 59 | 60 | @torch.jit._overload_method # noqa: F811 61 | def forward(self, input): 62 | # type: (Tensor) -> (Tensor) 63 | pass 64 | 65 | # torchscript does not yet support *args, so we overload method 66 | # allowing it to take either a List[Tensor] or single Tensor 67 | def forward(self, input): # noqa: F811 68 | if isinstance(input, Tensor): 69 | prev_features = [input] 70 | else: 71 | prev_features = input 72 | 73 | if self.memory_efficient and self.any_requires_grad(prev_features): 74 | if torch.jit.is_scripting(): 75 | raise Exception("Memory Efficient not supported in JIT") 76 | 77 | bottleneck_output = self.call_checkpoint_bottleneck(prev_features) 78 | else: 79 | bottleneck_output = self.bn_function(prev_features) 80 | 81 | new_features = self.conv2(self.relu2(self.norm2(bottleneck_output))) 82 | if self.drop_rate > 0: 83 | new_features = F.dropout(new_features, p=self.drop_rate, 84 | training=self.training) 85 | return new_features 86 | 87 | 88 | class _DenseBlock(nn.ModuleDict): 89 | _version = 2 90 | 91 | def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate, memory_efficient=False): 92 | super(_DenseBlock, self).__init__() 93 | for i in range(num_layers): 94 | layer = _DenseLayer( 95 | num_input_features + i * growth_rate, 96 | growth_rate=growth_rate, 97 | bn_size=bn_size, 98 | drop_rate=drop_rate, 99 | memory_efficient=memory_efficient, 100 | ) 101 | self.add_module('denselayer%d' % (i + 1), layer) 102 | 103 | def forward(self, init_features): 104 | features = [init_features] 105 | for name, layer in self.items(): 106 | new_features = layer(features) 107 | features.append(new_features) 108 | return torch.cat(features, 1) 109 | 110 | 111 | class _Transition(nn.Sequential): 112 | def __init__(self, num_input_features, num_output_features, use_softpool): 113 | super(_Transition, self).__init__() 114 | self.add_module('norm', nn.BatchNorm2d(num_input_features)) 115 | self.add_module('relu', nn.ReLU(inplace=True)) 116 | self.add_module('conv', nn.Conv2d(num_input_features, num_output_features, 117 | kernel_size=1, stride=1, bias=False)) 118 | #if not use_softpool: 119 | self.add_module('pool', nn.AvgPool2d(kernel_size=2, stride=2)) 120 | #else: 121 | # self.add_module('pool', SoftPool2d(kernel_size=(2,2), stride=(2,2))) 122 | 123 | 124 | 125 | class DenseNet(nn.Module): 126 | r"""Densenet-BC model class, based on 127 | `"Densely Connected Convolutional Networks" `_ 128 | 129 | Args: 130 | growth_rate (int) - how many filters to add each layer (`k` in paper) 131 | block_config (list of 4 ints) - how many layers in each pooling block 132 | num_init_features (int) - the number of filters to learn in the first convolution layer 133 | use_softpool (bool) - changes pooling operations to softpooling 134 | bn_size (int) - multiplicative factor for number of bottle neck layers 135 | (i.e. bn_size * k features in the bottleneck layer) 136 | drop_rate (float) - dropout rate after each dense layer 137 | num_classes (int) - number of classification classes 138 | memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient, 139 | but slower. Default: *False*. See `"paper" `_ 140 | """ 141 | 142 | def __init__(self, growth_rate=32, block_config=(6, 12, 24, 16), 143 | num_init_features=64, use_softpool=False, 144 | bn_size=4, drop_rate=0, 145 | num_classes=1000, memory_efficient=False): 146 | 147 | super(DenseNet, self).__init__() 148 | 149 | # First convolution 150 | if not use_softpool: 151 | self.features = nn.Sequential(OrderedDict([ 152 | ('conv0', nn.Conv2d(3, num_init_features, kernel_size=7, stride=2, 153 | padding=3, bias=False)), 154 | ('norm0', nn.BatchNorm2d(num_init_features)), 155 | ('relu0', nn.ReLU(inplace=True)), 156 | ('pool0', nn.MaxPool2d(kernel_size=3, stride=2, padding=1)), 157 | ])) 158 | else: 159 | self.features = nn.Sequential(OrderedDict([ 160 | ('conv0', nn.Conv2d(3, num_init_features, kernel_size=7, stride=2, 161 | padding=3, bias=False)), 162 | ('norm0', nn.BatchNorm2d(num_init_features)), 163 | ('relu0', nn.ReLU(inplace=True)), 164 | ('pool0', SoftPool2d(kernel_size=(2,2), stride=(2,2))), 165 | ])) 166 | 167 | # Each denseblock 168 | num_features = num_init_features 169 | for i, num_layers in enumerate(block_config): 170 | block = _DenseBlock( 171 | num_layers=num_layers, 172 | num_input_features=num_features, 173 | bn_size=bn_size, 174 | growth_rate=growth_rate, 175 | drop_rate=drop_rate, 176 | memory_efficient=memory_efficient 177 | ) 178 | self.features.add_module('denseblock%d' % (i + 1), block) 179 | num_features = num_features + num_layers * growth_rate 180 | if i != len(block_config) - 1: 181 | trans = _Transition(num_input_features=num_features, 182 | num_output_features=num_features // 2, 183 | use_softpool=use_softpool) 184 | self.features.add_module('transition%d' % (i + 1), trans) 185 | num_features = num_features // 2 186 | 187 | # Final batch norm 188 | self.features.add_module('norm5', nn.BatchNorm2d(num_features)) 189 | 190 | # Linear layer 191 | self.classifier = nn.Linear(num_features, num_classes) 192 | 193 | # Official init from torch repo. 194 | for m in self.modules(): 195 | if isinstance(m, nn.Conv2d): 196 | nn.init.kaiming_normal_(m.weight) 197 | elif isinstance(m, nn.BatchNorm2d): 198 | nn.init.constant_(m.weight, 1) 199 | nn.init.constant_(m.bias, 0) 200 | elif isinstance(m, nn.Linear): 201 | nn.init.constant_(m.bias, 0) 202 | 203 | def forward(self, x): 204 | features = self.features(x) 205 | out = F.relu(features, inplace=True) 206 | out = F.adaptive_avg_pool2d(out, (1, 1)) 207 | out = torch.flatten(out, 1) 208 | out = self.classifier(out) 209 | return out 210 | 211 | 212 | def _load_state_dict(model, model_url, progress): 213 | # '.'s are no longer allowed in module names, but previous _DenseLayer 214 | # has keys 'norm.1', 'relu.1', 'conv.1', 'norm.2', 'relu.2', 'conv.2'. 215 | # They are also in the checkpoints in model_urls. This pattern is used 216 | # to find such keys. 217 | pattern = re.compile( 218 | r'^(.*denselayer\d+\.(?:norm|relu|conv))\.((?:[12])\.(?:weight|bias|running_mean|running_var))$') 219 | 220 | state_dict = load_state_dict_from_url(model_url, progress=progress) 221 | for key in list(state_dict.keys()): 222 | res = pattern.match(key) 223 | if res: 224 | new_key = res.group(1) + res.group(2) 225 | state_dict[new_key] = state_dict[key] 226 | del state_dict[key] 227 | model.load_state_dict(state_dict) 228 | 229 | 230 | def _densenet(arch, growth_rate, block_config, num_init_features, pretrained, progress, 231 | use_softpool, **kwargs): 232 | model = DenseNet(growth_rate, block_config, num_init_features, use_softpool, **kwargs) 233 | if pretrained: 234 | _load_state_dict(model, model_urls[arch], progress) 235 | return model 236 | 237 | 238 | def densenet121(pretrained=False, progress=True, use_softpool=False, **kwargs): 239 | r"""Densenet-121 model from 240 | `"Densely Connected Convolutional Networks" `_ 241 | 242 | Args: 243 | pretrained (bool): If True, returns a model pre-trained on ImageNet 244 | progress (bool): If True, displays a progress bar of the download to stderr 245 | memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient, 246 | but slower. Default: *False*. See `"paper" `_ 247 | use_softpool (bool): If True, changes pooling operations to softpooling 248 | """ 249 | return _densenet('densenet121', 32, (6, 12, 24, 16), 64, pretrained, progress, use_softpool, 250 | **kwargs) 251 | 252 | 253 | def densenet161(pretrained=False, progress=True, use_softpool=False, **kwargs): 254 | r"""Densenet-161 model from 255 | `"Densely Connected Convolutional Networks" `_ 256 | 257 | Args: 258 | pretrained (bool): If True, returns a model pre-trained on ImageNet 259 | progress (bool): If True, displays a progress bar of the download to stderr 260 | memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient, 261 | but slower. Default: *False*. See `"paper" `_ 262 | use_softpool (bool): If True, changes pooling operations to softpooling 263 | """ 264 | return _densenet('densenet161', 48, (6, 12, 36, 24), 96, pretrained, progress, use_softpool, 265 | **kwargs) 266 | 267 | 268 | def densenet169(pretrained=False, progress=True, use_softpool=False, **kwargs): 269 | r"""Densenet-169 model from 270 | `"Densely Connected Convolutional Networks" `_ 271 | 272 | Args: 273 | pretrained (bool): If True, returns a model pre-trained on ImageNet 274 | progress (bool): If True, displays a progress bar of the download to stderr 275 | memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient, 276 | but slower. Default: *False*. See `"paper" `_ 277 | use_softpool (bool): If True, changes pooling operations to softpooling 278 | """ 279 | return _densenet('densenet169', 32, (6, 12, 32, 32), 64, pretrained, progress, use_softpool, 280 | **kwargs) 281 | 282 | 283 | def densenet201(pretrained=False, progress=True, use_softpool=False, **kwargs): 284 | r"""Densenet-201 model from 285 | `"Densely Connected Convolutional Networks" `_ 286 | 287 | Args: 288 | pretrained (bool): If True, returns a model pre-trained on ImageNet 289 | progress (bool): If True, displays a progress bar of the download to stderr 290 | memory_efficient (bool) - If True, uses checkpointing. Much more memory efficient, 291 | but slower. Default: *False*. See `"paper" `_ 292 | use_softpool (bool): If True, changes pooling operations to softpooling 293 | """ 294 | return _densenet('densenet201', 32, (6, 12, 48, 32), 64, pretrained, progress, use_softpool, 295 | **kwargs) 296 | -------------------------------------------------------------------------------- /main/models/inception.py: -------------------------------------------------------------------------------- 1 | from collections import namedtuple 2 | import warnings 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | from torch.jit.annotations import Optional 7 | from torch import Tensor 8 | 9 | import softpool_cuda 10 | from SoftPool import soft_pool2d, SoftPool2d 11 | 12 | __all__ = ['Inception3', 'inception_v3', 'InceptionOutputs', '_InceptionOutputs'] 13 | 14 | 15 | 16 | InceptionOutputs = namedtuple('InceptionOutputs', ['logits', 'aux_logits']) 17 | InceptionOutputs.__annotations__ = {'logits': torch.Tensor, 'aux_logits': Optional[torch.Tensor]} 18 | 19 | # Script annotations failed with _GoogleNetOutputs = namedtuple ... 20 | # _InceptionOutputs set here for backwards compat 21 | _InceptionOutputs = InceptionOutputs 22 | 23 | 24 | def inception_v3(pretrained=False, progress=True, use_softpool=True, **kwargs): 25 | r"""Inception v3 model architecture from 26 | `"Rethinking the Inception Architecture for Computer Vision" `_. 27 | 28 | .. note:: 29 | **Important**: In contrast to the other models the inception_v3 expects tensors with a size of 30 | N x 3 x 299 x 299, so ensure your images are sized accordingly. 31 | 32 | Args: 33 | pretrained (bool): If True, returns a model pre-trained on ImageNet 34 | progress (bool): If True, displays a progress bar of the download to stderr 35 | aux_logits (bool): If True, add an auxiliary branch that can improve training. 36 | Default: *True* 37 | transform_input (bool): If True, preprocesses the input according to the method with which it 38 | was trained on ImageNet. Default: *False* 39 | """ 40 | if pretrained: 41 | if 'transform_input' not in kwargs: 42 | kwargs['transform_input'] = True 43 | if 'aux_logits' in kwargs: 44 | original_aux_logits = kwargs['aux_logits'] 45 | kwargs['aux_logits'] = True 46 | else: 47 | original_aux_logits = True 48 | kwargs['init_weights'] = False # we are loading weights from a pretrained model 49 | model = Inception3(**kwargs) 50 | state_dict = load_state_dict_from_url(model_urls['inception_v3_google'], 51 | progress=progress) 52 | model.load_state_dict(state_dict) 53 | if not original_aux_logits: 54 | model.aux_logits = False 55 | del model.AuxLogits 56 | return model 57 | 58 | return Inception3(use_softpool, **kwargs) 59 | 60 | class Inception3(nn.Module): 61 | 62 | def __init__(self, use_softpool=False, num_classes=1000, aux_logits=False, transform_input=False, 63 | inception_blocks=None, init_weights=None): 64 | super(Inception3, self).__init__() 65 | if inception_blocks is None: 66 | inception_blocks = [ 67 | BasicConv2d, InceptionA, InceptionB, InceptionC, 68 | InceptionD, InceptionE, InceptionAux 69 | ] 70 | if init_weights is None: 71 | warnings.warn('The default weight initialization of inception_v3 will be changed in future releases of ' 72 | 'torchvision. If you wish to keep the old behavior (which leads to long initialization times' 73 | ' due to scipy/scipy#11299), please set init_weights=True.', FutureWarning) 74 | init_weights = True 75 | assert len(inception_blocks) == 7 76 | conv_block = inception_blocks[0] 77 | inception_a = inception_blocks[1] 78 | inception_b = inception_blocks[2] 79 | inception_c = inception_blocks[3] 80 | inception_d = inception_blocks[4] 81 | inception_e = inception_blocks[5] 82 | inception_aux = inception_blocks[6] 83 | 84 | self.aux_logits = aux_logits 85 | self.transform_input = transform_input 86 | self.Conv2d_1a_3x3 = conv_block(3, 32, kernel_size=3, stride=2) 87 | self.Conv2d_2a_3x3 = conv_block(32, 32, kernel_size=3) 88 | self.Conv2d_2b_3x3 = conv_block(32, 64, kernel_size=3, padding=1) 89 | 90 | if not use_softpool: 91 | self.pool1 = nn.MaxPool2d(kernel_size=3, stride=2) 92 | else: 93 | self.pool1 = SoftPool2d(kernel_size=3, stride=2) 94 | 95 | self.Conv2d_3b_1x1 = conv_block(64, 80, kernel_size=1) 96 | self.Conv2d_4a_3x3 = conv_block(80, 192, kernel_size=3) 97 | 98 | if not use_softpool: 99 | self.pool2 = nn.MaxPool2d(kernel_size=3, stride=2) 100 | else: 101 | self.pool2 = SoftPool2d(kernel_size=3, stride=2) 102 | 103 | self.Mixed_5b = inception_a(192, pool_features=32, use_softpool=use_softpool,pad=False) 104 | self.Mixed_5c = inception_a(256, pool_features=64, use_softpool=use_softpool,pad=False) 105 | self.Mixed_5d = inception_a(288, pool_features=64, use_softpool=use_softpool,pad=False) 106 | self.Mixed_6a = inception_b(288, use_softpool=use_softpool) 107 | self.Mixed_6b = inception_c(768, channels_7x7=128, use_softpool=use_softpool,pad=False) 108 | self.Mixed_6c = inception_c(768, channels_7x7=160, use_softpool=use_softpool,pad=False) 109 | self.Mixed_6d = inception_c(768, channels_7x7=160, use_softpool=use_softpool,pad=False) 110 | self.Mixed_6e = inception_c(768, channels_7x7=192, use_softpool=use_softpool,pad=False) 111 | if aux_logits: 112 | self.AuxLogits = inception_aux(768, num_classes, use_softpool=use_softpool) 113 | self.Mixed_7a = inception_d(768, use_softpool=use_softpool) 114 | self.Mixed_7b = inception_e(1280, use_softpool=use_softpool,pad=False) 115 | self.Mixed_7c = inception_e(2048, use_softpool=use_softpool,pad=False) 116 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) 117 | self.dropout = nn.Dropout() 118 | self.fc = nn.Linear(2048, num_classes) 119 | if init_weights: 120 | for m in self.modules(): 121 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear): 122 | import scipy.stats as stats 123 | stddev = m.stddev if hasattr(m, 'stddev') else 0.1 124 | X = stats.truncnorm(-2, 2, scale=stddev) 125 | values = torch.as_tensor(X.rvs(m.weight.numel()), dtype=m.weight.dtype) 126 | values = values.view(m.weight.size()) 127 | with torch.no_grad(): 128 | m.weight.copy_(values) 129 | elif isinstance(m, nn.BatchNorm2d): 130 | nn.init.constant_(m.weight, 1) 131 | nn.init.constant_(m.bias, 0) 132 | 133 | def _transform_input(self, x): 134 | if self.transform_input: 135 | x_ch0 = torch.unsqueeze(x[:, 0], 1) * (0.229 / 0.5) + (0.485 - 0.5) / 0.5 136 | x_ch1 = torch.unsqueeze(x[:, 1], 1) * (0.224 / 0.5) + (0.456 - 0.5) / 0.5 137 | x_ch2 = torch.unsqueeze(x[:, 2], 1) * (0.225 / 0.5) + (0.406 - 0.5) / 0.5 138 | x = torch.cat((x_ch0, x_ch1, x_ch2), 1) 139 | return x 140 | 141 | def _forward(self, x): 142 | # N x 3 x 299 x 299 143 | x = self.Conv2d_1a_3x3(x) 144 | # N x 32 x 149 x 149 145 | x = self.Conv2d_2a_3x3(x) 146 | # N x 32 x 147 x 147 147 | x = self.Conv2d_2b_3x3(x) 148 | # N x 64 x 147 x 147 149 | x = self.pool1(x) 150 | # N x 64 x 73 x 73 151 | x = self.Conv2d_3b_1x1(x) 152 | # N x 80 x 73 x 73 153 | x = self.Conv2d_4a_3x3(x) 154 | # N x 192 x 71 x 71 155 | x = self.pool2(x) 156 | # N x 192 x 35 x 35 157 | x = self.Mixed_5b(x) 158 | # N x 256 x 35 x 35 159 | x = self.Mixed_5c(x) 160 | # N x 288 x 35 x 35 161 | x = self.Mixed_5d(x) 162 | # N x 288 x 35 x 35 163 | x = self.Mixed_6a(x) 164 | # N x 768 x 17 x 17 165 | x = self.Mixed_6b(x) 166 | # N x 768 x 17 x 17 167 | x = self.Mixed_6c(x) 168 | # N x 768 x 17 x 17 169 | x = self.Mixed_6d(x) 170 | # N x 768 x 17 x 17 171 | x = self.Mixed_6e(x) 172 | # N x 768 x 17 x 17 173 | aux_defined = self.training and self.aux_logits 174 | if aux_defined: 175 | aux = self.AuxLogits(x) 176 | else: 177 | aux = None 178 | # N x 768 x 17 x 17 179 | x = self.Mixed_7a(x) 180 | # N x 1280 x 8 x 8 181 | x = self.Mixed_7b(x) 182 | # N x 2048 x 8 x 8 183 | x = self.Mixed_7c(x) 184 | # N x 2048 x 8 x 8 185 | # Adaptive average pooling 186 | x = self.avgpool(x) 187 | # N x 2048 x 1 x 1 188 | x = self.dropout(x) 189 | # N x 2048 x 1 x 1 190 | x = torch.flatten(x, 1) 191 | # N x 2048 192 | x = self.fc(x) 193 | # N x 1000 (num_classes) 194 | return x, aux 195 | 196 | @torch.jit.unused 197 | def eager_outputs(self, x, aux): 198 | # type: (Tensor, Optional[Tensor]) -> InceptionOutputs 199 | if self.training and self.aux_logits: 200 | return InceptionOutputs(x, aux) 201 | else: 202 | return x 203 | 204 | def forward(self, x): 205 | x = self._transform_input(x) 206 | x, aux = self._forward(x) 207 | aux_defined = self.training and self.aux_logits 208 | if torch.jit.is_scripting(): 209 | if not aux_defined: 210 | warnings.warn("Scripted Inception3 always returns Inception3 Tuple") 211 | return InceptionOutputs(x, aux) 212 | else: 213 | return self.eager_outputs(x, aux) 214 | 215 | 216 | class InceptionA(nn.Module): 217 | 218 | def __init__(self, in_channels, pool_features, conv_block=None, use_softpool=False, pad=True): 219 | super(InceptionA, self).__init__() 220 | if conv_block is None: 221 | conv_block = BasicConv2d 222 | self.pad = pad 223 | 224 | self.branch1x1 = conv_block(in_channels, 64, kernel_size=1) 225 | 226 | self.branch5x5_1 = conv_block(in_channels, 48, kernel_size=1) 227 | self.branch5x5_2 = conv_block(48, 64, kernel_size=5, padding=2) 228 | 229 | self.branch3x3dbl_1 = conv_block(in_channels, 64, kernel_size=1) 230 | self.branch3x3dbl_2 = conv_block(64, 96, kernel_size=3, padding=1) 231 | self.branch3x3dbl_3 = conv_block(96, 96, kernel_size=3, padding=1) 232 | 233 | self.branch_pool = conv_block(in_channels, pool_features, kernel_size=1) 234 | 235 | self.use_softpool = use_softpool 236 | 237 | def _forward(self, x): 238 | branch1x1 = self.branch1x1(x) 239 | 240 | branch5x5 = self.branch5x5_1(x) 241 | branch5x5 = self.branch5x5_2(branch5x5) 242 | 243 | branch3x3dbl = self.branch3x3dbl_1(x) 244 | branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl) 245 | branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl) 246 | 247 | if not self.use_softpool: 248 | branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1) 249 | else: 250 | if self.pad: 251 | branch_pool = F.pad(x,(1,1,1,1),'constant', 0) 252 | else: 253 | branch_pool = F.pad(x,(0,0,0,0),'constant', 0) 254 | branch_pool = soft_pool2d(branch_pool, kernel_size=3, stride=1) 255 | 256 | branch_pool = self.branch_pool(branch_pool) 257 | 258 | outputs = [branch1x1, branch5x5, branch3x3dbl, branch_pool] 259 | return outputs 260 | 261 | def forward(self, x): 262 | outputs = self._forward(x) 263 | return torch.cat(outputs, 1) 264 | 265 | 266 | class InceptionB(nn.Module): 267 | 268 | def __init__(self, in_channels, conv_block=None, use_softpool=False): 269 | super(InceptionB, self).__init__() 270 | if conv_block is None: 271 | conv_block = BasicConv2d 272 | self.branch3x3 = conv_block(in_channels, 384, kernel_size=3, stride=2) 273 | 274 | self.branch3x3dbl_1 = conv_block(in_channels, 64, kernel_size=1) 275 | self.branch3x3dbl_2 = conv_block(64, 96, kernel_size=3, padding=1) 276 | self.branch3x3dbl_3 = conv_block(96, 96, kernel_size=3, stride=2) 277 | 278 | self.use_softpool = use_softpool 279 | 280 | def _forward(self, x): 281 | branch3x3 = self.branch3x3(x) 282 | 283 | branch3x3dbl = self.branch3x3dbl_1(x) 284 | branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl) 285 | branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl) 286 | 287 | if not self.use_softpool: 288 | branch_pool = F.max_pool2d(x, kernel_size=3, stride=2) 289 | else: 290 | branch_pool = soft_pool2d(x, kernel_size=3, stride=2) 291 | 292 | outputs = [branch3x3, branch3x3dbl, branch_pool] 293 | return outputs 294 | 295 | def forward(self, x): 296 | outputs = self._forward(x) 297 | return torch.cat(outputs, 1) 298 | 299 | 300 | class InceptionC(nn.Module): 301 | 302 | def __init__(self, in_channels, channels_7x7, conv_block=None, use_softpool=False, pad=True): 303 | super(InceptionC, self).__init__() 304 | if conv_block is None: 305 | conv_block = BasicConv2d 306 | self.pad = pad 307 | 308 | self.branch1x1 = conv_block(in_channels, 192, kernel_size=1) 309 | 310 | c7 = channels_7x7 311 | self.branch7x7_1 = conv_block(in_channels, c7, kernel_size=1) 312 | self.branch7x7_2 = conv_block(c7, c7, kernel_size=(1, 7), padding=(0, 3)) 313 | self.branch7x7_3 = conv_block(c7, 192, kernel_size=(7, 1), padding=(3, 0)) 314 | 315 | self.branch7x7dbl_1 = conv_block(in_channels, c7, kernel_size=1) 316 | self.branch7x7dbl_2 = conv_block(c7, c7, kernel_size=(7, 1), padding=(3, 0)) 317 | self.branch7x7dbl_3 = conv_block(c7, c7, kernel_size=(1, 7), padding=(0, 3)) 318 | self.branch7x7dbl_4 = conv_block(c7, c7, kernel_size=(7, 1), padding=(3, 0)) 319 | self.branch7x7dbl_5 = conv_block(c7, 192, kernel_size=(1, 7), padding=(0, 3)) 320 | 321 | self.branch_pool = conv_block(in_channels, 192, kernel_size=1) 322 | 323 | self.use_softpool = use_softpool 324 | 325 | def _forward(self, x): 326 | branch1x1 = self.branch1x1(x) 327 | 328 | branch7x7 = self.branch7x7_1(x) 329 | branch7x7 = self.branch7x7_2(branch7x7) 330 | branch7x7 = self.branch7x7_3(branch7x7) 331 | 332 | branch7x7dbl = self.branch7x7dbl_1(x) 333 | branch7x7dbl = self.branch7x7dbl_2(branch7x7dbl) 334 | branch7x7dbl = self.branch7x7dbl_3(branch7x7dbl) 335 | branch7x7dbl = self.branch7x7dbl_4(branch7x7dbl) 336 | branch7x7dbl = self.branch7x7dbl_5(branch7x7dbl) 337 | 338 | if not self.use_softpool: 339 | branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1) 340 | else: 341 | if self.pad: 342 | branch_pool = F.pad(x,(1,1,1,1),'constant', 0) 343 | else: 344 | branch_pool = F.pad(x,(0,0,0,0),'constant', 0) 345 | branch_pool = soft_pool2d(branch_pool, kernel_size=3, stride=1) 346 | 347 | branch_pool = self.branch_pool(branch_pool) 348 | 349 | outputs = [branch1x1, branch7x7, branch7x7dbl, branch_pool] 350 | return outputs 351 | 352 | def forward(self, x): 353 | outputs = self._forward(x) 354 | return torch.cat(outputs, 1) 355 | 356 | 357 | class InceptionD(nn.Module): 358 | 359 | def __init__(self, in_channels, conv_block=None, use_softpool=False): 360 | super(InceptionD, self).__init__() 361 | if conv_block is None: 362 | conv_block = BasicConv2d 363 | self.branch3x3_1 = conv_block(in_channels, 192, kernel_size=1) 364 | self.branch3x3_2 = conv_block(192, 320, kernel_size=3, stride=2) 365 | 366 | self.branch7x7x3_1 = conv_block(in_channels, 192, kernel_size=1) 367 | self.branch7x7x3_2 = conv_block(192, 192, kernel_size=(1, 7), padding=(0, 3)) 368 | self.branch7x7x3_3 = conv_block(192, 192, kernel_size=(7, 1), padding=(3, 0)) 369 | self.branch7x7x3_4 = conv_block(192, 192, kernel_size=3, stride=2) 370 | 371 | self.use_softpool = use_softpool 372 | 373 | def _forward(self, x): 374 | branch3x3 = self.branch3x3_1(x) 375 | branch3x3 = self.branch3x3_2(branch3x3) 376 | 377 | branch7x7x3 = self.branch7x7x3_1(x) 378 | branch7x7x3 = self.branch7x7x3_2(branch7x7x3) 379 | branch7x7x3 = self.branch7x7x3_3(branch7x7x3) 380 | branch7x7x3 = self.branch7x7x3_4(branch7x7x3) 381 | 382 | if not self.use_softpool: 383 | branch_pool = F.max_pool2d(x, kernel_size=3, stride=2) 384 | else: 385 | branch_pool = soft_pool2d(x, kernel_size=3, stride=2) 386 | outputs = [branch3x3, branch7x7x3, branch_pool] 387 | return outputs 388 | 389 | def forward(self, x): 390 | outputs = self._forward(x) 391 | return torch.cat(outputs, 1) 392 | 393 | 394 | class InceptionE(nn.Module): 395 | 396 | def __init__(self, in_channels, conv_block=None, use_softpool=False, pad=True): 397 | super(InceptionE, self).__init__() 398 | if conv_block is None: 399 | conv_block = BasicConv2d 400 | self.pad = pad 401 | 402 | self.branch1x1 = conv_block(in_channels, 320, kernel_size=1) 403 | 404 | self.branch3x3_1 = conv_block(in_channels, 384, kernel_size=1) 405 | self.branch3x3_2a = conv_block(384, 384, kernel_size=(1, 3), padding=(0, 1)) 406 | self.branch3x3_2b = conv_block(384, 384, kernel_size=(3, 1), padding=(1, 0)) 407 | 408 | self.branch3x3dbl_1 = conv_block(in_channels, 448, kernel_size=1) 409 | self.branch3x3dbl_2 = conv_block(448, 384, kernel_size=3, padding=1) 410 | self.branch3x3dbl_3a = conv_block(384, 384, kernel_size=(1, 3), padding=(0, 1)) 411 | self.branch3x3dbl_3b = conv_block(384, 384, kernel_size=(3, 1), padding=(1, 0)) 412 | 413 | self.branch_pool = conv_block(in_channels, 192, kernel_size=1) 414 | 415 | self.use_softpool = use_softpool 416 | 417 | def _forward(self, x): 418 | branch1x1 = self.branch1x1(x) 419 | 420 | branch3x3 = self.branch3x3_1(x) 421 | branch3x3 = [ 422 | self.branch3x3_2a(branch3x3), 423 | self.branch3x3_2b(branch3x3), 424 | ] 425 | branch3x3 = torch.cat(branch3x3, 1) 426 | 427 | branch3x3dbl = self.branch3x3dbl_1(x) 428 | branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl) 429 | branch3x3dbl = [ 430 | self.branch3x3dbl_3a(branch3x3dbl), 431 | self.branch3x3dbl_3b(branch3x3dbl), 432 | ] 433 | branch3x3dbl = torch.cat(branch3x3dbl, 1) 434 | 435 | if not self.use_softpool: 436 | branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1) 437 | else: 438 | if self.pad: 439 | branch_pool = F.pad(x,(1,1,1,1),'constant', 0) 440 | else: 441 | branch_pool = F.pad(x,(0,0,0,0),'constant', 0) 442 | branch_pool = soft_pool2d(branch_pool, kernel_size=3, stride=1) 443 | 444 | branch_pool = self.branch_pool(branch_pool) 445 | 446 | outputs = [branch1x1, branch3x3, branch3x3dbl, branch_pool] 447 | return outputs 448 | 449 | def forward(self, x): 450 | outputs = self._forward(x) 451 | return torch.cat(outputs, 1) 452 | 453 | 454 | class InceptionAux(nn.Module): 455 | 456 | def __init__(self, in_channels, num_classes, conv_block=None, use_softpool=False): 457 | super(InceptionAux, self).__init__() 458 | if conv_block is None: 459 | conv_block = BasicConv2d 460 | self.conv0 = conv_block(in_channels, 128, kernel_size=1) 461 | self.conv1 = conv_block(128, 768, kernel_size=5) 462 | self.conv1.stddev = 0.01 463 | self.fc = nn.Linear(768, num_classes) 464 | self.fc.stddev = 0.001 465 | 466 | self.use_softpool = use_softpool 467 | 468 | def forward(self, x): 469 | # N x 768 x 17 x 17 470 | if not self.use_softpool: 471 | x = F.avg_pool2d(x, kernel_size=5, stride=3) 472 | else: 473 | x = soft_pool2d(x, kernel_size=5, stride=3) 474 | # N x 768 x 5 x 5 475 | x = self.conv0(x) 476 | # N x 128 x 5 x 5 477 | x = self.conv1(x) 478 | # N x 768 x 1 x 1 479 | # Adaptive average pooling 480 | x = F.adaptive_avg_pool2d(x, (1, 1)) 481 | # N x 768 x 1 x 1 482 | x = torch.flatten(x, 1) 483 | # N x 768 484 | x = self.fc(x) 485 | # N x 1000 486 | return x 487 | 488 | 489 | class BasicConv2d(nn.Module): 490 | 491 | def __init__(self, in_channels, out_channels, **kwargs): 492 | super(BasicConv2d, self).__init__() 493 | self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs) 494 | self.bn = nn.BatchNorm2d(out_channels, eps=0.001) 495 | 496 | def forward(self, x): 497 | x = self.conv(x) 498 | x = self.bn(x) 499 | return F.relu(x, inplace=True) 500 | -------------------------------------------------------------------------------- /main/models/resnet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | import softpool_cuda 5 | from SoftPool import soft_pool2d, SoftPool2d 6 | 7 | 8 | __all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 9 | 'resnet152', 'resnext50_32x4d', 'resnext101_32x8d', 10 | 'wide_resnet50_2', 'wide_resnet101_2'] 11 | 12 | 13 | def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1): 14 | """3x3 convolution with padding""" 15 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 16 | padding=dilation, groups=groups, bias=False, dilation=dilation) 17 | 18 | 19 | def conv1x1(in_planes, out_planes, stride=1): 20 | """1x1 convolution""" 21 | return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False) 22 | 23 | 24 | class BasicBlock(nn.Module): 25 | expansion = 1 26 | 27 | def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1, 28 | base_width=64, dilation=1, norm_layer=None): 29 | super(BasicBlock, self).__init__() 30 | if norm_layer is None: 31 | norm_layer = nn.BatchNorm2d 32 | if groups != 1 or base_width != 64: 33 | raise ValueError('BasicBlock only supports groups=1 and base_width=64') 34 | if dilation > 1: 35 | raise NotImplementedError("Dilation > 1 not supported in BasicBlock") 36 | # Both self.conv1 and self.downsample layers downsample the input when stride != 1 37 | self.conv1 = conv3x3(inplanes, planes, stride) 38 | self.bn1 = norm_layer(planes) 39 | self.relu = nn.ReLU(inplace=True) 40 | self.conv2 = conv3x3(planes, planes) 41 | self.bn2 = norm_layer(planes) 42 | self.downsample = downsample 43 | self.stride = stride 44 | 45 | def forward(self, x): 46 | identity = x 47 | 48 | out = self.conv1(x) 49 | out = self.bn1(out) 50 | out = self.relu(out) 51 | 52 | out = self.conv2(out) 53 | out = self.bn2(out) 54 | 55 | if self.downsample is not None: 56 | identity = self.downsample(x) 57 | 58 | out += identity 59 | out = self.relu(out) 60 | 61 | return out 62 | 63 | 64 | class Bottleneck(nn.Module): 65 | # Bottleneck in torchvision places the stride for downsampling at 3x3 convolution(self.conv2) 66 | # while original implementation places the stride at the first 1x1 convolution(self.conv1) 67 | # according to "Deep residual learning for image recognition"https://arxiv.org/abs/1512.03385. 68 | # This variant is also known as ResNet V1.5 and improves accuracy according to 69 | # https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch. 70 | 71 | expansion = 4 72 | 73 | def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1, 74 | base_width=64, dilation=1, norm_layer=None): 75 | super(Bottleneck, self).__init__() 76 | if norm_layer is None: 77 | norm_layer = nn.BatchNorm2d 78 | width = int(planes * (base_width / 64.)) * groups 79 | # Both self.conv2 and self.downsample layers downsample the input when stride != 1 80 | self.conv1 = conv1x1(inplanes, width) 81 | self.bn1 = norm_layer(width) 82 | self.conv2 = conv3x3(width, width, stride, groups, dilation) 83 | self.bn2 = norm_layer(width) 84 | self.conv3 = conv1x1(width, planes * self.expansion) 85 | self.bn3 = norm_layer(planes * self.expansion) 86 | self.relu = nn.ReLU(inplace=True) 87 | self.downsample = downsample 88 | self.stride = stride 89 | 90 | def forward(self, x): 91 | identity = x 92 | 93 | out = self.conv1(x) 94 | out = self.bn1(out) 95 | out = self.relu(out) 96 | 97 | out = self.conv2(out) 98 | out = self.bn2(out) 99 | out = self.relu(out) 100 | 101 | out = self.conv3(out) 102 | out = self.bn3(out) 103 | 104 | if self.downsample is not None: 105 | identity = self.downsample(x) 106 | 107 | out += identity 108 | out = self.relu(out) 109 | 110 | return out 111 | 112 | 113 | class ResNet(nn.Module): 114 | 115 | def __init__(self, block, layers, use_softpool=True, num_classes=1000, zero_init_residual=False, groups=1, width_per_group=64, replace_stride_with_dilation=None, norm_layer=None): 116 | super(ResNet, self).__init__() 117 | if norm_layer is None: 118 | norm_layer = nn.BatchNorm2d 119 | self._norm_layer = norm_layer 120 | 121 | self.inplanes = 64 122 | self.dilation = 1 123 | if replace_stride_with_dilation is None: 124 | # each element in the tuple indicates if we should replace 125 | # the 2x2 stride with a dilated convolution instead 126 | replace_stride_with_dilation = [False, False, False] 127 | if len(replace_stride_with_dilation) != 3: 128 | raise ValueError("replace_stride_with_dilation should be None " 129 | "or a 3-element tuple, got {}".format(replace_stride_with_dilation)) 130 | self.groups = groups 131 | self.base_width = width_per_group 132 | self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, 133 | bias=False) 134 | self.bn1 = norm_layer(self.inplanes) 135 | self.relu = nn.ReLU(inplace=True) 136 | 137 | if not use_softpool: 138 | self.pool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 139 | else: 140 | self.pool = SoftPool2d(kernel_size=(2,2), stride=(2,2)) 141 | 142 | self.layer1 = self._make_layer(block, 64, layers[0]) 143 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2, 144 | dilate=replace_stride_with_dilation[0]) 145 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2, 146 | dilate=replace_stride_with_dilation[1]) 147 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2, 148 | dilate=replace_stride_with_dilation[2]) 149 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) 150 | self.fc = nn.Linear(512 * block.expansion, num_classes) 151 | 152 | for m in self.modules(): 153 | if isinstance(m, nn.Conv2d): 154 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 155 | elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)): 156 | nn.init.constant_(m.weight, 1) 157 | nn.init.constant_(m.bias, 0) 158 | 159 | # Zero-initialize the last BN in each residual branch, 160 | # so that the residual branch starts with zeros, and each residual block behaves like an identity. 161 | # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677 162 | if zero_init_residual: 163 | for m in self.modules(): 164 | if isinstance(m, Bottleneck): 165 | nn.init.constant_(m.bn3.weight, 0) 166 | elif isinstance(m, BasicBlock): 167 | nn.init.constant_(m.bn2.weight, 0) 168 | 169 | def _make_layer(self, block, planes, blocks, stride=1, dilate=False): 170 | norm_layer = self._norm_layer 171 | downsample = None 172 | previous_dilation = self.dilation 173 | if dilate: 174 | self.dilation *= stride 175 | stride = 1 176 | if stride != 1 or self.inplanes != planes * block.expansion: 177 | downsample = nn.Sequential( 178 | conv1x1(self.inplanes, planes * block.expansion, stride), 179 | norm_layer(planes * block.expansion), 180 | ) 181 | 182 | layers = [] 183 | layers.append(block(self.inplanes, planes, stride, downsample, self.groups, 184 | self.base_width, previous_dilation, norm_layer)) 185 | self.inplanes = planes * block.expansion 186 | for _ in range(1, blocks): 187 | layers.append(block(self.inplanes, planes, groups=self.groups, 188 | base_width=self.base_width, dilation=self.dilation, 189 | norm_layer=norm_layer)) 190 | 191 | return nn.Sequential(*layers) 192 | 193 | def _forward_impl(self, x): 194 | # See note [TorchScript super()] 195 | x = self.conv1(x) 196 | x = self.bn1(x) 197 | x = self.relu(x) 198 | x = self.pool(x) 199 | 200 | x = self.layer1(x) 201 | x = self.layer2(x) 202 | x = self.layer3(x) 203 | x = self.layer4(x) 204 | 205 | x = self.avgpool(x) 206 | x = torch.flatten(x, 1) 207 | x = self.fc(x) 208 | 209 | return x 210 | 211 | def forward(self, x): 212 | return self._forward_impl(x) 213 | 214 | 215 | def _resnet(arch, block, layers, pretrained, progress, use_softpool, **kwargs): 216 | model = ResNet(block, layers, use_softpool, **kwargs) 217 | if pretrained: 218 | state_dict = load_state_dict_from_url(model_urls[arch], 219 | progress=progress) 220 | model.load_state_dict(state_dict) 221 | return model 222 | 223 | 224 | def resnet18(pretrained=False, progress=True, use_softpool=True, **kwargs): 225 | r"""ResNet-18 model from 226 | `"Deep Residual Learning for Image Recognition" `_ 227 | 228 | Args: 229 | pretrained (bool): If True, returns a model pre-trained on ImageNet 230 | progress (bool): If True, displays a progress bar of the download to stderr 231 | use_softpool (bool): If True, changes pooling operations to softpooling 232 | """ 233 | return _resnet('resnet18', BasicBlock, [2, 2, 2, 2], pretrained, progress, use_softpool, 234 | **kwargs) 235 | 236 | 237 | def resnet34(pretrained=False, progress=True, use_softpool=True, **kwargs): 238 | r"""ResNet-34 model from 239 | `"Deep Residual Learning for Image Recognition" `_ 240 | 241 | Args: 242 | pretrained (bool): If True, returns a model pre-trained on ImageNet 243 | progress (bool): If True, displays a progress bar of the download to stderr 244 | use_softpool (bool): If True, changes pooling operations to softpooling 245 | """ 246 | return _resnet('resnet34', BasicBlock, [3, 4, 6, 3], pretrained, progress, use_softpool, 247 | **kwargs) 248 | 249 | 250 | def resnet50(pretrained=False, progress=True, use_softpool=True, **kwargs): 251 | r"""ResNet-50 model from 252 | `"Deep Residual Learning for Image Recognition" `_ 253 | 254 | Args: 255 | pretrained (bool): If True, returns a model pre-trained on ImageNet 256 | progress (bool): If True, displays a progress bar of the download to stderr 257 | use_softpool (bool): If True, changes pooling operations to softpooling 258 | """ 259 | return _resnet('resnet50', Bottleneck, [3, 4, 6, 3], pretrained, progress, use_softpool, 260 | **kwargs) 261 | 262 | 263 | def resnet101(pretrained=False, progress=True, use_softpool=True, **kwargs): 264 | r"""ResNet-101 model from 265 | `"Deep Residual Learning for Image Recognition" `_ 266 | 267 | Args: 268 | pretrained (bool): If True, returns a model pre-trained on ImageNet 269 | progress (bool): If True, displays a progress bar of the download to stderr 270 | use_softpool (bool): If True, changes pooling operations to softpooling 271 | """ 272 | return _resnet('resnet101', Bottleneck, [3, 4, 23, 3], pretrained, progress, use_softpool, 273 | **kwargs) 274 | 275 | 276 | def resnet152(pretrained=False, progress=True, use_softpool=True, **kwargs): 277 | r"""ResNet-152 model from 278 | `"Deep Residual Learning for Image Recognition" `_ 279 | 280 | Args: 281 | pretrained (bool): If True, returns a model pre-trained on ImageNet 282 | progress (bool): If True, displays a progress bar of the download to stderr 283 | use_softpool (bool): If True, changes pooling operations to softpooling 284 | """ 285 | return _resnet('resnet152', Bottleneck, [3, 8, 36, 3], pretrained, progress, use_softpool, 286 | **kwargs) 287 | 288 | def resnet200(pretrained=False, progress=True, use_softpool=True, **kwargs): 289 | r"""ResNet-200 model from 290 | `"Deep Residual Learning for Image Recognition" `_ 291 | 292 | Args: 293 | pretrained (bool): If True, returns a model pre-trained on ImageNet 294 | progress (bool): If True, displays a progress bar of the download to stderr 295 | use_softpool (bool): If True, changes pooling operations to softpooling 296 | """ 297 | return _resnet('resnet200', Bottleneck, [3, 24, 36, 3], pretrained, progress, use_softpool, 298 | **kwargs) 299 | 300 | 301 | def resnext50_32x4d(pretrained=False, progress=True, use_softpool=True, **kwargs): 302 | r"""ResNeXt-50 32x4d model from 303 | `"Aggregated Residual Transformation for Deep Neural Networks" `_ 304 | 305 | Args: 306 | pretrained (bool): If True, returns a model pre-trained on ImageNet 307 | progress (bool): If True, displays a progress bar of the download to stderr 308 | use_softpool (bool): If True, changes pooling operations to softpooling 309 | """ 310 | kwargs['groups'] = 32 311 | kwargs['width_per_group'] = 4 312 | return _resnet('resnext50_32x4d', Bottleneck, [3, 4, 6, 3], 313 | pretrained, progress, use_softpool, **kwargs) 314 | 315 | def resnext101_32x4d(pretrained=False, progress=True, use_softpool=True, **kwargs): 316 | r"""ResNeXt-101 32x4d model from 317 | `"Aggregated Residual Transformation for Deep Neural Networks" `_ 318 | 319 | Args: 320 | pretrained (bool): If True, returns a model pre-trained on ImageNet 321 | progress (bool): If True, displays a progress bar of the download to stderr 322 | use_softpool (bool): If True, changes pooling operations to softpooling 323 | """ 324 | kwargs['groups'] = 32 325 | kwargs['width_per_group'] = 4 326 | return _resnet('resnext101_32x4d', Bottleneck, [3, 4, 23, 3], 327 | pretrained, progress, use_softpool, **kwargs) 328 | 329 | def resnext101_64x4d(pretrained=False, progress=True, use_softpool=True, **kwargs): 330 | r"""ResNeXt-101 64x4d model from 331 | `"Aggregated Residual Transformation for Deep Neural Networks" `_ 332 | 333 | Args: 334 | pretrained (bool): If True, returns a model pre-trained on ImageNet 335 | progress (bool): If True, displays a progress bar of the download to stderr 336 | use_softpool (bool): If True, changes pooling operations to softpooling 337 | """ 338 | kwargs['groups'] = 64 339 | kwargs['width_per_group'] = 4 340 | return _resnet('resnext101_64x4d', Bottleneck, [3, 4, 23, 3], 341 | pretrained, progress, use_softpool, **kwargs) 342 | 343 | def resnext101_32x8d(pretrained=False, progress=True, use_softpool=True, **kwargs): 344 | r"""ResNeXt-101 32x8d model from 345 | `"Aggregated Residual Transformation for Deep Neural Networks" `_ 346 | 347 | Args: 348 | pretrained (bool): If True, returns a model pre-trained on ImageNet 349 | progress (bool): If True, displays a progress bar of the download to stderr 350 | use_softpool (bool): If True, changes pooling operations to softpooling 351 | """ 352 | kwargs['groups'] = 32 353 | kwargs['width_per_group'] = 8 354 | return _resnet('resnext101_32x8d', Bottleneck, [3, 4, 23, 3], 355 | pretrained, progress, use_softpool, **kwargs) 356 | 357 | 358 | def wide_resnet50_2(pretrained=False, progress=True, use_softpool=True, **kwargs): 359 | r"""Wide ResNet-50-2 model from 360 | `"Wide Residual Networks" `_ 361 | 362 | The model is the same as ResNet except for the bottleneck number of channels 363 | which is twice larger in every block. The number of channels in outer 1x1 364 | convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048 365 | channels, and in Wide ResNet-50-2 has 2048-1024-2048. 366 | 367 | Args: 368 | pretrained (bool): If True, returns a model pre-trained on ImageNet 369 | progress (bool): If True, displays a progress bar of the download to stderr 370 | use_softpool (bool): If True, changes pooling operations to softpooling 371 | """ 372 | kwargs['width_per_group'] = 64 * 2 373 | return _resnet('wide_resnet50_2', Bottleneck, [3, 4, 6, 3], 374 | pretrained, progress, use_softpool, **kwargs) 375 | 376 | 377 | def wide_resnet101_2(pretrained=False, progress=True, use_softpool=True, **kwargs): 378 | r"""Wide ResNet-101-2 model from 379 | `"Wide Residual Networks" `_ 380 | 381 | The model is the same as ResNet except for the bottleneck number of channels 382 | which is twice larger in every block. The number of channels in outer 1x1 383 | convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048 384 | channels, and in Wide ResNet-50-2 has 2048-1024-2048. 385 | 386 | Args: 387 | pretrained (bool): If True, returns a model pre-trained on ImageNet 388 | progress (bool): If True, displays a progress bar of the download to stderr 389 | use_softpool (bool): If True, changes pooling operations to softpooling 390 | """ 391 | kwargs['width_per_group'] = 64 * 2 392 | return _resnet('wide_resnet101_2', Bottleneck, [3, 4, 23, 3], 393 | pretrained, progress, use_softpool, **kwargs) 394 | 395 | if __name__ == "__main__": 396 | #from ptflops import get_model_complexity_info 397 | #tmp = (3,224,224) 398 | net = resnet18(use_softpool=False) 399 | net.load_state_dict(torch.load('weights/resnet-18_best.pth')) 400 | #macs, params = get_model_complexity_info(net, tmp, as_strings=True,print_per_layer_stat=False, verbose=False) 401 | #print('{:<30} {:<8}'.format('Computational complexity: ', macs)) 402 | #print('{:<30} {:<8}'.format('Number of parameters: ', params)) 403 | #print('network 1 test passed \n') 404 | -------------------------------------------------------------------------------- /main/train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import random 4 | import shutil 5 | import time 6 | import warnings 7 | import pathlib 8 | import csv 9 | import torch 10 | import torch.nn as nn 11 | import torch.nn.parallel 12 | import torch.backends.cudnn as cudnn 13 | import torch.distributed as dist 14 | import torch.optim 15 | import torch.multiprocessing as mp 16 | import torch.utils.data 17 | import torch.utils.data.distributed 18 | import torchvision.transforms as transforms 19 | import torchvision.datasets as datasets 20 | 21 | from models import config 22 | 23 | model_names = config.models 24 | 25 | parser = argparse.ArgumentParser(description='PyTorch image training') 26 | parser.add_argument('-d','--dataset', metavar='DATASET', 27 | help='dataset selection', choices=['imagenet','cifar10','cifar100'], 28 | default='imagenet') 29 | parser.add_argument('-dir','--data_dir', metavar='DIR', 30 | help='path to dataset',default='/home/agstergiou/Desktop') 31 | parser.add_argument('-a', '--arch', metavar='ARCH', default='resnet18', 32 | choices=model_names, 33 | help='model architecture: ' + 34 | ' | '.join(model_names) + 35 | ' (default: resnet18)') 36 | parser.add_argument('-j', '--workers', default=8, type=int, metavar='N', 37 | help='number of data loading workers (default: 4)') 38 | parser.add_argument('--epochs', default=90, type=int, metavar='N', 39 | help='number of total epochs to run') 40 | parser.add_argument('--start-epoch', default=0, type=int, metavar='N', 41 | help='manual epoch number (useful on restarts)') 42 | parser.add_argument('--use_softpool', metavar='POOL', type=bool, 43 | help='use softpool', default=True) 44 | parser.add_argument('-b', '--batch-size', default=128, type=int, 45 | metavar='N', 46 | help='mini-batch size (default: 256), this is the total ' 47 | 'batch size of all GPUs on the current node when ' 48 | 'using Data Parallel or Distributed Data Parallel') 49 | parser.add_argument('--lr', '--learning-rate', default=0.1, type=float, 50 | metavar='LR', help='initial learning rate', dest='lr') 51 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', 52 | help='momentum') 53 | parser.add_argument('--wd', '--weight-decay', default=1e-4, type=float, 54 | metavar='W', help='weight decay (default: 1e-4)', 55 | dest='weight_decay') 56 | parser.add_argument('-p', '--print-freq', default=1, type=int, 57 | metavar='N', help='print frequency (default: 10)') 58 | parser.add_argument('--resume', default='', type=str, metavar='PATH', 59 | help='path to latest checkpoint (default: none)') 60 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', 61 | help='evaluate model on validation set') 62 | parser.add_argument('--pretrained', dest='pretrained', action='store_true', 63 | help='directory if using pre-trained model', default=None) 64 | parser.add_argument('--world-size', default=-1, type=int, 65 | help='number of nodes for distributed training') 66 | parser.add_argument('--rank', default=-1, type=int, 67 | help='node rank for distributed training') 68 | parser.add_argument('--dist-url', default='tcp://132.211.32.23:22', type=str, 69 | help='url used to set up distributed training') 70 | parser.add_argument('--dist-backend', default='nccl', type=str, 71 | help='distributed backend') 72 | parser.add_argument('--seed', default=None, type=int, 73 | help='seed for initializing training. ') 74 | parser.add_argument('--gpu', default=None, type=int, 75 | help='GPU id to use.') 76 | parser.add_argument('--multiprocessing-distributed', action='store_true', 77 | help='Use multi-processing distributed training to launch ' 78 | 'N processes per node, which has N GPUs. This is the ' 79 | 'fastest way to use PyTorch for either single node or ' 80 | 'multi node data parallel training') 81 | 82 | best_acc1 = 0 83 | 84 | 85 | def main(): 86 | args = parser.parse_args() 87 | 88 | if args.seed is not None: 89 | random.seed(args.seed) 90 | torch.manual_seed(args.seed) 91 | cudnn.deterministic = True 92 | warnings.warn('You have chosen to seed training. ' 93 | 'This will turn on the CUDNN deterministic setting, ' 94 | 'which can slow down your training considerably! ' 95 | 'You may see unexpected behavior when restarting ' 96 | 'from checkpoints.') 97 | 98 | if args.gpu is not None: 99 | warnings.warn('You have chosen a specific GPU. This will completely ' 100 | 'disable data parallelism.') 101 | 102 | if args.dist_url == "env://" and args.world_size == -1: 103 | args.world_size = int(os.environ["WORLD_SIZE"]) 104 | 105 | args.distributed = args.world_size > 1 or args.multiprocessing_distributed 106 | 107 | ngpus_per_node = torch.cuda.device_count() 108 | if args.multiprocessing_distributed: 109 | # Since we have ngpus_per_node processes per node, the total world_size 110 | # needs to be adjusted accordingly 111 | args.world_size = ngpus_per_node * args.world_size 112 | # Use torch.multiprocessing.spawn to launch distributed processes: the 113 | # main_worker process function 114 | mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) 115 | else: 116 | # Simply call main_worker function 117 | main_worker(args.gpu, ngpus_per_node, args) 118 | 119 | 120 | def main_worker(gpu, ngpus_per_node, args): 121 | global best_acc1 122 | args.gpu = gpu 123 | 124 | if args.gpu is not None: 125 | print("Use GPU: {} for training".format(args.gpu)) 126 | 127 | if args.distributed: 128 | if args.dist_url == "env://" and args.rank == -1: 129 | args.rank = int(os.environ["RANK"]) 130 | if args.multiprocessing_distributed: 131 | # For multiprocessing distributed training, rank needs to be the 132 | # global rank among all the processes 133 | args.rank = args.rank * ngpus_per_node + gpu 134 | dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, 135 | world_size=args.world_size, rank=args.rank) 136 | # create model 137 | model = config.get_model(args.arch,args.use_softpool) 138 | 139 | if not torch.cuda.is_available(): 140 | print('using CPU, this will be slow') 141 | elif args.distributed: 142 | # For multiprocessing distributed, DistributedDataParallel constructor 143 | # should always set the single device scope, otherwise, 144 | # DistributedDataParallel will use all available devices. 145 | if args.gpu is not None: 146 | torch.cuda.set_device(args.gpu) 147 | model.cuda(args.gpu) 148 | # When using a single GPU per process and per 149 | # DistributedDataParallel, we need to divide the batch size 150 | # ourselves based on the total number of GPUs we have 151 | args.batch_size = int(args.batch_size / ngpus_per_node) 152 | args.workers = int((args.workers + ngpus_per_node - 1) / ngpus_per_node) 153 | model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu]) 154 | else: 155 | model.cuda() 156 | # DistributedDataParallel will divide and allocate batch_size to all 157 | # available GPUs if device_ids are not set 158 | model = torch.nn.parallel.DistributedDataParallel(model) 159 | elif args.gpu is not None: 160 | torch.cuda.set_device(args.gpu) 161 | model = model.cuda(args.gpu) 162 | else: 163 | # DataParallel will divide and allocate batch_size to all available GPUs 164 | if args.arch.startswith('alexnet') or args.arch.startswith('vgg'): 165 | model.features = torch.nn.DataParallel(model.features) 166 | model.cuda() 167 | else: 168 | model = torch.nn.DataParallel(model).cuda() 169 | 170 | # Load weights 171 | if args.pretrained is not None: 172 | print("=> using pre-trained model '{}' from dir '{}' ".format(args.arch,args.pretrained)) 173 | model.load_state_dict(torch.load(args.pretrained)['state_dict']) 174 | else: 175 | print("=> initialised model '{}'".format(args.arch)) 176 | 177 | # define loss function (criterion) and optimizer 178 | criterion = nn.CrossEntropyLoss().cuda(args.gpu) 179 | 180 | optimizer = torch.optim.SGD(model.parameters(), args.lr, 181 | momentum=args.momentum, 182 | weight_decay=args.weight_decay) 183 | 184 | # optionally resume from a checkpoint 185 | if args.resume: 186 | if os.path.isfile(args.resume): 187 | print("=> loading checkpoint '{}'".format(args.resume)) 188 | if args.gpu is None: 189 | checkpoint = torch.load(args.resume) 190 | else: 191 | # Map model to be loaded to specified single gpu. 192 | loc = 'cuda:{}'.format(args.gpu) 193 | checkpoint = torch.load(args.resume, map_location=loc) 194 | args.start_epoch = checkpoint['epoch'] 195 | best_acc1 = checkpoint['best_acc1'] 196 | if args.gpu is not None: 197 | # best_acc1 may be from a checkpoint from a different GPU 198 | best_acc1 = best_acc1.to(args.gpu) 199 | model.load_state_dict(checkpoint['state_dict']) 200 | optimizer.load_state_dict(checkpoint['optimizer']) 201 | print("=> loaded checkpoint '{}' (epoch {})" 202 | .format(args.resume, checkpoint['epoch'])) 203 | else: 204 | print("=> no checkpoint found at '{}'".format(args.resume)) 205 | 206 | cudnn.benchmark = True 207 | 208 | # Data loading code 209 | geometrical = transforms.RandomApply([ 210 | transforms.RandomRotation(degrees=10), 211 | transforms.RandomHorizontalFlip(p=0.3)], 212 | p=.6 213 | ) 214 | colour = transforms.RandomApply( 215 | [transforms.ColorJitter(brightness=.1,contrast=.05,saturation=.01,hue=.05)], 216 | p=.4) 217 | 218 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], 219 | std=[0.229, 0.224, 0.225]) 220 | 221 | if args.dataset == 'imagenet': 222 | train_dataset = datasets.ImageNet( 223 | root=os.path.join(args.data_dir,'ILSVRC2012'), 224 | split='train', 225 | transform=transforms.Compose([ 226 | geometrical, 227 | transforms.RandomResizedCrop(299), 228 | transforms.ToTensor(), 229 | normalize, 230 | ]) 231 | ) 232 | val_dataset = datasets.ImageNet( 233 | root=os.path.join(args.data_dir,'ILSVRC2012'), 234 | split='val', 235 | transform=transforms.Compose([ 236 | transforms.Resize(324), 237 | transforms.CenterCrop(299), 238 | transforms.ToTensor(), 239 | normalize, 240 | ]) 241 | ) 242 | 243 | elif args.dataset == 'cifar10': 244 | train_dataset = datasets.CIFAR10( 245 | root=os.path.join(args.data_dir,'cifar-10-batches-py'), 246 | train=True, 247 | transform=transforms.Compose([ 248 | geometrical, 249 | colour, 250 | transforms.ToTensor(), 251 | normalize, 252 | ]) 253 | ) 254 | val_dataset = datasets.CIFAR10( 255 | root=os.path.join(args.data_dir,'cifar-10-batches-py'), 256 | train=False, 257 | transform=transforms.Compose([ 258 | transforms.ToTensor(), 259 | normalize, 260 | ]) 261 | ) 262 | else: 263 | train_dataset = datasets.CIFAR100( 264 | root=os.path.join(args.data_dir,'cifar-100-python'), 265 | train=True, 266 | transform=transforms.Compose([ 267 | geometrical, 268 | colour, 269 | transforms.ToTensor(), 270 | normalize, 271 | ]) 272 | ) 273 | val_dataset = datasets.CIFAR100( 274 | root=os.path.join(args.data_dir,'cifar-100-python'), 275 | train=False, 276 | transform=transforms.Compose([ 277 | transforms.ToTensor(), 278 | normalize, 279 | ]) 280 | ) 281 | 282 | 283 | if args.distributed: 284 | train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) 285 | else: 286 | train_sampler = None 287 | 288 | train_loader = torch.utils.data.DataLoader( 289 | train_dataset, batch_size=args.batch_size, shuffle=(train_sampler is None), 290 | num_workers=args.workers, pin_memory=True, sampler=train_sampler) 291 | 292 | val_loader = torch.utils.data.DataLoader( 293 | val_dataset, batch_size=args.batch_size, shuffle=False, 294 | num_workers=args.workers, pin_memory=True) 295 | 296 | if args.evaluate: 297 | validate(val_loader, model, criterion, args) 298 | return 299 | 300 | # Create csv file or open if it already exists 301 | pathlib.Path('results/'+args.arch).mkdir(parents=True, exist_ok=True) 302 | with open ('results/'+args.arch+'/accuracies.csv','a', newline='') as my_file: 303 | csv_writter = csv.writer(my_file, delimiter=';', quotechar='"', quoting=csv.QUOTE_ALL, lineterminator='\n') 304 | 305 | 306 | for epoch in range(args.start_epoch, args.epochs): 307 | if args.distributed: 308 | train_sampler.set_epoch(epoch) 309 | adjust_learning_rate(optimizer, epoch, args) 310 | 311 | # train for one epoch 312 | train(train_loader, model, criterion, optimizer, epoch, args) 313 | 314 | # evaluate on validation set 315 | loss,acc1,acc5 = validate(val_loader, model, criterion, args) 316 | 317 | csv_writter.writerow([str(epoch),str(loss),str(acc1.item()),str(acc5.item())]) 318 | 319 | # remember best acc@1 and save checkpoint 320 | is_best = acc1 > best_acc1 321 | best_acc1 = max(acc1, best_acc1) 322 | 323 | pathlib.Path('saved_weights/'+args.arch).mkdir(parents=True, exist_ok=True) 324 | 325 | save_checkpoint({ 326 | 'epoch': epoch + 1, 327 | 'arch': args.arch, 328 | 'state_dict': model.state_dict(), 329 | 'best_acc1': best_acc1, 330 | 'optimizer' : optimizer.state_dict(), 331 | }, is_best, 332 | filename='saved_weights/'+args.arch+'/checkpoint_'+str(epoch+1)+'.pth') 333 | 334 | 335 | def train(train_loader, model, criterion, optimizer, epoch, args): 336 | batch_time = AverageMeter('Time', ':6.3f') 337 | data_time = AverageMeter('Data', ':6.3f') 338 | losses = AverageMeter('Loss', ':.4e') 339 | top1 = AverageMeter('Acc@1', ':6.2f') 340 | top5 = AverageMeter('Acc@5', ':6.2f') 341 | progress = ProgressMeter( 342 | len(train_loader), 343 | [batch_time, data_time, losses, top1, top5], 344 | prefix="Epoch: [{}]".format(epoch)) 345 | 346 | # switch to train mode 347 | model.train() 348 | 349 | end = time.time() 350 | for i, (images, target) in enumerate(train_loader): 351 | # measure data loading time 352 | data_time.update(time.time() - end) 353 | 354 | if args.gpu is not None: 355 | images = images.cuda(args.gpu, non_blocking=True) 356 | if torch.cuda.is_available(): 357 | target = target.cuda(args.gpu, non_blocking=True) 358 | 359 | # compute output 360 | output = model(images) 361 | loss = criterion(output, target) 362 | 363 | # measure accuracy and record loss 364 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 365 | losses.update(loss.item(), images.size(0)) 366 | top1.update(acc1[0], images.size(0)) 367 | top5.update(acc5[0], images.size(0)) 368 | 369 | # compute gradient and do SGD step 370 | optimizer.zero_grad() 371 | loss.backward() 372 | optimizer.step() 373 | 374 | # measure elapsed time 375 | batch_time.update(time.time() - end) 376 | end = time.time() 377 | 378 | if i % args.print_freq == 0: 379 | progress.display(i) 380 | 381 | 382 | def validate(val_loader, model, criterion, args): 383 | batch_time = AverageMeter('Time', ':6.3f') 384 | losses = AverageMeter('Loss', ':.4e') 385 | top1 = AverageMeter('Acc@1', ':6.2f') 386 | top5 = AverageMeter('Acc@5', ':6.2f') 387 | progress = ProgressMeter( 388 | len(val_loader), 389 | [batch_time, losses, top1, top5], 390 | prefix='Test: ') 391 | 392 | # switch to evaluate mode 393 | model.eval() 394 | 395 | with torch.no_grad(): 396 | end = time.time() 397 | for i, (images, target) in enumerate(val_loader): 398 | if args.gpu is not None: 399 | images = images.cuda(args.gpu, non_blocking=True) 400 | if torch.cuda.is_available(): 401 | target = target.cuda(args.gpu, non_blocking=True) 402 | 403 | # compute output 404 | output = model(images) 405 | loss = criterion(output, target) 406 | 407 | # measure accuracy and record loss 408 | acc1, acc5 = accuracy(output, target, topk=(1, 5)) 409 | losses.update(loss.item(), images.size(0)) 410 | top1.update(acc1[0], images.size(0)) 411 | top5.update(acc5[0], images.size(0)) 412 | 413 | # measure elapsed time 414 | batch_time.update(time.time() - end) 415 | end = time.time() 416 | 417 | if i % args.print_freq == 0: 418 | progress.display(i) 419 | 420 | # TODO: this should also be done with the ProgressMeter 421 | print(' * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}' 422 | .format(top1=top1, top5=top5)) 423 | 424 | return (losses.avg,top1.avg,top5.avg) 425 | 426 | 427 | def save_checkpoint(state, is_best, filename='saved_weights/checkpoint.pth'): 428 | torch.save(state, filename) 429 | if is_best: 430 | shutil.copyfile(filename, filename.split('checkpoint')[0]+'model_best.pth') 431 | 432 | 433 | class AverageMeter(object): 434 | """Computes and stores the average and current value""" 435 | def __init__(self, name, fmt=':f'): 436 | self.name = name 437 | self.fmt = fmt 438 | self.reset() 439 | 440 | def reset(self): 441 | self.val = 0 442 | self.avg = 0 443 | self.sum = 0 444 | self.count = 0 445 | 446 | def update(self, val, n=1): 447 | self.val = val 448 | self.sum += val * n 449 | self.count += n 450 | self.avg = self.sum / self.count 451 | 452 | def __str__(self): 453 | fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' 454 | return fmtstr.format(**self.__dict__) 455 | 456 | 457 | class ProgressMeter(object): 458 | def __init__(self, num_batches, meters, prefix=""): 459 | self.batch_fmtstr = self._get_batch_fmtstr(num_batches) 460 | self.meters = meters 461 | self.prefix = prefix 462 | 463 | def display(self, batch): 464 | entries = [self.prefix + self.batch_fmtstr.format(batch)] 465 | entries += [str(meter) for meter in self.meters] 466 | print('\t'.join(entries)) 467 | 468 | def _get_batch_fmtstr(self, num_batches): 469 | num_digits = len(str(num_batches // 1)) 470 | fmt = '{:' + str(num_digits) + 'd}' 471 | return '[' + fmt + '/' + fmt.format(num_batches) + ']' 472 | 473 | 474 | def adjust_learning_rate(optimizer, epoch, args): 475 | """Sets the learning rate to the initial LR decayed by 10 every 30 epochs""" 476 | lr = args.lr * (0.1 ** (epoch // 30)) 477 | for param_group in optimizer.param_groups: 478 | param_group['lr'] = lr 479 | 480 | 481 | def accuracy(output, target, topk=(1,)): 482 | """Computes the accuracy over the k top predictions for the specified values of k""" 483 | with torch.no_grad(): 484 | maxk = max(topk) 485 | batch_size = target.size(0) 486 | 487 | _, pred = output.topk(maxk, 1, True, True) 488 | pred = pred.t() 489 | correct = pred.eq(target.view(1, -1).expand_as(pred)) 490 | 491 | res = [] 492 | for k in topk: 493 | correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) 494 | res.append(correct_k.mul_(100.0 / batch_size)) 495 | return res 496 | 497 | 498 | if __name__ == '__main__': 499 | main() 500 | -------------------------------------------------------------------------------- /pytorch/._setup.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/pytorch/._setup.py -------------------------------------------------------------------------------- /pytorch/CUDA/limits.cuh: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | #include 6 | #include 7 | 8 | // NumericLimits.cuh is a holder for numeric limits definitions of commonly used 9 | // types. This header is very specific to ROCm HIP and may be removed in the future. 10 | // This header is derived from the legacy THCNumerics.cuh. 11 | 12 | // The lower_bound and upper_bound constants are same as lowest and max for 13 | // integral types, but are -inf and +inf for floating point types. They are 14 | // useful in implementing min, max, etc. 15 | 16 | namespace at { 17 | 18 | template 19 | struct n_limits { 20 | }; 21 | 22 | // WARNING: the following at::numeric_limits definitions are there only to support 23 | // HIP compilation for the moment. Use std::numeric_limits if you are not 24 | // compiling for ROCm. 25 | // from @colesbury: "The functions on numeric_limits aren't marked with 26 | // __device__ which is why they don't work with ROCm. CUDA allows them 27 | // because they're constexpr." 28 | 29 | //namespace { 30 | // ROCm doesn't like INFINITY too. 31 | //constexpr double inf = INFINITY; 32 | //} 33 | 34 | template <> 35 | struct n_limits { 36 | static inline __host__ __device__ bool lowest() { return false; } 37 | static inline __host__ __device__ bool min() { return false; } 38 | static inline __host__ __device__ bool max() { return true; } 39 | static inline __host__ __device__ bool lower_bound() { return false; } 40 | static inline __host__ __device__ bool upper_bound() { return true; } 41 | }; 42 | 43 | template <> 44 | struct n_limits { 45 | static inline __host__ __device__ uint8_t lowest() { return 0; } 46 | static inline __host__ __device__ uint8_t min() { return 0; } 47 | static inline __host__ __device__ uint8_t max() { return UINT8_MAX; } 48 | static inline __host__ __device__ uint8_t lower_bound() { return 0; } 49 | static inline __host__ __device__ uint8_t upper_bound() { return UINT8_MAX; } 50 | }; 51 | 52 | template <> 53 | struct n_limits { 54 | static inline __host__ __device__ int8_t lowest() { return INT8_MIN; } 55 | static inline __host__ __device__ int8_t min() { return INT8_MIN; } 56 | static inline __host__ __device__ int8_t max() { return INT8_MAX; } 57 | static inline __host__ __device__ int8_t lower_bound() { return INT8_MIN; } 58 | static inline __host__ __device__ int8_t upper_bound() { return INT8_MAX; } 59 | }; 60 | 61 | template <> 62 | struct n_limits { 63 | static inline __host__ __device__ int16_t lowest() { return INT16_MIN; } 64 | static inline __host__ __device__ int16_t min() { return INT16_MIN; } 65 | static inline __host__ __device__ int16_t max() { return INT16_MAX; } 66 | static inline __host__ __device__ int16_t lower_bound() { return INT16_MIN; } 67 | static inline __host__ __device__ int16_t upper_bound() { return INT16_MAX; } 68 | }; 69 | 70 | template <> 71 | struct n_limits { 72 | static inline __host__ __device__ int32_t lowest() { return INT32_MIN; } 73 | static inline __host__ __device__ int32_t min() { return INT32_MIN; } 74 | static inline __host__ __device__ int32_t max() { return INT32_MAX; } 75 | static inline __host__ __device__ int32_t lower_bound() { return INT32_MIN; } 76 | static inline __host__ __device__ int32_t upper_bound() { return INT32_MAX; } 77 | }; 78 | 79 | template <> 80 | struct n_limits { 81 | #ifdef _MSC_VER 82 | static inline __host__ __device__ int64_t lowest() { return _I64_MIN; } 83 | static inline __host__ __device__ int64_t min() { return _I64_MIN; } 84 | static inline __host__ __device__ int64_t max() { return _I64_MAX; } 85 | static inline __host__ __device__ int64_t lower_bound() { return _I64_MIN; } 86 | static inline __host__ __device__ int64_t upper_bound() { return _I64_MAX; } 87 | #else 88 | static inline __host__ __device__ int64_t lowest() { return INT64_MIN; } 89 | static inline __host__ __device__ int64_t min() { return INT64_MIN; } 90 | static inline __host__ __device__ int64_t max() { return INT64_MAX; } 91 | static inline __host__ __device__ int64_t lower_bound() { return INT64_MIN; } 92 | static inline __host__ __device__ int64_t upper_bound() { return INT64_MAX; } 93 | #endif 94 | }; 95 | 96 | template <> 97 | struct n_limits { 98 | static inline __host__ __device__ at::Half lowest() { return at::Half(0xFBFF, at::Half::from_bits()); } 99 | static inline __host__ __device__ at::Half min() { return at::Half(0x0400, at::Half::from_bits()); } 100 | static inline __host__ __device__ at::Half max() { return at::Half(0x7BFF, at::Half::from_bits()); } 101 | static inline __host__ __device__ at::Half lower_bound() { return at::Half(0xFC00, at::Half::from_bits()); } 102 | static inline __host__ __device__ at::Half upper_bound() { return at::Half(0x7C00, at::Half::from_bits()); } 103 | }; 104 | 105 | template <> 106 | struct n_limits { 107 | static inline __host__ __device__ at::BFloat16 lowest() { return at::BFloat16(0xFF7F, at::BFloat16::from_bits()); } 108 | static inline __host__ __device__ at::BFloat16 min() { return at::BFloat16(0x0080, at::BFloat16::from_bits()); } 109 | static inline __host__ __device__ at::BFloat16 max() { return at::BFloat16(0x7F7F, at::BFloat16::from_bits()); } 110 | static inline __host__ __device__ at::BFloat16 lower_bound() { return at::BFloat16(0xFF80, at::BFloat16::from_bits()); } 111 | static inline __host__ __device__ at::BFloat16 upper_bound() { return at::BFloat16(0x7F80, at::BFloat16::from_bits()); } 112 | }; 113 | 114 | template <> 115 | struct n_limits { 116 | static inline __host__ __device__ float lowest() { return -FLT_MAX; } 117 | static inline __host__ __device__ float min() { return FLT_MIN; } 118 | static inline __host__ __device__ float max() { return FLT_MAX; } 119 | static inline __host__ __device__ float lower_bound() { return -std::numeric_limits::infinity(); } 120 | static inline __host__ __device__ float upper_bound() { return std::numeric_limits::infinity(); } 121 | }; 122 | 123 | template <> 124 | struct n_limits { 125 | static inline __host__ __device__ double lowest() { return -DBL_MAX; } 126 | static inline __host__ __device__ double min() { return DBL_MIN; } 127 | static inline __host__ __device__ double max() { return DBL_MAX; } 128 | static inline __host__ __device__ double lower_bound() { return -std::numeric_limits::infinity(); } 129 | static inline __host__ __device__ double upper_bound() { return std::numeric_limits::infinity(); } 130 | }; 131 | 132 | } // namespace at 133 | -------------------------------------------------------------------------------- /pytorch/CUDA/softpool_cuda.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | // CUDA forward declarations 5 | 6 | int SoftPool1dForwardLauncher(const at::Tensor input, const int batches, 7 | const int channels, const int dim, 8 | const int kernel_d, const int stride_d, 9 | at::Tensor output); 10 | 11 | int SoftPool1dBackwardLauncher(const at::Tensor output_grad, const at::Tensor input, 12 | const int batches, const int channels, 13 | const int dim, const int kernel_d, 14 | const int stride_d, at::Tensor input_grad); 15 | 16 | int SoftPool2dForwardLauncher(const at::Tensor input, const int batches, 17 | const int channels, const int height, 18 | const int width, const int kernel_h, 19 | const int kernel_w, const int stride_h, 20 | const int stride_w, at::Tensor output); 21 | 22 | int SoftPool2dBackwardLauncher(const at::Tensor output_grad, const at::Tensor input, 23 | const int batches, const int channels, 24 | const int height, const int width, 25 | const int kernel_h, const int kernel_w, 26 | const int stride_h, const int stride_w, 27 | at::Tensor input_grad); 28 | 29 | int SoftPool3dForwardLauncher(const at::Tensor input, const int batches, 30 | const int channels, const int depth, 31 | const int height, const int width, 32 | const int kernel_d, const int kernel_h, 33 | const int kernel_w, const int stride_d, 34 | const int stride_h, const int stride_w, 35 | at::Tensor output); 36 | 37 | int SoftPool3dBackwardLauncher(const at::Tensor output_grad, const at::Tensor input, 38 | const int batches, const int channels, 39 | const int depth, const int height, 40 | const int width, const int kernel_d, 41 | const int kernel_h, const int kernel_w, 42 | const int stride_d, const int stride_h, 43 | const int stride_w, at::Tensor input_grad); 44 | 45 | // C++ interface 46 | 47 | #define CHECK_CUDA(x) AT_ASSERT(x.is_cuda(), #x " must be a CUDA tensor") 48 | #define CHECK_CONTIGUOUS(x) AT_ASSERT(x.is_contiguous(), #x " must be a contiguous tensor"); 49 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x); 50 | 51 | int softpool1d_forward_cuda(at::Tensor input, const std::tuple kernel, 52 | const std::tuple stride, at::Tensor output) { 53 | CHECK_INPUT(input); 54 | CHECK_INPUT(output); 55 | 56 | int batches = input.size(0); 57 | int channels = input.size(1); 58 | int dim = input.size(2); 59 | 60 | int kernel_d = std::get<0>(kernel); 61 | int stride_d = std::get<0>(stride); 62 | 63 | SoftPool1dForwardLauncher(input, batches, 64 | channels, dim, 65 | kernel_d, stride_d, 66 | output); 67 | return 1; 68 | } 69 | 70 | int softpool1d_backward_cuda(const at::Tensor output_grad, const at::Tensor input, 71 | const std::tuple kernel, const std::tuple stride, 72 | at::Tensor input_grad) { 73 | CHECK_INPUT(output_grad); 74 | CHECK_INPUT(input); 75 | CHECK_INPUT(input_grad); 76 | 77 | 78 | int batches = input_grad.size(0); 79 | int channels = input_grad.size(1); 80 | int dim = input_grad.size(2); 81 | 82 | int kernel_d = std::get<0>(kernel); 83 | int stride_d = std::get<0>(stride); 84 | 85 | SoftPool1dBackwardLauncher(output_grad, input, 86 | batches, channels, 87 | dim, kernel_d, 88 | stride_d, input_grad); 89 | return 1; 90 | } 91 | 92 | 93 | int softpool2d_forward_cuda(at::Tensor input, const std::tuple kernel, 94 | const std::tuple stride, at::Tensor output) { 95 | CHECK_INPUT(input); 96 | CHECK_INPUT(output); 97 | 98 | int batches = input.size(0); 99 | int channels = input.size(1); 100 | int height = input.size(2); 101 | int width = input.size(3); 102 | 103 | int kernel_h = std::get<0>(kernel); 104 | int kernel_w = std::get<1>(kernel); 105 | int stride_h = std::get<0>(stride); 106 | int stride_w = std::get<1>(stride); 107 | 108 | SoftPool2dForwardLauncher(input, batches, 109 | channels, height, 110 | width, kernel_h, 111 | kernel_w, stride_h, 112 | stride_w, output); 113 | return 1; 114 | } 115 | 116 | int softpool2d_backward_cuda(const at::Tensor output_grad, const at::Tensor input, 117 | const std::tuple kernel, const std::tuple stride, 118 | at::Tensor input_grad) { 119 | CHECK_INPUT(output_grad); 120 | CHECK_INPUT(input); 121 | CHECK_INPUT(input_grad); 122 | 123 | 124 | int batches = input_grad.size(0); 125 | int channels = input_grad.size(1); 126 | int height = input_grad.size(2); 127 | int width = input_grad.size(3); 128 | 129 | int kernel_h = std::get<0>(kernel); 130 | int kernel_w = std::get<1>(kernel); 131 | int stride_h = std::get<0>(stride); 132 | int stride_w = std::get<1>(stride); 133 | 134 | SoftPool2dBackwardLauncher(output_grad, input, 135 | batches, channels, 136 | height, width, 137 | kernel_h, kernel_w, 138 | stride_h, stride_w, 139 | input_grad); 140 | return 1; 141 | } 142 | 143 | 144 | int softpool3d_forward_cuda(at::Tensor input, const std::tuple kernel, 145 | const std::tuple stride, at::Tensor output) { 146 | CHECK_INPUT(input); 147 | CHECK_INPUT(output); 148 | 149 | int batches = input.size(0); 150 | int channels = input.size(1); 151 | int depth = input.size(2); 152 | int height = input.size(3); 153 | int width = input.size(4); 154 | 155 | int kernel_d = std::get<0>(kernel); 156 | int kernel_h = std::get<1>(kernel); 157 | int kernel_w = std::get<2>(kernel); 158 | int stride_d = std::get<0>(stride); 159 | int stride_h = std::get<1>(stride); 160 | int stride_w = std::get<2>(stride); 161 | 162 | SoftPool3dForwardLauncher(input, batches, 163 | channels, depth, 164 | height, width, 165 | kernel_d, kernel_h, 166 | kernel_w, stride_d, 167 | stride_h, stride_w, 168 | output); 169 | return 1; 170 | } 171 | 172 | int softpool3d_backward_cuda(const at::Tensor output_grad, const at::Tensor input, 173 | const std::tuple kernel, const std::tuple stride, 174 | at::Tensor input_grad) { 175 | CHECK_INPUT(output_grad); 176 | CHECK_INPUT(input); 177 | CHECK_INPUT(input_grad); 178 | 179 | 180 | int batches = input_grad.size(0); 181 | int channels = input_grad.size(1); 182 | int depth = input_grad.size(2); 183 | int height = input_grad.size(3); 184 | int width = input_grad.size(4); 185 | 186 | int kernel_d = std::get<0>(kernel); 187 | int kernel_h = std::get<1>(kernel); 188 | int kernel_w = std::get<2>(kernel); 189 | int stride_d = std::get<0>(stride); 190 | int stride_h = std::get<1>(stride); 191 | int stride_w = std::get<2>(stride); 192 | 193 | SoftPool3dBackwardLauncher(output_grad, input, 194 | batches, channels, 195 | depth, height, 196 | width, kernel_d, 197 | kernel_h, kernel_w, 198 | stride_d, stride_h, 199 | stride_w, input_grad); 200 | return 1; 201 | } 202 | 203 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 204 | m.def("forward_1d", &softpool1d_forward_cuda, "SoftPool1d forward (CUDA)"); 205 | m.def("backward_1d", &softpool1d_backward_cuda, "SoftPool1d backward (CUDA)"); 206 | m.def("forward_2d", &softpool2d_forward_cuda, "SoftPool2d forward (CUDA)"); 207 | m.def("backward_2d", &softpool2d_backward_cuda, "SoftPool2d backward (CUDA)"); 208 | m.def("forward_3d", &softpool3d_forward_cuda, "SoftPool3d forward (CUDA)"); 209 | m.def("backward_3d", &softpool3d_backward_cuda, "SoftPool3d backward (CUDA)"); 210 | } 211 | -------------------------------------------------------------------------------- /pytorch/CUDA/softpool_cuda_kernel.cu: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #include "limits.cuh" 6 | 7 | using namespace at; // fix for pytorch<=0.4.1 8 | 9 | #define CUDA_1D_KERNEL_LOOP(i, n) \ 10 | for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < n; \ 11 | i += blockDim.x * gridDim.x) 12 | 13 | #define THREADS_PER_BLOCK 1024 14 | 15 | inline int GET_BLOCKS(const int N) { 16 | int optimal_block_num = (N + THREADS_PER_BLOCK - 1) / THREADS_PER_BLOCK; 17 | int max_block_num = 65000; 18 | return min(optimal_block_num, max_block_num); 19 | } 20 | 21 | //type-safe sign 22 | template 23 | __device__ scalar_t sgn(scalar_t val) { 24 | return (scalar_t(0) < val) - (val < scalar_t(0)); 25 | } 26 | 27 | // Overflow and Underflow clamp 28 | template 29 | __device__ scalar_t clamp(const scalar_t n, const scalar_t lower, const scalar_t upper) { 30 | const scalar_t tmp = abs(n); 31 | const scalar_t result = max(lower, min(tmp, upper)); 32 | return result * sgn(n); 33 | } 34 | 35 | 36 | template 37 | __global__ void SoftPool1dForward(const int nthreads, 38 | const scalar_t *bottom_input, const int batches, 39 | const int channels, const int dim, 40 | const int kernel_d, const int stride_d, 41 | scalar_t *output_data){ 42 | int pooled_dim = dim/stride_d; 43 | // Run in parallel for each cell within each kernel region 44 | CUDA_1D_KERNEL_LOOP(index, nthreads) { 45 | int pd = index % pooled_dim;// index of each kernel operation in relation to the position in the input 46 | int c = (index / pooled_dim) % channels; 47 | int n = index / pooled_dim / channels; 48 | 49 | const int offset = (n * channels + c) * dim; // initial offset 50 | const scalar_t *offset_bottom_input = bottom_input + offset; 51 | 52 | const int base_d = pd*stride_d; // start cell index for each kernel 53 | if (base_d > dim - kernel_d)break; // limit iterations based on the position of the final kernel application over the input 54 | 55 | // --- Initialisations happen here ---- 56 | scalar_t mask_sum_max = 0.; 57 | 58 | output_data[index] = 0.; 59 | const scalar_t upper = n_limits::max(); 60 | const scalar_t lower = n_limits::min(); 61 | const scalar_t zero = 0.; 62 | 63 | // Iterate over inputs cells within each kernel region in the input 64 | for(int id=0; id= dim || d_offset < 0)continue;// check if the offset index is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing) 68 | const int offset = d_offset; 69 | 70 | // Use this for verbose when debugging 71 | //printf("(pd: %d), base_d: %d, id: %d, d_offset: %d \n", pd, base_d, id, d_offset); 72 | 73 | mask_sum_max += exp(offset_bottom_input[offset]); 74 | 75 | } 76 | // Overflow check 77 | mask_sum_max = clamp(mask_sum_max, lower, upper); 78 | 79 | for(int id=0; id= dim || d_offset < 0)continue; 83 | const int offset = d_offset; 84 | 85 | scalar_t mask_ = exp(offset_bottom_input[offset])/ mask_sum_max;// SoftMax 86 | 87 | output_data[index] += offset_bottom_input[offset] * mask_; 88 | output_data[index] = clamp(output_data[index], zero, upper); 89 | } 90 | } 91 | } 92 | 93 | 94 | template 95 | __global__ void SoftPool2dForward(const int nthreads, 96 | const scalar_t *bottom_input, const int batches, 97 | const int channels, const int height, 98 | const int width, const int kernel_h, 99 | const int kernel_w, const int stride_h, 100 | const int stride_w, scalar_t *output_data){ 101 | int pooled_height = height/stride_h; 102 | int pooled_width = width/stride_w; 103 | // Run in parallel for each cell within each kernel region 104 | CUDA_1D_KERNEL_LOOP(index, nthreads) { 105 | int pw = index % pooled_width; // index over width of each kernel operation in relation to the position in the input 106 | int ph = (index / pooled_width) % pooled_height; // index over height of each kernel operation in relation to the position in the input 107 | int c = (index / pooled_width / pooled_height) % channels; 108 | int n = index / pooled_width / pooled_height / channels; 109 | 110 | const int offset = (n * channels + c) * height * width; // initial offset 111 | const scalar_t *offset_bottom_input = bottom_input + offset; 112 | 113 | const int base_y = ph*stride_h;// start cell index over height/y for each kernel 114 | if (base_y > height - kernel_h)break; // limit height/y iterations for the index of the final kernel location in the input 115 | 116 | const int base_x = pw*stride_w; // start cell index over width/x for each kernel 117 | if (base_x > width - kernel_w)break; // limit width/x iterations for the index of the final kernel location in the input 118 | 119 | // --- Initialisations happen here ---- 120 | scalar_t mask_sum_max = 0.; 121 | 122 | output_data[index] = 0.; 123 | const scalar_t upper = n_limits::max(); 124 | const scalar_t lower = n_limits::min(); 125 | const scalar_t zero = 0.; 126 | 127 | // Iterate over inputs cells within each kernel region in the input 128 | for(int iy=0; iy= height || y_offset < 0)continue; // check if the offset index over y is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing) 132 | 133 | for(int ix=0; ix= width || x_offset < 0)continue; // check if the offset index over x is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing) 137 | 138 | const int offset = y_offset*width + x_offset; 139 | 140 | // Use this for verbose when debugging 141 | // printf("(ph: %d, pw: %d), base_y: %d, base_x: %d, iy: %d, ix: %d offset: %d \n", ph, pw, base_y, base_x, iy, ix, offset) 142 | 143 | mask_sum_max += exp(offset_bottom_input[offset]); 144 | 145 | } 146 | } 147 | // Overflow check 148 | mask_sum_max = clamp(mask_sum_max, lower, upper); 149 | 150 | 151 | for(int iy=0; iy= height || y_offset < 0)continue; 155 | 156 | for(int ix=0; ix= width || x_offset < 0)continue; 160 | const int offset = y_offset*width + x_offset; // x+y adjusted offset 161 | 162 | scalar_t mask_ = exp(offset_bottom_input[offset])/ mask_sum_max; // SoftMax 163 | 164 | output_data[index] += offset_bottom_input[offset] * mask_; 165 | output_data[index] = clamp(output_data[index], zero, upper); 166 | } 167 | } 168 | } 169 | } 170 | 171 | 172 | template 173 | __global__ void SoftPool3dForward(const int nthreads, 174 | const scalar_t *bottom_input, const int batches, 175 | const int channels, const int depth, 176 | const int height, const int width, 177 | const int kernel_d, const int kernel_h, 178 | const int kernel_w, const int stride_d, 179 | const int stride_h, const int stride_w, 180 | scalar_t *output_data){ 181 | int pooled_depth = depth/stride_d; 182 | int pooled_height = height/stride_h; 183 | int pooled_width = width/stride_w; 184 | CUDA_1D_KERNEL_LOOP(index, nthreads) { 185 | int pw = index % pooled_width; 186 | int ph = (index / pooled_width) % pooled_height; 187 | int pd = (index / pooled_width / pooled_height) % pooled_depth; 188 | int c = (index / pooled_width / pooled_height / pooled_depth) % channels; 189 | int n = index / pooled_width / pooled_height / pooled_depth / channels; 190 | 191 | const int offset = (n * channels + c) * depth * height * width; 192 | const scalar_t *offset_bottom_input = bottom_input + offset; 193 | 194 | scalar_t mask_sum = 0.; 195 | output_data[index] = 0.; 196 | const scalar_t upper = n_limits::max(); 197 | const scalar_t lower = n_limits::min(); 198 | const scalar_t zero = 0.; 199 | 200 | for(int id=0; id= depth || d_offset < 0)continue; 203 | for(int iy=0; iy= height || y_offset < 0)continue; 206 | for(int ix=0; ix= width || x_offset < 0)continue; 209 | const int offset = d_offset*height + y_offset*width + x_offset; 210 | 211 | // (Over/Under)flow check (A.) 0 <= e^{inp[offset]} <= FLT_MAX 212 | scalar_t mask = exp(offset_bottom_input[offset]); 213 | mask = clamp(mask, zero, upper); 214 | mask_sum += mask; 215 | } 216 | } 217 | } 218 | // Overflow check (B.) FLT_MIN <= sum{e^{inp[offset]}} <= FLT_MAX 219 | mask_sum = clamp(mask_sum, lower, upper); 220 | 221 | for(int id=0; id= depth || d_offset < 0)continue; 224 | for(int iy=0; iy= height || y_offset < 0)continue; 227 | for(int ix=0; ix= width || x_offset < 0)continue; 230 | const int offset = d_offset*height + y_offset*width + x_offset; 231 | 232 | // (Over/Under)flow check (C.) 0 <= e^{inp[offset]} <= FLT_MAX 233 | scalar_t mask = exp(offset_bottom_input[offset]); 234 | mask = clamp(mask, zero, upper); 235 | 236 | // Underflow check (D.) 0 <= e^{inp[offset]}/sum{e^{inp[offset]}} <= 1 237 | mask /= mask_sum; 238 | mask = clamp(mask, zero, upper); 239 | 240 | // Underflow check (E.) 0 <= (e^{inp[offset]}/sum{e^{inp[offset]}}) * inp[offset] <= FLT_MAX 241 | scalar_t weighted_inp = offset_bottom_input[offset] * mask; 242 | weighted_inp = clamp(weighted_inp, zero, upper); 243 | 244 | // Overflow check (F.) 0 <= sum[(e^{inp[offset]}/sum{e^{inp[offset]}}) * inp[offset]] <= FLT_MAX 245 | output_data[index] += weighted_inp; 246 | output_data[index] = clamp(output_data[index], zero, upper); 247 | } 248 | } 249 | } 250 | } 251 | } 252 | 253 | 254 | int SoftPool1dForwardLauncher(const at::Tensor input, const int batches, 255 | const int channels, const int dim, 256 | const int kernel_d, const int stride_d, 257 | at::Tensor output){ 258 | const int output_size = batches * dim/stride_d * channels; 259 | AT_DISPATCH_FLOATING_TYPES_AND_HALF( 260 | input.scalar_type(), "SoftPool1dLauncherForward", ([&] { 261 | const scalar_t *bottom_input = input.data_ptr(); 262 | scalar_t *output_data = output.data_ptr(); 263 | 264 | SoftPool1dForward 265 | <<>>( 266 | output_size, bottom_input, 267 | batches, channels, 268 | dim, kernel_d, 269 | stride_d, output_data); 270 | }) 271 | ); 272 | 273 | cudaError_t err = cudaGetLastError(); 274 | if (cudaSuccess != err) { 275 | fprintf(stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString(err)); 276 | exit(-1); 277 | } 278 | return 1; 279 | } 280 | 281 | int SoftPool2dForwardLauncher(const at::Tensor input, const int batches, 282 | const int channels, const int height, 283 | const int width, const int kernel_h, 284 | const int kernel_w, const int stride_h, 285 | const int stride_w, at::Tensor output){ 286 | const int output_size = batches * height/stride_h * width/stride_w * channels; 287 | AT_DISPATCH_FLOATING_TYPES_AND_HALF( 288 | input.scalar_type(), "SoftPool2dLauncherForward", ([&] { 289 | const scalar_t *bottom_input = input.data_ptr(); 290 | scalar_t *output_data = output.data_ptr(); 291 | 292 | SoftPool2dForward 293 | <<>>( 294 | output_size, bottom_input, 295 | batches, channels, 296 | height, width, 297 | kernel_h, kernel_w, 298 | stride_h, stride_w, 299 | output_data); 300 | }) 301 | ); 302 | 303 | cudaError_t err = cudaGetLastError(); 304 | if (cudaSuccess != err) { 305 | fprintf(stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString(err)); 306 | exit(-1); 307 | } 308 | return 1; 309 | } 310 | 311 | int SoftPool3dForwardLauncher(const at::Tensor input, const int batches, 312 | const int channels, const int depth, 313 | const int height, const int width, 314 | const int kernel_d, const int kernel_h, 315 | const int kernel_w, const int stride_d, 316 | const int stride_h, const int stride_w, 317 | at::Tensor output){ 318 | const int output_size = batches * depth/stride_d * height/stride_h * width/stride_w * channels; 319 | AT_DISPATCH_FLOATING_TYPES_AND_HALF( 320 | input.scalar_type(), "SoftPool3dLauncherForward", ([&] { 321 | const scalar_t *bottom_input = input.data_ptr(); 322 | scalar_t *output_data = output.data_ptr(); 323 | 324 | SoftPool3dForward 325 | <<>>( 326 | output_size, bottom_input, 327 | batches, channels, 328 | depth, height, 329 | width, kernel_d, 330 | kernel_h, kernel_w, 331 | stride_d, stride_h, 332 | stride_w, output_data); 333 | }) 334 | ); 335 | 336 | cudaError_t err = cudaGetLastError(); 337 | if (cudaSuccess != err) { 338 | fprintf(stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString(err)); 339 | exit(-1); 340 | } 341 | return 1; 342 | } 343 | 344 | 345 | template 346 | __global__ void SoftPool1dBackward(const int nthreads, 347 | const scalar_t *diff_output, const scalar_t *data_input, 348 | const int batches, const int channels, 349 | const int dim, const int kernel_d, 350 | const int stride_d, scalar_t *diff_input){ 351 | int pooled_dim = dim/stride_d; 352 | // Run in parallel for each cell within each kernel region 353 | CUDA_1D_KERNEL_LOOP(index, nthreads) { 354 | int pd = index % pooled_dim; // index of each kernel operation in relation to the position in the input 355 | int c = (index / pooled_dim) % channels; 356 | int n = index / pooled_dim / channels; 357 | 358 | const int offset0 = (n * channels + c) * dim; // initial offset 359 | const scalar_t *offset_data_input = data_input + offset0; // offset based on the input data 360 | 361 | const scalar_t diff_output_index = diff_output[index]; // offset based on the output gradients 362 | scalar_t *offset_diff_input = diff_input + offset0; // offset based on the input gradients 363 | 364 | const int base_d = pd*stride_d; // start cell index for each kernel 365 | 366 | // --- Initialisations happen here ---- 367 | scalar_t mask_sum_max = 0.; 368 | const scalar_t upper = n_limits::max(); 369 | const scalar_t lower = n_limits::min(); 370 | 371 | // Iterate over inputs cells within each kernel region in the input 372 | for(int id=0; id= dim || d_offset < 0)continue; // check if the offset index is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing) 376 | const int offset = d_offset; 377 | 378 | // Use this for verbose when debugging 379 | //printf("(pd: %d), base_d: %d, id: %d, d_offset: %d \n", pd, base_d, id, d_offset); 380 | 381 | mask_sum_max += exp(offset_data_input[offset]); 382 | 383 | } 384 | // Overflow check 385 | mask_sum_max = clamp(mask_sum_max, lower, upper); 386 | 387 | for(int id=0; id= dim || d_offset < 0)continue; 391 | const int offset = d_offset; 392 | 393 | scalar_t mask_ = exp(offset_data_input[offset])/mask_sum_max; // SoftMax 394 | 395 | scalar_t weighted_grad = diff_output_index * mask_; // use mask over the output gradients 396 | 397 | // Underflow check 398 | weighted_grad = clamp(weighted_grad, lower, upper); 399 | 400 | atomicAdd(offset_diff_input+offset, weighted_grad); 401 | } 402 | } 403 | } 404 | 405 | template 406 | __global__ void SoftPool2dBackward(const int nthreads, 407 | const scalar_t *diff_output, const scalar_t *data_input, 408 | const int batches, const int channels, 409 | const int height, const int width, 410 | const int kernel_h, const int kernel_w, 411 | const int stride_h, const int stride_w, 412 | scalar_t *diff_input){ 413 | int pooled_height = height/stride_h; 414 | int pooled_width = width/stride_w; 415 | // Run in parallel for each cell within each kernel region 416 | CUDA_1D_KERNEL_LOOP(index, nthreads) { 417 | int pw = index % pooled_width; // index over width of each kernel operation in relation to the position in the input 418 | int ph = (index / pooled_width) % pooled_height; // index over height of each kernel operation in relation to the position in the input 419 | int c = (index / pooled_width / pooled_height) % channels; 420 | int n = index / pooled_width / pooled_height / channels; 421 | 422 | const int offset0 = (n * channels + c) * height * width; // initial offset 423 | const scalar_t *offset_data_input = data_input + offset0; // offset based on the input data 424 | 425 | const scalar_t diff_output_index = diff_output[index]; // offset based on the output gradients 426 | scalar_t *offset_diff_input = diff_input + offset0; // offset based on the input gradients 427 | 428 | const int base_y = ph * stride_h; // start cell index over height/y for each kernel 429 | if (base_y > height - kernel_h)break; // limit height/y iterations for the index of the final kernel location in the input 430 | 431 | const int base_x = pw * stride_w; // start cell index over width/x for each kernel 432 | if (base_x > width - kernel_w)break; // limit width/x iterations for the index of the final kernel location in the input 433 | 434 | // --- Initialisations happen here ---- 435 | scalar_t mask_sum_max = 0.; 436 | 437 | const scalar_t upper = n_limits::max(); 438 | const scalar_t lower = n_limits::min(); 439 | 440 | // Iterate over inputs cells within each kernel region in the input 441 | for(int iy=0; iy= height || y_offset < 0)continue; // check if the offset index over y is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing) 445 | 446 | for(int ix=0; ix= width || x_offset < 0)continue; // check if the offset index over x is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing) 450 | 451 | const int offset = y_offset*width + x_offset; 452 | 453 | // Use this for verbose when debugging 454 | // printf("(ph: %d, pw: %d), base_y: %d, base_x: %d, iy: %d, ix: %d offset: %d \n", ph, pw, base_y, base_x, iy, ix, offset) 455 | 456 | mask_sum_max += exp(offset_data_input[offset]); 457 | 458 | } 459 | } 460 | // Overflow check 461 | mask_sum_max = clamp(mask_sum_max, lower, upper); 462 | 463 | for(int iy=0; iy= height || y_offset < 0)continue; 467 | for(int ix=0; ix= width || x_offset < 0)continue; 471 | const int offset = y_offset*width + x_offset; // offset adjustment (x-based) 472 | 473 | scalar_t mask_ = exp(offset_data_input[offset])/mask_sum_max; // SoftMax (sum) 474 | 475 | scalar_t weighted_grad = diff_output_index * mask_; // use mask over the output gradients 476 | 477 | // Underflow check 478 | weighted_grad = clamp(weighted_grad, lower, upper); 479 | 480 | atomicAdd(offset_diff_input+offset, weighted_grad); 481 | } 482 | } 483 | } 484 | } 485 | 486 | template 487 | __global__ void SoftPool3dBackward(const int nthreads, 488 | const scalar_t *diff_output, const scalar_t *data_input, 489 | const int batches, const int channels, 490 | const int depth, const int height, 491 | const int width, const int kernel_d, 492 | const int kernel_h, const int kernel_w , 493 | const int stride_d, const int stride_h, 494 | const int stride_w, scalar_t *diff_input){ 495 | int pooled_depth = depth/stride_d; 496 | int pooled_height = width/stride_h; 497 | int pooled_width = width/stride_w; 498 | CUDA_1D_KERNEL_LOOP(index, nthreads) { 499 | int pw = index % pooled_width; // index over width of each kernel operation in relation to the position in the input 500 | int ph = (index / pooled_width) % pooled_height; // index over height of each kernel operation in relation to the position in the input 501 | int pd = (index / pooled_width / pooled_height) % pooled_depth; // index over depth of each kernel operation in relation to the position in the input 502 | int c = (index / pooled_width / pooled_height / pooled_depth) % channels; 503 | int n = index / pooled_width / pooled_height / pooled_depth / channels; 504 | 505 | const int offset0 = (n * channels + c) * depth * height * width; // initial offset 506 | const scalar_t *offset_data_input = data_input + offset0; // offset based on the input data 507 | 508 | const scalar_t diff_output_index = diff_output[index]; // offset based on the output gradients 509 | scalar_t *offset_diff_input = diff_input + offset0; // offset based on the input gradients 510 | 511 | const int base_d = pd*stride_d; // start cell index over depth/d for each kernel 512 | if (base_d > depth - kernel_d)break; // limit depth/d iterations for the index of the final kernel location in the input 513 | 514 | const int base_y = ph*stride_h; // start cell index over height/y for each kernel 515 | if (base_y > height - kernel_h)break; // limit height/y iterations for the index of the final kernel location in the input 516 | 517 | const int base_x = pw*stride_w; // start cell index over width/x for each kernel 518 | if (base_x > width - kernel_w)break; // limit width/x iterations for the index of the final kernel location in the input 519 | 520 | // --- Initialisations happen here ---- 521 | scalar_t mask_sum_max = 0.; 522 | 523 | const scalar_t upper = n_limits::max(); 524 | const scalar_t lower = n_limits::min(); 525 | 526 | // Iterate over inputs cells within each kernel region in the input 527 | for(int id=0; id= depth || d_offset < 0)continue; // check if the offset index over d is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing) 531 | 532 | for(int iy=0; iy= height || y_offset < 0)continue; // check if the offset index over y is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing) 536 | 537 | for(int ix=0; ix= width || x_offset < 0)continue; // check if the offset index over x is valid (not larger than or equal to the size of the dimension) OR smaller than 0 (for fool proofing) 541 | 542 | const int offset = d_offset*height + y_offset*width + x_offset; 543 | 544 | // Use this for verbose when debugging 545 | // printf("(pd: %d, ph: %d, pw: %d), base_d: %d, base_y: %d, base_x: %d, id: %d, iy: %d, ix: %d, offset: %d \n", pd, ph, pw, base_d, base_y, base_x, id, iy, ix, offset); 546 | 547 | mask_sum_max += exp(offset_data_input[offset]); 548 | 549 | } 550 | } 551 | } 552 | // Overflow check 553 | mask_sum_max = clamp(mask_sum_max, lower, upper); 554 | 555 | for(int id=0; id= depth || d_offset < 0)continue; 559 | for(int iy=0; iy= height || y_offset < 0)continue; 563 | for(int ix=0; ix= width || x_offset < 0)continue; 567 | const int offset = d_offset*height + y_offset*width + x_offset; 568 | 569 | scalar_t mask_ = exp(offset_data_input[offset])/mask_sum_max; // SoftMax 570 | 571 | scalar_t weighted_grad = diff_output_index * mask_; // use mask over the output gradients 572 | 573 | // Underflow check 574 | weighted_grad = clamp(weighted_grad, lower, upper); 575 | 576 | atomicAdd(offset_diff_input+offset, weighted_grad); 577 | } 578 | } 579 | } 580 | } 581 | } 582 | 583 | int SoftPool1dBackwardLauncher(const at::Tensor output_grad, const at::Tensor input, 584 | const int batches, const int channels, 585 | const int dim, const int kernel_d, 586 | const int stride_d, at::Tensor input_grad){ 587 | 588 | const int output_size = batches * dim/stride_d * channels; 589 | 590 | AT_DISPATCH_FLOATING_TYPES_AND_HALF( 591 | input.scalar_type(), "SoftPool1dLauncherBackward", ([&] { 592 | scalar_t *diff_input = input_grad.data_ptr(); 593 | const scalar_t *diff_output = output_grad.data_ptr(); 594 | const scalar_t *data_input = input.data_ptr(); 595 | 596 | SoftPool1dBackward 597 | <<>>( 598 | output_size, diff_output, 599 | data_input, batches, 600 | channels, dim, 601 | kernel_d, stride_d, 602 | diff_input); 603 | } 604 | ) 605 | ); 606 | 607 | cudaError_t err = cudaGetLastError(); 608 | if (cudaSuccess != err) { 609 | fprintf(stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString(err)); 610 | exit(-1); 611 | } 612 | return 1; 613 | } 614 | 615 | int SoftPool2dBackwardLauncher(const at::Tensor output_grad, const at::Tensor input, 616 | const int batches, const int channels, 617 | const int height, const int width, 618 | const int kernel_h, const int kernel_w, 619 | const int stride_h, const int stride_w, 620 | at::Tensor input_grad){ 621 | 622 | const int output_size = batches * height/stride_h * width/stride_w * channels; 623 | 624 | AT_DISPATCH_FLOATING_TYPES_AND_HALF( 625 | input.scalar_type(), "SoftPool2dLauncherBackward", ([&] { 626 | scalar_t *diff_input = input_grad.data_ptr(); 627 | const scalar_t *diff_output = output_grad.data_ptr(); 628 | const scalar_t *data_input = input.data_ptr(); 629 | 630 | SoftPool2dBackward 631 | <<>>( 632 | output_size, diff_output, 633 | data_input, batches, 634 | channels, height, 635 | width, kernel_h, 636 | kernel_w, stride_h, 637 | stride_w, diff_input); 638 | } 639 | ) 640 | ); 641 | 642 | cudaError_t err = cudaGetLastError(); 643 | if (cudaSuccess != err) { 644 | fprintf(stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString(err)); 645 | exit(-1); 646 | } 647 | return 1; 648 | } 649 | 650 | int SoftPool3dBackwardLauncher(const at::Tensor output_grad, const at::Tensor input, 651 | const int batches, const int channels, 652 | const int depth, const int height, 653 | const int width, const int kernel_d, 654 | const int kernel_h, const int kernel_w, 655 | const int stride_d, const int stride_h, 656 | const int stride_w, at::Tensor input_grad){ 657 | 658 | const int output_size = batches * depth/stride_d * height/stride_h * width/stride_w * channels; 659 | 660 | AT_DISPATCH_FLOATING_TYPES_AND_HALF( 661 | input.scalar_type(), "SoftPool3dLauncherBackward", ([&] { 662 | scalar_t *diff_input = input_grad.data_ptr(); 663 | const scalar_t *diff_output = output_grad.data_ptr(); 664 | const scalar_t *data_input = input.data_ptr(); 665 | 666 | SoftPool3dBackward 667 | <<>>( 668 | output_size, diff_output, 669 | data_input, batches, 670 | channels, depth, height, 671 | width, kernel_d, 672 | kernel_h, kernel_w, 673 | stride_d, stride_h, 674 | stride_w, diff_input); 675 | } 676 | ) 677 | ); 678 | 679 | cudaError_t err = cudaGetLastError(); 680 | if (cudaSuccess != err) { 681 | fprintf(stderr, "cudaCheckError() failed : %s\n", cudaGetErrorString(err)); 682 | exit(-1); 683 | } 684 | return 1; 685 | } 686 | -------------------------------------------------------------------------------- /pytorch/Makefile: -------------------------------------------------------------------------------- 1 | install: clean 2 | python setup.py install 3 | 4 | clean: 5 | rm -rf *.egg-info 6 | rm -rf build dist 7 | 8 | test: 9 | cd test-files && python test.py 10 | -------------------------------------------------------------------------------- /pytorch/SoftPool/__init__.py: -------------------------------------------------------------------------------- 1 | from .idea import soft_pool1d, soft_pool2d, soft_pool3d, SoftPool1d, SoftPool2d, SoftPool3d 2 | 3 | __all__ = ['soft_pool1d', 'soft_pool2d', 'soft_pool3d', 'SoftPool1d', 'SoftPool2d', 'SoftPool3d'] 4 | -------------------------------------------------------------------------------- /pytorch/SoftPool/__pycache__/idea.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/pytorch/SoftPool/__pycache__/idea.cpython-38.pyc -------------------------------------------------------------------------------- /pytorch/SoftPool/idea.py: -------------------------------------------------------------------------------- 1 | from torch import nn 2 | from torch.autograd import Function 3 | import torch.nn.functional as F 4 | import torch 5 | from torch.nn.modules.utils import _triple, _pair, _single 6 | 7 | import softpool_cuda 8 | 9 | 10 | class CUDA_SOFTPOOL1d(Function): 11 | @staticmethod 12 | @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32) 13 | def forward(ctx, input, kernel=2, stride=None): 14 | # Create contiguous tensor (if tensor is not contiguous) 15 | no_batch = False 16 | if len(input.size()) == 2: 17 | no_batch = True 18 | input.unsqueeze_(0) 19 | B, C, D = input.size() 20 | kernel = _single(kernel) 21 | if stride is None: 22 | stride = kernel 23 | else: 24 | stride = _single(stride) 25 | oD = (D-kernel[0]) // stride[0] + 1 26 | output = input.new_zeros((B, C, oD)) 27 | softpool_cuda.forward_1d(input.contiguous(), kernel, stride, output) 28 | ctx.save_for_backward(input) 29 | ctx.kernel = kernel 30 | ctx.stride = stride 31 | if no_batch: 32 | return output.squeeze_(0) 33 | return output 34 | 35 | @staticmethod 36 | @torch.cuda.amp.custom_bwd 37 | def backward(ctx, grad_output): 38 | # Create contiguous tensor (if tensor is not contiguous) 39 | grad_input = torch.zeros_like(ctx.saved_tensors[0]) 40 | saved = [grad_output.contiguous()] + list(ctx.saved_tensors) + [ctx.kernel, ctx.stride] + [grad_input] 41 | softpool_cuda.backward_1d(*saved) 42 | # Gradient underflow 43 | saved[-1][torch.isnan(saved[-1])] = 0 44 | return saved[-1], None, None 45 | 46 | 47 | class CUDA_SOFTPOOL2d(Function): 48 | @staticmethod 49 | @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32) 50 | def forward(ctx, input, kernel=2, stride=None): 51 | # Create contiguous tensor (if tensor is not contiguous) 52 | no_batch = False 53 | if len(input.size()) == 3: 54 | no_batch = True 55 | input.unsqueeze_(0) 56 | B, C, H, W = input.size() 57 | kernel = _pair(kernel) 58 | if stride is None: 59 | stride = kernel 60 | else: 61 | stride = _pair(stride) 62 | oH = (H - kernel[0]) // stride[0] + 1 63 | oW = (W - kernel[1]) // stride[1] + 1 64 | output = input.new_zeros((B, C, oH, oW)) 65 | softpool_cuda.forward_2d(input.contiguous(), kernel, stride, output) 66 | ctx.save_for_backward(input) 67 | ctx.kernel = kernel 68 | ctx.stride = stride 69 | if no_batch: 70 | return output.squeeze_(0) 71 | return output 72 | 73 | @staticmethod 74 | @torch.cuda.amp.custom_bwd 75 | def backward(ctx, grad_output): 76 | # Create contiguous tensor (if tensor is not contiguous) 77 | grad_input = torch.zeros_like(ctx.saved_tensors[0]) 78 | saved = [grad_output.contiguous()] + list(ctx.saved_tensors) + [ctx.kernel,ctx.stride] + [grad_input] 79 | softpool_cuda.backward_2d(*saved) 80 | # Gradient underflow 81 | saved[-1][torch.isnan(saved[-1])] = 0 82 | return saved[-1], None, None 83 | 84 | 85 | class CUDA_SOFTPOOL3d(Function): 86 | @staticmethod 87 | @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32) 88 | def forward(ctx, input, kernel=2, stride=None): 89 | # Create contiguous tensor (if tensor is not contiguous) 90 | no_batch = False 91 | if len(input.size()) == 3: 92 | no_batch = True 93 | input.unsqueeze_(0) 94 | B, C, D, H, W = input.size() 95 | kernel = _triple(kernel) 96 | if stride is None: 97 | stride = kernel 98 | else: 99 | stride = _triple(stride) 100 | oD = (D - kernel[0]) // stride[0] + 1 101 | oH = (H - kernel[1]) // stride[1] + 1 102 | oW = (W - kernel[2]) // stride[2] + 1 103 | output = input.new_zeros((B, C, oD, oH, oW)) 104 | softpool_cuda.forward_3d(input.contiguous(), kernel, stride, output) 105 | ctx.save_for_backward(input) 106 | ctx.kernel = kernel 107 | ctx.stride = stride 108 | if no_batch: 109 | return output.squeeze_(0) 110 | return output 111 | 112 | @staticmethod 113 | @torch.cuda.amp.custom_bwd 114 | def backward(ctx, grad_output): 115 | # Create contiguous tensor (if tensor is not contiguous) 116 | grad_input = torch.zeros_like(ctx.saved_tensors[0]) 117 | saved = [grad_output.contiguous()] + list(ctx.saved_tensors) + [ctx.kernel,ctx.stride] + [grad_input] 118 | softpool_cuda.backward_3d(*saved) 119 | # Gradient underflow 120 | saved[-1][torch.isnan(saved[-1])] = 0 121 | return saved[-1], None, None 122 | 123 | 124 | 125 | ''' 126 | --- S T A R T O F F U N C T I O N S O F T _ P O O L 1 D --- 127 | [About] 128 | Function for dowsampling based on the exponenial proportion rate of pixels (soft pooling). 129 | If the tensor is in CUDA the custom operation is used. Alternatively, the function uses 130 | standard (mostly) in-place PyTorch operations for speed and reduced memory consumption. 131 | It is also possible to use non-inplace operations in order to improve stability. 132 | [Args] 133 | - x: PyTorch Tensor, could be in either cpu of CUDA. If in CUDA the homonym extension is used. 134 | - kernel_size: Integer or Tuple, for the kernel size to be used for downsampling. If an `Integer` 135 | is used, a `Tuple` is created for the rest of the dimensions. Defaults to 2. 136 | - stride: Integer or Tuple, for the steps taken between kernels (i.e. strides). If `None` the 137 | strides become equal to the `kernel_size` tuple. Defaults to `None`. 138 | - force_inplace: Bool, determines if in-place operations are to be used regardless of the CUDA 139 | custom op. Mostly useful for time monitoring. Defaults to `False`. 140 | [Returns] 141 | - PyTorch Tensor, subsampled based on the specified `kernel_size` and `stride` 142 | ''' 143 | def soft_pool1d(x, kernel_size=2, stride=None, force_inplace=False): 144 | if x.is_cuda and not force_inplace: 145 | x = CUDA_SOFTPOOL1d.apply(x, kernel_size, stride) 146 | # Replace `NaN's if found 147 | if torch.isnan(x).any(): 148 | return torch.nan_to_num(x) 149 | return x 150 | kernel_size = _single(kernel_size) 151 | if stride is None: 152 | stride = kernel_size 153 | else: 154 | stride = _single(stride) 155 | # Get input sizes 156 | _, c, d = x.size() 157 | # Create exponential mask (should be similar to max-like pooling) 158 | e_x = torch.sum(torch.exp(x),dim=1,keepdim=True) 159 | e_x = torch.clamp(e_x , float(0), float('inf')) 160 | # Apply mask to input and pool and calculate the exponential sum 161 | # Tensor: [b x c x d] -> [b x c x d'] 162 | x = F.avg_pool1d(x.mul(e_x), kernel_size, stride=stride).mul_(sum(kernel_size)).div_(F.avg_pool1d(e_x, kernel_size, stride=stride).mul_(sum(kernel_size))) 163 | return torch.clamp(x , float(0), float('inf')) 164 | ''' 165 | --- E N D O F F U N C T I O N S O F T _ P O O L 1 D --- 166 | ''' 167 | 168 | 169 | 170 | ''' 171 | --- S T A R T O F F U N C T I O N S O F T _ P O O L 2 D --- 172 | [About] 173 | Function for dowsampling based on the exponenial proportion rate of pixels (soft pooling). 174 | If the tensor is in CUDA the custom operation is used. Alternatively, the function uses 175 | standard (mostly) in-place PyTorch operations for speed and reduced memory consumption. 176 | It is also possible to use non-inplace operations in order to improve stability. 177 | [Args] 178 | - x: PyTorch Tensor, could be in either cpu of CUDA. If in CUDA the homonym extension is used. 179 | - kernel_size: Integer or Tuple, for the kernel size to be used for downsampling. If an `Integer` 180 | is used, a `Tuple` is created for the rest of the dimensions. Defaults to 2. 181 | - stride: Integer or Tuple, for the steps taken between kernels (i.e. strides). If `None` the 182 | strides become equal to the `kernel_size` tuple. Defaults to `None`. 183 | - force_inplace: Bool, determines if in-place operations are to be used regardless of the CUDA 184 | custom op. Mostly useful for time monitoring. Defaults to `False`. 185 | [Returns] 186 | - PyTorch Tensor, subsampled based on the specified `kernel_size` and `stride` 187 | ''' 188 | def soft_pool2d(x, kernel_size=2, stride=None, force_inplace=False): 189 | if x.is_cuda and not force_inplace: 190 | x = CUDA_SOFTPOOL2d.apply(x, kernel_size, stride) 191 | # Replace `NaN's if found 192 | if torch.isnan(x).any(): 193 | return torch.nan_to_num(x) 194 | return x 195 | kernel_size = _pair(kernel_size) 196 | if stride is None: 197 | stride = kernel_size 198 | else: 199 | stride = _pair(stride) 200 | # Get input sizes 201 | _, c, h, w = x.size() 202 | # Create exponential mask (should be similar to max-like pooling) 203 | e_x = torch.sum(torch.exp(x),dim=1,keepdim=True) 204 | e_x = torch.clamp(e_x , float(0), float('inf')) 205 | # Apply mask to input and pool and calculate the exponential sum 206 | # Tensor: [b x c x d] -> [b x c x d'] 207 | x = F.avg_pool2d(x.mul(e_x), kernel_size, stride=stride).mul_(sum(kernel_size)).div_(F.avg_pool2d(e_x, kernel_size, stride=stride).mul_(sum(kernel_size))) 208 | return torch.clamp(x , float(0), float('inf')) 209 | ''' 210 | --- E N D O F F U N C T I O N S O F T _ P O O L 2 D --- 211 | ''' 212 | 213 | 214 | 215 | ''' 216 | --- S T A R T O F F U N C T I O N S O F T _ P O O L 3 D --- 217 | [About] 218 | Function for dowsampling based on the exponenial proportion rate of pixels (soft pooling). 219 | If the tensor is in CUDA the custom operation is used. Alternatively, the function uses 220 | standard (mostly) in-place PyTorch operations for speed and reduced memory consumption. 221 | It is also possible to use non-inplace operations in order to improve stability. 222 | [Args] 223 | - x: PyTorch Tensor, could be in either cpu of CUDA. If in CUDA the homonym extension is used. 224 | - kernel_size: Integer or Tuple, for the kernel size to be used for downsampling. If an `Integer` 225 | is used, a `Tuple` is created for the rest of the dimensions. Defaults to 2. 226 | - stride: Integer or Tuple, for the steps taken between kernels (i.e. strides). If `None` the 227 | strides become equal to the `kernel_size` tuple. Defaults to `None`. 228 | - force_inplace: Bool, determines if in-place operations are to be used regardless of the CUDA 229 | custom op. Mostly useful for time monitoring. Defaults to `False`. 230 | [Returns] 231 | - PyTorch Tensor, subsampled based on the specified `kernel_size` and `stride` 232 | ''' 233 | def soft_pool3d(x, kernel_size=2, stride=None, force_inplace=False): 234 | if x.is_cuda and not force_inplace: 235 | x = CUDA_SOFTPOOL3d.apply(x, kernel_size, stride) 236 | # Replace `NaN's if found 237 | if torch.isnan(x).any(): 238 | return torch.nan_to_num(x) 239 | return x 240 | kernel_size = _triple(kernel_size) 241 | if stride is None: 242 | stride = kernel_size 243 | else: 244 | stride = _triple(stride) 245 | # Get input sizes 246 | _, c, d, h, w = x.size() 247 | # Create exponential mask (should be similar to max-like pooling) 248 | e_x = torch.sum(torch.exp(x),dim=1,keepdim=True) 249 | e_x = torch.clamp(e_x , float(0), float('inf')) 250 | # Apply mask to input and pool and calculate the exponential sum 251 | # Tensor: [b x c x d x h x w] -> [b x c x d' x h' x w'] 252 | x = F.avg_pool3d(x.mul(e_x), kernel_size, stride=stride).mul_(sum(kernel_size)).div_(F.avg_pool3d(e_x, kernel_size, stride=stride).mul_(sum(kernel_size))) 253 | return torch.clamp(x , float(0), float('inf')) 254 | ''' 255 | --- E N D O F F U N C T I O N S O F T _ P O O L 3 D --- 256 | ''' 257 | 258 | class SoftPool1d(torch.nn.Module): 259 | def __init__(self, kernel_size=2, stride=None, force_inplace=False): 260 | super(SoftPool1d, self).__init__() 261 | self.kernel_size = kernel_size 262 | self.stride = stride 263 | self.force_inplace = force_inplace 264 | 265 | def forward(self, x): 266 | return soft_pool1d(x, kernel_size=self.kernel_size, stride=self.stride, force_inplace=self.force_inplace) 267 | 268 | 269 | 270 | class SoftPool2d(torch.nn.Module): 271 | def __init__(self, kernel_size=2, stride=None, force_inplace=False): 272 | super(SoftPool2d, self).__init__() 273 | self.kernel_size = kernel_size 274 | self.stride = stride 275 | self.force_inplace = force_inplace 276 | 277 | def forward(self, x): 278 | return soft_pool2d(x, kernel_size=self.kernel_size, stride=self.stride, force_inplace=self.force_inplace) 279 | 280 | 281 | 282 | class SoftPool3d(torch.nn.Module): 283 | def __init__(self, kernel_size=2, stride=None, force_inplace=False): 284 | super(SoftPool3d, self).__init__() 285 | self.kernel_size = kernel_size 286 | self.stride = stride 287 | self.force_inplace = force_inplace 288 | 289 | def forward(self, x): 290 | return soft_pool3d(x, kernel_size=self.kernel_size, stride=self.stride, force_inplace=self.force_inplace) 291 | -------------------------------------------------------------------------------- /pytorch/setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | from torch.utils.cpp_extension import BuildExtension, CUDAExtension 3 | 4 | setup( 5 | name='SoftPool', 6 | version='1.1', 7 | description='CUDA-accelerated package for performing 1D/2D/3D SoftPool', 8 | author='Alexandros Stergiou', 9 | author_email='alexstergiou5@gmail.com', 10 | license='MIT', 11 | packages=find_packages(), 12 | ext_modules=[ 13 | CUDAExtension('softpool_cuda', [ 14 | 'CUDA/softpool_cuda.cpp', 15 | 'CUDA/softpool_cuda_kernel.cu', 16 | ]), 17 | ], 18 | cmdclass={ 19 | 'build_ext': BuildExtension.with_options(use_ninja=False) 20 | }) 21 | -------------------------------------------------------------------------------- /pytorch/test-files/._images: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/pytorch/test-files/._images -------------------------------------------------------------------------------- /pytorch/test-files/._out_1: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/pytorch/test-files/._out_1 -------------------------------------------------------------------------------- /pytorch/test-files/._test.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alexandrosstergiou/SoftPool/e11dee7e96ecad895cf871c8cbf220f7908462ed/pytorch/test-files/._test.py -------------------------------------------------------------------------------- /pytorch/test-files/test.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import softpool_cuda 3 | from SoftPool import soft_pool1d, soft_pool2d, soft_pool3d, SoftPool1d, SoftPool2d, SoftPool3d 4 | 5 | import timeit 6 | 7 | 8 | def check_close_enough(a, check): 9 | a = a.cpu() 10 | check = check.cpu() 11 | residual = (a-check).data.abs().mean().cpu().item() 12 | assert torch.isnan(check).sum() == 0, 'meet NaN(s) in `check`' 13 | assert residual < .2, 'residual is not small: {}'.format(residual) 14 | 15 | x_1d = torch.rand((20, 32, 128)).float() 16 | x_2d = torch.rand((20, 32, 128, 128)).float() 17 | x_3d = torch.rand((20, 32, 16, 128, 128)).float() 18 | 19 | 20 | print('\033[95m' + '--- Initial checks for forward ---' + '\033[0m') 21 | 22 | 23 | ################## 1D FORWARD ################## 24 | print('\033[93m' + '> Checking 1D CPU ...' + '\033[0m') 25 | try: 26 | pl_1d_cpu = soft_pool1d(x_1d) 27 | print('\033[92m' + '> PASSED' + '\033[0m') 28 | except Exception as e: 29 | print('\033[91m' + '> FAILED' + '\033[0m') 30 | print(e) 31 | 32 | print('\033[93m' + '> Checking 1D GPU ...' + '\033[0m') 33 | try: 34 | pl_1d_gpu = soft_pool1d(x_1d.cuda()) 35 | print('\033[92m' + '> PASSED' + '\033[0m') 36 | except Exception as e: 37 | print('\033[91m' + '> FAILED' + '\033[0m') 38 | print(e) 39 | 40 | print('\033[93m' + '> Checking 1D CPU-GPU output similarities ...' + '\033[0m') 41 | try: 42 | check_close_enough(pl_1d_cpu.data, pl_1d_gpu.data) 43 | print('\033[92m' + '> PASSED' + '\033[0m'+'\n') 44 | except Exception as e: 45 | print('\033[91m' + '> FAILED' + '\033[0m') 46 | print(e,'\n') 47 | 48 | ################## 2D FORWARD ################## 49 | print('\033[93m' + '> Checking 2D CPU ...' + '\033[0m') 50 | try: 51 | pl_2d_cpu = soft_pool2d(x_2d) 52 | print('\033[92m' + '> PASSED' + '\033[0m') 53 | except Exception as e: 54 | print('\033[91m' + '> FAILED' + '\033[0m') 55 | print(e) 56 | 57 | print('\033[93m' + '> Checking 2D GPU ...' + '\033[0m') 58 | try: 59 | pl_2d_gpu = soft_pool2d(x_2d.cuda()) 60 | print('\033[92m' + '> PASSED' + '\033[0m') 61 | except Exception as e: 62 | print('\033[91m' + '> FAILED' + '\033[0m') 63 | print(e) 64 | 65 | print('\033[93m' + '> Checking 2D CPU-GPU output similarities ...' + '\033[0m') 66 | try: 67 | check_close_enough(pl_2d_cpu.data, pl_2d_gpu.data) 68 | print('\033[92m' + '> PASSED' + '\033[0m'+'\n') 69 | except Exception as e: 70 | print('\033[91m' + '> FAILED' + '\033[0m') 71 | print(e,'\n') 72 | 73 | ################## 3D FORWARD ################## 74 | print('\033[93m' + '> Checking 3D CPU ...' + '\033[0m') 75 | try: 76 | pl_3d_cpu = soft_pool3d(x_3d) 77 | print('\033[92m' + '> PASSED' + '\033[0m') 78 | except Exception as e: 79 | print('\033[91m' + '> FAILED' + '\033[0m') 80 | print(e) 81 | 82 | print('\033[93m' + '> Checking 3D GPU ...' + '\033[0m') 83 | try: 84 | pl_3d_gpu = soft_pool3d(x_3d.cuda()) 85 | print('\033[92m' + '> PASSED' + '\033[0m') 86 | except Exception as e: 87 | print('\033[91m' + '> FAILED' + '\033[0m') 88 | print(e) 89 | 90 | print('\033[93m' + '> Checking 3D CPU-GPU output similarities ...' + '\033[0m') 91 | try: 92 | check_close_enough(pl_3d_cpu.data, pl_3d_gpu.data) 93 | print('\033[92m' + '> PASSED' + '\033[0m'+'\n') 94 | except Exception as e: 95 | print('\033[91m' + '> FAILED' + '\033[0m') 96 | print(e,'\n') 97 | 98 | 99 | print('\033[95m' + '--- Initial checks for backward ---' + '\033[0m') 100 | 101 | a_1d = torch.rand((20, 32, 128)).float() 102 | b_1d = a_1d.clone().cuda() 103 | a_2d = torch.rand((20, 32, 128, 128)).float() 104 | b_2d = a_2d.clone().cuda() 105 | a_3d = torch.rand((20, 32, 16, 128, 128)).float() 106 | b_3d = a_3d.clone().cuda() 107 | 108 | a_1d.requires_grad = True 109 | a_2d.requires_grad = True 110 | a_3d.requires_grad = True 111 | b_1d.requires_grad = True 112 | b_2d.requires_grad = True 113 | b_3d.requires_grad = True 114 | 115 | 116 | print('\033[93m' + '> Checking 1D CPU ...' + '\033[0m') 117 | try: 118 | soft_pool1d(a_1d).pow(2).mean().backward() 119 | print('\033[92m' + '> PASSED' + '\033[0m') 120 | except Exception as e: 121 | print('\033[91m' + '> FAILED' + '\033[0m') 122 | print(e) 123 | 124 | print('\033[93m' + '> Checking 1D GPU ...' + '\033[0m') 125 | try: 126 | soft_pool1d(b_1d).pow(2).mean().backward() 127 | print('\033[92m' + '> PASSED' + '\033[0m') 128 | except Exception as e: 129 | print('\033[91m' + '> FAILED' + '\033[0m') 130 | print(e) 131 | 132 | print('\033[93m' + '> Checking 1D grad similarities ...' + '\033[0m') 133 | try: 134 | check_close_enough(a_1d.grad.data, b_1d.grad.data) 135 | print('\033[92m' + '> PASSED' + '\033[0m'+'\n') 136 | except Exception as e: 137 | print('\033[91m' + '> FAILED' + '\033[0m') 138 | print(e,'\n') 139 | 140 | print('\033[93m' + '> Checking 2D CPU ...' + '\033[0m') 141 | try: 142 | soft_pool2d(a_2d).pow(2).mean().backward() 143 | print('\033[92m' + '> PASSED' + '\033[0m') 144 | except Exception as e: 145 | print('\033[91m' + '> FAILED' + '\033[0m') 146 | print(e) 147 | 148 | print('\033[93m' + '> Checking 2D GPU ...' + '\033[0m') 149 | try: 150 | soft_pool2d(b_2d).pow(2).mean().backward() 151 | print('\033[92m' + '> PASSED' + '\033[0m') 152 | except Exception as e: 153 | print('\033[91m' + '> FAILED' + '\033[0m') 154 | print(e) 155 | 156 | print('\033[93m' + '> Checking 2D grad similarities ...' + '\033[0m') 157 | try: 158 | check_close_enough(a_2d.grad.data, b_2d.grad.data) 159 | print('\033[92m' + '> PASSED' + '\033[0m'+'\n') 160 | except Exception as e: 161 | print('\033[91m' + '> FAILED' + '\033[0m') 162 | print(e,'\n') 163 | 164 | print('\033[93m' + '> Checking 3D CPU ...' + '\033[0m') 165 | try: 166 | soft_pool3d(a_3d).pow(2).mean().backward() 167 | print('\033[92m' + '> PASSED' + '\033[0m') 168 | except Exception as e: 169 | print('\033[91m' + '> FAILED' + '\033[0m') 170 | print(e) 171 | 172 | print('\033[93m' + '> Checking 3D GPU ...' + '\033[0m') 173 | try: 174 | soft_pool3d(b_3d).pow(2).mean().backward() 175 | print('\033[92m' + '> PASSED' + '\033[0m') 176 | except Exception as e: 177 | print('\033[91m' + '> FAILED' + '\033[0m') 178 | print(e) 179 | 180 | print('\033[93m' + '> Checking 3D grad similarities ...' + '\033[0m') 181 | try: 182 | check_close_enough(a_3d.grad.data, b_3d.grad.data) 183 | print('\033[92m' + '> PASSED' + '\033[0m'+'\n') 184 | except Exception as e: 185 | print('\033[91m' + '> FAILED' + '\033[0m') 186 | print(e,'\n') 187 | 188 | 189 | print('\n'+'\033[92m' + 'TESTS COMPLETED' + '\033[0m'+'\n') 190 | 191 | print('\033[95m' + '--- Profiling checks ---' + '\033[0m') 192 | 193 | a_1d = torch.rand((10, 32, 80)).float() 194 | b_1d = a_1d.clone().cuda() 195 | c_1d = a_1d.clone().cuda() 196 | a_2d = torch.rand((10, 32, 80, 80)).float() 197 | b_2d = a_2d.clone().cuda() 198 | c_2d = a_2d.clone().cuda() 199 | a_3d = torch.rand((10, 32, 8, 80, 80)).float() 200 | b_3d = a_3d.clone().cuda() 201 | c_3d = a_3d.clone().cuda() 202 | 203 | 204 | a_1d.requires_grad = True 205 | a_2d.requires_grad = True 206 | a_3d.requires_grad = True 207 | b_1d.requires_grad = True 208 | b_2d.requires_grad = True 209 | b_3d.requires_grad = True 210 | c_1d.requires_grad = True 211 | c_2d.requires_grad = True 212 | c_3d.requires_grad = True 213 | 214 | 215 | with torch.autograd.profiler.profile(use_cuda=False) as prof: 216 | for i in range(100): 217 | soft_pool1d(a_1d) 218 | print('\033[93m' +'SoftPool1d (CPU) [foward]'+ '\033[0m') 219 | print(prof.key_averages().table(sort_by="self_cpu_time_total")) 220 | time_f_1d_cpu = ''.join(str(prof).split('\n')[-2:]) 221 | _tt = soft_pool1d(a_1d) 222 | with torch.autograd.profiler.profile(use_cuda=False) as prof: 223 | for i in range(100): 224 | soft_pool1d(a_1d).backward(_tt) 225 | print('\033[93m' +'SoftPool1d (CPU) [forward + backward]'+ '\033[0m') 226 | print(prof.key_averages().table(sort_by="self_cpu_time_total")) 227 | time_b_1d_cpu = ''.join(str(prof).split('\n')[-2:]) 228 | 229 | with torch.autograd.profiler.profile(use_cuda=True) as prof: 230 | for i in range(100): 231 | soft_pool1d(b_1d,force_inplace=True) 232 | print('\033[93m' +'SoftPool1d (CUDA-inplace) [foward]'+ '\033[0m') 233 | print(prof.key_averages()) 234 | time_f_1d_cuda_forced = ''.join(str(prof).split('\n')[-3:]) 235 | _tt = soft_pool1d(b_1d,force_inplace=True) 236 | with torch.autograd.profiler.profile(use_cuda=True) as prof: 237 | for i in range(100): 238 | soft_pool1d(b_1d,force_inplace=True).backward(_tt) 239 | print('\033[93m' +'SoftPool1d (CUDA-inplace) [forward + backward]'+ '\033[0m') 240 | print(prof.key_averages()) 241 | time_b_1d_cuda_forced = ''.join(str(prof).split('\n')[-3:]) 242 | 243 | with torch.autograd.profiler.profile(use_cuda=True) as prof: 244 | for i in range(100): 245 | soft_pool1d(c_1d) 246 | print('\033[93m' +'SoftPool1d (CUDA) [foward]'+ '\033[0m') 247 | print(prof.key_averages()) 248 | time_f_1d_cuda = ''.join(str(prof).split('\n')[-3:]) 249 | _tt = soft_pool1d(c_1d) 250 | with torch.autograd.profiler.profile(use_cuda=True) as prof: 251 | for i in range(100): 252 | soft_pool1d(c_1d).backward(_tt) 253 | print('\033[93m' +'SoftPool1d (CUDA) [forward + backward]'+ '\033[0m') 254 | print(prof.key_averages()) 255 | time_b_1d_cuda = ''.join(str(prof).split('\n')[-3:]) 256 | 257 | 258 | 259 | with torch.autograd.profiler.profile(use_cuda=False) as prof: 260 | for i in range(100): 261 | soft_pool2d(a_2d) 262 | print('\033[93m' +'SoftPool2d (CPU) [foward]'+ '\033[0m') 263 | print(prof.key_averages().table(sort_by="self_cpu_time_total")) 264 | time_f_2d_cpu = ''.join(str(prof).split('\n')[-2:]) 265 | _tt = soft_pool2d(a_2d) 266 | with torch.autograd.profiler.profile(use_cuda=False) as prof: 267 | for i in range(100): 268 | soft_pool2d(a_2d).backward(_tt) 269 | print('\033[93m' +'SoftPool2d (CPU) [forward + backward]'+ '\033[0m') 270 | print(prof.key_averages().table(sort_by="self_cpu_time_total")) 271 | time_b_2d_cpu = ''.join(str(prof).split('\n')[-2:]) 272 | 273 | with torch.autograd.profiler.profile(use_cuda=True) as prof: 274 | for i in range(100): 275 | soft_pool2d(b_2d,force_inplace=True) 276 | print('\033[93m' +'SoftPool2d (CUDA-inplace) [foward]'+ '\033[0m') 277 | print(prof.key_averages()) 278 | time_f_2d_cuda_forced = ''.join(str(prof).split('\n')[-3:]) 279 | _tt = soft_pool2d(b_2d,force_inplace=True) 280 | with torch.autograd.profiler.profile(use_cuda=True) as prof: 281 | for i in range(100): 282 | soft_pool2d(b_2d,force_inplace=True).backward(_tt) 283 | print('\033[93m' +'SoftPool2d (CUDA-inplace) [forward + backward]'+ '\033[0m') 284 | print(prof.key_averages()) 285 | time_b_2d_cuda_forced = ''.join(str(prof).split('\n')[-3:]) 286 | 287 | with torch.autograd.profiler.profile(use_cuda=True) as prof: 288 | for i in range(100): 289 | soft_pool2d(c_2d) 290 | print('\033[93m' +'SoftPool2d (CUDA) [foward]'+ '\033[0m') 291 | time_f_2d_cuda = ''.join(str(prof).split('\n')[-3:]) 292 | print(prof.key_averages()) 293 | _tt = soft_pool2d(c_2d) 294 | with torch.autograd.profiler.profile(use_cuda=True) as prof: 295 | for i in range(100): 296 | soft_pool2d(c_2d).backward(_tt) 297 | print('\033[93m' +'SoftPool2d (CUDA) [forward + backward]'+ '\033[0m') 298 | print(prof.key_averages()) 299 | time_b_2d_cuda = ''.join(str(prof).split('\n')[-3:]) 300 | 301 | 302 | 303 | with torch.autograd.profiler.profile(use_cuda=False) as prof: 304 | for i in range(100): 305 | soft_pool3d(a_3d) 306 | print('\033[93m' +'SoftPool3d (CPU) [foward]'+ '\033[0m') 307 | print(prof.key_averages().table(sort_by="self_cpu_time_total")) 308 | time_f_3d_cpu = ''.join(str(prof).split('\n')[-2:]) 309 | _tt = soft_pool3d(a_3d) 310 | with torch.autograd.profiler.profile(use_cuda=False) as prof: 311 | for i in range(100): 312 | soft_pool3d(a_3d).backward(_tt) 313 | print('\033[93m' +'SoftPool3d (CPU) [forward + backward]'+ '\033[0m') 314 | print(prof.key_averages().table(sort_by="self_cpu_time_total")) 315 | time_b_3d_cpu = ''.join(str(prof).split('\n')[-2:]) 316 | 317 | with torch.autograd.profiler.profile(use_cuda=True) as prof: 318 | for i in range(100): 319 | soft_pool3d(b_3d,force_inplace=True) 320 | print('\033[93m' +'SoftPool3d (CUDA-inplace) [foward]'+ '\033[0m') 321 | print(prof.key_averages()) 322 | time_f_3d_cuda_forced = ''.join(str(prof).split('\n')[-3:]) 323 | _tt = soft_pool3d(b_3d,force_inplace=True) 324 | with torch.autograd.profiler.profile(use_cuda=True) as prof: 325 | for i in range(100): 326 | soft_pool3d(b_3d,force_inplace=True).backward(_tt) 327 | print('\033[93m' +'SoftPool3d (CUDA-inplace) [forward + backward]'+ '\033[0m') 328 | print(prof.key_averages()) 329 | time_b_3d_cuda_forced = ''.join(str(prof).split('\n')[-3:]) 330 | 331 | with torch.autograd.profiler.profile(use_cuda=True) as prof: 332 | for i in range(100): 333 | soft_pool3d(c_3d) 334 | print('\033[93m' +'SoftPool3d (CUDA) [foward]'+ '\033[0m') 335 | print(prof.key_averages()) 336 | time_f_3d_cuda = ''.join(str(prof).split('\n')[-3:]) 337 | _tt = soft_pool3d(c_3d) 338 | with torch.autograd.profiler.profile(use_cuda=True) as prof: 339 | for i in range(100): 340 | soft_pool3d(c_3d).backward(_tt) 341 | print('\033[93m' +'SoftPool3d (CUDA) [forward + backward]'+ '\033[0m') 342 | print(prof.key_averages()) 343 | time_b_3d_cuda = ''.join(str(prof).split('\n')[-3:]) 344 | 345 | 346 | print('\n'+'\033[93m' +'-------------------------------'+ '\033[0m') 347 | print('\033[93m' +'SoftPool1d [forward + backward]'+ '\033[0m') 348 | print('\n'+'\033[93m' +'----------- C P U ------------'+ '\033[0m') 349 | print(time_b_1d_cpu) 350 | print('\n'+'\033[93m' +'-- C U D A - I N P L A C E ---'+ '\033[0m') 351 | print(time_b_1d_cuda_forced) 352 | print('\n'+'\033[93m' +'---------- C U D A -----------'+ '\033[0m') 353 | print(time_b_1d_cuda) 354 | print('\n'+'\033[93m' +'-------------------------------'+ '\033[0m') 355 | 356 | print('\n'+'\033[93m' +'-------------------------------'+ '\033[0m') 357 | print('\033[93m' +'SoftPool2d [forward + backward]'+ '\033[0m') 358 | print('\n'+'\033[93m' +'----------- C P U ------------'+ '\033[0m') 359 | print(time_b_2d_cpu) 360 | print('\n'+'\033[93m' +'-- C U D A - I N P L A C E ---'+ '\033[0m') 361 | print(time_b_2d_cuda_forced) 362 | print('\n'+'\033[93m' +'---------- C U D A -----------'+ '\033[0m') 363 | print(time_b_2d_cuda) 364 | print('\n'+'\033[93m' +'-------------------------------'+ '\033[0m') 365 | 366 | print('\n'+'\033[93m' +'-------------------------------'+ '\033[0m') 367 | print('\033[93m' +'SoftPool3d [forward + backward]'+ '\033[0m') 368 | print('\n'+'\033[93m' +'----------- C P U ------------'+ '\033[0m') 369 | print(time_b_3d_cpu) 370 | print('\n'+'\033[93m' +'-- C U D A - I N P L A C E ---'+ '\033[0m') 371 | print(time_b_3d_cuda_forced) 372 | print('\n'+'\033[93m' +'---------- C U D A -----------'+ '\033[0m') 373 | print(time_b_3d_cuda) 374 | print('\n'+'\033[93m' +'-------------------------------'+ '\033[0m') 375 | 376 | print('\n'+'\033[95m' + '--- Tests finished ---' + '\033[0m') 377 | --------------------------------------------------------------------------------