├── LICENSE ├── README.md ├── comparison-spp-tpp.png ├── pyramidpooling.py └── pytorch-tpp-visual.gif /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 revidee 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Pyramid Pooling implemented in PyTorch 2 | This Module implements Spatial Pyramid Pooling (SPP) and Temporal Pyramid Pooling (TPP) as described in different papers. 3 | 4 | 5 | ![SPP-TPP Comparison](https://github.com/revidee/pytorch-pyramid-pooling/blob/master/comparison-spp-tpp.png "SPP-TPP Comparison") 6 | 7 | 8 | Temporal Pyramid Pooling: 9 | ------ 10 | [Sudholt, Fink: Evaluating Word String Embeddings and LossFunctions for CNN-based Word Spotting](http://patrec.cs.tu-dortmund.de/pubs/papers/Sudholt2017-EWS.pdf "Sudholt, Fink: Evaluating Word String Embeddings and LossFunctions for CNN-based Word Spotting") 11 | 12 | ### Principle 13 | Given an 2D input Tensor, Temporal Pyramid Pooling divides the input in **x** _stripes_ which **extend through the height** of the image and **width of roughly (input_width / x)**. These stripes are then each pooled with max- or avg-pooling to calculate the output. 14 | 15 | ### Animated Principle 16 | ![TPP Visualization](https://github.com/revidee/pytorch-pyramid-pooling/blob/master/pytorch-tpp-visual.gif "TPP Visualization") 17 | 18 | Spatial Pyramid Pooling: 19 | ------ 20 | [He, et. al.: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition](https://arxiv.org/abs/1406.4729 "He et. al.: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition") 21 | 22 | ### Principle 23 | Given an 2D input Tensor, Spatial Pyramid Pooling divides the input in **x²** _rectangles_ with **height of roughly (input_height / x)** and **width of roughly (input_width / x)**. These rectangles are then each pooled with max- or avg-pooling to calculate the output. 24 | 25 | -------------------------------------------------------------------------------- /comparison-spp-tpp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/revidee/pytorch-pyramid-pooling/d814eacc81bbc5d1826104b2046b9344b2a9c45c/comparison-spp-tpp.png -------------------------------------------------------------------------------- /pyramidpooling.py: -------------------------------------------------------------------------------- 1 | import math 2 | import torch 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | 6 | 7 | class PyramidPooling(nn.Module): 8 | def __init__(self, levels, mode="max"): 9 | """ 10 | General Pyramid Pooling class which uses Spatial Pyramid Pooling by default and holds the static methods for both spatial and temporal pooling. 11 | :param levels defines the different divisions to be made in the width and (spatial) height dimension 12 | :param mode defines the underlying pooling mode to be used, can either be "max" or "avg" 13 | 14 | :returns a tensor vector with shape [batch x 1 x n], where n: sum(filter_amount*level*level) for each level in levels (spatial) or 15 | n: sum(filter_amount*level) for each level in levels (temporal) 16 | which is the concentration of multi-level pooling 17 | """ 18 | super(PyramidPooling, self).__init__() 19 | self.levels = levels 20 | self.mode = mode 21 | 22 | def forward(self, x): 23 | return self.spatial_pyramid_pool(x, self.levels, self.mode) 24 | 25 | def get_output_size(self, filters): 26 | out = 0 27 | for level in self.levels: 28 | out += filters * level * level 29 | return out 30 | 31 | @staticmethod 32 | def spatial_pyramid_pool(previous_conv, levels, mode): 33 | """ 34 | Static Spatial Pyramid Pooling method, which divides the input Tensor vertically and horizontally 35 | (last 2 dimensions) according to each level in the given levels and pools its value according to the given mode. 36 | :param previous_conv input tensor of the previous convolutional layer 37 | :param levels defines the different divisions to be made in the width and height dimension 38 | :param mode defines the underlying pooling mode to be used, can either be "max" or "avg" 39 | 40 | :returns a tensor vector with shape [batch x 1 x n], 41 | where n: sum(filter_amount*level*level) for each level in levels 42 | which is the concentration of multi-level pooling 43 | """ 44 | num_sample = previous_conv.size(0) 45 | previous_conv_size = [int(previous_conv.size(2)), int(previous_conv.size(3))] 46 | for i in range(len(levels)): 47 | h_kernel = int(math.ceil(previous_conv_size[0] / levels[i])) 48 | w_kernel = int(math.ceil(previous_conv_size[1] / levels[i])) 49 | w_pad1 = int(math.floor((w_kernel * levels[i] - previous_conv_size[1]) / 2)) 50 | w_pad2 = int(math.ceil((w_kernel * levels[i] - previous_conv_size[1]) / 2)) 51 | h_pad1 = int(math.floor((h_kernel * levels[i] - previous_conv_size[0]) / 2)) 52 | h_pad2 = int(math.ceil((h_kernel * levels[i] - previous_conv_size[0]) / 2)) 53 | assert w_pad1 + w_pad2 == (w_kernel * levels[i] - previous_conv_size[1]) and \ 54 | h_pad1 + h_pad2 == (h_kernel * levels[i] - previous_conv_size[0]) 55 | 56 | padded_input = F.pad(input=previous_conv, pad=[w_pad1, w_pad2, h_pad1, h_pad2], 57 | mode='constant', value=0) 58 | if mode == "max": 59 | pool = nn.MaxPool2d((h_kernel, w_kernel), stride=(h_kernel, w_kernel), padding=(0, 0)) 60 | elif mode == "avg": 61 | pool = nn.AvgPool2d((h_kernel, w_kernel), stride=(h_kernel, w_kernel), padding=(0, 0)) 62 | else: 63 | raise RuntimeError("Unknown pooling type: %s, please use \"max\" or \"avg\".") 64 | x = pool(padded_input) 65 | if i == 0: 66 | spp = x.view(num_sample, -1) 67 | else: 68 | spp = torch.cat((spp, x.view(num_sample, -1)), 1) 69 | 70 | return spp 71 | 72 | @staticmethod 73 | def temporal_pyramid_pool(previous_conv, out_pool_size, mode): 74 | """ 75 | Static Temporal Pyramid Pooling method, which divides the input Tensor horizontally (last dimensions) 76 | according to each level in the given levels and pools its value according to the given mode. 77 | In other words: It divides the Input Tensor in "level" horizontal stripes with width of roughly (previous_conv.size(3) / level) 78 | and the original height and pools the values inside this stripe 79 | :param previous_conv input tensor of the previous convolutional layer 80 | :param levels defines the different divisions to be made in the width dimension 81 | :param mode defines the underlying pooling mode to be used, can either be "max" or "avg" 82 | 83 | :returns a tensor vector with shape [batch x 1 x n], 84 | where n: sum(filter_amount*level) for each level in levels 85 | which is the concentration of multi-level pooling 86 | """ 87 | num_sample = previous_conv.size(0) 88 | previous_conv_size = [int(previous_conv.size(2)), int(previous_conv.size(3))] 89 | for i in range(len(out_pool_size)): 90 | # print(previous_conv_size) 91 | # 92 | h_kernel = previous_conv_size[0] 93 | w_kernel = int(math.ceil(previous_conv_size[1] / out_pool_size[i])) 94 | w_pad1 = int(math.floor((w_kernel * out_pool_size[i] - previous_conv_size[1]) / 2)) 95 | w_pad2 = int(math.ceil((w_kernel * out_pool_size[i] - previous_conv_size[1]) / 2)) 96 | assert w_pad1 + w_pad2 == (w_kernel * out_pool_size[i] - previous_conv_size[1]) 97 | 98 | padded_input = F.pad(input=previous_conv, pad=[w_pad1, w_pad2], 99 | mode='constant', value=0) 100 | if mode == "max": 101 | pool = nn.MaxPool2d((h_kernel, w_kernel), stride=(h_kernel, w_kernel), padding=(0, 0)) 102 | elif mode == "avg": 103 | pool = nn.AvgPool2d((h_kernel, w_kernel), stride=(h_kernel, w_kernel), padding=(0, 0)) 104 | else: 105 | raise RuntimeError("Unknown pooling type: %s, please use \"max\" or \"avg\".") 106 | x = pool(padded_input) 107 | if i == 0: 108 | tpp = x.view(num_sample, -1) 109 | else: 110 | tpp = torch.cat((tpp, x.view(num_sample, -1)), 1) 111 | 112 | return tpp 113 | 114 | 115 | class SpatialPyramidPooling(PyramidPooling): 116 | def __init__(self, levels, mode="max"): 117 | """ 118 | Spatial Pyramid Pooling Module, which divides the input Tensor horizontally and horizontally 119 | (last 2 dimensions) according to each level in the given levels and pools its value according to the given mode. 120 | Can be used as every other pytorch Module and has no learnable parameters since it's a static pooling. 121 | In other words: It divides the Input Tensor in level*level rectangles width of roughly (previous_conv.size(3) / level) 122 | and height of roughly (previous_conv.size(2) / level) and pools its value. (pads input to fit) 123 | :param levels defines the different divisions to be made in the width dimension 124 | :param mode defines the underlying pooling mode to be used, can either be "max" or "avg" 125 | 126 | :returns (forward) a tensor vector with shape [batch x 1 x n], 127 | where n: sum(filter_amount*level*level) for each level in levels 128 | which is the concentration of multi-level pooling 129 | """ 130 | super(SpatialPyramidPooling, self).__init__(levels, mode=mode) 131 | 132 | def forward(self, x): 133 | return self.spatial_pyramid_pool(x, self.levels, self.mode) 134 | 135 | def get_output_size(self, filters): 136 | """ 137 | Calculates the output shape given a filter_amount: sum(filter_amount*level*level) for each level in levels 138 | Can be used to x.view(-1, spp.get_output_size(filter_amount)) for the fully-connected layers 139 | :param filters: the amount of filter of output fed into the spatial pyramid pooling 140 | :return: sum(filter_amount*level*level) 141 | """ 142 | out = 0 143 | for level in self.levels: 144 | out += filters * level * level 145 | return out 146 | 147 | 148 | class TemporalPyramidPooling(PyramidPooling): 149 | def __init__(self, levels, mode="max"): 150 | """ 151 | Temporal Pyramid Pooling Module, which divides the input Tensor horizontally (last dimensions) 152 | according to each level in the given levels and pools its value according to the given mode. 153 | Can be used as every other pytorch Module and has no learnable parameters since it's a static pooling. 154 | In other words: It divides the Input Tensor in "level" horizontal stripes with width of roughly (previous_conv.size(3) / level) 155 | and the original height and pools the values inside this stripe 156 | :param levels defines the different divisions to be made in the width dimension 157 | :param mode defines the underlying pooling mode to be used, can either be "max" or "avg" 158 | 159 | :returns (forward) a tensor vector with shape [batch x 1 x n], 160 | where n: sum(filter_amount*level) for each level in levels 161 | which is the concentration of multi-level pooling 162 | """ 163 | super(TemporalPyramidPooling, self).__init__(levels, mode=mode) 164 | 165 | def forward(self, x): 166 | return self.temporal_pyramid_pool(x, self.levels, self.mode) 167 | 168 | def get_output_size(self, filters): 169 | """ 170 | Calculates the output shape given a filter_amount: sum(filter_amount*level) for each level in levels 171 | Can be used to x.view(-1, tpp.get_output_size(filter_amount)) for the fully-connected layers 172 | :param filters: the amount of filter of output fed into the temporal pyramid pooling 173 | :return: sum(filter_amount*level) 174 | """ 175 | out = 0 176 | for level in self.levels: 177 | out += filters * level 178 | return out 179 | -------------------------------------------------------------------------------- /pytorch-tpp-visual.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/revidee/pytorch-pyramid-pooling/d814eacc81bbc5d1826104b2046b9344b2a9c45c/pytorch-tpp-visual.gif --------------------------------------------------------------------------------