├── LICENSE
├── README.md
├── comparison-spp-tpp.png
├── pyramidpooling.py
└── pytorch-tpp-visual.gif


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 revidee
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Pyramid Pooling implemented in PyTorch
 2 | This Module implements Spatial Pyramid Pooling (SPP) and Temporal Pyramid Pooling (TPP) as described in different papers.
 3 | 
 4 | 
 5 | ![SPP-TPP Comparison](https://github.com/revidee/pytorch-pyramid-pooling/blob/master/comparison-spp-tpp.png "SPP-TPP Comparison")
 6 | 
 7 | 
 8 | Temporal Pyramid Pooling:
 9 | ------
10 | [Sudholt, Fink: Evaluating Word String Embeddings and LossFunctions for CNN-based Word Spotting](http://patrec.cs.tu-dortmund.de/pubs/papers/Sudholt2017-EWS.pdf "Sudholt, Fink: Evaluating Word String Embeddings and LossFunctions for CNN-based Word Spotting")
11 | 
12 | ### Principle
13 | Given an 2D input Tensor, Temporal Pyramid Pooling divides the input in **x** _stripes_ which **extend through the height** of the image and **width of roughly (input_width / x)**. These stripes are then each pooled with max- or avg-pooling to calculate the output.
14 | 
15 | ### Animated Principle
16 | ![TPP Visualization](https://github.com/revidee/pytorch-pyramid-pooling/blob/master/pytorch-tpp-visual.gif "TPP Visualization")
17 | 
18 | Spatial Pyramid Pooling:
19 | ------
20 | [He, et. al.: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition](https://arxiv.org/abs/1406.4729 "He et. al.: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition")
21 | 
22 | ### Principle
23 | Given an 2D input Tensor, Spatial Pyramid Pooling divides the input in **x²** _rectangles_ with **height of roughly (input_height / x)** and **width of roughly (input_width / x)**. These rectangles are then each pooled with max- or avg-pooling to calculate the output.
24 | 
25 | 


--------------------------------------------------------------------------------
/comparison-spp-tpp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/revidee/pytorch-pyramid-pooling/d814eacc81bbc5d1826104b2046b9344b2a9c45c/comparison-spp-tpp.png


--------------------------------------------------------------------------------
/pyramidpooling.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | import torch
  3 | import torch.nn as nn
  4 | import torch.nn.functional as F
  5 | 
  6 | 
  7 | class PyramidPooling(nn.Module):
  8 |     def __init__(self, levels, mode="max"):
  9 |         """
 10 |         General Pyramid Pooling class which uses Spatial Pyramid Pooling by default and holds the static methods for both spatial and temporal pooling.
 11 |         :param levels defines the different divisions to be made in the width and (spatial) height dimension
 12 |         :param mode defines the underlying pooling mode to be used, can either be "max" or "avg"
 13 | 
 14 |         :returns a tensor vector with shape [batch x 1 x n], where  n: sum(filter_amount*level*level) for each level in levels (spatial) or
 15 |                                                                     n: sum(filter_amount*level) for each level in levels (temporal)
 16 |                                             which is the concentration of multi-level pooling
 17 |         """
 18 |         super(PyramidPooling, self).__init__()
 19 |         self.levels = levels
 20 |         self.mode = mode
 21 | 
 22 |     def forward(self, x):
 23 |         return self.spatial_pyramid_pool(x, self.levels, self.mode)
 24 | 
 25 |     def get_output_size(self, filters):
 26 |         out = 0
 27 |         for level in self.levels:
 28 |             out += filters * level * level
 29 |         return out
 30 | 
 31 |     @staticmethod
 32 |     def spatial_pyramid_pool(previous_conv, levels, mode):
 33 |         """
 34 |         Static Spatial Pyramid Pooling method, which divides the input Tensor vertically and horizontally
 35 |         (last 2 dimensions) according to each level in the given levels and pools its value according to the given mode.
 36 |         :param previous_conv input tensor of the previous convolutional layer
 37 |         :param levels defines the different divisions to be made in the width and height dimension
 38 |         :param mode defines the underlying pooling mode to be used, can either be "max" or "avg"
 39 | 
 40 |         :returns a tensor vector with shape [batch x 1 x n],
 41 |                                             where n: sum(filter_amount*level*level) for each level in levels
 42 |                                             which is the concentration of multi-level pooling
 43 |         """
 44 |         num_sample = previous_conv.size(0)
 45 |         previous_conv_size = [int(previous_conv.size(2)), int(previous_conv.size(3))]
 46 |         for i in range(len(levels)):
 47 |             h_kernel = int(math.ceil(previous_conv_size[0] / levels[i]))
 48 |             w_kernel = int(math.ceil(previous_conv_size[1] / levels[i]))
 49 |             w_pad1 = int(math.floor((w_kernel * levels[i] - previous_conv_size[1]) / 2))
 50 |             w_pad2 = int(math.ceil((w_kernel * levels[i] - previous_conv_size[1]) / 2))
 51 |             h_pad1 = int(math.floor((h_kernel * levels[i] - previous_conv_size[0]) / 2))
 52 |             h_pad2 = int(math.ceil((h_kernel * levels[i] - previous_conv_size[0]) / 2))
 53 |             assert w_pad1 + w_pad2 == (w_kernel * levels[i] - previous_conv_size[1]) and \
 54 |                    h_pad1 + h_pad2 == (h_kernel * levels[i] - previous_conv_size[0])
 55 | 
 56 |             padded_input = F.pad(input=previous_conv, pad=[w_pad1, w_pad2, h_pad1, h_pad2],
 57 |                                  mode='constant', value=0)
 58 |             if mode == "max":
 59 |                 pool = nn.MaxPool2d((h_kernel, w_kernel), stride=(h_kernel, w_kernel), padding=(0, 0))
 60 |             elif mode == "avg":
 61 |                 pool = nn.AvgPool2d((h_kernel, w_kernel), stride=(h_kernel, w_kernel), padding=(0, 0))
 62 |             else:
 63 |                 raise RuntimeError("Unknown pooling type: %s, please use \"max\" or \"avg\".")
 64 |             x = pool(padded_input)
 65 |             if i == 0:
 66 |                 spp = x.view(num_sample, -1)
 67 |             else:
 68 |                 spp = torch.cat((spp, x.view(num_sample, -1)), 1)
 69 | 
 70 |         return spp
 71 | 
 72 |     @staticmethod
 73 |     def temporal_pyramid_pool(previous_conv, out_pool_size, mode):
 74 |         """
 75 |         Static Temporal Pyramid Pooling method, which divides the input Tensor horizontally (last dimensions)
 76 |         according to each level in the given levels and pools its value according to the given mode.
 77 |         In other words: It divides the Input Tensor in "level" horizontal stripes with width of roughly (previous_conv.size(3) / level)
 78 |         and the original height and pools the values inside this stripe
 79 |         :param previous_conv input tensor of the previous convolutional layer
 80 |         :param levels defines the different divisions to be made in the width dimension
 81 |         :param mode defines the underlying pooling mode to be used, can either be "max" or "avg"
 82 | 
 83 |         :returns a tensor vector with shape [batch x 1 x n],
 84 |                                             where n: sum(filter_amount*level) for each level in levels
 85 |                                             which is the concentration of multi-level pooling
 86 |         """
 87 |         num_sample = previous_conv.size(0)
 88 |         previous_conv_size = [int(previous_conv.size(2)), int(previous_conv.size(3))]
 89 |         for i in range(len(out_pool_size)):
 90 |             # print(previous_conv_size)
 91 |             #
 92 |             h_kernel = previous_conv_size[0]
 93 |             w_kernel = int(math.ceil(previous_conv_size[1] / out_pool_size[i]))
 94 |             w_pad1 = int(math.floor((w_kernel * out_pool_size[i] - previous_conv_size[1]) / 2))
 95 |             w_pad2 = int(math.ceil((w_kernel * out_pool_size[i] - previous_conv_size[1]) / 2))
 96 |             assert w_pad1 + w_pad2 == (w_kernel * out_pool_size[i] - previous_conv_size[1])
 97 | 
 98 |             padded_input = F.pad(input=previous_conv, pad=[w_pad1, w_pad2],
 99 |                                  mode='constant', value=0)
100 |             if mode == "max":
101 |                 pool = nn.MaxPool2d((h_kernel, w_kernel), stride=(h_kernel, w_kernel), padding=(0, 0))
102 |             elif mode == "avg":
103 |                 pool = nn.AvgPool2d((h_kernel, w_kernel), stride=(h_kernel, w_kernel), padding=(0, 0))
104 |             else:
105 |                 raise RuntimeError("Unknown pooling type: %s, please use \"max\" or \"avg\".")
106 |             x = pool(padded_input)
107 |             if i == 0:
108 |                 tpp = x.view(num_sample, -1)
109 |             else:
110 |                 tpp = torch.cat((tpp, x.view(num_sample, -1)), 1)
111 | 
112 |         return tpp
113 | 
114 | 
115 | class SpatialPyramidPooling(PyramidPooling):
116 |     def __init__(self, levels, mode="max"):
117 |         """
118 |                 Spatial Pyramid Pooling Module, which divides the input Tensor horizontally and horizontally
119 |                 (last 2 dimensions) according to each level in the given levels and pools its value according to the given mode.
120 |                 Can be used as every other pytorch Module and has no learnable parameters since it's a static pooling.
121 |                 In other words: It divides the Input Tensor in level*level rectangles width of roughly (previous_conv.size(3) / level)
122 |                 and height of roughly (previous_conv.size(2) / level) and pools its value. (pads input to fit)
123 |                 :param levels defines the different divisions to be made in the width dimension
124 |                 :param mode defines the underlying pooling mode to be used, can either be "max" or "avg"
125 | 
126 |                 :returns (forward) a tensor vector with shape [batch x 1 x n],
127 |                                                     where n: sum(filter_amount*level*level) for each level in levels
128 |                                                     which is the concentration of multi-level pooling
129 |                 """
130 |         super(SpatialPyramidPooling, self).__init__(levels, mode=mode)
131 | 
132 |     def forward(self, x):
133 |         return self.spatial_pyramid_pool(x, self.levels, self.mode)
134 | 
135 |     def get_output_size(self, filters):
136 |         """
137 |                 Calculates the output shape given a filter_amount: sum(filter_amount*level*level) for each level in levels
138 |                 Can be used to x.view(-1, spp.get_output_size(filter_amount)) for the fully-connected layers
139 |                 :param filters: the amount of filter of output fed into the spatial pyramid pooling
140 |                 :return: sum(filter_amount*level*level)
141 |         """
142 |         out = 0
143 |         for level in self.levels:
144 |             out += filters * level * level
145 |         return out
146 | 
147 | 
148 | class TemporalPyramidPooling(PyramidPooling):
149 |     def __init__(self, levels, mode="max"):
150 |         """
151 |         Temporal Pyramid Pooling Module, which divides the input Tensor horizontally (last dimensions)
152 |         according to each level in the given levels and pools its value according to the given mode.
153 |         Can be used as every other pytorch Module and has no learnable parameters since it's a static pooling.
154 |         In other words: It divides the Input Tensor in "level" horizontal stripes with width of roughly (previous_conv.size(3) / level)
155 |         and the original height and pools the values inside this stripe
156 |         :param levels defines the different divisions to be made in the width dimension
157 |         :param mode defines the underlying pooling mode to be used, can either be "max" or "avg"
158 | 
159 |         :returns (forward) a tensor vector with shape [batch x 1 x n],
160 |                                             where n: sum(filter_amount*level) for each level in levels
161 |                                             which is the concentration of multi-level pooling
162 |         """
163 |         super(TemporalPyramidPooling, self).__init__(levels, mode=mode)
164 | 
165 |     def forward(self, x):
166 |         return self.temporal_pyramid_pool(x, self.levels, self.mode)
167 | 
168 |     def get_output_size(self, filters):
169 |         """
170 |         Calculates the output shape given a filter_amount: sum(filter_amount*level) for each level in levels
171 |         Can be used to x.view(-1, tpp.get_output_size(filter_amount)) for the fully-connected layers
172 |         :param filters: the amount of filter of output fed into the temporal pyramid pooling
173 |         :return: sum(filter_amount*level)
174 |         """
175 |         out = 0
176 |         for level in self.levels:
177 |             out += filters * level
178 |         return out
179 | 


--------------------------------------------------------------------------------
/pytorch-tpp-visual.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/revidee/pytorch-pyramid-pooling/d814eacc81bbc5d1826104b2046b9344b2a9c45c/pytorch-tpp-visual.gif


--------------------------------------------------------------------------------