├── third_party
    └── flowiz
    │   ├── __init__.py
    │   ├── README.md
    │   ├── LICENSE
    │   └── flowiz.py
├── mpi_sintel_images
    ├── frame_0001.png
    ├── frame_0002.png
    └── README.md
├── LICENSE
├── README.md
└── sift_flow_torch.py


/third_party/flowiz/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/mpi_sintel_images/frame_0001.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hmorimitsu/sift-flow-gpu/HEAD/mpi_sintel_images/frame_0001.png


--------------------------------------------------------------------------------
/mpi_sintel_images/frame_0002.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hmorimitsu/sift-flow-gpu/HEAD/mpi_sintel_images/frame_0002.png


--------------------------------------------------------------------------------
/third_party/flowiz/README.md:
--------------------------------------------------------------------------------
1 |  Flowiz code gotten from [https://github.com/georgegach/flowiz](https://github.com/georgegach/flowiz).


--------------------------------------------------------------------------------
/mpi_sintel_images/README.md:
--------------------------------------------------------------------------------
1 | Images collected from the MPI Sintel dataset available at [http://sintel.is.tue.mpg.de/downloads](http://sintel.is.tue.mpg.de/downloads).
2 | 
3 | These images are the first two frames of the alley 1 - clean pass sequence.


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 Henrique Morimitsu
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/third_party/flowiz/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 George Gach
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # sift-flow-gpu
 2 | 
 3 | Implementation of the SIFT Flow descriptor [[1]](#references) on GPU using PyTorch.
 4 | 
 5 | This implementation is a port of the original implementation available at
 6 | [https://people.csail.mit.edu/celiu/SIFTflow/](https://people.csail.mit.edu/celiu/SIFTflow/).
 7 | 
 8 | This code is able to process a batch of images simultaneously for better
 9 | performance. The most expensive operation when running in GPU mode is the
10 | allocation of the space for the descriptors on the GPU. However, this step
11 | is only performed when the shape of the input batch changes. Subsequent
12 | calls using batches with the same shape as before will reuse the memory and
13 | will, therefore, be much faster.
14 | 
15 | Code for DAISY descriptors on GPU is also available at [https://github.com/hmorimitsu/daisy-gpu](https://github.com/hmorimitsu/daisy-gpu).
16 | 
17 | ## Requirements
18 | 
19 | - [Python 3](https://www.python.org/) (Tested on 3.7)
20 | - [Numpy](https://www.numpy.org/)
21 | - [PyTorch](https://pytorch.org/) >= 1.0.0 (Tested on 1.3.0)
22 | 
23 | ## Usage
24 | 
25 | A simple example is shown below. A more complete practical usage is available as a [Jupyter demo notebook](demo_notebook_torch.ipynb)
26 | 
27 | ```python
28 | from sift_flow_torch import SiftFlowTorch
29 | 
30 | sift_flow = SiftFlowTorch()
31 | imgs = [
32 |     read_some_image,
33 |     read_another_image
34 | ]
35 | descs = sift_flow.extract_descriptor(imgs) # This first call can be
36 |                                            # slower, due to memory allocation
37 | imgs2 = [
38 |     read_yet_another_image,
39 |     read_even_one_more_image
40 | ]
41 | descs2 = sift_flow.extract_descriptor(imgs2) # Subsequent calls are faster,
42 |                                              # if images retain same shape
43 | 
44 | # descs[0] is the descriptor of imgs[0] and so on.
45 | ```
46 | 
47 | ## Benchmark
48 | 
49 | - Machine configuration:
50 |   - Intel i7 8750H
51 |   - NVIDIA GeForce GTX1070
52 |   - Images 1024 x 436
53 |   - Descriptor size 128
54 | 
55 | Batch Size|FP16|Memory usage(GB)<sup>1</sup>|Time GPU(ms)<sup>2</sup>|Time GPU(ms)<sup>3</sup>|Time CPU(ms)
56 | -|------------------|---|------|------|------
57 | 1|                  |0.9|  19.0| 128.0| 660.6
58 | 2|                  |1.3|  35.3| 257.1|1275.1 
59 | 4|                  |2.1|  70.7| 516.2|2559.3 
60 | 8|                  |3.7| 142.5| 969.4|5773.9 
61 | 1|:heavy_check_mark:|0.7|  14.7|      |
62 | 2|:heavy_check_mark:|0.9|  27.2|      |
63 | 4|:heavy_check_mark:|1.3|  54.8|      |
64 | 8|:heavy_check_mark:|2.1| 110.9|      |
65 | 
66 | <sup>1</sup> Maximum value reported by `nvidia-smi` during the respective tests.
67 | 
68 | <sup>2</sup> NOT including time to transfer the result from GPU to CPU.
69 | 
70 | <sup>3</sup> Including time to transfer the result from GPU to CPU.
71 | 
72 | These times are the median of 5 runs measured after a warm up run to allocate the descriptor space in memory
73 | (read the [introduction](#sift-flow-gpu)).
74 | 
75 | ## References
76 | 
77 | [1] C. Liu; Jenny Yuen; Antonio Torralba. "SIFT Flow: Dense correspondence across scenes and its applications." IEEE Transactions on Pattern Analysis and Machine Intelligence 33.5 (2010): 978-994.
78 | 


--------------------------------------------------------------------------------
/third_party/flowiz/flowiz.py:
--------------------------------------------------------------------------------
  1 | # Converts Flow .flo files to Images
  2 | 
  3 | # Author : George Gach (@georgegach)
  4 | # Date   : July 2019
  5 | 
  6 | # Adapted from the Middlebury Vision project's Flow-Code
  7 | # URL    : http://vision.middlebury.edu/flow/
  8 | 
  9 | import numpy as np
 10 | import os
 11 | import errno
 12 | from tqdm import tqdm
 13 | from PIL import Image
 14 | import io
 15 | 
 16 | TAG_FLOAT = 202021.25
 17 | flags = {
 18 |     'debug': False
 19 | }
 20 | 
 21 | 
 22 | def read_flow(path):
 23 |     if not isinstance(path, io.BufferedReader):
 24 |         if not isinstance(path, str):
 25 |             raise AssertionError(
 26 |                 "Input [{p}] is not a string".format(p=path))
 27 |         if not os.path.isfile(path):
 28 |             raise AssertionError(
 29 |                 "Path [{p}] does not exist".format(p=path))
 30 |         if not path.split('.')[-1] == 'flo':
 31 |             raise AssertionError(
 32 |                 "File extension [flo] required, [{f}] given".format(f=path.split('.')[-1]))
 33 | 
 34 |         flo = open(path, 'rb')
 35 |     else:
 36 |         flo = path
 37 | 
 38 |     tag = np.frombuffer(flo.read(4), np.float32, count=1)[0]
 39 |     if not TAG_FLOAT == tag:
 40 |         raise AssertionError("Wrong Tag [{t}]".format(t=tag))
 41 | 
 42 |     width = np.frombuffer(flo.read(4), np.int32, count=1)[0]
 43 |     if not (width > 0 and width < 100000):
 44 |         raise AssertionError("Illegal width [{w}]".format(w=width))
 45 | 
 46 |     height = np.frombuffer(flo.read(4), np.int32, count=1)[0]
 47 |     if not (width > 0 and width < 100000):
 48 |         raise AssertionError("Illegal height [{h}]".format(h=height))
 49 | 
 50 |     nbands = 2
 51 |     tmp = np.frombuffer(flo.read(nbands * width * height * 4),
 52 |                         np.float32, count=nbands * width * height)
 53 |     flow = np.resize(tmp, (int(height), int(width), int(nbands)))
 54 |     flo.close()
 55 | 
 56 |     return flow
 57 | 
 58 | 
 59 | def _color_wheel():
 60 |     # Original inspiration: http://members.shaw.ca/quadibloc/other/colint.htm
 61 | 
 62 |     RY = 15
 63 |     YG = 6
 64 |     GC = 4
 65 |     CB = 11
 66 |     BM = 13
 67 |     MR = 6
 68 | 
 69 |     ncols = RY + YG + GC + CB + BM + MR
 70 | 
 71 |     colorwheel = np.zeros([ncols, 3])  # RGB
 72 | 
 73 |     col = 0
 74 | 
 75 |     # RY
 76 |     colorwheel[0:RY, 0] = 255
 77 |     colorwheel[0:RY, 1] = np.floor(255*np.arange(0, RY, 1)/RY)
 78 |     col += RY
 79 | 
 80 |     # YG
 81 |     colorwheel[col: YG + col, 0] = 255 - \
 82 |         np.floor(255*np.arange(0, YG, 1)/YG)
 83 |     colorwheel[col: YG + col, 1] = 255
 84 |     col += YG
 85 | 
 86 |     # GC
 87 |     colorwheel[col: GC + col, 1] = 255
 88 |     colorwheel[col: GC + col, 2] = np.floor(255*np.arange(0, GC, 1)/GC)
 89 |     col += GC
 90 | 
 91 |     # CB
 92 |     colorwheel[col: CB + col, 1] = 255 - \
 93 |         np.floor(255*np.arange(0, CB, 1)/CB)
 94 |     colorwheel[col: CB + col, 2] = 255
 95 |     col += CB
 96 | 
 97 |     # BM
 98 |     colorwheel[col: BM + col, 2] = 255
 99 |     colorwheel[col: BM + col, 0] = np.floor(255*np.arange(0, BM, 1)/BM)
100 |     col += BM
101 | 
102 |     # MR
103 |     colorwheel[col: MR + col, 2] = 255 - \
104 |         np.floor(255*np.arange(0, MR, 1)/MR)
105 |     colorwheel[col: MR + col, 0] = 255
106 | 
107 |     return colorwheel
108 | 
109 | 
110 | def _compute_color(u, v):
111 |     colorwheel = _color_wheel()
112 |     idxNans = np.where(np.logical_or(
113 |         np.isnan(u),
114 |         np.isnan(v)
115 |     ))
116 |     u[idxNans] = 0
117 |     v[idxNans] = 0
118 | 
119 |     ncols = colorwheel.shape[0]
120 |     radius = np.sqrt(np.multiply(u, u) + np.multiply(v, v))
121 |     a = np.arctan2(-v, -u) / np.pi
122 |     fk = (a+1) / 2 * (ncols - 1)
123 |     k0 = fk.astype(np.uint8)
124 |     k1 = k0 + 1
125 |     k1[k1 == ncols] = 0
126 |     f = fk - k0
127 | 
128 |     img = np.empty([k1.shape[0], k1.shape[1], 3])
129 |     ncolors = colorwheel.shape[1]
130 | 
131 |     for i in range(ncolors):
132 |         tmp = colorwheel[:, i]
133 |         col0 = tmp[k0] / 255
134 |         col1 = tmp[k1] / 255
135 |         col = (1-f) * col0 + f * col1
136 |         idx = radius <= 1
137 |         col[idx] = 1 - radius[idx] * (1 - col[idx])
138 |         col[~idx] *= 0.75
139 |         img[:, :, i] = np.floor(255 * col).astype(np.uint8)  # RGB
140 |         # img[:, :, 2 - i] = np.floor(255 * col).astype(np.uint8) # BGR
141 | 
142 |     return img.astype(np.uint8)
143 | 
144 | 
145 | def _normalize_flow(flow):
146 |     UNKNOWN_FLOW_THRESH = 1e9
147 |     # UNKNOWN_FLOW = 1e10
148 | 
149 |     height, width, nBands = flow.shape
150 |     if not nBands == 2:
151 |         raise AssertionError("Image must have two bands. [{h},{w},{nb}] shape given instead".format(
152 |             h=height, w=width, nb=nBands))
153 | 
154 |     u = flow[:, :, 0]
155 |     v = flow[:, :, 1]
156 | 
157 |     # Fix unknown flow
158 |     idxUnknown = np.where(np.logical_or(
159 |         abs(u) > UNKNOWN_FLOW_THRESH,
160 |         abs(v) > UNKNOWN_FLOW_THRESH
161 |     ))
162 |     u[idxUnknown] = 0
163 |     v[idxUnknown] = 0
164 | 
165 |     maxu = max([-999, np.max(u)])
166 |     maxv = max([-999, np.max(v)])
167 |     minu = max([999, np.min(u)])
168 |     minv = max([999, np.min(v)])
169 | 
170 |     rad = np.sqrt(np.multiply(u, u) + np.multiply(v, v))
171 |     maxrad = max([-1, np.max(rad)])
172 | 
173 |     if flags['debug']:
174 |         print("Max Flow : {maxrad:.4f}. Flow Range [u, v] -> [{minu:.3f}:{maxu:.3f}, {minv:.3f}:{maxv:.3f}] ".format(
175 |             minu=minu, minv=minv, maxu=maxu, maxv=maxv, maxrad=maxrad
176 |         ))
177 | 
178 |     eps = np.finfo(np.float32).eps
179 |     u = u/(maxrad + eps)
180 |     v = v/(maxrad + eps)
181 | 
182 |     return u, v
183 | 
184 | 
185 | def _flow2color(flow):
186 | 
187 |     u, v = _normalize_flow(flow)
188 |     img = _compute_color(u, v)
189 | 
190 |     # TO-DO
191 |     # Indicate unknown flows on the image
192 |     # Originally done as
193 |     #
194 |     # IDX = repmat(idxUnknown, [1 1 3]);
195 |     # img(IDX) = 0;
196 | 
197 |     return img
198 | 
199 | 
200 | def _flow2uv(flow):
201 |     u, v = _normalize_flow(flow)
202 |     uv = (np.dstack([u, v])*127.999+128).astype('uint8')
203 |     return uv
204 | 
205 | 
206 | def _save_png(arr, path):
207 |     # TO-DO: No dependency
208 |     Image.fromarray(arr).save(path)
209 | 
210 | 
211 | def convert_from_file(path, mode='RGB'):
212 |     return convert_from_flow(read_flow(path), mode)
213 | 
214 | 
215 | def convert_from_flow(flow, mode='RGB'):
216 |     if mode == 'RGB':
217 |         return _flow2color(flow)
218 |     if mode == 'UV':
219 |         return _flow2uv(flow)
220 | 
221 |     return _flow2color(flow)
222 | 
223 | 
224 | def convert_files(files, outdir=None):
225 |     if outdir != None and not os.path.exists(outdir):
226 |         try:
227 |             os.makedirs(outdir)
228 |             print("> Created directory: " + outdir)
229 |         except OSError as exc:
230 |             if exc.errno != errno.EEXIST:
231 |                 raise
232 | 
233 |     t = tqdm(files)
234 |     for f in t:
235 |         image = convert_from_file(f)
236 | 
237 |         if outdir == None:
238 |             path = f + '.png'
239 |             t.set_description(path)
240 |             _save_png(image, path)
241 |         else:
242 |             path = os.path.join(outdir, os.path.basename(f) + '.png')
243 |             t.set_description(path)
244 |             _save_png(image, path)
245 | 


--------------------------------------------------------------------------------
/sift_flow_torch.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | import torch.nn.functional as F
  4 | 
  5 | class SiftFlowTorch(object):
  6 |     """ Computes dense SIFT Flow [1] descriptors from a batch of images. It
  7 |     uses PyTorch to perform operations on GPU (if available) to significantly
  8 |     speedup the process. This implementation is a port of the origina
  9 |     implementation available at
 10 |     https://people.csail.mit.edu/celiu/SIFTflow/.
 11 | 
 12 |     This code is able to process a batch of images simultaneously for better
 13 |     performance. The most expensive operation when running in GPU mode is the
 14 |     allocation of the space for the descriptors on the GPU. However, this step
 15 |     is only performed when the shape of the input batch changes. Subsequent
 16 |     calls using batches with the same shape as before will reuse the memory and
 17 |     will, therefore, be much faster.
 18 | 
 19 |     Usage:
 20 |         from sift_flow_torch import SiftFlowTorch
 21 | 
 22 |         sift_flow = SiftFlowTorch()
 23 |         imgs = [
 24 |             read_some_image,
 25 |             read_another_image
 26 |         ]
 27 |         descs = sift_flow.extract_descriptor(imgs) # This first call can be
 28 |                                                    # slower, due to memory
 29 |                                                    # allocation
 30 |         imgs2 = [
 31 |             read_yet_another_image,
 32 |             read_even_one_more_image
 33 |         ]
 34 |         descs2 = sift_flow.extract_descriptor(imgs2) # Subsequent calls are 
 35 |                                                      # faster, if images retain
 36 |                                                      # the same shape
 37 | 
 38 |         # descs[0] is the descriptor of imgs[0] and so on.
 39 | 
 40 |     Args:
 41 |         cell_size : int, optional
 42 |             Size of the side of the cell used to compute the descriptor.
 43 |         step_size, : int, optional
 44 |             Distance between the descriptor sampled points.
 45 |         is_boundary_included : boolean
 46 |             If False, the descriptor is not computed for pixels in the image
 47 |             boundaries, to avoid boundary effects.
 48 |         num_bins : int, optional
 49 |             Number of bins of the descriptor.
 50 |         cuda : boolean, optional
 51 |             If True, operations are done on GPU (if available).
 52 |         fp16 : boolean, optional
 53 |             Whether to use half-precision floating points for computing the
 54 |             descriptors. Half-precision mode uses less memory and it may be
 55 |             slightly faster but less accurate.
 56 |         return_numpy : boolean, optional
 57 |             If True, transfers the descriptor from pytorch to numpy before
 58 |             returning. This will increase the running time due to the memory
 59 |             transfer.
 60 | 
 61 |     References:
 62 |     [1] C. Liu; Jenny Yuen; Antonio Torralba. "SIFT Flow: Dense correspondence
 63 |         across scenes and its applications." IEEE Transactions on Pattern
 64 |         Analysis and Machine Intelligence 33.5 (2010): 978-994.
 65 |     """
 66 |     def __init__(self,
 67 |                  cell_size=2,
 68 |                  step_size=1,
 69 |                  is_boundary_included=True,
 70 |                  num_bins=8,
 71 |                  cuda=True,
 72 |                  fp16=False,
 73 |                  return_numpy=False):
 74 |         self.cell_size = cell_size
 75 |         self.step_size = step_size
 76 |         self.is_boundary_included = is_boundary_included
 77 |         self.num_bins = num_bins
 78 |         self.return_numpy = return_numpy
 79 |         self.cuda = cuda and torch.cuda.is_available()
 80 |         self.fp16 = fp16 and self.cuda
 81 |         if cuda and not torch.cuda.is_available():
 82 |             print('WARNING! CUDA mode requested, but',
 83 |                   'torch.cuda.is_available() is False.',
 84 |                   'Operations will run on CPU.')
 85 |         if fp16 and not self.cuda:
 86 |             print('WARNING! FP16 can only be used in CUDA mode, but',
 87 |                   'CUDA is not enabled. FP32 will be used instead.')
 88 | 
 89 |         # this parameter controls decay of the gradient energy falls into a bin
 90 |         # run SIFT_weightFunc.m to see why alpha = 9 is the best value
 91 |         self.alpha = 9
 92 | 
 93 |         self.theta = 2 * np.pi / self.num_bins
 94 | 
 95 |         self.gradient = None
 96 |         self.imax_mag = None
 97 |         self.sin_bins = None
 98 |         self.cos_bins = None
 99 |         self.offsets = None
100 |         self.descs = None
101 |         self.grad_filter = None
102 |         self.max_batch_size = 1
103 | 
104 |         self.filter = self._compute_filter()
105 | 
106 |         if self.fp16:
107 |             self.epsilon = 1e-3
108 |         else:
109 |             self.epsilon = 1e-10
110 | 
111 |     def extract_descriptor(self,
112 |                            images):
113 |         """ Main function of this class, which extracts the descriptors from
114 |         a batch of images.
115 | 
116 |         Args:
117 |             images : list of 3D array of int or float. 
118 |                 List of images to form the batch. All images should have the
119 |                 same shape [Hi, Wi, Ci], with any number of channels Ci. The
120 |                 pixel values are assumed to be in the interval [0, 255].
121 | 
122 |         Returns:
123 |             descs : 4D array of floats
124 |                 Grid of SIFT Flow descriptors for the given image as an array
125 |                 of dimensionality (N, Co, Ho, Wo) where
126 |                 ``N = len(images)``
127 |                 ``Co = 16 * num_bins``
128 |                 ``Ho = floor((Hi - D) / step_size)``
129 |                 ``Wo = floor((Wi - D) / step_size)``
130 |                 ``D = 0 if is_boundary_included else 4*cell_size``
131 |         """
132 |         images = np.stack(images, axis=0).transpose(0, 3, 1, 2)
133 |         images = torch.from_numpy(images)
134 |         if self.fp16:
135 |             images = images.half()
136 |         else:
137 |             images = images.float()
138 |         if self.cuda:
139 |             images = images.cuda()
140 |         images /= 255.0
141 | 
142 |         self.batch_size = images.shape[0]
143 |         self.max_batch_size = max(self.max_batch_size, self.batch_size)
144 | 
145 |         if self.grad_filter is None:
146 |             kernel = np.array(
147 |                 [-1, 0, 1, -2, 0, 2, -1, 0, 1], np.float32).reshape(3, 3)
148 |             gf = np.zeros((images.shape[1], images.shape[1], 3, 3))
149 |             for i in range(gf.shape[0]):
150 |                 gf[i, i] = kernel
151 |             self.grad_filter = torch.from_numpy(gf)
152 |             if self.fp16:
153 |                 self.grad_filter = self.grad_filter.half()
154 |             else:
155 |                 self.grad_filter = self.grad_filter.float()
156 |             if self.cuda:
157 |                 self.grad_filter = self.grad_filter.cuda()
158 |         images_pad = F.pad(
159 |             images, (1, 1, 1, 1), mode='replicate')
160 |         dx = F.conv2d(images_pad, self.grad_filter)
161 |         dy = F.conv2d(images_pad, self.grad_filter.permute(0, 1, 3, 2))
162 | 
163 |         # Get the maximum gradient over the channels and estimate the
164 |         # normalized gradient
165 |         magsrc = torch.sqrt(torch.pow(dx, 2) + torch.pow(dy, 2))
166 |         mag, max_mag_idx = torch.max(magsrc, dim=1, keepdim=True)
167 |         if (self.imax_mag is None or
168 |                 self.imax_mag.shape[0] < self.max_batch_size or
169 |                 self.imax_mag.shape[2:] != images.shape[2:]):
170 |             self.imax_mag = torch.zeros(
171 |                 (self.max_batch_size,) + images.shape[1:])
172 |             if self.fp16:
173 |                 self.imax_mag = self.imax_mag.half()
174 |             else:
175 |                 self.imax_mag = self.imax_mag.float()
176 |             if self.cuda:
177 |                 self.imax_mag = self.imax_mag.cuda()
178 |         imax_mag = self.imax_mag[:self.batch_size]
179 |         imax_mag[:] = 0
180 |         imax_mag = imax_mag.scatter_(1, max_mag_idx, 1)
181 | 
182 |         if (self.gradient is None or
183 |                 self.gradient.shape[0] < self.max_batch_size or
184 |                 self.gradient.shape[2:] != images.shape[2:]):
185 |             self.gradient = torch.zeros(
186 |                 self.max_batch_size, 2, images.shape[2], images.shape[3])
187 |             if self.fp16:
188 |                 self.gradient = self.gradient.half()
189 |             else:
190 |                 self.gradient = self.gradient.float()
191 |             if self.cuda:
192 |                 self.gradient = self.gradient.cuda()
193 |         gradient = self.gradient[:self.batch_size]
194 |         gradient[:, 0] = (
195 |             torch.sum(dx * imax_mag, dim=1) / (mag[:, 0] + self.epsilon))
196 |         gradient[:, 1] = (
197 |             torch.sum(dy * imax_mag, dim=1) / (mag[:, 0] + self.epsilon))
198 | 
199 |         # Get the pixel-wise energy for each orientation band
200 |         bin_shape = (self.max_batch_size, 1, images.shape[2], images.shape[3])
201 |         if self.sin_bins is None or self.sin_bins.shape != bin_shape:
202 |             idx = torch.arange(self.num_bins).reshape(1, -1, 1, 1)
203 |             if self.fp16:
204 |                 idx = idx.half()
205 |             else:
206 |                 idx = idx.float()
207 |             if self.cuda:
208 |                 idx = idx.cuda()
209 |             idx = idx.repeat(
210 |                 self.max_batch_size, 1, images.shape[2], images.shape[3])
211 |             self.sin_bins = torch.sin(idx * self.theta)
212 |             self.cos_bins = torch.cos(idx * self.theta)
213 |         imband = (self.cos_bins[:self.batch_size] * gradient[:, :1] +
214 |                   self.sin_bins[:self.batch_size] * gradient[:, 1:2])
215 |         imband = torch.max(imband, torch.zeros_like(imband))
216 |         if self.alpha > 1:
217 |             imband = torch.pow(imband, self.alpha)
218 |         imband *= mag
219 | 
220 |         # Filter out the SIFT feature
221 |         imband_cell = self._filter_features(imband)
222 | 
223 |         # Allocate buffer for the sift image
224 |         siftdim = self.num_bins * 16
225 |         sift_height = images.shape[2] // self.step_size
226 |         sift_width = images.shape[3] // self.step_size
227 |         x_shift = 0
228 |         y_shift = 0
229 |         if not self.is_boundary_included:
230 |             sift_height = (images.shape[2]-4*self.cell_size) // self.step_size
231 |             sift_width = (images.shape[3]-4*self.cell_size) // self.step_size
232 |             x_shift = 2 * self.cell_size
233 |             y_shift = 2 * self.cell_size
234 | 
235 |         self._compute_offsets(
236 |             images.shape, sift_height, sift_width, x_shift, y_shift)
237 |         
238 |         desc_shape = (
239 |             self.max_batch_size, siftdim, sift_height, sift_width)
240 |         if self.descs is None or self.descs.shape != desc_shape:
241 |             self.descs = torch.empty(desc_shape)
242 |             if self.fp16:
243 |                 self.descs = self.descs.half()
244 |             else:
245 |                 self.descs = self.descs.float()
246 |             if self.cuda:
247 |                 self.descs = self.descs.cuda()
248 |         descs = self.descs[:self.batch_size]
249 |         for i in range(4):
250 |             for j in range(4):
251 |                 idx = 4 * i + j
252 |                 feats = F.grid_sample(
253 |                     imband_cell, self.offsets[
254 |                         :self.batch_size, :, :, 2*idx:2*idx+2],
255 |                     mode='nearest')
256 | 
257 |                 start = idx*self.num_bins
258 |                 end = (idx+1)*self.num_bins
259 |                 descs[:, start:end] = feats
260 | 
261 |         mag = descs.norm(dim=1, keepdim=True)
262 |         descs /= mag + 0.01
263 | 
264 |         if self.return_numpy:
265 |             descs = descs.detach().cpu().numpy()
266 | 
267 |         return descs
268 | 
269 |     def _compute_filter(self):
270 |         filt = np.zeros(
271 |             (self.num_bins, self.num_bins, 1, self.cell_size*2+1))
272 |         for i in range(self.num_bins):
273 |             filt[i, i, 0, 0] = 0.25
274 |             filt[i, i, 0, self.cell_size+1] = 0.25
275 |             for j in range(1, self.cell_size+1):
276 |                 filt[i, i, 0, j+1] = 1.0
277 |         filt = torch.from_numpy(filt)
278 |         if self.fp16:
279 |             filt = filt.half()
280 |         else:
281 |             filt = filt.float()
282 |         if self.cuda:
283 |             filt = filt.cuda()
284 |         return filt
285 | 
286 |     def _compute_offsets(self,
287 |                          images_shape,
288 |                          sift_height,
289 |                          sift_width,
290 |                          x_shift,
291 |                          y_shift):
292 |         if (self.offsets is None or
293 |                 self.offsets.shape[0] < self.max_batch_size or
294 |                 sift_height != self.offsets.shape[1] or
295 |                 sift_width != self.offsets.shape[2]):
296 |             xv, yv = torch.meshgrid(
297 |                 [torch.arange(0, sift_height),
298 |                  torch.arange(0, sift_width)])
299 |             grid = torch.stack([yv, xv], dim=2).unsqueeze(0)
300 |             if self.fp16:
301 |                 grid = grid.half()
302 |             else:
303 |                 grid = grid.float()
304 |             if self.cuda:
305 |                 grid = grid.cuda()
306 |             grid *= self.step_size
307 |             self.offsets = torch.zeros(
308 |                 self.max_batch_size, sift_height, sift_width, 32)
309 |             if self.fp16:
310 |                 self.offsets = self.offsets.half()
311 |             else:
312 |                 self.offsets = self.offsets.float()
313 |             if self.cuda:
314 |                 self.offsets = self.offsets.cuda()
315 |             for i in range(-1, 3):
316 |                 for j in range(-1, 3):
317 |                     off_y = y_shift + grid[:, :, :, 1] + i*self.cell_size
318 |                     off_y = torch.max(off_y, torch.zeros_like(off_y))
319 |                     off_y = torch.min(
320 |                         off_y, torch.zeros_like(off_y) + images_shape[2] - 1)
321 |                     off_x = x_shift + grid[:, :, :, 0] + j*self.cell_size
322 |                     off_x = torch.max(off_x, torch.zeros_like(off_x))
323 |                     off_x = torch.min(
324 |                         off_x, torch.zeros_like(off_x) + images_shape[3] - 1)
325 |                     idx = 4 * (i + 1) + j + 1
326 |                     self.offsets[:, :, :, 2*idx] = (
327 |                         off_x.repeat(images_shape[0], 1, 1))
328 |                     self.offsets[:, :, :, 2*idx+1] = (
329 |                         off_y.repeat(images_shape[0], 1, 1))
330 |             self.offsets[:, :, :, ::2] /= images_shape[3] - 1
331 |             self.offsets[:, :, :, 1::2] /= images_shape[2] - 1
332 |             self.offsets *= 2
333 |             self.offsets -= 1
334 | 
335 |     def _filter_features(self,
336 |                          imband):
337 |         radius = self.filter.shape[3] // 2
338 |         imband_smooth = imband
339 |         imband_smooth = F.pad(
340 |             imband_smooth, (radius, radius, 0, 0), mode='replicate')
341 |         imband_smooth = F.conv2d(imband_smooth, self.filter)
342 |         # print('imband_smooth', imband_smooth.min(), imband_smooth.max(), torch.abs(imband_smooth).sum())
343 |         imband_smooth = F.pad(
344 |             imband_smooth, (0, 0, radius, radius), mode='replicate')
345 |         imband_smooth = F.conv2d(
346 |             imband_smooth, self.filter.permute(0, 1, 3, 2))
347 |         # print('imband_smooth2', imband_smooth.min(), imband_smooth.max(), torch.abs(imband_smooth).sum())
348 |         return imband_smooth
349 | 


--------------------------------------------------------------------------------