├── third_party
└── flowiz
│ ├── __init__.py
│ ├── README.md
│ ├── LICENSE
│ └── flowiz.py
├── mpi_sintel_images
├── frame_0001.png
├── frame_0002.png
└── README.md
├── LICENSE
├── README.md
└── sift_flow_torch.py
/third_party/flowiz/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/mpi_sintel_images/frame_0001.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hmorimitsu/sift-flow-gpu/HEAD/mpi_sintel_images/frame_0001.png
--------------------------------------------------------------------------------
/mpi_sintel_images/frame_0002.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hmorimitsu/sift-flow-gpu/HEAD/mpi_sintel_images/frame_0002.png
--------------------------------------------------------------------------------
/third_party/flowiz/README.md:
--------------------------------------------------------------------------------
1 | Flowiz code gotten from [https://github.com/georgegach/flowiz](https://github.com/georgegach/flowiz).
--------------------------------------------------------------------------------
/mpi_sintel_images/README.md:
--------------------------------------------------------------------------------
1 | Images collected from the MPI Sintel dataset available at [http://sintel.is.tue.mpg.de/downloads](http://sintel.is.tue.mpg.de/downloads).
2 |
3 | These images are the first two frames of the alley 1 - clean pass sequence.
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 Henrique Morimitsu
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/third_party/flowiz/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 George Gach
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # sift-flow-gpu
2 |
3 | Implementation of the SIFT Flow descriptor [[1]](#references) on GPU using PyTorch.
4 |
5 | This implementation is a port of the original implementation available at
6 | [https://people.csail.mit.edu/celiu/SIFTflow/](https://people.csail.mit.edu/celiu/SIFTflow/).
7 |
8 | This code is able to process a batch of images simultaneously for better
9 | performance. The most expensive operation when running in GPU mode is the
10 | allocation of the space for the descriptors on the GPU. However, this step
11 | is only performed when the shape of the input batch changes. Subsequent
12 | calls using batches with the same shape as before will reuse the memory and
13 | will, therefore, be much faster.
14 |
15 | Code for DAISY descriptors on GPU is also available at [https://github.com/hmorimitsu/daisy-gpu](https://github.com/hmorimitsu/daisy-gpu).
16 |
17 | ## Requirements
18 |
19 | - [Python 3](https://www.python.org/) (Tested on 3.7)
20 | - [Numpy](https://www.numpy.org/)
21 | - [PyTorch](https://pytorch.org/) >= 1.0.0 (Tested on 1.3.0)
22 |
23 | ## Usage
24 |
25 | A simple example is shown below. A more complete practical usage is available as a [Jupyter demo notebook](demo_notebook_torch.ipynb)
26 |
27 | ```python
28 | from sift_flow_torch import SiftFlowTorch
29 |
30 | sift_flow = SiftFlowTorch()
31 | imgs = [
32 | read_some_image,
33 | read_another_image
34 | ]
35 | descs = sift_flow.extract_descriptor(imgs) # This first call can be
36 | # slower, due to memory allocation
37 | imgs2 = [
38 | read_yet_another_image,
39 | read_even_one_more_image
40 | ]
41 | descs2 = sift_flow.extract_descriptor(imgs2) # Subsequent calls are faster,
42 | # if images retain same shape
43 |
44 | # descs[0] is the descriptor of imgs[0] and so on.
45 | ```
46 |
47 | ## Benchmark
48 |
49 | - Machine configuration:
50 | - Intel i7 8750H
51 | - NVIDIA GeForce GTX1070
52 | - Images 1024 x 436
53 | - Descriptor size 128
54 |
55 | Batch Size|FP16|Memory usage(GB)1|Time GPU(ms)2|Time GPU(ms)3|Time CPU(ms)
56 | -|------------------|---|------|------|------
57 | 1| |0.9| 19.0| 128.0| 660.6
58 | 2| |1.3| 35.3| 257.1|1275.1
59 | 4| |2.1| 70.7| 516.2|2559.3
60 | 8| |3.7| 142.5| 969.4|5773.9
61 | 1|:heavy_check_mark:|0.7| 14.7| |
62 | 2|:heavy_check_mark:|0.9| 27.2| |
63 | 4|:heavy_check_mark:|1.3| 54.8| |
64 | 8|:heavy_check_mark:|2.1| 110.9| |
65 |
66 | 1 Maximum value reported by `nvidia-smi` during the respective tests.
67 |
68 | 2 NOT including time to transfer the result from GPU to CPU.
69 |
70 | 3 Including time to transfer the result from GPU to CPU.
71 |
72 | These times are the median of 5 runs measured after a warm up run to allocate the descriptor space in memory
73 | (read the [introduction](#sift-flow-gpu)).
74 |
75 | ## References
76 |
77 | [1] C. Liu; Jenny Yuen; Antonio Torralba. "SIFT Flow: Dense correspondence across scenes and its applications." IEEE Transactions on Pattern Analysis and Machine Intelligence 33.5 (2010): 978-994.
78 |
--------------------------------------------------------------------------------
/third_party/flowiz/flowiz.py:
--------------------------------------------------------------------------------
1 | # Converts Flow .flo files to Images
2 |
3 | # Author : George Gach (@georgegach)
4 | # Date : July 2019
5 |
6 | # Adapted from the Middlebury Vision project's Flow-Code
7 | # URL : http://vision.middlebury.edu/flow/
8 |
9 | import numpy as np
10 | import os
11 | import errno
12 | from tqdm import tqdm
13 | from PIL import Image
14 | import io
15 |
16 | TAG_FLOAT = 202021.25
17 | flags = {
18 | 'debug': False
19 | }
20 |
21 |
22 | def read_flow(path):
23 | if not isinstance(path, io.BufferedReader):
24 | if not isinstance(path, str):
25 | raise AssertionError(
26 | "Input [{p}] is not a string".format(p=path))
27 | if not os.path.isfile(path):
28 | raise AssertionError(
29 | "Path [{p}] does not exist".format(p=path))
30 | if not path.split('.')[-1] == 'flo':
31 | raise AssertionError(
32 | "File extension [flo] required, [{f}] given".format(f=path.split('.')[-1]))
33 |
34 | flo = open(path, 'rb')
35 | else:
36 | flo = path
37 |
38 | tag = np.frombuffer(flo.read(4), np.float32, count=1)[0]
39 | if not TAG_FLOAT == tag:
40 | raise AssertionError("Wrong Tag [{t}]".format(t=tag))
41 |
42 | width = np.frombuffer(flo.read(4), np.int32, count=1)[0]
43 | if not (width > 0 and width < 100000):
44 | raise AssertionError("Illegal width [{w}]".format(w=width))
45 |
46 | height = np.frombuffer(flo.read(4), np.int32, count=1)[0]
47 | if not (width > 0 and width < 100000):
48 | raise AssertionError("Illegal height [{h}]".format(h=height))
49 |
50 | nbands = 2
51 | tmp = np.frombuffer(flo.read(nbands * width * height * 4),
52 | np.float32, count=nbands * width * height)
53 | flow = np.resize(tmp, (int(height), int(width), int(nbands)))
54 | flo.close()
55 |
56 | return flow
57 |
58 |
59 | def _color_wheel():
60 | # Original inspiration: http://members.shaw.ca/quadibloc/other/colint.htm
61 |
62 | RY = 15
63 | YG = 6
64 | GC = 4
65 | CB = 11
66 | BM = 13
67 | MR = 6
68 |
69 | ncols = RY + YG + GC + CB + BM + MR
70 |
71 | colorwheel = np.zeros([ncols, 3]) # RGB
72 |
73 | col = 0
74 |
75 | # RY
76 | colorwheel[0:RY, 0] = 255
77 | colorwheel[0:RY, 1] = np.floor(255*np.arange(0, RY, 1)/RY)
78 | col += RY
79 |
80 | # YG
81 | colorwheel[col: YG + col, 0] = 255 - \
82 | np.floor(255*np.arange(0, YG, 1)/YG)
83 | colorwheel[col: YG + col, 1] = 255
84 | col += YG
85 |
86 | # GC
87 | colorwheel[col: GC + col, 1] = 255
88 | colorwheel[col: GC + col, 2] = np.floor(255*np.arange(0, GC, 1)/GC)
89 | col += GC
90 |
91 | # CB
92 | colorwheel[col: CB + col, 1] = 255 - \
93 | np.floor(255*np.arange(0, CB, 1)/CB)
94 | colorwheel[col: CB + col, 2] = 255
95 | col += CB
96 |
97 | # BM
98 | colorwheel[col: BM + col, 2] = 255
99 | colorwheel[col: BM + col, 0] = np.floor(255*np.arange(0, BM, 1)/BM)
100 | col += BM
101 |
102 | # MR
103 | colorwheel[col: MR + col, 2] = 255 - \
104 | np.floor(255*np.arange(0, MR, 1)/MR)
105 | colorwheel[col: MR + col, 0] = 255
106 |
107 | return colorwheel
108 |
109 |
110 | def _compute_color(u, v):
111 | colorwheel = _color_wheel()
112 | idxNans = np.where(np.logical_or(
113 | np.isnan(u),
114 | np.isnan(v)
115 | ))
116 | u[idxNans] = 0
117 | v[idxNans] = 0
118 |
119 | ncols = colorwheel.shape[0]
120 | radius = np.sqrt(np.multiply(u, u) + np.multiply(v, v))
121 | a = np.arctan2(-v, -u) / np.pi
122 | fk = (a+1) / 2 * (ncols - 1)
123 | k0 = fk.astype(np.uint8)
124 | k1 = k0 + 1
125 | k1[k1 == ncols] = 0
126 | f = fk - k0
127 |
128 | img = np.empty([k1.shape[0], k1.shape[1], 3])
129 | ncolors = colorwheel.shape[1]
130 |
131 | for i in range(ncolors):
132 | tmp = colorwheel[:, i]
133 | col0 = tmp[k0] / 255
134 | col1 = tmp[k1] / 255
135 | col = (1-f) * col0 + f * col1
136 | idx = radius <= 1
137 | col[idx] = 1 - radius[idx] * (1 - col[idx])
138 | col[~idx] *= 0.75
139 | img[:, :, i] = np.floor(255 * col).astype(np.uint8) # RGB
140 | # img[:, :, 2 - i] = np.floor(255 * col).astype(np.uint8) # BGR
141 |
142 | return img.astype(np.uint8)
143 |
144 |
145 | def _normalize_flow(flow):
146 | UNKNOWN_FLOW_THRESH = 1e9
147 | # UNKNOWN_FLOW = 1e10
148 |
149 | height, width, nBands = flow.shape
150 | if not nBands == 2:
151 | raise AssertionError("Image must have two bands. [{h},{w},{nb}] shape given instead".format(
152 | h=height, w=width, nb=nBands))
153 |
154 | u = flow[:, :, 0]
155 | v = flow[:, :, 1]
156 |
157 | # Fix unknown flow
158 | idxUnknown = np.where(np.logical_or(
159 | abs(u) > UNKNOWN_FLOW_THRESH,
160 | abs(v) > UNKNOWN_FLOW_THRESH
161 | ))
162 | u[idxUnknown] = 0
163 | v[idxUnknown] = 0
164 |
165 | maxu = max([-999, np.max(u)])
166 | maxv = max([-999, np.max(v)])
167 | minu = max([999, np.min(u)])
168 | minv = max([999, np.min(v)])
169 |
170 | rad = np.sqrt(np.multiply(u, u) + np.multiply(v, v))
171 | maxrad = max([-1, np.max(rad)])
172 |
173 | if flags['debug']:
174 | print("Max Flow : {maxrad:.4f}. Flow Range [u, v] -> [{minu:.3f}:{maxu:.3f}, {minv:.3f}:{maxv:.3f}] ".format(
175 | minu=minu, minv=minv, maxu=maxu, maxv=maxv, maxrad=maxrad
176 | ))
177 |
178 | eps = np.finfo(np.float32).eps
179 | u = u/(maxrad + eps)
180 | v = v/(maxrad + eps)
181 |
182 | return u, v
183 |
184 |
185 | def _flow2color(flow):
186 |
187 | u, v = _normalize_flow(flow)
188 | img = _compute_color(u, v)
189 |
190 | # TO-DO
191 | # Indicate unknown flows on the image
192 | # Originally done as
193 | #
194 | # IDX = repmat(idxUnknown, [1 1 3]);
195 | # img(IDX) = 0;
196 |
197 | return img
198 |
199 |
200 | def _flow2uv(flow):
201 | u, v = _normalize_flow(flow)
202 | uv = (np.dstack([u, v])*127.999+128).astype('uint8')
203 | return uv
204 |
205 |
206 | def _save_png(arr, path):
207 | # TO-DO: No dependency
208 | Image.fromarray(arr).save(path)
209 |
210 |
211 | def convert_from_file(path, mode='RGB'):
212 | return convert_from_flow(read_flow(path), mode)
213 |
214 |
215 | def convert_from_flow(flow, mode='RGB'):
216 | if mode == 'RGB':
217 | return _flow2color(flow)
218 | if mode == 'UV':
219 | return _flow2uv(flow)
220 |
221 | return _flow2color(flow)
222 |
223 |
224 | def convert_files(files, outdir=None):
225 | if outdir != None and not os.path.exists(outdir):
226 | try:
227 | os.makedirs(outdir)
228 | print("> Created directory: " + outdir)
229 | except OSError as exc:
230 | if exc.errno != errno.EEXIST:
231 | raise
232 |
233 | t = tqdm(files)
234 | for f in t:
235 | image = convert_from_file(f)
236 |
237 | if outdir == None:
238 | path = f + '.png'
239 | t.set_description(path)
240 | _save_png(image, path)
241 | else:
242 | path = os.path.join(outdir, os.path.basename(f) + '.png')
243 | t.set_description(path)
244 | _save_png(image, path)
245 |
--------------------------------------------------------------------------------
/sift_flow_torch.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | import torch.nn.functional as F
4 |
5 | class SiftFlowTorch(object):
6 | """ Computes dense SIFT Flow [1] descriptors from a batch of images. It
7 | uses PyTorch to perform operations on GPU (if available) to significantly
8 | speedup the process. This implementation is a port of the origina
9 | implementation available at
10 | https://people.csail.mit.edu/celiu/SIFTflow/.
11 |
12 | This code is able to process a batch of images simultaneously for better
13 | performance. The most expensive operation when running in GPU mode is the
14 | allocation of the space for the descriptors on the GPU. However, this step
15 | is only performed when the shape of the input batch changes. Subsequent
16 | calls using batches with the same shape as before will reuse the memory and
17 | will, therefore, be much faster.
18 |
19 | Usage:
20 | from sift_flow_torch import SiftFlowTorch
21 |
22 | sift_flow = SiftFlowTorch()
23 | imgs = [
24 | read_some_image,
25 | read_another_image
26 | ]
27 | descs = sift_flow.extract_descriptor(imgs) # This first call can be
28 | # slower, due to memory
29 | # allocation
30 | imgs2 = [
31 | read_yet_another_image,
32 | read_even_one_more_image
33 | ]
34 | descs2 = sift_flow.extract_descriptor(imgs2) # Subsequent calls are
35 | # faster, if images retain
36 | # the same shape
37 |
38 | # descs[0] is the descriptor of imgs[0] and so on.
39 |
40 | Args:
41 | cell_size : int, optional
42 | Size of the side of the cell used to compute the descriptor.
43 | step_size, : int, optional
44 | Distance between the descriptor sampled points.
45 | is_boundary_included : boolean
46 | If False, the descriptor is not computed for pixels in the image
47 | boundaries, to avoid boundary effects.
48 | num_bins : int, optional
49 | Number of bins of the descriptor.
50 | cuda : boolean, optional
51 | If True, operations are done on GPU (if available).
52 | fp16 : boolean, optional
53 | Whether to use half-precision floating points for computing the
54 | descriptors. Half-precision mode uses less memory and it may be
55 | slightly faster but less accurate.
56 | return_numpy : boolean, optional
57 | If True, transfers the descriptor from pytorch to numpy before
58 | returning. This will increase the running time due to the memory
59 | transfer.
60 |
61 | References:
62 | [1] C. Liu; Jenny Yuen; Antonio Torralba. "SIFT Flow: Dense correspondence
63 | across scenes and its applications." IEEE Transactions on Pattern
64 | Analysis and Machine Intelligence 33.5 (2010): 978-994.
65 | """
66 | def __init__(self,
67 | cell_size=2,
68 | step_size=1,
69 | is_boundary_included=True,
70 | num_bins=8,
71 | cuda=True,
72 | fp16=False,
73 | return_numpy=False):
74 | self.cell_size = cell_size
75 | self.step_size = step_size
76 | self.is_boundary_included = is_boundary_included
77 | self.num_bins = num_bins
78 | self.return_numpy = return_numpy
79 | self.cuda = cuda and torch.cuda.is_available()
80 | self.fp16 = fp16 and self.cuda
81 | if cuda and not torch.cuda.is_available():
82 | print('WARNING! CUDA mode requested, but',
83 | 'torch.cuda.is_available() is False.',
84 | 'Operations will run on CPU.')
85 | if fp16 and not self.cuda:
86 | print('WARNING! FP16 can only be used in CUDA mode, but',
87 | 'CUDA is not enabled. FP32 will be used instead.')
88 |
89 | # this parameter controls decay of the gradient energy falls into a bin
90 | # run SIFT_weightFunc.m to see why alpha = 9 is the best value
91 | self.alpha = 9
92 |
93 | self.theta = 2 * np.pi / self.num_bins
94 |
95 | self.gradient = None
96 | self.imax_mag = None
97 | self.sin_bins = None
98 | self.cos_bins = None
99 | self.offsets = None
100 | self.descs = None
101 | self.grad_filter = None
102 | self.max_batch_size = 1
103 |
104 | self.filter = self._compute_filter()
105 |
106 | if self.fp16:
107 | self.epsilon = 1e-3
108 | else:
109 | self.epsilon = 1e-10
110 |
111 | def extract_descriptor(self,
112 | images):
113 | """ Main function of this class, which extracts the descriptors from
114 | a batch of images.
115 |
116 | Args:
117 | images : list of 3D array of int or float.
118 | List of images to form the batch. All images should have the
119 | same shape [Hi, Wi, Ci], with any number of channels Ci. The
120 | pixel values are assumed to be in the interval [0, 255].
121 |
122 | Returns:
123 | descs : 4D array of floats
124 | Grid of SIFT Flow descriptors for the given image as an array
125 | of dimensionality (N, Co, Ho, Wo) where
126 | ``N = len(images)``
127 | ``Co = 16 * num_bins``
128 | ``Ho = floor((Hi - D) / step_size)``
129 | ``Wo = floor((Wi - D) / step_size)``
130 | ``D = 0 if is_boundary_included else 4*cell_size``
131 | """
132 | images = np.stack(images, axis=0).transpose(0, 3, 1, 2)
133 | images = torch.from_numpy(images)
134 | if self.fp16:
135 | images = images.half()
136 | else:
137 | images = images.float()
138 | if self.cuda:
139 | images = images.cuda()
140 | images /= 255.0
141 |
142 | self.batch_size = images.shape[0]
143 | self.max_batch_size = max(self.max_batch_size, self.batch_size)
144 |
145 | if self.grad_filter is None:
146 | kernel = np.array(
147 | [-1, 0, 1, -2, 0, 2, -1, 0, 1], np.float32).reshape(3, 3)
148 | gf = np.zeros((images.shape[1], images.shape[1], 3, 3))
149 | for i in range(gf.shape[0]):
150 | gf[i, i] = kernel
151 | self.grad_filter = torch.from_numpy(gf)
152 | if self.fp16:
153 | self.grad_filter = self.grad_filter.half()
154 | else:
155 | self.grad_filter = self.grad_filter.float()
156 | if self.cuda:
157 | self.grad_filter = self.grad_filter.cuda()
158 | images_pad = F.pad(
159 | images, (1, 1, 1, 1), mode='replicate')
160 | dx = F.conv2d(images_pad, self.grad_filter)
161 | dy = F.conv2d(images_pad, self.grad_filter.permute(0, 1, 3, 2))
162 |
163 | # Get the maximum gradient over the channels and estimate the
164 | # normalized gradient
165 | magsrc = torch.sqrt(torch.pow(dx, 2) + torch.pow(dy, 2))
166 | mag, max_mag_idx = torch.max(magsrc, dim=1, keepdim=True)
167 | if (self.imax_mag is None or
168 | self.imax_mag.shape[0] < self.max_batch_size or
169 | self.imax_mag.shape[2:] != images.shape[2:]):
170 | self.imax_mag = torch.zeros(
171 | (self.max_batch_size,) + images.shape[1:])
172 | if self.fp16:
173 | self.imax_mag = self.imax_mag.half()
174 | else:
175 | self.imax_mag = self.imax_mag.float()
176 | if self.cuda:
177 | self.imax_mag = self.imax_mag.cuda()
178 | imax_mag = self.imax_mag[:self.batch_size]
179 | imax_mag[:] = 0
180 | imax_mag = imax_mag.scatter_(1, max_mag_idx, 1)
181 |
182 | if (self.gradient is None or
183 | self.gradient.shape[0] < self.max_batch_size or
184 | self.gradient.shape[2:] != images.shape[2:]):
185 | self.gradient = torch.zeros(
186 | self.max_batch_size, 2, images.shape[2], images.shape[3])
187 | if self.fp16:
188 | self.gradient = self.gradient.half()
189 | else:
190 | self.gradient = self.gradient.float()
191 | if self.cuda:
192 | self.gradient = self.gradient.cuda()
193 | gradient = self.gradient[:self.batch_size]
194 | gradient[:, 0] = (
195 | torch.sum(dx * imax_mag, dim=1) / (mag[:, 0] + self.epsilon))
196 | gradient[:, 1] = (
197 | torch.sum(dy * imax_mag, dim=1) / (mag[:, 0] + self.epsilon))
198 |
199 | # Get the pixel-wise energy for each orientation band
200 | bin_shape = (self.max_batch_size, 1, images.shape[2], images.shape[3])
201 | if self.sin_bins is None or self.sin_bins.shape != bin_shape:
202 | idx = torch.arange(self.num_bins).reshape(1, -1, 1, 1)
203 | if self.fp16:
204 | idx = idx.half()
205 | else:
206 | idx = idx.float()
207 | if self.cuda:
208 | idx = idx.cuda()
209 | idx = idx.repeat(
210 | self.max_batch_size, 1, images.shape[2], images.shape[3])
211 | self.sin_bins = torch.sin(idx * self.theta)
212 | self.cos_bins = torch.cos(idx * self.theta)
213 | imband = (self.cos_bins[:self.batch_size] * gradient[:, :1] +
214 | self.sin_bins[:self.batch_size] * gradient[:, 1:2])
215 | imband = torch.max(imband, torch.zeros_like(imband))
216 | if self.alpha > 1:
217 | imband = torch.pow(imband, self.alpha)
218 | imband *= mag
219 |
220 | # Filter out the SIFT feature
221 | imband_cell = self._filter_features(imband)
222 |
223 | # Allocate buffer for the sift image
224 | siftdim = self.num_bins * 16
225 | sift_height = images.shape[2] // self.step_size
226 | sift_width = images.shape[3] // self.step_size
227 | x_shift = 0
228 | y_shift = 0
229 | if not self.is_boundary_included:
230 | sift_height = (images.shape[2]-4*self.cell_size) // self.step_size
231 | sift_width = (images.shape[3]-4*self.cell_size) // self.step_size
232 | x_shift = 2 * self.cell_size
233 | y_shift = 2 * self.cell_size
234 |
235 | self._compute_offsets(
236 | images.shape, sift_height, sift_width, x_shift, y_shift)
237 |
238 | desc_shape = (
239 | self.max_batch_size, siftdim, sift_height, sift_width)
240 | if self.descs is None or self.descs.shape != desc_shape:
241 | self.descs = torch.empty(desc_shape)
242 | if self.fp16:
243 | self.descs = self.descs.half()
244 | else:
245 | self.descs = self.descs.float()
246 | if self.cuda:
247 | self.descs = self.descs.cuda()
248 | descs = self.descs[:self.batch_size]
249 | for i in range(4):
250 | for j in range(4):
251 | idx = 4 * i + j
252 | feats = F.grid_sample(
253 | imband_cell, self.offsets[
254 | :self.batch_size, :, :, 2*idx:2*idx+2],
255 | mode='nearest')
256 |
257 | start = idx*self.num_bins
258 | end = (idx+1)*self.num_bins
259 | descs[:, start:end] = feats
260 |
261 | mag = descs.norm(dim=1, keepdim=True)
262 | descs /= mag + 0.01
263 |
264 | if self.return_numpy:
265 | descs = descs.detach().cpu().numpy()
266 |
267 | return descs
268 |
269 | def _compute_filter(self):
270 | filt = np.zeros(
271 | (self.num_bins, self.num_bins, 1, self.cell_size*2+1))
272 | for i in range(self.num_bins):
273 | filt[i, i, 0, 0] = 0.25
274 | filt[i, i, 0, self.cell_size+1] = 0.25
275 | for j in range(1, self.cell_size+1):
276 | filt[i, i, 0, j+1] = 1.0
277 | filt = torch.from_numpy(filt)
278 | if self.fp16:
279 | filt = filt.half()
280 | else:
281 | filt = filt.float()
282 | if self.cuda:
283 | filt = filt.cuda()
284 | return filt
285 |
286 | def _compute_offsets(self,
287 | images_shape,
288 | sift_height,
289 | sift_width,
290 | x_shift,
291 | y_shift):
292 | if (self.offsets is None or
293 | self.offsets.shape[0] < self.max_batch_size or
294 | sift_height != self.offsets.shape[1] or
295 | sift_width != self.offsets.shape[2]):
296 | xv, yv = torch.meshgrid(
297 | [torch.arange(0, sift_height),
298 | torch.arange(0, sift_width)])
299 | grid = torch.stack([yv, xv], dim=2).unsqueeze(0)
300 | if self.fp16:
301 | grid = grid.half()
302 | else:
303 | grid = grid.float()
304 | if self.cuda:
305 | grid = grid.cuda()
306 | grid *= self.step_size
307 | self.offsets = torch.zeros(
308 | self.max_batch_size, sift_height, sift_width, 32)
309 | if self.fp16:
310 | self.offsets = self.offsets.half()
311 | else:
312 | self.offsets = self.offsets.float()
313 | if self.cuda:
314 | self.offsets = self.offsets.cuda()
315 | for i in range(-1, 3):
316 | for j in range(-1, 3):
317 | off_y = y_shift + grid[:, :, :, 1] + i*self.cell_size
318 | off_y = torch.max(off_y, torch.zeros_like(off_y))
319 | off_y = torch.min(
320 | off_y, torch.zeros_like(off_y) + images_shape[2] - 1)
321 | off_x = x_shift + grid[:, :, :, 0] + j*self.cell_size
322 | off_x = torch.max(off_x, torch.zeros_like(off_x))
323 | off_x = torch.min(
324 | off_x, torch.zeros_like(off_x) + images_shape[3] - 1)
325 | idx = 4 * (i + 1) + j + 1
326 | self.offsets[:, :, :, 2*idx] = (
327 | off_x.repeat(images_shape[0], 1, 1))
328 | self.offsets[:, :, :, 2*idx+1] = (
329 | off_y.repeat(images_shape[0], 1, 1))
330 | self.offsets[:, :, :, ::2] /= images_shape[3] - 1
331 | self.offsets[:, :, :, 1::2] /= images_shape[2] - 1
332 | self.offsets *= 2
333 | self.offsets -= 1
334 |
335 | def _filter_features(self,
336 | imband):
337 | radius = self.filter.shape[3] // 2
338 | imband_smooth = imband
339 | imband_smooth = F.pad(
340 | imband_smooth, (radius, radius, 0, 0), mode='replicate')
341 | imband_smooth = F.conv2d(imband_smooth, self.filter)
342 | # print('imband_smooth', imband_smooth.min(), imband_smooth.max(), torch.abs(imband_smooth).sum())
343 | imband_smooth = F.pad(
344 | imband_smooth, (0, 0, radius, radius), mode='replicate')
345 | imband_smooth = F.conv2d(
346 | imband_smooth, self.filter.permute(0, 1, 3, 2))
347 | # print('imband_smooth2', imband_smooth.min(), imband_smooth.max(), torch.abs(imband_smooth).sum())
348 | return imband_smooth
349 |
--------------------------------------------------------------------------------