├── third_party └── flowiz │ ├── __init__.py │ ├── README.md │ ├── LICENSE │ └── flowiz.py ├── mpi_sintel_images ├── frame_0001.png ├── frame_0002.png └── README.md ├── LICENSE ├── README.md └── sift_flow_torch.py /third_party/flowiz/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /mpi_sintel_images/frame_0001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hmorimitsu/sift-flow-gpu/HEAD/mpi_sintel_images/frame_0001.png -------------------------------------------------------------------------------- /mpi_sintel_images/frame_0002.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hmorimitsu/sift-flow-gpu/HEAD/mpi_sintel_images/frame_0002.png -------------------------------------------------------------------------------- /third_party/flowiz/README.md: -------------------------------------------------------------------------------- 1 | Flowiz code gotten from [https://github.com/georgegach/flowiz](https://github.com/georgegach/flowiz). -------------------------------------------------------------------------------- /mpi_sintel_images/README.md: -------------------------------------------------------------------------------- 1 | Images collected from the MPI Sintel dataset available at [http://sintel.is.tue.mpg.de/downloads](http://sintel.is.tue.mpg.de/downloads). 2 | 3 | These images are the first two frames of the alley 1 - clean pass sequence. -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Henrique Morimitsu 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /third_party/flowiz/LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 George Gach 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # sift-flow-gpu 2 | 3 | Implementation of the SIFT Flow descriptor [[1]](#references) on GPU using PyTorch. 4 | 5 | This implementation is a port of the original implementation available at 6 | [https://people.csail.mit.edu/celiu/SIFTflow/](https://people.csail.mit.edu/celiu/SIFTflow/). 7 | 8 | This code is able to process a batch of images simultaneously for better 9 | performance. The most expensive operation when running in GPU mode is the 10 | allocation of the space for the descriptors on the GPU. However, this step 11 | is only performed when the shape of the input batch changes. Subsequent 12 | calls using batches with the same shape as before will reuse the memory and 13 | will, therefore, be much faster. 14 | 15 | Code for DAISY descriptors on GPU is also available at [https://github.com/hmorimitsu/daisy-gpu](https://github.com/hmorimitsu/daisy-gpu). 16 | 17 | ## Requirements 18 | 19 | - [Python 3](https://www.python.org/) (Tested on 3.7) 20 | - [Numpy](https://www.numpy.org/) 21 | - [PyTorch](https://pytorch.org/) >= 1.0.0 (Tested on 1.3.0) 22 | 23 | ## Usage 24 | 25 | A simple example is shown below. A more complete practical usage is available as a [Jupyter demo notebook](demo_notebook_torch.ipynb) 26 | 27 | ```python 28 | from sift_flow_torch import SiftFlowTorch 29 | 30 | sift_flow = SiftFlowTorch() 31 | imgs = [ 32 | read_some_image, 33 | read_another_image 34 | ] 35 | descs = sift_flow.extract_descriptor(imgs) # This first call can be 36 | # slower, due to memory allocation 37 | imgs2 = [ 38 | read_yet_another_image, 39 | read_even_one_more_image 40 | ] 41 | descs2 = sift_flow.extract_descriptor(imgs2) # Subsequent calls are faster, 42 | # if images retain same shape 43 | 44 | # descs[0] is the descriptor of imgs[0] and so on. 45 | ``` 46 | 47 | ## Benchmark 48 | 49 | - Machine configuration: 50 | - Intel i7 8750H 51 | - NVIDIA GeForce GTX1070 52 | - Images 1024 x 436 53 | - Descriptor size 128 54 | 55 | Batch Size|FP16|Memory usage(GB)1|Time GPU(ms)2|Time GPU(ms)3|Time CPU(ms) 56 | -|------------------|---|------|------|------ 57 | 1| |0.9| 19.0| 128.0| 660.6 58 | 2| |1.3| 35.3| 257.1|1275.1 59 | 4| |2.1| 70.7| 516.2|2559.3 60 | 8| |3.7| 142.5| 969.4|5773.9 61 | 1|:heavy_check_mark:|0.7| 14.7| | 62 | 2|:heavy_check_mark:|0.9| 27.2| | 63 | 4|:heavy_check_mark:|1.3| 54.8| | 64 | 8|:heavy_check_mark:|2.1| 110.9| | 65 | 66 | 1 Maximum value reported by `nvidia-smi` during the respective tests. 67 | 68 | 2 NOT including time to transfer the result from GPU to CPU. 69 | 70 | 3 Including time to transfer the result from GPU to CPU. 71 | 72 | These times are the median of 5 runs measured after a warm up run to allocate the descriptor space in memory 73 | (read the [introduction](#sift-flow-gpu)). 74 | 75 | ## References 76 | 77 | [1] C. Liu; Jenny Yuen; Antonio Torralba. "SIFT Flow: Dense correspondence across scenes and its applications." IEEE Transactions on Pattern Analysis and Machine Intelligence 33.5 (2010): 978-994. 78 | -------------------------------------------------------------------------------- /third_party/flowiz/flowiz.py: -------------------------------------------------------------------------------- 1 | # Converts Flow .flo files to Images 2 | 3 | # Author : George Gach (@georgegach) 4 | # Date : July 2019 5 | 6 | # Adapted from the Middlebury Vision project's Flow-Code 7 | # URL : http://vision.middlebury.edu/flow/ 8 | 9 | import numpy as np 10 | import os 11 | import errno 12 | from tqdm import tqdm 13 | from PIL import Image 14 | import io 15 | 16 | TAG_FLOAT = 202021.25 17 | flags = { 18 | 'debug': False 19 | } 20 | 21 | 22 | def read_flow(path): 23 | if not isinstance(path, io.BufferedReader): 24 | if not isinstance(path, str): 25 | raise AssertionError( 26 | "Input [{p}] is not a string".format(p=path)) 27 | if not os.path.isfile(path): 28 | raise AssertionError( 29 | "Path [{p}] does not exist".format(p=path)) 30 | if not path.split('.')[-1] == 'flo': 31 | raise AssertionError( 32 | "File extension [flo] required, [{f}] given".format(f=path.split('.')[-1])) 33 | 34 | flo = open(path, 'rb') 35 | else: 36 | flo = path 37 | 38 | tag = np.frombuffer(flo.read(4), np.float32, count=1)[0] 39 | if not TAG_FLOAT == tag: 40 | raise AssertionError("Wrong Tag [{t}]".format(t=tag)) 41 | 42 | width = np.frombuffer(flo.read(4), np.int32, count=1)[0] 43 | if not (width > 0 and width < 100000): 44 | raise AssertionError("Illegal width [{w}]".format(w=width)) 45 | 46 | height = np.frombuffer(flo.read(4), np.int32, count=1)[0] 47 | if not (width > 0 and width < 100000): 48 | raise AssertionError("Illegal height [{h}]".format(h=height)) 49 | 50 | nbands = 2 51 | tmp = np.frombuffer(flo.read(nbands * width * height * 4), 52 | np.float32, count=nbands * width * height) 53 | flow = np.resize(tmp, (int(height), int(width), int(nbands))) 54 | flo.close() 55 | 56 | return flow 57 | 58 | 59 | def _color_wheel(): 60 | # Original inspiration: http://members.shaw.ca/quadibloc/other/colint.htm 61 | 62 | RY = 15 63 | YG = 6 64 | GC = 4 65 | CB = 11 66 | BM = 13 67 | MR = 6 68 | 69 | ncols = RY + YG + GC + CB + BM + MR 70 | 71 | colorwheel = np.zeros([ncols, 3]) # RGB 72 | 73 | col = 0 74 | 75 | # RY 76 | colorwheel[0:RY, 0] = 255 77 | colorwheel[0:RY, 1] = np.floor(255*np.arange(0, RY, 1)/RY) 78 | col += RY 79 | 80 | # YG 81 | colorwheel[col: YG + col, 0] = 255 - \ 82 | np.floor(255*np.arange(0, YG, 1)/YG) 83 | colorwheel[col: YG + col, 1] = 255 84 | col += YG 85 | 86 | # GC 87 | colorwheel[col: GC + col, 1] = 255 88 | colorwheel[col: GC + col, 2] = np.floor(255*np.arange(0, GC, 1)/GC) 89 | col += GC 90 | 91 | # CB 92 | colorwheel[col: CB + col, 1] = 255 - \ 93 | np.floor(255*np.arange(0, CB, 1)/CB) 94 | colorwheel[col: CB + col, 2] = 255 95 | col += CB 96 | 97 | # BM 98 | colorwheel[col: BM + col, 2] = 255 99 | colorwheel[col: BM + col, 0] = np.floor(255*np.arange(0, BM, 1)/BM) 100 | col += BM 101 | 102 | # MR 103 | colorwheel[col: MR + col, 2] = 255 - \ 104 | np.floor(255*np.arange(0, MR, 1)/MR) 105 | colorwheel[col: MR + col, 0] = 255 106 | 107 | return colorwheel 108 | 109 | 110 | def _compute_color(u, v): 111 | colorwheel = _color_wheel() 112 | idxNans = np.where(np.logical_or( 113 | np.isnan(u), 114 | np.isnan(v) 115 | )) 116 | u[idxNans] = 0 117 | v[idxNans] = 0 118 | 119 | ncols = colorwheel.shape[0] 120 | radius = np.sqrt(np.multiply(u, u) + np.multiply(v, v)) 121 | a = np.arctan2(-v, -u) / np.pi 122 | fk = (a+1) / 2 * (ncols - 1) 123 | k0 = fk.astype(np.uint8) 124 | k1 = k0 + 1 125 | k1[k1 == ncols] = 0 126 | f = fk - k0 127 | 128 | img = np.empty([k1.shape[0], k1.shape[1], 3]) 129 | ncolors = colorwheel.shape[1] 130 | 131 | for i in range(ncolors): 132 | tmp = colorwheel[:, i] 133 | col0 = tmp[k0] / 255 134 | col1 = tmp[k1] / 255 135 | col = (1-f) * col0 + f * col1 136 | idx = radius <= 1 137 | col[idx] = 1 - radius[idx] * (1 - col[idx]) 138 | col[~idx] *= 0.75 139 | img[:, :, i] = np.floor(255 * col).astype(np.uint8) # RGB 140 | # img[:, :, 2 - i] = np.floor(255 * col).astype(np.uint8) # BGR 141 | 142 | return img.astype(np.uint8) 143 | 144 | 145 | def _normalize_flow(flow): 146 | UNKNOWN_FLOW_THRESH = 1e9 147 | # UNKNOWN_FLOW = 1e10 148 | 149 | height, width, nBands = flow.shape 150 | if not nBands == 2: 151 | raise AssertionError("Image must have two bands. [{h},{w},{nb}] shape given instead".format( 152 | h=height, w=width, nb=nBands)) 153 | 154 | u = flow[:, :, 0] 155 | v = flow[:, :, 1] 156 | 157 | # Fix unknown flow 158 | idxUnknown = np.where(np.logical_or( 159 | abs(u) > UNKNOWN_FLOW_THRESH, 160 | abs(v) > UNKNOWN_FLOW_THRESH 161 | )) 162 | u[idxUnknown] = 0 163 | v[idxUnknown] = 0 164 | 165 | maxu = max([-999, np.max(u)]) 166 | maxv = max([-999, np.max(v)]) 167 | minu = max([999, np.min(u)]) 168 | minv = max([999, np.min(v)]) 169 | 170 | rad = np.sqrt(np.multiply(u, u) + np.multiply(v, v)) 171 | maxrad = max([-1, np.max(rad)]) 172 | 173 | if flags['debug']: 174 | print("Max Flow : {maxrad:.4f}. Flow Range [u, v] -> [{minu:.3f}:{maxu:.3f}, {minv:.3f}:{maxv:.3f}] ".format( 175 | minu=minu, minv=minv, maxu=maxu, maxv=maxv, maxrad=maxrad 176 | )) 177 | 178 | eps = np.finfo(np.float32).eps 179 | u = u/(maxrad + eps) 180 | v = v/(maxrad + eps) 181 | 182 | return u, v 183 | 184 | 185 | def _flow2color(flow): 186 | 187 | u, v = _normalize_flow(flow) 188 | img = _compute_color(u, v) 189 | 190 | # TO-DO 191 | # Indicate unknown flows on the image 192 | # Originally done as 193 | # 194 | # IDX = repmat(idxUnknown, [1 1 3]); 195 | # img(IDX) = 0; 196 | 197 | return img 198 | 199 | 200 | def _flow2uv(flow): 201 | u, v = _normalize_flow(flow) 202 | uv = (np.dstack([u, v])*127.999+128).astype('uint8') 203 | return uv 204 | 205 | 206 | def _save_png(arr, path): 207 | # TO-DO: No dependency 208 | Image.fromarray(arr).save(path) 209 | 210 | 211 | def convert_from_file(path, mode='RGB'): 212 | return convert_from_flow(read_flow(path), mode) 213 | 214 | 215 | def convert_from_flow(flow, mode='RGB'): 216 | if mode == 'RGB': 217 | return _flow2color(flow) 218 | if mode == 'UV': 219 | return _flow2uv(flow) 220 | 221 | return _flow2color(flow) 222 | 223 | 224 | def convert_files(files, outdir=None): 225 | if outdir != None and not os.path.exists(outdir): 226 | try: 227 | os.makedirs(outdir) 228 | print("> Created directory: " + outdir) 229 | except OSError as exc: 230 | if exc.errno != errno.EEXIST: 231 | raise 232 | 233 | t = tqdm(files) 234 | for f in t: 235 | image = convert_from_file(f) 236 | 237 | if outdir == None: 238 | path = f + '.png' 239 | t.set_description(path) 240 | _save_png(image, path) 241 | else: 242 | path = os.path.join(outdir, os.path.basename(f) + '.png') 243 | t.set_description(path) 244 | _save_png(image, path) 245 | -------------------------------------------------------------------------------- /sift_flow_torch.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import torch.nn.functional as F 4 | 5 | class SiftFlowTorch(object): 6 | """ Computes dense SIFT Flow [1] descriptors from a batch of images. It 7 | uses PyTorch to perform operations on GPU (if available) to significantly 8 | speedup the process. This implementation is a port of the origina 9 | implementation available at 10 | https://people.csail.mit.edu/celiu/SIFTflow/. 11 | 12 | This code is able to process a batch of images simultaneously for better 13 | performance. The most expensive operation when running in GPU mode is the 14 | allocation of the space for the descriptors on the GPU. However, this step 15 | is only performed when the shape of the input batch changes. Subsequent 16 | calls using batches with the same shape as before will reuse the memory and 17 | will, therefore, be much faster. 18 | 19 | Usage: 20 | from sift_flow_torch import SiftFlowTorch 21 | 22 | sift_flow = SiftFlowTorch() 23 | imgs = [ 24 | read_some_image, 25 | read_another_image 26 | ] 27 | descs = sift_flow.extract_descriptor(imgs) # This first call can be 28 | # slower, due to memory 29 | # allocation 30 | imgs2 = [ 31 | read_yet_another_image, 32 | read_even_one_more_image 33 | ] 34 | descs2 = sift_flow.extract_descriptor(imgs2) # Subsequent calls are 35 | # faster, if images retain 36 | # the same shape 37 | 38 | # descs[0] is the descriptor of imgs[0] and so on. 39 | 40 | Args: 41 | cell_size : int, optional 42 | Size of the side of the cell used to compute the descriptor. 43 | step_size, : int, optional 44 | Distance between the descriptor sampled points. 45 | is_boundary_included : boolean 46 | If False, the descriptor is not computed for pixels in the image 47 | boundaries, to avoid boundary effects. 48 | num_bins : int, optional 49 | Number of bins of the descriptor. 50 | cuda : boolean, optional 51 | If True, operations are done on GPU (if available). 52 | fp16 : boolean, optional 53 | Whether to use half-precision floating points for computing the 54 | descriptors. Half-precision mode uses less memory and it may be 55 | slightly faster but less accurate. 56 | return_numpy : boolean, optional 57 | If True, transfers the descriptor from pytorch to numpy before 58 | returning. This will increase the running time due to the memory 59 | transfer. 60 | 61 | References: 62 | [1] C. Liu; Jenny Yuen; Antonio Torralba. "SIFT Flow: Dense correspondence 63 | across scenes and its applications." IEEE Transactions on Pattern 64 | Analysis and Machine Intelligence 33.5 (2010): 978-994. 65 | """ 66 | def __init__(self, 67 | cell_size=2, 68 | step_size=1, 69 | is_boundary_included=True, 70 | num_bins=8, 71 | cuda=True, 72 | fp16=False, 73 | return_numpy=False): 74 | self.cell_size = cell_size 75 | self.step_size = step_size 76 | self.is_boundary_included = is_boundary_included 77 | self.num_bins = num_bins 78 | self.return_numpy = return_numpy 79 | self.cuda = cuda and torch.cuda.is_available() 80 | self.fp16 = fp16 and self.cuda 81 | if cuda and not torch.cuda.is_available(): 82 | print('WARNING! CUDA mode requested, but', 83 | 'torch.cuda.is_available() is False.', 84 | 'Operations will run on CPU.') 85 | if fp16 and not self.cuda: 86 | print('WARNING! FP16 can only be used in CUDA mode, but', 87 | 'CUDA is not enabled. FP32 will be used instead.') 88 | 89 | # this parameter controls decay of the gradient energy falls into a bin 90 | # run SIFT_weightFunc.m to see why alpha = 9 is the best value 91 | self.alpha = 9 92 | 93 | self.theta = 2 * np.pi / self.num_bins 94 | 95 | self.gradient = None 96 | self.imax_mag = None 97 | self.sin_bins = None 98 | self.cos_bins = None 99 | self.offsets = None 100 | self.descs = None 101 | self.grad_filter = None 102 | self.max_batch_size = 1 103 | 104 | self.filter = self._compute_filter() 105 | 106 | if self.fp16: 107 | self.epsilon = 1e-3 108 | else: 109 | self.epsilon = 1e-10 110 | 111 | def extract_descriptor(self, 112 | images): 113 | """ Main function of this class, which extracts the descriptors from 114 | a batch of images. 115 | 116 | Args: 117 | images : list of 3D array of int or float. 118 | List of images to form the batch. All images should have the 119 | same shape [Hi, Wi, Ci], with any number of channels Ci. The 120 | pixel values are assumed to be in the interval [0, 255]. 121 | 122 | Returns: 123 | descs : 4D array of floats 124 | Grid of SIFT Flow descriptors for the given image as an array 125 | of dimensionality (N, Co, Ho, Wo) where 126 | ``N = len(images)`` 127 | ``Co = 16 * num_bins`` 128 | ``Ho = floor((Hi - D) / step_size)`` 129 | ``Wo = floor((Wi - D) / step_size)`` 130 | ``D = 0 if is_boundary_included else 4*cell_size`` 131 | """ 132 | images = np.stack(images, axis=0).transpose(0, 3, 1, 2) 133 | images = torch.from_numpy(images) 134 | if self.fp16: 135 | images = images.half() 136 | else: 137 | images = images.float() 138 | if self.cuda: 139 | images = images.cuda() 140 | images /= 255.0 141 | 142 | self.batch_size = images.shape[0] 143 | self.max_batch_size = max(self.max_batch_size, self.batch_size) 144 | 145 | if self.grad_filter is None: 146 | kernel = np.array( 147 | [-1, 0, 1, -2, 0, 2, -1, 0, 1], np.float32).reshape(3, 3) 148 | gf = np.zeros((images.shape[1], images.shape[1], 3, 3)) 149 | for i in range(gf.shape[0]): 150 | gf[i, i] = kernel 151 | self.grad_filter = torch.from_numpy(gf) 152 | if self.fp16: 153 | self.grad_filter = self.grad_filter.half() 154 | else: 155 | self.grad_filter = self.grad_filter.float() 156 | if self.cuda: 157 | self.grad_filter = self.grad_filter.cuda() 158 | images_pad = F.pad( 159 | images, (1, 1, 1, 1), mode='replicate') 160 | dx = F.conv2d(images_pad, self.grad_filter) 161 | dy = F.conv2d(images_pad, self.grad_filter.permute(0, 1, 3, 2)) 162 | 163 | # Get the maximum gradient over the channels and estimate the 164 | # normalized gradient 165 | magsrc = torch.sqrt(torch.pow(dx, 2) + torch.pow(dy, 2)) 166 | mag, max_mag_idx = torch.max(magsrc, dim=1, keepdim=True) 167 | if (self.imax_mag is None or 168 | self.imax_mag.shape[0] < self.max_batch_size or 169 | self.imax_mag.shape[2:] != images.shape[2:]): 170 | self.imax_mag = torch.zeros( 171 | (self.max_batch_size,) + images.shape[1:]) 172 | if self.fp16: 173 | self.imax_mag = self.imax_mag.half() 174 | else: 175 | self.imax_mag = self.imax_mag.float() 176 | if self.cuda: 177 | self.imax_mag = self.imax_mag.cuda() 178 | imax_mag = self.imax_mag[:self.batch_size] 179 | imax_mag[:] = 0 180 | imax_mag = imax_mag.scatter_(1, max_mag_idx, 1) 181 | 182 | if (self.gradient is None or 183 | self.gradient.shape[0] < self.max_batch_size or 184 | self.gradient.shape[2:] != images.shape[2:]): 185 | self.gradient = torch.zeros( 186 | self.max_batch_size, 2, images.shape[2], images.shape[3]) 187 | if self.fp16: 188 | self.gradient = self.gradient.half() 189 | else: 190 | self.gradient = self.gradient.float() 191 | if self.cuda: 192 | self.gradient = self.gradient.cuda() 193 | gradient = self.gradient[:self.batch_size] 194 | gradient[:, 0] = ( 195 | torch.sum(dx * imax_mag, dim=1) / (mag[:, 0] + self.epsilon)) 196 | gradient[:, 1] = ( 197 | torch.sum(dy * imax_mag, dim=1) / (mag[:, 0] + self.epsilon)) 198 | 199 | # Get the pixel-wise energy for each orientation band 200 | bin_shape = (self.max_batch_size, 1, images.shape[2], images.shape[3]) 201 | if self.sin_bins is None or self.sin_bins.shape != bin_shape: 202 | idx = torch.arange(self.num_bins).reshape(1, -1, 1, 1) 203 | if self.fp16: 204 | idx = idx.half() 205 | else: 206 | idx = idx.float() 207 | if self.cuda: 208 | idx = idx.cuda() 209 | idx = idx.repeat( 210 | self.max_batch_size, 1, images.shape[2], images.shape[3]) 211 | self.sin_bins = torch.sin(idx * self.theta) 212 | self.cos_bins = torch.cos(idx * self.theta) 213 | imband = (self.cos_bins[:self.batch_size] * gradient[:, :1] + 214 | self.sin_bins[:self.batch_size] * gradient[:, 1:2]) 215 | imband = torch.max(imband, torch.zeros_like(imband)) 216 | if self.alpha > 1: 217 | imband = torch.pow(imband, self.alpha) 218 | imband *= mag 219 | 220 | # Filter out the SIFT feature 221 | imband_cell = self._filter_features(imband) 222 | 223 | # Allocate buffer for the sift image 224 | siftdim = self.num_bins * 16 225 | sift_height = images.shape[2] // self.step_size 226 | sift_width = images.shape[3] // self.step_size 227 | x_shift = 0 228 | y_shift = 0 229 | if not self.is_boundary_included: 230 | sift_height = (images.shape[2]-4*self.cell_size) // self.step_size 231 | sift_width = (images.shape[3]-4*self.cell_size) // self.step_size 232 | x_shift = 2 * self.cell_size 233 | y_shift = 2 * self.cell_size 234 | 235 | self._compute_offsets( 236 | images.shape, sift_height, sift_width, x_shift, y_shift) 237 | 238 | desc_shape = ( 239 | self.max_batch_size, siftdim, sift_height, sift_width) 240 | if self.descs is None or self.descs.shape != desc_shape: 241 | self.descs = torch.empty(desc_shape) 242 | if self.fp16: 243 | self.descs = self.descs.half() 244 | else: 245 | self.descs = self.descs.float() 246 | if self.cuda: 247 | self.descs = self.descs.cuda() 248 | descs = self.descs[:self.batch_size] 249 | for i in range(4): 250 | for j in range(4): 251 | idx = 4 * i + j 252 | feats = F.grid_sample( 253 | imband_cell, self.offsets[ 254 | :self.batch_size, :, :, 2*idx:2*idx+2], 255 | mode='nearest') 256 | 257 | start = idx*self.num_bins 258 | end = (idx+1)*self.num_bins 259 | descs[:, start:end] = feats 260 | 261 | mag = descs.norm(dim=1, keepdim=True) 262 | descs /= mag + 0.01 263 | 264 | if self.return_numpy: 265 | descs = descs.detach().cpu().numpy() 266 | 267 | return descs 268 | 269 | def _compute_filter(self): 270 | filt = np.zeros( 271 | (self.num_bins, self.num_bins, 1, self.cell_size*2+1)) 272 | for i in range(self.num_bins): 273 | filt[i, i, 0, 0] = 0.25 274 | filt[i, i, 0, self.cell_size+1] = 0.25 275 | for j in range(1, self.cell_size+1): 276 | filt[i, i, 0, j+1] = 1.0 277 | filt = torch.from_numpy(filt) 278 | if self.fp16: 279 | filt = filt.half() 280 | else: 281 | filt = filt.float() 282 | if self.cuda: 283 | filt = filt.cuda() 284 | return filt 285 | 286 | def _compute_offsets(self, 287 | images_shape, 288 | sift_height, 289 | sift_width, 290 | x_shift, 291 | y_shift): 292 | if (self.offsets is None or 293 | self.offsets.shape[0] < self.max_batch_size or 294 | sift_height != self.offsets.shape[1] or 295 | sift_width != self.offsets.shape[2]): 296 | xv, yv = torch.meshgrid( 297 | [torch.arange(0, sift_height), 298 | torch.arange(0, sift_width)]) 299 | grid = torch.stack([yv, xv], dim=2).unsqueeze(0) 300 | if self.fp16: 301 | grid = grid.half() 302 | else: 303 | grid = grid.float() 304 | if self.cuda: 305 | grid = grid.cuda() 306 | grid *= self.step_size 307 | self.offsets = torch.zeros( 308 | self.max_batch_size, sift_height, sift_width, 32) 309 | if self.fp16: 310 | self.offsets = self.offsets.half() 311 | else: 312 | self.offsets = self.offsets.float() 313 | if self.cuda: 314 | self.offsets = self.offsets.cuda() 315 | for i in range(-1, 3): 316 | for j in range(-1, 3): 317 | off_y = y_shift + grid[:, :, :, 1] + i*self.cell_size 318 | off_y = torch.max(off_y, torch.zeros_like(off_y)) 319 | off_y = torch.min( 320 | off_y, torch.zeros_like(off_y) + images_shape[2] - 1) 321 | off_x = x_shift + grid[:, :, :, 0] + j*self.cell_size 322 | off_x = torch.max(off_x, torch.zeros_like(off_x)) 323 | off_x = torch.min( 324 | off_x, torch.zeros_like(off_x) + images_shape[3] - 1) 325 | idx = 4 * (i + 1) + j + 1 326 | self.offsets[:, :, :, 2*idx] = ( 327 | off_x.repeat(images_shape[0], 1, 1)) 328 | self.offsets[:, :, :, 2*idx+1] = ( 329 | off_y.repeat(images_shape[0], 1, 1)) 330 | self.offsets[:, :, :, ::2] /= images_shape[3] - 1 331 | self.offsets[:, :, :, 1::2] /= images_shape[2] - 1 332 | self.offsets *= 2 333 | self.offsets -= 1 334 | 335 | def _filter_features(self, 336 | imband): 337 | radius = self.filter.shape[3] // 2 338 | imband_smooth = imband 339 | imband_smooth = F.pad( 340 | imband_smooth, (radius, radius, 0, 0), mode='replicate') 341 | imband_smooth = F.conv2d(imband_smooth, self.filter) 342 | # print('imband_smooth', imband_smooth.min(), imband_smooth.max(), torch.abs(imband_smooth).sum()) 343 | imband_smooth = F.pad( 344 | imband_smooth, (0, 0, radius, radius), mode='replicate') 345 | imband_smooth = F.conv2d( 346 | imband_smooth, self.filter.permute(0, 1, 3, 2)) 347 | # print('imband_smooth2', imband_smooth.min(), imband_smooth.max(), torch.abs(imband_smooth).sum()) 348 | return imband_smooth 349 | --------------------------------------------------------------------------------