├── README.md
├── benchmark
    └── deep-visual-geo-localization-benchmark
    │   ├── README.md
    │   ├── anyloc.txt
    │   ├── anyloc_vlad_generate.py
    │   ├── cam.py
    │   ├── cct.txt
    │   ├── commons.py
    │   ├── cosplace.txt
    │   ├── dataloaders
    │       ├── GSVCitiesDataloader.py
    │       ├── GSVCitiesDataset.py
    │       ├── MapillaryDataset.py
    │       └── PittsburgDataset.py
    │   ├── datasets_ws.py
    │   ├── dino_extractor.py
    │   ├── eval.py
    │   ├── ind_name.py
    │   ├── main.py
    │   ├── mixvpr.txt
    │   ├── mixvpr_result.txt
    │   ├── model
    │       ├── __init__.py
    │       ├── aggregation.py
    │       ├── cct
    │       │   ├── __init__.py
    │       │   ├── cct.py
    │       │   ├── embedder.py
    │       │   ├── helpers.py
    │       │   ├── stochastic_depth.py
    │       │   ├── tokenizer.py
    │       │   └── transformers.py
    │       ├── commands.txt
    │       ├── functional.py
    │       ├── network.py
    │       ├── normalization.py
    │       └── sync_batchnorm
    │       │   ├── __init__.py
    │       │   ├── batchnorm.py
    │       │   ├── batchnorm_reimpl.py
    │       │   ├── comm.py
    │       │   ├── replicate.py
    │       │   └── unittest.py
    │   ├── models
    │       ├── __init__.py
    │       ├── aggregators
    │       │   ├── __init__.py
    │       │   ├── convap.py
    │       │   ├── cosplace.py
    │       │   ├── gem.py
    │       │   └── mixvpr.py
    │       ├── backbones
    │       │   ├── __init__.py
    │       │   ├── efficientnet.py
    │       │   ├── resnet.py
    │       │   └── swin.py
    │       └── helper.py
    │   ├── parser.py
    │   ├── pytorch_grad_cam
    │       ├── __init__.py
    │       ├── ablation_cam.py
    │       ├── ablation_cam_multilayer.py
    │       ├── ablation_layer.py
    │       ├── activations_and_gradients.py
    │       ├── base_cam.py
    │       ├── eigen_cam.py
    │       ├── eigen_grad_cam.py
    │       ├── feature_factorization
    │       │   ├── __init__.py
    │       │   └── deep_feature_factorization.py
    │       ├── fullgrad_cam.py
    │       ├── grad_cam.py
    │       ├── grad_cam_elementwise.py
    │       ├── grad_cam_plusplus.py
    │       ├── guided_backprop.py
    │       ├── hirescam.py
    │       ├── layer_cam.py
    │       ├── metrics
    │       │   ├── __init__.py
    │       │   ├── cam_mult_image.py
    │       │   ├── perturbation_confidence.py
    │       │   └── road.py
    │       ├── random_cam.py
    │       ├── score_cam.py
    │       ├── sobel_cam.py
    │       ├── utils
    │       │   ├── __init__.py
    │       │   ├── find_layers.py
    │       │   ├── image.py
    │       │   ├── model_targets.py
    │       │   ├── reshape_transforms.py
    │       │   └── svd_on_activations.py
    │       └── xgrad_cam.py
    │   ├── requirements.txt
    │   ├── resnet.txt
    │   ├── resnet_result.txt
    │   ├── results
    │       ├── anyloc_eval.txt
    │       ├── anyloc_name.txt
    │       ├── cct_eval.txt
    │       ├── cct_name.txt
    │       ├── cosplace_eval.txt
    │       ├── cosplace_name.txt
    │       ├── mixvpr_eval.txt
    │       ├── mixvpr_name.txt
    │       ├── resnet_eval.txt
    │       ├── resnet_name.txt
    │       └── result.txt
    │   ├── sbatch.txt
    │   ├── scratch.py
    │   ├── summary.py
    │   ├── test.SBATCH
    │   ├── test.py
    │   ├── test1.py
    │   ├── test_database.txt
    │   ├── test_queries.txt
    │   ├── train.py
    │   ├── train_1.py
    │   ├── util.py
    │   ├── utilities.py
    │   ├── utilities1.py
    │   ├── utils
    │       ├── __init__.py
    │       ├── losses.py
    │       └── validation.py
    │   └── visual
    │       ├── name_utm.py
    │       ├── test_paths.txt
    │       ├── test_utm.txt
    │       ├── train_paths.txt
    │       ├── train_utm.txt
    │       ├── util.py
    │       ├── val_paths.txt
    │       └── val_utm.txt
├── method
    ├── README.md
    └── traj_label_gui.py
└── teaser
    ├── data_vis.jpg
    ├── dataset_vis.jpg
    ├── label_pipeline_ex.png
    └── pipeline.jpg


/README.md:
--------------------------------------------------------------------------------
 1 | # NYC-Indoor-VPR
 2 | 
 3 | Diwei Sheng, Anbang Yang, John-Ross Rizzo, Chen Feng
 4 | 
 5 | [Paper on arXiv](https://arxiv.org/pdf/2404.00504)
 6 | 
 7 | <p align="center">
 8 | <img src="./teaser/data_vis.jpg" width="100%"/>
 9 | <p align="center">Dataset</p>
10 | <img src="./teaser/pipeline.jpg" width="100%"/>
11 | <p align="center">Semi-auto annotation method</p>
12 | </div>
13 | 
14 | 
15 | ## News
16 | - [2023/06]: We release **NYC-Indoor** for academic usage.
17 | - [2023/06]: NYC-Indoor is submitted to **NeurIPS 2023 Track on Datasets and Benchmarks**.
18 | - [2024/03]: NYC-Indoor is accepted by **ICRA 2024**.
19 | 
20 | ## Abstract
21 | Visual Place Recognition (VPR) seeks to enhance the ability of camera systems to identify previously visited places based on captured images. This paper introduces the NYC-Indoor dataset, a rich collection of over 36,000 images compiled from 13 distinct scenes within a span of a year. NYC-Indoor is a unique, year-long indoor VPR benchmark dataset comprising images from different crowded scenes in New York City, taken under varying lighting conditions with seasonal and appearance changes. To establish ground truth for this dataset, we propose a semi-automatic annotation approach that computes the positional information of each image. Our method specifically takes pairs of videos as input and yields matched pairs of images, along with their estimated relative locations. The accuracy of this matching process is further refined by human annotators, who utilize our custom annotation interface to correlate selected keyframes. We apply our annotation methodology to the NYC-Indoor dataset. Finally, we present a benchmark evaluation of several state-of-the-art VPR algorithms using our dataset.
22 | 
23 | ## NYC-Indoor Dataset
24 | NYC-Indoor dataset is a rich collection of over 36,000 images compiled from 13 distinct scenes within a span of a year. The dataset can be downloaded from [HuggingFace](https://huggingface.co/datasets/ai4ce/NYC-Indoor-VPR-Data/tree/main). We release NYC-Indoor under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).
25 | 
26 | ## Benchmark
27 | We benchmarked four state-of-the-art deep learning VPR methods on the NYC-Indoor dataset: CosPlace, MixVPR, ResNet+NetVLAD, and CCT+NetVLAD. For more details, please refer to the [benchmark](./benchmark) folder file.
28 | 
29 | ## Semi-auto Annotation
30 | Our semi-automatic annotation method can efficiently and accurately match trajectories and generate images with topometric locations as ground truth, applicable to any indoor VPR dataset. For more details, please refer to the [method](./method) folder file.
31 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/README.md:
--------------------------------------------------------------------------------
 1 | # Disclaimer
 2 | Code in this is folder is originally from Deep Visual Geo-localization Benchmark [official repository](https://github.com/gmberton/deep-visual-geo-localization-benchmark). We have made essential modifications the code to work with NYC-Indoor dataset. If you are using the code in this folder, please cosider citing the following paper:
 3 | ```
 4 | @inProceedings{Berton_CVPR_2022_benchmark,
 5 |     author    = {Berton, Gabriele and Mereu, Riccardo and Trivigno, Gabriele and Masone, Carlo and
 6 |                  Csurka, Gabriela and Sattler, Torsten and Caputo, Barbara},
 7 |     title     = {Deep Visual Geo-localization Benchmark},
 8 |     booktitle = {CVPR},
 9 |     month     = {June},
10 |     year      = {2022},
11 | }
12 | ```
13 | 
14 | # Create Environment
15 | ```bash
16 | conda create --name bench python=3.7
17 | ```
18 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/anyloc_vlad_generate.py:
--------------------------------------------------------------------------------
  1 | # Download cache data from OneDrive
  2 | import os
  3 | from onedrivedownloader import download
  4 | from utilities1 import od_down_links
  5 | 
  6 | # Link
  7 | ln = od_down_links["cache"]
  8 | # Download and unzip
  9 | if os.path.isdir("./cache"):
 10 |     print("Cache folder already exists!")
 11 | else:
 12 |     print("Downloading the cache folder")
 13 |     download(ln, filename="cache.zip", unzip=True, unzip_path="./")
 14 |     print("Cache folder downloaded")
 15 | 
 16 | import glob
 17 | _ex = lambda x: os.path.realpath(os.path.expanduser(x))
 18 | cache_dir: str = _ex("./cache")
 19 | # imgs_dir = "/mnt/data/dean/datasets/b4/images/test/database"
 20 | # assert os.path.isdir(cache_dir), "Cache directory not found"
 21 | # assert os.path.isdir(imgs_dir), "Invalid unzipping"
 22 | # num_imgs = len(glob.glob(f"{imgs_dir}/*.jpg"))
 23 | # print(f"Found {num_imgs} images in {imgs_dir}")
 24 | 
 25 | # Import everything
 26 | import numpy as np
 27 | import cv2 as cv
 28 | import torch
 29 | from torch import nn
 30 | from torch.nn import functional as F
 31 | from torchvision import transforms as tvf
 32 | from torchvision.transforms import functional as T
 33 | from PIL import Image
 34 | import matplotlib.pyplot as plt
 35 | import distinctipy as dipy
 36 | from tqdm.auto import tqdm
 37 | from typing import Literal, List
 38 | import os
 39 | import natsort
 40 | import shutil
 41 | from copy import deepcopy
 42 | # DINOv2 imports
 43 | from utilities1 import DinoV2ExtractFeatures
 44 | from utilities1 import VLAD
 45 | 
 46 | # Program parameters
 47 | save_dir = "/home/unav/Desktop/benchmark/AnyLoc/saved_desc"
 48 | device = torch.device("cuda")
 49 | # Dino_v2 properties (parameters)
 50 | desc_layer: int = 31
 51 | desc_facet: Literal["query", "key", "value", "token"] = "value"
 52 | num_c: int = 32
 53 | # Domain for use case (deployment environment)
 54 | domain: Literal["aerial", "indoor", "urban"] = "urban"
 55 | # Maximum image dimension
 56 | max_img_size: int = 640
 57 | 
 58 | # DINO extractor
 59 | if "extractor" in globals():
 60 |     print(f"Extractor already defined, skipping")
 61 | else:
 62 |     # extractor=ViTExtractor("dino_vits8", stride=4, 
 63 |     #     device=device)
 64 |     extractor = DinoV2ExtractFeatures("dinov2_vitg14", desc_layer,
 65 |         desc_facet, device=device)
 66 | # Base image transformations
 67 | base_tf = tvf.Compose([
 68 |     tvf.ToTensor(),
 69 |     tvf.Normalize(mean=[0.485, 0.456, 0.406], 
 70 |                     std=[0.229, 0.224, 0.225])
 71 | ])
 72 | 
 73 | # Ensure that data is present
 74 | ext_specifier = f"dinov2_vitg14/l{desc_layer}_{desc_facet}_c{num_c}"
 75 | c_centers_file = os.path.join(cache_dir, "vocabulary", ext_specifier,
 76 |                             domain, "c_centers.pt")
 77 | assert os.path.isfile(c_centers_file), "Cluster centers not cached!"
 78 | c_centers = torch.load(c_centers_file)
 79 | assert c_centers.shape[0] == num_c, "Wrong number of clusters!"
 80 | 
 81 | # VLAD object
 82 | vlad = VLAD(num_c, desc_dim=None, 
 83 |         cache_dir=os.path.dirname(c_centers_file))
 84 | # Fit (load) the cluster centers (this'll also load the desc_dim)
 85 | vlad.fit(None)
 86 | 
 87 | 
 88 | # img_fnames = glob.glob(f"{imgs_dir}/*.jpg")
 89 | # img_fnames = natsort.natsorted(img_fnames)
 90 | 
 91 | def single_desc(img_fname):
 92 | #     for img_fname in tqdm(img_fnames[:20]):
 93 | #         # DINO features
 94 |     with torch.no_grad():
 95 |         pil_img = Image.open(img_fname).convert('RGB')
 96 |         img_pt = base_tf(pil_img).to(device)
 97 |         if max(img_pt.shape[-2:]) > max_img_size:
 98 |             c, h, w = img_pt.shape
 99 |             # Maintain aspect ratio
100 |             if h == max(img_pt.shape[-2:]):
101 |                 w = int(w * max_img_size / h)
102 |                 h = max_img_size
103 |             else:
104 |                 h = int(h * max_img_size / w)
105 |                 w = max_img_size
106 |             # print(f"To {(h, w) =}")
107 |             img_pt = T.resize(img_pt, (h, w), 
108 |                     interpolation=T.InterpolationMode.BICUBIC)
109 |             # print(f"Resized {img_fname} to {img_pt.shape = }")
110 |         # Make image patchable (14, 14 patches)
111 |         c, h, w = img_pt.shape
112 |         h_new, w_new = (h // 14) * 14, (w // 14) * 14
113 |         img_pt = tvf.CenterCrop((h_new, w_new))(img_pt)[None, ...]
114 |         # Extract descriptor
115 |         # print(img_pt.shape)
116 |         ret = extractor(img_pt) # [1, num_patches, desc_dim]
117 |     # VLAD global descriptor
118 |     gd = vlad.generate(ret.cpu().squeeze()) # VLAD: shape [agg_dim]
119 |     # print(gd.shape)
120 |     return gd
121 |     # gd_np = gd.numpy()[np.newaxis, ...] # shape: [1, agg_dim]
122 |     # print(gd_np.shape)
123 |         # np.save(f"{save_dir}/{os.path.basename(img_fname)}.npy", gd_np)
124 | 
125 | # single_desc("/mnt/data/nyc_indoor/indoor/images/test/database/@00000.00@00035.70@168@.jpg")
126 | def all_desc():
127 |     f=open("test_database.txt")
128 |     f1=open("test_queries.txt")
129 |     l=f.readlines()
130 |     l1=f1.readlines()
131 |     all_features = np.empty((len(l)+len(l1), 49152), dtype="float32")
132 |     for i in tqdm(range(len(l))):
133 |         all_features[i]=single_desc(l[i].strip())
134 | 
135 |     for i in tqdm(range(len(l1))):
136 |         all_features[len(l)+i]=single_desc(l1[i].strip())
137 |     return all_features


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/cam.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import cv2
  3 | import numpy as np
  4 | import torch
  5 | from torchvision import models
  6 | from pytorch_grad_cam import GradCAM, \
  7 |     HiResCAM, \
  8 |     ScoreCAM, \
  9 |     GradCAMPlusPlus, \
 10 |     AblationCAM, \
 11 |     XGradCAM, \
 12 |     EigenCAM, \
 13 |     EigenGradCAM, \
 14 |     LayerCAM, \
 15 |     FullGrad, \
 16 |     GradCAMElementWise
 17 | 
 18 | 
 19 | from pytorch_grad_cam import GuidedBackpropReLUModel
 20 | from pytorch_grad_cam.utils.image import show_cam_on_image, \
 21 |     deprocess_image, \
 22 |     preprocess_image
 23 | from pytorch_grad_cam.utils.model_targets import ClassifierOutputTarget
 24 | 
 25 | 
 26 | def get_args():
 27 |     parser = argparse.ArgumentParser()
 28 |     parser.add_argument('--use-cuda', action='store_true', default=False,
 29 |                         help='Use NVIDIA GPU acceleration')
 30 |     parser.add_argument(
 31 |         '--image-path',
 32 |         type=str,
 33 |         default='./examples/both.png',
 34 |         help='Input image path')
 35 |     parser.add_argument('--aug_smooth', action='store_true',
 36 |                         help='Apply test time augmentation to smooth the CAM')
 37 |     parser.add_argument(
 38 |         '--eigen_smooth',
 39 |         action='store_true',
 40 |         help='Reduce noise by taking the first principle componenet'
 41 |         'of cam_weights*activations')
 42 |     parser.add_argument('--method', type=str, default='gradcam',
 43 |                         choices=['gradcam', 'hirescam', 'gradcam++',
 44 |                                  'scorecam', 'xgradcam',
 45 |                                  'ablationcam', 'eigencam',
 46 |                                  'eigengradcam', 'layercam', 'fullgrad'],
 47 |                         help='Can be gradcam/gradcam++/scorecam/xgradcam'
 48 |                              '/ablationcam/eigencam/eigengradcam/layercam')
 49 | 
 50 |     args = parser.parse_args()
 51 |     args.use_cuda = args.use_cuda and torch.cuda.is_available()
 52 |     if args.use_cuda:
 53 |         print('Using GPU for acceleration')
 54 |     else:
 55 |         print('Using CPU for computation')
 56 | 
 57 |     return args
 58 | 
 59 | 
 60 | def cam(model, img_path, layer, layer_name):
 61 |     """ python cam.py -image-path <path_to_image>
 62 |     Example usage of loading an image, and computing:
 63 |         1. CAM
 64 |         2. Guided Back Propagation
 65 |         3. Combining both
 66 |     """
 67 | 
 68 |     # args = get_args()
 69 |     methods = \
 70 |         {"gradcam": GradCAM,
 71 |          "hirescam": HiResCAM,
 72 |          "scorecam": ScoreCAM,
 73 |          "gradcam++": GradCAMPlusPlus,
 74 |          "ablationcam": AblationCAM,
 75 |          "xgradcam": XGradCAM,
 76 |          "eigencam": EigenCAM,
 77 |          "eigengradcam": EigenGradCAM,
 78 |          "layercam": LayerCAM,
 79 |          "fullgrad": FullGrad,
 80 |          "gradcamelementwise": GradCAMElementWise}
 81 | 
 82 |     # model = models.resnet50(pretrained=True)
 83 | 
 84 |     # Choose the target layer you want to compute the visualization for.
 85 |     # Usually this will be the last convolutional layer in the model.
 86 |     # Some common choices can be:
 87 |     # Resnet18 and 50: model.layer4
 88 |     # VGG, densenet161: model.features[-1]
 89 |     # mnasnet1_0: model.layers[-1]
 90 |     # You can print the model to help chose the layer
 91 |     # You can pass a list with several target layers,
 92 |     # in that case the CAMs will be computed per layer and then aggregated.
 93 |     # You can also try selecting all layers of a certain type, with e.g:
 94 |     # from pytorch_grad_cam.utils.find_layers import find_layer_types_recursive
 95 |     # find_layer_types_recursive(model, [torch.nn.ReLU])
 96 |     target_layers = [layer]
 97 | 
 98 |     img=cv2.imread(img_path, 1)
 99 |     img=cv2.resize(img, (384,384))
100 |     rgb_img = img[:, :, ::-1]
101 |     rgb_img = np.float32(rgb_img) / 255
102 |     input_tensor = preprocess_image(rgb_img,
103 |                                     mean=[0.485, 0.456, 0.406],
104 |                                     std=[0.229, 0.224, 0.225])
105 | 
106 |     # We have to specify the target we want to generate
107 |     # the Class Activation Maps for.
108 |     # If targets is None, the highest scoring category (for every member in the batch) will be used.
109 |     # You can target specific categories by
110 |     # targets = [e.g ClassifierOutputTarget(281)]
111 |     targets = None
112 | 
113 |     # Using the with statement ensures the context is freed, and you can
114 |     # recreate different CAM objects in a loop.
115 |     cam_algorithm = methods["gradcam"]
116 |     with cam_algorithm(model=model,
117 |                        target_layers=target_layers,
118 |                        use_cuda=False) as cam:
119 | 
120 |         # AblationCAM and ScoreCAM have batched implementations.
121 |         # You can override the internal batch size for faster computation.
122 |         cam.batch_size = 32
123 |         # try:
124 |         grayscale_cam = cam(input_tensor=input_tensor,
125 |                             targets=targets,
126 |                             aug_smooth=False,
127 |                             eigen_smooth=False)
128 |         # except:
129 |         #     print("grayscale_cam none")
130 |         #     return
131 | 
132 |         # Here grayscale_cam has only one image in the batch
133 |         grayscale_cam = grayscale_cam[0, :]
134 | 
135 |         cam_image = show_cam_on_image(rgb_img, grayscale_cam, use_rgb=True)
136 | 
137 |         # cam_image is RGB encoded whereas "cv2.imwrite" requires BGR encoding.
138 |         cam_image = cv2.cvtColor(cam_image, cv2.COLOR_RGB2BGR)
139 | 
140 |     gb_model = GuidedBackpropReLUModel(model=model, use_cuda=False)
141 |     gb = gb_model(input_tensor, target_category=None)
142 |     cam_mask = cv2.merge([grayscale_cam, grayscale_cam, grayscale_cam])
143 |     cam_gb = deprocess_image(cam_mask * gb)
144 |     gb = deprocess_image(gb)
145 | 
146 |     last_slash_index = img_path.rfind("/")
147 |     img_name=img_path[last_slash_index + 1:-4]
148 |     cv2.imwrite("visual/cct/"+img_name+'_'+layer_name+'.jpg', cam_image)
149 |     # cv2.imwrite("visual/"+model_name+'_'+layer_name+'.jpg', gb)
150 |     # cv2.imwrite(f'gradcam_cam_gb.jpg', cam_gb)
151 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/commons.py:
--------------------------------------------------------------------------------
 1 | 
 2 | """
 3 | This file contains some functions and classes which can be useful in very diverse projects.
 4 | """
 5 | 
 6 | import os
 7 | import sys
 8 | import torch
 9 | import random
10 | import logging
11 | import traceback
12 | import numpy as np
13 | from os.path import join
14 | 
15 | 
16 | def make_deterministic(seed=0):
17 |     """Make results deterministic. If seed == -1, do not make deterministic.
18 |     Running the script in a deterministic way might slow it down.
19 |     """
20 |     if seed == -1:
21 |         return
22 |     random.seed(seed)
23 |     np.random.seed(seed)
24 |     torch.manual_seed(seed)
25 |     torch.cuda.manual_seed(seed)
26 |     torch.backends.cudnn.deterministic = True
27 |     torch.backends.cudnn.benchmark = False
28 | 
29 | 
30 | def setup_logging(save_dir, console="debug",
31 |                   info_filename="info.log", debug_filename="debug.log"):
32 |     """Set up logging files and console output.
33 |     Creates one file for INFO logs and one for DEBUG logs.
34 |     Args:
35 |         save_dir (str): creates the folder where to save the files.
36 |         debug (str):
37 |             if == "debug" prints on console debug messages and higher
38 |             if == "info"  prints on console info messages and higher
39 |             if == None does not use console (useful when a logger has already been set)
40 |         info_filename (str): the name of the info file. if None, don't create info file
41 |         debug_filename (str): the name of the debug file. if None, don't create debug file
42 |     """
43 |     if os.path.exists(save_dir):
44 |         raise FileExistsError(f"{save_dir} already exists!")
45 |     os.makedirs(save_dir, exist_ok=True)
46 |     # logging.Logger.manager.loggerDict.keys() to check which loggers are in use
47 |     base_formatter = logging.Formatter('%(asctime)s   %(message)s', "%Y-%m-%d %H:%M:%S")
48 |     logger = logging.getLogger('')
49 |     logger.setLevel(logging.DEBUG)
50 |     
51 |     if info_filename is not None:
52 |         info_file_handler = logging.FileHandler(join(save_dir, info_filename))
53 |         info_file_handler.setLevel(logging.INFO)
54 |         info_file_handler.setFormatter(base_formatter)
55 |         logger.addHandler(info_file_handler)
56 |     
57 |     if debug_filename is not None:
58 |         debug_file_handler = logging.FileHandler(join(save_dir, debug_filename))
59 |         debug_file_handler.setLevel(logging.DEBUG)
60 |         debug_file_handler.setFormatter(base_formatter)
61 |         logger.addHandler(debug_file_handler)
62 |     
63 |     if console is not None:
64 |         console_handler = logging.StreamHandler()
65 |         if console == "debug":
66 |             console_handler.setLevel(logging.DEBUG)
67 |         if console == "info":
68 |             console_handler.setLevel(logging.INFO)
69 |         console_handler.setFormatter(base_formatter)
70 |         logger.addHandler(console_handler)
71 |     
72 |     def exception_handler(type_, value, tb):
73 |         logger.info("\n" + "".join(traceback.format_exception(type, value, tb)))
74 |     sys.excepthook = exception_handler
75 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/dataloaders/GSVCitiesDataloader.py:
--------------------------------------------------------------------------------
  1 | import pytorch_lightning as pl
  2 | from torch.utils.data.dataloader import DataLoader
  3 | from torchvision import transforms as T
  4 | 
  5 | from dataloaders.GSVCitiesDataset import GSVCitiesDataset
  6 | from . import PittsburgDataset
  7 | from . import MapillaryDataset
  8 | 
  9 | from prettytable import PrettyTable
 10 | 
 11 | IMAGENET_MEAN_STD = {'mean': [0.485, 0.456, 0.406], 
 12 |                      'std': [0.229, 0.224, 0.225]}
 13 | 
 14 | VIT_MEAN_STD = {'mean': [0.5, 0.5, 0.5], 
 15 |                 'std': [0.5, 0.5, 0.5]}
 16 | 
 17 | TRAIN_CITIES = [
 18 |     'Bangkok',
 19 |     'BuenosAires',
 20 |     'LosAngeles',
 21 |     'MexicoCity',
 22 |     'OSL',
 23 |     'Rome',
 24 |     'Barcelona',
 25 |     'Chicago',
 26 |     'Madrid',
 27 |     'Miami',
 28 |     'Phoenix',
 29 |     'TRT',
 30 |     'Boston',
 31 |     'Lisbon',
 32 |     'Medellin',
 33 |     'Minneapolis',
 34 |     'PRG',
 35 |     'WashingtonDC',
 36 |     'Brussels',
 37 |     'London',
 38 |     'Melbourne',
 39 |     'Osaka',
 40 |     'PRS',
 41 | ]
 42 | 
 43 | 
 44 | class GSVCitiesDataModule(pl.LightningDataModule):
 45 |     def __init__(self,
 46 |                  batch_size=32,
 47 |                  img_per_place=4,
 48 |                  min_img_per_place=4,
 49 |                  shuffle_all=False,
 50 |                  image_size=(480, 640),
 51 |                  num_workers=4,
 52 |                  show_data_stats=True,
 53 |                  cities=TRAIN_CITIES,
 54 |                  mean_std=IMAGENET_MEAN_STD,
 55 |                  batch_sampler=None,
 56 |                  random_sample_from_each_place=True,
 57 |                  val_set_names=['pitts30k_val', 'msls_val']
 58 |                  ):
 59 |         super().__init__()
 60 |         self.batch_size = batch_size
 61 |         self.img_per_place = img_per_place
 62 |         self.min_img_per_place = min_img_per_place
 63 |         self.shuffle_all = shuffle_all
 64 |         self.image_size = image_size
 65 |         self.num_workers = num_workers
 66 |         self.batch_sampler = batch_sampler
 67 |         self.show_data_stats = show_data_stats
 68 |         self.cities = cities
 69 |         self.mean_dataset = mean_std['mean']
 70 |         self.std_dataset = mean_std['std']
 71 |         self.random_sample_from_each_place = random_sample_from_each_place
 72 |         self.val_set_names = val_set_names
 73 |         self.save_hyperparameters() # save hyperparameter with Pytorch Lightening
 74 | 
 75 |         self.train_transform = T.Compose([
 76 |             T.Resize(image_size, interpolation=T.InterpolationMode.BILINEAR),
 77 |             T.RandAugment(num_ops=3, interpolation=T.InterpolationMode.BILINEAR),
 78 |             T.ToTensor(),
 79 |             T.Normalize(mean=self.mean_dataset, std=self.std_dataset),
 80 |         ])
 81 | 
 82 |         self.valid_transform = T.Compose([
 83 |             T.Resize(image_size, interpolation=T.InterpolationMode.BILINEAR),
 84 |             T.ToTensor(),
 85 |             T.Normalize(mean=self.mean_dataset, std=self.std_dataset)])
 86 | 
 87 |         self.train_loader_config = {
 88 |             'batch_size': self.batch_size,
 89 |             'num_workers': self.num_workers,
 90 |             'drop_last': False,
 91 |             'pin_memory': True,
 92 |             'shuffle': self.shuffle_all}
 93 | 
 94 |         self.valid_loader_config = {
 95 |             'batch_size': self.batch_size,
 96 |             'num_workers': self.num_workers//2,
 97 |             'drop_last': False,
 98 |             'pin_memory': True,
 99 |             'shuffle': False}
100 | 
101 |     def setup(self, stage):
102 |         if stage == 'fit':
103 |             # load train dataloader with reload routine
104 |             self.reload()
105 | 
106 |             # load validation sets (pitts_val, msls_val, ...etc)
107 |             self.val_datasets = []
108 |             for valid_set_name in self.val_set_names:
109 |                 if valid_set_name.lower() == 'pitts30k_test':
110 |                     self.val_datasets.append(PittsburgDataset.get_whole_test_set(
111 |                         input_transform=self.valid_transform))
112 |                 elif valid_set_name.lower() == 'pitts30k_val':
113 |                     self.val_datasets.append(PittsburgDataset.get_whole_val_set(
114 |                         input_transform=self.valid_transform))
115 |                 elif valid_set_name.lower() == 'msls_val':
116 |                     self.val_datasets.append(MapillaryDataset.MSLS(
117 |                         input_transform=self.valid_transform))
118 |                 else:
119 |                     print(
120 |                         f'Validation set {valid_set_name} does not exist or has not been implemented yet')
121 |                     raise NotImplementedError
122 |             if self.show_data_stats:
123 |                 self.print_stats()
124 | 
125 |     def reload(self):
126 |         self.train_dataset = GSVCitiesDataset(
127 |             cities=self.cities,
128 |             img_per_place=self.img_per_place,
129 |             min_img_per_place=self.min_img_per_place,
130 |             random_sample_from_each_place=self.random_sample_from_each_place,
131 |             transform=self.train_transform)
132 | 
133 |     def train_dataloader(self):
134 |         self.reload()
135 |         return DataLoader(dataset=self.train_dataset, **self.train_loader_config)
136 | 
137 |     def val_dataloader(self):
138 |         val_dataloaders = []
139 |         for val_dataset in self.val_datasets:
140 |             val_dataloaders.append(DataLoader(
141 |                 dataset=val_dataset, **self.valid_loader_config))
142 |         return val_dataloaders
143 | 
144 |     def print_stats(self):
145 |         print()  # print a new line
146 |         table = PrettyTable()
147 |         table.field_names = ['Data', 'Value']
148 |         table.align['Data'] = "l"
149 |         table.align['Value'] = "l"
150 |         table.header = False
151 |         table.add_row(["# of cities", f"{len(TRAIN_CITIES)}"])
152 |         table.add_row(["# of places", f'{self.train_dataset.__len__()}'])
153 |         table.add_row(["# of images", f'{self.train_dataset.total_nb_images}'])
154 |         print(table.get_string(title="Training Dataset"))
155 |         print()
156 | 
157 |         table = PrettyTable()
158 |         table.field_names = ['Data', 'Value']
159 |         table.align['Data'] = "l"
160 |         table.align['Value'] = "l"
161 |         table.header = False
162 |         for i, val_set_name in enumerate(self.val_set_names):
163 |             table.add_row([f"Validation set {i+1}", f"{val_set_name}"])
164 |         # table.add_row(["# of places", f'{self.train_dataset.__len__()}'])
165 |         print(table.get_string(title="Validation Datasets"))
166 |         print()
167 | 
168 |         table = PrettyTable()
169 |         table.field_names = ['Data', 'Value']
170 |         table.align['Data'] = "l"
171 |         table.align['Value'] = "l"
172 |         table.header = False
173 |         table.add_row(
174 |             ["Batch size (PxK)", f"{self.batch_size}x{self.img_per_place}"])
175 |         table.add_row(
176 |             ["# of iterations", f"{self.train_dataset.__len__()//self.batch_size}"])
177 |         table.add_row(["Image size", f"{self.image_size}"])
178 |         print(table.get_string(title="Training config"))
179 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/dataloaders/GSVCitiesDataset.py:
--------------------------------------------------------------------------------
  1 | # https://github.com/amaralibey/gsv-cities
  2 | 
  3 | import pandas as pd
  4 | from pathlib import Path
  5 | from PIL import Image
  6 | import torch
  7 | from torch.utils.data import Dataset
  8 | import torchvision.transforms as T
  9 | 
 10 | default_transform = T.Compose([
 11 |     T.ToTensor(),
 12 |     T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
 13 | ])
 14 | 
 15 | # NOTE: Hard coded path to dataset folder 
 16 | BASE_PATH = '../datasets/gsv_cities/'
 17 | 
 18 | if not Path(BASE_PATH).exists():
 19 |     raise FileNotFoundError(
 20 |         'BASE_PATH is hardcoded, please adjust to point to gsv_cities')
 21 | 
 22 | class GSVCitiesDataset(Dataset):
 23 |     def __init__(self,
 24 |                  cities=['London', 'Boston'],
 25 |                  img_per_place=4,
 26 |                  min_img_per_place=4,
 27 |                  random_sample_from_each_place=True,
 28 |                  transform=default_transform,
 29 |                  base_path=BASE_PATH
 30 |                  ):
 31 |         super(GSVCitiesDataset, self).__init__()
 32 |         self.base_path = base_path
 33 |         self.cities = cities
 34 | 
 35 |         assert img_per_place <= min_img_per_place, \
 36 |             f"img_per_place should be less than {min_img_per_place}"
 37 |         self.img_per_place = img_per_place
 38 |         self.min_img_per_place = min_img_per_place
 39 |         self.random_sample_from_each_place = random_sample_from_each_place
 40 |         self.transform = transform
 41 |         
 42 |         # generate the dataframe contraining images metadata
 43 |         self.dataframe = self.__getdataframes()
 44 |         
 45 |         # get all unique place ids
 46 |         self.places_ids = pd.unique(self.dataframe.index)
 47 |         self.total_nb_images = len(self.dataframe)
 48 |         
 49 |     def __getdataframes(self):
 50 |         ''' 
 51 |             Return one dataframe containing
 52 |             all info about the images from all cities
 53 | 
 54 |             This requieres DataFrame files to be in a folder
 55 |             named Dataframes, containing a DataFrame
 56 |             for each city in self.cities
 57 |         '''
 58 |         # read the first city dataframe
 59 |         df = pd.read_csv(self.base_path+'Dataframes/'+f'{self.cities[0]}.csv')
 60 |         df = df.sample(frac=1)  # shuffle the city dataframe
 61 |         
 62 | 
 63 |         # append other cities one by one
 64 |         for i in range(1, len(self.cities)):
 65 |             tmp_df = pd.read_csv(
 66 |                 self.base_path+'Dataframes/'+f'{self.cities[i]}.csv')
 67 | 
 68 |             # Now we add a prefix to place_id, so that we
 69 |             # don't confuse, say, place number 13 of NewYork
 70 |             # with place number 13 of London ==> (0000013 and 0500013)
 71 |             # We suppose that there is no city with more than
 72 |             # 99999 images and there won't be more than 99 cities
 73 |             # TODO: rename the dataset and hardcode these prefixes
 74 |             prefix = i
 75 |             tmp_df['place_id'] = tmp_df['place_id'] + (prefix * 10**5)
 76 |             tmp_df = tmp_df.sample(frac=1)  # shuffle the city dataframe
 77 |             
 78 |             df = pd.concat([df, tmp_df], ignore_index=True)
 79 | 
 80 |         # keep only places depicted by at least min_img_per_place images
 81 |         res = df[df.groupby('place_id')['place_id'].transform(
 82 |             'size') >= self.min_img_per_place]
 83 |         return res.set_index('place_id')
 84 |     
 85 |     def __getitem__(self, index):
 86 |         place_id = self.places_ids[index]
 87 |         
 88 |         # get the place in form of a dataframe (each row corresponds to one image)
 89 |         place = self.dataframe.loc[place_id]
 90 |         
 91 |         # sample K images (rows) from this place
 92 |         # we can either sort and take the most recent k images
 93 |         # or randomly sample them
 94 |         if self.random_sample_from_each_place:
 95 |             place = place.sample(n=self.img_per_place)
 96 |         else:  # always get the same most recent images
 97 |             place = place.sort_values(
 98 |                 by=['year', 'month', 'lat'], ascending=False)
 99 |             place = place[: self.img_per_place]
100 |             
101 |         imgs = []
102 |         for i, row in place.iterrows():
103 |             img_name = self.get_img_name(row)
104 |             img_path = self.base_path + 'Images/' + \
105 |                 row['city_id'] + '/' + img_name
106 |             img = self.image_loader(img_path)
107 | 
108 |             if self.transform is not None:
109 |                 img = self.transform(img)
110 | 
111 |             imgs.append(img)
112 | 
113 |         # NOTE: contrary to image classification where __getitem__ returns only one image 
114 |         # in GSVCities, we return a place, which is a Tesor of K images (K=self.img_per_place)
115 |         # this will return a Tensor of shape [K, channels, height, width]. This needs to be taken into account 
116 |         # in the Dataloader (which will yield batches of shape [BS, K, channels, height, width])
117 |         return torch.stack(imgs), torch.tensor(place_id).repeat(self.img_per_place)
118 | 
119 |     def __len__(self):
120 |         '''Denotes the total number of places (not images)'''
121 |         return len(self.places_ids)
122 | 
123 |     @staticmethod
124 |     def image_loader(path):
125 |         return Image.open(path).convert('RGB')
126 | 
127 |     @staticmethod
128 |     def get_img_name(row):
129 |         # given a row from the dataframe
130 |         # return the corresponding image name
131 | 
132 |         city = row['city_id']
133 |         
134 |         # now remove the two digit we added to the id
135 |         # they are superficially added to make ids different
136 |         # for different cities
137 |         pl_id = row.name % 10**5  #row.name is the index of the row, not to be confused with image name
138 |         pl_id = str(pl_id).zfill(7)
139 |         
140 |         panoid = row['panoid']
141 |         year = str(row['year']).zfill(4)
142 |         month = str(row['month']).zfill(2)
143 |         northdeg = str(row['northdeg']).zfill(3)
144 |         lat, lon = str(row['lat']), str(row['lon'])
145 |         name = city+'_'+pl_id+'_'+year+'_'+month+'_' + \
146 |             northdeg+'_'+lat+'_'+lon+'_'+panoid+'.jpg'
147 |         return name
148 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/dataloaders/MapillaryDataset.py:
--------------------------------------------------------------------------------
 1 | from pathlib import Path
 2 | import numpy as np
 3 | from PIL import Image
 4 | from torch.utils.data import Dataset
 5 | 
 6 | # NOTE: you need to download the mapillary_sls dataset from  https://github.com/FrederikWarburg/mapillary_sls
 7 | # make sure the path where the mapillary_sls validation dataset resides on your computer is correct.
 8 | # the folder named train_val should reside in DATASET_ROOT path (that's the only folder you need from mapillary_sls)
 9 | # I hardcoded the groundtruth for image to image evaluation, otherwise it would take ages to run the groundtruth script at each epoch.
10 | DATASET_ROOT = '../datasets/msls_val/'
11 | 
12 | path_obj = Path(DATASET_ROOT)
13 | if not path_obj.exists():
14 |     raise Exception('Please make sure the path to mapillary_sls dataset is correct')
15 | 
16 | if not path_obj.joinpath('train_val'):
17 |     raise Exception(f'Please make sure the directory train_val from mapillary_sls dataset is situated in the directory {DATASET_ROOT}')
18 | 
19 | class MSLS(Dataset):
20 |     def __init__(self, input_transform = None):
21 |         
22 |         self.input_transform = input_transform
23 |         
24 |         # hard coded reference image names, this avoids the hassle of listing them at each epoch.
25 |         self.dbImages = np.load('../datasets/msls_val/msls_val_dbImages.npy')
26 |         
27 |         # hard coded query image names.
28 |         self.qImages = np.load('../datasets/msls_val/msls_val_qImages.npy')
29 |         
30 |         # hard coded index of query images
31 |         self.qIdx = np.load('../datasets/msls_val/msls_val_qIdx.npy')
32 |         
33 |         # hard coded groundtruth (correspondence between each query and its matches)
34 |         self.pIdx = np.load('../datasets/msls_val/msls_val_pIdx.npy', allow_pickle=True)
35 |         
36 |         # concatenate reference images then query images so that we can use only one dataloader
37 |         self.images = np.concatenate((self.dbImages, self.qImages[self.qIdx]))
38 |         
39 |         # we need to keeo the number of references so that we can split references-queries 
40 |         # when calculating recall@K
41 |         self.num_references = len(self.dbImages)
42 |     
43 |     def __getitem__(self, index):
44 |         img = Image.open(DATASET_ROOT+self.images[index])
45 | 
46 |         if self.input_transform:
47 |             img = self.input_transform(img)
48 | 
49 |         return img, index
50 | 
51 |     def __len__(self):
52 |         return len(self.images)


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/dataloaders/PittsburgDataset.py:
--------------------------------------------------------------------------------
  1 | from os.path import join, exists
  2 | from collections import namedtuple
  3 | from scipy.io import loadmat
  4 | 
  5 | import torchvision.transforms as T
  6 | import torch.utils.data as data
  7 | 
  8 | 
  9 | from PIL import Image
 10 | from sklearn.neighbors import NearestNeighbors
 11 | 
 12 | root_dir = '/scratch/ds5725/VPR-datasets-downloader/datasets/indoor_new/'
 13 | 
 14 | if not exists(root_dir):
 15 |     raise FileNotFoundError(
 16 |         'root_dir is hardcoded, please adjust to point to Pittsburgh dataset')
 17 | 
 18 | struct_dir = "/scratch/ds5725/ssl_vpr/sub/"
 19 | queries_dir = join(root_dir, 'queries_real')
 20 | 
 21 | 
 22 | def input_transform(image_size=None):
 23 |     return T.Compose([
 24 |         T.Resize(image_size),# interpolation=T.InterpolationMode.BICUBIC),
 25 |         T.ToTensor(),
 26 |         T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
 27 |     ])
 28 | 
 29 | 
 30 | 
 31 | def get_whole_val_set(input_transform):
 32 |     structFile = join(struct_dir, 'new_val.mat')
 33 |     return WholeDatasetFromStruct(structFile, input_transform=input_transform)
 34 | 
 35 | 
 36 | def get_250k_val_set(input_transform):
 37 |     structFile = join(struct_dir, 'new_val.mat')
 38 |     return WholeDatasetFromStruct(structFile, input_transform=input_transform)
 39 | 
 40 | 
 41 | def get_whole_test_set(input_transform):
 42 |     structFile = join(struct_dir, 'new_test.mat')
 43 |     return WholeDatasetFromStruct(structFile, input_transform=input_transform)
 44 | 
 45 | 
 46 | def get_250k_test_set(input_transform):
 47 |     structFile = join(struct_dir, 'new_test.mat')
 48 |     return WholeDatasetFromStruct(structFile, input_transform=input_transform)
 49 | 
 50 | def get_whole_training_set(onlyDB=False):
 51 |     structFile = join(struct_dir, 'new_train.mat')
 52 |     return WholeDatasetFromStruct(structFile,
 53 |                                   input_transform=input_transform(),
 54 |                                   onlyDB=onlyDB)
 55 | 
 56 | dbStruct = namedtuple('dbStruct', ['whichSet', 'dataset',
 57 |                                    'dbImage', 'utmDb', 'qImage', 'utmQ', 'numDb', 'numQ',
 58 |                                    'posDistThr', 'posDistSqThr', 'nonTrivPosDistSqThr'])
 59 | 
 60 | 
 61 | def parse_dbStruct(path):
 62 |     mat = loadmat(path)
 63 |     matStruct = mat['dbStruct'].item()
 64 | 
 65 |     dataset = 'pitts250k'
 66 | 
 67 |     whichSet = matStruct[0].item()
 68 | 
 69 |     dbImage = [f[0].item() for f in matStruct[1]]
 70 |     utmDb = matStruct[2].T
 71 | 
 72 |     qImage = [f[0].item() for f in matStruct[3]]
 73 |     utmQ = matStruct[4].T
 74 | 
 75 |     numDb = matStruct[5].item()
 76 |     numQ = matStruct[6].item()
 77 | 
 78 |     posDistThr = matStruct[7].item()
 79 |     posDistSqThr = matStruct[8].item()
 80 |     nonTrivPosDistSqThr = matStruct[9].item()
 81 | 
 82 |     return dbStruct(whichSet, dataset, dbImage, utmDb, qImage,
 83 |                     utmQ, numDb, numQ, posDistThr,
 84 |                     posDistSqThr, nonTrivPosDistSqThr)
 85 | 
 86 | 
 87 | class WholeDatasetFromStruct(data.Dataset):
 88 |     def __init__(self, structFile, input_transform=None, onlyDB=False):
 89 |         super().__init__()
 90 | 
 91 |         self.input_transform = input_transform
 92 | 
 93 |         self.dbStruct = parse_dbStruct(structFile)
 94 |         self.images = [dbIm for dbIm in self.dbStruct.dbImage]
 95 |         if not onlyDB:
 96 |             self.images += [qIm for qIm in self.dbStruct.qImage]
 97 | 
 98 |         self.whichSet = self.dbStruct.whichSet
 99 |         self.dataset = self.dbStruct.dataset
100 | 
101 |         self.positives = None
102 |         self.distances = None
103 | 
104 |     def __getitem__(self, index):
105 |         img = Image.open(self.images[index])
106 | 
107 |         if self.input_transform:
108 |             img = self.input_transform(img)
109 | 
110 |         return img, index
111 | 
112 |     def __len__(self):
113 |         return len(self.images)
114 | 
115 |     def getPositives(self):
116 |         # positives for evaluation are those within trivial threshold range
117 |         # fit NN to find them, search by radius
118 |         if self.positives is None:
119 |             knn = NearestNeighbors(n_jobs=-1)
120 |             knn.fit(self.dbStruct.utmDb)
121 | 
122 |             self.distances, self.positives = knn.radius_neighbors(self.dbStruct.utmQ,
123 |                                                                   radius=self.dbStruct.posDistThr)
124 | 
125 |         return self.positives
126 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/eval.py:
--------------------------------------------------------------------------------
  1 | 
  2 | """
  3 | With this script you can evaluate checkpoints or test models from two popular
  4 | landmark retrieval github repos.
  5 | The first is https://github.com/naver/deep-image-retrieval from Naver labs,
  6 | provides ResNet-50 and ResNet-101 trained with AP on Google Landmarks 18 clean.
  7 | $ python eval.py --off_the_shelf=naver --l2=none --backbone=resnet101conv5 --aggregation=gem --fc_output_dim=2048
  8 | 
  9 | The second is https://github.com/filipradenovic/cnnimageretrieval-pytorch from
 10 | Radenovic, provides ResNet-50 and ResNet-101 trained with a triplet loss
 11 | on Google Landmarks 18 and sfm120k.
 12 | $ python eval.py --off_the_shelf=radenovic_gldv1 --l2=after_pool --backbone=resnet101conv5 --aggregation=gem --fc_output_dim=2048
 13 | $ python eval.py --off_the_shelf=radenovic_sfm --l2=after_pool --backbone=resnet101conv5 --aggregation=gem --fc_output_dim=2048
 14 | 
 15 | Note that although the architectures are almost the same, Naver's
 16 | implementation does not use a l2 normalization before/after the GeM aggregation,
 17 | while Radenovic's uses it after (and we use it before, which shows better
 18 | results in VG)
 19 | """
 20 | 
 21 | import os
 22 | import sys
 23 | import torch
 24 | import parser
 25 | import logging
 26 | import sklearn
 27 | from os.path import join
 28 | from datetime import datetime
 29 | from torch.utils.model_zoo import load_url
 30 | from google_drive_downloader import GoogleDriveDownloader as gdd
 31 | 
 32 | import test
 33 | import util
 34 | import commons
 35 | import datasets_ws
 36 | from model import network
 37 | 
 38 | OFF_THE_SHELF_RADENOVIC = {
 39 |     'resnet50conv5_sfm'    : 'http://cmp.felk.cvut.cz/cnnimageretrieval/data/networks/retrieval-SfM-120k/rSfM120k-tl-resnet50-gem-w-97bf910.pth',
 40 |     'resnet101conv5_sfm'   : 'http://cmp.felk.cvut.cz/cnnimageretrieval/data/networks/retrieval-SfM-120k/rSfM120k-tl-resnet101-gem-w-a155e54.pth',
 41 |     'resnet50conv5_gldv1'  : 'http://cmp.felk.cvut.cz/cnnimageretrieval/data/networks/gl18/gl18-tl-resnet50-gem-w-83fdc30.pth',
 42 |     'resnet101conv5_gldv1' : 'http://cmp.felk.cvut.cz/cnnimageretrieval/data/networks/gl18/gl18-tl-resnet101-gem-w-a4d43db.pth',
 43 | }
 44 | 
 45 | OFF_THE_SHELF_NAVER = {
 46 |     "resnet50conv5"  : "1oPtE_go9tnsiDLkWjN4NMpKjh-_md1G5",
 47 |     'resnet101conv5' : "1UWJGDuHtzaQdFhSMojoYVQjmCXhIwVvy"
 48 | }
 49 | 
 50 | ######################################### SETUP #########################################
 51 | args = parser.parse_arguments()
 52 | start_time = datetime.now()
 53 | args.save_dir = join("test", args.save_dir, start_time.strftime('%Y-%m-%d_%H-%M-%S'))
 54 | commons.setup_logging(args.save_dir)
 55 | commons.make_deterministic(args.seed)
 56 | logging.info(f"Arguments: {args}")
 57 | logging.info(f"The outputs are being saved in {args.save_dir}")
 58 | 
 59 | ######################################### MODEL #########################################
 60 | model = network.GeoLocalizationNet(args)
 61 | model = model.to(args.device)
 62 | 
 63 | if args.aggregation in ["netvlad", "crn"]:
 64 |     args.features_dim *= args.netvlad_clusters
 65 | 
 66 | if args.off_the_shelf.startswith("radenovic") or args.off_the_shelf.startswith("naver"):
 67 |     if args.off_the_shelf.startswith("radenovic"):
 68 |         pretrain_dataset_name = args.off_the_shelf.split("_")[1]  # sfm or gldv1 datasets
 69 |         url = OFF_THE_SHELF_RADENOVIC[f"{args.backbone}_{pretrain_dataset_name}"]
 70 |         state_dict = load_url(url, model_dir=join("data", "off_the_shelf_nets"))
 71 |     else:
 72 |         # This is a hacky workaround to maintain compatibility
 73 |         sys.modules['sklearn.decomposition.pca'] = sklearn.decomposition._pca
 74 |         zip_file_path = join("data", "off_the_shelf_nets", args.backbone + "_naver.zip")
 75 |         if not os.path.exists(zip_file_path):
 76 |             gdd.download_file_from_google_drive(file_id=OFF_THE_SHELF_NAVER[args.backbone],
 77 |                                                 dest_path=zip_file_path, unzip=True)
 78 |         if args.backbone == "resnet50conv5":
 79 |             state_dict_filename = "Resnet50-AP-GeM.pt"
 80 |         elif args.backbone == "resnet101conv5":
 81 |             state_dict_filename = "Resnet-101-AP-GeM.pt"
 82 |         state_dict = torch.load(join("data", "off_the_shelf_nets", state_dict_filename))
 83 |     state_dict = state_dict["state_dict"]
 84 |     model_keys = model.state_dict().keys()
 85 |     renamed_state_dict = {k: v for k, v in zip(model_keys, state_dict.values())}
 86 |     model.load_state_dict(renamed_state_dict)
 87 | elif args.resume is not None:
 88 |     logging.info(f"Resuming model from {args.resume}")
 89 |     model = util.resume_model(args, model)
 90 | # Enable DataParallel after loading checkpoint, otherwise doing it before
 91 | # would append "module." in front of the keys of the state dict triggering errors
 92 | model = torch.nn.DataParallel(model)
 93 | 
 94 | if args.pca_dim is None:
 95 |     pca = None
 96 | else:
 97 |     full_features_dim = args.features_dim
 98 |     args.features_dim = args.pca_dim
 99 |     pca = util.compute_pca(args, model, args.pca_dataset_folder, full_features_dim)
100 | 
101 | ######################################### DATASETS #########################################
102 | test_ds = datasets_ws.BaseDataset(args, args.datasets_folder, args.dataset_name, "test")
103 | logging.info(f"Test set: {test_ds}")
104 | 
105 | ######################################### TEST on TEST SET #########################################
106 | recalls, recalls_str = test.test(args, test_ds, model, args.test_method, pca)
107 | logging.info(f"Recalls on {test_ds}: {recalls_str}")
108 | 
109 | logging.info(f"Finished in {str(datetime.now() - start_time)[:-7]}")
110 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/ind_name.py:
--------------------------------------------------------------------------------
  1 | # fd=open("test_database.txt")
  2 | # fq=open("test_queries.txt")
  3 | # fdl=fd.readlines()
  4 | # fql=fq.readlines()
  5 | # f=open("anyloc.txt")
  6 | # f1=open("results/anyloc_name.txt","w")
  7 | # lines=f.readlines()
  8 | # for i in range(1,len(lines)):
  9 | #     s=lines[i].strip().split()
 10 | #     f1.write(fql[int(s[0])].strip()+" "+fdl[int(s[1])].strip()+"\n")
 11 | import math
 12 | 
 13 | def calculate_distance(point1, point2):
 14 |     x1, y1 = point1
 15 |     x2, y2 = point2
 16 |     distance = math.sqrt((x2 - x1)**2 + (y2 - y1)**2)
 17 |     return distance
 18 | 
 19 | # f=open("results/resnet_name.txt")
 20 | # f1=open("results/resnet_eval.txt","w")
 21 | # for line in f:
 22 | #     s=line.strip().split()
 23 | #     x0=float(s[0].split('@')[1])
 24 | #     y0=float(s[0].split('@')[2])
 25 | #     x1=float(s[1].split('@')[1])
 26 | #     y1=float(s[1].split('@')[2])
 27 | #     f1.write(s[0]+" "+s[1]+" "+str(calculate_distance((x0,y0),(x1,y1)))+"\n")
 28 | 
 29 | ad={}
 30 | f=open("results/anyloc_eval.txt")
 31 | for line in f:
 32 |     s=line.strip().split()
 33 |     ad[s[0]]=(s[1],float(s[2]))
 34 | 
 35 | cd={}
 36 | f1=open("results/cct_eval.txt")
 37 | for line in f1:
 38 |     s=line.strip().split()
 39 |     cd[s[0]]=(s[1],float(s[2]))
 40 | 
 41 | cod={}
 42 | f2=open("results/cosplace_eval.txt")
 43 | for line in f2:
 44 |     s=line.strip().split()
 45 |     cod[s[0]]=(s[1],float(s[2]))
 46 | 
 47 | md={}
 48 | f3=open("results/mixvpr_eval.txt")
 49 | for line in f3:
 50 |     s=line.strip().split()
 51 |     md[s[0]]=(s[1],float(s[2]))
 52 |   
 53 | rd={}
 54 | f4=open("results/resnet_eval.txt")
 55 | for line in f4:
 56 |     s=line.strip().split()
 57 |     rd[s[0]]=(s[1],float(s[2]))
 58 | 
 59 | ad_rate={}
 60 | for k,v in cod.items():
 61 |     m=k.split('@')[1][1]
 62 |     if m not in ad_rate:
 63 |         ad_rate[m]=[0,0]
 64 |     if v[1]<=25:
 65 |         ad_rate[m][0]+=1
 66 |     else:
 67 |         ad_rate[m][1]+=1
 68 | 
 69 | a=[]
 70 | for i in range(0,9):
 71 |     a.append(round(ad_rate[str(i)][0]/(ad_rate[str(i)][0]+ad_rate[str(i)][1]),2))
 72 | # print(a)
 73 | # common_keys = set(ad.keys()) & set(cd.keys()) & set(rd.keys()) & set(md.keys())
 74 | # num_common_keys = len(common_keys)
 75 | # print(num_common_keys)
 76 | 
 77 | # for k,v in md.items():
 78 | #     if float(k.split('@')[1])<8000 or float(k.split('@')[1])>9000:
 79 | #         continue
 80 | #     if v[1]<5 and (k in ad) and (k in cd) and (k in rd):
 81 | #         print(k, ad[k],cd[k],rd[k],md[k],sep="\n")
 82 | #         break
 83 |         
 84 | # k="/mnt/data/nyc_indoor/indoor/images/test/queries/@08115.45@00140.84@620@.jpg"
 85 | # print(k, ad[k],cd[k],rd[k],md[k],sep="\n")
 86 | # for k,v in md.items():
 87 | #     if float(k.split('@')[1])<500:
 88 | #         continue
 89 | #     if v[1]<5 and (k in ad) and (k in cd) and (k in rd) and (ad[k][1]>20):
 90 | #         print(k, ad[k],cd[k],rd[k],md[k],sep="\n")
 91 | #         break
 92 |         
 93 | # for k,v in md.items():
 94 | #     if float(k.split('@')[1])<1500 or (float(k.split('@')[1])>8000 and float(k.split('@')[1])<9000):
 95 | #         continue
 96 | #     if v[1]<5 and (k in ad) and (k in cd) and (k in rd) and (ad[k][1]>20 or cd[k][1]>20 or rd[k][1]>20):
 97 | #         print(k, ad[k],cd[k],rd[k],md[k],sep="\n")
 98 | #         break
 99 | 
100 | s=["/mnt/data/nyc_indoor/indoor/images/test/queries/@08115.45@00140.84@620@.jpg"]
101 | 
102 | kn=""
103 | for q in s:
104 |     maxd=9999
105 |     coord=(float(q.split('@')[1]),float(q.split('@')[2]))
106 |     for k,v in cod.items():
107 |         c1=(float(k.split('@')[1]),float(k.split('@')[2]))
108 |         if calculate_distance(coord,c1)<maxd:
109 |             maxd=calculate_distance(coord,c1)
110 |             kn=k
111 |     print(cod[kn])


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/benchmark/deep-visual-geo-localization-benchmark/model/__init__.py


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/cct/__init__.py:
--------------------------------------------------------------------------------
1 | from .cct import cct_14_7x2_384, cct_14_7x2_224


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/cct/embedder.py:
--------------------------------------------------------------------------------
 1 | import torch.nn as nn
 2 | 
 3 | 
 4 | class Embedder(nn.Module):
 5 |     def __init__(self,
 6 |                  word_embedding_dim=300,
 7 |                  vocab_size=100000,
 8 |                  padding_idx=1,
 9 |                  pretrained_weight=None,
10 |                  embed_freeze=False,
11 |                  *args, **kwargs):
12 |         super(Embedder, self).__init__()
13 |         self.embeddings = nn.Embedding.from_pretrained(pretrained_weight, freeze=embed_freeze) \
14 |             if pretrained_weight is not None else \
15 |             nn.Embedding(vocab_size, word_embedding_dim, padding_idx=padding_idx)
16 |         self.embeddings.weight.requires_grad = not embed_freeze
17 | 
18 |     def forward_mask(self, mask):
19 |         bsz, seq_len = mask.shape
20 |         new_mask = mask.view(bsz, seq_len, 1)
21 |         new_mask = new_mask.sum(-1)
22 |         new_mask = (new_mask > 0)
23 |         return new_mask
24 | 
25 |     def forward(self, x, mask=None):
26 |         embed = self.embeddings(x)
27 |         embed = embed if mask is None else embed * self.forward_mask(mask).unsqueeze(-1).float()
28 |         return embed, mask
29 | 
30 |     @staticmethod
31 |     def init_weight(m):
32 |         if isinstance(m, nn.Linear):
33 |             nn.init.trunc_normal_(m.weight, std=.02)
34 |             if isinstance(m, nn.Linear) and m.bias is not None:
35 |                 nn.init.constant_(m.bias, 0)
36 |         else:
37 |             nn.init.normal_(m.weight)
38 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/cct/helpers.py:
--------------------------------------------------------------------------------
 1 | import math
 2 | import torch
 3 | import torch.nn.functional as F
 4 | 
 5 | 
 6 | def resize_pos_embed(posemb, posemb_new, num_tokens=1):
 7 |     # Copied from `timm` by Ross Wightman:
 8 |     # github.com/rwightman/pytorch-image-models
 9 |     # Rescale the grid of position embeddings when loading from state_dict. Adapted from
10 |     # https://github.com/google-research/vision_transformer/blob/00883dd691c63a6830751563748663526e811cee/vit_jax/checkpoint.py#L224
11 |     ntok_new = posemb_new.shape[1]
12 |     if num_tokens:
13 |         posemb_tok, posemb_grid = posemb[:, :num_tokens], posemb[0, num_tokens:]
14 |         ntok_new -= num_tokens
15 |     else:
16 |         posemb_tok, posemb_grid = posemb[:, :0], posemb[0]
17 |     gs_old = int(math.sqrt(len(posemb_grid)))
18 |     gs_new = int(math.sqrt(ntok_new))
19 |     posemb_grid = posemb_grid.reshape(1, gs_old, gs_old, -1).permute(0, 3, 1, 2)
20 |     posemb_grid = F.interpolate(posemb_grid, size=(gs_new, gs_new), mode='bilinear')
21 |     posemb_grid = posemb_grid.permute(0, 2, 3, 1).reshape(1, gs_new * gs_new, -1)
22 |     posemb = torch.cat([posemb_tok, posemb_grid], dim=1)
23 |     return posemb
24 | 
25 | 
26 | def pe_check(model, state_dict, pe_key='classifier.positional_emb'):
27 |     if pe_key is not None and pe_key in state_dict.keys() and pe_key in model.state_dict().keys():
28 |         if model.state_dict()[pe_key].shape != state_dict[pe_key].shape:
29 |             state_dict[pe_key] = resize_pos_embed(state_dict[pe_key],
30 |                                                   model.state_dict()[pe_key],
31 |                                                   num_tokens=model.classifier.num_tokens)
32 |     return state_dict
33 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/cct/stochastic_depth.py:
--------------------------------------------------------------------------------
 1 | # Thanks to rwightman's timm package
 2 | # github.com:rwightman/pytorch-image-models
 3 | 
 4 | import torch
 5 | import torch.nn as nn
 6 | 
 7 | 
 8 | def drop_path(x, drop_prob: float = 0., training: bool = False):
 9 |     """
10 |     Obtained from: github.com:rwightman/pytorch-image-models
11 |     Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
12 |     This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,
13 |     the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
14 |     See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for
15 |     changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use
16 |     'survival rate' as the argument.
17 |     """
18 |     if drop_prob == 0. or not training:
19 |         return x
20 |     keep_prob = 1 - drop_prob
21 |     shape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets
22 |     random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
23 |     random_tensor.floor_()  # binarize
24 |     output = x.div(keep_prob) * random_tensor
25 |     return output
26 | 
27 | 
28 | class DropPath(nn.Module):
29 |     """
30 |     Obtained from: github.com:rwightman/pytorch-image-models
31 |     Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).
32 |     """
33 | 
34 |     def __init__(self, drop_prob=None):
35 |         super(DropPath, self).__init__()
36 |         self.drop_prob = drop_prob
37 | 
38 |     def forward(self, x):
39 |         return drop_path(x, self.drop_prob, self.training)
40 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/cct/tokenizer.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | 
  5 | 
  6 | class Tokenizer(nn.Module):
  7 |     def __init__(self,
  8 |                  kernel_size, stride, padding,
  9 |                  pooling_kernel_size=3, pooling_stride=2, pooling_padding=1,
 10 |                  n_conv_layers=1,
 11 |                  n_input_channels=3,
 12 |                  n_output_channels=64,
 13 |                  in_planes=64,
 14 |                  activation=None,
 15 |                  max_pool=True,
 16 |                  conv_bias=False):
 17 |         super(Tokenizer, self).__init__()
 18 | 
 19 |         n_filter_list = [n_input_channels] + \
 20 |                         [in_planes for _ in range(n_conv_layers - 1)] + \
 21 |                         [n_output_channels]
 22 | 
 23 |         self.conv_layers = nn.Sequential(
 24 |             *[nn.Sequential(
 25 |                 nn.Conv2d(n_filter_list[i], n_filter_list[i + 1],
 26 |                           kernel_size=(kernel_size, kernel_size),
 27 |                           stride=(stride, stride),
 28 |                           padding=(padding, padding), bias=conv_bias),
 29 |                 nn.Identity() if activation is None else activation(),
 30 |                 nn.MaxPool2d(kernel_size=pooling_kernel_size,
 31 |                              stride=pooling_stride,
 32 |                              padding=pooling_padding) if max_pool else nn.Identity()
 33 |             )
 34 |                 for i in range(n_conv_layers)
 35 |             ])
 36 | 
 37 |         self.flattener = nn.Flatten(2, 3)
 38 |         self.apply(self.init_weight)
 39 | 
 40 |     def sequence_length(self, n_channels=3, height=224, width=224):
 41 |         return self.forward(torch.zeros((1, n_channels, height, width))).shape[1]
 42 | 
 43 |     def forward(self, x):
 44 |         return self.flattener(self.conv_layers(x)).transpose(-2, -1)
 45 | 
 46 |     @staticmethod
 47 |     def init_weight(m):
 48 |         if isinstance(m, nn.Conv2d):
 49 |             nn.init.kaiming_normal_(m.weight)
 50 | 
 51 | 
 52 | class TextTokenizer(nn.Module):
 53 |     def __init__(self,
 54 |                  kernel_size, stride, padding,
 55 |                  pooling_kernel_size=3, pooling_stride=2, pooling_padding=1,
 56 |                  embedding_dim=300,
 57 |                  n_output_channels=128,
 58 |                  activation=None,
 59 |                  max_pool=True,
 60 |                  *args, **kwargs):
 61 |         super(TextTokenizer, self).__init__()
 62 | 
 63 |         self.max_pool = max_pool
 64 |         self.conv_layers = nn.Sequential(
 65 |             nn.Conv2d(1, n_output_channels,
 66 |                       kernel_size=(kernel_size, embedding_dim),
 67 |                       stride=(stride, 1),
 68 |                       padding=(padding, 0), bias=False),
 69 |             nn.Identity() if activation is None else activation(),
 70 |             nn.MaxPool2d(
 71 |                 kernel_size=(pooling_kernel_size, 1),
 72 |                 stride=(pooling_stride, 1),
 73 |                 padding=(pooling_padding, 0)
 74 |             ) if max_pool else nn.Identity()
 75 |         )
 76 | 
 77 |         self.apply(self.init_weight)
 78 | 
 79 |     def seq_len(self, seq_len=32, embed_dim=300):
 80 |         return self.forward(torch.zeros((1, seq_len, embed_dim)))[0].shape[1]
 81 | 
 82 |     def forward_mask(self, mask):
 83 |         new_mask = mask.unsqueeze(1).float()
 84 |         cnn_weight = torch.ones(
 85 |             (1, 1, self.conv_layers[0].kernel_size[0]),
 86 |             device=mask.device,
 87 |             dtype=torch.float)
 88 |         new_mask = F.conv1d(
 89 |             new_mask, cnn_weight, None,
 90 |             self.conv_layers[0].stride[0], self.conv_layers[0].padding[0], 1, 1)
 91 |         if self.max_pool:
 92 |             new_mask = F.max_pool1d(
 93 |                 new_mask, self.conv_layers[2].kernel_size[0],
 94 |                 self.conv_layers[2].stride[0], self.conv_layers[2].padding[0], 1, False, False)
 95 |         new_mask = new_mask.squeeze(1)
 96 |         new_mask = (new_mask > 0)
 97 |         return new_mask
 98 | 
 99 |     def forward(self, x, mask=None):
100 |         x = x.unsqueeze(1)
101 |         x = self.conv_layers(x)
102 |         x = x.transpose(1, 3).squeeze(1)
103 |         x = x if mask is None else x * self.forward_mask(mask).unsqueeze(-1).float()
104 |         return x, mask
105 | 
106 |     @staticmethod
107 |     def init_weight(m):
108 |         if isinstance(m, nn.Conv2d):
109 |             nn.init.kaiming_normal_(m.weight)
110 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/commands.txt:
--------------------------------------------------------------------------------
1 |  python train.py --dataset_name=nyu-vpr --datasets_folder=/scratch/ds5725/VPR-datasets-downloader/datasets --backbone=cct384 --trunc_te=8 --freeze_te 1


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/functional.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import math
 3 | import torch
 4 | import torch.nn.functional as F
 5 | 
 6 | def sare_ind(query, positive, negative):
 7 |     '''all 3 inputs are supposed to be shape 1xn_features'''
 8 |     dist_pos = ((query - positive)**2).sum(1)
 9 |     dist_neg = ((query - negative)**2).sum(1)
10 |     
11 |     dist = - torch.cat((dist_pos, dist_neg))
12 |     dist = F.log_softmax(dist, 0)
13 |     
14 |     #loss = (- dist[:, 0]).mean() on a batch
15 |     loss = -dist[0]
16 |     return loss
17 | 
18 | def sare_joint(query, positive, negatives):
19 |     '''query and positive have to be 1xn_features; whereas negatives has to be
20 |     shape n_negative x n_features. n_negative is usually 10'''
21 |     # NOTE: the implementation is the same if batch_size=1 as all operations
22 |     # are vectorial. If there were the additional n_batch dimension a different
23 |     # handling of that situation would have to be implemented here.
24 |     # This function is declared anyway for the sake of clarity as the 2 should
25 |     # be called in different situations because, even though there would be
26 |     # no Exceptions, there would actually be a conceptual error.
27 |     return sare_ind(query, positive, negatives)
28 | 
29 | def mac(x):
30 |     return F.adaptive_max_pool2d(x, (1,1))
31 | 
32 | def spoc(x):
33 |     return F.adaptive_avg_pool2d(x, (1,1))
34 | 
35 | def gem(x, p=3, eps=1e-6, work_with_tokens=False):
36 |     if work_with_tokens:
37 |         x = x.permute(0, 2, 1)
38 |         # unseqeeze to maintain compatibility with Flatten
39 |         return F.avg_pool1d(x.clamp(min=eps).pow(p), (x.size(-1))).pow(1./p).unsqueeze(3)
40 |     else:
41 |         return F.avg_pool2d(x.clamp(min=eps).pow(p), (x.size(-2), x.size(-1))).pow(1./p)
42 | 
43 | def rmac(x, L=3, eps=1e-6):
44 |     ovr = 0.4 # desired overlap of neighboring regions
45 |     steps = torch.Tensor([2, 3, 4, 5, 6, 7]) # possible regions for the long dimension
46 |     W = x.size(3)
47 |     H = x.size(2)
48 |     w = min(W, H)
49 |     # w2 = math.floor(w/2.0 - 1)
50 |     b = (max(H, W)-w)/(steps-1)
51 |     (tmp, idx) = torch.min(torch.abs(((w**2 - w*b)/w**2)-ovr), 0) # steps(idx) regions for long dimension
52 |     # region overplus per dimension
53 |     Wd = 0;
54 |     Hd = 0;
55 |     if H < W:  
56 |         Wd = idx.item() + 1
57 |     elif H > W:
58 |         Hd = idx.item() + 1
59 |     v = F.max_pool2d(x, (x.size(-2), x.size(-1)))
60 |     v = v / (torch.norm(v, p=2, dim=1, keepdim=True) + eps).expand_as(v)
61 |     for l in range(1, L+1):
62 |         wl = math.floor(2*w/(l+1))
63 |         wl2 = math.floor(wl/2 - 1)
64 |         if l+Wd == 1:
65 |             b = 0
66 |         else:
67 |             b = (W-wl)/(l+Wd-1)
68 |         cenW = torch.floor(wl2 + torch.Tensor(range(l-1+Wd+1))*b) - wl2 # center coordinates
69 |         if l+Hd == 1:
70 |             b = 0
71 |         else:
72 |             b = (H-wl)/(l+Hd-1)
73 |         cenH = torch.floor(wl2 + torch.Tensor(range(l-1+Hd+1))*b) - wl2 # center coordinates
74 |         for i_ in cenH.tolist():
75 |             for j_ in cenW.tolist():
76 |                 if wl == 0:
77 |                     continue
78 |                 R = x[:,:,(int(i_)+torch.Tensor(range(wl)).long()).tolist(),:]
79 |                 R = R[:,:,:,(int(j_)+torch.Tensor(range(wl)).long()).tolist()]
80 |                 vt = F.max_pool2d(R, (R.size(-2), R.size(-1)))
81 |                 vt = vt / (torch.norm(vt, p=2, dim=1, keepdim=True) + eps).expand_as(vt)
82 |                 v += vt
83 |     return v
84 | 
85 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/network.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import os
  3 | import torch
  4 | import logging
  5 | import torchvision
  6 | from torch import nn
  7 | from os.path import join
  8 | from transformers import ViTModel
  9 | from google_drive_downloader import GoogleDriveDownloader as gdd
 10 | 
 11 | from model.cct import cct_14_7x2_384
 12 | from model.aggregation import Flatten
 13 | from model.normalization import L2Norm
 14 | import model.aggregation as aggregation
 15 | 
 16 | # Pretrained models on Google Landmarks v2 and Places 365
 17 | PRETRAINED_MODELS = {
 18 |     'resnet18_places'  : '1DnEQXhmPxtBUrRc81nAvT8z17bk-GBj5',
 19 |     'resnet50_places'  : '1zsY4mN4jJ-AsmV3h4hjbT72CBfJsgSGC',
 20 |     'resnet101_places' : '1E1ibXQcg7qkmmmyYgmwMTh7Xf1cDNQXa',
 21 |     'vgg16_places'     : '1UWl1uz6rZ6Nqmp1K5z3GHAIZJmDh4bDu',
 22 |     'resnet18_gldv2'   : '1wkUeUXFXuPHuEvGTXVpuP5BMB-JJ1xke',
 23 |     'resnet50_gldv2'   : '1UDUv6mszlXNC1lv6McLdeBNMq9-kaA70',
 24 |     'resnet101_gldv2'  : '1apiRxMJpDlV0XmKlC5Na_Drg2jtGL-uE',
 25 |     'vgg16_gldv2'      : '10Ov9JdO7gbyz6mB5x0v_VSAUMj91Ta4o'
 26 | }
 27 | 
 28 | 
 29 | class GeoLocalizationNet(nn.Module):
 30 |     """The used networks are composed of a backbone and an aggregation layer.
 31 |     """
 32 |     def __init__(self, args):
 33 |         super().__init__()
 34 |         self.backbone = get_backbone(args)
 35 |         self.arch_name = args.backbone
 36 |         self.aggregation = get_aggregation(args)
 37 | 
 38 |         if args.aggregation in ["gem", "spoc", "mac", "rmac"]:
 39 |             if args.l2 == "before_pool":
 40 |                 self.aggregation = nn.Sequential(L2Norm(), self.aggregation, Flatten())
 41 |             elif args.l2 == "after_pool":
 42 |                 self.aggregation = nn.Sequential(self.aggregation, L2Norm(), Flatten())
 43 |             elif args.l2 == "none":
 44 |                 self.aggregation = nn.Sequential(self.aggregation, Flatten())
 45 |         
 46 |         if args.fc_output_dim != None:
 47 |             # Concatenate fully connected layer to the aggregation layer
 48 |             self.aggregation = nn.Sequential(self.aggregation,
 49 |                                              nn.Linear(args.features_dim, args.fc_output_dim),
 50 |                                              L2Norm())
 51 |             args.features_dim = args.fc_output_dim
 52 | 
 53 |     def forward(self, x):
 54 |         x = self.backbone(x)
 55 |         x = self.aggregation(x)
 56 |         return x
 57 | 
 58 | 
 59 | def get_aggregation(args):
 60 |     if args.aggregation == "gem":
 61 |         return aggregation.GeM(work_with_tokens=args.work_with_tokens)
 62 |     elif args.aggregation == "spoc":
 63 |         return aggregation.SPoC()
 64 |     elif args.aggregation == "mac":
 65 |         return aggregation.MAC()
 66 |     elif args.aggregation == "rmac":
 67 |         return aggregation.RMAC()
 68 |     elif args.aggregation == "netvlad":
 69 |         return aggregation.NetVLAD(clusters_num=args.netvlad_clusters, dim=args.features_dim,
 70 |                                    work_with_tokens=args.work_with_tokens)
 71 |     elif args.aggregation == 'crn':
 72 |         return aggregation.CRN(clusters_num=args.netvlad_clusters, dim=args.features_dim)
 73 |     elif args.aggregation == "rrm":
 74 |         return aggregation.RRM(args.features_dim)
 75 |     elif args.aggregation in ['cls', 'seqpool']:
 76 |         return nn.Identity()
 77 | 
 78 | 
 79 | def get_pretrained_model(args):
 80 |     if args.pretrain == 'places':  num_classes = 365
 81 |     elif args.pretrain == 'gldv2':  num_classes = 512
 82 |     
 83 |     if args.backbone.startswith("resnet18"):
 84 |         model = torchvision.models.resnet18(num_classes=num_classes)
 85 |     elif args.backbone.startswith("resnet50"):
 86 |         model = torchvision.models.resnet50(num_classes=num_classes)
 87 |     elif args.backbone.startswith("resnet101"):
 88 |         model = torchvision.models.resnet101(num_classes=num_classes)
 89 |     elif args.backbone.startswith("vgg16"):
 90 |         model = torchvision.models.vgg16(num_classes=num_classes)
 91 |     
 92 |     if args.backbone.startswith('resnet'):
 93 |         model_name = args.backbone.split('conv')[0] + "_" + args.pretrain
 94 |     else:
 95 |         model_name = args.backbone + "_" + args.pretrain
 96 |     file_path = join("data", "pretrained_nets", model_name +".pth")
 97 |     
 98 |     if not os.path.exists(file_path):
 99 |         gdd.download_file_from_google_drive(file_id=PRETRAINED_MODELS[model_name],
100 |                                             dest_path=file_path)
101 |     state_dict = torch.load(file_path, map_location=torch.device('cpu'))
102 |     model.load_state_dict(state_dict)
103 |     return model
104 | 
105 | 
106 | def get_backbone(args):
107 |     # The aggregation layer works differently based on the type of architecture
108 |     args.work_with_tokens = args.backbone.startswith('cct') or args.backbone.startswith('vit')
109 |     if args.backbone.startswith("resnet"):
110 |         if args.pretrain in ['places', 'gldv2']:
111 |             backbone = get_pretrained_model(args)
112 |         elif args.backbone.startswith("resnet18"):
113 |             backbone = torchvision.models.resnet18(pretrained=True)
114 |         elif args.backbone.startswith("resnet50"):
115 |             backbone = torchvision.models.resnet50(pretrained=True)
116 |         elif args.backbone.startswith("resnet101"):
117 |             backbone = torchvision.models.resnet101(pretrained=True)
118 |         for name, child in backbone.named_children():
119 |             # Freeze layers before conv_3
120 |             if name == "layer3":
121 |                 break
122 |             for params in child.parameters():
123 |                 params.requires_grad = False
124 |         if args.backbone.endswith("conv4"):
125 |             logging.debug(f"Train only conv4_x of the resnet{args.backbone.split('conv')[0]} (remove conv5_x), freeze the previous ones")
126 |             layers = list(backbone.children())[:-3]
127 |         elif args.backbone.endswith("conv5"):
128 |             logging.debug(f"Train only conv4_x and conv5_x of the resnet{args.backbone.split('conv')[0]}, freeze the previous ones")
129 |             layers = list(backbone.children())[:-2]
130 |     elif args.backbone == "vgg16":
131 |         if args.pretrain in ['places', 'gldv2']:
132 |             backbone = get_pretrained_model(args)
133 |         else:
134 |             backbone = torchvision.models.vgg16(pretrained=True)
135 |         layers = list(backbone.features.children())[:-2]
136 |         for l in layers[:-5]:
137 |             for p in l.parameters(): p.requires_grad = False
138 |         logging.debug("Train last layers of the vgg16, freeze the previous ones")
139 |     elif args.backbone == "alexnet":
140 |         backbone = torchvision.models.alexnet(pretrained=True)
141 |         layers = list(backbone.features.children())[:-2]
142 |         for l in layers[:5]:
143 |             for p in l.parameters(): p.requires_grad = False
144 |         logging.debug("Train last layers of the alexnet, freeze the previous ones")
145 |     elif args.backbone.startswith("cct"):
146 |         if args.backbone.startswith("cct384"):
147 |             backbone = cct_14_7x2_384(pretrained=True, progress=True, aggregation=args.aggregation)
148 |         if args.trunc_te:
149 |             logging.debug(f"Truncate CCT at transformers encoder {args.trunc_te}")
150 |             backbone.classifier.blocks = torch.nn.ModuleList(backbone.classifier.blocks[:args.trunc_te].children())
151 |         if args.freeze_te:
152 |             logging.debug(f"Freeze all the layers up to tranformer encoder {args.freeze_te}")
153 |             for p in backbone.parameters():
154 |                 p.requires_grad = False
155 |             for name, child in backbone.classifier.blocks.named_children():
156 |                 if int(name) > args.freeze_te:
157 |                     for params in child.parameters():
158 |                         params.requires_grad = True
159 |         args.features_dim = 384
160 |         return backbone
161 |     elif args.backbone.startswith("vit"):
162 |         assert args.resize[0] in [224, 384], f'Image size for ViT must be either 224 or 384, but it\'s {args.resize[0]}'
163 |         if args.resize[0] == 224:
164 |             backbone = ViTModel.from_pretrained('google/vit-base-patch16-224-in21k')
165 |         elif args.resize[0] == 384:
166 |             backbone = ViTModel.from_pretrained('google/vit-base-patch16-384')
167 | 
168 |         if args.trunc_te:
169 |             logging.debug(f"Truncate ViT at transformers encoder {args.trunc_te}")
170 |             backbone.encoder.layer = backbone.encoder.layer[:args.trunc_te]
171 |         if args.freeze_te:
172 |             logging.debug(f"Freeze all the layers up to tranformer encoder {args.freeze_te+1}")
173 |             for p in backbone.parameters():
174 |                 p.requires_grad = False
175 |             for name, child in backbone.encoder.layer.named_children():
176 |                 if int(name) > args.freeze_te:
177 |                     for params in child.parameters():
178 |                         params.requires_grad = True
179 |         backbone = VitWrapper(backbone, args.aggregation)
180 |         
181 |         args.features_dim = 768
182 |         return backbone
183 | 
184 |     backbone = torch.nn.Sequential(*layers)
185 |     args.features_dim = get_output_channels_dim(backbone)  # Dinamically obtain number of channels in output
186 |     return backbone
187 | 
188 | 
189 | class VitWrapper(nn.Module):
190 |     def __init__(self, vit_model, aggregation):
191 |         super().__init__()
192 |         self.vit_model = vit_model
193 |         self.aggregation = aggregation
194 |     def forward(self, x):
195 |         if self.aggregation in ["netvlad", "gem"]:
196 |             return self.vit_model(x).last_hidden_state[:, 1:, :]
197 |         else:
198 |             return self.vit_model(x).last_hidden_state[:, 0, :]
199 | 
200 | 
201 | def get_output_channels_dim(model):
202 |     """Return the number of channels in the output of a model."""
203 |     return model(torch.ones([1, 3, 224, 224])).shape[1]
204 | 
205 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/normalization.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import torch.nn as nn
 3 | import torch.nn.functional as F
 4 | 
 5 | class L2Norm(nn.Module):
 6 |     def __init__(self, dim=1):
 7 |         super().__init__()
 8 |         self.dim = dim
 9 |     def forward(self, x):
10 |         return F.normalize(x, p=2, dim=self.dim)
11 | 
12 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/sync_batchnorm/__init__.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # File   : __init__.py
 3 | # Author : Jiayuan Mao
 4 | # Email  : maojiayuan@gmail.com
 5 | # Date   : 27/01/2018
 6 | #
 7 | # This file is part of Synchronized-BatchNorm-PyTorch.
 8 | # https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
 9 | # Distributed under MIT License.
10 | 
11 | from .batchnorm import set_sbn_eps_mode
12 | from .batchnorm import SynchronizedBatchNorm1d, SynchronizedBatchNorm2d, SynchronizedBatchNorm3d
13 | from .batchnorm import patch_sync_batchnorm, convert_model
14 | from .replicate import DataParallelWithCallback, patch_replication_callback
15 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/sync_batchnorm/batchnorm_reimpl.py:
--------------------------------------------------------------------------------
 1 | #! /usr/bin/env python3
 2 | # -*- coding: utf-8 -*-
 3 | # File   : batchnorm_reimpl.py
 4 | # Author : acgtyrant
 5 | # Date   : 11/01/2018
 6 | #
 7 | # This file is part of Synchronized-BatchNorm-PyTorch.
 8 | # https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
 9 | # Distributed under MIT License.
10 | 
11 | import torch
12 | import torch.nn as nn
13 | import torch.nn.init as init
14 | 
15 | __all__ = ['BatchNorm2dReimpl']
16 | 
17 | 
18 | class BatchNorm2dReimpl(nn.Module):
19 |     """
20 |     A re-implementation of batch normalization, used for testing the numerical
21 |     stability.
22 | 
23 |     Author: acgtyrant
24 |     See also:
25 |     https://github.com/vacancy/Synchronized-BatchNorm-PyTorch/issues/14
26 |     """
27 |     def __init__(self, num_features, eps=1e-5, momentum=0.1):
28 |         super().__init__()
29 | 
30 |         self.num_features = num_features
31 |         self.eps = eps
32 |         self.momentum = momentum
33 |         self.weight = nn.Parameter(torch.empty(num_features))
34 |         self.bias = nn.Parameter(torch.empty(num_features))
35 |         self.register_buffer('running_mean', torch.zeros(num_features))
36 |         self.register_buffer('running_var', torch.ones(num_features))
37 |         self.reset_parameters()
38 | 
39 |     def reset_running_stats(self):
40 |         self.running_mean.zero_()
41 |         self.running_var.fill_(1)
42 | 
43 |     def reset_parameters(self):
44 |         self.reset_running_stats()
45 |         init.uniform_(self.weight)
46 |         init.zeros_(self.bias)
47 | 
48 |     def forward(self, input_):
49 |         batchsize, channels, height, width = input_.size()
50 |         numel = batchsize * height * width
51 |         input_ = input_.permute(1, 0, 2, 3).contiguous().view(channels, numel)
52 |         sum_ = input_.sum(1)
53 |         sum_of_square = input_.pow(2).sum(1)
54 |         mean = sum_ / numel
55 |         sumvar = sum_of_square - sum_ * mean
56 | 
57 |         self.running_mean = (
58 |                 (1 - self.momentum) * self.running_mean
59 |                 + self.momentum * mean.detach()
60 |         )
61 |         unbias_var = sumvar / (numel - 1)
62 |         self.running_var = (
63 |                 (1 - self.momentum) * self.running_var
64 |                 + self.momentum * unbias_var.detach()
65 |         )
66 | 
67 |         bias_var = sumvar / numel
68 |         inv_std = 1 / (bias_var + self.eps).pow(0.5)
69 |         output = (
70 |                 (input_ - mean.unsqueeze(1)) * inv_std.unsqueeze(1) *
71 |                 self.weight.unsqueeze(1) + self.bias.unsqueeze(1))
72 | 
73 |         return output.view(channels, batchsize, height, width).permute(1, 0, 2, 3).contiguous()
74 | 
75 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/sync_batchnorm/comm.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # File   : comm.py
  3 | # Author : Jiayuan Mao
  4 | # Email  : maojiayuan@gmail.com
  5 | # Date   : 27/01/2018
  6 | # 
  7 | # This file is part of Synchronized-BatchNorm-PyTorch.
  8 | # https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
  9 | # Distributed under MIT License.
 10 | 
 11 | import queue
 12 | import collections
 13 | import threading
 14 | 
 15 | __all__ = ['FutureResult', 'SlavePipe', 'SyncMaster']
 16 | 
 17 | 
 18 | class FutureResult(object):
 19 |     """A thread-safe future implementation. Used only as one-to-one pipe."""
 20 | 
 21 |     def __init__(self):
 22 |         self._result = None
 23 |         self._lock = threading.Lock()
 24 |         self._cond = threading.Condition(self._lock)
 25 | 
 26 |     def put(self, result):
 27 |         with self._lock:
 28 |             assert self._result is None, 'Previous result has\'t been fetched.'
 29 |             self._result = result
 30 |             self._cond.notify()
 31 | 
 32 |     def get(self):
 33 |         with self._lock:
 34 |             if self._result is None:
 35 |                 self._cond.wait()
 36 | 
 37 |             res = self._result
 38 |             self._result = None
 39 |             return res
 40 | 
 41 | 
 42 | _MasterRegistry = collections.namedtuple('MasterRegistry', ['result'])
 43 | _SlavePipeBase = collections.namedtuple('_SlavePipeBase', ['identifier', 'queue', 'result'])
 44 | 
 45 | 
 46 | class SlavePipe(_SlavePipeBase):
 47 |     """Pipe for master-slave communication."""
 48 | 
 49 |     def run_slave(self, msg):
 50 |         self.queue.put((self.identifier, msg))
 51 |         ret = self.result.get()
 52 |         self.queue.put(True)
 53 |         return ret
 54 | 
 55 | 
 56 | class SyncMaster(object):
 57 |     """An abstract `SyncMaster` object.
 58 | 
 59 |     - During the replication, as the data parallel will trigger an callback of each module, all slave devices should
 60 |     call `register(id)` and obtain an `SlavePipe` to communicate with the master.
 61 |     - During the forward pass, master device invokes `run_master`, all messages from slave devices will be collected,
 62 |     and passed to a registered callback.
 63 |     - After receiving the messages, the master device should gather the information and determine to message passed
 64 |     back to each slave devices.
 65 |     """
 66 | 
 67 |     def __init__(self, master_callback):
 68 |         """
 69 | 
 70 |         Args:
 71 |             master_callback: a callback to be invoked after having collected messages from slave devices.
 72 |         """
 73 |         self._master_callback = master_callback
 74 |         self._queue = queue.Queue()
 75 |         self._registry = collections.OrderedDict()
 76 |         self._activated = False
 77 | 
 78 |     def __getstate__(self):
 79 |         return {'master_callback': self._master_callback}
 80 | 
 81 |     def __setstate__(self, state):
 82 |         self.__init__(state['master_callback'])
 83 | 
 84 |     def register_slave(self, identifier):
 85 |         """
 86 |         Register an slave device.
 87 | 
 88 |         Args:
 89 |             identifier: an identifier, usually is the device id.
 90 | 
 91 |         Returns: a `SlavePipe` object which can be used to communicate with the master device.
 92 | 
 93 |         """
 94 |         if self._activated:
 95 |             assert self._queue.empty(), 'Queue is not clean before next initialization.'
 96 |             self._activated = False
 97 |             self._registry.clear()
 98 |         future = FutureResult()
 99 |         self._registry[identifier] = _MasterRegistry(future)
100 |         return SlavePipe(identifier, self._queue, future)
101 | 
102 |     def run_master(self, master_msg):
103 |         """
104 |         Main entry for the master device in each forward pass.
105 |         The messages were first collected from each devices (including the master device), and then
106 |         an callback will be invoked to compute the message to be sent back to each devices
107 |         (including the master device).
108 | 
109 |         Args:
110 |             master_msg: the message that the master want to send to itself. This will be placed as the first
111 |             message when calling `master_callback`. For detailed usage, see `_SynchronizedBatchNorm` for an example.
112 | 
113 |         Returns: the message to be sent back to the master device.
114 | 
115 |         """
116 |         self._activated = True
117 | 
118 |         intermediates = [(0, master_msg)]
119 |         for i in range(self.nr_slaves):
120 |             intermediates.append(self._queue.get())
121 | 
122 |         results = self._master_callback(intermediates)
123 |         assert results[0][0] == 0, 'The first result should belongs to the master.'
124 | 
125 |         for i, res in results:
126 |             if i == 0:
127 |                 continue
128 |             self._registry[i].result.put(res)
129 | 
130 |         for i in range(self.nr_slaves):
131 |             assert self._queue.get() is True
132 | 
133 |         return results[0][1]
134 | 
135 |     @property
136 |     def nr_slaves(self):
137 |         return len(self._registry)
138 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/sync_batchnorm/replicate.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # File   : replicate.py
 3 | # Author : Jiayuan Mao
 4 | # Email  : maojiayuan@gmail.com
 5 | # Date   : 27/01/2018
 6 | # 
 7 | # This file is part of Synchronized-BatchNorm-PyTorch.
 8 | # https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
 9 | # Distributed under MIT License.
10 | 
11 | import functools
12 | 
13 | from torch.nn.parallel.data_parallel import DataParallel
14 | 
15 | __all__ = [
16 |     'CallbackContext',
17 |     'execute_replication_callbacks',
18 |     'DataParallelWithCallback',
19 |     'patch_replication_callback'
20 | ]
21 | 
22 | 
23 | class CallbackContext(object):
24 |     pass
25 | 
26 | 
27 | def execute_replication_callbacks(modules):
28 |     """
29 |     Execute an replication callback `__data_parallel_replicate__` on each module created by original replication.
30 | 
31 |     The callback will be invoked with arguments `__data_parallel_replicate__(ctx, copy_id)`
32 | 
33 |     Note that, as all modules are isomorphism, we assign each sub-module with a context
34 |     (shared among multiple copies of this module on different devices).
35 |     Through this context, different copies can share some information.
36 | 
37 |     We guarantee that the callback on the master copy (the first copy) will be called ahead of calling the callback
38 |     of any slave copies.
39 |     """
40 |     master_copy = modules[0]
41 |     nr_modules = len(list(master_copy.modules()))
42 |     ctxs = [CallbackContext() for _ in range(nr_modules)]
43 | 
44 |     for i, module in enumerate(modules):
45 |         for j, m in enumerate(module.modules()):
46 |             if hasattr(m, '__data_parallel_replicate__'):
47 |                 m.__data_parallel_replicate__(ctxs[j], i)
48 | 
49 | 
50 | class DataParallelWithCallback(DataParallel):
51 |     """
52 |     Data Parallel with a replication callback.
53 | 
54 |     An replication callback `__data_parallel_replicate__` of each module will be invoked after being created by
55 |     original `replicate` function.
56 |     The callback will be invoked with arguments `__data_parallel_replicate__(ctx, copy_id)`
57 | 
58 |     Examples:
59 |         > sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False)
60 |         > sync_bn = DataParallelWithCallback(sync_bn, device_ids=[0, 1])
61 |         # sync_bn.__data_parallel_replicate__ will be invoked.
62 |     """
63 | 
64 |     def replicate(self, module, device_ids):
65 |         modules = super(DataParallelWithCallback, self).replicate(module, device_ids)
66 |         execute_replication_callbacks(modules)
67 |         return modules
68 | 
69 | 
70 | def patch_replication_callback(data_parallel):
71 |     """
72 |     Monkey-patch an existing `DataParallel` object. Add the replication callback.
73 |     Useful when you have customized `DataParallel` implementation.
74 | 
75 |     Examples:
76 |         > sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False)
77 |         > sync_bn = DataParallel(sync_bn, device_ids=[0, 1])
78 |         > patch_replication_callback(sync_bn)
79 |         # this is equivalent to
80 |         > sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False)
81 |         > sync_bn = DataParallelWithCallback(sync_bn, device_ids=[0, 1])
82 |     """
83 | 
84 |     assert isinstance(data_parallel, DataParallel)
85 | 
86 |     old_replicate = data_parallel.replicate
87 | 
88 |     @functools.wraps(old_replicate)
89 |     def new_replicate(module, device_ids):
90 |         modules = old_replicate(module, device_ids)
91 |         execute_replication_callbacks(modules)
92 |         return modules
93 | 
94 |     data_parallel.replicate = new_replicate
95 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/model/sync_batchnorm/unittest.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # File   : unittest.py
 3 | # Author : Jiayuan Mao
 4 | # Email  : maojiayuan@gmail.com
 5 | # Date   : 27/01/2018
 6 | #
 7 | # This file is part of Synchronized-BatchNorm-PyTorch.
 8 | # https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
 9 | # Distributed under MIT License.
10 | 
11 | import unittest
12 | import torch
13 | 
14 | 
15 | class TorchTestCase(unittest.TestCase):
16 |     def assertTensorClose(self, x, y):
17 |         adiff = float((x - y).abs().max())
18 |         if (y == 0).all():
19 |             rdiff = 'NaN'
20 |         else:
21 |             rdiff = float((adiff / y).abs().max())
22 | 
23 |         message = (
24 |             'Tensor close check failed\n'
25 |             'adiff={}\n'
26 |             'rdiff={}\n'
27 |         ).format(adiff, rdiff)
28 |         self.assertTrue(torch.allclose(x, y, atol=1e-5, rtol=1e-3), message)
29 | 
30 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/benchmark/deep-visual-geo-localization-benchmark/models/__init__.py


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/models/aggregators/__init__.py:
--------------------------------------------------------------------------------
1 | from .cosplace import CosPlace
2 | from .convap import ConvAP
3 | from .gem import GeMPool
4 | from .mixvpr import MixVPR


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/models/aggregators/convap.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn.functional as F
 3 | import torch.nn as nn
 4 | 
 5 | 
 6 | class ConvAP(nn.Module):
 7 |     """Implementation of ConvAP as of https://arxiv.org/pdf/2210.10239.pdf
 8 | 
 9 |     Args:
10 |         in_channels (int): number of channels in the input of ConvAP
11 |         out_channels (int, optional): number of channels that ConvAP outputs. Defaults to 512.
12 |         s1 (int, optional): spatial height of the adaptive average pooling. Defaults to 2.
13 |         s2 (int, optional): spatial width of the adaptive average pooling. Defaults to 2.
14 |     """
15 |     def __init__(self, in_channels, out_channels=512, s1=2, s2=2):
16 |         super(ConvAP, self).__init__()
17 |         self.channel_pool = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, bias=True)
18 |         self.AAP = nn.AdaptiveAvgPool2d((s1, s2))
19 | 
20 |     def forward(self, x):
21 |         x = self.channel_pool(x)
22 |         x = self.AAP(x)
23 |         x = F.normalize(x.flatten(1), p=2, dim=1)
24 |         return x
25 |     
26 | 
27 | if __name__ == '__main__':
28 |     x = torch.randn(4, 2048, 10, 10)
29 |     m = ConvAP(2048, 512)
30 |     r = m(x)
31 |     print(r.shape)


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/models/aggregators/cosplace.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn.functional as F
 3 | import torch.nn as nn
 4 | 
 5 | class GeM(nn.Module):
 6 |     """Implementation of GeM as in https://github.com/filipradenovic/cnnimageretrieval-pytorch
 7 |     """
 8 |     def __init__(self, p=3, eps=1e-6):
 9 |         super().__init__()
10 |         self.p = nn.Parameter(torch.ones(1)*p)
11 |         self.eps = eps
12 | 
13 |     def forward(self, x):
14 |         return F.avg_pool2d(x.clamp(min=self.eps).pow(self.p), (x.size(-2), x.size(-1))).pow(1./self.p)
15 | 
16 | class CosPlace(nn.Module):
17 |     """
18 |     CosPlace aggregation layer as implemented in https://github.com/gmberton/CosPlace/blob/main/model/network.py
19 | 
20 |     Args:
21 |         in_dim: number of channels of the input
22 |         out_dim: dimension of the output descriptor 
23 |     """
24 |     def __init__(self, in_dim, out_dim):
25 |         super().__init__()
26 |         self.gem = GeM()
27 |         self.fc = nn.Linear(in_dim, out_dim)
28 | 
29 |     def forward(self, x):
30 |         x = F.normalize(x, p=2, dim=1)
31 |         x = self.gem(x)
32 |         x = x.flatten(1)
33 |         x = self.fc(x)
34 |         x = F.normalize(x, p=2, dim=1)
35 |         return x
36 | 
37 | if __name__ == '__main__':
38 |     x = torch.randn(4, 2048, 10, 10)
39 |     m = CosPlace(2048, 512)
40 |     r = m(x)
41 |     print(r.shape)


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/models/aggregators/gem.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn.functional as F
 3 | import torch.nn as nn
 4 | 
 5 | class GeMPool(nn.Module):
 6 |     """Implementation of GeM as in https://github.com/filipradenovic/cnnimageretrieval-pytorch
 7 |     we add flatten and norm so that we can use it as one aggregation layer.
 8 |     """
 9 |     def __init__(self, p=3, eps=1e-6):
10 |         super().__init__()
11 |         self.p = nn.Parameter(torch.ones(1)*p)
12 |         self.eps = eps
13 | 
14 |     def forward(self, x):
15 |         x = F.avg_pool2d(x.clamp(min=self.eps).pow(self.p), (x.size(-2), x.size(-1))).pow(1./self.p)
16 |         x = x.flatten(1)
17 |         return F.normalize(x, p=2, dim=1)


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/models/aggregators/mixvpr.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn.functional as F
 3 | import torch.nn as nn
 4 | 
 5 | import numpy as np
 6 | 
 7 | 
 8 | class FeatureMixerLayer(nn.Module):
 9 |     def __init__(self, in_dim, mlp_ratio=1):
10 |         super().__init__()
11 |         self.mix = nn.Sequential(
12 |             nn.LayerNorm(in_dim),
13 |             nn.Linear(in_dim, int(in_dim * mlp_ratio)),
14 |             nn.ReLU(),
15 |             nn.Linear(int(in_dim * mlp_ratio), in_dim),
16 |         )
17 | 
18 |         for m in self.modules():
19 |             if isinstance(m, (nn.Linear)):
20 |                 nn.init.trunc_normal_(m.weight, std=0.02)
21 |                 if m.bias is not None:
22 |                     nn.init.zeros_(m.bias)
23 | 
24 |     def forward(self, x):
25 |         return x + self.mix(x)
26 | 
27 | 
28 | class MixVPR(nn.Module):
29 |     def __init__(self,
30 |                  in_channels=1024,
31 |                  in_h=20,
32 |                  in_w=20,
33 |                  out_channels=512,
34 |                  mix_depth=1,
35 |                  mlp_ratio=1,
36 |                  out_rows=4,
37 |                  ) -> None:
38 |         super().__init__()
39 | 
40 |         self.in_h = in_h # height of input feature maps
41 |         self.in_w = in_w # width of input feature maps
42 |         self.in_channels = in_channels # depth of input feature maps
43 |         
44 |         self.out_channels = out_channels # depth wise projection dimension
45 |         self.out_rows = out_rows # row wise projection dimesion
46 | 
47 |         self.mix_depth = mix_depth # L the number of stacked FeatureMixers
48 |         self.mlp_ratio = mlp_ratio # ratio of the mid projection layer in the mixer block
49 | 
50 |         hw = in_h*in_w
51 |         self.mix = nn.Sequential(*[
52 |             FeatureMixerLayer(in_dim=hw, mlp_ratio=mlp_ratio)
53 |             for _ in range(self.mix_depth)
54 |         ])
55 |         self.channel_proj = nn.Linear(in_channels, out_channels)
56 |         self.row_proj = nn.Linear(hw, out_rows)
57 | 
58 |     def forward(self, x):
59 |         x = x.flatten(2)
60 |         x = self.mix(x)
61 |         x = x.permute(0, 2, 1)
62 |         x = self.channel_proj(x)
63 |         x = x.permute(0, 2, 1)
64 |         x = self.row_proj(x)
65 |         x = F.normalize(x.flatten(1), p=2, dim=-1)
66 |         return x
67 | 
68 | 
69 | # -------------------------------------------------------------------------------
70 | 
71 | def print_nb_params(m):
72 |     model_parameters = filter(lambda p: p.requires_grad, m.parameters())
73 |     params = sum([np.prod(p.size()) for p in model_parameters])
74 |     print(f'Trainable parameters: {params/1e6:.3}M')
75 | 
76 | 
77 | def main():
78 |     x = torch.randn(1, 1024, 20, 20)
79 |     agg = MixVPR(
80 |         in_channels=1024,
81 |         in_h=20,
82 |         in_w=20,
83 |         out_channels=1024,
84 |         mix_depth=4,
85 |         mlp_ratio=1,
86 |         out_rows=4)
87 | 
88 |     print_nb_params(agg)
89 |     output = agg(x)
90 |     print(output.shape)
91 | 
92 | 
93 | if __name__ == '__main__':
94 |     main()
95 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/models/backbones/__init__.py:
--------------------------------------------------------------------------------
1 | from .efficientnet import EfficientNet
2 | from .resnet import ResNet
3 | from .swin import Swin


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/models/backbones/efficientnet.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import timm
 4 | import numpy as np
 5 | 
 6 | class EfficientNet(nn.Module):
 7 |     def __init__(self,
 8 |                  model_name='efficientnet_b0',
 9 |                  pretrained=True,
10 |                  layers_to_freeze=4,
11 |                  ):
12 |         """Class representing the EfficientNet backbone used in the pipeline
13 |         EfficientNet contains 7 efficient blocks (0 to 6),
14 |         we don't take into account the global pooling and the last fc
15 | 
16 |         Args:
17 |             model_name (str, optional): The architecture of the efficietnet backbone to instanciate. Defaults to 'efficientnet_b0'.
18 |             pretrained (bool, optional): Whether pretrained or not. Defaults to True.
19 |             layers_to_freeze (int, optional): The number of blocks to freeze (starting from 0) . Defaults to 4.
20 |         """
21 |         super().__init__()
22 |         self.model_name = model_name
23 |         self.layers_to_freeze = layers_to_freeze
24 |         self.model = timm.create_model(model_name=model_name, pretrained=pretrained)
25 |         
26 |         # freeze only if the model is pretrained
27 |         if pretrained:
28 |             if layers_to_freeze >= 0:
29 |                 self.model.conv_stem.requires_grad_(False)
30 |                 self.model.blocks[0].requires_grad_(False)
31 |                 self.model.blocks[1].requires_grad_(False)
32 |             if layers_to_freeze >= 1:
33 |                 self.model.blocks[2].requires_grad_(False)
34 |             if layers_to_freeze >= 2:
35 |                 self.model.blocks[3].requires_grad_(False)
36 |             if layers_to_freeze >= 3:
37 |                 self.model.blocks[4].requires_grad_(False)
38 |             if layers_to_freeze >= 4:
39 |                 self.model.blocks[5].requires_grad_(False)
40 | 
41 |         self.model.global_pool = None
42 |         self.model.fc = None
43 |         
44 |         out_channels = 1280 # for b0 and b1
45 |         if 'b2' in model_name:
46 |             out_channels = 1408
47 |         elif 'b3' in model_name:
48 |             out_channels = 1536
49 |         elif 'b4' in model_name:
50 |             out_channels = 1792
51 |         self.out_channels = out_channels
52 |         
53 |     def forward(self, x):
54 |         x = self.model.forward_features(x)
55 |         return x
56 | 
57 | 
58 | def print_nb_params(m):
59 |     model_parameters = filter(lambda p: p.requires_grad, m.parameters())
60 |     params = sum([np.prod(p.size()) for p in model_parameters])
61 |     print(f'Trainable parameters: {params/1e6:.3}M')
62 | 
63 | 
64 | if __name__ == '__main__':
65 |     x = torch.randn(4, 3, 320, 320)
66 |     m = EfficientNet(model_name='efficientnet_b0',
67 |                   pretrained=True,
68 |                   layers_to_freeze=0,
69 |                 )
70 |     r = m(x)
71 |     print_nb_params(m)
72 |     print(f'Input shape is {x.shape}')
73 |     print(f'Output shape is {r.shape}')
74 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/models/backbones/resnet.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torchvision
  4 | import numpy as np
  5 | 
  6 | class ResNet(nn.Module):
  7 |     def __init__(self,
  8 |                  model_name='resnet50',
  9 |                  pretrained=True,
 10 |                  layers_to_freeze=2,
 11 |                  layers_to_crop=[],
 12 |                  ):
 13 |         """Class representing the resnet backbone used in the pipeline
 14 |         we consider resnet network as a list of 5 blocks (from 0 to 4),
 15 |         layer 0 is the first conv+bn and the other layers (1 to 4) are the rest of the residual blocks
 16 |         we don't take into account the global pooling and the last fc
 17 | 
 18 |         Args:
 19 |             model_name (str, optional): The architecture of the resnet backbone to instanciate. Defaults to 'resnet50'.
 20 |             pretrained (bool, optional): Whether pretrained or not. Defaults to True.
 21 |             layers_to_freeze (int, optional): The number of residual blocks to freeze (starting from 0) . Defaults to 2.
 22 |             layers_to_crop (list, optional): Which residual layers to crop, for example [3,4] will crop the third and fourth res blocks. Defaults to [].
 23 | 
 24 |         Raises:
 25 |             NotImplementedError: if the model_name corresponds to an unknown architecture. 
 26 |         """
 27 |         super().__init__()
 28 |         self.model_name = model_name.lower()
 29 |         self.layers_to_freeze = layers_to_freeze
 30 | 
 31 |         if pretrained:
 32 |             # the new naming of pretrained weights, you can change to V2 if desired.
 33 |             weights = 'IMAGENET1K_V1'
 34 |         else:
 35 |             weights = None
 36 | 
 37 |         if 'swsl' in model_name or 'ssl' in model_name:
 38 |             # These are the semi supervised and weakly semi supervised weights from Facebook
 39 |             self.model = torch.hub.load(
 40 |                 'facebookresearch/semi-supervised-ImageNet1K-models', model_name)
 41 |         else:
 42 |             if 'resnext50' in model_name:
 43 |                 self.model = torchvision.models.resnext50_32x4d(weights=weights)
 44 |             elif 'resnet50' in model_name:
 45 |                 self.model = torchvision.models.resnet50(weights=weights)
 46 |             elif '101' in model_name:
 47 |                 self.model = torchvision.models.resnet101(weights=weights)
 48 |             elif '152' in model_name:
 49 |                 self.model = torchvision.models.resnet152(weights=weights)
 50 |             elif '34' in model_name:
 51 |                 self.model = torchvision.models.resnet34(weights=weights)
 52 |             elif '18' in model_name:
 53 |                 # self.model = torchvision.models.resnet18(pretrained=False)
 54 |                 self.model = torchvision.models.resnet18(weights=weights)
 55 |             elif 'wide_resnet50_2' in model_name:
 56 |                 self.model = torchvision.models.wide_resnet50_2(weights=weights)
 57 |             else:
 58 |                 raise NotImplementedError(
 59 |                     'Backbone architecture not recognized!')
 60 | 
 61 |         # freeze only if the model is pretrained
 62 |         if pretrained:
 63 |             if layers_to_freeze >= 0:
 64 |                 self.model.conv1.requires_grad_(False)
 65 |                 self.model.bn1.requires_grad_(False)
 66 |             if layers_to_freeze >= 1:
 67 |                 self.model.layer1.requires_grad_(False)
 68 |             if layers_to_freeze >= 2:
 69 |                 self.model.layer2.requires_grad_(False)
 70 |             if layers_to_freeze >= 3:
 71 |                 self.model.layer3.requires_grad_(False)
 72 | 
 73 |         # remove the avgpool and most importantly the fc layer
 74 |         self.model.avgpool = None
 75 |         self.model.fc = None
 76 | 
 77 |         if 4 in layers_to_crop:
 78 |             self.model.layer4 = None
 79 |         if 3 in layers_to_crop:
 80 |             self.model.layer3 = None
 81 | 
 82 |         out_channels = 2048
 83 |         if '34' in model_name or '18' in model_name:
 84 |             out_channels = 512
 85 |             
 86 |         self.out_channels = out_channels // 2 if self.model.layer4 is None else out_channels
 87 |         self.out_channels = self.out_channels // 2 if self.model.layer3 is None else self.out_channels
 88 | 
 89 |     def forward(self, x):
 90 |         x = self.model.conv1(x)
 91 |         x = self.model.bn1(x)
 92 |         x = self.model.relu(x)
 93 |         x = self.model.maxpool(x)
 94 |         x = self.model.layer1(x)
 95 |         x = self.model.layer2(x)
 96 |         if self.model.layer3 is not None:
 97 |             x = self.model.layer3(x)
 98 |         if self.model.layer4 is not None:
 99 |             x = self.model.layer4(x)
100 |         return x
101 | 
102 | 
103 | # def print_nb_params(m):
104 | #     model_parameters = filter(lambda p: p.requires_grad, m.parameters())
105 | #     params = sum([np.prod(p.size()) for p in model_parameters])
106 | #     print(f'Trainable parameters: {params/1e6:.3}M')
107 | 
108 | 
109 | # def main():
110 | #     x = torch.randn(1, 3, 320, 320)
111 | #     m = ResNet(model_name='resnet50',
112 | #                pretrained=True,
113 | #                layers_to_freeze=2,
114 | #                layers_to_crop=[],)
115 | #     r = m(x)
116 | #     helper.print_nb_params(m)
117 | #     print(f'Input shape is {x.shape}')
118 | #     print(f'Output shape is {r.shape}')
119 | 
120 | 
121 | # if __name__ == '__main__':
122 | #     main()
123 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/models/backbones/swin.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import timm
 4 | import numpy as np
 5 | 
 6 | 
 7 | class Swin(nn.Module):
 8 |     def __init__(self,
 9 |                  model_name='swinv2_base_window12to16_192to256_22kft1k',
10 |                  pretrained=True,
11 |                  layers_to_freeze=2
12 |                  ):
13 |         """Class representing the Swin (V1 and V2) backbone used in the pipeline
14 |         Swin contains 4 layers (0 to 3), where layer 2 is the heaviest in terms of # params
15 | 
16 |         Args:
17 |             model_name (str, optional): The architecture of the Swin backbone to instanciate. Defaults to 'swinv2_base_window12to16_192to256_22kft1k'.
18 |             pretrained (bool, optional): Whether pretrained or not. Defaults to True.
19 |             layers_to_freeze (int, optional): The number of blocks to freeze in layers[2]  (starting from 0) . Defaults to 2.
20 |         """
21 |         super().__init__()
22 |         self.model_name = model_name
23 |         self.layers_to_freeze = layers_to_freeze        
24 |         self.model = timm.create_model(model_name, pretrained=pretrained, num_classes=0)
25 |         self.model.head = None
26 |         
27 |         if pretrained:
28 |             self.model.patch_embed.requires_grad_(False)
29 |             self.model.layers[0].requires_grad_(False)
30 |             self.model.layers[1].requires_grad_(False)
31 |             # layers[2] contains most of the blocks, better freeze some of them
32 |             for i in range(layers_to_freeze*5): # we make 5 steps (swin contains lots of layers)
33 |                 self.model.layers[2].blocks[i].requires_grad_(False)
34 |             
35 | 
36 |         if 'base' in model_name:
37 |             out_channels = 1024
38 |         elif 'large' in model_name:
39 |             out_channels = 1536
40 |         else:
41 |             out_channels = 768
42 |         self.out_channels = out_channels
43 |         
44 |         if '384' in model_name:
45 |             self.depth = 144
46 |         else:
47 |             self.depth = 49
48 |             
49 |     def forward(self, x):
50 |         x = self.model.forward_features(x)
51 |         # the following is a hack to make the output of the transformer
52 |         # as a 3D feature maps
53 |         bs, f, c = x.shape
54 |         x = x.view(bs, int(np.sqrt(f)), int(np.sqrt(f)), c)
55 |         return x.permute(0,3,1,2)
56 |     
57 | 
58 | def print_nb_params(m):
59 |     model_parameters = filter(lambda p: p.requires_grad, m.parameters())
60 |     params = sum([np.prod(p.size()) for p in model_parameters])
61 |     print(f'Trainable parameters: {params/1e6:.3}M')
62 | 
63 | if __name__ == '__main__':
64 |     x = torch.randn(4,3,256,256)
65 |     m = Swin(model_name='swinv2_base_window12to16_192to256_22kft1k',
66 |                  pretrained=True,
67 |                  layers_to_freeze=2,)
68 |     r = m(x)
69 |     print_nb_params(m)
70 |     print(f'Input shape is {x.shape}')
71 |     print(f'Output shape is {r.shape}')


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/models/helper.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from models import aggregators
 3 | from models import backbones
 4 | 
 5 | 
 6 | def get_backbone(backbone_arch='resnet50',
 7 |                  pretrained=True,
 8 |                  layers_to_freeze=2,
 9 |                  layers_to_crop=[],):
10 |     """Helper function that returns the backbone given its name
11 | 
12 |     Args:
13 |         backbone_arch (str, optional): . Defaults to 'resnet50'.
14 |         pretrained (bool, optional): . Defaults to True.
15 |         layers_to_freeze (int, optional): . Defaults to 2.
16 |         layers_to_crop (list, optional): This is mostly used with ResNet where 
17 |                                          we sometimes need to crop the last 
18 |                                          residual block (ex. [4]). Defaults to [].
19 | 
20 |     Returns:
21 |         nn.Module: the backbone as a nn.Model object
22 |     """
23 |     if 'resnet' in backbone_arch.lower():
24 |         return backbones.ResNet(backbone_arch, pretrained, layers_to_freeze, layers_to_crop)
25 | 
26 |     elif 'efficient' in backbone_arch.lower():
27 |         if '_b' in backbone_arch.lower():
28 |             return backbones.EfficientNet(backbone_arch, pretrained, layers_to_freeze+2)
29 |         else:
30 |             return backbones.EfficientNet(model_name='efficientnet_b0',
31 |                                           pretrained=pretrained, 
32 |                                           layers_to_freeze=layers_to_freeze)
33 |             
34 |     elif 'swin' in backbone_arch.lower():
35 |         return backbones.Swin(model_name='swinv2_base_window12to16_192to256_22kft1k', 
36 |                               pretrained=pretrained, 
37 |                               layers_to_freeze=layers_to_freeze)
38 | 
39 | def get_aggregator(agg_arch='ConvAP', agg_config={}):
40 |     """Helper function that returns the aggregation layer given its name.
41 |     If you happen to make your own aggregator, you might need to add a call
42 |     to this helper function.
43 | 
44 |     Args:
45 |         agg_arch (str, optional): the name of the aggregator. Defaults to 'ConvAP'.
46 |         agg_config (dict, optional): this must contain all the arguments needed to instantiate the aggregator class. Defaults to {}.
47 | 
48 |     Returns:
49 |         nn.Module: the aggregation layer
50 |     """
51 |     
52 |     if 'cosplace' in agg_arch.lower():
53 |         assert 'in_dim' in agg_config
54 |         assert 'out_dim' in agg_config
55 |         return aggregators.CosPlace(**agg_config)
56 | 
57 |     elif 'gem' in agg_arch.lower():
58 |         if agg_config == {}:
59 |             agg_config['p'] = 3
60 |         else:
61 |             assert 'p' in agg_config
62 |         return aggregators.GeMPool(**agg_config)
63 |     
64 |     elif 'convap' in agg_arch.lower():
65 |         assert 'in_channels' in agg_config
66 |         return aggregators.ConvAP(**agg_config)
67 |     
68 |     elif 'mixvpr' in agg_arch.lower():
69 |         assert 'in_channels' in agg_config
70 |         assert 'out_channels' in agg_config
71 |         assert 'in_h' in agg_config
72 |         assert 'in_w' in agg_config
73 |         assert 'mix_depth' in agg_config
74 |         return aggregators.MixVPR(**agg_config)


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/__init__.py:
--------------------------------------------------------------------------------
 1 | from pytorch_grad_cam.grad_cam import GradCAM
 2 | from pytorch_grad_cam.hirescam import HiResCAM
 3 | from pytorch_grad_cam.grad_cam_elementwise import GradCAMElementWise
 4 | from pytorch_grad_cam.ablation_layer import AblationLayer, AblationLayerVit, AblationLayerFasterRCNN
 5 | from pytorch_grad_cam.ablation_cam import AblationCAM
 6 | from pytorch_grad_cam.xgrad_cam import XGradCAM
 7 | from pytorch_grad_cam.grad_cam_plusplus import GradCAMPlusPlus
 8 | from pytorch_grad_cam.score_cam import ScoreCAM
 9 | from pytorch_grad_cam.layer_cam import LayerCAM
10 | from pytorch_grad_cam.eigen_cam import EigenCAM
11 | from pytorch_grad_cam.eigen_grad_cam import EigenGradCAM
12 | from pytorch_grad_cam.random_cam import RandomCAM
13 | from pytorch_grad_cam.fullgrad_cam import FullGrad
14 | from pytorch_grad_cam.guided_backprop import GuidedBackpropReLUModel
15 | from pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients
16 | from pytorch_grad_cam.feature_factorization.deep_feature_factorization import DeepFeatureFactorization, run_dff_on_image
17 | import pytorch_grad_cam.utils.model_targets
18 | import pytorch_grad_cam.utils.reshape_transforms
19 | import pytorch_grad_cam.metrics.cam_mult_image
20 | import pytorch_grad_cam.metrics.road
21 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/ablation_cam.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | import tqdm
  4 | from typing import Callable, List
  5 | from pytorch_grad_cam.base_cam import BaseCAM
  6 | from pytorch_grad_cam.utils.find_layers import replace_layer_recursive
  7 | from pytorch_grad_cam.ablation_layer import AblationLayer
  8 | 
  9 | 
 10 | """ Implementation of AblationCAM
 11 | https://openaccess.thecvf.com/content_WACV_2020/papers/Desai_Ablation-CAM_Visual_Explanations_for_Deep_Convolutional_Network_via_Gradient-free_Localization_WACV_2020_paper.pdf
 12 | 
 13 | Ablate individual activations, and then measure the drop in the target score.
 14 | 
 15 | In the current implementation, the target layer activations is cached, so it won't be re-computed.
 16 | However layers before it, if any, will not be cached.
 17 | This means that if the target layer is a large block, for example model.featuers (in vgg), there will
 18 | be a large save in run time.
 19 | 
 20 | Since we have to go over many channels and ablate them, and every channel ablation requires a forward pass,
 21 | it would be nice if we could avoid doing that for channels that won't contribute anwyay, making it much faster.
 22 | The parameter ratio_channels_to_ablate controls how many channels should be ablated, using an experimental method
 23 | (to be improved). The default 1.0 value means that all channels will be ablated.
 24 | """
 25 | 
 26 | 
 27 | class AblationCAM(BaseCAM):
 28 |     def __init__(self,
 29 |                  model: torch.nn.Module,
 30 |                  target_layers: List[torch.nn.Module],
 31 |                  use_cuda: bool = False,
 32 |                  reshape_transform: Callable = None,
 33 |                  ablation_layer: torch.nn.Module = AblationLayer(),
 34 |                  batch_size: int = 32,
 35 |                  ratio_channels_to_ablate: float = 1.0) -> None:
 36 | 
 37 |         super(AblationCAM, self).__init__(model,
 38 |                                           target_layers,
 39 |                                           use_cuda,
 40 |                                           reshape_transform,
 41 |                                           uses_gradients=False)
 42 |         self.batch_size = batch_size
 43 |         self.ablation_layer = ablation_layer
 44 |         self.ratio_channels_to_ablate = ratio_channels_to_ablate
 45 | 
 46 |     def save_activation(self, module, input, output) -> None:
 47 |         """ Helper function to save the raw activations from the target layer """
 48 |         self.activations = output
 49 | 
 50 |     def assemble_ablation_scores(self,
 51 |                                  new_scores: list,
 52 |                                  original_score: float,
 53 |                                  ablated_channels: np.ndarray,
 54 |                                  number_of_channels: int) -> np.ndarray:
 55 |         """ Take the value from the channels that were ablated,
 56 |             and just set the original score for the channels that were skipped """
 57 | 
 58 |         index = 0
 59 |         result = []
 60 |         sorted_indices = np.argsort(ablated_channels)
 61 |         ablated_channels = ablated_channels[sorted_indices]
 62 |         new_scores = np.float32(new_scores)[sorted_indices]
 63 | 
 64 |         for i in range(number_of_channels):
 65 |             if index < len(ablated_channels) and ablated_channels[index] == i:
 66 |                 weight = new_scores[index]
 67 |                 index = index + 1
 68 |             else:
 69 |                 weight = original_score
 70 |             result.append(weight)
 71 | 
 72 |         return result
 73 | 
 74 |     def get_cam_weights(self,
 75 |                         input_tensor: torch.Tensor,
 76 |                         target_layer: torch.nn.Module,
 77 |                         targets: List[Callable],
 78 |                         activations: torch.Tensor,
 79 |                         grads: torch.Tensor) -> np.ndarray:
 80 | 
 81 |         # Do a forward pass, compute the target scores, and cache the
 82 |         # activations
 83 |         handle = target_layer.register_forward_hook(self.save_activation)
 84 |         with torch.no_grad():
 85 |             outputs = self.model(input_tensor)
 86 |             handle.remove()
 87 |             original_scores = np.float32(
 88 |                 [target(output).cpu().item() for target, output in zip(targets, outputs)])
 89 | 
 90 |         # Replace the layer with the ablation layer.
 91 |         # When we finish, we will replace it back, so the original model is
 92 |         # unchanged.
 93 |         ablation_layer = self.ablation_layer
 94 |         replace_layer_recursive(self.model, target_layer, ablation_layer)
 95 | 
 96 |         number_of_channels = activations.shape[1]
 97 |         weights = []
 98 |         # This is a "gradient free" method, so we don't need gradients here.
 99 |         with torch.no_grad():
100 |             # Loop over each of the batch images and ablate activations for it.
101 |             for batch_index, (target, tensor) in enumerate(
102 |                     zip(targets, input_tensor)):
103 |                 new_scores = []
104 |                 batch_tensor = tensor.repeat(self.batch_size, 1, 1, 1)
105 | 
106 |                 # Check which channels should be ablated. Normally this will be all channels,
107 |                 # But we can also try to speed this up by using a low
108 |                 # ratio_channels_to_ablate.
109 |                 channels_to_ablate = ablation_layer.activations_to_be_ablated(
110 |                     activations[batch_index, :], self.ratio_channels_to_ablate)
111 |                 number_channels_to_ablate = len(channels_to_ablate)
112 | 
113 |                 for i in tqdm.tqdm(
114 |                     range(
115 |                         0,
116 |                         number_channels_to_ablate,
117 |                         self.batch_size)):
118 |                     if i + self.batch_size > number_channels_to_ablate:
119 |                         batch_tensor = batch_tensor[:(
120 |                             number_channels_to_ablate - i)]
121 | 
122 |                     # Change the state of the ablation layer so it ablates the next channels.
123 |                     # TBD: Move this into the ablation layer forward pass.
124 |                     ablation_layer.set_next_batch(
125 |                         input_batch_index=batch_index,
126 |                         activations=self.activations,
127 |                         num_channels_to_ablate=batch_tensor.size(0))
128 |                     score = [target(o).cpu().item()
129 |                              for o in self.model(batch_tensor)]
130 |                     new_scores.extend(score)
131 |                     ablation_layer.indices = ablation_layer.indices[batch_tensor.size(
132 |                         0):]
133 | 
134 |                 new_scores = self.assemble_ablation_scores(
135 |                     new_scores,
136 |                     original_scores[batch_index],
137 |                     channels_to_ablate,
138 |                     number_of_channels)
139 |                 weights.extend(new_scores)
140 | 
141 |         weights = np.float32(weights)
142 |         weights = weights.reshape(activations.shape[:2])
143 |         original_scores = original_scores[:, None]
144 |         weights = (original_scores - weights) / original_scores
145 | 
146 |         # Replace the model back to the original state
147 |         replace_layer_recursive(self.model, ablation_layer, target_layer)
148 |         return weights
149 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/ablation_cam_multilayer.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import numpy as np
  3 | import torch
  4 | import tqdm
  5 | from pytorch_grad_cam.base_cam import BaseCAM
  6 | 
  7 | 
  8 | class AblationLayer(torch.nn.Module):
  9 |     def __init__(self, layer, reshape_transform, indices):
 10 |         super(AblationLayer, self).__init__()
 11 | 
 12 |         self.layer = layer
 13 |         self.reshape_transform = reshape_transform
 14 |         # The channels to zero out:
 15 |         self.indices = indices
 16 | 
 17 |     def forward(self, x):
 18 |         self.__call__(x)
 19 | 
 20 |     def __call__(self, x):
 21 |         output = self.layer(x)
 22 | 
 23 |         # Hack to work with ViT,
 24 |         # Since the activation channels are last and not first like in CNNs
 25 |         # Probably should remove it?
 26 |         if self.reshape_transform is not None:
 27 |             output = output.transpose(1, 2)
 28 | 
 29 |         for i in range(output.size(0)):
 30 | 
 31 |             # Commonly the minimum activation will be 0,
 32 |             # And then it makes sense to zero it out.
 33 |             # However depending on the architecture,
 34 |             # If the values can be negative, we use very negative values
 35 |             # to perform the ablation, deviating from the paper.
 36 |             if torch.min(output) == 0:
 37 |                 output[i, self.indices[i], :] = 0
 38 |             else:
 39 |                 ABLATION_VALUE = 1e5
 40 |                 output[i, self.indices[i], :] = torch.min(
 41 |                     output) - ABLATION_VALUE
 42 | 
 43 |         if self.reshape_transform is not None:
 44 |             output = output.transpose(2, 1)
 45 | 
 46 |         return output
 47 | 
 48 | 
 49 | def replace_layer_recursive(model, old_layer, new_layer):
 50 |     for name, layer in model._modules.items():
 51 |         if layer == old_layer:
 52 |             model._modules[name] = new_layer
 53 |             return True
 54 |         elif replace_layer_recursive(layer, old_layer, new_layer):
 55 |             return True
 56 |     return False
 57 | 
 58 | 
 59 | class AblationCAM(BaseCAM):
 60 |     def __init__(self, model, target_layers, use_cuda=False,
 61 |                  reshape_transform=None):
 62 |         super(AblationCAM, self).__init__(model, target_layers, use_cuda,
 63 |                                           reshape_transform)
 64 | 
 65 |         if len(target_layers) > 1:
 66 |             print(
 67 |                 "Warning. You are usign Ablation CAM with more than 1 layers. "
 68 |                 "This is supported only if all layers have the same output shape")
 69 | 
 70 |     def set_ablation_layers(self):
 71 |         self.ablation_layers = []
 72 |         for target_layer in self.target_layers:
 73 |             ablation_layer = AblationLayer(target_layer,
 74 |                                            self.reshape_transform, indices=[])
 75 |             self.ablation_layers.append(ablation_layer)
 76 |             replace_layer_recursive(self.model, target_layer, ablation_layer)
 77 | 
 78 |     def unset_ablation_layers(self):
 79 |         # replace the model back to the original state
 80 |         for ablation_layer, target_layer in zip(
 81 |                 self.ablation_layers, self.target_layers):
 82 |             replace_layer_recursive(self.model, ablation_layer, target_layer)
 83 | 
 84 |     def set_ablation_layer_batch_indices(self, indices):
 85 |         for ablation_layer in self.ablation_layers:
 86 |             ablation_layer.indices = indices
 87 | 
 88 |     def trim_ablation_layer_batch_indices(self, keep):
 89 |         for ablation_layer in self.ablation_layers:
 90 |             ablation_layer.indices = ablation_layer.indices[:keep]
 91 | 
 92 |     def get_cam_weights(self,
 93 |                         input_tensor,
 94 |                         target_category,
 95 |                         activations,
 96 |                         grads):
 97 |         with torch.no_grad():
 98 |             outputs = self.model(input_tensor).cpu().numpy()
 99 |             original_scores = []
100 |             for i in range(input_tensor.size(0)):
101 |                 original_scores.append(outputs[i, target_category[i]])
102 |         original_scores = np.float32(original_scores)
103 | 
104 |         self.set_ablation_layers()
105 | 
106 |         if hasattr(self, "batch_size"):
107 |             BATCH_SIZE = self.batch_size
108 |         else:
109 |             BATCH_SIZE = 32
110 | 
111 |         number_of_channels = activations.shape[1]
112 |         weights = []
113 | 
114 |         with torch.no_grad():
115 |             # Iterate over the input batch
116 |             for tensor, category in zip(input_tensor, target_category):
117 |                 batch_tensor = tensor.repeat(BATCH_SIZE, 1, 1, 1)
118 |                 for i in tqdm.tqdm(range(0, number_of_channels, BATCH_SIZE)):
119 |                     self.set_ablation_layer_batch_indices(
120 |                         list(range(i, i + BATCH_SIZE)))
121 | 
122 |                     if i + BATCH_SIZE > number_of_channels:
123 |                         keep = number_of_channels - i
124 |                         batch_tensor = batch_tensor[:keep]
125 |                         self.trim_ablation_layer_batch_indices(self, keep)
126 |                     score = self.model(batch_tensor)[:, category].cpu().numpy()
127 |                     weights.extend(score)
128 | 
129 |         weights = np.float32(weights)
130 |         weights = weights.reshape(activations.shape[:2])
131 |         original_scores = original_scores[:, None]
132 |         weights = (original_scores - weights) / original_scores
133 | 
134 |         # replace the model back to the original state
135 |         self.unset_ablation_layers()
136 |         return weights
137 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/ablation_layer.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | from collections import OrderedDict
  3 | import numpy as np
  4 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection
  5 | 
  6 | 
  7 | class AblationLayer(torch.nn.Module):
  8 |     def __init__(self):
  9 |         super(AblationLayer, self).__init__()
 10 | 
 11 |     def objectiveness_mask_from_svd(self, activations, threshold=0.01):
 12 |         """ Experimental method to get a binary mask to compare if the activation is worth ablating.
 13 |             The idea is to apply the EigenCAM method by doing PCA on the activations.
 14 |             Then we create a binary mask by comparing to a low threshold.
 15 |             Areas that are masked out, are probably not interesting anyway.
 16 |         """
 17 | 
 18 |         projection = get_2d_projection(activations[None, :])[0, :]
 19 |         projection = np.abs(projection)
 20 |         projection = projection - projection.min()
 21 |         projection = projection / projection.max()
 22 |         projection = projection > threshold
 23 |         return projection
 24 | 
 25 |     def activations_to_be_ablated(
 26 |             self,
 27 |             activations,
 28 |             ratio_channels_to_ablate=1.0):
 29 |         """ Experimental method to get a binary mask to compare if the activation is worth ablating.
 30 |             Create a binary CAM mask with objectiveness_mask_from_svd.
 31 |             Score each Activation channel, by seeing how much of its values are inside the mask.
 32 |             Then keep the top channels.
 33 | 
 34 |         """
 35 |         if ratio_channels_to_ablate == 1.0:
 36 |             self.indices = np.int32(range(activations.shape[0]))
 37 |             return self.indices
 38 | 
 39 |         projection = self.objectiveness_mask_from_svd(activations)
 40 | 
 41 |         scores = []
 42 |         for channel in activations:
 43 |             normalized = np.abs(channel)
 44 |             normalized = normalized - normalized.min()
 45 |             normalized = normalized / np.max(normalized)
 46 |             score = (projection * normalized).sum() / normalized.sum()
 47 |             scores.append(score)
 48 |         scores = np.float32(scores)
 49 | 
 50 |         indices = list(np.argsort(scores))
 51 |         high_score_indices = indices[::-
 52 |                                      1][: int(len(indices) *
 53 |                                               ratio_channels_to_ablate)]
 54 |         low_score_indices = indices[: int(
 55 |             len(indices) * ratio_channels_to_ablate)]
 56 |         self.indices = np.int32(high_score_indices + low_score_indices)
 57 |         return self.indices
 58 | 
 59 |     def set_next_batch(
 60 |             self,
 61 |             input_batch_index,
 62 |             activations,
 63 |             num_channels_to_ablate):
 64 |         """ This creates the next batch of activations from the layer.
 65 |             Just take corresponding batch member from activations, and repeat it num_channels_to_ablate times.
 66 |         """
 67 |         self.activations = activations[input_batch_index, :, :, :].clone(
 68 |         ).unsqueeze(0).repeat(num_channels_to_ablate, 1, 1, 1)
 69 | 
 70 |     def __call__(self, x):
 71 |         output = self.activations
 72 |         for i in range(output.size(0)):
 73 |             # Commonly the minimum activation will be 0,
 74 |             # And then it makes sense to zero it out.
 75 |             # However depending on the architecture,
 76 |             # If the values can be negative, we use very negative values
 77 |             # to perform the ablation, deviating from the paper.
 78 |             if torch.min(output) == 0:
 79 |                 output[i, self.indices[i], :] = 0
 80 |             else:
 81 |                 ABLATION_VALUE = 1e7
 82 |                 output[i, self.indices[i], :] = torch.min(
 83 |                     output) - ABLATION_VALUE
 84 | 
 85 |         return output
 86 | 
 87 | 
 88 | class AblationLayerVit(AblationLayer):
 89 |     def __init__(self):
 90 |         super(AblationLayerVit, self).__init__()
 91 | 
 92 |     def __call__(self, x):
 93 |         output = self.activations
 94 |         output = output.transpose(1, len(output.shape) - 1)
 95 |         for i in range(output.size(0)):
 96 | 
 97 |             # Commonly the minimum activation will be 0,
 98 |             # And then it makes sense to zero it out.
 99 |             # However depending on the architecture,
100 |             # If the values can be negative, we use very negative values
101 |             # to perform the ablation, deviating from the paper.
102 |             if torch.min(output) == 0:
103 |                 output[i, self.indices[i], :] = 0
104 |             else:
105 |                 ABLATION_VALUE = 1e7
106 |                 output[i, self.indices[i], :] = torch.min(
107 |                     output) - ABLATION_VALUE
108 | 
109 |         output = output.transpose(len(output.shape) - 1, 1)
110 | 
111 |         return output
112 | 
113 |     def set_next_batch(
114 |             self,
115 |             input_batch_index,
116 |             activations,
117 |             num_channels_to_ablate):
118 |         """ This creates the next batch of activations from the layer.
119 |             Just take corresponding batch member from activations, and repeat it num_channels_to_ablate times.
120 |         """
121 |         repeat_params = [num_channels_to_ablate] + \
122 |             len(activations.shape[:-1]) * [1]
123 |         self.activations = activations[input_batch_index, :, :].clone(
124 |         ).unsqueeze(0).repeat(*repeat_params)
125 | 
126 | 
127 | class AblationLayerFasterRCNN(AblationLayer):
128 |     def __init__(self):
129 |         super(AblationLayerFasterRCNN, self).__init__()
130 | 
131 |     def set_next_batch(
132 |             self,
133 |             input_batch_index,
134 |             activations,
135 |             num_channels_to_ablate):
136 |         """ Extract the next batch member from activations,
137 |             and repeat it num_channels_to_ablate times.
138 |         """
139 |         self.activations = OrderedDict()
140 |         for key, value in activations.items():
141 |             fpn_activation = value[input_batch_index,
142 |                                    :, :, :].clone().unsqueeze(0)
143 |             self.activations[key] = fpn_activation.repeat(
144 |                 num_channels_to_ablate, 1, 1, 1)
145 | 
146 |     def __call__(self, x):
147 |         result = self.activations
148 |         layers = {0: '0', 1: '1', 2: '2', 3: '3', 4: 'pool'}
149 |         num_channels_to_ablate = result['pool'].size(0)
150 |         for i in range(num_channels_to_ablate):
151 |             pyramid_layer = int(self.indices[i] / 256)
152 |             index_in_pyramid_layer = int(self.indices[i] % 256)
153 |             result[layers[pyramid_layer]][i,
154 |                                           index_in_pyramid_layer, :, :] = -1000
155 |         return result
156 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/activations_and_gradients.py:
--------------------------------------------------------------------------------
 1 | class ActivationsAndGradients:
 2 |     """ Class for extracting activations and
 3 |     registering gradients from targetted intermediate layers """
 4 | 
 5 |     def __init__(self, model, target_layers, reshape_transform):
 6 |         self.model = model
 7 |         self.gradients = []
 8 |         self.activations = []
 9 |         self.reshape_transform = reshape_transform
10 |         self.handles = []
11 |         for target_layer in target_layers:
12 |             self.handles.append(
13 |                 target_layer.register_forward_hook(self.save_activation))
14 |             # Because of https://github.com/pytorch/pytorch/issues/61519,
15 |             # we don't use backward hook to record gradients.
16 |             self.handles.append(
17 |                 target_layer.register_forward_hook(self.save_gradient))
18 | 
19 |     def save_activation(self, module, input, output):
20 |         activation = output
21 | 
22 |         if self.reshape_transform is not None:
23 |             activation = self.reshape_transform(activation)
24 |         self.activations.append(activation.cpu().detach())
25 | 
26 |     def save_gradient(self, module, input, output):
27 |         if not hasattr(output, "requires_grad") or not output.requires_grad:
28 |             # You can only register hooks on tensor requires grad.
29 |             return
30 | 
31 |         # Gradients are computed in reverse order
32 |         def _store_grad(grad):
33 |             if self.reshape_transform is not None:
34 |                 grad = self.reshape_transform(grad)
35 |             self.gradients = [grad.cpu().detach()] + self.gradients
36 | 
37 |         output.register_hook(_store_grad)
38 | 
39 |     def __call__(self, x):
40 |         self.gradients = []
41 |         self.activations = []
42 |         return self.model(x)
43 | 
44 |     def release(self):
45 |         for handle in self.handles:
46 |             handle.remove()
47 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/base_cam.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | import ttach as tta
  4 | from typing import Callable, List, Tuple
  5 | from pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients
  6 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection
  7 | from pytorch_grad_cam.utils.image import scale_cam_image
  8 | from pytorch_grad_cam.utils.model_targets import ClassifierOutputTarget
  9 | 
 10 | 
 11 | class BaseCAM:
 12 |     def __init__(self,
 13 |                  model: torch.nn.Module,
 14 |                  target_layers: List[torch.nn.Module],
 15 |                  use_cuda: bool = False,
 16 |                  reshape_transform: Callable = None,
 17 |                  compute_input_gradient: bool = False,
 18 |                  uses_gradients: bool = True) -> None:
 19 |         self.model = model.eval()
 20 |         self.target_layers = target_layers
 21 |         self.cuda = use_cuda
 22 |         if self.cuda:
 23 |             self.model = model.cuda()
 24 |         self.reshape_transform = reshape_transform
 25 |         self.compute_input_gradient = compute_input_gradient
 26 |         self.uses_gradients = uses_gradients
 27 |         self.activations_and_grads = ActivationsAndGradients(
 28 |             self.model, target_layers, reshape_transform)
 29 | 
 30 |     """ Get a vector of weights for every channel in the target layer.
 31 |         Methods that return weights channels,
 32 |         will typically need to only implement this function. """
 33 | 
 34 |     def get_cam_weights(self,
 35 |                         input_tensor: torch.Tensor,
 36 |                         target_layers: List[torch.nn.Module],
 37 |                         targets: List[torch.nn.Module],
 38 |                         activations: torch.Tensor,
 39 |                         grads: torch.Tensor) -> np.ndarray:
 40 |         raise Exception("Not Implemented")
 41 | 
 42 |     def get_cam_image(self,
 43 |                       input_tensor: torch.Tensor,
 44 |                       target_layer: torch.nn.Module,
 45 |                       targets: List[torch.nn.Module],
 46 |                       activations: torch.Tensor,
 47 |                       grads: torch.Tensor,
 48 |                       eigen_smooth: bool = False) -> np.ndarray:
 49 | 
 50 |         weights = self.get_cam_weights(input_tensor,
 51 |                                        target_layer,
 52 |                                        targets,
 53 |                                        activations,
 54 |                                        grads)
 55 |         weighted_activations = weights[:, :, None, None] * activations
 56 |         if eigen_smooth:
 57 |             cam = get_2d_projection(weighted_activations)
 58 |         else:
 59 |             cam = weighted_activations.sum(axis=1)
 60 |         return cam
 61 | 
 62 |     def forward(self,
 63 |                 input_tensor: torch.Tensor,
 64 |                 targets: List[torch.nn.Module],
 65 |                 eigen_smooth: bool = False) -> np.ndarray:
 66 | 
 67 |         if self.cuda:
 68 |             input_tensor = input_tensor.cuda()
 69 | 
 70 |         if self.compute_input_gradient:
 71 |             input_tensor = torch.autograd.Variable(input_tensor,
 72 |                                                    requires_grad=True)
 73 | 
 74 |         outputs = self.activations_and_grads(input_tensor)
 75 |         if targets is None:
 76 |             target_categories = np.argmax(outputs.cpu().data.numpy(), axis=-1)
 77 |             targets = [ClassifierOutputTarget(
 78 |                 category) for category in target_categories]
 79 | 
 80 |         if self.uses_gradients:
 81 |             self.model.zero_grad()
 82 |             loss = sum([target(output)
 83 |                        for target, output in zip(targets, outputs)])
 84 |             loss.backward(retain_graph=True)
 85 | 
 86 |         # In most of the saliency attribution papers, the saliency is
 87 |         # computed with a single target layer.
 88 |         # Commonly it is the last convolutional layer.
 89 |         # Here we support passing a list with multiple target layers.
 90 |         # It will compute the saliency image for every image,
 91 |         # and then aggregate them (with a default mean aggregation).
 92 |         # This gives you more flexibility in case you just want to
 93 |         # use all conv layers for example, all Batchnorm layers,
 94 |         # or something else.
 95 |         cam_per_layer = self.compute_cam_per_layer(input_tensor,
 96 |                                                    targets,
 97 |                                                    eigen_smooth)
 98 |         return self.aggregate_multi_layers(cam_per_layer)
 99 | 
100 |     def get_target_width_height(self,
101 |                                 input_tensor: torch.Tensor) -> Tuple[int, int]:
102 |         width, height = input_tensor.size(-1), input_tensor.size(-2)
103 |         return width, height
104 | 
105 |     def compute_cam_per_layer(
106 |             self,
107 |             input_tensor: torch.Tensor,
108 |             targets: List[torch.nn.Module],
109 |             eigen_smooth: bool) -> np.ndarray:
110 |         activations_list = [a.cpu().data.numpy()
111 |                             for a in self.activations_and_grads.activations]
112 |         grads_list = [g.cpu().data.numpy()
113 |                       for g in self.activations_and_grads.gradients]
114 |         target_size = self.get_target_width_height(input_tensor)
115 | 
116 |         cam_per_target_layer = []
117 |         # Loop over the saliency image from every layer
118 |         for i in range(len(self.target_layers)):
119 |             target_layer = self.target_layers[i]
120 |             layer_activations = None
121 |             layer_grads = None
122 |             if i < len(activations_list):
123 |                 layer_activations = activations_list[i]
124 |             if i < len(grads_list):
125 |                 layer_grads = grads_list[i]
126 | 
127 |             cam = self.get_cam_image(input_tensor,
128 |                                      target_layer,
129 |                                      targets,
130 |                                      layer_activations,
131 |                                      layer_grads,
132 |                                      eigen_smooth)
133 |             cam = np.maximum(cam, 0)
134 |             scaled = scale_cam_image(cam, target_size)
135 |             cam_per_target_layer.append(scaled[:, None, :])
136 | 
137 |         return cam_per_target_layer
138 | 
139 |     def aggregate_multi_layers(
140 |             self,
141 |             cam_per_target_layer: np.ndarray) -> np.ndarray:
142 |         cam_per_target_layer = np.concatenate(cam_per_target_layer, axis=1)
143 |         cam_per_target_layer = np.maximum(cam_per_target_layer, 0)
144 |         result = np.mean(cam_per_target_layer, axis=1)
145 |         return scale_cam_image(result)
146 | 
147 |     def forward_augmentation_smoothing(self,
148 |                                        input_tensor: torch.Tensor,
149 |                                        targets: List[torch.nn.Module],
150 |                                        eigen_smooth: bool = False) -> np.ndarray:
151 |         transforms = tta.Compose(
152 |             [
153 |                 tta.HorizontalFlip(),
154 |                 tta.Multiply(factors=[0.9, 1, 1.1]),
155 |             ]
156 |         )
157 |         cams = []
158 |         for transform in transforms:
159 |             augmented_tensor = transform.augment_image(input_tensor)
160 |             cam = self.forward(augmented_tensor,
161 |                                targets,
162 |                                eigen_smooth)
163 | 
164 |             # The ttach library expects a tensor of size BxCxHxW
165 |             cam = cam[:, None, :, :]
166 |             cam = torch.from_numpy(cam)
167 |             cam = transform.deaugment_mask(cam)
168 | 
169 |             # Back to numpy float32, HxW
170 |             cam = cam.numpy()
171 |             cam = cam[:, 0, :, :]
172 |             cams.append(cam)
173 | 
174 |         cam = np.mean(np.float32(cams), axis=0)
175 |         return cam
176 | 
177 |     def __call__(self,
178 |                  input_tensor: torch.Tensor,
179 |                  targets: List[torch.nn.Module] = None,
180 |                  aug_smooth: bool = False,
181 |                  eigen_smooth: bool = False) -> np.ndarray:
182 | 
183 |         # Smooth the CAM result with test time augmentation
184 |         if aug_smooth is True:
185 |             return self.forward_augmentation_smoothing(
186 |                 input_tensor, targets, eigen_smooth)
187 | 
188 |         return self.forward(input_tensor,
189 |                             targets, eigen_smooth)
190 | 
191 |     def __del__(self):
192 |         self.activations_and_grads.release()
193 | 
194 |     def __enter__(self):
195 |         return self
196 | 
197 |     def __exit__(self, exc_type, exc_value, exc_tb):
198 |         self.activations_and_grads.release()
199 |         if isinstance(exc_value, IndexError):
200 |             # Handle IndexError here...
201 |             print(
202 |                 f"An exception occurred in CAM with block: {exc_type}. Message: {exc_value}")
203 |             return True
204 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/eigen_cam.py:
--------------------------------------------------------------------------------
 1 | from pytorch_grad_cam.base_cam import BaseCAM
 2 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection
 3 | 
 4 | # https://arxiv.org/abs/2008.00299
 5 | 
 6 | 
 7 | class EigenCAM(BaseCAM):
 8 |     def __init__(self, model, target_layers, use_cuda=False,
 9 |                  reshape_transform=None):
10 |         super(EigenCAM, self).__init__(model,
11 |                                        target_layers,
12 |                                        use_cuda,
13 |                                        reshape_transform,
14 |                                        uses_gradients=False)
15 | 
16 |     def get_cam_image(self,
17 |                       input_tensor,
18 |                       target_layer,
19 |                       target_category,
20 |                       activations,
21 |                       grads,
22 |                       eigen_smooth):
23 |         return get_2d_projection(activations)
24 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/eigen_grad_cam.py:
--------------------------------------------------------------------------------
 1 | from pytorch_grad_cam.base_cam import BaseCAM
 2 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection
 3 | 
 4 | # Like Eigen CAM: https://arxiv.org/abs/2008.00299
 5 | # But multiply the activations x gradients
 6 | 
 7 | 
 8 | class EigenGradCAM(BaseCAM):
 9 |     def __init__(self, model, target_layers, use_cuda=False,
10 |                  reshape_transform=None):
11 |         super(EigenGradCAM, self).__init__(model, target_layers, use_cuda,
12 |                                            reshape_transform)
13 | 
14 |     def get_cam_image(self,
15 |                       input_tensor,
16 |                       target_layer,
17 |                       target_category,
18 |                       activations,
19 |                       grads,
20 |                       eigen_smooth):
21 |         return get_2d_projection(grads * activations)
22 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/feature_factorization/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/feature_factorization/__init__.py


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/feature_factorization/deep_feature_factorization.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from PIL import Image
  3 | import torch
  4 | from typing import Callable, List, Tuple, Optional
  5 | from sklearn.decomposition import NMF
  6 | from pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients
  7 | from pytorch_grad_cam.utils.image import scale_cam_image, create_labels_legend, show_factorization_on_image
  8 | 
  9 | 
 10 | def dff(activations: np.ndarray, n_components: int = 5):
 11 |     """ Compute Deep Feature Factorization on a 2d Activations tensor.
 12 | 
 13 |     :param activations: A numpy array of shape batch x channels x height x width
 14 |     :param n_components: The number of components for the non negative matrix factorization
 15 |     :returns: A tuple of the concepts (a numpy array with shape channels x components),
 16 |               and the explanation heatmaps (a numpy arary with shape batch x height x width)
 17 |     """
 18 | 
 19 |     batch_size, channels, h, w = activations.shape
 20 |     reshaped_activations = activations.transpose((1, 0, 2, 3))
 21 |     reshaped_activations[np.isnan(reshaped_activations)] = 0
 22 |     reshaped_activations = reshaped_activations.reshape(
 23 |         reshaped_activations.shape[0], -1)
 24 |     offset = reshaped_activations.min(axis=-1)
 25 |     reshaped_activations = reshaped_activations - offset[:, None]
 26 | 
 27 |     model = NMF(n_components=n_components, init='random', random_state=0)
 28 |     W = model.fit_transform(reshaped_activations)
 29 |     H = model.components_
 30 |     concepts = W + offset[:, None]
 31 |     explanations = H.reshape(n_components, batch_size, h, w)
 32 |     explanations = explanations.transpose((1, 0, 2, 3))
 33 |     return concepts, explanations
 34 | 
 35 | 
 36 | class DeepFeatureFactorization:
 37 |     """ Deep Feature Factorization: https://arxiv.org/abs/1806.10206
 38 |         This gets a model andcomputes the 2D activations for a target layer,
 39 |         and computes Non Negative Matrix Factorization on the activations.
 40 | 
 41 |         Optionally it runs a computation on the concept embeddings,
 42 |         like running a classifier on them.
 43 | 
 44 |         The explanation heatmaps are scalled to the range [0, 1]
 45 |         and to the input tensor width and height.
 46 |      """
 47 | 
 48 |     def __init__(self,
 49 |                  model: torch.nn.Module,
 50 |                  target_layer: torch.nn.Module,
 51 |                  reshape_transform: Callable = None,
 52 |                  computation_on_concepts=None
 53 |                  ):
 54 |         self.model = model
 55 |         self.computation_on_concepts = computation_on_concepts
 56 |         self.activations_and_grads = ActivationsAndGradients(
 57 |             self.model, [target_layer], reshape_transform)
 58 | 
 59 |     def __call__(self,
 60 |                  input_tensor: torch.Tensor,
 61 |                  n_components: int = 16):
 62 |         batch_size, channels, h, w = input_tensor.size()
 63 |         _ = self.activations_and_grads(input_tensor)
 64 | 
 65 |         with torch.no_grad():
 66 |             activations = self.activations_and_grads.activations[0].cpu(
 67 |             ).numpy()
 68 | 
 69 |         concepts, explanations = dff(activations, n_components=n_components)
 70 | 
 71 |         processed_explanations = []
 72 | 
 73 |         for batch in explanations:
 74 |             processed_explanations.append(scale_cam_image(batch, (w, h)))
 75 | 
 76 |         if self.computation_on_concepts:
 77 |             with torch.no_grad():
 78 |                 concept_tensors = torch.from_numpy(
 79 |                     np.float32(concepts).transpose((1, 0)))
 80 |                 concept_outputs = self.computation_on_concepts(
 81 |                     concept_tensors).cpu().numpy()
 82 |             return concepts, processed_explanations, concept_outputs
 83 |         else:
 84 |             return concepts, processed_explanations
 85 | 
 86 |     def __del__(self):
 87 |         self.activations_and_grads.release()
 88 | 
 89 |     def __exit__(self, exc_type, exc_value, exc_tb):
 90 |         self.activations_and_grads.release()
 91 |         if isinstance(exc_value, IndexError):
 92 |             # Handle IndexError here...
 93 |             print(
 94 |                 f"An exception occurred in ActivationSummary with block: {exc_type}. Message: {exc_value}")
 95 |             return True
 96 | 
 97 | 
 98 | def run_dff_on_image(model: torch.nn.Module,
 99 |                      target_layer: torch.nn.Module,
100 |                      classifier: torch.nn.Module,
101 |                      img_pil: Image,
102 |                      img_tensor: torch.Tensor,
103 |                      reshape_transform=Optional[Callable],
104 |                      n_components: int = 5,
105 |                      top_k: int = 2) -> np.ndarray:
106 |     """ Helper function to create a Deep Feature Factorization visualization for a single image.
107 |         TBD: Run this on a batch with several images.
108 |     """
109 |     rgb_img_float = np.array(img_pil) / 255
110 |     dff = DeepFeatureFactorization(model=model,
111 |                                    reshape_transform=reshape_transform,
112 |                                    target_layer=target_layer,
113 |                                    computation_on_concepts=classifier)
114 | 
115 |     concepts, batch_explanations, concept_outputs = dff(
116 |         img_tensor[None, :], n_components)
117 | 
118 |     concept_outputs = torch.softmax(
119 |         torch.from_numpy(concept_outputs),
120 |         axis=-1).numpy()
121 |     concept_label_strings = create_labels_legend(concept_outputs,
122 |                                                  labels=model.config.id2label,
123 |                                                  top_k=top_k)
124 |     visualization = show_factorization_on_image(
125 |         rgb_img_float,
126 |         batch_explanations[0],
127 |         image_weight=0.3,
128 |         concept_labels=concept_label_strings)
129 | 
130 |     result = np.hstack((np.array(img_pil), visualization))
131 |     return result
132 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/fullgrad_cam.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import torch
 3 | from pytorch_grad_cam.base_cam import BaseCAM
 4 | from pytorch_grad_cam.utils.find_layers import find_layer_predicate_recursive
 5 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection
 6 | from pytorch_grad_cam.utils.image import scale_accross_batch_and_channels, scale_cam_image
 7 | 
 8 | # https://arxiv.org/abs/1905.00780
 9 | 
10 | 
11 | class FullGrad(BaseCAM):
12 |     def __init__(self, model, target_layers, use_cuda=False,
13 |                  reshape_transform=None):
14 |         if len(target_layers) > 0:
15 |             print(
16 |                 "Warning: target_layers is ignored in FullGrad. All bias layers will be used instead")
17 | 
18 |         def layer_with_2D_bias(layer):
19 |             bias_target_layers = [torch.nn.Conv2d, torch.nn.BatchNorm2d]
20 |             if type(layer) in bias_target_layers and layer.bias is not None:
21 |                 return True
22 |             return False
23 |         target_layers = find_layer_predicate_recursive(
24 |             model, layer_with_2D_bias)
25 |         super(
26 |             FullGrad,
27 |             self).__init__(
28 |             model,
29 |             target_layers,
30 |             use_cuda,
31 |             reshape_transform,
32 |             compute_input_gradient=True)
33 |         self.bias_data = [self.get_bias_data(
34 |             layer).cpu().numpy() for layer in target_layers]
35 | 
36 |     def get_bias_data(self, layer):
37 |         # Borrowed from official paper impl:
38 |         # https://github.com/idiap/fullgrad-saliency/blob/master/saliency/tensor_extractor.py#L47
39 |         if isinstance(layer, torch.nn.BatchNorm2d):
40 |             bias = - (layer.running_mean * layer.weight
41 |                       / torch.sqrt(layer.running_var + layer.eps)) + layer.bias
42 |             return bias.data
43 |         else:
44 |             return layer.bias.data
45 | 
46 |     def compute_cam_per_layer(
47 |             self,
48 |             input_tensor,
49 |             target_category,
50 |             eigen_smooth):
51 |         input_grad = input_tensor.grad.data.cpu().numpy()
52 |         grads_list = [g.cpu().data.numpy() for g in
53 |                       self.activations_and_grads.gradients]
54 |         cam_per_target_layer = []
55 |         target_size = self.get_target_width_height(input_tensor)
56 | 
57 |         gradient_multiplied_input = input_grad * input_tensor.data.cpu().numpy()
58 |         gradient_multiplied_input = np.abs(gradient_multiplied_input)
59 |         gradient_multiplied_input = scale_accross_batch_and_channels(
60 |             gradient_multiplied_input,
61 |             target_size)
62 |         cam_per_target_layer.append(gradient_multiplied_input)
63 | 
64 |         # Loop over the saliency image from every layer
65 |         assert(len(self.bias_data) == len(grads_list))
66 |         for bias, grads in zip(self.bias_data, grads_list):
67 |             bias = bias[None, :, None, None]
68 |             # In the paper they take the absolute value,
69 |             # but possibily taking only the positive gradients will work
70 |             # better.
71 |             bias_grad = np.abs(bias * grads)
72 |             result = scale_accross_batch_and_channels(
73 |                 bias_grad, target_size)
74 |             result = np.sum(result, axis=1)
75 |             cam_per_target_layer.append(result[:, None, :])
76 |         cam_per_target_layer = np.concatenate(cam_per_target_layer, axis=1)
77 |         if eigen_smooth:
78 |             # Resize to a smaller image, since this method typically has a very large number of channels,
79 |             # and then consumes a lot of memory
80 |             cam_per_target_layer = scale_accross_batch_and_channels(
81 |                 cam_per_target_layer, (target_size[0] // 8, target_size[1] // 8))
82 |             cam_per_target_layer = get_2d_projection(cam_per_target_layer)
83 |             cam_per_target_layer = cam_per_target_layer[:, None, :, :]
84 |             cam_per_target_layer = scale_accross_batch_and_channels(
85 |                 cam_per_target_layer,
86 |                 target_size)
87 |         else:
88 |             cam_per_target_layer = np.sum(
89 |                 cam_per_target_layer, axis=1)[:, None, :]
90 | 
91 |         return cam_per_target_layer
92 | 
93 |     def aggregate_multi_layers(self, cam_per_target_layer):
94 |         result = np.sum(cam_per_target_layer, axis=1)
95 |         return scale_cam_image(result)
96 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/grad_cam.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from pytorch_grad_cam.base_cam import BaseCAM
 3 | 
 4 | 
 5 | class GradCAM(BaseCAM):
 6 |     def __init__(self, model, target_layers, use_cuda=False,
 7 |                  reshape_transform=None):
 8 |         super(
 9 |             GradCAM,
10 |             self).__init__(
11 |             model,
12 |             target_layers,
13 |             use_cuda,
14 |             reshape_transform)
15 | 
16 |     def get_cam_weights(self,
17 |                         input_tensor,
18 |                         target_layer,
19 |                         target_category,
20 |                         activations,
21 |                         grads):
22 |         return np.mean(grads, axis=(2, 3))
23 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/grad_cam_elementwise.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from pytorch_grad_cam.base_cam import BaseCAM
 3 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection
 4 | 
 5 | 
 6 | class GradCAMElementWise(BaseCAM):
 7 |     def __init__(self, model, target_layers, use_cuda=False,
 8 |                  reshape_transform=None):
 9 |         super(
10 |             GradCAMElementWise,
11 |             self).__init__(
12 |             model,
13 |             target_layers,
14 |             use_cuda,
15 |             reshape_transform)
16 | 
17 |     def get_cam_image(self,
18 |                       input_tensor,
19 |                       target_layer,
20 |                       target_category,
21 |                       activations,
22 |                       grads,
23 |                       eigen_smooth):
24 |         elementwise_activations = np.maximum(grads * activations, 0)
25 | 
26 |         if eigen_smooth:
27 |             cam = get_2d_projection(elementwise_activations)
28 |         else:
29 |             cam = elementwise_activations.sum(axis=1)
30 |         return cam
31 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/grad_cam_plusplus.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from pytorch_grad_cam.base_cam import BaseCAM
 3 | 
 4 | # https://arxiv.org/abs/1710.11063
 5 | 
 6 | 
 7 | class GradCAMPlusPlus(BaseCAM):
 8 |     def __init__(self, model, target_layers, use_cuda=False,
 9 |                  reshape_transform=None):
10 |         super(GradCAMPlusPlus, self).__init__(model, target_layers, use_cuda,
11 |                                               reshape_transform)
12 | 
13 |     def get_cam_weights(self,
14 |                         input_tensor,
15 |                         target_layers,
16 |                         target_category,
17 |                         activations,
18 |                         grads):
19 |         grads_power_2 = grads**2
20 |         grads_power_3 = grads_power_2 * grads
21 |         # Equation 19 in https://arxiv.org/abs/1710.11063
22 |         sum_activations = np.sum(activations, axis=(2, 3))
23 |         eps = 0.000001
24 |         aij = grads_power_2 / (2 * grads_power_2 +
25 |                                sum_activations[:, :, None, None] * grads_power_3 + eps)
26 |         # Now bring back the ReLU from eq.7 in the paper,
27 |         # And zero out aijs where the activations are 0
28 |         aij = np.where(grads != 0, aij, 0)
29 | 
30 |         weights = np.maximum(grads, 0) * aij
31 |         weights = np.sum(weights, axis=(2, 3))
32 |         return weights
33 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/guided_backprop.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | from torch.autograd import Function
  4 | from pytorch_grad_cam.utils.find_layers import replace_all_layer_type_recursive
  5 | 
  6 | 
  7 | class GuidedBackpropReLU(Function):
  8 |     @staticmethod
  9 |     def forward(self, input_img):
 10 |         positive_mask = (input_img > 0).type_as(input_img)
 11 |         output = torch.addcmul(
 12 |             torch.zeros(
 13 |                 input_img.size()).type_as(input_img),
 14 |             input_img,
 15 |             positive_mask)
 16 |         self.save_for_backward(input_img, output)
 17 |         return output
 18 | 
 19 |     @staticmethod
 20 |     def backward(self, grad_output):
 21 |         input_img, output = self.saved_tensors
 22 |         grad_input = None
 23 | 
 24 |         positive_mask_1 = (input_img > 0).type_as(grad_output)
 25 |         positive_mask_2 = (grad_output > 0).type_as(grad_output)
 26 |         grad_input = torch.addcmul(
 27 |             torch.zeros(
 28 |                 input_img.size()).type_as(input_img),
 29 |             torch.addcmul(
 30 |                 torch.zeros(
 31 |                     input_img.size()).type_as(input_img),
 32 |                 grad_output,
 33 |                 positive_mask_1),
 34 |             positive_mask_2)
 35 |         return grad_input
 36 | 
 37 | 
 38 | class GuidedBackpropReLUasModule(torch.nn.Module):
 39 |     def __init__(self):
 40 |         super(GuidedBackpropReLUasModule, self).__init__()
 41 | 
 42 |     def forward(self, input_img):
 43 |         return GuidedBackpropReLU.apply(input_img)
 44 | 
 45 | 
 46 | class GuidedBackpropReLUModel:
 47 |     def __init__(self, model, use_cuda):
 48 |         self.model = model
 49 |         self.model.eval()
 50 |         self.cuda = use_cuda
 51 |         if self.cuda:
 52 |             self.model = self.model.cuda()
 53 | 
 54 |     def forward(self, input_img):
 55 |         return self.model(input_img)
 56 | 
 57 |     def recursive_replace_relu_with_guidedrelu(self, module_top):
 58 | 
 59 |         for idx, module in module_top._modules.items():
 60 |             self.recursive_replace_relu_with_guidedrelu(module)
 61 |             if module.__class__.__name__ == 'ReLU':
 62 |                 module_top._modules[idx] = GuidedBackpropReLU.apply
 63 |         print("b")
 64 | 
 65 |     def recursive_replace_guidedrelu_with_relu(self, module_top):
 66 |         try:
 67 |             for idx, module in module_top._modules.items():
 68 |                 self.recursive_replace_guidedrelu_with_relu(module)
 69 |                 if module == GuidedBackpropReLU.apply:
 70 |                     module_top._modules[idx] = torch.nn.ReLU()
 71 |         except BaseException:
 72 |             pass
 73 | 
 74 |     def __call__(self, input_img, target_category=None):
 75 |         replace_all_layer_type_recursive(self.model,
 76 |                                          torch.nn.ReLU,
 77 |                                          GuidedBackpropReLUasModule())
 78 | 
 79 |         if self.cuda:
 80 |             input_img = input_img.cuda()
 81 | 
 82 |         input_img = input_img.requires_grad_(True)
 83 | 
 84 |         output = self.forward(input_img)
 85 | 
 86 |         if target_category is None:
 87 |             target_category = np.argmax(output.cpu().data.numpy())
 88 | 
 89 |         loss = output[0, target_category]
 90 |         loss.backward(retain_graph=True)
 91 | 
 92 |         output = input_img.grad.cpu().data.numpy()
 93 |         output = output[0, :, :, :]
 94 |         output = output.transpose((1, 2, 0))
 95 | 
 96 |         replace_all_layer_type_recursive(self.model,
 97 |                                          GuidedBackpropReLUasModule,
 98 |                                          torch.nn.ReLU())
 99 | 
100 |         return output
101 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/hirescam.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from pytorch_grad_cam.base_cam import BaseCAM
 3 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection
 4 | 
 5 | 
 6 | class HiResCAM(BaseCAM):
 7 |     def __init__(self, model, target_layers, use_cuda=False,
 8 |                  reshape_transform=None):
 9 |         super(
10 |             HiResCAM,
11 |             self).__init__(
12 |             model,
13 |             target_layers,
14 |             use_cuda,
15 |             reshape_transform)
16 | 
17 |     def get_cam_image(self,
18 |                       input_tensor,
19 |                       target_layer,
20 |                       target_category,
21 |                       activations,
22 |                       grads,
23 |                       eigen_smooth):
24 |         elementwise_activations = grads * activations
25 | 
26 |         if eigen_smooth:
27 |             print(
28 |                 "Warning: HiResCAM's faithfulness guarantees do not hold if smoothing is applied")
29 |             cam = get_2d_projection(elementwise_activations)
30 |         else:
31 |             cam = elementwise_activations.sum(axis=1)
32 |         return cam
33 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/layer_cam.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from pytorch_grad_cam.base_cam import BaseCAM
 3 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection
 4 | 
 5 | # https://ieeexplore.ieee.org/document/9462463
 6 | 
 7 | 
 8 | class LayerCAM(BaseCAM):
 9 |     def __init__(
10 |             self,
11 |             model,
12 |             target_layers,
13 |             use_cuda=False,
14 |             reshape_transform=None):
15 |         super(
16 |             LayerCAM,
17 |             self).__init__(
18 |             model,
19 |             target_layers,
20 |             use_cuda,
21 |             reshape_transform)
22 | 
23 |     def get_cam_image(self,
24 |                       input_tensor,
25 |                       target_layer,
26 |                       target_category,
27 |                       activations,
28 |                       grads,
29 |                       eigen_smooth):
30 |         spatial_weighted_activations = np.maximum(grads, 0) * activations
31 | 
32 |         if eigen_smooth:
33 |             cam = get_2d_projection(spatial_weighted_activations)
34 |         else:
35 |             cam = spatial_weighted_activations.sum(axis=1)
36 |         return cam
37 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/metrics/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/metrics/__init__.py


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/metrics/cam_mult_image.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import numpy as np
 3 | from typing import List, Callable
 4 | from pytorch_grad_cam.metrics.perturbation_confidence import PerturbationConfidenceMetric
 5 | 
 6 | 
 7 | def multiply_tensor_with_cam(input_tensor: torch.Tensor,
 8 |                              cam: torch.Tensor):
 9 |     """ Multiply an input tensor (after normalization)
10 |         with a pixel attribution map
11 |     """
12 |     return input_tensor * cam
13 | 
14 | 
15 | class CamMultImageConfidenceChange(PerturbationConfidenceMetric):
16 |     def __init__(self):
17 |         super(CamMultImageConfidenceChange,
18 |               self).__init__(multiply_tensor_with_cam)
19 | 
20 | 
21 | class DropInConfidence(CamMultImageConfidenceChange):
22 |     def __init__(self):
23 |         super(DropInConfidence, self).__init__()
24 | 
25 |     def __call__(self, *args, **kwargs):
26 |         scores = super(DropInConfidence, self).__call__(*args, **kwargs)
27 |         scores = -scores
28 |         return np.maximum(scores, 0)
29 | 
30 | 
31 | class IncreaseInConfidence(CamMultImageConfidenceChange):
32 |     def __init__(self):
33 |         super(IncreaseInConfidence, self).__init__()
34 | 
35 |     def __call__(self, *args, **kwargs):
36 |         scores = super(IncreaseInConfidence, self).__call__(*args, **kwargs)
37 |         return np.float32(scores > 0)
38 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/metrics/perturbation_confidence.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import numpy as np
  3 | from typing import List, Callable
  4 | 
  5 | import numpy as np
  6 | import cv2
  7 | 
  8 | 
  9 | class PerturbationConfidenceMetric:
 10 |     def __init__(self, perturbation):
 11 |         self.perturbation = perturbation
 12 | 
 13 |     def __call__(self, input_tensor: torch.Tensor,
 14 |                  cams: np.ndarray,
 15 |                  targets: List[Callable],
 16 |                  model: torch.nn.Module,
 17 |                  return_visualization=False,
 18 |                  return_diff=True):
 19 | 
 20 |         if return_diff:
 21 |             with torch.no_grad():
 22 |                 outputs = model(input_tensor)
 23 |                 scores = [target(output).cpu().numpy()
 24 |                           for target, output in zip(targets, outputs)]
 25 |                 scores = np.float32(scores)
 26 | 
 27 |         batch_size = input_tensor.size(0)
 28 |         perturbated_tensors = []
 29 |         for i in range(batch_size):
 30 |             cam = cams[i]
 31 |             tensor = self.perturbation(input_tensor[i, ...].cpu(),
 32 |                                        torch.from_numpy(cam))
 33 |             tensor = tensor.to(input_tensor.device)
 34 |             perturbated_tensors.append(tensor.unsqueeze(0))
 35 |         perturbated_tensors = torch.cat(perturbated_tensors)
 36 | 
 37 |         with torch.no_grad():
 38 |             outputs_after_imputation = model(perturbated_tensors)
 39 |         scores_after_imputation = [
 40 |             target(output).cpu().numpy() for target, output in zip(
 41 |                 targets, outputs_after_imputation)]
 42 |         scores_after_imputation = np.float32(scores_after_imputation)
 43 | 
 44 |         if return_diff:
 45 |             result = scores_after_imputation - scores
 46 |         else:
 47 |             result = scores_after_imputation
 48 | 
 49 |         if return_visualization:
 50 |             return result, perturbated_tensors
 51 |         else:
 52 |             return result
 53 | 
 54 | 
 55 | class RemoveMostRelevantFirst:
 56 |     def __init__(self, percentile, imputer):
 57 |         self.percentile = percentile
 58 |         self.imputer = imputer
 59 | 
 60 |     def __call__(self, input_tensor, mask):
 61 |         imputer = self.imputer
 62 |         if self.percentile != 'auto':
 63 |             threshold = np.percentile(mask.cpu().numpy(), self.percentile)
 64 |             binary_mask = np.float32(mask < threshold)
 65 |         else:
 66 |             _, binary_mask = cv2.threshold(
 67 |                 np.uint8(mask * 255), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
 68 | 
 69 |         binary_mask = torch.from_numpy(binary_mask)
 70 |         binary_mask = binary_mask.to(mask.device)
 71 |         return imputer(input_tensor, binary_mask)
 72 | 
 73 | 
 74 | class RemoveLeastRelevantFirst(RemoveMostRelevantFirst):
 75 |     def __init__(self, percentile, imputer):
 76 |         super(RemoveLeastRelevantFirst, self).__init__(percentile, imputer)
 77 | 
 78 |     def __call__(self, input_tensor, mask):
 79 |         return super(RemoveLeastRelevantFirst, self).__call__(
 80 |             input_tensor, 1 - mask)
 81 | 
 82 | 
 83 | class AveragerAcrossThresholds:
 84 |     def __init__(
 85 |         self,
 86 |         imputer,
 87 |         percentiles=[
 88 |             10,
 89 |             20,
 90 |             30,
 91 |             40,
 92 |             50,
 93 |             60,
 94 |             70,
 95 |             80,
 96 |             90]):
 97 |         self.imputer = imputer
 98 |         self.percentiles = percentiles
 99 | 
100 |     def __call__(self,
101 |                  input_tensor: torch.Tensor,
102 |                  cams: np.ndarray,
103 |                  targets: List[Callable],
104 |                  model: torch.nn.Module):
105 |         scores = []
106 |         for percentile in self.percentiles:
107 |             imputer = self.imputer(percentile)
108 |             scores.append(imputer(input_tensor, cams, targets, model))
109 |         return np.mean(np.float32(scores), axis=0)
110 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/metrics/road.py:
--------------------------------------------------------------------------------
  1 | # A Consistent and Efficient Evaluation Strategy for Attribution Methods
  2 | # https://arxiv.org/abs/2202.00449
  3 | # Taken from https://raw.githubusercontent.com/tleemann/road_evaluation/main/imputations.py
  4 | # MIT License
  5 | 
  6 | # Copyright (c) 2022 Tobias Leemann
  7 | 
  8 | # Permission is hereby granted, free of charge, to any person obtaining a copy
  9 | # of this software and associated documentation files (the "Software"), to deal
 10 | # in the Software without restriction, including without limitation the rights
 11 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 12 | # copies of the Software, and to permit persons to whom the Software is
 13 | # furnished to do so, subject to the following conditions:
 14 | 
 15 | # The above copyright notice and this permission notice shall be included in all
 16 | # copies or substantial portions of the Software.
 17 | 
 18 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 19 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 20 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 21 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 22 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 23 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 24 | # SOFTWARE.
 25 | 
 26 | 
 27 | # Implementations of our imputation models.
 28 | import torch
 29 | import numpy as np
 30 | from scipy.sparse import lil_matrix, csc_matrix
 31 | from scipy.sparse.linalg import spsolve
 32 | from typing import List, Callable
 33 | from pytorch_grad_cam.metrics.perturbation_confidence import PerturbationConfidenceMetric, \
 34 |     AveragerAcrossThresholds, \
 35 |     RemoveMostRelevantFirst, \
 36 |     RemoveLeastRelevantFirst
 37 | 
 38 | # The weights of the surrounding pixels
 39 | neighbors_weights = [((1, 1), 1 / 12),
 40 |                      ((0, 1), 1 / 6),
 41 |                      ((-1, 1), 1 / 12),
 42 |                      ((1, -1), 1 / 12),
 43 |                      ((0, -1), 1 / 6),
 44 |                      ((-1, -1), 1 / 12),
 45 |                      ((1, 0), 1 / 6),
 46 |                      ((-1, 0), 1 / 6)]
 47 | 
 48 | 
 49 | class NoisyLinearImputer:
 50 |     def __init__(self,
 51 |                  noise: float = 0.01,
 52 |                  weighting: List[float] = neighbors_weights):
 53 |         """
 54 |                 Noisy linear imputation.
 55 |                 noise: magnitude of noise to add (absolute, set to 0 for no noise)
 56 |                 weighting: Weights of the neighboring pixels in the computation.
 57 |                 List of tuples of (offset, weight)
 58 |         """
 59 |         self.noise = noise
 60 |         self.weighting = neighbors_weights
 61 | 
 62 |     @staticmethod
 63 |     def add_offset_to_indices(indices, offset, mask_shape):
 64 |         """ Add the corresponding offset to the indices.
 65 |     Return new indices plus a valid bit-vector. """
 66 |         cord1 = indices % mask_shape[1]
 67 |         cord0 = indices // mask_shape[1]
 68 |         cord0 += offset[0]
 69 |         cord1 += offset[1]
 70 |         valid = ((cord0 < 0) | (cord1 < 0) |
 71 |                  (cord0 >= mask_shape[0]) |
 72 |                  (cord1 >= mask_shape[1]))
 73 |         return ~valid, indices + offset[0] * mask_shape[1] + offset[1]
 74 | 
 75 |     @staticmethod
 76 |     def setup_sparse_system(mask, img, neighbors_weights):
 77 |         """ Vectorized version to set up the equation system.
 78 |                 mask: (H, W)-tensor of missing pixels.
 79 |                 Image: (H, W, C)-tensor of all values.
 80 |                 Return (N,N)-System matrix, (N,C)-Right hand side for each of the C channels.
 81 |         """
 82 |         maskflt = mask.flatten()
 83 |         imgflat = img.reshape((img.shape[0], -1))
 84 |     # Indices that are imputed in the flattened mask:
 85 |         indices = np.argwhere(maskflt == 0).flatten()
 86 |         coords_to_vidx = np.zeros(len(maskflt), dtype=int)
 87 |         coords_to_vidx[indices] = np.arange(len(indices))
 88 |         numEquations = len(indices)
 89 |     # System matrix:
 90 |         A = lil_matrix((numEquations, numEquations))
 91 |         b = np.zeros((numEquations, img.shape[0]))
 92 |     # Sum of weights assigned:
 93 |         sum_neighbors = np.ones(numEquations)
 94 |         for n in neighbors_weights:
 95 |             offset, weight = n[0], n[1]
 96 |             # Take out outliers
 97 |             valid, new_coords = NoisyLinearImputer.add_offset_to_indices(
 98 |                 indices, offset, mask.shape)
 99 |             valid_coords = new_coords[valid]
100 |             valid_ids = np.argwhere(valid == 1).flatten()
101 |             # Add values to the right hand-side
102 |             has_values_coords = valid_coords[maskflt[valid_coords] > 0.5]
103 |             has_values_ids = valid_ids[maskflt[valid_coords] > 0.5]
104 |             b[has_values_ids, :] -= weight * imgflat[:, has_values_coords].T
105 |             # Add weights to the system (left hand side)
106 | # Find coordinates in the system.
107 |             has_no_values = valid_coords[maskflt[valid_coords] < 0.5]
108 |             variable_ids = coords_to_vidx[has_no_values]
109 |             has_no_values_ids = valid_ids[maskflt[valid_coords] < 0.5]
110 |             A[has_no_values_ids, variable_ids] = weight
111 |             # Reduce weight for invalid
112 |             sum_neighbors[np.argwhere(valid == 0).flatten()] = \
113 |                 sum_neighbors[np.argwhere(valid == 0).flatten()] - weight
114 | 
115 |         A[np.arange(numEquations), np.arange(numEquations)] = -sum_neighbors
116 |         return A, b
117 | 
118 |     def __call__(self, img: torch.Tensor, mask: torch.Tensor):
119 |         """ Our linear inputation scheme. """
120 |         """
121 | 		This is the function to do the linear infilling
122 | 		img: original image (C,H,W)-tensor;
123 | 		mask: mask; (H,W)-tensor
124 | 
125 | 		"""
126 |         imgflt = img.reshape(img.shape[0], -1)
127 |         maskflt = mask.reshape(-1)
128 |     # Indices that need to be imputed.
129 |         indices_linear = np.argwhere(maskflt == 0).flatten()
130 |         # Set up sparse equation system, solve system.
131 |         A, b = NoisyLinearImputer.setup_sparse_system(
132 |             mask.numpy(), img.numpy(), neighbors_weights)
133 |         res = torch.tensor(spsolve(csc_matrix(A), b), dtype=torch.float)
134 | 
135 |         # Fill the values with the solution of the system.
136 |         img_infill = imgflt.clone()
137 |         img_infill[:, indices_linear] = res.t() + self.noise * \
138 |             torch.randn_like(res.t())
139 | 
140 |         return img_infill.reshape_as(img)
141 | 
142 | 
143 | class ROADMostRelevantFirst(PerturbationConfidenceMetric):
144 |     def __init__(self, percentile=80):
145 |         super(ROADMostRelevantFirst, self).__init__(
146 |             RemoveMostRelevantFirst(percentile, NoisyLinearImputer()))
147 | 
148 | 
149 | class ROADLeastRelevantFirst(PerturbationConfidenceMetric):
150 |     def __init__(self, percentile=20):
151 |         super(ROADLeastRelevantFirst, self).__init__(
152 |             RemoveLeastRelevantFirst(percentile, NoisyLinearImputer()))
153 | 
154 | 
155 | class ROADMostRelevantFirstAverage(AveragerAcrossThresholds):
156 |     def __init__(self, percentiles=[10, 20, 30, 40, 50, 60, 70, 80, 90]):
157 |         super(ROADMostRelevantFirstAverage, self).__init__(
158 |             ROADMostRelevantFirst, percentiles)
159 | 
160 | 
161 | class ROADLeastRelevantFirstAverage(AveragerAcrossThresholds):
162 |     def __init__(self, percentiles=[10, 20, 30, 40, 50, 60, 70, 80, 90]):
163 |         super(ROADLeastRelevantFirstAverage, self).__init__(
164 |             ROADLeastRelevantFirst, percentiles)
165 | 
166 | 
167 | class ROADCombined:
168 |     def __init__(self, percentiles=[10, 20, 30, 40, 50, 60, 70, 80, 90]):
169 |         self.percentiles = percentiles
170 |         self.morf_averager = ROADMostRelevantFirstAverage(percentiles)
171 |         self.lerf_averager = ROADLeastRelevantFirstAverage(percentiles)
172 | 
173 |     def __call__(self,
174 |                  input_tensor: torch.Tensor,
175 |                  cams: np.ndarray,
176 |                  targets: List[Callable],
177 |                  model: torch.nn.Module):
178 | 
179 |         scores_lerf = self.lerf_averager(input_tensor, cams, targets, model)
180 |         scores_morf = self.morf_averager(input_tensor, cams, targets, model)
181 |         return (scores_lerf - scores_morf) / 2
182 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/random_cam.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from pytorch_grad_cam.base_cam import BaseCAM
 3 | 
 4 | 
 5 | class RandomCAM(BaseCAM):
 6 |     def __init__(self, model, target_layers, use_cuda=False,
 7 |                  reshape_transform=None):
 8 |         super(
 9 |             RandomCAM,
10 |             self).__init__(
11 |             model,
12 |             target_layers,
13 |             use_cuda,
14 |             reshape_transform)
15 | 
16 |     def get_cam_weights(self,
17 |                         input_tensor,
18 |                         target_layer,
19 |                         target_category,
20 |                         activations,
21 |                         grads):
22 |         return np.random.uniform(-1, 1, size=(grads.shape[0], grads.shape[1]))
23 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/score_cam.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import tqdm
 3 | from pytorch_grad_cam.base_cam import BaseCAM
 4 | 
 5 | 
 6 | class ScoreCAM(BaseCAM):
 7 |     def __init__(
 8 |             self,
 9 |             model,
10 |             target_layers,
11 |             use_cuda=False,
12 |             reshape_transform=None):
13 |         super(ScoreCAM, self).__init__(model,
14 |                                        target_layers,
15 |                                        use_cuda,
16 |                                        reshape_transform=reshape_transform,
17 |                                        uses_gradients=False)
18 | 
19 |     def get_cam_weights(self,
20 |                         input_tensor,
21 |                         target_layer,
22 |                         targets,
23 |                         activations,
24 |                         grads):
25 |         with torch.no_grad():
26 |             upsample = torch.nn.UpsamplingBilinear2d(
27 |                 size=input_tensor.shape[-2:])
28 |             activation_tensor = torch.from_numpy(activations)
29 |             if self.cuda:
30 |                 activation_tensor = activation_tensor.cuda()
31 | 
32 |             upsampled = upsample(activation_tensor)
33 | 
34 |             maxs = upsampled.view(upsampled.size(0),
35 |                                   upsampled.size(1), -1).max(dim=-1)[0]
36 |             mins = upsampled.view(upsampled.size(0),
37 |                                   upsampled.size(1), -1).min(dim=-1)[0]
38 | 
39 |             maxs, mins = maxs[:, :, None, None], mins[:, :, None, None]
40 |             upsampled = (upsampled - mins) / (maxs - mins)
41 | 
42 |             input_tensors = input_tensor[:, None,
43 |                                          :, :] * upsampled[:, :, None, :, :]
44 | 
45 |             if hasattr(self, "batch_size"):
46 |                 BATCH_SIZE = self.batch_size
47 |             else:
48 |                 BATCH_SIZE = 16
49 | 
50 |             scores = []
51 |             for target, tensor in zip(targets, input_tensors):
52 |                 for i in tqdm.tqdm(range(0, tensor.size(0), BATCH_SIZE)):
53 |                     batch = tensor[i: i + BATCH_SIZE, :]
54 |                     outputs = [target(o).cpu().item()
55 |                                for o in self.model(batch)]
56 |                     scores.extend(outputs)
57 |             scores = torch.Tensor(scores)
58 |             scores = scores.view(activations.shape[0], activations.shape[1])
59 |             weights = torch.nn.Softmax(dim=-1)(scores).numpy()
60 |             return weights
61 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/sobel_cam.py:
--------------------------------------------------------------------------------
 1 | import cv2
 2 | 
 3 | 
 4 | def sobel_cam(img):
 5 |     gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
 6 |     grad_x = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
 7 |     grad_y = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
 8 |     abs_grad_x = cv2.convertScaleAbs(grad_x)
 9 |     abs_grad_y = cv2.convertScaleAbs(grad_y)
10 |     grad = cv2.addWeighted(abs_grad_x, 0.5, abs_grad_y, 0.5, 0)
11 |     return grad
12 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/utils/__init__.py:
--------------------------------------------------------------------------------
1 | from pytorch_grad_cam.utils.image import deprocess_image
2 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection
3 | from pytorch_grad_cam.utils import model_targets
4 | from pytorch_grad_cam.utils import reshape_transforms
5 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/utils/find_layers.py:
--------------------------------------------------------------------------------
 1 | def replace_layer_recursive(model, old_layer, new_layer):
 2 |     for name, layer in model._modules.items():
 3 |         if layer == old_layer:
 4 |             model._modules[name] = new_layer
 5 |             return True
 6 |         elif replace_layer_recursive(layer, old_layer, new_layer):
 7 |             return True
 8 |     return False
 9 | 
10 | 
11 | def replace_all_layer_type_recursive(model, old_layer_type, new_layer):
12 |     for name, layer in model._modules.items():
13 |         if isinstance(layer, old_layer_type):
14 |             model._modules[name] = new_layer
15 |         replace_all_layer_type_recursive(layer, old_layer_type, new_layer)
16 | 
17 | 
18 | def find_layer_types_recursive(model, layer_types):
19 |     def predicate(layer):
20 |         return type(layer) in layer_types
21 |     return find_layer_predicate_recursive(model, predicate)
22 | 
23 | 
24 | def find_layer_predicate_recursive(model, predicate):
25 |     result = []
26 |     for name, layer in model._modules.items():
27 |         if predicate(layer):
28 |             result.append(layer)
29 |         result.extend(find_layer_predicate_recursive(layer, predicate))
30 |     return result
31 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/utils/image.py:
--------------------------------------------------------------------------------
  1 | import matplotlib
  2 | from matplotlib import pyplot as plt
  3 | from matplotlib.lines import Line2D
  4 | import cv2
  5 | import numpy as np
  6 | import torch
  7 | from torchvision.transforms import Compose, Normalize, ToTensor
  8 | from typing import List, Dict
  9 | import math
 10 | 
 11 | 
 12 | def preprocess_image(
 13 |     img: np.ndarray, mean=[
 14 |         0.5, 0.5, 0.5], std=[
 15 |             0.5, 0.5, 0.5]) -> torch.Tensor:
 16 |     preprocessing = Compose([
 17 |         ToTensor(),
 18 |         Normalize(mean=mean, std=std)
 19 |     ])
 20 |     return preprocessing(img.copy()).unsqueeze(0)
 21 | 
 22 | 
 23 | def deprocess_image(img):
 24 |     """ see https://github.com/jacobgil/keras-grad-cam/blob/master/grad-cam.py#L65 """
 25 |     img = img - np.mean(img)
 26 |     img = img / (np.std(img) + 1e-5)
 27 |     img = img * 0.1
 28 |     img = img + 0.5
 29 |     img = np.clip(img, 0, 1)
 30 |     return np.uint8(img * 255)
 31 | 
 32 | 
 33 | def show_cam_on_image(img: np.ndarray,
 34 |                       mask: np.ndarray,
 35 |                       use_rgb: bool = False,
 36 |                       colormap: int = cv2.COLORMAP_JET,
 37 |                       image_weight: float = 0.5) -> np.ndarray:
 38 |     """ This function overlays the cam mask on the image as an heatmap.
 39 |     By default the heatmap is in BGR format.
 40 | 
 41 |     :param img: The base image in RGB or BGR format.
 42 |     :param mask: The cam mask.
 43 |     :param use_rgb: Whether to use an RGB or BGR heatmap, this should be set to True if 'img' is in RGB format.
 44 |     :param colormap: The OpenCV colormap to be used.
 45 |     :param image_weight: The final result is image_weight * img + (1-image_weight) * mask.
 46 |     :returns: The default image with the cam overlay.
 47 |     """
 48 |     heatmap = cv2.applyColorMap(np.uint8(255 * mask), colormap)
 49 |     if use_rgb:
 50 |         heatmap = cv2.cvtColor(heatmap, cv2.COLOR_BGR2RGB)
 51 |     heatmap = np.float32(heatmap) / 255
 52 | 
 53 |     if np.max(img) > 1:
 54 |         raise Exception(
 55 |             "The input image should np.float32 in the range [0, 1]")
 56 | 
 57 |     if image_weight < 0 or image_weight > 1:
 58 |         raise Exception(
 59 |             f"image_weight should be in the range [0, 1].\
 60 |                 Got: {image_weight}")
 61 | 
 62 |     cam = (1 - image_weight) * heatmap + image_weight * img
 63 |     cam = cam / np.max(cam)
 64 |     return np.uint8(255 * cam)
 65 | 
 66 | 
 67 | def create_labels_legend(concept_scores: np.ndarray,
 68 |                          labels: Dict[int, str],
 69 |                          top_k=2):
 70 |     concept_categories = np.argsort(concept_scores, axis=1)[:, ::-1][:, :top_k]
 71 |     concept_labels_topk = []
 72 |     for concept_index in range(concept_categories.shape[0]):
 73 |         categories = concept_categories[concept_index, :]
 74 |         concept_labels = []
 75 |         for category in categories:
 76 |             score = concept_scores[concept_index, category]
 77 |             label = f"{','.join(labels[category].split(',')[:3])}:{score:.2f}"
 78 |             concept_labels.append(label)
 79 |         concept_labels_topk.append("\n".join(concept_labels))
 80 |     return concept_labels_topk
 81 | 
 82 | 
 83 | def show_factorization_on_image(img: np.ndarray,
 84 |                                 explanations: np.ndarray,
 85 |                                 colors: List[np.ndarray] = None,
 86 |                                 image_weight: float = 0.5,
 87 |                                 concept_labels: List = None) -> np.ndarray:
 88 |     """ Color code the different component heatmaps on top of the image.
 89 |         Every component color code will be magnified according to the heatmap itensity
 90 |         (by modifying the V channel in the HSV color space),
 91 |         and optionally create a lagend that shows the labels.
 92 | 
 93 |         Since different factorization component heatmaps can overlap in principle,
 94 |         we need a strategy to decide how to deal with the overlaps.
 95 |         This keeps the component that has a higher value in it's heatmap.
 96 | 
 97 |     :param img: The base image RGB format.
 98 |     :param explanations: A tensor of shape num_componetns x height x width, with the component visualizations.
 99 |     :param colors: List of R, G, B colors to be used for the components.
100 |                    If None, will use the gist_rainbow cmap as a default.
101 |     :param image_weight: The final result is image_weight * img + (1-image_weight) * visualization.
102 |     :concept_labels: A list of strings for every component. If this is paseed, a legend that shows
103 |                      the labels and their colors will be added to the image.
104 |     :returns: The visualized image.
105 |     """
106 |     n_components = explanations.shape[0]
107 |     if colors is None:
108 |         # taken from https://github.com/edocollins/DFF/blob/master/utils.py
109 |         _cmap = plt.cm.get_cmap('gist_rainbow')
110 |         colors = [
111 |             np.array(
112 |                 _cmap(i)) for i in np.arange(
113 |                 0,
114 |                 1,
115 |                 1.0 /
116 |                 n_components)]
117 |     concept_per_pixel = explanations.argmax(axis=0)
118 |     masks = []
119 |     for i in range(n_components):
120 |         mask = np.zeros(shape=(img.shape[0], img.shape[1], 3))
121 |         mask[:, :, :] = colors[i][:3]
122 |         explanation = explanations[i]
123 |         explanation[concept_per_pixel != i] = 0
124 |         mask = np.uint8(mask * 255)
125 |         mask = cv2.cvtColor(mask, cv2.COLOR_RGB2HSV)
126 |         mask[:, :, 2] = np.uint8(255 * explanation)
127 |         mask = cv2.cvtColor(mask, cv2.COLOR_HSV2RGB)
128 |         mask = np.float32(mask) / 255
129 |         masks.append(mask)
130 | 
131 |     mask = np.sum(np.float32(masks), axis=0)
132 |     result = img * image_weight + mask * (1 - image_weight)
133 |     result = np.uint8(result * 255)
134 | 
135 |     if concept_labels is not None:
136 |         px = 1 / plt.rcParams['figure.dpi']  # pixel in inches
137 |         fig = plt.figure(figsize=(result.shape[1] * px, result.shape[0] * px))
138 |         plt.rcParams['legend.fontsize'] = int(
139 |             14 * result.shape[0] / 256 / max(1, n_components / 6))
140 |         lw = 5 * result.shape[0] / 256
141 |         lines = [Line2D([0], [0], color=colors[i], lw=lw)
142 |                  for i in range(n_components)]
143 |         plt.legend(lines,
144 |                    concept_labels,
145 |                    mode="expand",
146 |                    fancybox=True,
147 |                    shadow=True)
148 | 
149 |         plt.tight_layout(pad=0, w_pad=0, h_pad=0)
150 |         plt.axis('off')
151 |         fig.canvas.draw()
152 |         data = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8)
153 |         plt.close(fig=fig)
154 |         data = data.reshape(fig.canvas.get_width_height()[::-1] + (3,))
155 |         data = cv2.resize(data, (result.shape[1], result.shape[0]))
156 |         result = np.hstack((result, data))
157 |     return result
158 | 
159 | 
160 | def scale_cam_image(cam, target_size=None):
161 |     result = []
162 |     for img in cam:
163 |         img = img - np.min(img)
164 |         img = img / (1e-7 + np.max(img))
165 |         if target_size is not None:
166 |             img = cv2.resize(img, target_size)
167 |         result.append(img)
168 |     result = np.float32(result)
169 | 
170 |     return result
171 | 
172 | 
173 | def scale_accross_batch_and_channels(tensor, target_size):
174 |     batch_size, channel_size = tensor.shape[:2]
175 |     reshaped_tensor = tensor.reshape(
176 |         batch_size * channel_size, *tensor.shape[2:])
177 |     result = scale_cam_image(reshaped_tensor, target_size)
178 |     result = result.reshape(
179 |         batch_size,
180 |         channel_size,
181 |         target_size[1],
182 |         target_size[0])
183 |     return result
184 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/utils/model_targets.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | import torchvision
  4 | 
  5 | 
  6 | class ClassifierOutputTarget:
  7 |     def __init__(self, category):
  8 |         self.category = category
  9 | 
 10 |     def __call__(self, model_output):
 11 |         if len(model_output.shape) == 1:
 12 |             return model_output[self.category]
 13 |         return model_output[:, self.category]
 14 | 
 15 | 
 16 | class ClassifierOutputSoftmaxTarget:
 17 |     def __init__(self, category):
 18 |         self.category = category
 19 | 
 20 |     def __call__(self, model_output):
 21 |         if len(model_output.shape) == 1:
 22 |             return torch.softmax(model_output, dim=-1)[self.category]
 23 |         return torch.softmax(model_output, dim=-1)[:, self.category]
 24 | 
 25 | 
 26 | class BinaryClassifierOutputTarget:
 27 |     def __init__(self, category):
 28 |         self.category = category
 29 | 
 30 |     def __call__(self, model_output):
 31 |         if self.category == 1:
 32 |             sign = 1
 33 |         else:
 34 |             sign = -1
 35 |         return model_output * sign
 36 | 
 37 | 
 38 | class SoftmaxOutputTarget:
 39 |     def __init__(self):
 40 |         pass
 41 | 
 42 |     def __call__(self, model_output):
 43 |         return torch.softmax(model_output, dim=-1)
 44 | 
 45 | 
 46 | class RawScoresOutputTarget:
 47 |     def __init__(self):
 48 |         pass
 49 | 
 50 |     def __call__(self, model_output):
 51 |         return model_output
 52 | 
 53 | 
 54 | class SemanticSegmentationTarget:
 55 |     """ Gets a binary spatial mask and a category,
 56 |         And return the sum of the category scores,
 57 |         of the pixels in the mask. """
 58 | 
 59 |     def __init__(self, category, mask):
 60 |         self.category = category
 61 |         self.mask = torch.from_numpy(mask)
 62 |         if torch.cuda.is_available():
 63 |             self.mask = self.mask.cuda()
 64 | 
 65 |     def __call__(self, model_output):
 66 |         return (model_output[self.category, :, :] * self.mask).sum()
 67 | 
 68 | 
 69 | class FasterRCNNBoxScoreTarget:
 70 |     """ For every original detected bounding box specified in "bounding boxes",
 71 |         assign a score on how the current bounding boxes match it,
 72 |             1. In IOU
 73 |             2. In the classification score.
 74 |         If there is not a large enough overlap, or the category changed,
 75 |         assign a score of 0.
 76 | 
 77 |         The total score is the sum of all the box scores.
 78 |     """
 79 | 
 80 |     def __init__(self, labels, bounding_boxes, iou_threshold=0.5):
 81 |         self.labels = labels
 82 |         self.bounding_boxes = bounding_boxes
 83 |         self.iou_threshold = iou_threshold
 84 | 
 85 |     def __call__(self, model_outputs):
 86 |         output = torch.Tensor([0])
 87 |         if torch.cuda.is_available():
 88 |             output = output.cuda()
 89 | 
 90 |         if len(model_outputs["boxes"]) == 0:
 91 |             return output
 92 | 
 93 |         for box, label in zip(self.bounding_boxes, self.labels):
 94 |             box = torch.Tensor(box[None, :])
 95 |             if torch.cuda.is_available():
 96 |                 box = box.cuda()
 97 | 
 98 |             ious = torchvision.ops.box_iou(box, model_outputs["boxes"])
 99 |             index = ious.argmax()
100 |             if ious[0, index] > self.iou_threshold and model_outputs["labels"][index] == label:
101 |                 score = ious[0, index] + model_outputs["scores"][index]
102 |                 output = output + score
103 |         return output
104 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/utils/reshape_transforms.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | 
 3 | 
 4 | def fasterrcnn_reshape_transform(x):
 5 |     target_size = x['pool'].size()[-2:]
 6 |     activations = []
 7 |     for key, value in x.items():
 8 |         activations.append(
 9 |             torch.nn.functional.interpolate(
10 |                 torch.abs(value),
11 |                 target_size,
12 |                 mode='bilinear'))
13 |     activations = torch.cat(activations, axis=1)
14 |     return activations
15 | 
16 | 
17 | def swinT_reshape_transform(tensor, height=7, width=7):
18 |     result = tensor.reshape(tensor.size(0),
19 |                             height, width, tensor.size(2))
20 | 
21 |     # Bring the channels to the first dimension,
22 |     # like in CNNs.
23 |     result = result.transpose(2, 3).transpose(1, 2)
24 |     return result
25 | 
26 | 
27 | def vit_reshape_transform(tensor, height=14, width=14):
28 |     result = tensor[:, 1:, :].reshape(tensor.size(0),
29 |                                       height, width, tensor.size(2))
30 | 
31 |     # Bring the channels to the first dimension,
32 |     # like in CNNs.
33 |     result = result.transpose(2, 3).transpose(1, 2)
34 |     return result
35 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/utils/svd_on_activations.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | def get_2d_projection(activation_batch):
 5 |     # TBD: use pytorch batch svd implementation
 6 |     activation_batch[np.isnan(activation_batch)] = 0
 7 |     projections = []
 8 |     for activations in activation_batch:
 9 |         reshaped_activations = (activations).reshape(
10 |             activations.shape[0], -1).transpose()
11 |         # Centering before the SVD seems to be important here,
12 |         # Otherwise the image returned is negative
13 |         reshaped_activations = reshaped_activations - \
14 |             reshaped_activations.mean(axis=0)
15 |         U, S, VT = np.linalg.svd(reshaped_activations, full_matrices=True)
16 |         projection = reshaped_activations @ VT[0, :]
17 |         projection = projection.reshape(activations.shape[1:])
18 |         projections.append(projection)
19 |     return np.float32(projections)
20 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/xgrad_cam.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from pytorch_grad_cam.base_cam import BaseCAM
 3 | 
 4 | 
 5 | class XGradCAM(BaseCAM):
 6 |     def __init__(
 7 |             self,
 8 |             model,
 9 |             target_layers,
10 |             use_cuda=False,
11 |             reshape_transform=None):
12 |         super(
13 |             XGradCAM,
14 |             self).__init__(
15 |             model,
16 |             target_layers,
17 |             use_cuda,
18 |             reshape_transform)
19 | 
20 |     def get_cam_weights(self,
21 |                         input_tensor,
22 |                         target_layer,
23 |                         target_category,
24 |                         activations,
25 |                         grads):
26 |         sum_activations = np.sum(activations, axis=(2, 3))
27 |         eps = 1e-7
28 |         weights = grads * activations / \
29 |             (sum_activations[:, :, None, None] + eps)
30 |         weights = weights.sum(axis=(2, 3))
31 |         return weights
32 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/requirements.txt:
--------------------------------------------------------------------------------
 1 | numpy==1.19.4
 2 | psutil==5.6.7
 3 | faiss_cpu
 4 | tqdm==4.48.2
 5 | Pillow==8.2.0
 6 | scikit_learn==0.24.1
 7 | torchscan==0.1.1
 8 | googledrivedownloader==0.4
 9 | requests==2.26.0
10 | timm==0.4.12
11 | transformers==4.10.2
12 | einops
13 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/results/result.txt:
--------------------------------------------------------------------------------
1 | ResNet-NetVLAD: R@1: 39.4, R@5: 83.9, R@10: 93.1, R@20: 97.6
2 | CCT-NetVLAD: R@1: 38.7, R@5: 81.6, R@10: 93.4, R@20: 97.7
3 | MixVPR：R@1: 41.3, R@5: 83.1, R@10: 93.8, R@20: 97.9
4 | CosPlace: R@1: 29.1, R@5: 73.9, R@10: 88.5, R@20: 96.4
5 | AnyLoc: R@1: 37.4, R@5: 81.0, R@10: 92.5, R@20: 97.8


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/sbatch.txt:
--------------------------------------------------------------------------------
1 | python train.py --dataset_name=compressed_vid --datasets_folder=/scratch/ds5725/VPR-datasets-downloader/datasets --resume=/scratch/ds5725/deep-visual-geo-localization-benchmark/logs/default/2023-04-22_19-42-06/best_model.pth
2 | python train.py --dataset_name=indoor --datasets_folder=/mnt/data/nyc_indoor --backbone=resnet50conv4
3 | 
4 | python train.py --dataset_name=nyu-vpr --datasets_folder=/scratch/ds5725/VPR-datasets-downloader/datasets --backbone=


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/scratch.py:
--------------------------------------------------------------------------------
1 | import torch
2 | 
3 | best_model_state_dict = torch.load(join(args.save_dir, "/scratch/ds5725/deep-visual-geo-localization-benchmark/logs/default/2023-04-22_19-42-06/best_model.pth"))["model_state_dict"]
4 | 
5 | model.load_state_dict(best_model_state_dict)
6 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/summary.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import util
 3 | import cv2
 4 | from tqdm import tqdm
 5 | 
 6 | folder_path = '/scratch/ds5725/VPR-datasets-downloader/datasets/nyu-vpr/images/test/queries'
 7 | 
 8 | file_list = os.listdir(folder_path)
 9 | 
10 | full_path_list = [os.path.join(folder_path, filename) for filename in file_list]
11 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/test.SBATCH:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | #SBATCH --nodes=1
 4 | #SBATCH --ntasks-per-node=4
 5 | #SBATCH --cpus-per-task=1
 6 | #SBATCH --mem-per-cpu=64GB
 7 | #SBATCH --time=24:00:00
 8 | #SBATCH --gres=gpu
 9 | #SBATCH --job-name=res
10 | 
11 | module purge
12 | 
13 | singularity exec --nv \
14 | 	    --overlay /scratch/ds5725/environments/mixvpr.ext3:rw \
15 | 	    /scratch/work/public/singularity/cuda11.1-cudnn8-devel-ubuntu18.04.sif \
16 | 	    /bin/bash -c "source /ext3/env.sh; python train_1.py --dataset_name=indoor --datasets_folder=/mnt/data/nyc_indoor --backb"
17 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/util.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import re
 3 | import torch
 4 | import shutil
 5 | import logging
 6 | import torchscan
 7 | import numpy as np
 8 | from collections import OrderedDict
 9 | from os.path import join
10 | from sklearn.decomposition import PCA
11 | 
12 | import datasets_ws
13 | 
14 | 
15 | def get_flops(model, input_shape=(480, 640)):
16 |     """Return the FLOPs as a string, such as '22.33 GFLOPs'"""
17 |     assert len(input_shape) == 2, f"input_shape should have len==2, but it's {input_shape}"
18 |     module_info = torchscan.crawl_module(model, (3, input_shape[0], input_shape[1]))
19 |     output = torchscan.utils.format_info(module_info)
20 |     return re.findall("Floating Point Operations on forward: (.*)\n", output)[0]
21 | 
22 | 
23 | def save_checkpoint(args, state, is_best, filename):
24 |     model_path = join(args.save_dir, filename)
25 |     torch.save(state, model_path)
26 |     if is_best:
27 |         shutil.copyfile(model_path, join(args.save_dir, "best_model.pth"))
28 | 
29 | 
30 | def resume_model(args, model):
31 |     checkpoint = torch.load(args.resume, map_location=args.device)
32 |     if 'model_state_dict' in checkpoint:
33 |         state_dict = checkpoint['model_state_dict']
34 |     else:
35 |         # The pre-trained models that we provide in the README do not have 'state_dict' in the keys as
36 |         # the checkpoint is directly the state dict
37 |         state_dict = checkpoint
38 |     # if the model contains the prefix "module" which is appendend by
39 |     # DataParallel, remove it to avoid errors when loading dict
40 |     if list(state_dict.keys())[0].startswith('module'):
41 |         state_dict = OrderedDict({k.replace('module.', ''): v for (k, v) in state_dict.items()})
42 |     model.load_state_dict(state_dict)
43 |     return model
44 | 
45 | 
46 | def resume_train(args, model, optimizer=None, strict=False):
47 |     """Load model, optimizer, and other training parameters"""
48 |     logging.debug(f"Loading checkpoint: {args.resume}")
49 |     checkpoint = torch.load(args.resume)
50 |     start_epoch_num = checkpoint["epoch_num"]
51 |     model.load_state_dict(checkpoint["model_state_dict"], strict=strict)
52 |     if optimizer:
53 |         optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
54 |     best_r5 = checkpoint["best_r5"]
55 |     not_improved_num = checkpoint["not_improved_num"]
56 |     logging.debug(f"Loaded checkpoint: start_epoch_num = {start_epoch_num}, "
57 |                   f"current_best_R@5 = {best_r5:.1f}")
58 |     if args.resume.endswith("last_model.pth"):  # Copy best model to current save_dir
59 |         shutil.copy(args.resume.replace("last_model.pth", "best_model.pth"), args.save_dir)
60 |     return model, optimizer, best_r5, start_epoch_num, not_improved_num
61 | 
62 | 
63 | def compute_pca(args, model, pca_dataset_folder, full_features_dim):
64 |     model = model.eval()
65 |     pca_ds = datasets_ws.PCADataset(args, args.datasets_folder, pca_dataset_folder)
66 |     dl = torch.utils.data.DataLoader(pca_ds, args.infer_batch_size, shuffle=True)
67 |     pca_features = np.empty([min(len(pca_ds), 2**14), full_features_dim])
68 |     with torch.no_grad():
69 |         for i, images in enumerate(dl):
70 |             if i*args.infer_batch_size >= len(pca_features):
71 |                 break
72 |             features = model(images).cpu().numpy()
73 |             pca_features[i*args.infer_batch_size : (i*args.infer_batch_size)+len(features)] = features
74 |     pca = PCA(args.pca_dim)
75 |     pca.fit(pca_features)
76 |     return pca
77 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/utils/__init__.py:
--------------------------------------------------------------------------------
1 | from .losses import get_miner, get_loss
2 | from .validation import get_validation_recalls
3 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/utils/losses.py:
--------------------------------------------------------------------------------
 1 | from pytorch_metric_learning import losses, miners
 2 | from pytorch_metric_learning.distances import CosineSimilarity, DotProductSimilarity
 3 | 
 4 | def get_loss(loss_name):
 5 |     if loss_name == 'SupConLoss': return losses.SupConLoss(temperature=0.07)
 6 |     if loss_name == 'CircleLoss': return losses.CircleLoss(m=0.4, gamma=80) #these are params for image retrieval
 7 |     if loss_name == 'MultiSimilarityLoss': return losses.MultiSimilarityLoss(alpha=1.0, beta=50, base=0.0, distance=DotProductSimilarity())
 8 |     if loss_name == 'ContrastiveLoss': return losses.ContrastiveLoss(pos_margin=0, neg_margin=1)
 9 |     if loss_name == 'Lifted': return losses.GeneralizedLiftedStructureLoss(neg_margin=0, pos_margin=1, distance=DotProductSimilarity())
10 |     if loss_name == 'FastAPLoss': return losses.FastAPLoss(num_bins=30)
11 |     if loss_name == 'NTXentLoss': return losses.NTXentLoss(temperature=0.07) #The MoCo paper uses 0.07, while SimCLR uses 0.5.
12 |     if loss_name == 'TripletMarginLoss': return losses.TripletMarginLoss(margin=0.1, swap=False, smooth_loss=False, triplets_per_anchor='all') #or an int, for example 100
13 |     if loss_name == 'CentroidTripletLoss': return losses.CentroidTripletLoss(margin=0.05,
14 |                                                                             swap=False,
15 |                                                                             smooth_loss=False,
16 |                                                                             triplets_per_anchor="all",)
17 |     raise NotImplementedError(f'Sorry, <{loss_name}> loss function is not implemented!')
18 | 
19 | def get_miner(miner_name, margin=0.1):
20 |     if miner_name == 'TripletMarginMiner' : return miners.TripletMarginMiner(margin=margin, type_of_triplets="semihard") # all, hard, semihard, easy
21 |     if miner_name == 'MultiSimilarityMiner' : return miners.MultiSimilarityMiner(epsilon=margin, distance=CosineSimilarity())
22 |     if miner_name == 'PairMarginMiner' : return miners.PairMarginMiner(pos_margin=0.7, neg_margin=0.3, distance=DotProductSimilarity())
23 |     return None
24 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/utils/validation.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import faiss
 3 | import faiss.contrib.torch_utils
 4 | from prettytable import PrettyTable
 5 | 
 6 | 
 7 | def get_validation_recalls(r_list, q_list, k_values, gt, print_results=True, faiss_gpu=False, dataset_name='dataset without name ?'):
 8 |         
 9 |         embed_size = r_list.shape[1]
10 |         if faiss_gpu:
11 |             res = faiss.StandardGpuResources()
12 |             flat_config = faiss.GpuIndexFlatConfig()
13 |             flat_config.useFloat16 = True
14 |             flat_config.device = 0
15 |             faiss_index = faiss.GpuIndexFlatL2(res, embed_size, flat_config)
16 |         # build index
17 |         else:
18 |             faiss_index = faiss.IndexFlatL2(embed_size)
19 |         
20 |         # add references
21 |         faiss_index.add(r_list)
22 | 
23 |         # search for queries in the index
24 |         _, predictions = faiss_index.search(q_list, max(k_values))
25 |         
26 |         
27 |         
28 |         # start calculating recall_at_k
29 |         correct_at_k = np.zeros(len(k_values))
30 |         for q_idx, pred in enumerate(predictions):
31 |             for i, n in enumerate(k_values):
32 |                 # if in top N then also in top NN, where NN > N
33 |                 if np.any(np.in1d(pred[:n], gt[q_idx])):
34 |                     correct_at_k[i:] += 1
35 |                     break
36 |         
37 |         correct_at_k = correct_at_k / len(predictions)
38 |         d = {k:v for (k,v) in zip(k_values, correct_at_k)}
39 | 
40 |         if print_results:
41 |             print() # print a new line
42 |             table = PrettyTable()
43 |             table.field_names = ['K']+[str(k) for k in k_values]
44 |             table.add_row(['Recall@K']+ [f'{100*v:.2f}' for v in correct_at_k])
45 |             print(table.get_string(title=f"Performances on {dataset_name}"))
46 |         
47 |         return d
48 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/visual/name_utm.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import util
 3 | import cv2
 4 | from tqdm import tqdm
 5 | 
 6 | def get_file_paths(folder_path):
 7 |     file_paths = []
 8 |     for root, dirs, files in os.walk(folder_path):
 9 |         for file in files:
10 |             file_path = os.path.join(root, file)
11 |             file_paths.append(file_path)
12 |     return file_paths
13 | 
14 | folder_path = '/scratch/ds5725/VPR-datasets-downloader/datasets/indoor_new/images/val'
15 | 
16 | file_list=get_file_paths(folder_path)
17 | 
18 | full_path_list = [os.path.join(folder_path, filename) for filename in file_list]
19 | # file_list = os.listdir(folder_path)
20 | 
21 | # full_path_list = [os.path.join(folder_path, filename) for filename in file_list]
22 | 
23 | # folder_path = '/scratch/ds5725/deep-visual-geo-localization-benchmark/visual/resnet'
24 | 
25 | # file_list = os.listdir(folder_path)
26 | 
27 | # full_path_list1 = [os.path.join(folder_path, filename) for filename in file_list]
28 | 
29 | # d={}
30 | # f=open("/scratch/ds5725/ssl_vpr/sub/sub_test_utm.txt")
31 | # for line in f:
32 | #     s=line.strip().split()
33 | #     d[s[0]]=(util.format_coord(float(s[1])),util.format_coord(float(s[2])))
34 | 
35 | 
36 | # for i in tqdm(range(len(full_path_list))):
37 | #     substr=os.path.basename(full_path_list[i])
38 | #     substr=substr[:-6]+".jpg"
39 | #     utm_east=d[substr][0]
40 | #     utm_north=d[substr][1]
41 | #     new_name=""
42 | #     print(utm_east,utm_north)
43 | #     for ip1 in full_path_list1:
44 | #         print(ip1)
45 | #         break
46 | #         if utm_east in ip1 and utm_north in ip1:
47 | #             new_name=os.path.basename(ip1)
48 | #             break
49 | #     if new_name!="":
50 | #         img=cv2.imread(full_path_list[i])
51 | #         cv2.imwrite("/scratch/ds5725/deep-visual-geo-localization-benchmark/visual/simclr_utm/"+new_name, img)
52 |     
53 | f1=open("val_paths.txt", "w")
54 | for fp in full_path_list:
55 |     f1.write(fp+'\n')
56 | 
57 | f2=open("val_utm.txt","w")
58 | for line in full_path_list:
59 |     substr=os.path.basename(line)
60 |     first_number = float(substr.split('@')[1])
61 |     second_number = float(substr.split('@')[2])
62 |     f2.write(substr+" "+str(first_number)+" "+str(second_number)+'\n')
63 | 
64 | 


--------------------------------------------------------------------------------
/benchmark/deep-visual-geo-localization-benchmark/visual/util.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import re
  3 | import utm
  4 | import cv2
  5 | import math
  6 | import time
  7 | import shutil
  8 | import requests
  9 | from tqdm import tqdm
 10 | 
 11 | RETRY_SECONDS = 2
 12 | 
 13 | 
 14 | def get_distance(coords_A, coords_B):
 15 |     return math.sqrt((float(coords_B[0])-float(coords_A[0]))**2 + (float(coords_B[1])-float(coords_A[1]))**2)
 16 | 
 17 | 
 18 | def download_heavy_file(url, output_path):
 19 |     os.makedirs("tmp", exist_ok=True)
 20 | 
 21 |     tmp_filename = os.path.join("tmp", f"tmp_{int(time.time()*1000)}")
 22 |     if os.path.exists(output_path):
 23 |         print(f"File {output_path} already exists, I won't download it again")
 24 |         return
 25 |     for attempt_num in range(10):  # In case of errors, try 10 times
 26 |         try:
 27 |             req = requests.get(url, stream=True)
 28 |             total_size = int(req.headers.get('content-length', 0))  # Total size in bytes
 29 |             block_size = 1024  # 1 KB
 30 |             tqdm_bar = tqdm(total=total_size, desc=os.path.basename(output_path),
 31 |                             unit='iB', unit_scale=True, ncols=100)
 32 |             with open(tmp_filename, 'wb') as f:
 33 |                 for data in req.iter_content(block_size):
 34 |                     tqdm_bar.update(len(data))
 35 |                     f.write(data)
 36 |             tqdm_bar.close()
 37 |             if total_size != 0 and tqdm_bar.n != total_size:
 38 |                 print(tqdm_bar.n)
 39 |                 print(total_size)
 40 |                 raise RuntimeError("ERROR, something went wrong during download")
 41 |             break
 42 |         except (Exception, RuntimeError) as e:
 43 |             if os.path.exists(tmp_filename): os.remove(tmp_filename)
 44 |             print(e)
 45 |             print(f"I'll try again to download {output_path} in {RETRY_SECONDS**attempt_num} seconds")
 46 |             time.sleep(RETRY_SECONDS**attempt_num)
 47 |     else:
 48 |         raise RuntimeError(f"I tried 10 times and I couldn't download {output_path} from {url}")
 49 |     os.makedirs(os.path.dirname(os.path.abspath(output_path)), exist_ok=True)
 50 |     shutil.move(tmp_filename, output_path)
 51 | 
 52 | 
 53 | def is_valid_timestamp(timestamp):
 54 |     """Return True if it's a valid timestamp, in format YYYYMMDD_hhmmss,
 55 |         with all fields from left to right optional.
 56 |     >>> is_valid_timestamp('')
 57 |     True
 58 |     >>> is_valid_timestamp('201901')
 59 |     True
 60 |     >>> is_valid_timestamp('20190101_123000')
 61 |     True
 62 |     """
 63 |     return bool(re.match("^(\d{4}(\d{2}(\d{2}(_(\d{2})(\d{2})?(\d{2})?)?)?)?)?$", timestamp))
 64 | 
 65 | 
 66 | def format_coord(num, left=7, right=2):
 67 |     """Return the formatted number as a string with (left) int digits 
 68 |             (including sign '-' for negatives) and (right) float digits.
 69 |     >>> format_coord(1.1, 3, 3)
 70 |     '001.100'
 71 |     >>> format_coord(-0.123, 3, 3)
 72 |     '-00.123'
 73 |     """
 74 |     sign = "-" if float(num) < 0 else ""
 75 |     num = str(abs(float(num))) + "."
 76 |     integer, decimal = num.split(".")[:2]
 77 |     left -= len(sign)
 78 |     return f"{sign}{int(integer):0{left}d}.{decimal[:right]:<0{right}}"
 79 | 
 80 | import doctest
 81 | doctest.testmod()  # Automatically execute unit-test of format_coord()
 82 | 
 83 | 
 84 | def format_location_info(latitude, longitude):
 85 |     easting, northing, zone_number, zone_letter = utm.from_latlon(float(latitude), float(longitude))
 86 |     easting = format_coord(easting, 7, 2)
 87 |     northing = format_coord(northing, 7, 2)
 88 |     latitude = format_coord(latitude, 3, 5)
 89 |     longitude = format_coord(longitude, 4, 5)
 90 |     return easting, northing, zone_number, zone_letter, latitude, longitude
 91 | 
 92 | 
 93 | def get_dst_image_name(latitude, longitude, pano_id=None, tile_num=None, heading=None,
 94 |                        pitch=None, roll=None, height=None, timestamp=None, note=None, extension=".jpg"):
 95 |     easting, northing, zone_number, zone_letter, latitude, longitude = format_location_info(latitude, longitude)
 96 |     tile_num  = f"{int(float(tile_num)):02d}" if tile_num  is not None else ""
 97 |     heading   = f"{int(float(heading)):03d}"  if heading   is not None else ""
 98 |     pitch     = f"{int(float(pitch)):03d}"    if pitch     is not None else ""
 99 |     timestamp = f"{timestamp}"                if timestamp is not None else ""
100 |     note      = f"{note}"                     if note      is not None else ""
101 |     assert is_valid_timestamp(timestamp), f"{timestamp} is not in YYYYMMDD_hhmmss format"
102 |     if roll is None: roll = ""
103 |     else: raise NotImplementedError()
104 |     if height is None: height = ""
105 |     else: raise NotImplementedError()
106 |     
107 |     return f"@{easting}@{northing}@{zone_number:02d}@{zone_letter}@{latitude}@{longitude}" + \
108 |            f"@{pano_id}@{tile_num}@{heading}@{pitch}@{roll}@{height}@{timestamp}@{note}@{extension}"
109 | 
110 | 
111 | class VideoReader:
112 |     def __init__(self, video_name, size=None):
113 |         if not os.path.exists(video_name):
114 |             raise FileNotFoundError(f"{video_name} does not exist")
115 |         self.video_name = video_name
116 |         self.size = size
117 |         self.vc = cv2.VideoCapture(f"{video_name}")
118 |         self.frames_per_second = self.vc.get(cv2.CAP_PROP_FPS)
119 |         self.frame_duration_millis = 1000 / self.frames_per_second
120 |         self.frames_num = int(self.vc.get(cv2.CAP_PROP_FRAME_COUNT))
121 |         self.video_length_in_millis = int(self.frames_num * 1000 / self.frames_per_second)
122 | 
123 |     def get_time_at_frame(self, frame_num):
124 |         return int(self.frame_duration_millis * frame_num)
125 | 
126 |     def get_frame_num_at_time(self, time):
127 |         # time can be str ('21:59') or int in milliseconds
128 |         millis = time if type(time) == int else self.str_to_millis(time)
129 |         return min(int(millis / self.frame_duration_millis), self.frames_num)
130 | 
131 |     def get_frame_at_frame_num(self, frame_num):
132 |         self.vc.set(cv2.CAP_PROP_POS_FRAMES, frame_num)
133 |         frame = self.vc.read()[1]
134 |         if frame is None: return None  # In case of corrupt videos
135 |         if self.size is not None:
136 |             frame = cv2.resize(frame, self.size[::-1], cv2.INTER_CUBIC)
137 |         frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
138 |         return frame
139 | 
140 |     @staticmethod
141 |     def str_to_millis(time_str):
142 |         return (int(time_str.split(":")[0]) * 60 + int(time_str.split(":")[1])) * 1000
143 | 
144 |     @staticmethod
145 |     def millis_to_str(millis):
146 |         if millis < 60*60*1000:
147 |             return f"{math.floor((millis//1000//60)%60):02d}:{millis//1000%60:02d}"
148 |         else:
149 |             return f"{math.floor((millis//1000//60//60)%60):02d}:{math.floor((millis//1000//60)%60):02d}:{millis//1000%60:02d}"
150 | 
151 |     def __repr__(self):
152 |         H, W = int(self.vc.get(cv2.CAP_PROP_FRAME_HEIGHT)), int(self.vc.get(cv2.CAP_PROP_FRAME_WIDTH))
153 |         return (f"Video '{self.video_name}' has {self.frames_num} frames, " +
154 |                 f"with resolution {H}x{W}, " +
155 |                 f"and lasts {self.video_length_in_millis // 1000} seconds "
156 |                 f"({self.millis_to_str(self.video_length_in_millis)}), therefore "
157 |                 f"there's a frame every {int(self.frame_duration_millis)} millis")
158 | 
159 |     def __del__(self):
160 |         self.vc.release()
161 | 
162 | 


--------------------------------------------------------------------------------
/method/README.md:
--------------------------------------------------------------------------------
1 | # Usage
2 | TODO: add instructions for running the code.


--------------------------------------------------------------------------------
/teaser/data_vis.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/teaser/data_vis.jpg


--------------------------------------------------------------------------------
/teaser/dataset_vis.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/teaser/dataset_vis.jpg


--------------------------------------------------------------------------------
/teaser/label_pipeline_ex.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/teaser/label_pipeline_ex.png


--------------------------------------------------------------------------------
/teaser/pipeline.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/teaser/pipeline.jpg


--------------------------------------------------------------------------------