├── README.md ├── benchmark └── deep-visual-geo-localization-benchmark │ ├── README.md │ ├── anyloc.txt │ ├── anyloc_vlad_generate.py │ ├── cam.py │ ├── cct.txt │ ├── commons.py │ ├── cosplace.txt │ ├── dataloaders │ ├── GSVCitiesDataloader.py │ ├── GSVCitiesDataset.py │ ├── MapillaryDataset.py │ └── PittsburgDataset.py │ ├── datasets_ws.py │ ├── dino_extractor.py │ ├── eval.py │ ├── ind_name.py │ ├── main.py │ ├── mixvpr.txt │ ├── mixvpr_result.txt │ ├── model │ ├── __init__.py │ ├── aggregation.py │ ├── cct │ │ ├── __init__.py │ │ ├── cct.py │ │ ├── embedder.py │ │ ├── helpers.py │ │ ├── stochastic_depth.py │ │ ├── tokenizer.py │ │ └── transformers.py │ ├── commands.txt │ ├── functional.py │ ├── network.py │ ├── normalization.py │ └── sync_batchnorm │ │ ├── __init__.py │ │ ├── batchnorm.py │ │ ├── batchnorm_reimpl.py │ │ ├── comm.py │ │ ├── replicate.py │ │ └── unittest.py │ ├── models │ ├── __init__.py │ ├── aggregators │ │ ├── __init__.py │ │ ├── convap.py │ │ ├── cosplace.py │ │ ├── gem.py │ │ └── mixvpr.py │ ├── backbones │ │ ├── __init__.py │ │ ├── efficientnet.py │ │ ├── resnet.py │ │ └── swin.py │ └── helper.py │ ├── parser.py │ ├── pytorch_grad_cam │ ├── __init__.py │ ├── ablation_cam.py │ ├── ablation_cam_multilayer.py │ ├── ablation_layer.py │ ├── activations_and_gradients.py │ ├── base_cam.py │ ├── eigen_cam.py │ ├── eigen_grad_cam.py │ ├── feature_factorization │ │ ├── __init__.py │ │ └── deep_feature_factorization.py │ ├── fullgrad_cam.py │ ├── grad_cam.py │ ├── grad_cam_elementwise.py │ ├── grad_cam_plusplus.py │ ├── guided_backprop.py │ ├── hirescam.py │ ├── layer_cam.py │ ├── metrics │ │ ├── __init__.py │ │ ├── cam_mult_image.py │ │ ├── perturbation_confidence.py │ │ └── road.py │ ├── random_cam.py │ ├── score_cam.py │ ├── sobel_cam.py │ ├── utils │ │ ├── __init__.py │ │ ├── find_layers.py │ │ ├── image.py │ │ ├── model_targets.py │ │ ├── reshape_transforms.py │ │ └── svd_on_activations.py │ └── xgrad_cam.py │ ├── requirements.txt │ ├── resnet.txt │ ├── resnet_result.txt │ ├── results │ ├── anyloc_eval.txt │ ├── anyloc_name.txt │ ├── cct_eval.txt │ ├── cct_name.txt │ ├── cosplace_eval.txt │ ├── cosplace_name.txt │ ├── mixvpr_eval.txt │ ├── mixvpr_name.txt │ ├── resnet_eval.txt │ ├── resnet_name.txt │ └── result.txt │ ├── sbatch.txt │ ├── scratch.py │ ├── summary.py │ ├── test.SBATCH │ ├── test.py │ ├── test1.py │ ├── test_database.txt │ ├── test_queries.txt │ ├── train.py │ ├── train_1.py │ ├── util.py │ ├── utilities.py │ ├── utilities1.py │ ├── utils │ ├── __init__.py │ ├── losses.py │ └── validation.py │ └── visual │ ├── name_utm.py │ ├── test_paths.txt │ ├── test_utm.txt │ ├── train_paths.txt │ ├── train_utm.txt │ ├── util.py │ ├── val_paths.txt │ └── val_utm.txt ├── method ├── README.md └── traj_label_gui.py └── teaser ├── data_vis.jpg ├── dataset_vis.jpg ├── label_pipeline_ex.png └── pipeline.jpg /README.md: -------------------------------------------------------------------------------- 1 | # NYC-Indoor-VPR 2 | 3 | Diwei Sheng, Anbang Yang, John-Ross Rizzo, Chen Feng 4 | 5 | [Paper on arXiv](https://arxiv.org/pdf/2404.00504) 6 | 7 |

8 | 9 |

Dataset

10 | 11 |

Semi-auto annotation method

12 | 13 | 14 | 15 | ## News 16 | - [2023/06]: We release **NYC-Indoor** for academic usage. 17 | - [2023/06]: NYC-Indoor is submitted to **NeurIPS 2023 Track on Datasets and Benchmarks**. 18 | - [2024/03]: NYC-Indoor is accepted by **ICRA 2024**. 19 | 20 | ## Abstract 21 | Visual Place Recognition (VPR) seeks to enhance the ability of camera systems to identify previously visited places based on captured images. This paper introduces the NYC-Indoor dataset, a rich collection of over 36,000 images compiled from 13 distinct scenes within a span of a year. NYC-Indoor is a unique, year-long indoor VPR benchmark dataset comprising images from different crowded scenes in New York City, taken under varying lighting conditions with seasonal and appearance changes. To establish ground truth for this dataset, we propose a semi-automatic annotation approach that computes the positional information of each image. Our method specifically takes pairs of videos as input and yields matched pairs of images, along with their estimated relative locations. The accuracy of this matching process is further refined by human annotators, who utilize our custom annotation interface to correlate selected keyframes. We apply our annotation methodology to the NYC-Indoor dataset. Finally, we present a benchmark evaluation of several state-of-the-art VPR algorithms using our dataset. 22 | 23 | ## NYC-Indoor Dataset 24 | NYC-Indoor dataset is a rich collection of over 36,000 images compiled from 13 distinct scenes within a span of a year. The dataset can be downloaded from [HuggingFace](https://huggingface.co/datasets/ai4ce/NYC-Indoor-VPR-Data/tree/main). We release NYC-Indoor under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). 25 | 26 | ## Benchmark 27 | We benchmarked four state-of-the-art deep learning VPR methods on the NYC-Indoor dataset: CosPlace, MixVPR, ResNet+NetVLAD, and CCT+NetVLAD. For more details, please refer to the [benchmark](./benchmark) folder file. 28 | 29 | ## Semi-auto Annotation 30 | Our semi-automatic annotation method can efficiently and accurately match trajectories and generate images with topometric locations as ground truth, applicable to any indoor VPR dataset. For more details, please refer to the [method](./method) folder file. 31 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/README.md: -------------------------------------------------------------------------------- 1 | # Disclaimer 2 | Code in this is folder is originally from Deep Visual Geo-localization Benchmark [official repository](https://github.com/gmberton/deep-visual-geo-localization-benchmark). We have made essential modifications the code to work with NYC-Indoor dataset. If you are using the code in this folder, please cosider citing the following paper: 3 | ``` 4 | @inProceedings{Berton_CVPR_2022_benchmark, 5 | author = {Berton, Gabriele and Mereu, Riccardo and Trivigno, Gabriele and Masone, Carlo and 6 | Csurka, Gabriela and Sattler, Torsten and Caputo, Barbara}, 7 | title = {Deep Visual Geo-localization Benchmark}, 8 | booktitle = {CVPR}, 9 | month = {June}, 10 | year = {2022}, 11 | } 12 | ``` 13 | 14 | # Create Environment 15 | ```bash 16 | conda create --name bench python=3.7 17 | ``` 18 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/anyloc_vlad_generate.py: -------------------------------------------------------------------------------- 1 | # Download cache data from OneDrive 2 | import os 3 | from onedrivedownloader import download 4 | from utilities1 import od_down_links 5 | 6 | # Link 7 | ln = od_down_links["cache"] 8 | # Download and unzip 9 | if os.path.isdir("./cache"): 10 | print("Cache folder already exists!") 11 | else: 12 | print("Downloading the cache folder") 13 | download(ln, filename="cache.zip", unzip=True, unzip_path="./") 14 | print("Cache folder downloaded") 15 | 16 | import glob 17 | _ex = lambda x: os.path.realpath(os.path.expanduser(x)) 18 | cache_dir: str = _ex("./cache") 19 | # imgs_dir = "/mnt/data/dean/datasets/b4/images/test/database" 20 | # assert os.path.isdir(cache_dir), "Cache directory not found" 21 | # assert os.path.isdir(imgs_dir), "Invalid unzipping" 22 | # num_imgs = len(glob.glob(f"{imgs_dir}/*.jpg")) 23 | # print(f"Found {num_imgs} images in {imgs_dir}") 24 | 25 | # Import everything 26 | import numpy as np 27 | import cv2 as cv 28 | import torch 29 | from torch import nn 30 | from torch.nn import functional as F 31 | from torchvision import transforms as tvf 32 | from torchvision.transforms import functional as T 33 | from PIL import Image 34 | import matplotlib.pyplot as plt 35 | import distinctipy as dipy 36 | from tqdm.auto import tqdm 37 | from typing import Literal, List 38 | import os 39 | import natsort 40 | import shutil 41 | from copy import deepcopy 42 | # DINOv2 imports 43 | from utilities1 import DinoV2ExtractFeatures 44 | from utilities1 import VLAD 45 | 46 | # Program parameters 47 | save_dir = "/home/unav/Desktop/benchmark/AnyLoc/saved_desc" 48 | device = torch.device("cuda") 49 | # Dino_v2 properties (parameters) 50 | desc_layer: int = 31 51 | desc_facet: Literal["query", "key", "value", "token"] = "value" 52 | num_c: int = 32 53 | # Domain for use case (deployment environment) 54 | domain: Literal["aerial", "indoor", "urban"] = "urban" 55 | # Maximum image dimension 56 | max_img_size: int = 640 57 | 58 | # DINO extractor 59 | if "extractor" in globals(): 60 | print(f"Extractor already defined, skipping") 61 | else: 62 | # extractor=ViTExtractor("dino_vits8", stride=4, 63 | # device=device) 64 | extractor = DinoV2ExtractFeatures("dinov2_vitg14", desc_layer, 65 | desc_facet, device=device) 66 | # Base image transformations 67 | base_tf = tvf.Compose([ 68 | tvf.ToTensor(), 69 | tvf.Normalize(mean=[0.485, 0.456, 0.406], 70 | std=[0.229, 0.224, 0.225]) 71 | ]) 72 | 73 | # Ensure that data is present 74 | ext_specifier = f"dinov2_vitg14/l{desc_layer}_{desc_facet}_c{num_c}" 75 | c_centers_file = os.path.join(cache_dir, "vocabulary", ext_specifier, 76 | domain, "c_centers.pt") 77 | assert os.path.isfile(c_centers_file), "Cluster centers not cached!" 78 | c_centers = torch.load(c_centers_file) 79 | assert c_centers.shape[0] == num_c, "Wrong number of clusters!" 80 | 81 | # VLAD object 82 | vlad = VLAD(num_c, desc_dim=None, 83 | cache_dir=os.path.dirname(c_centers_file)) 84 | # Fit (load) the cluster centers (this'll also load the desc_dim) 85 | vlad.fit(None) 86 | 87 | 88 | # img_fnames = glob.glob(f"{imgs_dir}/*.jpg") 89 | # img_fnames = natsort.natsorted(img_fnames) 90 | 91 | def single_desc(img_fname): 92 | # for img_fname in tqdm(img_fnames[:20]): 93 | # # DINO features 94 | with torch.no_grad(): 95 | pil_img = Image.open(img_fname).convert('RGB') 96 | img_pt = base_tf(pil_img).to(device) 97 | if max(img_pt.shape[-2:]) > max_img_size: 98 | c, h, w = img_pt.shape 99 | # Maintain aspect ratio 100 | if h == max(img_pt.shape[-2:]): 101 | w = int(w * max_img_size / h) 102 | h = max_img_size 103 | else: 104 | h = int(h * max_img_size / w) 105 | w = max_img_size 106 | # print(f"To {(h, w) =}") 107 | img_pt = T.resize(img_pt, (h, w), 108 | interpolation=T.InterpolationMode.BICUBIC) 109 | # print(f"Resized {img_fname} to {img_pt.shape = }") 110 | # Make image patchable (14, 14 patches) 111 | c, h, w = img_pt.shape 112 | h_new, w_new = (h // 14) * 14, (w // 14) * 14 113 | img_pt = tvf.CenterCrop((h_new, w_new))(img_pt)[None, ...] 114 | # Extract descriptor 115 | # print(img_pt.shape) 116 | ret = extractor(img_pt) # [1, num_patches, desc_dim] 117 | # VLAD global descriptor 118 | gd = vlad.generate(ret.cpu().squeeze()) # VLAD: shape [agg_dim] 119 | # print(gd.shape) 120 | return gd 121 | # gd_np = gd.numpy()[np.newaxis, ...] # shape: [1, agg_dim] 122 | # print(gd_np.shape) 123 | # np.save(f"{save_dir}/{os.path.basename(img_fname)}.npy", gd_np) 124 | 125 | # single_desc("/mnt/data/nyc_indoor/indoor/images/test/database/@00000.00@00035.70@168@.jpg") 126 | def all_desc(): 127 | f=open("test_database.txt") 128 | f1=open("test_queries.txt") 129 | l=f.readlines() 130 | l1=f1.readlines() 131 | all_features = np.empty((len(l)+len(l1), 49152), dtype="float32") 132 | for i in tqdm(range(len(l))): 133 | all_features[i]=single_desc(l[i].strip()) 134 | 135 | for i in tqdm(range(len(l1))): 136 | all_features[len(l)+i]=single_desc(l1[i].strip()) 137 | return all_features -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/cam.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import cv2 3 | import numpy as np 4 | import torch 5 | from torchvision import models 6 | from pytorch_grad_cam import GradCAM, \ 7 | HiResCAM, \ 8 | ScoreCAM, \ 9 | GradCAMPlusPlus, \ 10 | AblationCAM, \ 11 | XGradCAM, \ 12 | EigenCAM, \ 13 | EigenGradCAM, \ 14 | LayerCAM, \ 15 | FullGrad, \ 16 | GradCAMElementWise 17 | 18 | 19 | from pytorch_grad_cam import GuidedBackpropReLUModel 20 | from pytorch_grad_cam.utils.image import show_cam_on_image, \ 21 | deprocess_image, \ 22 | preprocess_image 23 | from pytorch_grad_cam.utils.model_targets import ClassifierOutputTarget 24 | 25 | 26 | def get_args(): 27 | parser = argparse.ArgumentParser() 28 | parser.add_argument('--use-cuda', action='store_true', default=False, 29 | help='Use NVIDIA GPU acceleration') 30 | parser.add_argument( 31 | '--image-path', 32 | type=str, 33 | default='./examples/both.png', 34 | help='Input image path') 35 | parser.add_argument('--aug_smooth', action='store_true', 36 | help='Apply test time augmentation to smooth the CAM') 37 | parser.add_argument( 38 | '--eigen_smooth', 39 | action='store_true', 40 | help='Reduce noise by taking the first principle componenet' 41 | 'of cam_weights*activations') 42 | parser.add_argument('--method', type=str, default='gradcam', 43 | choices=['gradcam', 'hirescam', 'gradcam++', 44 | 'scorecam', 'xgradcam', 45 | 'ablationcam', 'eigencam', 46 | 'eigengradcam', 'layercam', 'fullgrad'], 47 | help='Can be gradcam/gradcam++/scorecam/xgradcam' 48 | '/ablationcam/eigencam/eigengradcam/layercam') 49 | 50 | args = parser.parse_args() 51 | args.use_cuda = args.use_cuda and torch.cuda.is_available() 52 | if args.use_cuda: 53 | print('Using GPU for acceleration') 54 | else: 55 | print('Using CPU for computation') 56 | 57 | return args 58 | 59 | 60 | def cam(model, img_path, layer, layer_name): 61 | """ python cam.py -image-path 62 | Example usage of loading an image, and computing: 63 | 1. CAM 64 | 2. Guided Back Propagation 65 | 3. Combining both 66 | """ 67 | 68 | # args = get_args() 69 | methods = \ 70 | {"gradcam": GradCAM, 71 | "hirescam": HiResCAM, 72 | "scorecam": ScoreCAM, 73 | "gradcam++": GradCAMPlusPlus, 74 | "ablationcam": AblationCAM, 75 | "xgradcam": XGradCAM, 76 | "eigencam": EigenCAM, 77 | "eigengradcam": EigenGradCAM, 78 | "layercam": LayerCAM, 79 | "fullgrad": FullGrad, 80 | "gradcamelementwise": GradCAMElementWise} 81 | 82 | # model = models.resnet50(pretrained=True) 83 | 84 | # Choose the target layer you want to compute the visualization for. 85 | # Usually this will be the last convolutional layer in the model. 86 | # Some common choices can be: 87 | # Resnet18 and 50: model.layer4 88 | # VGG, densenet161: model.features[-1] 89 | # mnasnet1_0: model.layers[-1] 90 | # You can print the model to help chose the layer 91 | # You can pass a list with several target layers, 92 | # in that case the CAMs will be computed per layer and then aggregated. 93 | # You can also try selecting all layers of a certain type, with e.g: 94 | # from pytorch_grad_cam.utils.find_layers import find_layer_types_recursive 95 | # find_layer_types_recursive(model, [torch.nn.ReLU]) 96 | target_layers = [layer] 97 | 98 | img=cv2.imread(img_path, 1) 99 | img=cv2.resize(img, (384,384)) 100 | rgb_img = img[:, :, ::-1] 101 | rgb_img = np.float32(rgb_img) / 255 102 | input_tensor = preprocess_image(rgb_img, 103 | mean=[0.485, 0.456, 0.406], 104 | std=[0.229, 0.224, 0.225]) 105 | 106 | # We have to specify the target we want to generate 107 | # the Class Activation Maps for. 108 | # If targets is None, the highest scoring category (for every member in the batch) will be used. 109 | # You can target specific categories by 110 | # targets = [e.g ClassifierOutputTarget(281)] 111 | targets = None 112 | 113 | # Using the with statement ensures the context is freed, and you can 114 | # recreate different CAM objects in a loop. 115 | cam_algorithm = methods["gradcam"] 116 | with cam_algorithm(model=model, 117 | target_layers=target_layers, 118 | use_cuda=False) as cam: 119 | 120 | # AblationCAM and ScoreCAM have batched implementations. 121 | # You can override the internal batch size for faster computation. 122 | cam.batch_size = 32 123 | # try: 124 | grayscale_cam = cam(input_tensor=input_tensor, 125 | targets=targets, 126 | aug_smooth=False, 127 | eigen_smooth=False) 128 | # except: 129 | # print("grayscale_cam none") 130 | # return 131 | 132 | # Here grayscale_cam has only one image in the batch 133 | grayscale_cam = grayscale_cam[0, :] 134 | 135 | cam_image = show_cam_on_image(rgb_img, grayscale_cam, use_rgb=True) 136 | 137 | # cam_image is RGB encoded whereas "cv2.imwrite" requires BGR encoding. 138 | cam_image = cv2.cvtColor(cam_image, cv2.COLOR_RGB2BGR) 139 | 140 | gb_model = GuidedBackpropReLUModel(model=model, use_cuda=False) 141 | gb = gb_model(input_tensor, target_category=None) 142 | cam_mask = cv2.merge([grayscale_cam, grayscale_cam, grayscale_cam]) 143 | cam_gb = deprocess_image(cam_mask * gb) 144 | gb = deprocess_image(gb) 145 | 146 | last_slash_index = img_path.rfind("/") 147 | img_name=img_path[last_slash_index + 1:-4] 148 | cv2.imwrite("visual/cct/"+img_name+'_'+layer_name+'.jpg', cam_image) 149 | # cv2.imwrite("visual/"+model_name+'_'+layer_name+'.jpg', gb) 150 | # cv2.imwrite(f'gradcam_cam_gb.jpg', cam_gb) 151 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/commons.py: -------------------------------------------------------------------------------- 1 | 2 | """ 3 | This file contains some functions and classes which can be useful in very diverse projects. 4 | """ 5 | 6 | import os 7 | import sys 8 | import torch 9 | import random 10 | import logging 11 | import traceback 12 | import numpy as np 13 | from os.path import join 14 | 15 | 16 | def make_deterministic(seed=0): 17 | """Make results deterministic. If seed == -1, do not make deterministic. 18 | Running the script in a deterministic way might slow it down. 19 | """ 20 | if seed == -1: 21 | return 22 | random.seed(seed) 23 | np.random.seed(seed) 24 | torch.manual_seed(seed) 25 | torch.cuda.manual_seed(seed) 26 | torch.backends.cudnn.deterministic = True 27 | torch.backends.cudnn.benchmark = False 28 | 29 | 30 | def setup_logging(save_dir, console="debug", 31 | info_filename="info.log", debug_filename="debug.log"): 32 | """Set up logging files and console output. 33 | Creates one file for INFO logs and one for DEBUG logs. 34 | Args: 35 | save_dir (str): creates the folder where to save the files. 36 | debug (str): 37 | if == "debug" prints on console debug messages and higher 38 | if == "info" prints on console info messages and higher 39 | if == None does not use console (useful when a logger has already been set) 40 | info_filename (str): the name of the info file. if None, don't create info file 41 | debug_filename (str): the name of the debug file. if None, don't create debug file 42 | """ 43 | if os.path.exists(save_dir): 44 | raise FileExistsError(f"{save_dir} already exists!") 45 | os.makedirs(save_dir, exist_ok=True) 46 | # logging.Logger.manager.loggerDict.keys() to check which loggers are in use 47 | base_formatter = logging.Formatter('%(asctime)s %(message)s', "%Y-%m-%d %H:%M:%S") 48 | logger = logging.getLogger('') 49 | logger.setLevel(logging.DEBUG) 50 | 51 | if info_filename is not None: 52 | info_file_handler = logging.FileHandler(join(save_dir, info_filename)) 53 | info_file_handler.setLevel(logging.INFO) 54 | info_file_handler.setFormatter(base_formatter) 55 | logger.addHandler(info_file_handler) 56 | 57 | if debug_filename is not None: 58 | debug_file_handler = logging.FileHandler(join(save_dir, debug_filename)) 59 | debug_file_handler.setLevel(logging.DEBUG) 60 | debug_file_handler.setFormatter(base_formatter) 61 | logger.addHandler(debug_file_handler) 62 | 63 | if console is not None: 64 | console_handler = logging.StreamHandler() 65 | if console == "debug": 66 | console_handler.setLevel(logging.DEBUG) 67 | if console == "info": 68 | console_handler.setLevel(logging.INFO) 69 | console_handler.setFormatter(base_formatter) 70 | logger.addHandler(console_handler) 71 | 72 | def exception_handler(type_, value, tb): 73 | logger.info("\n" + "".join(traceback.format_exception(type, value, tb))) 74 | sys.excepthook = exception_handler 75 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/dataloaders/GSVCitiesDataloader.py: -------------------------------------------------------------------------------- 1 | import pytorch_lightning as pl 2 | from torch.utils.data.dataloader import DataLoader 3 | from torchvision import transforms as T 4 | 5 | from dataloaders.GSVCitiesDataset import GSVCitiesDataset 6 | from . import PittsburgDataset 7 | from . import MapillaryDataset 8 | 9 | from prettytable import PrettyTable 10 | 11 | IMAGENET_MEAN_STD = {'mean': [0.485, 0.456, 0.406], 12 | 'std': [0.229, 0.224, 0.225]} 13 | 14 | VIT_MEAN_STD = {'mean': [0.5, 0.5, 0.5], 15 | 'std': [0.5, 0.5, 0.5]} 16 | 17 | TRAIN_CITIES = [ 18 | 'Bangkok', 19 | 'BuenosAires', 20 | 'LosAngeles', 21 | 'MexicoCity', 22 | 'OSL', 23 | 'Rome', 24 | 'Barcelona', 25 | 'Chicago', 26 | 'Madrid', 27 | 'Miami', 28 | 'Phoenix', 29 | 'TRT', 30 | 'Boston', 31 | 'Lisbon', 32 | 'Medellin', 33 | 'Minneapolis', 34 | 'PRG', 35 | 'WashingtonDC', 36 | 'Brussels', 37 | 'London', 38 | 'Melbourne', 39 | 'Osaka', 40 | 'PRS', 41 | ] 42 | 43 | 44 | class GSVCitiesDataModule(pl.LightningDataModule): 45 | def __init__(self, 46 | batch_size=32, 47 | img_per_place=4, 48 | min_img_per_place=4, 49 | shuffle_all=False, 50 | image_size=(480, 640), 51 | num_workers=4, 52 | show_data_stats=True, 53 | cities=TRAIN_CITIES, 54 | mean_std=IMAGENET_MEAN_STD, 55 | batch_sampler=None, 56 | random_sample_from_each_place=True, 57 | val_set_names=['pitts30k_val', 'msls_val'] 58 | ): 59 | super().__init__() 60 | self.batch_size = batch_size 61 | self.img_per_place = img_per_place 62 | self.min_img_per_place = min_img_per_place 63 | self.shuffle_all = shuffle_all 64 | self.image_size = image_size 65 | self.num_workers = num_workers 66 | self.batch_sampler = batch_sampler 67 | self.show_data_stats = show_data_stats 68 | self.cities = cities 69 | self.mean_dataset = mean_std['mean'] 70 | self.std_dataset = mean_std['std'] 71 | self.random_sample_from_each_place = random_sample_from_each_place 72 | self.val_set_names = val_set_names 73 | self.save_hyperparameters() # save hyperparameter with Pytorch Lightening 74 | 75 | self.train_transform = T.Compose([ 76 | T.Resize(image_size, interpolation=T.InterpolationMode.BILINEAR), 77 | T.RandAugment(num_ops=3, interpolation=T.InterpolationMode.BILINEAR), 78 | T.ToTensor(), 79 | T.Normalize(mean=self.mean_dataset, std=self.std_dataset), 80 | ]) 81 | 82 | self.valid_transform = T.Compose([ 83 | T.Resize(image_size, interpolation=T.InterpolationMode.BILINEAR), 84 | T.ToTensor(), 85 | T.Normalize(mean=self.mean_dataset, std=self.std_dataset)]) 86 | 87 | self.train_loader_config = { 88 | 'batch_size': self.batch_size, 89 | 'num_workers': self.num_workers, 90 | 'drop_last': False, 91 | 'pin_memory': True, 92 | 'shuffle': self.shuffle_all} 93 | 94 | self.valid_loader_config = { 95 | 'batch_size': self.batch_size, 96 | 'num_workers': self.num_workers//2, 97 | 'drop_last': False, 98 | 'pin_memory': True, 99 | 'shuffle': False} 100 | 101 | def setup(self, stage): 102 | if stage == 'fit': 103 | # load train dataloader with reload routine 104 | self.reload() 105 | 106 | # load validation sets (pitts_val, msls_val, ...etc) 107 | self.val_datasets = [] 108 | for valid_set_name in self.val_set_names: 109 | if valid_set_name.lower() == 'pitts30k_test': 110 | self.val_datasets.append(PittsburgDataset.get_whole_test_set( 111 | input_transform=self.valid_transform)) 112 | elif valid_set_name.lower() == 'pitts30k_val': 113 | self.val_datasets.append(PittsburgDataset.get_whole_val_set( 114 | input_transform=self.valid_transform)) 115 | elif valid_set_name.lower() == 'msls_val': 116 | self.val_datasets.append(MapillaryDataset.MSLS( 117 | input_transform=self.valid_transform)) 118 | else: 119 | print( 120 | f'Validation set {valid_set_name} does not exist or has not been implemented yet') 121 | raise NotImplementedError 122 | if self.show_data_stats: 123 | self.print_stats() 124 | 125 | def reload(self): 126 | self.train_dataset = GSVCitiesDataset( 127 | cities=self.cities, 128 | img_per_place=self.img_per_place, 129 | min_img_per_place=self.min_img_per_place, 130 | random_sample_from_each_place=self.random_sample_from_each_place, 131 | transform=self.train_transform) 132 | 133 | def train_dataloader(self): 134 | self.reload() 135 | return DataLoader(dataset=self.train_dataset, **self.train_loader_config) 136 | 137 | def val_dataloader(self): 138 | val_dataloaders = [] 139 | for val_dataset in self.val_datasets: 140 | val_dataloaders.append(DataLoader( 141 | dataset=val_dataset, **self.valid_loader_config)) 142 | return val_dataloaders 143 | 144 | def print_stats(self): 145 | print() # print a new line 146 | table = PrettyTable() 147 | table.field_names = ['Data', 'Value'] 148 | table.align['Data'] = "l" 149 | table.align['Value'] = "l" 150 | table.header = False 151 | table.add_row(["# of cities", f"{len(TRAIN_CITIES)}"]) 152 | table.add_row(["# of places", f'{self.train_dataset.__len__()}']) 153 | table.add_row(["# of images", f'{self.train_dataset.total_nb_images}']) 154 | print(table.get_string(title="Training Dataset")) 155 | print() 156 | 157 | table = PrettyTable() 158 | table.field_names = ['Data', 'Value'] 159 | table.align['Data'] = "l" 160 | table.align['Value'] = "l" 161 | table.header = False 162 | for i, val_set_name in enumerate(self.val_set_names): 163 | table.add_row([f"Validation set {i+1}", f"{val_set_name}"]) 164 | # table.add_row(["# of places", f'{self.train_dataset.__len__()}']) 165 | print(table.get_string(title="Validation Datasets")) 166 | print() 167 | 168 | table = PrettyTable() 169 | table.field_names = ['Data', 'Value'] 170 | table.align['Data'] = "l" 171 | table.align['Value'] = "l" 172 | table.header = False 173 | table.add_row( 174 | ["Batch size (PxK)", f"{self.batch_size}x{self.img_per_place}"]) 175 | table.add_row( 176 | ["# of iterations", f"{self.train_dataset.__len__()//self.batch_size}"]) 177 | table.add_row(["Image size", f"{self.image_size}"]) 178 | print(table.get_string(title="Training config")) 179 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/dataloaders/GSVCitiesDataset.py: -------------------------------------------------------------------------------- 1 | # https://github.com/amaralibey/gsv-cities 2 | 3 | import pandas as pd 4 | from pathlib import Path 5 | from PIL import Image 6 | import torch 7 | from torch.utils.data import Dataset 8 | import torchvision.transforms as T 9 | 10 | default_transform = T.Compose([ 11 | T.ToTensor(), 12 | T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), 13 | ]) 14 | 15 | # NOTE: Hard coded path to dataset folder 16 | BASE_PATH = '../datasets/gsv_cities/' 17 | 18 | if not Path(BASE_PATH).exists(): 19 | raise FileNotFoundError( 20 | 'BASE_PATH is hardcoded, please adjust to point to gsv_cities') 21 | 22 | class GSVCitiesDataset(Dataset): 23 | def __init__(self, 24 | cities=['London', 'Boston'], 25 | img_per_place=4, 26 | min_img_per_place=4, 27 | random_sample_from_each_place=True, 28 | transform=default_transform, 29 | base_path=BASE_PATH 30 | ): 31 | super(GSVCitiesDataset, self).__init__() 32 | self.base_path = base_path 33 | self.cities = cities 34 | 35 | assert img_per_place <= min_img_per_place, \ 36 | f"img_per_place should be less than {min_img_per_place}" 37 | self.img_per_place = img_per_place 38 | self.min_img_per_place = min_img_per_place 39 | self.random_sample_from_each_place = random_sample_from_each_place 40 | self.transform = transform 41 | 42 | # generate the dataframe contraining images metadata 43 | self.dataframe = self.__getdataframes() 44 | 45 | # get all unique place ids 46 | self.places_ids = pd.unique(self.dataframe.index) 47 | self.total_nb_images = len(self.dataframe) 48 | 49 | def __getdataframes(self): 50 | ''' 51 | Return one dataframe containing 52 | all info about the images from all cities 53 | 54 | This requieres DataFrame files to be in a folder 55 | named Dataframes, containing a DataFrame 56 | for each city in self.cities 57 | ''' 58 | # read the first city dataframe 59 | df = pd.read_csv(self.base_path+'Dataframes/'+f'{self.cities[0]}.csv') 60 | df = df.sample(frac=1) # shuffle the city dataframe 61 | 62 | 63 | # append other cities one by one 64 | for i in range(1, len(self.cities)): 65 | tmp_df = pd.read_csv( 66 | self.base_path+'Dataframes/'+f'{self.cities[i]}.csv') 67 | 68 | # Now we add a prefix to place_id, so that we 69 | # don't confuse, say, place number 13 of NewYork 70 | # with place number 13 of London ==> (0000013 and 0500013) 71 | # We suppose that there is no city with more than 72 | # 99999 images and there won't be more than 99 cities 73 | # TODO: rename the dataset and hardcode these prefixes 74 | prefix = i 75 | tmp_df['place_id'] = tmp_df['place_id'] + (prefix * 10**5) 76 | tmp_df = tmp_df.sample(frac=1) # shuffle the city dataframe 77 | 78 | df = pd.concat([df, tmp_df], ignore_index=True) 79 | 80 | # keep only places depicted by at least min_img_per_place images 81 | res = df[df.groupby('place_id')['place_id'].transform( 82 | 'size') >= self.min_img_per_place] 83 | return res.set_index('place_id') 84 | 85 | def __getitem__(self, index): 86 | place_id = self.places_ids[index] 87 | 88 | # get the place in form of a dataframe (each row corresponds to one image) 89 | place = self.dataframe.loc[place_id] 90 | 91 | # sample K images (rows) from this place 92 | # we can either sort and take the most recent k images 93 | # or randomly sample them 94 | if self.random_sample_from_each_place: 95 | place = place.sample(n=self.img_per_place) 96 | else: # always get the same most recent images 97 | place = place.sort_values( 98 | by=['year', 'month', 'lat'], ascending=False) 99 | place = place[: self.img_per_place] 100 | 101 | imgs = [] 102 | for i, row in place.iterrows(): 103 | img_name = self.get_img_name(row) 104 | img_path = self.base_path + 'Images/' + \ 105 | row['city_id'] + '/' + img_name 106 | img = self.image_loader(img_path) 107 | 108 | if self.transform is not None: 109 | img = self.transform(img) 110 | 111 | imgs.append(img) 112 | 113 | # NOTE: contrary to image classification where __getitem__ returns only one image 114 | # in GSVCities, we return a place, which is a Tesor of K images (K=self.img_per_place) 115 | # this will return a Tensor of shape [K, channels, height, width]. This needs to be taken into account 116 | # in the Dataloader (which will yield batches of shape [BS, K, channels, height, width]) 117 | return torch.stack(imgs), torch.tensor(place_id).repeat(self.img_per_place) 118 | 119 | def __len__(self): 120 | '''Denotes the total number of places (not images)''' 121 | return len(self.places_ids) 122 | 123 | @staticmethod 124 | def image_loader(path): 125 | return Image.open(path).convert('RGB') 126 | 127 | @staticmethod 128 | def get_img_name(row): 129 | # given a row from the dataframe 130 | # return the corresponding image name 131 | 132 | city = row['city_id'] 133 | 134 | # now remove the two digit we added to the id 135 | # they are superficially added to make ids different 136 | # for different cities 137 | pl_id = row.name % 10**5 #row.name is the index of the row, not to be confused with image name 138 | pl_id = str(pl_id).zfill(7) 139 | 140 | panoid = row['panoid'] 141 | year = str(row['year']).zfill(4) 142 | month = str(row['month']).zfill(2) 143 | northdeg = str(row['northdeg']).zfill(3) 144 | lat, lon = str(row['lat']), str(row['lon']) 145 | name = city+'_'+pl_id+'_'+year+'_'+month+'_' + \ 146 | northdeg+'_'+lat+'_'+lon+'_'+panoid+'.jpg' 147 | return name 148 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/dataloaders/MapillaryDataset.py: -------------------------------------------------------------------------------- 1 | from pathlib import Path 2 | import numpy as np 3 | from PIL import Image 4 | from torch.utils.data import Dataset 5 | 6 | # NOTE: you need to download the mapillary_sls dataset from https://github.com/FrederikWarburg/mapillary_sls 7 | # make sure the path where the mapillary_sls validation dataset resides on your computer is correct. 8 | # the folder named train_val should reside in DATASET_ROOT path (that's the only folder you need from mapillary_sls) 9 | # I hardcoded the groundtruth for image to image evaluation, otherwise it would take ages to run the groundtruth script at each epoch. 10 | DATASET_ROOT = '../datasets/msls_val/' 11 | 12 | path_obj = Path(DATASET_ROOT) 13 | if not path_obj.exists(): 14 | raise Exception('Please make sure the path to mapillary_sls dataset is correct') 15 | 16 | if not path_obj.joinpath('train_val'): 17 | raise Exception(f'Please make sure the directory train_val from mapillary_sls dataset is situated in the directory {DATASET_ROOT}') 18 | 19 | class MSLS(Dataset): 20 | def __init__(self, input_transform = None): 21 | 22 | self.input_transform = input_transform 23 | 24 | # hard coded reference image names, this avoids the hassle of listing them at each epoch. 25 | self.dbImages = np.load('../datasets/msls_val/msls_val_dbImages.npy') 26 | 27 | # hard coded query image names. 28 | self.qImages = np.load('../datasets/msls_val/msls_val_qImages.npy') 29 | 30 | # hard coded index of query images 31 | self.qIdx = np.load('../datasets/msls_val/msls_val_qIdx.npy') 32 | 33 | # hard coded groundtruth (correspondence between each query and its matches) 34 | self.pIdx = np.load('../datasets/msls_val/msls_val_pIdx.npy', allow_pickle=True) 35 | 36 | # concatenate reference images then query images so that we can use only one dataloader 37 | self.images = np.concatenate((self.dbImages, self.qImages[self.qIdx])) 38 | 39 | # we need to keeo the number of references so that we can split references-queries 40 | # when calculating recall@K 41 | self.num_references = len(self.dbImages) 42 | 43 | def __getitem__(self, index): 44 | img = Image.open(DATASET_ROOT+self.images[index]) 45 | 46 | if self.input_transform: 47 | img = self.input_transform(img) 48 | 49 | return img, index 50 | 51 | def __len__(self): 52 | return len(self.images) -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/dataloaders/PittsburgDataset.py: -------------------------------------------------------------------------------- 1 | from os.path import join, exists 2 | from collections import namedtuple 3 | from scipy.io import loadmat 4 | 5 | import torchvision.transforms as T 6 | import torch.utils.data as data 7 | 8 | 9 | from PIL import Image 10 | from sklearn.neighbors import NearestNeighbors 11 | 12 | root_dir = '/scratch/ds5725/VPR-datasets-downloader/datasets/indoor_new/' 13 | 14 | if not exists(root_dir): 15 | raise FileNotFoundError( 16 | 'root_dir is hardcoded, please adjust to point to Pittsburgh dataset') 17 | 18 | struct_dir = "/scratch/ds5725/ssl_vpr/sub/" 19 | queries_dir = join(root_dir, 'queries_real') 20 | 21 | 22 | def input_transform(image_size=None): 23 | return T.Compose([ 24 | T.Resize(image_size),# interpolation=T.InterpolationMode.BICUBIC), 25 | T.ToTensor(), 26 | T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) 27 | ]) 28 | 29 | 30 | 31 | def get_whole_val_set(input_transform): 32 | structFile = join(struct_dir, 'new_val.mat') 33 | return WholeDatasetFromStruct(structFile, input_transform=input_transform) 34 | 35 | 36 | def get_250k_val_set(input_transform): 37 | structFile = join(struct_dir, 'new_val.mat') 38 | return WholeDatasetFromStruct(structFile, input_transform=input_transform) 39 | 40 | 41 | def get_whole_test_set(input_transform): 42 | structFile = join(struct_dir, 'new_test.mat') 43 | return WholeDatasetFromStruct(structFile, input_transform=input_transform) 44 | 45 | 46 | def get_250k_test_set(input_transform): 47 | structFile = join(struct_dir, 'new_test.mat') 48 | return WholeDatasetFromStruct(structFile, input_transform=input_transform) 49 | 50 | def get_whole_training_set(onlyDB=False): 51 | structFile = join(struct_dir, 'new_train.mat') 52 | return WholeDatasetFromStruct(structFile, 53 | input_transform=input_transform(), 54 | onlyDB=onlyDB) 55 | 56 | dbStruct = namedtuple('dbStruct', ['whichSet', 'dataset', 57 | 'dbImage', 'utmDb', 'qImage', 'utmQ', 'numDb', 'numQ', 58 | 'posDistThr', 'posDistSqThr', 'nonTrivPosDistSqThr']) 59 | 60 | 61 | def parse_dbStruct(path): 62 | mat = loadmat(path) 63 | matStruct = mat['dbStruct'].item() 64 | 65 | dataset = 'pitts250k' 66 | 67 | whichSet = matStruct[0].item() 68 | 69 | dbImage = [f[0].item() for f in matStruct[1]] 70 | utmDb = matStruct[2].T 71 | 72 | qImage = [f[0].item() for f in matStruct[3]] 73 | utmQ = matStruct[4].T 74 | 75 | numDb = matStruct[5].item() 76 | numQ = matStruct[6].item() 77 | 78 | posDistThr = matStruct[7].item() 79 | posDistSqThr = matStruct[8].item() 80 | nonTrivPosDistSqThr = matStruct[9].item() 81 | 82 | return dbStruct(whichSet, dataset, dbImage, utmDb, qImage, 83 | utmQ, numDb, numQ, posDistThr, 84 | posDistSqThr, nonTrivPosDistSqThr) 85 | 86 | 87 | class WholeDatasetFromStruct(data.Dataset): 88 | def __init__(self, structFile, input_transform=None, onlyDB=False): 89 | super().__init__() 90 | 91 | self.input_transform = input_transform 92 | 93 | self.dbStruct = parse_dbStruct(structFile) 94 | self.images = [dbIm for dbIm in self.dbStruct.dbImage] 95 | if not onlyDB: 96 | self.images += [qIm for qIm in self.dbStruct.qImage] 97 | 98 | self.whichSet = self.dbStruct.whichSet 99 | self.dataset = self.dbStruct.dataset 100 | 101 | self.positives = None 102 | self.distances = None 103 | 104 | def __getitem__(self, index): 105 | img = Image.open(self.images[index]) 106 | 107 | if self.input_transform: 108 | img = self.input_transform(img) 109 | 110 | return img, index 111 | 112 | def __len__(self): 113 | return len(self.images) 114 | 115 | def getPositives(self): 116 | # positives for evaluation are those within trivial threshold range 117 | # fit NN to find them, search by radius 118 | if self.positives is None: 119 | knn = NearestNeighbors(n_jobs=-1) 120 | knn.fit(self.dbStruct.utmDb) 121 | 122 | self.distances, self.positives = knn.radius_neighbors(self.dbStruct.utmQ, 123 | radius=self.dbStruct.posDistThr) 124 | 125 | return self.positives 126 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/eval.py: -------------------------------------------------------------------------------- 1 | 2 | """ 3 | With this script you can evaluate checkpoints or test models from two popular 4 | landmark retrieval github repos. 5 | The first is https://github.com/naver/deep-image-retrieval from Naver labs, 6 | provides ResNet-50 and ResNet-101 trained with AP on Google Landmarks 18 clean. 7 | $ python eval.py --off_the_shelf=naver --l2=none --backbone=resnet101conv5 --aggregation=gem --fc_output_dim=2048 8 | 9 | The second is https://github.com/filipradenovic/cnnimageretrieval-pytorch from 10 | Radenovic, provides ResNet-50 and ResNet-101 trained with a triplet loss 11 | on Google Landmarks 18 and sfm120k. 12 | $ python eval.py --off_the_shelf=radenovic_gldv1 --l2=after_pool --backbone=resnet101conv5 --aggregation=gem --fc_output_dim=2048 13 | $ python eval.py --off_the_shelf=radenovic_sfm --l2=after_pool --backbone=resnet101conv5 --aggregation=gem --fc_output_dim=2048 14 | 15 | Note that although the architectures are almost the same, Naver's 16 | implementation does not use a l2 normalization before/after the GeM aggregation, 17 | while Radenovic's uses it after (and we use it before, which shows better 18 | results in VG) 19 | """ 20 | 21 | import os 22 | import sys 23 | import torch 24 | import parser 25 | import logging 26 | import sklearn 27 | from os.path import join 28 | from datetime import datetime 29 | from torch.utils.model_zoo import load_url 30 | from google_drive_downloader import GoogleDriveDownloader as gdd 31 | 32 | import test 33 | import util 34 | import commons 35 | import datasets_ws 36 | from model import network 37 | 38 | OFF_THE_SHELF_RADENOVIC = { 39 | 'resnet50conv5_sfm' : 'http://cmp.felk.cvut.cz/cnnimageretrieval/data/networks/retrieval-SfM-120k/rSfM120k-tl-resnet50-gem-w-97bf910.pth', 40 | 'resnet101conv5_sfm' : 'http://cmp.felk.cvut.cz/cnnimageretrieval/data/networks/retrieval-SfM-120k/rSfM120k-tl-resnet101-gem-w-a155e54.pth', 41 | 'resnet50conv5_gldv1' : 'http://cmp.felk.cvut.cz/cnnimageretrieval/data/networks/gl18/gl18-tl-resnet50-gem-w-83fdc30.pth', 42 | 'resnet101conv5_gldv1' : 'http://cmp.felk.cvut.cz/cnnimageretrieval/data/networks/gl18/gl18-tl-resnet101-gem-w-a4d43db.pth', 43 | } 44 | 45 | OFF_THE_SHELF_NAVER = { 46 | "resnet50conv5" : "1oPtE_go9tnsiDLkWjN4NMpKjh-_md1G5", 47 | 'resnet101conv5' : "1UWJGDuHtzaQdFhSMojoYVQjmCXhIwVvy" 48 | } 49 | 50 | ######################################### SETUP ######################################### 51 | args = parser.parse_arguments() 52 | start_time = datetime.now() 53 | args.save_dir = join("test", args.save_dir, start_time.strftime('%Y-%m-%d_%H-%M-%S')) 54 | commons.setup_logging(args.save_dir) 55 | commons.make_deterministic(args.seed) 56 | logging.info(f"Arguments: {args}") 57 | logging.info(f"The outputs are being saved in {args.save_dir}") 58 | 59 | ######################################### MODEL ######################################### 60 | model = network.GeoLocalizationNet(args) 61 | model = model.to(args.device) 62 | 63 | if args.aggregation in ["netvlad", "crn"]: 64 | args.features_dim *= args.netvlad_clusters 65 | 66 | if args.off_the_shelf.startswith("radenovic") or args.off_the_shelf.startswith("naver"): 67 | if args.off_the_shelf.startswith("radenovic"): 68 | pretrain_dataset_name = args.off_the_shelf.split("_")[1] # sfm or gldv1 datasets 69 | url = OFF_THE_SHELF_RADENOVIC[f"{args.backbone}_{pretrain_dataset_name}"] 70 | state_dict = load_url(url, model_dir=join("data", "off_the_shelf_nets")) 71 | else: 72 | # This is a hacky workaround to maintain compatibility 73 | sys.modules['sklearn.decomposition.pca'] = sklearn.decomposition._pca 74 | zip_file_path = join("data", "off_the_shelf_nets", args.backbone + "_naver.zip") 75 | if not os.path.exists(zip_file_path): 76 | gdd.download_file_from_google_drive(file_id=OFF_THE_SHELF_NAVER[args.backbone], 77 | dest_path=zip_file_path, unzip=True) 78 | if args.backbone == "resnet50conv5": 79 | state_dict_filename = "Resnet50-AP-GeM.pt" 80 | elif args.backbone == "resnet101conv5": 81 | state_dict_filename = "Resnet-101-AP-GeM.pt" 82 | state_dict = torch.load(join("data", "off_the_shelf_nets", state_dict_filename)) 83 | state_dict = state_dict["state_dict"] 84 | model_keys = model.state_dict().keys() 85 | renamed_state_dict = {k: v for k, v in zip(model_keys, state_dict.values())} 86 | model.load_state_dict(renamed_state_dict) 87 | elif args.resume is not None: 88 | logging.info(f"Resuming model from {args.resume}") 89 | model = util.resume_model(args, model) 90 | # Enable DataParallel after loading checkpoint, otherwise doing it before 91 | # would append "module." in front of the keys of the state dict triggering errors 92 | model = torch.nn.DataParallel(model) 93 | 94 | if args.pca_dim is None: 95 | pca = None 96 | else: 97 | full_features_dim = args.features_dim 98 | args.features_dim = args.pca_dim 99 | pca = util.compute_pca(args, model, args.pca_dataset_folder, full_features_dim) 100 | 101 | ######################################### DATASETS ######################################### 102 | test_ds = datasets_ws.BaseDataset(args, args.datasets_folder, args.dataset_name, "test") 103 | logging.info(f"Test set: {test_ds}") 104 | 105 | ######################################### TEST on TEST SET ######################################### 106 | recalls, recalls_str = test.test(args, test_ds, model, args.test_method, pca) 107 | logging.info(f"Recalls on {test_ds}: {recalls_str}") 108 | 109 | logging.info(f"Finished in {str(datetime.now() - start_time)[:-7]}") 110 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/ind_name.py: -------------------------------------------------------------------------------- 1 | # fd=open("test_database.txt") 2 | # fq=open("test_queries.txt") 3 | # fdl=fd.readlines() 4 | # fql=fq.readlines() 5 | # f=open("anyloc.txt") 6 | # f1=open("results/anyloc_name.txt","w") 7 | # lines=f.readlines() 8 | # for i in range(1,len(lines)): 9 | # s=lines[i].strip().split() 10 | # f1.write(fql[int(s[0])].strip()+" "+fdl[int(s[1])].strip()+"\n") 11 | import math 12 | 13 | def calculate_distance(point1, point2): 14 | x1, y1 = point1 15 | x2, y2 = point2 16 | distance = math.sqrt((x2 - x1)**2 + (y2 - y1)**2) 17 | return distance 18 | 19 | # f=open("results/resnet_name.txt") 20 | # f1=open("results/resnet_eval.txt","w") 21 | # for line in f: 22 | # s=line.strip().split() 23 | # x0=float(s[0].split('@')[1]) 24 | # y0=float(s[0].split('@')[2]) 25 | # x1=float(s[1].split('@')[1]) 26 | # y1=float(s[1].split('@')[2]) 27 | # f1.write(s[0]+" "+s[1]+" "+str(calculate_distance((x0,y0),(x1,y1)))+"\n") 28 | 29 | ad={} 30 | f=open("results/anyloc_eval.txt") 31 | for line in f: 32 | s=line.strip().split() 33 | ad[s[0]]=(s[1],float(s[2])) 34 | 35 | cd={} 36 | f1=open("results/cct_eval.txt") 37 | for line in f1: 38 | s=line.strip().split() 39 | cd[s[0]]=(s[1],float(s[2])) 40 | 41 | cod={} 42 | f2=open("results/cosplace_eval.txt") 43 | for line in f2: 44 | s=line.strip().split() 45 | cod[s[0]]=(s[1],float(s[2])) 46 | 47 | md={} 48 | f3=open("results/mixvpr_eval.txt") 49 | for line in f3: 50 | s=line.strip().split() 51 | md[s[0]]=(s[1],float(s[2])) 52 | 53 | rd={} 54 | f4=open("results/resnet_eval.txt") 55 | for line in f4: 56 | s=line.strip().split() 57 | rd[s[0]]=(s[1],float(s[2])) 58 | 59 | ad_rate={} 60 | for k,v in cod.items(): 61 | m=k.split('@')[1][1] 62 | if m not in ad_rate: 63 | ad_rate[m]=[0,0] 64 | if v[1]<=25: 65 | ad_rate[m][0]+=1 66 | else: 67 | ad_rate[m][1]+=1 68 | 69 | a=[] 70 | for i in range(0,9): 71 | a.append(round(ad_rate[str(i)][0]/(ad_rate[str(i)][0]+ad_rate[str(i)][1]),2)) 72 | # print(a) 73 | # common_keys = set(ad.keys()) & set(cd.keys()) & set(rd.keys()) & set(md.keys()) 74 | # num_common_keys = len(common_keys) 75 | # print(num_common_keys) 76 | 77 | # for k,v in md.items(): 78 | # if float(k.split('@')[1])<8000 or float(k.split('@')[1])>9000: 79 | # continue 80 | # if v[1]<5 and (k in ad) and (k in cd) and (k in rd): 81 | # print(k, ad[k],cd[k],rd[k],md[k],sep="\n") 82 | # break 83 | 84 | # k="/mnt/data/nyc_indoor/indoor/images/test/queries/@08115.45@00140.84@620@.jpg" 85 | # print(k, ad[k],cd[k],rd[k],md[k],sep="\n") 86 | # for k,v in md.items(): 87 | # if float(k.split('@')[1])<500: 88 | # continue 89 | # if v[1]<5 and (k in ad) and (k in cd) and (k in rd) and (ad[k][1]>20): 90 | # print(k, ad[k],cd[k],rd[k],md[k],sep="\n") 91 | # break 92 | 93 | # for k,v in md.items(): 94 | # if float(k.split('@')[1])<1500 or (float(k.split('@')[1])>8000 and float(k.split('@')[1])<9000): 95 | # continue 96 | # if v[1]<5 and (k in ad) and (k in cd) and (k in rd) and (ad[k][1]>20 or cd[k][1]>20 or rd[k][1]>20): 97 | # print(k, ad[k],cd[k],rd[k],md[k],sep="\n") 98 | # break 99 | 100 | s=["/mnt/data/nyc_indoor/indoor/images/test/queries/@08115.45@00140.84@620@.jpg"] 101 | 102 | kn="" 103 | for q in s: 104 | maxd=9999 105 | coord=(float(q.split('@')[1]),float(q.split('@')[2])) 106 | for k,v in cod.items(): 107 | c1=(float(k.split('@')[1]),float(k.split('@')[2])) 108 | if calculate_distance(coord,c1) 0) 23 | return new_mask 24 | 25 | def forward(self, x, mask=None): 26 | embed = self.embeddings(x) 27 | embed = embed if mask is None else embed * self.forward_mask(mask).unsqueeze(-1).float() 28 | return embed, mask 29 | 30 | @staticmethod 31 | def init_weight(m): 32 | if isinstance(m, nn.Linear): 33 | nn.init.trunc_normal_(m.weight, std=.02) 34 | if isinstance(m, nn.Linear) and m.bias is not None: 35 | nn.init.constant_(m.bias, 0) 36 | else: 37 | nn.init.normal_(m.weight) 38 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/model/cct/helpers.py: -------------------------------------------------------------------------------- 1 | import math 2 | import torch 3 | import torch.nn.functional as F 4 | 5 | 6 | def resize_pos_embed(posemb, posemb_new, num_tokens=1): 7 | # Copied from `timm` by Ross Wightman: 8 | # github.com/rwightman/pytorch-image-models 9 | # Rescale the grid of position embeddings when loading from state_dict. Adapted from 10 | # https://github.com/google-research/vision_transformer/blob/00883dd691c63a6830751563748663526e811cee/vit_jax/checkpoint.py#L224 11 | ntok_new = posemb_new.shape[1] 12 | if num_tokens: 13 | posemb_tok, posemb_grid = posemb[:, :num_tokens], posemb[0, num_tokens:] 14 | ntok_new -= num_tokens 15 | else: 16 | posemb_tok, posemb_grid = posemb[:, :0], posemb[0] 17 | gs_old = int(math.sqrt(len(posemb_grid))) 18 | gs_new = int(math.sqrt(ntok_new)) 19 | posemb_grid = posemb_grid.reshape(1, gs_old, gs_old, -1).permute(0, 3, 1, 2) 20 | posemb_grid = F.interpolate(posemb_grid, size=(gs_new, gs_new), mode='bilinear') 21 | posemb_grid = posemb_grid.permute(0, 2, 3, 1).reshape(1, gs_new * gs_new, -1) 22 | posemb = torch.cat([posemb_tok, posemb_grid], dim=1) 23 | return posemb 24 | 25 | 26 | def pe_check(model, state_dict, pe_key='classifier.positional_emb'): 27 | if pe_key is not None and pe_key in state_dict.keys() and pe_key in model.state_dict().keys(): 28 | if model.state_dict()[pe_key].shape != state_dict[pe_key].shape: 29 | state_dict[pe_key] = resize_pos_embed(state_dict[pe_key], 30 | model.state_dict()[pe_key], 31 | num_tokens=model.classifier.num_tokens) 32 | return state_dict 33 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/model/cct/stochastic_depth.py: -------------------------------------------------------------------------------- 1 | # Thanks to rwightman's timm package 2 | # github.com:rwightman/pytorch-image-models 3 | 4 | import torch 5 | import torch.nn as nn 6 | 7 | 8 | def drop_path(x, drop_prob: float = 0., training: bool = False): 9 | """ 10 | Obtained from: github.com:rwightman/pytorch-image-models 11 | Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). 12 | This is the same as the DropConnect impl I created for EfficientNet, etc networks, however, 13 | the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper... 14 | See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for 15 | changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use 16 | 'survival rate' as the argument. 17 | """ 18 | if drop_prob == 0. or not training: 19 | return x 20 | keep_prob = 1 - drop_prob 21 | shape = (x.shape[0],) + (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets 22 | random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device) 23 | random_tensor.floor_() # binarize 24 | output = x.div(keep_prob) * random_tensor 25 | return output 26 | 27 | 28 | class DropPath(nn.Module): 29 | """ 30 | Obtained from: github.com:rwightman/pytorch-image-models 31 | Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). 32 | """ 33 | 34 | def __init__(self, drop_prob=None): 35 | super(DropPath, self).__init__() 36 | self.drop_prob = drop_prob 37 | 38 | def forward(self, x): 39 | return drop_path(x, self.drop_prob, self.training) 40 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/model/cct/tokenizer.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | 6 | class Tokenizer(nn.Module): 7 | def __init__(self, 8 | kernel_size, stride, padding, 9 | pooling_kernel_size=3, pooling_stride=2, pooling_padding=1, 10 | n_conv_layers=1, 11 | n_input_channels=3, 12 | n_output_channels=64, 13 | in_planes=64, 14 | activation=None, 15 | max_pool=True, 16 | conv_bias=False): 17 | super(Tokenizer, self).__init__() 18 | 19 | n_filter_list = [n_input_channels] + \ 20 | [in_planes for _ in range(n_conv_layers - 1)] + \ 21 | [n_output_channels] 22 | 23 | self.conv_layers = nn.Sequential( 24 | *[nn.Sequential( 25 | nn.Conv2d(n_filter_list[i], n_filter_list[i + 1], 26 | kernel_size=(kernel_size, kernel_size), 27 | stride=(stride, stride), 28 | padding=(padding, padding), bias=conv_bias), 29 | nn.Identity() if activation is None else activation(), 30 | nn.MaxPool2d(kernel_size=pooling_kernel_size, 31 | stride=pooling_stride, 32 | padding=pooling_padding) if max_pool else nn.Identity() 33 | ) 34 | for i in range(n_conv_layers) 35 | ]) 36 | 37 | self.flattener = nn.Flatten(2, 3) 38 | self.apply(self.init_weight) 39 | 40 | def sequence_length(self, n_channels=3, height=224, width=224): 41 | return self.forward(torch.zeros((1, n_channels, height, width))).shape[1] 42 | 43 | def forward(self, x): 44 | return self.flattener(self.conv_layers(x)).transpose(-2, -1) 45 | 46 | @staticmethod 47 | def init_weight(m): 48 | if isinstance(m, nn.Conv2d): 49 | nn.init.kaiming_normal_(m.weight) 50 | 51 | 52 | class TextTokenizer(nn.Module): 53 | def __init__(self, 54 | kernel_size, stride, padding, 55 | pooling_kernel_size=3, pooling_stride=2, pooling_padding=1, 56 | embedding_dim=300, 57 | n_output_channels=128, 58 | activation=None, 59 | max_pool=True, 60 | *args, **kwargs): 61 | super(TextTokenizer, self).__init__() 62 | 63 | self.max_pool = max_pool 64 | self.conv_layers = nn.Sequential( 65 | nn.Conv2d(1, n_output_channels, 66 | kernel_size=(kernel_size, embedding_dim), 67 | stride=(stride, 1), 68 | padding=(padding, 0), bias=False), 69 | nn.Identity() if activation is None else activation(), 70 | nn.MaxPool2d( 71 | kernel_size=(pooling_kernel_size, 1), 72 | stride=(pooling_stride, 1), 73 | padding=(pooling_padding, 0) 74 | ) if max_pool else nn.Identity() 75 | ) 76 | 77 | self.apply(self.init_weight) 78 | 79 | def seq_len(self, seq_len=32, embed_dim=300): 80 | return self.forward(torch.zeros((1, seq_len, embed_dim)))[0].shape[1] 81 | 82 | def forward_mask(self, mask): 83 | new_mask = mask.unsqueeze(1).float() 84 | cnn_weight = torch.ones( 85 | (1, 1, self.conv_layers[0].kernel_size[0]), 86 | device=mask.device, 87 | dtype=torch.float) 88 | new_mask = F.conv1d( 89 | new_mask, cnn_weight, None, 90 | self.conv_layers[0].stride[0], self.conv_layers[0].padding[0], 1, 1) 91 | if self.max_pool: 92 | new_mask = F.max_pool1d( 93 | new_mask, self.conv_layers[2].kernel_size[0], 94 | self.conv_layers[2].stride[0], self.conv_layers[2].padding[0], 1, False, False) 95 | new_mask = new_mask.squeeze(1) 96 | new_mask = (new_mask > 0) 97 | return new_mask 98 | 99 | def forward(self, x, mask=None): 100 | x = x.unsqueeze(1) 101 | x = self.conv_layers(x) 102 | x = x.transpose(1, 3).squeeze(1) 103 | x = x if mask is None else x * self.forward_mask(mask).unsqueeze(-1).float() 104 | return x, mask 105 | 106 | @staticmethod 107 | def init_weight(m): 108 | if isinstance(m, nn.Conv2d): 109 | nn.init.kaiming_normal_(m.weight) 110 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/model/commands.txt: -------------------------------------------------------------------------------- 1 | python train.py --dataset_name=nyu-vpr --datasets_folder=/scratch/ds5725/VPR-datasets-downloader/datasets --backbone=cct384 --trunc_te=8 --freeze_te 1 -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/model/functional.py: -------------------------------------------------------------------------------- 1 | 2 | import math 3 | import torch 4 | import torch.nn.functional as F 5 | 6 | def sare_ind(query, positive, negative): 7 | '''all 3 inputs are supposed to be shape 1xn_features''' 8 | dist_pos = ((query - positive)**2).sum(1) 9 | dist_neg = ((query - negative)**2).sum(1) 10 | 11 | dist = - torch.cat((dist_pos, dist_neg)) 12 | dist = F.log_softmax(dist, 0) 13 | 14 | #loss = (- dist[:, 0]).mean() on a batch 15 | loss = -dist[0] 16 | return loss 17 | 18 | def sare_joint(query, positive, negatives): 19 | '''query and positive have to be 1xn_features; whereas negatives has to be 20 | shape n_negative x n_features. n_negative is usually 10''' 21 | # NOTE: the implementation is the same if batch_size=1 as all operations 22 | # are vectorial. If there were the additional n_batch dimension a different 23 | # handling of that situation would have to be implemented here. 24 | # This function is declared anyway for the sake of clarity as the 2 should 25 | # be called in different situations because, even though there would be 26 | # no Exceptions, there would actually be a conceptual error. 27 | return sare_ind(query, positive, negatives) 28 | 29 | def mac(x): 30 | return F.adaptive_max_pool2d(x, (1,1)) 31 | 32 | def spoc(x): 33 | return F.adaptive_avg_pool2d(x, (1,1)) 34 | 35 | def gem(x, p=3, eps=1e-6, work_with_tokens=False): 36 | if work_with_tokens: 37 | x = x.permute(0, 2, 1) 38 | # unseqeeze to maintain compatibility with Flatten 39 | return F.avg_pool1d(x.clamp(min=eps).pow(p), (x.size(-1))).pow(1./p).unsqueeze(3) 40 | else: 41 | return F.avg_pool2d(x.clamp(min=eps).pow(p), (x.size(-2), x.size(-1))).pow(1./p) 42 | 43 | def rmac(x, L=3, eps=1e-6): 44 | ovr = 0.4 # desired overlap of neighboring regions 45 | steps = torch.Tensor([2, 3, 4, 5, 6, 7]) # possible regions for the long dimension 46 | W = x.size(3) 47 | H = x.size(2) 48 | w = min(W, H) 49 | # w2 = math.floor(w/2.0 - 1) 50 | b = (max(H, W)-w)/(steps-1) 51 | (tmp, idx) = torch.min(torch.abs(((w**2 - w*b)/w**2)-ovr), 0) # steps(idx) regions for long dimension 52 | # region overplus per dimension 53 | Wd = 0; 54 | Hd = 0; 55 | if H < W: 56 | Wd = idx.item() + 1 57 | elif H > W: 58 | Hd = idx.item() + 1 59 | v = F.max_pool2d(x, (x.size(-2), x.size(-1))) 60 | v = v / (torch.norm(v, p=2, dim=1, keepdim=True) + eps).expand_as(v) 61 | for l in range(1, L+1): 62 | wl = math.floor(2*w/(l+1)) 63 | wl2 = math.floor(wl/2 - 1) 64 | if l+Wd == 1: 65 | b = 0 66 | else: 67 | b = (W-wl)/(l+Wd-1) 68 | cenW = torch.floor(wl2 + torch.Tensor(range(l-1+Wd+1))*b) - wl2 # center coordinates 69 | if l+Hd == 1: 70 | b = 0 71 | else: 72 | b = (H-wl)/(l+Hd-1) 73 | cenH = torch.floor(wl2 + torch.Tensor(range(l-1+Hd+1))*b) - wl2 # center coordinates 74 | for i_ in cenH.tolist(): 75 | for j_ in cenW.tolist(): 76 | if wl == 0: 77 | continue 78 | R = x[:,:,(int(i_)+torch.Tensor(range(wl)).long()).tolist(),:] 79 | R = R[:,:,:,(int(j_)+torch.Tensor(range(wl)).long()).tolist()] 80 | vt = F.max_pool2d(R, (R.size(-2), R.size(-1))) 81 | vt = vt / (torch.norm(vt, p=2, dim=1, keepdim=True) + eps).expand_as(vt) 82 | v += vt 83 | return v 84 | 85 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/model/network.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | import torch 4 | import logging 5 | import torchvision 6 | from torch import nn 7 | from os.path import join 8 | from transformers import ViTModel 9 | from google_drive_downloader import GoogleDriveDownloader as gdd 10 | 11 | from model.cct import cct_14_7x2_384 12 | from model.aggregation import Flatten 13 | from model.normalization import L2Norm 14 | import model.aggregation as aggregation 15 | 16 | # Pretrained models on Google Landmarks v2 and Places 365 17 | PRETRAINED_MODELS = { 18 | 'resnet18_places' : '1DnEQXhmPxtBUrRc81nAvT8z17bk-GBj5', 19 | 'resnet50_places' : '1zsY4mN4jJ-AsmV3h4hjbT72CBfJsgSGC', 20 | 'resnet101_places' : '1E1ibXQcg7qkmmmyYgmwMTh7Xf1cDNQXa', 21 | 'vgg16_places' : '1UWl1uz6rZ6Nqmp1K5z3GHAIZJmDh4bDu', 22 | 'resnet18_gldv2' : '1wkUeUXFXuPHuEvGTXVpuP5BMB-JJ1xke', 23 | 'resnet50_gldv2' : '1UDUv6mszlXNC1lv6McLdeBNMq9-kaA70', 24 | 'resnet101_gldv2' : '1apiRxMJpDlV0XmKlC5Na_Drg2jtGL-uE', 25 | 'vgg16_gldv2' : '10Ov9JdO7gbyz6mB5x0v_VSAUMj91Ta4o' 26 | } 27 | 28 | 29 | class GeoLocalizationNet(nn.Module): 30 | """The used networks are composed of a backbone and an aggregation layer. 31 | """ 32 | def __init__(self, args): 33 | super().__init__() 34 | self.backbone = get_backbone(args) 35 | self.arch_name = args.backbone 36 | self.aggregation = get_aggregation(args) 37 | 38 | if args.aggregation in ["gem", "spoc", "mac", "rmac"]: 39 | if args.l2 == "before_pool": 40 | self.aggregation = nn.Sequential(L2Norm(), self.aggregation, Flatten()) 41 | elif args.l2 == "after_pool": 42 | self.aggregation = nn.Sequential(self.aggregation, L2Norm(), Flatten()) 43 | elif args.l2 == "none": 44 | self.aggregation = nn.Sequential(self.aggregation, Flatten()) 45 | 46 | if args.fc_output_dim != None: 47 | # Concatenate fully connected layer to the aggregation layer 48 | self.aggregation = nn.Sequential(self.aggregation, 49 | nn.Linear(args.features_dim, args.fc_output_dim), 50 | L2Norm()) 51 | args.features_dim = args.fc_output_dim 52 | 53 | def forward(self, x): 54 | x = self.backbone(x) 55 | x = self.aggregation(x) 56 | return x 57 | 58 | 59 | def get_aggregation(args): 60 | if args.aggregation == "gem": 61 | return aggregation.GeM(work_with_tokens=args.work_with_tokens) 62 | elif args.aggregation == "spoc": 63 | return aggregation.SPoC() 64 | elif args.aggregation == "mac": 65 | return aggregation.MAC() 66 | elif args.aggregation == "rmac": 67 | return aggregation.RMAC() 68 | elif args.aggregation == "netvlad": 69 | return aggregation.NetVLAD(clusters_num=args.netvlad_clusters, dim=args.features_dim, 70 | work_with_tokens=args.work_with_tokens) 71 | elif args.aggregation == 'crn': 72 | return aggregation.CRN(clusters_num=args.netvlad_clusters, dim=args.features_dim) 73 | elif args.aggregation == "rrm": 74 | return aggregation.RRM(args.features_dim) 75 | elif args.aggregation in ['cls', 'seqpool']: 76 | return nn.Identity() 77 | 78 | 79 | def get_pretrained_model(args): 80 | if args.pretrain == 'places': num_classes = 365 81 | elif args.pretrain == 'gldv2': num_classes = 512 82 | 83 | if args.backbone.startswith("resnet18"): 84 | model = torchvision.models.resnet18(num_classes=num_classes) 85 | elif args.backbone.startswith("resnet50"): 86 | model = torchvision.models.resnet50(num_classes=num_classes) 87 | elif args.backbone.startswith("resnet101"): 88 | model = torchvision.models.resnet101(num_classes=num_classes) 89 | elif args.backbone.startswith("vgg16"): 90 | model = torchvision.models.vgg16(num_classes=num_classes) 91 | 92 | if args.backbone.startswith('resnet'): 93 | model_name = args.backbone.split('conv')[0] + "_" + args.pretrain 94 | else: 95 | model_name = args.backbone + "_" + args.pretrain 96 | file_path = join("data", "pretrained_nets", model_name +".pth") 97 | 98 | if not os.path.exists(file_path): 99 | gdd.download_file_from_google_drive(file_id=PRETRAINED_MODELS[model_name], 100 | dest_path=file_path) 101 | state_dict = torch.load(file_path, map_location=torch.device('cpu')) 102 | model.load_state_dict(state_dict) 103 | return model 104 | 105 | 106 | def get_backbone(args): 107 | # The aggregation layer works differently based on the type of architecture 108 | args.work_with_tokens = args.backbone.startswith('cct') or args.backbone.startswith('vit') 109 | if args.backbone.startswith("resnet"): 110 | if args.pretrain in ['places', 'gldv2']: 111 | backbone = get_pretrained_model(args) 112 | elif args.backbone.startswith("resnet18"): 113 | backbone = torchvision.models.resnet18(pretrained=True) 114 | elif args.backbone.startswith("resnet50"): 115 | backbone = torchvision.models.resnet50(pretrained=True) 116 | elif args.backbone.startswith("resnet101"): 117 | backbone = torchvision.models.resnet101(pretrained=True) 118 | for name, child in backbone.named_children(): 119 | # Freeze layers before conv_3 120 | if name == "layer3": 121 | break 122 | for params in child.parameters(): 123 | params.requires_grad = False 124 | if args.backbone.endswith("conv4"): 125 | logging.debug(f"Train only conv4_x of the resnet{args.backbone.split('conv')[0]} (remove conv5_x), freeze the previous ones") 126 | layers = list(backbone.children())[:-3] 127 | elif args.backbone.endswith("conv5"): 128 | logging.debug(f"Train only conv4_x and conv5_x of the resnet{args.backbone.split('conv')[0]}, freeze the previous ones") 129 | layers = list(backbone.children())[:-2] 130 | elif args.backbone == "vgg16": 131 | if args.pretrain in ['places', 'gldv2']: 132 | backbone = get_pretrained_model(args) 133 | else: 134 | backbone = torchvision.models.vgg16(pretrained=True) 135 | layers = list(backbone.features.children())[:-2] 136 | for l in layers[:-5]: 137 | for p in l.parameters(): p.requires_grad = False 138 | logging.debug("Train last layers of the vgg16, freeze the previous ones") 139 | elif args.backbone == "alexnet": 140 | backbone = torchvision.models.alexnet(pretrained=True) 141 | layers = list(backbone.features.children())[:-2] 142 | for l in layers[:5]: 143 | for p in l.parameters(): p.requires_grad = False 144 | logging.debug("Train last layers of the alexnet, freeze the previous ones") 145 | elif args.backbone.startswith("cct"): 146 | if args.backbone.startswith("cct384"): 147 | backbone = cct_14_7x2_384(pretrained=True, progress=True, aggregation=args.aggregation) 148 | if args.trunc_te: 149 | logging.debug(f"Truncate CCT at transformers encoder {args.trunc_te}") 150 | backbone.classifier.blocks = torch.nn.ModuleList(backbone.classifier.blocks[:args.trunc_te].children()) 151 | if args.freeze_te: 152 | logging.debug(f"Freeze all the layers up to tranformer encoder {args.freeze_te}") 153 | for p in backbone.parameters(): 154 | p.requires_grad = False 155 | for name, child in backbone.classifier.blocks.named_children(): 156 | if int(name) > args.freeze_te: 157 | for params in child.parameters(): 158 | params.requires_grad = True 159 | args.features_dim = 384 160 | return backbone 161 | elif args.backbone.startswith("vit"): 162 | assert args.resize[0] in [224, 384], f'Image size for ViT must be either 224 or 384, but it\'s {args.resize[0]}' 163 | if args.resize[0] == 224: 164 | backbone = ViTModel.from_pretrained('google/vit-base-patch16-224-in21k') 165 | elif args.resize[0] == 384: 166 | backbone = ViTModel.from_pretrained('google/vit-base-patch16-384') 167 | 168 | if args.trunc_te: 169 | logging.debug(f"Truncate ViT at transformers encoder {args.trunc_te}") 170 | backbone.encoder.layer = backbone.encoder.layer[:args.trunc_te] 171 | if args.freeze_te: 172 | logging.debug(f"Freeze all the layers up to tranformer encoder {args.freeze_te+1}") 173 | for p in backbone.parameters(): 174 | p.requires_grad = False 175 | for name, child in backbone.encoder.layer.named_children(): 176 | if int(name) > args.freeze_te: 177 | for params in child.parameters(): 178 | params.requires_grad = True 179 | backbone = VitWrapper(backbone, args.aggregation) 180 | 181 | args.features_dim = 768 182 | return backbone 183 | 184 | backbone = torch.nn.Sequential(*layers) 185 | args.features_dim = get_output_channels_dim(backbone) # Dinamically obtain number of channels in output 186 | return backbone 187 | 188 | 189 | class VitWrapper(nn.Module): 190 | def __init__(self, vit_model, aggregation): 191 | super().__init__() 192 | self.vit_model = vit_model 193 | self.aggregation = aggregation 194 | def forward(self, x): 195 | if self.aggregation in ["netvlad", "gem"]: 196 | return self.vit_model(x).last_hidden_state[:, 1:, :] 197 | else: 198 | return self.vit_model(x).last_hidden_state[:, 0, :] 199 | 200 | 201 | def get_output_channels_dim(model): 202 | """Return the number of channels in the output of a model.""" 203 | return model(torch.ones([1, 3, 224, 224])).shape[1] 204 | 205 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/model/normalization.py: -------------------------------------------------------------------------------- 1 | 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | class L2Norm(nn.Module): 6 | def __init__(self, dim=1): 7 | super().__init__() 8 | self.dim = dim 9 | def forward(self, x): 10 | return F.normalize(x, p=2, dim=self.dim) 11 | 12 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/model/sync_batchnorm/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # File : __init__.py 3 | # Author : Jiayuan Mao 4 | # Email : maojiayuan@gmail.com 5 | # Date : 27/01/2018 6 | # 7 | # This file is part of Synchronized-BatchNorm-PyTorch. 8 | # https://github.com/vacancy/Synchronized-BatchNorm-PyTorch 9 | # Distributed under MIT License. 10 | 11 | from .batchnorm import set_sbn_eps_mode 12 | from .batchnorm import SynchronizedBatchNorm1d, SynchronizedBatchNorm2d, SynchronizedBatchNorm3d 13 | from .batchnorm import patch_sync_batchnorm, convert_model 14 | from .replicate import DataParallelWithCallback, patch_replication_callback 15 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/model/sync_batchnorm/batchnorm_reimpl.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | # File : batchnorm_reimpl.py 4 | # Author : acgtyrant 5 | # Date : 11/01/2018 6 | # 7 | # This file is part of Synchronized-BatchNorm-PyTorch. 8 | # https://github.com/vacancy/Synchronized-BatchNorm-PyTorch 9 | # Distributed under MIT License. 10 | 11 | import torch 12 | import torch.nn as nn 13 | import torch.nn.init as init 14 | 15 | __all__ = ['BatchNorm2dReimpl'] 16 | 17 | 18 | class BatchNorm2dReimpl(nn.Module): 19 | """ 20 | A re-implementation of batch normalization, used for testing the numerical 21 | stability. 22 | 23 | Author: acgtyrant 24 | See also: 25 | https://github.com/vacancy/Synchronized-BatchNorm-PyTorch/issues/14 26 | """ 27 | def __init__(self, num_features, eps=1e-5, momentum=0.1): 28 | super().__init__() 29 | 30 | self.num_features = num_features 31 | self.eps = eps 32 | self.momentum = momentum 33 | self.weight = nn.Parameter(torch.empty(num_features)) 34 | self.bias = nn.Parameter(torch.empty(num_features)) 35 | self.register_buffer('running_mean', torch.zeros(num_features)) 36 | self.register_buffer('running_var', torch.ones(num_features)) 37 | self.reset_parameters() 38 | 39 | def reset_running_stats(self): 40 | self.running_mean.zero_() 41 | self.running_var.fill_(1) 42 | 43 | def reset_parameters(self): 44 | self.reset_running_stats() 45 | init.uniform_(self.weight) 46 | init.zeros_(self.bias) 47 | 48 | def forward(self, input_): 49 | batchsize, channels, height, width = input_.size() 50 | numel = batchsize * height * width 51 | input_ = input_.permute(1, 0, 2, 3).contiguous().view(channels, numel) 52 | sum_ = input_.sum(1) 53 | sum_of_square = input_.pow(2).sum(1) 54 | mean = sum_ / numel 55 | sumvar = sum_of_square - sum_ * mean 56 | 57 | self.running_mean = ( 58 | (1 - self.momentum) * self.running_mean 59 | + self.momentum * mean.detach() 60 | ) 61 | unbias_var = sumvar / (numel - 1) 62 | self.running_var = ( 63 | (1 - self.momentum) * self.running_var 64 | + self.momentum * unbias_var.detach() 65 | ) 66 | 67 | bias_var = sumvar / numel 68 | inv_std = 1 / (bias_var + self.eps).pow(0.5) 69 | output = ( 70 | (input_ - mean.unsqueeze(1)) * inv_std.unsqueeze(1) * 71 | self.weight.unsqueeze(1) + self.bias.unsqueeze(1)) 72 | 73 | return output.view(channels, batchsize, height, width).permute(1, 0, 2, 3).contiguous() 74 | 75 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/model/sync_batchnorm/comm.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # File : comm.py 3 | # Author : Jiayuan Mao 4 | # Email : maojiayuan@gmail.com 5 | # Date : 27/01/2018 6 | # 7 | # This file is part of Synchronized-BatchNorm-PyTorch. 8 | # https://github.com/vacancy/Synchronized-BatchNorm-PyTorch 9 | # Distributed under MIT License. 10 | 11 | import queue 12 | import collections 13 | import threading 14 | 15 | __all__ = ['FutureResult', 'SlavePipe', 'SyncMaster'] 16 | 17 | 18 | class FutureResult(object): 19 | """A thread-safe future implementation. Used only as one-to-one pipe.""" 20 | 21 | def __init__(self): 22 | self._result = None 23 | self._lock = threading.Lock() 24 | self._cond = threading.Condition(self._lock) 25 | 26 | def put(self, result): 27 | with self._lock: 28 | assert self._result is None, 'Previous result has\'t been fetched.' 29 | self._result = result 30 | self._cond.notify() 31 | 32 | def get(self): 33 | with self._lock: 34 | if self._result is None: 35 | self._cond.wait() 36 | 37 | res = self._result 38 | self._result = None 39 | return res 40 | 41 | 42 | _MasterRegistry = collections.namedtuple('MasterRegistry', ['result']) 43 | _SlavePipeBase = collections.namedtuple('_SlavePipeBase', ['identifier', 'queue', 'result']) 44 | 45 | 46 | class SlavePipe(_SlavePipeBase): 47 | """Pipe for master-slave communication.""" 48 | 49 | def run_slave(self, msg): 50 | self.queue.put((self.identifier, msg)) 51 | ret = self.result.get() 52 | self.queue.put(True) 53 | return ret 54 | 55 | 56 | class SyncMaster(object): 57 | """An abstract `SyncMaster` object. 58 | 59 | - During the replication, as the data parallel will trigger an callback of each module, all slave devices should 60 | call `register(id)` and obtain an `SlavePipe` to communicate with the master. 61 | - During the forward pass, master device invokes `run_master`, all messages from slave devices will be collected, 62 | and passed to a registered callback. 63 | - After receiving the messages, the master device should gather the information and determine to message passed 64 | back to each slave devices. 65 | """ 66 | 67 | def __init__(self, master_callback): 68 | """ 69 | 70 | Args: 71 | master_callback: a callback to be invoked after having collected messages from slave devices. 72 | """ 73 | self._master_callback = master_callback 74 | self._queue = queue.Queue() 75 | self._registry = collections.OrderedDict() 76 | self._activated = False 77 | 78 | def __getstate__(self): 79 | return {'master_callback': self._master_callback} 80 | 81 | def __setstate__(self, state): 82 | self.__init__(state['master_callback']) 83 | 84 | def register_slave(self, identifier): 85 | """ 86 | Register an slave device. 87 | 88 | Args: 89 | identifier: an identifier, usually is the device id. 90 | 91 | Returns: a `SlavePipe` object which can be used to communicate with the master device. 92 | 93 | """ 94 | if self._activated: 95 | assert self._queue.empty(), 'Queue is not clean before next initialization.' 96 | self._activated = False 97 | self._registry.clear() 98 | future = FutureResult() 99 | self._registry[identifier] = _MasterRegistry(future) 100 | return SlavePipe(identifier, self._queue, future) 101 | 102 | def run_master(self, master_msg): 103 | """ 104 | Main entry for the master device in each forward pass. 105 | The messages were first collected from each devices (including the master device), and then 106 | an callback will be invoked to compute the message to be sent back to each devices 107 | (including the master device). 108 | 109 | Args: 110 | master_msg: the message that the master want to send to itself. This will be placed as the first 111 | message when calling `master_callback`. For detailed usage, see `_SynchronizedBatchNorm` for an example. 112 | 113 | Returns: the message to be sent back to the master device. 114 | 115 | """ 116 | self._activated = True 117 | 118 | intermediates = [(0, master_msg)] 119 | for i in range(self.nr_slaves): 120 | intermediates.append(self._queue.get()) 121 | 122 | results = self._master_callback(intermediates) 123 | assert results[0][0] == 0, 'The first result should belongs to the master.' 124 | 125 | for i, res in results: 126 | if i == 0: 127 | continue 128 | self._registry[i].result.put(res) 129 | 130 | for i in range(self.nr_slaves): 131 | assert self._queue.get() is True 132 | 133 | return results[0][1] 134 | 135 | @property 136 | def nr_slaves(self): 137 | return len(self._registry) 138 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/model/sync_batchnorm/replicate.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # File : replicate.py 3 | # Author : Jiayuan Mao 4 | # Email : maojiayuan@gmail.com 5 | # Date : 27/01/2018 6 | # 7 | # This file is part of Synchronized-BatchNorm-PyTorch. 8 | # https://github.com/vacancy/Synchronized-BatchNorm-PyTorch 9 | # Distributed under MIT License. 10 | 11 | import functools 12 | 13 | from torch.nn.parallel.data_parallel import DataParallel 14 | 15 | __all__ = [ 16 | 'CallbackContext', 17 | 'execute_replication_callbacks', 18 | 'DataParallelWithCallback', 19 | 'patch_replication_callback' 20 | ] 21 | 22 | 23 | class CallbackContext(object): 24 | pass 25 | 26 | 27 | def execute_replication_callbacks(modules): 28 | """ 29 | Execute an replication callback `__data_parallel_replicate__` on each module created by original replication. 30 | 31 | The callback will be invoked with arguments `__data_parallel_replicate__(ctx, copy_id)` 32 | 33 | Note that, as all modules are isomorphism, we assign each sub-module with a context 34 | (shared among multiple copies of this module on different devices). 35 | Through this context, different copies can share some information. 36 | 37 | We guarantee that the callback on the master copy (the first copy) will be called ahead of calling the callback 38 | of any slave copies. 39 | """ 40 | master_copy = modules[0] 41 | nr_modules = len(list(master_copy.modules())) 42 | ctxs = [CallbackContext() for _ in range(nr_modules)] 43 | 44 | for i, module in enumerate(modules): 45 | for j, m in enumerate(module.modules()): 46 | if hasattr(m, '__data_parallel_replicate__'): 47 | m.__data_parallel_replicate__(ctxs[j], i) 48 | 49 | 50 | class DataParallelWithCallback(DataParallel): 51 | """ 52 | Data Parallel with a replication callback. 53 | 54 | An replication callback `__data_parallel_replicate__` of each module will be invoked after being created by 55 | original `replicate` function. 56 | The callback will be invoked with arguments `__data_parallel_replicate__(ctx, copy_id)` 57 | 58 | Examples: 59 | > sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False) 60 | > sync_bn = DataParallelWithCallback(sync_bn, device_ids=[0, 1]) 61 | # sync_bn.__data_parallel_replicate__ will be invoked. 62 | """ 63 | 64 | def replicate(self, module, device_ids): 65 | modules = super(DataParallelWithCallback, self).replicate(module, device_ids) 66 | execute_replication_callbacks(modules) 67 | return modules 68 | 69 | 70 | def patch_replication_callback(data_parallel): 71 | """ 72 | Monkey-patch an existing `DataParallel` object. Add the replication callback. 73 | Useful when you have customized `DataParallel` implementation. 74 | 75 | Examples: 76 | > sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False) 77 | > sync_bn = DataParallel(sync_bn, device_ids=[0, 1]) 78 | > patch_replication_callback(sync_bn) 79 | # this is equivalent to 80 | > sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False) 81 | > sync_bn = DataParallelWithCallback(sync_bn, device_ids=[0, 1]) 82 | """ 83 | 84 | assert isinstance(data_parallel, DataParallel) 85 | 86 | old_replicate = data_parallel.replicate 87 | 88 | @functools.wraps(old_replicate) 89 | def new_replicate(module, device_ids): 90 | modules = old_replicate(module, device_ids) 91 | execute_replication_callbacks(modules) 92 | return modules 93 | 94 | data_parallel.replicate = new_replicate 95 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/model/sync_batchnorm/unittest.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # File : unittest.py 3 | # Author : Jiayuan Mao 4 | # Email : maojiayuan@gmail.com 5 | # Date : 27/01/2018 6 | # 7 | # This file is part of Synchronized-BatchNorm-PyTorch. 8 | # https://github.com/vacancy/Synchronized-BatchNorm-PyTorch 9 | # Distributed under MIT License. 10 | 11 | import unittest 12 | import torch 13 | 14 | 15 | class TorchTestCase(unittest.TestCase): 16 | def assertTensorClose(self, x, y): 17 | adiff = float((x - y).abs().max()) 18 | if (y == 0).all(): 19 | rdiff = 'NaN' 20 | else: 21 | rdiff = float((adiff / y).abs().max()) 22 | 23 | message = ( 24 | 'Tensor close check failed\n' 25 | 'adiff={}\n' 26 | 'rdiff={}\n' 27 | ).format(adiff, rdiff) 28 | self.assertTrue(torch.allclose(x, y, atol=1e-5, rtol=1e-3), message) 29 | 30 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/benchmark/deep-visual-geo-localization-benchmark/models/__init__.py -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/models/aggregators/__init__.py: -------------------------------------------------------------------------------- 1 | from .cosplace import CosPlace 2 | from .convap import ConvAP 3 | from .gem import GeMPool 4 | from .mixvpr import MixVPR -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/models/aggregators/convap.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | import torch.nn as nn 4 | 5 | 6 | class ConvAP(nn.Module): 7 | """Implementation of ConvAP as of https://arxiv.org/pdf/2210.10239.pdf 8 | 9 | Args: 10 | in_channels (int): number of channels in the input of ConvAP 11 | out_channels (int, optional): number of channels that ConvAP outputs. Defaults to 512. 12 | s1 (int, optional): spatial height of the adaptive average pooling. Defaults to 2. 13 | s2 (int, optional): spatial width of the adaptive average pooling. Defaults to 2. 14 | """ 15 | def __init__(self, in_channels, out_channels=512, s1=2, s2=2): 16 | super(ConvAP, self).__init__() 17 | self.channel_pool = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, bias=True) 18 | self.AAP = nn.AdaptiveAvgPool2d((s1, s2)) 19 | 20 | def forward(self, x): 21 | x = self.channel_pool(x) 22 | x = self.AAP(x) 23 | x = F.normalize(x.flatten(1), p=2, dim=1) 24 | return x 25 | 26 | 27 | if __name__ == '__main__': 28 | x = torch.randn(4, 2048, 10, 10) 29 | m = ConvAP(2048, 512) 30 | r = m(x) 31 | print(r.shape) -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/models/aggregators/cosplace.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | import torch.nn as nn 4 | 5 | class GeM(nn.Module): 6 | """Implementation of GeM as in https://github.com/filipradenovic/cnnimageretrieval-pytorch 7 | """ 8 | def __init__(self, p=3, eps=1e-6): 9 | super().__init__() 10 | self.p = nn.Parameter(torch.ones(1)*p) 11 | self.eps = eps 12 | 13 | def forward(self, x): 14 | return F.avg_pool2d(x.clamp(min=self.eps).pow(self.p), (x.size(-2), x.size(-1))).pow(1./self.p) 15 | 16 | class CosPlace(nn.Module): 17 | """ 18 | CosPlace aggregation layer as implemented in https://github.com/gmberton/CosPlace/blob/main/model/network.py 19 | 20 | Args: 21 | in_dim: number of channels of the input 22 | out_dim: dimension of the output descriptor 23 | """ 24 | def __init__(self, in_dim, out_dim): 25 | super().__init__() 26 | self.gem = GeM() 27 | self.fc = nn.Linear(in_dim, out_dim) 28 | 29 | def forward(self, x): 30 | x = F.normalize(x, p=2, dim=1) 31 | x = self.gem(x) 32 | x = x.flatten(1) 33 | x = self.fc(x) 34 | x = F.normalize(x, p=2, dim=1) 35 | return x 36 | 37 | if __name__ == '__main__': 38 | x = torch.randn(4, 2048, 10, 10) 39 | m = CosPlace(2048, 512) 40 | r = m(x) 41 | print(r.shape) -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/models/aggregators/gem.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | import torch.nn as nn 4 | 5 | class GeMPool(nn.Module): 6 | """Implementation of GeM as in https://github.com/filipradenovic/cnnimageretrieval-pytorch 7 | we add flatten and norm so that we can use it as one aggregation layer. 8 | """ 9 | def __init__(self, p=3, eps=1e-6): 10 | super().__init__() 11 | self.p = nn.Parameter(torch.ones(1)*p) 12 | self.eps = eps 13 | 14 | def forward(self, x): 15 | x = F.avg_pool2d(x.clamp(min=self.eps).pow(self.p), (x.size(-2), x.size(-1))).pow(1./self.p) 16 | x = x.flatten(1) 17 | return F.normalize(x, p=2, dim=1) -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/models/aggregators/mixvpr.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | import torch.nn as nn 4 | 5 | import numpy as np 6 | 7 | 8 | class FeatureMixerLayer(nn.Module): 9 | def __init__(self, in_dim, mlp_ratio=1): 10 | super().__init__() 11 | self.mix = nn.Sequential( 12 | nn.LayerNorm(in_dim), 13 | nn.Linear(in_dim, int(in_dim * mlp_ratio)), 14 | nn.ReLU(), 15 | nn.Linear(int(in_dim * mlp_ratio), in_dim), 16 | ) 17 | 18 | for m in self.modules(): 19 | if isinstance(m, (nn.Linear)): 20 | nn.init.trunc_normal_(m.weight, std=0.02) 21 | if m.bias is not None: 22 | nn.init.zeros_(m.bias) 23 | 24 | def forward(self, x): 25 | return x + self.mix(x) 26 | 27 | 28 | class MixVPR(nn.Module): 29 | def __init__(self, 30 | in_channels=1024, 31 | in_h=20, 32 | in_w=20, 33 | out_channels=512, 34 | mix_depth=1, 35 | mlp_ratio=1, 36 | out_rows=4, 37 | ) -> None: 38 | super().__init__() 39 | 40 | self.in_h = in_h # height of input feature maps 41 | self.in_w = in_w # width of input feature maps 42 | self.in_channels = in_channels # depth of input feature maps 43 | 44 | self.out_channels = out_channels # depth wise projection dimension 45 | self.out_rows = out_rows # row wise projection dimesion 46 | 47 | self.mix_depth = mix_depth # L the number of stacked FeatureMixers 48 | self.mlp_ratio = mlp_ratio # ratio of the mid projection layer in the mixer block 49 | 50 | hw = in_h*in_w 51 | self.mix = nn.Sequential(*[ 52 | FeatureMixerLayer(in_dim=hw, mlp_ratio=mlp_ratio) 53 | for _ in range(self.mix_depth) 54 | ]) 55 | self.channel_proj = nn.Linear(in_channels, out_channels) 56 | self.row_proj = nn.Linear(hw, out_rows) 57 | 58 | def forward(self, x): 59 | x = x.flatten(2) 60 | x = self.mix(x) 61 | x = x.permute(0, 2, 1) 62 | x = self.channel_proj(x) 63 | x = x.permute(0, 2, 1) 64 | x = self.row_proj(x) 65 | x = F.normalize(x.flatten(1), p=2, dim=-1) 66 | return x 67 | 68 | 69 | # ------------------------------------------------------------------------------- 70 | 71 | def print_nb_params(m): 72 | model_parameters = filter(lambda p: p.requires_grad, m.parameters()) 73 | params = sum([np.prod(p.size()) for p in model_parameters]) 74 | print(f'Trainable parameters: {params/1e6:.3}M') 75 | 76 | 77 | def main(): 78 | x = torch.randn(1, 1024, 20, 20) 79 | agg = MixVPR( 80 | in_channels=1024, 81 | in_h=20, 82 | in_w=20, 83 | out_channels=1024, 84 | mix_depth=4, 85 | mlp_ratio=1, 86 | out_rows=4) 87 | 88 | print_nb_params(agg) 89 | output = agg(x) 90 | print(output.shape) 91 | 92 | 93 | if __name__ == '__main__': 94 | main() 95 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/models/backbones/__init__.py: -------------------------------------------------------------------------------- 1 | from .efficientnet import EfficientNet 2 | from .resnet import ResNet 3 | from .swin import Swin -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/models/backbones/efficientnet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import timm 4 | import numpy as np 5 | 6 | class EfficientNet(nn.Module): 7 | def __init__(self, 8 | model_name='efficientnet_b0', 9 | pretrained=True, 10 | layers_to_freeze=4, 11 | ): 12 | """Class representing the EfficientNet backbone used in the pipeline 13 | EfficientNet contains 7 efficient blocks (0 to 6), 14 | we don't take into account the global pooling and the last fc 15 | 16 | Args: 17 | model_name (str, optional): The architecture of the efficietnet backbone to instanciate. Defaults to 'efficientnet_b0'. 18 | pretrained (bool, optional): Whether pretrained or not. Defaults to True. 19 | layers_to_freeze (int, optional): The number of blocks to freeze (starting from 0) . Defaults to 4. 20 | """ 21 | super().__init__() 22 | self.model_name = model_name 23 | self.layers_to_freeze = layers_to_freeze 24 | self.model = timm.create_model(model_name=model_name, pretrained=pretrained) 25 | 26 | # freeze only if the model is pretrained 27 | if pretrained: 28 | if layers_to_freeze >= 0: 29 | self.model.conv_stem.requires_grad_(False) 30 | self.model.blocks[0].requires_grad_(False) 31 | self.model.blocks[1].requires_grad_(False) 32 | if layers_to_freeze >= 1: 33 | self.model.blocks[2].requires_grad_(False) 34 | if layers_to_freeze >= 2: 35 | self.model.blocks[3].requires_grad_(False) 36 | if layers_to_freeze >= 3: 37 | self.model.blocks[4].requires_grad_(False) 38 | if layers_to_freeze >= 4: 39 | self.model.blocks[5].requires_grad_(False) 40 | 41 | self.model.global_pool = None 42 | self.model.fc = None 43 | 44 | out_channels = 1280 # for b0 and b1 45 | if 'b2' in model_name: 46 | out_channels = 1408 47 | elif 'b3' in model_name: 48 | out_channels = 1536 49 | elif 'b4' in model_name: 50 | out_channels = 1792 51 | self.out_channels = out_channels 52 | 53 | def forward(self, x): 54 | x = self.model.forward_features(x) 55 | return x 56 | 57 | 58 | def print_nb_params(m): 59 | model_parameters = filter(lambda p: p.requires_grad, m.parameters()) 60 | params = sum([np.prod(p.size()) for p in model_parameters]) 61 | print(f'Trainable parameters: {params/1e6:.3}M') 62 | 63 | 64 | if __name__ == '__main__': 65 | x = torch.randn(4, 3, 320, 320) 66 | m = EfficientNet(model_name='efficientnet_b0', 67 | pretrained=True, 68 | layers_to_freeze=0, 69 | ) 70 | r = m(x) 71 | print_nb_params(m) 72 | print(f'Input shape is {x.shape}') 73 | print(f'Output shape is {r.shape}') 74 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/models/backbones/resnet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torchvision 4 | import numpy as np 5 | 6 | class ResNet(nn.Module): 7 | def __init__(self, 8 | model_name='resnet50', 9 | pretrained=True, 10 | layers_to_freeze=2, 11 | layers_to_crop=[], 12 | ): 13 | """Class representing the resnet backbone used in the pipeline 14 | we consider resnet network as a list of 5 blocks (from 0 to 4), 15 | layer 0 is the first conv+bn and the other layers (1 to 4) are the rest of the residual blocks 16 | we don't take into account the global pooling and the last fc 17 | 18 | Args: 19 | model_name (str, optional): The architecture of the resnet backbone to instanciate. Defaults to 'resnet50'. 20 | pretrained (bool, optional): Whether pretrained or not. Defaults to True. 21 | layers_to_freeze (int, optional): The number of residual blocks to freeze (starting from 0) . Defaults to 2. 22 | layers_to_crop (list, optional): Which residual layers to crop, for example [3,4] will crop the third and fourth res blocks. Defaults to []. 23 | 24 | Raises: 25 | NotImplementedError: if the model_name corresponds to an unknown architecture. 26 | """ 27 | super().__init__() 28 | self.model_name = model_name.lower() 29 | self.layers_to_freeze = layers_to_freeze 30 | 31 | if pretrained: 32 | # the new naming of pretrained weights, you can change to V2 if desired. 33 | weights = 'IMAGENET1K_V1' 34 | else: 35 | weights = None 36 | 37 | if 'swsl' in model_name or 'ssl' in model_name: 38 | # These are the semi supervised and weakly semi supervised weights from Facebook 39 | self.model = torch.hub.load( 40 | 'facebookresearch/semi-supervised-ImageNet1K-models', model_name) 41 | else: 42 | if 'resnext50' in model_name: 43 | self.model = torchvision.models.resnext50_32x4d(weights=weights) 44 | elif 'resnet50' in model_name: 45 | self.model = torchvision.models.resnet50(weights=weights) 46 | elif '101' in model_name: 47 | self.model = torchvision.models.resnet101(weights=weights) 48 | elif '152' in model_name: 49 | self.model = torchvision.models.resnet152(weights=weights) 50 | elif '34' in model_name: 51 | self.model = torchvision.models.resnet34(weights=weights) 52 | elif '18' in model_name: 53 | # self.model = torchvision.models.resnet18(pretrained=False) 54 | self.model = torchvision.models.resnet18(weights=weights) 55 | elif 'wide_resnet50_2' in model_name: 56 | self.model = torchvision.models.wide_resnet50_2(weights=weights) 57 | else: 58 | raise NotImplementedError( 59 | 'Backbone architecture not recognized!') 60 | 61 | # freeze only if the model is pretrained 62 | if pretrained: 63 | if layers_to_freeze >= 0: 64 | self.model.conv1.requires_grad_(False) 65 | self.model.bn1.requires_grad_(False) 66 | if layers_to_freeze >= 1: 67 | self.model.layer1.requires_grad_(False) 68 | if layers_to_freeze >= 2: 69 | self.model.layer2.requires_grad_(False) 70 | if layers_to_freeze >= 3: 71 | self.model.layer3.requires_grad_(False) 72 | 73 | # remove the avgpool and most importantly the fc layer 74 | self.model.avgpool = None 75 | self.model.fc = None 76 | 77 | if 4 in layers_to_crop: 78 | self.model.layer4 = None 79 | if 3 in layers_to_crop: 80 | self.model.layer3 = None 81 | 82 | out_channels = 2048 83 | if '34' in model_name or '18' in model_name: 84 | out_channels = 512 85 | 86 | self.out_channels = out_channels // 2 if self.model.layer4 is None else out_channels 87 | self.out_channels = self.out_channels // 2 if self.model.layer3 is None else self.out_channels 88 | 89 | def forward(self, x): 90 | x = self.model.conv1(x) 91 | x = self.model.bn1(x) 92 | x = self.model.relu(x) 93 | x = self.model.maxpool(x) 94 | x = self.model.layer1(x) 95 | x = self.model.layer2(x) 96 | if self.model.layer3 is not None: 97 | x = self.model.layer3(x) 98 | if self.model.layer4 is not None: 99 | x = self.model.layer4(x) 100 | return x 101 | 102 | 103 | # def print_nb_params(m): 104 | # model_parameters = filter(lambda p: p.requires_grad, m.parameters()) 105 | # params = sum([np.prod(p.size()) for p in model_parameters]) 106 | # print(f'Trainable parameters: {params/1e6:.3}M') 107 | 108 | 109 | # def main(): 110 | # x = torch.randn(1, 3, 320, 320) 111 | # m = ResNet(model_name='resnet50', 112 | # pretrained=True, 113 | # layers_to_freeze=2, 114 | # layers_to_crop=[],) 115 | # r = m(x) 116 | # helper.print_nb_params(m) 117 | # print(f'Input shape is {x.shape}') 118 | # print(f'Output shape is {r.shape}') 119 | 120 | 121 | # if __name__ == '__main__': 122 | # main() 123 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/models/backbones/swin.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import timm 4 | import numpy as np 5 | 6 | 7 | class Swin(nn.Module): 8 | def __init__(self, 9 | model_name='swinv2_base_window12to16_192to256_22kft1k', 10 | pretrained=True, 11 | layers_to_freeze=2 12 | ): 13 | """Class representing the Swin (V1 and V2) backbone used in the pipeline 14 | Swin contains 4 layers (0 to 3), where layer 2 is the heaviest in terms of # params 15 | 16 | Args: 17 | model_name (str, optional): The architecture of the Swin backbone to instanciate. Defaults to 'swinv2_base_window12to16_192to256_22kft1k'. 18 | pretrained (bool, optional): Whether pretrained or not. Defaults to True. 19 | layers_to_freeze (int, optional): The number of blocks to freeze in layers[2] (starting from 0) . Defaults to 2. 20 | """ 21 | super().__init__() 22 | self.model_name = model_name 23 | self.layers_to_freeze = layers_to_freeze 24 | self.model = timm.create_model(model_name, pretrained=pretrained, num_classes=0) 25 | self.model.head = None 26 | 27 | if pretrained: 28 | self.model.patch_embed.requires_grad_(False) 29 | self.model.layers[0].requires_grad_(False) 30 | self.model.layers[1].requires_grad_(False) 31 | # layers[2] contains most of the blocks, better freeze some of them 32 | for i in range(layers_to_freeze*5): # we make 5 steps (swin contains lots of layers) 33 | self.model.layers[2].blocks[i].requires_grad_(False) 34 | 35 | 36 | if 'base' in model_name: 37 | out_channels = 1024 38 | elif 'large' in model_name: 39 | out_channels = 1536 40 | else: 41 | out_channels = 768 42 | self.out_channels = out_channels 43 | 44 | if '384' in model_name: 45 | self.depth = 144 46 | else: 47 | self.depth = 49 48 | 49 | def forward(self, x): 50 | x = self.model.forward_features(x) 51 | # the following is a hack to make the output of the transformer 52 | # as a 3D feature maps 53 | bs, f, c = x.shape 54 | x = x.view(bs, int(np.sqrt(f)), int(np.sqrt(f)), c) 55 | return x.permute(0,3,1,2) 56 | 57 | 58 | def print_nb_params(m): 59 | model_parameters = filter(lambda p: p.requires_grad, m.parameters()) 60 | params = sum([np.prod(p.size()) for p in model_parameters]) 61 | print(f'Trainable parameters: {params/1e6:.3}M') 62 | 63 | if __name__ == '__main__': 64 | x = torch.randn(4,3,256,256) 65 | m = Swin(model_name='swinv2_base_window12to16_192to256_22kft1k', 66 | pretrained=True, 67 | layers_to_freeze=2,) 68 | r = m(x) 69 | print_nb_params(m) 70 | print(f'Input shape is {x.shape}') 71 | print(f'Output shape is {r.shape}') -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/models/helper.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from models import aggregators 3 | from models import backbones 4 | 5 | 6 | def get_backbone(backbone_arch='resnet50', 7 | pretrained=True, 8 | layers_to_freeze=2, 9 | layers_to_crop=[],): 10 | """Helper function that returns the backbone given its name 11 | 12 | Args: 13 | backbone_arch (str, optional): . Defaults to 'resnet50'. 14 | pretrained (bool, optional): . Defaults to True. 15 | layers_to_freeze (int, optional): . Defaults to 2. 16 | layers_to_crop (list, optional): This is mostly used with ResNet where 17 | we sometimes need to crop the last 18 | residual block (ex. [4]). Defaults to []. 19 | 20 | Returns: 21 | nn.Module: the backbone as a nn.Model object 22 | """ 23 | if 'resnet' in backbone_arch.lower(): 24 | return backbones.ResNet(backbone_arch, pretrained, layers_to_freeze, layers_to_crop) 25 | 26 | elif 'efficient' in backbone_arch.lower(): 27 | if '_b' in backbone_arch.lower(): 28 | return backbones.EfficientNet(backbone_arch, pretrained, layers_to_freeze+2) 29 | else: 30 | return backbones.EfficientNet(model_name='efficientnet_b0', 31 | pretrained=pretrained, 32 | layers_to_freeze=layers_to_freeze) 33 | 34 | elif 'swin' in backbone_arch.lower(): 35 | return backbones.Swin(model_name='swinv2_base_window12to16_192to256_22kft1k', 36 | pretrained=pretrained, 37 | layers_to_freeze=layers_to_freeze) 38 | 39 | def get_aggregator(agg_arch='ConvAP', agg_config={}): 40 | """Helper function that returns the aggregation layer given its name. 41 | If you happen to make your own aggregator, you might need to add a call 42 | to this helper function. 43 | 44 | Args: 45 | agg_arch (str, optional): the name of the aggregator. Defaults to 'ConvAP'. 46 | agg_config (dict, optional): this must contain all the arguments needed to instantiate the aggregator class. Defaults to {}. 47 | 48 | Returns: 49 | nn.Module: the aggregation layer 50 | """ 51 | 52 | if 'cosplace' in agg_arch.lower(): 53 | assert 'in_dim' in agg_config 54 | assert 'out_dim' in agg_config 55 | return aggregators.CosPlace(**agg_config) 56 | 57 | elif 'gem' in agg_arch.lower(): 58 | if agg_config == {}: 59 | agg_config['p'] = 3 60 | else: 61 | assert 'p' in agg_config 62 | return aggregators.GeMPool(**agg_config) 63 | 64 | elif 'convap' in agg_arch.lower(): 65 | assert 'in_channels' in agg_config 66 | return aggregators.ConvAP(**agg_config) 67 | 68 | elif 'mixvpr' in agg_arch.lower(): 69 | assert 'in_channels' in agg_config 70 | assert 'out_channels' in agg_config 71 | assert 'in_h' in agg_config 72 | assert 'in_w' in agg_config 73 | assert 'mix_depth' in agg_config 74 | return aggregators.MixVPR(**agg_config) -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/__init__.py: -------------------------------------------------------------------------------- 1 | from pytorch_grad_cam.grad_cam import GradCAM 2 | from pytorch_grad_cam.hirescam import HiResCAM 3 | from pytorch_grad_cam.grad_cam_elementwise import GradCAMElementWise 4 | from pytorch_grad_cam.ablation_layer import AblationLayer, AblationLayerVit, AblationLayerFasterRCNN 5 | from pytorch_grad_cam.ablation_cam import AblationCAM 6 | from pytorch_grad_cam.xgrad_cam import XGradCAM 7 | from pytorch_grad_cam.grad_cam_plusplus import GradCAMPlusPlus 8 | from pytorch_grad_cam.score_cam import ScoreCAM 9 | from pytorch_grad_cam.layer_cam import LayerCAM 10 | from pytorch_grad_cam.eigen_cam import EigenCAM 11 | from pytorch_grad_cam.eigen_grad_cam import EigenGradCAM 12 | from pytorch_grad_cam.random_cam import RandomCAM 13 | from pytorch_grad_cam.fullgrad_cam import FullGrad 14 | from pytorch_grad_cam.guided_backprop import GuidedBackpropReLUModel 15 | from pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients 16 | from pytorch_grad_cam.feature_factorization.deep_feature_factorization import DeepFeatureFactorization, run_dff_on_image 17 | import pytorch_grad_cam.utils.model_targets 18 | import pytorch_grad_cam.utils.reshape_transforms 19 | import pytorch_grad_cam.metrics.cam_mult_image 20 | import pytorch_grad_cam.metrics.road 21 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/ablation_cam.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import tqdm 4 | from typing import Callable, List 5 | from pytorch_grad_cam.base_cam import BaseCAM 6 | from pytorch_grad_cam.utils.find_layers import replace_layer_recursive 7 | from pytorch_grad_cam.ablation_layer import AblationLayer 8 | 9 | 10 | """ Implementation of AblationCAM 11 | https://openaccess.thecvf.com/content_WACV_2020/papers/Desai_Ablation-CAM_Visual_Explanations_for_Deep_Convolutional_Network_via_Gradient-free_Localization_WACV_2020_paper.pdf 12 | 13 | Ablate individual activations, and then measure the drop in the target score. 14 | 15 | In the current implementation, the target layer activations is cached, so it won't be re-computed. 16 | However layers before it, if any, will not be cached. 17 | This means that if the target layer is a large block, for example model.featuers (in vgg), there will 18 | be a large save in run time. 19 | 20 | Since we have to go over many channels and ablate them, and every channel ablation requires a forward pass, 21 | it would be nice if we could avoid doing that for channels that won't contribute anwyay, making it much faster. 22 | The parameter ratio_channels_to_ablate controls how many channels should be ablated, using an experimental method 23 | (to be improved). The default 1.0 value means that all channels will be ablated. 24 | """ 25 | 26 | 27 | class AblationCAM(BaseCAM): 28 | def __init__(self, 29 | model: torch.nn.Module, 30 | target_layers: List[torch.nn.Module], 31 | use_cuda: bool = False, 32 | reshape_transform: Callable = None, 33 | ablation_layer: torch.nn.Module = AblationLayer(), 34 | batch_size: int = 32, 35 | ratio_channels_to_ablate: float = 1.0) -> None: 36 | 37 | super(AblationCAM, self).__init__(model, 38 | target_layers, 39 | use_cuda, 40 | reshape_transform, 41 | uses_gradients=False) 42 | self.batch_size = batch_size 43 | self.ablation_layer = ablation_layer 44 | self.ratio_channels_to_ablate = ratio_channels_to_ablate 45 | 46 | def save_activation(self, module, input, output) -> None: 47 | """ Helper function to save the raw activations from the target layer """ 48 | self.activations = output 49 | 50 | def assemble_ablation_scores(self, 51 | new_scores: list, 52 | original_score: float, 53 | ablated_channels: np.ndarray, 54 | number_of_channels: int) -> np.ndarray: 55 | """ Take the value from the channels that were ablated, 56 | and just set the original score for the channels that were skipped """ 57 | 58 | index = 0 59 | result = [] 60 | sorted_indices = np.argsort(ablated_channels) 61 | ablated_channels = ablated_channels[sorted_indices] 62 | new_scores = np.float32(new_scores)[sorted_indices] 63 | 64 | for i in range(number_of_channels): 65 | if index < len(ablated_channels) and ablated_channels[index] == i: 66 | weight = new_scores[index] 67 | index = index + 1 68 | else: 69 | weight = original_score 70 | result.append(weight) 71 | 72 | return result 73 | 74 | def get_cam_weights(self, 75 | input_tensor: torch.Tensor, 76 | target_layer: torch.nn.Module, 77 | targets: List[Callable], 78 | activations: torch.Tensor, 79 | grads: torch.Tensor) -> np.ndarray: 80 | 81 | # Do a forward pass, compute the target scores, and cache the 82 | # activations 83 | handle = target_layer.register_forward_hook(self.save_activation) 84 | with torch.no_grad(): 85 | outputs = self.model(input_tensor) 86 | handle.remove() 87 | original_scores = np.float32( 88 | [target(output).cpu().item() for target, output in zip(targets, outputs)]) 89 | 90 | # Replace the layer with the ablation layer. 91 | # When we finish, we will replace it back, so the original model is 92 | # unchanged. 93 | ablation_layer = self.ablation_layer 94 | replace_layer_recursive(self.model, target_layer, ablation_layer) 95 | 96 | number_of_channels = activations.shape[1] 97 | weights = [] 98 | # This is a "gradient free" method, so we don't need gradients here. 99 | with torch.no_grad(): 100 | # Loop over each of the batch images and ablate activations for it. 101 | for batch_index, (target, tensor) in enumerate( 102 | zip(targets, input_tensor)): 103 | new_scores = [] 104 | batch_tensor = tensor.repeat(self.batch_size, 1, 1, 1) 105 | 106 | # Check which channels should be ablated. Normally this will be all channels, 107 | # But we can also try to speed this up by using a low 108 | # ratio_channels_to_ablate. 109 | channels_to_ablate = ablation_layer.activations_to_be_ablated( 110 | activations[batch_index, :], self.ratio_channels_to_ablate) 111 | number_channels_to_ablate = len(channels_to_ablate) 112 | 113 | for i in tqdm.tqdm( 114 | range( 115 | 0, 116 | number_channels_to_ablate, 117 | self.batch_size)): 118 | if i + self.batch_size > number_channels_to_ablate: 119 | batch_tensor = batch_tensor[:( 120 | number_channels_to_ablate - i)] 121 | 122 | # Change the state of the ablation layer so it ablates the next channels. 123 | # TBD: Move this into the ablation layer forward pass. 124 | ablation_layer.set_next_batch( 125 | input_batch_index=batch_index, 126 | activations=self.activations, 127 | num_channels_to_ablate=batch_tensor.size(0)) 128 | score = [target(o).cpu().item() 129 | for o in self.model(batch_tensor)] 130 | new_scores.extend(score) 131 | ablation_layer.indices = ablation_layer.indices[batch_tensor.size( 132 | 0):] 133 | 134 | new_scores = self.assemble_ablation_scores( 135 | new_scores, 136 | original_scores[batch_index], 137 | channels_to_ablate, 138 | number_of_channels) 139 | weights.extend(new_scores) 140 | 141 | weights = np.float32(weights) 142 | weights = weights.reshape(activations.shape[:2]) 143 | original_scores = original_scores[:, None] 144 | weights = (original_scores - weights) / original_scores 145 | 146 | # Replace the model back to the original state 147 | replace_layer_recursive(self.model, ablation_layer, target_layer) 148 | return weights 149 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/ablation_cam_multilayer.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | import torch 4 | import tqdm 5 | from pytorch_grad_cam.base_cam import BaseCAM 6 | 7 | 8 | class AblationLayer(torch.nn.Module): 9 | def __init__(self, layer, reshape_transform, indices): 10 | super(AblationLayer, self).__init__() 11 | 12 | self.layer = layer 13 | self.reshape_transform = reshape_transform 14 | # The channels to zero out: 15 | self.indices = indices 16 | 17 | def forward(self, x): 18 | self.__call__(x) 19 | 20 | def __call__(self, x): 21 | output = self.layer(x) 22 | 23 | # Hack to work with ViT, 24 | # Since the activation channels are last and not first like in CNNs 25 | # Probably should remove it? 26 | if self.reshape_transform is not None: 27 | output = output.transpose(1, 2) 28 | 29 | for i in range(output.size(0)): 30 | 31 | # Commonly the minimum activation will be 0, 32 | # And then it makes sense to zero it out. 33 | # However depending on the architecture, 34 | # If the values can be negative, we use very negative values 35 | # to perform the ablation, deviating from the paper. 36 | if torch.min(output) == 0: 37 | output[i, self.indices[i], :] = 0 38 | else: 39 | ABLATION_VALUE = 1e5 40 | output[i, self.indices[i], :] = torch.min( 41 | output) - ABLATION_VALUE 42 | 43 | if self.reshape_transform is not None: 44 | output = output.transpose(2, 1) 45 | 46 | return output 47 | 48 | 49 | def replace_layer_recursive(model, old_layer, new_layer): 50 | for name, layer in model._modules.items(): 51 | if layer == old_layer: 52 | model._modules[name] = new_layer 53 | return True 54 | elif replace_layer_recursive(layer, old_layer, new_layer): 55 | return True 56 | return False 57 | 58 | 59 | class AblationCAM(BaseCAM): 60 | def __init__(self, model, target_layers, use_cuda=False, 61 | reshape_transform=None): 62 | super(AblationCAM, self).__init__(model, target_layers, use_cuda, 63 | reshape_transform) 64 | 65 | if len(target_layers) > 1: 66 | print( 67 | "Warning. You are usign Ablation CAM with more than 1 layers. " 68 | "This is supported only if all layers have the same output shape") 69 | 70 | def set_ablation_layers(self): 71 | self.ablation_layers = [] 72 | for target_layer in self.target_layers: 73 | ablation_layer = AblationLayer(target_layer, 74 | self.reshape_transform, indices=[]) 75 | self.ablation_layers.append(ablation_layer) 76 | replace_layer_recursive(self.model, target_layer, ablation_layer) 77 | 78 | def unset_ablation_layers(self): 79 | # replace the model back to the original state 80 | for ablation_layer, target_layer in zip( 81 | self.ablation_layers, self.target_layers): 82 | replace_layer_recursive(self.model, ablation_layer, target_layer) 83 | 84 | def set_ablation_layer_batch_indices(self, indices): 85 | for ablation_layer in self.ablation_layers: 86 | ablation_layer.indices = indices 87 | 88 | def trim_ablation_layer_batch_indices(self, keep): 89 | for ablation_layer in self.ablation_layers: 90 | ablation_layer.indices = ablation_layer.indices[:keep] 91 | 92 | def get_cam_weights(self, 93 | input_tensor, 94 | target_category, 95 | activations, 96 | grads): 97 | with torch.no_grad(): 98 | outputs = self.model(input_tensor).cpu().numpy() 99 | original_scores = [] 100 | for i in range(input_tensor.size(0)): 101 | original_scores.append(outputs[i, target_category[i]]) 102 | original_scores = np.float32(original_scores) 103 | 104 | self.set_ablation_layers() 105 | 106 | if hasattr(self, "batch_size"): 107 | BATCH_SIZE = self.batch_size 108 | else: 109 | BATCH_SIZE = 32 110 | 111 | number_of_channels = activations.shape[1] 112 | weights = [] 113 | 114 | with torch.no_grad(): 115 | # Iterate over the input batch 116 | for tensor, category in zip(input_tensor, target_category): 117 | batch_tensor = tensor.repeat(BATCH_SIZE, 1, 1, 1) 118 | for i in tqdm.tqdm(range(0, number_of_channels, BATCH_SIZE)): 119 | self.set_ablation_layer_batch_indices( 120 | list(range(i, i + BATCH_SIZE))) 121 | 122 | if i + BATCH_SIZE > number_of_channels: 123 | keep = number_of_channels - i 124 | batch_tensor = batch_tensor[:keep] 125 | self.trim_ablation_layer_batch_indices(self, keep) 126 | score = self.model(batch_tensor)[:, category].cpu().numpy() 127 | weights.extend(score) 128 | 129 | weights = np.float32(weights) 130 | weights = weights.reshape(activations.shape[:2]) 131 | original_scores = original_scores[:, None] 132 | weights = (original_scores - weights) / original_scores 133 | 134 | # replace the model back to the original state 135 | self.unset_ablation_layers() 136 | return weights 137 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/ablation_layer.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from collections import OrderedDict 3 | import numpy as np 4 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection 5 | 6 | 7 | class AblationLayer(torch.nn.Module): 8 | def __init__(self): 9 | super(AblationLayer, self).__init__() 10 | 11 | def objectiveness_mask_from_svd(self, activations, threshold=0.01): 12 | """ Experimental method to get a binary mask to compare if the activation is worth ablating. 13 | The idea is to apply the EigenCAM method by doing PCA on the activations. 14 | Then we create a binary mask by comparing to a low threshold. 15 | Areas that are masked out, are probably not interesting anyway. 16 | """ 17 | 18 | projection = get_2d_projection(activations[None, :])[0, :] 19 | projection = np.abs(projection) 20 | projection = projection - projection.min() 21 | projection = projection / projection.max() 22 | projection = projection > threshold 23 | return projection 24 | 25 | def activations_to_be_ablated( 26 | self, 27 | activations, 28 | ratio_channels_to_ablate=1.0): 29 | """ Experimental method to get a binary mask to compare if the activation is worth ablating. 30 | Create a binary CAM mask with objectiveness_mask_from_svd. 31 | Score each Activation channel, by seeing how much of its values are inside the mask. 32 | Then keep the top channels. 33 | 34 | """ 35 | if ratio_channels_to_ablate == 1.0: 36 | self.indices = np.int32(range(activations.shape[0])) 37 | return self.indices 38 | 39 | projection = self.objectiveness_mask_from_svd(activations) 40 | 41 | scores = [] 42 | for channel in activations: 43 | normalized = np.abs(channel) 44 | normalized = normalized - normalized.min() 45 | normalized = normalized / np.max(normalized) 46 | score = (projection * normalized).sum() / normalized.sum() 47 | scores.append(score) 48 | scores = np.float32(scores) 49 | 50 | indices = list(np.argsort(scores)) 51 | high_score_indices = indices[::- 52 | 1][: int(len(indices) * 53 | ratio_channels_to_ablate)] 54 | low_score_indices = indices[: int( 55 | len(indices) * ratio_channels_to_ablate)] 56 | self.indices = np.int32(high_score_indices + low_score_indices) 57 | return self.indices 58 | 59 | def set_next_batch( 60 | self, 61 | input_batch_index, 62 | activations, 63 | num_channels_to_ablate): 64 | """ This creates the next batch of activations from the layer. 65 | Just take corresponding batch member from activations, and repeat it num_channels_to_ablate times. 66 | """ 67 | self.activations = activations[input_batch_index, :, :, :].clone( 68 | ).unsqueeze(0).repeat(num_channels_to_ablate, 1, 1, 1) 69 | 70 | def __call__(self, x): 71 | output = self.activations 72 | for i in range(output.size(0)): 73 | # Commonly the minimum activation will be 0, 74 | # And then it makes sense to zero it out. 75 | # However depending on the architecture, 76 | # If the values can be negative, we use very negative values 77 | # to perform the ablation, deviating from the paper. 78 | if torch.min(output) == 0: 79 | output[i, self.indices[i], :] = 0 80 | else: 81 | ABLATION_VALUE = 1e7 82 | output[i, self.indices[i], :] = torch.min( 83 | output) - ABLATION_VALUE 84 | 85 | return output 86 | 87 | 88 | class AblationLayerVit(AblationLayer): 89 | def __init__(self): 90 | super(AblationLayerVit, self).__init__() 91 | 92 | def __call__(self, x): 93 | output = self.activations 94 | output = output.transpose(1, len(output.shape) - 1) 95 | for i in range(output.size(0)): 96 | 97 | # Commonly the minimum activation will be 0, 98 | # And then it makes sense to zero it out. 99 | # However depending on the architecture, 100 | # If the values can be negative, we use very negative values 101 | # to perform the ablation, deviating from the paper. 102 | if torch.min(output) == 0: 103 | output[i, self.indices[i], :] = 0 104 | else: 105 | ABLATION_VALUE = 1e7 106 | output[i, self.indices[i], :] = torch.min( 107 | output) - ABLATION_VALUE 108 | 109 | output = output.transpose(len(output.shape) - 1, 1) 110 | 111 | return output 112 | 113 | def set_next_batch( 114 | self, 115 | input_batch_index, 116 | activations, 117 | num_channels_to_ablate): 118 | """ This creates the next batch of activations from the layer. 119 | Just take corresponding batch member from activations, and repeat it num_channels_to_ablate times. 120 | """ 121 | repeat_params = [num_channels_to_ablate] + \ 122 | len(activations.shape[:-1]) * [1] 123 | self.activations = activations[input_batch_index, :, :].clone( 124 | ).unsqueeze(0).repeat(*repeat_params) 125 | 126 | 127 | class AblationLayerFasterRCNN(AblationLayer): 128 | def __init__(self): 129 | super(AblationLayerFasterRCNN, self).__init__() 130 | 131 | def set_next_batch( 132 | self, 133 | input_batch_index, 134 | activations, 135 | num_channels_to_ablate): 136 | """ Extract the next batch member from activations, 137 | and repeat it num_channels_to_ablate times. 138 | """ 139 | self.activations = OrderedDict() 140 | for key, value in activations.items(): 141 | fpn_activation = value[input_batch_index, 142 | :, :, :].clone().unsqueeze(0) 143 | self.activations[key] = fpn_activation.repeat( 144 | num_channels_to_ablate, 1, 1, 1) 145 | 146 | def __call__(self, x): 147 | result = self.activations 148 | layers = {0: '0', 1: '1', 2: '2', 3: '3', 4: 'pool'} 149 | num_channels_to_ablate = result['pool'].size(0) 150 | for i in range(num_channels_to_ablate): 151 | pyramid_layer = int(self.indices[i] / 256) 152 | index_in_pyramid_layer = int(self.indices[i] % 256) 153 | result[layers[pyramid_layer]][i, 154 | index_in_pyramid_layer, :, :] = -1000 155 | return result 156 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/activations_and_gradients.py: -------------------------------------------------------------------------------- 1 | class ActivationsAndGradients: 2 | """ Class for extracting activations and 3 | registering gradients from targetted intermediate layers """ 4 | 5 | def __init__(self, model, target_layers, reshape_transform): 6 | self.model = model 7 | self.gradients = [] 8 | self.activations = [] 9 | self.reshape_transform = reshape_transform 10 | self.handles = [] 11 | for target_layer in target_layers: 12 | self.handles.append( 13 | target_layer.register_forward_hook(self.save_activation)) 14 | # Because of https://github.com/pytorch/pytorch/issues/61519, 15 | # we don't use backward hook to record gradients. 16 | self.handles.append( 17 | target_layer.register_forward_hook(self.save_gradient)) 18 | 19 | def save_activation(self, module, input, output): 20 | activation = output 21 | 22 | if self.reshape_transform is not None: 23 | activation = self.reshape_transform(activation) 24 | self.activations.append(activation.cpu().detach()) 25 | 26 | def save_gradient(self, module, input, output): 27 | if not hasattr(output, "requires_grad") or not output.requires_grad: 28 | # You can only register hooks on tensor requires grad. 29 | return 30 | 31 | # Gradients are computed in reverse order 32 | def _store_grad(grad): 33 | if self.reshape_transform is not None: 34 | grad = self.reshape_transform(grad) 35 | self.gradients = [grad.cpu().detach()] + self.gradients 36 | 37 | output.register_hook(_store_grad) 38 | 39 | def __call__(self, x): 40 | self.gradients = [] 41 | self.activations = [] 42 | return self.model(x) 43 | 44 | def release(self): 45 | for handle in self.handles: 46 | handle.remove() 47 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/base_cam.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import ttach as tta 4 | from typing import Callable, List, Tuple 5 | from pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients 6 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection 7 | from pytorch_grad_cam.utils.image import scale_cam_image 8 | from pytorch_grad_cam.utils.model_targets import ClassifierOutputTarget 9 | 10 | 11 | class BaseCAM: 12 | def __init__(self, 13 | model: torch.nn.Module, 14 | target_layers: List[torch.nn.Module], 15 | use_cuda: bool = False, 16 | reshape_transform: Callable = None, 17 | compute_input_gradient: bool = False, 18 | uses_gradients: bool = True) -> None: 19 | self.model = model.eval() 20 | self.target_layers = target_layers 21 | self.cuda = use_cuda 22 | if self.cuda: 23 | self.model = model.cuda() 24 | self.reshape_transform = reshape_transform 25 | self.compute_input_gradient = compute_input_gradient 26 | self.uses_gradients = uses_gradients 27 | self.activations_and_grads = ActivationsAndGradients( 28 | self.model, target_layers, reshape_transform) 29 | 30 | """ Get a vector of weights for every channel in the target layer. 31 | Methods that return weights channels, 32 | will typically need to only implement this function. """ 33 | 34 | def get_cam_weights(self, 35 | input_tensor: torch.Tensor, 36 | target_layers: List[torch.nn.Module], 37 | targets: List[torch.nn.Module], 38 | activations: torch.Tensor, 39 | grads: torch.Tensor) -> np.ndarray: 40 | raise Exception("Not Implemented") 41 | 42 | def get_cam_image(self, 43 | input_tensor: torch.Tensor, 44 | target_layer: torch.nn.Module, 45 | targets: List[torch.nn.Module], 46 | activations: torch.Tensor, 47 | grads: torch.Tensor, 48 | eigen_smooth: bool = False) -> np.ndarray: 49 | 50 | weights = self.get_cam_weights(input_tensor, 51 | target_layer, 52 | targets, 53 | activations, 54 | grads) 55 | weighted_activations = weights[:, :, None, None] * activations 56 | if eigen_smooth: 57 | cam = get_2d_projection(weighted_activations) 58 | else: 59 | cam = weighted_activations.sum(axis=1) 60 | return cam 61 | 62 | def forward(self, 63 | input_tensor: torch.Tensor, 64 | targets: List[torch.nn.Module], 65 | eigen_smooth: bool = False) -> np.ndarray: 66 | 67 | if self.cuda: 68 | input_tensor = input_tensor.cuda() 69 | 70 | if self.compute_input_gradient: 71 | input_tensor = torch.autograd.Variable(input_tensor, 72 | requires_grad=True) 73 | 74 | outputs = self.activations_and_grads(input_tensor) 75 | if targets is None: 76 | target_categories = np.argmax(outputs.cpu().data.numpy(), axis=-1) 77 | targets = [ClassifierOutputTarget( 78 | category) for category in target_categories] 79 | 80 | if self.uses_gradients: 81 | self.model.zero_grad() 82 | loss = sum([target(output) 83 | for target, output in zip(targets, outputs)]) 84 | loss.backward(retain_graph=True) 85 | 86 | # In most of the saliency attribution papers, the saliency is 87 | # computed with a single target layer. 88 | # Commonly it is the last convolutional layer. 89 | # Here we support passing a list with multiple target layers. 90 | # It will compute the saliency image for every image, 91 | # and then aggregate them (with a default mean aggregation). 92 | # This gives you more flexibility in case you just want to 93 | # use all conv layers for example, all Batchnorm layers, 94 | # or something else. 95 | cam_per_layer = self.compute_cam_per_layer(input_tensor, 96 | targets, 97 | eigen_smooth) 98 | return self.aggregate_multi_layers(cam_per_layer) 99 | 100 | def get_target_width_height(self, 101 | input_tensor: torch.Tensor) -> Tuple[int, int]: 102 | width, height = input_tensor.size(-1), input_tensor.size(-2) 103 | return width, height 104 | 105 | def compute_cam_per_layer( 106 | self, 107 | input_tensor: torch.Tensor, 108 | targets: List[torch.nn.Module], 109 | eigen_smooth: bool) -> np.ndarray: 110 | activations_list = [a.cpu().data.numpy() 111 | for a in self.activations_and_grads.activations] 112 | grads_list = [g.cpu().data.numpy() 113 | for g in self.activations_and_grads.gradients] 114 | target_size = self.get_target_width_height(input_tensor) 115 | 116 | cam_per_target_layer = [] 117 | # Loop over the saliency image from every layer 118 | for i in range(len(self.target_layers)): 119 | target_layer = self.target_layers[i] 120 | layer_activations = None 121 | layer_grads = None 122 | if i < len(activations_list): 123 | layer_activations = activations_list[i] 124 | if i < len(grads_list): 125 | layer_grads = grads_list[i] 126 | 127 | cam = self.get_cam_image(input_tensor, 128 | target_layer, 129 | targets, 130 | layer_activations, 131 | layer_grads, 132 | eigen_smooth) 133 | cam = np.maximum(cam, 0) 134 | scaled = scale_cam_image(cam, target_size) 135 | cam_per_target_layer.append(scaled[:, None, :]) 136 | 137 | return cam_per_target_layer 138 | 139 | def aggregate_multi_layers( 140 | self, 141 | cam_per_target_layer: np.ndarray) -> np.ndarray: 142 | cam_per_target_layer = np.concatenate(cam_per_target_layer, axis=1) 143 | cam_per_target_layer = np.maximum(cam_per_target_layer, 0) 144 | result = np.mean(cam_per_target_layer, axis=1) 145 | return scale_cam_image(result) 146 | 147 | def forward_augmentation_smoothing(self, 148 | input_tensor: torch.Tensor, 149 | targets: List[torch.nn.Module], 150 | eigen_smooth: bool = False) -> np.ndarray: 151 | transforms = tta.Compose( 152 | [ 153 | tta.HorizontalFlip(), 154 | tta.Multiply(factors=[0.9, 1, 1.1]), 155 | ] 156 | ) 157 | cams = [] 158 | for transform in transforms: 159 | augmented_tensor = transform.augment_image(input_tensor) 160 | cam = self.forward(augmented_tensor, 161 | targets, 162 | eigen_smooth) 163 | 164 | # The ttach library expects a tensor of size BxCxHxW 165 | cam = cam[:, None, :, :] 166 | cam = torch.from_numpy(cam) 167 | cam = transform.deaugment_mask(cam) 168 | 169 | # Back to numpy float32, HxW 170 | cam = cam.numpy() 171 | cam = cam[:, 0, :, :] 172 | cams.append(cam) 173 | 174 | cam = np.mean(np.float32(cams), axis=0) 175 | return cam 176 | 177 | def __call__(self, 178 | input_tensor: torch.Tensor, 179 | targets: List[torch.nn.Module] = None, 180 | aug_smooth: bool = False, 181 | eigen_smooth: bool = False) -> np.ndarray: 182 | 183 | # Smooth the CAM result with test time augmentation 184 | if aug_smooth is True: 185 | return self.forward_augmentation_smoothing( 186 | input_tensor, targets, eigen_smooth) 187 | 188 | return self.forward(input_tensor, 189 | targets, eigen_smooth) 190 | 191 | def __del__(self): 192 | self.activations_and_grads.release() 193 | 194 | def __enter__(self): 195 | return self 196 | 197 | def __exit__(self, exc_type, exc_value, exc_tb): 198 | self.activations_and_grads.release() 199 | if isinstance(exc_value, IndexError): 200 | # Handle IndexError here... 201 | print( 202 | f"An exception occurred in CAM with block: {exc_type}. Message: {exc_value}") 203 | return True 204 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/eigen_cam.py: -------------------------------------------------------------------------------- 1 | from pytorch_grad_cam.base_cam import BaseCAM 2 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection 3 | 4 | # https://arxiv.org/abs/2008.00299 5 | 6 | 7 | class EigenCAM(BaseCAM): 8 | def __init__(self, model, target_layers, use_cuda=False, 9 | reshape_transform=None): 10 | super(EigenCAM, self).__init__(model, 11 | target_layers, 12 | use_cuda, 13 | reshape_transform, 14 | uses_gradients=False) 15 | 16 | def get_cam_image(self, 17 | input_tensor, 18 | target_layer, 19 | target_category, 20 | activations, 21 | grads, 22 | eigen_smooth): 23 | return get_2d_projection(activations) 24 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/eigen_grad_cam.py: -------------------------------------------------------------------------------- 1 | from pytorch_grad_cam.base_cam import BaseCAM 2 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection 3 | 4 | # Like Eigen CAM: https://arxiv.org/abs/2008.00299 5 | # But multiply the activations x gradients 6 | 7 | 8 | class EigenGradCAM(BaseCAM): 9 | def __init__(self, model, target_layers, use_cuda=False, 10 | reshape_transform=None): 11 | super(EigenGradCAM, self).__init__(model, target_layers, use_cuda, 12 | reshape_transform) 13 | 14 | def get_cam_image(self, 15 | input_tensor, 16 | target_layer, 17 | target_category, 18 | activations, 19 | grads, 20 | eigen_smooth): 21 | return get_2d_projection(grads * activations) 22 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/feature_factorization/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/feature_factorization/__init__.py -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/feature_factorization/deep_feature_factorization.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from PIL import Image 3 | import torch 4 | from typing import Callable, List, Tuple, Optional 5 | from sklearn.decomposition import NMF 6 | from pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients 7 | from pytorch_grad_cam.utils.image import scale_cam_image, create_labels_legend, show_factorization_on_image 8 | 9 | 10 | def dff(activations: np.ndarray, n_components: int = 5): 11 | """ Compute Deep Feature Factorization on a 2d Activations tensor. 12 | 13 | :param activations: A numpy array of shape batch x channels x height x width 14 | :param n_components: The number of components for the non negative matrix factorization 15 | :returns: A tuple of the concepts (a numpy array with shape channels x components), 16 | and the explanation heatmaps (a numpy arary with shape batch x height x width) 17 | """ 18 | 19 | batch_size, channels, h, w = activations.shape 20 | reshaped_activations = activations.transpose((1, 0, 2, 3)) 21 | reshaped_activations[np.isnan(reshaped_activations)] = 0 22 | reshaped_activations = reshaped_activations.reshape( 23 | reshaped_activations.shape[0], -1) 24 | offset = reshaped_activations.min(axis=-1) 25 | reshaped_activations = reshaped_activations - offset[:, None] 26 | 27 | model = NMF(n_components=n_components, init='random', random_state=0) 28 | W = model.fit_transform(reshaped_activations) 29 | H = model.components_ 30 | concepts = W + offset[:, None] 31 | explanations = H.reshape(n_components, batch_size, h, w) 32 | explanations = explanations.transpose((1, 0, 2, 3)) 33 | return concepts, explanations 34 | 35 | 36 | class DeepFeatureFactorization: 37 | """ Deep Feature Factorization: https://arxiv.org/abs/1806.10206 38 | This gets a model andcomputes the 2D activations for a target layer, 39 | and computes Non Negative Matrix Factorization on the activations. 40 | 41 | Optionally it runs a computation on the concept embeddings, 42 | like running a classifier on them. 43 | 44 | The explanation heatmaps are scalled to the range [0, 1] 45 | and to the input tensor width and height. 46 | """ 47 | 48 | def __init__(self, 49 | model: torch.nn.Module, 50 | target_layer: torch.nn.Module, 51 | reshape_transform: Callable = None, 52 | computation_on_concepts=None 53 | ): 54 | self.model = model 55 | self.computation_on_concepts = computation_on_concepts 56 | self.activations_and_grads = ActivationsAndGradients( 57 | self.model, [target_layer], reshape_transform) 58 | 59 | def __call__(self, 60 | input_tensor: torch.Tensor, 61 | n_components: int = 16): 62 | batch_size, channels, h, w = input_tensor.size() 63 | _ = self.activations_and_grads(input_tensor) 64 | 65 | with torch.no_grad(): 66 | activations = self.activations_and_grads.activations[0].cpu( 67 | ).numpy() 68 | 69 | concepts, explanations = dff(activations, n_components=n_components) 70 | 71 | processed_explanations = [] 72 | 73 | for batch in explanations: 74 | processed_explanations.append(scale_cam_image(batch, (w, h))) 75 | 76 | if self.computation_on_concepts: 77 | with torch.no_grad(): 78 | concept_tensors = torch.from_numpy( 79 | np.float32(concepts).transpose((1, 0))) 80 | concept_outputs = self.computation_on_concepts( 81 | concept_tensors).cpu().numpy() 82 | return concepts, processed_explanations, concept_outputs 83 | else: 84 | return concepts, processed_explanations 85 | 86 | def __del__(self): 87 | self.activations_and_grads.release() 88 | 89 | def __exit__(self, exc_type, exc_value, exc_tb): 90 | self.activations_and_grads.release() 91 | if isinstance(exc_value, IndexError): 92 | # Handle IndexError here... 93 | print( 94 | f"An exception occurred in ActivationSummary with block: {exc_type}. Message: {exc_value}") 95 | return True 96 | 97 | 98 | def run_dff_on_image(model: torch.nn.Module, 99 | target_layer: torch.nn.Module, 100 | classifier: torch.nn.Module, 101 | img_pil: Image, 102 | img_tensor: torch.Tensor, 103 | reshape_transform=Optional[Callable], 104 | n_components: int = 5, 105 | top_k: int = 2) -> np.ndarray: 106 | """ Helper function to create a Deep Feature Factorization visualization for a single image. 107 | TBD: Run this on a batch with several images. 108 | """ 109 | rgb_img_float = np.array(img_pil) / 255 110 | dff = DeepFeatureFactorization(model=model, 111 | reshape_transform=reshape_transform, 112 | target_layer=target_layer, 113 | computation_on_concepts=classifier) 114 | 115 | concepts, batch_explanations, concept_outputs = dff( 116 | img_tensor[None, :], n_components) 117 | 118 | concept_outputs = torch.softmax( 119 | torch.from_numpy(concept_outputs), 120 | axis=-1).numpy() 121 | concept_label_strings = create_labels_legend(concept_outputs, 122 | labels=model.config.id2label, 123 | top_k=top_k) 124 | visualization = show_factorization_on_image( 125 | rgb_img_float, 126 | batch_explanations[0], 127 | image_weight=0.3, 128 | concept_labels=concept_label_strings) 129 | 130 | result = np.hstack((np.array(img_pil), visualization)) 131 | return result 132 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/fullgrad_cam.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | from pytorch_grad_cam.base_cam import BaseCAM 4 | from pytorch_grad_cam.utils.find_layers import find_layer_predicate_recursive 5 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection 6 | from pytorch_grad_cam.utils.image import scale_accross_batch_and_channels, scale_cam_image 7 | 8 | # https://arxiv.org/abs/1905.00780 9 | 10 | 11 | class FullGrad(BaseCAM): 12 | def __init__(self, model, target_layers, use_cuda=False, 13 | reshape_transform=None): 14 | if len(target_layers) > 0: 15 | print( 16 | "Warning: target_layers is ignored in FullGrad. All bias layers will be used instead") 17 | 18 | def layer_with_2D_bias(layer): 19 | bias_target_layers = [torch.nn.Conv2d, torch.nn.BatchNorm2d] 20 | if type(layer) in bias_target_layers and layer.bias is not None: 21 | return True 22 | return False 23 | target_layers = find_layer_predicate_recursive( 24 | model, layer_with_2D_bias) 25 | super( 26 | FullGrad, 27 | self).__init__( 28 | model, 29 | target_layers, 30 | use_cuda, 31 | reshape_transform, 32 | compute_input_gradient=True) 33 | self.bias_data = [self.get_bias_data( 34 | layer).cpu().numpy() for layer in target_layers] 35 | 36 | def get_bias_data(self, layer): 37 | # Borrowed from official paper impl: 38 | # https://github.com/idiap/fullgrad-saliency/blob/master/saliency/tensor_extractor.py#L47 39 | if isinstance(layer, torch.nn.BatchNorm2d): 40 | bias = - (layer.running_mean * layer.weight 41 | / torch.sqrt(layer.running_var + layer.eps)) + layer.bias 42 | return bias.data 43 | else: 44 | return layer.bias.data 45 | 46 | def compute_cam_per_layer( 47 | self, 48 | input_tensor, 49 | target_category, 50 | eigen_smooth): 51 | input_grad = input_tensor.grad.data.cpu().numpy() 52 | grads_list = [g.cpu().data.numpy() for g in 53 | self.activations_and_grads.gradients] 54 | cam_per_target_layer = [] 55 | target_size = self.get_target_width_height(input_tensor) 56 | 57 | gradient_multiplied_input = input_grad * input_tensor.data.cpu().numpy() 58 | gradient_multiplied_input = np.abs(gradient_multiplied_input) 59 | gradient_multiplied_input = scale_accross_batch_and_channels( 60 | gradient_multiplied_input, 61 | target_size) 62 | cam_per_target_layer.append(gradient_multiplied_input) 63 | 64 | # Loop over the saliency image from every layer 65 | assert(len(self.bias_data) == len(grads_list)) 66 | for bias, grads in zip(self.bias_data, grads_list): 67 | bias = bias[None, :, None, None] 68 | # In the paper they take the absolute value, 69 | # but possibily taking only the positive gradients will work 70 | # better. 71 | bias_grad = np.abs(bias * grads) 72 | result = scale_accross_batch_and_channels( 73 | bias_grad, target_size) 74 | result = np.sum(result, axis=1) 75 | cam_per_target_layer.append(result[:, None, :]) 76 | cam_per_target_layer = np.concatenate(cam_per_target_layer, axis=1) 77 | if eigen_smooth: 78 | # Resize to a smaller image, since this method typically has a very large number of channels, 79 | # and then consumes a lot of memory 80 | cam_per_target_layer = scale_accross_batch_and_channels( 81 | cam_per_target_layer, (target_size[0] // 8, target_size[1] // 8)) 82 | cam_per_target_layer = get_2d_projection(cam_per_target_layer) 83 | cam_per_target_layer = cam_per_target_layer[:, None, :, :] 84 | cam_per_target_layer = scale_accross_batch_and_channels( 85 | cam_per_target_layer, 86 | target_size) 87 | else: 88 | cam_per_target_layer = np.sum( 89 | cam_per_target_layer, axis=1)[:, None, :] 90 | 91 | return cam_per_target_layer 92 | 93 | def aggregate_multi_layers(self, cam_per_target_layer): 94 | result = np.sum(cam_per_target_layer, axis=1) 95 | return scale_cam_image(result) 96 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/grad_cam.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from pytorch_grad_cam.base_cam import BaseCAM 3 | 4 | 5 | class GradCAM(BaseCAM): 6 | def __init__(self, model, target_layers, use_cuda=False, 7 | reshape_transform=None): 8 | super( 9 | GradCAM, 10 | self).__init__( 11 | model, 12 | target_layers, 13 | use_cuda, 14 | reshape_transform) 15 | 16 | def get_cam_weights(self, 17 | input_tensor, 18 | target_layer, 19 | target_category, 20 | activations, 21 | grads): 22 | return np.mean(grads, axis=(2, 3)) 23 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/grad_cam_elementwise.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from pytorch_grad_cam.base_cam import BaseCAM 3 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection 4 | 5 | 6 | class GradCAMElementWise(BaseCAM): 7 | def __init__(self, model, target_layers, use_cuda=False, 8 | reshape_transform=None): 9 | super( 10 | GradCAMElementWise, 11 | self).__init__( 12 | model, 13 | target_layers, 14 | use_cuda, 15 | reshape_transform) 16 | 17 | def get_cam_image(self, 18 | input_tensor, 19 | target_layer, 20 | target_category, 21 | activations, 22 | grads, 23 | eigen_smooth): 24 | elementwise_activations = np.maximum(grads * activations, 0) 25 | 26 | if eigen_smooth: 27 | cam = get_2d_projection(elementwise_activations) 28 | else: 29 | cam = elementwise_activations.sum(axis=1) 30 | return cam 31 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/grad_cam_plusplus.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from pytorch_grad_cam.base_cam import BaseCAM 3 | 4 | # https://arxiv.org/abs/1710.11063 5 | 6 | 7 | class GradCAMPlusPlus(BaseCAM): 8 | def __init__(self, model, target_layers, use_cuda=False, 9 | reshape_transform=None): 10 | super(GradCAMPlusPlus, self).__init__(model, target_layers, use_cuda, 11 | reshape_transform) 12 | 13 | def get_cam_weights(self, 14 | input_tensor, 15 | target_layers, 16 | target_category, 17 | activations, 18 | grads): 19 | grads_power_2 = grads**2 20 | grads_power_3 = grads_power_2 * grads 21 | # Equation 19 in https://arxiv.org/abs/1710.11063 22 | sum_activations = np.sum(activations, axis=(2, 3)) 23 | eps = 0.000001 24 | aij = grads_power_2 / (2 * grads_power_2 + 25 | sum_activations[:, :, None, None] * grads_power_3 + eps) 26 | # Now bring back the ReLU from eq.7 in the paper, 27 | # And zero out aijs where the activations are 0 28 | aij = np.where(grads != 0, aij, 0) 29 | 30 | weights = np.maximum(grads, 0) * aij 31 | weights = np.sum(weights, axis=(2, 3)) 32 | return weights 33 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/guided_backprop.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | from torch.autograd import Function 4 | from pytorch_grad_cam.utils.find_layers import replace_all_layer_type_recursive 5 | 6 | 7 | class GuidedBackpropReLU(Function): 8 | @staticmethod 9 | def forward(self, input_img): 10 | positive_mask = (input_img > 0).type_as(input_img) 11 | output = torch.addcmul( 12 | torch.zeros( 13 | input_img.size()).type_as(input_img), 14 | input_img, 15 | positive_mask) 16 | self.save_for_backward(input_img, output) 17 | return output 18 | 19 | @staticmethod 20 | def backward(self, grad_output): 21 | input_img, output = self.saved_tensors 22 | grad_input = None 23 | 24 | positive_mask_1 = (input_img > 0).type_as(grad_output) 25 | positive_mask_2 = (grad_output > 0).type_as(grad_output) 26 | grad_input = torch.addcmul( 27 | torch.zeros( 28 | input_img.size()).type_as(input_img), 29 | torch.addcmul( 30 | torch.zeros( 31 | input_img.size()).type_as(input_img), 32 | grad_output, 33 | positive_mask_1), 34 | positive_mask_2) 35 | return grad_input 36 | 37 | 38 | class GuidedBackpropReLUasModule(torch.nn.Module): 39 | def __init__(self): 40 | super(GuidedBackpropReLUasModule, self).__init__() 41 | 42 | def forward(self, input_img): 43 | return GuidedBackpropReLU.apply(input_img) 44 | 45 | 46 | class GuidedBackpropReLUModel: 47 | def __init__(self, model, use_cuda): 48 | self.model = model 49 | self.model.eval() 50 | self.cuda = use_cuda 51 | if self.cuda: 52 | self.model = self.model.cuda() 53 | 54 | def forward(self, input_img): 55 | return self.model(input_img) 56 | 57 | def recursive_replace_relu_with_guidedrelu(self, module_top): 58 | 59 | for idx, module in module_top._modules.items(): 60 | self.recursive_replace_relu_with_guidedrelu(module) 61 | if module.__class__.__name__ == 'ReLU': 62 | module_top._modules[idx] = GuidedBackpropReLU.apply 63 | print("b") 64 | 65 | def recursive_replace_guidedrelu_with_relu(self, module_top): 66 | try: 67 | for idx, module in module_top._modules.items(): 68 | self.recursive_replace_guidedrelu_with_relu(module) 69 | if module == GuidedBackpropReLU.apply: 70 | module_top._modules[idx] = torch.nn.ReLU() 71 | except BaseException: 72 | pass 73 | 74 | def __call__(self, input_img, target_category=None): 75 | replace_all_layer_type_recursive(self.model, 76 | torch.nn.ReLU, 77 | GuidedBackpropReLUasModule()) 78 | 79 | if self.cuda: 80 | input_img = input_img.cuda() 81 | 82 | input_img = input_img.requires_grad_(True) 83 | 84 | output = self.forward(input_img) 85 | 86 | if target_category is None: 87 | target_category = np.argmax(output.cpu().data.numpy()) 88 | 89 | loss = output[0, target_category] 90 | loss.backward(retain_graph=True) 91 | 92 | output = input_img.grad.cpu().data.numpy() 93 | output = output[0, :, :, :] 94 | output = output.transpose((1, 2, 0)) 95 | 96 | replace_all_layer_type_recursive(self.model, 97 | GuidedBackpropReLUasModule, 98 | torch.nn.ReLU()) 99 | 100 | return output 101 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/hirescam.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from pytorch_grad_cam.base_cam import BaseCAM 3 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection 4 | 5 | 6 | class HiResCAM(BaseCAM): 7 | def __init__(self, model, target_layers, use_cuda=False, 8 | reshape_transform=None): 9 | super( 10 | HiResCAM, 11 | self).__init__( 12 | model, 13 | target_layers, 14 | use_cuda, 15 | reshape_transform) 16 | 17 | def get_cam_image(self, 18 | input_tensor, 19 | target_layer, 20 | target_category, 21 | activations, 22 | grads, 23 | eigen_smooth): 24 | elementwise_activations = grads * activations 25 | 26 | if eigen_smooth: 27 | print( 28 | "Warning: HiResCAM's faithfulness guarantees do not hold if smoothing is applied") 29 | cam = get_2d_projection(elementwise_activations) 30 | else: 31 | cam = elementwise_activations.sum(axis=1) 32 | return cam 33 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/layer_cam.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from pytorch_grad_cam.base_cam import BaseCAM 3 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection 4 | 5 | # https://ieeexplore.ieee.org/document/9462463 6 | 7 | 8 | class LayerCAM(BaseCAM): 9 | def __init__( 10 | self, 11 | model, 12 | target_layers, 13 | use_cuda=False, 14 | reshape_transform=None): 15 | super( 16 | LayerCAM, 17 | self).__init__( 18 | model, 19 | target_layers, 20 | use_cuda, 21 | reshape_transform) 22 | 23 | def get_cam_image(self, 24 | input_tensor, 25 | target_layer, 26 | target_category, 27 | activations, 28 | grads, 29 | eigen_smooth): 30 | spatial_weighted_activations = np.maximum(grads, 0) * activations 31 | 32 | if eigen_smooth: 33 | cam = get_2d_projection(spatial_weighted_activations) 34 | else: 35 | cam = spatial_weighted_activations.sum(axis=1) 36 | return cam 37 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/metrics/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/metrics/__init__.py -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/metrics/cam_mult_image.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | from typing import List, Callable 4 | from pytorch_grad_cam.metrics.perturbation_confidence import PerturbationConfidenceMetric 5 | 6 | 7 | def multiply_tensor_with_cam(input_tensor: torch.Tensor, 8 | cam: torch.Tensor): 9 | """ Multiply an input tensor (after normalization) 10 | with a pixel attribution map 11 | """ 12 | return input_tensor * cam 13 | 14 | 15 | class CamMultImageConfidenceChange(PerturbationConfidenceMetric): 16 | def __init__(self): 17 | super(CamMultImageConfidenceChange, 18 | self).__init__(multiply_tensor_with_cam) 19 | 20 | 21 | class DropInConfidence(CamMultImageConfidenceChange): 22 | def __init__(self): 23 | super(DropInConfidence, self).__init__() 24 | 25 | def __call__(self, *args, **kwargs): 26 | scores = super(DropInConfidence, self).__call__(*args, **kwargs) 27 | scores = -scores 28 | return np.maximum(scores, 0) 29 | 30 | 31 | class IncreaseInConfidence(CamMultImageConfidenceChange): 32 | def __init__(self): 33 | super(IncreaseInConfidence, self).__init__() 34 | 35 | def __call__(self, *args, **kwargs): 36 | scores = super(IncreaseInConfidence, self).__call__(*args, **kwargs) 37 | return np.float32(scores > 0) 38 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/metrics/perturbation_confidence.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | from typing import List, Callable 4 | 5 | import numpy as np 6 | import cv2 7 | 8 | 9 | class PerturbationConfidenceMetric: 10 | def __init__(self, perturbation): 11 | self.perturbation = perturbation 12 | 13 | def __call__(self, input_tensor: torch.Tensor, 14 | cams: np.ndarray, 15 | targets: List[Callable], 16 | model: torch.nn.Module, 17 | return_visualization=False, 18 | return_diff=True): 19 | 20 | if return_diff: 21 | with torch.no_grad(): 22 | outputs = model(input_tensor) 23 | scores = [target(output).cpu().numpy() 24 | for target, output in zip(targets, outputs)] 25 | scores = np.float32(scores) 26 | 27 | batch_size = input_tensor.size(0) 28 | perturbated_tensors = [] 29 | for i in range(batch_size): 30 | cam = cams[i] 31 | tensor = self.perturbation(input_tensor[i, ...].cpu(), 32 | torch.from_numpy(cam)) 33 | tensor = tensor.to(input_tensor.device) 34 | perturbated_tensors.append(tensor.unsqueeze(0)) 35 | perturbated_tensors = torch.cat(perturbated_tensors) 36 | 37 | with torch.no_grad(): 38 | outputs_after_imputation = model(perturbated_tensors) 39 | scores_after_imputation = [ 40 | target(output).cpu().numpy() for target, output in zip( 41 | targets, outputs_after_imputation)] 42 | scores_after_imputation = np.float32(scores_after_imputation) 43 | 44 | if return_diff: 45 | result = scores_after_imputation - scores 46 | else: 47 | result = scores_after_imputation 48 | 49 | if return_visualization: 50 | return result, perturbated_tensors 51 | else: 52 | return result 53 | 54 | 55 | class RemoveMostRelevantFirst: 56 | def __init__(self, percentile, imputer): 57 | self.percentile = percentile 58 | self.imputer = imputer 59 | 60 | def __call__(self, input_tensor, mask): 61 | imputer = self.imputer 62 | if self.percentile != 'auto': 63 | threshold = np.percentile(mask.cpu().numpy(), self.percentile) 64 | binary_mask = np.float32(mask < threshold) 65 | else: 66 | _, binary_mask = cv2.threshold( 67 | np.uint8(mask * 255), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU) 68 | 69 | binary_mask = torch.from_numpy(binary_mask) 70 | binary_mask = binary_mask.to(mask.device) 71 | return imputer(input_tensor, binary_mask) 72 | 73 | 74 | class RemoveLeastRelevantFirst(RemoveMostRelevantFirst): 75 | def __init__(self, percentile, imputer): 76 | super(RemoveLeastRelevantFirst, self).__init__(percentile, imputer) 77 | 78 | def __call__(self, input_tensor, mask): 79 | return super(RemoveLeastRelevantFirst, self).__call__( 80 | input_tensor, 1 - mask) 81 | 82 | 83 | class AveragerAcrossThresholds: 84 | def __init__( 85 | self, 86 | imputer, 87 | percentiles=[ 88 | 10, 89 | 20, 90 | 30, 91 | 40, 92 | 50, 93 | 60, 94 | 70, 95 | 80, 96 | 90]): 97 | self.imputer = imputer 98 | self.percentiles = percentiles 99 | 100 | def __call__(self, 101 | input_tensor: torch.Tensor, 102 | cams: np.ndarray, 103 | targets: List[Callable], 104 | model: torch.nn.Module): 105 | scores = [] 106 | for percentile in self.percentiles: 107 | imputer = self.imputer(percentile) 108 | scores.append(imputer(input_tensor, cams, targets, model)) 109 | return np.mean(np.float32(scores), axis=0) 110 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/metrics/road.py: -------------------------------------------------------------------------------- 1 | # A Consistent and Efficient Evaluation Strategy for Attribution Methods 2 | # https://arxiv.org/abs/2202.00449 3 | # Taken from https://raw.githubusercontent.com/tleemann/road_evaluation/main/imputations.py 4 | # MIT License 5 | 6 | # Copyright (c) 2022 Tobias Leemann 7 | 8 | # Permission is hereby granted, free of charge, to any person obtaining a copy 9 | # of this software and associated documentation files (the "Software"), to deal 10 | # in the Software without restriction, including without limitation the rights 11 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 12 | # copies of the Software, and to permit persons to whom the Software is 13 | # furnished to do so, subject to the following conditions: 14 | 15 | # The above copyright notice and this permission notice shall be included in all 16 | # copies or substantial portions of the Software. 17 | 18 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 19 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 20 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 21 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 22 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 23 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 24 | # SOFTWARE. 25 | 26 | 27 | # Implementations of our imputation models. 28 | import torch 29 | import numpy as np 30 | from scipy.sparse import lil_matrix, csc_matrix 31 | from scipy.sparse.linalg import spsolve 32 | from typing import List, Callable 33 | from pytorch_grad_cam.metrics.perturbation_confidence import PerturbationConfidenceMetric, \ 34 | AveragerAcrossThresholds, \ 35 | RemoveMostRelevantFirst, \ 36 | RemoveLeastRelevantFirst 37 | 38 | # The weights of the surrounding pixels 39 | neighbors_weights = [((1, 1), 1 / 12), 40 | ((0, 1), 1 / 6), 41 | ((-1, 1), 1 / 12), 42 | ((1, -1), 1 / 12), 43 | ((0, -1), 1 / 6), 44 | ((-1, -1), 1 / 12), 45 | ((1, 0), 1 / 6), 46 | ((-1, 0), 1 / 6)] 47 | 48 | 49 | class NoisyLinearImputer: 50 | def __init__(self, 51 | noise: float = 0.01, 52 | weighting: List[float] = neighbors_weights): 53 | """ 54 | Noisy linear imputation. 55 | noise: magnitude of noise to add (absolute, set to 0 for no noise) 56 | weighting: Weights of the neighboring pixels in the computation. 57 | List of tuples of (offset, weight) 58 | """ 59 | self.noise = noise 60 | self.weighting = neighbors_weights 61 | 62 | @staticmethod 63 | def add_offset_to_indices(indices, offset, mask_shape): 64 | """ Add the corresponding offset to the indices. 65 | Return new indices plus a valid bit-vector. """ 66 | cord1 = indices % mask_shape[1] 67 | cord0 = indices // mask_shape[1] 68 | cord0 += offset[0] 69 | cord1 += offset[1] 70 | valid = ((cord0 < 0) | (cord1 < 0) | 71 | (cord0 >= mask_shape[0]) | 72 | (cord1 >= mask_shape[1])) 73 | return ~valid, indices + offset[0] * mask_shape[1] + offset[1] 74 | 75 | @staticmethod 76 | def setup_sparse_system(mask, img, neighbors_weights): 77 | """ Vectorized version to set up the equation system. 78 | mask: (H, W)-tensor of missing pixels. 79 | Image: (H, W, C)-tensor of all values. 80 | Return (N,N)-System matrix, (N,C)-Right hand side for each of the C channels. 81 | """ 82 | maskflt = mask.flatten() 83 | imgflat = img.reshape((img.shape[0], -1)) 84 | # Indices that are imputed in the flattened mask: 85 | indices = np.argwhere(maskflt == 0).flatten() 86 | coords_to_vidx = np.zeros(len(maskflt), dtype=int) 87 | coords_to_vidx[indices] = np.arange(len(indices)) 88 | numEquations = len(indices) 89 | # System matrix: 90 | A = lil_matrix((numEquations, numEquations)) 91 | b = np.zeros((numEquations, img.shape[0])) 92 | # Sum of weights assigned: 93 | sum_neighbors = np.ones(numEquations) 94 | for n in neighbors_weights: 95 | offset, weight = n[0], n[1] 96 | # Take out outliers 97 | valid, new_coords = NoisyLinearImputer.add_offset_to_indices( 98 | indices, offset, mask.shape) 99 | valid_coords = new_coords[valid] 100 | valid_ids = np.argwhere(valid == 1).flatten() 101 | # Add values to the right hand-side 102 | has_values_coords = valid_coords[maskflt[valid_coords] > 0.5] 103 | has_values_ids = valid_ids[maskflt[valid_coords] > 0.5] 104 | b[has_values_ids, :] -= weight * imgflat[:, has_values_coords].T 105 | # Add weights to the system (left hand side) 106 | # Find coordinates in the system. 107 | has_no_values = valid_coords[maskflt[valid_coords] < 0.5] 108 | variable_ids = coords_to_vidx[has_no_values] 109 | has_no_values_ids = valid_ids[maskflt[valid_coords] < 0.5] 110 | A[has_no_values_ids, variable_ids] = weight 111 | # Reduce weight for invalid 112 | sum_neighbors[np.argwhere(valid == 0).flatten()] = \ 113 | sum_neighbors[np.argwhere(valid == 0).flatten()] - weight 114 | 115 | A[np.arange(numEquations), np.arange(numEquations)] = -sum_neighbors 116 | return A, b 117 | 118 | def __call__(self, img: torch.Tensor, mask: torch.Tensor): 119 | """ Our linear inputation scheme. """ 120 | """ 121 | This is the function to do the linear infilling 122 | img: original image (C,H,W)-tensor; 123 | mask: mask; (H,W)-tensor 124 | 125 | """ 126 | imgflt = img.reshape(img.shape[0], -1) 127 | maskflt = mask.reshape(-1) 128 | # Indices that need to be imputed. 129 | indices_linear = np.argwhere(maskflt == 0).flatten() 130 | # Set up sparse equation system, solve system. 131 | A, b = NoisyLinearImputer.setup_sparse_system( 132 | mask.numpy(), img.numpy(), neighbors_weights) 133 | res = torch.tensor(spsolve(csc_matrix(A), b), dtype=torch.float) 134 | 135 | # Fill the values with the solution of the system. 136 | img_infill = imgflt.clone() 137 | img_infill[:, indices_linear] = res.t() + self.noise * \ 138 | torch.randn_like(res.t()) 139 | 140 | return img_infill.reshape_as(img) 141 | 142 | 143 | class ROADMostRelevantFirst(PerturbationConfidenceMetric): 144 | def __init__(self, percentile=80): 145 | super(ROADMostRelevantFirst, self).__init__( 146 | RemoveMostRelevantFirst(percentile, NoisyLinearImputer())) 147 | 148 | 149 | class ROADLeastRelevantFirst(PerturbationConfidenceMetric): 150 | def __init__(self, percentile=20): 151 | super(ROADLeastRelevantFirst, self).__init__( 152 | RemoveLeastRelevantFirst(percentile, NoisyLinearImputer())) 153 | 154 | 155 | class ROADMostRelevantFirstAverage(AveragerAcrossThresholds): 156 | def __init__(self, percentiles=[10, 20, 30, 40, 50, 60, 70, 80, 90]): 157 | super(ROADMostRelevantFirstAverage, self).__init__( 158 | ROADMostRelevantFirst, percentiles) 159 | 160 | 161 | class ROADLeastRelevantFirstAverage(AveragerAcrossThresholds): 162 | def __init__(self, percentiles=[10, 20, 30, 40, 50, 60, 70, 80, 90]): 163 | super(ROADLeastRelevantFirstAverage, self).__init__( 164 | ROADLeastRelevantFirst, percentiles) 165 | 166 | 167 | class ROADCombined: 168 | def __init__(self, percentiles=[10, 20, 30, 40, 50, 60, 70, 80, 90]): 169 | self.percentiles = percentiles 170 | self.morf_averager = ROADMostRelevantFirstAverage(percentiles) 171 | self.lerf_averager = ROADLeastRelevantFirstAverage(percentiles) 172 | 173 | def __call__(self, 174 | input_tensor: torch.Tensor, 175 | cams: np.ndarray, 176 | targets: List[Callable], 177 | model: torch.nn.Module): 178 | 179 | scores_lerf = self.lerf_averager(input_tensor, cams, targets, model) 180 | scores_morf = self.morf_averager(input_tensor, cams, targets, model) 181 | return (scores_lerf - scores_morf) / 2 182 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/random_cam.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from pytorch_grad_cam.base_cam import BaseCAM 3 | 4 | 5 | class RandomCAM(BaseCAM): 6 | def __init__(self, model, target_layers, use_cuda=False, 7 | reshape_transform=None): 8 | super( 9 | RandomCAM, 10 | self).__init__( 11 | model, 12 | target_layers, 13 | use_cuda, 14 | reshape_transform) 15 | 16 | def get_cam_weights(self, 17 | input_tensor, 18 | target_layer, 19 | target_category, 20 | activations, 21 | grads): 22 | return np.random.uniform(-1, 1, size=(grads.shape[0], grads.shape[1])) 23 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/score_cam.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import tqdm 3 | from pytorch_grad_cam.base_cam import BaseCAM 4 | 5 | 6 | class ScoreCAM(BaseCAM): 7 | def __init__( 8 | self, 9 | model, 10 | target_layers, 11 | use_cuda=False, 12 | reshape_transform=None): 13 | super(ScoreCAM, self).__init__(model, 14 | target_layers, 15 | use_cuda, 16 | reshape_transform=reshape_transform, 17 | uses_gradients=False) 18 | 19 | def get_cam_weights(self, 20 | input_tensor, 21 | target_layer, 22 | targets, 23 | activations, 24 | grads): 25 | with torch.no_grad(): 26 | upsample = torch.nn.UpsamplingBilinear2d( 27 | size=input_tensor.shape[-2:]) 28 | activation_tensor = torch.from_numpy(activations) 29 | if self.cuda: 30 | activation_tensor = activation_tensor.cuda() 31 | 32 | upsampled = upsample(activation_tensor) 33 | 34 | maxs = upsampled.view(upsampled.size(0), 35 | upsampled.size(1), -1).max(dim=-1)[0] 36 | mins = upsampled.view(upsampled.size(0), 37 | upsampled.size(1), -1).min(dim=-1)[0] 38 | 39 | maxs, mins = maxs[:, :, None, None], mins[:, :, None, None] 40 | upsampled = (upsampled - mins) / (maxs - mins) 41 | 42 | input_tensors = input_tensor[:, None, 43 | :, :] * upsampled[:, :, None, :, :] 44 | 45 | if hasattr(self, "batch_size"): 46 | BATCH_SIZE = self.batch_size 47 | else: 48 | BATCH_SIZE = 16 49 | 50 | scores = [] 51 | for target, tensor in zip(targets, input_tensors): 52 | for i in tqdm.tqdm(range(0, tensor.size(0), BATCH_SIZE)): 53 | batch = tensor[i: i + BATCH_SIZE, :] 54 | outputs = [target(o).cpu().item() 55 | for o in self.model(batch)] 56 | scores.extend(outputs) 57 | scores = torch.Tensor(scores) 58 | scores = scores.view(activations.shape[0], activations.shape[1]) 59 | weights = torch.nn.Softmax(dim=-1)(scores).numpy() 60 | return weights 61 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/sobel_cam.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | 3 | 4 | def sobel_cam(img): 5 | gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) 6 | grad_x = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3) 7 | grad_y = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3) 8 | abs_grad_x = cv2.convertScaleAbs(grad_x) 9 | abs_grad_y = cv2.convertScaleAbs(grad_y) 10 | grad = cv2.addWeighted(abs_grad_x, 0.5, abs_grad_y, 0.5, 0) 11 | return grad 12 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/utils/__init__.py: -------------------------------------------------------------------------------- 1 | from pytorch_grad_cam.utils.image import deprocess_image 2 | from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection 3 | from pytorch_grad_cam.utils import model_targets 4 | from pytorch_grad_cam.utils import reshape_transforms 5 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/utils/find_layers.py: -------------------------------------------------------------------------------- 1 | def replace_layer_recursive(model, old_layer, new_layer): 2 | for name, layer in model._modules.items(): 3 | if layer == old_layer: 4 | model._modules[name] = new_layer 5 | return True 6 | elif replace_layer_recursive(layer, old_layer, new_layer): 7 | return True 8 | return False 9 | 10 | 11 | def replace_all_layer_type_recursive(model, old_layer_type, new_layer): 12 | for name, layer in model._modules.items(): 13 | if isinstance(layer, old_layer_type): 14 | model._modules[name] = new_layer 15 | replace_all_layer_type_recursive(layer, old_layer_type, new_layer) 16 | 17 | 18 | def find_layer_types_recursive(model, layer_types): 19 | def predicate(layer): 20 | return type(layer) in layer_types 21 | return find_layer_predicate_recursive(model, predicate) 22 | 23 | 24 | def find_layer_predicate_recursive(model, predicate): 25 | result = [] 26 | for name, layer in model._modules.items(): 27 | if predicate(layer): 28 | result.append(layer) 29 | result.extend(find_layer_predicate_recursive(layer, predicate)) 30 | return result 31 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/utils/image.py: -------------------------------------------------------------------------------- 1 | import matplotlib 2 | from matplotlib import pyplot as plt 3 | from matplotlib.lines import Line2D 4 | import cv2 5 | import numpy as np 6 | import torch 7 | from torchvision.transforms import Compose, Normalize, ToTensor 8 | from typing import List, Dict 9 | import math 10 | 11 | 12 | def preprocess_image( 13 | img: np.ndarray, mean=[ 14 | 0.5, 0.5, 0.5], std=[ 15 | 0.5, 0.5, 0.5]) -> torch.Tensor: 16 | preprocessing = Compose([ 17 | ToTensor(), 18 | Normalize(mean=mean, std=std) 19 | ]) 20 | return preprocessing(img.copy()).unsqueeze(0) 21 | 22 | 23 | def deprocess_image(img): 24 | """ see https://github.com/jacobgil/keras-grad-cam/blob/master/grad-cam.py#L65 """ 25 | img = img - np.mean(img) 26 | img = img / (np.std(img) + 1e-5) 27 | img = img * 0.1 28 | img = img + 0.5 29 | img = np.clip(img, 0, 1) 30 | return np.uint8(img * 255) 31 | 32 | 33 | def show_cam_on_image(img: np.ndarray, 34 | mask: np.ndarray, 35 | use_rgb: bool = False, 36 | colormap: int = cv2.COLORMAP_JET, 37 | image_weight: float = 0.5) -> np.ndarray: 38 | """ This function overlays the cam mask on the image as an heatmap. 39 | By default the heatmap is in BGR format. 40 | 41 | :param img: The base image in RGB or BGR format. 42 | :param mask: The cam mask. 43 | :param use_rgb: Whether to use an RGB or BGR heatmap, this should be set to True if 'img' is in RGB format. 44 | :param colormap: The OpenCV colormap to be used. 45 | :param image_weight: The final result is image_weight * img + (1-image_weight) * mask. 46 | :returns: The default image with the cam overlay. 47 | """ 48 | heatmap = cv2.applyColorMap(np.uint8(255 * mask), colormap) 49 | if use_rgb: 50 | heatmap = cv2.cvtColor(heatmap, cv2.COLOR_BGR2RGB) 51 | heatmap = np.float32(heatmap) / 255 52 | 53 | if np.max(img) > 1: 54 | raise Exception( 55 | "The input image should np.float32 in the range [0, 1]") 56 | 57 | if image_weight < 0 or image_weight > 1: 58 | raise Exception( 59 | f"image_weight should be in the range [0, 1].\ 60 | Got: {image_weight}") 61 | 62 | cam = (1 - image_weight) * heatmap + image_weight * img 63 | cam = cam / np.max(cam) 64 | return np.uint8(255 * cam) 65 | 66 | 67 | def create_labels_legend(concept_scores: np.ndarray, 68 | labels: Dict[int, str], 69 | top_k=2): 70 | concept_categories = np.argsort(concept_scores, axis=1)[:, ::-1][:, :top_k] 71 | concept_labels_topk = [] 72 | for concept_index in range(concept_categories.shape[0]): 73 | categories = concept_categories[concept_index, :] 74 | concept_labels = [] 75 | for category in categories: 76 | score = concept_scores[concept_index, category] 77 | label = f"{','.join(labels[category].split(',')[:3])}:{score:.2f}" 78 | concept_labels.append(label) 79 | concept_labels_topk.append("\n".join(concept_labels)) 80 | return concept_labels_topk 81 | 82 | 83 | def show_factorization_on_image(img: np.ndarray, 84 | explanations: np.ndarray, 85 | colors: List[np.ndarray] = None, 86 | image_weight: float = 0.5, 87 | concept_labels: List = None) -> np.ndarray: 88 | """ Color code the different component heatmaps on top of the image. 89 | Every component color code will be magnified according to the heatmap itensity 90 | (by modifying the V channel in the HSV color space), 91 | and optionally create a lagend that shows the labels. 92 | 93 | Since different factorization component heatmaps can overlap in principle, 94 | we need a strategy to decide how to deal with the overlaps. 95 | This keeps the component that has a higher value in it's heatmap. 96 | 97 | :param img: The base image RGB format. 98 | :param explanations: A tensor of shape num_componetns x height x width, with the component visualizations. 99 | :param colors: List of R, G, B colors to be used for the components. 100 | If None, will use the gist_rainbow cmap as a default. 101 | :param image_weight: The final result is image_weight * img + (1-image_weight) * visualization. 102 | :concept_labels: A list of strings for every component. If this is paseed, a legend that shows 103 | the labels and their colors will be added to the image. 104 | :returns: The visualized image. 105 | """ 106 | n_components = explanations.shape[0] 107 | if colors is None: 108 | # taken from https://github.com/edocollins/DFF/blob/master/utils.py 109 | _cmap = plt.cm.get_cmap('gist_rainbow') 110 | colors = [ 111 | np.array( 112 | _cmap(i)) for i in np.arange( 113 | 0, 114 | 1, 115 | 1.0 / 116 | n_components)] 117 | concept_per_pixel = explanations.argmax(axis=0) 118 | masks = [] 119 | for i in range(n_components): 120 | mask = np.zeros(shape=(img.shape[0], img.shape[1], 3)) 121 | mask[:, :, :] = colors[i][:3] 122 | explanation = explanations[i] 123 | explanation[concept_per_pixel != i] = 0 124 | mask = np.uint8(mask * 255) 125 | mask = cv2.cvtColor(mask, cv2.COLOR_RGB2HSV) 126 | mask[:, :, 2] = np.uint8(255 * explanation) 127 | mask = cv2.cvtColor(mask, cv2.COLOR_HSV2RGB) 128 | mask = np.float32(mask) / 255 129 | masks.append(mask) 130 | 131 | mask = np.sum(np.float32(masks), axis=0) 132 | result = img * image_weight + mask * (1 - image_weight) 133 | result = np.uint8(result * 255) 134 | 135 | if concept_labels is not None: 136 | px = 1 / plt.rcParams['figure.dpi'] # pixel in inches 137 | fig = plt.figure(figsize=(result.shape[1] * px, result.shape[0] * px)) 138 | plt.rcParams['legend.fontsize'] = int( 139 | 14 * result.shape[0] / 256 / max(1, n_components / 6)) 140 | lw = 5 * result.shape[0] / 256 141 | lines = [Line2D([0], [0], color=colors[i], lw=lw) 142 | for i in range(n_components)] 143 | plt.legend(lines, 144 | concept_labels, 145 | mode="expand", 146 | fancybox=True, 147 | shadow=True) 148 | 149 | plt.tight_layout(pad=0, w_pad=0, h_pad=0) 150 | plt.axis('off') 151 | fig.canvas.draw() 152 | data = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8) 153 | plt.close(fig=fig) 154 | data = data.reshape(fig.canvas.get_width_height()[::-1] + (3,)) 155 | data = cv2.resize(data, (result.shape[1], result.shape[0])) 156 | result = np.hstack((result, data)) 157 | return result 158 | 159 | 160 | def scale_cam_image(cam, target_size=None): 161 | result = [] 162 | for img in cam: 163 | img = img - np.min(img) 164 | img = img / (1e-7 + np.max(img)) 165 | if target_size is not None: 166 | img = cv2.resize(img, target_size) 167 | result.append(img) 168 | result = np.float32(result) 169 | 170 | return result 171 | 172 | 173 | def scale_accross_batch_and_channels(tensor, target_size): 174 | batch_size, channel_size = tensor.shape[:2] 175 | reshaped_tensor = tensor.reshape( 176 | batch_size * channel_size, *tensor.shape[2:]) 177 | result = scale_cam_image(reshaped_tensor, target_size) 178 | result = result.reshape( 179 | batch_size, 180 | channel_size, 181 | target_size[1], 182 | target_size[0]) 183 | return result 184 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/utils/model_targets.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import torchvision 4 | 5 | 6 | class ClassifierOutputTarget: 7 | def __init__(self, category): 8 | self.category = category 9 | 10 | def __call__(self, model_output): 11 | if len(model_output.shape) == 1: 12 | return model_output[self.category] 13 | return model_output[:, self.category] 14 | 15 | 16 | class ClassifierOutputSoftmaxTarget: 17 | def __init__(self, category): 18 | self.category = category 19 | 20 | def __call__(self, model_output): 21 | if len(model_output.shape) == 1: 22 | return torch.softmax(model_output, dim=-1)[self.category] 23 | return torch.softmax(model_output, dim=-1)[:, self.category] 24 | 25 | 26 | class BinaryClassifierOutputTarget: 27 | def __init__(self, category): 28 | self.category = category 29 | 30 | def __call__(self, model_output): 31 | if self.category == 1: 32 | sign = 1 33 | else: 34 | sign = -1 35 | return model_output * sign 36 | 37 | 38 | class SoftmaxOutputTarget: 39 | def __init__(self): 40 | pass 41 | 42 | def __call__(self, model_output): 43 | return torch.softmax(model_output, dim=-1) 44 | 45 | 46 | class RawScoresOutputTarget: 47 | def __init__(self): 48 | pass 49 | 50 | def __call__(self, model_output): 51 | return model_output 52 | 53 | 54 | class SemanticSegmentationTarget: 55 | """ Gets a binary spatial mask and a category, 56 | And return the sum of the category scores, 57 | of the pixels in the mask. """ 58 | 59 | def __init__(self, category, mask): 60 | self.category = category 61 | self.mask = torch.from_numpy(mask) 62 | if torch.cuda.is_available(): 63 | self.mask = self.mask.cuda() 64 | 65 | def __call__(self, model_output): 66 | return (model_output[self.category, :, :] * self.mask).sum() 67 | 68 | 69 | class FasterRCNNBoxScoreTarget: 70 | """ For every original detected bounding box specified in "bounding boxes", 71 | assign a score on how the current bounding boxes match it, 72 | 1. In IOU 73 | 2. In the classification score. 74 | If there is not a large enough overlap, or the category changed, 75 | assign a score of 0. 76 | 77 | The total score is the sum of all the box scores. 78 | """ 79 | 80 | def __init__(self, labels, bounding_boxes, iou_threshold=0.5): 81 | self.labels = labels 82 | self.bounding_boxes = bounding_boxes 83 | self.iou_threshold = iou_threshold 84 | 85 | def __call__(self, model_outputs): 86 | output = torch.Tensor([0]) 87 | if torch.cuda.is_available(): 88 | output = output.cuda() 89 | 90 | if len(model_outputs["boxes"]) == 0: 91 | return output 92 | 93 | for box, label in zip(self.bounding_boxes, self.labels): 94 | box = torch.Tensor(box[None, :]) 95 | if torch.cuda.is_available(): 96 | box = box.cuda() 97 | 98 | ious = torchvision.ops.box_iou(box, model_outputs["boxes"]) 99 | index = ious.argmax() 100 | if ious[0, index] > self.iou_threshold and model_outputs["labels"][index] == label: 101 | score = ious[0, index] + model_outputs["scores"][index] 102 | output = output + score 103 | return output 104 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/utils/reshape_transforms.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | 4 | def fasterrcnn_reshape_transform(x): 5 | target_size = x['pool'].size()[-2:] 6 | activations = [] 7 | for key, value in x.items(): 8 | activations.append( 9 | torch.nn.functional.interpolate( 10 | torch.abs(value), 11 | target_size, 12 | mode='bilinear')) 13 | activations = torch.cat(activations, axis=1) 14 | return activations 15 | 16 | 17 | def swinT_reshape_transform(tensor, height=7, width=7): 18 | result = tensor.reshape(tensor.size(0), 19 | height, width, tensor.size(2)) 20 | 21 | # Bring the channels to the first dimension, 22 | # like in CNNs. 23 | result = result.transpose(2, 3).transpose(1, 2) 24 | return result 25 | 26 | 27 | def vit_reshape_transform(tensor, height=14, width=14): 28 | result = tensor[:, 1:, :].reshape(tensor.size(0), 29 | height, width, tensor.size(2)) 30 | 31 | # Bring the channels to the first dimension, 32 | # like in CNNs. 33 | result = result.transpose(2, 3).transpose(1, 2) 34 | return result 35 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/utils/svd_on_activations.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def get_2d_projection(activation_batch): 5 | # TBD: use pytorch batch svd implementation 6 | activation_batch[np.isnan(activation_batch)] = 0 7 | projections = [] 8 | for activations in activation_batch: 9 | reshaped_activations = (activations).reshape( 10 | activations.shape[0], -1).transpose() 11 | # Centering before the SVD seems to be important here, 12 | # Otherwise the image returned is negative 13 | reshaped_activations = reshaped_activations - \ 14 | reshaped_activations.mean(axis=0) 15 | U, S, VT = np.linalg.svd(reshaped_activations, full_matrices=True) 16 | projection = reshaped_activations @ VT[0, :] 17 | projection = projection.reshape(activations.shape[1:]) 18 | projections.append(projection) 19 | return np.float32(projections) 20 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/pytorch_grad_cam/xgrad_cam.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from pytorch_grad_cam.base_cam import BaseCAM 3 | 4 | 5 | class XGradCAM(BaseCAM): 6 | def __init__( 7 | self, 8 | model, 9 | target_layers, 10 | use_cuda=False, 11 | reshape_transform=None): 12 | super( 13 | XGradCAM, 14 | self).__init__( 15 | model, 16 | target_layers, 17 | use_cuda, 18 | reshape_transform) 19 | 20 | def get_cam_weights(self, 21 | input_tensor, 22 | target_layer, 23 | target_category, 24 | activations, 25 | grads): 26 | sum_activations = np.sum(activations, axis=(2, 3)) 27 | eps = 1e-7 28 | weights = grads * activations / \ 29 | (sum_activations[:, :, None, None] + eps) 30 | weights = weights.sum(axis=(2, 3)) 31 | return weights 32 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==1.19.4 2 | psutil==5.6.7 3 | faiss_cpu 4 | tqdm==4.48.2 5 | Pillow==8.2.0 6 | scikit_learn==0.24.1 7 | torchscan==0.1.1 8 | googledrivedownloader==0.4 9 | requests==2.26.0 10 | timm==0.4.12 11 | transformers==4.10.2 12 | einops 13 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/results/result.txt: -------------------------------------------------------------------------------- 1 | ResNet-NetVLAD: R@1: 39.4, R@5: 83.9, R@10: 93.1, R@20: 97.6 2 | CCT-NetVLAD: R@1: 38.7, R@5: 81.6, R@10: 93.4, R@20: 97.7 3 | MixVPR:R@1: 41.3, R@5: 83.1, R@10: 93.8, R@20: 97.9 4 | CosPlace: R@1: 29.1, R@5: 73.9, R@10: 88.5, R@20: 96.4 5 | AnyLoc: R@1: 37.4, R@5: 81.0, R@10: 92.5, R@20: 97.8 -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/sbatch.txt: -------------------------------------------------------------------------------- 1 | python train.py --dataset_name=compressed_vid --datasets_folder=/scratch/ds5725/VPR-datasets-downloader/datasets --resume=/scratch/ds5725/deep-visual-geo-localization-benchmark/logs/default/2023-04-22_19-42-06/best_model.pth 2 | python train.py --dataset_name=indoor --datasets_folder=/mnt/data/nyc_indoor --backbone=resnet50conv4 3 | 4 | python train.py --dataset_name=nyu-vpr --datasets_folder=/scratch/ds5725/VPR-datasets-downloader/datasets --backbone= -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/scratch.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | best_model_state_dict = torch.load(join(args.save_dir, "/scratch/ds5725/deep-visual-geo-localization-benchmark/logs/default/2023-04-22_19-42-06/best_model.pth"))["model_state_dict"] 4 | 5 | model.load_state_dict(best_model_state_dict) 6 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/summary.py: -------------------------------------------------------------------------------- 1 | import os 2 | import util 3 | import cv2 4 | from tqdm import tqdm 5 | 6 | folder_path = '/scratch/ds5725/VPR-datasets-downloader/datasets/nyu-vpr/images/test/queries' 7 | 8 | file_list = os.listdir(folder_path) 9 | 10 | full_path_list = [os.path.join(folder_path, filename) for filename in file_list] 11 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/test.SBATCH: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | #SBATCH --nodes=1 4 | #SBATCH --ntasks-per-node=4 5 | #SBATCH --cpus-per-task=1 6 | #SBATCH --mem-per-cpu=64GB 7 | #SBATCH --time=24:00:00 8 | #SBATCH --gres=gpu 9 | #SBATCH --job-name=res 10 | 11 | module purge 12 | 13 | singularity exec --nv \ 14 | --overlay /scratch/ds5725/environments/mixvpr.ext3:rw \ 15 | /scratch/work/public/singularity/cuda11.1-cudnn8-devel-ubuntu18.04.sif \ 16 | /bin/bash -c "source /ext3/env.sh; python train_1.py --dataset_name=indoor --datasets_folder=/mnt/data/nyc_indoor --backb" 17 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/util.py: -------------------------------------------------------------------------------- 1 | 2 | import re 3 | import torch 4 | import shutil 5 | import logging 6 | import torchscan 7 | import numpy as np 8 | from collections import OrderedDict 9 | from os.path import join 10 | from sklearn.decomposition import PCA 11 | 12 | import datasets_ws 13 | 14 | 15 | def get_flops(model, input_shape=(480, 640)): 16 | """Return the FLOPs as a string, such as '22.33 GFLOPs'""" 17 | assert len(input_shape) == 2, f"input_shape should have len==2, but it's {input_shape}" 18 | module_info = torchscan.crawl_module(model, (3, input_shape[0], input_shape[1])) 19 | output = torchscan.utils.format_info(module_info) 20 | return re.findall("Floating Point Operations on forward: (.*)\n", output)[0] 21 | 22 | 23 | def save_checkpoint(args, state, is_best, filename): 24 | model_path = join(args.save_dir, filename) 25 | torch.save(state, model_path) 26 | if is_best: 27 | shutil.copyfile(model_path, join(args.save_dir, "best_model.pth")) 28 | 29 | 30 | def resume_model(args, model): 31 | checkpoint = torch.load(args.resume, map_location=args.device) 32 | if 'model_state_dict' in checkpoint: 33 | state_dict = checkpoint['model_state_dict'] 34 | else: 35 | # The pre-trained models that we provide in the README do not have 'state_dict' in the keys as 36 | # the checkpoint is directly the state dict 37 | state_dict = checkpoint 38 | # if the model contains the prefix "module" which is appendend by 39 | # DataParallel, remove it to avoid errors when loading dict 40 | if list(state_dict.keys())[0].startswith('module'): 41 | state_dict = OrderedDict({k.replace('module.', ''): v for (k, v) in state_dict.items()}) 42 | model.load_state_dict(state_dict) 43 | return model 44 | 45 | 46 | def resume_train(args, model, optimizer=None, strict=False): 47 | """Load model, optimizer, and other training parameters""" 48 | logging.debug(f"Loading checkpoint: {args.resume}") 49 | checkpoint = torch.load(args.resume) 50 | start_epoch_num = checkpoint["epoch_num"] 51 | model.load_state_dict(checkpoint["model_state_dict"], strict=strict) 52 | if optimizer: 53 | optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) 54 | best_r5 = checkpoint["best_r5"] 55 | not_improved_num = checkpoint["not_improved_num"] 56 | logging.debug(f"Loaded checkpoint: start_epoch_num = {start_epoch_num}, " 57 | f"current_best_R@5 = {best_r5:.1f}") 58 | if args.resume.endswith("last_model.pth"): # Copy best model to current save_dir 59 | shutil.copy(args.resume.replace("last_model.pth", "best_model.pth"), args.save_dir) 60 | return model, optimizer, best_r5, start_epoch_num, not_improved_num 61 | 62 | 63 | def compute_pca(args, model, pca_dataset_folder, full_features_dim): 64 | model = model.eval() 65 | pca_ds = datasets_ws.PCADataset(args, args.datasets_folder, pca_dataset_folder) 66 | dl = torch.utils.data.DataLoader(pca_ds, args.infer_batch_size, shuffle=True) 67 | pca_features = np.empty([min(len(pca_ds), 2**14), full_features_dim]) 68 | with torch.no_grad(): 69 | for i, images in enumerate(dl): 70 | if i*args.infer_batch_size >= len(pca_features): 71 | break 72 | features = model(images).cpu().numpy() 73 | pca_features[i*args.infer_batch_size : (i*args.infer_batch_size)+len(features)] = features 74 | pca = PCA(args.pca_dim) 75 | pca.fit(pca_features) 76 | return pca 77 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/utils/__init__.py: -------------------------------------------------------------------------------- 1 | from .losses import get_miner, get_loss 2 | from .validation import get_validation_recalls 3 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/utils/losses.py: -------------------------------------------------------------------------------- 1 | from pytorch_metric_learning import losses, miners 2 | from pytorch_metric_learning.distances import CosineSimilarity, DotProductSimilarity 3 | 4 | def get_loss(loss_name): 5 | if loss_name == 'SupConLoss': return losses.SupConLoss(temperature=0.07) 6 | if loss_name == 'CircleLoss': return losses.CircleLoss(m=0.4, gamma=80) #these are params for image retrieval 7 | if loss_name == 'MultiSimilarityLoss': return losses.MultiSimilarityLoss(alpha=1.0, beta=50, base=0.0, distance=DotProductSimilarity()) 8 | if loss_name == 'ContrastiveLoss': return losses.ContrastiveLoss(pos_margin=0, neg_margin=1) 9 | if loss_name == 'Lifted': return losses.GeneralizedLiftedStructureLoss(neg_margin=0, pos_margin=1, distance=DotProductSimilarity()) 10 | if loss_name == 'FastAPLoss': return losses.FastAPLoss(num_bins=30) 11 | if loss_name == 'NTXentLoss': return losses.NTXentLoss(temperature=0.07) #The MoCo paper uses 0.07, while SimCLR uses 0.5. 12 | if loss_name == 'TripletMarginLoss': return losses.TripletMarginLoss(margin=0.1, swap=False, smooth_loss=False, triplets_per_anchor='all') #or an int, for example 100 13 | if loss_name == 'CentroidTripletLoss': return losses.CentroidTripletLoss(margin=0.05, 14 | swap=False, 15 | smooth_loss=False, 16 | triplets_per_anchor="all",) 17 | raise NotImplementedError(f'Sorry, <{loss_name}> loss function is not implemented!') 18 | 19 | def get_miner(miner_name, margin=0.1): 20 | if miner_name == 'TripletMarginMiner' : return miners.TripletMarginMiner(margin=margin, type_of_triplets="semihard") # all, hard, semihard, easy 21 | if miner_name == 'MultiSimilarityMiner' : return miners.MultiSimilarityMiner(epsilon=margin, distance=CosineSimilarity()) 22 | if miner_name == 'PairMarginMiner' : return miners.PairMarginMiner(pos_margin=0.7, neg_margin=0.3, distance=DotProductSimilarity()) 23 | return None 24 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/utils/validation.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import faiss 3 | import faiss.contrib.torch_utils 4 | from prettytable import PrettyTable 5 | 6 | 7 | def get_validation_recalls(r_list, q_list, k_values, gt, print_results=True, faiss_gpu=False, dataset_name='dataset without name ?'): 8 | 9 | embed_size = r_list.shape[1] 10 | if faiss_gpu: 11 | res = faiss.StandardGpuResources() 12 | flat_config = faiss.GpuIndexFlatConfig() 13 | flat_config.useFloat16 = True 14 | flat_config.device = 0 15 | faiss_index = faiss.GpuIndexFlatL2(res, embed_size, flat_config) 16 | # build index 17 | else: 18 | faiss_index = faiss.IndexFlatL2(embed_size) 19 | 20 | # add references 21 | faiss_index.add(r_list) 22 | 23 | # search for queries in the index 24 | _, predictions = faiss_index.search(q_list, max(k_values)) 25 | 26 | 27 | 28 | # start calculating recall_at_k 29 | correct_at_k = np.zeros(len(k_values)) 30 | for q_idx, pred in enumerate(predictions): 31 | for i, n in enumerate(k_values): 32 | # if in top N then also in top NN, where NN > N 33 | if np.any(np.in1d(pred[:n], gt[q_idx])): 34 | correct_at_k[i:] += 1 35 | break 36 | 37 | correct_at_k = correct_at_k / len(predictions) 38 | d = {k:v for (k,v) in zip(k_values, correct_at_k)} 39 | 40 | if print_results: 41 | print() # print a new line 42 | table = PrettyTable() 43 | table.field_names = ['K']+[str(k) for k in k_values] 44 | table.add_row(['Recall@K']+ [f'{100*v:.2f}' for v in correct_at_k]) 45 | print(table.get_string(title=f"Performances on {dataset_name}")) 46 | 47 | return d 48 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/visual/name_utm.py: -------------------------------------------------------------------------------- 1 | import os 2 | import util 3 | import cv2 4 | from tqdm import tqdm 5 | 6 | def get_file_paths(folder_path): 7 | file_paths = [] 8 | for root, dirs, files in os.walk(folder_path): 9 | for file in files: 10 | file_path = os.path.join(root, file) 11 | file_paths.append(file_path) 12 | return file_paths 13 | 14 | folder_path = '/scratch/ds5725/VPR-datasets-downloader/datasets/indoor_new/images/val' 15 | 16 | file_list=get_file_paths(folder_path) 17 | 18 | full_path_list = [os.path.join(folder_path, filename) for filename in file_list] 19 | # file_list = os.listdir(folder_path) 20 | 21 | # full_path_list = [os.path.join(folder_path, filename) for filename in file_list] 22 | 23 | # folder_path = '/scratch/ds5725/deep-visual-geo-localization-benchmark/visual/resnet' 24 | 25 | # file_list = os.listdir(folder_path) 26 | 27 | # full_path_list1 = [os.path.join(folder_path, filename) for filename in file_list] 28 | 29 | # d={} 30 | # f=open("/scratch/ds5725/ssl_vpr/sub/sub_test_utm.txt") 31 | # for line in f: 32 | # s=line.strip().split() 33 | # d[s[0]]=(util.format_coord(float(s[1])),util.format_coord(float(s[2]))) 34 | 35 | 36 | # for i in tqdm(range(len(full_path_list))): 37 | # substr=os.path.basename(full_path_list[i]) 38 | # substr=substr[:-6]+".jpg" 39 | # utm_east=d[substr][0] 40 | # utm_north=d[substr][1] 41 | # new_name="" 42 | # print(utm_east,utm_north) 43 | # for ip1 in full_path_list1: 44 | # print(ip1) 45 | # break 46 | # if utm_east in ip1 and utm_north in ip1: 47 | # new_name=os.path.basename(ip1) 48 | # break 49 | # if new_name!="": 50 | # img=cv2.imread(full_path_list[i]) 51 | # cv2.imwrite("/scratch/ds5725/deep-visual-geo-localization-benchmark/visual/simclr_utm/"+new_name, img) 52 | 53 | f1=open("val_paths.txt", "w") 54 | for fp in full_path_list: 55 | f1.write(fp+'\n') 56 | 57 | f2=open("val_utm.txt","w") 58 | for line in full_path_list: 59 | substr=os.path.basename(line) 60 | first_number = float(substr.split('@')[1]) 61 | second_number = float(substr.split('@')[2]) 62 | f2.write(substr+" "+str(first_number)+" "+str(second_number)+'\n') 63 | 64 | -------------------------------------------------------------------------------- /benchmark/deep-visual-geo-localization-benchmark/visual/util.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | import utm 4 | import cv2 5 | import math 6 | import time 7 | import shutil 8 | import requests 9 | from tqdm import tqdm 10 | 11 | RETRY_SECONDS = 2 12 | 13 | 14 | def get_distance(coords_A, coords_B): 15 | return math.sqrt((float(coords_B[0])-float(coords_A[0]))**2 + (float(coords_B[1])-float(coords_A[1]))**2) 16 | 17 | 18 | def download_heavy_file(url, output_path): 19 | os.makedirs("tmp", exist_ok=True) 20 | 21 | tmp_filename = os.path.join("tmp", f"tmp_{int(time.time()*1000)}") 22 | if os.path.exists(output_path): 23 | print(f"File {output_path} already exists, I won't download it again") 24 | return 25 | for attempt_num in range(10): # In case of errors, try 10 times 26 | try: 27 | req = requests.get(url, stream=True) 28 | total_size = int(req.headers.get('content-length', 0)) # Total size in bytes 29 | block_size = 1024 # 1 KB 30 | tqdm_bar = tqdm(total=total_size, desc=os.path.basename(output_path), 31 | unit='iB', unit_scale=True, ncols=100) 32 | with open(tmp_filename, 'wb') as f: 33 | for data in req.iter_content(block_size): 34 | tqdm_bar.update(len(data)) 35 | f.write(data) 36 | tqdm_bar.close() 37 | if total_size != 0 and tqdm_bar.n != total_size: 38 | print(tqdm_bar.n) 39 | print(total_size) 40 | raise RuntimeError("ERROR, something went wrong during download") 41 | break 42 | except (Exception, RuntimeError) as e: 43 | if os.path.exists(tmp_filename): os.remove(tmp_filename) 44 | print(e) 45 | print(f"I'll try again to download {output_path} in {RETRY_SECONDS**attempt_num} seconds") 46 | time.sleep(RETRY_SECONDS**attempt_num) 47 | else: 48 | raise RuntimeError(f"I tried 10 times and I couldn't download {output_path} from {url}") 49 | os.makedirs(os.path.dirname(os.path.abspath(output_path)), exist_ok=True) 50 | shutil.move(tmp_filename, output_path) 51 | 52 | 53 | def is_valid_timestamp(timestamp): 54 | """Return True if it's a valid timestamp, in format YYYYMMDD_hhmmss, 55 | with all fields from left to right optional. 56 | >>> is_valid_timestamp('') 57 | True 58 | >>> is_valid_timestamp('201901') 59 | True 60 | >>> is_valid_timestamp('20190101_123000') 61 | True 62 | """ 63 | return bool(re.match("^(\d{4}(\d{2}(\d{2}(_(\d{2})(\d{2})?(\d{2})?)?)?)?)?$", timestamp)) 64 | 65 | 66 | def format_coord(num, left=7, right=2): 67 | """Return the formatted number as a string with (left) int digits 68 | (including sign '-' for negatives) and (right) float digits. 69 | >>> format_coord(1.1, 3, 3) 70 | '001.100' 71 | >>> format_coord(-0.123, 3, 3) 72 | '-00.123' 73 | """ 74 | sign = "-" if float(num) < 0 else "" 75 | num = str(abs(float(num))) + "." 76 | integer, decimal = num.split(".")[:2] 77 | left -= len(sign) 78 | return f"{sign}{int(integer):0{left}d}.{decimal[:right]:<0{right}}" 79 | 80 | import doctest 81 | doctest.testmod() # Automatically execute unit-test of format_coord() 82 | 83 | 84 | def format_location_info(latitude, longitude): 85 | easting, northing, zone_number, zone_letter = utm.from_latlon(float(latitude), float(longitude)) 86 | easting = format_coord(easting, 7, 2) 87 | northing = format_coord(northing, 7, 2) 88 | latitude = format_coord(latitude, 3, 5) 89 | longitude = format_coord(longitude, 4, 5) 90 | return easting, northing, zone_number, zone_letter, latitude, longitude 91 | 92 | 93 | def get_dst_image_name(latitude, longitude, pano_id=None, tile_num=None, heading=None, 94 | pitch=None, roll=None, height=None, timestamp=None, note=None, extension=".jpg"): 95 | easting, northing, zone_number, zone_letter, latitude, longitude = format_location_info(latitude, longitude) 96 | tile_num = f"{int(float(tile_num)):02d}" if tile_num is not None else "" 97 | heading = f"{int(float(heading)):03d}" if heading is not None else "" 98 | pitch = f"{int(float(pitch)):03d}" if pitch is not None else "" 99 | timestamp = f"{timestamp}" if timestamp is not None else "" 100 | note = f"{note}" if note is not None else "" 101 | assert is_valid_timestamp(timestamp), f"{timestamp} is not in YYYYMMDD_hhmmss format" 102 | if roll is None: roll = "" 103 | else: raise NotImplementedError() 104 | if height is None: height = "" 105 | else: raise NotImplementedError() 106 | 107 | return f"@{easting}@{northing}@{zone_number:02d}@{zone_letter}@{latitude}@{longitude}" + \ 108 | f"@{pano_id}@{tile_num}@{heading}@{pitch}@{roll}@{height}@{timestamp}@{note}@{extension}" 109 | 110 | 111 | class VideoReader: 112 | def __init__(self, video_name, size=None): 113 | if not os.path.exists(video_name): 114 | raise FileNotFoundError(f"{video_name} does not exist") 115 | self.video_name = video_name 116 | self.size = size 117 | self.vc = cv2.VideoCapture(f"{video_name}") 118 | self.frames_per_second = self.vc.get(cv2.CAP_PROP_FPS) 119 | self.frame_duration_millis = 1000 / self.frames_per_second 120 | self.frames_num = int(self.vc.get(cv2.CAP_PROP_FRAME_COUNT)) 121 | self.video_length_in_millis = int(self.frames_num * 1000 / self.frames_per_second) 122 | 123 | def get_time_at_frame(self, frame_num): 124 | return int(self.frame_duration_millis * frame_num) 125 | 126 | def get_frame_num_at_time(self, time): 127 | # time can be str ('21:59') or int in milliseconds 128 | millis = time if type(time) == int else self.str_to_millis(time) 129 | return min(int(millis / self.frame_duration_millis), self.frames_num) 130 | 131 | def get_frame_at_frame_num(self, frame_num): 132 | self.vc.set(cv2.CAP_PROP_POS_FRAMES, frame_num) 133 | frame = self.vc.read()[1] 134 | if frame is None: return None # In case of corrupt videos 135 | if self.size is not None: 136 | frame = cv2.resize(frame, self.size[::-1], cv2.INTER_CUBIC) 137 | frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) 138 | return frame 139 | 140 | @staticmethod 141 | def str_to_millis(time_str): 142 | return (int(time_str.split(":")[0]) * 60 + int(time_str.split(":")[1])) * 1000 143 | 144 | @staticmethod 145 | def millis_to_str(millis): 146 | if millis < 60*60*1000: 147 | return f"{math.floor((millis//1000//60)%60):02d}:{millis//1000%60:02d}" 148 | else: 149 | return f"{math.floor((millis//1000//60//60)%60):02d}:{math.floor((millis//1000//60)%60):02d}:{millis//1000%60:02d}" 150 | 151 | def __repr__(self): 152 | H, W = int(self.vc.get(cv2.CAP_PROP_FRAME_HEIGHT)), int(self.vc.get(cv2.CAP_PROP_FRAME_WIDTH)) 153 | return (f"Video '{self.video_name}' has {self.frames_num} frames, " + 154 | f"with resolution {H}x{W}, " + 155 | f"and lasts {self.video_length_in_millis // 1000} seconds " 156 | f"({self.millis_to_str(self.video_length_in_millis)}), therefore " 157 | f"there's a frame every {int(self.frame_duration_millis)} millis") 158 | 159 | def __del__(self): 160 | self.vc.release() 161 | 162 | -------------------------------------------------------------------------------- /method/README.md: -------------------------------------------------------------------------------- 1 | # Usage 2 | TODO: add instructions for running the code. -------------------------------------------------------------------------------- /teaser/data_vis.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/teaser/data_vis.jpg -------------------------------------------------------------------------------- /teaser/dataset_vis.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/teaser/dataset_vis.jpg -------------------------------------------------------------------------------- /teaser/label_pipeline_ex.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/teaser/label_pipeline_ex.png -------------------------------------------------------------------------------- /teaser/pipeline.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ai4ce/NYC-Indoor-VPR/36510997e724eb07caf9577128dc666b335ed7e5/teaser/pipeline.jpg --------------------------------------------------------------------------------