├── .gitignore ├── README.md ├── assets └── data │ ├── NN_7scenes.txt │ ├── NN_university.txt │ ├── db_all_med_hard_train.txt │ ├── db_all_med_hard_valid.txt │ └── est_rel_poses_flips_21_alpha1_dropout_no_grey.txt ├── configs └── main.yaml ├── experiments ├── configs │ ├── experiment │ │ └── 7scenes.yaml │ ├── main.yaml │ └── model │ │ └── relposenet.yaml ├── main.py ├── service │ └── benchmark_base.py └── seven_scenes │ ├── filter_pose.m │ ├── matlab_service │ ├── dqq_L1_mean_rotation_matrix.m │ ├── dqq_rotation_quaternion_initialization.m │ └── triangmidpoints.m │ └── pipeline.py ├── main.py ├── relposenet ├── __init__.py ├── augmentations.py ├── criterion.py ├── dataset.py ├── model.py ├── pipeline.py └── utils.py ├── requirements.txt └── tests └── dataloader_tests.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/ 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # RelPoseNet 2 | A PyTorch version of the ego-motion estimation pipeline proposed in [our work](https://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w17/Laskar_Camera_Relocalization_by_ICCV_2017_paper.pdf). The official implementation (in Lua) is available at https://github.com/AaltoVision/camera-relocalisation 3 | 4 | ## Evaluation on the 7-Scenes dataset 5 | scene|[Lua](https://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w17/Laskar_Camera_Relocalization_by_ICCV_2017_paper.pdf)| PyTorch (this repo) 6 | :---|:---:|:---: 7 | Chess|0.13m, 6.46deg|0.12m, 7.10deg 8 | Fire |0.26m, 12.72deg|0.26m, 12.45deg 9 | Heads|0.14m, 12.34deg|0.14m, 11.72deg 10 | Office|0.21m, 7.35deg|0.20m, 9.23deg 11 | Pumpkin|0.24m, 6.35deg|0.21m, 8.10deg 12 | Red Kitchen|0.24m, 8.03deg|0.23m, 8.82deg 13 | Stairs|0.27m, 11.82deg|0.27m, 11.66deg 14 | Average|0.21m, 9.30deg|0.20m, 9.87deg 15 | 16 | ## Installation 17 | - create and activate conda environment with Python 3.x 18 | ``` 19 | conda create -n my_fancy_env python=3.7 20 | source activate my_fancy_env 21 | ``` 22 | - install all dependencies by running the following command: 23 | ``` 24 | pip install -r requirements.txt 25 | ``` 26 | 27 | ## Evaluation and Training 28 | Evaluation and training have been performed on the 7-Scenes dataset available [here](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/). Important!!! The images have to be resized such that the smaller dimension is 256 and the aspect ratio is intact. This could be done using the following command: 29 | ```find . -name "*.color.png" | xargs -I {} convert {} -resize "256^>" {}``` 30 | 31 | ### Evaluation 32 | - download an [archive](https://drive.google.com/drive/folders/1TnVuR2bNZviYYdT3XLqCW4xjO19eLG6T?usp=sharing) with the model snapshot and unpack it to the working directory 33 | - navigate to `RelPoseNet/experiments` and modify the main config file `configs/main.yaml`. Here, you need to change `work_dir` and `datasets_home_dir` 34 | - modify `img_path` in the `configs/experiment/7scenes.yaml` config file. Where `img_path` is the path with resized images of the 7-Scenes dataset 35 | - run `main.py` from `experiments` path 36 | - once evaluated, the script creates a text file with relative camera poses located at `${experiment.experiment_params.output.home_dir}/est_rel_poses.txt` 37 | - in order to predict absolute poses, run MATLAB and open `experiments/seven_scenes/filter_pose.m` 38 | - modify line 17 by providing the text file with estimated relative poses 39 | - if everything goes fine, one should get localization performance presented in the table above. 40 | 41 | 42 | ### Training 43 | - modify a config file `RelPoseNet/configs/main.yaml` by changing `work_dir`, `img_dir`, and `out_dir` 44 | - to perform training, run `RelPoseNet/main.py` 45 | 46 | 47 | ## License 48 | Our code is released under the Creative Commons BY-NC-SA 3.0, available only for non-commercial use. 49 | 50 | ## How to cite 51 | If you use this project in your research, please cite: 52 | 53 | ``` 54 | @inproceedings{Laskar2017PoseNet, 55 | title = {Camera relocalization by computing pairwise relative poses using convolutional neural network}, 56 | author = {Laskar, Zakaria and Melekhov, Iaroslav and Kalia, Surya and Kannala, Juho}, 57 | year = {2017}, 58 | booktitle = {Proceedings of the IEEE International Conference on Computer Vision Workshops} 59 | } 60 | 61 | @inproceedings{Melekhov2017RelPoseNet, 62 | title = {Camera relocalization by computing pairwise relative poses using convolutional neural network}, 63 | author = {Melekhov, Iaroslav and Ylioinas, Juha and Kannala, Juho and Rahtu, Esa}, 64 | year = {2017}, 65 | booktitle = {International Conference on Advanced Concepts for Intelligent Vision Systems} 66 | } 67 | ``` 68 | -------------------------------------------------------------------------------- /configs/main.yaml: -------------------------------------------------------------------------------- 1 | pipeline: Relative Camera Pose Estimation pipeline 2 | data_params: 3 | work_dir: /data/projects/RelPoseNet 4 | img_dir: /ssd/data/7scenes-light 5 | train_pairs_fname: ${data_params.work_dir}/assets/data/db_all_med_hard_train.txt 6 | val_pairs_fname: ${data_params.work_dir}/assets/data/db_all_med_hard_valid.txt 7 | model_params: 8 | backbone_net: resnet34 9 | resume_snapshot: null 10 | train_params: 11 | bs: 32 12 | lr: 1e-3 13 | alpha: 1 14 | n_workers: 8 15 | n_train_iters: 125000 # 42k is the size of our training dataset 'db_all_med_hard_train.txt' 16 | scheduler: 17 | lrate_decay_steps: 15000 18 | lrate_decay_factor: 0.5 19 | output_params: 20 | out_dir: /data/output/relposenet 21 | logger_dir: ${output_params.out_dir}/tboard/${model_params.backbone_net} 22 | snapshot_dir: ${output_params.out_dir}/snapshots/${model_params.backbone_net} 23 | validate_interval: 1300 24 | log_scalar_interval: 200 25 | seed: 1984 26 | hydra: 27 | run: 28 | dir: ${output_params.out_dir} 29 | -------------------------------------------------------------------------------- /experiments/configs/experiment/7scenes.yaml: -------------------------------------------------------------------------------- 1 | # @package _group_ 2 | experiment_params: 3 | name: 7scenes 4 | bs: 16 5 | n_workers: 8 6 | paths: 7 | img_path: ${paths.datasets_home_dir}/7scenes-light 8 | test_pairs_fname: ${paths.work_dir}/assets/data/NN_7scenes.txt 9 | output: 10 | home_dir: ${paths.output_home_dir}/${model.model_params.name}/${experiment.experiment_params.name} 11 | res_txt_fname: ${experiment.experiment_params.output.home_dir}/est_rel_poses.txt -------------------------------------------------------------------------------- /experiments/configs/main.yaml: -------------------------------------------------------------------------------- 1 | pipeline: Relative Camera Pose Estimation pipeline 2 | defaults: 3 | - experiment: 7scenes 4 | - model: relposenet 5 | paths: 6 | work_dir: /data/projects/RelPoseNet 7 | datasets_home_dir: /ssd/data 8 | output_home_dir: ${paths.work_dir}/output 9 | snapshots_dir: ${paths.work_dir}/data/snapshots 10 | hydra: 11 | run: 12 | dir: ${paths.output_home_dir} -------------------------------------------------------------------------------- /experiments/configs/model/relposenet.yaml: -------------------------------------------------------------------------------- 1 | # @package _group_ 2 | model_params: 3 | name: relposenet 4 | backbone_net: resnet34 5 | snapshot: ${paths.snapshots_dir}/${model.model_params.name}/best_val_flipped_1_dropout_no_grey.pth -------------------------------------------------------------------------------- /experiments/main.py: -------------------------------------------------------------------------------- 1 | import hydra 2 | from experiments.seven_scenes.pipeline import SevenScenesBenchmark 3 | 4 | 5 | @hydra.main(config_path="configs", config_name="main") 6 | def main(cfg): 7 | benchmark = None 8 | if cfg.experiment.experiment_params.name == '7scenes': 9 | benchmark = SevenScenesBenchmark(cfg) 10 | 11 | benchmark.evaluate() 12 | 13 | 14 | if __name__ == "__main__": 15 | main() 16 | -------------------------------------------------------------------------------- /experiments/service/benchmark_base.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | 4 | class Benchmark(object): 5 | def __init__(self, cfg): 6 | self.cfg = cfg 7 | self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 8 | 9 | def evaluate(self): 10 | raise NotImplementedError 11 | -------------------------------------------------------------------------------- /experiments/seven_scenes/filter_pose.m: -------------------------------------------------------------------------------- 1 | %% script to filter pose estimates from NN 2 | clear all 3 | addpath('matlab_service') 4 | addpath('../../assets/data') 5 | 6 | dataset_name = '7-Scenes'; % or 'University' 7 | % getting GT file 8 | if strcmp(dataset_name, '7-Scenes') 9 | file_id_gt = fopen('NN_7scenes.txt'); 10 | elseif strcmp(dataset_name, 'University') 11 | file_id_gt = fopen('NN_university.txt'); 12 | else 13 | error('Please, specify dataset_name variable properly [7-Scenes or University]'); 14 | end 15 | 16 | % Txt file with network predictions 17 | file_id_est = fopen('../../output/est_rel_poses.txt'); 18 | 19 | data_cells = textscan(file_id_gt, '%s %s %d %d %d %f %f %f %f %f %f %f %f %f %f %f %f %f %f'); 20 | translation_gt_q = [data_cells{1,4+2} data_cells{1,5+2} data_cells{1,6+2}]; 21 | orientation_gt_q = [data_cells{1,7+2} data_cells{1,8+2} data_cells{1,9+2} ... 22 | data_cells{1,10+2}]; 23 | translation_gt_db = [data_cells{1,11+2} data_cells{1,12+2} data_cells{1,13+2}]; 24 | orientation_gt_db = [data_cells{1,14+2} data_cells{1,15+2} data_cells{1,16+2} ... 25 | data_cells{1,17+2}]; 26 | 27 | number_of_pairs = size(translation_gt_q, 1); 28 | 29 | data_cells_est = textscan(file_id_est, '%f %f %f %f %f %f %f'); 30 | 31 | orientation_est = [data_cells_est{1, 1} data_cells_est{1, 2} ... 32 | data_cells_est{1, 3} data_cells_est{1, 4}]; 33 | translation_est = [data_cells_est{1, 5} data_cells_est{1, 6} data_cells_est{1, 7}]; 34 | 35 | %estimations = fread(pred_file_id, [7 Inf], 'float')'; 36 | fclose(file_id_gt); 37 | fclose(file_id_est); 38 | 39 | %orientation_est = estimations(:, 1:4); 40 | %translation_est = estimations(:, 5:end); 41 | 42 | %% main filtering stage 43 | 44 | % intialize variables 45 | orientation_err_deg = zeros(1, number_of_pairs); 46 | translation_err_deg = zeros(1, number_of_pairs); 47 | 48 | NN_count = 0; % counter over the NN from 1 to |NN|(=5) 49 | allPairs = 0; % index to store the triangulated 3D camera locations 50 | queryNum = 0; 51 | 52 | % errors 53 | err_trans = zeros(1, number_of_pairs/5); 54 | err_quat = zeros(1, number_of_pairs/5); 55 | 56 | % triangulations from mid-point algo 57 | P = cell(1,5); 58 | matches = zeros(1,4); 59 | trans_tmp = zeros(10,3); % store all the possible camera locations from pairwise combinations of NN db images 60 | 61 | % NN 62 | R_q_NN = zeros(5,4); 63 | R_db_NN = zeros(5,4); 64 | t_db_NN = zeros(5,3); 65 | t_q_NN = zeros(5,3); 66 | 67 | % estimated direction vectors from db to query 68 | centers_rel_network = zeros(5,3); 69 | 70 | falseC = 0; 71 | 72 | for k=1:number_of_pairs 73 | % k 74 | %% ground truths 75 | %------- rotation 76 | R_q = quat2rotm(orientation_gt_q(k,:) ./ norm(orientation_gt_q(k,:))); 77 | R_db = quat2rotm(orientation_gt_db(k,:) ./ norm(orientation_gt_db(k,:))); 78 | %------- translation 79 | t_q = translation_gt_q(k,:) ; 80 | t_db = translation_gt_db(k,:) ; 81 | 82 | %% estimations 83 | %------- rotation 84 | delR_est = quat2rotm(orientation_est(k,:) ./ norm(orientation_est(k,:))); 85 | R_q_est = R_db*delR_est; 86 | %------- translation 87 | t_est_center = translation_est(k,:)./norm(translation_est(k,:)); % (C_i - C_j) 88 | t_est = (R_db'*t_est_center'); %R_j'(C_i - C_j) 89 | 90 | %% ------------------------------------------------------------------------- 91 | % store the estimations and db pose estimations for each NN related to 92 | % a query 93 | NN_count = NN_count + 1; 94 | R_q_NN(NN_count,:) = rotm2quat(R_q_est); 95 | t_q_NN(NN_count,:) = t_est; 96 | 97 | R_db_NN(NN_count,:) = rotm2quat(R_db); 98 | t_db_NN(NN_count,:) = t_db; 99 | 100 | P{NN_count} = [R_db' -R_db'*t_db']; 101 | 102 | centers_rel_network(NN_count,:) = t_est_center; 103 | 104 | % iterate over pairwise combinations {(1,2),(1,3),(2,3),(1,4),(2,4).....(4,5)} 105 | for i = 1:NN_count-1 106 | 107 | 108 | allPairs = allPairs + 1; 109 | 110 | % for triangulating a 3D camera position, we need the camera 111 | % matrices of the two db cameras: P1, P2 and the translation 112 | % directions from db to q: t1, t2 such that the z-cordinate is 1 113 | P1 = P{i}; 114 | P2 = P{NN_count}; 115 | t1 = t_q_NN(i,:)./t_q_NN(i,3); 116 | t2 = t_q_NN(NN_count,:)./t_q_NN(NN_count,3); 117 | matches(1,1:2) = t1(1:2); 118 | matches(1,3:4) = t2(1:2); 119 | X = triangmidpoints(matches, P1, P2); 120 | 121 | trans_tmp(allPairs,:) = X; 122 | end 123 | 124 | 125 | %% Filtering stage 126 | 127 | % if all the NN for a query are processed 128 | if NN_count == 5 129 | queryNum = queryNum + 1; 130 | 131 | % re-initialize the variables 132 | NN_count = 0; 133 | allPairs = 0; 134 | 135 | %% inlier process for trans 136 | 137 | % NaN can arise when the translation direction of two NN db 138 | % image-pairs used to triangulate the camera location have the same 139 | % direction. In the event all the pairwise combinations output same 140 | % translation directions, assign the translation vector of the NN 141 | % to the query 142 | if numel(find(isnan(trans_tmp))) == 10 143 | X_pred = t_db_NN(1,:); 144 | [err_trans(queryNum)] = norm(X_pred - t_q); 145 | else 146 | % remove the nan estimates 147 | nan_rows = any(isnan(trans_tmp),2) ; 148 | trans_tmp(nan_rows,:) = []; 149 | 150 | thresh_trans = 20; %10 degrees 151 | inlier_cnt_T = zeros(1,size(trans_tmp,1)); % store the inlier count estimates of the triangulated camera locs 152 | inlier_sum_T = zeros(1,size(trans_tmp,1)); % store the sum of residuals of distances for the inliers 153 | % estimate inliers for orientation 154 | % iterate over the triangulated 3D camera locs 155 | for h = 1:size(trans_tmp,1) 156 | 157 | % obtain the direction vectors from the database to query 158 | centers_rel_triang = bsxfun(@minus, trans_tmp(h,:), t_db_NN); 159 | 160 | % make unit length 161 | centers_rel_triang = bsxfun(@rdivide,centers_rel_triang,sqrt(sum(abs(centers_rel_triang).^2,2))); 162 | 163 | % compute angular distance between the translation 164 | % directions predicted by the network: centers_rel_network and that 165 | % obtained from triangulation followed by backprojection: 166 | % centers_rel_triang 167 | 168 | angular_dist_T = 2*acos(abs(sum(centers_rel_triang.*centers_rel_network,2)))*180/pi; 169 | 170 | inlier_thresh_T = find(angular_dist_T 1 %if exists such other estimate 182 | 183 | % OPtion 1: average the candidates 184 | X_best = mean(trans_tmp(sim_inlier_cnt_T,:)); 185 | 186 | % % 187 | % % OPtion 2: select the inlier estimate with least residual sum 188 | % [all_estimates_T, all_ID_T] = min(inlier_sum_T(sim_inlier_cnt_T)); 189 | % % all_ID = randi([1 numel(sim_inlier_cnt)],1,1); % if randomly chosen 190 | % X_best = trans_tmp(sim_inlier_cnt_T(all_ID_T),:); 191 | err_trans(queryNum) = norm(X_best - t_q); 192 | 193 | % % take the inlier with the best estimate using GT 194 | % inl_dist_GT = 2*acos(abs(sum(bsxfun(@times, R_qs(sim_inlier_cnt,:),rotm2quat(R_q)),2)))*180/pi; 195 | % err_quat(queryNum) = min(inl_dist_GT); 196 | else 197 | X_best = trans_tmp(init_ID_T,:); 198 | err_trans(queryNum) = norm(X_best - t_q); 199 | end 200 | % 201 | end 202 | 203 | %% filtering process for rotation 204 | 205 | thresh_ort = 20; %10 degrees 206 | inlier_cnt = zeros(1,5); % store the inlier count estimates of the triangulated camera locs 207 | inlier_sum = zeros(1,5); % store the sum of residuals of distances for the inliers 208 | % iterate over the rotation estimates obtained from NN 209 | for h = 1:5 210 | 211 | % compute the angular distance between the current estimate of 212 | % query rotation R_q_NN(h,:) as indexed by h and the rest of 213 | % the estimations. 214 | angular_dist = 2*acos(abs(sum(bsxfun(@times, R_q_NN(h,:),R_q_NN),2)))*180/pi; 215 | 216 | inlier_thresh = find(angular_dist 1 %if exists such other estimate 228 | 229 | % OPtion 1: average the candidates 230 | for inl = 1:numel(sim_inlier_cnt) 231 | 232 | R_inl(:,:,inl) = quat2rotm(R_q_NN(sim_inlier_cnt(inl),:)); 233 | 234 | end 235 | 236 | R_avg = dqq_L1_mean_rotation_matrix(R_inl); 237 | q_best = rotm2quat(R_avg); 238 | % % OPtion 2: select the inlier estimate with least residual sum 239 | % [all_estimates, all_ID] = min(inlier_sum(sim_inlier_cnt)); 240 | % q_best = R_qs(sim_inlier_cnt(all_ID),:); 241 | err_quat(queryNum) = 2*acos(abs(sum(q_best.*rotm2quat(R_q))))*180/pi; 242 | 243 | else 244 | q_best = R_q_NN(init_ID,:); 245 | err_quat(queryNum) = 2*acos(abs(sum(q_best.*rotm2quat(R_q))))*180/pi; 246 | end 247 | 248 | end 249 | 250 | 251 | 252 | end 253 | 254 | %% results 255 | if strcmp(dataset_name, '7-Scenes') 256 | chess = median(err_quat(6001:8000)); 257 | fire = median(err_quat(1:2000)); 258 | heads = median(err_quat(8001:9000)); 259 | office = median(err_quat(2001:6000)); 260 | pumpkin = median(err_quat(15001:17000)); 261 | redkitchen = median(err_quat(10001:15000)); 262 | stairs = median(err_quat(9001:10000)); 263 | fprintf('Orientation error, deg:\n'); 264 | fprintf('chess: %.2f\n', chess); 265 | fprintf('fire: %.2f\n', fire); 266 | fprintf('heads: %.2f\n', heads); 267 | fprintf('office: %.2f\n', office); 268 | fprintf('pumpkin: %.2f\n', pumpkin); 269 | fprintf('redkitchen: %.2f\n', redkitchen); 270 | fprintf('stairs: %.2f\n', stairs); 271 | fprintf('Mean averaged orientation: %.2f deg.\n', mean([chess fire heads office pumpkin redkitchen stairs])); 272 | fprintf('--------------------------------------------------------\n'); 273 | chess = median(err_trans(6001:8000)); 274 | fire = median(err_trans(1:2000)); 275 | heads = median(err_trans(8001:9000)); 276 | office = median(err_trans(2001:6000)); 277 | pumpkin = median(err_trans(15001:17000)); 278 | redkitchen = median(err_trans(10001:15000)); 279 | stairs = median(err_trans(9001:10000)); 280 | fprintf('Translation error, m:\n'); 281 | fprintf('chess: %.2f\n', chess); 282 | fprintf('fire: %.2f\n', fire); 283 | fprintf('heads: %.2f\n', heads); 284 | fprintf('office: %.2f\n', office); 285 | fprintf('pumpkin: %.2f\n', pumpkin); 286 | fprintf('redkitchen: %.2f\n', redkitchen); 287 | fprintf('stairs: %.2f\n', stairs); 288 | fprintf('Mean averaged translation: %.2f m.\n', mean([chess fire heads office pumpkin redkitchen stairs])); 289 | else 290 | conference = median(err_quat(1:949)); 291 | kitchen1 = median(err_quat(950:1939)); 292 | meeting = median(err_quat(1940:2884)); 293 | office = median(err_quat(2885:end)); 294 | fprintf('Orientation error, deg:\n'); 295 | fprintf('office: %.2f\n', office); 296 | fprintf('meeting: %.2f\n', meeting); 297 | fprintf('kitchen1: %.2f\n', kitchen1); 298 | fprintf('conference: %.2f\n', conference); 299 | fprintf('Mean averaged orientation: %.2f deg.\n', mean([conference kitchen1 meeting office])); 300 | fprintf('--------------------------------------------------------\n'); 301 | % translation 302 | conference = median(err_trans(1:949)); 303 | kitchen1 = median(err_trans(950:1939)); 304 | meeting = median(err_trans(1940:2884)); 305 | office = median(err_trans(2885:end)); 306 | fprintf('Translation error, m:\n'); 307 | fprintf('office: %.2f\n', office); 308 | fprintf('meeting: %.2f\n', meeting); 309 | fprintf('kitchen1: %.2f\n', kitchen1); 310 | fprintf('conference: %.2f\n', conference); 311 | fprintf('Mean averaged translation: %.2f m.\n', mean([chess fire heads office pumpkin redkitchen stairs])); 312 | end 313 | 314 | 315 | 316 | -------------------------------------------------------------------------------- /experiments/seven_scenes/matlab_service/dqq_L1_mean_rotation_matrix.m: -------------------------------------------------------------------------------- 1 | function [ Rmean ] = dqq_L1_mean_rotation_matrix( R ) 2 | %DQQ_L1_MEAN_ROTATION_MATRIX Summary of this function goes here 3 | % This function calculate the mean rotation matrix of the given 3*3*n R matrix 4 | % under L1 norm by Weiszfeld algorithm. 5 | % Please refer to the paper: 6 | % 'L1 rotation averaging using the Weiszfel algorithm', Richard Hartley, etc, CVPR 2011 7 | % for details. 8 | 9 | S(:,:,1) = dqq_rotation_quaternion_initialization( R ); 10 | nofR=size(R); 11 | 12 | iter=1; 13 | 14 | while isreal(S(:,:,iter)) 15 | iter=iter+1; 16 | sum_vmatrix_normed(:,:,iter)=zeros(3,3); 17 | for j=1:nofR(3) 18 | vmatrix(:,:,j)=logm(R(:,:,j)*(S(:,:,iter-1))^(-1)); 19 | vmatrix_normed(:,:,j)=vmatrix(:,:,j)/norm(vmatrix(:,:,j)); 20 | sum_vmatrix_normed(:,:,iter)=sum_vmatrix_normed(:,:,iter)+vmatrix_normed(:,:,j); 21 | inv_norm_vmatrix(j)=1/norm(vmatrix(:,:,j)); 22 | end 23 | 24 | delta(:,:,iter)=sum_vmatrix_normed(:,:,iter)/sum(inv_norm_vmatrix); 25 | 26 | S(:,:,iter)=expm(delta(:,:,iter))*S(:,:,iter-1); 27 | 28 | if abs(1-det(S(:,:,iter)*S(:,:,iter)'))<10^(-10) 29 | break; 30 | end 31 | end 32 | 33 | Rmean=S(:,:,iter-1); 34 | 35 | end 36 | 37 | -------------------------------------------------------------------------------- /experiments/seven_scenes/matlab_service/dqq_rotation_quaternion_initialization.m: -------------------------------------------------------------------------------- 1 | function [ Rm ] = dqq_rotation_quaternion_initialization( R ) 2 | %DQQ_R_Q_INITIALIZATION Summary of this function goes here 3 | % Detailed explanation goes here 4 | % This is a function which provides an initialization for givin mean rotation 5 | % matrix R. 6 | % Please refer to 7 | % 'Rotation Averaging with Application to Camera-Rig Calibration' 8 | % for details. 9 | 10 | QR=dcm2quat(R); 11 | 12 | SR=R(:,:,1); 13 | SQ=dcm2quat(SR); 14 | 15 | nofR=size(R); 16 | 17 | for i=1:nofR(3) 18 | if norm(QR(i,:)+SQ) 0.5: 52 | img1, img2 = img2, img1 53 | t_gt = -self.t_gt[item] 54 | q_gt = torch.FloatTensor([q_gt[0], -q_gt[1], -q_gt[2], -q_gt[3]]) 55 | 56 | return {'img1': img1, 57 | 'img2': img2, 58 | 't_gt': t_gt, 59 | 'q_gt': q_gt} 60 | 61 | def __len__(self): 62 | return len(self.fnames1) 63 | 64 | 65 | class SevenScenesTestDataset(object): 66 | def __init__(self, experiment_cfg, transforms=None): 67 | self.experiment_cfg = experiment_cfg 68 | self.transforms = transforms 69 | self.scenes_dict = defaultdict(str) 70 | for i, scene in enumerate(['chess', 'fire', 'heads', 'office', 'pumpkin', 'redkitchen', 'stairs']): 71 | self.scenes_dict[i] = scene 72 | 73 | self.fnames1, self.fnames2 = self._read_pairs_txt() 74 | 75 | def _read_pairs_txt(self): 76 | fnames1, fnames2 = [], [] 77 | 78 | pairs_txt = self.experiment_cfg.paths.test_pairs_fname 79 | img_dir = self.experiment_cfg.paths.img_path 80 | with open(pairs_txt, 'r') as f: 81 | for line in f: 82 | chunks = line.rstrip().split(' ') 83 | scene_id1 = int(chunks[2]) 84 | scene_id2 = int(chunks[3]) 85 | fnames1.append(osp.join(img_dir, self.scenes_dict[scene_id2], chunks[1][1:])) 86 | fnames2.append(osp.join(img_dir, self.scenes_dict[scene_id1], chunks[0][1:])) 87 | 88 | return fnames1, fnames2 89 | 90 | def __getitem__(self, item): 91 | img1 = Image.open(self.fnames1[item]).convert('RGB') 92 | img2 = Image.open(self.fnames2[item]).convert('RGB') 93 | 94 | if self.transforms: 95 | img1 = self.transforms(img1) 96 | img2 = self.transforms(img2) 97 | 98 | return {'img1': img1, 99 | 'img2': img2, 100 | } 101 | 102 | def __len__(self): 103 | return len(self.fnames1) 104 | -------------------------------------------------------------------------------- /relposenet/model.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torchvision.models as models 4 | 5 | 6 | class RelPoseNet(nn.Module): 7 | def __init__(self, cfg): 8 | super().__init__() 9 | self.cfg = cfg 10 | self.backbone, self.concat_layer = self._get_backbone() 11 | self.net_q_fc = nn.Linear(self.concat_layer.in_features, 4) 12 | self.net_t_fc = nn.Linear(self.concat_layer.in_features, 3) 13 | self.dropout = nn.Dropout(0.3) 14 | 15 | def _get_backbone(self): 16 | backbone, concat_layer = None, None 17 | if self.cfg.backbone_net == 'resnet34': 18 | backbone = models.resnet34(pretrained=True) 19 | in_features = backbone.fc.in_features 20 | backbone.fc = nn.Identity() 21 | concat_layer = nn.Linear(2 * in_features, 2 * in_features) 22 | return backbone, concat_layer 23 | 24 | def _forward_one(self, x): 25 | x = self.backbone(x) 26 | x = x.view(x.size()[0], -1) 27 | return x 28 | 29 | def forward(self, x1, x2): 30 | feat1 = self._forward_one(x1) 31 | feat2 = self._forward_one(x2) 32 | 33 | feat = torch.cat((feat1, feat2), 1) 34 | q_est = self.net_q_fc(self.dropout(self.concat_layer(feat))) 35 | t_est = self.net_t_fc(self.dropout(self.concat_layer(feat))) 36 | return q_est, t_est 37 | -------------------------------------------------------------------------------- /relposenet/pipeline.py: -------------------------------------------------------------------------------- 1 | import os 2 | from os import path as osp 3 | import time 4 | from tqdm import tqdm 5 | import torch 6 | from tensorboardX import SummaryWriter 7 | from relposenet.model import RelPoseNet 8 | from relposenet.dataset import SevenScenesRelPoseDataset 9 | from relposenet.augmentations import get_augmentations 10 | from relposenet.criterion import RelPoseCriterion 11 | from relposenet.utils import cycle, set_seed 12 | 13 | 14 | class Pipeline(object): 15 | def __init__(self, cfg): 16 | self.cfg = cfg 17 | cfg_model = self.cfg.model_params 18 | self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 19 | set_seed(self.cfg.seed) 20 | 21 | # initialize dataloaders 22 | self.train_loader, self.val_loader = self._init_dataloaders() 23 | self.train_loader_iterator = iter(cycle(self.train_loader)) 24 | 25 | self.model = RelPoseNet(cfg_model).to(self.device) 26 | 27 | # Optimizer 28 | self.optimizer = torch.optim.Adam(self.model.parameters(), 29 | lr=self.cfg.train_params.lr) 30 | 31 | # Scheduler 32 | cfg_scheduler = self.cfg.train_params.scheduler 33 | self.scheduler = torch.optim.lr_scheduler.StepLR(self.optimizer, 34 | step_size=cfg_scheduler.lrate_decay_steps, 35 | gamma=cfg_scheduler.lrate_decay_factor) 36 | 37 | # Criterion 38 | self.criterion = RelPoseCriterion(self.cfg.train_params.alpha).to(self.device) 39 | 40 | # create writer (logger) 41 | self.writer = SummaryWriter(self.cfg.output_params.logger_dir) 42 | 43 | self.start_step = 0 44 | self.val_total_loss = 1e6 45 | if self.cfg.model_params.resume_snapshot: 46 | self._load_model(self.cfg.model_params.resume_snapshot) 47 | 48 | def _init_dataloaders(self): 49 | cfg_data = self.cfg.data_params 50 | cfg_train = self.cfg.train_params 51 | 52 | # get image augmentations 53 | train_augs, val_augs = get_augmentations() 54 | 55 | train_dataset = SevenScenesRelPoseDataset(cfg=self.cfg, split='train', transforms=train_augs) 56 | 57 | val_dataset = SevenScenesRelPoseDataset(cfg=self.cfg, split='val', transforms=val_augs) 58 | 59 | train_loader = torch.utils.data.DataLoader(train_dataset, 60 | batch_size=cfg_train.bs, 61 | shuffle=True, 62 | pin_memory=True, 63 | num_workers=cfg_train.n_workers, 64 | drop_last=True) 65 | 66 | val_loader = torch.utils.data.DataLoader(val_dataset, 67 | batch_size=cfg_train.bs, 68 | shuffle=False, 69 | pin_memory=True, 70 | num_workers=cfg_train.n_workers, 71 | drop_last=True) 72 | return train_loader, val_loader 73 | 74 | def _predict_cam_pose(self, mini_batch): 75 | q_est, t_est = self.model.forward(mini_batch['img1'].to(self.device), 76 | mini_batch['img2'].to(self.device)) 77 | return q_est, t_est 78 | 79 | def _save_model(self, step, loss_val, best_val=False): 80 | if not osp.exists(self.cfg.output_params.snapshot_dir): 81 | os.makedirs(self.cfg.output_params.snapshot_dir) 82 | 83 | fname_out = 'best_val.pth' if best_val else 'snapshot{:06d}.pth'.format(step) 84 | save_path = osp.join(self.cfg.output_params.snapshot_dir, fname_out) 85 | model_state = self.model.state_dict() 86 | torch.save({'step': step, 87 | 'state_dict': model_state, 88 | 'optimizer': self.optimizer.state_dict(), 89 | 'scheduler': self.scheduler.state_dict(), 90 | 'val_loss': loss_val, 91 | }, 92 | save_path) 93 | 94 | def _load_model(self, snapshot): 95 | data_dict = torch.load(snapshot) 96 | self.model.load_state_dict(data_dict['state_dict']) 97 | self.optimizer.load_state_dict(data_dict['optimizer']) 98 | self.scheduler.load_state_dict(data_dict['scheduler']) 99 | self.start_step = data_dict['step'] 100 | if 'val_loss' in data_dict: 101 | self.val_total_loss = data_dict['val_loss'] 102 | 103 | def _train_batch(self): 104 | train_sample = next(self.train_loader_iterator) 105 | q_est, t_est = self._predict_cam_pose(train_sample) 106 | 107 | self.optimizer.zero_grad() 108 | 109 | # compute loss 110 | loss, t_loss_val, q_loss_val = self.criterion(train_sample['q_gt'].to(self.device), 111 | train_sample['t_gt'].to(self.device), 112 | q_est, 113 | t_est) 114 | loss.backward() 115 | 116 | # update the optimizer 117 | self.optimizer.step() 118 | 119 | # update the scheduler 120 | self.scheduler.step() 121 | return loss.item(), t_loss_val, q_loss_val 122 | 123 | def _validate(self): 124 | self.model.eval() 125 | loss_total, t_loss_total, q_loss_total = 0., 0., 0. 126 | 127 | with torch.no_grad(): 128 | for val_sample in tqdm(self.val_loader): 129 | q_est, t_est = self._predict_cam_pose(val_sample) 130 | # compute loss 131 | loss, t_loss_val, q_loss_val = self.criterion(val_sample['q_gt'].to(self.device), 132 | val_sample['t_gt'].to(self.device), 133 | q_est, 134 | t_est) 135 | loss_total += loss.item() 136 | t_loss_total += t_loss_val 137 | q_loss_total += q_loss_val 138 | 139 | avg_total_loss = loss_total / len(self.val_loader) 140 | avg_t_loss = t_loss_total / len(self.val_loader) 141 | avg_q_loss = q_loss_total / len(self.val_loader) 142 | 143 | self.model.train() 144 | 145 | return avg_total_loss, avg_t_loss, avg_q_loss 146 | 147 | def run(self): 148 | print('Start training', self.start_step) 149 | train_start_time = time.time() 150 | train_log_iter_time = time.time() 151 | for step in range(self.start_step + 1, self.start_step + self.cfg.train_params.n_train_iters): 152 | train_loss_batch, _, _ = self._train_batch() 153 | 154 | if step % self.cfg.output_params.log_scalar_interval == 0 and step > 0: 155 | self.writer.add_scalar('Train_total_loss_batch', train_loss_batch, step) 156 | print(f'Elapsed time [min] for {self.cfg.output_params.log_scalar_interval} iterations: ' 157 | f'{(time.time() - train_log_iter_time) / 60.}') 158 | train_log_iter_time = time.time() 159 | print(f'Step {step} out of {self.cfg.train_params.n_train_iters} is done. Train loss (per batch): ' 160 | f'{train_loss_batch}.') 161 | 162 | if step % self.cfg.output_params.validate_interval == 0 and step > 0: 163 | val_time = time.time() 164 | best_val = False 165 | val_total_loss, val_t_loss, val_q_loss = self._validate() 166 | self.writer.add_scalar('Val_total_loss', val_total_loss, step) 167 | self.writer.add_scalar('Val_t_loss', val_t_loss, step) 168 | self.writer.add_scalar('Val_q_loss', val_q_loss, step) 169 | if val_total_loss < self.val_total_loss: 170 | self.val_total_loss = val_total_loss 171 | best_val = True 172 | self._save_model(step, val_total_loss, best_val=best_val) 173 | print(f'Validation loss: {val_total_loss}, t_loss: {val_t_loss}, q_loss: {val_q_loss}') 174 | print(f'Elapsed time [min] for validation: {(time.time() - val_time) / 60.}') 175 | train_log_iter_time = time.time() 176 | 177 | print(f'Elapsed time for training [min] {(time.time() - train_start_time) / 60.}') 178 | print('Done') 179 | -------------------------------------------------------------------------------- /relposenet/utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import random 3 | import torch 4 | 5 | 6 | def set_seed(seed): 7 | torch.backends.cudnn.benchmark = True 8 | torch.manual_seed(seed) 9 | torch.cuda.manual_seed_all(seed) 10 | torch.cuda.manual_seed(seed) 11 | np.random.seed(seed) 12 | random.seed(seed) 13 | 14 | 15 | def cycle(iterable): 16 | while True: 17 | for x in iterable: 18 | yield x 19 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | absl-py==0.11.0 2 | albumentations==0.5.2 3 | antlr4-python3-runtime==4.8 4 | argon2-cffi==20.1.0 5 | async-generator==1.10 6 | attrs==20.3.0 7 | backcall==0.2.0 8 | bleach==3.2.1 9 | blessings==1.7 10 | cachetools==4.1.1 11 | certifi==2020.11.8 12 | cffi==1.14.4 13 | chardet==3.0.4 14 | ConfigArgParse==1.2.3 15 | cupy-cuda102==8.0.0 16 | cycler==0.10.0 17 | decorator==4.4.2 18 | defusedxml==0.6.0 19 | entrypoints==0.3 20 | fastrlock==0.5 21 | future==0.18.2 22 | google-auth==1.23.0 23 | google-auth-oauthlib==0.4.2 24 | gpustat==0.6.0 25 | grpcio==1.33.2 26 | hydra-core==1.0.4 27 | idna==2.10 28 | imageio==2.9.0 29 | imgaug==0.4.0 30 | importlib-metadata==2.0.0 31 | importlib-resources==3.3.0 32 | ipykernel==5.4.2 33 | ipython==7.19.0 34 | ipython-genutils==0.2.0 35 | ipywidgets==7.5.1 36 | jedi==0.17.2 37 | Jinja2==2.11.2 38 | joblib==0.17.0 39 | json5==0.9.5 40 | jsonschema==3.2.0 41 | jupyter==1.0.0 42 | jupyter-client==6.1.7 43 | jupyter-console==6.2.0 44 | jupyter-core==4.7.0 45 | jupyterlab==2.2.9 46 | jupyterlab-pygments==0.1.2 47 | jupyterlab-server==1.2.0 48 | kiwisolver==1.2.0 49 | kornia==0.4.1 50 | Markdown==3.3.3 51 | MarkupSafe==1.1.1 52 | matplotlib==3.3.2 53 | mistune==0.8.4 54 | mkl-fft==1.2.0 55 | mkl-random==1.1.1 56 | mkl-service==2.3.0 57 | nbclient==0.5.1 58 | nbconvert==6.0.7 59 | nbformat==5.0.8 60 | nest-asyncio==1.4.3 61 | networkx==2.5 62 | notebook==6.1.5 63 | numpy @ file:///tmp/build/80754af9/numpy_and_numpy_base_1596233707986/work 64 | nvidia-ml-py3==7.352.0 65 | oauthlib==3.1.0 66 | olefile==0.46 67 | omegaconf==2.0.5 68 | opencv-contrib-python==3.4.2.17 69 | opencv-python==3.4.2.17 70 | opencv-python-headless==4.4.0.46 71 | packaging==20.8 72 | pandocfilters==1.4.3 73 | parso==0.7.1 74 | pexpect==4.8.0 75 | pickleshare==0.7.5 76 | Pillow @ file:///tmp/build/80754af9/pillow_1594307325547/work 77 | prometheus-client==0.9.0 78 | prompt-toolkit==3.0.8 79 | protobuf==3.13.0 80 | psutil==5.7.2 81 | ptyprocess==0.6.0 82 | pyasn1==0.4.8 83 | pyasn1-modules==0.2.8 84 | pycparser==2.20 85 | Pygments==2.7.3 86 | pynvrtc==9.2 87 | pyparsing==2.4.7 88 | pyrsistent==0.17.3 89 | python-dateutil==2.8.1 90 | pytorch-metric-learning==0.9.94 91 | PyWavelets==1.1.1 92 | PyYAML==5.3.1 93 | pyzmq==20.0.0 94 | qtconsole==5.0.1 95 | QtPy==1.9.0 96 | requests==2.24.0 97 | requests-oauthlib==1.3.0 98 | rsa==4.6 99 | scikit-image==0.17.2 100 | scikit-learn==0.23.2 101 | scipy==1.5.2 102 | Send2Trash==1.5.0 103 | Shapely==1.7.1 104 | six==1.15.0 105 | tensorboard==2.3.0 106 | tensorboard-plugin-wit==1.7.0 107 | tensorboardX==2.1 108 | terminado==0.9.1 109 | testpath==0.4.4 110 | threadpoolctl==2.1.0 111 | tifffile==2020.10.1 112 | torch==1.6.0 113 | torchvision==0.7.0 114 | tornado==6.1 115 | tqdm==4.50.2 116 | traitlets==5.0.5 117 | typing-extensions==3.7.4.3 118 | urllib3==1.25.11 119 | wcwidth==0.2.5 120 | webencodings==0.5.1 121 | Werkzeug==1.0.1 122 | widgetsnbextension==3.5.1 123 | zipp==3.4.0 124 | faiss==1.6.3 125 | -------------------------------------------------------------------------------- /tests/dataloader_tests.py: -------------------------------------------------------------------------------- 1 | import hydra 2 | import torch 3 | from relposenet.dataset import SevenScenesRelPoseDataset 4 | from relposenet.augmentations import train_augmentations 5 | import matplotlib.pyplot as plt 6 | 7 | 8 | @hydra.main(config_path="../configs", config_name="main") 9 | def main(cfg): 10 | augs = train_augmentations() 11 | dataset = SevenScenesRelPoseDataset(cfg, 12 | split='train', 13 | transforms=augs) 14 | 15 | dataloader = torch.utils.data.DataLoader(dataset, 16 | batch_size=1, 17 | shuffle=True, 18 | num_workers=8) 19 | 20 | for idx, mini_batch in enumerate(dataloader): 21 | 22 | print(f'Pair id: {idx}') 23 | print(f't: {mini_batch["t_gt"]}, q: {mini_batch["q_gt"]}') 24 | 25 | fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(35, 10)) 26 | axes = axes.flatten() 27 | 28 | axes[0].imshow(mini_batch['img1'].permute(0, 2, 3, 1).squeeze().numpy()) 29 | axes[1].imshow(mini_batch['img2'].permute(0, 2, 3, 1).squeeze().numpy()) 30 | 31 | axes[0].axis("off") 32 | axes[1].axis("off") 33 | 34 | plt.show(block=False) 35 | plt.pause(3) 36 | plt.close() 37 | 38 | 39 | if __name__ == '__main__': 40 | main() 41 | --------------------------------------------------------------------------------