├── .gitignore
├── README.md
├── assets
    └── data
    │   ├── NN_7scenes.txt
    │   ├── NN_university.txt
    │   ├── db_all_med_hard_train.txt
    │   ├── db_all_med_hard_valid.txt
    │   └── est_rel_poses_flips_21_alpha1_dropout_no_grey.txt
├── configs
    └── main.yaml
├── experiments
    ├── configs
    │   ├── experiment
    │   │   └── 7scenes.yaml
    │   ├── main.yaml
    │   └── model
    │   │   └── relposenet.yaml
    ├── main.py
    ├── service
    │   └── benchmark_base.py
    └── seven_scenes
    │   ├── filter_pose.m
    │   ├── matlab_service
    │       ├── dqq_L1_mean_rotation_matrix.m
    │       ├── dqq_rotation_quaternion_initialization.m
    │       └── triangmidpoints.m
    │   └── pipeline.py
├── main.py
├── relposenet
    ├── __init__.py
    ├── augmentations.py
    ├── criterion.py
    ├── dataset.py
    ├── model.py
    ├── pipeline.py
    └── utils.py
├── requirements.txt
└── tests
    └── dataloader_tests.py


/.gitignore:
--------------------------------------------------------------------------------
1 | .idea/
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # RelPoseNet
 2 | A PyTorch version of the ego-motion estimation pipeline proposed in [our work](https://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w17/Laskar_Camera_Relocalization_by_ICCV_2017_paper.pdf). The official implementation (in Lua) is available at https://github.com/AaltoVision/camera-relocalisation
 3 | 
 4 | ## Evaluation on the 7-Scenes dataset
 5 | scene|[Lua](https://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w17/Laskar_Camera_Relocalization_by_ICCV_2017_paper.pdf)| PyTorch (this repo)
 6 | :---|:---:|:---:
 7 | Chess|0.13m, 6.46deg|0.12m, 7.10deg
 8 | Fire |0.26m, 12.72deg|0.26m, 12.45deg
 9 | Heads|0.14m, 12.34deg|0.14m, 11.72deg
10 | Office|0.21m, 7.35deg|0.20m, 9.23deg
11 | Pumpkin|0.24m, 6.35deg|0.21m, 8.10deg
12 | Red Kitchen|0.24m, 8.03deg|0.23m, 8.82deg
13 | Stairs|0.27m, 11.82deg|0.27m, 11.66deg
14 | Average|0.21m, 9.30deg|0.20m, 9.87deg
15 | 
16 | ## Installation
17 | - create and activate conda environment with Python 3.x
18 | ```
19 | conda create -n my_fancy_env python=3.7
20 | source activate my_fancy_env
21 | ```
22 | - install all dependencies by running the following command:
23 | ```
24 | pip install -r requirements.txt
25 | ```
26 | 
27 | ## Evaluation and Training
28 | Evaluation and training have been performed on the 7-Scenes dataset available [here](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/). Important!!! The images have to be resized such that the smaller dimension is 256 and the aspect ratio is intact. This could be done using the following command:
29 | ```find . -name "*.color.png" | xargs -I {} convert {} -resize "256^>" {}```
30 | 
31 | ### Evaluation
32 | - download an [archive](https://drive.google.com/drive/folders/1TnVuR2bNZviYYdT3XLqCW4xjO19eLG6T?usp=sharing) with the model snapshot and unpack it to the working directory
33 | - navigate to `RelPoseNet/experiments` and modify the main config file `configs/main.yaml`. Here, you need to change `work_dir` and `datasets_home_dir`
34 | - modify `img_path` in the `configs/experiment/7scenes.yaml` config file. Where `img_path` is the path with resized images of the 7-Scenes dataset
35 | - run `main.py` from `experiments` path
36 | - once evaluated, the script creates a text file with relative camera poses located at `${experiment.experiment_params.output.home_dir}/est_rel_poses.txt`
37 | - in order to predict absolute poses, run MATLAB and open `experiments/seven_scenes/filter_pose.m`
38 | - modify line 17 by providing the text file with estimated relative poses
39 | - if everything goes fine, one should get localization performance presented in the table above.
40 | 
41 | 
42 | ### Training
43 | - modify a config file `RelPoseNet/configs/main.yaml` by changing `work_dir`, `img_dir`, and `out_dir`
44 | - to perform training, run `RelPoseNet/main.py`
45 | 
46 | 
47 | ## License
48 | Our code is released under the Creative Commons BY-NC-SA 3.0, available only for non-commercial use.
49 | 
50 | ## How to cite
51 | If you use this project in your research, please cite:
52 | 
53 | ```
54 | @inproceedings{Laskar2017PoseNet,
55 |       title = {Camera relocalization by computing pairwise relative poses using convolutional neural network},
56 |       author = {Laskar, Zakaria and Melekhov, Iaroslav and Kalia, Surya and Kannala, Juho},
57 |        year = {2017},
58 |        booktitle = {Proceedings of the IEEE International Conference on Computer Vision Workshops}
59 | }
60 | 
61 | @inproceedings{Melekhov2017RelPoseNet,
62 |       title = {Camera relocalization by computing pairwise relative poses using convolutional neural network},
63 |       author = {Melekhov, Iaroslav and Ylioinas, Juha and Kannala, Juho and Rahtu, Esa},
64 |        year = {2017},
65 |        booktitle = {International Conference on Advanced Concepts for Intelligent Vision Systems}
66 | }
67 | ```
68 | 


--------------------------------------------------------------------------------
/configs/main.yaml:
--------------------------------------------------------------------------------
 1 | pipeline: Relative Camera Pose Estimation pipeline
 2 | data_params:
 3 |   work_dir: /data/projects/RelPoseNet
 4 |   img_dir: /ssd/data/7scenes-light
 5 |   train_pairs_fname: ${data_params.work_dir}/assets/data/db_all_med_hard_train.txt
 6 |   val_pairs_fname: ${data_params.work_dir}/assets/data/db_all_med_hard_valid.txt
 7 | model_params:
 8 |   backbone_net: resnet34
 9 |   resume_snapshot: null
10 | train_params:
11 |   bs: 32
12 |   lr: 1e-3
13 |   alpha: 1
14 |   n_workers: 8
15 |   n_train_iters: 125000 # 42k is the size of our training dataset 'db_all_med_hard_train.txt'
16 |   scheduler:
17 |     lrate_decay_steps: 15000
18 |     lrate_decay_factor: 0.5
19 | output_params:
20 |   out_dir: /data/output/relposenet
21 |   logger_dir: ${output_params.out_dir}/tboard/${model_params.backbone_net}
22 |   snapshot_dir: ${output_params.out_dir}/snapshots/${model_params.backbone_net}
23 |   validate_interval: 1300
24 |   log_scalar_interval: 200
25 | seed: 1984
26 | hydra:
27 |   run:
28 |     dir: ${output_params.out_dir}
29 | 


--------------------------------------------------------------------------------
/experiments/configs/experiment/7scenes.yaml:
--------------------------------------------------------------------------------
 1 | # @package _group_
 2 | experiment_params:
 3 |   name: 7scenes
 4 |   bs: 16
 5 |   n_workers: 8
 6 |   paths:
 7 |     img_path: ${paths.datasets_home_dir}/7scenes-light
 8 |     test_pairs_fname: ${paths.work_dir}/assets/data/NN_7scenes.txt
 9 |   output:
10 |     home_dir: ${paths.output_home_dir}/${model.model_params.name}/${experiment.experiment_params.name}
11 |     res_txt_fname: ${experiment.experiment_params.output.home_dir}/est_rel_poses.txt


--------------------------------------------------------------------------------
/experiments/configs/main.yaml:
--------------------------------------------------------------------------------
 1 | pipeline: Relative Camera Pose Estimation pipeline
 2 | defaults:
 3 |   - experiment: 7scenes
 4 |   - model: relposenet
 5 | paths:
 6 |   work_dir: /data/projects/RelPoseNet
 7 |   datasets_home_dir: /ssd/data
 8 |   output_home_dir: ${paths.work_dir}/output
 9 |   snapshots_dir: ${paths.work_dir}/data/snapshots
10 | hydra:
11 |   run:
12 |     dir: ${paths.output_home_dir}


--------------------------------------------------------------------------------
/experiments/configs/model/relposenet.yaml:
--------------------------------------------------------------------------------
1 | # @package _group_
2 | model_params:
3 |   name: relposenet
4 |   backbone_net: resnet34
5 |   snapshot: ${paths.snapshots_dir}/${model.model_params.name}/best_val_flipped_1_dropout_no_grey.pth


--------------------------------------------------------------------------------
/experiments/main.py:
--------------------------------------------------------------------------------
 1 | import hydra
 2 | from experiments.seven_scenes.pipeline import SevenScenesBenchmark
 3 | 
 4 | 
 5 | @hydra.main(config_path="configs", config_name="main")
 6 | def main(cfg):
 7 |     benchmark = None
 8 |     if cfg.experiment.experiment_params.name == '7scenes':
 9 |         benchmark = SevenScenesBenchmark(cfg)
10 | 
11 |     benchmark.evaluate()
12 | 
13 | 
14 | if __name__ == "__main__":
15 |     main()
16 | 


--------------------------------------------------------------------------------
/experiments/service/benchmark_base.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | 
 3 | 
 4 | class Benchmark(object):
 5 |     def __init__(self, cfg):
 6 |         self.cfg = cfg
 7 |         self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 8 | 
 9 |     def evaluate(self):
10 |         raise NotImplementedError
11 | 


--------------------------------------------------------------------------------
/experiments/seven_scenes/filter_pose.m:
--------------------------------------------------------------------------------
  1 | %% script to filter pose estimates from NN
  2 | clear all
  3 | addpath('matlab_service')
  4 | addpath('../../assets/data')
  5 | 
  6 | dataset_name = '7-Scenes'; % or 'University'
  7 | % getting GT file
  8 | if strcmp(dataset_name, '7-Scenes')
  9 |     file_id_gt = fopen('NN_7scenes.txt');
 10 | elseif strcmp(dataset_name, 'University')
 11 |     file_id_gt = fopen('NN_university.txt');
 12 | else
 13 |     error('Please, specify dataset_name variable properly [7-Scenes or University]');
 14 | end
 15 | 
 16 | % Txt file with network predictions
 17 | file_id_est = fopen('../../output/est_rel_poses.txt');
 18 | 
 19 | data_cells = textscan(file_id_gt, '%s %s %d %d %d %f %f %f %f %f %f %f %f %f %f %f %f %f %f');
 20 | translation_gt_q = [data_cells{1,4+2} data_cells{1,5+2} data_cells{1,6+2}];
 21 | orientation_gt_q = [data_cells{1,7+2} data_cells{1,8+2} data_cells{1,9+2} ...
 22 |                   data_cells{1,10+2}];
 23 | translation_gt_db = [data_cells{1,11+2} data_cells{1,12+2} data_cells{1,13+2}];
 24 | orientation_gt_db = [data_cells{1,14+2} data_cells{1,15+2} data_cells{1,16+2} ...
 25 |                   data_cells{1,17+2}];
 26 | 
 27 | number_of_pairs = size(translation_gt_q, 1);
 28 | 
 29 | data_cells_est = textscan(file_id_est, '%f %f %f %f %f %f %f');
 30 | 
 31 | orientation_est = [data_cells_est{1, 1} data_cells_est{1, 2} ...
 32 |                    data_cells_est{1, 3} data_cells_est{1, 4}];
 33 | translation_est = [data_cells_est{1, 5} data_cells_est{1, 6} data_cells_est{1, 7}];
 34 | 
 35 | %estimations = fread(pred_file_id, [7 Inf], 'float')';
 36 | fclose(file_id_gt);
 37 | fclose(file_id_est);
 38 | 
 39 | %orientation_est = estimations(:, 1:4);
 40 | %translation_est = estimations(:, 5:end);
 41 | 
 42 | %% main filtering stage
 43 | 
 44 | % intialize variables
 45 | orientation_err_deg = zeros(1, number_of_pairs);
 46 | translation_err_deg = zeros(1, number_of_pairs);
 47 | 
 48 | NN_count = 0; % counter over the NN from 1 to |NN|(=5)
 49 | allPairs = 0; % index to store the triangulated 3D camera locations
 50 | queryNum = 0;
 51 | 
 52 | % errors
 53 | err_trans = zeros(1, number_of_pairs/5);
 54 | err_quat = zeros(1, number_of_pairs/5);
 55 | 
 56 | % triangulations from mid-point algo
 57 | P = cell(1,5);
 58 | matches = zeros(1,4);
 59 | trans_tmp = zeros(10,3); % store all the possible camera locations from pairwise combinations of NN db images
 60 | 
 61 | % NN 
 62 | R_q_NN = zeros(5,4);
 63 | R_db_NN = zeros(5,4);
 64 | t_db_NN = zeros(5,3);
 65 | t_q_NN = zeros(5,3);
 66 | 
 67 | % estimated direction vectors from db to query
 68 | centers_rel_network = zeros(5,3);
 69 | 
 70 | falseC = 0;
 71 | 
 72 | for k=1:number_of_pairs
 73 | %     k
 74 |     %% ground truths
 75 |     %------- rotation
 76 |     R_q  = quat2rotm(orientation_gt_q(k,:)  ./ norm(orientation_gt_q(k,:)));
 77 |     R_db  = quat2rotm(orientation_gt_db(k,:)  ./ norm(orientation_gt_db(k,:)));
 78 |     %------- translation
 79 |     t_q = translation_gt_q(k,:)  ;
 80 |     t_db = translation_gt_db(k,:) ;
 81 |     
 82 |     %% estimations
 83 |     %------- rotation
 84 |     delR_est = quat2rotm(orientation_est(k,:) ./ norm(orientation_est(k,:)));
 85 |     R_q_est = R_db*delR_est;
 86 |     %------- translation
 87 |     t_est_center = translation_est(k,:)./norm(translation_est(k,:)); % (C_i - C_j)
 88 |     t_est = (R_db'*t_est_center'); %R_j'(C_i - C_j)
 89 |     
 90 |     %% -------------------------------------------------------------------------
 91 |     % store the estimations and db pose estimations for each NN related to
 92 |     % a query
 93 |     NN_count = NN_count + 1;
 94 |     R_q_NN(NN_count,:) = rotm2quat(R_q_est);
 95 |     t_q_NN(NN_count,:) = t_est;
 96 |     
 97 |     R_db_NN(NN_count,:) = rotm2quat(R_db);
 98 |     t_db_NN(NN_count,:) = t_db;
 99 |     
100 |     P{NN_count} = [R_db' -R_db'*t_db'];
101 |     
102 |     centers_rel_network(NN_count,:) = t_est_center;
103 |     
104 |     % iterate over pairwise combinations {(1,2),(1,3),(2,3),(1,4),(2,4).....(4,5)}
105 |     for i = 1:NN_count-1
106 |         
107 |         
108 |         allPairs = allPairs + 1;
109 |       
110 |         % for triangulating a 3D camera position, we need the camera
111 |         % matrices of the two db cameras: P1, P2 and the translation
112 |         % directions from db to q: t1, t2 such that the z-cordinate is 1
113 |         P1 = P{i};
114 |         P2 = P{NN_count};      
115 |         t1 = t_q_NN(i,:)./t_q_NN(i,3);
116 |         t2 = t_q_NN(NN_count,:)./t_q_NN(NN_count,3);
117 |         matches(1,1:2) = t1(1:2);
118 |         matches(1,3:4) = t2(1:2);
119 |         X = triangmidpoints(matches, P1, P2);
120 |         
121 |         trans_tmp(allPairs,:) = X;
122 |     end
123 |     
124 |     
125 |     %% Filtering stage
126 |     
127 |     % if all the NN for a query are processed 
128 |     if NN_count == 5
129 |         queryNum = queryNum + 1;
130 | 
131 |         % re-initialize the variables
132 |         NN_count = 0;
133 |         allPairs = 0;
134 | 
135 |         %% inlier process for trans
136 |         
137 |         % NaN can arise when the translation direction of two NN db
138 |         % image-pairs used to triangulate the camera location have the same
139 |         % direction. In the event all the pairwise combinations output same
140 |         % translation directions, assign the translation vector of the NN
141 |         % to the query
142 |         if numel(find(isnan(trans_tmp))) == 10
143 |             X_pred = t_db_NN(1,:);
144 |             [err_trans(queryNum)] = norm(X_pred - t_q);
145 |         else
146 |             % remove the nan estimates
147 |             nan_rows = any(isnan(trans_tmp),2) ;
148 |             trans_tmp(nan_rows,:) = [];
149 |            
150 |             thresh_trans = 20; %10 degrees
151 |             inlier_cnt_T = zeros(1,size(trans_tmp,1)); % store the inlier count estimates of the triangulated camera locs
152 |             inlier_sum_T = zeros(1,size(trans_tmp,1)); % store the sum of residuals of distances for the inliers
153 |             % estimate inliers for orientation
154 |             % iterate over the triangulated 3D camera locs
155 |             for h = 1:size(trans_tmp,1)
156 |                 
157 |                 % obtain the direction vectors from the database to query 
158 |                 centers_rel_triang = bsxfun(@minus, trans_tmp(h,:), t_db_NN);
159 |                 
160 |                 % make unit length
161 |                 centers_rel_triang = bsxfun(@rdivide,centers_rel_triang,sqrt(sum(abs(centers_rel_triang).^2,2)));
162 |                 
163 |                 % compute angular distance between the translation
164 |                 % directions predicted by the network: centers_rel_network and that
165 |                 % obtained from triangulation followed by backprojection:
166 |                 % centers_rel_triang
167 |                 
168 |                 angular_dist_T = 2*acos(abs(sum(centers_rel_triang.*centers_rel_network,2)))*180/pi;
169 |               
170 |                 inlier_thresh_T = find(angular_dist_T<thresh_trans);
171 |                 inlier_cnt_T(h) = numel(inlier_thresh_T); %bcz the sample itself is an inlier to itself
172 |                 inlier_sum_T(h) = sum(angular_dist_T(inlier_thresh_T));
173 |                 
174 |             end
175 |             
176 |             % select the best inlier
177 |             [init_estimate_inlier_T, init_ID_T] = max(inlier_cnt_T);
178 |           
179 |             sim_inlier_cnt_T = find(inlier_cnt_T == init_estimate_inlier_T); % find other estimates with similar inlier counts
180 |             
181 |             if numel(sim_inlier_cnt_T) > 1 %if exists such other estimate
182 |                 
183 |                   % OPtion 1: average the candidates
184 |                   X_best = mean(trans_tmp(sim_inlier_cnt_T,:));
185 |                   
186 | % %                   
187 | %                 % OPtion 2: select the inlier estimate with least residual sum
188 | %                 [all_estimates_T, all_ID_T] = min(inlier_sum_T(sim_inlier_cnt_T));
189 | %                 %             all_ID = randi([1 numel(sim_inlier_cnt)],1,1); % if randomly chosen
190 | %                 X_best = trans_tmp(sim_inlier_cnt_T(all_ID_T),:);
191 |                 err_trans(queryNum) = norm(X_best - t_q);
192 |                 
193 |                 %             % take the inlier with the best estimate using GT
194 |                 %             inl_dist_GT = 2*acos(abs(sum(bsxfun(@times, R_qs(sim_inlier_cnt,:),rotm2quat(R_q)),2)))*180/pi;
195 |                 %             err_quat(queryNum) = min(inl_dist_GT);
196 |             else
197 |                 X_best = trans_tmp(init_ID_T,:);
198 |                 err_trans(queryNum) = norm(X_best - t_q);
199 |             end
200 | %             
201 |         end
202 | 
203 |         %% filtering process for rotation
204 |         
205 |         thresh_ort = 20; %10 degrees
206 |         inlier_cnt = zeros(1,5); % store the inlier count estimates of the triangulated camera locs
207 |         inlier_sum = zeros(1,5); % store the sum of residuals of distances for the inliers
208 |         % iterate over the rotation estimates obtained from NN
209 |         for h = 1:5
210 |             
211 |             % compute the angular distance between the current estimate of
212 |             % query rotation R_q_NN(h,:) as indexed by h and the rest of
213 |             % the estimations. 
214 |             angular_dist = 2*acos(abs(sum(bsxfun(@times, R_q_NN(h,:),R_q_NN),2)))*180/pi;
215 |             
216 |             inlier_thresh = find(angular_dist<thresh_ort);
217 |             inlier_cnt(h) = numel(inlier_thresh)-1; % bcz the sample itself is an inlier to itself
218 |             inlier_sum(h) = sum(angular_dist(inlier_thresh));
219 |             
220 |         end
221 |         
222 |         % select the best inlier
223 |         [init_estimate_inlier, init_ID] = max(inlier_cnt);
224 | 
225 |         sim_inlier_cnt = find(inlier_cnt == init_estimate_inlier); % find other estimates with similar inlier counts
226 |         
227 |         if numel(sim_inlier_cnt) > 1 %if exists such other estimate
228 |             
229 |             % OPtion 1: average the candidates
230 |             for inl = 1:numel(sim_inlier_cnt)
231 |                 
232 |                R_inl(:,:,inl)  = quat2rotm(R_q_NN(sim_inlier_cnt(inl),:));
233 |                 
234 |             end
235 |             
236 |             R_avg = dqq_L1_mean_rotation_matrix(R_inl);
237 |             q_best = rotm2quat(R_avg);
238 | %             % OPtion 2: select the inlier estimate with least residual sum
239 | %             [all_estimates, all_ID] = min(inlier_sum(sim_inlier_cnt));
240 | %             q_best = R_qs(sim_inlier_cnt(all_ID),:);
241 |             err_quat(queryNum) = 2*acos(abs(sum(q_best.*rotm2quat(R_q))))*180/pi;
242 |             
243 |         else
244 |             q_best = R_q_NN(init_ID,:);
245 |             err_quat(queryNum) = 2*acos(abs(sum(q_best.*rotm2quat(R_q))))*180/pi;
246 |         end
247 | 
248 |     end
249 |     
250 |     
251 |     
252 | end
253 | 
254 | %% results
255 | if strcmp(dataset_name, '7-Scenes')
256 |     chess = median(err_quat(6001:8000));
257 |     fire = median(err_quat(1:2000));
258 |     heads = median(err_quat(8001:9000));
259 |     office = median(err_quat(2001:6000));
260 |     pumpkin = median(err_quat(15001:17000));
261 |     redkitchen = median(err_quat(10001:15000));
262 |     stairs = median(err_quat(9001:10000));
263 |     fprintf('Orientation error, deg:\n');
264 |     fprintf('chess: %.2f\n', chess);
265 |     fprintf('fire: %.2f\n', fire);
266 |     fprintf('heads: %.2f\n', heads);
267 |     fprintf('office: %.2f\n', office);
268 |     fprintf('pumpkin: %.2f\n', pumpkin);
269 |     fprintf('redkitchen: %.2f\n', redkitchen);
270 |     fprintf('stairs: %.2f\n', stairs);
271 |     fprintf('Mean averaged orientation: %.2f deg.\n', mean([chess fire heads office pumpkin redkitchen stairs]));
272 |     fprintf('--------------------------------------------------------\n');
273 |     chess = median(err_trans(6001:8000));
274 |     fire = median(err_trans(1:2000));
275 |     heads = median(err_trans(8001:9000));
276 |     office = median(err_trans(2001:6000));
277 |     pumpkin = median(err_trans(15001:17000));
278 |     redkitchen = median(err_trans(10001:15000));
279 |     stairs = median(err_trans(9001:10000));
280 |     fprintf('Translation error, m:\n');
281 |     fprintf('chess: %.2f\n', chess);
282 |     fprintf('fire: %.2f\n', fire);
283 |     fprintf('heads: %.2f\n', heads);
284 |     fprintf('office: %.2f\n', office);
285 |     fprintf('pumpkin: %.2f\n', pumpkin);
286 |     fprintf('redkitchen: %.2f\n', redkitchen);
287 |     fprintf('stairs: %.2f\n', stairs);
288 |     fprintf('Mean averaged translation: %.2f m.\n', mean([chess fire heads office pumpkin redkitchen stairs]));
289 | else
290 |     conference = median(err_quat(1:949));
291 |     kitchen1 = median(err_quat(950:1939));
292 |     meeting = median(err_quat(1940:2884));
293 |     office = median(err_quat(2885:end));
294 |     fprintf('Orientation error, deg:\n');
295 |     fprintf('office: %.2f\n', office);
296 |     fprintf('meeting: %.2f\n', meeting);
297 |     fprintf('kitchen1: %.2f\n', kitchen1);
298 |     fprintf('conference: %.2f\n', conference);
299 |     fprintf('Mean averaged orientation: %.2f deg.\n', mean([conference kitchen1 meeting office]));
300 |     fprintf('--------------------------------------------------------\n');
301 |     % translation
302 |     conference = median(err_trans(1:949));
303 |     kitchen1 = median(err_trans(950:1939));
304 |     meeting = median(err_trans(1940:2884));
305 |     office = median(err_trans(2885:end));
306 |     fprintf('Translation error, m:\n');
307 |     fprintf('office: %.2f\n', office);
308 |     fprintf('meeting: %.2f\n', meeting);
309 |     fprintf('kitchen1: %.2f\n', kitchen1);
310 |     fprintf('conference: %.2f\n', conference);
311 |     fprintf('Mean averaged translation: %.2f m.\n', mean([chess fire heads office pumpkin redkitchen stairs]));
312 | end
313 | 
314 | 
315 | 
316 | 


--------------------------------------------------------------------------------
/experiments/seven_scenes/matlab_service/dqq_L1_mean_rotation_matrix.m:
--------------------------------------------------------------------------------
 1 | function [ Rmean ] = dqq_L1_mean_rotation_matrix( R )
 2 | %DQQ_L1_MEAN_ROTATION_MATRIX Summary of this function goes here
 3 | %   This function calculate the mean rotation matrix of the given 3*3*n R matrix
 4 | %	under L1 norm by Weiszfeld algorithm.
 5 | %	Please refer to the paper: 
 6 | %	'L1 rotation averaging using the Weiszfel algorithm', Richard Hartley, etc, CVPR 2011
 7 | %	for details.
 8 | 
 9 | S(:,:,1) = dqq_rotation_quaternion_initialization( R );
10 | nofR=size(R);
11 | 
12 | iter=1;
13 | 
14 | while isreal(S(:,:,iter))
15 |     iter=iter+1;
16 |     sum_vmatrix_normed(:,:,iter)=zeros(3,3);
17 |     for j=1:nofR(3)
18 |         vmatrix(:,:,j)=logm(R(:,:,j)*(S(:,:,iter-1))^(-1));
19 |         vmatrix_normed(:,:,j)=vmatrix(:,:,j)/norm(vmatrix(:,:,j));
20 |         sum_vmatrix_normed(:,:,iter)=sum_vmatrix_normed(:,:,iter)+vmatrix_normed(:,:,j);
21 |         inv_norm_vmatrix(j)=1/norm(vmatrix(:,:,j));
22 |     end
23 |     
24 |     delta(:,:,iter)=sum_vmatrix_normed(:,:,iter)/sum(inv_norm_vmatrix);
25 |     
26 |     S(:,:,iter)=expm(delta(:,:,iter))*S(:,:,iter-1);
27 |     
28 |     if abs(1-det(S(:,:,iter)*S(:,:,iter)'))<10^(-10)
29 |         break;
30 |     end
31 | end
32 | 
33 | Rmean=S(:,:,iter-1);
34 | 
35 | end
36 | 
37 | 


--------------------------------------------------------------------------------
/experiments/seven_scenes/matlab_service/dqq_rotation_quaternion_initialization.m:
--------------------------------------------------------------------------------
 1 | function [ Rm ] = dqq_rotation_quaternion_initialization( R )
 2 | %DQQ_R_Q_INITIALIZATION Summary of this function goes here
 3 | %   Detailed explanation goes here
 4 | %   This is a function which provides an initialization for givin mean rotation  
 5 | %	matrix R.
 6 | %	Please refer to
 7 | %	'Rotation Averaging with Application to Camera-Rig Calibration'
 8 | %	for details.
 9 | 
10 | QR=dcm2quat(R);
11 | 
12 | SR=R(:,:,1);
13 | SQ=dcm2quat(SR);
14 | 
15 | nofR=size(R);
16 | 
17 | for i=1:nofR(3)
18 |     if norm(QR(i,:)+SQ)<norm(QR(i,:)-SQ)
19 |         QR(i,:)=-QR(i,:);
20 |     end
21 | end
22 | 
23 | 
24 | barQR=sum(QR,1)/nofR(3);
25 | 
26 | barQR=barQR/norm(barQR);
27 | 
28 | Rm=quat2dcm(barQR);
29 | 
30 | 
31 | end


--------------------------------------------------------------------------------
/experiments/seven_scenes/matlab_service/triangmidpoints.m:
--------------------------------------------------------------------------------
 1 | function [X]=triangmidpoints(matches,P1,P2)
 2 | 
 3 | n=size(matches,1);
 4 | 
 5 | [U,S,V]=svd(P1);
 6 | Q1=V(:,4)./V(4,4);
 7 | t1=Q1(1:3);
 8 | invM1=invmat3x3(P1(1:3,1:3));
 9 | [U,S,V]=svd(P2);
10 | Q2=V(:,4)./V(4,4);
11 | t2=Q2(1:3);
12 | invM2=invmat3x3(P2(1:3,1:3));
13 | 
14 | X=zeros(3,n);
15 | Id=eye(3);
16 | 
17 | for i=1:n
18 |   Xinf1=invM1*[matches(i,1);matches(i,2);1];
19 |   D1=Xinf1./sqrt(sum(Xinf1.^2));
20 |   Xinf2=invM2*[matches(i,3);matches(i,4);1];
21 |   D2=Xinf2./sqrt(sum(Xinf2.^2));
22 |   X(:,i)=invmat3x3((Id-D1*D1')+(Id-D2*D2'))*(t1+t2-(t1'*D1)*D1-(t2'*D2)*D2);
23 | end
24 | 
25 | 
26 | function invM=invmat3x3(M)
27 | 
28 | a=M(1,1);b=M(1,2);c=M(1,3);
29 | d=M(2,1);e=M(2,2);f=M(2,3);
30 | g=M(3,1);h=M(3,2);k=M(3,3);
31 | 
32 | detM=a*(e*k-f*h)+b*(f*g-k*d)+c*(d*h-e*g);
33 | 
34 | invM=[(e*k-f*h) (c*h-b*k) (b*f-c*e);...
35 |       (f*g-d*k) (a*k-c*g) (c*d-a*f);...
36 |       (d*h-e*g) (g*b-a*h) (a*e-b*d)];
37 | invM=invM./detM;
38 | 
39 | if 0
40 |   max(max(abs(invM-inv(M))));
41 | end
42 | 


--------------------------------------------------------------------------------
/experiments/seven_scenes/pipeline.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from os import path as osp
 3 | from tqdm import tqdm
 4 | import torch
 5 | from experiments.service.benchmark_base import Benchmark
 6 | from relposenet.dataset import SevenScenesTestDataset
 7 | from relposenet.augmentations import get_augmentations
 8 | from relposenet.model import RelPoseNet
 9 | 
10 | 
11 | class SevenScenesBenchmark(Benchmark):
12 |     def __init__(self, cfg):
13 |         super().__init__(cfg)
14 | 
15 |         self.dataloader = self._init_dataloader()
16 |         self.model = self._load_model_relposenet().to(self.device)
17 | 
18 |     def _init_dataloader(self):
19 |         experiment_cfg = self.cfg.experiment.experiment_params
20 | 
21 |         # define test augmentations
22 |         _, eval_aug = get_augmentations()
23 | 
24 |         # test dataset
25 |         dataset = SevenScenesTestDataset(experiment_cfg, eval_aug)
26 | 
27 |         # define a dataloader
28 |         dataloader = torch.utils.data.DataLoader(dataset,
29 |                                                  batch_size=experiment_cfg.bs,
30 |                                                  shuffle=False,
31 |                                                  pin_memory=True,
32 |                                                  num_workers=experiment_cfg.n_workers,
33 |                                                  drop_last=False)
34 | 
35 |         return dataloader
36 | 
37 |     def _load_model_relposenet(self):
38 |         print(f'Loading RelPoseNet model...')
39 |         model_params_cfg = self.cfg.model.model_params
40 |         model = RelPoseNet(model_params_cfg)
41 | 
42 |         data_dict = torch.load(model_params_cfg.snapshot)
43 |         model.load_state_dict(data_dict['state_dict'])
44 |         print(f'Loading RelPoseNet model... Done!')
45 |         return model.eval()
46 | 
47 |     def evaluate(self):
48 |         q_est_all, t_est_all = [], []
49 |         print(f'Evaluate on the dataset...')
50 |         with torch.no_grad():
51 |             for data_batch in tqdm(self.dataloader):
52 |                 q_est, t_est = self.model(data_batch['img1'].to(self.device),
53 |                                           data_batch['img2'].to(self.device))
54 | 
55 |                 q_est_all.append(q_est)
56 |                 t_est_all.append(t_est)
57 | 
58 |         q_est_all = torch.cat(q_est_all).cpu().numpy()
59 |         t_est_all = torch.cat(t_est_all).cpu().numpy()
60 | 
61 |         print(f'Write the estimates to a text file')
62 |         experiment_cfg = self.cfg.experiment.experiment_params
63 | 
64 |         if not osp.exists(experiment_cfg.output.home_dir):
65 |             os.makedirs(experiment_cfg.output.home_dir)
66 | 
67 |         with open(experiment_cfg.output.res_txt_fname, 'w') as f:
68 |             for q_est, t_est in zip(q_est_all, t_est_all):
69 |                 f.write(f"{q_est[0]} {q_est[1]} {q_est[2]} {q_est[3]} {t_est[0]} {t_est[1]} {t_est[2]}\n")
70 | 
71 |         print(f'Done')
72 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
 1 | import hydra
 2 | from relposenet.pipeline import Pipeline
 3 | 
 4 | 
 5 | @hydra.main(config_path="configs", config_name="main")
 6 | def main(cfg):
 7 |     pipeline = Pipeline(cfg)
 8 |     pipeline.run()
 9 | 
10 | 
11 | if __name__ == "__main__":
12 |     main()
13 | 


--------------------------------------------------------------------------------
/relposenet/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AaltoVision/RelPoseNet/dd1774e85ddcbc7e4dc3c98530d02b1f4e34349e/relposenet/__init__.py


--------------------------------------------------------------------------------
/relposenet/augmentations.py:
--------------------------------------------------------------------------------
 1 | import torchvision.transforms as transforms
 2 | 
 3 | 
 4 | def get_imagenet_mean_std():
 5 |     mean, std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
 6 |     return mean, std
 7 | 
 8 | 
 9 | def get_augmentations():
10 |     train_aug = train_augmentations()
11 |     val_aug = eval_augmentations()
12 |     return train_aug, val_aug
13 | 
14 | 
15 | def train_augmentations():
16 |     mean, std = get_imagenet_mean_std()
17 |     transform = transforms.Compose([transforms.RandomCrop(size=224),
18 |                                     transforms.ColorJitter(brightness=0.4,
19 |                                                            contrast=0.4,
20 |                                                            saturation=0.4),
21 |                                     transforms.ToTensor(),
22 |                                     transforms.Normalize(mean, std)
23 |                                     ])
24 | 
25 |     return transform
26 | 
27 | 
28 | def eval_augmentations():
29 |     mean, std = get_imagenet_mean_std()
30 |     transform = transforms.Compose([transforms.CenterCrop(size=224),
31 |                                     transforms.ToTensor(),
32 |                                     transforms.Normalize(mean, std)
33 |                                     ])
34 | 
35 |     return transform
36 | 


--------------------------------------------------------------------------------
/relposenet/criterion.py:
--------------------------------------------------------------------------------
 1 | from abc import ABC
 2 | import torch.nn as nn
 3 | 
 4 | 
 5 | class RelPoseCriterion(nn.Module, ABC):
 6 |     def __init__(self, alpha=1):
 7 |         super().__init__()
 8 |         self.alpha = alpha
 9 |         self.q_loss = nn.MSELoss()
10 |         self.t_loss = nn.MSELoss()
11 | 
12 |     def forward(self, q_gt, t_gt, q_est, t_est):
13 |         t_loss = self.t_loss(t_est, t_gt)
14 |         q_loss = self.q_loss(q_est, q_gt)
15 | 
16 |         loss_total = t_loss + self.alpha * q_loss
17 |         return loss_total, t_loss.item(), q_loss.item()
18 | 


--------------------------------------------------------------------------------
/relposenet/dataset.py:
--------------------------------------------------------------------------------
  1 | import random
  2 | from os import path as osp
  3 | from collections import defaultdict
  4 | from PIL import Image
  5 | import torch
  6 | 
  7 | 
  8 | class SevenScenesRelPoseDataset(object):
  9 |     def __init__(self, cfg, split='train', transforms=None):
 10 |         self.cfg = cfg
 11 |         self.split = split
 12 |         self.transforms = transforms
 13 |         self.scenes_dict = defaultdict(str)
 14 |         for i, scene in enumerate(['chess', 'fire', 'heads', 'office', 'pumpkin', 'redkitchen', 'stairs']):
 15 |             self.scenes_dict[i] = scene
 16 | 
 17 |         self.fnames1, self.fnames2, self.t_gt, self.q_gt = self._read_pairs_txt()
 18 | 
 19 |     def _read_pairs_txt(self):
 20 |         fnames1, fnames2, t_gt, q_gt = [], [], [], []
 21 | 
 22 |         data_params = self.cfg.data_params
 23 | 
 24 |         pairs_txt = data_params.train_pairs_fname if self.split == 'train' else data_params.val_pairs_fname
 25 |         with open(pairs_txt, 'r') as f:
 26 |             for line in f:
 27 |                 chunks = line.rstrip().split(' ')
 28 |                 scene_id = int(chunks[2])
 29 |                 fnames1.append(osp.join(data_params.img_dir, self.scenes_dict[scene_id], chunks[0][1:]))
 30 |                 fnames2.append(osp.join(data_params.img_dir, self.scenes_dict[scene_id], chunks[1][1:]))
 31 | 
 32 |                 t_gt.append(torch.FloatTensor([float(chunks[3]), float(chunks[4]), float(chunks[5])]))
 33 |                 q_gt.append(torch.FloatTensor([float(chunks[6]),
 34 |                                                float(chunks[7]),
 35 |                                                float(chunks[8]),
 36 |                                                float(chunks[9])]))
 37 | 
 38 |         return fnames1, fnames2, t_gt, q_gt
 39 | 
 40 |     def __getitem__(self, item):
 41 |         img1 = Image.open(self.fnames1[item]).convert('RGB')
 42 |         img2 = Image.open(self.fnames2[item]).convert('RGB')
 43 |         t_gt = self.t_gt[item]
 44 |         q_gt = self.q_gt[item]
 45 | 
 46 |         if self.transforms:
 47 |             img1 = self.transforms(img1)
 48 |             img2 = self.transforms(img2)
 49 | 
 50 |         # randomly flip images in an image pair
 51 |         if random.uniform(0, 1) > 0.5:
 52 |             img1, img2 = img2, img1
 53 |             t_gt = -self.t_gt[item]
 54 |             q_gt = torch.FloatTensor([q_gt[0], -q_gt[1], -q_gt[2], -q_gt[3]])
 55 | 
 56 |         return {'img1': img1,
 57 |                 'img2': img2,
 58 |                 't_gt': t_gt,
 59 |                 'q_gt': q_gt}
 60 | 
 61 |     def __len__(self):
 62 |         return len(self.fnames1)
 63 | 
 64 | 
 65 | class SevenScenesTestDataset(object):
 66 |     def __init__(self, experiment_cfg, transforms=None):
 67 |         self.experiment_cfg = experiment_cfg
 68 |         self.transforms = transforms
 69 |         self.scenes_dict = defaultdict(str)
 70 |         for i, scene in enumerate(['chess', 'fire', 'heads', 'office', 'pumpkin', 'redkitchen', 'stairs']):
 71 |             self.scenes_dict[i] = scene
 72 | 
 73 |         self.fnames1, self.fnames2 = self._read_pairs_txt()
 74 | 
 75 |     def _read_pairs_txt(self):
 76 |         fnames1, fnames2 = [], []
 77 | 
 78 |         pairs_txt = self.experiment_cfg.paths.test_pairs_fname
 79 |         img_dir = self.experiment_cfg.paths.img_path
 80 |         with open(pairs_txt, 'r') as f:
 81 |             for line in f:
 82 |                 chunks = line.rstrip().split(' ')
 83 |                 scene_id1 = int(chunks[2])
 84 |                 scene_id2 = int(chunks[3])
 85 |                 fnames1.append(osp.join(img_dir, self.scenes_dict[scene_id2], chunks[1][1:]))
 86 |                 fnames2.append(osp.join(img_dir, self.scenes_dict[scene_id1], chunks[0][1:]))
 87 | 
 88 |         return fnames1, fnames2
 89 | 
 90 |     def __getitem__(self, item):
 91 |         img1 = Image.open(self.fnames1[item]).convert('RGB')
 92 |         img2 = Image.open(self.fnames2[item]).convert('RGB')
 93 | 
 94 |         if self.transforms:
 95 |             img1 = self.transforms(img1)
 96 |             img2 = self.transforms(img2)
 97 | 
 98 |         return {'img1': img1,
 99 |                 'img2': img2,
100 |                 }
101 | 
102 |     def __len__(self):
103 |         return len(self.fnames1)
104 | 


--------------------------------------------------------------------------------
/relposenet/model.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | import torchvision.models as models
 4 | 
 5 | 
 6 | class RelPoseNet(nn.Module):
 7 |     def __init__(self, cfg):
 8 |         super().__init__()
 9 |         self.cfg = cfg
10 |         self.backbone, self.concat_layer = self._get_backbone()
11 |         self.net_q_fc = nn.Linear(self.concat_layer.in_features, 4)
12 |         self.net_t_fc = nn.Linear(self.concat_layer.in_features, 3)
13 |         self.dropout = nn.Dropout(0.3)
14 | 
15 |     def _get_backbone(self):
16 |         backbone, concat_layer = None, None
17 |         if self.cfg.backbone_net == 'resnet34':
18 |             backbone = models.resnet34(pretrained=True)
19 |             in_features = backbone.fc.in_features
20 |             backbone.fc = nn.Identity()
21 |             concat_layer = nn.Linear(2 * in_features, 2 * in_features)
22 |         return backbone, concat_layer
23 | 
24 |     def _forward_one(self, x):
25 |         x = self.backbone(x)
26 |         x = x.view(x.size()[0], -1)
27 |         return x
28 | 
29 |     def forward(self, x1, x2):
30 |         feat1 = self._forward_one(x1)
31 |         feat2 = self._forward_one(x2)
32 | 
33 |         feat = torch.cat((feat1, feat2), 1)
34 |         q_est = self.net_q_fc(self.dropout(self.concat_layer(feat)))
35 |         t_est = self.net_t_fc(self.dropout(self.concat_layer(feat)))
36 |         return q_est, t_est
37 | 


--------------------------------------------------------------------------------
/relposenet/pipeline.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from os import path as osp
  3 | import time
  4 | from tqdm import tqdm
  5 | import torch
  6 | from tensorboardX import SummaryWriter
  7 | from relposenet.model import RelPoseNet
  8 | from relposenet.dataset import SevenScenesRelPoseDataset
  9 | from relposenet.augmentations import get_augmentations
 10 | from relposenet.criterion import RelPoseCriterion
 11 | from relposenet.utils import cycle, set_seed
 12 | 
 13 | 
 14 | class Pipeline(object):
 15 |     def __init__(self, cfg):
 16 |         self.cfg = cfg
 17 |         cfg_model = self.cfg.model_params
 18 |         self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 19 |         set_seed(self.cfg.seed)
 20 | 
 21 |         # initialize dataloaders
 22 |         self.train_loader, self.val_loader = self._init_dataloaders()
 23 |         self.train_loader_iterator = iter(cycle(self.train_loader))
 24 | 
 25 |         self.model = RelPoseNet(cfg_model).to(self.device)
 26 | 
 27 |         # Optimizer
 28 |         self.optimizer = torch.optim.Adam(self.model.parameters(),
 29 |                                           lr=self.cfg.train_params.lr)
 30 | 
 31 |         # Scheduler
 32 |         cfg_scheduler = self.cfg.train_params.scheduler
 33 |         self.scheduler = torch.optim.lr_scheduler.StepLR(self.optimizer,
 34 |                                                          step_size=cfg_scheduler.lrate_decay_steps,
 35 |                                                          gamma=cfg_scheduler.lrate_decay_factor)
 36 | 
 37 |         # Criterion
 38 |         self.criterion = RelPoseCriterion(self.cfg.train_params.alpha).to(self.device)
 39 | 
 40 |         # create writer (logger)
 41 |         self.writer = SummaryWriter(self.cfg.output_params.logger_dir)
 42 | 
 43 |         self.start_step = 0
 44 |         self.val_total_loss = 1e6
 45 |         if self.cfg.model_params.resume_snapshot:
 46 |             self._load_model(self.cfg.model_params.resume_snapshot)
 47 | 
 48 |     def _init_dataloaders(self):
 49 |         cfg_data = self.cfg.data_params
 50 |         cfg_train = self.cfg.train_params
 51 | 
 52 |         # get image augmentations
 53 |         train_augs, val_augs = get_augmentations()
 54 | 
 55 |         train_dataset = SevenScenesRelPoseDataset(cfg=self.cfg, split='train', transforms=train_augs)
 56 | 
 57 |         val_dataset = SevenScenesRelPoseDataset(cfg=self.cfg, split='val', transforms=val_augs)
 58 | 
 59 |         train_loader = torch.utils.data.DataLoader(train_dataset,
 60 |                                                    batch_size=cfg_train.bs,
 61 |                                                    shuffle=True,
 62 |                                                    pin_memory=True,
 63 |                                                    num_workers=cfg_train.n_workers,
 64 |                                                    drop_last=True)
 65 | 
 66 |         val_loader = torch.utils.data.DataLoader(val_dataset,
 67 |                                                  batch_size=cfg_train.bs,
 68 |                                                  shuffle=False,
 69 |                                                  pin_memory=True,
 70 |                                                  num_workers=cfg_train.n_workers,
 71 |                                                  drop_last=True)
 72 |         return train_loader, val_loader
 73 | 
 74 |     def _predict_cam_pose(self, mini_batch):
 75 |         q_est, t_est = self.model.forward(mini_batch['img1'].to(self.device),
 76 |                                           mini_batch['img2'].to(self.device))
 77 |         return q_est, t_est
 78 | 
 79 |     def _save_model(self, step, loss_val, best_val=False):
 80 |         if not osp.exists(self.cfg.output_params.snapshot_dir):
 81 |             os.makedirs(self.cfg.output_params.snapshot_dir)
 82 | 
 83 |         fname_out = 'best_val.pth' if best_val else 'snapshot{:06d}.pth'.format(step)
 84 |         save_path = osp.join(self.cfg.output_params.snapshot_dir, fname_out)
 85 |         model_state = self.model.state_dict()
 86 |         torch.save({'step': step,
 87 |                     'state_dict': model_state,
 88 |                     'optimizer': self.optimizer.state_dict(),
 89 |                     'scheduler': self.scheduler.state_dict(),
 90 |                     'val_loss': loss_val,
 91 |                     },
 92 |                    save_path)
 93 | 
 94 |     def _load_model(self, snapshot):
 95 |         data_dict = torch.load(snapshot)
 96 |         self.model.load_state_dict(data_dict['state_dict'])
 97 |         self.optimizer.load_state_dict(data_dict['optimizer'])
 98 |         self.scheduler.load_state_dict(data_dict['scheduler'])
 99 |         self.start_step = data_dict['step']
100 |         if 'val_loss' in data_dict:
101 |             self.val_total_loss = data_dict['val_loss']
102 | 
103 |     def _train_batch(self):
104 |         train_sample = next(self.train_loader_iterator)
105 |         q_est, t_est = self._predict_cam_pose(train_sample)
106 | 
107 |         self.optimizer.zero_grad()
108 | 
109 |         # compute loss
110 |         loss, t_loss_val, q_loss_val = self.criterion(train_sample['q_gt'].to(self.device),
111 |                                                       train_sample['t_gt'].to(self.device),
112 |                                                       q_est,
113 |                                                       t_est)
114 |         loss.backward()
115 | 
116 |         # update the optimizer
117 |         self.optimizer.step()
118 | 
119 |         # update the scheduler
120 |         self.scheduler.step()
121 |         return loss.item(), t_loss_val, q_loss_val
122 | 
123 |     def _validate(self):
124 |         self.model.eval()
125 |         loss_total, t_loss_total, q_loss_total = 0., 0., 0.
126 | 
127 |         with torch.no_grad():
128 |             for val_sample in tqdm(self.val_loader):
129 |                 q_est, t_est = self._predict_cam_pose(val_sample)
130 |                 # compute loss
131 |                 loss, t_loss_val, q_loss_val = self.criterion(val_sample['q_gt'].to(self.device),
132 |                                                               val_sample['t_gt'].to(self.device),
133 |                                                               q_est,
134 |                                                               t_est)
135 |                 loss_total += loss.item()
136 |                 t_loss_total += t_loss_val
137 |                 q_loss_total += q_loss_val
138 | 
139 |         avg_total_loss = loss_total / len(self.val_loader)
140 |         avg_t_loss = t_loss_total / len(self.val_loader)
141 |         avg_q_loss = q_loss_total / len(self.val_loader)
142 | 
143 |         self.model.train()
144 | 
145 |         return avg_total_loss, avg_t_loss, avg_q_loss
146 | 
147 |     def run(self):
148 |         print('Start training', self.start_step)
149 |         train_start_time = time.time()
150 |         train_log_iter_time = time.time()
151 |         for step in range(self.start_step + 1, self.start_step + self.cfg.train_params.n_train_iters):
152 |             train_loss_batch, _, _ = self._train_batch()
153 | 
154 |             if step % self.cfg.output_params.log_scalar_interval == 0 and step > 0:
155 |                 self.writer.add_scalar('Train_total_loss_batch', train_loss_batch, step)
156 |                 print(f'Elapsed time [min] for {self.cfg.output_params.log_scalar_interval} iterations: '
157 |                       f'{(time.time() - train_log_iter_time) / 60.}')
158 |                 train_log_iter_time = time.time()
159 |                 print(f'Step {step} out of {self.cfg.train_params.n_train_iters} is done. Train loss (per batch): '
160 |                       f'{train_loss_batch}.')
161 | 
162 |             if step % self.cfg.output_params.validate_interval == 0 and step > 0:
163 |                 val_time = time.time()
164 |                 best_val = False
165 |                 val_total_loss, val_t_loss, val_q_loss = self._validate()
166 |                 self.writer.add_scalar('Val_total_loss', val_total_loss, step)
167 |                 self.writer.add_scalar('Val_t_loss', val_t_loss, step)
168 |                 self.writer.add_scalar('Val_q_loss', val_q_loss, step)
169 |                 if val_total_loss < self.val_total_loss:
170 |                     self.val_total_loss = val_total_loss
171 |                     best_val = True
172 |                 self._save_model(step, val_total_loss, best_val=best_val)
173 |                 print(f'Validation loss: {val_total_loss}, t_loss: {val_t_loss}, q_loss: {val_q_loss}')
174 |                 print(f'Elapsed time [min] for validation: {(time.time() - val_time) / 60.}')
175 |                 train_log_iter_time = time.time()
176 | 
177 |         print(f'Elapsed time for training [min] {(time.time() - train_start_time) / 60.}')
178 |         print('Done')
179 | 


--------------------------------------------------------------------------------
/relposenet/utils.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import random
 3 | import torch
 4 | 
 5 | 
 6 | def set_seed(seed):
 7 |     torch.backends.cudnn.benchmark = True
 8 |     torch.manual_seed(seed)
 9 |     torch.cuda.manual_seed_all(seed)
10 |     torch.cuda.manual_seed(seed)
11 |     np.random.seed(seed)
12 |     random.seed(seed)
13 | 
14 | 
15 | def cycle(iterable):
16 |     while True:
17 |         for x in iterable:
18 |             yield x
19 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
  1 | absl-py==0.11.0
  2 | albumentations==0.5.2
  3 | antlr4-python3-runtime==4.8
  4 | argon2-cffi==20.1.0
  5 | async-generator==1.10
  6 | attrs==20.3.0
  7 | backcall==0.2.0
  8 | bleach==3.2.1
  9 | blessings==1.7
 10 | cachetools==4.1.1
 11 | certifi==2020.11.8
 12 | cffi==1.14.4
 13 | chardet==3.0.4
 14 | ConfigArgParse==1.2.3
 15 | cupy-cuda102==8.0.0
 16 | cycler==0.10.0
 17 | decorator==4.4.2
 18 | defusedxml==0.6.0
 19 | entrypoints==0.3
 20 | fastrlock==0.5
 21 | future==0.18.2
 22 | google-auth==1.23.0
 23 | google-auth-oauthlib==0.4.2
 24 | gpustat==0.6.0
 25 | grpcio==1.33.2
 26 | hydra-core==1.0.4
 27 | idna==2.10
 28 | imageio==2.9.0
 29 | imgaug==0.4.0
 30 | importlib-metadata==2.0.0
 31 | importlib-resources==3.3.0
 32 | ipykernel==5.4.2
 33 | ipython==7.19.0
 34 | ipython-genutils==0.2.0
 35 | ipywidgets==7.5.1
 36 | jedi==0.17.2
 37 | Jinja2==2.11.2
 38 | joblib==0.17.0
 39 | json5==0.9.5
 40 | jsonschema==3.2.0
 41 | jupyter==1.0.0
 42 | jupyter-client==6.1.7
 43 | jupyter-console==6.2.0
 44 | jupyter-core==4.7.0
 45 | jupyterlab==2.2.9
 46 | jupyterlab-pygments==0.1.2
 47 | jupyterlab-server==1.2.0
 48 | kiwisolver==1.2.0
 49 | kornia==0.4.1
 50 | Markdown==3.3.3
 51 | MarkupSafe==1.1.1
 52 | matplotlib==3.3.2
 53 | mistune==0.8.4
 54 | mkl-fft==1.2.0
 55 | mkl-random==1.1.1
 56 | mkl-service==2.3.0
 57 | nbclient==0.5.1
 58 | nbconvert==6.0.7
 59 | nbformat==5.0.8
 60 | nest-asyncio==1.4.3
 61 | networkx==2.5
 62 | notebook==6.1.5
 63 | numpy @ file:///tmp/build/80754af9/numpy_and_numpy_base_1596233707986/work
 64 | nvidia-ml-py3==7.352.0
 65 | oauthlib==3.1.0
 66 | olefile==0.46
 67 | omegaconf==2.0.5
 68 | opencv-contrib-python==3.4.2.17
 69 | opencv-python==3.4.2.17
 70 | opencv-python-headless==4.4.0.46
 71 | packaging==20.8
 72 | pandocfilters==1.4.3
 73 | parso==0.7.1
 74 | pexpect==4.8.0
 75 | pickleshare==0.7.5
 76 | Pillow @ file:///tmp/build/80754af9/pillow_1594307325547/work
 77 | prometheus-client==0.9.0
 78 | prompt-toolkit==3.0.8
 79 | protobuf==3.13.0
 80 | psutil==5.7.2
 81 | ptyprocess==0.6.0
 82 | pyasn1==0.4.8
 83 | pyasn1-modules==0.2.8
 84 | pycparser==2.20
 85 | Pygments==2.7.3
 86 | pynvrtc==9.2
 87 | pyparsing==2.4.7
 88 | pyrsistent==0.17.3
 89 | python-dateutil==2.8.1
 90 | pytorch-metric-learning==0.9.94
 91 | PyWavelets==1.1.1
 92 | PyYAML==5.3.1
 93 | pyzmq==20.0.0
 94 | qtconsole==5.0.1
 95 | QtPy==1.9.0
 96 | requests==2.24.0
 97 | requests-oauthlib==1.3.0
 98 | rsa==4.6
 99 | scikit-image==0.17.2
100 | scikit-learn==0.23.2
101 | scipy==1.5.2
102 | Send2Trash==1.5.0
103 | Shapely==1.7.1
104 | six==1.15.0
105 | tensorboard==2.3.0
106 | tensorboard-plugin-wit==1.7.0
107 | tensorboardX==2.1
108 | terminado==0.9.1
109 | testpath==0.4.4
110 | threadpoolctl==2.1.0
111 | tifffile==2020.10.1
112 | torch==1.6.0
113 | torchvision==0.7.0
114 | tornado==6.1
115 | tqdm==4.50.2
116 | traitlets==5.0.5
117 | typing-extensions==3.7.4.3
118 | urllib3==1.25.11
119 | wcwidth==0.2.5
120 | webencodings==0.5.1
121 | Werkzeug==1.0.1
122 | widgetsnbextension==3.5.1
123 | zipp==3.4.0
124 | faiss==1.6.3
125 | 


--------------------------------------------------------------------------------
/tests/dataloader_tests.py:
--------------------------------------------------------------------------------
 1 | import hydra
 2 | import torch
 3 | from relposenet.dataset import SevenScenesRelPoseDataset
 4 | from relposenet.augmentations import train_augmentations
 5 | import matplotlib.pyplot as plt
 6 | 
 7 | 
 8 | @hydra.main(config_path="../configs", config_name="main")
 9 | def main(cfg):
10 |     augs = train_augmentations()
11 |     dataset = SevenScenesRelPoseDataset(cfg,
12 |                                         split='train',
13 |                                         transforms=augs)
14 | 
15 |     dataloader = torch.utils.data.DataLoader(dataset,
16 |                                              batch_size=1,
17 |                                              shuffle=True,
18 |                                              num_workers=8)
19 | 
20 |     for idx, mini_batch in enumerate(dataloader):
21 | 
22 |         print(f'Pair id: {idx}')
23 |         print(f't: {mini_batch["t_gt"]}, q: {mini_batch["q_gt"]}')
24 | 
25 |         fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(35, 10))
26 |         axes = axes.flatten()
27 | 
28 |         axes[0].imshow(mini_batch['img1'].permute(0, 2, 3, 1).squeeze().numpy())
29 |         axes[1].imshow(mini_batch['img2'].permute(0, 2, 3, 1).squeeze().numpy())
30 | 
31 |         axes[0].axis("off")
32 |         axes[1].axis("off")
33 | 
34 |         plt.show(block=False)
35 |         plt.pause(3)
36 |         plt.close()
37 | 
38 | 
39 | if __name__ == '__main__':
40 |     main()
41 | 


--------------------------------------------------------------------------------