├── .gitignore ├── LICENSE ├── README.md ├── code ├── .gitignore ├── Logger.py ├── config.py ├── data │ ├── MovingObj3D.py │ ├── SimpleLoader.py │ ├── TUM_RGBD.py │ ├── data_examples │ │ └── TUM │ │ │ ├── color │ │ │ ├── 1305031790.645155.png │ │ │ ├── 1305031790.713097.png │ │ │ ├── 1305031790.781258.png │ │ │ ├── 1305031790.845151.png │ │ │ └── 1305031790.913129.png │ │ │ └── depth │ │ │ ├── 1305031790.640468.png │ │ │ ├── 1305031790.709421.png │ │ │ ├── 1305031790.773548.png │ │ │ ├── 1305031790.839363.png │ │ │ └── 1305031790.909436.png │ └── dataloader.py ├── evaluate.py ├── models │ ├── LeastSquareTracking.py │ ├── __init__.py │ ├── algorithms.py │ ├── criterions.py │ ├── geometry.py │ └── submodules.py ├── run_example.py ├── timers.py ├── train.py ├── train_utils.py └── trained_models │ └── TUM_RGBD_ABC_final.pth.tar ├── images └── overall_flowchart.png └── setup └── environment.yml /.gitignore: -------------------------------------------------------------------------------- 1 | *.DS_Store 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Zhaoyang Lv, Frank Dellaert, James M. Rehg, Andreas Geiger 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Taking a Deeper Look at the Inverse Compositional Algorithm (CVPR 2019, Oral Presentation) 2 | 3 | ![alt text](images/overall_flowchart.png) 4 | 5 | ## Summary 6 | 7 | This is the official repository of our CVPR 2019 paper: 8 | 9 | **Taking a Deeper Look at the Inverse Compositional Algorithm**, 10 | *Zhaoyang Lv, Frank Dellaert, James M. Rehg, Andreas Geiger*, 11 | CVPR 2019 12 | * [Preprint (PDF)][1] 13 | * [Video talk][2] 14 | 15 | ```bibtex 16 | @inproceedings{Lv19cvpr, 17 | title = {Taking a Deeper Look at the Inverse Compositional Algorithm}, 18 | author = {Lv, Zhaoyang and Dellaert, Frank and Rehg, James and Geiger, Andreas}, 19 | booktitle = {CVPR}, 20 | year = {2019} 21 | } 22 | ``` 23 | 24 | Please cite the paper if you use the provided code in your research. 25 | 26 | ### Project Members 27 | 28 | * [Zhaoyang Lv][3], Georgia Institute of Technology (Ph.D. student), Max Planck Institute (Alumni) 29 | * [Frank Dellaert][4], Georgia Institute of Technology 30 | * [James M. Rehg][5], Georgia Institute of Technology 31 | * [Andreas Geiger][6], Max Planck Institute, University of Tuebingen 32 | 33 | ### Contact 34 | 35 | Please drop me an email if you have any questions regarding this project. 36 | 37 | Zhaoyang Lv (zhaoyang.lv@gatech.edu, lvzhaoyang1990@gmail.com) 38 | 39 | ### Setup 40 | 41 | The code is developed using Pytorch 1.0, CUDA 9.0, Ubuntu16.04. Pytorch > 1.0 and Cuda > 9.0 were also tested in some machines. If you need to integrate to some code bases which are using Pytorch < 1.0, note that some functions (particularly the matrix inverse operator) are not backward compatible. Contact me if you need some instructions. 42 | 43 | You can reproduce the setup by using our anaconda environment configurations 44 | 45 | ``` bash! 46 | conda env create -f setup/environment.yml 47 | ``` 48 | 49 | Everytime before you run, activate the environment 50 | 51 | ``` bash! 52 | conda activate deepICN # the environment name will be (deepICN) 53 | ``` 54 | 55 | ### Quick Inference Example 56 | 57 | Here is a quick run using pre-trained model on a short TUM trajectory. 58 | 59 | ``` 60 | python run_example.py 61 | ``` 62 | 63 | Be careful about the depth reader when you switch to a different dataset, which may use different scaling to the TUM dataset. And this pre-trained model is trained for egocentric-motion estimation using TUM dataset, not for the object-centric motion estimation. These factors may affect the result. 64 | 65 | To run the full training and evaluation, please follow the steps below. 66 | 67 | ### Prepare the datasets 68 | 69 | **TUM RGBD Dataset**: Download the dataset from [TUM RGBD][7] to '$YOUR_TUM_RGBD_DIR'. Create a symbolic link to the data directory as 70 | 71 | ``` 72 | ln -s $YOUR_TUM_RGBD_DIR code/data/data_tum 73 | ``` 74 | 75 | **MovingObjects3D Dataset** Download the dataset from [MovingObjs3D][8] to '$YOUR_MOV_OBJS_3D_DIR'. Create a symbolic link to the data directory as 76 | 77 | ``` 78 | ln -s $YOUR_MOV_OBJS_3D_DIR code/data/data_objs3D 79 | ``` 80 | 81 | ### Run training 82 | 83 | **Train example with TUM RGBD dataset:** 84 | 85 | ``` bash! 86 | python train.py --dataset TUM_RGBD 87 | 88 | # or run with the specific setting 89 | python train.py --dataset TUM_RGBD \ 90 | --encoder_name ConvRGBD2 \ 91 | --mestimator MultiScale2w \ 92 | --solver Direct-ResVol \ 93 | --keyframes 1,2,4,8 # mixing keyframes subsampled from 1,2,4,8 for training. 94 | ``` 95 | 96 | To check the full training setting, run the help config as 97 | ``` bash! 98 | python train.py --help 99 | ``` 100 | 101 | Use tensorboard to check the progress during training 102 | ``` bash! 103 | tensorboard --log logs/TUM_RGBD --port 8000 # go to localhost:8000 to check the training&validation curve 104 | ``` 105 | 106 | **Train example with MovingObjects3D:** All the same as the last one only except changing the dataset name 107 | 108 | ``` bash! 109 | python train.py --dataset MovingObjs3D \ 110 | --keyframes 1,2,4 # mixing keyframes subsampled from 1,2,4 for training 111 | 112 | # check the training progress using tensorboard 113 | tensorboard --log logs/MovingObjs3D --port 8001 114 | ``` 115 | 116 | We will soon release the instructions for the other two datasets used in the paper, data in the BundleFusion and DynamicBundleFusion. 117 | 118 | ### Run evaluation with the pretrained models 119 | 120 | **Run the pretrained model:** If you have set up the dataset properly with the datasets, you can run the learned model with the checkpoint we provided in the trained model directory 121 | 122 | ``` bash! 123 | python evaluate.py --dataset TUM_RGBD \ 124 | --trajectory fr1/rgbd_dataset_freiburg1_360 \ 125 | --encoder_name ConvRGBD2 \ 126 | --mestimator MultiScale2w \ 127 | --solver Direct-ResVol \ 128 | --keyframes 1 \ 129 | --checkpoint trained_models/TUM_RGBD_ABC_final.pth.tar 130 | ``` 131 | 132 | You can substitute the trajectory, the keyframe and the checkpoint file. The training and evaluation share the same config setting. To check the full setting, run the help config as 133 | 134 | ``` bash! 135 | python evaluate.py --help 136 | ``` 137 | 138 | **Results:** The evaluation results will be generated automatically in both '.pkl' and '*.csv' in the folder 'test_results/'. 139 | 140 | **Run a baseline:** This implementation can be simplified to a vanilla Lucas-Kanade method minizing the photometric error without any learnable module. Note that it is **not** the [RGBD VO baseline][9] we report in the paper. It may not be the optimal Lucas-Kanade baseline you want to compared with since we use the same stopping criterion, Gauss-Newton solver within the same framework as our learned model. There is no extra bells and whistles, but it may provide a baseline for you to explore the algorithm in various directions. 141 | 142 | ``` bash! 143 | python evaluate.py --dataset TUM_RGBD \ 144 | --trajectory fr1/rgbd_dataset_freiburg1_360 \ 145 | --encoder_name RGB --mestimator None --solve Direct-Nodamping \ 146 | --keyframes 1 # optionally 1,2,4,8, etc. 147 | ``` 148 | 149 | ## License 150 | 151 | MIT License 152 | 153 | Copyright (c) 2019 Zhaoyang Lv, Frank Dellaert, James M. Rehg, Andreas Geiger 154 | 155 | Permission is hereby granted, free of charge, to any person obtaining a copy 156 | of this software and associated documentation files (the "Software"), to deal 157 | in the Software without restriction, including without limitation the rights 158 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 159 | copies of the Software, and to permit persons to whom the Software is 160 | furnished to do so, subject to the following conditions: 161 | 162 | The above copyright notice and this permission notice shall be included in all 163 | copies or substantial portions of the Software. 164 | 165 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 166 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 167 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 168 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 169 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 170 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 171 | SOFTWARE. 172 | 173 | 174 | [1]: https://arxiv.org/pdf/1812.06861.pdf 175 | [2]: https://youtu.be/doTjXDFtyK0 176 | [3]: https://www.cc.gatech.edu/~zlv30/ 177 | [4]: https://www.cc.gatech.edu/~dellaert/FrankDellaert/Frank_Dellaert/Frank_Dellaert.html 178 | [5]: https://rehg.org/ 179 | [6]: http://www.cvlibs.net/ 180 | [7]: https://vision.in.tum.de/data/datasets/rgbd-dataset/download 181 | [8]: https://drive.google.com/open?id=1EIlS4J2J0sdsq8Mw_03DXHlRQmfL8XQx 182 | [9]: https://vision.cs.tum.edu/_media/spezial/bib/steinbruecker_sturm_cremers_iccv11.pdf -------------------------------------------------------------------------------- /code/.gitignore: -------------------------------------------------------------------------------- 1 | trained_models/* 2 | test/* 3 | test_results/* 4 | */**/*.pyc 5 | logs/* 6 | -------------------------------------------------------------------------------- /code/Logger.py: -------------------------------------------------------------------------------- 1 | """ Logger, wrapped on Tensorboard 1.0.0a6 2 | Tensorboard is not backward compatible since then. 3 | 4 | @author: Zhaoyang Lv 5 | @date: March 2019 6 | """ 7 | 8 | import sys, os, shutil 9 | import os.path as osp 10 | import tensorboard 11 | import torch 12 | from collections import OrderedDict 13 | 14 | class Logger(object): 15 | """ 16 | example usage: 17 | 18 | stdout = Logger('log.txt') 19 | sys.stdout = stdout 20 | 21 | ... your code here ... 22 | 23 | stdout.delink() 24 | """ 25 | def __init__(self, filename="Default.log"): 26 | self.terminal = sys.stdout 27 | self.log = open(filename, "w") 28 | 29 | def delink(self): 30 | self.log.close() 31 | #self.log = open('foo', "w") 32 | # self.write = self.writeTerminalOnly 33 | 34 | def writeTerminalOnly(self, message): 35 | self.terminal.write(message) 36 | 37 | def write(self, message): 38 | self.terminal.write(message) 39 | self.log.write(message) 40 | 41 | def flush(self): 42 | pass 43 | 44 | class TensorBoardLogger(object): 45 | def __init__(self, logging_dir, logfile_name, print_freq = 10): 46 | 47 | self.log_dir = logging_dir 48 | self.print_freq = print_freq 49 | 50 | if not os.path.isdir(logging_dir): 51 | os.makedirs(logging_dir) 52 | 53 | self.summary_writer = tensorboard.FileWriter(logdir=logging_dir) 54 | 55 | # standard logger to print to terminal 56 | logfile = osp.join(logging_dir,'log.txt') 57 | stdout = Logger(logfile) 58 | sys.stdout = stdout 59 | 60 | def write_to_tensorboard(self, display_dict, iteration): 61 | """ Write the saved states (display_dict) to tensorboard 62 | """ 63 | for k, v in display_dict.items(): 64 | self.summary_writer.add_summary(tensorboard.summary.scalar(k, v), iteration) 65 | 66 | def write_to_terminal(self, display_dict, epoch, batch_iter, epoch_len, batch_time, is_train = True): 67 | """ Write the save states (display_dict) and training information to terminal for display 68 | """ 69 | 70 | if batch_iter % self.print_freq != 0: 71 | return 72 | 73 | if is_train: 74 | prefix = 'Train' 75 | else: 76 | prefix = 'Test' 77 | 78 | state = prefix + ':\tEpoch %d, Batch %d/%d, BatchTime %.4f'%(epoch+1, batch_iter, epoch_len, batch_time) 79 | 80 | loss = '' 81 | for k, v in display_dict.items(): 82 | loss += k + ' ' + '%.8f ' % v 83 | 84 | print(state + loss) 85 | 86 | def save_checkpoint(self, network, state_info = None, 87 | filename='checkpoint.pth.tar'): 88 | """save checkpoint to disk""" 89 | state_dict = network.state_dict().copy() 90 | 91 | if torch.cuda.device_count() > 1: 92 | state_dict_rename = OrderedDict() 93 | for k, v in state_dict.items(): 94 | name = k[7:] # remove `module.` 95 | state_dict_rename[name] = v 96 | state_dict = state_dict_rename 97 | 98 | if state_info is None: 99 | state = {'state_dict': state_dict} 100 | else: 101 | state = state_info 102 | state['state_dict'] = state_dict 103 | 104 | checkpoint_path = osp.join(self.log_dir,filename) 105 | torch.save(state, checkpoint_path) 106 | return checkpoint_path 107 | -------------------------------------------------------------------------------- /code/config.py: -------------------------------------------------------------------------------- 1 | """ 2 | Add the configurations by modules 3 | 4 | @author: Zhaoyang Lv 5 | @date: March 2019 6 | """ 7 | 8 | def add_tracking_config(parser): 9 | parser.add_argument('--network', 10 | default='DeepIC', type=str, 11 | choices=('DeepIC', 'GaussNewton'), 12 | help='Choose a network to run. \n \ 13 | The DeepIC is the proposed Deeper Inverse Compositional method. \n\ 14 | The GuassNewton is the baseline for Inverse Compositional method which does not include \ 15 | any learnable parameters\n') 16 | parser.add_argument('--mestimator', 17 | default='MultiScale2w', type=str, 18 | choices=('None', 'MultiScale2w'), 19 | help='Choose a weighting function for the Trust Region method.\n\ 20 | The MultiScale2w is the proposed (B) convolutional M-estimator. \n') 21 | parser.add_argument('--solver', 22 | default='Direct-ResVol', type=str, 23 | choices=('Direct-Nodamping', 'Direct-ResVol'), 24 | help='Choose the solver function for the Trust Region method. \n\ 25 | Direct-Nodamping is the Gauss-Newton algorithm, which does not use damping. \n\ 26 | Direct-ResVol is the proposed (C) Trust-Region Network. \n\ 27 | (default: Direct-ResVol) ') 28 | parser.add_argument('--encoder_name', 29 | default='ConvRGBD2', 30 | choices=('ConvRGBD2', 'RGB'), 31 | help='The encoder architectures. \ 32 | ConvRGBD2 takes the two-view features as input. \n\ 33 | RGB is using the raw RGB images as input (converted to intensity afterwards).\n\ 34 | (default: ConvRGBD2)') 35 | parser.add_argument('--max_iter_per_pyr', 36 | default=3, type=int, 37 | help='The maximum number of iterations at each pyramids.\n') 38 | parser.add_argument('--no_weight_sharing', 39 | action='store_true', 40 | help='If this flag is on, we disable sharing the weights across different backbone network when extracing \ 41 | features. In default, we share the weights for all network in each pyramid level.\n') 42 | parser.add_argument('--tr_samples', default=10, type=int, 43 | help='Set the number of trust-region samples. (default: 10)\n') 44 | 45 | def add_basics_config(parser): 46 | """ the basic setting 47 | (supposed to be shared through train and inference) 48 | """ 49 | parser.add_argument('--cpu_workers', type=int, default=12, 50 | help="Number of cpu threads for data loader.\n") 51 | parser.add_argument('--dataset', type=str, 52 | choices=('TUM_RGBD', 'MovingObjects3D'), 53 | help='Choose a dataset to train/val/evaluate.\n') 54 | parser.add_argument('--time', dest='time', action='store_true', 55 | help='Count the execution time of each step.\n' ) 56 | 57 | def add_test_basics_config(parser): 58 | parser.add_argument('--batch_per_gpu', default=8, type=int, 59 | help='Specify the batch size during test. The default is 8.\n') 60 | parser.add_argument('--checkpoint', default='', type=str, 61 | help='Choose a checkpoint model to test.\n') 62 | parser.add_argument('--keyframes', 63 | default='1,2,4,8', type=str, 64 | help='Choose the number of keyframes to train the algorithm.\n') 65 | parser.add_argument('--verbose', action='store_true', 66 | help='Print/save all the intermediate representations') 67 | parser.add_argument('--eval_set', default='test', 68 | choices=('test', 'validation')) 69 | parser.add_argument('--trajectory', type=str, 70 | default = '', 71 | help = 'Specify a trajectory to run.\n') 72 | 73 | def add_train_basics_config(parser): 74 | """ add the basics about the training """ 75 | parser.add_argument('--checkpoint', default='', type=str, 76 | help='Choose a pretrained checkpoint model to start with. \n') 77 | parser.add_argument('--batch_per_gpu', default=64, type=int, 78 | help='Specify the batch size during training.\n') 79 | parser.add_argument('--epochs', 80 | default=30, type=int, 81 | help='The total number of total epochs to run. Default is 30.\n' ) 82 | parser.add_argument('--resume_training', 83 | dest='resume_training', action='store_true', 84 | help='Resume the training using the loaded checkpoint. If not, restart the training. \n\ 85 | You will need to use the --checkpoint config to load the pretrained checkpoint' ) 86 | parser.add_argument('--pretrained_model', default='', type=str, 87 | help='Initialize the model weights with pretrained model.\n') 88 | parser.add_argument('--no_val', 89 | default=False, 90 | action='store_true', 91 | help='Use no validatation set for training.\n') 92 | parser.add_argument('--keyframes', 93 | default='1,2,4,8', type=str, 94 | help='Choose the number of keyframes to train the algorithm') 95 | parser.add_argument('--verbose', action='store_true', 96 | help='Print/save all the intermediate representations.\n') 97 | 98 | def add_train_log_config(parser): 99 | """ checkpoint and log options """ 100 | parser.add_argument('--checkpoint_folder', default='', type=str, 101 | help='The folder name (postfix) to save the checkpoint.') 102 | parser.add_argument('--snapshot', default=1, type=int, 103 | help='Number of interations to save a snapshot') 104 | parser.add_argument('--save_checkpoint_freq', 105 | default=1, type=int, 106 | help='save the checkpoint for every N epochs') 107 | parser.add_argument('--prefix', default='', type=str, 108 | help='the prefix string added to the log files') 109 | parser.add_argument('-p', '--print_freq', 110 | default=10, type=int, 111 | help='print frequency (default: 10)') 112 | 113 | def add_train_optim_config(parser): 114 | """ add training optimization options """ 115 | parser.add_argument('--opt', 116 | type=str, default='adam', choices=('sgd','adam'), 117 | help='choice of optimizer (default: adam) \n') 118 | parser.add_argument('--lr', 119 | default=0.0005, type=float, 120 | help='initial learning rate. \n') 121 | parser.add_argument('--lr_decay_ratio', 122 | default=0.5, type=float, 123 | help='lr decay ratio (default:0.5)') 124 | parser.add_argument('--lr_decay_epochs', 125 | default=[5, 10, 20], type=int, nargs='+', 126 | help='lr decay epochs') 127 | parser.add_argument('--lr_min', default=1e-6, type=float, 128 | help='minimum learning rate') 129 | parser.add_argument('--lr_restart', default=10, type=int, 130 | help='restart learning after N epochs') 131 | 132 | def add_train_loss_config(parser): 133 | """ add training configuration for the loss function """ 134 | parser.add_argument('--regression_loss_type', 135 | default='SmoothL1', type=str, choices=('L1', 'SmoothL1'), 136 | help='Loss function for flow regression (default: SmoothL1 loss)') -------------------------------------------------------------------------------- /code/data/MovingObj3D.py: -------------------------------------------------------------------------------- 1 | """ 2 | Data loader for MovingObjs 3D dataset 3 | 4 | @author: Zhaoyang Lv 5 | @date: May 2019 6 | """ 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | from __future__ import unicode_literals 12 | 13 | import sys, os, random 14 | import pickle 15 | import functools 16 | 17 | import numpy as np 18 | import torch.utils.data as data 19 | import os.path as osp 20 | 21 | from scipy.misc import imread 22 | from tqdm import tqdm 23 | 24 | from cv2 import resize, INTER_NEAREST 25 | 26 | class MovingObjects3D(data.Dataset): 27 | 28 | # every sequence has 200 frames. 29 | categories = { 30 | 'train': {'aeroplane': [0,190], 31 | 'bicycle': [0,190], 32 | 'bus': [0,190], 33 | 'car': [0,190]}, 34 | 35 | 'validation': {'aeroplane': [190,200], 36 | 'bicycle': [190,200], 37 | 'bus': [190,200], 38 | 'car': [190,200]}, 39 | 40 | 'test': {'boat': [0,200], 41 | 'motorbike': [0,200]} 42 | } 43 | 44 | def __init__(self, root, load_type='train', keyframes = [1], data_transform=None): 45 | super(MovingObjects3D, self).__init__() 46 | 47 | self.base_folder = osp.join(root, 'data_objs3D') 48 | 49 | data_all = self.categories[load_type] 50 | 51 | # split it into train and test set (the first 20 are for test) 52 | self.image_seq = [] 53 | self.depth_seq = [] 54 | self.invalid_seq = [] 55 | self.object_mask_seq = [] 56 | self.cam_pose_seq = [] 57 | self.obj_pose_seq = [] 58 | self.obj_vis_idx = [] 59 | self.calib = [] 60 | self.obj_names = [] 61 | 62 | self.transforms = data_transform 63 | 64 | if load_type in ['validation', 'test']: 65 | # should not mix different keyframes in test 66 | assert(len(keyframes) == 1) 67 | self.keyframes = [1] 68 | self.sample_freq = keyframes[0] 69 | else: 70 | self.keyframes = keyframes 71 | self.sample_freq = 1 72 | 73 | self.ids = 0 74 | self.images_size = [240, 320] 75 | # get the accumulated image sequences on the fly 76 | self.seq_acc_ids = [0] 77 | for data_obj, frame_interval in data_all.items(): 78 | start_idx, end_idx = frame_interval 79 | print('Load {:} data from frame {:d} to {:d}'.format(data_obj, start_idx, end_idx)) 80 | for seq_idx in range(start_idx, end_idx, 1): 81 | seq_str = "{:06d}".format(seq_idx) 82 | 83 | info_pkl= osp.join(self.base_folder, 84 | data_obj, seq_str, 'info.pkl') 85 | 86 | color_seq, depth_seq, invalid_seq, mask_seq, camera_poses_seq, object_poses_seq,\ 87 | obj_visible_frames, calib_seq = extract_info_pickle(info_pkl) 88 | 89 | obj_visible_frames = obj_visible_frames[::self.sample_freq] 90 | 91 | self.image_seq.append([osp.join(self.base_folder, x) for x in color_seq]) 92 | self.depth_seq.append([osp.join(self.base_folder, x) for x in depth_seq]) 93 | # self.invalid_seq.append(invalid_seq) 94 | self.object_mask_seq.append([osp.join(self.base_folder, x) for x in mask_seq]) 95 | self.cam_pose_seq.append(camera_poses_seq) 96 | self.obj_pose_seq.append(object_poses_seq) 97 | self.calib.append(calib_seq) 98 | self.obj_vis_idx.append(obj_visible_frames) 99 | 100 | self.obj_names.append('{:}_{:03d}'.format(data_obj, seq_idx)) 101 | 102 | total_valid_frames = max(0, len(obj_visible_frames) - max(self.keyframes)) 103 | 104 | self.ids += total_valid_frames 105 | self.seq_acc_ids.append(self.ids) 106 | 107 | # downscale the input image to half 108 | self.fx_s = 0.5 109 | self.fy_s = 0.5 110 | 111 | print('There are a total of {:} valid frames'.format(self.ids)) 112 | 113 | def __len__(self): 114 | return self.ids 115 | 116 | def __getitem__(self, index): 117 | # the index we want from search sorted is shifted for one 118 | seq_idx = max(np.searchsorted(self.seq_acc_ids, index+1) - 1, 0) 119 | frame_idx = index - self.seq_acc_ids[seq_idx] 120 | 121 | this_idx= self.obj_vis_idx[seq_idx][frame_idx] 122 | next_idx= self.obj_vis_idx[seq_idx][frame_idx + random.choice(self.keyframes)] 123 | 124 | color0 = self.__load_rgb_tensor(self.image_seq[seq_idx][this_idx]) 125 | color1 = self.__load_rgb_tensor(self.image_seq[seq_idx][next_idx]) 126 | 127 | if self.transforms: 128 | color0, color1 = self.transforms([color0, color1]) 129 | 130 | depth0 = self.__load_depth_tensor(self.depth_seq[seq_idx][this_idx]) 131 | depth1 = self.__load_depth_tensor(self.depth_seq[seq_idx][next_idx]) 132 | 133 | cam_pose0 = self.cam_pose_seq[seq_idx][this_idx] 134 | cam_pose1 = self.cam_pose_seq[seq_idx][next_idx] 135 | obj_pose0 = self.obj_pose_seq[seq_idx][this_idx] 136 | obj_pose1 = self.obj_pose_seq[seq_idx][next_idx] 137 | 138 | # the relative allocentric transform of objects 139 | transform = functools.reduce(np.dot, 140 | [np.linalg.inv(cam_pose1), obj_pose1, np.linalg.inv(obj_pose0), cam_pose0]).astype(np.float32) 141 | 142 | # the validity of the object is up the object mask 143 | obj_index = 1 # object index is in default to be 1 144 | obj_mask0 = self.__load_binary_mask_tensor(self.object_mask_seq[seq_idx][this_idx], obj_index) 145 | obj_mask1 = self.__load_binary_mask_tensor(self.object_mask_seq[seq_idx][next_idx], obj_index) 146 | 147 | calib = np.asarray(self.calib[seq_idx], dtype=np.float32) 148 | calib[0] *= self.fx_s 149 | calib[1] *= self.fy_s 150 | calib[2] *= self.fx_s 151 | calib[3] *= self.fy_s 152 | 153 | obj_name = self.obj_names[seq_idx] 154 | pair_name = '{:}/{:06d}to{:06d}'.format(obj_name, this_idx, next_idx) 155 | 156 | return color0, color1, depth0, depth1, transform, calib, obj_mask0, obj_mask1, pair_name 157 | 158 | def __load_rgb_tensor(self, path): 159 | """ Load the rgb image 160 | """ 161 | image = imread(path)[:, :, :3] 162 | image = image.astype(np.float32) / 255.0 163 | image = resize(image, None, fx=self.fx_s, fy=self.fy_s) 164 | return image 165 | 166 | def __load_depth_tensor(self, path): 167 | """ Load the depth 168 | """ 169 | depth = imread(path).astype(np.float32) / 1e3 170 | depth = resize(depth, None, fx=self.fx_s, fy=self.fy_s, interpolation=INTER_NEAREST) 171 | depth = np.clip(depth, 1e-1, 1e2) # the valid region of the depth 172 | return depth[np.newaxis, :] 173 | 174 | def __load_binary_mask_tensor(self, path, seg_index): 175 | """ Load a binary segmentation mask (numbers) 176 | If the object matches the specified index, return true; 177 | Otherwise, return false 178 | """ 179 | obj_mask = imread(path) 180 | mask = (obj_mask == seg_index) 181 | mask = resize(mask.astype(np.float), None, fx=self.fx_s, fy=self.fy_s, interpolation=INTER_NEAREST) 182 | return mask.astype(np.uint8)[np.newaxis, :] 183 | 184 | def extract_info_pickle(info_pkl): 185 | 186 | with open(info_pkl, 'rb') as p: 187 | info = pickle.load(p) 188 | 189 | color_seq = [x.split('final/')[1] for x in info['color']] 190 | depth_seq = [x.split('final/')[1] for x in info['depth']] 191 | invalid_seq = [x.split('final/')[1] for x in info['invalid'] ] 192 | mask_seq = [x.split('final/')[1] for x in info['object_mask']] 193 | 194 | # in this rendering setting, there is only one object 195 | camera_poses_seq = info['pose'] 196 | object_poses_seq = info['object_poses']['Model_1'] 197 | object_visible_frames = info['object_visible_frames']['Model_1'] 198 | 199 | calib_seq = info['calib'] 200 | 201 | return color_seq, depth_seq, invalid_seq, mask_seq, \ 202 | camera_poses_seq, object_poses_seq, object_visible_frames, calib_seq 203 | 204 | 205 | if __name__ == '__main__': 206 | 207 | loader = MovingObjects3D('', load_type='train', keyframes=[1]) 208 | 209 | import torchvision.utils as torch_utils 210 | torch_loader = data.DataLoader(loader, batch_size=16, shuffle=False, num_workers=4) 211 | 212 | for batch in torch_loader: 213 | color0, color1, depth0, depth1, transform, K, mask0, mask1, names = batch 214 | B,C,H,W=color0.shape 215 | 216 | bcolor0_img = torch_utils.make_grid(color0, nrow=4) 217 | bcolor1_img = torch_utils.make_grid(color1, nrow=4) 218 | # bdepth0_img = torch_utils.make_grid(depth0, nrow=4) 219 | # bdepth1_img = torch_utils.make_grid(depth1, nrow=4) 220 | bmask0_img = torch_utils.make_grid(mask0.view(B,1,H,W)*255, nrow=4) 221 | bmask1_img = torch_utils.make_grid(mask1.view(B,1,H,W)*255, nrow=4) 222 | 223 | import matplotlib.pyplot as plt 224 | plt.figure() 225 | plt.imshow(bcolor0_img.numpy().transpose((1,2,0))) 226 | plt.figure() 227 | plt.imshow(bcolor1_img.numpy().transpose((1,2,0))) 228 | # plt.figure() 229 | # plt.imshow(bdepth0_img.numpy().transpose((1,2,0))) 230 | # plt.figure() 231 | # plt.imshow(bdepth1_img.numpy().transpose((1,2,0))) 232 | plt.figure() 233 | plt.imshow(bmask0_img.numpy().transpose((1,2,0))) 234 | plt.figure() 235 | plt.imshow(bmask1_img.numpy().transpose((1,2,0))) 236 | plt.show() 237 | 238 | -------------------------------------------------------------------------------- /code/data/SimpleLoader.py: -------------------------------------------------------------------------------- 1 | """ 2 | This Simple loader partially refers to 3 | https://github.com/NVlabs/learningrigidity/blob/master/SimpleLoader.py 4 | 5 | @author: Zhaoyang Lv 6 | @date: May, 2019 7 | """ 8 | 9 | import sys, os, random 10 | import torch.utils.data as data 11 | import os.path as osp 12 | 13 | import numpy as np 14 | 15 | from scipy.misc import imread 16 | 17 | class SimpleLoader(data.Dataset): 18 | 19 | def __init__(self, color_dir, depth_dir, K): 20 | """ 21 | :param the directory of color images 22 | :param the directory of depth images 23 | :param the intrinsic parameter [fx, fy, cx, cy] 24 | """ 25 | 26 | print('This simple loader is designed for TUM. \n\ 27 | The depth scale may be different in your depth format. ') 28 | 29 | color_files = sorted(os.listdir(color_dir)) 30 | depth_files = sorted(os.listdir(depth_dir)) 31 | 32 | # please ensure the two folders use the same number of synchronized files 33 | assert(len(color_files) == len(depth_files)) 34 | 35 | self.color_pairs = [] 36 | self.depth_pairs = [] 37 | self.ids = len(color_files) - 1 38 | for idx in range(self.ids): 39 | self.color_pairs.append([ 40 | osp.join(color_dir, color_files[idx]), 41 | osp.join(color_dir, color_files[idx+1]) 42 | ] ) 43 | self.depth_pairs.append([ 44 | osp.join(depth_dir, depth_files[idx]), 45 | osp.join(depth_dir, depth_files[idx+1]) 46 | ] ) 47 | 48 | self.K = K 49 | 50 | def __getitem__(self, index): 51 | 52 | image0_path, image1_path = self.color_pairs[index] 53 | depth0_path, depth1_path = self.depth_pairs[index] 54 | 55 | image0 = self.__load_rgb_tensor(image0_path) 56 | image1 = self.__load_rgb_tensor(image1_path) 57 | 58 | depth0 = self.__load_depth_tensor(depth0_path) 59 | depth1 = self.__load_depth_tensor(depth1_path) 60 | 61 | calib = np.asarray(self.K, dtype=np.float32) 62 | 63 | return image0, image1, depth0, depth1, calib 64 | 65 | def __len__(self): 66 | return self.ids 67 | 68 | def __load_rgb_tensor(self, path): 69 | image = imread(path) 70 | image = image.astype(np.float32) / 255.0 71 | image = np.transpose(image, (2,0,1)) 72 | return image.astype(np.float32) 73 | 74 | def __load_depth_tensor(self, path): 75 | assert(path.endswith('.png')) 76 | depth = imread(path).astype(np.float32) / 5e3 77 | depth = np.clip(depth, a_min=0.5, a_max=5.0) 78 | 79 | return depth[np.newaxis, :] -------------------------------------------------------------------------------- /code/data/TUM_RGBD.py: -------------------------------------------------------------------------------- 1 | """ 2 | Data loader for TUM RGBD benchmark 3 | 4 | @author: Zhaoyang Lv 5 | @date: March 2019 6 | """ 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | from __future__ import unicode_literals 12 | 13 | import sys, os, random 14 | import pickle 15 | 16 | import numpy as np 17 | import os.path as osp 18 | import torch.utils.data as data 19 | 20 | from scipy.misc import imread 21 | from tqdm import tqdm 22 | from transforms3d import quaternions 23 | from cv2 import resize, INTER_NEAREST 24 | 25 | """ 26 | The following scripts use the directory structure as: 27 | 28 | root 29 | ---- fr1 30 | -------- rgbd_dataset_freiburg1_360 31 | -------- rgbd_dataset_freiburg1_desk 32 | -------- rgbd_dataset_freiburg1_desk2 33 | -------- ... 34 | ---- fr2 35 | -------- rgbd_dataset_freiburg2_360_hemisphere 36 | -------- rgbd_dataset_freiburg2_360_kidnap 37 | -------- rgbd_dataset_freiburg2_coke 38 | -------- .... 39 | ---- fr3 40 | -------- rgbd_dataset_freiburg3_cabinet 41 | -------- rgbd_dataset_freiburg3_long_office_household 42 | -------- rgbd_dataset_freiburg3_nostructure_notexture_far 43 | -------- .... 44 | """ 45 | 46 | def tum_trainval_dict(): 47 | """ the sequence dictionary of TUM dataset 48 | https://vision.in.tum.de/data/datasets/rgbd-dataset/download 49 | 50 | The calibration parameters refers to: 51 | https://vision.in.tum.de/data/datasets/rgbd-dataset/file_formats 52 | """ 53 | return { 54 | 'fr1': { 55 | 'calib': [525.0, 525.0, 319.5, 239.5], 56 | 'seq': ['rgbd_dataset_freiburg1_desk2', 57 | 'rgbd_dataset_freiburg1_floor', 58 | 'rgbd_dataset_freiburg1_room', 59 | 'rgbd_dataset_freiburg1_xyz', 60 | 'rgbd_dataset_freiburg1_rpy', 61 | 'rgbd_dataset_freiburg1_plant', 62 | 'rgbd_dataset_freiburg1_teddy'] 63 | }, 64 | 65 | 'fr2': { 66 | 'calib': [525.0, 525.0, 319.5, 239.5], 67 | 'seq': ['rgbd_dataset_freiburg2_360_hemisphere', 68 | 'rgbd_dataset_freiburg2_large_no_loop', 69 | 'rgbd_dataset_freiburg2_large_with_loop', 70 | 'rgbd_dataset_freiburg2_pioneer_slam', 71 | 'rgbd_dataset_freiburg2_pioneer_slam2', 72 | 'rgbd_dataset_freiburg2_pioneer_slam3', 73 | 'rgbd_dataset_freiburg2_xyz', 74 | 'rgbd_dataset_freiburg2_360_kidnap', 75 | 'rgbd_dataset_freiburg2_rpy', 76 | 'rgbd_dataset_freiburg2_coke', 77 | 'rgbd_dataset_freiburg2_desk_with_person', 78 | 'rgbd_dataset_freiburg2_dishes', 79 | 'rgbd_dataset_freiburg2_flowerbouquet_brownbackground', 80 | 'rgbd_dataset_freiburg2_metallic_sphere2', 81 | 'rgbd_dataset_freiburg2_flowerbouquet' 82 | ] 83 | }, 84 | 85 | 'fr3': { 86 | 'calib': [525.0, 525.0, 319.5, 239.5], 87 | 'seq': [ 88 | 'rgbd_dataset_freiburg3_walking_halfsphere', 89 | 'rgbd_dataset_freiburg3_walking_rpy', 90 | 'rgbd_dataset_freiburg3_cabinet', 91 | 'rgbd_dataset_freiburg3_nostructure_notexture_far', 92 | 'rgbd_dataset_freiburg3_nostructure_notexture_near_withloop', 93 | 'rgbd_dataset_freiburg3_nostructure_texture_far', 94 | 'rgbd_dataset_freiburg3_nostructure_texture_near_withloop', 95 | 'rgbd_dataset_freiburg3_sitting_rpy', 96 | 'rgbd_dataset_freiburg3_sitting_static', 97 | 'rgbd_dataset_freiburg3_sitting_xyz', 98 | 'rgbd_dataset_freiburg3_structure_notexture_near', 99 | 'rgbd_dataset_freiburg3_structure_texture_far', 100 | 'rgbd_dataset_freiburg3_structure_texture_near', 101 | 'rgbd_dataset_freiburg3_teddy'] 102 | } 103 | } 104 | 105 | def tum_test_dict(): 106 | """ the trajectorys held out for testing TUM dataset 107 | """ 108 | return { 109 | 'fr1': { 110 | 'calib': [525.0, 525.0, 319.5, 239.5], 111 | 'seq': ['rgbd_dataset_freiburg1_360', 112 | 'rgbd_dataset_freiburg1_desk'] 113 | }, 114 | 115 | 'fr2': { 116 | 'calib': [525.0, 525.0, 319.5, 239.5], 117 | 'seq': ['rgbd_dataset_freiburg2_desk', 118 | 'rgbd_dataset_freiburg2_pioneer_360'] 119 | }, 120 | 121 | 'fr3': { 122 | 'calib': [525.0, 525.0, 319.5, 239.5], 123 | 'seq': ['rgbd_dataset_freiburg3_walking_static', # dynamic scene 124 | 'rgbd_dataset_freiburg3_walking_xyz', # dynamic scene 125 | 'rgbd_dataset_freiburg3_long_office_household'] 126 | } 127 | } 128 | 129 | class TUM(data.Dataset): 130 | 131 | base = 'data' 132 | 133 | def __init__(self, root = '', category='train', 134 | keyframes=[1], data_transform=None, select_traj=None): 135 | """ 136 | :param the root directory of the data 137 | :param select the category (train, validation,test) 138 | :param select the number of keyframes 139 | Test data only support one keyfame at one time 140 | Train/validation data support mixing different keyframes 141 | :param select one particular trajectory at runtime 142 | Only support for testing 143 | """ 144 | super(TUM, self).__init__() 145 | 146 | self.image_seq = [] 147 | self.depth_seq = [] 148 | self.invalid_seq = [] 149 | self.cam_pose_seq= [] 150 | self.calib = [] 151 | self.seq_names = [] 152 | 153 | self.ids = 0 154 | self.seq_acc_ids = [0] 155 | self.keyframes = keyframes 156 | 157 | self.transforms = data_transform 158 | 159 | if category == 'test': 160 | self.__load_test(root+'/data_tum', select_traj) 161 | else: # train and validation 162 | self.__load_train_val(root+'/data_tum', category) 163 | 164 | # downscale the input image to a quarter 165 | self.fx_s = 0.25 166 | self.fy_s = 0.25 167 | 168 | print('TUM dataloader for {:} using keyframe {:}: \ 169 | {:} valid frames'.format(category, keyframes, self.ids)) 170 | 171 | def __load_train_val(self, root, category): 172 | 173 | tum_data = tum_trainval_dict() 174 | 175 | for ks, scene in tum_data.items(): 176 | for seq_name in scene['seq']: 177 | 178 | seq_path = osp.join(ks, seq_name) 179 | self.calib.append(scene['calib']) 180 | # synchronized trajectory file 181 | sync_traj_file = osp.join(root, seq_path, 'sync_trajectory.pkl') 182 | if not osp.isfile(sync_traj_file): 183 | print("The synchronized trajectory file {:} has not been generated.".format(seq_path)) 184 | print("Generate it now...") 185 | write_sync_trajectory(root, ks, seq_name) 186 | 187 | with open(sync_traj_file, 'rb') as p: 188 | trainval = pickle.load(p) 189 | total_num = len(trainval) 190 | # the ratio to split the train & validation set 191 | if category == 'train': 192 | start_idx, end_idx = 0, int(0.95*total_num) 193 | else: 194 | start_idx, end_idx = int(0.95*total_num), total_num 195 | 196 | images = [trainval[idx][1] for idx in range(start_idx, end_idx)] 197 | depths = [trainval[idx][2] for idx in range(start_idx, end_idx)] 198 | extrin = [tq2mat(trainval[idx][0]) for idx in range(start_idx, end_idx)] 199 | self.image_seq.append(images) 200 | self.depth_seq.append(depths) 201 | self.cam_pose_seq.append(extrin) 202 | self.seq_names.append(seq_path) 203 | self.ids += max(0, len(images) - max(self.keyframes)) 204 | self.seq_acc_ids.append(self.ids) 205 | 206 | def __load_test(self, root, select_traj=None): 207 | """ Note: 208 | The test trajectory is loaded slightly different from the train/validation trajectory. 209 | We only select keyframes from the entire trajectory, rather than use every individual frame. 210 | For a given trajectory of length N, using key-frame 2, the train/validation set will use 211 | [[1, 3], [2, 4], [3, 5],...[N-1, N]], 212 | while test set will use pair 213 | [[1, 3], [3, 5], [5, 7],...[N-1, N]] 214 | This difference result in a change in the trajectory length when using different keyframes. 215 | 216 | The benefit of sampling keyframes of the test set is that the output is a more reasonable trajectory; 217 | And in training/validation, we fully leverage every pair of image. 218 | """ 219 | 220 | tum_data = tum_test_dict() 221 | 222 | assert(len(self.keyframes) == 1) 223 | kf = self.keyframes[0] 224 | self.keyframes = [1] 225 | 226 | for ks, scene in tum_data.items(): 227 | for seq_name in scene['seq']: 228 | seq_path = osp.join(ks, seq_name) 229 | 230 | if select_traj is not None: 231 | if seq_path != select_traj: continue 232 | 233 | self.calib.append(scene['calib']) 234 | # synchronized trajectory file 235 | sync_traj_file = osp.join(root, seq_path, 'sync_trajectory.pkl') 236 | if not osp.isfile(sync_traj_file): 237 | print("The synchronized trajectory file {:} has not been generated.".format(seq_path)) 238 | print("Generate it now...") 239 | write_sync_trajectory(root, ks, seq_name) 240 | 241 | with open(sync_traj_file, 'rb') as p: 242 | frames = pickle.load(p) 243 | total_num = len(frames) 244 | 245 | images = [frames[idx][1] for idx in range(0, total_num, kf)] 246 | depths = [frames[idx][2] for idx in range(0, total_num, kf)] 247 | extrin = [tq2mat(frames[idx][0]) for idx in range(0, total_num, kf)] 248 | self.image_seq.append(images) 249 | self.depth_seq.append(depths) 250 | self.cam_pose_seq.append(extrin) 251 | self.seq_names.append(seq_path) 252 | self.ids += max(0, len(images)-1) 253 | self.seq_acc_ids.append(self.ids) 254 | 255 | if len(self.image_seq) == 0: 256 | raise Exception("The specified trajectory is not in the test set.") 257 | 258 | def __getitem__(self, index): 259 | seq_idx = max(np.searchsorted(self.seq_acc_ids, index+1) - 1, 0) 260 | frame_idx = index - self.seq_acc_ids[seq_idx] 261 | 262 | this_idx = frame_idx 263 | next_idx = frame_idx + random.choice(self.keyframes) 264 | 265 | color0 = self.__load_rgb_tensor(self.image_seq[seq_idx][this_idx]) 266 | color1 = self.__load_rgb_tensor(self.image_seq[seq_idx][next_idx]) 267 | 268 | depth0 = self.__load_depth_tensor(self.depth_seq[seq_idx][this_idx]) 269 | depth1 = self.__load_depth_tensor(self.depth_seq[seq_idx][next_idx]) 270 | 271 | if self.transforms: 272 | color0, color1 = self.transforms([color0, color1]) 273 | 274 | # normalize the coordinate 275 | calib = np.asarray(self.calib[seq_idx], dtype=np.float32) 276 | calib[0] *= self.fx_s 277 | calib[1] *= self.fy_s 278 | calib[2] *= self.fx_s 279 | calib[3] *= self.fy_s 280 | 281 | cam_pose0 = self.cam_pose_seq[seq_idx][this_idx] 282 | cam_pose1 = self.cam_pose_seq[seq_idx][next_idx] 283 | transform = np.dot(np.linalg.inv(cam_pose1), cam_pose0).astype(np.float32) 284 | 285 | name = '{:}_{:06d}to{:06d}'.format(self.seq_names[seq_idx], 286 | this_idx, next_idx) 287 | 288 | return color0, color1, depth0, depth1, transform, calib, name 289 | 290 | def __len__(self): 291 | return self.ids 292 | 293 | def __load_rgb_tensor(self, path): 294 | """ Load the rgb image 295 | """ 296 | image = imread(path)[:, :, :3] 297 | image = image.astype(np.float32) / 255.0 298 | image = resize(image, None, fx=self.fx_s, fy=self.fy_s) 299 | return image 300 | 301 | def __load_depth_tensor(self, path): 302 | """ Load the depth: 303 | The depth images are scaled by a factor of 5000, i.e., a pixel 304 | value of 5000 in the depth image corresponds to a distance of 305 | 1 meter from the camera, 10000 to 2 meter distance, etc. 306 | A pixel value of 0 means missing value/no data. 307 | """ 308 | depth = imread(path).astype(np.float32) / 5e3 309 | depth = resize(depth, None, fx=self.fx_s, fy=self.fy_s, interpolation=INTER_NEAREST) 310 | depth = np.clip(depth, a_min=0.5, a_max=5.0) # the accurate range of kinect depth 311 | return depth[np.newaxis, :] 312 | 313 | """ 314 | Some utility files to work with the data 315 | """ 316 | 317 | def tq2mat(tq): 318 | """ transform translation-quaternion (tq) to (4x4) matrix 319 | """ 320 | tq = np.array(tq) 321 | T = np.eye(4) 322 | T[:3,:3] = quaternions.quat2mat(np.roll(tq[3:], 1)) 323 | T[:3, 3] = tq[:3] 324 | return T 325 | 326 | def write_sync_trajectory(local_dir, dataset, subject_name): 327 | """ 328 | :param the root of the directory 329 | :param the dataset category 'fr1', 'fr2' or 'fr3' 330 | """ 331 | rgb_file = osp.join(local_dir, dataset, subject_name, 'rgb.txt') 332 | depth_file= osp.join(local_dir, dataset, subject_name, 'depth.txt') 333 | pose_file = osp.join(local_dir, dataset, subject_name, 'groundtruth.txt') 334 | 335 | rgb_list = read_file_list(rgb_file) 336 | depth_list=read_file_list(depth_file) 337 | pose_list = read_file_list(pose_file) 338 | 339 | matches = associate_three(rgb_list, depth_list, pose_list, offset=0.0, max_difference=0.02) 340 | 341 | trajectory_info = [] 342 | for (a,b,c) in matches: 343 | pose = [float(x) for x in pose_list[c]] 344 | rgb_file = osp.join(local_dir, dataset, subject_name, rgb_list[a][0]) 345 | depth_file = osp.join(local_dir, dataset, subject_name, depth_list[b][0]) 346 | trajectory_info.append([pose, rgb_file, depth_file]) 347 | 348 | dataset_path = osp.join(local_dir, dataset, subject_name, 'sync_trajectory.pkl') 349 | 350 | with open(dataset_path, 'wb') as output: 351 | pickle.dump(trajectory_info, output) 352 | 353 | txt_path = osp.join(local_dir, dataset, subject_name, 'sync_trajectory.txt') 354 | pickle2txts(dataset_path, txt_path) 355 | 356 | def pickle2txts(pickle_file, txt_file): 357 | ''' 358 | write the pickle_file into a txt_file 359 | ''' 360 | with open(pickle_file, 'rb') as pkl_file: 361 | traj = pickle.load(pkl_file) 362 | 363 | with open(txt_file, 'w') as f: 364 | for frame in traj: 365 | f.write(' '.join(['%f ' % x for x in frame[0]])) 366 | f.write(frame[1] + ' ') 367 | f.write(frame[2] + '\n') 368 | 369 | """ 370 | The following utility files are provided by TUM RGBD dataset benchmark 371 | 372 | Refer: https://vision.in.tum.de/data/datasets/rgbd-dataset/tools 373 | """ 374 | 375 | def read_file_list(filename): 376 | """ 377 | Reads a trajectory from a text file. 378 | 379 | File format: 380 | The file format is "stamp d1 d2 d3 ...", where stamp denotes the time stamp (to be matched) 381 | and "d1 d2 d3.." is arbitary data (e.g., a 3D position and 3D orientation) associated to this timestamp. 382 | 383 | Input: 384 | filename -- File name 385 | 386 | Output: 387 | dict -- dictionary of (stamp,data) tuples 388 | 389 | """ 390 | file = open(filename) 391 | data = file.read() 392 | lines = data.replace(","," ").replace("\t"," ").split("\n") 393 | list = [[v.strip() for v in line.split(" ") if v.strip()!=""] for line in lines if len(line)>0 and line[0]!="#"] 394 | list = [(float(l[0]),l[1:]) for l in list if len(l)>1] 395 | return dict(list) 396 | 397 | def associate(first_list, second_list,offset,max_difference): 398 | """ 399 | Associate two dictionaries of (stamp,data). As the time stamps never match exactly, we aim 400 | to find the closest match for every input tuple. 401 | 402 | Input: 403 | first_list -- first dictionary of (stamp,data) tuples 404 | second_list -- second dictionary of (stamp,data) tuples 405 | offset -- time offset between both dictionaries (e.g., to model the delay between the sensors) 406 | max_difference -- search radius for candidate generation 407 | 408 | Output: 409 | matches -- list of matched tuples ((stamp1,data1),(stamp2,data2)) 410 | 411 | """ 412 | first_keys = first_list.keys() 413 | second_keys = second_list.keys() 414 | potential_matches = [(abs(a - (b + offset)), a, b) 415 | for a in first_keys 416 | for b in second_keys 417 | if abs(a - (b + offset)) < max_difference] 418 | potential_matches.sort() 419 | matches = [] 420 | for diff, a, b in potential_matches: 421 | if a in first_keys and b in second_keys: 422 | first_keys.remove(a) 423 | second_keys.remove(b) 424 | matches.append((a, b)) 425 | 426 | matches.sort() 427 | return matches 428 | 429 | def associate_three(first_list, second_list, third_list, offset, max_difference): 430 | """ 431 | Associate two dictionaries of (stamp,data). As the time stamps never match exactly, we aim to find the closest match for every input tuple. 432 | 433 | Input: 434 | first_list -- first dictionary of (stamp,data) tuples (default to be rgb) 435 | second_list -- second dictionary of (stamp,data) tuples (default to be depth) 436 | third_list -- third dictionary of (stamp,data) tuples (default to be pose) 437 | offset -- time offset between both dictionaries (e.g., to model the delay between the sensors) 438 | max_difference -- search radius for candidate generation 439 | 440 | Output: 441 | matches -- list of matched tuples ((stamp1,data1),(stamp2,data2),(stamp3,data3)) 442 | """ 443 | 444 | first_keys = list(first_list) 445 | second_keys = list(second_list) 446 | third_keys = list(third_list) 447 | # find the potential matches in (rgb, depth) 448 | potential_matches_ab = [(abs(a - (b + offset)), a, b) 449 | for a in first_keys 450 | for b in second_keys 451 | if abs(a - (b + offset)) < max_difference] 452 | potential_matches_ab.sort() 453 | matches_ab = [] 454 | for diff, a, b in potential_matches_ab: 455 | if a in first_keys and b in second_keys: 456 | matches_ab.append((a, b)) 457 | 458 | matches_ab.sort() 459 | 460 | # find the potential matches in (rgb, depth, pose) 461 | potential_matches = [(abs(a - (c + offset)), abs(b - (c + offset)), a,b,c) 462 | for (a,b) in matches_ab 463 | for c in third_keys 464 | if abs(a - (c + offset)) < max_difference and 465 | abs(b - (c + offset)) < max_difference] 466 | 467 | potential_matches.sort() 468 | matches_abc = [] 469 | for diff_rgb, diff_depth, a, b, c in potential_matches: 470 | if a in first_keys and b in second_keys and c in third_keys: 471 | first_keys.remove(a) 472 | second_keys.remove(b) 473 | third_keys.remove(c) 474 | matches_abc.append((a,b,c)) 475 | matches_abc.sort() 476 | return matches_abc 477 | 478 | if __name__ == '__main__': 479 | 480 | loader = TUM(category='test', keyframes=[1]) 481 | 482 | import torchvision.utils as torch_utils 483 | 484 | torch_loader = data.DataLoader(loader, batch_size=16, 485 | shuffle=False, num_workers=4) 486 | 487 | for batch in torch_loader: 488 | color0, color1, depth0, depth1, transform, K, name = batch 489 | B, C, H, W = color0.shape 490 | 491 | bcolor0_img = torch_utils.make_grid(color0, nrow=4) 492 | 493 | import matplotlib.pyplot as plt 494 | plt.figure() 495 | plt.imshow(bcolor0_img.numpy().transpose(1,2,0)) 496 | plt.show() -------------------------------------------------------------------------------- /code/data/data_examples/TUM/color/1305031790.645155.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/color/1305031790.645155.png -------------------------------------------------------------------------------- /code/data/data_examples/TUM/color/1305031790.713097.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/color/1305031790.713097.png -------------------------------------------------------------------------------- /code/data/data_examples/TUM/color/1305031790.781258.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/color/1305031790.781258.png -------------------------------------------------------------------------------- /code/data/data_examples/TUM/color/1305031790.845151.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/color/1305031790.845151.png -------------------------------------------------------------------------------- /code/data/data_examples/TUM/color/1305031790.913129.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/color/1305031790.913129.png -------------------------------------------------------------------------------- /code/data/data_examples/TUM/depth/1305031790.640468.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/depth/1305031790.640468.png -------------------------------------------------------------------------------- /code/data/data_examples/TUM/depth/1305031790.709421.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/depth/1305031790.709421.png -------------------------------------------------------------------------------- /code/data/data_examples/TUM/depth/1305031790.773548.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/depth/1305031790.773548.png -------------------------------------------------------------------------------- /code/data/data_examples/TUM/depth/1305031790.839363.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/depth/1305031790.839363.png -------------------------------------------------------------------------------- /code/data/data_examples/TUM/depth/1305031790.909436.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/depth/1305031790.909436.png -------------------------------------------------------------------------------- /code/data/dataloader.py: -------------------------------------------------------------------------------- 1 | """ The dataloaders for training and evaluation 2 | 3 | @author: Zhaoyang Lv 4 | @date: March 2019 5 | """ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | from __future__ import unicode_literals 11 | 12 | import torchvision.transforms as transforms 13 | import numpy as np 14 | 15 | def load_data(dataset_name, keyframes = None, load_type = 'train', 16 | select_trajectory = '', load_numpy = False): 17 | """ Use two frame camera pose data loader 18 | """ 19 | if select_trajectory == '': 20 | select_trajectory = None 21 | 22 | if not load_numpy: 23 | if load_type == 'train': 24 | data_transform = image_transforms(['color_augment', 'numpy2torch']) 25 | else: 26 | data_transform = image_transforms(['numpy2torch']) 27 | else: 28 | data_transform = image_transforms([]) 29 | 30 | if dataset_name == 'TUM_RGBD': 31 | from data.TUM_RGBD import TUM 32 | np_loader = TUM('data', load_type, keyframes, data_transform, select_trajectory) 33 | elif dataset_name == 'MovingObjects3D': 34 | from data.MovingObj3D import MovingObjects3D 35 | np_loader = MovingObjects3D('data', load_type, keyframes, data_transform) 36 | # elif dataset_name == 'BundleFusion': 37 | # from data.BundleFusion import BundleFusion 38 | # np_loader = BundleFusion(load_type, keyframes, data_transform) 39 | # elif dataset_name == 'Refresh': 40 | # from data.REFRESH import REFRESH 41 | # np_loader = REFRESH(load_type, keyframes) 42 | else: 43 | raise NotImplementedError() 44 | 45 | return np_loader 46 | 47 | def image_transforms(options): 48 | 49 | transform_list = [] 50 | 51 | if 'color_augment' in options: 52 | augment_parameters = [0.9, 1.1, 0.9, 1.1, 0.9, 1.1] 53 | transform_list.append(AugmentImages(augment_parameters)) 54 | 55 | if 'numpy2torch' in options: 56 | transform_list.append(ToTensor()) 57 | 58 | # if 'color_normalize' in options: # we do it on the fly 59 | # transform_list.append(ColorNormalize()) 60 | 61 | return transforms.Compose(transform_list) 62 | 63 | class ColorNormalize(object): 64 | 65 | def __init__(self): 66 | rgb_mean = (0.4914, 0.4822, 0.4465) 67 | rgb_std = (0.2023, 0.1994, 0.2010) 68 | self.transform = transforms.Normalize(mean=rgb_mean, std=rgb_std) 69 | 70 | def __call__(self, sample): 71 | return [self.transform(x) for x in sample] 72 | 73 | class ToTensor(object): 74 | def __init__(self): 75 | self.transform = transforms.ToTensor() 76 | 77 | def __call__(self, sample): 78 | return [self.transform(x) for x in sample] 79 | 80 | class AugmentImages(object): 81 | def __init__(self, augment_parameters): 82 | self.gamma_low = augment_parameters[0] # 0.9 83 | self.gamma_high = augment_parameters[1] # 1.1 84 | self.brightness_low = augment_parameters[2] # 0.9 85 | self.brightness_high = augment_parameters[3] # 1,1 86 | self.color_low = augment_parameters[4] # 0.9 87 | self.color_high = augment_parameters[5] # 1.1 88 | 89 | self.thresh = 0.5 90 | 91 | def __call__(self, sample): 92 | p = np.random.uniform(0, 1, 1) 93 | if p > self.thresh: 94 | random_gamma = np.random.uniform(self.gamma_low, self.gamma_high) 95 | random_brightness = np.random.uniform(self.brightness_low, self.brightness_high) 96 | random_colors = np.random.uniform(self.color_low, self.color_high, 3) 97 | for x in sample: 98 | x = x ** random_gamma # randomly shift gamma 99 | x = x * random_brightness # randomly shift brightness 100 | for i in range(3): # randomly shift color 101 | x[:, :, i] *= random_colors[i] 102 | x[:, :, i] *= random_colors[i] 103 | x = np.clip(x, a_min=0, a_max=1) # saturate 104 | return sample 105 | else: 106 | return sample 107 | -------------------------------------------------------------------------------- /code/evaluate.py: -------------------------------------------------------------------------------- 1 | """ 2 | Evaluation scripts 3 | 4 | @author: Zhaoyang Lv 5 | @date: March 2019 6 | """ 7 | 8 | import os, sys, argparse, pickle 9 | import os.path as osp 10 | import numpy as np 11 | import pandas as pd 12 | 13 | import torch 14 | import torch.utils.data as data 15 | import torchvision.utils as torch_utils 16 | import torch.nn as nn 17 | 18 | import models.LeastSquareTracking as ICtracking 19 | import models.criterions as criterions 20 | import train_utils 21 | import config 22 | 23 | from data.dataloader import load_data 24 | from timers import Timers 25 | from tqdm import tqdm 26 | 27 | def check_directory(filename): 28 | target_dir = os.path.dirname(filename) 29 | if not os.path.isdir(target_dir): 30 | os.makedirs(target_dir) 31 | 32 | def eval_trajectories(dataset): 33 | return { 34 | 'TUM_RGBD': ['fr1/rgbd_dataset_freiburg1_360', 35 | 'fr1/rgbd_dataset_freiburg1_desk', 36 | 'fr2/rgbd_dataset_freiburg2_desk', 37 | 'fr2/rgbd_dataset_freiburg2_pioneer_360'] 38 | }[dataset] 39 | 40 | def create_eval_loaders(options, eval_type, keyframes, 41 | total_batch_size = 8, 42 | trajectory = ''): 43 | """ create the evaluation loader at different keyframes set-up 44 | """ 45 | eval_loaders = {} 46 | 47 | if trajectory == '': 48 | trajectories = eval_trajectories(options.dataset) 49 | else: 50 | trajectories = [trajectory] 51 | 52 | for trajectory in trajectories: 53 | for kf in keyframes: 54 | np_loader = load_data(options.dataset, [kf], eval_type, trajectory) 55 | eval_loaders['{:}_keyframe_{:}'.format(trajectory, kf)] = data.DataLoader(np_loader, 56 | batch_size = int(total_batch_size), 57 | shuffle = False, num_workers = options.cpu_workers) 58 | 59 | return eval_loaders 60 | 61 | def evaluate_trust_region(dataloader, net, objectives, eval_name='', 62 | known_mask = False, timers = None): 63 | """ evaluate the trust-region method given the two-frame pose estimation 64 | :param the pytorch dataloader 65 | :param the network 66 | :param the evaluation objective names, e.g. RPE, EPE3D 67 | :param True if ground mask if known 68 | :param (optional) timing each step 69 | """ 70 | 71 | progress = tqdm(dataloader, ncols=100, 72 | desc = 'evaluate deeper inverse compositional algorithm {:}'.format(eval_name), 73 | total= len(dataloader)) 74 | 75 | net.eval() 76 | 77 | total_frames = len(dataloader.dataset) 78 | 79 | outputs = { 80 | 'R_est': np.zeros((total_frames, 3, 3)), 81 | 't_est': np.zeros((total_frames, 3)), 82 | 'names': [] 83 | } 84 | flow_loss, rpe_loss = None, None 85 | if 'EPE3D' in objectives: 86 | flow_loss = criterions.compute_RT_EPE_loss 87 | outputs['epes'] = np.zeros(total_frames) 88 | if 'RPE' in objectives: 89 | rpe_loss = criterions.compute_RPE_loss 90 | outputs['angular_error'] = np.zeros(total_frames) 91 | outputs['translation_error'] = np.zeros(total_frames) 92 | 93 | count_base = 0 94 | 95 | if timers: timers.tic('one iteration') 96 | 97 | count = 1 98 | for idx, batch in enumerate(progress): 99 | 100 | if timers: timers.tic('forward step') 101 | 102 | names = batch[-1] 103 | 104 | if known_mask: # for dataset that with mask or need mask 105 | color0, color1, depth0, depth1, Rt, K, obj_mask0, obj_mask1 = \ 106 | train_utils.check_cuda(batch[:8]) 107 | else: 108 | color0, color1, depth0, depth1, Rt, K = \ 109 | train_utils.check_cuda(batch[:6]) 110 | obj_mask0, obj_mask1 = None, None 111 | 112 | B, _, H, W = depth0.shape 113 | 114 | with torch.no_grad(): 115 | output = net.forward(color0, color1, depth0, depth1, K) 116 | R, t = output 117 | 118 | if timers: timers.toc('forward step') 119 | 120 | outputs['R_est'][count_base:count_base+B] = R.cpu().numpy() 121 | outputs['t_est'][count_base:count_base+B] = t.cpu().numpy() 122 | 123 | if timers: timers.tic('evaluate') 124 | R_gt, t_gt = Rt[:,:3,:3], Rt[:,:3,3] 125 | if rpe_loss: # evaluate the relative pose error 126 | angle_error, trans_error = rpe_loss(R, t, R_gt, t_gt) 127 | outputs['angular_error'][count_base:count_base+B] = angle_error.cpu().numpy() 128 | outputs['translation_error'][count_base:count_base+B] = trans_error.cpu().numpy() 129 | 130 | if flow_loss:# evaluate the end-point-error loss 3D 131 | invalid_mask = (depth0 == depth0.min()) + (depth0 == depth0.max()) 132 | if obj_mask0 is not None: invalid_mask = ~obj_mask0 + invalid_mask 133 | 134 | epes3d = flow_loss(R, t, R_gt, t_gt, depth0, K, invalid_mask) 135 | outputs['epes'][count_base:count_base+B] = epes3d.cpu().numpy() 136 | 137 | outputs['names'] += names 138 | 139 | count_base += B 140 | 141 | if timers: timers.toc('evaluate') 142 | if timers: timers.toc('one iteration') 143 | if timers: timers.tic('one iteration') 144 | 145 | # if timers: timers.print() 146 | 147 | return outputs 148 | 149 | def test_TrustRegion(options): 150 | 151 | if options.time: 152 | timers = Timers() 153 | else: 154 | timers = None 155 | 156 | print('Evaluate test performance with the (deep) direct method.') 157 | 158 | total_batch_size = options.batch_per_gpu * torch.cuda.device_count() 159 | 160 | keyframes = [int(x) for x in options.keyframes.split(',')] 161 | if options.dataset in ['BundleFusion', 'TUM_RGBD']: 162 | obj_has_mask = False 163 | else: 164 | obj_has_mask = True 165 | 166 | eval_loaders = create_eval_loaders(options, options.eval_set, 167 | keyframes, total_batch_size, options.trajectory) 168 | 169 | if options.checkpoint == '': 170 | print('No checkpoint loaded. Use the non-learning method') 171 | net = ICtracking.LeastSquareTracking( 172 | encoder_name = 'RGB', 173 | max_iter_per_pyr= options.max_iter_per_pyr, 174 | mEst_type = 'None', 175 | solver_type = 'Direct-Nodamping') 176 | if torch.cuda.is_available(): net.cuda() 177 | net.eval() 178 | else: 179 | train_utils.load_checkpoint_test(options) 180 | 181 | net = ICtracking.LeastSquareTracking( 182 | encoder_name = options.encoder_name, 183 | max_iter_per_pyr= options.max_iter_per_pyr, 184 | mEst_type = options.mestimator, 185 | solver_type = options.solver, 186 | no_weight_sharing = options.no_weight_sharing) 187 | 188 | if torch.cuda.is_available(): net.cuda() 189 | net.eval() 190 | 191 | # check whether it is a single checkpoint or a directory 192 | net.load_state_dict(torch.load(options.checkpoint)['state_dict']) 193 | 194 | eval_objectives = ['EPE3D', 'RPE'] 195 | 196 | output_prefix = '_'.join([ 197 | options.network, 198 | options.encoder_name, 199 | options.mestimator, 200 | options.solver, 201 | 'iter', str(options.max_iter_per_pyr) 202 | ]) 203 | 204 | # evaluate results per trajectory per key-frame 205 | outputs = {} 206 | for k, loader in eval_loaders.items(): 207 | 208 | traj_name, kf = k.split('_keyframe_') 209 | 210 | output_name = '{:}_{:}'.format(output_prefix, k) 211 | info = evaluate_trust_region(loader, net, 212 | eval_objectives, 213 | eval_name = 'tmp/'+output_name, 214 | known_mask=obj_has_mask) 215 | 216 | # collect results 217 | outputs[k] = pd.Series([info['epes'].mean(), 218 | info['angular_error'].mean(), 219 | info['translation_error'].mean(), 220 | info['epes'].shape[0], int(kf), traj_name], 221 | index=['3D EPE', 'axis error', 'trans error', 'total frames', 'keyframe', 'trajectory']) 222 | 223 | print(outputs[k]) 224 | 225 | checkpoint_name = options.checkpoint.replace('.pth.tar', '') 226 | if checkpoint_name == '': 227 | checkpoint_name = 'nolearning' 228 | output_dir = osp.join(options.eval_set+'_results', checkpoint_name, k) 229 | output_pkl = output_dir + '.pkl' 230 | 231 | check_directory(output_pkl) 232 | 233 | with open(output_pkl, 'wb') as output: # dump per-frame results info 234 | info = info 235 | pickle.dump(info, output) 236 | 237 | """ =============================================================== """ 238 | """ Generate the final evaluation results """ 239 | """ =============================================================== """ 240 | 241 | outputs_pd = pd.DataFrame(outputs).T 242 | outputs_pd['3D EPE'] *= 100 # convert to cm 243 | outputs_pd['axis error'] *= (180/np.pi) # convert to degree 244 | outputs_pd['trans error'] *= 100 # convert to cm 245 | 246 | print(outputs_pd) 247 | 248 | stats_dict = {} 249 | for kf in keyframes: 250 | kf_outputs = outputs_pd[outputs_pd['keyframe']==kf] 251 | 252 | stats_dict['mean values of trajectories keyframe {:}'.format(kf)] = pd.Series( 253 | [kf_outputs['3D EPE'].mean(), 254 | kf_outputs['axis error'].mean(), 255 | kf_outputs['trans error'].mean(), kf], 256 | index=['3D EPE', 'axis error', 'trans error', 'keyframe']) 257 | 258 | total_frames = kf_outputs['total frames'].sum() 259 | stats_dict['mean values of frames keyframe {:}'.format(kf)] = pd.Series( 260 | [(kf_outputs['3D EPE'] * kf_outputs['total frames']).sum() / total_frames, 261 | (kf_outputs['axis error'] * kf_outputs['total frames']).sum() / total_frames, 262 | (kf_outputs['trans error']* kf_outputs['total frames']).sum() / total_frames, kf], 263 | index=['3D EPE', 'axis error', 'trans error', 'keyframe']) 264 | 265 | stats_pd = pd.DataFrame(stats_dict).T 266 | print(stats_pd) 267 | 268 | final_pd = outputs_pd.append(stats_pd, sort=False) 269 | final_pd.to_csv('{:}.csv'.format(output_dir)) 270 | 271 | return outputs_pd 272 | 273 | if __name__ == '__main__': 274 | 275 | parser = argparse.ArgumentParser(description="Evaluate the network") 276 | config.add_basics_config(parser) 277 | config.add_test_basics_config(parser) 278 | config.add_tracking_config(parser) 279 | 280 | options = parser.parse_args() 281 | 282 | print('---------------------------------------') 283 | 284 | outputs = test_TrustRegion(options) 285 | 286 | -------------------------------------------------------------------------------- /code/models/LeastSquareTracking.py: -------------------------------------------------------------------------------- 1 | """ 2 | The learned Inverse Compositional Tracking. 3 | Support both ego-motion and object-motion tracking 4 | 5 | @author: Zhaoyang Lv 6 | @Date: March, 2019 7 | """ 8 | 9 | import torch 10 | import torch.nn as nn 11 | import numpy as np 12 | 13 | from models.submodules import convLayer as conv 14 | from models.submodules import color_normalize 15 | 16 | from models.algorithms import TrustRegionBase as TrustRegion 17 | from models.algorithms import ImagePyramids, DirectSolverNet, FeaturePyramid, DeepRobustEstimator 18 | 19 | class LeastSquareTracking(nn.Module): 20 | 21 | # all enum types 22 | NONE = -1 23 | RGB = 0 24 | 25 | CONV_RGBD = 1 26 | CONV_RGBD2 = 2 27 | 28 | def __init__(self, encoder_name, 29 | max_iter_per_pyr, 30 | mEst_type, 31 | solver_type, 32 | tr_samples = 10, 33 | no_weight_sharing = False, 34 | timers = None): 35 | """ 36 | :param the backbone network used for regression. 37 | :param the maximum number of iterations at each pyramid levels 38 | :param the type of weighting functions. 39 | :param the type of solver. 40 | :param number of samples in trust-region solver 41 | :param True if we do not want to share weight at different pyramid levels 42 | :param (optional) time to benchmark time consumed at each step 43 | """ 44 | super(LeastSquareTracking, self).__init__() 45 | 46 | self.construct_image_pyramids = ImagePyramids([0,1,2,3], pool='avg') 47 | self.construct_depth_pyramids = ImagePyramids([0,1,2,3], pool='max') 48 | 49 | self.timers = timers 50 | 51 | """ =============================================================== """ 52 | """ Initialize the Deep Feature Extractor """ 53 | """ =============================================================== """ 54 | 55 | if encoder_name == 'RGB': 56 | print('The network will use raw image as measurements.') 57 | self.encoder = None 58 | self.encoder_type = self.RGB 59 | context_dim = 1 60 | elif encoder_name == 'ConvRGBD': 61 | print('Use a network with RGB-D information \ 62 | to extract the features') 63 | context_dim = 4 64 | self.encoder = FeaturePyramid(D=context_dim) 65 | self.encoder_type = self.CONV_RGBD 66 | elif encoder_name == 'ConvRGBD2': 67 | print('Use two stream network with two frame input') 68 | context_dim = 8 69 | self.encoder = FeaturePyramid(D=context_dim) 70 | self.encoder_type = self.CONV_RGBD2 71 | else: 72 | raise NotImplementedError() 73 | 74 | """ =============================================================== """ 75 | """ Initialize the Robust Estimator """ 76 | """ =============================================================== """ 77 | 78 | if no_weight_sharing: 79 | self.mEst_func0 = DeepRobustEstimator(mEst_type) 80 | self.mEst_func1 = DeepRobustEstimator(mEst_type) 81 | self.mEst_func2 = DeepRobustEstimator(mEst_type) 82 | self.mEst_func3 = DeepRobustEstimator(mEst_type) 83 | mEst_funcs = [self.mEst_func0, self.mEst_func1, self.mEst_func2, 84 | self.mEst_func3] 85 | else: 86 | self.mEst_func = DeepRobustEstimator(mEst_type) 87 | mEst_funcs = [self.mEst_func, self.mEst_func, self.mEst_func, 88 | self.mEst_func] 89 | 90 | """ =============================================================== """ 91 | """ Initialize the Trust-Region Damping """ 92 | """ =============================================================== """ 93 | 94 | if no_weight_sharing: 95 | # for residual volume, the input K is not assigned correctly 96 | self.solver_func0 = DirectSolverNet(solver_type, samples=tr_samples) 97 | self.solver_func1 = DirectSolverNet(solver_type, samples=tr_samples) 98 | self.solver_func2 = DirectSolverNet(solver_type, samples=tr_samples) 99 | self.solver_func3 = DirectSolverNet(solver_type, samples=tr_samples) 100 | solver_funcs = [self.solver_func0, self.solver_func1, 101 | self.solver_func2, self.solver_func3] 102 | else: 103 | self.solver_func = DirectSolverNet(solver_type, samples=tr_samples) 104 | solver_funcs = [self.solver_func, self.solver_func, 105 | self.solver_func, self.solver_func] 106 | 107 | """ =============================================================== """ 108 | """ Initialize the Trust-Region Method """ 109 | """ =============================================================== """ 110 | 111 | self.tr_update0 = TrustRegion(max_iter_per_pyr, 112 | mEst_func = mEst_funcs[0], 113 | solver_func = solver_funcs[0], 114 | timers = timers) 115 | self.tr_update1 = TrustRegion(max_iter_per_pyr, 116 | mEst_func = mEst_funcs[1], 117 | solver_func = solver_funcs[1], 118 | timers = timers) 119 | self.tr_update2 = TrustRegion(max_iter_per_pyr, 120 | mEst_func = mEst_funcs[2], 121 | solver_func = solver_funcs[2], 122 | timers = timers) 123 | self.tr_update3 = TrustRegion(max_iter_per_pyr, 124 | mEst_func = mEst_funcs[3], 125 | solver_func = solver_funcs[3], 126 | timers = timers) 127 | 128 | def forward(self, img0, img1, depth0, depth1, K, init_only=False): 129 | """ 130 | :input 131 | :param the reference image 132 | :param the target image 133 | :param the inverse depth of the reference image 134 | :param the inverse depth of the target image 135 | :param the pin-hole camera instrinsic (in vector) [fx, fy, cx, cy] 136 | :param the initial pose [Rotation, translation] 137 | -------- 138 | :return 139 | :param estimated transform 140 | """ 141 | if self.timers: self.timers.tic('extract features') 142 | 143 | # pre-processing all the data, all the invalid inputs depth are set to 0 144 | invD0 = torch.clamp(1.0 / depth0, 0, 10) 145 | invD1 = torch.clamp(1.0 / depth1, 0, 10) 146 | invD0[invD0 == invD0.min()] = 0 147 | invD1[invD1 == invD1.min()] = 0 148 | invD0[invD0 == invD0.max()] = 0 149 | invD1[invD1 == invD1.max()] = 0 150 | 151 | I0 = color_normalize(img0) 152 | I1 = color_normalize(img1) 153 | 154 | x0 = self.__encode_features(I0, invD0, I1, invD1) 155 | x1 = self.__encode_features(I1, invD1, I0, invD0) 156 | d0 = self.construct_depth_pyramids(invD0) 157 | d1 = self.construct_depth_pyramids(invD1) 158 | 159 | if self.timers: self.timers.toc('extract features') 160 | 161 | poses_to_train = [[],[]] # '[rotation, translation]' 162 | B = invD0.shape[0] 163 | R0 = torch.eye(3,dtype=torch.float).expand(B,3,3).type_as(I0) 164 | t0 = torch.zeros(B,3,1,dtype=torch.float).type_as(I0) 165 | poseI = [R0, t0] 166 | 167 | # the prior of the mask 168 | prior_W = torch.ones(d0[3].shape).type_as(d0[3]) 169 | 170 | if self.timers: self.timers.tic('trust-region update') 171 | # trust region update on level 3 172 | K3 = K >> 3 173 | output3 = self.tr_update3(poseI, x0[3], x1[3], d0[3], d1[3], K3, prior_W) 174 | pose3, mEst_W3 = output3[0], output3[1] 175 | poses_to_train[0].append(pose3[0]) 176 | poses_to_train[1].append(pose3[1]) 177 | # trust region update on level 2 178 | K2 = K >> 2 179 | output2 = self.tr_update2(pose3, x0[2], x1[2], d0[2], d1[2], K2, mEst_W3) 180 | pose2, mEst_W2 = output2[0], output2[1] 181 | poses_to_train[0].append(pose2[0]) 182 | poses_to_train[1].append(pose2[1]) 183 | # trust region update on level 1 184 | K1 = K >> 1 185 | output1 = self.tr_update1(pose2, x0[1], x1[1], d0[1], d1[1], K1, mEst_W2) 186 | pose1, mEst_W1 = output1[0], output1[1] 187 | poses_to_train[0].append(pose1[0]) 188 | poses_to_train[1].append(pose1[1]) 189 | # trust-region update on the raw scale 190 | output0 = self.tr_update0(pose1, x0[0], x1[0], d0[0], d1[0], K, mEst_W1) 191 | pose0 = output0[0] 192 | poses_to_train[0].append(pose0[0]) 193 | poses_to_train[1].append(pose0[1]) 194 | if self.timers: self.timers.toc('trust-region update') 195 | 196 | if self.training: 197 | pyr_R = torch.stack(tuple(poses_to_train[0]), dim=1) 198 | pyr_t = torch.stack(tuple(poses_to_train[1]), dim=1) 199 | return pyr_R, pyr_t 200 | else: 201 | return pose0 202 | 203 | def __encode_features(self, img0, invD0, img1, invD1): 204 | """ get the encoded features 205 | """ 206 | if self.encoder_type == self.RGB: 207 | # In the RGB case, we will only use the intensity image 208 | I = self.__color3to1(img0) 209 | x = self.construct_image_pyramids(I) 210 | elif self.encoder_type == self.CONV_RGBD: 211 | m = torch.cat((img0, invD0), dim=1) 212 | x = self.encoder.forward(m) 213 | elif self.encoder_type in [self.CONV_RGBD2]: 214 | m = torch.cat((img0, invD0, img1, invD1), dim=1) 215 | x = self.encoder.forward(m) 216 | else: 217 | raise NotImplementedError() 218 | 219 | x = [self.__Nto1(a) for a in x] 220 | 221 | return x 222 | 223 | def __Nto1(self, x): 224 | """ Take the average of multi-dimension feature into one dimensional, 225 | which boostrap the optimization speed 226 | """ 227 | C = x.shape[1] 228 | return x.sum(dim=1, keepdim=True) / C 229 | 230 | def __color3to1(self, img): 231 | """ Return a gray-scale image 232 | """ 233 | B, _, H, W = img.shape 234 | return (img[:,0] * 0.299 + img[:, 1] * 0.587 + img[:, 2] * 0.114).view(B,1,H,W) 235 | -------------------------------------------------------------------------------- /code/models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/models/__init__.py -------------------------------------------------------------------------------- /code/models/algorithms.py: -------------------------------------------------------------------------------- 1 | """ 2 | The algorithm backbone, primarily the three contributions proposed in our paper 3 | 4 | @author: Zhaoyang Lv 5 | @date: March, 2019 6 | """ 7 | 8 | import torch 9 | import torch.nn as nn 10 | import torchvision.models as models 11 | import torch.nn.functional as func 12 | 13 | import models.geometry as geometry 14 | from models.submodules import convLayer as conv 15 | from models.submodules import fcLayer, initialize_weights 16 | 17 | class TrustRegionBase(nn.Module): 18 | """ 19 | This is the the base function of the trust-region based inverse compositional algorithm. 20 | """ 21 | def __init__(self, 22 | max_iter = 3, 23 | mEst_func = None, 24 | solver_func = None, 25 | timers = None): 26 | """ 27 | :param max_iter, maximum number of iterations 28 | :param mEst_func, the M-estimator function / network 29 | :param solver_func, the trust-region function / network 30 | :param timers, if yes, counting time for each step 31 | """ 32 | super(TrustRegionBase, self).__init__() 33 | 34 | self.max_iterations = max_iter 35 | self.mEstimator = mEst_func 36 | self.directSolver = solver_func 37 | self.timers = timers 38 | 39 | def forward(self, pose, x0, x1, invD0, invD1, K, wPrior=None): 40 | """ 41 | :param pose, the initial pose 42 | (extrinsic of the target frame w.r.t. the referenc frame) 43 | :param x0, the template features 44 | :param x1, the image features 45 | :param invD0, the template inverse depth 46 | :param invD1, the image inverse depth 47 | :param K, the intrinsic parameters, [fx, fy, cx, cy] 48 | :param wPrior (optional), provide an initial weight as input to the convolutional m-estimator 49 | """ 50 | B, C, H, W = x0.shape 51 | px, py = geometry.generate_xy_grid(B,H,W,K) 52 | 53 | if self.timers: self.timers.tic('pre-compute Jacobians') 54 | J_F_p = self.precompute_Jacobian(invD0, x0, px, py, K) 55 | if self.timers: self.timers.toc('pre-compute Jacobians') 56 | 57 | if self.timers: self.timers.tic('compute warping residuals') 58 | residuals, occ = compute_warped_residual(pose, invD0, invD1, \ 59 | x0, x1, px, py, K) 60 | if self.timers: self.timers.toc('compute warping residuals') 61 | 62 | if self.timers: self.timers.tic('robust estimator') 63 | weights = self.mEstimator(residuals, x0, x1, wPrior) 64 | wJ = weights.view(B,-1,1) * J_F_p 65 | if self.timers: self.timers.toc('robust estimator') 66 | 67 | if self.timers: self.timers.tic('pre-compute JtWJ') 68 | JtWJ = torch.bmm(torch.transpose(J_F_p, 1, 2) , wJ) 69 | if self.timers: self.timers.toc('pre-compute JtWJ') 70 | 71 | for idx in range(self.max_iterations): 72 | if self.timers: self.timers.tic('solve x=A^{-1}b') 73 | pose = self.directSolver(JtWJ, 74 | torch.transpose(J_F_p,1,2), weights, residuals, 75 | pose, invD0, invD1, x0, x1, K) 76 | if self.timers: self.timers.toc('solve x=A^{-1}b') 77 | 78 | if self.timers: self.timers.tic('compute warping residuals') 79 | residuals, occ = compute_warped_residual(pose, invD0, invD1, \ 80 | x0, x1, px, py, K) 81 | if self.timers: self.timers.toc('compute warping residuals') 82 | 83 | return pose, weights 84 | 85 | def precompute_Jacobian(self, invD, x, px, py, K): 86 | """ Pre-compute the image Jacobian on the reference frame 87 | refer to equation (13) in the paper 88 | 89 | :param invD, template depth 90 | :param x, template feature 91 | :param px, normalized image coordinate in cols (x) 92 | :param py, normalized image coordinate in rows (y) 93 | :param K, the intrinsic parameters, [fx, fy, cx, cy] 94 | 95 | ------------ 96 | :return precomputed image Jacobian on template 97 | """ 98 | Jf_x, Jf_y = feature_gradient(x) 99 | Jx_p, Jy_p = compute_jacobian_warping(invD, K, px, py) 100 | J_F_p = compute_jacobian_dIdp(Jf_x, Jf_y, Jx_p, Jy_p) 101 | return J_F_p 102 | 103 | class ImagePyramids(nn.Module): 104 | """ Construct the pyramids in the image / depth space 105 | """ 106 | def __init__(self, scales, pool='avg'): 107 | super(ImagePyramids, self).__init__() 108 | if pool == 'avg': 109 | self.multiscales = [nn.AvgPool2d(1< 0: 177 | self.net = nn.Sequential( 178 | conv(True, self.D, 16, 3, dilation=1), 179 | conv(True, 16, 32, 3, dilation=2), 180 | conv(True, 32, 64, 3, dilation=4), 181 | conv(True, 64, 1, 3, dilation=1), 182 | nn.Sigmoid() ) 183 | initialize_weights(self.net) 184 | else: 185 | self.net = None 186 | 187 | def forward(self, residual, x0, x1, ws=None): 188 | """ 189 | :param residual, the residual map 190 | :param x0, the feature map of the template 191 | :param x1, the feature map of the image 192 | :param ws, the initial weighted residual 193 | """ 194 | if self.D == 1: # use residual only 195 | context = residual.abs() 196 | w = self.net(context) 197 | elif self.D == 4: 198 | B, C, H, W = residual.shape 199 | wl = func.interpolate(ws, (H,W), mode='bilinear', align_corners=True) 200 | context = torch.cat((residual.abs(), x0, x1, wl), dim=1) 201 | w = self.net(context) 202 | elif self.D < 0: 203 | w = self.mEst_func(residual) 204 | 205 | return w 206 | 207 | def __weight_Huber(self, x, alpha = 0.02): 208 | """ weight function of Huber loss: 209 | refer to P. 24 w(x) at 210 | https://members.loria.fr/moberger/Enseignement/Master2/Documents/ZhangIVC-97-01.pdf 211 | 212 | Note this current implementation is not differentiable. 213 | """ 214 | abs_x = torch.abs(x) 215 | linear_mask = abs_x > alpha 216 | w = torch.ones(x.shape).type_as(x) 217 | 218 | if linear_mask.sum().item() > 0: 219 | w[linear_mask] = alpha / abs_x[linear_mask] 220 | return w 221 | 222 | def __constant_weight(self, x): 223 | """ mimic the standard least-square when weighting function is constant 224 | """ 225 | return torch.ones(x.shape).type_as(x) 226 | 227 | class DirectSolverNet(nn.Module): 228 | 229 | # the enum types for direct solver 230 | SOLVER_NO_DAMPING = 0 231 | SOLVER_RESIDUAL_VOLUME = 1 232 | 233 | def __init__(self, solver_type, samples=10): 234 | super(DirectSolverNet, self).__init__() 235 | 236 | if solver_type == 'Direct-Nodamping': 237 | self.net = None 238 | self.type = self.SOLVER_NO_DAMPING 239 | elif solver_type == 'Direct-ResVol': 240 | # flattened JtJ and JtR (number of samples, currently fixed at 10) 241 | self.samples = samples 242 | self.net = deep_damping_regressor(D=6*6+6*samples) 243 | self.type = self.SOLVER_RESIDUAL_VOLUME 244 | initialize_weights(self.net) 245 | else: 246 | raise NotImplementedError() 247 | 248 | def forward(self, JtJ, Jt, weights, R, pose0, invD0, invD1, x0, x1, K): 249 | """ 250 | :param JtJ, the approximated Hessian JtJ 251 | :param Jt, the trasposed Jacobian 252 | :param weights, the weight matrix 253 | :param R, the residual 254 | :param pose0, the initial estimated pose 255 | :param invD0, the template inverse depth map 256 | :param invD1, the image inverse depth map 257 | :param x0, the template feature map 258 | :param x1, the image feature map 259 | :param K, the intrinsic parameters 260 | 261 | ----------- 262 | :return updated pose 263 | """ 264 | 265 | B = JtJ.shape[0] 266 | 267 | wR = (weights * R).view(B, -1, 1) 268 | JtR = torch.bmm(Jt, wR) 269 | 270 | if self.type == self.SOLVER_NO_DAMPING: 271 | # Add a small diagonal damping. Without it, the training becomes quite unstable 272 | # Do not see a clear difference by removing the damping in inference though 273 | diag_mask = torch.eye(6).view(1,6,6).type_as(JtJ) 274 | diagJtJ = diag_mask * JtJ 275 | traceJtJ = torch.sum(diagJtJ, (2,1)) 276 | epsilon = (traceJtJ * 1e-6).view(B,1,1) * diag_mask 277 | Hessian = JtJ + epsilon 278 | pose_update = inverse_update_pose(Hessian, JtR, pose0) 279 | elif self.type == self.SOLVER_RESIDUAL_VOLUME: 280 | Hessian = self.__regularize_residual_volume(JtJ, Jt, JtR, weights, 281 | pose0, invD0, invD1, x0, x1, K, sample_range=self.samples) 282 | pose_update = inverse_update_pose(Hessian, JtR, pose0) 283 | else: 284 | raise NotImplementedError() 285 | 286 | return pose_update 287 | 288 | def __regularize_residual_volume(self, JtJ, Jt, JtR, weights, pose, 289 | invD0, invD1, x0, x1, K, sample_range): 290 | """ regularize the approximate with residual volume 291 | 292 | :param JtJ, the approximated Hessian JtJ 293 | :param Jt, the trasposed Jacobian 294 | :param JtR, the Right-hand size residual 295 | :param weights, the weight matrix 296 | :param pose, the initial estimated pose 297 | :param invD0, the template inverse depth map 298 | :param invD1, the image inverse depth map 299 | :param K, the intrinsic parameters 300 | :param x0, the template feature map 301 | :param x1, the image feature map 302 | :param sample_range, the numerb of samples 303 | 304 | --------------- 305 | :return the damped Hessian matrix 306 | """ 307 | # the following current support only single scale 308 | JtR_volumes = [] 309 | 310 | B, C, H, W = x0.shape 311 | px, py = geometry.generate_xy_grid(B, H, W, K) 312 | 313 | diag_mask = torch.eye(6).view(1,6,6).type_as(JtJ) 314 | diagJtJ = diag_mask * JtJ 315 | traceJtJ = torch.sum(diagJtJ, (2,1)) 316 | epsilon = (traceJtJ * 1e-6).view(B,1,1) * diag_mask 317 | n = sample_range 318 | lambdas = torch.logspace(-5, 5, n).type_as(JtJ) 319 | 320 | for s in range(n): 321 | # the epsilon is to prevent the matrix to be too ill-conditioned 322 | D = lambdas[s] * diagJtJ + epsilon 323 | Hessian = JtJ + D 324 | pose_s = inverse_update_pose(Hessian, JtR, pose) 325 | 326 | res_s,_= compute_warped_residual(pose_s, invD0, invD1, x0, x1, px, py, K) 327 | JtR_s = torch.bmm(Jt, (weights * res_s).view(B,-1,1)) 328 | JtR_volumes.append(JtR_s) 329 | 330 | JtR_flat = torch.cat(tuple(JtR_volumes), dim=2).view(B,-1) 331 | JtJ_flat = JtJ.view(B,-1) 332 | damp_est = self.net(torch.cat((JtR_flat, JtJ_flat), dim=1)) 333 | R = diag_mask * damp_est.view(B,6,1) + epsilon # also lift-up 334 | 335 | return JtJ + R 336 | 337 | def deep_damping_regressor(D): 338 | """ Output a damping vector at each dimension 339 | """ 340 | net = nn.Sequential( 341 | fcLayer(in_planes=D, out_planes=128, bias=True), 342 | fcLayer(in_planes=128, out_planes=256, bias=True), 343 | fcLayer(in_planes=256, out_planes=6, bias=True) 344 | ) # the last ReLU makes sure every predicted value is positive 345 | return net 346 | 347 | def feature_gradient(img, normalize_gradient=True): 348 | """ Calculate the gradient on the feature space using Sobel operator 349 | :param the input image 350 | ----------- 351 | :return the gradient of the image in x, y direction 352 | """ 353 | B, C, H, W = img.shape 354 | # to filter the image equally in each channel 355 | wx = torch.FloatTensor([[-1, 0, 1],[-2, 0, 2],[-1, 0, 1]]).view(1,1,3,3).type_as(img) 356 | wy = torch.FloatTensor([[-1,-2,-1],[ 0, 0, 0],[ 1, 2, 1]]).view(1,1,3,3).type_as(img) 357 | 358 | img_reshaped = img.view(-1, 1, H, W) 359 | img_pad = func.pad(img_reshaped, (1,1,1,1), mode='replicate') 360 | img_dx = func.conv2d(img_pad, wx, stride=1, padding=0) 361 | img_dy = func.conv2d(img_pad, wy, stride=1, padding=0) 362 | 363 | if normalize_gradient: 364 | mag = torch.sqrt((img_dx ** 2) + (img_dy ** 2)+ 1e-8) 365 | img_dx = img_dx / mag 366 | img_dy = img_dy / mag 367 | 368 | return img_dx.view(B,C,H,W), img_dy.view(B,C,H,W) 369 | 370 | def compute_jacobian_dIdp(Jf_x, Jf_y, Jx_p, Jy_p): 371 | """ chained gradient of image w.r.t. the pose 372 | :param the Jacobian of the feature map in x direction 373 | :param the Jacobian of the feature map in y direction 374 | :param the Jacobian of the x map to manifold p 375 | :param the Jacobian of the y map to manifold p 376 | ------------ 377 | :return the image jacobian in x, y, direction, Bx2x6 each 378 | """ 379 | B, C, H, W = Jf_x.shape 380 | 381 | # precompute J_F_p, JtWJ 382 | Jf_p = Jf_x.view(B,C,-1,1) * Jx_p.view(B,1,-1,6) + \ 383 | Jf_y.view(B,C,-1,1) * Jy_p.view(B,1,-1,6) 384 | 385 | return Jf_p.view(B,-1,6) 386 | 387 | def compute_jacobian_warping(p_invdepth, K, px, py): 388 | """ Compute the Jacobian matrix of the warped (x,y) w.r.t. the inverse depth 389 | (linearized at origin) 390 | :param p_invdepth the input inverse depth 391 | :param the intrinsic calibration 392 | :param the pixel x map 393 | :param the pixel y map 394 | ------------ 395 | :return the warping jacobian in x, y direction 396 | """ 397 | B, C, H, W = p_invdepth.size() 398 | assert(C == 1) 399 | 400 | x = px.view(B, -1, 1) 401 | y = py.view(B, -1, 1) 402 | invd = p_invdepth.view(B, -1, 1) 403 | 404 | xy = x * y 405 | O = torch.zeros((B, H*W, 1)).type_as(p_invdepth) 406 | 407 | # This is cascaded Jacobian functions of the warping function 408 | # Refer to the supplementary materials for math documentation 409 | dx_dp = torch.cat((-xy, 1+x**2, -y, invd, O, -invd*x), dim=2) 410 | dy_dp = torch.cat((-1-y**2, xy, x, O, invd, -invd*y), dim=2) 411 | 412 | fx, fy, cx, cy = torch.split(K, 1, dim=1) 413 | return dx_dp*fx.view(B,1,1), dy_dp*fy.view(B,1,1) 414 | 415 | def compute_warped_residual(pose, invD0, invD1, x0, x1, px, py, K, obj_mask=None): 416 | """ Compute the residual error of warped target image w.r.t. the reference feature map. 417 | refer to equation (12) in the paper 418 | 419 | :param the forward warping pose from the reference camera to the target frame. 420 | Note that warping from the target frame to the reference frame is the inverse of this operation. 421 | :param the reference inverse depth 422 | :param the target inverse depth 423 | :param the reference feature image 424 | :param the target feature image 425 | :param the pixel x map 426 | :param the pixel y map 427 | :param the intrinsic calibration 428 | ----------- 429 | :return the residual (of reference image), and occlusion information 430 | """ 431 | u_warped, v_warped, inv_z_warped = geometry.batch_warp_inverse_depth( 432 | px, py, invD0, pose, K) 433 | x1_1to0 = geometry.warp_features(x1, u_warped, v_warped) 434 | occ = geometry.check_occ(inv_z_warped, invD1, u_warped, v_warped) 435 | 436 | residuals = x1_1to0 - x0 # equation (12) 437 | 438 | B, C, H, W = x0.shape 439 | if obj_mask is not None: 440 | # determine whether the object is in-view 441 | occ = occ & (obj_mask.view(B,1,H,W) < 1) 442 | residuals[occ.expand(B,C,H,W)] = 1e-3 443 | 444 | return residuals, occ 445 | 446 | def inverse_update_pose(H, Rhs, pose): 447 | """ Ues left-multiplication for the pose update 448 | in the inverse compositional form 449 | refer to equation (10) in the paper 450 | 451 | :param the (approximated) Hessian matrix 452 | :param Right-hand side vector 453 | :param the initial pose (forward transform inverse of xi) 454 | --------- 455 | :return the forward updated pose (inverse of xi) 456 | """ 457 | inv_H = invH(H) 458 | xi = torch.bmm(inv_H, Rhs) 459 | # simplifed inverse compositional for SE3 460 | d_R = geometry.batch_twist2Mat(-xi[:, :3].view(-1,3)) 461 | d_t = -torch.bmm(d_R, xi[:, 3:]) 462 | 463 | R, t = pose 464 | pose = geometry.batch_Rt_compose(R, t, d_R, d_t) 465 | return pose 466 | 467 | def invH(H): 468 | """ Generate (H+damp)^{-1}, with predicted damping values 469 | :param approximate Hessian matrix JtWJ 470 | ----------- 471 | :return the inverse of Hessian 472 | """ 473 | # GPU is much slower for matrix inverse when the size is small (compare to CPU) 474 | # works (50x faster) than inversing the dense matrix in GPU 475 | if H.is_cuda: 476 | # invH = bpinv((H).cpu()).cuda() 477 | # invH = torch.inverse(H) 478 | invH = torch.inverse(H.cpu()).cuda() 479 | else: 480 | invH = torch.inverse(H) 481 | return invH 482 | -------------------------------------------------------------------------------- /code/models/criterions.py: -------------------------------------------------------------------------------- 1 | """ 2 | Some training criterions 3 | 4 | @author: Zhaoyang Lv 5 | @date: March, 2019 6 | """ 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | from __future__ import unicode_literals 12 | 13 | import torch 14 | import torch.nn as nn 15 | import torch.nn.functional as func 16 | import models.geometry as geo 17 | 18 | def EPE3D_loss(input_flow, target_flow, invalid=None): 19 | """ 20 | :param the estimated optical / scene flow 21 | :param the ground truth / target optical / scene flow 22 | :param the invalid mask, the mask has value 1 for all areas that are invalid 23 | """ 24 | epe_map = torch.norm(target_flow-input_flow,p=2,dim=1) 25 | B = epe_map.shape[0] 26 | 27 | invalid_flow = (target_flow != target_flow) # check Nan same as torch.isnan 28 | 29 | mask = (invalid_flow[:,0,:,:] | invalid_flow[:,1,:,:] | invalid_flow[:,2,:,:]) 30 | if invalid is not None: 31 | mask = mask | (invalid.view(mask.shape) > 0) 32 | 33 | epes = [] 34 | for idx in range(B): 35 | epe_sample = epe_map[idx][~mask[idx].data] 36 | if len(epe_sample) == 0: 37 | epes.append(torch.zeros(()).type_as(input_flow)) 38 | else: 39 | epes.append(epe_sample.mean()) 40 | 41 | return torch.stack(epes) 42 | 43 | def RPE(R, t): 44 | """ Calcualte the relative pose error 45 | (a batch version of the RPE error defined in TUM RGBD SLAM TUM dataset) 46 | :param relative rotation 47 | :param relative translation 48 | """ 49 | angle_error = geo.batch_mat2angle(R) 50 | trans_error = torch.norm(t, p=2, dim=1) 51 | return angle_error, trans_error 52 | 53 | def compute_RPE_loss(R_est, t_est, R_gt, t_gt): 54 | """ 55 | :param estimated rotation matrix Bx3x3 56 | :param estimated translation vector Bx3 57 | :param ground truth rotation matrix Bx3x3 58 | :param ground truth translation vector Bx3 59 | """ 60 | dR, dt = geo.batch_Rt_between(R_est, t_est, R_gt, t_gt) 61 | angle_error, trans_error = RPE(dR, dt) 62 | return angle_error, trans_error 63 | 64 | def compute_RT_EPE_loss(R_est, t_est, R_gt, t_gt, depth0, K, invalid=None): 65 | """ Compute the epe point error of rotation & translation 66 | :param estimated rotation matrix Bx3x3 67 | :param estimated translation vector Bx3 68 | :param ground truth rotation matrix Bx3x3 69 | :param ground truth translation vector Bx3 70 | :param reference depth image, 71 | :param camera intrinsic 72 | """ 73 | 74 | loss = 0 75 | if R_est.dim() > 3: # training time [batch, num_poses, rot_row, rot_col] 76 | rH, rW = 60, 80 # we train the algorithm using a downsized input, (since the size of the input is not super important at training time) 77 | 78 | B,C,H,W = depth0.shape 79 | rdepth = func.interpolate(depth0, size=(rH, rW), mode='bilinear') 80 | rinvalid = func.interpolate(invalid.float(), size=(rH,rW), mode='bilinear') 81 | rK = K.clone() 82 | rK[:,0] *= float(rW) / W 83 | rK[:,1] *= float(rH) / H 84 | rK[:,2] *= float(rW) / W 85 | rK[:,3] *= float(rH) / H 86 | xyz = geo.batch_inverse_project(rdepth, rK) 87 | flow_gt = geo.batch_transform_xyz(xyz, R_gt, t_gt, get_Jacobian=False) 88 | 89 | for idx in range(R_est.shape[1]): 90 | flow_est= geo.batch_transform_xyz(xyz, R_est[:,idx], t_est[:,idx], get_Jacobian=False) 91 | loss += EPE3D_loss(flow_est, flow_gt.detach(), rinvalid) #* (1< inv_z_warped - thres) 290 | 291 | inviews = inlier & (u > 0) & (u < W) & \ 292 | (v > 0) & (v < H) 293 | 294 | return 1-inviews 295 | 296 | def warp_features(F, u, v): 297 | """ 298 | Warp the feature map (F) w.r.t. the grid (u, v) 299 | """ 300 | B, C, H, W = F.shape 301 | 302 | u_norm = u / ((W-1)/2) - 1 303 | v_norm = v / ((H-1)/2) - 1 304 | uv_grid = torch.cat((u_norm.view(B,H,W,1), v_norm.view(B,H,W,1)), dim=3) 305 | F_warped = nn.functional.grid_sample(F, uv_grid, 306 | mode='bilinear', padding_mode='border') 307 | return F_warped 308 | 309 | def batch_transform_xyz(xyz_tensor, R, t, get_Jacobian=True): 310 | ''' 311 | transform the point cloud w.r.t. the transformation matrix 312 | :param xyz_tensor: B * 3 * H * W 313 | :param R: rotation matrix B * 3 * 3 314 | :param t: translation vector B * 3 315 | ''' 316 | B, C, H, W = xyz_tensor.size() 317 | t_tensor = t.contiguous().view(B,3,1).repeat(1,1,H*W) 318 | p_tensor = xyz_tensor.contiguous().view(B, C, H*W) 319 | # the transformation process is simply: 320 | # p' = t + R*p 321 | xyz_t_tensor = torch.baddbmm(t_tensor, R, p_tensor) 322 | 323 | if get_Jacobian: 324 | # return both the transformed tensor and its Jacobian matrix 325 | J_r = R.bmm(batch_skew_symmetric_matrix(-1*p_tensor.permute(0,2,1))) 326 | J_t = -1 * torch.eye(3).view(1,3,3).expand(B,3,3) 327 | J = torch.cat((J_r, J_t), 1) 328 | return xyz_t_tensor.view(B, C, H, W), J 329 | else: 330 | return xyz_t_tensor.view(B, C, H, W) 331 | 332 | def flow_from_rigid_transform(depth, extrinsic, intrinsic): 333 | """ 334 | Get the optical flow induced by rigid transform [R,t] and depth 335 | """ 336 | [R, t] = extrinsic 337 | [fx, fy, cx, cy] = intrinsic 338 | 339 | def batch_project(xyz_tensor, K): 340 | """ Project a point cloud into pixels (u,v) given intrinsic K 341 | [u';v';w] = [K][x;y;z] 342 | u = u' / w; v = v' / w 343 | 344 | :param the xyz points 345 | :param calibration is a torch array composed of [fx, fy, cx, cy] 346 | ------- 347 | :return u, v grid tensor in image coordinate 348 | (tested through inverse project) 349 | """ 350 | B, _, H, W = xyz_tensor.size() 351 | batch_K = K.expand(H, W, B, 4).permute(2,3,0,1) 352 | 353 | x, y, z = torch.split(xyz_tensor, 1, dim=1) 354 | fx, fy, cx, cy = torch.split(batch_K, 1, dim=1) 355 | 356 | u = fx*x / z + cx 357 | v = fy*y / z + cy 358 | return torch.cat((u,v), dim=1) 359 | 360 | def batch_inverse_project(depth, K): 361 | """ Inverse project pixels (u,v) to a point cloud given intrinsic 362 | :param depth dim B*H*W 363 | :param calibration is torch array composed of [fx, fy, cx, cy] 364 | :param color (optional) dim B*3*H*W 365 | ------- 366 | :return xyz tensor (batch of point cloud) 367 | (tested through projection) 368 | """ 369 | if depth.dim() == 3: 370 | B, H, W = depth.size() 371 | else: 372 | B, _, H, W = depth.size() 373 | 374 | x, y = generate_xy_grid(B,H,W,K) 375 | z = depth.view(B,1,H,W) 376 | return torch.cat((x*z, y*z, z), dim=1) 377 | 378 | def batch_euler2mat(ai, aj, ak, axes='sxyz'): 379 | """ A torch implementation euler2mat from transform3d: 380 | https://github.com/matthew-brett/transforms3d/blob/master/transforms3d/euler.py 381 | :param ai : First rotation angle (according to `axes`). 382 | :param aj : Second rotation angle (according to `axes`). 383 | :param ak : Third rotation angle (according to `axes`). 384 | :param axes : Axis specification; one of 24 axis sequences as string or encoded tuple - e.g. ``sxyz`` (the default). 385 | ------- 386 | :return rotation matrix, array-like shape (B, 3, 3) 387 | 388 | Tested w.r.t. transforms3d.euler module 389 | """ 390 | B = ai.size()[0] 391 | 392 | try: 393 | firstaxis, parity, repetition, frame = _AXES2TUPLE[axes.lower()] 394 | except (AttributeError, KeyError): 395 | _TUPLE2AXES[axes] # validation 396 | firstaxis, parity, repetition, frame = axes 397 | 398 | i = firstaxis 399 | j = _NEXT_AXIS[i+parity] 400 | k = _NEXT_AXIS[i-parity+1] 401 | order = [i, j, k] 402 | 403 | if frame: 404 | ai, ak = ak, ai 405 | if parity: 406 | ai, aj, ak = -ai, -aj, -ak 407 | 408 | si, sj, sk = sin(ai), sin(aj), sin(ak) 409 | ci, cj, ck = cos(ai), cos(aj), cos(ak) 410 | cc, cs = ci*ck, ci*sk 411 | sc, ss = si*ck, si*sk 412 | 413 | # M = torch.zeros(B, 3, 3).cuda() 414 | if repetition: 415 | c_i = [cj, sj*si, sj*ci] 416 | c_j = [sj*sk, -cj*ss+cc, -cj*cs-sc] 417 | c_k = [-sj*ck, cj*sc+cs, cj*cc-ss] 418 | else: 419 | c_i = [cj*ck, sj*sc-cs, sj*cc+ss] 420 | c_j = [cj*sk, sj*ss+cc, sj*cs-sc] 421 | c_k = [-sj, cj*si, cj*ci] 422 | 423 | def permute(X): # sort X w.r.t. the axis indices 424 | return [ x for (y, x) in sorted(zip(order, X)) ] 425 | 426 | c_i = permute(c_i) 427 | c_j = permute(c_j) 428 | c_k = permute(c_k) 429 | 430 | r =[torch.stack(c_i, 1), 431 | torch.stack(c_j, 1), 432 | torch.stack(c_k, 1)] 433 | r = permute(r) 434 | 435 | return torch.stack(r, 1) 436 | 437 | def batch_mat2euler(M, axes='sxyz'): 438 | """ A torch implementation euler2mat from transform3d: 439 | https://github.com/matthew-brett/transforms3d/blob/master/transforms3d/euler.py 440 | :param array-like shape (3, 3) or (4, 4). Rotation matrix or affine. 441 | :param Axis specification; one of 24 axis sequences as string or encoded tuple - e.g. ``sxyz`` (the default). 442 | -------- 443 | :returns 444 | :param ai : First rotation angle (according to `axes`). 445 | :param aj : Second rotation angle (according to `axes`). 446 | :param ak : Third rotation angle (according to `axes`). 447 | """ 448 | try: 449 | firstaxis, parity, repetition, frame = _AXES2TUPLE[axes.lower()] 450 | except (AttributeError, KeyError): 451 | _TUPLE2AXES[axes] # validation 452 | firstaxis, parity, repetition, frame = axes 453 | 454 | i = firstaxis 455 | j = _NEXT_AXIS[i+parity] 456 | k = _NEXT_AXIS[i-parity+1] 457 | 458 | if repetition: 459 | sy = torch.sqrt(M[:, i, j]**2 + M[:, i, k]**2) 460 | # A lazy way to cope with batch data. Can be more efficient 461 | mask = ~(sy > 1e-8) 462 | ax = atan2( M[:, i, j], M[:, i, k]) 463 | ay = atan2( sy, M[:, i, i]) 464 | az = atan2( M[:, j, i], -M[:, k, i]) 465 | if mask.sum() > 0: 466 | ax[mask] = atan2(-M[:, j, k][mask], M[:, j, j][mask]) 467 | ay[mask] = atan2( sy[mask], M[:, i, i][mask]) 468 | az[mask] = 0.0 469 | else: 470 | cy = torch.sqrt(M[:, i, i]**2 + M[:, j, i]**2) 471 | mask = ~(cy > 1e-8) 472 | ax = atan2( M[:, k, j], M[:, k, k]) 473 | ay = atan2(-M[:, k, i], cy) 474 | az = atan2( M[:, j, i], M[:, i, i]) 475 | if mask.sum() > 0: 476 | ax[mask] = atan2(-M[:, j, k][mask], M[:, j, j][mask]) 477 | ay[mask] = atan2(-M[:, k, i][mask], cy[mask]) 478 | az[mask] = 0.0 479 | 480 | if parity: 481 | ax, ay, az = -ax, -ay, -az 482 | if frame: 483 | ax, az = az, ax 484 | return ax, ay, az 485 | -------------------------------------------------------------------------------- /code/models/submodules.py: -------------------------------------------------------------------------------- 1 | """ 2 | Submodules to build up CNN 3 | 4 | @author: Zhaoyang Lv 5 | @date: March, 2019 6 | """ 7 | 8 | from __future__ import print_function 9 | 10 | import torch.nn as nn 11 | import torch 12 | import numpy as np 13 | 14 | from torch.nn import init 15 | from torchvision import transforms 16 | 17 | def color_normalize(color): 18 | rgb_mean = torch.Tensor([0.4914, 0.4822, 0.4465]).type_as(color) 19 | rgb_std = torch.Tensor([0.2023, 0.1994, 0.2010]).type_as(color) 20 | return (color - rgb_mean.view(1,3,1,1)) / rgb_std.view(1,3,1,1) 21 | 22 | def convLayer(batchNorm, in_planes, out_planes, kernel_size=3, stride=1, dilation=1, bias=False): 23 | """ A wrapper of convolution-batchnorm-ReLU module 24 | """ 25 | if batchNorm: 26 | return nn.Sequential( 27 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2 + dilation-1, dilation=dilation, bias=bias), 28 | nn.BatchNorm2d(out_planes), 29 | #nn.LeakyReLU(0.1,inplace=True) # deprecated 30 | nn.ELU(inplace=True) 31 | ) 32 | else: 33 | return nn.Sequential( 34 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2 + dilation-1, dilation=dilation, bias=True), 35 | #nn.LeakyReLU(0.1,inplace=True) # deprecated 36 | nn.ELU(inplace=True) 37 | ) 38 | 39 | def fcLayer(in_planes, out_planes, bias=True): 40 | return nn.Sequential( 41 | nn.Linear(in_planes, out_planes, bias), 42 | nn.ReLU(inplace=True) 43 | ) 44 | 45 | def initialize_weights(modules, method='xavier'): 46 | for m in modules: 47 | if isinstance(m, nn.Conv2d): 48 | if m.bias is not None: 49 | m.bias.data.zero_() 50 | if method == 'xavier': 51 | init.xavier_uniform_(m.weight) 52 | elif method == 'kaiming': 53 | init.kaiming_uniform_(m.weight) 54 | 55 | if isinstance(m, nn.ConvTranspose2d): 56 | if m.bias is not None: 57 | m.bias.data.zero_() 58 | if method == 'xavier': 59 | init.xavier_uniform_(m.weight) 60 | elif method == 'kaiming': 61 | init.kaiming_uniform_(m.weight) 62 | 63 | class ListModule(nn.Module): 64 | """ The implementation of a list of modules from 65 | https://discuss.pytorch.org/t/list-of-nn-module-in-a-nn-module/219/2 66 | """ 67 | def __init__(self, *args): 68 | super(ListModule, self).__init__() 69 | idx = 0 70 | for module in args: 71 | self.add_module(str(idx), module) 72 | idx += 1 73 | 74 | def __getitem__(self, idx): 75 | if idx < 0 or idx >= len(self._modules): 76 | raise IndexError('index {} is out of range'.format(idx)) 77 | it = iter(self._modules.values()) 78 | for i in range(idx): 79 | next(it) 80 | return next(it) 81 | 82 | def __iter__(self): 83 | return iter(self._modules.values()) 84 | 85 | def __len__(self): 86 | return len(self._modules) 87 | -------------------------------------------------------------------------------- /code/run_example.py: -------------------------------------------------------------------------------- 1 | """ 2 | An extremely simple example to show how to run the algorithm 3 | 4 | @author: Zhaoyang Lv 5 | @date: May 2019 6 | """ 7 | 8 | import argparse 9 | 10 | import torch 11 | import torch.nn as nn 12 | import torch.nn.functional as func 13 | 14 | import models.LeastSquareTracking as ICtracking 15 | 16 | from tqdm import tqdm 17 | from torch.utils.data import DataLoader 18 | from train_utils import check_cuda 19 | from data.SimpleLoader import SimpleLoader 20 | 21 | def resize(img0, img1, depth0, depth1, K_in, resizeH, resizeW): 22 | H, W = img0.shape[-2:] 23 | 24 | I0 = func.interpolate(img0, (resizeH,resizeW), mode='bilinear', align_corners=True) 25 | I1 = func.interpolate(img1, (resizeH,resizeW), mode='bilinear', align_corners=True) 26 | D0 = func.interpolate(depth0, (resizeH,resizeW), mode='nearest') 27 | D1 = func.interpolate(depth1, (resizeH,resizeW), mode='nearest') 28 | 29 | sx = resizeH / H 30 | sy = resizeW / W 31 | 32 | K = K_in.clone() 33 | K[:,0] *= sx 34 | K[:,1] *= sy 35 | K[:,2] *= sx 36 | K[:,3] *= sy 37 | 38 | return I0, I1, D0, D1, K 39 | 40 | def run_inference(dataloader, net): 41 | 42 | progress = tqdm(dataloader, ncols=100, 43 | desc = 'Run the deeper inverse compositional algorithm', 44 | total= len(dataloader)) 45 | 46 | net.eval() 47 | 48 | for idx, batch, in enumerate(progress): 49 | 50 | color0, color1, depth0, depth1, K = check_cuda(batch) 51 | 52 | # downsize the input to 120*160, it is the size of data when the algorthm is trained 53 | C0, C1, D0, D1, K = resize(color0, color1, depth0, depth1, K, resizeH = 120, resizeW=160) 54 | 55 | with torch.no_grad(): 56 | R, t = net.forward(C0, C1, D0, D1, K) 57 | 58 | print('Rotation: ') 59 | print(R) 60 | print('translation: ') 61 | print(t) 62 | 63 | if __name__ == '__main__': 64 | 65 | parser = argparse.ArgumentParser(description='Run the network inference example.') 66 | 67 | parser.add_argument('--checkpoint', default='trained_models/TUM_RGBD_ABC_final.pth.tar', 68 | type=str, help='the path to the pre-trained checkpoint.') 69 | 70 | parser.add_argument('--color_dir', default='data/data_examples/TUM/color', 71 | help='the directory of color images') 72 | parser.add_argument('--depth_dir', default='data/data_examples/TUM/depth', 73 | help='the directory of depth images') 74 | 75 | parser.add_argument('--intrinsic', default='525.0,525.0,319.5,239.5', 76 | help='Simple pin-hole camera intrinsics, input in the format (fx, fy, cx, cy)') 77 | 78 | config = parser.parse_args() 79 | 80 | K = [float(x) for x in config.intrinsic.split(',')] 81 | 82 | simple_loader = SimpleLoader(config.color_dir, config.depth_dir, K) 83 | simple_loader = DataLoader(simple_loader, batch_size=1, shuffle=False) 84 | 85 | net = ICtracking.LeastSquareTracking( 86 | encoder_name = 'ConvRGBD2', 87 | max_iter_per_pyr= 3, 88 | mEst_type = 'MultiScale2w', 89 | solver_type = 'Direct-ResVol') 90 | 91 | if torch.cuda.is_available(): 92 | net.cuda() 93 | 94 | net.load_state_dict(torch.load(config.checkpoint)['state_dict']) 95 | 96 | run_inference(simple_loader, net) -------------------------------------------------------------------------------- /code/timers.py: -------------------------------------------------------------------------------- 1 | """ 2 | A timing utility 3 | 4 | @author: Zhaoyang Lv 5 | @date: March 2019 6 | """ 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | from __future__ import unicode_literals 12 | 13 | import time 14 | import numpy as np 15 | from collections import defaultdict 16 | 17 | class Timer(object): 18 | 19 | def __init__(self): 20 | self.reset() 21 | 22 | def tic(self): 23 | self.start_time = time.time() 24 | 25 | def toc(self, average=True): 26 | self.diff = time.time() - self.start_time 27 | self.total_time += self.diff 28 | self.calls += 1 29 | 30 | def total(self): 31 | """ return the total amount of time """ 32 | return self.total_time 33 | 34 | def avg(self): 35 | """ return the average amount of time """ 36 | return self.total_time / float(self.calls) 37 | 38 | def reset(self): 39 | self.total_time = 0. 40 | self.calls = 0 41 | self.start_time = 0. 42 | self.diff = 0. 43 | 44 | class Timers(object): 45 | 46 | def __init__(self): 47 | self.timers = defaultdict(Timer) 48 | 49 | def tic(self, key): 50 | self.timers[key].tic() 51 | 52 | def toc(self, key): 53 | self.timers[key].toc() 54 | 55 | def print(self, key=None): 56 | if key is None: 57 | # print all time 58 | for k, v in self.timers.items(): 59 | print("Average time for {:}: {:}".format(k, v.avg())) 60 | else: 61 | print("Average time for {:}: {:}".format(key, self.timers[key].avg())) 62 | 63 | def get_avg(self, key): 64 | return self.timers[key].avg() 65 | -------------------------------------------------------------------------------- /code/train.py: -------------------------------------------------------------------------------- 1 | """ 2 | The training script for deep trust region method 3 | 4 | @author: Zhaoyang Lv 5 | @date: March 2019 6 | """ 7 | 8 | import os, sys, argparse, time 9 | 10 | import models.LeastSquareTracking as ICtracking 11 | import models.criterions as criterions 12 | import models.geometry as geometry 13 | import train_utils 14 | import config 15 | from data.dataloader import load_data 16 | 17 | import torch 18 | import torch.nn as nn 19 | import torch.utils.data as data 20 | 21 | from timers import Timers 22 | from tqdm import tqdm 23 | 24 | import evaluate as eval_utils 25 | 26 | def create_train_eval_loaders(options, eval_type, keyframes, 27 | total_batch_size = 8, 28 | trajectory = ''): 29 | """ create the evaluation loader at different keyframes set-up 30 | """ 31 | eval_loaders = {} 32 | 33 | for kf in keyframes: 34 | np_loader = load_data(options.dataset, [kf], eval_type, trajectory) 35 | eval_loaders['{:}_keyframe_{:}'.format(trajectory, kf)] = data.DataLoader(np_loader, 36 | batch_size = int(total_batch_size), 37 | shuffle = False, num_workers = options.cpu_workers) 38 | 39 | return eval_loaders 40 | 41 | def train_one_epoch(options, dataloader, net, optim, epoch, logger, objectives, 42 | known_mask=False, timers=None): 43 | 44 | net.train() 45 | 46 | epoch_len = len(dataloader) 47 | 48 | flow_loss, rpe_loss = None, None 49 | if 'EPE3D' in objectives: 50 | flow_loss = criterions.compute_RT_EPE_loss 51 | if 'RPE' in objectives: 52 | rpe_loss = criterions.compute_RPE_loss 53 | 54 | if timers is None: timers_iter = Timers() 55 | 56 | if timers: timers.tic('one iteration') 57 | else: timers_iter.tic('one iteration') 58 | 59 | for batch_idx, batch in enumerate(dataloader): 60 | 61 | display_dict = {} 62 | 63 | optim.zero_grad() 64 | 65 | if timers: timers.tic('forward step') 66 | 67 | if known_mask: # for dataset that with mask or need mask 68 | color0, color1, depth0, depth1, Rt, K, obj_mask0, obj_mask1 = \ 69 | train_utils.check_cuda(batch[:8]) 70 | else: 71 | color0, color1, depth0, depth1, Rt, K = \ 72 | train_utils.check_cuda(batch[:6]) 73 | obj_mask0, obj_mask1 = None, None 74 | 75 | # Bypass lazy way to bypass invalid pixels. 76 | invalid_mask = (depth0 == depth0.min()) + (depth0 == depth0.max()) 77 | if obj_mask0 is not None: 78 | invalid_mask = 1.0 - obj_mask0 + invalid_mask 79 | 80 | Rs, ts = net.forward(color0, color1, depth0, depth1, K)[:2] 81 | 82 | if timers: timers.toc('forward step') 83 | if timers: timers.tic('calculate loss') 84 | 85 | R_gt, t_gt = Rt[:,:3,:3], Rt[:,:3,3] 86 | 87 | assert(flow_loss) # the only loss used for training 88 | 89 | epes3d = flow_loss(Rs, ts, R_gt, t_gt, depth0, K, invalid_mask).mean() * 1e2 90 | display_dict['train_epes3d'] = epes3d.item() 91 | 92 | loss = epes3d 93 | display_dict['train_loss'] = loss.item() 94 | 95 | if timers: timers.toc('calculate loss') 96 | if timers: timers.tic('backward') 97 | 98 | loss.backward() 99 | 100 | if timers: timers.toc('backward') 101 | 102 | optim.step() 103 | 104 | lr = train_utils.get_learning_rate(optim) 105 | display_dict['lr'] = lr 106 | 107 | if timers: 108 | timers.toc('one iteration') 109 | batch_time = timers.get_avg('one iteration') 110 | timers.tic('one iteration') 111 | else: 112 | timers_iter.toc('one iteration') 113 | batch_time = timers_iter.get_avg('one iteration') 114 | timers_iter.tic('one iteration') 115 | 116 | logger.write_to_tensorboard(display_dict, epoch*epoch_len + batch_idx) 117 | logger.write_to_terminal(display_dict, epoch, batch_idx, epoch_len, batch_time, is_train=True) 118 | 119 | def train(options): 120 | 121 | if options.time: 122 | timers = Timers() 123 | else: 124 | timers = None 125 | 126 | total_batch_size = options.batch_per_gpu * torch.cuda.device_count() 127 | 128 | checkpoint = train_utils.load_checkpoint_train(options) 129 | 130 | keyframes = [int(x) for x in options.keyframes.split(',')] 131 | train_loader = load_data(options.dataset, keyframes, load_type = 'train') 132 | train_loader = data.DataLoader(train_loader, 133 | batch_size = total_batch_size, 134 | shuffle = True, num_workers = options.cpu_workers) 135 | if options.dataset in ['BundleFusion', 'TUM_RGBD']: 136 | obj_has_mask = False 137 | else: 138 | obj_has_mask = True 139 | 140 | eval_loaders = create_train_eval_loaders(options, 'validation', keyframes, total_batch_size) 141 | 142 | logfile_name = '_'.join([ 143 | options.prefix, # the current test version 144 | options.network, 145 | options.encoder_name, 146 | options.mestimator, 147 | options.solver, 148 | 'lr', str(options.lr), 149 | 'batch', str(total_batch_size), 150 | 'kf', options.keyframes]) 151 | 152 | print("Initialize and train the Deep Trust Region Network") 153 | net = ICtracking.LeastSquareTracking( 154 | encoder_name = options.encoder_name, 155 | max_iter_per_pyr= options.max_iter_per_pyr, 156 | mEst_type = options.mestimator, 157 | solver_type = options.solver, 158 | tr_samples = options.tr_samples, 159 | no_weight_sharing = options.no_weight_sharing, 160 | timers = timers) 161 | 162 | if options.no_weight_sharing: 163 | logfile_name += '_no_weight_sharing' 164 | logger = train_utils.initialize_logger(options, logfile_name) 165 | 166 | if options.checkpoint: 167 | net.load_state_dict(checkpoint['state_dict']) 168 | 169 | if torch.cuda.is_available(): net.cuda() 170 | 171 | net.train() 172 | 173 | if torch.cuda.device_count() > 1: 174 | print("Use", torch.cuda.device_count(), "GPUs for training!") 175 | # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs 176 | net = nn.DataParallel(net) 177 | 178 | train_objective = ['EPE3D'] # Note: we don't use RPE for training 179 | eval_objectives = ['EPE3D', 'RPE'] 180 | 181 | num_params = train_utils.count_parameters(net) 182 | 183 | if num_params < 1: 184 | print('There is no learnable parameters in this baseline.') 185 | print('No training. Only one iteration of evaluation') 186 | no_training = True 187 | else: 188 | print('There is a total of {:} learnabled parameters'.format(num_params)) 189 | no_training = False 190 | optim = train_utils.create_optim(options, net) 191 | scheduler = torch.optim.lr_scheduler.MultiStepLR(optim, 192 | milestones=options.lr_decay_epochs, 193 | gamma=options.lr_decay_ratio) 194 | 195 | freq = options.save_checkpoint_freq 196 | for epoch in range(options.start_epoch, options.epochs): 197 | 198 | if epoch % freq == 0: 199 | checkpoint_name = 'checkpoint_epoch{:d}.pth.tar'.format(epoch) 200 | print('save {:}'.format(checkpoint_name)) 201 | state_info = {'epoch': epoch, 'num_param': num_params} 202 | logger.save_checkpoint(net, state_info, filename=checkpoint_name) 203 | 204 | if options.no_val is False: 205 | for k, loader in eval_loaders.items(): 206 | 207 | eval_name = '{:}_{:}'.format(options.dataset, k) 208 | 209 | eval_info = eval_utils.evaluate_trust_region( 210 | loader, net, eval_objectives, 211 | known_mask = obj_has_mask, 212 | eval_name = eval_name, 213 | timers = timers) 214 | 215 | display_dict = {"{:}_epe3d".format(eval_name): eval_info['epes'].mean(), 216 | "{:}_rpe_angular".format(eval_name): eval_info['angular_error'].mean(), 217 | "{:}_rpe_translation".format(eval_name): eval_info['translation_error'].mean()} 218 | 219 | logger.write_to_tensorboard(display_dict, epoch) 220 | 221 | if no_training: break 222 | 223 | train_one_epoch(options, train_loader, net, optim, epoch, logger, 224 | train_objective, known_mask=obj_has_mask, timers=timers) 225 | 226 | scheduler.step() 227 | 228 | if __name__ == '__main__': 229 | 230 | parser = argparse.ArgumentParser(description='Training the network') 231 | 232 | config.add_basics_config(parser) 233 | config.add_train_basics_config(parser) 234 | config.add_train_optim_config(parser) 235 | config.add_train_log_config(parser) 236 | config.add_train_loss_config(parser) 237 | config.add_tracking_config(parser) 238 | 239 | options = parser.parse_args() 240 | 241 | options.start_epoch = 0 242 | 243 | print('---------------------------------------') 244 | print_options = vars(options) 245 | for key in print_options.keys(): 246 | print(key+': '+str(print_options[key])) 247 | print('---------------------------------------') 248 | 249 | torch.manual_seed(1) 250 | torch.cuda.manual_seed(1) 251 | 252 | print('Start training...') 253 | train(options) 254 | -------------------------------------------------------------------------------- /code/train_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | The training utility functions 3 | 4 | @author: Zhaoyang Lv 5 | @date: March 2019 6 | """ 7 | 8 | import os, sys 9 | from os.path import join 10 | import torch 11 | import torch.nn as nn 12 | 13 | def check_cuda(items): 14 | if torch.cuda.is_available(): 15 | return [x.cuda() for x in items] 16 | else: 17 | return items 18 | 19 | def initialize_logger(opt, logfile_name): 20 | """ Initialize the logger for the network 21 | """ 22 | from Logger import TensorBoardLogger 23 | log_dir = opt.dataset 24 | # if opt.resume_training: 25 | # logfile_name = '_'.join([ 26 | # logfile_name, 27 | # 'resume']) 28 | 29 | log_dir = join('logs', log_dir, logfile_name) 30 | logger = TensorBoardLogger(log_dir, logfile_name) 31 | return logger 32 | 33 | def create_optim(config, network): 34 | """ Create the optimizer 35 | """ 36 | if config.opt=='sgd': 37 | optim = torch.optim.SGD(network.parameters(), 38 | lr = config.lr, 39 | momentum = 0.9, 40 | weight_decay = 4e-4, 41 | nesterov=False) 42 | elif config.opt=='adam' or config.opt=='sgdr': 43 | optim = torch.optim.Adam(network.parameters(), 44 | lr = config.lr, 45 | weight_decay = 4e-4 46 | ) 47 | elif config.opt=='rmsprop': 48 | optim = torch.optim.RMSprop(network.parameters(), 49 | lr = config.lr, 50 | weight_decay = 1e-4) 51 | else: 52 | raise NotImplementedError 53 | 54 | return optim 55 | 56 | def count_parameters(model): 57 | return sum(p.numel() for p in model.parameters() if p.requires_grad) 58 | 59 | def load_checkpoint_test(opt): 60 | if os.path.isfile(opt.checkpoint): 61 | print('=> loading checkpoint '+ opt.checkpoint) 62 | checkpoint = torch.load(opt.checkpoint) 63 | else: 64 | raise Exception('=> no checkpoint found at '+opt.checkpoint) 65 | return checkpoint 66 | 67 | def load_checkpoint_train(opt): 68 | """ Loading the checking-point file if specified 69 | """ 70 | checkpoint = None 71 | if opt.checkpoint: 72 | if os.path.isfile(opt.checkpoint): 73 | print('=> loading checkpoint '+ opt.checkpoint) 74 | 75 | checkpoint = torch.load(opt.checkpoint) 76 | print('=> loaded checkpoint '+ opt.checkpoint+' epoch %d'%checkpoint['epoch']) 77 | if opt.resume_training: 78 | opt.start_epoch = checkpoint['epoch'] 79 | print('resume training on the checkpoint') 80 | else: 81 | print('start new training...') 82 | 83 | # This is to configure the module loaded from multi-gpu 84 | if opt.checkpoint_multigpu: 85 | from collections import OrderedDict 86 | state_dict_rename = OrderedDict() 87 | for k, v in checkpoint['state_dict'].items(): 88 | name = k[7:] # remove `module.` 89 | state_dict_rename[name] = v 90 | checkpoint['state_dict'] = state_dict_rename 91 | else: 92 | print('=> no checkpoint found at '+opt.checkpoint) 93 | return checkpoint 94 | 95 | def set_learning_rate(optim, lr): 96 | """ manual set the learning rate for all specified parameters 97 | """ 98 | for param_group in optim.param_groups: 99 | param_group['lr']=lr 100 | 101 | def get_learning_rate(optim, name=None): 102 | """ retrieve the current learning rate 103 | """ 104 | if name is None: 105 | # assume all the learning rate remains the same 106 | return optim.param_groups[0]['lr'] 107 | 108 | def adjust_learning_rate_manual(optim, epoch, lr, lr_decay_epochs, lr_decay_ratio): 109 | """ DIY the learning rate 110 | """ 111 | for e in lr_decay_epochs: 112 | if epoch