├── .gitignore
├── LICENSE
├── README.md
├── code
    ├── .gitignore
    ├── Logger.py
    ├── config.py
    ├── data
    │   ├── MovingObj3D.py
    │   ├── SimpleLoader.py
    │   ├── TUM_RGBD.py
    │   ├── data_examples
    │   │   └── TUM
    │   │   │   ├── color
    │   │   │       ├── 1305031790.645155.png
    │   │   │       ├── 1305031790.713097.png
    │   │   │       ├── 1305031790.781258.png
    │   │   │       ├── 1305031790.845151.png
    │   │   │       └── 1305031790.913129.png
    │   │   │   └── depth
    │   │   │       ├── 1305031790.640468.png
    │   │   │       ├── 1305031790.709421.png
    │   │   │       ├── 1305031790.773548.png
    │   │   │       ├── 1305031790.839363.png
    │   │   │       └── 1305031790.909436.png
    │   └── dataloader.py
    ├── evaluate.py
    ├── models
    │   ├── LeastSquareTracking.py
    │   ├── __init__.py
    │   ├── algorithms.py
    │   ├── criterions.py
    │   ├── geometry.py
    │   └── submodules.py
    ├── run_example.py
    ├── timers.py
    ├── train.py
    ├── train_utils.py
    └── trained_models
    │   └── TUM_RGBD_ABC_final.pth.tar
├── images
    └── overall_flowchart.png
└── setup
    └── environment.yml


/.gitignore:
--------------------------------------------------------------------------------
1 | *.DS_Store
2 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 Zhaoyang Lv, Frank Dellaert, James M. Rehg, Andreas Geiger
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Taking a Deeper Look at the Inverse Compositional Algorithm (CVPR 2019, Oral Presentation)
  2 | 
  3 | ![alt text](images/overall_flowchart.png)
  4 | 
  5 | ## Summary 
  6 | 
  7 | This is the official repository of our CVPR 2019 paper:
  8 | 
  9 | **Taking a Deeper Look at the Inverse Compositional Algorithm**,
 10 | *Zhaoyang Lv, Frank Dellaert, James M. Rehg, Andreas Geiger*,
 11 | CVPR 2019
 12 |  * [Preprint (PDF)][1]
 13 |  * [Video talk][2]
 14 | 
 15 | ```bibtex
 16 | @inproceedings{Lv19cvpr,  
 17 |   title     = {Taking a Deeper Look at the Inverse Compositional Algorithm}, 
 18 |   author    = {Lv, Zhaoyang and Dellaert, Frank and Rehg, James and Geiger, Andreas},  
 19 |   booktitle = {CVPR},  
 20 |   year      = {2019}  
 21 | }
 22 | ```
 23 | 
 24 | Please cite the paper if you use the provided code in your research. 
 25 | 
 26 | ### Project Members
 27 | 
 28 | * [Zhaoyang Lv][3], Georgia Institute of Technology (Ph.D. student), Max Planck Institute (Alumni)
 29 | * [Frank Dellaert][4], Georgia Institute of Technology
 30 | * [James M. Rehg][5], Georgia Institute of Technology
 31 | * [Andreas Geiger][6], Max Planck Institute, University of Tuebingen
 32 | 
 33 | ### Contact
 34 | 
 35 | Please drop me an email if you have any questions regarding this project. 
 36 | 
 37 | Zhaoyang Lv (zhaoyang.lv@gatech.edu, lvzhaoyang1990@gmail.com)
 38 | 
 39 | ### Setup 
 40 | 
 41 | The code is developed using Pytorch 1.0, CUDA 9.0, Ubuntu16.04. Pytorch > 1.0 and Cuda > 9.0 were also tested in some machines. If you need to integrate to some code bases which are using Pytorch < 1.0, note that some functions (particularly the matrix inverse operator) are not backward compatible. Contact me if you need some instructions.
 42 | 
 43 | You can reproduce the setup by using our anaconda environment configurations 
 44 | 
 45 | ``` bash!
 46 | conda env create -f setup/environment.yml
 47 | ```
 48 | 
 49 | Everytime before you run, activate the environment
 50 | 
 51 | ``` bash!
 52 | conda activate deepICN # the environment name will be (deepICN)
 53 | ```
 54 | 
 55 | ### Quick Inference Example
 56 | 
 57 | Here is a quick run using pre-trained model on a short TUM trajectory. 
 58 | 
 59 | ```
 60 | python run_example.py
 61 | ```
 62 | 
 63 | Be careful about the depth reader when you switch to a different dataset, which may use different scaling to the TUM dataset. And this pre-trained model is trained for egocentric-motion estimation using TUM dataset, not for the object-centric motion estimation. These factors may affect the result. 
 64 | 
 65 | To run the full training and evaluation, please follow the steps below.
 66 | 
 67 | ### Prepare the datasets 
 68 | 
 69 | **TUM RGBD Dataset**: Download the dataset from [TUM RGBD][7] to '$YOUR_TUM_RGBD_DIR'. Create a symbolic link to the data directory as 
 70 | 
 71 | ```
 72 | ln -s $YOUR_TUM_RGBD_DIR code/data/data_tum
 73 | ```
 74 | 
 75 | **MovingObjects3D Dataset** Download the dataset from [MovingObjs3D][8] to '$YOUR_MOV_OBJS_3D_DIR'. Create a symbolic link to the data directory as 
 76 | 
 77 | ```
 78 | ln -s $YOUR_MOV_OBJS_3D_DIR code/data/data_objs3D
 79 | ```
 80 | 
 81 | ### Run training
 82 | 
 83 | **Train example with TUM RGBD dataset:** 
 84 | 
 85 | ``` bash! 
 86 | python train.py --dataset TUM_RGBD 
 87 | 
 88 | # or run with the specific setting
 89 | python train.py --dataset TUM_RGBD \
 90 | --encoder_name ConvRGBD2 \
 91 | --mestimator MultiScale2w \
 92 | --solver Direct-ResVol \
 93 | --keyframes 1,2,4,8 # mixing keyframes subsampled from 1,2,4,8 for training.
 94 | ```
 95 | 
 96 | To check the full training setting, run the help config as 
 97 | ``` bash!
 98 | python train.py --help
 99 | ``` 
100 | 
101 | Use tensorboard to check the progress during training
102 | ``` bash!
103 | tensorboard --log logs/TUM_RGBD --port 8000 # go to localhost:8000 to check the training&validation curve
104 | ```
105 | 
106 | **Train example with MovingObjects3D:** All the same as the last one only except changing the dataset name
107 | 
108 | ``` bash!
109 | python train.py --dataset MovingObjs3D \
110 | --keyframes 1,2,4 # mixing keyframes subsampled from 1,2,4 for training
111 | 
112 | # check the training progress using tensorboard
113 | tensorboard --log logs/MovingObjs3D --port 8001
114 | ```
115 | 
116 | We will soon release the instructions for the other two datasets used in the paper, data in the BundleFusion and DynamicBundleFusion.
117 | 
118 | ### Run evaluation with the pretrained models 
119 | 
120 | **Run the pretrained model:** If you have set up the dataset properly with the datasets, you can run the learned model with the checkpoint we provided in the trained model directory 
121 | 
122 | ``` bash!
123 | python evaluate.py --dataset TUM_RGBD \
124 | --trajectory fr1/rgbd_dataset_freiburg1_360 \
125 | --encoder_name ConvRGBD2 \
126 | --mestimator MultiScale2w \
127 | --solver Direct-ResVol \
128 | --keyframes 1 \
129 | --checkpoint trained_models/TUM_RGBD_ABC_final.pth.tar
130 | ```
131 | 
132 | You can substitute the trajectory, the keyframe and the checkpoint file. The training and evaluation share the same config setting. To check the full setting, run the help config as
133 | 
134 | ``` bash!
135 | python evaluate.py --help
136 | ```
137 | 
138 | **Results:** The evaluation results will be generated automatically in both '.pkl' and '*.csv' in the folder 'test_results/'.
139 | 
140 | **Run a baseline:** This implementation can be simplified to a vanilla Lucas-Kanade method minizing the photometric error without any learnable module. Note that it is **not** the [RGBD VO baseline][9] we report in the paper. It may not be the optimal Lucas-Kanade baseline you want to compared with since we use the same stopping criterion, Gauss-Newton solver within the same framework as our learned model. There is no extra bells and whistles, but it may provide a baseline for you to explore the algorithm in various directions.  
141 | 
142 | ``` bash!
143 | python evaluate.py --dataset TUM_RGBD \
144 | --trajectory fr1/rgbd_dataset_freiburg1_360 \
145 | --encoder_name RGB --mestimator None --solve Direct-Nodamping \
146 | --keyframes 1 # optionally 1,2,4,8, etc.
147 | ```
148 | 
149 | ## License 
150 | 
151 | MIT License
152 | 
153 | Copyright (c) 2019 Zhaoyang Lv, Frank Dellaert, James M. Rehg, Andreas Geiger
154 | 
155 | Permission is hereby granted, free of charge, to any person obtaining a copy
156 | of this software and associated documentation files (the "Software"), to deal
157 | in the Software without restriction, including without limitation the rights
158 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
159 | copies of the Software, and to permit persons to whom the Software is
160 | furnished to do so, subject to the following conditions:
161 | 
162 | The above copyright notice and this permission notice shall be included in all
163 | copies or substantial portions of the Software.
164 | 
165 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
166 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
167 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
168 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
169 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
170 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
171 | SOFTWARE.
172 | 
173 | 
174 | [1]: https://arxiv.org/pdf/1812.06861.pdf
175 | [2]: https://youtu.be/doTjXDFtyK0
176 | [3]: https://www.cc.gatech.edu/~zlv30/
177 | [4]: https://www.cc.gatech.edu/~dellaert/FrankDellaert/Frank_Dellaert/Frank_Dellaert.html
178 | [5]: https://rehg.org/
179 | [6]: http://www.cvlibs.net/
180 | [7]: https://vision.in.tum.de/data/datasets/rgbd-dataset/download
181 | [8]: https://drive.google.com/open?id=1EIlS4J2J0sdsq8Mw_03DXHlRQmfL8XQx
182 | [9]: https://vision.cs.tum.edu/_media/spezial/bib/steinbruecker_sturm_cremers_iccv11.pdf 


--------------------------------------------------------------------------------
/code/.gitignore:
--------------------------------------------------------------------------------
1 | trained_models/*
2 | test/*
3 | test_results/*
4 | */**/*.pyc
5 | logs/*
6 | 


--------------------------------------------------------------------------------
/code/Logger.py:
--------------------------------------------------------------------------------
  1 | """ Logger, wrapped on Tensorboard 1.0.0a6
  2 | Tensorboard is not backward compatible since then. 
  3 | 
  4 | @author: Zhaoyang Lv
  5 | @date: March 2019
  6 | """
  7 | 
  8 | import sys, os, shutil
  9 | import os.path as osp
 10 | import tensorboard
 11 | import torch
 12 | from collections import OrderedDict
 13 | 
 14 | class Logger(object):
 15 |     """
 16 |     example usage:
 17 | 
 18 |         stdout = Logger('log.txt')
 19 |         sys.stdout = stdout
 20 | 
 21 |         ... your code here ...
 22 | 
 23 |         stdout.delink()
 24 |     """
 25 |     def __init__(self, filename="Default.log"):
 26 |         self.terminal = sys.stdout
 27 |         self.log = open(filename, "w")
 28 | 
 29 |     def delink(self):
 30 |         self.log.close()
 31 |         #self.log = open('foo', "w")
 32 | #        self.write = self.writeTerminalOnly
 33 | 
 34 |     def writeTerminalOnly(self, message):
 35 |         self.terminal.write(message)
 36 | 
 37 |     def write(self, message):
 38 |         self.terminal.write(message)
 39 |         self.log.write(message)
 40 | 
 41 |     def flush(self):
 42 |         pass
 43 | 
 44 | class TensorBoardLogger(object):
 45 |     def __init__(self, logging_dir, logfile_name, print_freq = 10):
 46 | 
 47 |         self.log_dir = logging_dir
 48 |         self.print_freq = print_freq
 49 | 
 50 |         if not os.path.isdir(logging_dir):
 51 |             os.makedirs(logging_dir)
 52 | 
 53 |         self.summary_writer = tensorboard.FileWriter(logdir=logging_dir)
 54 | 
 55 |         # standard logger to print to terminal
 56 |         logfile = osp.join(logging_dir,'log.txt')
 57 |         stdout = Logger(logfile)
 58 |         sys.stdout = stdout
 59 | 
 60 |     def write_to_tensorboard(self, display_dict, iteration):
 61 |         """ Write the saved states (display_dict) to tensorboard
 62 |         """
 63 |         for k, v in display_dict.items():
 64 |             self.summary_writer.add_summary(tensorboard.summary.scalar(k, v), iteration)
 65 | 
 66 |     def write_to_terminal(self, display_dict, epoch, batch_iter, epoch_len, batch_time, is_train = True):
 67 |         """ Write the save states (display_dict) and training information to terminal for display
 68 |         """
 69 | 
 70 |         if batch_iter % self.print_freq != 0:
 71 |             return
 72 | 
 73 |         if is_train:
 74 |             prefix = 'Train'
 75 |         else:
 76 |             prefix = 'Test'
 77 | 
 78 |         state = prefix + ':\tEpoch %d, Batch %d/%d, BatchTime %.4f'%(epoch+1, batch_iter, epoch_len, batch_time)
 79 | 
 80 |         loss = ''
 81 |         for k, v in display_dict.items():
 82 |             loss += k + ' ' + '%.8f ' % v
 83 | 
 84 |         print(state + loss)
 85 | 
 86 |     def save_checkpoint(self, network, state_info = None,
 87 |         filename='checkpoint.pth.tar'):
 88 |         """save checkpoint to disk"""
 89 |         state_dict = network.state_dict().copy()
 90 | 
 91 |         if torch.cuda.device_count() > 1:
 92 |             state_dict_rename = OrderedDict()
 93 |             for k, v in state_dict.items():
 94 |                 name = k[7:] # remove `module.`
 95 |                 state_dict_rename[name] = v
 96 |             state_dict = state_dict_rename
 97 | 
 98 |         if state_info is None:
 99 |             state = {'state_dict': state_dict}
100 |         else:
101 |             state = state_info
102 |             state['state_dict'] = state_dict
103 | 
104 |         checkpoint_path = osp.join(self.log_dir,filename)
105 |         torch.save(state, checkpoint_path)
106 |         return checkpoint_path
107 | 


--------------------------------------------------------------------------------
/code/config.py:
--------------------------------------------------------------------------------
  1 | """ 
  2 | Add the configurations by modules
  3 | 
  4 | @author: Zhaoyang Lv
  5 | @date: March 2019
  6 | """
  7 | 
  8 | def add_tracking_config(parser):
  9 |     parser.add_argument('--network',
 10 |         default='DeepIC', type=str,
 11 |         choices=('DeepIC', 'GaussNewton'),
 12 |         help='Choose a network to run. \n \
 13 |         The DeepIC is the proposed Deeper Inverse Compositional method. \n\
 14 |         The GuassNewton is the baseline for Inverse Compositional method which does not include \
 15 |         any learnable parameters\n')
 16 |     parser.add_argument('--mestimator',
 17 |         default='MultiScale2w', type=str,
 18 |         choices=('None', 'MultiScale2w'),
 19 |         help='Choose a weighting function for the Trust Region method.\n\
 20 |             The MultiScale2w is the proposed (B) convolutional M-estimator. \n')
 21 |     parser.add_argument('--solver',
 22 |         default='Direct-ResVol', type=str,
 23 |         choices=('Direct-Nodamping', 'Direct-ResVol'),
 24 |         help='Choose the solver function for the Trust Region method. \n\
 25 |             Direct-Nodamping is the Gauss-Newton algorithm, which does not use damping. \n\
 26 |             Direct-ResVol is the proposed (C) Trust-Region Network. \n\
 27 |             (default: Direct-ResVol) ')
 28 |     parser.add_argument('--encoder_name',
 29 |         default='ConvRGBD2',
 30 |         choices=('ConvRGBD2', 'RGB'),
 31 |         help='The encoder architectures. \
 32 |             ConvRGBD2 takes the two-view features as input. \n\
 33 |             RGB is using the raw RGB images as input (converted to intensity afterwards).\n\
 34 |             (default: ConvRGBD2)')
 35 |     parser.add_argument('--max_iter_per_pyr',
 36 |         default=3, type=int,
 37 |         help='The maximum number of iterations at each pyramids.\n')
 38 |     parser.add_argument('--no_weight_sharing',
 39 |         action='store_true',
 40 |         help='If this flag is on, we disable sharing the weights across different backbone network when extracing \
 41 |          features. In default, we share the weights for all network in each pyramid level.\n')
 42 |     parser.add_argument('--tr_samples', default=10, type=int,
 43 |         help='Set the number of trust-region samples. (default: 10)\n')
 44 | 
 45 | def add_basics_config(parser):
 46 |     """ the basic setting
 47 |     (supposed to be shared through train and inference)
 48 |     """
 49 |     parser.add_argument('--cpu_workers', type=int, default=12,
 50 |         help="Number of cpu threads for data loader.\n")
 51 |     parser.add_argument('--dataset', type=str,
 52 |         choices=('TUM_RGBD', 'MovingObjects3D'),
 53 |         help='Choose a dataset to train/val/evaluate.\n')
 54 |     parser.add_argument('--time', dest='time', action='store_true',
 55 |         help='Count the execution time of each step.\n' )
 56 | 
 57 | def add_test_basics_config(parser):
 58 |     parser.add_argument('--batch_per_gpu', default=8, type=int,
 59 |         help='Specify the batch size during test. The default is 8.\n')
 60 |     parser.add_argument('--checkpoint', default='', type=str,
 61 |         help='Choose a checkpoint model to test.\n')
 62 |     parser.add_argument('--keyframes',
 63 |         default='1,2,4,8', type=str,
 64 |         help='Choose the number of keyframes to train the algorithm.\n')
 65 |     parser.add_argument('--verbose', action='store_true',
 66 |         help='Print/save all the intermediate representations')
 67 |     parser.add_argument('--eval_set', default='test',
 68 |         choices=('test', 'validation'))
 69 |     parser.add_argument('--trajectory', type=str, 
 70 |         default = '',
 71 |         help = 'Specify a trajectory to run.\n')
 72 | 
 73 | def add_train_basics_config(parser):
 74 |     """ add the basics about the training """
 75 |     parser.add_argument('--checkpoint', default='', type=str,
 76 |         help='Choose a pretrained checkpoint model to start with. \n')
 77 |     parser.add_argument('--batch_per_gpu', default=64, type=int,
 78 |         help='Specify the batch size during training.\n')
 79 |     parser.add_argument('--epochs',
 80 |         default=30, type=int,
 81 |         help='The total number of total epochs to run. Default is 30.\n' )
 82 |     parser.add_argument('--resume_training',
 83 |         dest='resume_training', action='store_true',
 84 |         help='Resume the training using the loaded checkpoint. If not, restart the training. \n\
 85 |             You will need to use the --checkpoint config to load the pretrained checkpoint' )
 86 |     parser.add_argument('--pretrained_model', default='', type=str,
 87 |         help='Initialize the model weights with pretrained model.\n')
 88 |     parser.add_argument('--no_val',
 89 |         default=False,
 90 |         action='store_true',
 91 |         help='Use no validatation set for training.\n')
 92 |     parser.add_argument('--keyframes',
 93 |         default='1,2,4,8', type=str,
 94 |         help='Choose the number of keyframes to train the algorithm')
 95 |     parser.add_argument('--verbose', action='store_true',
 96 |         help='Print/save all the intermediate representations.\n')
 97 | 
 98 | def add_train_log_config(parser):
 99 |     """ checkpoint and log options """
100 |     parser.add_argument('--checkpoint_folder', default='', type=str,
101 |         help='The folder name (postfix) to save the checkpoint.')
102 |     parser.add_argument('--snapshot', default=1, type=int,
103 |         help='Number of interations to save a snapshot')
104 |     parser.add_argument('--save_checkpoint_freq',
105 |         default=1, type=int,
106 |         help='save the checkpoint for every N epochs')
107 |     parser.add_argument('--prefix', default='', type=str,
108 |         help='the prefix string added to the log files')
109 |     parser.add_argument('-p', '--print_freq',
110 |         default=10, type=int,
111 |         help='print frequency (default: 10)')
112 | 
113 | def add_train_optim_config(parser):
114 |     """ add training optimization options """
115 |     parser.add_argument('--opt',
116 |         type=str, default='adam', choices=('sgd','adam'),
117 |         help='choice of optimizer (default: adam) \n')
118 |     parser.add_argument('--lr',
119 |         default=0.0005, type=float,
120 |         help='initial learning rate. \n')
121 |     parser.add_argument('--lr_decay_ratio',
122 |         default=0.5, type=float,
123 |         help='lr decay ratio (default:0.5)')
124 |     parser.add_argument('--lr_decay_epochs',
125 |         default=[5, 10, 20], type=int, nargs='+',
126 |         help='lr decay epochs')
127 |     parser.add_argument('--lr_min', default=1e-6, type=float,
128 |         help='minimum learning rate')
129 |     parser.add_argument('--lr_restart', default=10, type=int,
130 |         help='restart learning after N epochs')
131 | 
132 | def add_train_loss_config(parser):
133 |     """ add training configuration for the loss function """
134 |     parser.add_argument('--regression_loss_type',
135 |         default='SmoothL1', type=str, choices=('L1', 'SmoothL1'),
136 |         help='Loss function for flow regression (default: SmoothL1 loss)')


--------------------------------------------------------------------------------
/code/data/MovingObj3D.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Data loader for MovingObjs 3D dataset
  3 | 
  4 | @author: Zhaoyang Lv
  5 | @date: May 2019
  6 | """
  7 | 
  8 | from __future__ import absolute_import
  9 | from __future__ import division
 10 | from __future__ import print_function
 11 | from __future__ import unicode_literals
 12 | 
 13 | import sys, os, random
 14 | import pickle
 15 | import functools
 16 | 
 17 | import numpy as np
 18 | import torch.utils.data as data
 19 | import os.path as osp
 20 | 
 21 | from scipy.misc import imread
 22 | from tqdm import tqdm
 23 | 
 24 | from cv2 import resize, INTER_NEAREST
 25 | 
 26 | class MovingObjects3D(data.Dataset):
 27 | 
 28 |     # every sequence has 200 frames.
 29 |     categories = {
 30 |         'train': {'aeroplane':  [0,190],
 31 |                 'bicycle':      [0,190],
 32 |                 'bus':          [0,190],
 33 |                 'car':          [0,190]},
 34 | 
 35 |         'validation': {'aeroplane': [190,200],
 36 |                 'bicycle':          [190,200],
 37 |                 'bus':              [190,200],
 38 |                 'car':              [190,200]},
 39 | 
 40 |         'test': {'boat':           [0,200],
 41 |                 'motorbike':        [0,200]}
 42 |     }
 43 | 
 44 |     def __init__(self, root, load_type='train', keyframes = [1], data_transform=None):
 45 |         super(MovingObjects3D, self).__init__()
 46 | 
 47 |         self.base_folder = osp.join(root, 'data_objs3D')
 48 | 
 49 |         data_all = self.categories[load_type]
 50 | 
 51 |         # split it into train and test set (the first 20 are for test)
 52 |         self.image_seq = []
 53 |         self.depth_seq = []
 54 |         self.invalid_seq = []
 55 |         self.object_mask_seq = []
 56 |         self.cam_pose_seq = []
 57 |         self.obj_pose_seq = []
 58 |         self.obj_vis_idx = []
 59 |         self.calib = []
 60 |         self.obj_names = []
 61 | 
 62 |         self.transforms = data_transform
 63 | 
 64 |         if load_type in ['validation', 'test']:
 65 |             # should not mix different keyframes in test
 66 |             assert(len(keyframes) == 1)
 67 |             self.keyframes = [1]
 68 |             self.sample_freq = keyframes[0]
 69 |         else:
 70 |             self.keyframes = keyframes
 71 |             self.sample_freq = 1
 72 | 
 73 |         self.ids = 0
 74 |         self.images_size = [240, 320]
 75 |         # get the accumulated image sequences on the fly
 76 |         self.seq_acc_ids = [0]
 77 |         for data_obj, frame_interval in data_all.items():
 78 |             start_idx, end_idx = frame_interval
 79 |             print('Load {:} data from frame {:d} to {:d}'.format(data_obj, start_idx, end_idx))
 80 |             for seq_idx in range(start_idx, end_idx, 1):
 81 |                 seq_str = "{:06d}".format(seq_idx)
 82 | 
 83 |                 info_pkl= osp.join(self.base_folder,
 84 |                     data_obj, seq_str, 'info.pkl')
 85 | 
 86 |                 color_seq, depth_seq, invalid_seq, mask_seq, camera_poses_seq, object_poses_seq,\
 87 |                     obj_visible_frames, calib_seq = extract_info_pickle(info_pkl)
 88 | 
 89 |                 obj_visible_frames = obj_visible_frames[::self.sample_freq]
 90 | 
 91 |                 self.image_seq.append([osp.join(self.base_folder, x) for x in color_seq]) 
 92 |                 self.depth_seq.append([osp.join(self.base_folder, x) for x in depth_seq])
 93 |                 # self.invalid_seq.append(invalid_seq)
 94 |                 self.object_mask_seq.append([osp.join(self.base_folder, x) for x in mask_seq])
 95 |                 self.cam_pose_seq.append(camera_poses_seq)
 96 |                 self.obj_pose_seq.append(object_poses_seq)
 97 |                 self.calib.append(calib_seq)
 98 |                 self.obj_vis_idx.append(obj_visible_frames)
 99 | 
100 |                 self.obj_names.append('{:}_{:03d}'.format(data_obj, seq_idx))
101 | 
102 |                 total_valid_frames = max(0, len(obj_visible_frames) - max(self.keyframes))
103 | 
104 |                 self.ids += total_valid_frames
105 |                 self.seq_acc_ids.append(self.ids)
106 | 
107 |         # downscale the input image to half
108 |         self.fx_s = 0.5
109 |         self.fy_s = 0.5
110 | 
111 |         print('There are a total of {:} valid frames'.format(self.ids))
112 | 
113 |     def __len__(self):
114 |         return self.ids
115 | 
116 |     def __getitem__(self, index):
117 |         # the index we want from search sorted is shifted for one
118 |         seq_idx = max(np.searchsorted(self.seq_acc_ids, index+1) - 1, 0)
119 |         frame_idx = index - self.seq_acc_ids[seq_idx]
120 | 
121 |         this_idx= self.obj_vis_idx[seq_idx][frame_idx]
122 |         next_idx= self.obj_vis_idx[seq_idx][frame_idx + random.choice(self.keyframes)]
123 | 
124 |         color0 = self.__load_rgb_tensor(self.image_seq[seq_idx][this_idx])
125 |         color1 = self.__load_rgb_tensor(self.image_seq[seq_idx][next_idx])
126 | 
127 |         if self.transforms:
128 |             color0, color1 = self.transforms([color0, color1])
129 | 
130 |         depth0 = self.__load_depth_tensor(self.depth_seq[seq_idx][this_idx])
131 |         depth1 = self.__load_depth_tensor(self.depth_seq[seq_idx][next_idx])
132 | 
133 |         cam_pose0 = self.cam_pose_seq[seq_idx][this_idx]
134 |         cam_pose1 = self.cam_pose_seq[seq_idx][next_idx]
135 |         obj_pose0 = self.obj_pose_seq[seq_idx][this_idx]
136 |         obj_pose1 = self.obj_pose_seq[seq_idx][next_idx]
137 | 
138 |         # the relative allocentric transform of objects
139 |         transform = functools.reduce(np.dot,
140 |         [np.linalg.inv(cam_pose1), obj_pose1, np.linalg.inv(obj_pose0), cam_pose0]).astype(np.float32)
141 | 
142 |         # the validity of the object is up the object mask
143 |         obj_index = 1 # object index is in default to be 1
144 |         obj_mask0 = self.__load_binary_mask_tensor(self.object_mask_seq[seq_idx][this_idx], obj_index)
145 |         obj_mask1 = self.__load_binary_mask_tensor(self.object_mask_seq[seq_idx][next_idx], obj_index)
146 | 
147 |         calib = np.asarray(self.calib[seq_idx], dtype=np.float32)
148 |         calib[0] *= self.fx_s
149 |         calib[1] *= self.fy_s
150 |         calib[2] *= self.fx_s
151 |         calib[3] *= self.fy_s
152 | 
153 |         obj_name = self.obj_names[seq_idx]
154 |         pair_name = '{:}/{:06d}to{:06d}'.format(obj_name, this_idx, next_idx)
155 | 
156 |         return color0, color1, depth0, depth1, transform, calib, obj_mask0, obj_mask1, pair_name
157 | 
158 |     def __load_rgb_tensor(self, path):
159 |         """ Load the rgb image
160 |         """
161 |         image = imread(path)[:, :, :3]
162 |         image = image.astype(np.float32) / 255.0
163 |         image = resize(image, None, fx=self.fx_s, fy=self.fy_s)
164 |         return image
165 | 
166 |     def __load_depth_tensor(self, path):
167 |         """ Load the depth
168 |         """
169 |         depth = imread(path).astype(np.float32) / 1e3
170 |         depth = resize(depth, None, fx=self.fx_s, fy=self.fy_s, interpolation=INTER_NEAREST)
171 |         depth = np.clip(depth, 1e-1, 1e2) # the valid region of the depth
172 |         return depth[np.newaxis, :]
173 | 
174 |     def __load_binary_mask_tensor(self, path, seg_index):
175 |         """ Load a binary segmentation mask (numbers)
176 |             If the object matches the specified index, return true;
177 |             Otherwise, return false
178 |         """
179 |         obj_mask = imread(path)
180 |         mask = (obj_mask == seg_index)
181 |         mask = resize(mask.astype(np.float), None, fx=self.fx_s, fy=self.fy_s, interpolation=INTER_NEAREST)
182 |         return mask.astype(np.uint8)[np.newaxis, :]
183 | 
184 | def extract_info_pickle(info_pkl): 
185 | 
186 |     with open(info_pkl, 'rb') as p:
187 |         info = pickle.load(p)
188 | 
189 |         color_seq = [x.split('final/')[1] for x in info['color']]
190 |         depth_seq = [x.split('final/')[1] for x in info['depth']]
191 |         invalid_seq = [x.split('final/')[1] for x in info['invalid'] ]
192 |         mask_seq = [x.split('final/')[1] for x in info['object_mask']]
193 | 
194 |         # in this rendering setting, there is only one object
195 |         camera_poses_seq = info['pose']
196 |         object_poses_seq = info['object_poses']['Model_1']
197 |         object_visible_frames = info['object_visible_frames']['Model_1']
198 | 
199 |         calib_seq = info['calib']
200 | 
201 |     return color_seq, depth_seq, invalid_seq, mask_seq, \
202 |         camera_poses_seq, object_poses_seq, object_visible_frames, calib_seq
203 | 
204 | 
205 | if __name__ == '__main__':
206 | 
207 |     loader = MovingObjects3D('', load_type='train', keyframes=[1])
208 | 
209 |     import torchvision.utils as torch_utils
210 |     torch_loader = data.DataLoader(loader, batch_size=16, shuffle=False, num_workers=4)
211 | 
212 |     for batch in torch_loader:
213 |         color0, color1, depth0, depth1, transform, K, mask0, mask1, names = batch
214 |         B,C,H,W=color0.shape
215 | 
216 |         bcolor0_img = torch_utils.make_grid(color0, nrow=4)
217 |         bcolor1_img = torch_utils.make_grid(color1, nrow=4)
218 |         # bdepth0_img = torch_utils.make_grid(depth0, nrow=4)
219 |         # bdepth1_img = torch_utils.make_grid(depth1, nrow=4)
220 |         bmask0_img = torch_utils.make_grid(mask0.view(B,1,H,W)*255, nrow=4)
221 |         bmask1_img = torch_utils.make_grid(mask1.view(B,1,H,W)*255, nrow=4)
222 | 
223 |         import matplotlib.pyplot as plt
224 |         plt.figure()
225 |         plt.imshow(bcolor0_img.numpy().transpose((1,2,0)))
226 |         plt.figure()
227 |         plt.imshow(bcolor1_img.numpy().transpose((1,2,0)))
228 |         # plt.figure()
229 |         # plt.imshow(bdepth0_img.numpy().transpose((1,2,0)))
230 |         # plt.figure()
231 |         # plt.imshow(bdepth1_img.numpy().transpose((1,2,0)))
232 |         plt.figure()
233 |         plt.imshow(bmask0_img.numpy().transpose((1,2,0)))
234 |         plt.figure()
235 |         plt.imshow(bmask1_img.numpy().transpose((1,2,0)))
236 |         plt.show()
237 | 
238 | 


--------------------------------------------------------------------------------
/code/data/SimpleLoader.py:
--------------------------------------------------------------------------------
 1 | """
 2 | This Simple loader partially refers to
 3 | https://github.com/NVlabs/learningrigidity/blob/master/SimpleLoader.py
 4 | 
 5 | @author: Zhaoyang Lv
 6 | @date: May, 2019
 7 | """
 8 | 
 9 | import sys, os, random
10 | import torch.utils.data as data
11 | import os.path as osp
12 | 
13 | import numpy as np
14 | 
15 | from scipy.misc import imread
16 | 
17 | class SimpleLoader(data.Dataset):
18 |     
19 |     def __init__(self, color_dir, depth_dir, K):
20 |         """
21 |         :param the directory of color images
22 |         :param the directory of depth images
23 |         :param the intrinsic parameter [fx, fy, cx, cy]
24 |         """
25 | 
26 |         print('This simple loader is designed for TUM. \n\
27 |             The depth scale may be different in your depth format. ')
28 | 
29 |         color_files = sorted(os.listdir(color_dir))
30 |         depth_files = sorted(os.listdir(depth_dir))
31 | 
32 |         # please ensure the two folders use the same number of synchronized files
33 |         assert(len(color_files) == len(depth_files))
34 | 
35 |         self.color_pairs = []
36 |         self.depth_pairs = []
37 |         self.ids = len(color_files) - 1
38 |         for idx in range(self.ids):        
39 |             self.color_pairs.append([
40 |                 osp.join(color_dir, color_files[idx]), 
41 |                 osp.join(color_dir, color_files[idx+1])
42 |                 ] )
43 |             self.depth_pairs.append([
44 |                 osp.join(depth_dir, depth_files[idx]), 
45 |                 osp.join(depth_dir, depth_files[idx+1])
46 |                 ] )
47 | 
48 |         self.K = K 
49 | 
50 |     def __getitem__(self, index):
51 | 
52 |         image0_path, image1_path = self.color_pairs[index]
53 |         depth0_path, depth1_path = self.depth_pairs[index]
54 | 
55 |         image0 = self.__load_rgb_tensor(image0_path)
56 |         image1 = self.__load_rgb_tensor(image1_path)
57 | 
58 |         depth0 = self.__load_depth_tensor(depth0_path)
59 |         depth1 = self.__load_depth_tensor(depth1_path)
60 | 
61 |         calib = np.asarray(self.K, dtype=np.float32)
62 | 
63 |         return image0, image1, depth0, depth1, calib
64 | 
65 |     def __len__(self):
66 |         return self.ids
67 | 
68 |     def __load_rgb_tensor(self, path):
69 |         image = imread(path)
70 |         image = image.astype(np.float32) / 255.0
71 |         image = np.transpose(image, (2,0,1))
72 |         return image.astype(np.float32)
73 | 
74 |     def __load_depth_tensor(self, path):
75 |         assert(path.endswith('.png'))
76 |         depth = imread(path).astype(np.float32) / 5e3
77 |         depth = np.clip(depth, a_min=0.5, a_max=5.0)
78 | 
79 |         return depth[np.newaxis, :]


--------------------------------------------------------------------------------
/code/data/TUM_RGBD.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Data loader for TUM RGBD benchmark
  3 | 
  4 | @author: Zhaoyang Lv
  5 | @date: March 2019
  6 | """
  7 | 
  8 | from __future__ import absolute_import
  9 | from __future__ import division
 10 | from __future__ import print_function
 11 | from __future__ import unicode_literals
 12 | 
 13 | import sys, os, random
 14 | import pickle
 15 | 
 16 | import numpy as np
 17 | import os.path as osp
 18 | import torch.utils.data as data
 19 | 
 20 | from scipy.misc import imread
 21 | from tqdm import tqdm
 22 | from transforms3d import quaternions
 23 | from cv2 import resize, INTER_NEAREST
 24 | 
 25 | """
 26 | The following scripts use the directory structure as:
 27 | 
 28 | root
 29 | ---- fr1 
 30 | -------- rgbd_dataset_freiburg1_360
 31 | -------- rgbd_dataset_freiburg1_desk 
 32 | -------- rgbd_dataset_freiburg1_desk2 
 33 | -------- ...
 34 | ---- fr2 
 35 | -------- rgbd_dataset_freiburg2_360_hemisphere
 36 | -------- rgbd_dataset_freiburg2_360_kidnap
 37 | -------- rgbd_dataset_freiburg2_coke
 38 | -------- ....
 39 | ---- fr3
 40 | -------- rgbd_dataset_freiburg3_cabinet
 41 | -------- rgbd_dataset_freiburg3_long_office_household
 42 | -------- rgbd_dataset_freiburg3_nostructure_notexture_far
 43 | -------- .... 
 44 | """
 45 | 
 46 | def tum_trainval_dict(): 
 47 |     """ the sequence dictionary of TUM dataset
 48 |         https://vision.in.tum.de/data/datasets/rgbd-dataset/download
 49 | 
 50 |         The calibration parameters refers to: 
 51 |         https://vision.in.tum.de/data/datasets/rgbd-dataset/file_formats 
 52 |     """
 53 |     return  {
 54 |         'fr1': { 
 55 |             'calib': [525.0, 525.0, 319.5, 239.5],
 56 |             'seq': ['rgbd_dataset_freiburg1_desk2',
 57 |                 'rgbd_dataset_freiburg1_floor',
 58 |                 'rgbd_dataset_freiburg1_room',
 59 |                 'rgbd_dataset_freiburg1_xyz',
 60 |                 'rgbd_dataset_freiburg1_rpy',
 61 |                 'rgbd_dataset_freiburg1_plant',
 62 |                 'rgbd_dataset_freiburg1_teddy']
 63 |         },
 64 | 
 65 |         'fr2': {
 66 |             'calib': [525.0, 525.0, 319.5, 239.5],
 67 |             'seq': ['rgbd_dataset_freiburg2_360_hemisphere',
 68 |                 'rgbd_dataset_freiburg2_large_no_loop',
 69 |                 'rgbd_dataset_freiburg2_large_with_loop',
 70 |                 'rgbd_dataset_freiburg2_pioneer_slam',
 71 |                 'rgbd_dataset_freiburg2_pioneer_slam2',
 72 |                 'rgbd_dataset_freiburg2_pioneer_slam3',
 73 |                 'rgbd_dataset_freiburg2_xyz',
 74 |                 'rgbd_dataset_freiburg2_360_kidnap',
 75 |                 'rgbd_dataset_freiburg2_rpy',
 76 |                 'rgbd_dataset_freiburg2_coke', 
 77 |                 'rgbd_dataset_freiburg2_desk_with_person',
 78 |                 'rgbd_dataset_freiburg2_dishes',
 79 |                 'rgbd_dataset_freiburg2_flowerbouquet_brownbackground',
 80 |                 'rgbd_dataset_freiburg2_metallic_sphere2',
 81 |                 'rgbd_dataset_freiburg2_flowerbouquet'
 82 |                 ]
 83 |         }, 
 84 | 
 85 |         'fr3': {
 86 |             'calib': [525.0, 525.0, 319.5, 239.5],
 87 |             'seq': [
 88 |                 'rgbd_dataset_freiburg3_walking_halfsphere',
 89 |                 'rgbd_dataset_freiburg3_walking_rpy',
 90 |                 'rgbd_dataset_freiburg3_cabinet',  
 91 |                 'rgbd_dataset_freiburg3_nostructure_notexture_far', 
 92 |                 'rgbd_dataset_freiburg3_nostructure_notexture_near_withloop', 
 93 |                 'rgbd_dataset_freiburg3_nostructure_texture_far', 
 94 |                 'rgbd_dataset_freiburg3_nostructure_texture_near_withloop', 
 95 |                 'rgbd_dataset_freiburg3_sitting_rpy', 
 96 |                 'rgbd_dataset_freiburg3_sitting_static', 
 97 |                 'rgbd_dataset_freiburg3_sitting_xyz', 
 98 |                 'rgbd_dataset_freiburg3_structure_notexture_near', 
 99 |                 'rgbd_dataset_freiburg3_structure_texture_far', 
100 |                 'rgbd_dataset_freiburg3_structure_texture_near',
101 |                 'rgbd_dataset_freiburg3_teddy']
102 |         }
103 |     }
104 | 
105 | def tum_test_dict():
106 |     """ the trajectorys held out for testing TUM dataset
107 |     """
108 |     return  {
109 |         'fr1': { 
110 |             'calib': [525.0, 525.0, 319.5, 239.5],
111 |             'seq': ['rgbd_dataset_freiburg1_360',
112 |                 'rgbd_dataset_freiburg1_desk'] 
113 |         },
114 | 
115 |         'fr2': {
116 |             'calib': [525.0, 525.0, 319.5, 239.5],
117 |             'seq': ['rgbd_dataset_freiburg2_desk',
118 |                 'rgbd_dataset_freiburg2_pioneer_360']
119 |         }, 
120 | 
121 |         'fr3': {
122 |             'calib': [525.0, 525.0, 319.5, 239.5],
123 |             'seq': ['rgbd_dataset_freiburg3_walking_static', # dynamic scene
124 |                 'rgbd_dataset_freiburg3_walking_xyz',        # dynamic scene
125 |                 'rgbd_dataset_freiburg3_long_office_household']
126 |         }
127 |     }
128 | 
129 | class TUM(data.Dataset): 
130 | 
131 |     base = 'data'
132 | 
133 |     def __init__(self, root = '', category='train', 
134 |         keyframes=[1], data_transform=None, select_traj=None):
135 |         """
136 |         :param the root directory of the data 
137 |         :param select the category (train, validation,test)
138 |         :param select the number of keyframes 
139 |             Test data only support one keyfame at one time
140 |             Train/validation data support mixing different keyframes 
141 |         :param select one particular trajectory at runtime 
142 |             Only support for testing 
143 |         """ 
144 |         super(TUM, self).__init__()
145 |         
146 |         self.image_seq = []
147 |         self.depth_seq = []
148 |         self.invalid_seq = []
149 |         self.cam_pose_seq= []
150 |         self.calib = []
151 |         self.seq_names = []
152 | 
153 |         self.ids = 0
154 |         self.seq_acc_ids = [0]
155 |         self.keyframes = keyframes
156 | 
157 |         self.transforms = data_transform
158 | 
159 |         if category == 'test':
160 |             self.__load_test(root+'/data_tum', select_traj)
161 |         else: # train and validation
162 |             self.__load_train_val(root+'/data_tum', category)
163 | 
164 |         # downscale the input image to a quarter
165 |         self.fx_s = 0.25
166 |         self.fy_s = 0.25
167 | 
168 |         print('TUM dataloader for {:} using keyframe {:}: \
169 |             {:} valid frames'.format(category, keyframes, self.ids))
170 | 
171 |     def __load_train_val(self, root, category):
172 | 
173 |         tum_data = tum_trainval_dict()
174 | 
175 |         for ks, scene in tum_data.items():
176 |             for seq_name in scene['seq']: 
177 | 
178 |                 seq_path = osp.join(ks, seq_name)
179 |                 self.calib.append(scene['calib'])
180 |                 # synchronized trajectory file 
181 |                 sync_traj_file = osp.join(root, seq_path, 'sync_trajectory.pkl') 
182 |                 if not osp.isfile(sync_traj_file):
183 |                     print("The synchronized trajectory file {:} has not been generated.".format(seq_path))
184 |                     print("Generate it now...")
185 |                     write_sync_trajectory(root, ks, seq_name)
186 | 
187 |                 with open(sync_traj_file, 'rb') as p:
188 |                     trainval = pickle.load(p)
189 |                     total_num = len(trainval)
190 |                     # the ratio to split the train & validation set                        
191 |                     if category == 'train': 
192 |                         start_idx, end_idx = 0, int(0.95*total_num)
193 |                     else:
194 |                         start_idx, end_idx = int(0.95*total_num), total_num
195 | 
196 |                     images = [trainval[idx][1] for idx in range(start_idx, end_idx)]
197 |                     depths = [trainval[idx][2] for idx in range(start_idx, end_idx)]
198 |                     extrin = [tq2mat(trainval[idx][0]) for idx in range(start_idx, end_idx)]
199 |                     self.image_seq.append(images)
200 |                     self.depth_seq.append(depths)
201 |                     self.cam_pose_seq.append(extrin)
202 |                     self.seq_names.append(seq_path)
203 |                     self.ids += max(0, len(images) - max(self.keyframes))
204 |                     self.seq_acc_ids.append(self.ids)
205 | 
206 |     def __load_test(self, root, select_traj=None):
207 |         """ Note: 
208 |         The test trajectory is loaded slightly different from the train/validation trajectory. 
209 |         We only select keyframes from the entire trajectory, rather than use every individual frame. 
210 |         For a given trajectory of length N, using key-frame 2, the train/validation set will use 
211 |         [[1, 3], [2, 4], [3, 5],...[N-1, N]], 
212 |         while test set will use pair 
213 |         [[1, 3], [3, 5], [5, 7],...[N-1, N]]
214 |         This difference result in a change in the trajectory length when using different keyframes.
215 | 
216 |         The benefit of sampling keyframes of the test set is that the output is a more reasonable trajectory;
217 |         And in training/validation, we fully leverage every pair of image.
218 |         """
219 | 
220 |         tum_data = tum_test_dict()
221 | 
222 |         assert(len(self.keyframes) == 1) 
223 |         kf = self.keyframes[0]
224 |         self.keyframes = [1]
225 | 
226 |         for ks, scene in tum_data.items():
227 |             for seq_name in scene['seq']: 
228 |                 seq_path = osp.join(ks, seq_name)
229 | 
230 |                 if select_traj is not None: 
231 |                     if seq_path != select_traj: continue
232 | 
233 |                 self.calib.append(scene['calib'])
234 |                 # synchronized trajectory file 
235 |                 sync_traj_file = osp.join(root, seq_path, 'sync_trajectory.pkl') 
236 |                 if not osp.isfile(sync_traj_file):
237 |                     print("The synchronized trajectory file {:} has not been generated.".format(seq_path))
238 |                     print("Generate it now...")
239 |                     write_sync_trajectory(root, ks, seq_name)
240 | 
241 |                 with open(sync_traj_file, 'rb') as p:
242 |                     frames = pickle.load(p)
243 |                     total_num = len(frames)
244 | 
245 |                     images = [frames[idx][1] for idx in range(0, total_num, kf)]
246 |                     depths = [frames[idx][2] for idx in range(0, total_num, kf)]
247 |                     extrin = [tq2mat(frames[idx][0]) for idx in range(0, total_num, kf)]
248 |                     self.image_seq.append(images)
249 |                     self.depth_seq.append(depths)
250 |                     self.cam_pose_seq.append(extrin)
251 |                     self.seq_names.append(seq_path)
252 |                     self.ids += max(0, len(images)-1)
253 |                     self.seq_acc_ids.append(self.ids)
254 | 
255 |         if len(self.image_seq) == 0:
256 |             raise Exception("The specified trajectory is not in the test set.")
257 | 
258 |     def __getitem__(self, index): 
259 |         seq_idx = max(np.searchsorted(self.seq_acc_ids, index+1) - 1, 0)
260 |         frame_idx = index - self.seq_acc_ids[seq_idx]
261 | 
262 |         this_idx = frame_idx
263 |         next_idx = frame_idx + random.choice(self.keyframes)
264 | 
265 |         color0 = self.__load_rgb_tensor(self.image_seq[seq_idx][this_idx])
266 |         color1 = self.__load_rgb_tensor(self.image_seq[seq_idx][next_idx])
267 | 
268 |         depth0 = self.__load_depth_tensor(self.depth_seq[seq_idx][this_idx])
269 |         depth1 = self.__load_depth_tensor(self.depth_seq[seq_idx][next_idx])
270 | 
271 |         if self.transforms:
272 |             color0, color1 = self.transforms([color0, color1])            
273 | 
274 |         # normalize the coordinate
275 |         calib = np.asarray(self.calib[seq_idx], dtype=np.float32)
276 |         calib[0] *= self.fx_s
277 |         calib[1] *= self.fy_s
278 |         calib[2] *= self.fx_s
279 |         calib[3] *= self.fy_s
280 | 
281 |         cam_pose0 = self.cam_pose_seq[seq_idx][this_idx]
282 |         cam_pose1 = self.cam_pose_seq[seq_idx][next_idx]
283 |         transform = np.dot(np.linalg.inv(cam_pose1), cam_pose0).astype(np.float32)        
284 | 
285 |         name = '{:}_{:06d}to{:06d}'.format(self.seq_names[seq_idx], 
286 |             this_idx, next_idx)
287 | 
288 |         return color0, color1, depth0, depth1, transform, calib, name
289 | 
290 |     def __len__(self): 
291 |         return self.ids
292 | 
293 |     def __load_rgb_tensor(self, path):
294 |         """ Load the rgb image
295 |         """
296 |         image = imread(path)[:, :, :3]
297 |         image = image.astype(np.float32) / 255.0
298 |         image = resize(image, None, fx=self.fx_s, fy=self.fy_s)
299 |         return image
300 | 
301 |     def __load_depth_tensor(self, path):
302 |         """ Load the depth:
303 |             The depth images are scaled by a factor of 5000, i.e., a pixel 
304 |             value of 5000 in the depth image corresponds to a distance of 
305 |             1 meter from the camera, 10000 to 2 meter distance, etc. 
306 |             A pixel value of 0 means missing value/no data.
307 |         """
308 |         depth = imread(path).astype(np.float32) / 5e3
309 |         depth = resize(depth, None, fx=self.fx_s, fy=self.fy_s, interpolation=INTER_NEAREST)
310 |         depth = np.clip(depth, a_min=0.5, a_max=5.0) # the accurate range of kinect depth
311 |         return depth[np.newaxis, :]        
312 | 
313 | """
314 | Some utility files to work with the data
315 | """
316 | 
317 | def tq2mat(tq): 
318 |     """ transform translation-quaternion (tq) to (4x4) matrix
319 |     """
320 |     tq = np.array(tq)
321 |     T = np.eye(4)
322 |     T[:3,:3] = quaternions.quat2mat(np.roll(tq[3:], 1))
323 |     T[:3, 3] = tq[:3]
324 |     return T
325 | 
326 | def write_sync_trajectory(local_dir, dataset, subject_name):
327 |     """
328 |     :param the root of the directory 
329 |     :param the dataset category 'fr1', 'fr2' or 'fr3'
330 |     """
331 |     rgb_file  = osp.join(local_dir, dataset, subject_name, 'rgb.txt')
332 |     depth_file= osp.join(local_dir, dataset, subject_name, 'depth.txt')
333 |     pose_file = osp.join(local_dir, dataset, subject_name, 'groundtruth.txt')
334 | 
335 |     rgb_list = read_file_list(rgb_file)
336 |     depth_list=read_file_list(depth_file)
337 |     pose_list = read_file_list(pose_file)
338 | 
339 |     matches = associate_three(rgb_list, depth_list, pose_list, offset=0.0, max_difference=0.02)
340 | 
341 |     trajectory_info = []
342 |     for (a,b,c) in matches:
343 |         pose = [float(x) for x in pose_list[c]]
344 |         rgb_file = osp.join(local_dir, dataset, subject_name, rgb_list[a][0])
345 |         depth_file = osp.join(local_dir, dataset, subject_name, depth_list[b][0])
346 |         trajectory_info.append([pose, rgb_file, depth_file])
347 | 
348 |     dataset_path = osp.join(local_dir, dataset, subject_name, 'sync_trajectory.pkl')
349 | 
350 |     with open(dataset_path, 'wb') as output:
351 |         pickle.dump(trajectory_info, output)
352 | 
353 |     txt_path = osp.join(local_dir, dataset, subject_name, 'sync_trajectory.txt')
354 |     pickle2txts(dataset_path, txt_path)
355 | 
356 | def pickle2txts(pickle_file, txt_file):
357 |     '''
358 |     write the pickle_file into a txt_file
359 |     '''
360 |     with open(pickle_file, 'rb') as pkl_file:
361 |         traj = pickle.load(pkl_file)
362 | 
363 |     with open(txt_file, 'w') as f:
364 |         for frame in traj:
365 |             f.write(' '.join(['%f ' % x for x in frame[0]]))
366 |             f.write(frame[1] + ' ')
367 |             f.write(frame[2] + '\n')
368 | 
369 | """
370 | The following utility files are provided by TUM RGBD dataset benchmark
371 | 
372 | Refer: https://vision.in.tum.de/data/datasets/rgbd-dataset/tools
373 | """
374 | 
375 | def read_file_list(filename):
376 |     """
377 |     Reads a trajectory from a text file.
378 | 
379 |     File format:
380 |     The file format is "stamp d1 d2 d3 ...", where stamp denotes the time stamp (to be matched)
381 |     and "d1 d2 d3.." is arbitary data (e.g., a 3D position and 3D orientation) associated to this timestamp.
382 | 
383 |     Input:
384 |     filename -- File name
385 | 
386 |     Output:
387 |     dict -- dictionary of (stamp,data) tuples
388 | 
389 |     """
390 |     file = open(filename)
391 |     data = file.read()
392 |     lines = data.replace(","," ").replace("\t"," ").split("\n")
393 |     list = [[v.strip() for v in line.split(" ") if v.strip()!=""] for line in lines if len(line)>0 and line[0]!="#"]
394 |     list = [(float(l[0]),l[1:]) for l in list if len(l)>1]
395 |     return dict(list)
396 | 
397 | def associate(first_list, second_list,offset,max_difference):
398 |     """
399 |     Associate two dictionaries of (stamp,data). As the time stamps never match exactly, we aim
400 |     to find the closest match for every input tuple.
401 | 
402 |     Input:
403 |     first_list -- first dictionary of (stamp,data) tuples
404 |     second_list -- second dictionary of (stamp,data) tuples
405 |     offset -- time offset between both dictionaries (e.g., to model the delay between the sensors)
406 |     max_difference -- search radius for candidate generation
407 | 
408 |     Output:
409 |     matches -- list of matched tuples ((stamp1,data1),(stamp2,data2))
410 | 
411 |     """
412 |     first_keys = first_list.keys()
413 |     second_keys = second_list.keys()
414 |     potential_matches = [(abs(a - (b + offset)), a, b)
415 |                          for a in first_keys
416 |                          for b in second_keys
417 |                          if abs(a - (b + offset)) < max_difference]
418 |     potential_matches.sort()
419 |     matches = []
420 |     for diff, a, b in potential_matches:
421 |         if a in first_keys and b in second_keys:
422 |             first_keys.remove(a)
423 |             second_keys.remove(b)
424 |             matches.append((a, b))
425 | 
426 |     matches.sort()
427 |     return matches
428 | 
429 | def associate_three(first_list, second_list, third_list, offset, max_difference):
430 |     """
431 |     Associate two dictionaries of (stamp,data). As the time stamps never match exactly, we aim to find the closest match for every input tuple.
432 | 
433 |     Input:
434 |     first_list -- first dictionary of (stamp,data) tuples (default to be rgb)
435 |     second_list -- second dictionary of (stamp,data) tuples (default to be depth)
436 |     third_list -- third dictionary of (stamp,data) tuples (default to be pose)
437 |     offset -- time offset between both dictionaries (e.g., to model the delay between the sensors)
438 |     max_difference -- search radius for candidate generation
439 | 
440 |     Output:
441 |     matches -- list of matched tuples ((stamp1,data1),(stamp2,data2),(stamp3,data3))
442 |     """
443 | 
444 |     first_keys = list(first_list)
445 |     second_keys = list(second_list)
446 |     third_keys = list(third_list)
447 |     # find the potential matches in (rgb, depth)
448 |     potential_matches_ab = [(abs(a - (b + offset)), a, b)
449 |                             for a in first_keys
450 |                             for b in second_keys
451 |                             if abs(a - (b + offset)) < max_difference]
452 |     potential_matches_ab.sort()
453 |     matches_ab = []
454 |     for diff, a, b in potential_matches_ab:
455 |         if a in first_keys and b in second_keys:
456 |             matches_ab.append((a, b))
457 | 
458 |     matches_ab.sort()
459 | 
460 |     # find the potential matches in (rgb, depth, pose)
461 |     potential_matches = [(abs(a - (c + offset)), abs(b - (c + offset)), a,b,c)
462 |                         for (a,b) in matches_ab
463 |                         for c in third_keys
464 |                         if abs(a - (c + offset)) < max_difference and
465 |                         abs(b - (c + offset)) < max_difference]
466 | 
467 |     potential_matches.sort()
468 |     matches_abc = []
469 |     for diff_rgb, diff_depth, a, b, c in potential_matches:
470 |         if a in first_keys and b in second_keys and c in third_keys:
471 |             first_keys.remove(a)
472 |             second_keys.remove(b)
473 |             third_keys.remove(c)
474 |             matches_abc.append((a,b,c))
475 |     matches_abc.sort()
476 |     return matches_abc
477 | 
478 | if __name__ == '__main__': 
479 | 
480 |     loader = TUM(category='test', keyframes=[1])
481 | 
482 |     import torchvision.utils as torch_utils
483 |  
484 |     torch_loader = data.DataLoader(loader, batch_size=16, 
485 |         shuffle=False, num_workers=4)
486 | 
487 |     for batch in torch_loader: 
488 |         color0, color1, depth0, depth1, transform, K, name = batch
489 |         B, C, H, W = color0.shape
490 | 
491 |         bcolor0_img = torch_utils.make_grid(color0, nrow=4)
492 | 
493 |         import matplotlib.pyplot as plt
494 |         plt.figure()
495 |         plt.imshow(bcolor0_img.numpy().transpose(1,2,0))
496 |         plt.show()


--------------------------------------------------------------------------------
/code/data/data_examples/TUM/color/1305031790.645155.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/color/1305031790.645155.png


--------------------------------------------------------------------------------
/code/data/data_examples/TUM/color/1305031790.713097.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/color/1305031790.713097.png


--------------------------------------------------------------------------------
/code/data/data_examples/TUM/color/1305031790.781258.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/color/1305031790.781258.png


--------------------------------------------------------------------------------
/code/data/data_examples/TUM/color/1305031790.845151.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/color/1305031790.845151.png


--------------------------------------------------------------------------------
/code/data/data_examples/TUM/color/1305031790.913129.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/color/1305031790.913129.png


--------------------------------------------------------------------------------
/code/data/data_examples/TUM/depth/1305031790.640468.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/depth/1305031790.640468.png


--------------------------------------------------------------------------------
/code/data/data_examples/TUM/depth/1305031790.709421.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/depth/1305031790.709421.png


--------------------------------------------------------------------------------
/code/data/data_examples/TUM/depth/1305031790.773548.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/depth/1305031790.773548.png


--------------------------------------------------------------------------------
/code/data/data_examples/TUM/depth/1305031790.839363.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/depth/1305031790.839363.png


--------------------------------------------------------------------------------
/code/data/data_examples/TUM/depth/1305031790.909436.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/data/data_examples/TUM/depth/1305031790.909436.png


--------------------------------------------------------------------------------
/code/data/dataloader.py:
--------------------------------------------------------------------------------
  1 | """ The dataloaders for training and evaluation
  2 | 
  3 | @author: Zhaoyang Lv
  4 | @date: March 2019
  5 | """
  6 | 
  7 | from __future__ import absolute_import
  8 | from __future__ import division
  9 | from __future__ import print_function
 10 | from __future__ import unicode_literals
 11 | 
 12 | import torchvision.transforms as transforms
 13 | import numpy as np
 14 | 
 15 | def load_data(dataset_name, keyframes = None, load_type = 'train',
 16 |     select_trajectory = '', load_numpy = False):
 17 |     """ Use two frame camera pose data loader
 18 |     """
 19 |     if select_trajectory == '':
 20 |         select_trajectory = None
 21 | 
 22 |     if not load_numpy:
 23 |         if load_type == 'train': 
 24 |             data_transform = image_transforms(['color_augment', 'numpy2torch'])
 25 |         else:
 26 |             data_transform = image_transforms(['numpy2torch'])
 27 |     else:
 28 |         data_transform = image_transforms([])
 29 | 
 30 |     if dataset_name == 'TUM_RGBD':
 31 |         from data.TUM_RGBD import TUM
 32 |         np_loader = TUM('data', load_type, keyframes, data_transform, select_trajectory)
 33 |     elif dataset_name == 'MovingObjects3D': 
 34 |         from data.MovingObj3D import MovingObjects3D
 35 |         np_loader = MovingObjects3D('data', load_type, keyframes, data_transform)
 36 |     # elif dataset_name == 'BundleFusion':
 37 |     #     from data.BundleFusion import BundleFusion
 38 |     #     np_loader = BundleFusion(load_type, keyframes, data_transform)
 39 |     # elif dataset_name == 'Refresh':
 40 |     #     from data.REFRESH import REFRESH
 41 |     #     np_loader = REFRESH(load_type, keyframes)
 42 |     else:
 43 |         raise NotImplementedError()
 44 | 
 45 |     return np_loader
 46 | 
 47 | def image_transforms(options):
 48 | 
 49 |     transform_list = []
 50 | 
 51 |     if 'color_augment' in options: 
 52 |         augment_parameters = [0.9, 1.1, 0.9, 1.1, 0.9, 1.1]
 53 |         transform_list.append(AugmentImages(augment_parameters))
 54 | 
 55 |     if 'numpy2torch' in options:
 56 |         transform_list.append(ToTensor())
 57 | 
 58 |     # if 'color_normalize' in options: # we do it on the fly
 59 |     #     transform_list.append(ColorNormalize())
 60 | 
 61 |     return transforms.Compose(transform_list)
 62 | 
 63 | class ColorNormalize(object):
 64 | 
 65 |     def __init__(self):
 66 |         rgb_mean = (0.4914, 0.4822, 0.4465)
 67 |         rgb_std = (0.2023, 0.1994, 0.2010)
 68 |         self.transform = transforms.Normalize(mean=rgb_mean, std=rgb_std)
 69 | 
 70 |     def __call__(self, sample):
 71 |         return [self.transform(x) for x in sample]
 72 | 
 73 | class ToTensor(object):
 74 |     def __init__(self):
 75 |         self.transform = transforms.ToTensor()
 76 | 
 77 |     def __call__(self, sample):
 78 |         return [self.transform(x) for x in sample] 
 79 | 
 80 | class AugmentImages(object):
 81 |     def __init__(self, augment_parameters):
 82 |         self.gamma_low  = augment_parameters[0]         # 0.9
 83 |         self.gamma_high = augment_parameters[1]         # 1.1
 84 |         self.brightness_low  = augment_parameters[2]    # 0.9
 85 |         self.brightness_high = augment_parameters[3]    # 1,1
 86 |         self.color_low  = augment_parameters[4]         # 0.9
 87 |         self.color_high = augment_parameters[5]         # 1.1
 88 | 
 89 |         self.thresh = 0.5
 90 | 
 91 |     def __call__(self, sample):
 92 |         p = np.random.uniform(0, 1, 1)
 93 |         if p > self.thresh:
 94 |             random_gamma = np.random.uniform(self.gamma_low, self.gamma_high)
 95 |             random_brightness = np.random.uniform(self.brightness_low, self.brightness_high)
 96 |             random_colors = np.random.uniform(self.color_low, self.color_high, 3)
 97 |             for x in sample:
 98 |                 x = x ** random_gamma             # randomly shift gamma
 99 |                 x = x * random_brightness         # randomly shift brightness
100 |                 for i in range(3):                # randomly shift color
101 |                     x[:, :, i] *= random_colors[i]
102 |                     x[:, :, i] *= random_colors[i]
103 |                 x = np.clip(x, a_min=0, a_max=1)  # saturate
104 |             return sample
105 |         else:        
106 |             return sample
107 | 


--------------------------------------------------------------------------------
/code/evaluate.py:
--------------------------------------------------------------------------------
  1 | """ 
  2 | Evaluation scripts
  3 | 
  4 | @author: Zhaoyang Lv
  5 | @date: March 2019
  6 | """
  7 | 
  8 | import os, sys, argparse, pickle
  9 | import os.path as osp
 10 | import numpy as np
 11 | import pandas as pd
 12 | 
 13 | import torch
 14 | import torch.utils.data as data
 15 | import torchvision.utils as torch_utils
 16 | import torch.nn as nn
 17 | 
 18 | import models.LeastSquareTracking as ICtracking
 19 | import models.criterions as criterions
 20 | import train_utils
 21 | import config
 22 | 
 23 | from data.dataloader import load_data
 24 | from timers import Timers
 25 | from tqdm import tqdm
 26 | 
 27 | def check_directory(filename):
 28 |     target_dir = os.path.dirname(filename)
 29 |     if not os.path.isdir(target_dir):
 30 |         os.makedirs(target_dir)
 31 | 
 32 | def eval_trajectories(dataset):
 33 |     return {
 34 |         'TUM_RGBD': ['fr1/rgbd_dataset_freiburg1_360', 
 35 |             'fr1/rgbd_dataset_freiburg1_desk', 
 36 |             'fr2/rgbd_dataset_freiburg2_desk', 
 37 |             'fr2/rgbd_dataset_freiburg2_pioneer_360']
 38 |     }[dataset]
 39 | 
 40 | def create_eval_loaders(options, eval_type, keyframes, 
 41 |     total_batch_size = 8, 
 42 |     trajectory  = ''):
 43 |     """ create the evaluation loader at different keyframes set-up
 44 |     """
 45 |     eval_loaders = {}
 46 | 
 47 |     if trajectory == '': 
 48 |         trajectories = eval_trajectories(options.dataset)
 49 |     else: 
 50 |         trajectories = [trajectory]
 51 | 
 52 |     for trajectory in trajectories:
 53 |         for kf in keyframes:
 54 |             np_loader = load_data(options.dataset, [kf], eval_type, trajectory)
 55 |             eval_loaders['{:}_keyframe_{:}'.format(trajectory, kf)] = data.DataLoader(np_loader, 
 56 |                 batch_size = int(total_batch_size),
 57 |                 shuffle = False, num_workers = options.cpu_workers)
 58 |     
 59 |     return eval_loaders
 60 | 
 61 | def evaluate_trust_region(dataloader, net, objectives, eval_name='',
 62 |         known_mask = False, timers = None):
 63 |     """ evaluate the trust-region method given the two-frame pose estimation
 64 |     :param the pytorch dataloader
 65 |     :param the network
 66 |     :param the evaluation objective names, e.g. RPE, EPE3D
 67 |     :param True if ground mask if known
 68 |     :param (optional) timing each step
 69 |     """
 70 | 
 71 |     progress = tqdm(dataloader, ncols=100,
 72 |         desc = 'evaluate deeper inverse compositional algorithm {:}'.format(eval_name),
 73 |         total= len(dataloader))
 74 | 
 75 |     net.eval()
 76 | 
 77 |     total_frames = len(dataloader.dataset)
 78 | 
 79 |     outputs = {
 80 |         'R_est': np.zeros((total_frames, 3, 3)),
 81 |         't_est': np.zeros((total_frames, 3)),
 82 |         'names': []
 83 |     }
 84 |     flow_loss, rpe_loss = None, None
 85 |     if 'EPE3D' in objectives: 
 86 |         flow_loss = criterions.compute_RT_EPE_loss
 87 |         outputs['epes'] = np.zeros(total_frames)
 88 |     if 'RPE' in objectives:
 89 |         rpe_loss = criterions.compute_RPE_loss
 90 |         outputs['angular_error'] = np.zeros(total_frames)  
 91 |         outputs['translation_error'] = np.zeros(total_frames) 
 92 | 
 93 |     count_base = 0
 94 | 
 95 |     if timers: timers.tic('one iteration')
 96 | 
 97 |     count = 1
 98 |     for idx, batch in enumerate(progress):
 99 | 
100 |         if timers: timers.tic('forward step')
101 | 
102 |         names = batch[-1]
103 | 
104 |         if known_mask: # for dataset that with mask or need mask
105 |             color0, color1, depth0, depth1, Rt, K, obj_mask0, obj_mask1 = \
106 |                 train_utils.check_cuda(batch[:8])
107 |         else:
108 |             color0, color1, depth0, depth1, Rt, K = \
109 |                 train_utils.check_cuda(batch[:6])
110 |             obj_mask0, obj_mask1 = None, None
111 | 
112 |         B, _, H, W = depth0.shape
113 |             
114 |         with torch.no_grad():
115 |             output = net.forward(color0, color1, depth0, depth1, K)
116 |             R, t = output
117 | 
118 |         if timers: timers.toc('forward step')
119 | 
120 |         outputs['R_est'][count_base:count_base+B] = R.cpu().numpy()
121 |         outputs['t_est'][count_base:count_base+B] = t.cpu().numpy()
122 | 
123 |         if timers: timers.tic('evaluate')
124 |         R_gt, t_gt = Rt[:,:3,:3], Rt[:,:3,3]
125 |         if rpe_loss: # evaluate the relative pose error             
126 |             angle_error, trans_error = rpe_loss(R, t, R_gt, t_gt)
127 |             outputs['angular_error'][count_base:count_base+B] = angle_error.cpu().numpy()
128 |             outputs['translation_error'][count_base:count_base+B] = trans_error.cpu().numpy()
129 | 
130 |         if flow_loss:# evaluate the end-point-error loss 3D
131 |             invalid_mask = (depth0 == depth0.min()) + (depth0 == depth0.max())
132 |             if obj_mask0 is not None: invalid_mask = ~obj_mask0 + invalid_mask
133 | 
134 |             epes3d = flow_loss(R, t, R_gt, t_gt, depth0, K, invalid_mask)            
135 |             outputs['epes'][count_base:count_base+B] = epes3d.cpu().numpy()
136 | 
137 |         outputs['names'] += names
138 | 
139 |         count_base += B
140 | 
141 |         if timers: timers.toc('evaluate')
142 |         if timers: timers.toc('one iteration')
143 |         if timers: timers.tic('one iteration')
144 | 
145 |     # if timers: timers.print()
146 | 
147 |     return outputs
148 | 
149 | def test_TrustRegion(options):
150 | 
151 |     if options.time:
152 |         timers = Timers()
153 |     else:
154 |         timers = None
155 | 
156 |     print('Evaluate test performance with the (deep) direct method.')
157 | 
158 |     total_batch_size = options.batch_per_gpu *  torch.cuda.device_count()
159 | 
160 |     keyframes = [int(x) for x in options.keyframes.split(',')]
161 |     if options.dataset in ['BundleFusion', 'TUM_RGBD']:
162 |         obj_has_mask = False
163 |     else:
164 |         obj_has_mask = True
165 | 
166 |     eval_loaders = create_eval_loaders(options, options.eval_set,
167 |         keyframes, total_batch_size, options.trajectory)
168 | 
169 |     if options.checkpoint == '':
170 |         print('No checkpoint loaded. Use the non-learning method')
171 |         net = ICtracking.LeastSquareTracking(
172 |             encoder_name    = 'RGB',
173 |             max_iter_per_pyr= options.max_iter_per_pyr,
174 |             mEst_type       = 'None',
175 |             solver_type     = 'Direct-Nodamping')
176 |         if torch.cuda.is_available(): net.cuda()
177 |         net.eval()
178 |     else:
179 |         train_utils.load_checkpoint_test(options)
180 | 
181 |         net = ICtracking.LeastSquareTracking(
182 |             encoder_name    = options.encoder_name,
183 |             max_iter_per_pyr= options.max_iter_per_pyr,
184 |             mEst_type       = options.mestimator,
185 |             solver_type     = options.solver,
186 |             no_weight_sharing = options.no_weight_sharing)
187 | 
188 |         if torch.cuda.is_available(): net.cuda()
189 |         net.eval()
190 | 
191 |         # check whether it is a single checkpoint or a directory
192 |         net.load_state_dict(torch.load(options.checkpoint)['state_dict'])
193 | 
194 |     eval_objectives = ['EPE3D', 'RPE']
195 | 
196 |     output_prefix = '_'.join([
197 |         options.network,
198 |         options.encoder_name,
199 |         options.mestimator,
200 |         options.solver,
201 |         'iter', str(options.max_iter_per_pyr)
202 |     ])
203 | 
204 |     # evaluate results per trajectory per key-frame
205 |     outputs = {}
206 |     for k, loader in eval_loaders.items():
207 | 
208 |         traj_name, kf = k.split('_keyframe_')
209 | 
210 |         output_name = '{:}_{:}'.format(output_prefix, k)
211 |         info = evaluate_trust_region(loader, net,
212 |             eval_objectives,
213 |             eval_name = 'tmp/'+output_name,
214 |             known_mask=obj_has_mask)
215 | 
216 |         # collect results 
217 |         outputs[k] = pd.Series([info['epes'].mean(), 
218 |             info['angular_error'].mean(), 
219 |             info['translation_error'].mean(), 
220 |             info['epes'].shape[0], int(kf), traj_name], 
221 |             index=['3D EPE', 'axis error', 'trans error', 'total frames', 'keyframe', 'trajectory'])
222 | 
223 |         print(outputs[k])
224 | 
225 |         checkpoint_name = options.checkpoint.replace('.pth.tar', '')
226 |         if checkpoint_name == '':
227 |             checkpoint_name = 'nolearning'
228 |         output_dir = osp.join(options.eval_set+'_results', checkpoint_name, k)
229 |         output_pkl = output_dir + '.pkl'
230 |         
231 |         check_directory(output_pkl)
232 | 
233 |         with open(output_pkl, 'wb') as output: # dump per-frame results info
234 |             info = info
235 |             pickle.dump(info, output)
236 | 
237 |     """ =============================================================== """
238 |     """             Generate the final evaluation results                   """
239 |     """ =============================================================== """
240 | 
241 |     outputs_pd = pd.DataFrame(outputs).T
242 |     outputs_pd['3D EPE'] *= 100 # convert to cm
243 |     outputs_pd['axis error'] *= (180/np.pi) # convert to degree
244 |     outputs_pd['trans error'] *= 100 # convert to cm
245 | 
246 |     print(outputs_pd)
247 | 
248 |     stats_dict = {}
249 |     for kf in keyframes:        
250 |         kf_outputs = outputs_pd[outputs_pd['keyframe']==kf]
251 | 
252 |         stats_dict['mean values of trajectories keyframe {:}'.format(kf)] = pd.Series(
253 |             [kf_outputs['3D EPE'].mean(), 
254 |              kf_outputs['axis error'].mean(),
255 |              kf_outputs['trans error'].mean(), kf], 
256 |             index=['3D EPE', 'axis error', 'trans error', 'keyframe'])
257 | 
258 |         total_frames = kf_outputs['total frames'].sum()
259 |         stats_dict['mean values of frames keyframe {:}'.format(kf)] = pd.Series(
260 |             [(kf_outputs['3D EPE'] * kf_outputs['total frames']).sum() / total_frames, 
261 |              (kf_outputs['axis error'] * kf_outputs['total frames']).sum() / total_frames, 
262 |              (kf_outputs['trans error']* kf_outputs['total frames']).sum() / total_frames, kf],
263 |             index=['3D EPE', 'axis error', 'trans error', 'keyframe'])
264 | 
265 |     stats_pd = pd.DataFrame(stats_dict).T
266 |     print(stats_pd)
267 | 
268 |     final_pd = outputs_pd.append(stats_pd, sort=False)
269 |     final_pd.to_csv('{:}.csv'.format(output_dir))
270 | 
271 |     return outputs_pd
272 | 
273 | if __name__ == '__main__':
274 | 
275 |     parser = argparse.ArgumentParser(description="Evaluate the network")
276 |     config.add_basics_config(parser)
277 |     config.add_test_basics_config(parser)
278 |     config.add_tracking_config(parser)
279 | 
280 |     options = parser.parse_args()
281 | 
282 |     print('---------------------------------------')
283 | 
284 |     outputs = test_TrustRegion(options)
285 | 
286 | 


--------------------------------------------------------------------------------
/code/models/LeastSquareTracking.py:
--------------------------------------------------------------------------------
  1 | """
  2 | The learned Inverse Compositional Tracking.
  3 | Support both ego-motion and object-motion tracking
  4 | 
  5 | @author: Zhaoyang Lv
  6 | @Date: March, 2019
  7 | """
  8 | 
  9 | import torch
 10 | import torch.nn as nn
 11 | import numpy as np
 12 | 
 13 | from models.submodules import convLayer as conv
 14 | from models.submodules import color_normalize
 15 | 
 16 | from models.algorithms import TrustRegionBase as TrustRegion
 17 | from models.algorithms import ImagePyramids, DirectSolverNet, FeaturePyramid, DeepRobustEstimator
 18 | 
 19 | class LeastSquareTracking(nn.Module):
 20 | 
 21 |     # all enum types
 22 |     NONE                = -1
 23 |     RGB                 = 0
 24 | 
 25 |     CONV_RGBD           = 1
 26 |     CONV_RGBD2          = 2
 27 | 
 28 |     def __init__(self, encoder_name,
 29 |         max_iter_per_pyr,
 30 |         mEst_type,
 31 |         solver_type,
 32 |         tr_samples = 10,
 33 |         no_weight_sharing = False,
 34 |         timers = None):
 35 |         """
 36 |         :param the backbone network used for regression.
 37 |         :param the maximum number of iterations at each pyramid levels
 38 |         :param the type of weighting functions.
 39 |         :param the type of solver. 
 40 |         :param number of samples in trust-region solver
 41 |         :param True if we do not want to share weight at different pyramid levels
 42 |         :param (optional) time to benchmark time consumed at each step
 43 |         """
 44 |         super(LeastSquareTracking, self).__init__()
 45 | 
 46 |         self.construct_image_pyramids = ImagePyramids([0,1,2,3], pool='avg')
 47 |         self.construct_depth_pyramids = ImagePyramids([0,1,2,3], pool='max')
 48 | 
 49 |         self.timers = timers
 50 | 
 51 |         """ =============================================================== """
 52 |         """             Initialize the Deep Feature Extractor               """
 53 |         """ =============================================================== """
 54 | 
 55 |         if encoder_name == 'RGB':
 56 |             print('The network will use raw image as measurements.')
 57 |             self.encoder = None
 58 |             self.encoder_type = self.RGB
 59 |             context_dim = 1
 60 |         elif encoder_name == 'ConvRGBD':
 61 |             print('Use a network with RGB-D information \
 62 |             to extract the features')
 63 |             context_dim = 4
 64 |             self.encoder = FeaturePyramid(D=context_dim)
 65 |             self.encoder_type = self.CONV_RGBD
 66 |         elif encoder_name == 'ConvRGBD2':
 67 |             print('Use two stream network with two frame input')
 68 |             context_dim = 8
 69 |             self.encoder = FeaturePyramid(D=context_dim)
 70 |             self.encoder_type = self.CONV_RGBD2
 71 |         else:
 72 |             raise NotImplementedError()
 73 | 
 74 |         """ =============================================================== """
 75 |         """             Initialize the Robust Estimator                     """
 76 |         """ =============================================================== """
 77 | 
 78 |         if no_weight_sharing:
 79 |             self.mEst_func0 = DeepRobustEstimator(mEst_type)
 80 |             self.mEst_func1 = DeepRobustEstimator(mEst_type)
 81 |             self.mEst_func2 = DeepRobustEstimator(mEst_type)
 82 |             self.mEst_func3 = DeepRobustEstimator(mEst_type)
 83 |             mEst_funcs = [self.mEst_func0, self.mEst_func1, self.mEst_func2,
 84 |             self.mEst_func3]
 85 |         else:
 86 |             self.mEst_func = DeepRobustEstimator(mEst_type)
 87 |             mEst_funcs = [self.mEst_func, self.mEst_func, self.mEst_func,
 88 |             self.mEst_func]
 89 | 
 90 |         """ =============================================================== """
 91 |         """             Initialize the Trust-Region Damping                 """
 92 |         """ =============================================================== """
 93 | 
 94 |         if no_weight_sharing:
 95 |             # for residual volume, the input K is not assigned correctly
 96 |             self.solver_func0 = DirectSolverNet(solver_type, samples=tr_samples)
 97 |             self.solver_func1 = DirectSolverNet(solver_type, samples=tr_samples)
 98 |             self.solver_func2 = DirectSolverNet(solver_type, samples=tr_samples)
 99 |             self.solver_func3 = DirectSolverNet(solver_type, samples=tr_samples)
100 |             solver_funcs = [self.solver_func0, self.solver_func1,
101 |             self.solver_func2, self.solver_func3]
102 |         else:
103 |             self.solver_func = DirectSolverNet(solver_type, samples=tr_samples)
104 |             solver_funcs = [self.solver_func, self.solver_func,
105 |                 self.solver_func, self.solver_func]
106 | 
107 |         """ =============================================================== """
108 |         """             Initialize the Trust-Region Method                  """
109 |         """ =============================================================== """
110 | 
111 |         self.tr_update0 = TrustRegion(max_iter_per_pyr,
112 |             mEst_func   = mEst_funcs[0],
113 |             solver_func = solver_funcs[0],
114 |             timers      = timers)
115 |         self.tr_update1 = TrustRegion(max_iter_per_pyr,
116 |             mEst_func   = mEst_funcs[1],
117 |             solver_func = solver_funcs[1],
118 |             timers      = timers)
119 |         self.tr_update2 = TrustRegion(max_iter_per_pyr,
120 |             mEst_func   = mEst_funcs[2],
121 |             solver_func = solver_funcs[2],
122 |             timers      = timers)
123 |         self.tr_update3 = TrustRegion(max_iter_per_pyr,
124 |             mEst_func   = mEst_funcs[3],
125 |             solver_func = solver_funcs[3],
126 |             timers      = timers)
127 | 
128 |     def forward(self, img0, img1, depth0, depth1, K, init_only=False):
129 |         """
130 |         :input
131 |         :param the reference image
132 |         :param the target image
133 |         :param the inverse depth of the reference image
134 |         :param the inverse depth of the target image
135 |         :param the pin-hole camera instrinsic (in vector) [fx, fy, cx, cy] 
136 |         :param the initial pose [Rotation, translation]
137 |         --------
138 |         :return 
139 |         :param estimated transform 
140 |         """
141 |         if self.timers: self.timers.tic('extract features')
142 | 
143 |         # pre-processing all the data, all the invalid inputs depth are set to 0
144 |         invD0 = torch.clamp(1.0 / depth0, 0, 10)
145 |         invD1 = torch.clamp(1.0 / depth1, 0, 10)
146 |         invD0[invD0 == invD0.min()] = 0
147 |         invD1[invD1 == invD1.min()] = 0
148 |         invD0[invD0 == invD0.max()] = 0
149 |         invD1[invD1 == invD1.max()] = 0
150 |         
151 |         I0 = color_normalize(img0)
152 |         I1 = color_normalize(img1)
153 | 
154 |         x0 = self.__encode_features(I0, invD0, I1, invD1)
155 |         x1 = self.__encode_features(I1, invD1, I0, invD0)
156 |         d0 = self.construct_depth_pyramids(invD0)
157 |         d1 = self.construct_depth_pyramids(invD1)
158 | 
159 |         if self.timers: self.timers.toc('extract features')
160 | 
161 |         poses_to_train = [[],[]] # '[rotation, translation]'
162 |         B = invD0.shape[0]
163 |         R0 = torch.eye(3,dtype=torch.float).expand(B,3,3).type_as(I0)
164 |         t0 = torch.zeros(B,3,1,dtype=torch.float).type_as(I0)
165 |         poseI = [R0, t0]
166 | 
167 |         # the prior of the mask
168 |         prior_W = torch.ones(d0[3].shape).type_as(d0[3])
169 | 
170 |         if self.timers: self.timers.tic('trust-region update')
171 |         # trust region update on level 3
172 |         K3 = K >> 3
173 |         output3 = self.tr_update3(poseI, x0[3], x1[3], d0[3], d1[3], K3, prior_W)
174 |         pose3, mEst_W3 = output3[0], output3[1]
175 |         poses_to_train[0].append(pose3[0])
176 |         poses_to_train[1].append(pose3[1])
177 |         # trust region update on level 2
178 |         K2 = K >> 2
179 |         output2 = self.tr_update2(pose3, x0[2], x1[2], d0[2], d1[2], K2, mEst_W3)
180 |         pose2, mEst_W2 = output2[0], output2[1]
181 |         poses_to_train[0].append(pose2[0])
182 |         poses_to_train[1].append(pose2[1])
183 |         # trust region update on level 1
184 |         K1 = K >> 1
185 |         output1 = self.tr_update1(pose2, x0[1], x1[1], d0[1], d1[1], K1, mEst_W2)
186 |         pose1, mEst_W1 = output1[0], output1[1]
187 |         poses_to_train[0].append(pose1[0])
188 |         poses_to_train[1].append(pose1[1])
189 |         # trust-region update on the raw scale
190 |         output0 = self.tr_update0(pose1, x0[0], x1[0], d0[0], d1[0], K, mEst_W1)
191 |         pose0 = output0[0]
192 |         poses_to_train[0].append(pose0[0])
193 |         poses_to_train[1].append(pose0[1])
194 |         if self.timers: self.timers.toc('trust-region update')
195 | 
196 |         if self.training:
197 |             pyr_R = torch.stack(tuple(poses_to_train[0]), dim=1)
198 |             pyr_t = torch.stack(tuple(poses_to_train[1]), dim=1)                
199 |             return pyr_R, pyr_t
200 |         else:
201 |             return pose0
202 | 
203 |     def __encode_features(self, img0, invD0, img1, invD1):
204 |         """ get the encoded features
205 |         """
206 |         if self.encoder_type == self.RGB:
207 |             # In the RGB case, we will only use the intensity image
208 |             I = self.__color3to1(img0)
209 |             x = self.construct_image_pyramids(I)
210 |         elif self.encoder_type == self.CONV_RGBD:
211 |             m = torch.cat((img0, invD0), dim=1)
212 |             x = self.encoder.forward(m)
213 |         elif self.encoder_type in [self.CONV_RGBD2]:
214 |             m = torch.cat((img0, invD0, img1, invD1), dim=1)
215 |             x = self.encoder.forward(m)
216 |         else:
217 |             raise NotImplementedError()
218 | 
219 |         x = [self.__Nto1(a) for a in x]
220 | 
221 |         return x
222 | 
223 |     def __Nto1(self, x):
224 |         """ Take the average of multi-dimension feature into one dimensional,
225 |             which boostrap the optimization speed
226 |         """
227 |         C = x.shape[1]
228 |         return x.sum(dim=1, keepdim=True) / C
229 | 
230 |     def __color3to1(self, img):
231 |         """ Return a gray-scale image
232 |         """
233 |         B, _, H, W = img.shape
234 |         return (img[:,0] * 0.299 + img[:, 1] * 0.587 + img[:, 2] * 0.114).view(B,1,H,W)
235 | 


--------------------------------------------------------------------------------
/code/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/models/__init__.py


--------------------------------------------------------------------------------
/code/models/algorithms.py:
--------------------------------------------------------------------------------
  1 | """
  2 | The algorithm backbone, primarily the three contributions proposed in our paper
  3 | 
  4 | @author: Zhaoyang Lv
  5 | @date: March, 2019
  6 | """
  7 | 
  8 | import torch
  9 | import torch.nn as nn
 10 | import torchvision.models as models
 11 | import torch.nn.functional as func
 12 | 
 13 | import models.geometry as geometry
 14 | from models.submodules import convLayer as conv
 15 | from models.submodules import fcLayer, initialize_weights
 16 | 
 17 | class TrustRegionBase(nn.Module):
 18 |     """ 
 19 |     This is the the base function of the trust-region based inverse compositional algorithm. 
 20 |     """
 21 |     def __init__(self,
 22 |         max_iter    = 3,
 23 |         mEst_func   = None,
 24 |         solver_func = None,
 25 |         timers      = None):
 26 |         """
 27 |         :param max_iter, maximum number of iterations
 28 |         :param mEst_func, the M-estimator function / network 
 29 |         :param solver_func, the trust-region function / network
 30 |         :param timers, if yes, counting time for each step
 31 |         """
 32 |         super(TrustRegionBase, self).__init__()
 33 | 
 34 |         self.max_iterations = max_iter
 35 |         self.mEstimator     = mEst_func
 36 |         self.directSolver   = solver_func
 37 |         self.timers         = timers
 38 | 
 39 |     def forward(self, pose, x0, x1, invD0, invD1, K, wPrior=None):
 40 |         """
 41 |         :param pose, the initial pose
 42 |             (extrinsic of the target frame w.r.t. the referenc frame)
 43 |         :param x0, the template features
 44 |         :param x1, the image features
 45 |         :param invD0, the template inverse depth
 46 |         :param invD1, the image inverse depth
 47 |         :param K, the intrinsic parameters, [fx, fy, cx, cy]
 48 |         :param wPrior (optional), provide an initial weight as input to the convolutional m-estimator
 49 |         """
 50 |         B, C, H, W = x0.shape
 51 |         px, py = geometry.generate_xy_grid(B,H,W,K)
 52 | 
 53 |         if self.timers: self.timers.tic('pre-compute Jacobians')
 54 |         J_F_p = self.precompute_Jacobian(invD0, x0, px, py, K)
 55 |         if self.timers: self.timers.toc('pre-compute Jacobians')
 56 | 
 57 |         if self.timers: self.timers.tic('compute warping residuals')
 58 |         residuals, occ = compute_warped_residual(pose, invD0, invD1, \
 59 |             x0, x1, px, py, K)
 60 |         if self.timers: self.timers.toc('compute warping residuals')
 61 | 
 62 |         if self.timers: self.timers.tic('robust estimator')
 63 |         weights = self.mEstimator(residuals, x0, x1, wPrior)
 64 |         wJ = weights.view(B,-1,1) * J_F_p
 65 |         if self.timers: self.timers.toc('robust estimator')
 66 | 
 67 |         if self.timers: self.timers.tic('pre-compute JtWJ')
 68 |         JtWJ = torch.bmm(torch.transpose(J_F_p, 1, 2) , wJ)
 69 |         if self.timers: self.timers.toc('pre-compute JtWJ')
 70 | 
 71 |         for idx in range(self.max_iterations):
 72 |             if self.timers: self.timers.tic('solve x=A^{-1}b')
 73 |             pose = self.directSolver(JtWJ,
 74 |                 torch.transpose(J_F_p,1,2), weights, residuals,
 75 |                 pose, invD0, invD1, x0, x1, K)
 76 |             if self.timers: self.timers.toc('solve x=A^{-1}b')
 77 |     
 78 |             if self.timers: self.timers.tic('compute warping residuals')
 79 |             residuals, occ = compute_warped_residual(pose, invD0, invD1, \
 80 |                 x0, x1, px, py, K)
 81 |             if self.timers: self.timers.toc('compute warping residuals')
 82 | 
 83 |         return pose, weights
 84 | 
 85 |     def precompute_Jacobian(self, invD, x, px, py, K):
 86 |         """ Pre-compute the image Jacobian on the reference frame
 87 |         refer to equation (13) in the paper
 88 |         
 89 |         :param invD, template depth
 90 |         :param x, template feature
 91 |         :param px, normalized image coordinate in cols (x)
 92 |         :param py, normalized image coordinate in rows (y)
 93 |         :param K, the intrinsic parameters, [fx, fy, cx, cy]
 94 | 
 95 |         ------------
 96 |         :return precomputed image Jacobian on template
 97 |         """
 98 |         Jf_x, Jf_y = feature_gradient(x)
 99 |         Jx_p, Jy_p = compute_jacobian_warping(invD, K, px, py)
100 |         J_F_p = compute_jacobian_dIdp(Jf_x, Jf_y, Jx_p, Jy_p)
101 |         return J_F_p
102 | 
103 | class ImagePyramids(nn.Module):
104 |     """ Construct the pyramids in the image / depth space
105 |     """
106 |     def __init__(self, scales, pool='avg'):
107 |         super(ImagePyramids, self).__init__()
108 |         if pool == 'avg':
109 |             self.multiscales = [nn.AvgPool2d(1<<i, 1<<i) for i in scales]
110 |         elif pool == 'max':
111 |             self.multiscales = [nn.MaxPool2d(1<<i, 1<<i) for i in scales]
112 |         else:
113 |             raise NotImplementedError()
114 | 
115 |     def forward(self, x):
116 |         x_out = [f(x) for f in self.multiscales]
117 |         return x_out
118 | 
119 | class FeaturePyramid(nn.Module):
120 |     """ 
121 |     The proposed feature-encoder (A).
122 |     It also supports to extract features using one-view only.
123 |     """
124 |     def __init__(self, D):
125 |         super(FeaturePyramid, self).__init__()
126 |         self.net0 = nn.Sequential(
127 |             conv(True, D,  16, 3), 
128 |             conv(True, 16, 32, 3, dilation=2),
129 |             conv(True, 32, 32, 3, dilation=2))
130 |         self.net1 = nn.Sequential(
131 |             conv(True, 32, 32, 3),
132 |             conv(True, 32, 64, 3, dilation=2),
133 |             conv(True, 64, 64, 3, dilation=2))
134 |         self.net2 = nn.Sequential(
135 |             conv(True, 64, 64, 3),
136 |             conv(True, 64, 96, 3, dilation=2),
137 |             conv(True, 96, 96, 3, dilation=2))
138 |         self.net3 = nn.Sequential(
139 |             conv(True, 96, 96, 3),
140 |             conv(True, 96, 128, 3, dilation=2),
141 |             conv(True, 128,128, 3, dilation=2))
142 | 
143 |         initialize_weights(self.net0)
144 |         initialize_weights(self.net1)
145 |         initialize_weights(self.net2)
146 |         initialize_weights(self.net3)
147 |         self.downsample = torch.nn.AvgPool2d(kernel_size=2)
148 | 
149 |     def forward(self, x):
150 |         x0 = self.net0(x)
151 |         x0s= self.downsample(x0)
152 |         x1 = self.net1(x0s)
153 |         x1s= self.downsample(x1)
154 |         x2 = self.net2(x1s)
155 |         x2s= self.downsample(x2)
156 |         x3 = self.net3(x2s)
157 |         return x0, x1, x2, x3
158 | 
159 | class DeepRobustEstimator(nn.Module):
160 |     """ The M-estimator 
161 | 
162 |     When use estimator_type = 'MultiScale2w', it is the proposed convolutional M-estimator
163 |     """
164 | 
165 |     def __init__(self, estimator_type):
166 |         super(DeepRobustEstimator, self).__init__()
167 | 
168 |         if estimator_type == 'MultiScale2w':
169 |             self.D = 4
170 |         elif estimator_type == 'None':
171 |             self.mEst_func = self.__constant_weight
172 |             self.D = -1
173 |         else:
174 |             raise NotImplementedError()
175 | 
176 |         if self.D > 0:
177 |             self.net = nn.Sequential(
178 |                 conv(True, self.D, 16, 3, dilation=1),
179 |                 conv(True, 16, 32, 3, dilation=2),
180 |                 conv(True, 32, 64, 3, dilation=4),
181 |                 conv(True, 64, 1,  3, dilation=1),
182 |                 nn.Sigmoid() )
183 |             initialize_weights(self.net)
184 |         else:
185 |             self.net = None
186 | 
187 |     def forward(self, residual, x0, x1, ws=None):
188 |         """
189 |         :param residual, the residual map
190 |         :param x0, the feature map of the template
191 |         :param x1, the feature map of the image
192 |         :param ws, the initial weighted residual
193 |         """
194 |         if self.D == 1: # use residual only
195 |             context = residual.abs()
196 |             w = self.net(context)
197 |         elif self.D == 4:
198 |             B, C, H, W = residual.shape
199 |             wl = func.interpolate(ws, (H,W), mode='bilinear', align_corners=True)
200 |             context = torch.cat((residual.abs(), x0, x1, wl), dim=1)
201 |             w = self.net(context)
202 |         elif self.D < 0:
203 |             w = self.mEst_func(residual)
204 | 
205 |         return w
206 | 
207 |     def __weight_Huber(self, x, alpha = 0.02):
208 |         """ weight function of Huber loss:
209 |         refer to P. 24 w(x) at
210 |         https://members.loria.fr/moberger/Enseignement/Master2/Documents/ZhangIVC-97-01.pdf
211 | 
212 |         Note this current implementation is not differentiable.
213 |         """
214 |         abs_x = torch.abs(x)
215 |         linear_mask = abs_x > alpha
216 |         w = torch.ones(x.shape).type_as(x)
217 | 
218 |         if linear_mask.sum().item() > 0: 
219 |             w[linear_mask] = alpha / abs_x[linear_mask]
220 |         return w
221 | 
222 |     def __constant_weight(self, x):
223 |         """ mimic the standard least-square when weighting function is constant
224 |         """
225 |         return torch.ones(x.shape).type_as(x)
226 | 
227 | class DirectSolverNet(nn.Module):
228 | 
229 |     # the enum types for direct solver
230 |     SOLVER_NO_DAMPING       = 0
231 |     SOLVER_RESIDUAL_VOLUME  = 1
232 | 
233 |     def __init__(self, solver_type, samples=10):
234 |         super(DirectSolverNet, self).__init__()
235 | 
236 |         if solver_type == 'Direct-Nodamping':
237 |             self.net = None
238 |             self.type = self.SOLVER_NO_DAMPING
239 |         elif solver_type == 'Direct-ResVol':
240 |             # flattened JtJ and JtR (number of samples, currently fixed at 10)
241 |             self.samples = samples
242 |             self.net = deep_damping_regressor(D=6*6+6*samples)
243 |             self.type = self.SOLVER_RESIDUAL_VOLUME
244 |             initialize_weights(self.net)
245 |         else: 
246 |             raise NotImplementedError()
247 | 
248 |     def forward(self, JtJ, Jt, weights, R, pose0, invD0, invD1, x0, x1, K):
249 |         """
250 |         :param JtJ, the approximated Hessian JtJ
251 |         :param Jt, the trasposed Jacobian
252 |         :param weights, the weight matrix
253 |         :param R, the residual
254 |         :param pose0, the initial estimated pose
255 |         :param invD0, the template inverse depth map
256 |         :param invD1, the image inverse depth map
257 |         :param x0, the template feature map
258 |         :param x1, the image feature map
259 |         :param K, the intrinsic parameters
260 | 
261 |         -----------
262 |         :return updated pose
263 |         """
264 | 
265 |         B = JtJ.shape[0]
266 | 
267 |         wR = (weights * R).view(B, -1, 1)
268 |         JtR = torch.bmm(Jt, wR)
269 | 
270 |         if self.type == self.SOLVER_NO_DAMPING:
271 |             # Add a small diagonal damping. Without it, the training becomes quite unstable
272 |             # Do not see a clear difference by removing the damping in inference though
273 |             diag_mask = torch.eye(6).view(1,6,6).type_as(JtJ)
274 |             diagJtJ = diag_mask * JtJ
275 |             traceJtJ = torch.sum(diagJtJ, (2,1))
276 |             epsilon = (traceJtJ * 1e-6).view(B,1,1) * diag_mask
277 |             Hessian = JtJ + epsilon
278 |             pose_update = inverse_update_pose(Hessian, JtR, pose0)
279 |         elif self.type == self.SOLVER_RESIDUAL_VOLUME:
280 |             Hessian = self.__regularize_residual_volume(JtJ, Jt, JtR, weights,
281 |                 pose0, invD0, invD1, x0, x1, K, sample_range=self.samples)
282 |             pose_update = inverse_update_pose(Hessian, JtR, pose0)
283 |         else:
284 |             raise NotImplementedError()
285 | 
286 |         return pose_update
287 | 
288 |     def __regularize_residual_volume(self, JtJ, Jt, JtR, weights, pose,
289 |         invD0, invD1, x0, x1, K, sample_range):
290 |         """ regularize the approximate with residual volume
291 | 
292 |         :param JtJ, the approximated Hessian JtJ
293 |         :param Jt, the trasposed Jacobian
294 |         :param JtR, the Right-hand size residual
295 |         :param weights, the weight matrix
296 |         :param pose, the initial estimated pose
297 |         :param invD0, the template inverse depth map
298 |         :param invD1, the image inverse depth map
299 |         :param K, the intrinsic parameters
300 |         :param x0, the template feature map
301 |         :param x1, the image feature map
302 |         :param sample_range, the numerb of samples
303 | 
304 |         ---------------
305 |         :return the damped Hessian matrix
306 |         """
307 |         # the following current support only single scale
308 |         JtR_volumes = []
309 | 
310 |         B, C, H, W = x0.shape
311 |         px, py = geometry.generate_xy_grid(B, H, W, K)
312 | 
313 |         diag_mask = torch.eye(6).view(1,6,6).type_as(JtJ)
314 |         diagJtJ = diag_mask * JtJ
315 |         traceJtJ = torch.sum(diagJtJ, (2,1))
316 |         epsilon = (traceJtJ * 1e-6).view(B,1,1) * diag_mask
317 |         n = sample_range
318 |         lambdas = torch.logspace(-5, 5, n).type_as(JtJ)
319 | 
320 |         for s in range(n):
321 |             # the epsilon is to prevent the matrix to be too ill-conditioned
322 |             D = lambdas[s] * diagJtJ + epsilon
323 |             Hessian = JtJ + D
324 |             pose_s = inverse_update_pose(Hessian, JtR, pose)
325 | 
326 |             res_s,_= compute_warped_residual(pose_s, invD0, invD1, x0, x1, px, py, K)
327 |             JtR_s = torch.bmm(Jt, (weights * res_s).view(B,-1,1))
328 |             JtR_volumes.append(JtR_s)
329 | 
330 |         JtR_flat = torch.cat(tuple(JtR_volumes), dim=2).view(B,-1)
331 |         JtJ_flat = JtJ.view(B,-1)
332 |         damp_est = self.net(torch.cat((JtR_flat, JtJ_flat), dim=1))
333 |         R = diag_mask * damp_est.view(B,6,1) + epsilon # also lift-up
334 | 
335 |         return JtJ + R
336 | 
337 | def deep_damping_regressor(D):
338 |     """ Output a damping vector at each dimension
339 |     """
340 |     net = nn.Sequential(
341 |         fcLayer(in_planes=D,   out_planes=128, bias=True),
342 |         fcLayer(in_planes=128, out_planes=256, bias=True),
343 |         fcLayer(in_planes=256, out_planes=6, bias=True)
344 |     ) # the last ReLU makes sure every predicted value is positive
345 |     return net
346 | 
347 | def feature_gradient(img, normalize_gradient=True):
348 |     """ Calculate the gradient on the feature space using Sobel operator
349 |     :param the input image 
350 |     -----------
351 |     :return the gradient of the image in x, y direction
352 |     """
353 |     B, C, H, W = img.shape
354 |     # to filter the image equally in each channel
355 |     wx = torch.FloatTensor([[-1, 0, 1],[-2, 0, 2],[-1, 0, 1]]).view(1,1,3,3).type_as(img)
356 |     wy = torch.FloatTensor([[-1,-2,-1],[ 0, 0, 0],[ 1, 2, 1]]).view(1,1,3,3).type_as(img)
357 | 
358 |     img_reshaped = img.view(-1, 1, H, W)
359 |     img_pad = func.pad(img_reshaped, (1,1,1,1), mode='replicate')
360 |     img_dx = func.conv2d(img_pad, wx, stride=1, padding=0)
361 |     img_dy = func.conv2d(img_pad, wy, stride=1, padding=0)
362 | 
363 |     if normalize_gradient:
364 |         mag = torch.sqrt((img_dx ** 2) + (img_dy ** 2)+ 1e-8)
365 |         img_dx = img_dx / mag 
366 |         img_dy = img_dy / mag
367 | 
368 |     return img_dx.view(B,C,H,W), img_dy.view(B,C,H,W)
369 | 
370 | def compute_jacobian_dIdp(Jf_x, Jf_y, Jx_p, Jy_p):
371 |     """ chained gradient of image w.r.t. the pose
372 |     :param the Jacobian of the feature map in x direction
373 |     :param the Jacobian of the feature map in y direction
374 |     :param the Jacobian of the x map to manifold p
375 |     :param the Jacobian of the y map to manifold p
376 |     ------------
377 |     :return the image jacobian in x, y, direction, Bx2x6 each
378 |     """
379 |     B, C, H, W = Jf_x.shape
380 | 
381 |     # precompute J_F_p, JtWJ
382 |     Jf_p = Jf_x.view(B,C,-1,1) * Jx_p.view(B,1,-1,6) + \
383 |         Jf_y.view(B,C,-1,1) * Jy_p.view(B,1,-1,6)
384 |     
385 |     return Jf_p.view(B,-1,6)
386 | 
387 | def compute_jacobian_warping(p_invdepth, K, px, py):
388 |     """ Compute the Jacobian matrix of the warped (x,y) w.r.t. the inverse depth
389 |     (linearized at origin)
390 |     :param p_invdepth the input inverse depth
391 |     :param the intrinsic calibration
392 |     :param the pixel x map
393 |     :param the pixel y map
394 |      ------------
395 |     :return the warping jacobian in x, y direction
396 |     """
397 |     B, C, H, W = p_invdepth.size()
398 |     assert(C == 1)
399 | 
400 |     x = px.view(B, -1, 1)
401 |     y = py.view(B, -1, 1)
402 |     invd = p_invdepth.view(B, -1, 1)
403 | 
404 |     xy = x * y
405 |     O = torch.zeros((B, H*W, 1)).type_as(p_invdepth)
406 | 
407 |     # This is cascaded Jacobian functions of the warping function
408 |     # Refer to the supplementary materials for math documentation
409 |     dx_dp = torch.cat((-xy,     1+x**2, -y, invd, O, -invd*x), dim=2)
410 |     dy_dp = torch.cat((-1-y**2, xy,     x, O, invd, -invd*y), dim=2)
411 | 
412 |     fx, fy, cx, cy = torch.split(K, 1, dim=1)
413 |     return dx_dp*fx.view(B,1,1), dy_dp*fy.view(B,1,1)
414 | 
415 | def compute_warped_residual(pose, invD0, invD1, x0, x1, px, py, K, obj_mask=None):
416 |     """ Compute the residual error of warped target image w.r.t. the reference feature map.
417 |     refer to equation (12) in the paper
418 | 
419 |     :param the forward warping pose from the reference camera to the target frame.
420 |         Note that warping from the target frame to the reference frame is the inverse of this operation.
421 |     :param the reference inverse depth
422 |     :param the target inverse depth
423 |     :param the reference feature image
424 |     :param the target feature image
425 |     :param the pixel x map
426 |     :param the pixel y map
427 |     :param the intrinsic calibration
428 |     -----------
429 |     :return the residual (of reference image), and occlusion information
430 |     """
431 |     u_warped, v_warped, inv_z_warped = geometry.batch_warp_inverse_depth(
432 |         px, py, invD0, pose, K)
433 |     x1_1to0 = geometry.warp_features(x1, u_warped, v_warped)
434 |     occ = geometry.check_occ(inv_z_warped, invD1, u_warped, v_warped)
435 | 
436 |     residuals = x1_1to0 - x0 # equation (12)
437 | 
438 |     B, C, H, W = x0.shape
439 |     if obj_mask is not None:
440 |         # determine whether the object is in-view
441 |         occ = occ & (obj_mask.view(B,1,H,W) < 1)
442 |     residuals[occ.expand(B,C,H,W)] = 1e-3
443 | 
444 |     return residuals, occ
445 | 
446 | def inverse_update_pose(H, Rhs, pose):
447 |     """ Ues left-multiplication for the pose update 
448 |     in the inverse compositional form
449 |     refer to equation (10) in the paper 
450 | 
451 |     :param the (approximated) Hessian matrix
452 |     :param Right-hand side vector
453 |     :param the initial pose (forward transform inverse of xi)
454 |     ---------
455 |     :return the forward updated pose (inverse of xi)
456 |     """
457 |     inv_H = invH(H)
458 |     xi = torch.bmm(inv_H, Rhs)
459 |     # simplifed inverse compositional for SE3
460 |     d_R = geometry.batch_twist2Mat(-xi[:, :3].view(-1,3))
461 |     d_t = -torch.bmm(d_R, xi[:, 3:])
462 | 
463 |     R, t = pose
464 |     pose = geometry.batch_Rt_compose(R, t, d_R, d_t) 
465 |     return pose
466 | 
467 | def invH(H):
468 |     """ Generate (H+damp)^{-1}, with predicted damping values
469 |     :param approximate Hessian matrix JtWJ
470 |     -----------
471 |     :return the inverse of Hessian
472 |     """
473 |     # GPU is much slower for matrix inverse when the size is small (compare to CPU)
474 |     # works (50x faster) than inversing the dense matrix in GPU
475 |     if H.is_cuda:
476 |         # invH = bpinv((H).cpu()).cuda()
477 |         # invH = torch.inverse(H)
478 |         invH = torch.inverse(H.cpu()).cuda()
479 |     else:
480 |         invH = torch.inverse(H)
481 |     return invH
482 | 


--------------------------------------------------------------------------------
/code/models/criterions.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Some training criterions
  3 | 
  4 | @author: Zhaoyang Lv
  5 | @date: March, 2019
  6 | """
  7 | 
  8 | from __future__ import absolute_import
  9 | from __future__ import division
 10 | from __future__ import print_function
 11 | from __future__ import unicode_literals
 12 | 
 13 | import torch
 14 | import torch.nn as nn
 15 | import torch.nn.functional as func
 16 | import models.geometry as geo
 17 | 
 18 | def EPE3D_loss(input_flow, target_flow, invalid=None):
 19 |     """
 20 |     :param the estimated optical / scene flow
 21 |     :param the ground truth / target optical / scene flow
 22 |     :param the invalid mask, the mask has value 1 for all areas that are invalid
 23 |     """
 24 |     epe_map = torch.norm(target_flow-input_flow,p=2,dim=1)
 25 |     B = epe_map.shape[0]
 26 | 
 27 |     invalid_flow = (target_flow != target_flow) # check Nan same as torch.isnan
 28 | 
 29 |     mask = (invalid_flow[:,0,:,:] | invalid_flow[:,1,:,:] | invalid_flow[:,2,:,:]) 
 30 |     if invalid is not None:
 31 |         mask = mask | (invalid.view(mask.shape) > 0)
 32 | 
 33 |     epes = []
 34 |     for idx in range(B):
 35 |         epe_sample = epe_map[idx][~mask[idx].data]
 36 |         if len(epe_sample) == 0:
 37 |             epes.append(torch.zeros(()).type_as(input_flow))
 38 |         else:
 39 |             epes.append(epe_sample.mean()) 
 40 | 
 41 |     return torch.stack(epes)
 42 | 
 43 | def RPE(R, t):
 44 |     """ Calcualte the relative pose error 
 45 |     (a batch version of the RPE error defined in TUM RGBD SLAM TUM dataset)
 46 |     :param relative rotation
 47 |     :param relative translation
 48 |     """
 49 |     angle_error = geo.batch_mat2angle(R)
 50 |     trans_error = torch.norm(t, p=2, dim=1) 
 51 |     return angle_error, trans_error
 52 | 
 53 | def compute_RPE_loss(R_est, t_est, R_gt, t_gt):
 54 |     """
 55 |     :param estimated rotation matrix Bx3x3
 56 |     :param estimated translation vector Bx3
 57 |     :param ground truth rotation matrix Bx3x3
 58 |     :param ground truth translation vector Bx3
 59 |     """ 
 60 |     dR, dt = geo.batch_Rt_between(R_est, t_est, R_gt, t_gt)
 61 |     angle_error, trans_error = RPE(dR, dt)
 62 |     return angle_error, trans_error
 63 | 
 64 | def compute_RT_EPE_loss(R_est, t_est, R_gt, t_gt, depth0, K, invalid=None): 
 65 |     """ Compute the epe point error of rotation & translation
 66 |     :param estimated rotation matrix Bx3x3
 67 |     :param estimated translation vector Bx3
 68 |     :param ground truth rotation matrix Bx3x3
 69 |     :param ground truth translation vector Bx3
 70 |     :param reference depth image, 
 71 |     :param camera intrinsic 
 72 |     """
 73 |     
 74 |     loss = 0
 75 |     if R_est.dim() > 3: # training time [batch, num_poses, rot_row, rot_col]
 76 |         rH, rW = 60, 80 # we train the algorithm using a downsized input, (since the size of the input is not super important at training time)
 77 | 
 78 |         B,C,H,W = depth0.shape
 79 |         rdepth = func.interpolate(depth0, size=(rH, rW), mode='bilinear')
 80 |         rinvalid = func.interpolate(invalid.float(), size=(rH,rW), mode='bilinear')
 81 |         rK = K.clone()
 82 |         rK[:,0] *= float(rW) / W
 83 |         rK[:,1] *= float(rH) / H
 84 |         rK[:,2] *= float(rW) / W
 85 |         rK[:,3] *= float(rH) / H
 86 |         xyz = geo.batch_inverse_project(rdepth, rK)
 87 |         flow_gt = geo.batch_transform_xyz(xyz, R_gt, t_gt, get_Jacobian=False)
 88 | 
 89 |         for idx in range(R_est.shape[1]):
 90 |             flow_est= geo.batch_transform_xyz(xyz, R_est[:,idx], t_est[:,idx], get_Jacobian=False)
 91 |             loss += EPE3D_loss(flow_est, flow_gt.detach(), rinvalid) #* (1<<idx) scaling does not help that much
 92 |     else:
 93 |         xyz = geo.batch_inverse_project(depth0, K)
 94 |         flow_gt = geo.batch_transform_xyz(xyz, R_gt, t_gt, get_Jacobian=False)
 95 | 
 96 |         flow_est= geo.batch_transform_xyz(xyz, R_est, t_est, get_Jacobian=False)
 97 |         loss = EPE3D_loss(flow_est, flow_gt, invalid)
 98 | 
 99 |     return loss
100 | 
101 | 


--------------------------------------------------------------------------------
/code/models/geometry.py:
--------------------------------------------------------------------------------
  1 | """
  2 | A collection of geometric transformation operations
  3 | 
  4 | @author: Zhaoyang Lv 
  5 | @Date: March, 2019
  6 | """
  7 | 
  8 | from __future__ import absolute_import
  9 | from __future__ import division
 10 | from __future__ import print_function
 11 | from __future__ import unicode_literals
 12 | 
 13 | import torch
 14 | import torch.nn as nn
 15 | from torch import sin, cos, atan2, acos
 16 | 
 17 | _NEXT_AXIS = [1, 2, 0, 1]
 18 | 
 19 | # map axes strings to/from tuples of inner axis, parity, repetition, frame
 20 | _AXES2TUPLE = {
 21 |     'sxyz': (0, 0, 0, 0), 'sxyx': (0, 0, 1, 0), 'sxzy': (0, 1, 0, 0),
 22 |     'sxzx': (0, 1, 1, 0), 'syzx': (1, 0, 0, 0), 'syzy': (1, 0, 1, 0),
 23 |     'syxz': (1, 1, 0, 0), 'syxy': (1, 1, 1, 0), 'szxy': (2, 0, 0, 0),
 24 |     'szxz': (2, 0, 1, 0), 'szyx': (2, 1, 0, 0), 'szyz': (2, 1, 1, 0),
 25 |     'rzyx': (0, 0, 0, 1), 'rxyx': (0, 0, 1, 1), 'ryzx': (0, 1, 0, 1),
 26 |     'rxzx': (0, 1, 1, 1), 'rxzy': (1, 0, 0, 1), 'ryzy': (1, 0, 1, 1),
 27 |     'rzxy': (1, 1, 0, 1), 'ryxy': (1, 1, 1, 1), 'ryxz': (2, 0, 0, 1),
 28 |     'rzxz': (2, 0, 1, 1), 'rxyz': (2, 1, 0, 1), 'rzyz': (2, 1, 1, 1)}
 29 | 
 30 | _TUPLE2AXES = dict((v, k) for k, v in _AXES2TUPLE.items())
 31 | 
 32 | def meshgrid(H, W, B=None, is_cuda=False):
 33 |     """ torch version of numpy meshgrid function
 34 | 
 35 |     :input
 36 |     :param height
 37 |     :param width
 38 |     :param batch size
 39 |     :param initialize a cuda tensor if true
 40 |     -------
 41 |     :return 
 42 |     :param meshgrid in column
 43 |     :param meshgrid in row
 44 |     """
 45 |     u = torch.arange(0, W)
 46 |     v = torch.arange(0, H)
 47 | 
 48 |     if is_cuda:
 49 |         u, v = u.cuda(), v.cuda()
 50 | 
 51 |     u = u.repeat(H, 1).view(1,H,W)
 52 |     v = v.repeat(W, 1).t_().view(1,H,W)
 53 | 
 54 |     if B is not None:
 55 |         u, v = u.repeat(B,1,1,1), v.repeat(B,1,1,1)
 56 |     return u, v
 57 | 
 58 | def generate_xy_grid(B, H, W, K):
 59 |     """ Generate a batch of image grid from image space to world space 
 60 |         px = (u - cx) / fx
 61 |         py = (y - cy) / fy
 62 | 
 63 |         function tested in 'test_geometry.py'
 64 | 
 65 |     :input
 66 |     :param batch size
 67 |     :param height
 68 |     :param width
 69 |     :param camera intrinsic array [fx,fy,cx,cy] 
 70 |     ---------
 71 |     :return 
 72 |     :param 
 73 |     :param 
 74 |     """
 75 |     fx, fy, cx, cy = K.split(1,dim=1)
 76 |     uv_grid = meshgrid(H, W, B)
 77 |     u_grid, v_grid = [uv.type_as(cx) for uv in uv_grid]
 78 |     px = ((u_grid.view(B,-1) - cx) / fx).view(B,1,H,W)
 79 |     py = ((v_grid.view(B,-1) - cy) / fy).view(B,1,H,W)
 80 |     return px, py
 81 | 
 82 | def batch_inverse_Rt(R, t):
 83 |     """ The inverse of the R, t: [R' | -R't] 
 84 | 
 85 |         function tested in 'test_geometry.py'
 86 | 
 87 |     :input 
 88 |     :param rotation Bx3x3
 89 |     :param translation Bx3
 90 |     ----------
 91 |     :return 
 92 |     :param rotation inverse Bx3x3
 93 |     :param translation inverse Bx3
 94 |     """
 95 |     R_t = R.transpose(1,2)
 96 |     t_inv = -torch.bmm(R_t, t.contiguous().view(-1, 3, 1))
 97 | 
 98 |     return R_t, t_inv.view(-1,3)
 99 | 
100 | def batch_Rt_compose(d_R, d_t, R0, t0):
101 |     """ Compose operator of R, t: [d_R*R | d_R*t + d_t] 
102 |         We use left-mulitplication rule here. 
103 | 
104 |         function tested in 'test_geometry.py'
105 |     
106 |     :input
107 |     :param rotation incremental Bx3x3
108 |     :param translation incremental Bx3
109 |     :param initial rotation Bx3x3
110 |     :param initial translation Bx3
111 |     ----------
112 |     :return 
113 |     :param composed rotation Bx3x3
114 |     :param composed translation Bx3
115 |     """
116 |     R1 = d_R.bmm(R0)
117 |     t1 = d_R.bmm(t0.view(-1,3,1)) + d_t.view(-1,3,1)
118 |     return R1, t1.view(-1,3)
119 | 
120 | def batch_Rt_between(R0, t0, R1, t1): 
121 |     """ Between operator of R, t, transform of T_0=[R0, t0] to T_1=[R1, t1]
122 |         which is T_1 \compose T^{-1}_0 
123 | 
124 |         function tested in 'test_geometry.py'
125 |     
126 |     :input 
127 |     :param rotation of source Bx3x3
128 |     :param translation of source Bx3
129 |     :param rotation of target Bx3x3
130 |     :param translation of target Bx3
131 |     ----------
132 |     :return 
133 |     :param incremental rotation Bx3x3
134 |     :param incremnetal translation Bx3
135 |     """
136 |     R0t = R0.transpose(1,2)
137 |     dR = R1.bmm(R0t)
138 |     dt = t1.view(-1,3) - dR.bmm(t0.view(-1,3,1)).view(-1,3)
139 |     return dR, dt
140 | 
141 | def batch_skew(w):
142 |     """ Generate a batch of skew-symmetric matrices. 
143 | 
144 |         function tested in 'test_geometry.py'
145 | 
146 |     :input
147 |     :param skew symmetric matrix entry Bx3
148 |     ---------
149 |     :return 
150 |     :param the skew-symmetric matrix Bx3x3
151 |     """
152 |     B, D = w.size()
153 |     assert(D == 3)
154 |     o = torch.zeros(B).type_as(w)
155 |     w0, w1, w2 = w[:, 0], w[:, 1], w[:, 2]
156 |     return torch.stack((o, -w2, w1, w2, o, -w0, -w1, w0, o), 1).view(B, 3, 3)
157 | 
158 | def batch_twist2Mat(twist):
159 |     """ The exponential map from so3 to SO3
160 | 
161 |         Calculate the rotation matrix using Rodrigues' Rotation Formula
162 |         http://electroncastle.com/wp/?p=39 
163 |         or Ethan Eade's lie group note:
164 |         http://ethaneade.com/lie.pdf equation (13)-(15) 
165 | 
166 |         @todo: may rename the interface to batch_so3expmap(twist)
167 | 
168 |         functioned tested with cv2.Rodrigues implementation in 'test_geometry.py'
169 | 
170 |     :input
171 |     :param twist/axis angle Bx3 \in \so3 space 
172 |     ----------
173 |     :return 
174 |     :param Rotation matrix Bx3x3 \in \SO3 space
175 |     """
176 |     B = twist.size()[0]
177 |     theta = twist.norm(p=2, dim=1).view(B, 1)
178 |     w_so3 = twist / theta.expand(B, 3)
179 |     W = batch_skew(w_so3)
180 |     return torch.eye(3).repeat(B,1,1).type_as(W) \
181 |         + W*sin(theta.view(B,1,1)) \
182 |         + W.bmm(W)*(1-cos(theta).view(B,1,1))
183 | 
184 | def batch_mat2angle(R):
185 |     """ Calcuate the axis angles (twist) from a batch of rotation matrices
186 | 
187 |         Ethan Eade's lie group note:
188 |         http://ethaneade.com/lie.pdf equation (17)
189 | 
190 |         function tested in 'test_geometry.py'
191 | 
192 |     :input
193 |     :param Rotation matrix Bx3x3 \in \SO3 space
194 |     --------
195 |     :return 
196 |     :param the axis angle B
197 |     """
198 |     R1 = [torch.trace(R[i]) for i in range(R.size()[0])]
199 |     R_trace = torch.stack(R1)
200 |     # clamp if the angle is too large (break small angle assumption)
201 |     # @todo: not sure whether it is absoluately necessary in training. 
202 |     angle = acos( ((R_trace - 1)/2).clamp(-1,1))
203 |     return angle
204 | 
205 | def batch_mat2twist(R):
206 |     """ The log map from SO3 to so3
207 | 
208 |         Calculate the twist vector from Rotation matrix 
209 | 
210 |         Ethan Eade's lie group note:
211 |         http://ethaneade.com/lie.pdf equation (18)
212 | 
213 |         @todo: may rename the interface to batch_so3logmap(R)
214 | 
215 |         function tested in 'test_geometry.py'
216 | 
217 |         @note: it currently does not consider extreme small values. 
218 |         If you use it as training loss, you may run into problems
219 | 
220 |     :input
221 |     :param Rotation matrix Bx3x3 \in \SO3 space 
222 |     --------
223 |     :param the twist vector Bx3 \in \so3 space
224 |     """
225 |     B = R.size()[0]
226 |  
227 |     R1 = [torch.trace(R[i]) for i in range(R.size()[0])]
228 |     tr = torch.stack(R1)
229 |     theta = acos( ((tr - 1)/2).clamp(-1,1) )
230 | 
231 |     r11,r12,r13,r21,r22,r23,r31,r32,r33 = torch.split(R.view(B,-1),1,dim=1)
232 |     res = torch.cat([r32-r23, r13-r31, r21-r12],dim=1)  
233 | 
234 |     magnitude = (0.5*theta/sin(theta))
235 | 
236 |     return magnitude.view(B,1) * res
237 | 
238 | def batch_warp_inverse_depth(p_x, p_y, p_invD, pose, K):
239 |     """ Compute the warping grid w.r.t. the SE3 transform given the inverse depth
240 | 
241 |     :input
242 |     :param p_x the x coordinate map
243 |     :param p_y the y coordinate map
244 |     :param p_invD the inverse depth
245 |     :param pose the 3D transform in SE3
246 |     :param K the intrinsics
247 |     --------
248 |     :return 
249 |     :param projected u coordinate in image space Bx1xHxW
250 |     :param projected v coordinate in image space Bx1xHxW
251 |     :param projected inverse depth Bx1XHxW
252 |     """
253 |     [R, t] = pose
254 |     B, _, H, W = p_x.shape
255 | 
256 |     I = torch.ones((B,1,H,W)).type_as(p_invD)
257 |     x_y_1 = torch.cat((p_x, p_y, I), dim=1)
258 | 
259 |     warped = torch.bmm(R, x_y_1.view(B,3,H*W)) + \
260 |         t.view(B,3,1).expand(B,3,H*W) * p_invD.view(B, 1, H*W).expand(B,3,H*W)
261 | 
262 |     x_, y_, s_ = torch.split(warped, 1, dim=1)
263 |     fx, fy, cx, cy = torch.split(K, 1, dim=1)
264 | 
265 |     u_ = (x_ / s_).view(B,-1) * fx + cx
266 |     v_ = (y_ / s_).view(B,-1) * fy + cy
267 | 
268 |     inv_z_ = p_invD / s_.view(B,1,H,W)
269 | 
270 |     return u_.view(B,1,H,W), v_.view(B,1,H,W), inv_z_
271 | 
272 | def batch_warp_affine(pu, pv, affine):
273 |     # A = affine[:,:,:2]
274 |     # t = affine[:,:, 2]
275 |     B,_,H,W = pu.shape
276 |     ones = torch.ones(pu.shape).type_as(pu)
277 |     uv = torch.cat((pu, pv, ones), dim=1)
278 |     uv = torch.bmm(affine, uv.view(B,3,-1)) #+ t.view(B,2,1)
279 |     return uv[:,0].view(B,1,H,W), uv[:,1].view(B,1,H,W)
280 | 
281 | def check_occ(inv_z_buffer, inv_z_ref, u, v, thres=1e-1):
282 |     """ z-buffering check of occlusion 
283 |     :param inverse depth of target frame
284 |     :param inverse depth of reference frame
285 |     """
286 |     B, _, H, W = inv_z_buffer.shape
287 | 
288 |     inv_z_warped = warp_features(inv_z_ref, u, v)
289 |     inlier = (inv_z_buffer > inv_z_warped - thres)
290 | 
291 |     inviews = inlier & (u > 0) & (u < W) & \
292 |         (v > 0) & (v < H)
293 | 
294 |     return 1-inviews
295 | 
296 | def warp_features(F, u, v):
297 |     """
298 |     Warp the feature map (F) w.r.t. the grid (u, v)
299 |     """
300 |     B, C, H, W = F.shape
301 | 
302 |     u_norm = u / ((W-1)/2) - 1
303 |     v_norm = v / ((H-1)/2) - 1
304 |     uv_grid = torch.cat((u_norm.view(B,H,W,1), v_norm.view(B,H,W,1)), dim=3)
305 |     F_warped = nn.functional.grid_sample(F, uv_grid,
306 |         mode='bilinear', padding_mode='border')
307 |     return F_warped
308 | 
309 | def batch_transform_xyz(xyz_tensor, R, t, get_Jacobian=True):
310 |     '''
311 |     transform the point cloud w.r.t. the transformation matrix
312 |     :param xyz_tensor: B * 3 * H * W
313 |     :param R: rotation matrix B * 3 * 3
314 |     :param t: translation vector B * 3
315 |     '''
316 |     B, C, H, W = xyz_tensor.size()
317 |     t_tensor = t.contiguous().view(B,3,1).repeat(1,1,H*W)
318 |     p_tensor = xyz_tensor.contiguous().view(B, C, H*W)
319 |     # the transformation process is simply:
320 |     # p' = t + R*p
321 |     xyz_t_tensor = torch.baddbmm(t_tensor, R, p_tensor)
322 | 
323 |     if get_Jacobian:
324 |         # return both the transformed tensor and its Jacobian matrix
325 |         J_r = R.bmm(batch_skew_symmetric_matrix(-1*p_tensor.permute(0,2,1)))
326 |         J_t = -1 * torch.eye(3).view(1,3,3).expand(B,3,3)
327 |         J = torch.cat((J_r, J_t), 1)
328 |         return xyz_t_tensor.view(B, C, H, W), J
329 |     else:
330 |         return xyz_t_tensor.view(B, C, H, W)
331 | 
332 | def flow_from_rigid_transform(depth, extrinsic, intrinsic):
333 |     """
334 |     Get the optical flow induced by rigid transform [R,t] and depth
335 |     """
336 |     [R, t] = extrinsic
337 |     [fx, fy, cx, cy] = intrinsic
338 | 
339 | def batch_project(xyz_tensor, K):
340 |     """ Project a point cloud into pixels (u,v) given intrinsic K
341 |     [u';v';w] = [K][x;y;z]
342 |     u = u' / w; v = v' / w
343 | 
344 |     :param the xyz points 
345 |     :param calibration is a torch array composed of [fx, fy, cx, cy]
346 |     -------
347 |     :return u, v grid tensor in image coordinate
348 |     (tested through inverse project)
349 |     """
350 |     B, _, H, W = xyz_tensor.size()
351 |     batch_K = K.expand(H, W, B, 4).permute(2,3,0,1)
352 | 
353 |     x, y, z = torch.split(xyz_tensor, 1, dim=1)
354 |     fx, fy, cx, cy = torch.split(batch_K, 1, dim=1)
355 | 
356 |     u = fx*x / z + cx
357 |     v = fy*y / z + cy
358 |     return torch.cat((u,v), dim=1)
359 | 
360 | def batch_inverse_project(depth, K):
361 |     """ Inverse project pixels (u,v) to a point cloud given intrinsic 
362 |     :param depth dim B*H*W
363 |     :param calibration is torch array composed of [fx, fy, cx, cy]
364 |     :param color (optional) dim B*3*H*W
365 |     -------
366 |     :return xyz tensor (batch of point cloud)
367 |     (tested through projection)
368 |     """
369 |     if depth.dim() == 3:
370 |         B, H, W = depth.size()
371 |     else: 
372 |         B, _, H, W = depth.size()
373 | 
374 |     x, y = generate_xy_grid(B,H,W,K)
375 |     z = depth.view(B,1,H,W)
376 |     return torch.cat((x*z, y*z, z), dim=1)
377 | 
378 | def batch_euler2mat(ai, aj, ak, axes='sxyz'):
379 |     """ A torch implementation euler2mat from transform3d:
380 |     https://github.com/matthew-brett/transforms3d/blob/master/transforms3d/euler.py
381 |     :param ai : First rotation angle (according to `axes`).
382 |     :param aj : Second rotation angle (according to `axes`).
383 |     :param ak : Third rotation angle (according to `axes`).
384 |     :param axes : Axis specification; one of 24 axis sequences as string or encoded tuple - e.g. ``sxyz`` (the default).
385 |     -------
386 |     :return rotation matrix, array-like shape (B, 3, 3)
387 | 
388 |     Tested w.r.t. transforms3d.euler module
389 |     """
390 |     B = ai.size()[0]
391 | 
392 |     try:
393 |         firstaxis, parity, repetition, frame = _AXES2TUPLE[axes.lower()]
394 |     except (AttributeError, KeyError):
395 |         _TUPLE2AXES[axes]  # validation
396 |         firstaxis, parity, repetition, frame = axes
397 | 
398 |     i = firstaxis
399 |     j = _NEXT_AXIS[i+parity]
400 |     k = _NEXT_AXIS[i-parity+1]
401 |     order = [i, j, k]
402 | 
403 |     if frame:
404 |         ai, ak = ak, ai
405 |     if parity:
406 |         ai, aj, ak = -ai, -aj, -ak
407 | 
408 |     si, sj, sk = sin(ai), sin(aj), sin(ak)
409 |     ci, cj, ck = cos(ai), cos(aj), cos(ak)
410 |     cc, cs = ci*ck, ci*sk
411 |     sc, ss = si*ck, si*sk
412 | 
413 |     # M = torch.zeros(B, 3, 3).cuda()
414 |     if repetition:
415 |         c_i = [cj, sj*si, sj*ci]
416 |         c_j = [sj*sk, -cj*ss+cc, -cj*cs-sc]
417 |         c_k = [-sj*ck, cj*sc+cs, cj*cc-ss]
418 |     else:
419 |         c_i = [cj*ck, sj*sc-cs, sj*cc+ss]
420 |         c_j = [cj*sk, sj*ss+cc, sj*cs-sc]
421 |         c_k = [-sj, cj*si, cj*ci]
422 | 
423 |     def permute(X): # sort X w.r.t. the axis indices
424 |         return [ x for (y, x) in sorted(zip(order, X)) ]
425 | 
426 |     c_i = permute(c_i)
427 |     c_j = permute(c_j)
428 |     c_k = permute(c_k)
429 | 
430 |     r =[torch.stack(c_i, 1),
431 |         torch.stack(c_j, 1),
432 |         torch.stack(c_k, 1)]
433 |     r = permute(r)
434 | 
435 |     return torch.stack(r, 1)
436 | 
437 | def batch_mat2euler(M, axes='sxyz'): 
438 |     """ A torch implementation euler2mat from transform3d:
439 |     https://github.com/matthew-brett/transforms3d/blob/master/transforms3d/euler.py
440 |     :param array-like shape (3, 3) or (4, 4). Rotation matrix or affine.
441 |     :param  Axis specification; one of 24 axis sequences as string or encoded tuple - e.g. ``sxyz`` (the default).
442 |     --------
443 |     :returns 
444 |     :param ai : First rotation angle (according to `axes`).
445 |     :param aj : Second rotation angle (according to `axes`).
446 |     :param ak : Third rotation angle (according to `axes`).
447 |     """
448 |     try:
449 |         firstaxis, parity, repetition, frame = _AXES2TUPLE[axes.lower()]
450 |     except (AttributeError, KeyError):
451 |         _TUPLE2AXES[axes]  # validation
452 |         firstaxis, parity, repetition, frame = axes
453 | 
454 |     i = firstaxis
455 |     j = _NEXT_AXIS[i+parity]
456 |     k = _NEXT_AXIS[i-parity+1]
457 | 
458 |     if repetition:
459 |         sy = torch.sqrt(M[:, i, j]**2 + M[:, i, k]**2)
460 |         # A lazy way to cope with batch data. Can be more efficient
461 |         mask = ~(sy > 1e-8) 
462 |         ax = atan2( M[:, i, j],  M[:, i, k])
463 |         ay = atan2( sy,          M[:, i, i])
464 |         az = atan2( M[:, j, i], -M[:, k, i])
465 |         if mask.sum() > 0:
466 |             ax[mask] = atan2(-M[:, j, k][mask], M[:, j, j][mask])
467 |             ay[mask] = atan2( sy[mask],         M[:, i, i][mask])
468 |             az[mask] = 0.0
469 |     else:
470 |         cy = torch.sqrt(M[:, i, i]**2 + M[:, j, i]**2)
471 |         mask = ~(cy > 1e-8)
472 |         ax = atan2( M[:, k, j],  M[:, k, k])
473 |         ay = atan2(-M[:, k, i],  cy)
474 |         az = atan2( M[:, j, i],  M[:, i, i])
475 |         if mask.sum() > 0:
476 |             ax[mask] = atan2(-M[:, j, k][mask],  M[:, j, j][mask])
477 |             ay[mask] = atan2(-M[:, k, i][mask],  cy[mask])
478 |             az[mask] = 0.0
479 | 
480 |     if parity:
481 |         ax, ay, az = -ax, -ay, -az
482 |     if frame:
483 |         ax, az = az, ax
484 |     return ax, ay, az
485 | 


--------------------------------------------------------------------------------
/code/models/submodules.py:
--------------------------------------------------------------------------------
 1 | """
 2 | Submodules to build up CNN
 3 | 
 4 | @author: Zhaoyang Lv
 5 | @date: March, 2019
 6 | """
 7 | 
 8 | from __future__ import print_function
 9 | 
10 | import torch.nn as nn
11 | import torch
12 | import numpy as np
13 | 
14 | from torch.nn import init
15 | from torchvision import transforms
16 | 
17 | def color_normalize(color):
18 |     rgb_mean = torch.Tensor([0.4914, 0.4822, 0.4465]).type_as(color)
19 |     rgb_std = torch.Tensor([0.2023, 0.1994, 0.2010]).type_as(color)
20 |     return (color - rgb_mean.view(1,3,1,1)) / rgb_std.view(1,3,1,1)
21 | 
22 | def convLayer(batchNorm, in_planes, out_planes, kernel_size=3, stride=1, dilation=1, bias=False):
23 |     """ A wrapper of convolution-batchnorm-ReLU module
24 |     """
25 |     if batchNorm:
26 |         return nn.Sequential(
27 |             nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2 + dilation-1, dilation=dilation, bias=bias),
28 |             nn.BatchNorm2d(out_planes),
29 |             #nn.LeakyReLU(0.1,inplace=True) # deprecated 
30 |             nn.ELU(inplace=True)
31 |         )
32 |     else:
33 |         return nn.Sequential(
34 |             nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2 + dilation-1, dilation=dilation, bias=True),
35 |             #nn.LeakyReLU(0.1,inplace=True) # deprecated
36 |             nn.ELU(inplace=True)
37 |         )
38 | 
39 | def fcLayer(in_planes, out_planes, bias=True):
40 |     return nn.Sequential(
41 |         nn.Linear(in_planes, out_planes, bias),
42 |         nn.ReLU(inplace=True)
43 |     )
44 | 
45 | def initialize_weights(modules, method='xavier'):
46 |     for m in modules:
47 |         if isinstance(m, nn.Conv2d):
48 |             if m.bias is not None:
49 |                 m.bias.data.zero_()
50 |             if method == 'xavier':
51 |                 init.xavier_uniform_(m.weight)
52 |             elif method == 'kaiming':
53 |                 init.kaiming_uniform_(m.weight)
54 | 
55 |         if isinstance(m, nn.ConvTranspose2d):
56 |             if m.bias is not None:
57 |                 m.bias.data.zero_()
58 |             if method == 'xavier':
59 |                 init.xavier_uniform_(m.weight)
60 |             elif method == 'kaiming':
61 |                 init.kaiming_uniform_(m.weight)
62 |                 
63 | class ListModule(nn.Module):
64 |     """ The implementation of a list of modules from
65 |     https://discuss.pytorch.org/t/list-of-nn-module-in-a-nn-module/219/2
66 |     """
67 |     def __init__(self, *args):
68 |         super(ListModule, self).__init__()
69 |         idx = 0
70 |         for module in args:
71 |             self.add_module(str(idx), module)
72 |             idx += 1
73 | 
74 |     def __getitem__(self, idx):
75 |         if idx < 0 or idx >= len(self._modules):
76 |             raise IndexError('index {} is out of range'.format(idx))
77 |         it = iter(self._modules.values())
78 |         for i in range(idx):
79 |             next(it)
80 |         return next(it)
81 | 
82 |     def __iter__(self):
83 |         return iter(self._modules.values())
84 | 
85 |     def __len__(self):
86 |         return len(self._modules)
87 | 


--------------------------------------------------------------------------------
/code/run_example.py:
--------------------------------------------------------------------------------
 1 | """
 2 | An extremely simple example to show how to run the algorithm
 3 | 
 4 | @author: Zhaoyang Lv
 5 | @date: May 2019
 6 | """
 7 | 
 8 | import argparse 
 9 | 
10 | import torch
11 | import torch.nn as nn
12 | import torch.nn.functional as func
13 | 
14 | import models.LeastSquareTracking as ICtracking
15 | 
16 | from tqdm import tqdm
17 | from torch.utils.data import DataLoader
18 | from train_utils import check_cuda
19 | from data.SimpleLoader import SimpleLoader
20 | 
21 | def resize(img0, img1, depth0, depth1, K_in, resizeH, resizeW): 
22 |     H, W = img0.shape[-2:]
23 | 
24 |     I0 = func.interpolate(img0, (resizeH,resizeW), mode='bilinear', align_corners=True)
25 |     I1 = func.interpolate(img1, (resizeH,resizeW), mode='bilinear', align_corners=True)
26 |     D0 = func.interpolate(depth0, (resizeH,resizeW), mode='nearest')
27 |     D1 = func.interpolate(depth1, (resizeH,resizeW), mode='nearest')
28 | 
29 |     sx = resizeH / H
30 |     sy = resizeW / W
31 | 
32 |     K = K_in.clone()
33 |     K[:,0] *= sx
34 |     K[:,1] *= sy
35 |     K[:,2] *= sx
36 |     K[:,3] *= sy
37 | 
38 |     return I0, I1, D0, D1, K
39 | 
40 | def run_inference(dataloader, net):
41 | 
42 |     progress = tqdm(dataloader, ncols=100, 
43 |         desc = 'Run the deeper inverse compositional algorithm', 
44 |         total= len(dataloader))
45 | 
46 |     net.eval()
47 | 
48 |     for idx, batch, in enumerate(progress): 
49 | 
50 |         color0, color1, depth0, depth1, K = check_cuda(batch)
51 | 
52 |         # downsize the input to 120*160, it is the size of data when the algorthm is trained
53 |         C0, C1, D0, D1, K = resize(color0, color1, depth0, depth1, K, resizeH = 120, resizeW=160)
54 | 
55 |         with torch.no_grad():
56 |             R, t = net.forward(C0, C1, D0, D1, K)
57 | 
58 |         print('Rotation: ')
59 |         print(R)
60 |         print('translation: ')
61 |         print(t)
62 | 
63 | if __name__ == '__main__': 
64 | 
65 |     parser = argparse.ArgumentParser(description='Run the network inference example.')
66 | 
67 |     parser.add_argument('--checkpoint', default='trained_models/TUM_RGBD_ABC_final.pth.tar', 
68 |         type=str, help='the path to the pre-trained checkpoint.')
69 | 
70 |     parser.add_argument('--color_dir', default='data/data_examples/TUM/color',
71 |         help='the directory of color images')
72 |     parser.add_argument('--depth_dir', default='data/data_examples/TUM/depth', 
73 |         help='the directory of depth images')
74 | 
75 |     parser.add_argument('--intrinsic', default='525.0,525.0,319.5,239.5', 
76 |         help='Simple pin-hole camera intrinsics, input in the format (fx, fy, cx, cy)')
77 | 
78 |     config = parser.parse_args()
79 |     
80 |     K = [float(x) for x in config.intrinsic.split(',')]
81 | 
82 |     simple_loader = SimpleLoader(config.color_dir, config.depth_dir, K)
83 |     simple_loader = DataLoader(simple_loader, batch_size=1, shuffle=False)
84 | 
85 |     net = ICtracking.LeastSquareTracking(
86 |         encoder_name    = 'ConvRGBD2',
87 |         max_iter_per_pyr= 3,
88 |         mEst_type       = 'MultiScale2w',
89 |         solver_type     = 'Direct-ResVol')
90 | 
91 |     if torch.cuda.is_available(): 
92 |         net.cuda()
93 |     
94 |     net.load_state_dict(torch.load(config.checkpoint)['state_dict'])
95 | 
96 |     run_inference(simple_loader, net)


--------------------------------------------------------------------------------
/code/timers.py:
--------------------------------------------------------------------------------
 1 | """
 2 | A timing utility
 3 | 
 4 | @author: Zhaoyang Lv
 5 | @date: March 2019
 6 | """
 7 | 
 8 | from __future__ import absolute_import
 9 | from __future__ import division
10 | from __future__ import print_function
11 | from __future__ import unicode_literals
12 | 
13 | import time
14 | import numpy as np
15 | from collections import defaultdict
16 | 
17 | class Timer(object):
18 | 
19 |     def __init__(self):
20 |         self.reset()
21 | 
22 |     def tic(self):
23 |         self.start_time = time.time()
24 | 
25 |     def toc(self, average=True):
26 |         self.diff = time.time() - self.start_time
27 |         self.total_time += self.diff
28 |         self.calls += 1
29 | 
30 |     def total(self):
31 |         """ return the total amount of time """
32 |         return self.total_time
33 | 
34 |     def avg(self):
35 |         """ return the average amount of time """
36 |         return self.total_time / float(self.calls)
37 | 
38 |     def reset(self):
39 |         self.total_time = 0.
40 |         self.calls = 0
41 |         self.start_time = 0.
42 |         self.diff = 0.
43 | 
44 | class Timers(object):
45 | 
46 |     def __init__(self):
47 |         self.timers = defaultdict(Timer)
48 | 
49 |     def tic(self, key):
50 |         self.timers[key].tic()
51 | 
52 |     def toc(self, key):
53 |         self.timers[key].toc()
54 | 
55 |     def print(self, key=None):
56 |         if key is None:
57 |             # print all time
58 |             for k, v in self.timers.items():
59 |                 print("Average time for {:}: {:}".format(k, v.avg()))
60 |         else:
61 |             print("Average time for {:}: {:}".format(key, self.timers[key].avg()))
62 | 
63 |     def get_avg(self, key):
64 |         return self.timers[key].avg()
65 | 


--------------------------------------------------------------------------------
/code/train.py:
--------------------------------------------------------------------------------
  1 | """
  2 | The training script for deep trust region method
  3 | 
  4 | @author: Zhaoyang Lv 
  5 | @date: March 2019
  6 | """
  7 | 
  8 | import os, sys, argparse, time
  9 | 
 10 | import models.LeastSquareTracking as ICtracking
 11 | import models.criterions as criterions
 12 | import models.geometry as geometry
 13 | import train_utils
 14 | import config
 15 | from data.dataloader import load_data
 16 | 
 17 | import torch
 18 | import torch.nn as nn
 19 | import torch.utils.data as data
 20 | 
 21 | from timers import Timers
 22 | from tqdm import tqdm
 23 | 
 24 | import evaluate as eval_utils
 25 | 
 26 | def create_train_eval_loaders(options, eval_type, keyframes, 
 27 |     total_batch_size = 8, 
 28 |     trajectory  = ''):
 29 |     """ create the evaluation loader at different keyframes set-up
 30 |     """
 31 |     eval_loaders = {}
 32 | 
 33 |     for kf in keyframes:
 34 |         np_loader = load_data(options.dataset, [kf], eval_type, trajectory)
 35 |         eval_loaders['{:}_keyframe_{:}'.format(trajectory, kf)] = data.DataLoader(np_loader, 
 36 |             batch_size = int(total_batch_size),
 37 |             shuffle = False, num_workers = options.cpu_workers)
 38 |     
 39 |     return eval_loaders
 40 | 
 41 | def train_one_epoch(options, dataloader, net, optim, epoch, logger, objectives,
 42 |     known_mask=False, timers=None):
 43 | 
 44 |     net.train()
 45 | 
 46 |     epoch_len = len(dataloader)
 47 | 
 48 |     flow_loss, rpe_loss = None, None
 49 |     if 'EPE3D' in objectives: 
 50 |         flow_loss = criterions.compute_RT_EPE_loss
 51 |     if 'RPE' in objectives:
 52 |         rpe_loss = criterions.compute_RPE_loss
 53 | 
 54 |     if timers is None: timers_iter = Timers()
 55 | 
 56 |     if timers: timers.tic('one iteration')
 57 |     else: timers_iter.tic('one iteration')
 58 | 
 59 |     for batch_idx, batch in enumerate(dataloader):
 60 | 
 61 |         display_dict = {}
 62 | 
 63 |         optim.zero_grad()
 64 | 
 65 |         if timers: timers.tic('forward step')
 66 | 
 67 |         if known_mask: # for dataset that with mask or need mask
 68 |             color0, color1, depth0, depth1, Rt, K, obj_mask0, obj_mask1 = \
 69 |                 train_utils.check_cuda(batch[:8])
 70 |         else:
 71 |             color0, color1, depth0, depth1, Rt, K = \
 72 |                 train_utils.check_cuda(batch[:6])
 73 |             obj_mask0, obj_mask1 = None, None
 74 | 
 75 |         # Bypass lazy way to bypass invalid pixels. 
 76 |         invalid_mask = (depth0 == depth0.min()) + (depth0 == depth0.max())
 77 |         if obj_mask0 is not None:
 78 |             invalid_mask = 1.0 - obj_mask0 + invalid_mask
 79 | 
 80 |         Rs, ts = net.forward(color0, color1, depth0, depth1, K)[:2]
 81 | 
 82 |         if timers: timers.toc('forward step')
 83 |         if timers: timers.tic('calculate loss')
 84 | 
 85 |         R_gt, t_gt = Rt[:,:3,:3], Rt[:,:3,3]
 86 |  
 87 |         assert(flow_loss) # the only loss used for training
 88 | 
 89 |         epes3d = flow_loss(Rs, ts, R_gt, t_gt, depth0, K, invalid_mask).mean() * 1e2
 90 |         display_dict['train_epes3d'] = epes3d.item()
 91 | 
 92 |         loss = epes3d
 93 |         display_dict['train_loss'] = loss.item()
 94 | 
 95 |         if timers: timers.toc('calculate loss')
 96 |         if timers: timers.tic('backward')
 97 | 
 98 |         loss.backward()
 99 | 
100 |         if timers: timers.toc('backward')
101 | 
102 |         optim.step()
103 | 
104 |         lr = train_utils.get_learning_rate(optim)
105 |         display_dict['lr'] = lr
106 | 
107 |         if timers:
108 |             timers.toc('one iteration')
109 |             batch_time = timers.get_avg('one iteration')
110 |             timers.tic('one iteration')
111 |         else:
112 |             timers_iter.toc('one iteration')
113 |             batch_time = timers_iter.get_avg('one iteration')
114 |             timers_iter.tic('one iteration')
115 | 
116 |         logger.write_to_tensorboard(display_dict, epoch*epoch_len + batch_idx)
117 |         logger.write_to_terminal(display_dict, epoch, batch_idx, epoch_len, batch_time, is_train=True)
118 | 
119 | def train(options):
120 | 
121 |     if options.time:
122 |         timers = Timers()
123 |     else:
124 |         timers = None
125 | 
126 |     total_batch_size = options.batch_per_gpu *  torch.cuda.device_count()
127 | 
128 |     checkpoint = train_utils.load_checkpoint_train(options)
129 | 
130 |     keyframes = [int(x) for x in options.keyframes.split(',')]
131 |     train_loader = load_data(options.dataset, keyframes, load_type = 'train')
132 |     train_loader = data.DataLoader(train_loader,
133 |         batch_size = total_batch_size,
134 |         shuffle = True, num_workers = options.cpu_workers)
135 |     if options.dataset in ['BundleFusion', 'TUM_RGBD']:
136 |         obj_has_mask = False
137 |     else:
138 |         obj_has_mask = True
139 | 
140 |     eval_loaders = create_train_eval_loaders(options, 'validation', keyframes, total_batch_size)
141 | 
142 |     logfile_name = '_'.join([
143 |         options.prefix, # the current test version
144 |         options.network,
145 |         options.encoder_name,
146 |         options.mestimator,
147 |         options.solver,
148 |         'lr', str(options.lr),
149 |         'batch', str(total_batch_size),
150 |         'kf', options.keyframes])
151 | 
152 |     print("Initialize and train the Deep Trust Region Network")
153 |     net = ICtracking.LeastSquareTracking(
154 |         encoder_name    = options.encoder_name,
155 |         max_iter_per_pyr= options.max_iter_per_pyr,
156 |         mEst_type       = options.mestimator,
157 |         solver_type     = options.solver,
158 |         tr_samples      = options.tr_samples,
159 |         no_weight_sharing = options.no_weight_sharing,
160 |         timers          = timers)
161 | 
162 |     if options.no_weight_sharing:
163 |         logfile_name += '_no_weight_sharing'
164 |     logger = train_utils.initialize_logger(options, logfile_name)
165 | 
166 |     if options.checkpoint:
167 |         net.load_state_dict(checkpoint['state_dict'])
168 | 
169 |     if torch.cuda.is_available(): net.cuda()
170 | 
171 |     net.train()
172 | 
173 |     if torch.cuda.device_count() > 1:
174 |         print("Use", torch.cuda.device_count(), "GPUs for training!")
175 |         # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
176 |         net = nn.DataParallel(net)
177 | 
178 |     train_objective = ['EPE3D'] # Note: we don't use RPE for training
179 |     eval_objectives = ['EPE3D', 'RPE']
180 | 
181 |     num_params = train_utils.count_parameters(net)
182 | 
183 |     if num_params < 1:
184 |         print('There is no learnable parameters in this baseline.')
185 |         print('No training. Only one iteration of evaluation')
186 |         no_training = True
187 |     else:
188 |         print('There is a total of {:} learnabled parameters'.format(num_params))
189 |         no_training = False
190 |         optim = train_utils.create_optim(options, net)
191 |         scheduler = torch.optim.lr_scheduler.MultiStepLR(optim,
192 |             milestones=options.lr_decay_epochs,
193 |             gamma=options.lr_decay_ratio)
194 | 
195 |     freq = options.save_checkpoint_freq
196 |     for epoch in range(options.start_epoch, options.epochs):
197 | 
198 |         if epoch % freq == 0:
199 |             checkpoint_name = 'checkpoint_epoch{:d}.pth.tar'.format(epoch)
200 |             print('save {:}'.format(checkpoint_name))
201 |             state_info = {'epoch': epoch, 'num_param': num_params}
202 |             logger.save_checkpoint(net, state_info, filename=checkpoint_name)
203 | 
204 |         if options.no_val is False:
205 |             for k, loader in eval_loaders.items():
206 | 
207 |                 eval_name = '{:}_{:}'.format(options.dataset, k)
208 | 
209 |                 eval_info = eval_utils.evaluate_trust_region(
210 |                     loader, net, eval_objectives, 
211 |                     known_mask  = obj_has_mask, 
212 |                     eval_name   = eval_name,
213 |                     timers      = timers)
214 | 
215 |                 display_dict = {"{:}_epe3d".format(eval_name): eval_info['epes'].mean(), 
216 |                     "{:}_rpe_angular".format(eval_name): eval_info['angular_error'].mean(), 
217 |                     "{:}_rpe_translation".format(eval_name): eval_info['translation_error'].mean()}
218 | 
219 |                 logger.write_to_tensorboard(display_dict, epoch)
220 | 
221 |         if no_training: break
222 | 
223 |         train_one_epoch(options, train_loader, net, optim, epoch, logger,
224 |             train_objective, known_mask=obj_has_mask, timers=timers)
225 | 
226 |         scheduler.step()
227 | 
228 | if __name__ == '__main__':
229 | 
230 |     parser = argparse.ArgumentParser(description='Training the network')
231 | 
232 |     config.add_basics_config(parser)
233 |     config.add_train_basics_config(parser)
234 |     config.add_train_optim_config(parser)
235 |     config.add_train_log_config(parser)
236 |     config.add_train_loss_config(parser)
237 |     config.add_tracking_config(parser)
238 | 
239 |     options = parser.parse_args()
240 | 
241 |     options.start_epoch = 0
242 | 
243 |     print('---------------------------------------')
244 |     print_options = vars(options)
245 |     for key in print_options.keys():
246 |         print(key+': '+str(print_options[key]))
247 |     print('---------------------------------------')
248 | 
249 |     torch.manual_seed(1)
250 |     torch.cuda.manual_seed(1)
251 | 
252 |     print('Start training...')
253 |     train(options)
254 | 


--------------------------------------------------------------------------------
/code/train_utils.py:
--------------------------------------------------------------------------------
  1 | """
  2 | The training utility functions
  3 | 
  4 | @author: Zhaoyang Lv
  5 | @date: March 2019
  6 | """
  7 | 
  8 | import os, sys
  9 | from os.path import join
 10 | import torch
 11 | import torch.nn as nn
 12 | 
 13 | def check_cuda(items):
 14 |     if torch.cuda.is_available():
 15 |         return [x.cuda() for x in items]
 16 |     else:
 17 |         return items
 18 | 
 19 | def initialize_logger(opt, logfile_name):
 20 |     """ Initialize the logger for the network
 21 |     """
 22 |     from Logger import TensorBoardLogger
 23 |     log_dir = opt.dataset
 24 |     # if opt.resume_training:
 25 |     #     logfile_name = '_'.join([
 26 |     #         logfile_name,
 27 |     #         'resume'])
 28 | 
 29 |     log_dir = join('logs', log_dir, logfile_name)
 30 |     logger = TensorBoardLogger(log_dir, logfile_name)
 31 |     return logger
 32 | 
 33 | def create_optim(config, network):
 34 |     """ Create the optimizer
 35 |     """
 36 |     if config.opt=='sgd':
 37 |         optim = torch.optim.SGD(network.parameters(),
 38 |                     lr = config.lr,
 39 |                     momentum = 0.9,
 40 |                     weight_decay = 4e-4,
 41 |                     nesterov=False)
 42 |     elif config.opt=='adam' or config.opt=='sgdr':
 43 |         optim = torch.optim.Adam(network.parameters(),
 44 |                     lr = config.lr,
 45 |                     weight_decay = 4e-4
 46 |                     )
 47 |     elif config.opt=='rmsprop':
 48 |         optim = torch.optim.RMSprop(network.parameters(),
 49 |                     lr = config.lr,
 50 |                     weight_decay = 1e-4)
 51 |     else:
 52 |         raise NotImplementedError
 53 | 
 54 |     return optim
 55 | 
 56 | def count_parameters(model):
 57 |     return sum(p.numel() for p in model.parameters() if p.requires_grad)
 58 | 
 59 | def load_checkpoint_test(opt):
 60 |     if os.path.isfile(opt.checkpoint):
 61 |         print('=> loading checkpoint '+ opt.checkpoint)
 62 |         checkpoint = torch.load(opt.checkpoint)
 63 |     else:
 64 |         raise Exception('=> no checkpoint found at '+opt.checkpoint)
 65 |     return checkpoint
 66 | 
 67 | def load_checkpoint_train(opt):
 68 |     """ Loading the checking-point file if specified
 69 |     """
 70 |     checkpoint = None
 71 |     if opt.checkpoint:
 72 |         if os.path.isfile(opt.checkpoint):
 73 |             print('=> loading checkpoint '+ opt.checkpoint)
 74 | 
 75 |             checkpoint = torch.load(opt.checkpoint)
 76 |             print('=> loaded checkpoint '+ opt.checkpoint+' epoch %d'%checkpoint['epoch'])
 77 |             if opt.resume_training:
 78 |                 opt.start_epoch = checkpoint['epoch']
 79 |                 print('resume training on the checkpoint')
 80 |             else:
 81 |                 print('start new training...')
 82 | 
 83 |             # This is to configure the module loaded from multi-gpu
 84 |             if opt.checkpoint_multigpu:
 85 |                 from collections import OrderedDict
 86 |                 state_dict_rename = OrderedDict()
 87 |                 for k, v in checkpoint['state_dict'].items():
 88 |                     name = k[7:] # remove `module.`
 89 |                     state_dict_rename[name] = v
 90 |                 checkpoint['state_dict'] = state_dict_rename
 91 |         else:
 92 |             print('=> no checkpoint found at '+opt.checkpoint)
 93 |     return checkpoint
 94 | 
 95 | def set_learning_rate(optim, lr):
 96 |     """ manual set the learning rate for all specified parameters
 97 |     """
 98 |     for param_group in optim.param_groups:
 99 |         param_group['lr']=lr
100 | 
101 | def get_learning_rate(optim, name=None):
102 |     """ retrieve the current learning rate
103 |     """
104 |     if name is None:
105 |         # assume all the learning rate remains the same
106 |         return optim.param_groups[0]['lr']
107 | 
108 | def adjust_learning_rate_manual(optim, epoch, lr, lr_decay_epochs, lr_decay_ratio):
109 |     """ DIY the learning rate
110 |     """
111 |     for e in lr_decay_epochs:
112 |         if epoch<e: break
113 |         lr *= lr_decay_ratio
114 |     set_learning_rate(optim, lr)
115 |     return lr
116 | 
117 | def resize_input(img0, img1):
118 |     """ Resize a pair of inputs
119 |     """
120 |     B, C, H, W = img0.shape
121 |     resize_H = (H / 64) * 64
122 |     resize_W = (W / 64) * 64
123 |     if H != resize_H or W != resize_W:
124 |         resize_img = nn.Upsample(size=(resize_H, resize_W),mode='bilinear')
125 |         img0 = resize_img(img0)
126 |         img1 = resize_img(img1)
127 | 
128 |     return img0, img1
129 | 
130 | 
131 | """
132 | Deprecated Functions!!!
133 | """
134 | 
135 | def schedule_SGDR(optim, lr_min, lr_max, T_max, current_epoch, snapshots=None):
136 |     """ Use SGD Restart method for learning rate scheduling
137 |     """
138 |     last_epoch = current_epoch % T_max - 1
139 |     if last_epoch == 0:
140 |         print('Restart SGD. Set learning rate to {:}'.format(lr_max))
141 |         set_learning_rate(optim, lr_max)
142 | 
143 |     scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, T_max, lr_min, last_epoch)
144 | 
145 |     if snapshots is not None:
146 |         # reload the snapshot using the best model on validation set
147 |         print('Load snapshot')
148 |         pass
149 | 
150 |     return scheduler
151 | 
152 | 


--------------------------------------------------------------------------------
/code/trained_models/TUM_RGBD_ABC_final.pth.tar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/code/trained_models/TUM_RGBD_ABC_final.pth.tar


--------------------------------------------------------------------------------
/images/overall_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lvzhaoyang/DeeperInverseCompositionalAlgorithm/9a401bf2b03ecfed169c1a655b4fe8be8a4c211d/images/overall_flowchart.png


--------------------------------------------------------------------------------
/setup/environment.yml:
--------------------------------------------------------------------------------
 1 | name: deepICN
 2 | channels:
 3 |   - owlas
 4 |   - pytorch
 5 |   - anaconda
 6 |   - defaults
 7 | dependencies:
 8 |   - blas=1.0=mkl
 9 |   - bzip2=1.0.6=h14c3975_5
10 |   - ca-certificates=2019.1.23=0
11 |   - cairo=1.14.12=h8948797_3
12 |   - certifi=2019.3.9=py36_0
13 |   - cffi=1.12.3=py36h2e261b9_0
14 |   - cudatoolkit=9.2=0
15 |   - cycler=0.10.0=py36_0
16 |   - dbus=1.13.6=h746ee38_0
17 |   - expat=2.2.6=he6710b0_0
18 |   - ffmpeg=4.0=hcdf2ecd_0
19 |   - fontconfig=2.13.0=h9420a91_0
20 |   - freeglut=3.0.0=hf484d3e_5
21 |   - freetype=2.9.1=h8a8886c_1
22 |   - glib=2.56.2=hd408876_0
23 |   - graphite2=1.3.13=h23475e2_0
24 |   - gst-plugins-base=1.14.0=hbbd80ab_1
25 |   - gstreamer=1.14.0=hb453b48_1
26 |   - harfbuzz=1.8.8=hffaf4a1_0
27 |   - hdf5=1.10.2=hba1933b_1
28 |   - icu=58.2=h211956c_0
29 |   - intel-openmp=2019.3=199
30 |   - jasper=2.0.14=h07fcdf6_1
31 |   - jpeg=9b=h024ee3a_2
32 |   - kiwisolver=1.1.0=py36he6710b0_0
33 |   - libedit=3.1.20181209=hc058e9b_0
34 |   - libffi=3.2.1=hd88cf55_4
35 |   - libgcc-ng=8.2.0=hdf63c60_1
36 |   - libgfortran-ng=7.3.0=hdf63c60_0
37 |   - libglu=9.0.0=hf484d3e_1
38 |   - libopencv=3.4.2=hb342d67_1
39 |   - libopus=1.3=h7b6447c_0
40 |   - libpng=1.6.37=hbc83047_0
41 |   - libstdcxx-ng=8.2.0=hdf63c60_1
42 |   - libtiff=4.0.10=h2733197_2
43 |   - libuuid=1.0.3=h1bed415_2
44 |   - libvpx=1.7.0=h439df22_0
45 |   - libxcb=1.13=h1bed415_1
46 |   - libxml2=2.9.9=he19cac6_0
47 |   - matplotlib=3.1.0=py36h5429711_0
48 |   - mkl=2019.3=199
49 |   - mkl_fft=1.0.12=py36ha843d7b_0
50 |   - mkl_random=1.0.2=py36hd81dba3_0
51 |   - ncurses=6.1=he6710b0_1
52 |   - ninja=1.9.0=py36hfd86e86_0
53 |   - numpy=1.16.4=py36h7e9f1db_0
54 |   - numpy-base=1.16.4=py36hde5b4d6_0
55 |   - olefile=0.46=py36_0
56 |   - opencv=3.4.2=py36h6fd60c2_1
57 |   - openssl=1.1.1c=h7b6447c_1
58 |   - pandas=0.24.2=py36he6710b0_0
59 |   - pcre=8.43=he6710b0_0
60 |   - pillow=6.0.0=py36h34e0f95_0
61 |   - pip=19.1.1=py36_0
62 |   - pixman=0.38.0=h7b6447c_0
63 |   - py-opencv=3.4.2=py36hb342d67_1
64 |   - pycparser=2.19=py36_0
65 |   - pyparsing=2.4.0=py_0
66 |   - pyqt=5.9.2=py36h05f1152_2
67 |   - python=3.6.8=h0371630_0
68 |   - python-dateutil=2.8.0=py36_0
69 |   - pytorch=1.0.0=py3.6_cuda9.0.176_cudnn7.4.1_1
70 |   - pytz=2019.1=py_0
71 |   - qt=5.9.7=h5867ecd_1
72 |   - readline=7.0=h7b6447c_5
73 |   - scipy=1.2.1=py36h7c811a0_0
74 |   - setuptools=41.0.1=py36_0
75 |   - sip=4.19.8=py36hf484d3e_0
76 |   - six=1.12.0=py36_0
77 |   - sqlite=3.28.0=h7b6447c_0
78 |   - tk=8.6.8=hbc83047_0
79 |   - torchvision=0.2.2=py_3
80 |   - tornado=6.0.2=py36h7b6447c_0
81 |   - tqdm=4.31.1=py36_1
82 |   - transforms3d=0.3.1=h2e23829_0
83 |   - wheel=0.33.4=py36_0
84 |   - xz=5.2.4=h14c3975_4
85 |   - zlib=1.2.11=h7b6447c_3
86 |   - zstd=1.3.7=h0b5b093_0
87 |   - pip:
88 |     - protobuf==3.8.0
89 |     - tensorboard==1.0.0a6
90 |     - werkzeug==0.15.4
91 | prefix: ~/anaconda3/envs/deepICN
92 | 
93 | 


--------------------------------------------------------------------------------