├── Readme.md ├── assets └── network_architecture.jpg ├── configs ├── datasets │ ├── flying_things_3d │ │ ├── test.txt │ │ ├── train.txt │ │ └── val.txt │ ├── lidar_kitti │ │ └── test.txt │ ├── semantic_kitti │ │ ├── train.txt │ │ └── val.txt │ ├── stereo_kitti │ │ └── test.txt │ └── waymo_open │ │ ├── test.txt │ │ ├── train.txt │ │ └── val.txt ├── default.yaml ├── demo │ ├── demo_flying_things_3d.yaml │ └── demo_lidar_kitti.yaml ├── eval │ ├── eval_flying_things_3d.yaml │ ├── eval_lidar_kitti.yaml │ ├── eval_semantic_kitti.yaml │ ├── eval_stereo_kitti.yaml │ └── eval_waymo_open.yaml └── train │ ├── train_fully_supervised.yaml │ └── train_weakly_supervised.yaml ├── data_preprocessing ├── IO.py ├── flyingthings3d_utils.py ├── kitti_utils.py ├── process_flyingthings3d_subset.py ├── process_kitti.py ├── process_semantKitti.py └── python_pfm.py ├── eval.py ├── lib ├── __pycache__ │ ├── __init__.cpython-36.pyc │ ├── config.cpython-36.pyc │ ├── config.cpython-37.pyc │ ├── config.cpython-38.pyc │ ├── data.cpython-36.pyc │ ├── data.cpython-37.pyc │ ├── logger.cpython-36.pyc │ ├── logger.cpython-37.pyc │ ├── loss.cpython-36.pyc │ ├── loss.cpython-37.pyc │ ├── metrics.cpython-36.pyc │ ├── metrics.cpython-37.pyc │ ├── trainer.cpython-36.pyc │ ├── trainer.cpython-37.pyc │ ├── utils.cpython-36.pyc │ ├── utils.cpython-37.pyc │ └── utils.cpython-38.pyc ├── config.py ├── data.py ├── logger.py ├── loss.py ├── metrics.py ├── model │ ├── __init__.py │ ├── minkowski │ │ ├── ME_layers.py │ │ ├── MinkowskiFlow.py │ │ └── __init__,py │ └── rigid_3d_sf.py ├── trainer.py └── utils.py ├── requirements.txt ├── scripts ├── download_data.sh ├── download_pretrained_models.sh └── download_pretrained_models_ablations.sh ├── train.py └── utils ├── __init__.py └── chamfer_distance ├── __init__.py ├── __pycache__ ├── __init__.cpython-37.pyc └── chamfer_distance.cpython-37.pyc ├── chamfer_distance.cpp ├── chamfer_distance.cu ├── chamfer_distance.py └── readme.txt /Readme.md: -------------------------------------------------------------------------------- 1 | # Weakly Supervised Learning of Rigid 3D Scene Flow 2 | This repository provides code and data to train and evaluate a weakly supervised method for rigid 3D scene flow estimation. It represents the official implementation of the paper: 3 | 4 | ### [Weakly Supervised Learning of Rigid 3D Scene Flow](https://arxiv.org/pdf/2102.08945.pdf) 5 | [Zan Gojcic](https://zgojcic.github.io/), [Or Litany](https://orlitany.github.io/), [Andreas Wieser](https://baug.ethz.ch/departement/personen/mitarbeiter/personen-detail.MTg3NzU5.TGlzdC82NzksLTU1NTc1NDEwMQ==.html), [Leonidas J. Guibas](https://geometry.stanford.edu/member/guibas/), [Tolga Birdal](http://tbirdal.me/)\ 6 | | [IGP ETH Zurich](https://igp.ethz.ch/) | [Nvidia Toronto AI Lab](https://nv-tlabs.github.io/) | [Guibas Lab Stanford University](https://geometry.stanford.edu/index.html) | 7 | 8 | For more information, please see the [project webpage](https://3dsceneflow.github.io/) 9 | 10 | ![WSR3DSF](assets/network_architecture.jpg?raw=true) 11 | 12 | 13 | ### Environment Setup 14 | 15 | > Note: the code in this repo has been tested on Ubuntu 16.04/20.04 with Python 3.7, CUDA 10.1/10.2, PyTorch 1.7.1 and MinkowskiEngine 0.5.1. It may work for other setups, but has not been tested. 16 | 17 | 18 | Before proceding, make sure CUDA is installed and set up correctly. 19 | 20 | After cloning this reposiory you can proceed by setting up and activating a virual environment with Python 3.7. If you are using a different version of cuda (10.1) change the pytorch installation instruction accordingly. 21 | 22 | ```bash 23 | export CXX=g++-7 24 | conda config --append channels conda-forge 25 | conda create --name rigid_3dsf python=3.7 26 | source activate rigid_3dsf 27 | conda install --file requirements.txt 28 | conda install -c open3d-admin open3d=0.9.0.0 29 | conda install -c intel scikit-learn 30 | conda install pytorch==1.7.1 torchvision cudatoolkit=10.1 -c pytorch 31 | ``` 32 | You can then proceed and install [MinkowskiEngine](https://github.com/NVIDIA/MinkowskiEngine) library for sparse tensors: 33 | 34 | ```bash 35 | pip install -U git+https://github.com/NVIDIA/MinkowskiEngine -v --no-deps 36 | ``` 37 | Our repository also includes a pytorch implementation of [Chamfer Distance](https://github.com/chrdiller/pyTorchChamferDistance) in `./utils/chamfer_distance` which will be compiled on the first run. 38 | 39 | In order to test if Pytorch and MinkwoskiEngine are installed correctly please run 40 | ```bash 41 | python -c "import torch, MinkowskiEngine" 42 | ``` 43 | which should run without an error message. 44 | 45 | ### Data 46 | 47 | We provide the preprocessed data of *flying_things_3d* (108GB), *stereo_kitti* (500MB), *lidar_kitti* (~160MB), *semantic_kitti* (78GB), and *waymo_open* (50GB) used for training and evaluating our model. 48 | 49 | To download a single dataset please run: 50 | 51 | ```bash 52 | bash ./scripts/download_data.sh name_of_the_dataset 53 | ``` 54 | 55 | To download all datasets simply run: 56 | 57 | ```bash 58 | bash ./scripts/download_data.sh 59 | ``` 60 | The data will be downloaded and extracted to `./data/name_of_the_dataset/`. 61 | 62 | ### Pretrained models 63 | 64 | We provide the checkpoints of the models trained on *flying_things_3d* or *semantic_kitti*, which we use in our main evaluations. 65 | 66 | To download these models please run: 67 | 68 | ```bash 69 | bash ./scripts/download_pretrained_models.sh 70 | ``` 71 | 72 | Additionally, we provide all the models used in the ablation studies and the model fine tuned on *waymo_open*. 73 | 74 | To download these models please run: 75 | 76 | ```bash 77 | bash ./scripts/download_pretrained_models_ablations.sh 78 | ``` 79 | 80 | All the models will be downloaded and extracted to `./logs/dataset_used_for_training/`. 81 | 82 | ### Evaluation with pretrained models 83 | 84 | Our method with pretrained weights can be evaluated using the `./eval.py` script. The configuration parameters of the evaluation can be set with the `*.yaml` configuration files located in `./configs/eval/`. We provide a configuration file for each dataset used in our paper. For all evaluations please first download the pretrained weights and the corresponding data. Note, if the data or pretrained models are saved to a non-default path the config files also has to be adapted accordingly. 85 | 86 | #### *FlyingThings3D* 87 | 88 | To evaluate our backbone + scene flow head on *FlyingThings3d* please run: 89 | 90 | ```shell 91 | python eval.py ./configs/eval/eval_flying_things_3d.yaml 92 | ``` 93 | This should recreate the results from the Table 1 of our paper (EPE3D: 0.052 m). 94 | 95 | #### *stereoKITTI* 96 | 97 | To evaluate our backbone + scene flow head on *stereoKITTI* please run: 98 | 99 | ```shell 100 | python eval.py ./configs/eval/eval_stereo_kitti.yaml 101 | ``` 102 | This should again recreate the results from the Table 1 of our paper (EPE3D: 0.042 m). 103 | 104 | #### *lidarKITTI* 105 | 106 | To evaluate our full weakly supervised method on *lidarKITTI* please run: 107 | 108 | ```shell 109 | python eval.py ./configs/eval/eval_lidar_kitti.yaml 110 | ``` 111 | This should recreate the results for Ours++ on *lidarKITTI* (w/o ground) from the Table 2 of our paper (EPE3D: 0.094 m). To recreate other results on *lidarKITTI* please change the `./configs/eval/eval_lidar_kitti.yaml` file accordingly. 112 | 113 | 114 | #### *semanticKITTI* 115 | 116 | To evaluate our full weakly supervised method on *semanticKITTI* please run: 117 | 118 | ```shell 119 | python eval.py ./configs/eval/eval_semantic_kitti.yaml 120 | ``` 121 | This should recreate the results of our full model on *semanticKITTI* (w/o ground) from the Table 4 of our paper. To recreate other results on *semanticKITTI* please change the `./configs/eval/eval_semantic_kitti.yaml` file accordingly. 122 | 123 | #### *waymo open* 124 | 125 | To evaluate our fine-tuned model on *waymo open* please run: 126 | 127 | ```shell 128 | python eval.py ./configs/eval/eval_waymo_open.yaml 129 | ``` 130 | This should recreate the results for Ours++ (fine-tuned) from the Table 9 of the appendix. To recreate other results on *waymo open* please change the `./configs/eval/eval_waymo_open.yaml` file accordingly. 131 | 132 | 133 | ### Training our method from scratch 134 | 135 | Our method can be trained using the `./train.py` script. The configuration parameters of the training process can be set using the config files located in `./configs/train/`. 136 | 137 | #### Training our backbone with full supervision on *FlyingThings3D* 138 | 139 | To train our backbone network and scene flow head under full supervision (corresponds to Sec. 4.3 of our paper) please run: 140 | 141 | ```shell 142 | python train.py ./configs/train/train_fully_supervised.yaml 143 | ``` 144 | 145 | The checkpoints and tensorboard data will be saved to `./logs/logs_FlyingThings3D_ME`. If you run out of GPU memory with the default setting please adapt the `batch_size` and `acc_iter_size` in the `./configs/default.yaml` to e.g. 4 and 2, respectively. 146 | 147 | #### Training under weak supervision on *semanticKITTI* 148 | 149 | To train our full method under weak supervision on *semanticKITTI* please run 150 | 151 | ```shell 152 | python train.py ./configs/train/train_weakly_supervised.yaml 153 | ``` 154 | 155 | The checkpoints and tensorboard data will be saved to `./logs/logs_SemanticKITTI_ME`. If you run out of GPU memory with the default setting please adapt the `batch_size` and `acc_iter_size` in the `./configs/default.yaml` to e.g. 4 and 2, respectively. 156 | 157 | ### Citation 158 | 159 | If you found this code or paper useful, please consider citing: 160 | 161 | ```shell 162 | @misc{gojcic2021weakly3dsf, 163 | title = {Weakly {S}upervised {L}earning of {R}igid {3D} {S}cene {F}low}, 164 | author = {Gojcic, Zan and Litany, Or and Wieser, Andreas and Guibas, Leonidas J and Birdal, Tolga}, 165 | year = {2021}, 166 | eprint={2102.08945}, 167 | archivePrefix={arXiv}, 168 | primaryClass={cs.CV} 169 | } 170 | ``` 171 | ### Contact 172 | If you run into any problems or have questions, please create an issue or contact [Zan Gojcic](zgojcic@ethz.ch). 173 | 174 | 175 | ### Acknowledgments 176 | In this project we use parts of the official implementations of: 177 | 178 | - [MinkowskiEngine](https://github.com/NVIDIA/MinkowskiEngine) 179 | - [MultiviewReg](https://github.com/zgojcic/3D_multiview_reg) 180 | - [RPMNet](https://github.com/yewzijian/RPMNet) 181 | - [FLOT](https://github.com/valeoai/FLOT) 182 | 183 | We thank the respective authors for open sourcing their methods. -------------------------------------------------------------------------------- /assets/network_architecture.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/assets/network_architecture.jpg -------------------------------------------------------------------------------- /configs/datasets/lidar_kitti/test.txt: -------------------------------------------------------------------------------- 1 | 000002.npz 2 | 000003.npz 3 | 000007.npz 4 | 000008.npz 5 | 000009.npz 6 | 000010.npz 7 | 000011.npz 8 | 000012.npz 9 | 000013.npz 10 | 000014.npz 11 | 000015.npz 12 | 000016.npz 13 | 000017.npz 14 | 000018.npz 15 | 000019.npz 16 | 000020.npz 17 | 000021.npz 18 | 000022.npz 19 | 000023.npz 20 | 000024.npz 21 | 000025.npz 22 | 000026.npz 23 | 000027.npz 24 | 000028.npz 25 | 000029.npz 26 | 000030.npz 27 | 000031.npz 28 | 000032.npz 29 | 000033.npz 30 | 000034.npz 31 | 000035.npz 32 | 000036.npz 33 | 000037.npz 34 | 000038.npz 35 | 000039.npz 36 | 000040.npz 37 | 000041.npz 38 | 000042.npz 39 | 000043.npz 40 | 000044.npz 41 | 000045.npz 42 | 000046.npz 43 | 000047.npz 44 | 000048.npz 45 | 000049.npz 46 | 000050.npz 47 | 000051.npz 48 | 000052.npz 49 | 000053.npz 50 | 000054.npz 51 | 000055.npz 52 | 000056.npz 53 | 000057.npz 54 | 000058.npz 55 | 000059.npz 56 | 000060.npz 57 | 000061.npz 58 | 000062.npz 59 | 000063.npz 60 | 000064.npz 61 | 000065.npz 62 | 000066.npz 63 | 000067.npz 64 | 000068.npz 65 | 000069.npz 66 | 000070.npz 67 | 000071.npz 68 | 000072.npz 69 | 000073.npz 70 | 000074.npz 71 | 000075.npz 72 | 000076.npz 73 | 000077.npz 74 | 000078.npz 75 | 000079.npz 76 | 000080.npz 77 | 000081.npz 78 | 000083.npz 79 | 000084.npz 80 | 000085.npz 81 | 000086.npz 82 | 000088.npz 83 | 000089.npz 84 | 000090.npz 85 | 000091.npz 86 | 000092.npz 87 | 000093.npz 88 | 000094.npz 89 | 000095.npz 90 | 000096.npz 91 | 000097.npz 92 | 000098.npz 93 | 000105.npz 94 | 000106.npz 95 | 000107.npz 96 | 000108.npz 97 | 000109.npz 98 | 000110.npz 99 | 000111.npz 100 | 000112.npz 101 | 000113.npz 102 | 000114.npz 103 | 000115.npz 104 | 000116.npz 105 | 000117.npz 106 | 000118.npz 107 | 000119.npz 108 | 000120.npz 109 | 000121.npz 110 | 000122.npz 111 | 000123.npz 112 | 000124.npz 113 | 000125.npz 114 | 000126.npz 115 | 000127.npz 116 | 000128.npz 117 | 000129.npz 118 | 000130.npz 119 | 000131.npz 120 | 000132.npz 121 | 000141.npz 122 | 000142.npz 123 | 000143.npz 124 | 000144.npz 125 | 000145.npz 126 | 000146.npz 127 | 000147.npz 128 | 000148.npz 129 | 000149.npz 130 | 000150.npz 131 | 000155.npz 132 | 000157.npz 133 | 000158.npz 134 | 000159.npz 135 | 000160.npz 136 | 000161.npz 137 | 000162.npz 138 | 000163.npz 139 | 000164.npz 140 | 000168.npz 141 | 000169.npz 142 | 000199.npz 143 | -------------------------------------------------------------------------------- /configs/datasets/stereo_kitti/test.txt: -------------------------------------------------------------------------------- 1 | 000002.npz 2 | 000003.npz 3 | 000007.npz 4 | 000008.npz 5 | 000009.npz 6 | 000010.npz 7 | 000011.npz 8 | 000012.npz 9 | 000013.npz 10 | 000014.npz 11 | 000015.npz 12 | 000016.npz 13 | 000017.npz 14 | 000018.npz 15 | 000019.npz 16 | 000020.npz 17 | 000021.npz 18 | 000022.npz 19 | 000023.npz 20 | 000024.npz 21 | 000025.npz 22 | 000026.npz 23 | 000027.npz 24 | 000028.npz 25 | 000029.npz 26 | 000030.npz 27 | 000031.npz 28 | 000032.npz 29 | 000033.npz 30 | 000034.npz 31 | 000035.npz 32 | 000036.npz 33 | 000037.npz 34 | 000038.npz 35 | 000039.npz 36 | 000040.npz 37 | 000041.npz 38 | 000042.npz 39 | 000043.npz 40 | 000044.npz 41 | 000045.npz 42 | 000046.npz 43 | 000047.npz 44 | 000048.npz 45 | 000049.npz 46 | 000050.npz 47 | 000051.npz 48 | 000052.npz 49 | 000053.npz 50 | 000054.npz 51 | 000055.npz 52 | 000056.npz 53 | 000057.npz 54 | 000058.npz 55 | 000059.npz 56 | 000060.npz 57 | 000061.npz 58 | 000062.npz 59 | 000063.npz 60 | 000064.npz 61 | 000065.npz 62 | 000066.npz 63 | 000067.npz 64 | 000068.npz 65 | 000069.npz 66 | 000070.npz 67 | 000071.npz 68 | 000072.npz 69 | 000073.npz 70 | 000074.npz 71 | 000075.npz 72 | 000076.npz 73 | 000077.npz 74 | 000078.npz 75 | 000079.npz 76 | 000080.npz 77 | 000081.npz 78 | 000083.npz 79 | 000084.npz 80 | 000085.npz 81 | 000086.npz 82 | 000088.npz 83 | 000089.npz 84 | 000090.npz 85 | 000091.npz 86 | 000092.npz 87 | 000093.npz 88 | 000094.npz 89 | 000095.npz 90 | 000096.npz 91 | 000097.npz 92 | 000098.npz 93 | 000105.npz 94 | 000106.npz 95 | 000107.npz 96 | 000108.npz 97 | 000109.npz 98 | 000110.npz 99 | 000111.npz 100 | 000112.npz 101 | 000113.npz 102 | 000114.npz 103 | 000115.npz 104 | 000116.npz 105 | 000117.npz 106 | 000118.npz 107 | 000119.npz 108 | 000120.npz 109 | 000121.npz 110 | 000122.npz 111 | 000123.npz 112 | 000124.npz 113 | 000125.npz 114 | 000126.npz 115 | 000127.npz 116 | 000128.npz 117 | 000129.npz 118 | 000130.npz 119 | 000131.npz 120 | 000132.npz 121 | 000141.npz 122 | 000142.npz 123 | 000143.npz 124 | 000144.npz 125 | 000145.npz 126 | 000146.npz 127 | 000147.npz 128 | 000148.npz 129 | 000149.npz 130 | 000150.npz 131 | 000155.npz 132 | 000157.npz 133 | 000158.npz 134 | 000159.npz 135 | 000160.npz 136 | 000161.npz 137 | 000162.npz 138 | 000163.npz 139 | 000164.npz 140 | 000168.npz 141 | 000169.npz 142 | 000199.npz 143 | -------------------------------------------------------------------------------- /configs/default.yaml: -------------------------------------------------------------------------------- 1 | method: 2 | backbone: '' 3 | 4 | misc: 5 | voxel_size: 0.10 # Size of the voxel grid used for downsampling 6 | num_points: 8192 # Number of points 7 | trainer: 'FlowTrainer' # Which class of trainer to use. Can be used if multiple different trainers are defined. 8 | use_gpu: True # If GPU should be used or not 9 | log_dir: ./logs/ # Path to the base folder where the models and logs will be saves 10 | 11 | data: 12 | input_features: absolute_coords # Input features to use (assigned to each sparse voxel) 13 | only_near_points: True # Only consider near points (less than 35m away) [Used in all scene flow algorithms] 14 | 15 | train: 16 | batch_size: 8 # Training batch size 17 | acc_iter_size: 1 # Number of iterration to accumulate the gradients before optimizer step (can be used if the gpu memory is too low) 18 | num_workers: 6 # Number of workers used for the data loader 19 | max_epoch: 5000 # Max number of training epochs 20 | stat_interval: 250 # Interval at which the stats are printed out and saved for the tensorboard (if positive it denotes iteration if negative epochs) 21 | chkpt_interval: 500 # Interval at which the model is saved (if positive it denotes iteration if negative epochs) 22 | val_interval: 2000 # Interval at which the validation is performed (if positive it denotes iteration if negative epochs) 23 | weighted_seg_loss: True 24 | 25 | val: 26 | batch_size: 8 # Validation batch size 27 | num_workers: 6 # Number of workers for the validation data set 28 | 29 | test: 30 | results_dir: ./eval/ # Path to where to save the test results 31 | batch_size: 1 # Test batch size 32 | num_workers: 1 # Num of workers to use for the test data set 33 | 34 | loss: 35 | bg_loss_w: 1.0 # Weight of the background loss 36 | fg_loss_w: 1.0 # Weight of the foreground loss 37 | flow_loss_w: 1.0 # Weight of the flow loss 38 | ego_loss_w: 1.0 # Weight of the ego motion loss 39 | inlier_loss_w: 0.005 # Weight of the inlier loss (part of ego-motion) 40 | cd_loss_w: 0.5 # Wegihts of the chamfer distance loss 41 | rigid_loss_w: 1.0 # Wegihts of the rigidity loss 42 | 43 | optimizer: 44 | alg: Adam # Which optimizer to use 45 | learning_rate: 0.001 # Initial learning rate 46 | weight_decay: 0.0 # Weight decay weight 47 | momentum: 0.8 #Momentum 48 | scheduler: ExponentialLR 49 | exp_gamma: 0.98 50 | -------------------------------------------------------------------------------- /configs/demo/demo_flying_things_3d.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/configs/demo/demo_flying_things_3d.yaml -------------------------------------------------------------------------------- /configs/demo/demo_lidar_kitti.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/configs/demo/demo_lidar_kitti.yaml -------------------------------------------------------------------------------- /configs/eval/eval_flying_things_3d.yaml: -------------------------------------------------------------------------------- 1 | method: 2 | backbone: 'ME' # Type of backbone network [ME] 3 | flow: True # Use scene-flow head 4 | ego_motion: False # Use ego-motion head 5 | semantic: False # Use background segmentation head 6 | clustering: False # Use foreground clustering head 7 | 8 | misc: 9 | run_mode: test # Mode to run the network in 10 | 11 | data: 12 | dataset: FlyingThings3D_ME # Name of the dataset [StereoKITTI_ME, FlyingThings3D_ME, SemanticKITTI_ME, LidarKITTI_ME, WaymoOpen_ME] 13 | root: ./data/flying_things_3d/ # Path to the data 14 | input_features: absolute_coords # Input features assigned to each sparse voxel 15 | n_classes: 2 # Number of classes for the background segmentation head 16 | remove_ground: False # Remove ground by simple thresholding of the height coordinate 17 | augment_data: False # Augment the data by random rotation and translation 18 | 19 | network: 20 | normalize_features: True 21 | norm_type: IN # Type of normalization layer IN = instance, BN = batch normalization, False = no normalization 22 | in_kernel_size: 7 # Size of the initial convolutional kernel 23 | feature_dim: 64 24 | ego_motion_points: 1024 # Number of points that are randomly sampled for the ego motion estimation 25 | add_slack: True # Add slack row and column in the Sinkhorn iteration module 26 | sinkhorn_iter: 3 # Number of Sinkhorn iterations in the ego motion module 27 | use_pretrained: True # Flag for training 28 | cluster_metric: euclidean # Distance metric used to compute the cluster assignments 0 = Euclidean 29 | min_p_cluster: 30 # Min number of points in a cluster 30 | min_samples_dbscan: 5 # Min number of points in the neighborhood DBSCAN 31 | eps_dbscan: 0.75 # Eps value in DBSCAN for the Euclidean distance 32 | pretrained_path: logs/logs_FlyingThings3D_ME/flow_head_l1_loss/model_best.pt # Path to the pretrained model 33 | 34 | test: 35 | batch_size: 1 # Test batch size 36 | num_workers: 1 # Num of workers to use for the test data set 37 | postprocess_ego: False # Apply postprocessing (optimization of the ego-motion) 38 | postprocess_clusters: False # Apply postprocessing (optimization of the motion across the clusters) 39 | results_dir: ./eval_results/flying_things_3d/ 40 | 41 | loss: 42 | background_loss: False # Compute background loss 43 | flow_loss: False # Compute flow loss 44 | ego_loss: False # Compute ego-motion loss 45 | foreground_loss: False # Compute foreground loss 46 | 47 | metrics: 48 | flow: True # Compute evaluation metrics for flow estimation (EPE3D, Acc3DS, Acc3DR, Outliers) 49 | ego_motion: False # Compute evaluation metrics for ego-motion estimation (RRE, RTE) 50 | semantic: False # Compute evaluation metrics for background segmentation (Precision, Recall) 51 | -------------------------------------------------------------------------------- /configs/eval/eval_lidar_kitti.yaml: -------------------------------------------------------------------------------- 1 | method: 2 | backbone: 'ME' # Type of backbone network [ME] 3 | flow: True # Use scene-flow head 4 | ego_motion: True # Use ego-motion head 5 | semantic: True # Use background segmentation head 6 | clustering: True # Use foreground clustering head 7 | 8 | misc: 9 | run_mode: test # Mode to run the network in 10 | 11 | data: 12 | dataset: LidarKITTI_ME # Name of the dataset [StereoKITTI_ME, FlyingThings3D_ME, SemanticKITTI_ME, LidarKITTI_ME, WaymoOpen_ME] 13 | root: ./data/lidar_kitti/ # Path to the data 14 | input_features: absolute_coords # Input features assigned to each sparse voxel 15 | n_classes: 2 # Number of classes for the background segmentation head 16 | remove_ground: True # Remove ground by simple thresholding of the height coordinate 17 | augment_data: False # Augment the data by random rotation and translation 18 | 19 | network: 20 | normalize_features: True 21 | norm_type: IN # Type of normalization layer IN = instance, BN = batch normalization, False = no normalization 22 | in_kernel_size: 7 # Size of the initial convolutional kernel 23 | feature_dim: 64 24 | ego_motion_points: 1024 # Number of points that are randomly sampled for the ego motion estimation 25 | add_slack: True # Add slack row and column in the Sinkhorn iteration module 26 | sinkhorn_iter: 3 # Number of Sinkhorn iterations in the ego motion module 27 | use_pretrained: True # Flag for training 28 | cluster_metric: euclidean # Distance metric used to compute the cluster assignments 0 = Euclidean 29 | min_p_cluster: 30 # Min number of points in a cluster 30 | min_samples_dbscan: 5 # Min number of points in the neighborhood DBSCAN 31 | eps_dbscan: 0.75 # Eps value in DBSCAN for the Euclidean distance 32 | pretrained_path: logs/logs_SemanticKITTI_ME/full_scratch_all_loss/model_best.pt # Path to the pretrained model 33 | 34 | test: 35 | batch_size: 1 # Test batch size 36 | num_workers: 1 # Num of workers to use for the test data set 37 | postprocess_ego: True # Apply postprocessing (optimization of the ego-motion) 38 | postprocess_clusters: True # Apply postprocessing (optimization of the motion across the clusters) 39 | results_dir: ./eval_results/lidar_kitti/ 40 | 41 | loss: 42 | background_loss: False # Compute background loss 43 | flow_loss: False # Compute flow loss 44 | ego_loss: False # Compute ego-motion loss 45 | foreground_loss: False # Compute foreground loss 46 | 47 | metrics: 48 | flow: True # Compute evaluation metrics for flow estimation (EPE3D, Acc3DS, Acc3DR, Outliers) 49 | ego_motion: True # Compute evaluation metrics for ego-motion estimation (RRE, RTE) 50 | semantic: False # Compute evaluation metrics for background segmentation (Precision, Recall) -------------------------------------------------------------------------------- /configs/eval/eval_semantic_kitti.yaml: -------------------------------------------------------------------------------- 1 | method: 2 | backbone: 'ME' # Type of backbone network [ME] 3 | flow: True # Use scene-flow head 4 | ego_motion: True # Use ego-motion head 5 | semantic: True # Use background segmentation head 6 | clustering: True # Use foreground clustering head 7 | 8 | misc: 9 | run_mode: test # Mode to run the network in 10 | 11 | data: 12 | dataset: SemanticKITTI_ME # Name of the dataset [StereoKITTI_ME, FlyingThings3D_ME, SemanticKITTI_ME, LidarKITTI_ME, WaymoOpen_ME] 13 | root: ./data/semantic_kitti/ # Path to the data 14 | input_features: absolute_coords # Input features assigned to each sparse voxel 15 | n_classes: 2 # Number of classes for the background segmentation head 16 | remove_ground: True # Remove ground by simple thresholding of the height coordinate 17 | augment_data: False # Augment the data by random rotation and translation 18 | 19 | network: 20 | normalize_features: True 21 | norm_type: IN # Type of normalization layer IN = instance, BN = batch normalization, False = no normalization 22 | in_kernel_size: 7 # Size of the initial convolutional kernel 23 | feature_dim: 64 24 | ego_motion_points: 1024 # Number of points that are randomly sampled for the ego motion estimation 25 | add_slack: True # Add slack row and column in the Sinkhorn iteration module 26 | sinkhorn_iter: 3 # Number of Sinkhorn iterations in the ego motion module 27 | use_pretrained: True # Flag for training 28 | cluster_metric: euclidean # Distance metric used to compute the cluster assignments 0 = Euclidean 29 | min_p_cluster: 30 # Min number of points in a cluster 30 | min_samples_dbscan: 5 # Min number of points in the neighborhood DBSCAN 31 | eps_dbscan: 0.75 # Eps value in DBSCAN for the Euclidean distance 32 | pretrained_path: logs/logs_SemanticKITTI_ME/full_scratch_all_loss/model_best.pt # Path to the pretrained model 33 | 34 | test: 35 | batch_size: 1 # Test batch size 36 | num_workers: 1 # Num of workers to use for the test data set 37 | postprocess_ego: True # Apply postprocessing (optimization of the ego-motion) 38 | postprocess_clusters: True # Apply postprocessing (optimization of the motion across the clusters) 39 | results_dir: ./eval_results/semantic_kitti/ 40 | 41 | loss: 42 | background_loss: False # Compute background loss 43 | flow_loss: False # Compute flow loss 44 | ego_loss: False # Compute ego-motion loss 45 | foreground_loss: False # Compute foreground loss 46 | 47 | metrics: 48 | flow: False # Compute evaluation metrics for flow estimation (EPE3D, Acc3DS, Acc3DR, Outliers) 49 | ego_motion: True # Compute evaluation metrics for ego-motion estimation (RRE, RTE) 50 | semantic: True # Compute evaluation metrics for background segmentation (Precision, Recall) 51 | 52 | -------------------------------------------------------------------------------- /configs/eval/eval_stereo_kitti.yaml: -------------------------------------------------------------------------------- 1 | method: 2 | backbone: 'ME' # Type of backbone network [ME] 3 | flow: True # Use scene-flow head 4 | ego_motion: False # Use ego-motion head 5 | semantic: False # Use background segmentation head 6 | clustering: False # Use foreground clustering head 7 | 8 | misc: 9 | run_mode: test # Mode to run the network in 10 | 11 | data: 12 | dataset: StereoKITTI_ME # Name of the dataset [StereoKITTI_ME, FlyingThings3D_ME, SemanticKITTI_ME, LidarKITTI_ME, WaymoOpen_ME] 13 | root: ./data/stereo_kitti/ # Path to the data 14 | input_features: absolute_coords # Input features assigned to each sparse voxel 15 | n_classes: 2 # Number of classes for the background segmentation head 16 | remove_ground: True # Remove ground by simple thresholding of the height coordinate 17 | augment_data: False # Augment the data by random rotation and translation 18 | 19 | network: 20 | normalize_features: True 21 | norm_type: IN # Type of normalization layer IN = instance, BN = batch normalization, False = no normalization 22 | in_kernel_size: 7 # Size of the initial convolutional kernel 23 | feature_dim: 64 24 | ego_motion_points: 1024 # Number of points that are randomly sampled for the ego motion estimation 25 | add_slack: True # Add slack row and column in the Sinkhorn iteration module 26 | sinkhorn_iter: 3 # Number of Sinkhorn iterations in the ego motion module 27 | use_pretrained: True # Flag for training 28 | cluster_metric: euclidean # Distance metric used to compute the cluster assignments 0 = Euclidean 29 | min_p_cluster: 30 # Min number of points in a cluster 30 | min_samples_dbscan: 5 # Min number of points in the neighborhood DBSCAN 31 | eps_dbscan: 0.75 # Eps value in DBSCAN for the Euclidean distance 32 | pretrained_path: logs/logs_FlyingThings3D_ME/flow_head_l1_loss/model_best.pt # Path to the pretrained model 33 | 34 | test: 35 | batch_size: 1 # Test batch size 36 | num_workers: 1 # Num of workers to use for the test data set 37 | postprocess_ego: False # Apply postprocessing (optimization of the ego-motion) 38 | postprocess_clusters: False # Apply postprocessing (optimization of the motion across the clusters) 39 | results_dir: ./eval_results/stereo_kitti/ 40 | 41 | loss: 42 | background_loss: False # Compute background loss 43 | flow_loss: False # Compute flow loss 44 | ego_loss: False # Compute ego-motion loss 45 | foreground_loss: False # Compute foreground loss 46 | 47 | metrics: 48 | flow: True # Compute evaluation metrics for flow estimation (EPE3D, Acc3DS, Acc3DR, Outliers) 49 | ego_motion: False # Compute evaluation metrics for ego-motion estimation (RRE, RTE) 50 | semantic: False # Compute evaluation metrics for background segmentation (Precision, Recall) 51 | -------------------------------------------------------------------------------- /configs/eval/eval_waymo_open.yaml: -------------------------------------------------------------------------------- 1 | method: 2 | backbone: 'ME' # Type of backbone network [ME] 3 | flow: True # Use scene-flow head 4 | ego_motion: True # Use ego-motion head 5 | semantic: True # Use background segmentation head 6 | clustering: True # Use foreground clustering head 7 | 8 | misc: 9 | run_mode: test # Mode to run the network in 10 | 11 | data: 12 | dataset: WaymoOpen_ME # Name of the dataset [StereoKITTI_ME, FlyingThings3D_ME, SemanticKITTI_ME, LidarKITTI_ME, WaymoOpen_ME] 13 | root: ./data/waymo_open/ # Path to the data 14 | input_features: absolute_coords # Input features assigned to each sparse voxel 15 | n_classes: 2 # Number of classes for the background segmentation head 16 | remove_ground: False # Remove ground by simple thresholding of the height coordinate 17 | augment_data: False # Augment the data by random rotation and translation 18 | 19 | network: 20 | normalize_features: True 21 | norm_type: IN # Type of normalization layer IN = instance, BN = batch normalization, False = no normalization 22 | in_kernel_size: 7 # Size of the initial convolutional kernel 23 | feature_dim: 64 24 | ego_motion_points: 1024 # Number of points that are randomly sampled for the ego motion estimation 25 | add_slack: True # Add slack row and column in the Sinkhorn iteration module 26 | sinkhorn_iter: 3 # Number of Sinkhorn iterations in the ego motion module 27 | use_pretrained: True # Flag for training 28 | cluster_metric: euclidean # Distance metric used to compute the cluster assignments 0 = Euclidean 29 | min_p_cluster: 30 # Min number of points in a cluster 30 | min_samples_dbscan: 5 # Min number of points in the neighborhood DBSCAN 31 | eps_dbscan: 0.75 # Eps value in DBSCAN for the Euclidean distance 32 | pretrained_path: logs/logs_WaymoOpen_ME/full_pretrained_all_loss/model_best.pt # Path to the pretrained model 33 | 34 | test: 35 | batch_size: 1 # Test batch size 36 | num_workers: 1 # Num of workers to use for the test data set 37 | postprocess_ego: True # Apply postprocessing (optimization of the ego-motion) 38 | postprocess_clusters: True # Apply postprocessing (optimization of the motion across the clusters) 39 | results_dir: ./eval_results/waymo_open/ 40 | 41 | loss: 42 | background_loss: False # Compute background loss 43 | flow_loss: False # Compute flow loss 44 | ego_loss: False # Compute ego-motion loss 45 | foreground_loss: False # Compute foreground loss 46 | 47 | metrics: 48 | flow: False # Compute evaluation metrics for flow estimation (EPE3D, Acc3DS, Acc3DR, Outliers) 49 | ego_motion: True # Compute evaluation metrics for ego-motion estimation (RRE, RTE) 50 | semantic: True # Compute evaluation metrics for background segmentation (Precision, Recall) -------------------------------------------------------------------------------- /configs/train/train_fully_supervised.yaml: -------------------------------------------------------------------------------- 1 | method: 2 | backbone: 'ME' # Type of backbone network [ME] 3 | flow: True # Use scene-flow head 4 | ego_motion: False # Use ego-motion head 5 | semantic: False # Use background segmentation head 6 | clustering: False # Use foreground clustering head 7 | 8 | misc: 9 | run_mode: train # Mode to run the network in 10 | 11 | data: 12 | dataset: FlyingThings3D_ME # Name of the dataset [StereoKITTI_ME, FlyingThings3D_ME, SemanticKITTI_ME, LidarKITTI_ME, WaymoOpen_ME] 13 | root: ./data/flying_things_3d/ # Path to the data 14 | input_features: absolute_coords # Input features assigned to each sparse voxel 15 | remove_ground: False # Remove ground by simple thresholding of the height coordinate 16 | augment_data: False # Augment the data by random rotation and translation 17 | 18 | network: 19 | normalize_features: True # If the feature for the correspondence computation should be normalized 20 | norm_type: IN # Type of normalization layer IN = instance, BN = batch normalization, False = no normalization 21 | in_kernel_size: 7 # Size of the initial convolutional kernel 22 | feature_dim: 64 23 | use_pretrained: True # Flag for training 24 | pretrained_path: '' # Path to the pretrained model 25 | 26 | loss: 27 | background_loss: False # Compute background loss 28 | flow_loss: True # Compute flow loss 29 | ego_loss: False # Compute ego-motion loss 30 | foreground_loss: False # Compute foreground loss 31 | 32 | metrics: 33 | flow: True # Compute evaluation metrics for flow estimation (EPE3D, Acc3DS, Acc3DR, Outliers) 34 | ego_motion: False # Compute evaluation metrics for ego-motion estimation (RRE, RTE) 35 | semantic: False # Compute evaluation metrics for background segmentation (Precision, Recall) 36 | -------------------------------------------------------------------------------- /configs/train/train_weakly_supervised.yaml: -------------------------------------------------------------------------------- 1 | method: 2 | backbone: 'ME' # Type of backbone network [ME] 3 | flow: True # Use scene-flow head 4 | ego_motion: True # Use ego-motion head 5 | semantic: True # Use background segmentation head 6 | clustering: True # Use foreground clustering head 7 | 8 | misc: 9 | run_mode: train # Mode to run the network in 10 | 11 | data: 12 | dataset: SemanticKITTI_ME # Name of the dataset [StereoKITTI_ME, FlyingThings3D_ME, SemanticKITTI_ME, LidarKITTI_ME, WaymoOpen_ME] 13 | root: ./data/semantic_kitti/ # Path to the data 14 | input_features: absolute_coords # Input features assigned to each sparse voxel 15 | n_classes: 2 # Number of classes for the background segmentation head 16 | remove_ground: True # Remove ground by simple thresholding of the height coordinate 17 | augment_data: True # Augment the data by random rotation and translation 18 | 19 | network: 20 | normalize_features: True 21 | norm_type: IN # Type of normalization layer IN = instance, BN = batch normalization, False = no normalization 22 | in_kernel_size: 7 # Size of the initial convolutional kernel 23 | feature_dim: 64 24 | ego_motion_points: 1024 # Number of points that are randomly sampled for the ego motion estimation 25 | add_slack: True # Add slack row and column in the Sinkhorn iteration module 26 | sinkhorn_iter: 3 # Number of Sinkhorn iterations in the ego motion module 27 | use_pretrained: True # Flag for training 28 | cluster_metric: euclidean # Distance metric used to compute the cluster assignments 0 = Euclidean 29 | min_p_cluster: 30 # Min number of points in a cluster 30 | min_samples_dbscan: 5 # Min number of points in the neighborhood DBSCAN 31 | eps_dbscan: 0.75 # Eps value in DBSCAN for the Euclidean distance 32 | pretrained_path: '' # Path to the pretrained model 33 | 34 | loss: 35 | background_loss: True # Compute background loss 36 | flow_loss: False # Compute flow loss 37 | ego_loss: True # Compute ego-motion loss 38 | foreground_loss: True # Compute foreground loss 39 | 40 | metrics: 41 | flow: False # Compute evaluation metrics for flow estimation (EPE3D, Acc3DS, Acc3DR, Outliers) 42 | ego_motion: True # Compute evaluation metrics for ego-motion estimation (RRE, RTE) 43 | semantic: True # Compute evaluation metrics for background segmentation (Precision, Recall) 44 | 45 | -------------------------------------------------------------------------------- /data_preprocessing/IO.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3.4 2 | 3 | import os 4 | import re 5 | import numpy as np 6 | import uuid 7 | from scipy import misc 8 | import numpy as np 9 | from PIL import Image 10 | import sys 11 | 12 | 13 | def read(file): 14 | if file.endswith('.float3'): return readFloat(file) 15 | elif file.endswith('.flo'): return readFlow(file) 16 | elif file.endswith('.ppm'): return readImage(file) 17 | elif file.endswith('.pgm'): return readImage(file) 18 | elif file.endswith('.png'): return readImage(file) 19 | elif file.endswith('.jpg'): return readImage(file) 20 | elif file.endswith('.pfm'): return readPFM(file)[0] 21 | else: raise Exception('don\'t know how to read %s' % file) 22 | 23 | def write(file, data): 24 | if file.endswith('.float3'): return writeFloat(file, data) 25 | elif file.endswith('.flo'): return writeFlow(file, data) 26 | elif file.endswith('.ppm'): return writeImage(file, data) 27 | elif file.endswith('.pgm'): return writeImage(file, data) 28 | elif file.endswith('.png'): return writeImage(file, data) 29 | elif file.endswith('.jpg'): return writeImage(file, data) 30 | elif file.endswith('.pfm'): return writePFM(file, data) 31 | else: raise Exception('don\'t know how to write %s' % file) 32 | 33 | def readPFM(file): 34 | file = open(file, 'rb') 35 | 36 | color = None 37 | width = None 38 | height = None 39 | scale = None 40 | endian = None 41 | 42 | header = file.readline().rstrip() 43 | if header.decode("ascii") == 'PF': 44 | color = True 45 | elif header.decode("ascii") == 'Pf': 46 | color = False 47 | else: 48 | raise Exception('Not a PFM file.') 49 | 50 | dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode("ascii")) 51 | if dim_match: 52 | width, height = list(map(int, dim_match.groups())) 53 | else: 54 | raise Exception('Malformed PFM header.') 55 | 56 | scale = float(file.readline().decode("ascii").rstrip()) 57 | if scale < 0: # little-endian 58 | endian = '<' 59 | scale = -scale 60 | else: 61 | endian = '>' # big-endian 62 | 63 | data = np.fromfile(file, endian + 'f') 64 | shape = (height, width, 3) if color else (height, width) 65 | 66 | data = np.reshape(data, shape) 67 | data = np.flipud(data) 68 | return data, scale 69 | 70 | def writePFM(file, image, scale=1): 71 | file = open(file, 'wb') 72 | 73 | color = None 74 | 75 | if image.dtype.name != 'float32': 76 | raise Exception('Image dtype must be float32.') 77 | 78 | image = np.flipud(image) 79 | 80 | if len(image.shape) == 3 and image.shape[2] == 3: # color image 81 | color = True 82 | elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1: # greyscale 83 | color = False 84 | else: 85 | raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.') 86 | 87 | file.write('PF\n' if color else 'Pf\n'.encode()) 88 | file.write('%d %d\n'.encode() % (image.shape[1], image.shape[0])) 89 | 90 | endian = image.dtype.byteorder 91 | 92 | if endian == '<' or endian == '=' and sys.byteorder == 'little': 93 | scale = -scale 94 | 95 | file.write('%f\n'.encode() % scale) 96 | 97 | image.tofile(file) 98 | 99 | def readFlow(name): 100 | if name.endswith('.pfm') or name.endswith('.PFM'): 101 | return readPFM(name)[0][:,:,0:2] 102 | 103 | f = open(name, 'rb') 104 | 105 | header = f.read(4) 106 | if header.decode("utf-8") != 'PIEH': 107 | raise Exception('Flow file header does not contain PIEH') 108 | 109 | width = np.fromfile(f, np.int32, 1).squeeze() 110 | height = np.fromfile(f, np.int32, 1).squeeze() 111 | 112 | flow = np.fromfile(f, np.float32, width * height * 2).reshape((height, width, 2)) 113 | 114 | return flow.astype(np.float32) 115 | 116 | def readImage(name): 117 | if name.endswith('.pfm') or name.endswith('.PFM'): 118 | data = readPFM(name)[0] 119 | if len(data.shape)==3: 120 | return data[:,:,0:3] 121 | else: 122 | return data 123 | 124 | return misc.imread(name) 125 | 126 | def writeImage(name, data): 127 | if name.endswith('.pfm') or name.endswith('.PFM'): 128 | return writePFM(name, data, 1) 129 | 130 | return misc.imsave(name, data) 131 | 132 | def writeFlow(name, flow): 133 | f = open(name, 'wb') 134 | f.write('PIEH'.encode('utf-8')) 135 | np.array([flow.shape[1], flow.shape[0]], dtype=np.int32).tofile(f) 136 | flow = flow.astype(np.float32) 137 | flow.tofile(f) 138 | 139 | def readFloat(name): 140 | f = open(name, 'rb') 141 | 142 | if(f.readline().decode("utf-8")) != 'float\n': 143 | raise Exception('float file %s did not contain keyword' % name) 144 | 145 | dim = int(f.readline()) 146 | 147 | dims = [] 148 | count = 1 149 | for i in range(0, dim): 150 | d = int(f.readline()) 151 | dims.append(d) 152 | count *= d 153 | 154 | dims = list(reversed(dims)) 155 | 156 | data = np.fromfile(f, np.float32, count).reshape(dims) 157 | if dim > 2: 158 | data = np.transpose(data, (2, 1, 0)) 159 | data = np.transpose(data, (1, 0, 2)) 160 | 161 | return data 162 | 163 | def writeFloat(name, data): 164 | f = open(name, 'wb') 165 | 166 | dim=len(data.shape) 167 | if dim>3: 168 | raise Exception('bad float file dimension: %d' % dim) 169 | 170 | f.write(('float\n').encode('ascii')) 171 | f.write(('%d\n' % dim).encode('ascii')) 172 | 173 | if dim == 1: 174 | f.write(('%d\n' % data.shape[0]).encode('ascii')) 175 | else: 176 | f.write(('%d\n' % data.shape[1]).encode('ascii')) 177 | f.write(('%d\n' % data.shape[0]).encode('ascii')) 178 | for i in range(2, dim): 179 | f.write(('%d\n' % data.shape[i]).encode('ascii')) 180 | 181 | data = data.astype(np.float32) 182 | if dim==2: 183 | data.tofile(f) 184 | 185 | else: 186 | np.transpose(data, (2, 0, 1)).tofile(f) 187 | 188 | -------------------------------------------------------------------------------- /data_preprocessing/flyingthings3d_utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def next_pixel2pc(flow, disparity, save_path=None, f=-1050., cx=479.5, cy=269.5): 5 | height, width = disparity.shape 6 | 7 | BASELINE = 1.0 8 | depth = -1. * f * BASELINE / disparity 9 | 10 | x = ((np.tile(np.arange(width, dtype=np.float32)[None, :], (height, 1)) - cx + flow[..., 0]) * -1. / disparity)[:, 11 | :, None] 12 | y = ((np.tile(np.arange(height, dtype=np.float32)[:, None], (1, width)) - cy + flow[..., 1]) * 1. / disparity)[:, 13 | :, None] 14 | pc = np.concatenate((x, y, depth[:, :, None]), axis=-1) 15 | 16 | if save_path is not None: 17 | np.save(save_path, pc) 18 | return pc 19 | 20 | 21 | def pixel2pc(disparity, save_path=None, f=-1050., cx=479.5, cy=269.5): 22 | height, width = disparity.shape 23 | 24 | BASELINE = 1.0 25 | depth = -1. * f * BASELINE / disparity 26 | 27 | x = ((np.tile(np.arange(width, dtype=np.float32)[None, :], (height, 1)) - cx) * -1. / disparity)[:, :, None] 28 | y = ((np.tile(np.arange(height, dtype=np.float32)[:, None], (1, width)) - cy) * 1. / disparity)[:, :, None] 29 | pc = np.concatenate((x, y, depth[:, :, None]), axis=-1) 30 | 31 | if save_path is not None: 32 | np.save(save_path, pc) 33 | return pc -------------------------------------------------------------------------------- /data_preprocessing/kitti_utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import png 3 | 4 | 5 | def pixel2xyz(depth, P_rect, px=None, py=None): 6 | assert P_rect[0,1] == 0 7 | assert P_rect[1,0] == 0 8 | assert P_rect[2,0] == 0 9 | assert P_rect[2,1] == 0 10 | assert P_rect[0,0] == P_rect[1,1] 11 | focal_length_pixel = P_rect[0,0] 12 | 13 | height, width = depth.shape[:2] 14 | if px is None: 15 | px = np.tile(np.arange(width, dtype=np.float32)[None, :], (height, 1)) 16 | if py is None: 17 | py = np.tile(np.arange(height, dtype=np.float32)[:, None], (1, width)) 18 | const_x = P_rect[0,2] * depth + P_rect[0,3] 19 | const_y = P_rect[1,2] * depth + P_rect[1,3] 20 | 21 | x = ((px * (depth + P_rect[2,3]) - const_x) / focal_length_pixel) [:, :, None] 22 | y = ((py * (depth + P_rect[2,3]) - const_y) / focal_length_pixel) [:, :, None] 23 | pc = np.concatenate((x, y, depth[:, :, None]), axis=-1) 24 | 25 | pc[..., :2] *= -1. 26 | return pc 27 | 28 | 29 | def load_uint16PNG(fpath): 30 | reader = png.Reader(fpath) 31 | pngdata = reader.read() 32 | px_array = np.vstack( map(np.uint16, pngdata[2]) ) 33 | if pngdata[3]['planes'] == 3: 34 | width, height = pngdata[:2] 35 | px_array = px_array.reshape(height, width, 3) 36 | return px_array 37 | 38 | 39 | def load_disp(fpath): 40 | # A 0 value indicates an invalid pixel (ie, no 41 | # ground truth exists, or the estimation algorithm didn't produce an estimate 42 | # for that pixel). 43 | array = load_uint16PNG(fpath) 44 | valid = array > 0 45 | disp = array.astype(np.float32) / 256.0 46 | disp[np.logical_not(valid)] = -1. 47 | return disp, valid 48 | 49 | 50 | def load_op_flow(fpath): 51 | array = load_uint16PNG(fpath) 52 | valid = array[..., -1] == 1 53 | array = array.astype(np.float32) 54 | flow = (array[..., :-1] - 2**15) / 64. 55 | return flow, valid 56 | 57 | 58 | def disp_2_depth(disparity, valid_disp, FOCAL_LENGTH_PIXEL): 59 | BASELINE = 0.54 60 | depth = FOCAL_LENGTH_PIXEL * BASELINE / (disparity + 1e-5) 61 | depth[np.logical_not(valid_disp)] = -1. 62 | return depth 63 | -------------------------------------------------------------------------------- /data_preprocessing/process_flyingthings3d_subset.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import sys, os 3 | import os.path as osp 4 | from multiprocessing import Pool 5 | import argparse 6 | 7 | import IO 8 | from flyingthings3d_utils import * 9 | 10 | parser = argparse.ArgumentParser() 11 | parser.add_argument('--raw_data_path', type=str, help="path to the raw data") 12 | parser.add_argument('--save_path', type=str, help="save path") 13 | parser.add_argument('--only_save_near_pts', dest='save_near', action='store_true', 14 | help='only save near points to save disk space') 15 | 16 | args = parser.parse_args() 17 | root_path = args.raw_data_path 18 | save_path = args.save_path 19 | 20 | splits = ['train', 'val'] 21 | 22 | 23 | def process_one_file(params): 24 | try: 25 | train_val, fname = params 26 | 27 | save_folder_path = osp.join(save_path, train_val) 28 | os.makedirs(save_folder_path, exist_ok=True) 29 | 30 | disp1 = IO.read(osp.join(root_path, train_val, 'disparity', 'left', fname + '.pfm')) 31 | disp1_occ = IO.read(osp.join(root_path, train_val, 'disparity_occlusions', 'left', fname + '.png')) 32 | disp1_change = IO.read( 33 | osp.join(root_path, train_val, 'disparity_change', 'left', 'into_future', fname + '.pfm')) 34 | flow = IO.read(osp.join(root_path, train_val, 'flow', 'left', 'into_future', fname + '.flo')) 35 | flow_occ = IO.read(osp.join(root_path, train_val, 'flow_occlusions', 'left', 'into_future', fname + '.png')) 36 | instance_mask = IO.read(osp.join(root_path, train_val, 'object_ids', 'left', fname + '.png')) 37 | 38 | pc1 = pixel2pc(disp1) 39 | pc2 = next_pixel2pc(flow, disp1 + disp1_change) 40 | 41 | if pc1[..., -1].max() > 0 or pc2[..., -1].max() > 0: 42 | print('z > 0', train_val, fname, pc1[..., -1].max(), pc1[..., -1].min(), pc2[..., -1].max(), 43 | pc2[..., -1].min()) 44 | 45 | valid_mask = np.logical_and(disp1_occ == 0, flow_occ == 0) 46 | 47 | pc1 = pc1[valid_mask] 48 | pc2 = pc2[valid_mask] 49 | flow = pc2 - pc1 50 | instance_mask = instance_mask[valid_mask] 51 | 52 | inst_cnt = 0 53 | 54 | 55 | 56 | if args.save_near: 57 | near_mask = np.logical_and(pc1[..., -1] > -35., pc2[..., -1] > -35.) 58 | pc1 = pc1[near_mask] 59 | pc2 = pc2[near_mask] 60 | flow = flow[near_mask] 61 | instance_mask = instance_mask[near_mask] 62 | 63 | # Map instance labels to small numbers 64 | for inst_label in set(instance_mask.tolist()): 65 | instance_mask[instance_mask==inst_label] = inst_cnt 66 | inst_cnt += 1 67 | 68 | if not args.save_near: 69 | np.savez_compressed(osp.join(save_folder_path, '{}.npz'.format(fname)), pc1=pc1, pc2=pc2, 70 | flow=flow, inst_pc1=instance_mask, 71 | inst_pc2=instance_mask) 72 | else: 73 | near_mask = np.logical_and(pc1[..., -1] > -35., pc2[..., -1] > -35.) 74 | np.savez_compressed(osp.join(save_folder_path, '{}.npz'.format(fname)), pc1=pc1[near_mask], 75 | pc2=pc2[near_mask], flow=flow[near_mask], 76 | inst_pc1=instance_mask[near_mask], 77 | inst_pc2=instance_mask[near_mask]) 78 | 79 | #np.savetxt('test.csv',np.concatenate((pc1,instance_mask.reshape(-1,1)),axis=1)) 80 | except Exception as ex: 81 | print('error in addressing params', params, 'see exception:') 82 | print(ex) 83 | sys.stdout.flush() 84 | return 85 | 86 | 87 | if __name__ == '__main__': 88 | param_list = [] 89 | for train_val in splits: 90 | tmp_path = osp.join(root_path, train_val, 'disparity_change', 'left', 'into_future') 91 | param_list.extend([(train_val, item.split('.')[0]) for item in os.listdir(tmp_path)]) 92 | 93 | pool = Pool(10) 94 | pool.map(process_one_file, param_list) 95 | pool.close() 96 | pool.join() 97 | 98 | print('Finish all!') 99 | -------------------------------------------------------------------------------- /data_preprocessing/process_kitti.py: -------------------------------------------------------------------------------- 1 | import os, sys 2 | import os.path as osp 3 | import numpy as np 4 | from multiprocessing import Pool 5 | import IO 6 | 7 | from kitti_utils import * 8 | 9 | 10 | data_root = sys.argv[1] 11 | calib_root = osp.join(data_root, 'training/calib_cam_to_cam') 12 | disp1_root = osp.join(data_root, 'training/disp_occ_0') 13 | disp2_root = osp.join(data_root, 'training/disp_occ_1') 14 | op_flow_root = osp.join(data_root, 'training/flow_occ') 15 | obj_map_root = osp.join(data_root, 'training/obj_map') 16 | save_path = sys.argv[2] 17 | 18 | 19 | def process_one_frame(idx): 20 | sidx = '{:06d}'.format(idx) 21 | 22 | calib_path = osp.join(calib_root, sidx + '.txt') 23 | with open(calib_path) as fd: 24 | lines = fd.readlines() 25 | assert len([line for line in lines if line.startswith('P_rect_02')]) == 1 26 | P_rect_left = \ 27 | np.array([float(item) for item in 28 | [line for line in lines if line.startswith('P_rect_02')][0].split()[1:]], 29 | dtype=np.float32).reshape(3, 4) 30 | 31 | assert P_rect_left[0, 0] == P_rect_left[1, 1] 32 | focal_length_pixel = P_rect_left[0, 0] 33 | 34 | disp1_path = osp.join(disp1_root, sidx + '_10.png') 35 | disp1, valid_disp1 = load_disp(disp1_path) 36 | depth1 = disp_2_depth(disp1, valid_disp1, focal_length_pixel) 37 | pc1 = pixel2xyz(depth1, P_rect_left) 38 | 39 | disp2_path = osp.join(disp2_root, sidx + '_10.png') 40 | disp2, valid_disp2 = load_disp(disp2_path) 41 | depth2 = disp_2_depth(disp2, valid_disp2, focal_length_pixel) 42 | 43 | valid_disp = np.logical_and(valid_disp1, valid_disp2) 44 | 45 | op_flow, valid_op_flow = load_op_flow(osp.join(op_flow_root, '{:06d}_10.png'.format(idx))) 46 | vertical = op_flow[..., 1] 47 | horizontal = op_flow[..., 0] 48 | height, width = op_flow.shape[:2] 49 | 50 | px2 = np.zeros((height, width), dtype=np.float32) 51 | py2 = np.zeros((height, width), dtype=np.float32) 52 | 53 | obj_map_1 = osp.join(obj_map_root, sidx + '_10.png') 54 | instance_mask = IO.read(obj_map_1) 55 | 56 | for i in range(height): 57 | for j in range(width): 58 | if valid_op_flow[i, j] and valid_disp[i, j]: 59 | try: 60 | dx = horizontal[i, j] 61 | dy = vertical[i, j] 62 | except: 63 | print('error, i,j:', i, j, 'hor and ver:', horizontal[i, j], vertical[i, j]) 64 | continue 65 | 66 | px2[i, j] = j + dx 67 | py2[i, j] = i + dy 68 | 69 | pc2 = pixel2xyz(depth2, P_rect_left, px=px2, py=py2) 70 | 71 | # Only consider points/pixels with valid disparity and flow information 72 | final_mask = np.logical_and(valid_disp, valid_op_flow) 73 | 74 | valid_pc1 = pc1[final_mask] 75 | valid_pc2 = pc2[final_mask] 76 | flow = valid_pc2 - valid_pc1 77 | instance_mask = instance_mask[final_mask] 78 | 79 | np.savetxt('test.csv',np.concatenate((valid_pc2,instance_mask.reshape(-1,1)),axis=1)) 80 | 81 | 82 | np.savez_compressed(osp.join(save_path, '{:06d}.npz'.format(idx)), pc1=valid_pc1, pc2=valid_pc2, 83 | flow=flow, inst_pc1=instance_mask, 84 | inst_pc2=instance_mask) 85 | 86 | 87 | pool = Pool(10) 88 | indices = range(200) 89 | pool.map(process_one_frame, indices) 90 | pool.close() 91 | pool.join() 92 | -------------------------------------------------------------------------------- /data_preprocessing/process_semantKitti.py: -------------------------------------------------------------------------------- 1 | import os 2 | import glob 3 | import argparse 4 | import re 5 | import copy 6 | 7 | import open3d as o3d 8 | import numpy as np 9 | from multiprocessing import Pool 10 | 11 | # Some of the functions are taken from pykitti https://github.com/utiasSTARS/pykitti/blob/master/pykitti/utils.py 12 | def load_velo_scan(file): 13 | """Load and parse a velodyne binary file.""" 14 | scan = np.fromfile(file, dtype=np.float32) 15 | return scan.reshape((-1, 4)) 16 | 17 | def load_poses(file): 18 | """Load and parse ground truth poses""" 19 | tmp_poses = np.genfromtxt(file, delimiter=' ').reshape(-1,3,4) 20 | poses = np.repeat(np.expand_dims(np.eye(4),0), tmp_poses.shape[0], axis=0) 21 | poses[:,0:3,:] = tmp_poses 22 | return poses 23 | 24 | def read_calib_file(filepath): 25 | """Read in a calibration file and parse into a dictionary.""" 26 | data = {} 27 | 28 | with open(filepath, 'r') as f: 29 | for line in f.readlines(): 30 | key, value = line.split(':', 1) 31 | # The only non-float values in these files are dates, which 32 | # we don't care about anyway 33 | try: 34 | data[key] = np.array([float(x) for x in value.split()]) 35 | except ValueError: 36 | pass 37 | 38 | return data 39 | 40 | # This part of the code is taken from the semanticKITTI API 41 | 42 | def open_label(filename): 43 | """ Open raw scan and fill in attributes 44 | """ 45 | # check filename is string 46 | if not isinstance(filename, str): 47 | raise TypeError("Filename should be string type, " 48 | "but was {type}".format(type=str(type(filename)))) 49 | 50 | # if all goes well, open label 51 | label = np.fromfile(filename, dtype=np.uint32) 52 | label = label.reshape((-1)) 53 | 54 | return label 55 | 56 | def set_label(label, points): 57 | """ Set points for label not from file but from np 58 | """ 59 | # check label makes sense 60 | if not isinstance(label, np.ndarray): 61 | raise TypeError("Label should be numpy array") 62 | 63 | # only fill in attribute if the right size 64 | if label.shape[0] == points.shape[0]: 65 | sem_label = label & 0xFFFF # semantic label in lower half 66 | inst_label = label >> 16 # instance id in upper half 67 | else: 68 | print("Points shape: ", points.shape) 69 | print("Label shape: ", label.shape) 70 | raise ValueError("Scan and Label don't contain same number of points") 71 | 72 | # sanity check 73 | assert((sem_label + (inst_label << 16) == label).all()) 74 | 75 | return sem_label, inst_label 76 | 77 | 78 | 79 | 80 | 81 | def transform_point_cloud(x1, R, t): 82 | """ 83 | Transforms the point cloud using the giver transformation paramaters 84 | 85 | Args: 86 | x1 (np array): points of the point cloud [n,3] 87 | R (np array): estimated rotation matrice [3,3] 88 | t (np array): estimated translation vectors [3,1] 89 | Returns: 90 | x1_t (np array): points of the transformed point clouds [n,3] 91 | """ 92 | x1_t = (np.matmul(R, x1.transpose()) + t).transpose() 93 | 94 | return x1_t 95 | 96 | def add_argument_group(name): 97 | arg = parser.add_argument_group(name) 98 | arg_lists.append(arg) 99 | return arg 100 | 101 | def sorted_alphanum(file_list_ordered): 102 | """ 103 | Sorts the list alphanumerically 104 | Args: 105 | file_list_ordered (list): list of files to be sorted 106 | Return: 107 | sorted_list (list): input list sorted alphanumerically 108 | """ 109 | def convert(text): 110 | return int(text) if text.isdigit() else text 111 | 112 | def alphanum_key(key): 113 | return [convert(c) for c in re.split('([0-9]+)', key)] 114 | 115 | sorted_list = sorted(file_list_ordered, key=alphanum_key) 116 | 117 | return sorted_list 118 | 119 | def get_file_list(path, extension=None): 120 | """ 121 | Build a list of all the files in the provided path 122 | Args: 123 | path (str): path to the directory 124 | extension (str): only return files with this extension 125 | Return: 126 | file_list (list): list of all the files (with the provided extension) sorted alphanumerically 127 | """ 128 | if extension is None: 129 | file_list = [os.path.join(path, f) for f in os.listdir(path) if os.path.isfile(os.path.join(path, f))] 130 | else: 131 | file_list = [ 132 | os.path.join(path, f) 133 | for f in os.listdir(path) 134 | if os.path.isfile(os.path.join(path, f)) and os.path.splitext(f)[1] == extension 135 | ] 136 | file_list = sorted_alphanum(file_list) 137 | 138 | return file_list 139 | 140 | 141 | def get_folder_list(path): 142 | """ 143 | Build a list of all the files in the provided path 144 | Args: 145 | path (str): path to the directory 146 | extension (str): only return files with this extension 147 | Returns: 148 | file_list (list): list of all the files (with the provided extension) sorted alphanumerically 149 | """ 150 | folder_list = [os.path.join(path, f) for f in os.listdir(path) if os.path.isdir(os.path.join(path, f))] 151 | folder_list = sorted_alphanum(folder_list) 152 | 153 | return folder_list 154 | 155 | 156 | def extract_moving_objects(save_path, frame_idx, pts, sem_label, inst_label, moving_threshold = 100): 157 | """ 158 | Extracts the point belonging to individual moving objects and saves them to a file 159 | Args: 160 | save_path (str): path where to save the files 161 | frame_idx (str): current frame number 162 | pts (np.array): point cloud of the source frame 163 | sem_label (np.array): semantic labels 164 | inst_label (np.array): temporally consistent instance labels 165 | moving_threshold (int): label above which the classes denote moving objects 166 | 167 | Returns: 168 | 169 | """ 170 | moving_idx_s = np.where(sem_label >= 100)[0] 171 | 172 | # Filter out the points and labels 173 | sem_label = sem_label[moving_idx_s] 174 | inst_label = inst_label[moving_idx_s] 175 | pts = pts[moving_idx_s,:] 176 | 177 | # Unique semantic labels 178 | unique_labels = np.unique(sem_label) 179 | 180 | pcd = o3d.geometry.PointCloud() 181 | for label in unique_labels: 182 | class_idx = np.where(sem_label == label)[0] 183 | class_instances = inst_label[class_idx] 184 | class_points = pts[class_idx,:] 185 | tmp_instances = np.unique(class_instances) 186 | 187 | for instance in tmp_instances: 188 | object_idx = np.where(class_instances == instance)[0] 189 | object_points = class_points[object_idx, :] 190 | 191 | # Save the points and sample a random color 192 | object_color = np.repeat(np.random.random(size=3).reshape(1,-1),repeats=object_points.shape[0], axis=0) 193 | pcd.points = o3d.utility.Vector3dVector(object_points) 194 | pcd.colors = o3d.utility.Vector3dVector(object_color) 195 | 196 | if not os.path.exists(os.path.join(save_path, 'objects', '{}_{}'.format(label, instance))): 197 | os.makedirs(os.path.join(save_path, 'objects', '{}_{}'.format(label, instance))) 198 | 199 | 200 | # Save point in the npz and ply format 201 | np.savez(os.path.join(save_path, 'objects', '{}_{}'.format(label, instance),'{}.npz'.format(frame_idx)), 202 | pts=object_points) 203 | 204 | o3d.io.write_point_cloud(os.path.join(save_path, 'objects', '{}_{}'.format(label, instance), 205 | '{}.ply'.format(frame_idx)), pcd) 206 | 207 | 208 | 209 | class semanticKITTIProcesor: 210 | def __init__(self, args): 211 | self.root_path = args.raw_data_path 212 | self.save_path = args.save_path 213 | self.save_ply = args.save_ply 214 | self.save_near = args.save_near 215 | self.n_processes = args.n_processes 216 | 217 | self.scenes = get_folder_list(self.root_path) 218 | 219 | def run_processing(self): 220 | 221 | if self.n_processes < 1: 222 | self.n_processes = 1 223 | 224 | pool = Pool(self.n_processes) 225 | pool.map(self.process_scene, self.scenes) 226 | pool.close() 227 | pool.join() 228 | 229 | def process_scene(self, scene): 230 | scene_name = scene.split(os.sep)[-1] 231 | 232 | # Create a save file if not existing 233 | if not os.path.exists(os.path.join(self.save_path, scene_name)): 234 | os.makedirs(os.path.join(self.save_path, scene_name)) 235 | 236 | # Load transformation paramters 237 | poses = load_poses(os.path.join(scene,'poses.txt')) 238 | tr_velo_cam = read_calib_file(os.path.join(scene,'calib.txt'))['Tr'].reshape(3,4) 239 | tr_velo_cam = np.concatenate((tr_velo_cam,np.array([0,0,0,1]).reshape(1,4)),axis=0) 240 | frames = get_file_list(os.path.join(scene,'velodyne'), extension='.bin') 241 | 242 | if os.path.isdir(os.path.join(scene,'labels')): 243 | labels = get_file_list(os.path.join(scene,'labels'), extension='.label') 244 | test_scene = False 245 | 246 | assert len(frames) == len(labels), "Number of point cloud fils and label files is not the same!!" 247 | 248 | else: 249 | test_scene = True 250 | 251 | 252 | 253 | 254 | for idx in range(len(frames)-1): 255 | frame_name_s = frames[idx].split(os.sep)[-1].split('.')[0] 256 | frame_name_t = frames[idx + 1].split(os.sep)[-1].split('.')[0] 257 | 258 | pc_s = load_velo_scan(frames[idx])[:,:3] 259 | pc_t = load_velo_scan(frames[idx + 1])[:,:3] 260 | 261 | # Transform both point cloud to the camera coordinate system (check KITTI webpage) 262 | pc_s = transform_point_cloud(pc_s, tr_velo_cam[:3, :3], tr_velo_cam[:3, 3:4]) 263 | pc_t = transform_point_cloud(pc_t, tr_velo_cam[:3, :3], tr_velo_cam[:3, 3:4]) 264 | 265 | # Rotate 180 degrees around z axis (to be in accordance to KITTI flow as used by other datsets) 266 | pc_s[:,0], pc_s[:,1] = -pc_s[:,0], -pc_s[:,1] 267 | pc_t[:,0], pc_t[:,1] = -pc_t[:,0], -pc_t[:,1] 268 | 269 | 270 | 271 | 272 | if not test_scene: 273 | # Load the labels 274 | sem_label_s, inst_label_s = set_label(open_label(labels[idx]), pc_s) 275 | sem_label_t, inst_label_t = set_label(open_label(labels[idx + 1]), pc_t) 276 | 277 | # Filter out points which are behind the car (to be in accordance with the stereo datasets) 278 | front_mask_s = pc_s[:,2] > 1.5 279 | front_mask_t = pc_t[:,2] > 1.5 280 | pc_s = pc_s[front_mask_s, :] 281 | pc_t = pc_t[front_mask_t,:] 282 | 283 | sem_label_s = sem_label_s[front_mask_s] 284 | inst_label_s = inst_label_s[front_mask_s] 285 | 286 | sem_label_t = sem_label_t[front_mask_t] 287 | inst_label_t = inst_label_t[front_mask_t] 288 | 289 | if self.save_near: 290 | near_mask_s = pc_s[:,2] < 35 291 | near_mask_t = pc_t[:,2] < 35 292 | pc_s = pc_s[near_mask_s, :] 293 | pc_t = pc_t[near_mask_t,:] 294 | 295 | sem_label_s = sem_label_s[near_mask_s] 296 | inst_label_s = inst_label_s[near_mask_s] 297 | 298 | sem_label_t = sem_label_t[near_mask_t] 299 | inst_label_t = inst_label_t[near_mask_t] 300 | 301 | # Extract the stable parts (sem. labels above 99 denote moving objects) 302 | # Could also remove 11, 15, 30, 31, 32 (classes like cyclist, person, ...) 303 | # Motion labels are 1 if moving and 0 if stable 304 | stable_idx_s = np.where(sem_label_s < 100)[0] 305 | stable_idx_t = np.where(sem_label_t < 100)[0] 306 | mot_label_s = np.ones_like(sem_label_s) 307 | mot_label_s[stable_idx_s] = 0 308 | 309 | mot_label_t = np.ones_like(sem_label_t) 310 | mot_label_t[stable_idx_t] = 0 311 | 312 | 313 | # Extract ego motion from the gt poses 314 | T_st = np.matmul(poses[idx,:,:],np.linalg.inv(poses[idx + 1,:,:])) 315 | 316 | 317 | 318 | np.savez_compressed(os.path.join(self.save_path, scene_name, '{}_{}.npz'.format(frame_name_s, frame_name_t)), 319 | pc1=pc_s, 320 | pc2=pc_t, 321 | sem_label_s=sem_label_s, 322 | sem_label_t=sem_label_t, 323 | inst_label_s=inst_label_s, 324 | inst_label_t=inst_label_t, 325 | mot_label_s=mot_label_s, 326 | mot_label_t=mot_label_t, 327 | pose_s=poses[idx,:,:], 328 | pose_t=poses[idx + 1,:,:]) 329 | else: 330 | # Filter out points which are behind the car (to be in accordance with the stereo datasets) 331 | front_mask_s = pc_s[:,2] > 1.5 332 | front_mask_t = pc_t[:,2] > 1.5 333 | pc_s = pc_s[front_mask_s, :] 334 | pc_t = pc_t[front_mask_t,:] 335 | 336 | if self.save_near: 337 | near_mask_s = pc_s[:,2] < 35 338 | near_mask_t = pc_t[:,2] < 35 339 | pc_s = pc_s[near_mask_s, :] 340 | pc_t = pc_t[near_mask_t,:] 341 | 342 | np.savez_compressed(os.path.join(self.save_path, scene_name, '{}_{}.npz'.format(frame_name_s, frame_name_t)), 343 | pc1=pc_s, 344 | pc2=pc_t, 345 | pose_s=poses[idx,:,:], 346 | pose_t=poses[idx + 1,:,:]) 347 | 348 | 349 | # Save point clouds as ply files 350 | if self.save_ply: 351 | pcd_s = o3d.geometry.PointCloud() 352 | pcd_t = o3d.geometry.PointCloud() 353 | pcd_s.points = o3d.utility.Vector3dVector(pc_s) 354 | pcd_t.points = o3d.utility.Vector3dVector(pc_t) 355 | 356 | o3d.io.write_point_cloud(os.path.join(self.save_path, scene_name, '{}.ply'.format(frame_name_s)), pcd_s) 357 | o3d.io.write_point_cloud(os.path.join(self.save_path, scene_name, '{}.ply'.format(frame_name_t)), pcd_t) 358 | 359 | 360 | 361 | 362 | # Define and process command line arguments 363 | parser = argparse.ArgumentParser() 364 | parser.add_argument("--raw_data_path", type=str, default="test", help='path to the raw files') 365 | parser.add_argument('--save_path', type=str, help="save path") 366 | parser.add_argument('--n_processes', type=int, default=10, 367 | help='number of processes used for multi-processing') 368 | parser.add_argument('--save_ply', action='store_true', 369 | help='save point clouds also in ply format') 370 | parser.add_argument('--save_near', action='store_true', 371 | help='only save near points (less than 35m)') 372 | 373 | 374 | args = parser.parse_args() 375 | 376 | 377 | processor = semanticKITTIProcesor(args) 378 | 379 | processor.run_processing() -------------------------------------------------------------------------------- /data_preprocessing/python_pfm.py: -------------------------------------------------------------------------------- 1 | import re 2 | import numpy as np 3 | import sys 4 | 5 | 6 | def readPFM(file): 7 | file = open(file, 'rb') 8 | 9 | color = None 10 | width = None 11 | height = None 12 | scale = None 13 | endian = None 14 | 15 | header = file.readline().rstrip() 16 | if header == 'PF': 17 | color = True 18 | elif header == 'Pf': 19 | color = False 20 | else: 21 | raise Exception('Not a PFM file.') 22 | 23 | dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline()) 24 | if dim_match: 25 | width, height = map(int, dim_match.groups()) 26 | else: 27 | raise Exception('Malformed PFM header.') 28 | 29 | scale = float(file.readline().rstrip()) 30 | if scale < 0: # little-endian 31 | endian = '<' 32 | scale = -scale 33 | else: 34 | endian = '>' # big-endian 35 | 36 | data = np.fromfile(file, endian + 'f') 37 | shape = (height, width, 3) if color else (height, width) 38 | 39 | data = np.reshape(data, shape) 40 | data = np.flipud(data) 41 | return data, scale 42 | 43 | def writePFM(file, image, scale=1): 44 | file = open(file, 'wb') 45 | 46 | color = None 47 | 48 | if image.dtype.name != 'float32': 49 | raise Exception('Image dtype must be float32.') 50 | 51 | image = np.flipud(image) 52 | 53 | if len(image.shape) == 3 and image.shape[2] == 3: # color image 54 | color = True 55 | elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1: # greyscale 56 | color = False 57 | else: 58 | raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.') 59 | 60 | file.write('PF\n' if color else 'Pf\n') 61 | file.write('%d %d\n' % (image.shape[1], image.shape[0])) 62 | 63 | endian = image.dtype.byteorder 64 | 65 | if endian == '<' or endian == '=' and sys.byteorder == 'little': 66 | scale = -scale 67 | 68 | file.write('%f\n' % scale) 69 | 70 | image.tofile(file) 71 | 72 | -------------------------------------------------------------------------------- /eval.py: -------------------------------------------------------------------------------- 1 | import os 2 | import logging 3 | import torch 4 | import time 5 | import argparse 6 | import yaml 7 | 8 | import numpy as np 9 | from collections import defaultdict 10 | from tqdm import tqdm 11 | 12 | import lib.config as config 13 | from lib.utils import n_model_parameters, dict_all_to_device, load_checkpoint 14 | from lib.data import make_data_loader 15 | from lib.logger import prepare_logger 16 | 17 | 18 | 19 | # Set the random seeds for repeatability 20 | np.random.seed(41) 21 | torch.manual_seed(41) 22 | if torch.cuda.is_available(): 23 | torch.cuda.manual_seed(41) 24 | 25 | def main(cfg, logger): 26 | """ 27 | Main function of this evaluation software. After preparing the data loaders, and the model start with the evaluation process. 28 | Args: 29 | cfg (dict): current configuration paramaters 30 | """ 31 | 32 | # Create the output dir if it does not exist 33 | if not os.path.exists(cfg['test']['results_dir']): 34 | os.makedirs(cfg['test']['results_dir']) 35 | 36 | # Get model 37 | model = config.get_model(cfg) 38 | device = torch.device('cuda' if (torch.cuda.is_available() and cfg['misc']['use_gpu']) else 'cpu') 39 | 40 | # Get data loader 41 | eval_loader = make_data_loader(cfg, phase='test') 42 | 43 | # Log directory 44 | dataset_name = cfg["data"]["dataset"] 45 | 46 | path2log = os.path.join(cfg['test']['results_dir'], dataset_name, '{}_{}'.format(cfg['method']['backbone'], cfg['misc']['num_points'])) 47 | 48 | logger, checkpoint_dir = prepare_logger(cfg, path2log) 49 | 50 | # Output torch and cuda version 51 | 52 | logger.info('Torch version: {}'.format(torch.__version__)) 53 | logger.info('CUDA version: {}'.format(torch.version.cuda)) 54 | logger.info('Starting evaluation of the method {} on {} dataset'.format(cfg['method']['backbone'], dataset_name)) 55 | 56 | # Save config file that was used for this experiment 57 | with open(os.path.join(path2log, "config.yaml"),'w') as outfile: 58 | yaml.dump(cfg, outfile, default_flow_style=False, allow_unicode=True) 59 | 60 | 61 | logger.info("Parameter Count: {:d}".format(n_model_parameters(model))) 62 | 63 | # Load the pretrained weights 64 | if cfg['network']['use_pretrained'] and cfg['network']['pretrained_path']: 65 | model, optimizer, scheduler, epoch_it, total_it, metric_val_best = load_checkpoint(model, None, None, filename=cfg['network']['pretrained_path']) 66 | 67 | else: 68 | logger.warning('MODEL RUNS IN EVAL MODE, BUT NO PRETRAINED WEIGHTS WERE LOADED!!!!') 69 | 70 | 71 | # Initialize the trainer 72 | trainer = config.get_trainer(cfg, model,device) 73 | 74 | # if not a pretrained model epoch and iterations should be -1 75 | eval_metrics = defaultdict(list) 76 | start = time.time() 77 | 78 | for it, batch in enumerate(tqdm(eval_loader)): 79 | # Put all the tensors to the designated device 80 | dict_all_to_device(batch, device) 81 | 82 | 83 | metrics = trainer.eval_step(batch) 84 | 85 | for key in metrics: 86 | eval_metrics[key].append(metrics[key]) 87 | 88 | 89 | stop = time.time() 90 | 91 | # Compute mean values of the evaluation statistics 92 | result_string = '' 93 | 94 | for key, value in eval_metrics.items(): 95 | if key not in ['true_p', 'true_n', 'false_p', 'false_n']: 96 | result_string += '{}: {:.3f}; '.format(key, np.mean(value)) 97 | 98 | if 'true_p' in eval_metrics: 99 | result_string += '{}: {:.3f}; '.format('dataset_precision_f', (np.sum(eval_metrics['true_p']) / (np.sum(eval_metrics['true_p']) + np.sum(eval_metrics['false_p'])) )) 100 | result_string += '{}: {:.3f}; '.format('dataset_recall_f', (np.sum(eval_metrics['true_p']) / (np.sum(eval_metrics['true_p']) + np.sum(eval_metrics['false_n'])))) 101 | 102 | result_string += '{}: {:.3f}; '.format('dataset_precision_b', (np.sum(eval_metrics['true_n']) / (np.sum(eval_metrics['true_n']) + np.sum(eval_metrics['false_n'])))) 103 | result_string += '{}: {:.3f}; '.format('dataset_recall_b', (np.sum(eval_metrics['true_n']) / (np.sum(eval_metrics['true_n']) + np.sum(eval_metrics['false_p'])))) 104 | 105 | 106 | logger.info('Outputing the evaluation metric for: {} {} {} '.format('Flow, ' if cfg['metrics']['flow'] else '', 'Ego-Motion, ' if cfg['metrics']['ego_motion'] else '', 'Bckg. Segmentaion' if cfg['metrics']['semantic'] else '')) 107 | logger.info(result_string) 108 | logger.info('Evaluation completed in {}s [{}s per scene]'.format((stop - start), (stop - start)/len(eval_loader))) 109 | 110 | 111 | if __name__ == "__main__": 112 | logger = logging.getLogger 113 | 114 | 115 | parser = argparse.ArgumentParser() 116 | parser.add_argument('config', type=str, help= 'Path to the config file.') 117 | args = parser.parse_args() 118 | 119 | cfg = config.get_config(args.config, 'configs/default.yaml') 120 | 121 | main(cfg, logger) -------------------------------------------------------------------------------- /lib/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /lib/__pycache__/config.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/config.cpython-36.pyc -------------------------------------------------------------------------------- /lib/__pycache__/config.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/config.cpython-37.pyc -------------------------------------------------------------------------------- /lib/__pycache__/config.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/config.cpython-38.pyc -------------------------------------------------------------------------------- /lib/__pycache__/data.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/data.cpython-36.pyc -------------------------------------------------------------------------------- /lib/__pycache__/data.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/data.cpython-37.pyc -------------------------------------------------------------------------------- /lib/__pycache__/logger.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/logger.cpython-36.pyc -------------------------------------------------------------------------------- /lib/__pycache__/logger.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/logger.cpython-37.pyc -------------------------------------------------------------------------------- /lib/__pycache__/loss.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/loss.cpython-36.pyc -------------------------------------------------------------------------------- /lib/__pycache__/loss.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/loss.cpython-37.pyc -------------------------------------------------------------------------------- /lib/__pycache__/metrics.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/metrics.cpython-36.pyc -------------------------------------------------------------------------------- /lib/__pycache__/metrics.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/metrics.cpython-37.pyc -------------------------------------------------------------------------------- /lib/__pycache__/trainer.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/trainer.cpython-36.pyc -------------------------------------------------------------------------------- /lib/__pycache__/trainer.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/trainer.cpython-37.pyc -------------------------------------------------------------------------------- /lib/__pycache__/utils.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/utils.cpython-36.pyc -------------------------------------------------------------------------------- /lib/__pycache__/utils.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/utils.cpython-37.pyc -------------------------------------------------------------------------------- /lib/__pycache__/utils.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/__pycache__/utils.cpython-38.pyc -------------------------------------------------------------------------------- /lib/config.py: -------------------------------------------------------------------------------- 1 | from lib.model.rigid_3d_sf import MinkowskiFlow 2 | from lib.trainer import MEFlowTrainer 3 | import torch 4 | import yaml 5 | import torch.optim as optim 6 | 7 | model_dict = { 8 | 'ME': MinkowskiFlow, 9 | } 10 | 11 | trainer_dict = { 12 | 'ME': MEFlowTrainer, 13 | } 14 | 15 | 16 | def get_model(cfg): 17 | ''' 18 | Gets the model instance based on the input paramters. 19 | Args: 20 | cfg (dict): config dictionary 21 | 22 | Returns: 23 | model (nn.Module): torch model initialized with the input params 24 | ''' 25 | 26 | method = cfg['method']['backbone'] 27 | 28 | model = model_dict[method](cfg) 29 | 30 | return model 31 | 32 | 33 | def get_trainer(cfg, model, device): 34 | ''' 35 | Returns a trainer instance. 36 | Args: 37 | cfg (dict): config dictionary 38 | model (nn.Module): the model used for training 39 | device: torch device 40 | 41 | Returns: 42 | trainer (trainer instance): trainer instance used to train the network 43 | ''' 44 | 45 | method = cfg['method']['backbone'] 46 | trainer = trainer_dict[method](cfg, model, device) 47 | 48 | 49 | return trainer 50 | 51 | 52 | def get_optimizer(cfg, model): 53 | ''' 54 | Returns an optimizer instance. 55 | Args: 56 | cfg (dict): config dictionary 57 | model (nn.Module): the model used for training 58 | 59 | Returns: 60 | optimizer (optimizer instance): optimizer used to train the network 61 | ''' 62 | 63 | method = cfg['optimizer']['alg'] 64 | 65 | if method == "SGD": 66 | optimizer = getattr(optim, method)(model.parameters(), lr=cfg['optimizer']['learning_rate'], 67 | momentum=cfg['optimizer']['momentum'], 68 | weight_decay=cfg['optimizer']['weight_decay']) 69 | 70 | elif method == "Adam": 71 | optimizer = getattr(optim, method)(model.parameters(), lr=cfg['optimizer']['learning_rate'], 72 | weight_decay=cfg['optimizer']['weight_decay']) 73 | else: 74 | print("{} optimizer is not implemented, must be one of the [SGD, Adam]".format(method)) 75 | 76 | return optimizer 77 | 78 | 79 | def get_scheduler(cfg, optimizer): 80 | ''' 81 | Returns a learning rate scheduler 82 | Args: 83 | cfg (dict): config dictionary 84 | optimizer (torch.optim): optimizer used for training the network 85 | 86 | Returns: 87 | scheduler (optimizer instance): learning rate scheduler 88 | ''' 89 | 90 | method = cfg['optimizer']['scheduler'] 91 | 92 | if method == "ExponentialLR": 93 | scheduler = getattr(optim.lr_scheduler, method)(optimizer, gamma=cfg['optimizer']['exp_gamma']) 94 | else: 95 | print("{} scheduler is not implemented, must be one of the [ExponentialLR]".format(method)) 96 | 97 | return scheduler 98 | 99 | 100 | 101 | # General config 102 | def get_config(path, default_path='./configs/default.yaml'): 103 | ''' 104 | Loads config file. 105 | 106 | Args: 107 | path (str): path to config file 108 | default_path (bool): whether to use default path 109 | ''' 110 | # Load configuration from file itself 111 | with open(path, 'r') as f: 112 | cfg_special = yaml.safe_load(f) 113 | 114 | # Check if we should inherit from a config 115 | inherit_from = cfg_special.get('inherit_from') 116 | 117 | # If yes, load this config first as default 118 | # If no, use the default_path 119 | if inherit_from is not None: 120 | cfg = load_config(inherit_from, default_path) 121 | elif default_path is not None: 122 | with open(default_path, 'r') as f: 123 | cfg = yaml.safe_load(f) 124 | else: 125 | cfg = dict() 126 | 127 | # Include main configuration 128 | update_recursive(cfg, cfg_special) 129 | 130 | return cfg 131 | 132 | 133 | def update_recursive(dict1, dict2): 134 | ''' 135 | Update two config dictionaries recursively. 136 | 137 | Args: 138 | dict1 (dict): first dictionary to be updated 139 | dict2 (dict): second dictionary which entries should be used 140 | ''' 141 | for k, v in dict2.items(): 142 | if k not in dict1: 143 | dict1[k] = dict() 144 | if isinstance(v, dict): 145 | update_recursive(dict1[k], v) 146 | else: 147 | dict1[k] = v -------------------------------------------------------------------------------- /lib/data.py: -------------------------------------------------------------------------------- 1 | import os 2 | import torch 3 | import logging 4 | 5 | import numpy as np 6 | import torch.utils.data as data 7 | import MinkowskiEngine as ME 8 | 9 | 10 | def to_tensor(x): 11 | if isinstance(x, torch.Tensor): 12 | return x 13 | elif isinstance(x, np.ndarray): 14 | return torch.from_numpy(x) 15 | else: 16 | raise ValueError("Can not convert to torch tensor {}".format(x)) 17 | 18 | def collate_fn(list_data): 19 | pc_1,pc_2, coords1, coords2, feats1, feats2, fg_labels_1, \ 20 | fg_labels_2, flow, R_ego, t_ego, pc_eval_1, pc_eval_2, flow_eval, fg_labels_eval_1, fg_labels_eval_2 = list(zip(*list_data)) 21 | 22 | pc_batch1, pc_batch2 = [], [] 23 | pc_eval_batch1, pc_eval_batch2 = [], [] 24 | fg_labels_batch1, fg_labels_batch2 = [], [] 25 | fg_labels_eval_batch1, fg_labels_eval_batch2 = [], [] 26 | R_ego_batch, t_ego_batch = [],[] 27 | flow_batch, flow_eval_batch, len_batch = [], [], [] 28 | batch_id = 0 29 | 30 | for batch_id, _ in enumerate(coords1): 31 | N1 = coords1[batch_id].shape[0] 32 | N2 = coords2[batch_id].shape[0] 33 | len_batch.append([N1, N2]) 34 | 35 | pc_batch1.append(to_tensor(pc_1[batch_id]).float()) 36 | pc_batch2.append(to_tensor(pc_2[batch_id]).float()) 37 | 38 | pc_eval_batch1.append(to_tensor(pc_eval_1[batch_id]).float()) 39 | pc_eval_batch2.append(to_tensor(pc_eval_2[batch_id]).float()) 40 | 41 | fg_labels_batch1.append(to_tensor(fg_labels_1[batch_id])) 42 | fg_labels_batch2.append(to_tensor(fg_labels_2[batch_id])) 43 | 44 | fg_labels_eval_batch1.append(to_tensor(fg_labels_eval_1[batch_id])) 45 | fg_labels_eval_batch2.append(to_tensor(fg_labels_eval_2[batch_id])) 46 | 47 | R_ego_batch.append(to_tensor(R_ego[batch_id]).unsqueeze(0)) 48 | t_ego_batch.append(to_tensor(t_ego[batch_id]).unsqueeze(0)) 49 | 50 | flow_batch.append(to_tensor(flow[batch_id])) 51 | flow_eval_batch.append(to_tensor(flow_eval[batch_id])) 52 | 53 | coords_batch1, feats_batch1 = ME.utils.sparse_collate(coords=coords1, feats=feats1) 54 | coords_batch2, feats_batch2 = ME.utils.sparse_collate(coords=coords2, feats=feats2) 55 | 56 | 57 | # Concatenate all lists 58 | fg_labels_batch1 = torch.cat(fg_labels_batch1, 0).long() 59 | fg_labels_batch2 = torch.cat(fg_labels_batch2, 0).long() 60 | flow_batch = torch.cat(flow_batch, 0).float() 61 | flow_eval_batch = torch.cat(flow_eval_batch, 0).float() 62 | R_ego_batch = torch.cat(R_ego_batch, 0).float() 63 | t_ego_batch = torch.cat(t_ego_batch, 0).float() 64 | fg_labels_eval_batch1 = torch.cat(fg_labels_eval_batch1, 0).long() 65 | fg_labels_eval_batch2 = torch.cat(fg_labels_eval_batch2, 0).long() 66 | 67 | return { 68 | 'pcd_s': pc_batch1, 69 | 'pcd_t': pc_batch2, 70 | 'sinput_s_C': coords_batch1, 71 | 'sinput_s_F': feats_batch1.float(), 72 | 'sinput_t_C': coords_batch2, 73 | 'sinput_t_F': feats_batch2.float(), 74 | 'fg_labels_s': fg_labels_batch1, 75 | 'fg_labels_t': fg_labels_batch2, 76 | 'flow': flow_batch, 77 | 'R_ego': R_ego_batch, 78 | 't_ego': t_ego_batch, 79 | 'pcd_eval_s': pc_eval_batch1, 80 | 'pcd_eval_t': pc_eval_batch2, 81 | 'flow_eval': flow_eval_batch, 82 | 'fg_labels_eval_s': fg_labels_eval_batch1, 83 | 'fg_labels_eval_t': fg_labels_eval_batch2, 84 | 'len_batch': len_batch 85 | } 86 | 87 | 88 | class MELidarDataset(data.Dataset): 89 | def __init__(self, phase, config): 90 | 91 | self.files = [] 92 | self.root = config['data']['root'] 93 | self.config = config 94 | self.input_features = config['data']['input_features'] 95 | self.num_points = config['misc']['num_points'] 96 | self.voxel_size = config['misc']['voxel_size'] 97 | self.remove_ground = True if (config['data']['remove_ground'] and config['data']['dataset'] in ['StereoKITTI_ME','LidarKITTI_ME','SemanticKITTI_ME','WaymoOpen_ME']) else False 98 | self.dataset = config['data']['dataset'] 99 | self.only_near_points = config['data']['only_near_points'] 100 | self.phase = phase 101 | 102 | self.randng = np.random.RandomState() 103 | self.device = torch.device('cuda' if (torch.cuda.is_available() and config['misc']['use_gpu']) else 'cpu') 104 | 105 | self.augment_data = config['data']['augment_data'] 106 | 107 | logging.info("Loading the subset {} from {}".format(phase,self.root)) 108 | 109 | subset_names = open(self.DATA_FILES[phase]).read().split() 110 | 111 | for name in subset_names: 112 | self.files.append(name) 113 | 114 | def __getitem__(self, idx): 115 | file = os.path.join(self.root,self.files[idx]) 116 | file_name = file.replace(os.sep,'/').split('/')[-1] 117 | 118 | # Load the data 119 | data = np.load(file) 120 | pc_1 = data['pc1'] 121 | pc_2 = data['pc2'] 122 | 123 | if 'pose_s' in data: 124 | pose_1 = data['pose_s'] 125 | else: 126 | pose_1 = np.eye(4) 127 | 128 | if 'pose_t' in data: 129 | pose_2 = data['pose_t'] 130 | else: 131 | pose_2 = np.eye(4) 132 | 133 | if 'sem_label_s' in data: 134 | labels_1 = data['sem_label_s'] 135 | else: 136 | labels_1 = np.zeros(pc_1.shape[0]) 137 | 138 | 139 | if 'sem_label_t' in data: 140 | labels_2 = data['sem_label_t'] 141 | else: 142 | labels_2 = np.zeros(pc_2.shape[0]) 143 | 144 | if 'flow' in data: 145 | flow = data['flow'] 146 | else: 147 | flow = np.zeros_like(pc_1) 148 | 149 | # Remove the ground and far away points 150 | # In stereoKITTI the direct correspondences are provided therefore we remove, 151 | # if either of the points fullfills the condition (as in hplflownet, flot, ...) 152 | 153 | if self.dataset in ["SemanticKITTI_ME", 'LidarKITTI_ME', "WaymoOpen_ME"]: 154 | if self.remove_ground: 155 | if self.phase == 'test': 156 | is_not_ground_s = (pc_1[:, 1] > -1.4) 157 | is_not_ground_t = (pc_2[:, 1] > -1.4) 158 | 159 | pc_1 = pc_1[is_not_ground_s,:] 160 | labels_1 = labels_1[is_not_ground_s] 161 | flow = flow[is_not_ground_s,:] 162 | 163 | pc_2 = pc_2[is_not_ground_t,:] 164 | labels_2 = labels_2[is_not_ground_t] 165 | 166 | # In the training phase we randomly select if the ground should be removed or not 167 | elif np.random.rand() > 1/4: 168 | is_not_ground_s = (pc_1[:, 1] > -1.4) 169 | is_not_ground_t = (pc_2[:, 1] > -1.4) 170 | 171 | pc_1 = pc_1[is_not_ground_s,:] 172 | labels_1 = labels_1[is_not_ground_s] 173 | flow = flow[is_not_ground_s,:] 174 | 175 | pc_2 = pc_2[is_not_ground_t,:] 176 | labels_2 = labels_2[is_not_ground_t] 177 | 178 | if self.only_near_points: 179 | is_near_s = (pc_1[:, 2] < 35) 180 | is_near_t = (pc_2[:, 2] < 35) 181 | 182 | pc_1 = pc_1[is_near_s,:] 183 | labels_1 = labels_1[is_near_s] 184 | flow = flow[is_near_s,:] 185 | 186 | pc_2 = pc_2[is_near_t,:] 187 | labels_2 = labels_2[is_near_t] 188 | 189 | else: 190 | if self.remove_ground: 191 | is_not_ground = np.logical_not(np.logical_and(pc_1[:, 1] < -1.4, pc_2[:, 1] < -1.4)) 192 | pc_1 = pc_1[is_not_ground,:] 193 | pc_2 = pc_2[is_not_ground,:] 194 | flow = flow[is_not_ground,:] 195 | 196 | if self.only_near_points: 197 | is_near = np.logical_and(pc_1[:, 2] < 35, pc_1[:, 2] < 35) 198 | pc_1 = pc_1[is_near,:] 199 | pc_2 = pc_2[is_near,:] 200 | flow = flow[is_near,:] 201 | 202 | # Augment the point cloud by randomly rotating and translating them (recompute the ego-motion if augmention is applied!) 203 | if self.augment_data and self.phase != 'test': 204 | T_1 = np.eye(4) 205 | T_2 = np.eye(4) 206 | 207 | T_1[0:3,3] = (np.random.rand(3) - 0.5) * 0.5 208 | T_2[0:3,3] = (np.random.rand(3) - 0.5) * 0.5 209 | 210 | T_1[1,3] = (np.random.rand(1) - 0.5) * 0.1 211 | T_2[1,3] = (np.random.rand(1) - 0.5) * 0.1 212 | 213 | pc_1 = (np.matmul(T_1[0:3, 0:3], pc_1.transpose()) + T_1[0:3,3:4]).transpose() 214 | pc_2 = (np.matmul(T_2[0:3, 0:3], pc_2.transpose()) + T_2[0:3,3:4]).transpose() 215 | 216 | pose_1 = np.matmul(pose_1, np.linalg.inv(T_1)) 217 | pose_2 = np.matmul(pose_2, np.linalg.inv(T_2)) 218 | 219 | rel_trans = np.linalg.inv(pose_2) @ pose_1 220 | 221 | R_ego = rel_trans[0:3,0:3] 222 | t_ego = rel_trans[0:3,3:4] 223 | else: 224 | # Compute relative pose that transform the point from the source point cloud to the target 225 | rel_trans = np.linalg.inv(pose_2) @ pose_1 226 | R_ego = rel_trans[0:3,0:3] 227 | t_ego = rel_trans[0:3,3:4] 228 | 229 | 230 | # Sample n points for evaluation before the voxelization 231 | # If less than desired points are available just consider the maximum 232 | if pc_1.shape[0] > self.num_points: 233 | idx_1 = np.random.choice(pc_1.shape[0], self.num_points, replace=False) 234 | else: 235 | idx_1 = np.random.choice(pc_1.shape[0], pc_1.shape[0], replace=False) 236 | 237 | if pc_2.shape[0] > self.num_points: 238 | idx_2 = np.random.choice(pc_2.shape[0], self.num_points, replace=False) 239 | else: 240 | idx_2 = np.random.choice(pc_2.shape[0], pc_2.shape[0], replace=False) 241 | 242 | pc_1_eval = pc_1[idx_1,:] 243 | flow_eval = flow[idx_1,:] 244 | labels_1_eval = labels_1[idx_1] 245 | 246 | pc_2_eval = pc_2[idx_2,:] 247 | labels_2_eval = labels_2[idx_2] 248 | 249 | # Voxelization 250 | _, sel1 = ME.utils.sparse_quantize(np.ascontiguousarray(pc_1) / self.voxel_size, return_index=True) 251 | _, sel2 = ME.utils.sparse_quantize(np.ascontiguousarray(pc_2) / self.voxel_size, return_index=True) 252 | 253 | 254 | # Slect the voxelized points 255 | pc_1 = pc_1[sel1,:] 256 | labels_1 = labels_1[sel1] 257 | flow = flow[sel1,:] 258 | 259 | pc_2 = pc_2[sel2,:] 260 | labels_2 = labels_2[sel2] 261 | 262 | # If more voxels then the selected number of points are remaining randomly sample them 263 | if pc_1.shape[0] > self.num_points: 264 | idx_1 = np.random.choice(pc_1.shape[0], self.num_points, replace=False) 265 | else: 266 | idx_1 = np.random.choice(pc_1.shape[0], pc_1.shape[0], replace=False) 267 | 268 | if pc_2.shape[0] > self.num_points: 269 | idx_2 = np.random.choice(pc_2.shape[0], self.num_points, replace=False) 270 | else: 271 | idx_2 = np.random.choice(pc_2.shape[0], pc_2.shape[0], replace=False) 272 | 273 | pc_1 = pc_1[idx_1,:] 274 | labels_1 = labels_1[idx_1] 275 | flow = flow[idx_1,:] 276 | 277 | pc_2 = pc_2[idx_2,:] 278 | labels_2 = labels_2[idx_2] 279 | 280 | 281 | # Get sparse indices 282 | coords1 = np.floor(pc_1 / self.voxel_size) 283 | coords2 = np.floor(pc_2 / self.voxel_size) 284 | 285 | 286 | feats_train1, feats_train2 = [], [] 287 | 288 | if self.input_features == 'occupancy': 289 | feats_train1.append(np.ones((pc_1.shape[0], 1))) 290 | feats_train2.append(np.ones((pc_2.shape[0], 1))) 291 | 292 | elif self.input_features == 'absolute_coords': 293 | feats_train1.append(pc_1) 294 | feats_train2.append(pc_2) 295 | 296 | elif self.input_features == 'relative_coords': 297 | feats_train1.append(pc_1 - (coords1 * self.voxel_size)) 298 | feats_train2.append(pc_2 - (coords2 * self.voxel_size)) 299 | 300 | else: 301 | raise ValueError('{} not recognized as a valid input feature!'.format(self.input_features)) 302 | 303 | feats1 = np.hstack(feats_train1) 304 | feats2 = np.hstack(feats_train2) 305 | 306 | # Foreground points (class label bellow 40 or above 99 -> binary label 1) 307 | fg_labels_1 = np.zeros((labels_1.shape[0])) 308 | fg_labels_1[((labels_1 < 40) | (labels_1 > 99))] = 1 309 | fg_labels_1[labels_1 == 0] = -1 310 | 311 | fg_labels_2 = np.zeros((labels_2.shape[0])) 312 | fg_labels_2[((labels_2 < 40) | (labels_2 > 99))] = 1 313 | fg_labels_2[labels_2 == 0] = -1 314 | 315 | fg_labels_1_eval = np.zeros((labels_1_eval.shape[0])) 316 | fg_labels_1_eval[((labels_1_eval < 40) | (labels_1_eval > 99))] = 1 317 | fg_labels_1_eval[labels_1_eval == 0] = -1 318 | 319 | fg_labels_2_eval = np.zeros((labels_2_eval.shape[0])) 320 | fg_labels_2_eval[((labels_2_eval < 40) | (labels_2_eval > 99))] = 1 321 | fg_labels_2_eval[labels_2_eval == 0] = -1 322 | 323 | return (pc_1, pc_2, coords1, coords2, feats1, feats2, fg_labels_1, fg_labels_2, flow, 324 | R_ego, t_ego, pc_1_eval, pc_2_eval, flow_eval, fg_labels_1_eval, fg_labels_2_eval) 325 | 326 | def __len__(self): 327 | return len(self.files) 328 | 329 | def reset_seed(self,seed=41): 330 | logging.info('Resetting the data loader seed to {}'.format(seed)) 331 | self.randng.seed(seed) 332 | 333 | 334 | class FlyingThings3D_ME(MELidarDataset): 335 | # 3D Match dataset all files 336 | DATA_FILES = { 337 | 'train': './configs/datasets/flying_things_3d/train.txt', 338 | 'val': './configs/datasets/flying_things_3d/val.txt', 339 | 'test': './configs/datasets/flying_things_3d/test.txt' 340 | } 341 | 342 | class StereoKITTI_ME(MELidarDataset): 343 | # 3D Match dataset all files 344 | DATA_FILES = { 345 | 'train': './configs/datasets/stereo_kitti/test.txt', 346 | 'val': './configs/datasets/stereo_kitti/test.txt', 347 | 'test': './configs/datasets/stereo_kitti/test.txt' 348 | } 349 | 350 | class SemanticKITTI_ME(MELidarDataset): 351 | # 3D Match dataset all files 352 | DATA_FILES = { 353 | 'train': './configs/datasets/semantic_kitti/train.txt', 354 | 'val': './configs/datasets/semantic_kitti/val.txt', 355 | 'test': './configs/datasets/semantic_kitti/val.txt' 356 | } 357 | 358 | class LidarKITTI_ME(MELidarDataset): 359 | # 3D Match dataset all files 360 | DATA_FILES = { 361 | 'train': './configs/datasets/lidar_kitti/test.txt', 362 | 'val': './configs/datasets/lidar_kitti/test.txt', 363 | 'test': './configs/datasets/lidar_kitti/test.txt' 364 | } 365 | 366 | 367 | class WaymoOpen_ME(MELidarDataset): 368 | # 3D Match dataset all files 369 | DATA_FILES = { 370 | 'train': './configs/datasets/waymo_open/train.txt', 371 | 'val': './configs/datasets/waymo_open/val.txt', 372 | 'test': './configs/datasets/waymo_open/test.txt' 373 | } 374 | 375 | 376 | # Map the datasets to string names 377 | ALL_DATASETS = [FlyingThings3D_ME, StereoKITTI_ME, SemanticKITTI_ME, LidarKITTI_ME, WaymoOpen_ME] 378 | 379 | dataset_str_mapping = {d.__name__: d for d in ALL_DATASETS} 380 | 381 | 382 | def make_data_loader(config, phase, neighborhood_limits=None, shuffle_dataset=None): 383 | """ 384 | Defines the data loader based on the parameters specified in the config file 385 | Args: 386 | config (dict): dictionary of the arguments 387 | phase (str): phase for which the data loader should be initialized in [train,val,test] 388 | shuffle_dataset (bool): shuffle the dataset or not 389 | Returns: 390 | loader (torch data loader): data loader that handles loading the data to the model 391 | """ 392 | 393 | assert config['misc']['run_mode'] in ['train','val','test'] 394 | 395 | if shuffle_dataset is None: 396 | shuffle_dataset = config['misc']['run_mode'] != 'test' 397 | 398 | # Select the defined dataset 399 | Dataset = dataset_str_mapping[config['data']['dataset']] 400 | 401 | dset = Dataset(phase, config=config) 402 | 403 | drop_last = False if config['misc']['run_mode'] == 'test' else True 404 | 405 | loader = torch.utils.data.DataLoader( 406 | dset, 407 | batch_size=config[phase]['batch_size'], 408 | shuffle=shuffle_dataset, 409 | num_workers=config[phase]['num_workers'], 410 | collate_fn=collate_fn, 411 | pin_memory=False, 412 | drop_last=drop_last 413 | ) 414 | 415 | return loader -------------------------------------------------------------------------------- /lib/logger.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import numpy as np 4 | import logging 5 | import coloredlogs 6 | 7 | 8 | _logger = logging.getLogger() 9 | 10 | 11 | def print_info(config, log_dir=None): 12 | """ Logs source code configuration 13 | Code adapted from RPMNet repository: https://github.com/yewzijian/RPMNet/ 14 | """ 15 | _logger.info('Command: {}'.format(' '.join(sys.argv))) 16 | 17 | # Arguments 18 | arg_str = [] 19 | 20 | for k_id, k_val in config.items(): 21 | if isinstance(k_val, dict): 22 | for key in k_val: 23 | arg_str.append("{}_{}: {}".format(k_id, key, k_val[key])) 24 | else: 25 | arg_str.append("{}: {}".format(k_id, k_val)) 26 | 27 | arg_str = ', '.join(arg_str) 28 | _logger.info('Arguments: {}'.format(arg_str)) 29 | 30 | 31 | def prepare_logger(config, log_path = None): 32 | """Creates logging directory, and installs colorlogs 33 | Args: 34 | opt: Program arguments, should include --dev and --logdir flag. 35 | See get_parent_parser() 36 | log_path: Logging path (optional). This serves to overwrite the settings in 37 | argparse namespace 38 | Returns: 39 | logger (logging.Logger) 40 | log_path (str): Logging directory 41 | Code borrowed from RPMNet repository: https://github.com/yewzijian/RPMNet/ 42 | """ 43 | 44 | os.makedirs(log_path, exist_ok=True) 45 | 46 | logger = logging.getLogger() 47 | coloredlogs.install(level='INFO', logger=logger) 48 | file_handler = logging.FileHandler('{}/console_output.txt'.format(log_path)) 49 | log_formatter = logging.Formatter('%(asctime)s [%(levelname)s] %(name)s - %(message)s') 50 | file_handler.setFormatter(log_formatter) 51 | logger.addHandler(file_handler) 52 | print_info(config, log_path) 53 | logger.info('Output and logs will be saved to {}'.format(log_path)) 54 | 55 | return logger, log_path 56 | -------------------------------------------------------------------------------- /lib/loss.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | from lib.utils import transform_point_cloud, kabsch_transformation_estimation 5 | from utils.chamfer_distance import ChamferDistance 6 | 7 | 8 | class TrainLoss(nn.Module): 9 | """ 10 | Training loss consists of a ego-motion loss, background segmentation loss, and a foreground loss. 11 | The l1 flow loss is used for the full supervised experiments only. 12 | 13 | Args: 14 | args: parameters controling the initialization of the loss functions 15 | 16 | """ 17 | 18 | def __init__(self, args): 19 | nn.Module.__init__(self) 20 | 21 | 22 | self.args = args 23 | self.device = torch.device('cuda' if (torch.cuda.is_available() and args['misc']['use_gpu']) else 'cpu') 24 | 25 | # Flow loss 26 | self.flow_criterion = nn.L1Loss(reduction='mean') 27 | 28 | # Ego motion loss 29 | self.ego_l1_criterion = nn.L1Loss(reduction='mean') 30 | self.ego_outlier_criterion = OutlierLoss() 31 | 32 | # Background segmentation loss 33 | if args['loss']['background_loss'] == 'weighted': 34 | 35 | # Based on the dataset analysis there are 14 times more background labels 36 | seg_weight = torch.tensor([1.0, 20.0]).to(self.device) 37 | self.seg_criterion = torch.nn.CrossEntropyLoss(weight=seg_weight, ignore_index=-1) 38 | 39 | else: 40 | self.seg_criterion = torch.nn.CrossEntropyLoss(ignore_index=-1) 41 | 42 | # Foreground loss 43 | self.chamfer_criterion = ChamferDistance() 44 | self.rigidity_criterion = nn.L1Loss(reduction='mean') 45 | 46 | def __call__(self, inferred_values, gt_data): 47 | 48 | # Initialize the dictionary 49 | losses = {} 50 | 51 | if self.args['method']['flow'] and self.args['loss']['flow_loss']: 52 | assert (('coarse_flow' in inferred_values) & ('flow' in gt_data)), 'Flow loss selected \ 53 | but either est or gt flow not provided' 54 | 55 | losses['refined_flow_loss'] = self.flow_criterion(inferred_values['refined_flow'], 56 | gt_data['flow']) * self.args['loss'].get('flow_loss_w', 1.0) 57 | 58 | losses['coarse_flow_loss'] = self.flow_criterion(inferred_values['coarse_flow'], 59 | gt_data['flow']) * self.args['loss'].get('flow_loss_w', 1.0) 60 | 61 | 62 | if self.args['method']['ego_motion'] and self.args['loss']['ego_loss']: 63 | assert (('R_est' in inferred_values) & ('R_s_t' in gt_data) is not None), "Ego motion loss selected \ 64 | but either est or gt ego motion not provided" 65 | 66 | assert 'permutation' in inferred_values is not None, 'Outlier loss selected \ 67 | but the permutation matrix is not provided' 68 | 69 | # Only evaluate on the background points 70 | mask = (gt_data['fg_labels_s'] == 0) 71 | 72 | prev_idx = 0 73 | pc_t_gt, pc_t_est = [], [] 74 | 75 | # Iterate over the samples in the batch 76 | for batch_idx in range(gt_data['R_ego'].shape[0]): 77 | 78 | # Convert the voxel indices back to the coordinates 79 | p_s_temp = gt_data['sinput_s_C'][prev_idx: prev_idx + gt_data['len_batch'][batch_idx][0],:].to(self.device) * self.args['misc']['voxel_size'] 80 | mask_temp = mask[prev_idx: prev_idx + gt_data['len_batch'][batch_idx][0]] 81 | 82 | # Transform the point cloud with gt and estimated ego-motion parameters 83 | pc_t_gt_temp = transform_point_cloud(p_s_temp[mask_temp,1:4], gt_data['R_ego'][batch_idx,:,:], gt_data['t_ego'][batch_idx,:,:]) 84 | pc_t_est_temp = transform_point_cloud(p_s_temp[mask_temp,1:4], inferred_values['R_est'][batch_idx,:,:], inferred_values['t_est'][batch_idx,:,:]) 85 | 86 | pc_t_gt.append(pc_t_gt_temp.squeeze(0)) 87 | pc_t_est.append(pc_t_est_temp.squeeze(0)) 88 | 89 | prev_idx += gt_data['len_batch'][batch_idx][0] 90 | 91 | pc_t_est = torch.cat(pc_t_est, 0) 92 | pc_t_gt = torch.cat(pc_t_gt, 0) 93 | 94 | losses['ego_loss'] = self.ego_l1_criterion(pc_t_est, pc_t_gt) * self.args['loss'].get('ego_loss_w', 1.0) 95 | losses['outlier_loss'] = self.ego_outlier_criterion(inferred_values['permutation']) * self.args['loss'].get('inlier_loss_w', 1.0) 96 | 97 | # Background segmentation loss 98 | if self.args['method']['semantic'] and self.args['loss']['background_loss']: 99 | assert (('semantic_logits_s' in inferred_values) & ('fg_labels_s' in gt_data)), "Background loss selected but either est or gt labels not provided" 100 | 101 | semantic_loss = torch.tensor(0.0).to(self.device) 102 | 103 | semantic_loss += self.seg_criterion(inferred_values['semantic_logits_s'].F, gt_data['fg_labels_s']) * self.args['loss'].get('bg_loss_w', 1.0) 104 | 105 | # If the background labels for the target point cloud are available also use them for the loss computation 106 | if 'semantic_logits_t' in inferred_values: 107 | semantic_loss += self.seg_criterion(inferred_values['semantic_logits_t'].F, gt_data['fg_labels_t']) * self.args['loss'].get('bg_loss_w', 1.0) 108 | semantic_loss = semantic_loss/2 109 | 110 | losses['semantic_loss'] = semantic_loss 111 | 112 | # Foreground loss 113 | if self.args['method']['clustering'] and self.args['loss']['foreground_loss']: 114 | assert ('clusters_s' in inferred_values), "Foreground loss selected but inferred cluster labels not provided" 115 | 116 | rigidity_loss = torch.tensor(0.0).to(self.device) 117 | 118 | xyz_s = torch.cat(gt_data['pcd_s'], 0).to(self.device) 119 | xyz_t = torch.cat(gt_data['pcd_t'], 0).to(self.device) 120 | 121 | # # Two-way chamfer distance for the foreground points (only compute if both point clouds have more than 50 foreground points) 122 | # if torch.where(gt_data['fg_labels_s'] == 1)[0].shape[0] > 50 and torch.where(gt_data['fg_labels_t'] == 1)[0].shape[0] > 50: 123 | 124 | foreground_mask_s = (gt_data['fg_labels_s'] == 1) 125 | foreground_mask_t = (gt_data['fg_labels_t'] == 1) 126 | 127 | prev_idx_s = 0 128 | prev_idx_t = 0 129 | chamfer_loss = [] 130 | # Iterate over the samples in the batch 131 | for batch_idx in range(gt_data['R_ego'].shape[0]): 132 | 133 | temp_foreground_mask_s = foreground_mask_s[prev_idx_s : prev_idx_s + gt_data['len_batch'][batch_idx][0]] 134 | temp_foreground_mask_t = foreground_mask_t[prev_idx_t : prev_idx_t + gt_data['len_batch'][batch_idx][1]] 135 | 136 | if torch.sum(temp_foreground_mask_s) > 50 and torch.sum(temp_foreground_mask_t) > 50: 137 | foreground_xyz_s_temp = xyz_s[prev_idx_s: prev_idx_s + gt_data['len_batch'][batch_idx][0],:] 138 | foreground_xyz_t_temp = xyz_t[prev_idx_t: prev_idx_t + gt_data['len_batch'][batch_idx][1],:] 139 | foreground_flow = inferred_values['refined_rigid_flow'][prev_idx_s: prev_idx_s + gt_data['len_batch'][batch_idx][0],:] 140 | 141 | foreground_xyz_s = foreground_xyz_s_temp[temp_foreground_mask_s,:] 142 | foreground_flow = foreground_flow[temp_foreground_mask_s,:] 143 | foreground_xyz_t = foreground_xyz_t_temp[temp_foreground_mask_t,:] 144 | 145 | dist1, dist2 = self.chamfer_criterion(foreground_xyz_t.unsqueeze(0), (foreground_xyz_s + foreground_flow).unsqueeze(0)) 146 | 147 | # Clamp the distance to prevent outliers (objects that appear and disappear from the scene) 148 | dist1 = torch.clamp(torch.sqrt(dist1), max=1.0) 149 | dist2 = torch.clamp(torch.sqrt(dist2), max=1.0) 150 | 151 | chamfer_loss.append((torch.mean(dist1) + torch.mean(dist2)) / 2.0) 152 | 153 | prev_idx_s += gt_data['len_batch'][batch_idx][0] 154 | prev_idx_t += gt_data['len_batch'][batch_idx][1] 155 | 156 | # Handle the case where there are no foreground points 157 | if len(chamfer_loss) == 0: chamfer_loss.append(torch.tensor(0.0).to(self.device)) 158 | 159 | losses['chamfer_loss'] = torch.mean(torch.stack(chamfer_loss)) * self.args['loss'].get('cd_loss_w', 1.0) 160 | 161 | # Rigidity loss (flow vectors of each cluster should be congruent) 162 | n_clusters = 0 163 | # Iterate over the clusters and enforce rigidity within each cluster 164 | for batch_idx in inferred_values['clusters_s']: 165 | 166 | for cluster in inferred_values['clusters_s'][batch_idx]: 167 | cluster_xyz_s = xyz_s[cluster,:].unsqueeze(0) 168 | cluster_flow = inferred_values['refined_rigid_flow'][cluster,:].unsqueeze(0) 169 | reconstructed_xyz = cluster_xyz_s + cluster_flow 170 | 171 | # Compute the unweighted Kabsch estimation (transformation parameters which best explain the vectors) 172 | R_cluster, t_cluster, _, _ = kabsch_transformation_estimation(cluster_xyz_s, reconstructed_xyz) 173 | 174 | # Detach the gradients such that they do not flow through the tansformation parameters but only through flow 175 | rigid_xyz = (torch.matmul(R_cluster, cluster_xyz_s.transpose(1, 2)) + t_cluster ).detach().squeeze(0).transpose(0,1) 176 | 177 | rigidity_loss += self.rigidity_criterion(reconstructed_xyz.squeeze(0), rigid_xyz) 178 | 179 | n_clusters += 1 180 | 181 | n_clusters = 1.0 if n_clusters == 0 else n_clusters 182 | losses['rigidity_loss'] = (rigidity_loss / n_clusters) * self.args['loss'].get('rigid_loss_w', 1.0) 183 | 184 | # Compute the total loss as the sum of individual losses 185 | total_loss = 0.0 186 | for key in losses: 187 | total_loss += losses[key] 188 | 189 | losses['total_loss'] = total_loss 190 | return losses 191 | 192 | 193 | 194 | 195 | 196 | 197 | class OutlierLoss(): 198 | """ 199 | Outlier loss used regularize the training of the ego-motion. Aims to prevent Sinkhorn algorithm to 200 | assign to much mass to the slack row and column. 201 | 202 | """ 203 | def __init__(self): 204 | 205 | self.reduction = 'mean' 206 | 207 | def __call__(self, perm_matrix): 208 | 209 | ref_outliers_strength = [] 210 | src_outliers_strength = [] 211 | 212 | for batch_idx in range(len(perm_matrix)): 213 | ref_outliers_strength.append(1.0 - torch.sum(perm_matrix[batch_idx], dim=1)) 214 | src_outliers_strength.append(1.0 - torch.sum(perm_matrix[batch_idx], dim=2)) 215 | 216 | ref_outliers_strength = torch.cat(ref_outliers_strength,1) 217 | src_outliers_strength = torch.cat(src_outliers_strength,0) 218 | 219 | if self.reduction.lower() == 'mean': 220 | return torch.mean(ref_outliers_strength) + torch.mean(src_outliers_strength) 221 | 222 | elif self.reduction.lower() == 'none': 223 | return torch.mean(ref_outliers_strength, dim=1) + \ 224 | torch.mean(src_outliers_strength, dim=1) 225 | -------------------------------------------------------------------------------- /lib/metrics.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | from lib.utils import compute_epe, rotation_error, translation_error, precision_at_one, evaluate_binary_class 5 | 6 | class EvalMetrics(nn.Module): 7 | """ 8 | Computes all the evaluation metric used to either monitor the training process or evaluate the method 9 | 10 | Args: 11 | args: parameters controling the initialization of the evaluation metrics 12 | 13 | """ 14 | 15 | def __init__(self, args): 16 | nn.Module.__init__(self) 17 | 18 | 19 | self.args = args 20 | self.device = torch.device('cuda' if (torch.cuda.is_available() and args['misc']['use_gpu']) else 'cpu') 21 | 22 | def __call__(self, inferred_values, gt_data, phase='train'): 23 | 24 | # Initialize the dictionary 25 | metrics = {} 26 | 27 | if (self.args['method']['flow'] and self.args['metrics']['flow']): 28 | assert (('refined_flow' in inferred_values) & ('flow_eval' in gt_data)), "Flow metrics selected \ 29 | but either est or gt flow not provided" 30 | 31 | 32 | gt_flow = gt_data['flow'] if phase == 'train' else gt_data['flow_eval'] 33 | # Compute the end point error of the flow vectors 34 | # If bg/fg labels are available use them to also compute f-EPE and b-EPE 35 | if 'fg_labels_eval_s' in gt_data and self.args['data']['dataset'] not in ["FlyingThings3D_ME", "StereoKITTI_ME"]: 36 | gt_label = gt_data['fg_labels_s'] if phase == 'train' else gt_data['fg_labels_eval_s'] 37 | ego_metrics = compute_epe(inferred_values['refined_rigid_flow'], gt_flow, sem_label=gt_label, eval_stats=True) 38 | else: 39 | ego_metrics = compute_epe(inferred_values['refined_rigid_flow'], gt_flow, eval_stats =True) 40 | 41 | for key, value in ego_metrics.items(): 42 | metrics[key] = value 43 | 44 | # Compute the ego-motion metric 45 | if self.args['method']['ego_motion'] and self.args['metrics']['ego_motion']: 46 | assert (('R_est' in inferred_values) & ('R_ego' in gt_data)), "Ego motion metric selected \ 47 | but either est or gt ego motion not provided" 48 | 49 | r_error = rotation_error(inferred_values['R_est'], gt_data['R_ego']) 50 | 51 | metrics['mean_r_error'] = torch.mean(r_error).item() 52 | metrics['max_r_error'] = torch.max(r_error).item() 53 | metrics['min_r_error'] = torch.min(r_error).item() 54 | 55 | t_error = translation_error(inferred_values['t_est'], gt_data['t_ego']) 56 | 57 | metrics['mean_t_error'] = torch.mean(t_error).item() 58 | metrics['max_t_error'] = torch.max(t_error).item() 59 | metrics['min_t_error'] = torch.min(t_error).item() 60 | 61 | 62 | # Compute the background segmentation metric 63 | if self.args['method']['semantic'] and self.args['metrics']['semantic']: 64 | assert (('semantic_logits_s_all' in inferred_values) & ('fg_labels_eval_s' in gt_data)), "Background segmentation metric selected \ 65 | but either est or gt labels not provided" 66 | 67 | gt_label = gt_data['fg_labels_s'] if phase == 'train' else gt_data['fg_labels_eval_s'] 68 | 69 | pred_label = inferred_values['semantic_logits_s_all'].max(1)[1] 70 | pre_f, pre_b, rec_f, rec_b = precision_at_one(pred_label, gt_label) 71 | 72 | metrics['precision_f'] = pre_f.item() 73 | metrics['recall_f'] = rec_f.item() 74 | metrics['precision_b'] = pre_b.item() 75 | metrics['recall_b'] = rec_b.item() 76 | 77 | 78 | true_p, true_n, false_p, false_n = evaluate_binary_class(pred_label, gt_label) 79 | 80 | metrics['true_p'] = true_p.item() 81 | metrics['true_n'] = true_n.item() 82 | metrics['false_p'] = false_p.item() 83 | metrics['false_n'] = false_n.item() 84 | 85 | return metrics -------------------------------------------------------------------------------- /lib/model/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/model/__init__.py -------------------------------------------------------------------------------- /lib/model/minkowski/ME_layers.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | import MinkowskiEngine as ME 5 | import MinkowskiEngine.MinkowskiFunctional as MEF 6 | 7 | #### NORMALIZATION LAYER #### 8 | def get_norm_layer(norm_type, num_feats, bn_momentum=0.05, D=-1): 9 | 10 | if norm_type == 'BN': 11 | return ME.MinkowskiBatchNorm(num_feats, momentum=bn_momentum) 12 | 13 | elif norm_type == 'IN': 14 | return ME.MinkowskiInstanceNorm(num_feats) 15 | 16 | else: 17 | raise ValueError(f'Type {norm_type}, not defined') 18 | 19 | #### RESIDUAL BLOCK #### 20 | 21 | class ResBlockBase(nn.Module): 22 | expansion = 1 23 | NORM_TYPE = 'BN' 24 | 25 | def __init__(self, 26 | inplanes, 27 | planes, 28 | stride=1, 29 | dilation=1, 30 | downsample=None, 31 | bn_momentum=0.1, 32 | D=3): 33 | super(ResBlockBase, self).__init__() 34 | 35 | self.conv1 = ME.MinkowskiConvolution( 36 | inplanes, planes, kernel_size=3, stride=stride, dimension=D) 37 | 38 | self.norm1 = get_norm_layer(self.NORM_TYPE, planes, bn_momentum=bn_momentum, D=D) 39 | 40 | self.conv2 = ME.MinkowskiConvolution( 41 | planes, 42 | planes, 43 | kernel_size=3, 44 | stride=1, 45 | dilation=dilation, 46 | bias=False, 47 | dimension=D) 48 | 49 | self.norm2 = get_norm_layer(self.NORM_TYPE, planes, bn_momentum=bn_momentum, D=D) 50 | 51 | self.downsample = downsample 52 | 53 | def forward(self, x): 54 | residual = x 55 | 56 | out = self.conv1(x) 57 | out = self.norm1(out) 58 | out = MEF.relu(out) 59 | 60 | out = self.conv2(out) 61 | out = self.norm2(out) 62 | 63 | if self.downsample is not None: 64 | residual = self.downsample(x) 65 | 66 | out += residual 67 | out = MEF.relu(out) 68 | 69 | return out 70 | 71 | 72 | class ResBlockBN(ResBlockBase): 73 | NORM_TYPE = 'BN' 74 | 75 | 76 | class ResBlockIN(ResBlockBase): 77 | NORM_TYPE = 'IN' 78 | 79 | 80 | def get_res_block(norm_type, 81 | inplanes, 82 | planes, 83 | stride=1, 84 | dilation=1, 85 | downsample=None, 86 | bn_momentum=0.1, 87 | D=3): 88 | 89 | if norm_type == 'BN': 90 | return ResBlockBN(inplanes, planes, stride, dilation, downsample, bn_momentum, D) 91 | 92 | elif norm_type == 'IN': 93 | return ResBlockIN(inplanes, planes, stride, dilation, downsample, bn_momentum, D) 94 | 95 | else: 96 | raise ValueError(f'Type {norm_type}, not defined') 97 | -------------------------------------------------------------------------------- /lib/model/minkowski/MinkowskiFlow.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import numpy as np 4 | import MinkowskiEngine as ME 5 | import MinkowskiEngine.MinkowskiFunctional as MEF 6 | 7 | from lib.model.minkowski.ME_layers import get_norm_layer, get_res_block 8 | from lib.utils import kabsch_transformation_estimation 9 | 10 | _EPS = 1e-6 11 | 12 | class SparseEnoder(ME.MinkowskiNetwork): 13 | CHANNELS = [None, 64, 64, 128, 128] 14 | 15 | def __init__(self, 16 | in_channels=3, 17 | out_channels=128, 18 | bn_momentum=0.1, 19 | conv1_kernel_size=9, 20 | norm_type='IN', 21 | D=3): 22 | 23 | ME.MinkowskiNetwork.__init__(self, D) 24 | 25 | NORM_TYPE = norm_type 26 | BLOCK_NORM_TYPE = norm_type 27 | CHANNELS = self.CHANNELS 28 | 29 | 30 | self.conv1 = ME.MinkowskiConvolution( 31 | in_channels=in_channels, 32 | out_channels=CHANNELS[1], 33 | kernel_size=conv1_kernel_size, 34 | stride=1, 35 | dilation=1, 36 | bias=False, 37 | dimension=D) 38 | self.norm1 = get_norm_layer(NORM_TYPE, CHANNELS[1], bn_momentum=bn_momentum, D=D) 39 | 40 | self.block1 = get_res_block( 41 | BLOCK_NORM_TYPE, CHANNELS[1], CHANNELS[1], bn_momentum=bn_momentum, D=D) 42 | 43 | self.conv2 = ME.MinkowskiConvolution( 44 | in_channels=CHANNELS[1], 45 | out_channels=CHANNELS[2], 46 | kernel_size=3, 47 | stride=2, 48 | dilation=1, 49 | bias=False, 50 | dimension=D) 51 | 52 | self.norm2 = get_norm_layer(NORM_TYPE, CHANNELS[2], bn_momentum=bn_momentum, D=D) 53 | 54 | self.block2 = get_res_block( 55 | BLOCK_NORM_TYPE, CHANNELS[2], CHANNELS[2], bn_momentum=bn_momentum, D=D) 56 | 57 | self.conv3 = ME.MinkowskiConvolution( 58 | in_channels=CHANNELS[2], 59 | out_channels=CHANNELS[3], 60 | kernel_size=3, 61 | stride=2, 62 | dilation=1, 63 | bias=False, 64 | dimension=D) 65 | self.norm3 = get_norm_layer(NORM_TYPE, CHANNELS[3], bn_momentum=bn_momentum, D=D) 66 | 67 | self.block3 = get_res_block( 68 | BLOCK_NORM_TYPE, CHANNELS[3], CHANNELS[3], bn_momentum=bn_momentum, D=D) 69 | 70 | self.conv4 = ME.MinkowskiConvolution( 71 | in_channels=CHANNELS[3], 72 | out_channels=CHANNELS[4], 73 | kernel_size=3, 74 | stride=2, 75 | dilation=1, 76 | bias=False, 77 | dimension=D) 78 | self.norm4 = get_norm_layer(NORM_TYPE, CHANNELS[4], bn_momentum=bn_momentum, D=D) 79 | 80 | self.block4 = get_res_block( 81 | BLOCK_NORM_TYPE, CHANNELS[4], CHANNELS[4], bn_momentum=bn_momentum, D=D) 82 | 83 | 84 | 85 | def forward(self, x, tgt_feature=False): 86 | 87 | skip_features = [] 88 | out_s1 = self.conv1(x) 89 | out_s1 = self.norm1(out_s1) 90 | out = self.block1(out_s1) 91 | 92 | skip_features.append(out_s1) 93 | 94 | out_s2 = self.conv2(out) 95 | out_s2 = self.norm2(out_s2) 96 | out = self.block2(out_s2) 97 | 98 | skip_features.append(out_s2) 99 | 100 | out_s4 = self.conv3(out) 101 | out_s4 = self.norm3(out_s4) 102 | out = self.block3(out_s4) 103 | 104 | skip_features.append(out_s4) 105 | 106 | out_s8 = self.conv4(out) 107 | out_s8 = self.norm4(out_s8) 108 | out = self.block4(out_s8) 109 | 110 | return out, skip_features 111 | 112 | 113 | 114 | 115 | 116 | class SparseDecoder(ME.MinkowskiNetwork): 117 | TR_CHANNELS = [None, 64, 128, 128, 128] 118 | CHANNELS = [None, 64, 64, 128, 128] 119 | 120 | def __init__(self, 121 | out_channels=128, 122 | bn_momentum=0.1, 123 | norm_type='IN', 124 | D=3): 125 | 126 | ME.MinkowskiNetwork.__init__(self, D) 127 | 128 | NORM_TYPE = norm_type 129 | BLOCK_NORM_TYPE = norm_type 130 | TR_CHANNELS = self.TR_CHANNELS 131 | CHANNELS = self.CHANNELS 132 | 133 | 134 | self.conv4_tr = ME.MinkowskiConvolutionTranspose( 135 | in_channels=CHANNELS[4], 136 | out_channels=TR_CHANNELS[4], 137 | kernel_size=3, 138 | stride=2, 139 | dilation=1, 140 | bias=False, 141 | dimension=D) 142 | 143 | self.norm4_tr = get_norm_layer(NORM_TYPE, TR_CHANNELS[4], bn_momentum=bn_momentum, D=D) 144 | 145 | self.block4_tr = get_res_block( 146 | BLOCK_NORM_TYPE, TR_CHANNELS[4], TR_CHANNELS[4], bn_momentum=bn_momentum, D=D) 147 | 148 | 149 | self.conv3_tr = ME.MinkowskiConvolutionTranspose( 150 | in_channels=CHANNELS[3] + TR_CHANNELS[4], 151 | out_channels=TR_CHANNELS[3], 152 | kernel_size=3, 153 | stride=2, 154 | dilation=1, 155 | bias=False, 156 | dimension=D) 157 | self.norm3_tr = get_norm_layer(NORM_TYPE, TR_CHANNELS[3], bn_momentum=bn_momentum, D=D) 158 | 159 | self.block3_tr = get_res_block( 160 | BLOCK_NORM_TYPE, TR_CHANNELS[3], TR_CHANNELS[3], bn_momentum=bn_momentum, D=D) 161 | 162 | 163 | self.conv2_tr = ME.MinkowskiConvolutionTranspose( 164 | in_channels=CHANNELS[2] + TR_CHANNELS[3], 165 | out_channels=TR_CHANNELS[2], 166 | kernel_size=3, 167 | stride=2, 168 | dilation=1, 169 | bias=False, 170 | dimension=D) 171 | self.norm2_tr = get_norm_layer(NORM_TYPE, TR_CHANNELS[2], bn_momentum=bn_momentum, D=D) 172 | 173 | self.block2_tr = get_res_block( 174 | BLOCK_NORM_TYPE, TR_CHANNELS[2], TR_CHANNELS[2], bn_momentum=bn_momentum, D=D) 175 | 176 | 177 | 178 | self.conv1_tr = ME.MinkowskiConvolutionTranspose( 179 | in_channels=CHANNELS[1] + TR_CHANNELS[2], 180 | out_channels=TR_CHANNELS[1], 181 | kernel_size=1, 182 | stride=1, 183 | dilation=1, 184 | bias=False, 185 | dimension=D) 186 | 187 | self.final = ME.MinkowskiConvolution( 188 | in_channels=TR_CHANNELS[1], 189 | out_channels=out_channels, 190 | kernel_size=1, 191 | stride=1, 192 | dilation=1, 193 | bias=True, 194 | dimension=D) 195 | 196 | 197 | def forward(self, x, skip_features): 198 | 199 | out = self.conv4_tr(x) 200 | out = self.norm4_tr(out) 201 | 202 | out_s4_tr = self.block4_tr(out) 203 | 204 | out = ME.cat(out_s4_tr, skip_features[-1]) 205 | 206 | out = self.conv3_tr(out) 207 | out = self.norm3_tr(out) 208 | out_s2_tr = self.block3_tr(out) 209 | 210 | out = ME.cat(out_s2_tr, skip_features[-2]) 211 | 212 | out = self.conv2_tr(out) 213 | out = self.norm2_tr(out) 214 | out_s1_tr = self.block2_tr(out) 215 | 216 | out = ME.cat(out_s1_tr, skip_features[-3]) 217 | 218 | out = self.conv1_tr(out) 219 | out = MEF.relu(out) 220 | out = self.final(out) 221 | 222 | return out 223 | 224 | class SparseFlowRefiner(ME.MinkowskiNetwork): 225 | BLOCK_NORM_TYPE = 'BN' 226 | NORM_TYPE = 'BN' 227 | 228 | def __init__(self, 229 | flow_dim = 3, 230 | flow_channels = 64, 231 | out_channels=3, 232 | bn_momentum=0.1, 233 | conv1_kernel_size=5, 234 | D=3): 235 | 236 | ME.MinkowskiNetwork.__init__(self, D) 237 | 238 | NORM_TYPE = self.NORM_TYPE 239 | BLOCK_NORM_TYPE = self.BLOCK_NORM_TYPE 240 | 241 | self.conv1 = ME.MinkowskiConvolution( 242 | in_channels=flow_dim, 243 | out_channels=flow_channels, 244 | kernel_size=conv1_kernel_size, 245 | stride=1, 246 | dilation=1, 247 | bias=False, 248 | dimension=D) 249 | 250 | self.conv2 = ME.MinkowskiConvolution( 251 | in_channels=flow_channels, 252 | out_channels=flow_channels, 253 | kernel_size=3, 254 | stride=1, 255 | dilation=1, 256 | bias=False, 257 | dimension=D) 258 | 259 | 260 | 261 | self.conv3 = ME.MinkowskiConvolution( 262 | in_channels=flow_channels, 263 | out_channels=flow_channels, 264 | kernel_size=3, 265 | stride=1, 266 | dilation=1, 267 | bias=False, 268 | dimension=D) 269 | 270 | self.conv4 = ME.MinkowskiConvolution( 271 | in_channels=flow_channels, 272 | out_channels=flow_channels, 273 | kernel_size=3, 274 | stride=1, 275 | dilation=1, 276 | bias=False, 277 | dimension=D) 278 | 279 | self.final = ME.MinkowskiConvolution( 280 | in_channels=flow_channels, 281 | out_channels=out_channels, 282 | kernel_size=1, 283 | stride=1, 284 | dilation=1, 285 | bias=False, 286 | dimension=D) 287 | 288 | 289 | def forward(self, flow): 290 | 291 | 292 | out = MEF.relu(self.conv1(flow)) 293 | out = MEF.relu(self.conv2(out)) 294 | 295 | out = MEF.relu(self.conv3(out)) 296 | out = MEF.relu(self.conv4(out)) 297 | 298 | res_flow = self.final(out) 299 | 300 | 301 | return flow + res_flow 302 | 303 | 304 | class EgoMotionHead(nn.Module): 305 | """ 306 | Class defining EgoMotionHead 307 | """ 308 | 309 | def __init__(self, add_slack=True, sinkhorn_iter=5): 310 | nn.Module.__init__(self) 311 | 312 | self.slack = add_slack 313 | self.sinkhorn_iter = sinkhorn_iter 314 | 315 | # Affinity parameters 316 | self.beta = torch.nn.Parameter(torch.tensor(-5.0)) 317 | self.alpha = torch.nn.Parameter(torch.tensor(-5.0)) 318 | 319 | self.softplus = torch.nn.Softplus() 320 | 321 | 322 | def compute_rigid_transform(self, xyz_s, xyz_t, weights): 323 | """Compute rigid transforms between two point sets 324 | 325 | Args: 326 | a (torch.Tensor): (B, M, 3) points 327 | b (torch.Tensor): (B, N, 3) points 328 | weights (torch.Tensor): (B, M) 329 | 330 | Returns: 331 | Transform T (B, 3, 4) to get from a to b, i.e. T*a = b 332 | """ 333 | 334 | weights_normalized = weights[..., None] / (torch.sum(weights[..., None], dim=1, keepdim=True) + _EPS) 335 | centroid_s = torch.sum(xyz_s * weights_normalized, dim=1) 336 | centroid_t = torch.sum(xyz_t * weights_normalized, dim=1) 337 | s_centered = xyz_s - centroid_s[:, None, :] 338 | t_centered = xyz_t - centroid_t[:, None, :] 339 | cov = s_centered.transpose(-2, -1) @ (t_centered * weights_normalized) 340 | 341 | # Compute rotation using Kabsch algorithm. Will compute two copies with +/-V[:,:3] 342 | # and choose based on determinant to avoid flips 343 | u, s, v = torch.svd(cov, some=False, compute_uv=True) 344 | rot_mat_pos = v @ u.transpose(-1, -2) 345 | v_neg = v.clone() 346 | v_neg[:, :, 2] *= -1 347 | rot_mat_neg = v_neg @ u.transpose(-1, -2) 348 | rot_mat = torch.where(torch.det(rot_mat_pos)[:, None, None] > 0, rot_mat_pos, rot_mat_neg) 349 | assert torch.all(torch.det(rot_mat) > 0) 350 | 351 | # Compute translation (uncenter centroid) 352 | translation = -rot_mat @ centroid_s[:, :, None] + centroid_t[:, :, None] 353 | 354 | transform = torch.cat((rot_mat, translation), dim=2) 355 | 356 | return transform 357 | 358 | def sinkhorn(self, log_alpha, n_iters=5, slack=True): 359 | """ Run sinkhorn iterations to generate a near doubly stochastic matrix, where each row or column sum to <=1 360 | Args: 361 | log_alpha: log of positive matrix to apply sinkhorn normalization (B, J, K) 362 | n_iters (int): Number of normalization iterations 363 | slack (bool): Whether to include slack row and column 364 | eps: eps for early termination (Used only for handcrafted RPM). Set to negative to disable. 365 | Returns: 366 | log(perm_matrix): Doubly stochastic matrix (B, J, K) 367 | Modified from original source taken from: 368 | Learning Latent Permutations with Gumbel-Sinkhorn Networks 369 | https://github.com/HeddaCohenIndelman/Learning-Gumbel-Sinkhorn-Permutations-w-Pytorch 370 | """ 371 | 372 | # Sinkhorn iterations 373 | 374 | zero_pad = nn.ZeroPad2d((0, 1, 0, 1)) 375 | log_alpha_padded = zero_pad(log_alpha[:, None, :, :]) 376 | 377 | log_alpha_padded = torch.squeeze(log_alpha_padded, dim=1) 378 | 379 | for i in range(n_iters): 380 | # Row normalization 381 | log_alpha_padded = torch.cat(( 382 | log_alpha_padded[:, :-1, :] - (torch.logsumexp(log_alpha_padded[:, :-1, :], dim=2, keepdim=True)), 383 | log_alpha_padded[:, -1, None, :]), # Don't normalize last row 384 | dim=1) 385 | 386 | # Column normalization 387 | log_alpha_padded = torch.cat(( 388 | log_alpha_padded[:, :, :-1] - (torch.logsumexp(log_alpha_padded[:, :, :-1], dim=1, keepdim=True)), 389 | log_alpha_padded[:, :, -1, None]), # Don't normalize last column 390 | dim=2) 391 | 392 | 393 | log_alpha = log_alpha_padded[:, :-1, :-1] 394 | 395 | return log_alpha 396 | 397 | 398 | def forward(self, score_matrix, mask, xyz_s, xyz_t): 399 | 400 | affinity = -(score_matrix - self.softplus(self.alpha))/(torch.exp(self.beta) + 0.02) 401 | 402 | # Compute weighted coordinates 403 | log_perm_matrix = self.sinkhorn(affinity, n_iters=self.sinkhorn_iter, slack=self.slack) 404 | 405 | perm_matrix = torch.exp(log_perm_matrix) * mask 406 | weighted_t = perm_matrix @ xyz_t / (torch.sum(perm_matrix, dim=2, keepdim=True) + _EPS) 407 | 408 | # Compute transform and transform points 409 | #transform = self.compute_rigid_transform(xyz_s, weighted_t, weights=torch.sum(perm_matrix, dim=2)) 410 | R_est, t_est, _, _ = kabsch_transformation_estimation(xyz_s, weighted_t, weights=torch.sum(perm_matrix, dim=2)) 411 | return R_est, t_est, perm_matrix 412 | 413 | 414 | 415 | class SparseSegHead(ME.MinkowskiNetwork): 416 | 417 | def __init__(self, 418 | in_channels=64, 419 | out_channels=128, 420 | bn_momentum=0.1, 421 | norm_type='IN', 422 | D=3): 423 | 424 | ME.MinkowskiNetwork.__init__(self, D) 425 | 426 | NORM_TYPE = norm_type 427 | 428 | self.seg_head_1 = ME.MinkowskiConvolution( 429 | in_channels=in_channels, 430 | out_channels=in_channels, 431 | kernel_size=1, 432 | stride=1, 433 | dilation=1, 434 | bias=True, 435 | dimension=D) 436 | 437 | self.norm_1 = get_norm_layer(NORM_TYPE, in_channels, bn_momentum=bn_momentum, D=D) 438 | 439 | self.seg_head_2 = ME.MinkowskiConvolution( 440 | in_channels=in_channels, 441 | out_channels=out_channels, 442 | kernel_size=1, 443 | stride=1, 444 | dilation=1, 445 | bias=True, 446 | dimension=D) 447 | 448 | 449 | def forward(self, x): 450 | 451 | out = self.seg_head_1(x) 452 | out = self.norm_1(out) 453 | out = MEF.relu(out) 454 | 455 | out = self.seg_head_2(out) 456 | 457 | 458 | return out -------------------------------------------------------------------------------- /lib/model/minkowski/__init__,py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/lib/model/minkowski/__init__,py -------------------------------------------------------------------------------- /lib/model/rigid_3d_sf.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | import numpy as np 5 | from collections import defaultdict 6 | from sklearn.cluster import DBSCAN 7 | import MinkowskiEngine as ME 8 | 9 | from lib.utils import pairwise_distance, transform_point_cloud, kabsch_transformation_estimation, refine_ego_motion, refine_cluster_motion 10 | from lib.utils import upsample_flow, upsample_bckg_labels, upsample_cluster_labels 11 | from lib.model.minkowski.MinkowskiFlow import SparseEnoder, SparseDecoder, SparseFlowRefiner, EgoMotionHead, SparseSegHead 12 | 13 | 14 | 15 | class MinkowskiFlow(nn.Module): 16 | def __init__(self, args): 17 | super(MinkowskiFlow, self).__init__() 18 | 19 | self.args = args 20 | self.voxel_size = args['misc']['voxel_size'] 21 | self.device = torch.device('cuda' if (torch.cuda.is_available() and args['misc']['use_gpu']) else 'cpu') 22 | self.normalize_feature = args['network']['normalize_features'] 23 | self.test_flag = True if args['misc']['run_mode'] == 'test' else False 24 | 25 | if self.test_flag: 26 | self.postprocess_ego = args['test']['postprocess_ego'] 27 | self.postprocess_clusters = args['test']['postprocess_clusters'] 28 | 29 | self.estimate_ego, self.estimate_flow, self.estimate_semantic, self.estimate_cluster = False, False, False, False 30 | 31 | self.upsampling_k = 36 if args['data']['dataset'] in ['StereoKITTI_ME', 'FlyingThings3D_ME'] else 3 32 | self.tau_offset = 0.025 if args['data']['dataset'] in ['StereoKITTI_ME', 'FlyingThings3D_ME'] else 0.03 33 | 34 | if args['data']['input_features'] == 'occupancy': 35 | self.input_feature_dim = 1 36 | else: 37 | self.input_feature_dim = 3 38 | 39 | # Initialize the backbone network 40 | self.encoder = SparseEnoder(in_channels=self.input_feature_dim, 41 | conv1_kernel_size=args['network']['in_kernel_size'], 42 | norm_type=args['network']['norm_type']) 43 | 44 | self.decoder = SparseDecoder(out_channels=args['network']['feature_dim'], 45 | norm_type=args['network']['norm_type']) 46 | 47 | # Initialize the scene flow head 48 | if args['method']['flow']: 49 | self.estimate_flow = True 50 | self.epsilon = torch.nn.Parameter(torch.tensor(-5.0)) 51 | 52 | self.flow_refiner = SparseFlowRefiner(flow_dim=3) 53 | 54 | # Initialize the background segmentation head 55 | if args['method']['semantic']: 56 | self.estimate_semantic = True 57 | 58 | self.seg_decoder = SparseSegHead(in_channels=args['network']['feature_dim'], 59 | out_channels=args['data']['n_classes'], 60 | norm_type=args['network']['norm_type']) 61 | 62 | 63 | # Initialize the ego motion head 64 | if args['method']['ego_motion']: 65 | self.estimate_ego = True 66 | self.ego_n_points = args['network']['ego_motion_points'] 67 | self.add_slack = args['network']['add_slack'] 68 | self.sinkhorn_iter = args['network']['sinkhorn_iter'] 69 | 70 | self.ego_motion_decoder = EgoMotionHead(add_slack=self.add_slack, 71 | sinkhorn_iter=self.sinkhorn_iter) 72 | 73 | # Initialize the foreground clustering head 74 | if args['method']['clustering']: 75 | self.estimate_cluster = True 76 | self.min_p_cluster = args['network']['min_p_cluster'] 77 | 78 | self.cluster_estimator = DBSCAN(min_samples=args['network']['min_samples_dbscan'], 79 | metric=args['network']['cluster_metric'], eps=args['network']['eps_dbscan']) 80 | 81 | 82 | 83 | def _infer_flow(self, flow_f_1, flow_f_2): 84 | 85 | # Normalize the features 86 | if self.normalize_feature: 87 | flow_f_1= ME.SparseTensor( 88 | flow_f_1.F / torch.norm(flow_f_1.F, p=2, dim=1, keepdim=True), 89 | coordinate_map_key=flow_f_1.coordinate_map_key, 90 | coordinate_manager=flow_f_1.coordinate_manager) 91 | 92 | flow_f_2= ME.SparseTensor( 93 | flow_f_2.F / torch.norm(flow_f_2.F, p=2, dim=1, keepdim=True), 94 | coordinate_map_key=flow_f_2.coordinate_map_key, 95 | coordinate_manager=flow_f_2.coordinate_manager) 96 | 97 | # Extract the coarse flow based on the feature correspondences 98 | coarse_flow = [] 99 | 100 | # Iterate over the examples in the batch 101 | for b_idx in range(len(flow_f_1.decomposed_coordinates)): 102 | feat_s = flow_f_1.F[flow_f_1.C[:,0] == b_idx] 103 | feat_t = flow_f_2.F[flow_f_2.C[:,0] == b_idx] 104 | 105 | coor_s = flow_f_1.C[flow_f_1.C[:,0] == b_idx,1:].to(self.device) * self.voxel_size 106 | coor_t = flow_f_2.C[flow_f_2.C[:,0] == b_idx,1:].to(self.device) * self.voxel_size 107 | 108 | 109 | # Squared l2 distance between points points of both point clouds 110 | coor_s, coor_t = coor_s.unsqueeze(0), coor_t.unsqueeze(0) 111 | feat_s, feat_t = feat_s.unsqueeze(0), feat_t.unsqueeze(0) 112 | 113 | # Force transport to be zero for points further than 10 m apart 114 | support = (pairwise_distance(coor_s, coor_t, normalized=False ) < 10**2).float() 115 | 116 | # Transport cost matrix 117 | C = pairwise_distance(feat_s, feat_t) 118 | 119 | K = torch.exp(-C / (torch.exp(self.epsilon) + self.tau_offset)) * support 120 | 121 | row_sum = K.sum(-1, keepdim=True) 122 | 123 | # Estimate flow 124 | corr_flow = (K @ coor_t) / (row_sum + 1e-8) - coor_s 125 | 126 | coarse_flow.append(corr_flow.squeeze(0)) 127 | 128 | 129 | coarse_flow = torch.cat(coarse_flow,dim=0) 130 | 131 | st_cf = ME.SparseTensor(features=coarse_flow, 132 | coordinate_manager=flow_f_1.coordinate_manager, 133 | coordinate_map_key=flow_f_1.coordinate_map_key) 134 | 135 | self.inferred_values['coarse_flow'] = st_cf.F 136 | 137 | 138 | # Refine the flow with the second network 139 | refined_flow = self.flow_refiner(st_cf) 140 | 141 | 142 | self.inferred_values['refined_flow'] = refined_flow.F 143 | 144 | 145 | 146 | def _infer_ego_motion(self, flow_f_1, flow_f_2, sem_label_s, sem_label_t): 147 | 148 | ego_motion_R = [] 149 | ego_motion_t = [] 150 | ego_motion_perm = [] 151 | 152 | run_b_len_s = 0 153 | run_b_len_t = 0 154 | 155 | # Normalize the features 156 | if self.normalize_feature: 157 | flow_f_1= ME.SparseTensor( 158 | flow_f_1.F / torch.norm(flow_f_1.F, p=2, dim=1, keepdim=True), 159 | coordinate_map_key=flow_f_1.coordinate_map_key, 160 | coordinate_manager=flow_f_1.coordinate_manager) 161 | 162 | flow_f_2= ME.SparseTensor( 163 | flow_f_2.F / torch.norm(flow_f_2.F, p=2, dim=1, keepdim=True), 164 | coordinate_map_key=flow_f_2.coordinate_map_key, 165 | coordinate_manager=flow_f_2.coordinate_manager) 166 | 167 | for b_idx in range(len(flow_f_1.decomposed_coordinates)): 168 | feat_s = flow_f_1.F[flow_f_1.C[:,0] == b_idx] 169 | feat_t = flow_f_2.F[flow_f_2.C[:,0] == b_idx] 170 | 171 | coor_s = flow_f_1.C[flow_f_1.C[:,0] == b_idx,1:].to(self.device) * self.voxel_size 172 | coor_t = flow_f_2.C[flow_f_2.C[:,0] == b_idx,1:].to(self.device) * self.voxel_size 173 | 174 | # Get the number of points in the current b_idx 175 | b_len_s = feat_s.shape[0] 176 | b_len_t = feat_t.shape[0] 177 | 178 | # Extract the semantic labels for the current b_idx (0 are the background points) 179 | mask_s = (sem_label_s[run_b_len_s: (run_b_len_s + b_len_s)] == 0) 180 | mask_t = (sem_label_t[run_b_len_t: (run_b_len_t + b_len_t)] == 0) 181 | 182 | # Update the running number of points 183 | run_b_len_s += b_len_s 184 | run_b_len_t += b_len_t 185 | 186 | # Squared l2 distance between points points of both point clouds 187 | coor_s, coor_t = coor_s[mask_s, :].unsqueeze(0), coor_t[mask_t, :].unsqueeze(0) 188 | feat_s, feat_t = feat_s[mask_s, :].unsqueeze(0), feat_t[mask_t, :].unsqueeze(0) 189 | 190 | # Sample the points randomly (to keep the computation memory tracktable) 191 | idx_ego_s = torch.randperm(coor_s.shape[1])[:self.ego_n_points] 192 | idx_ego_t = torch.randperm(coor_t.shape[1])[:self.ego_n_points] 193 | 194 | coor_s_ego = coor_s[:,idx_ego_s,:] 195 | coor_t_ego = coor_t[:,idx_ego_t,:] 196 | feat_s_ego = feat_s[:,idx_ego_s,:] 197 | feat_t_ego = feat_t[:,idx_ego_t,:] 198 | 199 | # Force transport to be zero for points further than 10 m apart 200 | support_ego = (pairwise_distance(coor_s_ego, coor_t_ego, normalized=False ) < 5 ** 2).float() 201 | 202 | # Cost matrix in the feature space 203 | feat_dist = pairwise_distance(feat_s_ego, feat_t_ego) 204 | 205 | R_est, t_est, perm_matrix = self.ego_motion_decoder(feat_dist, support_ego, coor_s_ego, coor_t_ego) 206 | 207 | ego_motion_R.append(R_est) 208 | ego_motion_t.append(t_est) 209 | ego_motion_perm.append(perm_matrix) 210 | 211 | 212 | # Save ego motion results 213 | self.inferred_values['R_est'] = torch.cat(ego_motion_R, dim=0) 214 | self.inferred_values['t_est'] = torch.cat(ego_motion_t, dim=0) 215 | self.inferred_values['permutation'] = ego_motion_perm 216 | 217 | 218 | def _infer_semantics(self, dec_f_1, dec_f_2): 219 | 220 | # Extract the logits 221 | logits_s = self.seg_decoder(dec_f_1) 222 | logits_t = self.seg_decoder(dec_f_2) 223 | 224 | self.inferred_values['semantic_logits_s'] = logits_s 225 | self.inferred_values['semantic_logits_t'] = logits_t 226 | 227 | 228 | def _infer_clusters(self, st_s, st_t, sem_label_s, sem_label_t): 229 | 230 | # Cluster the source and target point cloud (only source clusters will be used) 231 | running_idx_s = 0 232 | running_idx_t = 0 233 | 234 | clusters_s = defaultdict(list) 235 | clusters_t = defaultdict(list) 236 | 237 | clusters_s_rot = defaultdict(list) 238 | clusters_s_trans = defaultdict(list) 239 | 240 | batch_size = torch.max(st_s.coordinates[:,0]) + 1 241 | 242 | for b_idx in range(batch_size): 243 | b_fgrnd_idx_s = torch.where(sem_label_s[running_idx_s:(running_idx_s + st_s.C[st_s.C[:,0] == b_idx,1:].shape[0])] == 1)[0] 244 | b_fgrnd_idx_t = torch.where(sem_label_t[running_idx_t:(running_idx_t + st_t.C[st_t.C[:,0] == b_idx,1:].shape[0])] == 1)[0] 245 | 246 | coor_s = st_s.C[st_s.C[:,0] == b_idx,1:].to(self.device) * self.voxel_size 247 | coor_t = st_t.C[st_t.C[:,0] == b_idx,1:].to(self.device) * self.voxel_size 248 | 249 | # Only perform if foreground points are in both source and target 250 | if b_fgrnd_idx_s.shape[0] and b_fgrnd_idx_t.shape[0]: 251 | xyz_fgrnd_s = coor_s[b_fgrnd_idx_s, :].cpu().numpy() 252 | xyz_fgrnd_t = coor_t[b_fgrnd_idx_t, :].cpu().numpy() 253 | 254 | # Perform clustering 255 | labels_s = self.cluster_estimator.fit_predict(xyz_fgrnd_s) 256 | labels_t = self.cluster_estimator.fit_predict(xyz_fgrnd_t) 257 | 258 | # Map cluster labels to indices (consider only clusters that have at least n points) 259 | for class_label in np.unique(labels_s): 260 | if class_label != -1 and np.where(labels_s == class_label)[0].shape[0] >= self.min_p_cluster: 261 | clusters_s[str(b_idx)].append(b_fgrnd_idx_s[np.where(labels_s == class_label)[0]] + running_idx_s) 262 | 263 | for class_label in np.unique(labels_t): 264 | if class_label != -1 and np.where(labels_t == class_label)[0].shape[0] >= self.min_p_cluster: 265 | clusters_t[str(b_idx)].append(b_fgrnd_idx_t[np.where(labels_t == class_label)[0]] + running_idx_t) 266 | 267 | # Estimate the relative transformation parameteres of each cluster 268 | if self.test_flag: 269 | for c_idx in clusters_s[str(b_idx)]: 270 | cluster_xyz_s = (st_s.C[c_idx,1:] * self.voxel_size).unsqueeze(0).to(self.device) 271 | cluster_flow = self.inferred_values['refined_flow'][c_idx,:].unsqueeze(0) 272 | reconstructed_xyz = cluster_xyz_s + cluster_flow 273 | 274 | R_cluster, t_cluster, _, _ = kabsch_transformation_estimation(cluster_xyz_s, reconstructed_xyz) 275 | 276 | clusters_s_rot[str(b_idx)].append(R_cluster.squeeze(0)) 277 | clusters_s_trans[str(b_idx)].append(t_cluster.squeeze(0)) 278 | 279 | running_idx_s += coor_s.shape[0] 280 | running_idx_t += coor_t.shape[0] 281 | 282 | self.inferred_values['clusters_s'] = clusters_s 283 | self.inferred_values['clusters_t'] = clusters_t 284 | self.inferred_values['clusters_s_R'] = clusters_s_rot 285 | self.inferred_values['clusters_s_t'] = clusters_s_trans 286 | 287 | 288 | 289 | def forward(self, st_1, st_2, xyz_1, xyz_2, sem_label_s, sem_label_t): 290 | 291 | self.inferred_values = {} 292 | 293 | # Run both point clouds through the backbone network 294 | enc_feat_1, skip_features_1 = self.encoder(st_1) 295 | enc_feat_2, skip_features_2 = self.encoder(st_2) 296 | 297 | dec_feat_1 = self.decoder(enc_feat_1, skip_features_1) 298 | dec_feat_2 = self.decoder(enc_feat_2, skip_features_2) 299 | 300 | # Rune the background segmentation head 301 | if self.estimate_semantic: 302 | self._infer_semantics(dec_feat_1, dec_feat_2) 303 | est_sem_label_s = self.inferred_values['semantic_logits_s'].F.max(1)[1] 304 | est_sem_label_t = self.inferred_values['semantic_logits_t'].F.max(1)[1] 305 | 306 | # Rune the scene flow head 307 | if self.estimate_flow: 308 | self._infer_flow(dec_feat_1, dec_feat_2) 309 | 310 | # Rune the ego-motion head 311 | if self.estimate_ego: 312 | # During training use the given semantic labels to sample the points 313 | if self.test_flag: 314 | if self.estimate_semantic: 315 | self._infer_ego_motion(dec_feat_1, dec_feat_2, est_sem_label_s, est_sem_label_t) 316 | else: 317 | raise ValueError("Ego motion estimation selected in test phase but background segmentation head was not used") 318 | else: 319 | self._infer_ego_motion(dec_feat_1, dec_feat_2, sem_label_s, sem_label_t) 320 | 321 | # Rune the foreground clustering 322 | if self.estimate_cluster: 323 | # During training use the given semantic labels 324 | if self.test_flag: 325 | if self.estimate_semantic: 326 | self._infer_clusters(st_1,st_2, est_sem_label_s, est_sem_label_t) 327 | else: 328 | raise ValueError("Foreground clustering selected in test phase but background segmentation head was not used") 329 | else: 330 | 331 | self._infer_clusters(st_1,st_2, sem_label_s, sem_label_t) 332 | 333 | 334 | 335 | # From rigid transformations to pointwise scene flow 336 | if self.test_flag and self.estimate_ego: 337 | 338 | coor_s = st_1.C[st_1.C[:,0] == 0,1:].to(self.device) * self.voxel_size 339 | coor_t = st_2.C[st_2.C[:,0] == 0,1:].to(self.device) * self.voxel_size 340 | 341 | # Ego-motion test-time optimization 342 | if self.test_flag and self.postprocess_ego: 343 | bckg_mask_s = (est_sem_label_s == 0).unsqueeze(0) 344 | bckg_mask_t = (est_sem_label_t == 0).unsqueeze(0) 345 | 346 | R_e, t_e = refine_ego_motion(coor_s.unsqueeze(0), coor_t.unsqueeze(0), bckg_mask_s, bckg_mask_t, self.inferred_values['R_est'], self.inferred_values['t_est'] ) 347 | 348 | self.inferred_values['R_est'] = torch.from_numpy(R_e).to(self.device) 349 | self.inferred_values['t_est'] = torch.from_numpy(t_e).to(self.device) 350 | 351 | 352 | # Update the flow vectors of the background based on the ego motion 353 | xyz_1_transformed = transform_point_cloud(coor_s.to(self.device), self.inferred_values['R_est'], self.inferred_values['t_est']) 354 | bckg_idx = torch.where(est_sem_label_s == 0)[0] 355 | self.inferred_values['refined_flow'][bckg_idx,:] = xyz_1_transformed[0,bckg_idx,:].to(self.device) - coor_s[bckg_idx,:].to(self.device) 356 | 357 | if self.test_flag and self.estimate_cluster: 358 | 359 | # Foreground test time optimization 360 | if self.test_flag and self.postprocess_clusters: 361 | fgnd_mask_t = (est_sem_label_t == 1).unsqueeze(0) 362 | 363 | for idx, c_idx in enumerate(self.inferred_values['clusters_s']['0']): 364 | pc_s_cluster = coor_s[c_idx,:] 365 | pc_t_fgnd = coor_t[fgnd_mask_t[0],:] 366 | 367 | R_coarse = self.inferred_values['clusters_s_R']['0'][idx] 368 | t_coarse = self.inferred_values['clusters_s_t']['0'][idx] 369 | 370 | R_c, t_c = refine_cluster_motion(pc_s_cluster, pc_t_fgnd, R_coarse, t_coarse) 371 | 372 | R_c = torch.from_numpy(R_c).to(self.device) 373 | t_c = torch.from_numpy(t_c).to(self.device) 374 | 375 | self.inferred_values['clusters_s_R']['0'][idx] = R_c 376 | self.inferred_values['clusters_s_t']['0'][idx] = t_c 377 | 378 | 379 | # Update the flow vectors of the foreground based on the object wise rigid motion 380 | for idx, c_idx in enumerate(self.inferred_values['clusters_s']['0']): 381 | pc_s_cluster = coor_s[c_idx,:] 382 | 383 | cluster_transformed = transform_point_cloud(pc_s_cluster.to(self.device), self.inferred_values['clusters_s_R']['0'][idx], 384 | self.inferred_values['clusters_s_t']['0'][idx]) 385 | 386 | self.inferred_values['refined_flow'][c_idx,:] = cluster_transformed.squeeze(0).to(self.device) - pc_s_cluster.to(self.device) 387 | 388 | 389 | # Upsample the flow from the voxel centers to the original points 390 | 391 | if self.estimate_flow: 392 | # Finally we upsample the voxel flow to the actuall raw points 393 | refined_voxel_flow = ME.SparseTensor(features=self.inferred_values['refined_flow'], 394 | coordinate_manager=dec_feat_1.coordinate_manager, 395 | coordinate_map_key=dec_feat_1.coordinate_map_key) 396 | 397 | # Interpolate the flow from the voxels to the continuos coordinates on the coarse level and upsample the labels 398 | upsampled_voxel_flow = upsample_flow(xyz_1, refined_voxel_flow, k_value=self.upsampling_k, voxel_size=self.voxel_size) 399 | self.inferred_values['refined_rigid_flow'] = torch.cat(upsampled_voxel_flow, dim=0) 400 | 401 | if self.estimate_semantic: 402 | upsampled_seg_labels = upsample_bckg_labels(xyz_1, self.inferred_values['semantic_logits_s'], voxel_size=self.voxel_size) 403 | 404 | self.inferred_values['semantic_logits_s_all'] = upsampled_seg_labels 405 | 406 | if self.estimate_cluster: 407 | 408 | upsampled_cluster_labels = upsample_cluster_labels(xyz_1, self.inferred_values['semantic_logits_s'], self.inferred_values['clusters_s'], voxel_size=self.voxel_size) 409 | 410 | self.inferred_values['clusters_s_all'] = upsampled_cluster_labels 411 | 412 | return self.inferred_values 413 | 414 | 415 | -------------------------------------------------------------------------------- /lib/trainer.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import copy 3 | import MinkowskiEngine as ME 4 | from tqdm import tqdm 5 | 6 | from lib.loss import TrainLoss 7 | from lib.metrics import EvalMetrics 8 | from lib.utils import dict_all_to_device 9 | 10 | class FlowTrainer: 11 | ''' 12 | Default trainer class for the scene flow training 13 | 14 | Args: 15 | args (dict): configuration parameters 16 | model (nn.Module): model 17 | device (pytorch device) 18 | 19 | ''' 20 | 21 | def __init__(self, args, model, device): 22 | 23 | self.device = device 24 | 25 | self.compute_losses = TrainLoss(args) 26 | self.compute_metrics = EvalMetrics(args) 27 | self.model = model.to(device) 28 | 29 | 30 | def train_step(self, data): 31 | ''' 32 | Performs a single training step. 33 | 34 | Args: 35 | data (dict): input data 36 | 37 | Returns: 38 | loss_values (dict): all individual loss values 39 | metric (dict): evaluation metics 40 | total_loss (torch.tensor): loss value used for training 41 | 42 | ''' 43 | 44 | self.model.train() 45 | losses, metrics = self._compute_loss_metrics(data) 46 | 47 | # Copy only the loss values not the whole tensors 48 | loss_values = {} 49 | for key, value in losses.items(): 50 | loss_values[key] = value.item() 51 | 52 | return loss_values, metrics, losses['total_loss'] 53 | 54 | 55 | def eval_step(self, data): 56 | ''' 57 | Performs a single evaluation epoch. 58 | 59 | Args: 60 | data (dict): input data 61 | 62 | Returns: 63 | metric (dict): evaluation metics 64 | 65 | ''' 66 | 67 | # evaluate model: 68 | self.model.eval() 69 | with torch.no_grad(): 70 | _, metrics = self._compute_loss_metrics(data, phase='eval') 71 | 72 | return metrics 73 | 74 | 75 | def validate(self, val_loader): 76 | ''' 77 | Performs the whole validation 78 | 79 | Args: 80 | val_loader ( torch data loader): data loader of the validation data 81 | ''' 82 | 83 | # evaluate model: 84 | self.model.eval() 85 | running_losses = {} 86 | running_metrics = {} 87 | 88 | with torch.no_grad(): 89 | for it, batch in enumerate(tqdm(val_loader)): 90 | 91 | dict_all_to_device(batch, self.device) 92 | losses, metrics = self._compute_loss_metrics(batch) 93 | 94 | # Update the running losses 95 | if not running_losses: 96 | running_losses = copy.deepcopy(losses) 97 | else: 98 | for key, value in losses.items(): 99 | running_losses[key] += value 100 | 101 | # Update the running metrics 102 | if not running_metrics: 103 | running_metrics = copy.deepcopy(metrics) 104 | else: 105 | for key, value in metrics.items(): 106 | running_metrics[key] += value 107 | 108 | 109 | for key, value in running_losses.items(): 110 | running_losses[key] = value/len(val_loader) 111 | 112 | for key, value in running_metrics.items(): 113 | running_metrics[key] = value/len(val_loader) 114 | 115 | return running_losses, running_metrics 116 | 117 | 118 | class MEFlowTrainer(FlowTrainer): 119 | ''' 120 | Trainer class of the 3D rigid scene flow network with ME backbone 121 | 122 | Args: 123 | args (dict): configuration parameters 124 | model (nn.Module): model 125 | device (pytorch device) 126 | ''' 127 | 128 | def __init__(self, args, model, device): 129 | 130 | FlowTrainer.__init__(self, args, model, device) 131 | 132 | def _compute_loss_metrics(self, input_dict, phase='train'): 133 | 134 | ''' 135 | Computes the losses and evaluation metrics 136 | 137 | Args: 138 | input_dict (dict): data dictionary 139 | 140 | Return: 141 | losses (dict): selected loss values 142 | metric (dict): selected evaluation metric 143 | ''' 144 | 145 | # Run the feature and context encoder 146 | sinput1 = ME.SparseTensor(features=input_dict['sinput_s_F'].to(self.device), 147 | coordinates=input_dict['sinput_s_C'].to(self.device)) 148 | 149 | sinput2 = ME.SparseTensor(features=input_dict['sinput_t_F'].to(self.device), 150 | coordinates=input_dict['sinput_t_C'].to(self.device)) 151 | 152 | if phase == 'train': 153 | inferred_values = self.model(sinput1, sinput2, input_dict['pcd_s'], input_dict['pcd_t'], input_dict['fg_labels_s'], input_dict['fg_labels_t']) 154 | else: 155 | inferred_values = self.model(sinput1, sinput2, input_dict['pcd_eval_s'], input_dict['pcd_eval_t'], input_dict['fg_labels_s'], input_dict['fg_labels_t']) 156 | 157 | losses = self.compute_losses(inferred_values, input_dict) 158 | 159 | metrics = self.compute_metrics(inferred_values, input_dict, phase) 160 | 161 | return losses, metrics 162 | 163 | 164 | def _demo_step(self, input_dict): 165 | 166 | ''' 167 | Runs a short demo and visualizes the output 168 | 169 | Args: 170 | input_dict (dict): data dictionary 171 | 172 | ''' 173 | 174 | # Run the feature and context encoder 175 | sinput1 = ME.SparseTensor(features=input_dict['sinput_s_F'].to(self.device), 176 | coordinates=input_dict['sinput_s_C'].to(self.device)) 177 | 178 | sinput2 = ME.SparseTensor(features=input_dict['sinput_t_F'].to(self.device), 179 | coordinates=input_dict['sinput_t_C'].to(self.device)) 180 | 181 | inferred_values = self.model(sinput1, sinput2, input_dict['pcd_eval_s'], input_dict['pcd_eval_t'], input_dict['fg_labels_s'], input_dict['fg_labels_t']) 182 | 183 | 184 | metrics = self.compute_metrics(inferred_values, input_dict) 185 | 186 | return losses, metrics -------------------------------------------------------------------------------- /lib/utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | import torch 4 | import logging 5 | import math 6 | from collections import defaultdict 7 | 8 | import open3d as o3d 9 | import numpy as np 10 | 11 | from matplotlib import pyplot as plt 12 | from matplotlib import cm 13 | from matplotlib.colors import Normalize 14 | 15 | def dict_all_to_device(tensor_dict, device): 16 | """ 17 | Puts all the tensors to a specified device 18 | 19 | Args: 20 | tensor_dict (dict): dictionary of all tensors 21 | device (str): device to be used (cuda or cpu) 22 | 23 | """ 24 | 25 | for key in tensor_dict: 26 | if isinstance(tensor_dict[key], torch.Tensor): 27 | if 'sinput' not in key: 28 | tensor_dict[key] = tensor_dict[key].to(device) 29 | 30 | 31 | def save_checkpoint(filename, epoch, it, model, optimizer=None, scheduler=None, config=None, best_val=None): 32 | """ 33 | Saves the current model, optimizer, scheduler, and side information to a checkpoint 34 | 35 | Args: 36 | filename (str): path to where the checpoint will be saved 37 | epoch (int): current epoch 38 | it (int): current iteration 39 | model (nn.Module): torch neural network model 40 | optimizer (torch.optim): selected optimizer 41 | scheduler (torch.optim): selected scheduler 42 | config (dict): config parameters 43 | best_val (float): best validation result 44 | 45 | """ 46 | 47 | state = { 48 | 'epoch': epoch, 49 | 'state_dict': model.state_dict(), 50 | 'total_it': it, 51 | 'optimizer': optimizer.state_dict(), 52 | 'config': config, 53 | 'scheduler': scheduler.state_dict(), 54 | 'best_val': best_val, 55 | } 56 | 57 | logging.info("Saving checkpoint: {} ...".format(filename)) 58 | torch.save(state, filename) 59 | 60 | 61 | def load_checkpoint(model, optimizer, scheduler, filename): 62 | """ 63 | Loads the saved checkpoint and updates the model, optimizer and scheduler. 64 | 65 | Args: 66 | model (nn.Module): torch neural network model 67 | optimizer (torch.optim): selected optimizer 68 | scheduler (torch.optim): selected scheduler 69 | filename (str): path to the saved checkpoint 70 | 71 | Returns: 72 | model (nn.Module): model with pretrained parameters 73 | optimizer (torch.optim): optimizer loaded from the checkpoint 74 | scheduler (torch.optim): scheduler loaded from the checkpoint 75 | start_epoch (int): current epoch 76 | total_it (int): total number of iterations that were performed 77 | metric_val_best (float): current best valuation metric 78 | 79 | """ 80 | start_epoch = 0 81 | total_it = 0 82 | metric_val_best = np.inf 83 | 84 | if os.path.isfile(filename): 85 | logging.info("Loading checkpoint {}".format(filename)) 86 | checkpoint = torch.load(filename) 87 | 88 | # Safe loading of the model, load only the keys that are in the init and the saved model 89 | model_dict = model.state_dict() 90 | for key in model_dict: 91 | if key in checkpoint['state_dict']: 92 | model_dict[key] = checkpoint['state_dict'][key] 93 | 94 | model.load_state_dict(model_dict) 95 | 96 | if optimizer is not None and 'optimizer' in checkpoint: 97 | try: 98 | optimizer.load_state_dict(checkpoint['optimizer']) 99 | except: 100 | logging.info('could not load optimizer from the pretrained model') 101 | 102 | if 'epoch' in checkpoint: 103 | start_epoch = checkpoint['epoch'] 104 | 105 | if 'total_it' in checkpoint: 106 | total_it = checkpoint['total_it'] 107 | 108 | if 'best_val' in checkpoint: 109 | metric_val_best = checkpoint['best_val'] 110 | 111 | if scheduler is not None and 'scheduler' in checkpoint: 112 | scheduler.load_state_dict(checkpoint['scheduler']) 113 | 114 | else: 115 | logging.info("No checkpoint found at {}".format(filename)) 116 | 117 | return model, optimizer, scheduler, start_epoch, total_it, metric_val_best 118 | 119 | 120 | def load_point_cloud(file, data_type='numpy'): 121 | """ 122 | Loads the point cloud coordinates from the '*.ply' file. 123 | Args: 124 | file (str): path to the '*.ply' file 125 | data_type (str): data type to be returned (default: numpy) 126 | Returns: 127 | pc (np.array or open3d.PointCloud()): point coordinates [n, 3] 128 | """ 129 | temp_pc = o3d.io.read_point_cloud(file) 130 | 131 | assert data_type in ['numpy', 'open3d'], 'Wrong data type selected when loading the ply file.' 132 | 133 | if data_type == 'numpy': 134 | return np.asarray(temp_pc.points) 135 | else: 136 | return temp_pc 137 | 138 | 139 | def sorted_alphanum(file_list_ordered): 140 | """ 141 | Sorts the list alphanumerically 142 | Args: 143 | file_list_ordered (list): list of files to be sorted 144 | Return: 145 | sorted_list (list): input list sorted alphanumerically 146 | """ 147 | def convert(text): 148 | return int(text) if text.isdigit() else text 149 | 150 | def alphanum_key(key): 151 | return [convert(c) for c in re.split('([0-9]+)', key)] 152 | 153 | sorted_list = sorted(file_list_ordered, key=alphanum_key) 154 | 155 | return sorted_list 156 | 157 | def get_file_list(path, extension=None): 158 | """ 159 | Build a list of all the files in the provided path 160 | Args: 161 | path (str): path to the directory 162 | extension (str): only return files with this extension 163 | Return: 164 | file_list (list): list of all the files (with the provided extension) sorted alphanumerically 165 | """ 166 | if extension is None: 167 | file_list = [os.path.join(path, f) for f in os.listdir(path) if os.path.isfile(os.path.join(path, f))] 168 | else: 169 | file_list = [ 170 | os.path.join(path, f) 171 | for f in os.listdir(path) 172 | if os.path.isfile(os.path.join(path, f)) and os.path.splitext(f)[1] == extension 173 | ] 174 | file_list = sorted_alphanum(file_list) 175 | 176 | return file_list 177 | 178 | 179 | def get_folder_list(path): 180 | """ 181 | Build a list of all the files in the provided path 182 | Args: 183 | path (str): path to the directory 184 | extension (str): only return files with this extension 185 | Returns: 186 | file_list (list): list of all the files (with the provided extension) sorted alphanumerically 187 | """ 188 | folder_list = [os.path.join(path, f) for f in os.listdir(path) if os.path.isdir(os.path.join(path, f))] 189 | folder_list = sorted_alphanum(folder_list) 190 | 191 | return folder_list 192 | 193 | def n_model_parameters(model): 194 | """ 195 | Counts the number of parameters in a torch model 196 | Args: 197 | model (torch.Model): input model 198 | 199 | Returns: 200 | _ (int): number of the parameters 201 | """ 202 | 203 | return sum(p.numel() for p in model.parameters() if p.requires_grad) 204 | 205 | 206 | def pairwise_distance(src, dst, normalized=True): 207 | """Calculates squared Euclidean distance between each two points. 208 | Args: 209 | src (torch tensor): source data, [b, n, c] 210 | dst (torch tensor): target data, [b, m, c] 211 | normalized (bool): distance computation can be more efficient 212 | Returns: 213 | dist (torch tensor): per-point square distance, [b, n, m] 214 | """ 215 | 216 | if len(src.shape) == 2: 217 | src = src.unsqueeze(0) 218 | dst = dst.unsqueeze(0) 219 | 220 | B, N, _ = src.shape 221 | _, M, _ = dst.shape 222 | 223 | # Minus such that smaller value still means closer 224 | dist = -torch.matmul(src, dst.permute(0, 2, 1)) 225 | 226 | # If inputs are normalized just add 1 otherwise compute the norms 227 | if not normalized: 228 | dist *= 2 229 | dist += torch.sum(src ** 2, dim=-1)[:, :, None] 230 | dist += torch.sum(dst ** 2, dim=-1)[:, None, :] 231 | 232 | else: 233 | dist += 1.0 234 | 235 | # Distances can get negative due to numerical precision 236 | dist = torch.clamp(dist, min=0.0, max=None) 237 | 238 | return dist 239 | 240 | 241 | def rotation_error(R1, R2): 242 | """ 243 | Torch batch implementation of the rotation error between the estimated and the ground truth rotatiom matrix. 244 | Rotation error is defined as r_e = \arccos(\frac{Trace(\mathbf{R}_{ij}^{T}\mathbf{R}_{ij}^{\mathrm{GT}) - 1}{2}) 245 | Args: 246 | R1 (torch tensor): Estimated rotation matrices [b,3,3] 247 | R2 (torch tensor): Ground truth rotation matrices [b,3,3] 248 | Returns: 249 | ae (torch tensor): Rotation error in angular degreees [b,1] 250 | """ 251 | R_ = torch.matmul(R1.transpose(1,2), R2) 252 | e = torch.stack([(torch.trace(R_[_, :, :]) - 1) / 2 for _ in range(R_.shape[0])], dim=0).unsqueeze(1) 253 | 254 | # Clamp the errors to the valid range (otherwise torch.acos() is nan) 255 | e = torch.clamp(e, -1, 1, out=None) 256 | 257 | ae = torch.acos(e) 258 | pi = torch.Tensor([math.pi]) 259 | ae = 180. * ae / pi.to(ae.device).type(ae.dtype) 260 | 261 | return ae 262 | 263 | 264 | def translation_error(t1, t2): 265 | """ 266 | Torch batch implementation of the rotation error between the estimated and the ground truth rotatiom matrix. 267 | Rotation error is defined as r_e = \arccos(\frac{Trace(\mathbf{R}_{ij}^{T}\mathbf{R}_{ij}^{\mathrm{GT}) - 1}{2}) 268 | Args: 269 | t1 (torch tensor): Estimated translation vectors [b,3,1] 270 | t2 (torch tensor): Ground truth translation vectors [b,3,1] 271 | Returns: 272 | te (torch tensor): translation error in meters [b,1] 273 | """ 274 | return torch.norm(t1-t2, dim=(1, 2)) 275 | 276 | 277 | def kabsch_transformation_estimation(x1, x2, weights=None, normalize_w = True, eps = 1e-7, best_k = 0, w_threshold = 0, compute_residuals = False): 278 | """ 279 | Torch differentiable implementation of the weighted Kabsch algorithm (https://en.wikipedia.org/wiki/Kabsch_algorithm). Based on the correspondences and weights calculates 280 | the optimal rotation matrix in the sense of the Frobenius norm (RMSD), based on the estimated rotation matrix it then estimates the translation vector hence solving 281 | the Procrustes problem. This implementation supports batch inputs. 282 | Args: 283 | x1 (torch array): points of the first point cloud [b,n,3] 284 | x2 (torch array): correspondences for the PC1 established in the feature space [b,n,3] 285 | weights (torch array): weights denoting if the coorespondence is an inlier (~1) or an outlier (~0) [b,n] 286 | normalize_w (bool) : flag for normalizing the weights to sum to 1 287 | best_k (int) : number of correspondences with highest weights to be used (if 0 all are used) 288 | w_threshold (float) : only use weights higher than this w_threshold (if 0 all are used) 289 | Returns: 290 | rot_matrices (torch array): estimated rotation matrices [b,3,3] 291 | trans_vectors (torch array): estimated translation vectors [b,3,1] 292 | res (torch array): pointwise residuals (Eucledean distance) [b,n] 293 | valid_gradient (bool): Flag denoting if the SVD computation converged (gradient is valid) 294 | """ 295 | if weights is None: 296 | weights = torch.ones(x1.shape[0],x1.shape[1]).type_as(x1).to(x1.device) 297 | 298 | if normalize_w: 299 | sum_weights = torch.sum(weights,dim=1,keepdim=True) + eps 300 | weights = (weights/sum_weights) 301 | 302 | weights = weights.unsqueeze(2) 303 | 304 | if best_k > 0: 305 | indices = np.argpartition(weights.cpu().numpy(), -best_k, axis=1)[0,-best_k:,0] 306 | weights = weights[:,indices,:] 307 | x1 = x1[:,indices,:] 308 | x2 = x2[:,indices,:] 309 | 310 | if w_threshold > 0: 311 | weights[weights < w_threshold] = 0 312 | 313 | 314 | x1_mean = torch.matmul(weights.transpose(1,2), x1) / (torch.sum(weights, dim=1).unsqueeze(1) + eps) 315 | x2_mean = torch.matmul(weights.transpose(1,2), x2) / (torch.sum(weights, dim=1).unsqueeze(1) + eps) 316 | 317 | x1_centered = x1 - x1_mean 318 | x2_centered = x2 - x2_mean 319 | 320 | cov_mat = torch.matmul(x1_centered.transpose(1, 2), 321 | (x2_centered * weights)) 322 | 323 | try: 324 | u, s, v = torch.svd(cov_mat) 325 | 326 | except Exception as e: 327 | r = torch.eye(3,device=x1.device) 328 | r = r.repeat(x1_mean.shape[0],1,1) 329 | t = torch.zeros((x1_mean.shape[0],3,1), device=x1.device) 330 | 331 | res = transformation_residuals(x1, x2, r, t) 332 | 333 | return r, t, res, True 334 | 335 | tm_determinant = torch.det(torch.matmul(v.transpose(1, 2), u.transpose(1, 2))) 336 | 337 | determinant_matrix = torch.diag_embed(torch.cat((torch.ones((tm_determinant.shape[0],2),device=x1.device), tm_determinant.unsqueeze(1)), 1)) 338 | 339 | rotation_matrix = torch.matmul(v,torch.matmul(determinant_matrix,u.transpose(1,2))) 340 | 341 | # translation vector 342 | translation_matrix = x2_mean.transpose(1,2) - torch.matmul(rotation_matrix,x1_mean.transpose(1,2)) 343 | 344 | # Residuals 345 | res = None 346 | if compute_residuals: 347 | res = transformation_residuals(x1, x2, rotation_matrix, translation_matrix) 348 | 349 | return rotation_matrix, translation_matrix, res, False 350 | 351 | 352 | def transformation_residuals(x1, x2, R, t): 353 | """ 354 | Computer the pointwise residuals based on the estimated transformation paramaters 355 | 356 | Args: 357 | x1 (torch array): points of the first point cloud [b,n,3] 358 | x2 (torch array): points of the second point cloud [b,n,3] 359 | R (torch array): estimated rotation matrice [b,3,3] 360 | t (torch array): estimated translation vectors [b,3,1] 361 | Returns: 362 | res (torch array): pointwise residuals (Eucledean distance) [b,n,1] 363 | """ 364 | x2_reconstruct = torch.matmul(R, x1.transpose(1, 2)) + t 365 | 366 | res = torch.norm(x2_reconstruct.transpose(1, 2) - x2, dim=2) 367 | 368 | return res 369 | 370 | def transform_point_cloud(x1, R, t): 371 | """ 372 | Transforms the point cloud using the giver transformation paramaters 373 | 374 | Args: 375 | x1 (np array): points of the point cloud [b,n,3] 376 | R (np array): estimated rotation matrice [b,3,3] 377 | t (np array): estimated translation vectors [b,3,1] 378 | Returns: 379 | x1_t (np array): points of the transformed point clouds [b,n,3] 380 | """ 381 | if len(R.shape) != 3: 382 | R = R.unsqueeze(0) 383 | 384 | if len(t.shape) != 3: 385 | t = t.unsqueeze(0) 386 | 387 | if len(x1.shape) != 3: 388 | x1 = x1.unsqueeze(0) 389 | 390 | x1_t = (torch.matmul(R, x1.transpose(2,1)) + t).transpose(2,1) 391 | 392 | return x1_t 393 | 394 | 395 | def refine_ego_motion(pc_s, pc_t, bckg_mask_s, bckg_mask_t, R_est, t_est): 396 | """ 397 | Refines the coarse ego motion estimate based on all background indices 398 | 399 | Args: 400 | pc_s (torch.tensor): points of the source point cloud [b,n,3] 401 | pc_t (torch.tensor): points of the target point cloud [b,n,3] 402 | bckg_mask_s (torch.tensor): background mask for the source points [b,n] 403 | bckg_mask_t (torch.tensor): background mask for the target points [b,n] 404 | R_est (torch.tensor): coarse rotation matrices [b,3,3] 405 | t_est (torch.tensor): coarse translation vectors [b,3,1] 406 | Returns: 407 | R_ref (np array): refined transformation parameters [b,3,3] 408 | t_ref (np array): refined transformation parameters [b,3,1] 409 | """ 410 | 411 | pcd_s = o3d.geometry.PointCloud() 412 | pcd_t = o3d.geometry.PointCloud() 413 | 414 | R_est = R_est.cpu().numpy() 415 | t_est = t_est.cpu().numpy() 416 | 417 | R_ref = np.zeros_like(R_est) 418 | t_ref = np.zeros_like(t_est) 419 | 420 | init_T = np.eye(4) 421 | 422 | for b_idx in range(pc_s.shape[0]): 423 | xyz_bckg_s = pc_s[b_idx, bckg_mask_s[b_idx,:], :].cpu().numpy() 424 | xyz_bckg_t = pc_t[b_idx, bckg_mask_t[b_idx,:], :].cpu().numpy() 425 | 426 | pcd_s.points = o3d.utility.Vector3dVector(xyz_bckg_s) 427 | pcd_t.points = o3d.utility.Vector3dVector(xyz_bckg_t) 428 | 429 | init_T[0:3,0:3] = R_est[b_idx,:,:] 430 | init_T[0:3,3:4] = t_est[b_idx,:,:] 431 | 432 | trans = o3d.pipelines.registration.registration_icp(pcd_s, pcd_t, 433 | max_correspondence_distance=0.15, init=init_T, 434 | criteria=o3d.pipelines.registration.ICPConvergenceCriteria(max_iteration = 300)) 435 | 436 | R_ref[b_idx,:,:] = trans.transformation[0:3,0:3] 437 | t_ref[b_idx,:,:] = trans.transformation[0:3,3:4] 438 | 439 | return R_ref, t_ref 440 | 441 | 442 | 443 | 444 | def refine_cluster_motion(pc_s, pc_t, R_est=None, t_est=None): 445 | """ 446 | Refines the motion of a foreground rigid agent (clust) 447 | 448 | Args: 449 | pc_s (torch.tensor): points of the cluster points [n,3] 450 | pc_t (torch.tensor): foreground point of the target point cloud [m,3] 451 | R_coarse (torch.tensor): coarse rotation matrices [3,3] 452 | t_coarse (torch.tensor): coarse translation vectors [3,1] 453 | Returns: 454 | R_ref (np array): refined transformation parameters [3,3] 455 | t_ref (np array): refined transformation parameters [3,1] 456 | """ 457 | 458 | pcd_s = o3d.geometry.PointCloud() 459 | pcd_t = o3d.geometry.PointCloud() 460 | 461 | init_T = np.eye(4, dtype=np.float) 462 | 463 | if R_est is not None: 464 | init_T[0:3,0:3] = R_est.cpu().numpy() 465 | init_T[0:3,3:4] = t_est.cpu().numpy() 466 | 467 | pcd_s.points = o3d.utility.Vector3dVector(pc_s.cpu()) 468 | pcd_t.points = o3d.utility.Vector3dVector(pc_t.cpu()) 469 | 470 | trans = o3d.pipelines.registration.registration_icp(pcd_s, pcd_t, 471 | max_correspondence_distance=0.25, init=init_T, 472 | criteria=o3d.pipelines.registration.ICPConvergenceCriteria(max_iteration = 300)) 473 | 474 | R_ref = trans.transformation[0:3,0:3].astype(np.float32) 475 | t_ref = trans.transformation[0:3,3:4].astype(np.float32) 476 | 477 | return R_ref, t_ref 478 | 479 | 480 | def compute_epe(est_flow, gt_flow, sem_label=None, eval_stats =False, mask=None): 481 | """ 482 | Compute 3d end-point-error 483 | 484 | Args: 485 | st_flow (torch.Tensor): estimated flow vectors [n,3] 486 | gt_flow (torch.Tensor): ground truth flow vectors [n,3] 487 | eval_stats (bool): compute the evaluation stats as defined in FlowNet3D 488 | mask (torch.Tensor): boolean mask used for filtering the epe [n] 489 | 490 | Returns: 491 | epe (float): mean EPE for current batch 492 | epe_bckg (float): mean EPE for the background points 493 | epe_forg (float): mean EPE for the foreground points 494 | acc3d_strict (float): inlier ratio according to strict thresh (error smaller than 5cm or 5%) 495 | acc3d_relax (float): inlier ratio according to relaxed thresh (error smaller than 10cm or 10%) 496 | outlier (float): ratio of outliers (error larger than 30cm or 10%) 497 | """ 498 | 499 | metrics = {} 500 | error = est_flow - gt_flow 501 | 502 | # If mask if provided mask out the flow 503 | if mask is not None: 504 | error = error[mask > 0.5] 505 | gt_flow = gt_flow[mask > 0.5, :] 506 | 507 | epe_per_point = torch.sqrt(torch.sum(torch.pow(error, 2.0), -1)) 508 | epe = epe_per_point.mean() 509 | 510 | metrics['epe'] = epe.item() 511 | 512 | 513 | if sem_label is not None: 514 | # Extract epe for background and foreground separately (background = class 0) 515 | bckg_mask = (sem_label == 0) 516 | forg_mask = (sem_label == 1) 517 | 518 | bckg_epe = epe_per_point[bckg_mask] 519 | forg_epe = epe_per_point[forg_mask] 520 | 521 | metrics['bckg_epe'] = bckg_epe.mean().item() 522 | metrics['bckg_epe_median'] = bckg_epe.median().item() 523 | 524 | if torch.sum(forg_mask) > 0: 525 | metrics['forg_epe_median'] = forg_epe.median().item() 526 | metrics['forg_epe'] = forg_epe.mean().item() 527 | 528 | if eval_stats: 529 | 530 | gt_f_magnitude = torch.norm(gt_flow, dim=-1) 531 | gt_f_magnitude_np = np.linalg.norm(gt_flow.cpu(), axis=-1) 532 | relative_err = epe_per_point / (gt_f_magnitude + 1e-4) 533 | acc3d_strict = ( 534 | (torch.logical_or(epe_per_point < 0.05, relative_err < 0.05)).type(torch.float).mean() 535 | ) 536 | acc3d_relax = ( 537 | (torch.logical_or(epe_per_point < 0.1, relative_err < 0.1)).type(torch.float).mean() 538 | ) 539 | outlier = (torch.logical_or(epe_per_point > 0.3, relative_err > 0.1)).type(torch.float).mean() 540 | 541 | metrics['acc3d_s'] = acc3d_strict.item() 542 | metrics['acc3d_r'] = acc3d_relax.item() 543 | metrics['outlier'] = outlier.item() 544 | 545 | return metrics 546 | 547 | 548 | def compute_l1_loss(est_flow, gt_flow): 549 | """ 550 | Compute training loss. 551 | 552 | Args: 553 | est_flow (torch.Tensor): estimated flow 554 | gt_flow (torch.Tensor): : ground truth flow 555 | 556 | Returns 557 | loss (torch.tensor): mean l1 loss of the current batch 558 | 559 | """ 560 | 561 | error = est_flow - gt_flow 562 | loss = torch.mean(torch.abs(error)) 563 | 564 | return loss 565 | 566 | 567 | 568 | def precision_at_one(pred, target): 569 | """ 570 | Computes the precision and recall of the binary fg/bg segmentation 571 | 572 | Args: 573 | pred (torch.Tensor): predicted foreground labels 574 | target (torch.Tensor): : gt foreground labels 575 | 576 | Returns 577 | precision_f (float): foreground precision 578 | precision_b (float): background precision 579 | recall_f (float): foreground recall 580 | recall_b (float): background recall 581 | 582 | """ 583 | 584 | precision_f = (pred[target == 1] == 1).float().sum() / ((pred == 1).float().sum() + 1e-6) 585 | precision_b = (pred[target == 0] == 0).float().sum() / ((pred == 0).float().sum() + 1e-6) 586 | 587 | recall_f = (pred[target == 1] == 1).float().sum() / ((target == 1).float().sum() + 1e-6) 588 | recall_b = (pred[target == 0] == 0).float().sum() / ((target == 0).float().sum() + 1e-6) 589 | 590 | return precision_f, precision_b, recall_f, recall_b 591 | 592 | 593 | def evaluate_binary_class(pred, target): 594 | """ 595 | Computes the number of true/false positives and negatives 596 | 597 | Args: 598 | pred (torch.Tensor): predicted foreground labels 599 | target (torch.Tensor): : gt foreground labels 600 | 601 | Returns 602 | true_p (float): number of true positives 603 | true_n (float): number of true negatives 604 | false_p (float): number of false positives 605 | false_n (float): number of false negatives 606 | 607 | """ 608 | 609 | true_p = (pred[target == 1] == 1).float().sum() 610 | true_n = (pred[target == 0] == 0).float().sum() 611 | 612 | false_p = (pred[target == 0] == 1).float().sum() 613 | false_n = (pred[target == 1] == 0).float().sum() 614 | 615 | return true_p, true_n, false_p, false_n 616 | 617 | 618 | 619 | def upsample_flow(xyz, sparse_flow_tensor, k_value=3, voxel_size = None): 620 | 621 | dense_flow = [] 622 | for b_idx in range(len(xyz)): 623 | 624 | sparse_xyz = sparse_flow_tensor.coordinates_at(b_idx).cuda() * voxel_size 625 | sparse_flow = sparse_flow_tensor.features_at(b_idx) 626 | 627 | sqr_dist = pairwise_distance(xyz[b_idx].cuda(), sparse_xyz, normalized=False).squeeze(0) 628 | sqr_dist, group_idx = torch.topk(sqr_dist, k_value, dim = -1, largest=False, sorted=False) 629 | 630 | 631 | dist = torch.sqrt(sqr_dist) 632 | norm = torch.sum(1 / (dist + 1e-7), dim = 1, keepdim = True) 633 | weight = ((1 / (dist + 1e-7)) / norm ).unsqueeze(-1) 634 | 635 | test = group_idx.reshape(-1) 636 | sparse_flow = sparse_flow[group_idx.reshape(-1), :].reshape(-1,k_value,3) 637 | 638 | dense_flow.append(torch.sum(weight * sparse_flow, dim=1)) 639 | 640 | return dense_flow 641 | 642 | 643 | def upsample_bckg_labels(xyz, sparse_seg_tensor, voxel_size = None): 644 | 645 | upsampled_seg_labels = [] 646 | for b_idx in range(len(xyz)): 647 | sparse_xyz = sparse_seg_tensor.coordinates_at(b_idx).cuda() * voxel_size 648 | seg_labels = sparse_seg_tensor.features_at(b_idx) 649 | sqr_dist = pairwise_distance(xyz[b_idx].cuda(), sparse_xyz, normalized=False).squeeze(0) 650 | sqr_dist, idx = torch.topk(sqr_dist, 1, dim = -1, largest=False, sorted=False) 651 | 652 | 653 | upsampled_seg_labels.append(seg_labels[idx.reshape(-1)]) 654 | 655 | return torch.cat(upsampled_seg_labels,0) 656 | 657 | 658 | 659 | def upsample_cluster_labels(xyz, sparse_seg_tensor, cluster_labels, voxel_size = None): 660 | 661 | upsampled_seg_labels = [] 662 | 663 | cluster_labels_all = defaultdict(list) 664 | for b_idx in range(len(xyz)): 665 | sparse_xyz = sparse_seg_tensor.coordinates_at(b_idx).cuda() * voxel_size 666 | seg_labels = sparse_seg_tensor.features_at(b_idx) 667 | sqr_dist = pairwise_distance(xyz[b_idx].cuda(), sparse_xyz, normalized=False).squeeze(0) 668 | sqr_dist, idx = torch.topk(sqr_dist, 1, dim = -1, largest=False, sorted=False) 669 | 670 | for cluster in cluster_labels[str(b_idx)]: 671 | cluster_indices = torch.nonzero(idx.reshape(-1)[:,None] == cluster) 672 | 673 | cluster_labels_all[str(b_idx)].append(cluster_indices[:,0]) 674 | 675 | return cluster_labels_all 676 | 677 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | coloredlogs 2 | tqdm 3 | pyyaml 4 | tensorboardx 5 | matplotlib -------------------------------------------------------------------------------- /scripts/download_data.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | DATASET=$1 4 | 5 | function download() { 6 | if [ ! -d "data" ]; then 7 | mkdir -p "data" 8 | fi 9 | cd data 10 | 11 | url="https://share.phys.ethz.ch/~gsg/weakly_supervised_3D_rigid_scene_flow/data/" 12 | 13 | extension=".tar" 14 | 15 | wget --no-check-certificate --show-progress "$url$DATASET$extension" 16 | tar -xf "$DATASET$extension" 17 | rm "$DATASET$extension" 18 | cd ../ 19 | 20 | 21 | } 22 | 23 | 24 | function download_all() { 25 | 26 | for DATASET in "stereo_kitti" "lidar_kitti" "semantic_kitti" "waymo_open" "flying_things_3d" 27 | do 28 | download 29 | done 30 | } 31 | 32 | function main() { 33 | if [ -z "$DATASET" ]; 34 | then 35 | echo "No dataset selcted. All data will be downloaded" 36 | download_all 37 | elif [ "$DATASET" == "stereo_kitti" ] || [ "$DATASET" == "lidar_kitti" ] || [ "$DATASET" == "semantic_kitti" ] || [ "$DATASET" == "waymo_open" ] || [ "$DATASET" == "flying_things_3d" ] 38 | then 39 | download 40 | else 41 | echo "Wrong dataset selected must be one of [stereo_kitti, lidar_kitti, waymo_open, semantic_kitti, flying_things_3d]." 42 | fi 43 | } 44 | 45 | main; 46 | -------------------------------------------------------------------------------- /scripts/download_pretrained_models.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | 4 | function download_models() { 5 | if [ ! -d "logs" ]; then 6 | mkdir -p "logs" 7 | fi 8 | cd logs 9 | 10 | url="https://share.phys.ethz.ch/~gsg/weakly_supervised_3D_rigid_scene_flow/pretrained_models/" 11 | 12 | model_file="pretrained_models.tar" 13 | 14 | wget --no-check-certificate --show-progress "$url$model_file" 15 | tar -xf "$model_file" 16 | rm "$model_file" 17 | cd ../ 18 | } 19 | 20 | 21 | download_models; 22 | -------------------------------------------------------------------------------- /scripts/download_pretrained_models_ablations.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | 4 | function download_models() { 5 | if [ ! -d "logs" ]; then 6 | mkdir -p "logs" 7 | fi 8 | cd logs 9 | 10 | url="https://share.phys.ethz.ch/~gsg/weakly_supervised_3D_rigid_scene_flow/pretrained_models/" 11 | 12 | model_file="pretrained_models_ablations.tar" 13 | 14 | wget --no-check-certificate --show-progress "$url$model_file" 15 | tar -xf "$model_file" 16 | rm "$model_file" 17 | cd ../ 18 | } 19 | 20 | 21 | download_models; 22 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import os 2 | import gc 3 | import shutil 4 | import torch 5 | import time 6 | import argparse 7 | import yaml 8 | import copy 9 | import glob 10 | import numpy as np 11 | from datetime import datetime 12 | from tqdm import tqdm 13 | from tensorboardX import SummaryWriter 14 | 15 | import lib.config as config 16 | from lib.utils import n_model_parameters, save_checkpoint, dict_all_to_device, load_checkpoint 17 | from lib.data import make_data_loader 18 | from lib.logger import prepare_logger 19 | 20 | # Set the random seeds for repeatability 21 | np.random.seed(41) 22 | torch.manual_seed(41) 23 | if torch.cuda.is_available(): 24 | torch.cuda.manual_seed(41) 25 | 26 | def main(cfg, config_name): 27 | """ 28 | Main training function: after preparing the data loaders, model, optimizer, and trainer, 29 | start with the training process. 30 | 31 | Args: 32 | cfg (dict): current configuration parameters 33 | config_name (str): path to the config file 34 | """ 35 | 36 | # Create the output dir if it does not exist 37 | if not os.path.exists(cfg['misc']['log_dir']): 38 | os.makedirs(cfg['misc']['log_dir']) 39 | 40 | # Initialize the model 41 | model = config.get_model(cfg) 42 | model = model.cuda() 43 | 44 | # Get data loader 45 | train_loader = make_data_loader(cfg, phase='train') 46 | val_loader = make_data_loader(cfg, phase='val') 47 | 48 | # Log directory 49 | dataset_name = cfg["data"]["dataset"] 50 | 51 | now = datetime.now().strftime("%y_%m_%d-%H_%M_%S_%f") 52 | now += "__Method_" + str(cfg['method']['backbone']) 53 | now += "__Pretrained_" if cfg['network']['use_pretrained'] and cfg['network']['pretrained_path'] else '' 54 | if cfg['method']['flow']: now += "__Flow_" 55 | if cfg['method']['ego_motion']: now += "__Ego_" 56 | if cfg['method']['semantic']: now += "__Sem_" 57 | now += "__Rem_Ground_" if cfg['data']['remove_ground'] else '' 58 | now += "__VoxSize_" + str(cfg['misc']["voxel_size"]) 59 | now += "__Pts_" + str(cfg['misc']["num_points"]) 60 | path2log = os.path.join(cfg['misc']['log_dir'],"logs_" + dataset_name, now) 61 | 62 | logger, checkpoint_dir = prepare_logger(cfg, path2log) 63 | tboard_logger = SummaryWriter(path2log) 64 | 65 | # Output number of model parameters 66 | logger.info("Parameter Count: {:d}".format(n_model_parameters(model))) 67 | 68 | # Output torch and cuda version 69 | logger.info('Torch version: {}'.format(torch.__version__)) 70 | logger.info('CUDA version: {}'.format(torch.version.cuda)) 71 | 72 | # Save config file that was used for this experiment 73 | with open(os.path.join(path2log, config_name.split(os.sep)[-1]),'w') as outfile: 74 | yaml.dump(cfg, outfile, default_flow_style=False, allow_unicode=True) 75 | 76 | # Get optimizer and trainer 77 | optimizer = config.get_optimizer(cfg, model) 78 | scheduler = config.get_scheduler(cfg, optimizer) 79 | 80 | 81 | # Parameters determining the saving and validation interval (if positive denotes iteration if negative epoch) 82 | stat_interval = cfg['train']['stat_interval'] 83 | stat_interval = stat_interval if stat_interval > 0 else abs(stat_interval* len(train_loader)) 84 | 85 | chkpt_interval = cfg['train']['chkpt_interval'] 86 | chkpt_interval = chkpt_interval if chkpt_interval > 0 else abs(chkpt_interval* len(train_loader)) 87 | 88 | val_interval = cfg['train']['val_interval'] 89 | val_interval = val_interval if val_interval > 0 else abs(val_interval* len(train_loader)) 90 | 91 | # if not a pretrained model epoch and iterations should be -1 92 | metric_val_best = np.inf 93 | running_metrics = {} 94 | running_losses = {} 95 | epoch_it = -1 96 | total_it = -1 97 | 98 | # Load the pretrained weights 99 | if cfg['network']['use_pretrained'] and cfg['network']['pretrained_path']: 100 | model, optimizer, scheduler, epoch_it, total_it, metric_val_best = load_checkpoint(model, optimizer, scheduler, filename=cfg['network']['pretrained_path']) 101 | 102 | # Find previous tensorboard files and copy them 103 | tb_files = glob.glob(os.sep.join(cfg['network']['pretrained_path'].split(os.sep)[:-1]) + '/events.*') 104 | for tb_file in tb_files: 105 | shutil.copy(tb_file, os.path.join(path2log, tb_file.split(os.sep)[-1])) 106 | 107 | 108 | # Initialize the trainer 109 | device = torch.device('cuda' if (torch.cuda.is_available() and cfg['misc']['use_gpu']) else 'cpu') 110 | trainer = config.get_trainer(cfg, model, device) 111 | acc_iter_size = cfg['train']['acc_iter_size'] 112 | 113 | # Training loop 114 | while epoch_it < cfg['train']['max_epoch']: 115 | epoch_it += 1 116 | lr = scheduler.get_last_lr() 117 | logger.info('Training epoch: {}, LR: {} '.format(epoch_it, lr)) 118 | gc.collect() 119 | 120 | train_loader_iter = train_loader.__iter__() 121 | start = time.time() 122 | tbar = tqdm(total=len(train_loader) // acc_iter_size, ncols=100) 123 | 124 | for it in range(len(train_loader) // acc_iter_size): 125 | optimizer.zero_grad() 126 | total_it += 1 127 | batch_metrics = {} 128 | batch_losses = {} 129 | 130 | 131 | for iter_idx in range(acc_iter_size): 132 | 133 | batch = train_loader_iter.next() 134 | 135 | dict_all_to_device(batch, device) 136 | losses, metrics, total_loss = trainer.train_step(batch) 137 | 138 | total_loss.backward() 139 | 140 | # Save the running metrics and losses 141 | if not batch_metrics: 142 | batch_metrics = copy.deepcopy(metrics) 143 | else: 144 | for key, value in metrics.items(): 145 | batch_metrics[key] += value 146 | 147 | if not batch_losses: 148 | batch_losses = copy.deepcopy(losses) 149 | else: 150 | for key, value in losses.items(): 151 | batch_losses[key] += value 152 | 153 | 154 | # Compute the mean value of the metrics and losses of the batch 155 | for key, value in batch_metrics.items(): 156 | batch_metrics[key] = value / acc_iter_size 157 | 158 | for key, value in batch_losses.items(): 159 | batch_losses[key] = value / acc_iter_size 160 | 161 | optimizer.step() 162 | torch.cuda.empty_cache() 163 | 164 | tbar.set_description('Loss: {:.3g}'.format(batch_losses['total_loss'])) 165 | tbar.update(1) 166 | 167 | # Save the running metrics and losses 168 | if not running_metrics: 169 | running_metrics = copy.deepcopy(batch_metrics) 170 | else: 171 | for key, value in batch_metrics.items(): 172 | running_metrics[key] += value 173 | 174 | if not running_losses: 175 | running_losses = copy.deepcopy(batch_losses) 176 | else: 177 | for key, value in batch_losses.items(): 178 | running_losses[key] += value 179 | 180 | # Logs 181 | if total_it % stat_interval == stat_interval - 1: 182 | # Print / save logs 183 | logger.info("Epoch {0:d} - It. {1:d}: loss = {2:.3f}".format( 184 | epoch_it, total_it, running_losses['total_loss'] / stat_interval 185 | ) 186 | ) 187 | 188 | for key, value in running_losses.items(): 189 | tboard_logger.add_scalar("Train/{}".format(key), value / stat_interval, total_it) 190 | # Reinitialize the values 191 | running_losses[key] = 0 192 | 193 | for key, value in running_metrics.items(): 194 | tboard_logger.add_scalar("Train/{}".format(key), value / stat_interval, total_it) 195 | # Reinitialize the values 196 | running_metrics[key] = 0 197 | 198 | start = time.time() 199 | 200 | 201 | # Run validation 202 | if total_it % val_interval == val_interval - 1: 203 | logger.info("Starting the validation") 204 | val_losses, val_metrics = trainer.validate(val_loader) 205 | 206 | for key, value in val_losses.items(): 207 | tboard_logger.add_scalar("Val/{}".format(key), value, total_it) 208 | 209 | 210 | for key, value in val_metrics.items(): 211 | tboard_logger.add_scalar("Val/{}".format(key), value, total_it) 212 | 213 | logger.info("VALIDATION -It. {0:d}: total loss: {1:.3f}.".format(total_it, val_losses['total_loss'])) 214 | 215 | 216 | if val_losses['total_loss'] < metric_val_best: 217 | metric_val_best = val_losses['total_loss'] 218 | logger.info('New best model (loss: {:.4f})'.format(metric_val_best)) 219 | 220 | save_checkpoint(os.path.join(path2log,'model_best.pt'), epoch=epoch_it, it=total_it, model=model, 221 | optimizer=optimizer,scheduler=scheduler,config=cfg, best_val=metric_val_best) 222 | else: 223 | save_checkpoint(os.path.join(path2log,'model_{}.pt'.format(total_it)), epoch=epoch_it, it=total_it, model=model, 224 | optimizer=optimizer, scheduler=scheduler, config=cfg, best_val=val_losses['total_loss']) 225 | 226 | # After the epoch if finished update the scheduler 227 | scheduler.step() 228 | 229 | # Quit after the maximum number of epochs is reached 230 | logger.info('Training completed after {} Epochs ({} it) with best val metric ({})={}'.format(epoch_it, it, model_selection_metric, metric_val_best)) 231 | 232 | 233 | if __name__ == "__main__": 234 | 235 | # Parse the command line arguments 236 | parser = argparse.ArgumentParser() 237 | parser.add_argument('config', type=str, help= 'Path to the config file.') 238 | args = parser.parse_args() 239 | 240 | # Combine the two config files 241 | cfg = config.get_config(args.config, 'configs/default.yaml') 242 | 243 | main(cfg, args.config) 244 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/utils/__init__.py -------------------------------------------------------------------------------- /utils/chamfer_distance/__init__.py: -------------------------------------------------------------------------------- 1 | from .chamfer_distance import ChamferDistance 2 | -------------------------------------------------------------------------------- /utils/chamfer_distance/__pycache__/__init__.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/utils/chamfer_distance/__pycache__/__init__.cpython-37.pyc -------------------------------------------------------------------------------- /utils/chamfer_distance/__pycache__/chamfer_distance.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zgojcic/Rigid3DSceneFlow/ee064203873951ee42e058c35924878de8446eb8/utils/chamfer_distance/__pycache__/chamfer_distance.cpython-37.pyc -------------------------------------------------------------------------------- /utils/chamfer_distance/chamfer_distance.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | // CUDA forward declarations 4 | int ChamferDistanceKernelLauncher( 5 | const int b, const int n, 6 | const float* xyz, 7 | const int m, 8 | const float* xyz2, 9 | float* result, 10 | int* result_i, 11 | float* result2, 12 | int* result2_i); 13 | 14 | int ChamferDistanceGradKernelLauncher( 15 | const int b, const int n, 16 | const float* xyz1, 17 | const int m, 18 | const float* xyz2, 19 | const float* grad_dist1, 20 | const int* idx1, 21 | const float* grad_dist2, 22 | const int* idx2, 23 | float* grad_xyz1, 24 | float* grad_xyz2); 25 | 26 | 27 | void chamfer_distance_forward_cuda( 28 | const at::Tensor xyz1, 29 | const at::Tensor xyz2, 30 | const at::Tensor dist1, 31 | const at::Tensor dist2, 32 | const at::Tensor idx1, 33 | const at::Tensor idx2) 34 | { 35 | ChamferDistanceKernelLauncher(xyz1.size(0), xyz1.size(1), xyz1.data(), 36 | xyz2.size(1), xyz2.data(), 37 | dist1.data(), idx1.data(), 38 | dist2.data(), idx2.data()); 39 | } 40 | 41 | void chamfer_distance_backward_cuda( 42 | const at::Tensor xyz1, 43 | const at::Tensor xyz2, 44 | at::Tensor gradxyz1, 45 | at::Tensor gradxyz2, 46 | at::Tensor graddist1, 47 | at::Tensor graddist2, 48 | at::Tensor idx1, 49 | at::Tensor idx2) 50 | { 51 | ChamferDistanceGradKernelLauncher(xyz1.size(0), xyz1.size(1), xyz1.data(), 52 | xyz2.size(1), xyz2.data(), 53 | graddist1.data(), idx1.data(), 54 | graddist2.data(), idx2.data(), 55 | gradxyz1.data(), gradxyz2.data()); 56 | } 57 | 58 | 59 | void nnsearch( 60 | const int b, const int n, const int m, 61 | const float* xyz1, 62 | const float* xyz2, 63 | float* dist, 64 | int* idx) 65 | { 66 | for (int i = 0; i < b; i++) { 67 | for (int j = 0; j < n; j++) { 68 | const float x1 = xyz1[(i*n+j)*3+0]; 69 | const float y1 = xyz1[(i*n+j)*3+1]; 70 | const float z1 = xyz1[(i*n+j)*3+2]; 71 | double best = 0; 72 | int besti = 0; 73 | for (int k = 0; k < m; k++) { 74 | const float x2 = xyz2[(i*m+k)*3+0] - x1; 75 | const float y2 = xyz2[(i*m+k)*3+1] - y1; 76 | const float z2 = xyz2[(i*m+k)*3+2] - z1; 77 | const double d=x2*x2+y2*y2+z2*z2; 78 | if (k==0 || d < best){ 79 | best = d; 80 | besti = k; 81 | } 82 | } 83 | dist[i*n+j] = best; 84 | idx[i*n+j] = besti; 85 | } 86 | } 87 | } 88 | 89 | 90 | void chamfer_distance_forward( 91 | const at::Tensor xyz1, 92 | const at::Tensor xyz2, 93 | const at::Tensor dist1, 94 | const at::Tensor dist2, 95 | const at::Tensor idx1, 96 | const at::Tensor idx2) 97 | { 98 | const int batchsize = xyz1.size(0); 99 | const int n = xyz1.size(1); 100 | const int m = xyz2.size(1); 101 | 102 | const float* xyz1_data = xyz1.data(); 103 | const float* xyz2_data = xyz2.data(); 104 | float* dist1_data = dist1.data(); 105 | float* dist2_data = dist2.data(); 106 | int* idx1_data = idx1.data(); 107 | int* idx2_data = idx2.data(); 108 | 109 | nnsearch(batchsize, n, m, xyz1_data, xyz2_data, dist1_data, idx1_data); 110 | nnsearch(batchsize, m, n, xyz2_data, xyz1_data, dist2_data, idx2_data); 111 | } 112 | 113 | 114 | void chamfer_distance_backward( 115 | const at::Tensor xyz1, 116 | const at::Tensor xyz2, 117 | at::Tensor gradxyz1, 118 | at::Tensor gradxyz2, 119 | at::Tensor graddist1, 120 | at::Tensor graddist2, 121 | at::Tensor idx1, 122 | at::Tensor idx2) 123 | { 124 | const int b = xyz1.size(0); 125 | const int n = xyz1.size(1); 126 | const int m = xyz2.size(1); 127 | 128 | const float* xyz1_data = xyz1.data(); 129 | const float* xyz2_data = xyz2.data(); 130 | float* gradxyz1_data = gradxyz1.data(); 131 | float* gradxyz2_data = gradxyz2.data(); 132 | float* graddist1_data = graddist1.data(); 133 | float* graddist2_data = graddist2.data(); 134 | const int* idx1_data = idx1.data(); 135 | const int* idx2_data = idx2.data(); 136 | 137 | for (int i = 0; i < b*n*3; i++) 138 | gradxyz1_data[i] = 0; 139 | for (int i = 0; i < b*m*3; i++) 140 | gradxyz2_data[i] = 0; 141 | for (int i = 0;i < b; i++) { 142 | for (int j = 0; j < n; j++) { 143 | const float x1 = xyz1_data[(i*n+j)*3+0]; 144 | const float y1 = xyz1_data[(i*n+j)*3+1]; 145 | const float z1 = xyz1_data[(i*n+j)*3+2]; 146 | const int j2 = idx1_data[i*n+j]; 147 | 148 | const float x2 = xyz2_data[(i*m+j2)*3+0]; 149 | const float y2 = xyz2_data[(i*m+j2)*3+1]; 150 | const float z2 = xyz2_data[(i*m+j2)*3+2]; 151 | const float g = graddist1_data[i*n+j]*2; 152 | 153 | gradxyz1_data[(i*n+j)*3+0] += g*(x1-x2); 154 | gradxyz1_data[(i*n+j)*3+1] += g*(y1-y2); 155 | gradxyz1_data[(i*n+j)*3+2] += g*(z1-z2); 156 | gradxyz2_data[(i*m+j2)*3+0] -= (g*(x1-x2)); 157 | gradxyz2_data[(i*m+j2)*3+1] -= (g*(y1-y2)); 158 | gradxyz2_data[(i*m+j2)*3+2] -= (g*(z1-z2)); 159 | } 160 | for (int j = 0; j < m; j++) { 161 | const float x1 = xyz2_data[(i*m+j)*3+0]; 162 | const float y1 = xyz2_data[(i*m+j)*3+1]; 163 | const float z1 = xyz2_data[(i*m+j)*3+2]; 164 | const int j2 = idx2_data[i*m+j]; 165 | const float x2 = xyz1_data[(i*n+j2)*3+0]; 166 | const float y2 = xyz1_data[(i*n+j2)*3+1]; 167 | const float z2 = xyz1_data[(i*n+j2)*3+2]; 168 | const float g = graddist2_data[i*m+j]*2; 169 | gradxyz2_data[(i*m+j)*3+0] += g*(x1-x2); 170 | gradxyz2_data[(i*m+j)*3+1] += g*(y1-y2); 171 | gradxyz2_data[(i*m+j)*3+2] += g*(z1-z2); 172 | gradxyz1_data[(i*n+j2)*3+0] -= (g*(x1-x2)); 173 | gradxyz1_data[(i*n+j2)*3+1] -= (g*(y1-y2)); 174 | gradxyz1_data[(i*n+j2)*3+2] -= (g*(z1-z2)); 175 | } 176 | } 177 | } 178 | 179 | 180 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 181 | m.def("forward", &chamfer_distance_forward, "ChamferDistance forward"); 182 | m.def("forward_cuda", &chamfer_distance_forward_cuda, "ChamferDistance forward (CUDA)"); 183 | m.def("backward", &chamfer_distance_backward, "ChamferDistance backward"); 184 | m.def("backward_cuda", &chamfer_distance_backward_cuda, "ChamferDistance backward (CUDA)"); 185 | } 186 | -------------------------------------------------------------------------------- /utils/chamfer_distance/chamfer_distance.cu: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include 4 | #include 5 | 6 | __global__ 7 | void ChamferDistanceKernel( 8 | int b, 9 | int n, 10 | const float* xyz, 11 | int m, 12 | const float* xyz2, 13 | float* result, 14 | int* result_i) 15 | { 16 | const int batch=512; 17 | __shared__ float buf[batch*3]; 18 | for (int i=blockIdx.x;ibest){ 130 | result[(i*n+j)]=best; 131 | result_i[(i*n+j)]=best_i; 132 | } 133 | } 134 | __syncthreads(); 135 | } 136 | } 137 | } 138 | 139 | void ChamferDistanceKernelLauncher( 140 | const int b, const int n, 141 | const float* xyz, 142 | const int m, 143 | const float* xyz2, 144 | float* result, 145 | int* result_i, 146 | float* result2, 147 | int* result2_i) 148 | { 149 | ChamferDistanceKernel<<>>(b, n, xyz, m, xyz2, result, result_i); 150 | ChamferDistanceKernel<<>>(b, m, xyz2, n, xyz, result2, result2_i); 151 | 152 | cudaError_t err = cudaGetLastError(); 153 | if (err != cudaSuccess) 154 | printf("error in chamfer distance updateOutput: %s\n", cudaGetErrorString(err)); 155 | } 156 | 157 | 158 | __global__ 159 | void ChamferDistanceGradKernel( 160 | int b, int n, 161 | const float* xyz1, 162 | int m, 163 | const float* xyz2, 164 | const float* grad_dist1, 165 | const int* idx1, 166 | float* grad_xyz1, 167 | float* grad_xyz2) 168 | { 169 | for (int i = blockIdx.x; i>>(b, n, xyz1, m, xyz2, grad_dist1, idx1, grad_xyz1, grad_xyz2); 204 | ChamferDistanceGradKernel<<>>(b, m, xyz2, n, xyz1, grad_dist2, idx2, grad_xyz2, grad_xyz1); 205 | 206 | cudaError_t err = cudaGetLastError(); 207 | if (err != cudaSuccess) 208 | printf("error in chamfer distance get grad: %s\n", cudaGetErrorString(err)); 209 | } 210 | -------------------------------------------------------------------------------- /utils/chamfer_distance/chamfer_distance.py: -------------------------------------------------------------------------------- 1 | 2 | import torch 3 | 4 | from torch.utils.cpp_extension import load 5 | cd = load(name="cd", 6 | sources=["utils/chamfer_distance/chamfer_distance.cpp", 7 | "utils/chamfer_distance/chamfer_distance.cu"], verbose=True) 8 | 9 | class ChamferDistanceFunction(torch.autograd.Function): 10 | @staticmethod 11 | def forward(ctx, xyz1, xyz2): 12 | batchsize, n, _ = xyz1.size() 13 | _, m, _ = xyz2.size() 14 | xyz1 = xyz1.contiguous() 15 | xyz2 = xyz2.contiguous() 16 | dist1 = torch.zeros(batchsize, n) 17 | dist2 = torch.zeros(batchsize, m) 18 | 19 | idx1 = torch.zeros(batchsize, n, dtype=torch.int) 20 | idx2 = torch.zeros(batchsize, m, dtype=torch.int) 21 | 22 | if not xyz1.is_cuda: 23 | cd.forward(xyz1, xyz2, dist1, dist2, idx1, idx2) 24 | else: 25 | dist1 = dist1.cuda() 26 | dist2 = dist2.cuda() 27 | idx1 = idx1.cuda() 28 | idx2 = idx2.cuda() 29 | cd.forward_cuda(xyz1, xyz2, dist1, dist2, idx1, idx2) 30 | 31 | ctx.save_for_backward(xyz1, xyz2, idx1, idx2) 32 | 33 | return dist1, dist2 34 | 35 | @staticmethod 36 | def backward(ctx, graddist1, graddist2): 37 | xyz1, xyz2, idx1, idx2 = ctx.saved_tensors 38 | 39 | graddist1 = graddist1.contiguous() 40 | graddist2 = graddist2.contiguous() 41 | 42 | gradxyz1 = torch.zeros(xyz1.size()) 43 | gradxyz2 = torch.zeros(xyz2.size()) 44 | 45 | if not graddist1.is_cuda: 46 | cd.backward(xyz1, xyz2, gradxyz1, gradxyz2, graddist1, graddist2, idx1, idx2) 47 | else: 48 | gradxyz1 = gradxyz1.cuda() 49 | gradxyz2 = gradxyz2.cuda() 50 | cd.backward_cuda(xyz1, xyz2, gradxyz1, gradxyz2, graddist1, graddist2, idx1, idx2) 51 | 52 | return gradxyz1, gradxyz2 53 | 54 | 55 | class ChamferDistance(torch.nn.Module): 56 | def forward(self, xyz1, xyz2): 57 | return ChamferDistanceFunction.apply(xyz1, xyz2) 58 | -------------------------------------------------------------------------------- /utils/chamfer_distance/readme.txt: -------------------------------------------------------------------------------- 1 | git clone //github.com/chrdiller/pyTorchChamferDistance.git 2 | --------------------------------------------------------------------------------