├── .gitignore ├── .gitmodules ├── License.txt ├── README.md ├── imgs └── overview.jpg ├── nerf_loc ├── __init__.py ├── configs │ ├── 12scenes │ │ ├── apt1_kitchen.yaml │ │ ├── apt1_living.yaml │ │ ├── apt2_bed.yaml │ │ ├── apt2_kitchen.yaml │ │ ├── apt2_living.yaml │ │ ├── apt2_luke.yaml │ │ ├── office1_gates362.yaml │ │ ├── office1_gates381.yaml │ │ ├── office1_lounge.yaml │ │ ├── office1_manolis.yaml │ │ ├── office2_5a.yaml │ │ └── office2_5b.yaml │ ├── 12scenes_all.yaml │ ├── 7scenes │ │ ├── chess.yaml │ │ ├── fire.yaml │ │ ├── heads.yaml │ │ ├── office.yaml │ │ ├── pumpkin.yaml │ │ ├── redkitchen.yaml │ │ └── stairs.yaml │ ├── 7scenes_all.yaml │ ├── __init__.py │ ├── ablation_study │ │ ├── 7scenes_from_scratch │ │ │ ├── chess.txt │ │ │ ├── fire.txt │ │ │ ├── heads.txt │ │ │ ├── office.txt │ │ │ ├── pumpkin.txt │ │ │ ├── redkitchen.txt │ │ │ └── stairs.txt │ │ ├── 7scenes_ft_no_coord │ │ │ ├── chess.txt │ │ │ ├── fire.txt │ │ │ ├── heads.txt │ │ │ ├── office.txt │ │ │ ├── pumpkin.txt │ │ │ ├── redkitchen.txt │ │ │ └── stairs.txt │ │ ├── 7scenes_simple.txt │ │ └── cambridge_simple.txt │ ├── blender │ │ └── lego.txt │ ├── cambridge │ │ ├── GreatCourt.yaml │ │ ├── KingsCollege.yaml │ │ ├── OldHospital.yaml │ │ ├── ShopFacade.yaml │ │ └── StMarysChurch.yaml │ ├── cambridge_all.yaml │ ├── data │ │ ├── 12scenes.yaml │ │ ├── 7scenes.yaml │ │ ├── cambridge.yaml │ │ └── onepose.yaml │ ├── dtu.txt │ ├── generalize.txt │ ├── llff │ │ └── horns.txt │ ├── mario.txt │ ├── onepose │ │ ├── 0447.yaml │ │ ├── 0450.yaml │ │ ├── 0488.yaml │ │ ├── 0493.yaml │ │ ├── 0494.yaml │ │ └── 0594.yaml │ └── onepose_all.yaml ├── datasets │ ├── __init__.py │ ├── colmap │ │ ├── cli.py │ │ ├── database.py │ │ └── read_write_model.py │ ├── colmap_dataset.py │ ├── neuray_base_dataset.py │ └── video │ │ ├── __init__.py │ │ ├── covisibility_sampler.py │ │ ├── dataset.py │ │ ├── furthest_pose_sampler.py │ │ ├── fusion.py │ │ ├── geometry.py │ │ ├── image.py │ │ ├── multi_scene_dataset.py │ │ ├── preprocess_12scenes.py │ │ ├── preprocess_7scenes.py │ │ ├── preprocess_cambridge.py │ │ ├── preprocess_onepose.py │ │ ├── reader.py │ │ └── transform.py ├── models │ ├── COTR │ │ ├── __init__.py │ │ ├── backbone2d.py │ │ ├── fpn.py │ │ ├── misc.py │ │ ├── position_encoding.py │ │ ├── resnet.py │ │ └── transformer.py │ ├── __init__.py │ ├── appearance_embedding.py │ ├── conditional_nerf │ │ ├── __init__.py │ │ ├── depth_fusion.py │ │ ├── losses.py │ │ ├── model.py │ │ ├── model_simple.py │ │ ├── multiview_aggregator.py │ │ ├── neuray_ops.py │ │ ├── ray_unet.py │ │ ├── utils.py │ │ └── visibility_decoder.py │ ├── ibrnet │ │ └── ibrnet.py │ ├── image_retrieval │ │ ├── __init__.py │ │ ├── base_model.py │ │ ├── dir.py │ │ ├── netvlad.py │ │ ├── run.py │ │ └── vis.py │ ├── matcher.py │ ├── matching │ │ ├── __init__.py │ │ ├── coarse_matching.py │ │ ├── fine_matching.py │ │ └── sparse_to_dense.py │ ├── nerf_pose_estimator.py │ ├── ops │ │ └── knn │ │ │ ├── knn_utils.py │ │ │ └── src │ │ │ ├── knn.cu │ │ │ ├── knn.h │ │ │ ├── knn_api.cpp │ │ │ ├── knn_cpu.cpp │ │ │ └── utils │ │ │ ├── dispatch.cuh │ │ │ ├── index_utils.cuh │ │ │ ├── mink.cuh │ │ │ └── pytorch3d_cutils.h │ ├── pose_optimizer.py │ └── utils.py └── utils │ ├── __init__.py │ ├── common.py │ ├── metrics.py │ ├── transform │ ├── __init__.py │ ├── math.py │ ├── rotation_conversions.py │ ├── se3.py │ └── so3.py │ └── visualization.py ├── pl ├── model.py ├── test.py └── train.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | *__pycache__* 2 | *.ckpt 3 | *.pth -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "third_party/NeuRay"] 2 | path = third_party/NeuRay 3 | url = https://github.com/liuyuan-pal/NeuRay.git -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # NeRF-Loc 2 | 3 | 4 | This project the PyTorch implementation of [NeRF-Loc](https://arxiv.org/abs/2304.07979), a visual-localization pipeline based on conditional NeRF. 5 | ![overview](./imgs/overview.jpg) 6 | 7 | ## Installation 8 | Environment 9 | + python3.8 10 | + cuda11.3 11 | 12 | 1. clone with submodules 13 | ``` 14 | git clone --recursive https://github.com/JenningsL/nerf-loc.git 15 | ``` 16 | 2. install colmap, following the instruction [here](https://colmap.github.io/install.html) 17 | 3. install python packages 18 | 19 | ``` 20 | conda create --name nerf-loc python=3.8 -y 21 | conda activate nerf-loc 22 | pip install -r requirements.txt 23 | # install pytorch3d from source 24 | git clone https://github.com/facebookresearch/pytorch3d.git 25 | git checkout v0.6.0 26 | conda install -c bottler nvidiacub 27 | pip install -e . 28 | ``` 29 | 30 | ## How To Run? 31 | 32 | `cd nerf-loc && export PYTHONPATH=.` 33 | ### Data Preparation 34 | 35 | 1. Download data for [Cambridge](https://github.com/vislearn/dsacstar#cambridge-landmarks), [12scenes](https://github.com/vislearn/dsacstar#12scenes), [7scenes](https://github.com/vislearn/dsacstar#7scenes) and [Onepose](https://github.com/zju3dv/OnePose) following their instructions. Create `data` folder and put the downloaded dataset to `data/Cambridge`, `data/12scenes`, `data/7scenes` and `data/onepose` respectively. You can also change the `base_dir` of datasets by modifying dataset configs in `nerf_loc/configs/data`. 36 | 2. Preprocess datasets: 37 | 38 | ``` 39 | python3 nerf_loc/datasets/video/preprocess_cambridge.py data/Cambridge 40 | python3 datasets/video/preprocess_12scenes.py data/12scenes 41 | python3 datasets/video/preprocess_7scenes.py data/7scenes 42 | ``` 43 | 44 | 3. Run image retrieval 45 | 46 | ``` 47 | python3 models/image_retrieval/run.py --config ${CONFIG} 48 | ``` 49 | replace `{CONFIG}` with `nerf_loc/configs/cambridge_all.txt` | `nerf_loc/configs/12scenes_all.txt` | `nerf_loc/configs/7scenes_all.txt` | etc. 50 | 51 | ### Training 52 | 53 | First, train scene-agnostic NeRF-Loc across different scenes: 54 | 55 | ``` 56 | python3 pl/train.py --config ${CONFIG} --num_nodes ${HOST_NUM} 57 | ``` 58 | 59 | replace `{CONFIG}` with `nerf_loc/configs/cambridge_all.txt` | `nerf_loc/configs/12scenes_all.txt` | `nerf_loc/configs/7scenes_all.txt` | etc. 60 | 61 | Then, finetune on a certain scene to get scene-specific NeRF-Loc model. 62 | 63 | ``` 64 | python3 pl/train.py --config ${CONFIG} --num_nodes ${HOST_NUM} 65 | ``` 66 | ` 67 | replace `{CONFIG}` with `nerf_loc/configs/cambridge/KingsCollege.txt` | `nerf_loc/configs/12scenes/apt1_kitchen.txt` | `nerf_loc/configs/7scenes/chess.txt` | etc. 68 | 69 | --- 70 | 71 | ### Evaluation 72 | 73 | To evaluate NeRF-Loc: 74 | 75 | ``` 76 | python3 pl/test.py --config ${CONFIG} --ckpt ${CKPT} 77 | ``` 78 | 79 | replace `{CONFIG}` with `nerf_loc/configs/cambridge/KingsCollege.txt` | `nerf_loc/configs/12scenes/apt1_kitchen.txt` | `nerf_loc/configs/7scenes/chess.txt` | etc. 80 | replace `{CKPT}` with the path of checkpoint file. 81 | 82 | 83 | ### Pre-trained Models 84 | The 2d backbone weights of COTR can be downloaded [here](https://www.cs.ubc.ca/research/kmyi_data/files/2021/cotr/default.zip), please put it in `nerf_loc/models/COTR/default/checkpoint.pth.tar`. 85 | You can download the NeRF-Loc pre-trained models [here](). TODO: 86 | 87 | ## Acknowledgements 88 | Our codes are largely borrowed from the following works, thanks for their excellent contributions! 89 | + [Neuray](https://github.com/liuyuan-pal/NeuRay) 90 | + [LoFTR](https://github.com/zju3dv/LoFTR) 91 | + [COTR](https://github.com/ubc-vision/COTR) 92 | + [HLoc](https://github.com/cvg/Hierarchical-Localization) 93 | + [DSM](https://github.com/Tangshitao/Dense-Scene-Matching) 94 | + [Colmap](https://github.com/colmap/colmap) 95 | 96 | ## Citation 97 | ``` 98 | @misc{liu2023nerfloc, 99 | title={NeRF-Loc: Visual Localization with Conditional Neural Radiance Field}, 100 | author={Jianlin Liu and Qiang Nie and Yong Liu and Chengjie Wang}, 101 | year={2023}, 102 | eprint={2304.07979}, 103 | archivePrefix={arXiv}, 104 | primaryClass={cs.CV} 105 | } 106 | ``` 107 | -------------------------------------------------------------------------------- /imgs/overview.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JenningsL/nerf-loc/1d539c5a4824a46d26414f3c2b41bb1b1f6dd91e/imgs/overview.jpg -------------------------------------------------------------------------------- /nerf_loc/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JenningsL/nerf-loc/1d539c5a4824a46d26414f3c2b41bb1b1f6dd91e/nerf_loc/__init__.py -------------------------------------------------------------------------------- /nerf_loc/configs/12scenes/apt1_kitchen.yaml: -------------------------------------------------------------------------------- 1 | expname: apt1_kitchen 2 | basedir: experiments/12scenes 3 | dataset_type: video_12scenes 4 | scenes: [apt1/kitchen] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | # cascade_matching: True 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method_train: netvlad 39 | image_retrieval_method_test: netvlad 40 | image_retrieval_interval_train: 20 41 | image_retrieval_interval_test: 10 42 | 43 | render_loss_weight: 1.0 44 | ref_depth_loss_weight: 0.1 45 | 46 | use_scene_coord_memorization: True 47 | ckpt: experiments/12scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/12scenes/apt1_living.yaml: -------------------------------------------------------------------------------- 1 | expname: apt1_living 2 | basedir: experiments/12scenes 3 | dataset_type: video_12scenes 4 | scenes: [apt1/living] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | # cascade_matching: True 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method_train: netvlad 39 | image_retrieval_method_test: netvlad 40 | image_retrieval_interval_train: 20 41 | image_retrieval_interval_test: 10 42 | 43 | render_loss_weight: 1.0 44 | ref_depth_loss_weight: 0.1 45 | 46 | use_scene_coord_memorization: True 47 | ckpt: experiments/12scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/12scenes/apt2_bed.yaml: -------------------------------------------------------------------------------- 1 | expname: apt2_bed 2 | basedir: experiments/12scenes 3 | dataset_type: video_12scenes 4 | scenes: [apt2/bed] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | # cascade_matching: True 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method_train: netvlad 39 | image_retrieval_method_test: netvlad 40 | image_retrieval_interval_train: 20 41 | image_retrieval_interval_test: 10 42 | 43 | render_loss_weight: 1.0 44 | ref_depth_loss_weight: 0.1 45 | 46 | use_scene_coord_memorization: True 47 | ckpt: experiments/12scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/12scenes/apt2_kitchen.yaml: -------------------------------------------------------------------------------- 1 | expname: apt2_kitchen 2 | basedir: experiments/12scenes 3 | dataset_type: video_12scenes 4 | scenes: [apt2/kitchen] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | # cascade_matching: True 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method_train: netvlad 39 | image_retrieval_method_test: netvlad 40 | image_retrieval_interval_train: 20 41 | image_retrieval_interval_test: 10 42 | 43 | render_loss_weight: 1.0 44 | ref_depth_loss_weight: 0.1 45 | 46 | use_scene_coord_memorization: True 47 | ckpt: experiments/12scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/12scenes/apt2_living.yaml: -------------------------------------------------------------------------------- 1 | expname: apt2_living 2 | basedir: experiments/12scenes 3 | dataset_type: video_12scenes 4 | scenes: [apt2/living] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | # cascade_matching: True 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method_train: netvlad 39 | image_retrieval_method_test: netvlad 40 | image_retrieval_interval_train: 20 41 | image_retrieval_interval_test: 10 42 | 43 | render_loss_weight: 1.0 44 | ref_depth_loss_weight: 0.1 45 | 46 | use_scene_coord_memorization: True 47 | ckpt: experiments/12scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/12scenes/apt2_luke.yaml: -------------------------------------------------------------------------------- 1 | expname: apt2_luke 2 | basedir: experiments/12scenes 3 | dataset_type: video_12scenes 4 | scenes: [apt2/luke] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | # cascade_matching: True 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method_train: netvlad 39 | image_retrieval_method_test: netvlad 40 | image_retrieval_interval_train: 20 41 | image_retrieval_interval_test: 10 42 | 43 | render_loss_weight: 1.0 44 | ref_depth_loss_weight: 0.1 45 | 46 | use_scene_coord_memorization: True 47 | ckpt: experiments/12scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/12scenes/office1_gates362.yaml: -------------------------------------------------------------------------------- 1 | expname: office1_gates362 2 | basedir: experiments/12scenes 3 | dataset_type: video_12scenes 4 | scenes: [office1/gates362] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | # cascade_matching: True 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method_train: netvlad 39 | image_retrieval_method_test: netvlad 40 | image_retrieval_interval_train: 20 41 | image_retrieval_interval_test: 10 42 | 43 | render_loss_weight: 1.0 44 | ref_depth_loss_weight: 0.1 45 | 46 | use_scene_coord_memorization: True 47 | ckpt: experiments/12scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/12scenes/office1_gates381.yaml: -------------------------------------------------------------------------------- 1 | expname: office1_gates381 2 | basedir: experiments/12scenes 3 | dataset_type: video_12scenes 4 | scenes: [office1/gates381] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | # cascade_matching: True 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method_train: netvlad 39 | image_retrieval_method_test: netvlad 40 | image_retrieval_interval_train: 20 41 | image_retrieval_interval_test: 10 42 | 43 | render_loss_weight: 1.0 44 | ref_depth_loss_weight: 0.1 45 | 46 | use_scene_coord_memorization: True 47 | ckpt: experiments/12scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/12scenes/office1_lounge.yaml: -------------------------------------------------------------------------------- 1 | expname: office1_lounge 2 | basedir: experiments/12scenes 3 | dataset_type: video_12scenes 4 | scenes: [office1/lounge] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | # cascade_matching: True 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method_train: netvlad 39 | image_retrieval_method_test: netvlad 40 | image_retrieval_interval_train: 20 41 | image_retrieval_interval_test: 10 42 | 43 | render_loss_weight: 1.0 44 | ref_depth_loss_weight: 0.1 45 | 46 | use_scene_coord_memorization: True 47 | ckpt: experiments/12scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/12scenes/office1_manolis.yaml: -------------------------------------------------------------------------------- 1 | expname: office1_manolis 2 | basedir: experiments/12scenes 3 | dataset_type: video_12scenes 4 | scenes: [office1/manolis] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | # cascade_matching: True 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method_train: netvlad 39 | image_retrieval_method_test: netvlad 40 | image_retrieval_interval_train: 20 41 | image_retrieval_interval_test: 10 42 | 43 | render_loss_weight: 1.0 44 | ref_depth_loss_weight: 0.1 45 | 46 | use_scene_coord_memorization: True 47 | ckpt: experiments/12scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/12scenes/office2_5a.yaml: -------------------------------------------------------------------------------- 1 | expname: office2_5a 2 | basedir: experiments/12scenes 3 | dataset_type: video_12scenes 4 | scenes: [office2/5a] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | # cascade_matching: True 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method_train: netvlad 39 | image_retrieval_method_test: netvlad 40 | image_retrieval_interval_train: 20 41 | image_retrieval_interval_test: 10 42 | 43 | render_loss_weight: 1.0 44 | ref_depth_loss_weight: 0.1 45 | 46 | use_scene_coord_memorization: True 47 | ckpt: experiments/12scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/12scenes/office2_5b.yaml: -------------------------------------------------------------------------------- 1 | expname: office2_5b 2 | basedir: experiments/12scenes 3 | dataset_type: video_12scenes 4 | scenes: [office2/5b] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | # cascade_matching: True 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method_train: netvlad 39 | image_retrieval_method_test: netvlad 40 | image_retrieval_interval_train: 20 41 | image_retrieval_interval_test: 10 42 | 43 | render_loss_weight: 1.0 44 | ref_depth_loss_weight: 0.1 45 | 46 | use_scene_coord_memorization: True 47 | ckpt: experiments/12scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/12scenes_all.yaml: -------------------------------------------------------------------------------- 1 | expname: all 2 | basedir: experiments/12scenes 3 | dataset_type: video_12scenes 4 | scenes: [apt1/kitchen, apt1/living, apt2/bed, apt2/kitchen, apt2/living, apt2/luke, office1/gates362, office1/gates381, office1/lounge, office1/manolis, office2/5a, office2/5b] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | # cascade_matching: True 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method_train: netvlad 39 | image_retrieval_method_test: netvlad 40 | image_retrieval_interval_train: 20 41 | image_retrieval_interval_test: 10 42 | 43 | render_loss_weight: 1.0 44 | ref_depth_loss_weight: 0.1 45 | 46 | use_scene_coord_memorization: False -------------------------------------------------------------------------------- /nerf_loc/configs/7scenes/chess.yaml: -------------------------------------------------------------------------------- 1 | expname: chess 2 | basedir: experiments/7scenes 3 | dataset_type: video_7scenes 4 | scenes: [chess] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | cascade_matching: False 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method: netvlad 39 | image_retrieval_method_train: netvlad 40 | image_retrieval_method_test: netvlad 41 | image_retrieval_interval_train: 20 42 | image_retrieval_interval_test: 10 43 | 44 | render_loss_weight: 1.0 45 | ref_depth_loss_weight: 0.1 46 | 47 | use_scene_coord_memorization: True 48 | ckpt: experiments/7scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/7scenes/fire.yaml: -------------------------------------------------------------------------------- 1 | expname: fire 2 | basedir: experiments/7scenes 3 | dataset_type: video_7scenes 4 | scenes: [fire] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | cascade_matching: False 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method: netvlad 39 | image_retrieval_method_train: netvlad 40 | image_retrieval_method_test: netvlad 41 | image_retrieval_interval_train: 20 42 | image_retrieval_interval_test: 10 43 | 44 | render_loss_weight: 1.0 45 | ref_depth_loss_weight: 0.1 46 | 47 | use_scene_coord_memorization: True 48 | ckpt: experiments/7scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/7scenes/heads.yaml: -------------------------------------------------------------------------------- 1 | expname: heads 2 | basedir: experiments/7scenes 3 | dataset_type: video_7scenes 4 | scenes: [heads] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | cascade_matching: False 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method: netvlad 39 | image_retrieval_method_train: netvlad 40 | image_retrieval_method_test: netvlad 41 | image_retrieval_interval_train: 20 42 | image_retrieval_interval_test: 10 43 | 44 | render_loss_weight: 1.0 45 | ref_depth_loss_weight: 0.1 46 | 47 | use_scene_coord_memorization: True 48 | ckpt: experiments/7scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/7scenes/office.yaml: -------------------------------------------------------------------------------- 1 | expname: office 2 | basedir: experiments/7scenes 3 | dataset_type: video_7scenes 4 | scenes: [office] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | cascade_matching: False 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method: netvlad 39 | image_retrieval_method_train: netvlad 40 | image_retrieval_method_test: netvlad 41 | image_retrieval_interval_train: 20 42 | image_retrieval_interval_test: 10 43 | 44 | render_loss_weight: 1.0 45 | ref_depth_loss_weight: 0.1 46 | 47 | use_scene_coord_memorization: True 48 | ckpt: experiments/7scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/7scenes/pumpkin.yaml: -------------------------------------------------------------------------------- 1 | expname: pumpkin 2 | basedir: experiments/7scenes 3 | dataset_type: video_7scenes 4 | scenes: [pumpkin] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | cascade_matching: False 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method: netvlad 39 | image_retrieval_method_train: netvlad 40 | image_retrieval_method_test: netvlad 41 | image_retrieval_interval_train: 20 42 | image_retrieval_interval_test: 10 43 | 44 | render_loss_weight: 1.0 45 | ref_depth_loss_weight: 0.1 46 | 47 | use_scene_coord_memorization: True 48 | ckpt: experiments/7scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/7scenes/redkitchen.yaml: -------------------------------------------------------------------------------- 1 | expname: redkitchen 2 | basedir: experiments/7scenes 3 | dataset_type: video_7scenes 4 | scenes: [redkitchen] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | cascade_matching: False 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method: netvlad 39 | image_retrieval_method_train: netvlad 40 | image_retrieval_method_test: netvlad 41 | image_retrieval_interval_train: 20 42 | image_retrieval_interval_test: 10 43 | 44 | render_loss_weight: 1.0 45 | ref_depth_loss_weight: 0.1 46 | 47 | use_scene_coord_memorization: True 48 | ckpt: experiments/7scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/7scenes/stairs.yaml: -------------------------------------------------------------------------------- 1 | expname: stairs 2 | basedir: experiments/7scenes 3 | dataset_type: video_7scenes 4 | scenes: [stairs] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | cascade_matching: False 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method: netvlad 39 | image_retrieval_method_train: netvlad 40 | image_retrieval_method_test: netvlad 41 | image_retrieval_interval_train: 20 42 | image_retrieval_interval_test: 10 43 | 44 | render_loss_weight: 1.0 45 | ref_depth_loss_weight: 0.1 46 | 47 | use_scene_coord_memorization: True 48 | ckpt: experiments/7scenes/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/7scenes_all.yaml: -------------------------------------------------------------------------------- 1 | expname: all 2 | basedir: experiments/7scenes 3 | dataset_type: video_7scenes 4 | scenes: [chess,fire,heads,office,pumpkin,redkitchen,stairs] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 1024 9 | chunk: 2048 10 | 11 | max_epochs: 30 12 | 13 | use_depth_supervision: True 14 | 15 | matching: 16 | keypoints_3d_sampling: random 17 | keypoints_3d_sampling_max_keep: 16384 18 | coarse_matching_depth_thresh: 0.2 19 | coarse_num_3d_keypoints: 1024 20 | fine_num_3d_keypoints: 1024 21 | 22 | backbone2d_use_fpn: True 23 | cascade_matching: False 24 | 25 | encode_appearance: True 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 5 29 | n_views_test: 10 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: retrieval 37 | image_core_set_size: 16 38 | image_retrieval_method: netvlad 39 | image_retrieval_method_train: netvlad 40 | image_retrieval_method_test: netvlad 41 | image_retrieval_interval_train: 20 42 | image_retrieval_interval_test: 10 43 | 44 | render_loss_weight: 1.0 45 | ref_depth_loss_weight: 0.1 46 | 47 | use_scene_coord_memorization: False -------------------------------------------------------------------------------- /nerf_loc/configs/__init__.py: -------------------------------------------------------------------------------- 1 | from yacs.config import CfgNode as CN 2 | 3 | 4 | _C = CN() 5 | _C.expname = '' 6 | _C.basedir = '' 7 | _C.datadir = '' 8 | _C.version = 'default' 9 | _C.ckpt = '' 10 | _C.dataset_type = 'video_cambridge' 11 | _C.scenes = [] 12 | 13 | _C.max_epochs = 50 14 | _C.lrate = 5e-4 15 | _C.lrate_decay_steps = 50000 16 | _C.lrate_decay_factor = 0.5 17 | 18 | _C.train_nerf = True 19 | _C.train_pose = True 20 | 21 | _C.backbone2d = 'cotr' 22 | _C.backbone2d_fpn_dim = 192 23 | _C.backbone2d_use_fpn = True 24 | _C.backbone2d_coarse_layer_name = 'layer2' 25 | _C.backbone2d_fine_layer_name = 'layer1' 26 | 27 | # support image 28 | _C.support_image_selection = 'retrieval' 29 | _C.n_views_train = 5 30 | _C.n_views_test = 10 31 | _C.image_core_set_size = 16 32 | # image retrieval 33 | _C.image_retrieval_method = 'netvlad' # used in offline preprocessing 34 | _C.image_retrieval_method_train = 'netvlad' 35 | _C.image_retrieval_method_test = 'netvlad' 36 | _C.image_retrieval_interval_train = 1 37 | _C.image_retrieval_interval_test = 1 38 | # coreset 39 | _C.coreset_sampler = 'FPS' 40 | 41 | _C.model_3d_hidden_dim = 128 42 | _C.use_scene_coord_memorization = False 43 | 44 | _C.encode_appearance = True 45 | _C.appearance_emb_dim = 128 46 | 47 | _C.simple_3d_model = False 48 | 49 | # position embedding 50 | _C.multires = 10 # log2 of max freq for positional encoding (3D location) 51 | _C.multires_views = 4 # log2 of max freq for positional encoding (2D direction) 52 | _C.i_embed = 0 # set 0 for default positional encoding, -1 for none 53 | 54 | _C.render = CN() 55 | _C.render.N_samples = 64 56 | _C.render.N_importance = 0 57 | _C.render.N_rand = 1024 58 | _C.render.chunk = 2048 59 | _C.render.lindisp = False 60 | _C.render.white_bkgd = False 61 | _C.render.use_render_uncertainty = True 62 | _C.render.render_feature = True 63 | 64 | _C.use_depth_supervision = False 65 | _C.coarse_loss_weight = 10000. 66 | _C.fine_loss_weight = 10. 67 | _C.render_loss_weight = 1.0 68 | _C.ref_depth_loss_weight = 0.1 69 | 70 | _C.keypoints_3d_source = 'depth' # sfm - from sparse sfm points, depth - from dense backprojected points 71 | _C.matcher_hidden_dim = 192 72 | _C.matching = CN() 73 | _C.matching.keypoints_3d_sampling = 'random' 74 | _C.matching.keypoints_3d_sampling_max_keep = 100000 75 | _C.matching.coarse_matching_depth_thresh = 2. 76 | _C.matching.coarse_num_3d_keypoints = 1024 77 | _C.matching.fine_num_3d_keypoints = 1024 78 | _C.fine_matching_loss_type = 'l2_with_std' 79 | 80 | _C.ransac_thresh = 8 81 | _C.rotation_eval_thresh = 5 82 | _C.translation_eval_thresh = 0.05 83 | 84 | # test time 85 | _C.cascade_matching = False 86 | _C.optimize_pose = False 87 | _C.test_time_color_jitter = False 88 | _C.test_time_style_change = False 89 | _C.test_render_interval = 50 # interval of rendering test image 90 | _C.vis_3d_box = False # save onepose box visualization 91 | _C.vis_rendering = False # save rendered image for visualization 92 | _C.vis_trajectory = False # save camera trajectory for visualization 93 | 94 | def get_cfg_defaults(): 95 | """Get a yacs CfgNode object with default values""" 96 | return _C.clone() 97 | 98 | def override_cfg_with_args(cfg, args): 99 | for name in vars(args): 100 | if name in cfg: 101 | setattr(cfg, name, getattr(args, name)) 102 | return cfg 103 | -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_from_scratch/chess.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [chess] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 50 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = chess_scratch 47 | 48 | use_scene_coord_memorization = True -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_from_scratch/fire.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [fire] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 50 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = fire_scratch 47 | 48 | use_scene_coord_memorization = True -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_from_scratch/heads.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [heads] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 50 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = heads_scratch 47 | 48 | use_scene_coord_memorization = True -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_from_scratch/office.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [office] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 50 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = office_scratch 47 | 48 | use_scene_coord_memorization = True -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_from_scratch/pumpkin.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [pumpkin] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 50 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = pumpkin_scratch 47 | 48 | use_scene_coord_memorization = True -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_from_scratch/redkitchen.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [redkitchen] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 50 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = redkitchen_scratch 47 | 48 | use_scene_coord_memorization = True -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_from_scratch/stairs.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [stairs] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 50 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = stairs_scratch 47 | 48 | use_scene_coord_memorization = True -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_ft_no_coord/chess.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [chess] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 30 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = chess_ft_no_coord 47 | 48 | use_scene_coord_memorization = False 49 | ckpt = experiments/7scenes/all/multi_scenes_cascade_scratch/checkpoints/epoch29-acc0.8576.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_ft_no_coord/fire.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [fire] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 30 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = fire_ft_no_coord 47 | 48 | use_scene_coord_memorization = False 49 | ckpt = experiments/7scenes/all/multi_scenes_cascade_scratch/checkpoints/epoch29-acc0.8576.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_ft_no_coord/heads.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [heads] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 30 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = heads_ft_no_coord 47 | 48 | use_scene_coord_memorization = False 49 | ckpt = experiments/7scenes/all/multi_scenes_cascade_scratch/checkpoints/epoch29-acc0.8576.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_ft_no_coord/office.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [office] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 30 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = office_ft_no_coord 47 | 48 | use_scene_coord_memorization = False 49 | ckpt = experiments/7scenes/all/multi_scenes_cascade_scratch/checkpoints/epoch29-acc0.8576.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_ft_no_coord/pumpkin.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [pumpkin] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 30 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = pumpkin_ft_no_coord 47 | 48 | use_scene_coord_memorization = False 49 | ckpt = experiments/7scenes/all/backup/multi_scenes_cascade_scratch/checkpoints/epoch29-acc0.8576.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_ft_no_coord/redkitchen.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [redkitchen] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 30 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = redkitchen_ft_no_coord 47 | 48 | use_scene_coord_memorization = False 49 | ckpt = experiments/7scenes/all/multi_scenes_cascade_scratch/checkpoints/epoch29-acc0.8576.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_ft_no_coord/stairs.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [stairs] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | max_epochs = 30 9 | 10 | use_depth_supervision = True 11 | 12 | keypoints_3d_sampling = random 13 | keypoints_3d_sampling_response_thresh = 0.003 14 | keypoints_3d_sampling_max_keep = 16384 15 | coarse_matching_depth_thresh = 0.2 16 | 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | i_weights = 50000 21 | i_testset = 50000 22 | 23 | backbone2d_use_fpn = True 24 | cascade_matching = True 25 | 26 | encode_appearance = True 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 10 31 | 32 | train_nerf = False 33 | train_pose = True 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_fpn_dim = 192 39 | support_image_selection = retrieval 40 | image_core_set_size = 16 41 | image_retrieval_method = netvlad 42 | image_retrieval_method_train = netvlad 43 | image_retrieval_method_test = netvlad 44 | image_retrieval_interval_train = 20 45 | image_retrieval_interval_test = 10 46 | version = stairs_ft_no_coord 47 | 48 | use_scene_coord_memorization = False 49 | ckpt = experiments/7scenes/all/multi_scenes_cascade_scratch/checkpoints/epoch29-acc0.8576.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/7scenes_simple.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/7scenes 3 | dataset_type = video_7scenes 4 | scenes = [chess,fire,heads,office,pumpkin,redkitchen,stairs] 5 | 6 | N_samples = 64 7 | N_rand = 1024 8 | N_iters = 200000 9 | max_epochs = 30 10 | 11 | use_depth_supervision = True 12 | 13 | keypoints_3d_sampling = random 14 | keypoints_3d_sampling_response_thresh = 0.003 15 | keypoints_3d_sampling_max_keep = 16384 16 | coarse_matching_depth_thresh = 0.2 17 | 18 | coarse_num_3d_keypoints = 1024 19 | fine_num_3d_keypoints = 1024 20 | 21 | i_weights = 50000 22 | i_testset = 50000 23 | 24 | backbone2d_use_fpn = True 25 | cascade_matching = False 26 | 27 | encode_appearance = True 28 | appearance_emb_dim = 128 29 | 30 | n_views_train = 5 31 | n_views_test = 10 32 | 33 | train_nerf = False 34 | train_pose = True 35 | 36 | chunk = 2048 37 | 38 | backbone2d = cotr 39 | backbone2d_fpn_dim = 192 40 | support_image_selection = retrieval 41 | image_core_set_size = 16 42 | image_retrieval_method = netvlad 43 | image_retrieval_method_train = netvlad 44 | image_retrieval_method_test = netvlad 45 | image_retrieval_interval_train = 20 46 | image_retrieval_interval_test = 10 47 | 48 | use_scene_coord_memorization = False 49 | 50 | render_loss_weight = 1.0 51 | simple_3d_model = True 52 | version = ablation_study/simple -------------------------------------------------------------------------------- /nerf_loc/configs/ablation_study/cambridge_simple.txt: -------------------------------------------------------------------------------- 1 | expname = all 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/cambridge 3 | dataset_type = video_cambridge 4 | scenes = [GreatCourt,KingsCollege,OldHospital,ShopFacade,StMarysChurch] 5 | 6 | N_samples = 64 7 | N_importance = 0 8 | N_rand = 1024 9 | max_epochs = 50 10 | 11 | use_depth_supervision = False 12 | 13 | voxel_size = [0.2, 0.2, 0.2] 14 | keypoints_3d_sampling = random 15 | keypoints_3d_sampling_max_keep = 100000 16 | coarse_matching_depth_thresh = 2 17 | coarse_num_3d_keypoints = 1024 18 | fine_num_3d_keypoints = 1024 19 | 20 | backbone2d_use_fpn = True 21 | 22 | encode_appearance = True 23 | appearance_emb_dim = 128 24 | 25 | n_views_train = 5 26 | n_views_test = 10 27 | 28 | train_nerf = True 29 | train_pose = True 30 | 31 | chunk = 2048 32 | 33 | backbone2d = cotr 34 | backbone2d_fpn_dim = 192 35 | support_image_selection = retrieval 36 | image_core_set_size = 16 37 | image_retrieval_method_train = netvlad 38 | image_retrieval_method_test = netvlad 39 | image_retrieval_interval_train = 1 40 | image_retrieval_interval_test = 1 41 | use_render_uncertainty = True 42 | use_nerf_3d_feature = True 43 | 44 | render_loss_weight = 1.0 45 | ref_depth_loss_weight = 0.1 46 | 47 | use_scene_coord_memorization = False 48 | 49 | render_feature = True 50 | simple_3d_model = True 51 | version = ablation_study/simple -------------------------------------------------------------------------------- /nerf_loc/configs/blender/lego.txt: -------------------------------------------------------------------------------- 1 | expname = lego 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/blender 3 | datadir = ./data/nerf_synthetic/lego 4 | dataset_type = blender 5 | scene = lego 6 | 7 | no_batching = False 8 | 9 | use_viewdirs = True 10 | render_feat = True 11 | white_bkgd = True 12 | lrate_decay = 500 13 | 14 | N_samples = 64 15 | N_importance = 0 16 | N_rand = 1024 17 | 18 | precrop_iters = 500 19 | precrop_frac = 0.5 20 | 21 | half_res = True 22 | 23 | 24 | use_depth_supervision = False 25 | 26 | encode_appearance = False 27 | appearance_emb_dim = 128 28 | 29 | n_views_train = 5 30 | n_views_test = 5 31 | 32 | train_nerf = True 33 | train_pose = False 34 | 35 | chunk = 2048 36 | 37 | backbone2d = cotr 38 | backbone2d_use_fpn = True 39 | backbone2d_fpn_dim = 192 40 | 41 | refine_support_depth = True 42 | version = finetune 43 | 44 | test_render_interval = 50 45 | ckpt = experiments/generalize/dtu_gso_space_iconic_fg/checkpoints/pretrain.ckpt 46 | -------------------------------------------------------------------------------- /nerf_loc/configs/cambridge/GreatCourt.yaml: -------------------------------------------------------------------------------- 1 | expname: all 2 | basedir: experiments/cambridge 3 | dataset_type: video_cambridge 4 | scenes: [GreatCourt] 5 | 6 | render: 7 | N_samples: 64 8 | N_importance: 0 9 | N_rand: 1024 10 | chunk: 2048 11 | use_render_uncertainty: True 12 | render_feature: True 13 | 14 | max_epochs: 50 15 | 16 | use_depth_supervision: False 17 | 18 | matching: 19 | keypoints_3d_sampling: random 20 | keypoints_3d_sampling_max_keep: 100000 21 | coarse_matching_depth_thresh: 2. 22 | coarse_num_3d_keypoints: 1024 23 | fine_num_3d_keypoints: 1024 24 | 25 | backbone2d_use_fpn: True 26 | 27 | encode_appearance: True 28 | appearance_emb_dim: 128 29 | 30 | n_views_train: 5 31 | n_views_test: 10 32 | 33 | train_nerf: True 34 | train_pose: True 35 | 36 | backbone2d: cotr 37 | backbone2d_fpn_dim: 192 38 | support_image_selection: retrieval 39 | image_core_set_size: 16 40 | image_retrieval_method_train: netvlad 41 | image_retrieval_method_test: netvlad 42 | image_retrieval_interval_train: 1 43 | image_retrieval_interval_test: 1 44 | 45 | render_loss_weight: 1.0 46 | ref_depth_loss_weight: 0.1 47 | 48 | use_scene_coord_memorization: True 49 | 50 | version: GreatCourt_ft 51 | ckpt: experiments/cambridge/all/multi_scenes/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/cambridge/KingsCollege.yaml: -------------------------------------------------------------------------------- 1 | expname: all 2 | basedir: experiments/cambridge 3 | dataset_type: video_cambridge 4 | scenes: [KingsCollege] 5 | 6 | render: 7 | N_samples: 64 8 | N_importance: 0 9 | N_rand: 1024 10 | chunk: 2048 11 | use_render_uncertainty: True 12 | render_feature: True 13 | 14 | max_epochs: 50 15 | 16 | use_depth_supervision: False 17 | 18 | matching: 19 | keypoints_3d_sampling: random 20 | keypoints_3d_sampling_max_keep: 100000 21 | coarse_matching_depth_thresh: 2. 22 | coarse_num_3d_keypoints: 1024 23 | fine_num_3d_keypoints: 1024 24 | 25 | backbone2d_use_fpn: True 26 | 27 | encode_appearance: True 28 | appearance_emb_dim: 128 29 | 30 | n_views_train: 5 31 | n_views_test: 10 32 | 33 | train_nerf: True 34 | train_pose: True 35 | 36 | backbone2d: cotr 37 | backbone2d_fpn_dim: 192 38 | support_image_selection: retrieval 39 | image_core_set_size: 16 40 | image_retrieval_method_train: netvlad 41 | image_retrieval_method_test: netvlad 42 | image_retrieval_interval_train: 1 43 | image_retrieval_interval_test: 1 44 | 45 | render_loss_weight: 1.0 46 | ref_depth_loss_weight: 0.1 47 | 48 | use_scene_coord_memorization: True 49 | 50 | version: KingsCollege_ft 51 | ckpt: experiments/cambridge/all/multi_scenes/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/cambridge/OldHospital.yaml: -------------------------------------------------------------------------------- 1 | expname: all 2 | basedir: experiments/cambridge 3 | dataset_type: video_cambridge 4 | scenes: [OldHospital] 5 | 6 | render: 7 | N_samples: 64 8 | N_importance: 0 9 | N_rand: 1024 10 | chunk: 2048 11 | use_render_uncertainty: True 12 | render_feature: True 13 | 14 | max_epochs: 50 15 | 16 | use_depth_supervision: False 17 | 18 | matching: 19 | keypoints_3d_sampling: random 20 | keypoints_3d_sampling_max_keep: 100000 21 | coarse_matching_depth_thresh: 2. 22 | coarse_num_3d_keypoints: 1024 23 | fine_num_3d_keypoints: 1024 24 | 25 | backbone2d_use_fpn: True 26 | 27 | encode_appearance: True 28 | appearance_emb_dim: 128 29 | 30 | n_views_train: 5 31 | n_views_test: 10 32 | 33 | train_nerf: True 34 | train_pose: True 35 | 36 | backbone2d: cotr 37 | backbone2d_fpn_dim: 192 38 | support_image_selection: retrieval 39 | image_core_set_size: 16 40 | image_retrieval_method_train: netvlad 41 | image_retrieval_method_test: netvlad 42 | image_retrieval_interval_train: 1 43 | image_retrieval_interval_test: 1 44 | 45 | render_loss_weight: 1.0 46 | ref_depth_loss_weight: 0.1 47 | 48 | use_scene_coord_memorization: True 49 | 50 | version: OldHospital_ft 51 | ckpt: experiments/cambridge/all/multi_scenes/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/cambridge/ShopFacade.yaml: -------------------------------------------------------------------------------- 1 | expname: all 2 | basedir: experiments/cambridge 3 | dataset_type: video_cambridge 4 | scenes: [ShopFacade] 5 | 6 | render: 7 | N_samples: 64 8 | N_importance: 0 9 | N_rand: 1024 10 | chunk: 2048 11 | use_render_uncertainty: True 12 | render_feature: True 13 | 14 | max_epochs: 50 15 | 16 | use_depth_supervision: False 17 | 18 | matching: 19 | keypoints_3d_sampling: random 20 | keypoints_3d_sampling_max_keep: 100000 21 | coarse_matching_depth_thresh: 2. 22 | coarse_num_3d_keypoints: 1024 23 | fine_num_3d_keypoints: 1024 24 | 25 | backbone2d_use_fpn: True 26 | 27 | encode_appearance: True 28 | appearance_emb_dim: 128 29 | 30 | n_views_train: 5 31 | n_views_test: 10 32 | 33 | train_nerf: True 34 | train_pose: True 35 | 36 | backbone2d: cotr 37 | backbone2d_fpn_dim: 192 38 | support_image_selection: retrieval 39 | image_core_set_size: 16 40 | image_retrieval_method_train: netvlad 41 | image_retrieval_method_test: netvlad 42 | image_retrieval_interval_train: 1 43 | image_retrieval_interval_test: 1 44 | 45 | render_loss_weight: 1.0 46 | ref_depth_loss_weight: 0.1 47 | 48 | use_scene_coord_memorization: True 49 | 50 | version: ShopFacade_ft 51 | ckpt: experiments/cambridge/all/multi_scenes/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/cambridge/StMarysChurch.yaml: -------------------------------------------------------------------------------- 1 | expname: all 2 | basedir: experiments/cambridge 3 | dataset_type: video_cambridge 4 | scenes: [StMarysChurch] 5 | 6 | render: 7 | N_samples: 64 8 | N_importance: 0 9 | N_rand: 1024 10 | chunk: 2048 11 | use_render_uncertainty: True 12 | render_feature: True 13 | 14 | max_epochs: 50 15 | 16 | use_depth_supervision: False 17 | 18 | matching: 19 | keypoints_3d_sampling: random 20 | keypoints_3d_sampling_max_keep: 100000 21 | coarse_matching_depth_thresh: 2. 22 | coarse_num_3d_keypoints: 1024 23 | fine_num_3d_keypoints: 1024 24 | 25 | backbone2d_use_fpn: True 26 | 27 | encode_appearance: True 28 | appearance_emb_dim: 128 29 | 30 | n_views_train: 5 31 | n_views_test: 10 32 | 33 | train_nerf: True 34 | train_pose: True 35 | 36 | backbone2d: cotr 37 | backbone2d_fpn_dim: 192 38 | support_image_selection: retrieval 39 | image_core_set_size: 16 40 | image_retrieval_method_train: netvlad 41 | image_retrieval_method_test: netvlad 42 | image_retrieval_interval_train: 1 43 | image_retrieval_interval_test: 1 44 | 45 | render_loss_weight: 1.0 46 | ref_depth_loss_weight: 0.1 47 | 48 | use_scene_coord_memorization: True 49 | 50 | version: StMarysChurch_ft 51 | ckpt: experiments/cambridge/all/multi_scenes/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/cambridge_all.yaml: -------------------------------------------------------------------------------- 1 | expname: all 2 | basedir: experiments/cambridge 3 | dataset_type: video_cambridge 4 | scenes: [GreatCourt,KingsCollege,OldHospital,ShopFacade,StMarysChurch] 5 | 6 | render: 7 | N_samples: 64 8 | N_importance: 0 9 | N_rand: 1024 10 | chunk: 2048 11 | use_render_uncertainty: True 12 | render_feature: True 13 | 14 | max_epochs: 50 15 | 16 | use_depth_supervision: False 17 | 18 | matching: 19 | keypoints_3d_sampling: random 20 | keypoints_3d_sampling_max_keep: 100000 21 | coarse_matching_depth_thresh: 2. 22 | coarse_num_3d_keypoints: 1024 23 | fine_num_3d_keypoints: 1024 24 | 25 | backbone2d_use_fpn: True 26 | 27 | encode_appearance: True 28 | appearance_emb_dim: 128 29 | 30 | n_views_train: 5 31 | n_views_test: 10 32 | 33 | train_nerf: True 34 | train_pose: True 35 | 36 | backbone2d: cotr 37 | backbone2d_fpn_dim: 192 38 | support_image_selection: retrieval 39 | image_core_set_size: 16 40 | image_retrieval_method_train: netvlad 41 | image_retrieval_method_test: netvlad 42 | image_retrieval_interval_train: 1 43 | image_retrieval_interval_test: 1 44 | 45 | render_loss_weight: 1.0 46 | ref_depth_loss_weight: 0.1 47 | 48 | use_scene_coord_memorization: False -------------------------------------------------------------------------------- /nerf_loc/configs/data/12scenes.yaml: -------------------------------------------------------------------------------- 1 | DATASET: 2 | type: 12scene 3 | base_dir: data/12scenes 4 | tempo_interval: 1 5 | 6 | # # image 7 | # TRANSFORM: 8 | # - DownSample: 9 | # scale_factor: 4 10 | 11 | # image 12 | TRANSFORM: 13 | - ResizeAndCrop: 14 | target_size: 256 # short side 15 | base_image_size: 32 16 | 17 | AUG_TRANSFORM: 18 | # - RandomZoom: 19 | # aug_scale_min: 0.66666 20 | # aug_scale_max: 1.5 21 | # - RandomRotate: 22 | # aug_rotation: 30 23 | - ColorJitter: 24 | brightness: 0.1 25 | contrast: 0.1 26 | saturation: 0.1 27 | hue: 0.1 -------------------------------------------------------------------------------- /nerf_loc/configs/data/7scenes.yaml: -------------------------------------------------------------------------------- 1 | DATASET: 2 | type: 7scene 3 | base_dir: data/7scenes 4 | tempo_interval: 1 5 | near: 0.3 6 | far: 5.0 7 | 8 | # rescale_far_limit: 3. # scale the whole scene to limit far plane 9 | 10 | # image 11 | TRANSFORM: 12 | - ResizeAndCrop: 13 | target_size: 256 # short side 14 | base_image_size: 16 15 | 16 | AUG_TRANSFORM: 17 | - RandomZoom: 18 | aug_scale_min: 0.666 19 | aug_scale_max: 1.5 20 | - RandomRotate: 21 | aug_rotation: 30 22 | - ColorJitter: 23 | brightness: 0.1 24 | contrast: 0.1 25 | saturation: 0.1 26 | hue: 0.1 27 | 28 | # ref depth augmentation 29 | aug_ref_depth: True 30 | aug_depth_range_prob: 0.05 31 | aug_depth_range_min: 0.95 32 | aug_depth_range_max: 1.05 33 | aug_use_depth_offset: True 34 | aug_depth_offset_prob: 0.25 35 | aug_depth_offset_region_min: 0.05 36 | aug_depth_offset_region_max: 0.1 37 | aug_depth_offset_min: 0.5 38 | aug_depth_offset_max: 1.0 39 | aug_depth_offset_local: 0.1 40 | aug_use_depth_small_offset: True 41 | aug_use_global_noise: True 42 | aug_global_noise_prob: 0.5 43 | aug_depth_small_offset_prob: 0.5 -------------------------------------------------------------------------------- /nerf_loc/configs/data/cambridge.yaml: -------------------------------------------------------------------------------- 1 | DATASET: 2 | type: cambridge 3 | base_dir: data/Cambridge 4 | tempo_interval: 1 5 | # rescale_far_limit: 5. # scale the whole scene to limit far plane 6 | scale_factor: 0.05 7 | # scale_factor: 1 8 | 9 | # image 10 | TRANSFORM: 11 | - ResizeAndCrop: 12 | target_size: 256 # short side 13 | base_image_size: 32 14 | 15 | AUG_TRANSFORM: 16 | - RandomZoom: 17 | aug_scale_min: 1.25 18 | aug_scale_max: 0.8 19 | # - RandomRotate: 20 | # aug_rotation: 15 21 | - ColorJitter: 22 | brightness: 0.1 23 | contrast: 0.1 24 | saturation: 0.1 25 | hue: 0.1 -------------------------------------------------------------------------------- /nerf_loc/configs/data/onepose.yaml: -------------------------------------------------------------------------------- 1 | DATASET: 2 | type: onepose 3 | base_dir: data/onepose 4 | tempo_interval: 1 5 | 6 | # image 7 | TRANSFORM: 8 | - ResizeAndCrop: 9 | target_size: 256 # short side 10 | base_image_size: 16 11 | 12 | AUG_TRANSFORM: 13 | - RandomZoom: 14 | aug_scale_min: 1.25 15 | aug_scale_max: 0.8 16 | # - RandomRotate: 17 | # aug_rotation: 15 18 | - ColorJitter: 19 | brightness: 0.1 20 | contrast: 0.1 21 | saturation: 0.1 22 | hue: 0.1 -------------------------------------------------------------------------------- /nerf_loc/configs/dtu.txt: -------------------------------------------------------------------------------- 1 | expname = mvs 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/dtu 3 | dataset_type = nerf_pretrain 4 | datadir = /youtu/youtu-public/SLAM/DTU/mvs_training/dtu 5 | 6 | no_batching = False 7 | 8 | use_viewdirs = True 9 | learn_descriptor = True 10 | 11 | N_samples = 64 12 | N_rand = 1024 13 | N_iters = 200000 14 | 15 | backbone2d_use_fpn = True 16 | 17 | encode_appearance = True 18 | appearance_emb_dim = 128 19 | 20 | n_views_train = 3 21 | n_views_test = 3 22 | 23 | train_nerf = True 24 | train_pose = False 25 | 26 | chunk = 2048 27 | 28 | backbone2d = cotr 29 | backbone2d_fpn_dim = 192 30 | support_image_selection = retrieval 31 | image_core_set_size = 16 32 | image_retrieval_method = netvlad 33 | use_depth_supervision = False 34 | use_color_volume = False 35 | version = neuray_ibrnet_pointnerf_truncat -------------------------------------------------------------------------------- /nerf_loc/configs/generalize.txt: -------------------------------------------------------------------------------- 1 | expname = generalize 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs 3 | dataset_type = nerf_pretrain 4 | scenes = [dtu_train,gso,space,real_iconic] 5 | 6 | no_batching = False 7 | 8 | use_viewdirs = True 9 | 10 | N_samples = 64 11 | N_importance = 0 12 | N_rand = 1024 13 | N_iters = 200000 14 | 15 | backbone2d_use_fpn = True 16 | 17 | encode_appearance = False 18 | appearance_emb_dim = 224 19 | 20 | n_views_train = 3 21 | n_views_test = 3 22 | 23 | train_nerf = True 24 | train_pose = False 25 | 26 | chunk = 2048 27 | 28 | backbone2d = cotr 29 | backbone2d_fpn_dim = 192 30 | support_image_selection = retrieval 31 | image_core_set_size = 16 32 | image_retrieval_method = netvlad 33 | use_depth_supervision = False 34 | refine_support_depth = True 35 | version = hist_global_adapt 36 | 37 | test_render_interval = 1 -------------------------------------------------------------------------------- /nerf_loc/configs/llff/horns.txt: -------------------------------------------------------------------------------- 1 | expname = horns 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/llff 3 | datadir = ./data/nerf_llff_data/horns 4 | dataset_type = llff 5 | scene = horns 6 | 7 | factor = 8 8 | llffhold = 8 9 | 10 | N_rand = 1024 11 | N_samples = 64 12 | N_importance = 0 13 | 14 | use_viewdirs = True 15 | 16 | use_depth_supervision = False 17 | 18 | encode_appearance = False 19 | appearance_emb_dim = 128 20 | 21 | n_views_train = 5 22 | n_views_test = 5 23 | 24 | train_nerf = True 25 | train_pose = False 26 | 27 | chunk = 2048 28 | 29 | backbone2d = cotr 30 | backbone2d_use_fpn = True 31 | backbone2d_fpn_dim = 192 32 | 33 | refine_support_depth = True 34 | version = finetune 35 | 36 | test_render_interval = 5 37 | ckpt = experiments/generalize/dtu_gso_space_iconic_fg/checkpoints/pretrain.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/mario.txt: -------------------------------------------------------------------------------- 1 | expname = mario 2 | basedir = /youtu/xlab-team4/jenningsliu/nerf_loc_logs/capture_data 3 | datadir = /youtu/xlab-team4/jenningsliu/localization/capture_data/colmap/dense 4 | dataset_type = colmap 5 | scene = mario 6 | 7 | N_samples = 64 8 | N_rand = 512 9 | max_epochs = 200 10 | 11 | use_depth_supervision = True 12 | 13 | coarse_num_3d_keypoints = 1024 14 | fine_num_3d_keypoints = 1024 15 | 16 | i_weights = 50000 17 | i_testset = 50000 18 | 19 | backbone2d_use_fpn = True 20 | cascade_matching = False 21 | 22 | encode_appearance = True 23 | appearance_emb_dim = 128 24 | 25 | n_views_train = 16 26 | n_views_test = 16 27 | 28 | train_nerf = True 29 | train_pose = True 30 | 31 | chunk = 2048 32 | 33 | backbone2d = cotr 34 | backbone2d_fpn_dim = 192 35 | use_color_volume = False 36 | support_image_selection = coreset 37 | coreset_sampler = FPS 38 | image_core_set_size = 16 39 | image_retrieval_method = netvlad 40 | image_retrieval_method_train = netvlad 41 | image_retrieval_method_test = netvlad 42 | image_retrieval_interval_train = 10 43 | image_retrieval_interval_test = 10 44 | 45 | render_loss_weight = 1.0 46 | ref_depth_loss_weight = 1.0 47 | 48 | render_feature = True 49 | use_scene_coord_memorization = True 50 | version = colmap_sfm 51 | keypoints_3d_source = sfm 52 | 53 | ckpt = experiments/generalize/dtu_gso_space_iconic_fg/checkpoints/pretrain.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/onepose/0447.yaml: -------------------------------------------------------------------------------- 1 | expname: 0447-nabati-box 2 | basedir: experiments/onepose 3 | dataset_type: video_onepose 4 | scenes: [0447-nabati-box] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 512 9 | chunk: 2048 10 | render_feature: True 11 | 12 | max_epochs: 30 13 | 14 | use_depth_supervision: False 15 | 16 | matching: 17 | keypoints_3d_sampling: random 18 | keypoints_3d_sampling_max_keep: 8192 19 | coarse_matching_depth_thresh: 0.02 20 | coarse_num_3d_keypoints: 1024 21 | fine_num_3d_keypoints: 1024 22 | 23 | backbone2d_use_fpn: True 24 | 25 | encode_appearance: False 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 16 29 | n_views_test: 16 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: coreset 37 | coreset_sampler: FPS 38 | image_core_set_size: 16 39 | image_retrieval_method: netvlad 40 | image_retrieval_method_train: netvlad 41 | image_retrieval_method_test: netvlad 42 | image_retrieval_interval_train: 10 43 | image_retrieval_interval_test: 10 44 | 45 | render_loss_weight: 1.0 46 | ref_depth_loss_weight: 1.0 47 | 48 | use_scene_coord_memorization: True 49 | ckpt: experiments/onepose/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/onepose/0450.yaml: -------------------------------------------------------------------------------- 1 | expname: 0450-hlychocpie-box 2 | basedir: experiments/onepose 3 | dataset_type: video_onepose 4 | scenes: [0450-hlychocpie-box] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 512 9 | chunk: 2048 10 | render_feature: True 11 | 12 | max_epochs: 30 13 | 14 | use_depth_supervision: False 15 | 16 | matching: 17 | keypoints_3d_sampling: random 18 | keypoints_3d_sampling_max_keep: 8192 19 | coarse_matching_depth_thresh: 0.02 20 | coarse_num_3d_keypoints: 1024 21 | fine_num_3d_keypoints: 1024 22 | 23 | backbone2d_use_fpn: True 24 | 25 | encode_appearance: False 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 16 29 | n_views_test: 16 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: coreset 37 | coreset_sampler: FPS 38 | image_core_set_size: 16 39 | image_retrieval_method: netvlad 40 | image_retrieval_method_train: netvlad 41 | image_retrieval_method_test: netvlad 42 | image_retrieval_interval_train: 10 43 | image_retrieval_interval_test: 10 44 | 45 | render_loss_weight: 1.0 46 | ref_depth_loss_weight: 1.0 47 | 48 | use_scene_coord_memorization: True 49 | ckpt: experiments/onepose/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/onepose/0488.yaml: -------------------------------------------------------------------------------- 1 | expname: 0488-jijiantoothpaste-box 2 | basedir: experiments/onepose 3 | dataset_type: video_onepose 4 | scenes: [0488-jijiantoothpaste-box] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 512 9 | chunk: 2048 10 | render_feature: True 11 | 12 | max_epochs: 30 13 | 14 | use_depth_supervision: False 15 | 16 | matching: 17 | keypoints_3d_sampling: random 18 | keypoints_3d_sampling_max_keep: 8192 19 | coarse_matching_depth_thresh: 0.02 20 | coarse_num_3d_keypoints: 1024 21 | fine_num_3d_keypoints: 1024 22 | 23 | backbone2d_use_fpn: True 24 | 25 | encode_appearance: False 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 16 29 | n_views_test: 16 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: coreset 37 | coreset_sampler: FPS 38 | image_core_set_size: 16 39 | image_retrieval_method: netvlad 40 | image_retrieval_method_train: netvlad 41 | image_retrieval_method_test: netvlad 42 | image_retrieval_interval_train: 10 43 | image_retrieval_interval_test: 10 44 | 45 | render_loss_weight: 1.0 46 | ref_depth_loss_weight: 1.0 47 | 48 | use_scene_coord_memorization: True 49 | ckpt: experiments/onepose/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/onepose/0493.yaml: -------------------------------------------------------------------------------- 1 | expname: 0493-haochidianeggroll-box 2 | basedir: experiments/onepose 3 | dataset_type: video_onepose 4 | scenes: [0493-haochidianeggroll-box] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 512 9 | chunk: 2048 10 | render_feature: True 11 | 12 | max_epochs: 30 13 | 14 | use_depth_supervision: False 15 | 16 | matching: 17 | keypoints_3d_sampling: random 18 | keypoints_3d_sampling_max_keep: 8192 19 | coarse_matching_depth_thresh: 0.02 20 | coarse_num_3d_keypoints: 1024 21 | fine_num_3d_keypoints: 1024 22 | 23 | backbone2d_use_fpn: True 24 | 25 | encode_appearance: False 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 16 29 | n_views_test: 16 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: coreset 37 | coreset_sampler: FPS 38 | image_core_set_size: 16 39 | image_retrieval_method: netvlad 40 | image_retrieval_method_train: netvlad 41 | image_retrieval_method_test: netvlad 42 | image_retrieval_interval_train: 10 43 | image_retrieval_interval_test: 10 44 | 45 | render_loss_weight: 1.0 46 | ref_depth_loss_weight: 1.0 47 | 48 | use_scene_coord_memorization: True 49 | ckpt: experiments/onepose/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/onepose/0494.yaml: -------------------------------------------------------------------------------- 1 | expname: 0494-qvduoduocookies-box 2 | basedir: experiments/onepose 3 | dataset_type: video_onepose 4 | scenes: [0494-qvduoduocookies-box] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 512 9 | chunk: 2048 10 | render_feature: True 11 | 12 | max_epochs: 30 13 | 14 | use_depth_supervision: False 15 | 16 | matching: 17 | keypoints_3d_sampling: random 18 | keypoints_3d_sampling_max_keep: 8192 19 | coarse_matching_depth_thresh: 0.02 20 | coarse_num_3d_keypoints: 1024 21 | fine_num_3d_keypoints: 1024 22 | 23 | backbone2d_use_fpn: True 24 | 25 | encode_appearance: False 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 16 29 | n_views_test: 16 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: coreset 37 | coreset_sampler: FPS 38 | image_core_set_size: 16 39 | image_retrieval_method: netvlad 40 | image_retrieval_method_train: netvlad 41 | image_retrieval_method_test: netvlad 42 | image_retrieval_interval_train: 10 43 | image_retrieval_interval_test: 10 44 | 45 | render_loss_weight: 1.0 46 | ref_depth_loss_weight: 1.0 47 | 48 | use_scene_coord_memorization: True 49 | ckpt: experiments/onepose/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/onepose/0594.yaml: -------------------------------------------------------------------------------- 1 | expname: 0594-martinBootsLeft-others 2 | basedir: experiments/onepose 3 | dataset_type: video_onepose 4 | scenes: [0594-martinBootsLeft-others] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 512 9 | chunk: 2048 10 | render_feature: True 11 | 12 | max_epochs: 30 13 | 14 | use_depth_supervision: False 15 | 16 | matching: 17 | keypoints_3d_sampling: random 18 | keypoints_3d_sampling_max_keep: 8192 19 | coarse_matching_depth_thresh: 0.02 20 | coarse_num_3d_keypoints: 1024 21 | fine_num_3d_keypoints: 1024 22 | 23 | backbone2d_use_fpn: True 24 | 25 | encode_appearance: False 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 16 29 | n_views_test: 16 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: coreset 37 | coreset_sampler: FPS 38 | image_core_set_size: 16 39 | image_retrieval_method: netvlad 40 | image_retrieval_method_train: netvlad 41 | image_retrieval_method_test: netvlad 42 | image_retrieval_interval_train: 10 43 | image_retrieval_interval_test: 10 44 | 45 | render_loss_weight: 1.0 46 | ref_depth_loss_weight: 1.0 47 | 48 | use_scene_coord_memorization: True 49 | ckpt: experiments/onepose/all/default/checkpoints/last.ckpt -------------------------------------------------------------------------------- /nerf_loc/configs/onepose_all.yaml: -------------------------------------------------------------------------------- 1 | expname: all 2 | basedir: experiments/onepose 3 | dataset_type: video_onepose 4 | scenes: [0447-nabati-box, 0450-hlychocpie-box, 0488-jijiantoothpaste-box, 0493-haochidianeggroll-box, 0494-qvduoduocookies-box, 0594-martinBootsLeft-others] 5 | 6 | render: 7 | N_samples: 64 8 | N_rand: 512 9 | chunk: 2048 10 | render_feature: True 11 | 12 | max_epochs: 30 13 | 14 | use_depth_supervision: False 15 | 16 | matching: 17 | keypoints_3d_sampling: random 18 | keypoints_3d_sampling_max_keep: 8192 19 | coarse_matching_depth_thresh: 0.02 20 | coarse_num_3d_keypoints: 1024 21 | fine_num_3d_keypoints: 1024 22 | 23 | backbone2d_use_fpn: True 24 | 25 | encode_appearance: False 26 | appearance_emb_dim: 128 27 | 28 | n_views_train: 16 29 | n_views_test: 16 30 | 31 | train_nerf: True 32 | train_pose: True 33 | 34 | backbone2d: cotr 35 | backbone2d_fpn_dim: 192 36 | support_image_selection: coreset 37 | coreset_sampler: FPS 38 | image_core_set_size: 16 39 | image_retrieval_method: netvlad 40 | image_retrieval_method_train: netvlad 41 | image_retrieval_method_test: netvlad 42 | image_retrieval_interval_train: 10 43 | image_retrieval_interval_test: 10 44 | 45 | render_loss_weight: 1.0 46 | ref_depth_loss_weight: 1.0 -------------------------------------------------------------------------------- /nerf_loc/datasets/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-02-28 19:38:14 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-08-23 17:13:51 6 | FilePath: /nerf-loc/datasets/__init__.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import yaml 11 | from nerf_loc.utils.common import AttrDict 12 | import os 13 | import copy 14 | 15 | def build_dataset(args, split, phase='pose'): 16 | if args.dataset_type == 'blender': 17 | from .neuray_base_dataset import NeurayBaseDataset 18 | dataset = NeurayBaseDataset(args, split, f'nerf_synthetic/{args.scene}/white_400') 19 | elif args.dataset_type == 'llff': 20 | from .neuray_base_dataset import NeurayBaseDataset 21 | dataset = NeurayBaseDataset(args, split, f'llff_colmap/{args.scene}/low') 22 | elif args.dataset_type == 'colmap': 23 | from .colmap_dataset import ColmapDataset 24 | dataset = ColmapDataset(args, args.datadir, split, depth_type='colmap') 25 | elif args.dataset_type.startswith('video_'): 26 | from .video.dataset import VideoDataset 27 | from .video.multi_scene_dataset import MultiSceneDataset 28 | cfg_name = args.dataset_type.split('_')[1] 29 | cfg_file = f'nerf_loc/configs/data/{cfg_name}.yaml' 30 | cfg = yaml.load(open(cfg_file), Loader=yaml.FullLoader) 31 | if phase == 'nerf': 32 | # no geometry augmentation during nerf training 33 | cfg['DATASET']['AUG_TRANSFORM'] = [] 34 | if phase == 'nerf': 35 | # single scene 36 | cfg['DATASET']['scene'] = args.scene 37 | cfg = AttrDict(cfg) 38 | dataset = VideoDataset(args, cfg.DATASET, split) 39 | else: 40 | datasets = [] 41 | # multiple scene 42 | for scene in args.scenes: 43 | scene_cfg = copy.deepcopy(cfg) 44 | scene_cfg['DATASET']['scene'] = scene 45 | scene_cfg = AttrDict(scene_cfg) 46 | ds = VideoDataset(args, scene_cfg.DATASET, split) 47 | datasets.append(ds) 48 | dataset = MultiSceneDataset(datasets) 49 | else: 50 | raise NotImplementedError 51 | return dataset 52 | -------------------------------------------------------------------------------- /nerf_loc/datasets/colmap/cli.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-04-27 10:02:03 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-04-27 10:39:51 6 | FilePath: /nerf-loc/datasets/colmap/cli.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import sys 11 | import subprocess 12 | 13 | def run_colmap_mvs(sparse_path, image_path, dense_path): 14 | cmd = [ 15 | 'colmap image_undistorter', 16 | f'--image_path {image_path}', 17 | f'--input_path {sparse_path}', 18 | f'--output_path {dense_path}', 19 | f'--output_type COLMAP', 20 | f'--max_image_size 2000' 21 | ] 22 | out = subprocess.run(' '.join(cmd), capture_output=False, shell=True, check=False) 23 | if out.returncode != 0: 24 | print('Run colmap image_undistorter failed!') 25 | sys.exit(1) 26 | 27 | cmd = [ 28 | 'colmap patch_match_stereo', 29 | f'--workspace_path {dense_path}', 30 | f'--workspace_format COLMAP', 31 | f'--PatchMatchStereo.geom_consistency true' 32 | ] 33 | out = subprocess.run(' '.join(cmd), capture_output=False, shell=True, check=False) 34 | if out.returncode != 0: 35 | print('Run colmap patch_match_stereo failed!') 36 | sys.exit(1) 37 | -------------------------------------------------------------------------------- /nerf_loc/datasets/neuray_base_dataset.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-03-29 22:36:11 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-08-22 13:04:44 6 | FilePath: /nerf-loc/datasets/llff.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import torch 11 | from torch.utils.data import Dataset 12 | import torchvision 13 | import numpy as np 14 | import random 15 | 16 | import sys 17 | sys.path.append('third_party/NeuRay') 18 | from third_party.NeuRay.dataset.train_dataset import * 19 | 20 | class NeurayBaseDataset(Dataset): 21 | def __init__(self, args, split, database_name): 22 | super().__init__() 23 | self.args = args 24 | self.database = parse_database_name(database_name) 25 | self.ref_ids, test_que_ids = get_database_split(self.database, 'test') 26 | self.split = split 27 | if split == 'train': 28 | self.que_ids = self.ref_ids 29 | else: 30 | self.que_ids = test_que_ids 31 | self.scale_factor = 1 32 | 33 | def set_mode(self, mode): 34 | pass 35 | 36 | def prepare_pose(self, poses): 37 | hom_poses = np.tile(np.eye(4, dtype=np.float32)[None], [len(poses),1,1]).copy() 38 | hom_poses[:,:3] = poses 39 | return np.linalg.inv(hom_poses) 40 | 41 | def select_working_views(self, database, que_id, ref_ids): 42 | ref_ids = [i for i in ref_ids if i != que_id] 43 | database_name = database.database_name 44 | dist_idx = compute_nearest_camera_indices(database, [que_id], ref_ids)[0] 45 | dist_idx = dist_idx[:10] 46 | ref_ids = np.array(ref_ids)[dist_idx] 47 | return ref_ids 48 | 49 | def __len__(self): 50 | # return len(self.images) 51 | return len(self.que_ids) 52 | 53 | def __getitem__(self, idx): 54 | que_id = self.que_ids[idx] 55 | ref_ids = self.select_working_views(self.database, que_id, self.ref_ids) 56 | ref_imgs_info = build_imgs_info(self.database, ref_ids, -1, True) 57 | que_imgs_info = build_imgs_info(self.database, [que_id], has_depth=self.split=='train') 58 | ref_imgs_info = pad_imgs_info(ref_imgs_info, 16) 59 | near, far = que_imgs_info['depth_range'][0] 60 | ret = { 61 | 'scene': self.args.scene, 62 | 'filename': que_imgs_info['img_ids'][0]+'.png', 63 | 'image': que_imgs_info['imgs'][0], 64 | 'pose': self.prepare_pose(que_imgs_info['poses'])[0], 65 | 'K': que_imgs_info['Ks'][0], 66 | 'near': near, 67 | 'far': far, 68 | 'topk_images': ref_imgs_info['imgs'], 69 | 'topk_poses': self.prepare_pose(ref_imgs_info['poses']), 70 | 'topk_Ks': ref_imgs_info['Ks'], 71 | 'topk_depths': ref_imgs_info['depth'].squeeze(1), 72 | 'points3d': np.zeros([8,3],dtype=np.float32), 73 | 'scale_factor': self.scale_factor 74 | } 75 | if 'depth' in que_imgs_info: 76 | ret['depth'] = que_imgs_info['depth'].squeeze(1)[0] 77 | else: 78 | ret['depth'] = np.zeros_like(ret['image'][0]) 79 | if 'true_depth' in ref_imgs_info: 80 | ret['topk_depths_gt'] = ref_imgs_info['true_depth'].squeeze(1) 81 | return ret 82 | -------------------------------------------------------------------------------- /nerf_loc/datasets/video/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JenningsL/nerf-loc/1d539c5a4824a46d26414f3c2b41bb1b1f6dd91e/nerf_loc/datasets/video/__init__.py -------------------------------------------------------------------------------- /nerf_loc/datasets/video/covisibility_sampler.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-06-24 16:57:04 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-07-07 13:41:49 6 | FilePath: /nerf-loc/datasets/video/covisibility_sampler.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import torch 11 | import numpy as np 12 | import cv2 13 | import os 14 | from collections import defaultdict 15 | 16 | # from models.ops.pointnet2.pointnet2_batch import pointnet2_utils 17 | from .furthest_pose_sampler import get_next_FPS_sample 18 | 19 | class CovisibilitySampling(object): 20 | def __init__(self, dataset, max_num_pts=8192): 21 | self.dataset = dataset 22 | if len(pts3d) < 8192: 23 | # pts3d = torch.tensor(dataset.pc.vertices).float().cuda() 24 | # rand_idx = pointnet2_utils.furthest_point_sample(pts3d.view(1,-1,3), 8192)[0].long() 25 | # self.pc = pts3d[rand_idx].cpu().numpy() 26 | pts3d = dataset.pc.vertices 27 | rand_idx = np.random.choice(len(pts3d), 8192, replace=False) 28 | self.pc = pts3d[rand_idx] 29 | self.ref_poses = dataset.ref_poses 30 | self.ref_Ks = dataset.ref_intrinsics 31 | 32 | self.points_to_images = {} 33 | self.images_to_points = {} 34 | 35 | self.build_visibility() 36 | 37 | def get_visible_points(self, T, K): 38 | xyz_cam = T[:3,:3] @ self.pc.T + T[:3,3:4] 39 | uvz = K @ xyz_cam 40 | u,v,z = uvz[0], uvz[1], uvz[2] 41 | u = u / (z+1e-8) 42 | v = v / (z+1e-8) 43 | w = int(K[0,2] * 2) 44 | h = int(K[1,2] * 2) 45 | mask = (z > 0) & (u > 0) & (u < w) & (v > 0) & (v < h) 46 | return np.nonzero(mask)[0] 47 | 48 | def build_visibility(self): 49 | for name, Tcw in self.ref_poses.items(): 50 | cam_params = self.ref_Ks[name] 51 | K = np.eye(3) 52 | K[0,0] = cam_params[0] 53 | K[1,1] = cam_params[1] 54 | K[0,2] = cam_params[2] 55 | K[1,2] = cam_params[3] 56 | vis_idx = self.get_visible_points(Tcw, K) 57 | self.images_to_points[name] = vis_idx 58 | 59 | def find_visible_images(self, pts_idx): 60 | pass 61 | 62 | def find_visible_points(self, imgs_idx): 63 | pass 64 | 65 | def sample(self, max_k, target_idx=None): 66 | # print(f'build core image set for scene: {self.dataset.scene}') 67 | if target_idx is None: 68 | target_idx = set(np.arange(len(self.pc))) 69 | samples = {} 70 | candidates = {name:self.ref_poses[name] for name in self.images_to_points.keys()} 71 | for k in range(max_k): 72 | if len(target_idx) > 0: 73 | best = None 74 | best_overlap = None 75 | for ref_name in candidates: 76 | vis_idx = self.images_to_points[ref_name] 77 | intersection = target_idx.intersection(set(vis_idx)) 78 | if best is None or len(intersection) > len(best_overlap): 79 | best = ref_name 80 | best_overlap = intersection 81 | target_idx = target_idx - best_overlap 82 | samples[best] = self.ref_poses[best] 83 | candidates.pop(best) 84 | else: 85 | # if all points are obversed, pick furthest pose 86 | next_idx, next_name, next_pose = get_next_FPS_sample(candidates, samples) 87 | samples[next_name] = next_pose 88 | candidates.pop(next_name) 89 | # print(len(target_idx)) 90 | # print(f'Final core image set size of {self.dataset.scene} is {len(samples)}') 91 | return list(samples.keys()) 92 | 93 | def find_covisible_images(self, T, K, max_k): 94 | target_idx = set(self.get_visible_points(T, K)) 95 | 96 | samples = {} 97 | candidates = {name:self.ref_poses[name] for name in self.images_to_points.keys()} 98 | for k in range(max_k): 99 | if len(target_idx) == 0: 100 | break 101 | best = None 102 | best_overlap = None 103 | for ref_name in candidates: 104 | vis_idx = self.images_to_points[ref_name] 105 | intersection = target_idx.intersection(set(vis_idx)) 106 | if best is None or len(intersection) > len(best_overlap): 107 | best = ref_name 108 | best_overlap = intersection 109 | target_idx = target_idx - best_overlap 110 | samples[best] = self.ref_poses[best] 111 | candidates.pop(best) 112 | return list(samples.keys()) 113 | 114 | if __name__ == '__main__': 115 | from nerf_loc.configs import config_parser 116 | from nerf_loc.datasets import build_dataset 117 | from nerf_loc.utils.common import colorize_np 118 | parser = config_parser() 119 | args = parser.parse_args() 120 | 121 | multi_trainset = build_dataset(args, 'train') 122 | # multi_testset = build_dataset(args, 'test') 123 | 124 | trainset = multi_trainset.datasets[0] 125 | # testset = multi_testset.datasets[0] 126 | cov = CovisibilitySampling(trainset) 127 | 128 | # target_idx = np.random.choice(len(testset), 1)[0] 129 | # print(target_idx) 130 | # target_frame = testset[target_idx] 131 | # target_name = target_frame['filename'] 132 | # target_pose = target_frame['pose'] 133 | # target_K = target_frame['K'] 134 | # res = cov.find_covisible_images(target_pose, target_K, max_k=5) 135 | # images = [cv2.imread(os.path.join(testset.root_dir, target_name))] 136 | 137 | res = cov.sample(max_k=args.image_core_set_size) 138 | print('train: ', res) 139 | # cov1 = CovisibilitySampling(testset) 140 | # res1 = cov1.sample(max_k=args.image_core_set_size) 141 | # print('test: ', res1) 142 | images = [] 143 | depths = [] 144 | 145 | # vis 146 | for p in res: 147 | idx = trainset.ref_image_idx[p] 148 | meta = trainset.train_meta_info_list[idx] 149 | filename = os.path.join(trainset.root_dir, p) 150 | images.append(cv2.imread(filename)) 151 | depth_filename = meta['depth_file_name'] 152 | depths.append(cv2.imread(os.path.join(trainset.root_dir, depth_filename), cv2.IMREAD_ANYDEPTH) / 1000) 153 | color = np.concatenate(images, axis=1) 154 | depth = np.concatenate(depths, axis=1) 155 | depth = (colorize_np(depth, 'jet', None, range=None, append_cbar=False, cbar_in_image=False) * 255).astype(np.uint8) 156 | vis = np.concatenate([color, depth], axis=0) 157 | vis = cv2.resize(vis, fx=0.25, fy=0.25, dsize=None) 158 | cv2.imwrite('vis_coreset.png', vis) 159 | -------------------------------------------------------------------------------- /nerf_loc/datasets/video/furthest_pose_sampler.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-06-28 18:08:20 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-07-28 19:45:24 6 | FilePath: /nerf-loc/datasets/video/furthest_pose_sampler.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import torch 11 | import numpy as np 12 | import cv2 13 | import os 14 | from collections import defaultdict 15 | import copy 16 | 17 | from nerf_loc.utils.common import batched_angular_dist_rot_matrix 18 | 19 | def get_next_FPS_sample(candidates, samples): 20 | candidates_name = list(candidates.keys()) 21 | candidates_pose = np.array([candidates[n] for n in candidates_name]) # N,4,4 22 | sampled_name = list(samples.keys()) 23 | sampled_pose = np.array([samples[n] for n in sampled_name]) # M,4,4 24 | 25 | N, M = len(candidates_pose), len(sampled_pose) 26 | 27 | candidates_R = np.tile(candidates_pose[:,None,:3,:3], [1,M,1,1]).reshape(-1,3,3) # N*M,3,3 28 | sampled_R = np.tile(sampled_pose[None,:,:3,:3], [N,1,1,1]).reshape(-1,3,3) # N*M,3,3 29 | angular_dists = batched_angular_dist_rot_matrix(candidates_R, sampled_R).reshape(N,M) 30 | 31 | max_min_idx = angular_dists.min(axis=1).argmax(axis=0) # N -> 1 32 | return max_min_idx, candidates_name[max_min_idx], candidates_pose[max_min_idx] 33 | 34 | class FurtherPoseSampling(object): 35 | def __init__(self, dataset): 36 | self.dataset = dataset 37 | 38 | self.ref_poses = dataset.ref_poses 39 | self.filter_ref_poses_without_depth() # only sample reference frame with depth 40 | 41 | def filter_ref_poses_without_depth(self): 42 | meta_info = {d['file_name']:d for d in self.dataset.train_meta_info_list} 43 | def is_depth_exist(meta): 44 | return os.path.exists(os.path.join(meta['base_dir'], meta["depth_file_name"])) 45 | self.ref_poses = {name:pose for name,pose in self.ref_poses.items() if is_depth_exist(meta_info[name])} 46 | 47 | def sample(self, max_k): 48 | return FurtherPoseSampling.sample_FPS(self.ref_poses, max_k) 49 | 50 | @staticmethod 51 | def sample_FPS(ref_poses, max_k): 52 | samples = {} 53 | # np.random.seed(666) 54 | init_idx = np.random.choice(len(ref_poses), 1, replace=False)[0] 55 | init_name = list(ref_poses.keys())[init_idx] 56 | samples[init_name] = ref_poses[init_name] 57 | candidates = copy.deepcopy(ref_poses) 58 | candidates.pop(init_name) 59 | for k in range(1, max_k): 60 | next_idx, next_name, next_pose = get_next_FPS_sample(candidates, samples) 61 | samples[next_name] = next_pose 62 | 63 | return list(samples.keys()) 64 | 65 | if __name__ == '__main__': 66 | from nerf_loc.configs import config_parser 67 | from nerf_loc.datasets import build_dataset 68 | from nerf_loc.utils.common import colorize_np 69 | import trimesh 70 | parser = config_parser() 71 | args = parser.parse_args() 72 | 73 | multi_trainset = build_dataset(args, 'train') 74 | # multi_testset = build_dataset(args, 'test') 75 | 76 | trainset = multi_trainset.datasets[0] 77 | # testset = multi_testset.datasets[0] 78 | sampler = FurtherPoseSampling(trainset) 79 | 80 | res = sampler.sample(max_k=args.image_core_set_size) 81 | print('train: ', res) 82 | # cov1 = CovisibilitySampling(testset) 83 | # res1 = cov1.sample(max_k=args.image_core_set_size) 84 | # print('test: ', res1) 85 | images = [] 86 | depths = [] 87 | points = [] 88 | point_colors = [] 89 | # vis 90 | for p in res: 91 | idx = trainset.ref_image_idx[p] 92 | data = trainset[idx] 93 | image = (data['image'].transpose(1,2,0)*255).astype(np.uint8) 94 | depth = data['depth'] 95 | K = torch.tensor(data['K']) 96 | c2w = torch.tensor(data['pose']) 97 | 98 | images.append(image) 99 | depths.append(depth) 100 | # print(idx, K) 101 | depth_tensor = torch.tensor(depth) 102 | v, u = torch.nonzero((depth_tensor > trainset.near) & (depth_tensor < trainset.far), as_tuple=True) 103 | z = depth_tensor[v,u] 104 | uv_hom = torch.stack([u,v,torch.ones_like(u)], dim=0).float() # 3,N 105 | pts3d_cam = torch.matmul(K.inverse(), uv_hom) * z 106 | pts3d_cam_hom = torch.cat([pts3d_cam, torch.ones_like(pts3d_cam[:1])]) 107 | pts3d_world = torch.matmul(c2w[:3,:3], pts3d_cam) + c2w[:3,3:] 108 | points.append(pts3d_world.T) 109 | point_colors.append(torch.tensor(image)[v,u]) 110 | color = cv2.cvtColor(np.concatenate(images, axis=1), cv2.COLOR_BGR2RGB) 111 | depth = np.concatenate(depths, axis=1) 112 | depth = (colorize_np( 113 | depth, 'jet', None, range=[trainset.near, trainset.far], append_cbar=False, cbar_in_image=False 114 | ) * 255).astype(np.uint8) 115 | vis = np.concatenate([color, depth], axis=0) 116 | # vis = cv2.resize(vis, fx=0.25, fy=0.25, dsize=None) 117 | cv2.imwrite('vis_coreset.png', vis) 118 | 119 | points = torch.cat(points).numpy() 120 | point_colors = torch.cat(point_colors).numpy() 121 | cloud = trimesh.PointCloud( 122 | vertices=points, 123 | colors=point_colors) 124 | cloud.export(f'coreset_pc.ply') 125 | -------------------------------------------------------------------------------- /nerf_loc/datasets/video/image.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import cv2 3 | 4 | 5 | def crop_from_center(img, new_h, new_w): 6 | """ 7 | Crop the image with respect to the center 8 | :param img: image to be cropped 9 | :param new_h: cropped dimension on height 10 | :param new_w: cropped dimension on width 11 | :return: cropped image 12 | """ 13 | 14 | h = img.shape[0] 15 | w = img.shape[1] 16 | x_c = w / 2 17 | y_c = h / 2 18 | 19 | crop_img = None 20 | if h >= new_h and w >= new_w: 21 | start_x = int(x_c - new_w / 2) 22 | start_y = int(y_c - new_h / 2) 23 | 24 | if len(img.shape) > 2: 25 | crop_img = img[ 26 | start_y : start_y + int(new_h), start_x : start_x + int(new_w), : 27 | ] 28 | elif len(img.shape) == 2: 29 | crop_img = img[ 30 | start_y : start_y + int(new_h), start_x : start_x + int(new_w) 31 | ] 32 | 33 | return crop_img 34 | 35 | 36 | def fov(fx, fy, h, w): 37 | """ 38 | Camera fov on x and y dimension 39 | :param fx: focal length on x axis 40 | :param fy: focal length on y axis 41 | :param h: frame height 42 | :param w: frame width 43 | :return: fov_x, fov_y 44 | """ 45 | return ( 46 | np.rad2deg(2 * np.arctan(w / (2 * fx))), 47 | np.rad2deg(2 * np.arctan(h / (2 * fy))), 48 | ) 49 | 50 | 51 | def crop_by_intrinsic(img, cur_k, new_k, interp_method="bilinear"): 52 | """ 53 | Crop the image with new intrinsic parameters 54 | :param img: image to be cropped 55 | :param cur_k: current intrinsic parameters, 3x3 matrix 56 | :param new_k: crop target intrinsic parameters, 3x3 matrix 57 | :return: cropped image 58 | """ 59 | cur_fov_x, cur_fov_y = fov( 60 | cur_k[0, 0], cur_k[1, 1], 2 * cur_k[1, 2], 2 * cur_k[0, 2] 61 | ) 62 | new_fov_x, new_fov_y = fov( 63 | new_k[0, 0], new_k[1, 1], 2 * new_k[1, 2], 2 * new_k[0, 2] 64 | ) 65 | crop_img = None 66 | if cur_fov_x >= new_fov_x and cur_fov_y >= new_fov_y: 67 | # Only allow to crop to a smaller fov image 68 | # 1. Resize image 69 | focal_ratio = new_k[0, 0] / cur_k[0, 0] 70 | if interp_method == "nearest": 71 | crop_img = cv2.resize( 72 | img, 73 | (int(focal_ratio * img.shape[1]), int(focal_ratio * img.shape[0])), 74 | interpolation=cv2.INTER_NEAREST, 75 | ) 76 | else: 77 | crop_img = cv2.resize( 78 | img, (int(focal_ratio * img.shape[1]), int(focal_ratio * img.shape[0])) 79 | ) 80 | 81 | # Crop the image with new w/h ratio with respect to the center 82 | crop_img = crop_from_center(crop_img, 2 * new_k[1, 2], 2 * new_k[0, 2]) 83 | else: 84 | raise Exception("The new camera FOV is larger then the current.") 85 | 86 | return crop_img 87 | -------------------------------------------------------------------------------- /nerf_loc/datasets/video/multi_scene_dataset.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-05-25 13:44:53 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-07-06 16:48:29 6 | FilePath: /nerf-loc/datasets/DSM/multi_scene_dataset.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import torch 11 | import torch.nn as nn 12 | from torch.utils.data import ConcatDataset 13 | import numpy as np 14 | 15 | from .dataset import VideoDataset 16 | 17 | 18 | class MultiSceneDataset(ConcatDataset): 19 | def __init__( 20 | self, datasets 21 | ): 22 | super().__init__(datasets) 23 | self.datasets = datasets 24 | # TODO: scale_factor is different across scene 25 | scale_factor = np.inf 26 | for ds in datasets: 27 | scale_factor = min(scale_factor, ds.scale_factor) 28 | self.scale_factor = scale_factor 29 | 30 | def set_mode(self, mode): 31 | for ds in self.datasets: 32 | ds.set_mode(mode) 33 | 34 | def get_pc_range(self): 35 | pc_range = np.zeros(6) 36 | for ds in self.datasets: 37 | pc_range[:3] = np.minimum(pc_range[:3], ds.pc_range[:3]) 38 | pc_range[3:6] = np.maximum(pc_range[3:6], ds.pc_range[3:6]) 39 | return pc_range -------------------------------------------------------------------------------- /nerf_loc/datasets/video/preprocess_12scenes.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-03-10 17:46:34 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-03-29 19:57:41 6 | FilePath: /nerf-loc/datasets/video/preprocess_12scenes.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import sys 11 | import glob 12 | import pickle as pkl 13 | import os 14 | from os import path as osp 15 | import numpy as np 16 | import trimesh 17 | import cv2 18 | from tqdm import tqdm 19 | 20 | def load_pose(pose_txt): 21 | pose = [] 22 | with open(pose_txt, 'r') as f: 23 | for line in f: 24 | row = line.strip('\n').split() 25 | row = [float(c) for c in row] 26 | pose.append(row) 27 | pose = np.array(pose).astype(np.float32) 28 | assert pose.shape == (4,4) 29 | return pose 30 | 31 | def build_meta_infos(data_folder, place, scene, images, poses, color_focal, color_width, color_height): 32 | meta_infos = [] 33 | for image, pose in zip(images, poses): 34 | # some image have invalid pose files, skip those 35 | valid = True 36 | with open(ds + '/' + scene + '/data/' + pose, 'r') as f: 37 | pose_file = f.readlines() 38 | for line in pose_file: 39 | if 'INF' in line: 40 | valid = False 41 | if not valid: 42 | continue 43 | 44 | Twc = load_pose(os.path.join(data_folder, pose)) 45 | depth = cv2.imread( 46 | os.path.join(data_root, place, scene, 'data', image).replace('color.jpg', 'depth.png'), 47 | cv2.IMREAD_ANYDEPTH) 48 | depth = depth.astype(np.float32)/1000 49 | depth = depth.reshape(-1) 50 | near = np.percentile(depth, 0.1) 51 | far = np.percentile(depth, 99.9) 52 | # link_frame(i, 'test') 53 | meta_infos.append({ 54 | 'file_name': os.path.join(place, scene, 'data', image), 55 | 'frame_id': int(image.split('.')[0].split('-')[1]), 56 | 'sequence_id': '0', 57 | 'depth_file_name': os.path.join(place, scene, 'data', image).replace('color.jpg', 'depth.png'), 58 | 'extrinsic_Tcw': np.linalg.inv(Twc)[:3], 59 | 'camera_intrinsic': np.array([ 60 | color_focal, color_focal, color_width/2, color_height/2, 0., 0. 61 | ], dtype=np.float32), 62 | 'frame_dim': (color_height, color_width), 63 | 'near': near, 64 | 'far': far 65 | }) 66 | return meta_infos 67 | 68 | if __name__ == '__main__': 69 | data_root = sys.argv[1] 70 | for place in ['apt1', 'apt2', 'office1', 'office2']: 71 | # for place in ['apt1']: 72 | ds = os.path.join(data_root, place) 73 | scenes = os.listdir(ds) 74 | 75 | for scene in scenes: 76 | 77 | data_folder = ds + '/' + scene + '/data/' 78 | 79 | if not os.path.isdir(data_folder): 80 | # skip README files 81 | continue 82 | 83 | print(f"Processing files for {ds}/{scene}") 84 | 85 | # read the train / test split - the first sequence is used for testing, everything else for training 86 | with open(ds + '/' + scene + '/split.txt', 'r') as f: 87 | split = f.readlines() 88 | split = int(split[0].split()[1][8:-1]) 89 | 90 | # read the calibration parameters, we use only the color_focal 91 | with open(ds + '/' + scene + '/info.txt', 'r') as f: 92 | info_lines = f.readlines() 93 | color_focal = info_lines[7].split() 94 | color_focal = (float(color_focal[2]) + float(color_focal[7])) / 2 95 | color_width = int(info_lines[2].split()[-1]) 96 | color_height = int(info_lines[3].split()[-1]) 97 | 98 | files = os.listdir(data_folder) 99 | 100 | images = [f for f in files if f.endswith('color.jpg')] 101 | images.sort() 102 | 103 | poses = [f for f in files if f.endswith('pose.txt')] 104 | poses.sort() 105 | 106 | # frames up to split are test images 107 | test_meta_infos = build_meta_infos(data_folder, place, scene, 108 | images[:split], poses[:split], color_focal, color_width, color_height) 109 | train_meta_infos = build_meta_infos(data_folder, place, scene, 110 | images[split:], poses[split:], color_focal, color_width, color_height) 111 | 112 | print('near: ', np.array([m['near'] for m in train_meta_infos]).min()) 113 | print('far: ', np.array([m['far'] for m in train_meta_infos]).max()) 114 | with open(osp.join(ds, scene, f'info_train.pkl'), 'wb') as fout: 115 | pkl.dump(train_meta_infos, fout) 116 | with open(osp.join(ds, scene, f'info_test.pkl'), 'wb') as fout: 117 | pkl.dump(test_meta_infos, fout) 118 | print(f'test_num: {len(test_meta_infos)} train_num: {len(train_meta_infos)}, total: {len(images)}') 119 | 120 | model_path = glob.glob(ds + '/' + scene + '/*.ply')[0] 121 | mesh = trimesh.load(model_path) 122 | cloud = trimesh.PointCloud(vertices=mesh.vertices) 123 | cloud.export(ds + '/' + scene + '/pc.ply') 124 | 125 | 126 | -------------------------------------------------------------------------------- /nerf_loc/datasets/video/preprocess_7scenes.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-03-10 17:46:34 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-08-07 17:55:57 6 | FilePath: /nerf-loc/datasets/video/preprocess_7scenes.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import sys 11 | import glob 12 | import pickle as pkl 13 | import os 14 | from os import path as osp 15 | import numpy as np 16 | import cv2 17 | from tqdm import tqdm 18 | import re 19 | 20 | import fusion 21 | 22 | data_root = sys.argv[1] 23 | 24 | focallength = 525.0 25 | # focallength = 585.0 26 | 27 | def load_pose(pose_txt): 28 | pose = [] 29 | with open(pose_txt, 'r') as f: 30 | for line in f: 31 | row = line.strip('\n').split() 32 | row = [float(c) for c in row] 33 | pose.append(row) 34 | pose = np.array(pose).astype(np.float32) 35 | assert pose.shape == (4,4) 36 | return pose 37 | 38 | def fuse_tsdf(data_path, seqs): 39 | cam_intr = np.array([ 40 | [focallength, 0, 320.0], 41 | [0,focallength,240.0], 42 | [0,0,1] 43 | ]) 44 | vol_bnds = np.zeros((3,2)) 45 | for seq in seqs: 46 | seq = int(seq.replace('sequence', '')) 47 | seq_folder = osp.join(data_path, 'seq-%02d'%seq) 48 | for img in tqdm(glob.glob(seq_folder+'/*color.png')): 49 | mat = re.search(r'frame-(\d+)', img) 50 | i = int(mat.group(1)) 51 | # Read depth image and camera pose 52 | depth_im = cv2.imread( 53 | f"{data_path}/rendered_depth/train/depth/seq%02d_frame-%06d.pose.depth.tiff"%(seq, i),-1 54 | ).astype(float) 55 | depth_im /= 1000. # depth is saved in 16-bit PNG in millimeters 56 | depth_im[depth_im == 65.535] = 0 # set invalid depth to 0 (specific to 7-scenes dataset) 57 | cam_pose = \ 58 | np.loadtxt(f"{data_path}/seq-%02d/frame-%06d.pose.txt"%(seq, i)) # 4x4 rigid transformation matrix 59 | 60 | # Compute camera view frustum and extend convex hull 61 | view_frust_pts = fusion.get_view_frustum(depth_im, cam_intr, cam_pose) 62 | vol_bnds[:,0] = np.minimum(vol_bnds[:,0], np.amin(view_frust_pts, axis=1)) 63 | vol_bnds[:,1] = np.maximum(vol_bnds[:,1], np.amax(view_frust_pts, axis=1)) 64 | print(vol_bnds) 65 | 66 | # Initialize voxel volume 67 | print("Initializing voxel volume...") 68 | tsdf_vol = fusion.TSDFVolume(vol_bnds, voxel_size=0.02) 69 | 70 | # Loop through RGB-D images and fuse them together 71 | for seq in seqs: 72 | seq = int(seq.replace('sequence', '')) 73 | seq_folder = osp.join(data_path, 'seq-%02d'%seq) 74 | for img in tqdm(glob.glob(seq_folder+'/*color.png')): 75 | mat = re.search(r'frame-(\d+)', img) 76 | i = int(mat.group(1)) 77 | if i % 5 != 0: 78 | continue 79 | 80 | # Read RGB-D image and camera pose 81 | color_image = \ 82 | cv2.cvtColor(cv2.imread(f"{data_path}/seq-%02d/frame-%06d.color.png"%(seq,i)), cv2.COLOR_BGR2RGB) 83 | #depth_im = cv2.imread(f"{data_path}/frame-%06d.depth.png"%(i),-1).astype(float) 84 | depth_im = cv2.imread( 85 | f"{data_path}/rendered_depth/train/depth/seq%02d_frame-%06d.pose.depth.tiff"%(seq, i),-1 86 | ).astype(float) 87 | depth_im /= 1000. 88 | depth_im[depth_im == 65.535] = 0 89 | cam_pose = np.loadtxt(f"{data_path}/seq-%02d/frame-%06d.pose.txt"%(seq,i)) 90 | 91 | # Integrate observation into voxel volume (assume color aligned with depth) 92 | tsdf_vol.integrate(color_image, depth_im, cam_intr, cam_pose, obs_weight=1.) 93 | 94 | # Get mesh from voxel volume and save to disk (can be viewed with Meshlab) 95 | print("Saving mesh to mesh.ply...", osp.join(data_path, "mesh.ply")) 96 | verts, faces, norms, colors = tsdf_vol.get_mesh() 97 | fusion.meshwrite(osp.join(data_path, "mesh.ply"), verts, faces, norms, colors) 98 | 99 | # Get point cloud from voxel volume and save to disk (can be viewed with Meshlab) 100 | print("Saving point cloud to pc.ply...", osp.join(data_path, "pc.ply")) 101 | point_cloud = tsdf_vol.get_point_cloud() 102 | fusion.pcwrite(osp.join(data_path, "pc.ply"), point_cloud) 103 | 104 | 105 | def process_split(scene_folder, seqs, split): 106 | if split == 'train': 107 | fuse_tsdf(scene_folder, seqs) 108 | meta_infos = [] 109 | for seq in seqs: 110 | num = int(seq.replace('sequence', '')) 111 | seq_folder = osp.join(scene_folder, 'seq-%02d'%num) 112 | for img in tqdm(glob.glob(seq_folder+'/*color.png')): 113 | img_name = img.replace(data_root, '').lstrip('/') 114 | Twc = load_pose(img.replace('color.png', 'pose.txt')) 115 | image = cv2.imread(os.path.join(data_root, img_name)) 116 | 117 | mat = re.search(r'frame-(\d+)', img) 118 | i = int(mat.group(1)) 119 | render_depth_file = \ 120 | f"{scene_folder}/rendered_depth/train/depth/seq%02d_frame-%06d.pose.depth.tiff"%(num, i) 121 | if split == 'train': 122 | depth = cv2.imread(render_depth_file,-1) 123 | else: 124 | # only use to compute far near 125 | depth = cv2.imread(os.path.join(data_root, img_name.replace('color', 'depth')), cv2.IMREAD_ANYDEPTH) 126 | 127 | depth[depth==65535] = 0 128 | depth = depth.astype(np.float32)/1000 129 | depth = depth.reshape(-1) 130 | near = np.percentile(depth, 0.1) 131 | far = np.percentile(depth, 99.9) 132 | meta_infos.append({ 133 | 'file_name': img_name, 134 | 'frame_id': i, 135 | 'sequence_id': num, 136 | 'depth_file_name': render_depth_file.replace(data_root, '').lstrip('/'), 137 | 'extrinsic_Tcw': np.linalg.inv(Twc)[:3], 138 | 'camera_intrinsic': np.array([focallength, focallength, 320., 240., 0., 0.], dtype=np.float32), 139 | 'frame_dim': image.shape[:2], 140 | 'near': near, 141 | 'far': far 142 | }) 143 | print(meta_infos[0]) 144 | with open(osp.join(scene_folder, f'info_{split}.pkl'), 'wb') as fout: 145 | pkl.dump(meta_infos, fout) 146 | return meta_infos 147 | 148 | for scene in ['chess', 'pumpkin', 'fire', 'heads', 'office', 'redkitchen', 'stairs']: 149 | scene_folder = osp.join(data_root, scene) 150 | 151 | train_split = [] 152 | with open(osp.join(data_root, scene, 'TrainSplit.txt'), 'r') as f: 153 | for line in f: 154 | train_split.append(line.strip('\n')) 155 | process_split(scene_folder, train_split, 'train') 156 | 157 | test_split = [] 158 | with open(osp.join(data_root, scene, 'TestSplit.txt'), 'r') as f: 159 | for line in f: 160 | test_split.append(line.strip('\n')) 161 | process_split(scene_folder, test_split, 'test') 162 | 163 | 164 | -------------------------------------------------------------------------------- /nerf_loc/models/COTR/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JenningsL/nerf-loc/1d539c5a4824a46d26414f3c2b41bb1b1f6dd91e/nerf_loc/models/COTR/__init__.py -------------------------------------------------------------------------------- /nerf_loc/models/COTR/backbone2d.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | """ 3 | Backbone modules. 4 | """ 5 | from collections import OrderedDict 6 | from functools import partial 7 | 8 | import torch 9 | import torch.nn.functional as F 10 | import torchvision 11 | from torchvision import transforms 12 | from torch import nn 13 | from torchvision.models._utils import IntermediateLayerGetter 14 | # from torchvision.ops import FeaturePyramidNetwork 15 | from typing import Dict, List 16 | 17 | from .resnet import resnet50 18 | from .fpn import FeaturePyramidNetwork 19 | 20 | # from .position_encoding import build_position_encoding 21 | # from COTR.utils import debug_utils, constants 22 | 23 | DEFAULT_PRECISION = 'float32' 24 | MAX_SIZE = 256 25 | VALID_NN_OVERLAPPING_THRESH = 0.1 26 | 27 | 28 | class FrozenBatchNorm2d(torch.nn.Module): 29 | """ 30 | BatchNorm2d where the batch statistics and the affine parameters are fixed. 31 | 32 | Copy-paste from torchvision.misc.ops with added eps before rqsrt, 33 | without which any other models than torchvision.models.resnet[18,34,50,101] 34 | produce nans. 35 | """ 36 | 37 | def __init__(self, n): 38 | super(FrozenBatchNorm2d, self).__init__() 39 | self.register_buffer("weight", torch.ones(n)) 40 | self.register_buffer("bias", torch.zeros(n)) 41 | self.register_buffer("running_mean", torch.zeros(n)) 42 | self.register_buffer("running_var", torch.ones(n)) 43 | 44 | def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict, 45 | missing_keys, unexpected_keys, error_msgs): 46 | num_batches_tracked_key = prefix + 'num_batches_tracked' 47 | if num_batches_tracked_key in state_dict: 48 | del state_dict[num_batches_tracked_key] 49 | 50 | super(FrozenBatchNorm2d, self)._load_from_state_dict( 51 | state_dict, prefix, local_metadata, strict, 52 | missing_keys, unexpected_keys, error_msgs) 53 | 54 | def forward(self, x): 55 | # move reshapes to the beginning 56 | # to make it fuser-friendly 57 | w = self.weight.reshape(1, -1, 1, 1) 58 | b = self.bias.reshape(1, -1, 1, 1) 59 | rv = self.running_var.reshape(1, -1, 1, 1) 60 | rm = self.running_mean.reshape(1, -1, 1, 1) 61 | eps = 1e-5 62 | scale = w * (rv + eps).rsqrt() 63 | bias = b - rm * scale 64 | return x * scale + bias 65 | 66 | 67 | class Backbone(nn.Module): 68 | """ResNet backbone with frozen BatchNorm.""" 69 | def __init__(self, return_layers, train_backbone=True, use_fpn=False, fpn_dim=128): 70 | super().__init__() 71 | self.normalize = transforms.Normalize( 72 | mean=[0.485, 0.456, 0.406], 73 | std=[0.229, 0.224, 0.225]) 74 | self.layer_to_channels = layer_to_channels = { 75 | 'conv1': 64, 76 | 'layer1': 256, 77 | 'layer2': 512, 78 | 'layer3': 1024, 79 | 'layer4': 2048, 80 | } 81 | self.layer_to_stride = layer_to_stride = { 82 | 'conv1': 2, 83 | 'layer1': 4, 84 | 'layer2': 8, 85 | 'layer3': 16, 86 | 'layer4': 32, 87 | } 88 | self.return_layers = return_layers 89 | backbone = resnet50( 90 | replace_stride_with_dilation=[False, False, False], 91 | pretrained=False, 92 | norm_layer=FrozenBatchNorm2d 93 | # norm_layer=nn.BatchNorm2d 94 | # norm_layer=partial(nn.InstanceNorm2d, track_running_stats=True) 95 | ) 96 | for name, parameter in backbone.named_parameters(): 97 | if (not train_backbone) or ('layer2' not in name and 'layer3' not in name and 'layer4' not in name): 98 | parameter.requires_grad_(False) 99 | print(f'freeze {name}') 100 | self.body = IntermediateLayerGetter(backbone, return_layers={l:l for l in return_layers}) 101 | self.use_fpn = use_fpn 102 | if self.use_fpn: 103 | self.fpn = FeaturePyramidNetwork( 104 | in_channels_list=[layer_to_channels[l] for l in return_layers if 'layer' in l], 105 | # in_channels_list=[layer_to_channels[l] for l in return_layers], 106 | out_channels=fpn_dim, 107 | # norm_layer=nn.BatchNorm2d 108 | norm_layer=nn.InstanceNorm2d 109 | ) 110 | self.layer_to_channels.update({l:fpn_dim for l in return_layers if 'layer' in l}) 111 | # self.layer_to_channels.update({l:fpn_dim for l in return_layers}) 112 | 113 | def forward(self, x): 114 | x = self.normalize(x) 115 | y = self.body(x) 116 | if self.use_fpn: 117 | fpn_input = OrderedDict() 118 | for l in self.return_layers: 119 | if 'layer' in l: # conv1 is not passed to FPN 120 | fpn_input[l] = y[l] 121 | # fpn_input[l] = y[l] 122 | fpn_out = self.fpn(fpn_input) 123 | y.update(fpn_out) 124 | return y 125 | 126 | 127 | # class Backbone(nn.Module): 128 | # """ResNet backbone with frozen BatchNorm.""" 129 | 130 | # def __init__(self, name: str, 131 | # train_backbone: bool, 132 | # return_interm_layers: bool, 133 | # dilation: bool, 134 | # layer='layer3', 135 | # num_channels=1024): 136 | # backbone = getattr(torchvision.models, name)( 137 | # replace_stride_with_dilation=[False, False, dilation], 138 | # pretrained=False, 139 | # norm_layer=FrozenBatchNorm2d) 140 | # super().__init__(backbone, train_backbone, num_channels, return_interm_layers, layer) 141 | 142 | 143 | def build_backbone( 144 | model_path='models/COTR/default/checkpoint.pth.tar', 145 | return_layers=['layer1', 'layer2', 'layer3', 'layer4'], 146 | train_backbone=True, 147 | use_fpn=False, 148 | fpn_dim=128): 149 | ckpt = torch.load(model_path) 150 | state_dict = {k.replace('backbone.0.', ''):v for k,v in ckpt['model_state_dict'].items() if 'backbone' in k} 151 | # position_embedding = build_position_encoding(args) 152 | # train_backbone = False 153 | 154 | backbone = Backbone(return_layers=return_layers, train_backbone=train_backbone, use_fpn=use_fpn, fpn_dim=fpn_dim) 155 | # backbone.layer_to_stride = layer_to_stride 156 | # backbone.layer_to_channels = layer_to_channels 157 | backbone.load_state_dict(state_dict, strict=False) 158 | return backbone 159 | 160 | if __name__ == '__main__': 161 | model = build_backbone(return_layers=['conv1', 'layer1', 'layer2', 'layer3'], use_fpn=True) 162 | print(model) 163 | x = model(torch.randn(1,3,32,32)) 164 | for k,v in x.items(): 165 | print(k, v.shape) 166 | -------------------------------------------------------------------------------- /nerf_loc/models/COTR/misc.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 2 | """ 3 | Misc functions, including distributed helpers. 4 | 5 | Mostly copy-paste from torchvision references. 6 | """ 7 | import os 8 | import subprocess 9 | import time 10 | from collections import defaultdict, deque 11 | import datetime 12 | import pickle 13 | from typing import Optional, List 14 | 15 | import torch 16 | import torch.distributed as dist 17 | from torch import Tensor 18 | 19 | # needed due to empty tensor bug in pytorch and torchvision 0.5 20 | import torchvision 21 | # if float(torchvision.__version__[:3]) < 0.7: 22 | # from torchvision.ops import _new_empty_tensor 23 | # from torchvision.ops.misc import _output_size 24 | 25 | 26 | def _max_by_axis(the_list): 27 | # type: (List[List[int]]) -> List[int] 28 | maxes = the_list[0] 29 | for sublist in the_list[1:]: 30 | for index, item in enumerate(sublist): 31 | maxes[index] = max(maxes[index], item) 32 | return maxes 33 | 34 | 35 | class NestedTensor(object): 36 | def __init__(self, tensors, mask: Optional[Tensor]): 37 | self.tensors = tensors 38 | self.mask = mask 39 | 40 | def to(self, device): 41 | # type: (Device) -> NestedTensor # noqa 42 | cast_tensor = self.tensors.to(device) 43 | mask = self.mask 44 | if mask is not None: 45 | assert mask is not None 46 | cast_mask = mask.to(device) 47 | else: 48 | cast_mask = None 49 | return NestedTensor(cast_tensor, cast_mask) 50 | 51 | def decompose(self): 52 | return self.tensors, self.mask 53 | 54 | def __repr__(self): 55 | return str(self.tensors) 56 | 57 | 58 | def nested_tensor_from_tensor_list(tensor_list: List[Tensor]): 59 | # TODO make this more general 60 | if tensor_list[0].ndim == 3: 61 | if torchvision._is_tracing(): 62 | # nested_tensor_from_tensor_list() does not export well to ONNX 63 | # call onnx_nested_tensor_from_tensor_list() instead 64 | return onnx_nested_tensor_from_tensor_list(tensor_list) 65 | 66 | # TODO make it support different-sized images 67 | max_size = _max_by_axis([list(img.shape) for img in tensor_list]) 68 | # min_size = tuple(min(s) for s in zip(*[img.shape for img in tensor_list])) 69 | batch_shape = [len(tensor_list)] + max_size 70 | b, c, h, w = batch_shape 71 | dtype = tensor_list[0].dtype 72 | device = tensor_list[0].device 73 | tensor = torch.zeros(batch_shape, dtype=dtype, device=device) 74 | mask = torch.ones((b, h, w), dtype=torch.bool, device=device) 75 | for img, pad_img, m in zip(tensor_list, tensor, mask): 76 | pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img) 77 | m[: img.shape[1], :img.shape[2]] = False 78 | else: 79 | raise ValueError('not supported') 80 | return NestedTensor(tensor, mask) 81 | 82 | 83 | # onnx_nested_tensor_from_tensor_list() is an implementation of 84 | # nested_tensor_from_tensor_list() that is supported by ONNX tracing. 85 | @torch.jit.unused 86 | def onnx_nested_tensor_from_tensor_list(tensor_list: List[Tensor]) -> NestedTensor: 87 | max_size = [] 88 | for i in range(tensor_list[0].dim()): 89 | max_size_i = torch.max(torch.stack([img.shape[i] for img in tensor_list]).to(torch.float32)).to(torch.int64) 90 | max_size.append(max_size_i) 91 | max_size = tuple(max_size) 92 | 93 | # work around for 94 | # pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img) 95 | # m[: img.shape[1], :img.shape[2]] = False 96 | # which is not yet supported in onnx 97 | padded_imgs = [] 98 | padded_masks = [] 99 | for img in tensor_list: 100 | padding = [(s1 - s2) for s1, s2 in zip(max_size, tuple(img.shape))] 101 | padded_img = torch.nn.functional.pad(img, (0, padding[2], 0, padding[1], 0, padding[0])) 102 | padded_imgs.append(padded_img) 103 | 104 | m = torch.zeros_like(img[0], dtype=torch.int, device=img.device) 105 | padded_mask = torch.nn.functional.pad(m, (0, padding[2], 0, padding[1]), "constant", 1) 106 | padded_masks.append(padded_mask.to(torch.bool)) 107 | 108 | tensor = torch.stack(padded_imgs) 109 | mask = torch.stack(padded_masks) 110 | 111 | return NestedTensor(tensor, mask=mask) 112 | -------------------------------------------------------------------------------- /nerf_loc/models/COTR/position_encoding.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-03-09 18:02:30 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-03-09 18:30:32 6 | FilePath: /nerf-loc/models/COTR/position_encoding.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | 11 | import math 12 | import torch 13 | from torch import nn 14 | import torch.nn.functional as F 15 | 16 | 17 | class MLP(nn.Module): 18 | """ Very simple multi-layer perceptron (also called FFN)""" 19 | 20 | def __init__(self, input_dim, hidden_dim, output_dim, num_layers): 21 | super().__init__() 22 | self.num_layers = num_layers 23 | h = [hidden_dim] * (num_layers - 1) 24 | self.layers = nn.ModuleList(nn.Linear(n, k) for n, k in zip([input_dim] + h, h + [output_dim])) 25 | 26 | def forward(self, x): 27 | for i, layer in enumerate(self.layers): 28 | x = F.relu(layer(x)) if i < self.num_layers - 1 else layer(x) 29 | return x 30 | 31 | 32 | class NerfPositionalEncoding(nn.Module): 33 | def __init__(self, depth=10, sine_type='lin_sine'): 34 | ''' 35 | out_dim = in_dim * depth * 2 36 | ''' 37 | super().__init__() 38 | if sine_type == 'lin_sine': 39 | self.bases = [i+1 for i in range(depth)] 40 | elif sine_type == 'exp_sine': 41 | self.bases = [2**i for i in range(depth)] 42 | print(f'using {sine_type} as positional encoding') 43 | 44 | @torch.no_grad() 45 | def forward(self, inputs): 46 | out = torch.cat([ 47 | torch.sin(i * math.pi * inputs) for i in self.bases 48 | ] + [torch.cos(i * math.pi * inputs) for i in self.bases], axis=-1) 49 | assert torch.isnan(out).any() == False 50 | return out 51 | 52 | 53 | class PositionEmbeddingSine(nn.Module): 54 | """ 55 | This is a more standard version of the position embedding, very similar to the one 56 | used by the Attention is all you need paper, generalized to work on images. 57 | """ 58 | def __init__(self, num_pos_feats=64, temperature=10000, normalize=False, scale=None, sine_type='lin_sine'): 59 | super().__init__() 60 | self.num_pos_feats = num_pos_feats 61 | self.temperature = temperature 62 | self.normalize = normalize 63 | self.sine = NerfPositionalEncoding(num_pos_feats//2, sine_type) 64 | 65 | @torch.no_grad() 66 | def forward(self, x): 67 | """ 68 | Args: 69 | x: B,H,W 70 | Returns: 71 | B,H,W,C 72 | """ 73 | mask = torch.ones_like(x) 74 | y_embed = mask.cumsum(1, dtype=torch.float32) 75 | x_embed = mask.cumsum(2, dtype=torch.float32) 76 | eps = 1e-6 77 | y_embed = (y_embed-0.5) / (y_embed[:, -1:, :] + eps) 78 | x_embed = (x_embed-0.5) / (x_embed[:, :, -1:] + eps) 79 | pos = torch.stack([x_embed, y_embed], dim=-1) 80 | return self.sine(pos) 81 | 82 | 83 | 84 | def build_position_encoding(args): 85 | N_steps = args.hidden_dim // 2 86 | if args.position_embedding in ('lin_sine', 'exp_sine'): 87 | # TODO find a better way of exposing other arguments 88 | position_embedding = PositionEmbeddingSine(N_steps, normalize=True, sine_type=args.position_embedding) 89 | else: 90 | raise ValueError(f"not supported {args.position_embedding}") 91 | 92 | return position_embedding 93 | -------------------------------------------------------------------------------- /nerf_loc/models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JenningsL/nerf-loc/1d539c5a4824a46d26414f3c2b41bb1b1f6dd91e/nerf_loc/models/__init__.py -------------------------------------------------------------------------------- /nerf_loc/models/appearance_embedding.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-04-14 21:10:00 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-08-18 12:36:13 6 | FilePath: /nerf-loc/models/appearance_embedding.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import torch 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | import torchvision 14 | 15 | from .COTR.backbone2d import build_backbone as build_cotr_backbone 16 | 17 | class AppearanceEmbedding(nn.Module): 18 | def __init__(self, args): 19 | super().__init__() 20 | self.dim = args.appearance_emb_dim 21 | 22 | def forward(self, imgs, x): 23 | """ 24 | Args: 25 | imgs: b,3,h,w 26 | x: {'conv1': [b,c,h,w]} 27 | Returns: 28 | """ 29 | xs = [] 30 | b,c = x['conv1'].shape[:2] # 64 31 | for i in range(b): 32 | std,mean = torch.std_mean(x['conv1'][i].view(c,-1), dim=1) 33 | xs.append(torch.cat([mean,std])) # 128 34 | x = torch.stack(xs) 35 | assert x.shape[-1] == self.dim 36 | return x 37 | 38 | class AppearanceAdaptLayer(nn.Module): 39 | def __init__(self, args, input_dim, is_rgb=False): 40 | super().__init__() 41 | self.input_dim = input_dim 42 | self.appearance_emb_dim = args.appearance_emb_dim 43 | self.is_rgb = is_rgb 44 | self.mlp = nn.Sequential( 45 | nn.Linear(self.appearance_emb_dim, 64), 46 | nn.LeakyReLU(inplace=True), 47 | nn.Linear(64, 64), 48 | nn.LeakyReLU(inplace=True), 49 | nn.Linear(64, input_dim*2) 50 | ) 51 | 52 | def forward(self, x, embedding, target_embedding): 53 | """ 54 | Args: 55 | x: B,H,W,C 56 | embedding: B,appearance_emb_dim 57 | target_embedding: 1,appearance_emb_dim 58 | Returns: 59 | B,H,W,C 60 | """ 61 | embedding_diff = target_embedding - embedding 62 | code = self.mlp(embedding_diff) 63 | a, b = torch.split(code, [self.input_dim, self.input_dim], dim=-1) 64 | y = a[:,None,None,:] * x + b[:,None,None,:] 65 | if self.is_rgb: 66 | y = torch.clip(y, min=0., max=1.) 67 | return y 68 | 69 | if __name__ == '__main__': 70 | from collections import namedtuple 71 | Config = namedtuple('Config', ['appearance_emb_dim']) 72 | args = Config(128) 73 | net = AppearanceEmbeddingNetwork(args) 74 | x = net(torch.randn([1,3,512,512])) 75 | print(x.shape, x) 76 | -------------------------------------------------------------------------------- /nerf_loc/models/conditional_nerf/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JenningsL/nerf-loc/1d539c5a4824a46d26414f3c2b41bb1b1f6dd91e/nerf_loc/models/conditional_nerf/__init__.py -------------------------------------------------------------------------------- /nerf_loc/models/conditional_nerf/losses.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-03-02 13:56:51 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-08-10 20:47:41 6 | FilePath: /nerf-loc/models/pointnerf/losses.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import torch 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | 14 | # transform to inverse depth coordinate 15 | def to_inverse_normalized_depth(depth, near, far): 16 | near_inv, far_inv = -1/near, -1/far # rfn,1 17 | depth = torch.clamp(depth, min=1e-5) 18 | depth = -1 / depth 19 | depth = (depth - near_inv) / (far_inv - near_inv) 20 | depth = torch.clamp(depth, min=0, max=1.0) 21 | return depth 22 | 23 | class RenderingLoss(nn.Module): 24 | """ 25 | Equation 13 in the NeRF-W paper. 26 | Name abbreviations: 27 | c_l: coarse color loss 28 | f_l: fine color loss (1st term in equation 13) 29 | b_l: beta loss (2nd term in equation 13) 30 | """ 31 | def __init__(self, coef=1, lambda_u=0.01, use_depth=False, depth_std=0.05): 32 | """ 33 | lambda_u: in equation 13 34 | """ 35 | super().__init__() 36 | self.coef = coef 37 | self.lambda_u = lambda_u 38 | self.use_depth = use_depth 39 | self.depth_std = depth_std 40 | 41 | def forward(self, inputs, targets): 42 | if 'mask' not in targets: 43 | mask = torch.ones_like(targets['rgb'][:,0]).bool() 44 | else: 45 | mask = targets['mask'].bool() 46 | rgb = inputs['rgb'][mask] 47 | depth = inputs['depth'][mask] 48 | rgb_target = targets['rgb'][mask] 49 | 50 | if 'beta' in inputs: 51 | beta = inputs['beta'][mask] 52 | rgb_loss = ((rgb-rgb_target)**2/(2*beta.unsqueeze(1)**2)).mean() 53 | beta_loss = 3 + torch.log(beta).mean() # +3 to make it positive 54 | loss = self.coef * (rgb_loss + beta_loss) 55 | else: 56 | rgb_loss = ((rgb-rgb_target)**2).mean() 57 | loss = self.coef * rgb_loss 58 | 59 | if self.use_depth and 'depth' in targets: 60 | target_depth = targets['depth'][mask] 61 | 62 | # weights = inputs['weights'] 63 | 64 | # valid_depth_mask = target_depth > 0 65 | # depth_mask = (target_depth > 0) & ((depth-target_depth).abs()>self.depth_std) 66 | depth_mask = target_depth > 0 67 | near, far = targets['depth_range'] 68 | target_depth = to_inverse_normalized_depth(target_depth, near, far) 69 | depth = to_inverse_normalized_depth(depth, near, far) 70 | 71 | # uncertainty = inputs['depth_uncertainty'][mask] + 1. 72 | # depth_loss = torch.log(uncertainty) + ((depth-target_depth)**2)/uncertainty 73 | # depth_loss = 0.003 * (depth_loss * depth_mask).sum() / (1e-8 + depth_mask.sum()) 74 | 75 | depth_loss = (depth-target_depth)**2 76 | depth_loss = (depth_loss * depth_mask).sum() / (1e-8 + depth_mask.sum()) 77 | # print('rgb_loss: ', rgb_loss, 'depth_loss: ', depth_loss) 78 | loss += (self.coef * depth_loss) 79 | 80 | 81 | if 'depth_coarse' in inputs: 82 | depth_coarse = inputs['depth_coarse'][mask] 83 | depth_coarse = to_inverse_normalized_depth(depth_coarse, near, far) 84 | depth_loss_c = (depth_coarse-target_depth)**2 85 | depth_loss_c = (depth_loss_c * depth_mask).sum() / (1e-8 + depth_mask.sum()) 86 | # print('rgb_loss: ', rgb_loss, 'depth_loss: ', depth_loss, 'depth_loss_c: ', depth_loss_c) 87 | loss += (self.coef * depth_loss_c) 88 | 89 | if 'feat' in inputs and 'feat' in targets: 90 | feat_loss = 0.1 * ((inputs['feat'][mask] - targets['feat'][mask])**2).mean() 91 | loss += (self.coef * feat_loss) 92 | 93 | return loss 94 | -------------------------------------------------------------------------------- /nerf_loc/models/conditional_nerf/model_simple.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-06-30 10:51:58 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-08-11 12:24:10 6 | FilePath: /nerf-loc/models/pointnerf/pointnerf_3d_model.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import copy 11 | import math 12 | import torch 13 | import torch.nn as nn 14 | 15 | from .model import ConditionalNeRF 16 | 17 | class ConditionalNeRFSimple(ConditionalNeRF): 18 | def __init__(self, args, activation_func=nn.LeakyReLU(inplace=True)): 19 | super().__init__(args, activation_func) 20 | 21 | self.out_fc = nn.Linear(3+args.backbone2d_fpn_dim, args.model_3d_hidden_dim) 22 | self.proj_layer_3d_coarse = nn.Linear(args.model_3d_hidden_dim, args.matcher_hidden_dim) 23 | self.proj_layer_3d_fine = nn.Linear(args.model_3d_hidden_dim, args.matcher_hidden_dim) 24 | 25 | def query(self, data, xyz, support_featmaps, support_neural_points, 26 | direction=None, K=8, embed_a=None, target_proj_mat=None): 27 | """ 28 | Args: 29 | data: 30 | xyz: target points in world frame (N,3) 31 | support_featmaps: feature maps of support views (V,C,h,w) 32 | support_neural_points: 33 | xyz: (M,3) 34 | feature: (M,C) 35 | confidence: (M,1) 36 | direction: (N,4) viewing direction and distance of sample points. in world frame 37 | embed_a: (1,C_a) 38 | target_proj_mat: (3,4) 39 | Returns: 40 | feature: K-nearest support features (N,K,C1) 41 | weights: K-nearest support weights (N,K) 42 | feature_agg: aggregated support feature (N,C1) 43 | """ 44 | intrinsics = data['topk_Ks'] 45 | intrinsics_hom = torch.eye(4, device=intrinsics.device).expand(intrinsics.shape[0], 4, 4).clone() 46 | intrinsics_hom[:,:3,:3] = intrinsics 47 | rgb, feat, mask = \ 48 | self.projector.compute(xyz, intrinsics_hom, data['topk_poses'], data['topk_images'], support_featmaps) 49 | weight = mask / (torch.sum(mask, dim=1, keepdim=True) + 1e-8) 50 | multiview_feature = torch.cat([rgb, feat], dim=-1) 51 | multiview_visibility = mask.float() 52 | feature_agg = self.out_fc((multiview_feature * weight).sum(dim=1)) 53 | return { 54 | 'feature_agg': feature_agg, 55 | 'multiview_feature': multiview_feature, 56 | 'multiview_visibility': multiview_visibility 57 | } 58 | 59 | def query_coarse(self, data, points=None, embed_a=None): 60 | if self.support_neural_points is None: 61 | self.build_support_neural_points(data) 62 | 63 | if points is None: 64 | pts3d, pts3d_ndc, _ = self.sample_points_3d() 65 | else: 66 | pts3d = points 67 | w2c_ref = data['topk_poses'][0].inverse() 68 | pts3d_ndc = (torch.matmul(w2c_ref[:3,:3], points.T) + w2c_ref[:3,3:]).T 69 | 70 | query_dict = self.query( 71 | data, pts3d, 72 | support_featmaps=data['feat_coarse_src'].permute(0,3,1,2), 73 | support_neural_points=self.support_neural_points['coarse'], 74 | K=8, 75 | embed_a=embed_a 76 | ) 77 | desc_3d = self.proj_layer_3d_coarse(query_dict['feature_agg']) 78 | 79 | return desc_3d, pts3d, pts3d_ndc 80 | 81 | def query_fine(self, data, points, embed_a=None): 82 | if self.support_neural_points is None: 83 | self.build_support_neural_points(data) 84 | 85 | query_dict = self.query( 86 | data, points, 87 | support_featmaps=data['feat_fine_src'].permute(0,3,1,2), 88 | support_neural_points=self.support_neural_points['fine'], 89 | K=1, 90 | embed_a=embed_a 91 | ) 92 | 93 | desc_3d = self.proj_layer_3d_fine(query_dict['feature_agg']) 94 | 95 | return desc_3d, None, None 96 | -------------------------------------------------------------------------------- /nerf_loc/models/conditional_nerf/ray_unet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | class RayUnet(nn.Module): 6 | def __init__(self, in_channels, n_samples): 7 | super().__init__() 8 | 9 | out_channels = in_channels 10 | 11 | self.conv1 = nn.Sequential( 12 | nn.Conv1d(in_channels, 64, 3, 1, padding=1), 13 | nn.LayerNorm([64,n_samples]), 14 | # nn.InstanceNorm1d(64), 15 | nn.ELU(inplace=True) 16 | ) 17 | self.conv2 = nn.Sequential( 18 | nn.Conv1d(64, 128, 3, 1, padding=1), 19 | nn.LayerNorm([128,n_samples//2]), 20 | # nn.InstanceNorm1d(128), 21 | nn.ELU(inplace=True) 22 | ) 23 | self.conv3 = nn.Sequential( 24 | nn.Conv1d(128, 128, 3, 1, padding=1), 25 | nn.LayerNorm([128,n_samples//4]), 26 | # nn.InstanceNorm1d(128), 27 | nn.ELU(inplace=True) 28 | ) 29 | self.maxpool = nn.MaxPool1d(2) 30 | self.trans_conv3 = nn.Sequential( 31 | nn.ConvTranspose1d(128, 128, 3, 2, padding=1, output_padding=1), 32 | nn.LayerNorm([128,n_samples//4]), 33 | # nn.InstanceNorm1d(128), 34 | nn.ELU(inplace=True) 35 | ) 36 | self.trans_conv2 = nn.Sequential( 37 | nn.ConvTranspose1d(256, 64, 3, 2, padding=1, output_padding=1), 38 | nn.LayerNorm([64,n_samples//2]), 39 | # nn.InstanceNorm1d(64), 40 | nn.ELU(inplace=True) 41 | ) 42 | self.trans_conv1 = nn.Sequential( 43 | nn.ConvTranspose1d(128, 32, 3, 2, padding=1, output_padding=1), 44 | nn.LayerNorm([32,n_samples]), 45 | # nn.InstanceNorm1d(32), 46 | nn.ELU(inplace=True) 47 | ) 48 | self.conv_out = nn.Sequential( 49 | nn.Conv1d(in_channels+32, out_channels, 3, 1, padding=1), 50 | nn.LayerNorm([out_channels, n_samples]), 51 | # nn.InstanceNorm1d(out_channels), 52 | nn.ELU(inplace=True) 53 | ) 54 | 55 | def forward(self, x): 56 | """ 57 | x: B,C,N 58 | """ 59 | conv1_0 = self.conv1(x) 60 | conv1 = self.maxpool(conv1_0) 61 | conv2_0 = self.conv2(conv1) 62 | conv2 = self.maxpool(conv2_0) 63 | conv3_0 = self.conv3(conv2) 64 | conv3 = self.maxpool(conv3_0) 65 | x_0 = self.trans_conv3(conv3) 66 | x_1 = self.trans_conv2(torch.cat([conv2, x_0], dim=1)) 67 | x_2 = self.trans_conv1(torch.cat([conv1, x_1], dim=1)) 68 | x_out = self.conv_out(torch.cat([x, x_2], dim=1)) 69 | return x_out 70 | -------------------------------------------------------------------------------- /nerf_loc/models/conditional_nerf/utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from einops import rearrange, reduce, repeat 4 | 5 | class Embedder: 6 | def __init__(self, **kwargs): 7 | self.kwargs = kwargs 8 | self.create_embedding_fn() 9 | 10 | def create_embedding_fn(self): 11 | embed_fns = [] 12 | d = self.kwargs['input_dims'] 13 | out_dim = 0 14 | if self.kwargs['include_input']: 15 | embed_fns.append(lambda x : x) 16 | out_dim += d 17 | 18 | max_freq = self.kwargs['max_freq_log2'] 19 | N_freqs = self.kwargs['num_freqs'] 20 | 21 | if self.kwargs['log_sampling']: 22 | freq_bands = 2.**torch.linspace(0., max_freq, steps=N_freqs) 23 | else: 24 | freq_bands = torch.linspace(2.**0., 2.**max_freq, steps=N_freqs) 25 | 26 | for freq in freq_bands: 27 | for p_fn in self.kwargs['periodic_fns']: 28 | embed_fns.append(lambda x, p_fn=p_fn, freq=freq : p_fn(x * freq)) 29 | out_dim += d 30 | 31 | self.embed_fns = embed_fns 32 | self.out_dim = out_dim 33 | 34 | def embed(self, inputs): 35 | return torch.cat([fn(inputs) for fn in self.embed_fns], -1) 36 | 37 | 38 | def get_embedder(multires, i=0, include_input=True): 39 | if i == -1: 40 | return nn.Identity(), 3 41 | 42 | embed_kwargs = { 43 | 'include_input' : include_input, 44 | 'input_dims' : 3, 45 | 'max_freq_log2' : multires-1, 46 | 'num_freqs' : multires, 47 | 'log_sampling' : True, 48 | 'periodic_fns' : [torch.sin, torch.cos], 49 | } 50 | 51 | embedder_obj = Embedder(**embed_kwargs) 52 | embed = lambda x, eo=embedder_obj : eo.embed(x) 53 | return embed, embedder_obj.out_dim 54 | 55 | 56 | def get_rays(H, W, K, c2w): 57 | i, j = torch.meshgrid(torch.linspace(0, W-1, W), torch.linspace(0, H-1, H)) # pytorch's meshgrid has indexing='ij' 58 | i = i.t().to(K.device) 59 | j = j.t().to(K.device) 60 | dirs = torch.stack([ 61 | (i-K[0][2])/K[0][0], 62 | (j-K[1][2])/K[1][1], 63 | torch.ones_like(i) 64 | ], -1) 65 | # Rotate ray directions from camera frame to the world frame 66 | rays_d = torch.sum(dirs[..., None, :] * c2w[:3,:3], -1) # dot product, equals to: [c2w.dot(dir) for dir in dirs] 67 | rays_d = rays_d / torch.norm(rays_d, dim=-1, keepdim=True) # normalize 68 | # Translate camera frame's origin to the world frame. It is the origin of all rays. 69 | rays_o = c2w[:3,-1].expand(rays_d.shape) 70 | return rays_o, rays_d 71 | 72 | 73 | def sample_pdf(bins, weights, N_importance, det=False, eps=1e-5): 74 | """ 75 | Sample @N_importance samples from @bins with distribution defined by @weights. 76 | Inputs: 77 | bins: (N_rays, N_samples_+1) where N_samples_ is "the number of coarse samples per ray - 2" 78 | weights: (N_rays, N_samples_) 79 | N_importance: the number of samples to draw from the distribution 80 | det: deterministic or not 81 | eps: a small number to prevent division by zero 82 | Outputs: 83 | samples: the sampled samples 84 | """ 85 | N_rays, N_samples_ = weights.shape 86 | weights = weights + eps # prevent division by zero (don't do inplace op!) 87 | pdf = weights / reduce(weights, 'n1 n2 -> n1 1', 'sum') # (N_rays, N_samples_) 88 | cdf = torch.cumsum(pdf, -1) # (N_rays, N_samples), cumulative distribution function 89 | cdf = torch.cat([torch.zeros_like(cdf[: ,:1]), cdf], -1) # (N_rays, N_samples_+1) 90 | # padded to 0~1 inclusive 91 | 92 | if det: 93 | u = torch.linspace(0, 1, N_importance, device=bins.device) 94 | u = u.expand(N_rays, N_importance) 95 | else: 96 | u = torch.rand(N_rays, N_importance, device=bins.device) 97 | u = u.contiguous() 98 | 99 | inds = torch.searchsorted(cdf, u, right=True) 100 | below = torch.clamp_min(inds-1, 0) 101 | above = torch.clamp_max(inds, N_samples_) 102 | 103 | inds_sampled = rearrange(torch.stack([below, above], -1), 'n1 n2 c -> n1 (n2 c)', c=2) 104 | cdf_g = rearrange(torch.gather(cdf, 1, inds_sampled), 'n1 (n2 c) -> n1 n2 c', c=2) 105 | bins_g = rearrange(torch.gather(bins, 1, inds_sampled), 'n1 (n2 c) -> n1 n2 c', c=2) 106 | 107 | denom = cdf_g[...,1]-cdf_g[...,0] 108 | denom[denom= -EPS and image.max() <= 1 + EPS 135 | image = torch.clamp(image * 255, 0.0, 255.0) # Input should be 0-255. 136 | mean = self.preprocess['mean'] 137 | std = self.preprocess['std'] 138 | image = image - image.new_tensor(mean).view(1, -1, 1, 1) 139 | image = image / image.new_tensor(std).view(1, -1, 1, 1) 140 | 141 | # Feature extraction. 142 | descriptors = self.backbone(image) 143 | b, c, _, _ = descriptors.size() 144 | descriptors = descriptors.view(b, c, -1) 145 | 146 | # NetVLAD layer. 147 | descriptors = F.normalize(descriptors, dim=1) # Pre-normalization. 148 | desc = self.netvlad(descriptors) 149 | 150 | # Whiten if needed. 151 | if hasattr(self, 'whiten'): 152 | desc = self.whiten(desc) 153 | desc = F.normalize(desc, dim=1) # Final L2 normalization. 154 | 155 | return { 156 | 'global_descriptor': desc 157 | } 158 | -------------------------------------------------------------------------------- /nerf_loc/models/image_retrieval/vis.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-07-01 18:13:03 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-07-13 10:31:46 6 | FilePath: /nerf-loc/models/image_retrieval/vis.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import os 11 | import sys 12 | import cv2 13 | import pickle 14 | import numpy as np 15 | 16 | if __name__ == '__main__': 17 | scene_path = sys.argv[1] 18 | split = sys.argv[2] 19 | method = sys.argv[3] 20 | 21 | with open(os.path.join(scene_path, f'image_retrieval_{split}_{method}.pkl'), 'rb') as f: 22 | data = pickle.load(f) 23 | 24 | for query, srcs in list(data.items())[::20]: 25 | query_img = cv2.imread(os.path.join(os.path.dirname(scene_path), query)) 26 | src_imgs = [] 27 | for src in srcs: 28 | src_img = cv2.imread(os.path.join(os.path.dirname(scene_path), src)) 29 | src_imgs.append(src_img) 30 | cv2.imwrite('vis_retrieval.png', np.concatenate([query_img]+src_imgs[:8], axis=1)) 31 | from IPython import embed;embed() 32 | -------------------------------------------------------------------------------- /nerf_loc/models/matcher.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | from .COTR.transformer import SelfCrossTransformer 5 | from .COTR.position_encoding import PositionEmbeddingSine 6 | from .matching.sparse_to_dense import S2DMatching 7 | from .matching.coarse_matching import CoarseMatching 8 | from .matching.fine_matching import FinePreprocess, FineMatching 9 | 10 | class Matcher(nn.Module): 11 | def __init__(self, args, hidden_dim, in_channels_coarse, in_channels_fine, fine_matching=True): 12 | super().__init__() 13 | self.coarse_transformer = SelfCrossTransformer( 14 | d_model=hidden_dim, 15 | dropout=0.1, 16 | nhead=8, 17 | dim_feedforward=512, 18 | num_encoder_layers=6, 19 | num_decoder_layers=6, 20 | return_intermediate_dec=False, 21 | ) 22 | self.coarse_matcher = S2DMatching(hidden_dim, thr=0.2) 23 | # self.coarse_matcher = CoarseMatching({ 24 | # 'thr': 0.2, 25 | # # 'match_type': 'sinkhorn', 26 | # 'match_type': 'dual_softmax', 27 | # 'dsmax_temperature': 0.1, 28 | # 'skh_init_bin_score': 1.0, 29 | # 'skh_iters': 3, 30 | # 'skh_prefilter': False, 31 | # # 'sparse_spvs': True 32 | # 'sparse_spvs': False 33 | # }) 34 | 35 | self.fine_matching = fine_matching 36 | if fine_matching: 37 | self.pos_emd_2d_fn = PositionEmbeddingSine(hidden_dim // 2, normalize=True, sine_type='lin_sine') 38 | 39 | self.fine_window_size = 7 40 | self.fine_preprocess = FinePreprocess({ 41 | 'fine_concat_coarse_feat': False, 42 | 'fine_window_size': self.fine_window_size, 43 | 'in_channels_coarse': in_channels_coarse, 44 | 'in_channels_fine': in_channels_fine, 45 | 'out_channels': hidden_dim 46 | }) 47 | self.fine_transformer = SelfCrossTransformer( 48 | d_model=hidden_dim, 49 | dropout=0.1, 50 | nhead=8, 51 | dim_feedforward=128, 52 | num_encoder_layers=6, 53 | num_decoder_layers=6, 54 | return_intermediate_dec=False, 55 | ) 56 | self.fine_matcher = FineMatching({ 57 | 'feat_dim': hidden_dim, 58 | 'correct_thr': 1.0, 59 | # 'loss_type' :'l2_with_std' 60 | 'loss_type' : args.fine_matching_loss_type 61 | }) 62 | 63 | def forward(self, data): 64 | # kps_3d, desc_3d, kps_2d, desc_2d, conf_matrix_gt 65 | # data = {'conf_matrix_gt': conf_matrix_gt} 66 | # coarse matching 67 | # desc_3d, desc_2d = self.coarse_transformer( 68 | desc_3d_trans_c, desc_2d_trans_c = self.coarse_transformer( 69 | data['desc_3d'][None,...], data['pos_emd_3d'][None,...], 70 | data['desc_2d_coarse'][None,...], data['pos_emd_2d'][None,...]) 71 | # data['desc_3d'], data['desc_2d_coarse'] = desc_3d[0], desc_2d[0] 72 | data = self.coarse_matcher(desc_3d_trans_c[0], desc_2d_trans_c[0], data) 73 | i_ids = data['i_ids'] 74 | j_ids = data['j_ids'] # in coarse level 75 | data['b_ids'] = torch.zeros_like(i_ids) 76 | # mkps3d = data['kps3d'][i_ids] 77 | # mkps2d_c = data['kps2d'][j_ids] 78 | data.update({ 79 | 'mkps3d': data['kps3d'][i_ids], 80 | 'mkps2d_c': data['kps2d'][j_ids], 81 | 'pairs': [i_ids, j_ids] 82 | }) 83 | if not self.fine_matching: 84 | return data 85 | 86 | # fine matching 87 | if self.training: 88 | # use GT coarse matching for training fine matching 89 | i_ids = data['pairs_gt'][0] 90 | j_ids = data['pairs_gt'][1] 91 | data.update({ 92 | 'b_ids': torch.zeros_like(i_ids), 93 | 'i_ids': i_ids, 94 | 'j_ids': j_ids, 95 | 'mkps2d_c': data['kps2d'][j_ids], 96 | 'mkps3d': data['kps3d'][i_ids], 97 | }) 98 | # compute normalized offset 99 | data['expec_f_gt'] = (data['kps3d_proj_gt'][i_ids] - data['mkps2d_c']) / (self.fine_window_size // 2) 100 | 101 | M = len(i_ids) 102 | if M == 0: # no coarse matching at all 103 | assert self.training == False, "M is always >0, when training" 104 | data.update({ 105 | 'expec_f': torch.empty(0, 3, device=data['mkps2d_c'].device), 106 | 'mkps2d_f': data['mkps2d_c'], 107 | }) 108 | return data 109 | 110 | feat_fine = data['feat_fine'].permute(0,3,1,2) # B,C,H,W 111 | feat_coarse = data['feat_coarse'].permute(0,3,1,2) # B,C,H,W 112 | 113 | # in older version, we use desc_3d_trans_c here 114 | # matched_desc_3d = data['desc_3d'][i_ids][:,None,:] # M,C -> M,1,C 115 | matched_desc_3d = data['desc_3d_fine'][i_ids][:,None,:] # M,C -> M,1,C 116 | matched_pos_emd_3d = data['pos_emd_3d'][i_ids][:,None,:] # M,C -> M,1,C 117 | 118 | matched_desc_2d_fine = self.fine_preprocess(feat_fine, feat_coarse, data) # M,WW,C 119 | matched_pos_emd_2d_fine = self.pos_emd_2d_fn(matched_desc_2d_fine[...,0].view(M, self.fine_window_size, self.fine_window_size)).view(M, self.fine_window_size*self.fine_window_size, -1) # M,WW,C 120 | matched_desc_3d, matched_desc_2d_fine = self.fine_transformer( 121 | matched_desc_3d, matched_pos_emd_3d, # M,K,C 122 | matched_desc_2d_fine, matched_pos_emd_2d_fine) # M,WW,C 123 | 124 | data = self.fine_matcher(matched_desc_3d[:,0,:], matched_desc_2d_fine, data) 125 | 126 | if self.training: 127 | data['fine_err'] = torch.norm(data['expec_f_gt']-data['expec_f'][:,:2], dim=-1).mean() * (self.fine_window_size // 2) * data['stride_fine'] 128 | 129 | # # DEBUG 130 | # data['mkps2d_f'] = data['mkps2d_c'] + data['expec_f_gt'] * (self.fine_window_size // 2) 131 | return data 132 | -------------------------------------------------------------------------------- /nerf_loc/models/matching/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JenningsL/nerf-loc/1d539c5a4824a46d26414f3c2b41bb1b1f6dd91e/nerf_loc/models/matching/__init__.py -------------------------------------------------------------------------------- /nerf_loc/models/matching/sparse_to_dense.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-03-08 15:47:46 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-05-14 15:19:08 6 | FilePath: /nerf-loc/models/matching/sparse_to_dense.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import torch 11 | import torch.nn as nn 12 | import torch.nn.functional as F 13 | 14 | class SigmoidFocalClassificationLoss(nn.Module): 15 | """ 16 | Sigmoid focal cross entropy loss. 17 | """ 18 | 19 | def __init__(self, gamma: float = 2.0, alpha: float = 0.25): 20 | """ 21 | Args: 22 | gamma: Weighting parameter to balance loss for hard and easy examples. 23 | alpha: Weighting parameter to balance loss for positive and negative examples. 24 | """ 25 | super(SigmoidFocalClassificationLoss, self).__init__() 26 | self.alpha = alpha 27 | self.gamma = gamma 28 | 29 | @staticmethod 30 | def sigmoid_cross_entropy_with_logits(input: torch.Tensor, target: torch.Tensor): 31 | """ PyTorch Implementation for tf.nn.sigmoid_cross_entropy_with_logits: 32 | max(x, 0) - x * z + log(1 + exp(-abs(x))) in 33 | https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits 34 | 35 | Args: 36 | input: (B, #anchors, #classes) float tensor. 37 | Predicted logits for each class 38 | target: (B, #anchors, #classes) float tensor. 39 | One-hot encoded classification targets 40 | 41 | Returns: 42 | loss: (B, #anchors, #classes) float tensor. 43 | Sigmoid cross entropy loss without reduction 44 | """ 45 | loss = torch.clamp(input, min=0) - input * target + \ 46 | torch.log1p(torch.exp(-torch.abs(input))) 47 | return loss 48 | 49 | def forward(self, input: torch.Tensor, target: torch.Tensor, weights: torch.Tensor): 50 | """ 51 | Args: 52 | input: (B, #anchors, #classes) float tensor. 53 | Predicted logits for each class 54 | target: (B, #anchors, #classes) float tensor. 55 | One-hot encoded classification targets 56 | weights: (B, #anchors) float tensor. 57 | Anchor-wise weights. 58 | 59 | Returns: 60 | weighted_loss: (B, #anchors, #classes) float tensor after weighting. 61 | """ 62 | pred_sigmoid = torch.sigmoid(input) 63 | alpha_weight = target * self.alpha + (1 - target) * (1 - self.alpha) 64 | pt = target * (1.0 - pred_sigmoid) + (1.0 - target) * pred_sigmoid 65 | focal_weight = alpha_weight * torch.pow(pt, self.gamma) 66 | 67 | bce_loss = self.sigmoid_cross_entropy_with_logits(input, target) 68 | # bce_loss = F.binary_cross_entropy_with_logits(input, target, reduction='none') 69 | 70 | loss = focal_weight * bce_loss 71 | 72 | if weights.shape.__len__() == 2 or \ 73 | (weights.shape.__len__() == 1 and target.shape.__len__() == 2): 74 | weights = weights.unsqueeze(-1) 75 | 76 | assert weights.shape.__len__() == loss.shape.__len__() 77 | 78 | return loss * weights 79 | 80 | class S2DMatching(nn.Module): 81 | def __init__(self, feat_dim, thr=0.1): 82 | super().__init__() 83 | self.mlps = nn.Sequential( 84 | nn.Linear(feat_dim, 128), 85 | nn.ReLU(inplace=True), 86 | nn.Linear(128, 128), 87 | nn.ReLU(inplace=True), 88 | nn.Linear(128, 1) 89 | ) 90 | self.thr = thr 91 | self.focal_loss = SigmoidFocalClassificationLoss() 92 | 93 | def get_loss(self, conf, conf_gt): 94 | """ 95 | Args: 96 | conf: N,M 97 | conf_gt: N,M 98 | Returns: 99 | """ 100 | # CE 101 | # loss = F.binary_cross_entropy_with_logits(conf, conf_gt, reduction='mean') 102 | 103 | # # focal loss 104 | # pos_mask, neg_mask = conf_gt == 1, conf_gt == 0 105 | # conf = torch.clamp(conf, 1e-6, 1-1e-6) 106 | # alpha = 0.25 107 | # gamma = 2.0 108 | # loss_pos = - alpha * torch.pow(1 - conf[pos_mask], gamma) * (conf[pos_mask]).log() 109 | # loss_neg = - alpha * torch.pow(conf[neg_mask], gamma) * (1 - conf[neg_mask]).log() 110 | # loss = loss_pos.mean() + loss_neg.mean() 111 | 112 | loss = self.focal_loss(conf.unsqueeze(2), conf_gt.unsqueeze(2), torch.ones_like(conf_gt.unsqueeze(2))).mean() 113 | 114 | return loss 115 | 116 | def forward(self, desc0, desc1, data): 117 | """ 118 | Args: 119 | desc0: (N,C) sparse set of descriptors 120 | desc1: (M,C) dense set of descriptors, where M >> N 121 | Returns: 122 | """ 123 | assert (desc0.shape[0] > 0) and (desc1.shape[0] > 0) 124 | # x = torch.einsum('NC,MC->NMC', desc0, desc1) 125 | x = torch.einsum('nc,mc->nmc', desc0, desc1) 126 | conf_matrix = self.mlps(x).squeeze(-1) # N,M 127 | score_matrix = torch.sigmoid(conf_matrix) 128 | 129 | # i_ids = torch.arange(len(desc0), device=desc0.device) 130 | # max_v, j_ids = score_matrix.max(dim=1) 131 | # valid_mask = max_v > self.thr 132 | # i_ids = i_ids[valid_mask] 133 | # j_ids = j_ids[valid_mask] 134 | 135 | # mutual nearest 136 | mask = score_matrix > self.thr 137 | mask = mask \ 138 | * (score_matrix == score_matrix.max(dim=1, keepdim=True)[0]) \ 139 | * (score_matrix == score_matrix.max(dim=0, keepdim=True)[0]) 140 | mask_v, all_j_ids = mask.max(dim=1) # N 141 | i_ids = torch.where(mask_v)[0] 142 | j_ids = all_j_ids[i_ids] # N' 143 | data.update({ 144 | 'i_ids': i_ids, 145 | 'j_ids': j_ids, 146 | 'score_matrix': score_matrix 147 | }) 148 | if self.training: 149 | data['coarse_loss'] = self.get_loss(conf_matrix, data['conf_matrix_gt'].float()) 150 | 151 | return data 152 | -------------------------------------------------------------------------------- /nerf_loc/models/ops/knn/src/knn.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) Facebook, Inc. and its affiliates. 3 | * All rights reserved. 4 | * 5 | * This source code is licensed under the BSD-style license found in the 6 | * LICENSE file in the root directory of this source tree. 7 | */ 8 | 9 | #pragma once 10 | #include 11 | #include 12 | #include "utils/pytorch3d_cutils.h" 13 | 14 | #define WITH_CUDA 1 15 | 16 | // Compute indices of K nearest neighbors in pointcloud p2 to points 17 | // in pointcloud p1. 18 | // 19 | // Args: 20 | // p1: FloatTensor of shape (N, P1, D) giving a batch of pointclouds each 21 | // containing P1 points of dimension D. 22 | // p2: FloatTensor of shape (N, P2, D) giving a batch of pointclouds each 23 | // containing P2 points of dimension D. 24 | // lengths1: LongTensor, shape (N,), giving actual length of each P1 cloud. 25 | // lengths2: LongTensor, shape (N,), giving actual length of each P2 cloud. 26 | // K: int giving the number of nearest points to return. 27 | // version: Integer telling which implementation to use. 28 | // 29 | // Returns: 30 | // p1_neighbor_idx: LongTensor of shape (N, P1, K), where 31 | // p1_neighbor_idx[n, i, k] = j means that the kth nearest 32 | // neighbor to p1[n, i] in the cloud p2[n] is p2[n, j]. 33 | // It is padded with zeros so that it can be used easily in a later 34 | // gather() operation. 35 | // 36 | // p1_neighbor_dists: FloatTensor of shape (N, P1, K) containing the squared 37 | // distance from each point p1[n, p, :] to its K neighbors 38 | // p2[n, p1_neighbor_idx[n, p, k], :]. 39 | 40 | // CPU implementation. 41 | std::tuple KNearestNeighborIdxCpu( 42 | const at::Tensor& p1, 43 | const at::Tensor& p2, 44 | const at::Tensor& lengths1, 45 | const at::Tensor& lengths2, 46 | int K); 47 | 48 | // CUDA implementation 49 | std::tuple KNearestNeighborIdxCuda( 50 | const at::Tensor& p1, 51 | const at::Tensor& p2, 52 | const at::Tensor& lengths1, 53 | const at::Tensor& lengths2, 54 | int K, 55 | int version); 56 | 57 | // Implementation which is exposed. 58 | std::tuple KNearestNeighborIdx( 59 | const at::Tensor& p1, 60 | const at::Tensor& p2, 61 | const at::Tensor& lengths1, 62 | const at::Tensor& lengths2, 63 | int K, 64 | int version) { 65 | if (p1.is_cuda() || p2.is_cuda()) { 66 | #ifdef WITH_CUDA 67 | CHECK_CUDA(p1); 68 | CHECK_CUDA(p2); 69 | return KNearestNeighborIdxCuda(p1, p2, lengths1, lengths2, K, version); 70 | #else 71 | AT_ERROR("Not compiled with GPU support."); 72 | #endif 73 | } 74 | return KNearestNeighborIdxCpu(p1, p2, lengths1, lengths2, K); 75 | } 76 | 77 | // Compute gradients with respect to p1 and p2 78 | // 79 | // Args: 80 | // p1: FloatTensor of shape (N, P1, D) giving a batch of pointclouds each 81 | // containing P1 points of dimension D. 82 | // p2: FloatTensor of shape (N, P2, D) giving a batch of pointclouds each 83 | // containing P2 points of dimension D. 84 | // lengths1: LongTensor, shape (N,), giving actual length of each P1 cloud. 85 | // lengths2: LongTensor, shape (N,), giving actual length of each P2 cloud. 86 | // p1_neighbor_idx: LongTensor of shape (N, P1, K), where 87 | // p1_neighbor_idx[n, i, k] = j means that the kth nearest 88 | // neighbor to p1[n, i] in the cloud p2[n] is p2[n, j]. 89 | // It is padded with zeros so that it can be used easily in a later 90 | // gather() operation. This is computed from the forward pass. 91 | // grad_dists: FLoatTensor of shape (N, P1, K) which contains the input 92 | // gradients. 93 | // 94 | // Returns: 95 | // grad_p1: FloatTensor of shape (N, P1, D) containing the output gradients 96 | // wrt p1. 97 | // grad_p2: FloatTensor of shape (N, P2, D) containing the output gradients 98 | // wrt p2. 99 | 100 | // CPU implementation. 101 | std::tuple KNearestNeighborBackwardCpu( 102 | const at::Tensor& p1, 103 | const at::Tensor& p2, 104 | const at::Tensor& lengths1, 105 | const at::Tensor& lengths2, 106 | const at::Tensor& idxs, 107 | const at::Tensor& grad_dists); 108 | 109 | // CUDA implementation 110 | std::tuple KNearestNeighborBackwardCuda( 111 | const at::Tensor& p1, 112 | const at::Tensor& p2, 113 | const at::Tensor& lengths1, 114 | const at::Tensor& lengths2, 115 | const at::Tensor& idxs, 116 | const at::Tensor& grad_dists); 117 | 118 | // Implementation which is exposed. 119 | std::tuple KNearestNeighborBackward( 120 | const at::Tensor& p1, 121 | const at::Tensor& p2, 122 | const at::Tensor& lengths1, 123 | const at::Tensor& lengths2, 124 | const at::Tensor& idxs, 125 | const at::Tensor& grad_dists) { 126 | if (p1.is_cuda() || p2.is_cuda()) { 127 | #ifdef WITH_CUDA 128 | CHECK_CUDA(p1); 129 | CHECK_CUDA(p2); 130 | return KNearestNeighborBackwardCuda( 131 | p1, p2, lengths1, lengths2, idxs, grad_dists); 132 | #else 133 | AT_ERROR("Not compiled with GPU support."); 134 | #endif 135 | } 136 | return KNearestNeighborBackwardCpu( 137 | p1, p2, lengths1, lengths2, idxs, grad_dists); 138 | } 139 | 140 | // Utility to check whether a KNN version can be used. 141 | // 142 | // Args: 143 | // version: Integer in the range 0 <= version <= 3 indicating one of our 144 | // KNN implementations. 145 | // D: Number of dimensions for the input and query point clouds 146 | // K: Number of neighbors to be found 147 | // 148 | // Returns: 149 | // Whether the indicated KNN version can be used. 150 | bool KnnCheckVersion(int version, const int64_t D, const int64_t K); 151 | -------------------------------------------------------------------------------- /nerf_loc/models/ops/knn/src/knn_api.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | #include "knn.h" 8 | 9 | 10 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 11 | m.def("knn_check_version", &KnnCheckVersion); 12 | m.def("knn_points_idx", &KNearestNeighborIdx); 13 | m.def("knn_points_backward", &KNearestNeighborBackward); 14 | } 15 | -------------------------------------------------------------------------------- /nerf_loc/models/ops/knn/src/knn_cpu.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) Facebook, Inc. and its affiliates. 3 | * All rights reserved. 4 | * 5 | * This source code is licensed under the BSD-style license found in the 6 | * LICENSE file in the root directory of this source tree. 7 | */ 8 | 9 | #include 10 | #include 11 | #include 12 | 13 | std::tuple KNearestNeighborIdxCpu( 14 | const at::Tensor& p1, 15 | const at::Tensor& p2, 16 | const at::Tensor& lengths1, 17 | const at::Tensor& lengths2, 18 | int K) { 19 | const int N = p1.size(0); 20 | const int P1 = p1.size(1); 21 | const int D = p1.size(2); 22 | 23 | auto long_opts = lengths1.options().dtype(torch::kInt64); 24 | torch::Tensor idxs = torch::full({N, P1, K}, 0, long_opts); 25 | torch::Tensor dists = torch::full({N, P1, K}, 0, p1.options()); 26 | 27 | auto p1_a = p1.accessor(); 28 | auto p2_a = p2.accessor(); 29 | auto lengths1_a = lengths1.accessor(); 30 | auto lengths2_a = lengths2.accessor(); 31 | auto idxs_a = idxs.accessor(); 32 | auto dists_a = dists.accessor(); 33 | 34 | for (int n = 0; n < N; ++n) { 35 | const int64_t length1 = lengths1_a[n]; 36 | const int64_t length2 = lengths2_a[n]; 37 | for (int64_t i1 = 0; i1 < length1; ++i1) { 38 | // Use a priority queue to store (distance, index) tuples. 39 | std::priority_queue> q; 40 | for (int64_t i2 = 0; i2 < length2; ++i2) { 41 | float dist = 0; 42 | for (int d = 0; d < D; ++d) { 43 | float diff = p1_a[n][i1][d] - p2_a[n][i2][d]; 44 | dist += diff * diff; 45 | } 46 | int size = static_cast(q.size()); 47 | if (size < K || dist < std::get<0>(q.top())) { 48 | q.emplace(dist, i2); 49 | if (size >= K) { 50 | q.pop(); 51 | } 52 | } 53 | } 54 | while (!q.empty()) { 55 | auto t = q.top(); 56 | q.pop(); 57 | const int k = q.size(); 58 | dists_a[n][i1][k] = std::get<0>(t); 59 | idxs_a[n][i1][k] = std::get<1>(t); 60 | } 61 | } 62 | } 63 | return std::make_tuple(idxs, dists); 64 | } 65 | 66 | // ------------------------------------------------------------- // 67 | // Backward Operators // 68 | // ------------------------------------------------------------- // 69 | 70 | std::tuple KNearestNeighborBackwardCpu( 71 | const at::Tensor& p1, 72 | const at::Tensor& p2, 73 | const at::Tensor& lengths1, 74 | const at::Tensor& lengths2, 75 | const at::Tensor& idxs, 76 | const at::Tensor& grad_dists) { 77 | const int N = p1.size(0); 78 | const int P1 = p1.size(1); 79 | const int D = p1.size(2); 80 | const int P2 = p2.size(1); 81 | const int K = idxs.size(2); 82 | 83 | torch::Tensor grad_p1 = torch::full({N, P1, D}, 0, p1.options()); 84 | torch::Tensor grad_p2 = torch::full({N, P2, D}, 0, p2.options()); 85 | 86 | auto p1_a = p1.accessor(); 87 | auto p2_a = p2.accessor(); 88 | auto lengths1_a = lengths1.accessor(); 89 | auto lengths2_a = lengths2.accessor(); 90 | auto idxs_a = idxs.accessor(); 91 | auto grad_dists_a = grad_dists.accessor(); 92 | auto grad_p1_a = grad_p1.accessor(); 93 | auto grad_p2_a = grad_p2.accessor(); 94 | 95 | for (int n = 0; n < N; ++n) { 96 | const int64_t length1 = lengths1_a[n]; 97 | int64_t length2 = lengths2_a[n]; 98 | length2 = (length2 < K) ? length2 : K; 99 | for (int64_t i1 = 0; i1 < length1; ++i1) { 100 | for (int64_t k = 0; k < length2; ++k) { 101 | const int64_t i2 = idxs_a[n][i1][k]; 102 | // If the index is the pad value of -1 then ignore it 103 | if (i2 == -1) { 104 | continue; 105 | } 106 | for (int64_t d = 0; d < D; ++d) { 107 | const float diff = 108 | 2.0f * grad_dists_a[n][i1][k] * (p1_a[n][i1][d] - p2_a[n][i2][d]); 109 | grad_p1_a[n][i1][d] += diff; 110 | grad_p2_a[n][i2][d] += -1.0f * diff; 111 | } 112 | } 113 | } 114 | } 115 | return std::make_tuple(grad_p1, grad_p2); 116 | } 117 | -------------------------------------------------------------------------------- /nerf_loc/models/ops/knn/src/utils/index_utils.cuh: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) Facebook, Inc. and its affiliates. 3 | * All rights reserved. 4 | * 5 | * This source code is licensed under the BSD-style license found in the 6 | * LICENSE file in the root directory of this source tree. 7 | */ 8 | 9 | // This converts dynamic array lookups into static array lookups, for small 10 | // arrays up to size 32. 11 | // 12 | // Suppose we have a small thread-local array: 13 | // 14 | // float vals[10]; 15 | // 16 | // Ideally we should only index this array using static indices: 17 | // 18 | // for (int i = 0; i < 10; ++i) vals[i] = i * i; 19 | // 20 | // If we do so, then the CUDA compiler may be able to place the array into 21 | // registers, which can have a big performance improvement. However if we 22 | // access the array dynamically, the the compiler may force the array into 23 | // local memory, which has the same latency as global memory. 24 | // 25 | // These functions convert dynamic array access into static array access 26 | // using a brute-force lookup table. It can be used like this: 27 | // 28 | // float vals[10]; 29 | // int idx = 3; 30 | // float val = 3.14f; 31 | // RegisterIndexUtils::set(vals, idx, val); 32 | // float val2 = RegisterIndexUtils::get(vals, idx); 33 | // 34 | // The implementation is based on fbcuda/RegisterUtils.cuh: 35 | // https://github.com/facebook/fbcuda/blob/master/RegisterUtils.cuh 36 | // To avoid depending on the entire library, we just reimplement these two 37 | // functions. The fbcuda implementation is a bit more sophisticated, and uses 38 | // the preprocessor to generate switch statements that go up to N for each 39 | // value of N. We are lazy and just have a giant explicit switch statement. 40 | // 41 | // We might be able to use a template metaprogramming approach similar to 42 | // DispatchKernel1D for this. However DispatchKernel1D is intended to be used 43 | // for dispatching to the correct CUDA kernel on the host, while this is 44 | // is intended to run on the device. I was concerned that a metaprogramming 45 | // approach for this might lead to extra function calls at runtime if the 46 | // compiler fails to optimize them away, which could be very slow on device. 47 | // However I didn't actually benchmark or test this. 48 | template 49 | struct RegisterIndexUtils { 50 | __device__ __forceinline__ static T get(const T arr[N], int idx) { 51 | if (idx < 0 || idx >= N) 52 | return T(); 53 | switch (idx) { 54 | case 0: 55 | return arr[0]; 56 | case 1: 57 | return arr[1]; 58 | case 2: 59 | return arr[2]; 60 | case 3: 61 | return arr[3]; 62 | case 4: 63 | return arr[4]; 64 | case 5: 65 | return arr[5]; 66 | case 6: 67 | return arr[6]; 68 | case 7: 69 | return arr[7]; 70 | case 8: 71 | return arr[8]; 72 | case 9: 73 | return arr[9]; 74 | case 10: 75 | return arr[10]; 76 | case 11: 77 | return arr[11]; 78 | case 12: 79 | return arr[12]; 80 | case 13: 81 | return arr[13]; 82 | case 14: 83 | return arr[14]; 84 | case 15: 85 | return arr[15]; 86 | case 16: 87 | return arr[16]; 88 | case 17: 89 | return arr[17]; 90 | case 18: 91 | return arr[18]; 92 | case 19: 93 | return arr[19]; 94 | case 20: 95 | return arr[20]; 96 | case 21: 97 | return arr[21]; 98 | case 22: 99 | return arr[22]; 100 | case 23: 101 | return arr[23]; 102 | case 24: 103 | return arr[24]; 104 | case 25: 105 | return arr[25]; 106 | case 26: 107 | return arr[26]; 108 | case 27: 109 | return arr[27]; 110 | case 28: 111 | return arr[28]; 112 | case 29: 113 | return arr[29]; 114 | case 30: 115 | return arr[30]; 116 | case 31: 117 | return arr[31]; 118 | }; 119 | return T(); 120 | } 121 | 122 | __device__ __forceinline__ static void set(T arr[N], int idx, T val) { 123 | if (idx < 0 || idx >= N) 124 | return; 125 | switch (idx) { 126 | case 0: 127 | arr[0] = val; 128 | break; 129 | case 1: 130 | arr[1] = val; 131 | break; 132 | case 2: 133 | arr[2] = val; 134 | break; 135 | case 3: 136 | arr[3] = val; 137 | break; 138 | case 4: 139 | arr[4] = val; 140 | break; 141 | case 5: 142 | arr[5] = val; 143 | break; 144 | case 6: 145 | arr[6] = val; 146 | break; 147 | case 7: 148 | arr[7] = val; 149 | break; 150 | case 8: 151 | arr[8] = val; 152 | break; 153 | case 9: 154 | arr[9] = val; 155 | break; 156 | case 10: 157 | arr[10] = val; 158 | break; 159 | case 11: 160 | arr[11] = val; 161 | break; 162 | case 12: 163 | arr[12] = val; 164 | break; 165 | case 13: 166 | arr[13] = val; 167 | break; 168 | case 14: 169 | arr[14] = val; 170 | break; 171 | case 15: 172 | arr[15] = val; 173 | break; 174 | case 16: 175 | arr[16] = val; 176 | break; 177 | case 17: 178 | arr[17] = val; 179 | break; 180 | case 18: 181 | arr[18] = val; 182 | break; 183 | case 19: 184 | arr[19] = val; 185 | break; 186 | case 20: 187 | arr[20] = val; 188 | break; 189 | case 21: 190 | arr[21] = val; 191 | break; 192 | case 22: 193 | arr[22] = val; 194 | break; 195 | case 23: 196 | arr[23] = val; 197 | break; 198 | case 24: 199 | arr[24] = val; 200 | break; 201 | case 25: 202 | arr[25] = val; 203 | break; 204 | case 26: 205 | arr[26] = val; 206 | break; 207 | case 27: 208 | arr[27] = val; 209 | break; 210 | case 28: 211 | arr[28] = val; 212 | break; 213 | case 29: 214 | arr[29] = val; 215 | break; 216 | case 30: 217 | arr[30] = val; 218 | break; 219 | case 31: 220 | arr[31] = val; 221 | break; 222 | } 223 | } 224 | }; 225 | -------------------------------------------------------------------------------- /nerf_loc/models/ops/knn/src/utils/mink.cuh: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) Facebook, Inc. and its affiliates. 3 | * All rights reserved. 4 | * 5 | * This source code is licensed under the BSD-style license found in the 6 | * LICENSE file in the root directory of this source tree. 7 | */ 8 | 9 | #pragma once 10 | #define MINK_H 11 | 12 | #include "index_utils.cuh" 13 | 14 | // A data structure to keep track of the smallest K keys seen so far as well 15 | // as their associated values, intended to be used in device code. 16 | // This data structure doesn't allocate any memory; keys and values are stored 17 | // in arrays passed to the constructor. 18 | // 19 | // The implementation is generic; it can be used for any key type that supports 20 | // the < operator, and can be used with any value type. 21 | // 22 | // Example usage: 23 | // 24 | // float keys[K]; 25 | // int values[K]; 26 | // MinK mink(keys, values, K); 27 | // for (...) { 28 | // // Produce some key and value from somewhere 29 | // mink.add(key, value); 30 | // } 31 | // mink.sort(); 32 | // 33 | // Now keys and values store the smallest K keys seen so far and the values 34 | // associated to these keys: 35 | // 36 | // for (int k = 0; k < K; ++k) { 37 | // float key_k = keys[k]; 38 | // int value_k = values[k]; 39 | // } 40 | template 41 | class MinK { 42 | public: 43 | // Constructor. 44 | // 45 | // Arguments: 46 | // keys: Array in which to store keys 47 | // values: Array in which to store values 48 | // K: How many values to keep track of 49 | __device__ MinK(key_t* keys, value_t* vals, int K) 50 | : keys(keys), vals(vals), K(K), _size(0) {} 51 | 52 | // Try to add a new key and associated value to the data structure. If the key 53 | // is one of the smallest K seen so far then it will be kept; otherwise it 54 | // it will not be kept. 55 | // 56 | // This takes O(1) operations if the new key is not kept, or if the structure 57 | // currently contains fewer than K elements. Otherwise this takes O(K) time. 58 | // 59 | // Arguments: 60 | // key: The key to add 61 | // val: The value associated to the key 62 | __device__ __forceinline__ void add(const key_t& key, const value_t& val) { 63 | if (_size < K) { 64 | keys[_size] = key; 65 | vals[_size] = val; 66 | if (_size == 0 || key > max_key) { 67 | max_key = key; 68 | max_idx = _size; 69 | } 70 | _size++; 71 | } else if (key < max_key) { 72 | keys[max_idx] = key; 73 | vals[max_idx] = val; 74 | max_key = key; 75 | for (int k = 0; k < K; ++k) { 76 | key_t cur_key = keys[k]; 77 | if (cur_key > max_key) { 78 | max_key = cur_key; 79 | max_idx = k; 80 | } 81 | } 82 | } 83 | } 84 | 85 | // Get the number of items currently stored in the structure. 86 | // This takes O(1) time. 87 | __device__ __forceinline__ int size() { 88 | return _size; 89 | } 90 | 91 | // Sort the items stored in the structure using bubble sort. 92 | // This takes O(K^2) time. 93 | __device__ __forceinline__ void sort() { 94 | for (int i = 0; i < _size - 1; ++i) { 95 | for (int j = 0; j < _size - i - 1; ++j) { 96 | if (keys[j + 1] < keys[j]) { 97 | key_t key = keys[j]; 98 | value_t val = vals[j]; 99 | keys[j] = keys[j + 1]; 100 | vals[j] = vals[j + 1]; 101 | keys[j + 1] = key; 102 | vals[j + 1] = val; 103 | } 104 | } 105 | } 106 | } 107 | 108 | private: 109 | key_t* keys; 110 | value_t* vals; 111 | int K; 112 | int _size; 113 | key_t max_key; 114 | int max_idx; 115 | }; 116 | 117 | // This is a version of MinK that only touches the arrays using static indexing 118 | // via RegisterIndexUtils. If the keys and values are stored in thread-local 119 | // arrays, then this may allow the compiler to place them in registers for 120 | // fast access. 121 | // 122 | // This has the same API as RegisterMinK, but doesn't support sorting. 123 | // We found that sorting via RegisterIndexUtils gave very poor performance, 124 | // and suspect it may have prevented the compiler from placing the arrays 125 | // into registers. 126 | template 127 | class RegisterMinK { 128 | public: 129 | __device__ RegisterMinK(key_t* keys, value_t* vals) 130 | : keys(keys), vals(vals), _size(0) {} 131 | 132 | __device__ __forceinline__ void add(const key_t& key, const value_t& val) { 133 | if (_size < K) { 134 | RegisterIndexUtils::set(keys, _size, key); 135 | RegisterIndexUtils::set(vals, _size, val); 136 | if (_size == 0 || key > max_key) { 137 | max_key = key; 138 | max_idx = _size; 139 | } 140 | _size++; 141 | } else if (key < max_key) { 142 | RegisterIndexUtils::set(keys, max_idx, key); 143 | RegisterIndexUtils::set(vals, max_idx, val); 144 | max_key = key; 145 | for (int k = 0; k < K; ++k) { 146 | key_t cur_key = RegisterIndexUtils::get(keys, k); 147 | if (cur_key > max_key) { 148 | max_key = cur_key; 149 | max_idx = k; 150 | } 151 | } 152 | } 153 | } 154 | 155 | __device__ __forceinline__ int size() { 156 | return _size; 157 | } 158 | 159 | private: 160 | key_t* keys; 161 | value_t* vals; 162 | int _size; 163 | key_t max_key; 164 | int max_idx; 165 | }; 166 | -------------------------------------------------------------------------------- /nerf_loc/models/ops/knn/src/utils/pytorch3d_cutils.h: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) Facebook, Inc. and its affiliates. 3 | * All rights reserved. 4 | * 5 | * This source code is licensed under the BSD-style license found in the 6 | * LICENSE file in the root directory of this source tree. 7 | */ 8 | 9 | #pragma once 10 | #include 11 | 12 | #define CHECK_CUDA(x) TORCH_CHECK(x.is_cuda(), #x " must be a CUDA tensor.") 13 | #define CHECK_CONTIGUOUS(x) \ 14 | TORCH_CHECK(x.is_contiguous(), #x " must be contiguous.") 15 | #define CHECK_CONTIGUOUS_CUDA(x) \ 16 | CHECK_CUDA(x); \ 17 | CHECK_CONTIGUOUS(x) 18 | 19 | // Max possible threads per block 20 | const int MAX_THREADS_PER_BLOCK = 1024; 21 | -------------------------------------------------------------------------------- /nerf_loc/models/utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-03-20 13:05:04 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-06-21 17:35:12 6 | FilePath: /nerf-loc/models/utils.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import torch 11 | 12 | def camera_project(p3d, K): 13 | # K0 = torch.eye(3).to(K.device) 14 | # K0[1,1] = -1 15 | # K0[2,2] = -1 16 | # p3d = torch.mm(K0, p3d.t()) 17 | uvz = torch.mm(K, p3d.t()) 18 | z = uvz[2] 19 | u = uvz[0] / z 20 | v = uvz[1] / z 21 | return u,v,z 22 | -------------------------------------------------------------------------------- /nerf_loc/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JenningsL/nerf-loc/1d539c5a4824a46d26414f3c2b41bb1b1f6dd91e/nerf_loc/utils/__init__.py -------------------------------------------------------------------------------- /nerf_loc/utils/metrics.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-03-01 14:33:19 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-04-08 19:38:03 6 | FilePath: /nerf-loc/utils/metrics.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import cv2 11 | import numpy as np 12 | import math 13 | 14 | def compute_pose_error(T_est, T_gt): 15 | """ assuming two translation matrix are in the same scale 16 | Args: 17 | T_est: np.array (4,4) 18 | T_gt: np.array (4,4) 19 | Returns: 20 | angular_err: in degrees 21 | translatiton_err: 22 | """ 23 | r1 = T_est[:3,:3] 24 | r2 = T_gt[:3,:3] 25 | rot_diff= r2 @ r1.T 26 | trace = cv2.trace(rot_diff)[0] 27 | trace = min(3.0, max(-1.0, trace)) 28 | angular_err = 180*math.acos((trace-1.0)/2.0)/np.pi 29 | 30 | t1 = T_est[:3,3] 31 | t2 = T_gt[:3,3] 32 | translatiton_err = np.linalg.norm(t1-t2) 33 | return angular_err, translatiton_err 34 | 35 | 36 | def compute_matching_iou(pairs, pairs_gt): 37 | pred = zip(pairs[0].cpu().numpy().tolist(), pairs[1].cpu().numpy().tolist()) 38 | gt = zip(pairs_gt[0].cpu().numpy().tolist(), pairs_gt[1].cpu().numpy().tolist()) 39 | pred = set(pred) 40 | gt = set(gt) 41 | return len(pred.intersection(gt)) / (len(pred.union(gt))+1e-8) 42 | -------------------------------------------------------------------------------- /nerf_loc/utils/transform/__init__.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-04-27 10:02:03 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-04-27 10:39:51 6 | FilePath: /nerf-loc/utils/transform/__init__.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import torch 11 | import numpy as np 12 | from .rotation_conversions import euler_angles_to_matrix 13 | 14 | def get_pose_perturb(translation_noise, rotation_noise): 15 | pose_perturb = torch.eye(4) 16 | radians = rotation_noise * (2*torch.rand(3)-1) * np.pi / 180 17 | pose_perturb[:3,:3] = euler_angles_to_matrix(radians, 'XYZ') 18 | pose_perturb[:3,3] = (2*torch.rand(3)-1) * translation_noise 19 | return pose_perturb 20 | -------------------------------------------------------------------------------- /nerf_loc/utils/transform/math.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-04-11 19:14:07 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-04-11 19:15:08 6 | FilePath: /nerf-loc/utils/transform/math.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | # Copyright (c) Meta Platforms, Inc. and affiliates. 11 | # All rights reserved. 12 | # 13 | # This source code is licensed under the BSD-style license found in the 14 | # LICENSE file in the root directory of this source tree. 15 | 16 | import math 17 | from typing import Tuple 18 | 19 | import torch 20 | 21 | 22 | DEFAULT_ACOS_BOUND: float = 1.0 - 1e-4 23 | 24 | 25 | def acos_linear_extrapolation( 26 | x: torch.Tensor, 27 | bounds: Tuple[float, float] = (-DEFAULT_ACOS_BOUND, DEFAULT_ACOS_BOUND), 28 | ) -> torch.Tensor: 29 | """ 30 | Implements `arccos(x)` which is linearly extrapolated outside `x`'s original 31 | domain of `(-1, 1)`. This allows for stable backpropagation in case `x` 32 | is not guaranteed to be strictly within `(-1, 1)`. 33 | 34 | More specifically: 35 | ``` 36 | bounds=(lower_bound, upper_bound) 37 | if lower_bound <= x <= upper_bound: 38 | acos_linear_extrapolation(x) = acos(x) 39 | elif x <= lower_bound: # 1st order Taylor approximation 40 | acos_linear_extrapolation(x) 41 | = acos(lower_bound) + dacos/dx(lower_bound) * (x - lower_bound) 42 | else: # x >= upper_bound 43 | acos_linear_extrapolation(x) 44 | = acos(upper_bound) + dacos/dx(upper_bound) * (x - upper_bound) 45 | ``` 46 | 47 | Args: 48 | x: Input `Tensor`. 49 | bounds: A float 2-tuple defining the region for the 50 | linear extrapolation of `acos`. 51 | The first/second element of `bound` 52 | describes the lower/upper bound that defines the lower/upper 53 | extrapolation region, i.e. the region where 54 | `x <= bound[0]`/`bound[1] <= x`. 55 | Note that all elements of `bound` have to be within (-1, 1). 56 | Returns: 57 | acos_linear_extrapolation: `Tensor` containing the extrapolated `arccos(x)`. 58 | """ 59 | 60 | lower_bound, upper_bound = bounds 61 | 62 | if lower_bound > upper_bound: 63 | raise ValueError("lower bound has to be smaller or equal to upper bound.") 64 | 65 | if lower_bound <= -1.0 or upper_bound >= 1.0: 66 | raise ValueError("Both lower bound and upper bound have to be within (-1, 1).") 67 | 68 | # init an empty tensor and define the domain sets 69 | acos_extrap = torch.empty_like(x) 70 | x_upper = x >= upper_bound 71 | x_lower = x <= lower_bound 72 | x_mid = (~x_upper) & (~x_lower) 73 | 74 | # acos calculation for upper_bound < x < lower_bound 75 | acos_extrap[x_mid] = torch.acos(x[x_mid]) 76 | # the linear extrapolation for x >= upper_bound 77 | acos_extrap[x_upper] = _acos_linear_approximation(x[x_upper], upper_bound) 78 | # the linear extrapolation for x <= lower_bound 79 | acos_extrap[x_lower] = _acos_linear_approximation(x[x_lower], lower_bound) 80 | 81 | return acos_extrap 82 | 83 | 84 | 85 | def _acos_linear_approximation(x: torch.Tensor, x0: float) -> torch.Tensor: 86 | """ 87 | Calculates the 1st order Taylor expansion of `arccos(x)` around `x0`. 88 | """ 89 | return (x - x0) * _dacos_dx(x0) + math.acos(x0) 90 | 91 | 92 | def _dacos_dx(x: float) -> float: 93 | """ 94 | Calculates the derivative of `arccos(x)` w.r.t. `x`. 95 | """ 96 | return (-1.0) / math.sqrt(1.0 - x * x) 97 | -------------------------------------------------------------------------------- /nerf_loc/utils/transform/rotation_conversions.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-04-19 15:43:00 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-04-19 15:43:00 6 | FilePath: /nerf-loc/utils/transform/rotation_conversions.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | from typing import Optional 11 | 12 | import torch 13 | import torch.nn.functional as F 14 | 15 | def _axis_angle_rotation(axis: str, angle: torch.Tensor) -> torch.Tensor: 16 | """ 17 | Return the rotation matrices for one of the rotations about an axis 18 | of which Euler angles describe, for each value of the angle given. 19 | 20 | Args: 21 | axis: Axis label "X" or "Y or "Z". 22 | angle: any shape tensor of Euler angles in radians 23 | 24 | Returns: 25 | Rotation matrices as tensor of shape (..., 3, 3). 26 | """ 27 | 28 | cos = torch.cos(angle) 29 | sin = torch.sin(angle) 30 | one = torch.ones_like(angle) 31 | zero = torch.zeros_like(angle) 32 | 33 | if axis == "X": 34 | R_flat = (one, zero, zero, zero, cos, -sin, zero, sin, cos) 35 | elif axis == "Y": 36 | R_flat = (cos, zero, sin, zero, one, zero, -sin, zero, cos) 37 | elif axis == "Z": 38 | R_flat = (cos, -sin, zero, sin, cos, zero, zero, zero, one) 39 | else: 40 | raise ValueError("letter must be either X, Y or Z.") 41 | 42 | return torch.stack(R_flat, -1).reshape(angle.shape + (3, 3)) 43 | 44 | 45 | def euler_angles_to_matrix(euler_angles: torch.Tensor, convention: str) -> torch.Tensor: 46 | """ 47 | Convert rotations given as Euler angles in radians to rotation matrices. 48 | 49 | Args: 50 | euler_angles: Euler angles in radians as tensor of shape (..., 3). 51 | convention: Convention string of three uppercase letters from 52 | {"X", "Y", and "Z"}. 53 | 54 | Returns: 55 | Rotation matrices as tensor of shape (..., 3, 3). 56 | """ 57 | if euler_angles.dim() == 0 or euler_angles.shape[-1] != 3: 58 | raise ValueError("Invalid input euler angles.") 59 | if len(convention) != 3: 60 | raise ValueError("Convention must have 3 letters.") 61 | if convention[1] in (convention[0], convention[2]): 62 | raise ValueError(f"Invalid convention {convention}.") 63 | for letter in convention: 64 | if letter not in ("X", "Y", "Z"): 65 | raise ValueError(f"Invalid letter {letter} in convention string.") 66 | matrices = [ 67 | _axis_angle_rotation(c, e) 68 | for c, e in zip(convention, torch.unbind(euler_angles, -1)) 69 | ] 70 | # return functools.reduce(torch.matmul, matrices) 71 | return torch.matmul(torch.matmul(matrices[0], matrices[1]), matrices[2]) 72 | -------------------------------------------------------------------------------- /nerf_loc/utils/visualization.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-08-01 14:53:46 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-08-19 16:18:20 6 | FilePath: /nerf-loc/utils/visualization.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import os 11 | import cv2 12 | import numpy as np 13 | import imageio 14 | 15 | def project_3d_points(pts3d, w2c, K): 16 | pts3d_cam = w2c[:3,:3] @ pts3d.T + w2c[:3,3:4] 17 | uvz = K @ pts3d_cam 18 | z = uvz[2].copy() 19 | uvz /= z[None] 20 | return uvz[:2].T, z 21 | 22 | def draw_onepose_3d_box(box_corners, img, w2c, K, radius=5, color=(255, 255, 255)): 23 | """ 24 | box_corners: [8,3] 25 | """ 26 | arranged_points,z = project_3d_points(box_corners, w2c, K) 27 | EDGES = [ 28 | [1, 5], [2, 6], [3, 7], [0,4], 29 | [1, 2], [5, 6], [4,7], [0,3], 30 | [1,0], [5,4], [6,7], [2,3] 31 | ] 32 | for i in range(arranged_points.shape[0]): 33 | x, y = arranged_points[i] 34 | cv2.circle( 35 | img, 36 | (int(x), int(y)), 37 | radius, 38 | color, 39 | -10 40 | ) 41 | for edge in EDGES: 42 | start_points = arranged_points[edge[0]] 43 | start_x = int(start_points[0]) 44 | start_y = int(start_points[1]) 45 | end_points = arranged_points[edge[1]] 46 | end_x = int(end_points[0]) 47 | end_y = int(end_points[1]) 48 | cv2.line(img, (start_x, start_y), (end_x, end_y), color, 2) 49 | return img 50 | 51 | def images_to_video(image_folder, video_save_path): 52 | imgs = [] 53 | img_paths = glob.glob(image_folder+'/*') 54 | print(image_folder) 55 | print(img_paths[0]) 56 | # img_paths = sorted(img_paths, key=lambda x: int(os.path.basename(x).split('.')[0])) 57 | # img_paths = sorted(img_paths, key=lambda x: int(os.path.basename(x).split('.')[-2].split('_')[-1])) 58 | img_paths = sorted(img_paths) 59 | for img_path in img_paths: 60 | print(img_path) 61 | imgs.append(cv2.cvtColor(cv2.imread(img_path), cv2.COLOR_RGB2BGR)) 62 | imageio.mimwrite(video_save_path, imgs, fps=30, quality=8) 63 | 64 | if __name__ == '__main__': 65 | # from nerf_loc.configs import config_parser 66 | # from nerf_loc.datasets import build_dataset 67 | # parser = config_parser() 68 | # args = parser.parse_args() 69 | 70 | # multi_dataset = build_dataset(args, 'test') 71 | # dataset = multi_dataset.datasets[0] 72 | 73 | # data = dataset[0] 74 | 75 | # image = (data['image'].transpose(1,2,0)*255).astype(np.uint8) 76 | # K = data['K'] 77 | # w2c = np.linalg.inv(data['pose']) 78 | # box_corners = dataset.bboxes_3d.reshape(-1,3) 79 | # image = draw_onepose_3d_box(box_corners, image, w2c, K) 80 | # cv2.imwrite('vis_box_3d.png', image) 81 | 82 | import sys 83 | import glob 84 | images_to_video(sys.argv[1], sys.argv[2]) 85 | -------------------------------------------------------------------------------- /pl/test.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-06-01 22:19:57 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-08-16 17:16:36 6 | FilePath: /nerf-loc/pl/test.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import argparse 11 | import os 12 | import torch 13 | import pytorch_lightning as pl 14 | from pytorch_lightning.loggers import TensorBoardLogger 15 | 16 | from nerf_loc.configs import get_cfg_defaults, override_cfg_with_args 17 | from nerf_loc.datasets import build_dataset 18 | from nerf_loc.models.nerf_pose_estimator import NerfPoseEstimator 19 | 20 | from .model import Model 21 | 22 | if __name__ == '__main__': 23 | parser = argparse.ArgumentParser() 24 | parser.add_argument('--config', type=str, help='config file path') 25 | parser.add_argument("--num_nodes", type=int, default=1, help='number of nodes') 26 | parser.add_argument("--gpus", type=int, default=-1, help='number of gpus') 27 | parser.add_argument("--ckpt", type=str, default=None, help='whole model file') 28 | parser.add_argument("--test_render_interval", type=int, default=-1, help='interval of rendering test image') 29 | parser.add_argument('--vis_3d_box', action='store_true', help='save onepose box visualization') 30 | parser.add_argument('--vis_rendering', action='store_true', help='save rendered image for visualization') 31 | parser.add_argument('--vis_trajectory', action='store_true', help='save camera trajectory for visualization') 32 | args = parser.parse_args() 33 | 34 | cfg = get_cfg_defaults() 35 | cfg.merge_from_file(args.config) 36 | cfg = override_cfg_with_args(cfg, args) 37 | 38 | basedir = cfg.basedir 39 | expname = cfg.expname 40 | exp_dir = os.path.join(basedir, expname) 41 | os.makedirs(exp_dir, exist_ok=True) 42 | 43 | # logger = TensorBoardLogger(save_dir=basedir, name=expname, version=cfg.version) 44 | 45 | trainset = build_dataset(cfg, 'train') 46 | # trainset.set_mode('test') 47 | 48 | testset = build_dataset(cfg, 'test') 49 | testset.set_mode('test') 50 | test_dataloader = torch.utils.data.DataLoader( 51 | testset, batch_size=1, num_workers=8, shuffle=False, drop_last=False) 52 | 53 | pose_estimator = NerfPoseEstimator(cfg, trainset) 54 | model = Model(pose_estimator).eval() 55 | trainer = pl.Trainer( 56 | # logger=logger, 57 | accelerator='ddp', 58 | num_nodes=1, 59 | devices=1, 60 | gpus=-1, 61 | max_epochs=1000, 62 | num_sanity_val_steps=20, 63 | benchmark=True, 64 | ) 65 | 66 | if args.ckpt is not None: 67 | model.load_ckpt(args.ckpt) 68 | trainer.test(model, dataloaders=test_dataloader) 69 | -------------------------------------------------------------------------------- /pl/train.py: -------------------------------------------------------------------------------- 1 | """ 2 | Author: jenningsliu 3 | Date: 2022-06-01 22:19:57 4 | LastEditors: jenningsliu 5 | LastEditTime: 2022-08-19 19:00:51 6 | FilePath: /nerf-loc/pl/train.py 7 | Description: 8 | Copyright (c) 2022 by Tencent, All Rights Reserved. 9 | """ 10 | import os 11 | import glob 12 | import argparse 13 | 14 | import torch 15 | import pytorch_lightning as pl 16 | from pytorch_lightning.loggers import TensorBoardLogger 17 | from model import Model 18 | from nerf_loc.datasets import build_dataset 19 | 20 | from nerf_loc.models.nerf_pose_estimator import NerfPoseEstimator 21 | from nerf_loc.configs import get_cfg_defaults, override_cfg_with_args 22 | 23 | if __name__ == '__main__': 24 | parser = argparse.ArgumentParser() 25 | parser.add_argument('--config', type=str, help='config file path') 26 | parser.add_argument("--num_nodes", type=int, default=1, help='number of nodes') 27 | parser.add_argument("--gpus", type=int, default=-1, help='number of gpus') 28 | args = parser.parse_args() 29 | 30 | cfg = get_cfg_defaults() 31 | cfg.merge_from_file(args.config) 32 | cfg = override_cfg_with_args(cfg, args) 33 | 34 | basedir = cfg.basedir 35 | expname = cfg.expname 36 | exp_dir = os.path.join(basedir, expname) 37 | os.makedirs(exp_dir, exist_ok=True) 38 | # logging.basicConfig(level=logging.INFO, filename=os.path.join(exp_dir, 'train_pose.log')) 39 | # logging.basicConfig(level=logging.INFO, stream=sys.stdout) 40 | # logger = logging.getLogger(__name__) 41 | # logger.info(str(cfg)) 42 | logger = TensorBoardLogger(save_dir=basedir, name=expname, version=cfg.version) 43 | 44 | trainset = build_dataset(cfg, 'train') 45 | trainset.set_mode('train') 46 | train_size = len(trainset) 47 | train_loader = torch.utils.data.DataLoader( 48 | trainset, batch_size=1, num_workers=10, shuffle=True, drop_last=False, pin_memory=True) 49 | 50 | testset = build_dataset(cfg, 'test') 51 | testset.set_mode('test') 52 | test_dataloader = torch.utils.data.DataLoader( 53 | testset, batch_size=1, num_workers=10, shuffle=False, drop_last=False, pin_memory=True) 54 | 55 | if cfg.dataset_type == 'nerf_pretrain' or not cfg.train_pose: 56 | checkpoint_callback = pl.callbacks.ModelCheckpoint( 57 | dirpath=os.path.join(exp_dir, cfg.version, 'checkpoints'), 58 | filename='epoch{epoch:02d}-psnr{psnr_test:.4f}', 59 | monitor='psnr_test', 60 | mode='max', 61 | save_top_k=1, 62 | verbose=True, 63 | auto_insert_metric_name=False 64 | ) 65 | elif cfg.dataset_type.startswith('video_') and testset.datasets[0].cfg.type == 'cambridge': 66 | checkpoint_callback = pl.callbacks.ModelCheckpoint( 67 | dirpath=os.path.join(exp_dir, cfg.version, 'checkpoints'), 68 | filename='epoch{epoch:02d}-median_trans_err{median_trans_err/avg:.4f}', 69 | monitor='median_trans_err/avg', 70 | mode='min', 71 | save_top_k=5, 72 | verbose=True, 73 | auto_insert_metric_name=False 74 | ) 75 | else: 76 | checkpoint_callback = pl.callbacks.ModelCheckpoint( 77 | dirpath=os.path.join(exp_dir, cfg.version, 'checkpoints'), 78 | filename='epoch{epoch:02d}-acc{pose_acc/avg:.4f}', 79 | monitor='pose_acc/avg', 80 | mode='max', 81 | save_last=True, 82 | save_top_k=5, 83 | verbose=True, 84 | auto_insert_metric_name=False 85 | ) 86 | 87 | pose_estimator = NerfPoseEstimator(cfg, trainset) 88 | print(pose_estimator) 89 | model = Model(pose_estimator) 90 | 91 | # auto resume from the last checkpoint 92 | ckpts = glob.glob(os.path.join(exp_dir, cfg.version, 'checkpoints')+'/*.ckpt') 93 | ckpts = sorted(ckpts) 94 | if len(ckpts) > 0: 95 | resume_from_checkpoint = ckpts[-1] 96 | print('resume from ', resume_from_checkpoint) 97 | else: 98 | resume_from_checkpoint = None 99 | 100 | trainer = pl.Trainer( 101 | logger=logger, 102 | callbacks=[checkpoint_callback], 103 | accelerator='ddp', 104 | num_nodes=args.num_nodes, 105 | gpus=args.gpus, 106 | max_epochs=cfg.max_epochs, 107 | check_val_every_n_epoch=1, 108 | num_sanity_val_steps=0, 109 | benchmark=True, 110 | gradient_clip_val=1.0, 111 | resume_from_checkpoint=resume_from_checkpoint 112 | ) 113 | if resume_from_checkpoint is None and cfg.ckpt: 114 | print('load pretrain weights: ', cfg.ckpt) 115 | model.load_ckpt(cfg.ckpt) 116 | 117 | trainer.fit(model=model, train_dataloaders=train_loader, val_dataloaders=test_dataloader) 118 | 119 | # trainer.fit(model=model, train_dataloaders=train_loader, val_dataloaders=test_dataloader, ckpt_path=cfg.ckpt) 120 | 121 | # state_dict = {k.replace('pose_estimator.', ''):v for k,v in torch.load(cfg.ckpt)['state_dict'].items()} 122 | # model.pose_estimator.load_state_dict(state_dict) 123 | # # trainer.fit(model=model, train_dataloaders=train_loader, val_dataloaders=test_dataloader) 124 | # trainer.test(model, dataloaders=test_dataloader) 125 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pytorch==1.8.1 2 | torchvision==0.9.1 3 | # torchaudio==0.8.1 4 | pytorch_lightning==1.6.0 5 | imageio 6 | imageio-ffmpeg 7 | matplotlib 8 | configargparse 9 | tensorboard>=2.0 10 | tqdm 11 | opencv-python 12 | # spconv-cu111 13 | pycolmap>=0.1.0 14 | scikit-image 15 | scikit-learn 16 | trimesh 17 | h5py 18 | einops 19 | easydict 20 | kornia==0.6.4 21 | numba==0.55.1 22 | omegaconf==2.2.0 23 | yacs==0.1.8 24 | lmdb 25 | inplace_abn 26 | transforms3d 27 | --------------------------------------------------------------------------------