├── .gitignore ├── README.md └── assets ├── vis_failed_case.png ├── vis_matching.png ├── vis_rendered_image.png └── vrs-nerf-comp.gif /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # VRS-NeRF: Visual Relocalization with Sparse Neural Radiance Field 2 | 3 | The implementation of `VRS-NeRF: Visual Relocalization with Sparse Neural Radiance Field` which provides a a new 4 | baseline of applying NeRFs to visual relocalization task. We use zip-nerf as implicit learning map (ILM) and 3D 5 | reconstruction as explicit geometric map (EGM) for efficient localization. In the localization process, we use SFD2 as 6 | local features, IMP as matcher, and online rendering of patches for matching. Results on 7Scenes and CambridgeLandmarks 7 | are promising and much better than previous LENS and NeRF-loc. However, the pose accuracy on Aachen dataset is not 8 | satisfying because of the poor quality of rendered 9 | images. 10 | 11 | ## Results 12 | 13 | ### Rendered video 14 | 15 | ![Video](assets/vrs-nerf-comp.gif) 16 | 17 | ### Groundtruth and rendered images 18 | 19 | ![Rendered image](assets/vis_rendered_image.png) 20 | 21 | ### Matching between query and rendered images 22 | 23 | ![Matching](assets/vis_matching.png) 24 | 25 | ### 7Scenes dataset (patch size=15, median position (cm), rotation (deg) errors and average percentage of poses within error of 5cm, 5deg) 26 | 27 | | | chess | fire | heads | office | pumpkin | kitchen | stairs | Average (%) | 28 | |:--------:|:------:|---------|:------:|:------:|:-------:|:-------:|:-------:|:-----------:| 29 | | LENS | 3, 1.3 | 10, 3.7 | 7, 5.8 | 7, 1.9 | 8, 2.2 | 9, 2.2 | 14, 3.6 | - | 30 | | NeRF-loc | 2, 1.1 | 2, 1.1 | 1, 1.9 | 2, 1.1 | 3, 1.3 | 3, 1.5 | 3, 1.3 | 89.5 | 31 | | ACE | 2, 1.1 | 2, 1.8 | 2, 1.1 | 3, 1.4 | 3, 1.3 | 3, 1.3 | 3, 1.2 | 97.1 | 32 | | SP+SG | 0, 0.1 | 1, 0.2 | 0, 0.2 | 1, 0.2 | 1, 0.1 | 0, 0.1 | 2, 0.6 | 95.7 | 33 | | SFD2+IMP | 0, 0.1 | 1, 0.2 | 0, 0.2 | 1, 0.2 | 1, 0.2 | 0, 0 | 2, 0.5 | 95.7 | 34 | | VRS-NeRF | 0, 0.1 | 1, 0.2 | 0, 0.2 | 1, 0.2 | 1, 0.2 | 0, 0.1 | 3, 0.8 | 93.1 | 35 | 36 | ### CambridgeLandmarks (patch size=15, median position (cm), rotation (deg) errors and average percentage of poses within error of 25cm, 2deg) 37 | 38 | | | Kings College | Great Court | Old Hospital | Shop Facade | St Mary Church | Average (%) | 39 | |:--------:|:-------------:|:-----------:|:------------:|:-----------:|:--------------:|:-----------:| 40 | | LENS | 33, 0.5 | - | 44, 0.9 | 27, 1.6 | 53, 1.6 | - | 41 | | NeRF-loc | 7, 0.2 | 25, 0.1 | 18, 0.4 | 11, 0.2 | 4, 0.2 | - | 42 | | ACE | 18, 0.4 | 42, 0.2 | 31, 0.6 | 5, 0.3 | 19, 0.6 | 54.68 | 43 | | SP+SG | 7, 0.1 | 12, 0.1 | 9, 0.2 | 2, 0.1 | 4, 0.1 | 89.4 | 44 | | SFD2+IMP | 7, 0.1 | 11, 0.1 | 10, 0.2 | 2, 0.1 | 4, 0.1 | 89.1 | 45 | | VRS-NeRF | 9, 0.1 | - | 11, 0.2 | 2, 0.1 | 5, 0.2 | 89.3 | 46 | 47 | ### Aachen dataset (percentage of poses within error of 0.25m, 2deg / 0.5m, 5deg / 5m, 10deg) 48 | 49 | | | Day | Night | 50 | |:-------------:|:------------------:|:-------------------:| 51 | | ESAC | 42.6 / 59.6 / 75.5 | 3.1 / 9.2 / 11.2 | 52 | | HSCNet | 71.1 / 81.9 / 91.7 | 32.7 / 43.9 / 65.3 | 53 | | SP+SPG | 89.6 / 95.4 / 98.8 | 86.7 / 93.9 / 100.0 | 54 | | SFD2+IMP | 89.7 / 96.5 / 98.9 | 84.7 / 94.9 / 100.0 | 55 | | VRS-NeRF (15) | 60.8 / 67.8 / 73.1 | 19.4 / 22.4 / 25.5 | 56 | | VRS-NeRF (31) | 70.1 / 76.9 / 80.9 | 44.9 / 51.0 / 62.2 | 57 | 58 | ### Imperfect rendering 59 | 60 | ![Rendered image](assets/vis_failed_case.png) 61 | 62 | ### Code and pretrained models will come soon 63 | 64 | ## Citation 65 | 66 | ``` 67 | @article{xue2024vrs, 68 | author = {Fei Xue and Ignas Budvytis and Daniel Olmeda Reino and Roberto Cipolla}, 69 | title = {VRS-NeRF: Visual Relocalization with Sparse Neural Radiance Field}, 70 | booktitle = {ECCVW}, 71 | year = {2024} 72 | } 73 | 74 | @inproceedings{sfd22023, 75 | title={{SFD2: Semantic-guided Feature Detection and Description}}, 76 | author={Xue, Fei and Budvytis, Ignas and Cipolla, Roberto}, 77 | booktitle={CVPR}, 78 | year={2023} 79 | } 80 | 81 | @inproceedings{imp2023, 82 | title={IMP: Iterative Matching and Pose Estimation with Adaptive Pooling}, 83 | author={Xue, Fei and Budvytis, Ignas and Cipolla, Roberto}, 84 | booktitle={CVPR}, 85 | year={2023} 86 | } 87 | 88 | @inproceedings{barron2023zipnerf, 89 | title={Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields}, 90 | author={Jonathan T. Barron and Ben Mildenhall and Dor Verbin and Pratul P. Srinivasan and Peter Hedman}, 91 | booktitle={ICCV}, 92 | year={2023} 93 | } 94 | ``` 95 | -------------------------------------------------------------------------------- /assets/vis_failed_case.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/feixue94/vrs-nerf/ce76b5aaaeecb1e78d7cacd07d02f766d39ae300/assets/vis_failed_case.png -------------------------------------------------------------------------------- /assets/vis_matching.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/feixue94/vrs-nerf/ce76b5aaaeecb1e78d7cacd07d02f766d39ae300/assets/vis_matching.png -------------------------------------------------------------------------------- /assets/vis_rendered_image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/feixue94/vrs-nerf/ce76b5aaaeecb1e78d7cacd07d02f766d39ae300/assets/vis_rendered_image.png -------------------------------------------------------------------------------- /assets/vrs-nerf-comp.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/feixue94/vrs-nerf/ce76b5aaaeecb1e78d7cacd07d02f766d39ae300/assets/vrs-nerf-comp.gif --------------------------------------------------------------------------------