3 | Awesome Visual-Language-Navigation (VLN) 4 |

└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | <p align="center">
  2 |   <h1 align="center">
  3 |   Awesome Visual-Language-Navigation (VLN)
  4 |   </h1>
  5 | </p>
  6 | 
  7 | This repository contains a curated list of resources addressing the VLN (Visual Language Navigation).
  8 | Additionally, it includes related papers from areas such as Learning-based Navigation, etc.
  9 | 
 10 | If you find some ignored papers, **feel free to [*create pull requests*](https://github.com/KwanWaiPang/Awesome-Transformer-based-SLAM/blob/pdf/How-to-PR.md), or [*open issues*](https://github.com/KwanWaiPang/Awesome-VLN/issues/new)**. 
 11 | 
 12 | Contributions in any form to make this list more comprehensive are welcome.
 13 | 
 14 | If you find this repository useful, a simple star should be the best affirmation. 😊
 15 | 
 16 | Feel free to share this list with others!
 17 | 
 18 | # Overview
 19 | - [VLN](#VLN)
 20 |   - [Simulator and Dataset](#Simulator-and-Dataset)
 21 |   - [Survey Paper](#Survey-Paper)
 22 | - [Learning-based Navigation](#Learning-based-Navigation)
 23 |   - [Mapless navigation](#Mapless-navigation) 
 24 | - [Others](#Others)
 25 |   - [Occupancy Perception](#Occupancy-Perception)
 26 |   - [VLA](#VLA)
 27 | 
 28 | # VLN
 29 | <!-- |---|`arXiv`|---|---|---| -->
 30 | <!-- [![Github stars](https://img.shields.io/github/stars/***.svg)]() -->
 31 | | Year | Venue | Paper Title | Repository | Note |
 32 | |:----:|:-----:| ----------- |:----------:|:----:|
 33 | |2025|`arXiv`|[Embodied navigation foundation model](https://arxiv.org/pdf/2509.12129)|---|[website](https://pku-epic.github.io/NavFoM-Web/)<br>NavFoM|
 34 | |2025|`arXiv`|[InternVLA-N1: An Open Dual-System Vision-Language Navigation Foundation Model with Learned Latent Plans](https://internrobotics.github.io/internvla-n1.github.io/static/pdfs/InternVLA_N1.pdf)|[![Github stars](https://img.shields.io/github/stars/InternRobotics/InternNav.svg)](https://github.com/InternRobotics/InternNav) |[website](https://internrobotics.github.io/internvla-n1.github.io/)|
 35 | |2025|`arXiv`|[Odyssey: Open-world quadrupeds exploration and manipulation for long-horizon tasks](https://arxiv.org/pdf/2508.08240)|---|[website](https://kaijwang.github.io/odyssey.github.io/)|
 36 | |2025|`arXiv`|[OpenVLN: Open-world aerial Vision-Language Navigation](https://arxiv.org/pdf/2511.06182)|---|---|
 37 | |2025|`arXiv`|[VLN-Zero: Rapid Exploration and Cache-Enabled Neurosymbolic Vision-Language Planning for Zero-Shot Transfer in Robot Navigation](https://arxiv.org/pdf/2509.18592?)|[![Github stars](https://img.shields.io/github/stars/VLN-Zero/vln-zero.github.io.svg)](https://github.com/VLN-Zero/vln-zero.github.io)|[website](https://vln-zero.github.io/)|
 38 | |2025|`arXiv`|[JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation](https://arxiv.org/pdf/2509.22548)|[![Github stars](https://img.shields.io/github/stars/MIV-XJTU/JanusVLN.svg)](https://github.com/MIV-XJTU/JanusVLN)|[Website](https://miv-xjtu.github.io/JanusVLN.github.io/)|
 39 | |2025|`arXiv`|[Streamvln: Streaming vision-and-language navigation via slowfast context modeling](https://arxiv.org/pdf/2507.05240)|[![Github stars](https://img.shields.io/github/stars/InternRobotics/StreamVLN.svg)](https://github.com/InternRobotics/StreamVLN)|[Website](https://streamvln.github.io/)|
 40 | |2025|`arXiv`|[GC-VLN: Instruction as Graph Constraints for Training-free Vision-and-Language Navigation](https://arxiv.org/pdf/2509.10454)|[![Github stars](https://img.shields.io/github/stars/bagh2178/GC-VLN.svg)](https://github.com/bagh2178/GC-VLN)|[Website](https://bagh2178.github.io/GC-VLN/)|
 41 | |2025|`arXiv`|[Boosting Zero-Shot VLN via Abstract Obstacle Map-Based Waypoint Prediction with TopoGraph-and-VisitInfo-Aware Prompting](https://arxiv.org/pdf/2509.20499)|---|---|
 42 | |2025|`arXiv`|[SLAM-Free Visual Navigation with Hierarchical Vision-Language Perception and Coarse-to-Fine Semantic Topological Planning](https://arxiv.org/pdf/2509.20739)|---|---|
 43 | |2025|`arXiv`|[Zero-shot Object-Centric Instruction Following: Integrating Foundation Models with Traditional Navigation](https://arxiv.org/pdf/2411.07848)|---|[website](https://sonia-raychaudhuri.github.io/nlslam/)| 
 44 | |2025|`RSS`|[Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks](https://arxiv.org/pdf/2412.06224)|[![Github stars](https://img.shields.io/github/stars/jzhzhang/Uni-NaVid.svg)](https://github.com/jzhzhang/Uni-NaVid)|[website](https://pku-epic.github.io/Uni-NaVid/)|
 45 | |2025|`RSS`|[NaVILA: Legged Robot Vision-Language-Action Model for Navigation](https://arxiv.org/pdf/2412.04453)|[![Github stars](https://img.shields.io/github/stars/AnjieCheng/NaVILA.svg)](https://github.com/AnjieCheng/NaVILA)|[website](https://navila-bot.github.io/)|
 46 | |2025|`ICCV`|[Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation](https://arxiv.org/pdf/2507.04047)|[![Github stars](https://img.shields.io/github/stars/MTU3D/MTU3D.svg)](https://github.com/MTU3D/MTU3D)|[website](https://mtu3d.github.io/)| 
 47 | | 2025 | `ACL` | [MapNav: A Novel Memory Representation via Annotated Semantic Maps for VLM-based Vision-and-Language Navigation](https://arxiv.org/pdf/2502.13451) |---|---|
 48 | | 2025 | `CVPR` | [Scene Map-based Prompt Tuning for Navigation Instruction Generation](https://openaccess.thecvf.com/content/CVPR2025/papers/Fan_Scene_Map-based_Prompt_Tuning_for_Navigation_Instruction_Generation_CVPR_2025_paper.pdf) |---|---|
 49 | | 2025 | `ACL` | [NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM](https://arxiv.org/pdf/2502.11142) | [![Github stars](https://img.shields.io/github/stars/MrZihan/NavRAG.svg)](https://github.com/MrZihan/NavRAG) |---|
 50 | | 2025 | ICLR | [Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel](https://arxiv.org/abs/2412.08467) | [![Github stars](https://img.shields.io/github/stars/wz0919/VLN-SRDF.svg)](https://github.com/wz0919/VLN-SRDF) |---|
 51 | | 2025 | `ICCV` | [SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts](https://arxiv.org/pdf/2412.05552) | [![Github stars](https://img.shields.io/github/stars/GengzeZhou/SAME.svg)](https://github.com/GengzeZhou/SAME) |---|
 52 | | 2025 | `ICCV` | [NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments](https://arxiv.org/pdf/2506.23468) | [![Github stars](https://img.shields.io/github/stars/Feliciaxyao/NavMorph.svg)](https://github.com/Feliciaxyao/NavMorph) |---|
 53 | | 2025 | `AAAI` | [Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation](https://arxiv.org/abs/2407.05890) |---|---|
 54 | | 2025 | Arxiv | [EvolveNav: Self-Improving Embodied Reasoning for LLM-Based Vision-Language Navigation](https://arxiv.org/pdf/2506.01551) | [![Github stars](https://img.shields.io/github/stars/expectorlin/EvolveNav.svg)](https://github.com/expectorlin/EvolveNav) |---|
 55 | | 2025 | `CVPR` | [Do Visual Imaginations Improve Vision-and-Language Navigation Agents?](https://arxiv.org/pdf/2503.16394) |---|---|
 56 | | 2024 | `AAAI` | [VLN-Video: Utilizing Driving Videos for Outdoor Vision-and-Language Navigation](https://arxiv.org/abs/2402.03561) |---|---|
 57 | | 2024 | `CVPR` | [Volumetric Environment Representation for Vision-Language Navigation](https://arxiv.org/pdf/2403.14158) | [![Github stars](https://img.shields.io/github/stars/DefaultRui/VLN-VER.svg)](https://github.com/DefaultRui/VLN-VER) |---|
 58 | |2024|`ECCV`|[Navgpt-2: Unleashing navigational reasoning capability for large vision-language models](https://arxiv.org/pdf/2407.12366?)|[![Github stars](https://img.shields.io/github/stars/GengzeZhou/NavGPT-2.svg)](https://github.com/GengzeZhou/NavGPT-2)|---| 
 59 | | 2024 | `CVPR` | [Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation](https://arxiv.org/pdf/2404.01943) | [![Github stars](https://img.shields.io/github/stars/MrZihan/HNR-VLN.svg)](https://github.com/MrZihan/HNR-VLN) |---|
 60 | | 2024 | `TPAMI` | [ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments](https://arxiv.org/abs/2304.03047v2) | [![Github stars](https://img.shields.io/github/stars/MarSaKi/ETPNav.svg)](https://github.com/MarSaKi/ETPNav) |---|
 61 | | 2024 | `MM` | [Narrowing the Gap between Vision and Action in Navigation](https://www.arxiv.org/abs/2408.10388) |---|---|
 62 | | 2024 | `ECCV` | [LLM as Copilot for Coarse-grained Vision-and-Language Navigation](https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/00833.pdf) |---|---|
 63 | | 2024 | `ICRA` | [Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions](https://ieeexplore.ieee.org/abstract/document/10611565) | [![Github stars](https://img.shields.io/github/stars/LYX0501/DiscussNav.svg)](https://github.com/LYX0501/DiscussNav) |---|
 64 | | 2024 | `ACL` | [MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation](https://arxiv.org/abs/2401.07314) | [![Github stars](https://img.shields.io/github/stars/chen-judge/MapGPT.svg)](https://chen-judge.github.io/MapGPT/) |---|
 65 | | 2024 |`arXiv`| [MC-GPT: Empowering Vision-and-LanguageNavigation with Memory Map and Reasoning Chains](https://arxiv.org/pdf/2405.10620) |---|---|
 66 | | 2024 |`arXiv`| [InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment](https://arxiv.org/pdf/2406.04882) | [![Github stars](https://img.shields.io/github/stars/LYX0501/InstructNav.svg)](https://github.com/LYX0501/InstructNav) |---|
 67 | | 2024 | `AAAI` | [NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models](https://arxiv.org/abs/2305.16986) | [![Github stars](https://img.shields.io/github/stars/GengzeZhou/NavGPT.svg)](https://github.com/GengzeZhou/NavGPT) |---|
 68 | | 2024 | NACCL Findings | [LangNav: Language as a Perceptual Representation for Navigation](https://aclanthology.org/2024.findings-naacl.60.pdf) | [![Github stars](https://img.shields.io/github/stars/pbw-Berwin/LangNav.svg)](https://github.com/pbw-Berwin/LangNav) |---|
 69 | | 2024 |`arXiv`| [NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning](https://arxiv.org/abs/2403.07376) | [![Github stars](https://img.shields.io/github/stars/expectorlin/NavCoT.svg)](https://github.com/expectorlin/NavCoT) |---|
 70 | | 2024 | `CVPR` | [Towards Learning a Generalist Model for Embodied Navigation](https://arxiv.org/abs/2312.02010) | [![Github stars](https://img.shields.io/github/stars/LaVi-Lab/NaviLLM.svg)](https://github.com/LaVi-Lab/NaviLLM) |NaviLLM|
 71 | | 2024 | `RSS` | [NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation](https://arxiv.org/pdf/2402.15852) | [![Github stars](https://img.shields.io/github/stars/jzhzhang/NaVid-VLN-CE.svg)](https://github.com/jzhzhang/NaVid-VLN-CE) |[website](https://pku-epic.github.io/NaVid/)|
 72 | | 2024 |`EMNLP`| [Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation](https://arxiv.org/pdf/2409.17313) | [![Github stars](https://img.shields.io/github/stars/zehao-wang/navnuances.svg)](https://github.com/zehao-wang/navnuances) |---|
 73 | | 2023 | `CVPR` | [Behavioral Analysis of Vision-and-Language Navigation Agents](https://yoark.github.io/assets/pdf/vln-behave/vln-behave.pdf) | [![Github stars](https://img.shields.io/github/stars/Yoark/vln-behave.svg)](https://github.com/Yoark/vln-behave) |---|
 74 | | 2023 | `ICCV` | [March in Chat: Interactive Prompting for Remote Embodied Referring Expression](https://openaccess.thecvf.com//content/ICCV2023/papers/Qiao_March_in_Chat_Interactive_Prompting_for_Remote_Embodied_Referring_Expression_ICCV_2023_paper.pdf) | [![Github stars](https://img.shields.io/github/stars/YanyuanQiao/MiC.svg)](https://github.com/YanyuanQiao/MiC) |---|
 75 | | 2023 |`arXiv`| [Vision and Language Navigation in the Real World via Online Visual Language Mapping](https://arxiv.org/pdf/2310.10822) |---|---|
 76 | | 2023 | `NeurIPS` | [A2Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models](https://peihaochen.github.io/files/publications/A2Nav.pdf) |---|---|
 77 | | 2023 | `ICCV` | [BEVBert: Multimodal Map Pre-training for Language-guided Navigation](https://arxiv.org/pdf/2212.04385) | [![Github stars](https://img.shields.io/github/stars/MarSaKi/VLN-BEVBert.svg)](https://github.com/MarSaKi/VLN-BEVBert) |---|
 78 | |2023|`CVPR`|[Cows on pasture: Baselines and benchmarks for language-driven zero-shot object navigation](https://openaccess.thecvf.com/content/CVPR2023/papers/Gadre_CoWs_on_Pasture_Baselines_and_Benchmarks_for_Language-Driven_Zero-Shot_Object_CVPR_2023_paper.pdf)|[![Github stars](https://img.shields.io/github/stars/real-stanford/cow.svg)](https://github.com/real-stanford/cow)|CLIP on Wheels<br>[website](https://cow.cs.columbia.edu/)|
 79 | |2023|`NIPS`|[Frequency-enhanced data augmentation for vision-and-language navigation](https://proceedings.neurips.cc/paper_files/paper/2023/file/0d9e08f247ca7fbbfd5e50b7ff9cf357-Paper-Conference.pdf)|[![Github stars](https://img.shields.io/github/stars/hekj/FDA.svg)](https://github.com/hekj/FDA)|---|
 80 | |2023|`NIPS`|[Find what you want: Learning demand-conditioned object attribute space for demand-driven navigation](https://proceedings.neurips.cc/paper_files/paper/2023/file/34e278fbbd7d6d7d788c98065988e1a9-Paper-Conference.pdf)|[![Github stars](https://img.shields.io/github/stars/whcpumpkin/Demand-driven-navigation.svg)](https://github.com/whcpumpkin/Demand-driven-navigation)|[website](https://sites.google.com/view/demand-driven-navigation)|
 81 | |2023|`ACL`|[Aerial vision-and-dialog navigation](https://arxiv.org/pdf/2205.12219)|[![Github stars](https://img.shields.io/github/stars/eric-ai-lab/Aerial-Vision-and-Dialog-Navigation.svg)](https://github.com/eric-ai-lab/Aerial-Vision-and-Dialog-Navigation)|[website](https://sites.google.com/view/aerial-vision-and-dialog/home)| 
 82 | | 2023 | `AAAI` | [Actional Atomic-Concept Learning for Demystifying Vision-Language Navigation](https://arxiv.org/pdf/2302.06072) |---|---|
 83 | | 2023 | `ICCV` | [Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation](https://arxiv.org/abs/2308.12587) | [![Github stars](https://img.shields.io/github/stars/CSir1996/VLN-GELA.svg)](https://github.com/CSir1996/VLN-GELA) |---|
 84 | | 2023 | `CVPR` | [Adaptive Zone-aware Hierarchical Planner for Vision-Language Navigation](https://openaccess.thecvf.com/content/CVPR2023/papers/Gao_Adaptive_Zone-Aware_Hierarchical_Planner_for_Vision-Language_Navigation_CVPR_2023_paper.pdf) | [![Github stars](https://img.shields.io/github/stars/chengaopro/AZHP.svg)](https://github.com/chengaopro/AZHP) |---|
 85 | | 2023 | `ICCV` | [Bird's-Eye-View Scene Graph for Vision-Language Navigation](https://arxiv.org/abs/2308.04758) |---|---|
 86 | | 2023 |`EMNLP`| [Masked Path Modeling for Vision-and-Language Navigation](https://arxiv.org/abs/2305.14268) |---|---|
 87 | | 2023 | `CVPR` | [Improving Vision-and-Language Navigation by Generating Future-View Image Semantics](https://arxiv.org/pdf/2304.04907) | [![Github stars](https://img.shields.io/github/stars/jialuli-luka/VLN-SIG.svg)](https://github.com/jialuli-luka/VLN-SIG) |---|
 88 | | 2023 | `TPAMI` | [HOP+: History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation](https://ieeexplore.ieee.org/document/10006384) |---|---|
 89 | | 2023 | `TPAMI` | [Learning to Follow and Generate Instructions for Language-Capable Navigation](https://ieeexplore.ieee.org/document/10359152) |---|---|
 90 | | 2023 | `CVPR` | [A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning](https://arxiv.org/pdf/2210.03112) | --- |[Dataset](https://github.com/google-research-datasets/RxR/tree/main/marky-mT5)|
 91 | | 2023 | `CVPR` | [Lana: A Language-Capable Navigator for Instruction Following and Generation](https://arxiv.org/abs/2303.08409) | [![Github stars](https://img.shields.io/github/stars/wxh1996/LANA-VLN.svg)](https://github.com/wxh1996/LANA-VLN) |---|
 92 | | 2023 | `CVPR` | [KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation](https://openaccess.thecvf.com/content/CVPR2023/papers/Li_KERM_Knowledge_Enhanced_Reasoning_for_Vision-and-Language_Navigation_CVPR_2023_paper.pdf) | [![Github stars](https://img.shields.io/github/stars/xiangyangli-cn/KERM.svg)](https://github.com/xiangyangli-cn/KERM) |---|
 93 | | 2023 | `MM` | [PASTS: Progress-Aware Spatio-Temporal Transformer Speaker For Vision-and-Language Navigation](https://arxiv.org/pdf/2305.11918) |---|---|
 94 | | 2023 |`arXiv`| [CrossMap Transformer: A Crossmodal Masked Path Transformer Using Double Back-Translation for Vision-and-Language Navigation](https://arxiv.org/abs/2103.00852) |---|---|
 95 | | 2023 | `ACL` | [VLN-Trans: Translator for the Vision and Language Navigation Agent](https://arxiv.org/pdf/2302.09230) | [![Github stars](https://img.shields.io/github/stars/HLR/VLN-trans.svg)](https://github.com/HLR/VLN-trans) |---|
 96 | | 2022 | `ACL` | [Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration](https://arxiv.org/pdf/2203.04006) | [![Github stars](https://img.shields.io/github/stars/liangcici/Probes-VLN.svg)](https://github.com/liangcici/Probes-VLN) |---|
 97 | | 2022 | `CVPR` | [Less is More: Generating Grounded Navigation Instructions from Landmarks](https://arxiv.org/pdf/2004.14973) | [![Github stars](https://img.shields.io/github/stars/google-research-datasets/RxR.svg)](https://github.com/google-research-datasets/RxR/tree/main/marky-mT5) |---|
 98 | | 2022 | `MM` | [Target-Driven Structured Transformer Planner for Vision-Language Navigation](https://arxiv.org/pdf/2207.11201) | [![Github stars](https://img.shields.io/github/stars/YushengZhao/TD-STP.svg)](https://github.com/YushengZhao/TD-STP) |---|
 99 | | 2022 | `CVPR` | [HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation](https://ieeexplore.ieee.org/document/9880046) | [![Github stars](https://img.shields.io/github/stars/YanyuanQiao/HOP-VLN.svg)](https://github.com/YanyuanQiao/HOP-VLN) |---|
100 | | 2022 | `International Conference on Computational Linguistics` | [LOViS: Learning Orientation and Visual Signals for Vision and Language Navigation](https://aclanthology.org/2022.coling-1.505.pdf) | [![Github stars](https://img.shields.io/github/stars/HLR/LOViS.svg)](https://github.com/HLR/LOViS) |---|
101 | | 2022 | NACCL | [Diagnosing Vision-and-Language Navigation: What Really Matters](https://aclanthology.org/2022.naacl-main.438.pdf) | [![Github stars](https://img.shields.io/github/stars/VegB/Diagnose_VLN.svg)](https://github.com/VegB/Diagnose_VLN) |---|
102 | | 2022 |`arXiv`| [CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation](https://arxiv.org/pdf/2211.16649) |---|---|
103 | | 2022 | `CVPR` | [Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation](https://arxiv.org/abs/2203.02764) | [![Github stars](https://img.shields.io/github/stars/YicongHong/Discrete-Continuous-VLN.svg)](https://github.com/YicongHong/Discrete-Continuous-VLN) |---|
104 | | 2021 | `CVPR` | [Scene-Intuitive Agent for Remote Embodied Visual Grounding](https://arxiv.org/pdf/2103.12944) |---|---|
105 | | 2021 | `NeurIPS` | [SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation](https://arxiv.org/abs/2110.14143) |---|---|
106 | | 2021 | `ICCV` | [The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation](https://openaccess.thecvf.com/content/ICCV2021/papers/Qi_The_Road_To_Know-Where_An_Object-and-Room_Informed_Sequential_BERT_for_ICCV_2021_paper.pdf) | [![Github stars](https://img.shields.io/github/stars/YuankaiQi/ORIST.svg)](https://github.com/YuankaiQi/ORIST) |---|
107 | | 2021 | `CVPR` | [VLN BERT: A Recurrent Vision-and-Language BERT for Navigation](https://openaccess.thecvf.com/content/CVPR2021/papers/Hong_VLN_BERT_A_Recurrent_Vision-and-Language_BERT_for_Navigation_CVPR_2021_paper.pdf) | [![Github stars](https://img.shields.io/github/stars/YicongHong/Recurrent-VLN-BERT.svg)](https://github.com/YicongHong/Recurrent-VLN-BERT) |---|
108 | | 2021 | EACL | [On the Evaluation of Vision-and-Language Navigation Instructions](https://arxiv.org/abs/2101.10504) |---|---|
109 | |---|---| [Do As I Can, Not As I Say: Grounding Language in Robotic Affordances](https://say-can.github.io/assets/palm_saycan.pdf) | [![Github stars](https://img.shields.io/github/stars/say-can/say-can.github.io.svg)](https://say-can.github.io/) |---|
110 | |2021|`NIPS`|[History aware multimodal transformer for vision-and-language navigation](https://proceedings.neurips.cc/paper/2021/file/2e5c2cb8d13e8fba78d95211440ba326-Paper.pdf)|[![Github stars](https://img.shields.io/github/stars/cshizhe/VLN-HAMT.svg)](https://github.com/cshizhe/VLN-HAMT)|[website](https://cshizhe.github.io/projects/vln_hamt.html)|
111 | |2021|`CVPR`|[Room-and-object aware knowledge reasoning for remote embodied referring expression](https://openaccess.thecvf.com/content/CVPR2021/papers/Gao_Room-and-Object_Aware_Knowledge_Reasoning_for_Remote_Embodied_Referring_Expression_CVPR_2021_paper.pdf)|[![Github stars](https://img.shields.io/github/stars/alloldman/CKR.svg)](https://github.com/alloldman/CKR)|---|
112 | |2021|`ICCV`|[Vision-language navigation with random environmental mixup](https://openaccess.thecvf.com/content/ICCV2021/papers/Liu_Vision-Language_Navigation_With_Random_Environmental_Mixup_ICCV_2021_paper.pdf)|---|---|
113 | |2021|`ICRA`|[Hierarchical cross-modal agent for robotics vision-and-language navigation](https://arxiv.org/pdf/2104.10674)|---|Robo-VLN <br> First continuous| 
114 | | 2020 | `CVPR` | [Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training](https://arxiv.org/abs/2002.10638) | [![Github stars](https://img.shields.io/github/stars/weituo12321/PREVALENT.svg)](https://github.com/weituo12321/PREVALENT) |---|
115 | |2020|`ECCV`|[Active visual information gathering for vision-language navigation](https://arxiv.org/pdf/2007.08037)|[![Github stars](https://img.shields.io/github/stars/HanqingWangAI/Active_VLN.svg)](https://github.com/HanqingWangAI/Active_VLN)|---|
116 | |2020|`CVPR`|[Vision-language navigation with self-supervised auxiliary reasoning tasks](https://openaccess.thecvf.com/content_CVPR_2020/papers/Zhu_Vision-Language_Navigation_With_Self-Supervised_Auxiliary_Reasoning_Tasks_CVPR_2020_paper.pdf)|---|---|
117 | |2020|`ECCV`|[Improving vision-and-language navigation with image-text pairs from the web](https://arxiv.org/pdf/2004.14973)|---|VLN-BERT|
118 | | 2020 | `ECCV` | [Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments](https://arxiv.org/abs/2004.02857) | [![Github stars](https://img.shields.io/github/stars/jacobkrantz/VLN-CE.svg)](https://github.com/jacobkrantz/VLN-CE) |---|
119 | |2019|`EMNLP`|[Robust navigation with language pretraining and stochastic sampling](https://arxiv.org/pdf/1909.02244)|[![Github stars](https://img.shields.io/github/stars/xjli/r2r_vln.svg)](https://github.com/xjli/r2r_vln)|---|
120 | |2019|`CoRL`|[Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight](https://arxiv.org/pdf/1910.09664)|[![Github stars](https://img.shields.io/github/stars/lil-lab/drif.svg)](https://github.com/lil-lab/drif)|---|
121 | |2018|`NIPS`|[Speaker-follower models for vision-and-language navigation](https://arxiv.org/pdf/1806.02724)|[![Github stars](https://img.shields.io/github/stars/ronghanghu/speaker_follower.svg)](https://github.com/ronghanghu/speaker_follower)|[website](https://ronghanghu.com/speaker_follower/)|
122 | |2018|`RSS`|[Following high-level navigation instructions on a simulated quadcopter with imitation learning](https://arxiv.org/pdf/1806.00047)|[![Github stars](https://img.shields.io/github/stars/lil-lab/gsmn.svg)](https://github.com/lil-lab/gsmn)|---|
123 | 
124 | ## Simulator and Dataset
125 | 
126 | <!-- |---|`arXiv`|---|---|---| -->
127 | <!-- [![Github stars](https://img.shields.io/github/stars/***.svg)]() -->
128 | | Year | Venue | Paper Title | Repository | Note |
129 | |:----:|:-----:| ----------- |:----------:|:----:|
130 | |2025|`arXiv`|[InternVLA-N1: An Open Dual-System Vision-Language Navigation Foundation Model with Learned Latent Plans](https://internrobotics.github.io/internvla-n1.github.io/static/pdfs/InternVLA_N1.pdf)|[![Github stars](https://img.shields.io/github/stars/InternRobotics/InternNav.svg)](https://github.com/InternRobotics/InternNav) |[website](https://internrobotics.github.io/internvla-n1.github.io/)<br>InternData-N1 Dataset|
131 | |2025|`arXiv`|[HA-VLN: A Benchmark for Human-Aware Navigation in Discrete–Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard](https://arxiv.org/pdf/2503.14229)|[![Github stars](https://img.shields.io/github/stars/F1y1113/HA-VLN.svg)](https://github.com/F1y1113/HA-VLN)|[websit](https://ha-vln-project.vercel.app/)|
132 | |2023|`ICCV`|[Learning vision-and-language navigation from youtube videos](https://openaccess.thecvf.com/content/ICCV2023/papers/Lin_Learning_Vision-and-Language_Navigation_from_YouTube_Videos_ICCV_2023_paper.pdf)|[![Github stars](https://img.shields.io/github/stars/JeremyLinky/YouTube-VLN.svg)](https://github.com/JeremyLinky/YouTube-VLN)|YouTube-VLN|
133 | |2023|`ICCV`|[Aerialvln: Vision-and-language navigation for uavs](https://openaccess.thecvf.com/content/ICCV2023/papers/Liu_AerialVLN_Vision-and-Language_Navigation_for_UAVs_ICCV_2023_paper.pdf)|[![Github stars](https://img.shields.io/github/stars/AirVLN/AirVLN.svg)](https://github.com/AirVLN/AirVLN)|AerialVLN|
134 | |2023|`ICCV`|[Scaling data generation in vision-and-language navigation](https://openaccess.thecvf.com/content/ICCV2023/papers/Wang_Scaling_Data_Generation_in_Vision-and-Language_Navigation_ICCV_2023_paper.pdf)|[![Github stars](https://img.shields.io/github/stars/wz0919/ScaleVLN.svg)](https://github.com/wz0919/ScaleVLN)|ScaleVLN|
135 | |2022|`CVPR`|[Habitat-web: Learning embodied object-search strategies from human demonstrations at scale](https://openaccess.thecvf.com/content/CVPR2022/papers/Ramrakhya_Habitat-Web_Learning_Embodied_Object-Search_Strategies_From_Human_Demonstrations_at_Scale_CVPR_2022_paper.pdf)|[![Github stars](https://img.shields.io/github/stars/Ram81/habitat-web.svg)](https://github.com/Ram81/habitat-web)|[website](https://ram81.github.io/projects/habitat-web)|
136 | |2022|`CVPR`|[Bridging the gap between learning in discrete and continuous environments for vision-and-language navigation](https://openaccess.thecvf.com/content/CVPR2022/papers/Hong_Bridging_the_Gap_Between_Learning_in_Discrete_and_Continuous_Environments_CVPR_2022_paper.pdf)|[![Github stars](https://img.shields.io/github/stars/YicongHong/Discrete-Continuous-VLN.svg)](https://github.com/YicongHong/Discrete-Continuous-VLN)|R2R-CE|
137 | |2021|`CVPR`|[Soon: Scenario oriented object navigation with graph-based exploration](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhu_SOON_Scenario_Oriented_Object_Navigation_With_Graph-Based_Exploration_CVPR_2021_paper.pdf)|[![Github stars](https://img.shields.io/github/stars/ZhuFengdaaa/SOON.svg)](https://github.com/ZhuFengdaaa/SOON)|SOON|
138 | |2020|`arXiv`|[Alfred: A benchmark for interpreting grounded instructions for everyday tasks](https://openaccess.thecvf.com/content_CVPR_2020/papers/Shridhar_ALFRED_A_Benchmark_for_Interpreting_Grounded_Instructions_for_Everyday_Tasks_CVPR_2020_paper.pdf)|[![Github stars](https://img.shields.io/github/stars/askforalfred/alfred.svg)](https://github.com/askforalfred/alfred) |ALFRED<br>[website](AskForALFRED.com)| 
139 | |2020|`CVPR`|[REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments](https://openaccess.thecvf.com/content_CVPR_2020/papers/Qi_REVERIE_Remote_Embodied_Visual_Referring_Expression_in_Real_Indoor_Environments_CVPR_2020_paper.pdf)|[![Github stars](https://img.shields.io/github/stars/YuankaiQi/REVERIE.svg)](https://github.com/YuankaiQi/REVERIE)|REVERIE|
140 | |2020|`EMNLP`|[Where are you? localization from embodied dialog](https://arxiv.org/pdf/2011.08277)|---|[website](https://meerahahn.github.io/way/)|
141 | |2020|`CoRL`|[Vision-and-Dialog Navigation](https://arxiv.org/pdf/1907.04957)|[![Github stars](https://img.shields.io/github/stars/mmurray/cvdn.svg)](https://github.com/mmurray/cvdn/)|CVDN<br>[website](https://cvdn.dev/)|
142 | |2020|`EMNLP`|[Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding](https://arxiv.org/pdf/2010.07954)|[![Github stars](https://img.shields.io/github/stars/google-research-datasets/RxR.svg)](https://github.com/google-research-datasets/RxR)|RxR|
143 | |2019|`EMNLP`|[Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning](https://arxiv.org/pdf/1909.01871)|[![Github stars](https://img.shields.io/github/stars/khanhptnk/hanna.svg)](https://github.com/khanhptnk/hanna)|HANNA|
144 | |2019|`ACL`|[Stay on the path: Instruction fidelity in vision-and-language navigation](https://arxiv.org/pdf/1905.12255)|---|R4R<br>[website](https://github.com/google-research/google-research/tree/master/r4r)|
145 | |2019|`arXiv`|[Learning to navigate unseen environments: Back translation with environmental dropout](https://arxiv.org/pdf/1904.04195)|[![Github stars](https://img.shields.io/github/stars/airsplay/R2R-EnvDrop.svg)](https://github.com/airsplay/R2R-EnvDrop)|R2R-EnvDrop-CE|
146 | |2018|`CVPR`|[Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments](https://openaccess.thecvf.com/content_cvpr_2018/papers/Anderson_Vision-and-Language_Navigation_Interpreting_CVPR_2018_paper.pdf)|[![Github stars](https://img.shields.io/github/stars/peteanderson80/Matterport3DSimulator.svg)](https://github.com/peteanderson80/Matterport3DSimulator)|R2R<br>[website](https://bringmeaspoon.org/)|
147 | 
148 | 
149 | ## Survey Paper
150 | <!-- |---|`arXiv`|---|---|---| -->
151 | <!-- [![Github stars](https://img.shields.io/github/stars/***.svg)]() -->
152 | | Year | Venue | Paper Title | Repository | Note |
153 | |:----:|:-----:| ----------- |:----------:|:----:|
154 | |2025|`arXiv`|[Sensing, Social, and Motion Intelligence in Embodied Navigation: A Comprehensive Survey](https://arxiv.org/pdf/2508.15354)|[![Github stars](https://img.shields.io/github/stars/Franky-X/Awesome-Embodied-Navigation.svg)](https://github.com/Franky-X/Awesome-Embodied-Navigation)|Survey for EN <br> [blog](https://kwanwaipang.github.io/Enbodied-Navigation/)|
155 | |2025|`Transactions on Mechatronics`|[Aligning cyber space with physical world: A comprehensive survey on embodied ai](https://arxiv.org/pdf/2407.06886?)|[![Github stars](https://img.shields.io/github/stars/HCPLab-SYSU/Embodied_AI_Paper_List.svg)](https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List)|---|
156 | |2024|`Transactions on Machine Learning Research`|[Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models](https://openreview.net/pdf?id=yiqeh2ZYUh)|[![Github stars](https://img.shields.io/github/stars/zhangyuejoslin/VLN-Survey-with-Foundation-Models.svg)](https://github.com/zhangyuejoslin/VLN-Survey-with-Foundation-Models)|[blog](https://kwanwaipang.github.io/VLNsurvery2024/)|
157 | |2024|`arXiv`|[A Survey on Vision-Language-Action Models for Embodied AI](https://arxiv.org/pdf/2405.14093)|[![Github stars](https://img.shields.io/github/stars/yueen-ma/Awesome-VLA.svg)](https://github.com/yueen-ma/Awesome-VLA)|Survey for VLA|
158 | |2024|`Neural Computing and Applications`|[Vision-language navigation: a survey and taxonomy](https://arxiv.org/pdf/2108.11544)|---|---|
159 | |2023|`Artificial Intelligence Review`|[Visual language navigation: A survey and open challenges](https://link.springer.com/article/10.1007/s10462-022-10174-9)|---|---|
160 | |2022|`ACL`|[Vision-and-language navigation: A survey of tasks, methods, and future directions](https://arxiv.org/pdf/2203.12667)|[![Github stars](https://img.shields.io/github/stars/eric-ai-lab/awesome-vision-language-navigation.svg)](https://github.com/eric-ai-lab/awesome-vision-language-navigation)|---|
161 | 
162 | 
163 | 
164 | <!-- 
165 | 
166 | <br>
167 | | Simulator | Dataset | Link | Note |
168 | |:---------:|:-------:| ---- |:----:|
169 | |VizDooma|---|[website](https://vizdoom.cs.put.edu.pl/)|---|
170 | |House3D|SUNCG|[website](https://github.com/facebookresearch/House3D)|---|
171 | |AI2THOR|---|[website](http://ai2thor.allenai.org)|---|
172 | |Gibson|2D-3D-S|[website](http://gibsonenv.stanford.edu/)|---|
173 | |iGibson|iGibson|[website](http://gibsonenv.stanford.edu/)|---|
174 | |Matterport3DSimulator|R2R, R4R, REVERIE, SOON|---|
175 | |Habitat|VLN-CE|[website](https://aihabitat.org/)|---|
176 | |AirSim|AerialVLN|[website](https://github.com/microsoft/AirSim)|---|
177 | 
178 | -->
179 | 
180 | 
181 | 
182 | 
183 | # Learning-based Navigation
184 | or Image-goal Navigation, or object-goal navigation
185 | <!-- |---|`arXiv`|---|---|---| -->
186 | <!-- [![Github stars](https://img.shields.io/github/stars/***.svg)]() -->
187 | | Year | Venue | Paper Title | Repository | Note |
188 | |:----:|:-----:| ----------- |:----------:|:----:|
189 | |2025|`arXiv`|[Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation](https://arxiv.org/pdf/2510.08713)|[![Github stars](https://img.shields.io/github/stars/F1y1113/UniWM.svg)](https://github.com/F1y1113/UniWM)|---|
190 | |2025|`arXiv`|[NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance](https://arxiv.org/pdf/2505.08712)|[![Github stars](https://img.shields.io/github/stars/InternRobotics/NavDP.svg)](https://github.com/InternRobotics/NavDP)|[website](https://wzcai99.github.io/navigation-diffusion-policy.github.io/)|
191 | |2025|`arXiv`|[MSGNav: Unleashing the Power of Multi-modal 3D Scene Graph for Zero-Shot Embodied Navigation](https://arxiv.org/pdf/2511.10376v2)|[![Github stars](https://img.shields.io/github/stars/ylwhxht/MSGNav.svg)](https://github.com/ylwhxht/MSGNav)|---|
192 | |2025|`arXiv`|[Adaptive Interactive Navigation of Quadruped Robots using Large Language Models](https://arxiv.org/pdf/2503.22942?)|---|[Video](https://www.youtube.com/watch?v=W5ttPnSap2g)|
193 | |2025|`arXiv`|[DPL: Depth-only Perceptive Humanoid Locomotion via Realistic Depth Synthesis and Cross-Attention Terrain Reconstruction](https://www.arxiv.org/pdf/2510.07152)|---|---|
194 | |2025|`arXiv`|[IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation](https://arxiv.org/pdf/2508.00823)|[![Github stars](https://img.shields.io/github/stars/GWxuan/IGL-Nav.svg)](https://github.com/GWxuan/IGL-Nav)|[website](https://gwxuan.github.io/IGL-Nav/)<br>Exploration+target matching|
195 | |2025|`arXiv`|[LOVON: Legged Open-Vocabulary Object Navigator](https://arxiv.org/pdf/2507.06747?)|[![Github stars](https://img.shields.io/github/stars/DaojiePENG/LOVON.svg)](https://github.com/DaojiePENG/LOVON)|[website](https://daojiepeng.github.io/LOVON/)|
196 | |2025|`ICRA`|[TANGO: Traversability-Aware Navigation with Local Metric Control for Topological Goals](https://arxiv.org/pdf/2509.08699)|[![Github stars](https://img.shields.io/github/stars/podgorki/TANGO.svg)](https://github.com/podgorki/TANGO)|[website](https://podgorki.github.io/TANGO/)|
197 | |2025|`RSS`|[Learned Perceptive Forward Dynamics Model for Safe and Platform-aware Robotic Navigation](https://arxiv.org/pdf/2504.19322)|[![Github stars](https://img.shields.io/github/stars/leggedrobotics/fdm.svg)](https://github.com/leggedrobotics/fdm)|[website](https://leggedrobotics.github.io/fdm.github.io/)|
198 | |2025|`arXiv`|[Parkour in the Wild: Learning a General and Extensible Agile Locomotion Policy Using Multi-expert Distillation and RL Fine-tuning](https://arxiv.org/pdf/2505.11164)|---|---|
199 | |2025|`CoRL`|[Omni-Perception: Omnidirectional Collision Avoidance for Legged Locomotion in Dynamic Environments](https://arxiv.org/pdf/2505.19214)|[![Github stars](https://img.shields.io/github/stars/aCodeDog/OmniPerception.svg)](https://github.com/aCodeDog/OmniPerception)|---|
200 | |2024|`ICRA`|[VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation](https://arxiv.org/pdf/2312.03275)|[![Github stars](https://img.shields.io/github/stars/bdaiinstitute/vlfm.svg)](https://github.com/bdaiinstitute/vlfm)|[website](https://naoki.io/portfolio/vlfm)|
201 | |2024|`SRO`|[Learning Robust Autonomous Navigation and Locomotion for Wheeled-Legged Robots](https://arxiv.org/pdf/2405.01792)|---|---| 
202 | |2024|`RAL`|[PIE: Parkour With Implicit-Explicit Learning Framework for Legged Robots](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10678805)|---|---| 
203 | |2024|`ICRA`|[Extreme Parkour with Legged Robots](https://arxiv.org/pdf/2309.14341)|[![Github stars](https://img.shields.io/github/stars/chengxuxin/extreme-parkour.svg)](https://github.com/chengxuxin/extreme-parkour)|[website](https://extreme-parkour.github.io/)| 
204 | |2023|`ICML`|[Esc: Exploration with soft commonsense constraints for zero-shot object navigation](https://proceedings.mlr.press/v202/zhou23r/zhou23r.pdf)|---|---|
205 | |2023|`ICRA`|[Zero-shot object goal visual navigation](https://arxiv.org/pdf/2206.07423)|[![Github stars](https://img.shields.io/github/stars/pioneer-innovation/Zero-Shot-Object-Navigation.svg)](https://github.com/pioneer-innovation/Zero-Shot-Object-Navigation)|---|
206 | |2023|`ICRA`|[ViNL: Visual Navigation and Locomotion Over Obstacles](https://arxiv.org/pdf/2210.14791)|[![Github stars](https://img.shields.io/github/stars/SimarKareer/ViNL.svg)](https://github.com/SimarKareer/ViNL)|[website](https://www.joannetruong.com/projects/vinl.html)|
207 | |2023|`Field Robotics`|[ArtPlanner: Robust Legged Robot Navigation in the Field](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10876046)|[![Github stars](https://img.shields.io/github/stars/leggedrobotics/art_planner.svg)](https://github.com/leggedrobotics/art_planner)|---|
208 | 
209 | 
210 | ## Mapless navigation
211 | <!-- |---|`arXiv`|---|---|---| -->
212 | <!-- [![Github stars](https://img.shields.io/github/stars/***.svg)]() -->
213 | | Year | Venue | Paper Title | Repository | Note |
214 | |:----:|:-----:| ----------- |:----------:|:----:|
215 | |2025|`RSS`|[CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance](https://arxiv.org/pdf/2503.03921?)|[![Github stars](https://img.shields.io/github/stars/ut-amrl/creste_public.svg)](https://github.com/ut-amrl/creste_public)|[website](https://amrl.cs.utexas.edu/creste/)|
216 | 
217 | 
218 | # Others
219 | <!-- |---|`arXiv`|---|---|---| -->
220 | <!-- [![Github stars](https://img.shields.io/github/stars/***.svg)]() -->
221 | | Year | Venue | Paper Title | Repository | Note |
222 | |:----:|:-----:| ----------- |:----------:|:----:|
223 | |2025|`IEEE/ASME Transactions on Mechatronics`|[Aligning cyber space with physical world: A comprehensive survey on embodied ai](https://arxiv.org/pdf/2407.06886)|[![Github stars](https://img.shields.io/github/stars/HCPLab-SYSU/Embodied_AI_Paper_List.svg)](https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List)|---|
224 | |2025|`arXiv`|[HumanoidPano: Hybrid Spherical Panoramic-LiDAR Cross-Modal Perception for Humanoid Robots](https://arxiv.org/pdf/2503.09010)|---|---|
225 | |2021|`ICML`|[Learning transferable visual models from natural language supervision](https://proceedings.mlr.press/v139/radford21a/radford21a.pdf)|[![Github stars](https://img.shields.io/github/stars/OpenAI/CLIP.svg)](https://github.com/OpenAI/CLIP)|CLIP<br>[website](https://openai.com/index/clip/)|
226 | 
227 | ## Occupancy Perception
228 | <!-- |---|`arXiv`|---|---|---| -->
229 | <!-- [![Github stars](https://img.shields.io/github/stars/***.svg)]() -->
230 | | Year | Venue | Paper Title | Repository | Note |
231 | |:----:|:-----:| ----------- |:----------:|:----:|
232 | |2025|`arXiv`|[Humanoid Occupancy: Enabling A Generalized Multimodal Occupancy Perception System on Humanoid Robots](https://arxiv.org/pdf/2507.20217)|[![Github stars](https://img.shields.io/github/stars/Open-X-Humanoid/Humanoid-Occupancy.svg)](https://github.com/Open-X-Humanoid/Humanoid-Occupancy)|[website](https://humanoid-occupancy.github.io/)<br>Multimodal Occupancy Perception| 
233 | |2025|`arXiv`|[Roboocc: Enhancing the geometric and semantic scene understanding for robots](https://arxiv.org/pdf/2504.14604)|---|3DGS|
234 | |2025|`ICCV`|[EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding](https://arxiv.org/pdf/2412.04380)|[![Github stars](https://img.shields.io/github/stars/YkiWu/EmbodiedOcc.svg)](https://github.com/YkiWu/EmbodiedOcc)|[website](https://ykiwu.github.io/EmbodiedOcc/)|
235 | |2023|`ICCV`|[Scene as occupancy](https://openaccess.thecvf.com/content/ICCV2023/papers/Tong_Scene_as_Occupancy_ICCV_2023_paper.pdf)|[![Github stars](https://img.shields.io/github/stars/OpenDriveLab/OccNet.svg)](https://github.com/OpenDriveLab/OccNet)|[Challenge and dataset](https://github.com/OpenDriveLab/OpenScene)|
236 | 
237 | ## VLA
238 | * Paper List for [VLA (Visual Language Action)](https://github.com/KwanWaiPang/Awesome-VLA)
239 | 
240 | 
241 | 


--------------------------------------------------------------------------------