├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 KosmosG 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # A Paperlist for RoboMani-Learning 🚀🤖 2 | 3 | [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)]() 4 | 5 | 6 | --- 7 | 8 | ## 📌 Basic Info 9 | 10 | This repository collects the latest and influential papers and resources related to **robotic manipulation**. The focus is on: 11 | 12 | - [Generalist Manipulation Models & Methods](#-generalist-manipulation-models-and-methods) 13 | - [Reinforcement Learning (RL) on Robotics Manipulation](#-reinforcement-learning-rl-on-robotics-manipulation) 14 | - [World Model in Robot Manipulation](#-world-model-in-robot-manipulation) 15 | - [Skill Learning in Robotics](#-skill-learning-in-robotics) 16 | - [Data & Benchmarks](#-data-and-benchmarks) 17 | - [Hardware Projects on Robotics](#-hardware-projects-on-robotics) 18 | - [Interdisciplinary](#-interdisciplinary) 19 | 20 | The following publications are ordered by time, with the most recent at the top. 21 | 22 | Papers with **open-sourced implementations or code** are marked with a ☀️ 23 | Papers with **real-world performance** reproduced by us are marked with a ✅ 24 | Works claimed with **code coming soon** are marked with a 🧐 25 | 26 | --- 27 | 28 | ## 📚 Paper List 29 | 30 | ### 🧠 Generalist Manipulation Models and Methods 31 | - **GEN-0**: Embodied Foundation Models That Scale with Physical Interaction [[paper](https://generalistai.com/blog/nov-04-2025-GEN-0)] 32 | - **LACY**: A Vision-Language Model-based Language-Action Cycle for Self-Improving Robotic Manipulation [[paper](https://arxiv.org/pdf/2511.02239)] [[project](https://vla2026.github.io/LACY/#l2c)] 33 | - **RobustVLA**: Robustness-Aware ReinforcementPost-Training for Vision-Language-Action Models [[paper](https://arxiv.org/pdf/2511.01331)] 34 | - Running VLAs at Real-time Speed [[paper](https://arxiv.org/pdf/2510.26742)] [[project](https://github.com/Dexmal/realtime-vla)] ☀️ 35 | - **VLA-0**: Building State-of-the-Art VLAs with Zero Modification [[paper](https://arxiv.org/pdf/2510.13054v1)] [[project](https://vla0.github.io/)] ☀️ 36 | - **EgoBridge**: Domain Adaptation for Generalizable Imitation from Egocentric Human Data [[paper](https://arxiv.org/pdf/2509.19626)] [[project](https://ego-bridge.github.io/)] 🧐 37 | - **MLA**: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation [[paper](https://arxiv.org/pdf/2509.26642)] [[project](https://sites.google.com/view/open-mla)] 38 | - Seeing Space and Motion: Enhancing Latent Actions with Spatial and Dynamic Awareness for VLA [[paper](https://arxiv.org/pdf/2509.26251)] 39 | - SELF-IMPROVING EMBODIED FOUNDATION MODELS [[paper](https://arxiv.org/pdf/2509.15155)] [[project](https://self-improving-efms.github.io/)] 40 | - The Better You Learn, The Smarter You Prune: Towards Efficient Vision-language-action Models via Differentiable Token Pruning [[paper](https://arxiv.org/pdf/2509.12594?)] [[project](https://liauto-research.github.io/LightVLA/)] 41 | - Enhancing Generalization in Vision–Language–Action Models by Preserving Pretrained Representations [[paper](https://arxiv.org/pdf/2509.11417?)] [[project](https://gen-vla.github.io/)] 🧐 42 | - **WALL-OSS**: Igniting VLMs toward the Embodied Space [[paper](https://arxiv.org/pdf/2509.11766)] [[project](https://github.com/X-Square-Robot/wall-x)] ☀️ 43 | - **TA-VLA**: Elucidating the Design Space of Torque-aware Vision-Language-Action Models [[paper](https://arxiv.org/pdf/2509.07962)] [[project](https://zzongzheng0918.github.io/Torque-Aware-VLA.github.io/)] 🧐 44 | - **F1**: A VISION-LANGUAGE-ACTION MODEL BRIDGING UNDERSTANDING AND GENERATION TO ACTIONS [[paper](https://arxiv.org/pdf/2509.06951v2)] [[project](https://aopolin-lv.github.io/F1-VLA/)] ☀️ 45 | - **RaC**: Robot Learning for Long-Horizon Tasks by Scaling Recovery and Correction [[paper](https://arxiv.org/pdf/2509.07953)] [[project](https://rac-scaling-robot.github.io/)] 🧐 46 | - **SpecPrune-VLA**: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning [[paper](https://arxiv.org/pdf/2509.05614)] 47 | - **FlowVLA**: Thinking in Motion with a Visual Chain of Thought [[paper](https://arxiv.org/pdf/2508.18269)] 48 | - **CAST**: Counterfactual Labels Improve Instruction Following inVision-Language-Action Models [[paper](https://arxiv.org/pdf/2508.13446)] [[project](https://cast-vla.github.io/)] ☀️ 49 | - **Grounding Actions in Camera Space**: Observation-CentricVision-Language-Action Policy [[paper](https://arxiv.org/pdf/2508.13103)] 50 | - **GeoVLA**: Empowering 3D Representations in Vision-Language-Action Models [[paper](https://arxiv.org/pdf/2508.09071)] [[project](https://linsun449.github.io/GeoVLA//)] 🧐 51 | - **GraphCoT-VLA**: A 3D Spatial-Aware Reasoning Vision-Language-Action Model for Robotic Manipulation with Ambiguous Instructions [[paper](https://arxiv.org/pdf/2508.07650)] 52 | - **InstructVLA**: Vision-Language-Action Instruction Tuning: From Understanding to Manipulation [[paper](https://arxiv.org/pdf/2507.17520)] [[project](https://yangs03.github.io/InstructVLA_Home/)] ☀️ 53 | - **Being-H0**: Vision-Language-Action Pretraining from Large-Scale Human Videos [[paper](https://arxiv.org/pdf/2507.15597)] [[project](https://beingbeyond.github.io/Being-H0/)] ☀️ 54 | - **EgoVLA**: Learning Vision-Language-Action Models from Egocentric Human Videos [[paper](https://arxiv.org/pdf/2507.12440)] [[project](https://rchalyang.github.io/EgoVLA/)] 55 | - **ThinkAct**: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning [[paper](https://arxiv.org/pdf/2507.16815)] [[project](https://jasper0314-huang.github.io/thinkact-vla/)] 🧐 56 | - **GR-3** Technical Report [[paper](https://arxiv.org/pdf/2507.15493)] [[project](https://seed.bytedance.com/en/GR3)] 57 | - **VITA**: VISION-TO-ACTION FLOW MATCHING POLICY [[paper](https://arxiv.org/pdf/2507.13231)] [[project](https://ucd-dare.github.io/VITA/)] 🧐 58 | - **TACTILE-VLA**: UNLOCKING VISION-LANGUAGE ACTION MODEL’S PHYSICAL KNOWLEDGE FOR TACTILE GENERALIZATION [[paper](https://arxiv.org/pdf/2507.09160)] 59 | - **VOTE**: Vision-Language-Action Optimization with Trajectory Ensemble Voting [[paper](https://arxiv.org/pdf/2507.05116)] [[project](https://github.com/LukeLIN-web/VOTE)] ☀️ 60 | - **Evo-0**: Vision-Language-Action Model with Implicit Spatial Understanding [[paper](https://www.arxiv.org/pdf/2507.00416)] [[project](https://mint-sjtu.github.io/Evo-VLA.io/)] 🧐 61 | - **TRIVLA**: A TRIPLE-SYSTEM-BASED UNIFIED VISION-LANGUAGE-ACTION MODEL FOR GENERAL ROBOT CONTROL [[paper](https://arxiv.org/pdf/2507.01424)] [[project](https://zhenyangliu.github.io/TriVLA/)] 🧐 62 | - **WorldVLA**: Towards Autoregressive Action World Model [[paper](https://arxiv.org/pdf/2506.21539)] [[project](https://github.com/alibaba-damo-academy/WorldVLA)] ☀️ 63 | - Grounding Language Models with Semantic Digital Twins for Robotic Planning [[paper](https://arxiv.org/pdf/2506.16493)] 64 | - **Prompting with the Future**:Open-WorldModel PredictiveControlwithInteractiveDigitalTwins [[paper](https://arxiv.org/pdf/2506.13761)] [[project](https://prompting-with-the-future.github.io/)] ☀️ 65 | - **RationalVLA**: A Rational Vision-Language-Action Model with Dual System [[paper](https://arxiv.org/pdf/2506.10826)] [[project](https://irpn-eai.github.io/RationalVLA/)] 🧐 66 | - **Chain-of-Action**: Trajectory Autoregressive Modeling for Robotic Manipulation [[paper](https://arxiv.org/pdf/2506.09990)] [[project](https://chain-of-action.github.io/)] ☀️ 67 | - **BitVLA**: 1-bit Vision-Language-Action Models for Robotics Manipulation [[paper](https://arxiv.org/pdf/2506.07530)] 68 | - **PDFactor**: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation [[paper](https://openaccess.thecvf.com/content/CVPR2025/papers/Tian_PDFactor_Learning_Tri-Perspective_View_Policy_Diffusion_Field_for_Multi-Task_Robotic_CVPR_2025_paper.pdf)] 69 | - **FlowRAM**: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation [[paper](https://openaccess.thecvf.com/content/CVPR2025/papers/Wang_FlowRAM_Grounding_Flow_Matching_Policy_with_Region-Aware_Mamba_Framework_for_CVPR_2025_paper.pdf)] 70 | - Real-Time Execution of Action Chunking Flow Policies [[paper](https://www.physicalintelligence.company/download/real_time_chunking.pdf)] ☀️ 71 | - **SmolVLA**: A Vision-Language-Action Model for Affordable and Efficient Robotics [[paper](https://arxiv.org/pdf/2506.01844)] [[project](https://huggingface.co/lerobot/smolvla_base)] ☀️ ✅ 72 | - Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better [[paper](https://arxiv.org/pdf/2505.23705)] [[project](https://www.physicalintelligence.company/research/knowledge_insulation)] 73 | - **ChatVLA-2**: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge [[paper](https://arxiv.org/pdf/2505.21906)] [[project](https://chatvla-2.github.io/)] ☀️ 74 | - **ForceVLA**: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation [[paper](https://arxiv.org/pdf/2505.22159)] [[project](https://sites.google.com/view/forcevla2025/)] 75 | - **DexUMI**: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation [[paper](https://arxiv.org/pdf/2505.21864)] [[project](https://dex-umi.github.io/)] ☀️ 76 | - **Hume**: Introducing System-2 Thinking in Visual-Language-Action Model [[paper](https://arxiv.org/pdf/2505.21432)] 77 | - **FLARE**: Robot Learning with Implicit World Modeling [[paper](https://arxiv.org/pdf/2505.15659)] 78 | - **InSpire**: Vision-Language-Action Models with Intrinsic Spatial Reasoning [[paper](https://arxiv.org/pdf/2505.13888)] [[project](https://koorye.github.io/proj/Inspire/)] ☀️ 79 | - **DreamGen**: Unlocking Generalization in Robot Learning through Neural Trajectories [[paper](https://arxiv.org/pdf/2505.12705)] [[project](https://research.nvidia.com/labs/gear/dreamgen/)] ☀️ 80 | - **GLOVER++**: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation [[paper](https://arxiv.org/pdf/2505.11865)] [[project](https://teleema.github.io/projects/GLOVER++/)] ☀️ 81 | - **UniVLA**: Learning to Act Anywhere with Task-centric Latent Action [[paper](https://arxiv.org/pdf/2505.06111)] [[project](https://github.com/OpenDriveLab/UniVLA)] ☀️ 82 | - **NORA**: A SMALL OPEN-SOURCED GENERALIST VISION LANGUAGE ACTION MODEL FOR EMBODIED TASKS [[paper](https://arxiv.org/pdf/2504.19854)] [[project](https://declare-lab.github.io/nora)] ☀️ ✅ 83 | - **Gemini Robotics**: Bringing AI into the Physical World [[paper](https://arxiv.org/abs/2503.20020)] [[project](https://deepmind.google/models/gemini-robotics/)] ☀️ 84 | - **GR00T N1**: An Open Foundation Model for Generalist Humanoid Robots [[paper](https://arxiv.org/pdf/2503.14734)] [[project](https://developer.nvidia.com/isaac/gr00t)] ☀️ ✅ 85 | - **π0.5**: a VLA with Open-World Generalization [[paper](https://www.physicalintelligence.company/download/pi05.pdf)] [[project](https://www.physicalintelligence.company/blog/pi05)] ☀️ 86 | - **PointVLA**: Injecting the 3D World into Vision-Language-Action Model [[paper](https://arxiv.org/pdf/2503.07511)] [[project](https://pointvla.github.io/)] 87 | - **GraspVLA**: a Grasping Foundation Model Pre-trained on Billion-scale Synthetic Action Data [[paper](https://arxiv.org/pdf/2505.03233)] [[project](https://pku-epic.github.io/GraspVLA-web/)] ☀️ 88 | - **Reactive Diffusion Policy**: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation [[paper](https://arxiv.org/pdf/2503.02881)] [[project](https://reactive-diffusion-policy.github.io/)] ☀️ 89 | - **Unified Video Action Model** [[paper](https://arxiv.org/pdf/2503.00200)] [[project](https://unified-video-action-model.github.io/)] ☀️ 90 | - **OpenVLA-OFT**: Fine-Tuning Vision-Language-Action Models:Optimizing Speed and Success [[paper](https://arxiv.org/pdf/2502.19645)] [[project](https://openvla-oft.github.io)] ☀️ ✅ 91 | - **Helix**:Helix: A Vision-Language-Action Model for Generalist Humanoid Control [[project](https://www.figure.ai/news/helix)] 92 | - **You Only Teach Once**: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations [[paper](https://arxiv.org/pdf/2501.14208)] [[project](https://hnuzhy.github.io/projects/YOTO/)] ☀️ 93 | - **TraceVLA**: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies [[paper](https://arxiv.org/pdf/2412.10345)] [[project](https://tracevla.github.io/)] ☀️ 94 | - **CogACT**: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation [[paper](https://arxiv.org/pdf/2411.19650)] [[project](https://cogact.github.io/)] ☀️ ✅ 95 | - **GRAPE**: Generalizing Robot Policy via Preference Alignment [[paper](https://arxiv.org/pdf/2411.19309)] [[project](https://grape-vla.github.io/)] ☀️ 96 | - **iDP3**: Generalizable Humanoid Manipulation with 3D Diffusion Policies [[paper](https://arxiv.org/pdf/2410.10803)] [[project](https://humanoid-manipulation.github.io/)] ☀️ 97 | - **π0**: A Vision-Language-Action Flow Model for General Robot Control [[paper](https://arxiv.org/pdf/2410.24164)] [[project](https://www.physicalintelligence.company/blog/pi0)] ☀️ ✅ 98 | - **TinyVLA**: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation [[paper](https://arxiv.org/pdf/2409.12514)] [[project](https://arxiv.org/pdf/2409.12514)] ☀️ 99 | - **ReKep**: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation [[paper](https://arxiv.org/pdf/2409.01652)] [[project](https://rekep-robot.github.io/)] ☀️ ✅ 100 | - **OpenVLA**: An Open-Source Vision-Language-Action Model [[paper](https://arxiv.org/pdf/2406.09246)] [[project](https://openvla.github.io/)] ☀️ ✅ 101 | - **3D Diffusion Policy**: Generalizable Visuomotor Policy Learning via Simple 3D Representations [[paper](https://arxiv.org/pdf/2403.03954)] [[project](https://3d-diffusion-policy.github.io/)] ☀️ ✅ 102 | 103 | ### 🔁 Reinforcement Learning (RL) on Robotics Manipulation 104 | - **GR-RL**: Going Dexterous and Precise for Long-HorizonRobotic Manipulation [[paper](https://arxiv.org/pdf/2512.01801#page=15.11)] [[project](https://seed.bytedance.com/gr_rl)] 105 | - **π∗0.6**: a VLA That Learns From Experience [[paper](https://www.physicalintelligence.company/download/pistar06.pdf)] [[project]()] 106 | - **πRL**: ONLINE RL FINE-TUNING FOR FLOW-BASEDVISION-LANGUAGE-ACTION MODELS [[paper](https://arxiv.org/pdf/2510.25889)] [[project](https://github.com/RLinf/RLinf)] ☀️ 107 | - **Probe, Learn, Distill**: Self-Improving Vision-Language-Action Models with Data Generation via Residual RL [[paper](https://www.wenlixiao.com/self-improve-VLA-PLD/assets/doc/pld-fullpaper.pdf)] [[project](https://www.wenlixiao.com/self-improve-VLA-PLD)] 🧐 108 | - **RL-100**: Performant Robotic Manipulation with Real-World Reinforcement Learning [[paper](https://arxiv.org/pdf/2510.14830)] [[project](https://lei-kun.github.io/RL-100/)] 🧐 109 | - **SARM**: STAGE-AWARE REWARD MODELING FOR LONG HORIZON ROBOT MANIPULATION [[paper](https://arxiv.org/pdf/2509.25358)] [[project](https://qianzhong-chen.github.io/sarm.github.io/)] 🧐 110 | - **DEEPSEARCH**: OVERCOME THE BOTTLENECK OF REINFORCEMENT LEARNING WITH VERIFIABLE REWARDS VIA MONTE CARLO TREE SEARCH [[paper](https://arxiv.org/pdf/2509.25454)] [[project](https://huggingface.co/fangwu97/DeepSearch-1.5B)] ☀️ 111 | - Residual Off-Policy RL for Finetuning Behavior Cloning Policies [[paper](https://arxiv.org/pdf/2509.19301)] [[project](https://residual-offpolicy-rl.github.io//)] 🧐 112 | - **SOE**: Sample-Efficient Robot Policy Self-Improvementvia On-Manifold Exploration [[paper](https://arxiv.org/pdf/2509.19292)] [[project](https://ericjin2002.github.io/SOE/)] 🧐 113 | - SELF-IMPROVING EMBODIED FOUNDATION MODELS [[paper](https://arxiv.org/pdf/2509.15155)] [[project](https://self-improving-efms.github.io/)] 114 | - **RLinf**: Reinforcement Learning Infrastructure for Agentic AI [[paper](https://rlinf.readthedocs.io/en/latest/)] [[project](https://github.com/RLinf/RLinf)] ☀️ 115 | - Compute-Optimal Scaling for Value-Based Deep RL [[paper](https://arxiv.org/pdf/2508.14881)] 116 | - **Embodied-R1**: Reinforced Embodied Reasoningfor General Robotic Manipulation [[paper](https://arxiv.org/pdf/2508.13998)] [[project](https://embodied-r1.github.io/)] ☀️ 117 | - **CO-RFT**: Efficient Fine-Tuning of Vision-Language-Action Models through Chunked Offline Reinforcement Learning [[paper](https://arxiv.org/pdf/2508.02219)] 118 | - **ROVER**: Recursive Reasoning Over Videos with Vision-Language Models for Embodied Tasks [[paper](https://arxiv.org/pdf/2508.01943)] [[project](https://rover-vlm.github.io/)] ☀️ 119 | - Reinforcement Learning for Flow-Matching Policies [[paper](https://arxiv.org/pdf/2507.15073v1)] 120 | - **FOUNDER**: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making [[paper](https://arxiv.org/pdf/2507.12496)] [[project](https://sites.google.com/view/founder-rl)] 121 | - **EXPO**: Stable Reinforcement Learning with Expressive Policies [[paper](https://arxiv.org/pdf/2507.07986)] 122 | - Asynchronous multi-agent deep reinforcement learning under partial observability [[paper](https://journals.sagepub.com/doi/full/10.1177/02783649241306124)] 123 | - Reinforcement Learning with Action Chunking [[paper](https://www.alphaxiv.org/abs/2507.07969v1)] [[project](https://github.com/ColinQiyangLi/qc)] ☀️ ✅ 124 | - **SimLauncher**: Launching Sample-Efficient Real-world Robotic Reinforcement Learning via Simulation Pre-training [[paper](https://arxiv.org/pdf/2507.04452)] 125 | - **RLRC**: Reinforcement Learning-based Recovery for Compressed Vision-Language-Action Models [[paper](https://arxiv.org/pdf/2506.17639)] [[project](https://rlrc-vla.github.io/)] 🧐 126 | - Steering Your Diffusion Policy with Latent Space Reinforcement Learning [[paper](https://arxiv.org/pdf/2506.15799)] [[project](https://diffusion-steering.github.io/)] ☀️ 127 | - **GMT**:General Motion Tracking for Humanoid Whole-Body Control [[paper](https://arxiv.org/pdf/2506.14770)] [[project](https://gmt-humanoid.github.io/)] ☀️ 128 | - **Eye, Robot**: Learning to Look to Act with a BC-RL Perception-Action Loop [[paper](https://arxiv.org/pdf/2506.10968)] [[project](https://www.eyerobot.net/)] 🧐 129 | - Reinforcement Learning via Implicit Imitation Guidance [[paper](https://arxiv.org/pdf/2506.07505)] 130 | - Robotic Policy Learning via Human-assisted Action Preference Optimization [[paper](https://arxiv.org/pdf/2506.07127)] [[project](https://gewu-lab.github.io/hapo_human_assisted_preference_optimization/)] 🧐 131 | - Self-Adapting Improvement Loops for Robotic Learning [[paper](https://arxiv.org/pdf/2506.07505)] [[project](https://diffusion-supervision.github.io/sail/)] 🧐 132 | - **Robot-R1**: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics [[paper](https://arxiv.org/pdf/2506.00070v1)] 133 | - Self-Challenging Language Model Agents [[paper](https://arxiv.org/pdf/2506.01716)] 134 | - Diffusion Guidance Is a Controllable Policy Improvement Operator [[paper](https://arxiv.org/pdf/2505.23458)] [[project](https://github.com/kvfrans/cfgrl)] ☀️ 135 | - **Beyond Markovian**: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning [[paper](https://arxiv.org/pdf/2505.20561v1)] [[project](https://github.com/shenao-zhang/BARL)] ☀️ 136 | - What Can RL Bring to VLA Generalization? An Empirical Study [[paper](https://arxiv.org/pdf/2505.19789)] [[project](https://rlvla.github.io/)] ☀️ 137 | - Learning to Reason without External Rewards [[paper](https://arxiv.org/pdf/2505.19590)] 138 | - **GenPO**: Generative Diffusion Models Meet On-Policy Reinforcement Learning [[paper](https://arxiv.org/pdf/2505.18763)] 139 | - **VLA-RL**: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning [[paper](https://arxiv.org/pdf/2505.18719)][[code](https://github.com/GuanxingLu/vlarl)] ☀️ ✅ 140 | - **SimpleVLA-RL**: Online RL with Simple Reward Enables Training VLA Models with Only One Trajectory [[paper](https://www.alphaxiv.org/abs/2509.09674)][[code](https://github.com/PRIME-RL/SimpleVLA-RL)] ☀️ ✅ 141 | - **TeViR**: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning [[paper](https://arxiv.org/pdf/2505.19769)] 142 | - **Genie Centurion**: Accelerating Scalable Real-World Robot Training with Human Rewind-and-Refine Guidance [[paper](https://arxiv.org/pdf/2505.18793)] [[project](https://genie-centurion.github.io/)] ☀️ 143 | - **RIPT-VLA**: Interactive Post-Training for Vision-Language-Action Models [[paper](https://arxiv.org/pdf/2505.17016)] [[project](https://ariostgx.github.io/ript_vla/)] ☀️ 144 | - **ManipLVM-R1**: Reinforcement Learning for Reasoning in Embodied Manipulation with Large Vision-Language Models [[paper](https://arxiv.org/pdf/2505.16517)] 145 | - Deep reinforcement learning for robotic manipulation [[technique report](https://patentimages.storage.googleapis.com/7f/04/95/2437c0dc1b5ab6/US20250153352A1.pdf)] 146 | - **DORA**:Object Affordance-Guided Reinforcement Learning for Dexterous Robotic Manipulation [[paper](https://arxiv.org/pdf/2505.14819)] [[project](https://sites.google.com/view/dora-manip)] 🧐 147 | - What Matters for Batch Online Reinforcement Learning in Robotics? [[paper](https://arxiv.org/pdf/2505.08078)] 148 | - **ReinboT**: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning [[paper](https://arxiv.org/pdf/2505.07395)] 149 | - **IN–RIL**: Interleaved Reinforement and Imitation Learning for Policy Fine-tuning [[paper](https://arxiv.org/pdf/2505.10442)] [[project](https://github.com/ucd-dare/IN-RIL)] 🧐 150 | - **MoRE**: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models[[paper](https://arxiv.org/pdf/2503.08007)] 151 | - **ConRFT**: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy [[paper](https://arxiv.org/pdf/2502.05450)] [[project](https://cccedric.github.io/conrft/)] ☀️ 152 | - **Rethinking Latent Redundancy in Behavior Cloning**: An Information Bottleneck Approach for Robot Manipulation [[paper](https://arxiv.org/pdf/2502.02853)] [[project](https://baishuanghao.github.io/BC-IB.github.io/)] ☀️ 153 | - Flow Q-Learning [[paper](https://arxiv.org/pdf/2502.02538)] [[project](https://seohong.me/projects/fql/)] ☀️ 154 | - Improving Vision-Language-Action Model with Online Reinforcement Learning [[paper](https://arxiv.org/pdf/2501.16664)] 155 | - **RLDG**: Robotic Generalist Policy Distillation via Reinforcement Learning [[paper](https://arxiv.org/pdf/2412.09858)] [[project](https://generalist-distillation.github.io/)] ☀️ 156 | - Vision Language Models are In-Context Value Learners [[paper](https://arxiv.org/pdf/2411.04549)] [[project](https://generative-value-learning.github.io/)] ☀️ 157 | - **Hil-serl**: Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning [[paper](https://arxiv.org/pdf/2410.21845)] [[project](https://hil-serl.github.io/)] ☀️ ✅ 158 | - **SERL**: A Software Suite for Sample-Efficient Robotic Reinforcement Learning [[paper](https://arxiv.org/pdf/2401.16013)] [[project](https://serl-robot.github.io/)] ☀️ ✅ 159 | 160 | ### 🌍 World Model in Robot Manipulation 161 | - **WMPO**: World Model-based Policy Optimization for Vision-Language-Action Models [[paper](https://arxiv.org/pdf/2511.09515)] [[project](https://wm-po.github.io/)] 162 | - **Ctrl-World**: A Controllable Generative World Model for Robot Manipulation [[paper](https://arxiv.org/abs/2510.10125)] [[project](https://ctrl-world.github.io/)] 163 | - **World4RL**: Diffusion World Models for Policy Refinement with Reinforcement Learning for Robotic Manipulation [[paper](https://arxiv.org/pdf/2509.19080)] [[project](https://world4rl.github.io/)] 164 | - Latent Action Pretraining Through World Modeling [[paper](https://arxiv.org/pdf/2509.18428)] 165 | - **OmniWorld**: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling [[paper](https://arxiv.org/pdf/2509.12201)] [[project](https://yangzhou24.github.io/OmniWorld/)] ☀️ 166 | - World Modeling with Probabilistic Structure Integration [[paper](https://arxiv.org/pdf/2509.09737)] 167 | - Planning with Reasoning using Vision Language World Model [[paper](https://arxiv.org/pdf/2509.02722)] 168 | - **Learning Primitive Embodied World Models**: Towards Scalable Robotic Learning [[paper](https://arxiv.org/pdf/2508.20840)] [[project](https://qiaosun22.github.io/PrimitiveWorld/)] 🧐 169 | - **GWM**: Towards Scalable Gaussian World Models for Robotic Manipulation [[paper](https://arxiv.org/abs/2508.17600)] [[project](https://gaussian-world-model.github.io/)] ✅ 170 | - Latent Policy Steering with Embodiment-Agnostic Pretrained World Models [[paper](https://arxiv.org/pdf/2507.13340)] 171 | - Test-Time Scaling with World Models for Spatial Reasoning [[paper](https://arxiv.org/pdf/2507.12508)] [[project](https://umass-embodied-agi.github.io/MindJourney/)] ☀️ 172 | - **FOUNDER**: Grounding Foundation Models in World Models for Open-Ended Embodied Decision Making [[paper](https://arxiv.org/pdf/2507.12496)] [[project](https://sites.google.com/view/founder-rl)] 173 | - **Martian World Models**: Controllable Video Synthesis with Physically Accurate 3D Reconstructions [[paper](https://arxiv.org/pdf/2507.07978)] [[project](https://marsgenai.github.io/)] ☀️ 174 | - **EmbodieDreamer**: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling [[paper](https://arxiv.org/pdf/2507.05198)] [[project](https://embodiedreamer.github.io/)] 🧐 175 | - **DreamVLA**: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge [[paper](https://arxiv.org/pdf/2507.04447)] [[project](https://zhangwenyao1.github.io/DreamVLA/)] ☀️ 176 | - A Survey: Learning Embodied Intelligence from Physical Simulators and World Models [[paper](https://arxiv.org/pdf/2507.00917)] [[project](https://github.com/NJU3DV-LoongGroup/Embodied-World-Models-Survey)] 177 | - **RoboScape**: Physics-informed Embodied World Model [[paper](https://arxiv.org/pdf/2506.23135)] [[project](https://github.com/tsinghua-fib-lab/RoboScape)] 🧐 178 | - **ParticleFormer**: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation [[paper](https://arxiv.org/pdf/2506.23126)] [[project](https://suninghuang19.github.io/particleformer_page/)] 🧐 179 | - **World4Omni**: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation [[paper](https://arxiv.org/pdf/2506.23919)] [[project](https://world4omni.github.io/)] 🧐 180 | - **RoboPearls**: Editable Video Simulation for Robot Manipulation [[paper](https://arxiv.org/pdf/2506.22756)] [[project](https://tangtaogo.github.io/RoboPearls/)] 🧐 181 | - Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins [[paper](https://arxiv.org/pdf/2506.13761)] [[project](https://prompting-with-the-future.github.io/)] ☀️ 182 | - **FLARE**: Robot Learning with Implicit World Modeling [[paper](https://arxiv.org/pdf/2505.15659)] [[project](https://research.nvidia.com/labs/gear/flare/)] 🧐 183 | - Occupancy World Model for Robots [[paper](https://arxiv.org/pdf/2505.05512)] 184 | - Learning 3D Persistent Embodied World Models [[paper](https://arxiv.org/pdf/2505.05495)] 185 | - **DYNAMICAL DIFFUSION**: LEARNING TEMPORAL DYNAMICS WITH DIFFUSION MODELS [[paper](https://arxiv.org/pdf/2503.00951)] [[project](https://github.com/thuml/dynamical-diffusion)] ☀️ 186 | - **iVideoGPT**: Interactive VideoGPTs are Scalable World Models [[paper](https://arxiv.org/pdf/2405.15223)] [[project](https://thuml.github.io/iVideoGPT/)] ☀️ ✅ 187 | 188 | ### 🦾 Skill Learning in Robotics 189 | 190 | ### 📦 Data and Benchmarks 191 | - Constraint-Preserving Data Generation for Visuomotor Policy Learning [[paper](https://arxiv.org/pdf/2508.03944)] [[project](https://cp-gen.github.io/)] ☀️ 192 | - **Shortcut Learning in Generalist Robot Policies**: The Role of Dataset Diversity and Fragmentation [[paper](https://arxiv.org/pdf/2508.06426)] ☀️ 193 | - **FreeTacMan**: Robot-free Visuo-Tactile Data Collection System for Contact-rich Manipulation [[paper](https://arxiv.org/pdf/2506.01941)] [[project](https://freetacmanblog.github.io/)] ☀️ 194 | - **Guiding Data Collection**: via Factored Scaling Curves [[paper](https://arxiv.org/pdf/2505.07728)] 195 | - **DemoGen**: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning [[paper](https://arxiv.org/pdf/2502.16932)] [[project](https://demo-generation.github.io/)] ☀️ 196 | - Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition [[paper](https://arxiv.org/pdf/2407.00299)] [[project](https://norweig1an.github.io/HAJL.github.io/)] 🧐 197 | - **AutoBio**: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory [[paper](https://arxiv.org/pdf/2505.14030) [[project](https://github.com/autobio-bench/AutoBio)] ☀️ 198 | - **3DFlowAction**: Learning Cross-Embodiment Manipulation from 3D Flow World Model [[paper](https://arxiv.org/pdf/2506.06199) [[project](https://github.com/Hoyyyaard/3DFlowAction/)] ☀️ 199 | - A very good survey and report on Simulators [[project](https://simulately.wiki)] 200 | - **EgoDex**: Learning Dexterous Manipulation from Large-Scale Egocentric Video [[paper](https://arxiv.org/pdf/2505.11709)] 201 | - **Open X-Embodiment**: Robotic Learning Datasets and RT-X Model [[paper](https://arxiv.org/pdf/2310.08864)] [[project](https://robotics-transformer-x.github.io/)] ✅ 202 | - **GENMANIP**: LLM-driven Simulation for Generalizable Instruction-Following Manipulation [[paper](https://arxiv.org/pdf/2506.10966)] [[project](https://genmanip.axi404.top/)] ✅ 203 | - **RoboArena**: Distributed Real-World Evaluation of Generalist Robot Policies [[paper](https://robo-arena.github.io/assets/roboarena-B1XSLVwD.pdf)] [[project](https://robo-arena.github.io/)] 204 | - **RoboCerebra**:ALarge-scaleBenchmarkfor Long-horizonRoboticManipulationEvaluation [[paper](https://arxiv.org/pdf/2506.06677)] [[project](https://robocerebra.github.io/)] 205 | - **RoboVerse**: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning [[paper](https://arxiv.org/pdf/2504.18904)] [[project](https://roboverseorg.github.io/)] ✅ 206 | - **AgiBot World Colosseo**: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems [[paper](https://arxiv.org/pdf/2503.06669)] [[project](https://agibot-world.com/)] ☀️ 207 | - **RoboTwin**: Dual-Arm Robot Benchmark with Generative Digital Twins [[paper](https://arxiv.org/pdf/2504.13059)] [[project](https://robotwin-benchmark.github.io/)] ✅ 208 | - **ManiSkill3**: Demonstrating GPU Parallelized Robot Simulation and Rendering for Generalizable Embodied AI [[paper](https://arxiv.org/pdf/2410.00425)] [[project](https://www.maniskill.ai/)] ✅ 209 | - **SimplerEnv**: Simulated Manipulation Policy Evaluation Environments for Real Robot Setups [[paper](https://arxiv.org/pdf/2405.05941)] [[project](https://simpler-env.github.io/)] ✅ 210 | - **LIBERO**: Benchmarking Knowledge Transfer for Lifelong Robot Learning [[paper](https://arxiv.org/pdf/2306.03310)] [[project](https://libero-project.github.io/main.html)] ✅ 211 | - **DISCOVERSE**: Efficient Robot Simulation in Complex High-Fidelity Environments [[paper](https://drive.google.com/file/d/1pG8N2qBdLuqj8_wylTYgsXYGOKMhwKXB/view)] [[project](https://air-discoverse.github.io/)] ✅ 212 | 213 | ### 🛠️ Hardware Projects on Robotics 214 | - **Vision in Action**: Learning Active Perception from HumanDemonstrations [[paper](https://arxiv.org/pdf/2506.15666)] [[project](https://vision-in-action.github.io/)] ☀️ 215 | - **TWIST**: Teleoperated Whole-Body Imitation System [[paper](https://arxiv.org/pdf/2505.02833)] [[project](https://yanjieze.com/TWIST/)] 🧐 216 | - **Berkeley Humanoid Lite**: An Open-source, Accessible, and Customizable 3D-printed Humanoid Robot [[project](https://arxiv.org/pdf/2504.17249)] [[project](https://lite.berkeley-humanoid.org/)] ☀️ 217 | - **BEHAVIOR Robot Suite**: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities [[paper](https://arxiv.org/pdf/2503.05652)] [[project](https://behavior-robot-suite.github.io/)] ☀️ ✅ 218 | - **Reactive Diffusion Policy**: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation [[paper](https://arxiv.org/pdf/2503.02881)] [[project](https://reactive-diffusion-policy.github.io/)] ☀️ 219 | - **HOVER**: Versatile Neural Whole-Body Controller for Humanoid Robots [[paper](https://arxiv.org/pdf/2410.21229)] [[project](https://hover-versatile-humanoid.github.io/)] ☀️ 220 | - **Mobile ALOHA**: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation [[paper](https://arxiv.org/pdf/2401.02117)] [[project](https://mobile-aloha.github.io/)] ☀️ 221 | 222 | 223 | ### 🔬 Interdisciplinary 224 | - The hippocampal sharp wave–ripple in memory retrieval for immediate use and consolidation [[paper](https://www.nature.com/articles/s41583-018-0077-1)] [[full text](https://pmc.ncbi.nlm.nih.gov/articles/PMC6794196/)] 225 | 226 | --- 227 | 228 | ## 🙋 Contributing 229 | 230 | This repo is inspired by Yanjie Ze's [[Paperlist](https://github.com/YanjieZe/awesome-humanoid-robot-learning)] 231 | Feel free to submit pull requests for new papers, corrected links, or updated results. 232 | 233 | --- 234 | 235 | ## 📜 License 236 | 237 | [MIT](LICENSE) 238 | --------------------------------------------------------------------------------