└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Paper-Reading-List of DeepTimer-Robot-Lab 2 | 3 | # Recent Random Papers 4 | - ICML 2024, **RoboCodeX**: Multimodal Code Generation for Robotic Behavior Synthesis, [Arxiv](https://arxiv.org/abs/2402.16117) 5 | - ICML 2024, **Voronav**: VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model, [Arxiv](https://arxiv.org/abs/2401.02695) 6 | - ICML 2024, Learning Reward for Robot Skills Using Large Language Models via Self-Alignment, [Arxiv](https://arxiv.org/pdf/2405.07162) 7 | - RSS 2024, **3D Diffusion Policy**: Generalizable Visuomotor Policy Learning via Simple 3D Representations, [Website](https://3d-diffusion-policy.github.io/) 8 | - arXiv 2024.05, **Model-based Diffusion** for Trajectory Optimization, [Website](https://lecar-lab.github.io/mbd/) 9 | - RSS 2024, **RoboCasa**: Large-Scale Simulation of Everyday Tasks for Generalist Robots, [Website](https://robocasa.ai/) 10 | - CVPR 2024, **OmniGlue**: Generalizable Feature Matching with Foundation Model Guidance, [Website](https://hwjiang1510.github.io/OmniGlue/) 11 | - arXiv 2024.05, **Pandora**: Towards General World Model with Natural Language Actions and Video States, [Website](https://world-model.maitrix.org/) 12 | - arXiv 2024.03, **GeoWizard**: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image, [Website](https://fuxiao0719.github.io/projects/geowizard/) 13 | - arXiv 2024.05, **TRANSIC**: Sim-to-Real Policy Transfer by Learning from Online Correction, [Website](https://transic-robot.github.io/) 14 | - RSS 2024, **Natural Language** Can Help Bridge the Sim2Real Gap, [arXiv](https://arxiv.org/abs/2405.10020) 15 | - ICML 2024, The **Platonic Representation** Hypothesis, [arXiv](https://arxiv.org/abs/2405.07987) 16 | - arXiv 2024.05, **SPIN**: Simultaneous Perception, Interaction and Navigation, [Website](https://spin-robot.github.io/) 17 | - RSS 2024, **Consistency Policy**: Accelerated Visuomotor Policies via Consistency Distillation, [Website](https://consistency-policy.github.io/) 18 | - arXiv 2024.05, **Humanoid Parkour** Learning, [Website](https://humanoid4parkour.github.io/) 19 | - arXiv 2024.05, **Evaluating Real-World Robot Manipulation Policies in Simulation**, [Website](https://simpler-env.github.io/) 20 | - arXiv 2024.05, **ScrewMimic**: Bimanual Imitation from Human Videos with Screw Space Projection, [Website](https://robin-lab.cs.utexas.edu/ScrewMimic/) 21 | - arXiv 2024.04, **DiffuseLoco**: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets, [arXiv](https://arxiv.org/abs/2404.19264) 22 | - arXiv 2024.05, **DrEureka**: Language Model Guided Sim-To-Real Transfer, [Website](https://eureka-research.github.io/dr-eureka/) 23 | - ICRA 2024, Learning Force Control for Legged Manipulation, [arXiv](https://arxiv.org/abs/2405.01402) 24 | - arXiv 2024.05, **IntervenGen**: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning, [arXiv](https://arxiv.org/abs/2405.01472) 25 | - arXiv 2024.05, **Track2Act**: Predicting Point Tracks from Internet Videos enables Diverse Zero-shot Robot Manipulation, [arXiv](https://arxiv.org/abs/2405.01527) 26 | - RSS 2023, **IndustReal**: Transferring Contact-Rich Assembly Tasks from Simulation to Reality, [Website](https://sites.google.com/nvidia.com/industreal) 27 | - SIIGRAPH 2023, **OctFormer**: Octree-based Transformers for 3D Point Clouds, [Website](https://wang-ps.github.io/octformer.html) 28 | - arXiv 2024.04, **Clio**: Real-time Task-Driven Open-Set 3D Scene Graphs, [arXiv](https://arxiv.org/abs/2404.13696) 29 | - arXiv 2024.04, **HATO**: Learning Visuotactile Skills with Two Multifingered Hands, [Website](https://toruowo.github.io/hato/) 30 | - arXiv 2024.04, **SpringGrasp**: Synthesizing Compliant Dexterous Grasps under Shape Uncertainty, [Website](https://stanford-tml.github.io/SpringGrasp/) 31 | - ICRA 2024 workshop, Object-Aware **Gaussian Splatting for Robotic Manipulation**, [OpenReview](https://openreview.net/forum?id=gdRI43hDgo) 32 | - arXiv 2024.04, **PhysDreamer**: Physics-Based Interaction with 3D Objects via Video Generation, [Website](https://physdreamer.github.io/) 33 | - arXiv 2015.09, **MPPI**: Model Predictive Path Integral Control using Covariance Variable Importance Sampling, [arXiv](https://arxiv.org/abs/1509.01149) 34 | - arXiv 2023.07, Sampling-based Model Predictive Control Leveraging Parallelizable Physics Simulations, [arXiv](https://arxiv.org/abs/2307.09105) / [Github](https://github.com/tud-airlab/mppi-isaac) 35 | - arXiv 2024.04, **BLINK**: Multimodal Large Language Models Can See but Not Perceive, [Website](https://zeyofu.github.io/blink/) 36 | - arXiv 2024.04, **Factorized Diffusion**: Perceptual Illusions by Noise Decomposition, [Website](https://dangeng.github.io/factorized_diffusion/) 37 | - CVPR 2024, Probing the 3D Awareness of Visual Foundation Models, [arXiv](https://arxiv.org/abs/2404.08636) 38 | - ICCV 2019, **Neural-Guided RANSAC**: Learning Where to Sample Model Hypotheses, [arXiv](https://arxiv.org/abs/1905.04132) 39 | - arXiv 2024.04, **QuasiSim**: Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer, [Website](https://meowuu7.github.io/QuasiSim/) 40 | - arXiv 2024.04, **Policy-Guided Diffusion**, [arXiv](https://arxiv.org/abs/2404.06356) / [Github](https://github.com/EmptyJackson/policy-guided-diffusion) 41 | - RoboSoft 2024, Body Design and Gait Generation of **Chair-Type Asymmetrical Tripedal** Low-rigidity Robot, [Website](https://shin0805.github.io/chair-type-tripedal-robot/) 42 | - CVPR 2024 oral, **MicKey**: Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences, [Website](https://nianticlabs.github.io/mickey/) 43 | - arXiv 2024.04, **ZeST**: Zero-Shot Material Transfer from a Single Image, [Website](https://ttchengab.github.io/zest/) 44 | - arXiv 2024.03, **Keypoint Action Tokens** Enable In-Context Imitation Learning in Robotics, [Website](https://www.robot-learning.uk/keypoint-action-tokens) 45 | - arXiv 2024.04, Reconstructing **Hand-Held Objects** in 3D, [arXiv](https://arxiv.org/abs/2404.06507) 46 | - ICRA 2024, **Actor-Critic Model Predictive Control**, [arXiv](https://arxiv.org/abs/2306.09852) 47 | - arXiv 2024.04, Finding Visual Task Vectors, [arXiv](https://arxiv.org/abs/2404.05729) 48 | - NeurIPS 2022, Visual Prompting via **Image Inpainting**, [arXiv](https://arxiv.org/abs/2209.00647) 49 | - CVPR 2024 highlight, **SpatialTracker**: Tracking Any 2D Pixels in 3D Space, [Website](https://henry123-boy.github.io/SpaTracker/) 50 | - CVPR 2024, **NeRF2Physics**: Physical Property Understanding from Language-Embedded Feature Fields, [Website](https://ajzhai.github.io/NeRF2Physics/) 51 | - CVPR 2024, **Scaling Laws of Synthetic Images** for Model Training ... for Now, [arXiv](https://arxiv.org/abs/2312.04567) 52 | - CVPR 2024, A Vision Check-up for Language Models, [arXiv](https://arxiv.org/abs/2401.01862) 53 | - CVPR 2024, **GenH2R**: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation, [Website](https://genh2r.github.io/) 54 | - arXiv 2024.04, **PreAfford**: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments, [Website](https://air-discover.github.io/PreAfford/) 55 | - CVPR 2024, **Lift3D**: Zero-Shot Lifting of Any 2D Vision Model to 3D, [Website](https://mukundvarmat.github.io/Lift3D/) 56 | - arXiv 2024.03, **LocoMan**: Advancing Versatile Quadrupedal Dexterity with Lightweight Loco-Manipulators, [arXiv](https://arxiv.org/abs/2403.18197) 57 | - arXiv 2024.03, Leveraging **Symmetry** in RL-based Legged Locomotion Control, [arXiv](https://arxiv.org/abs/2403.17320) 58 | - arXiv 2024.03, **RoboDuet**: A Framework Affording Mobile-Manipulation and Cross-Embodiment, [arXiv](https://arxiv.org/abs/2403.17367) 59 | - arXiv 2024.03, Imitation Bootstrapped Reinforcement Learning, [arXiv](https://arxiv.org/abs/2311.02198) 60 | - arXiv 2024.03, **Visual Whole-Body Control** for Legged Loco-Manipulation, [arXiv](https://arxiv.org/abs/2403.16967) 61 | - arXiv 2024.03, **S2**: When Do We Not Need Larger Vision Models? [arXiv](https://arxiv.org/abs/2403.13043) 62 | - ICCV 2021, **DPT**: Vision Transformers for Dense Prediction, [arXiv](https://arxiv.org/abs/2103.13413) 63 | - arXiv 2024.03, **GRM**: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation, [Website](https://justimyhxu.github.io/projects/grm/) 64 | - arXiv 2024.03, **MVSplat**: Efficient 3D Gaussian Splatting from Sparse Multi-View Images, [Website](https://donydchen.github.io/mvsplat/) 65 | - arXiv 2024.03, **LiFT**: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors, [Website](https://www.cs.umd.edu/~sakshams/LiFT/) 66 | - SIGGRAPH 2023, **VET**: Visual Error Tomography for Point Cloud Completion and High-Quality Neural Rendering, [Github](https://github.com/lfranke/vet) 67 | - arXiv 2024.03, On **Pretraining Data Diversity** for Self-Supervised Learning, [arXiv](https://arxiv.org/abs/2403.13808) 68 | - arXiv 2024.03, **FeatUp**: A Model-Agnostic Framework for Features at Any Resolution, [arXiv](https://arxiv.org/abs/2403.10516) / [Github](https://github.com/mhamilton723/FeatUp) 69 | - arXiv 2024.03, **Vid2Robot**: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers, [arXiv](https://arxiv.org/abs/2403.12943) 70 | - arXiv 2024.03, **Yell At Your Robot**: Improving On-the-Fly from Language Corrections, [arXiv](https://arxiv.org/abs/2403.12910) 71 | - arXiv 2024.03, **DROID**: A Large-Scale In-the-Wild Robot Manipulation Dataset, [Website](https://droid-dataset.github.io/) 72 | - ICLR 2024 oral, **Ghost on the Shell**: An Expressive Representation of General 3D Shapes, [Website](https://gshell3d.github.io/) 73 | - arXiv 2024.03, **HumanoidBench**: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation, [Website](https://sferrazza.cc/humanoidbench_site/) 74 | - arXiv 2024.03, **PaperBot**: Learning to Design Real-World Tools Using Paper, [arXiv](https://arxiv.org/abs/2403.09566) 75 | - arXiv 2024.03, **GaussianGrasper**: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping, [arXiv](https://arxiv.org/abs/2403.09637) 76 | - arXiv 2024.03, A Decade's Battle on **Dataset Bias**: Are We There Yet? [arXiv](https://arxiv.org/abs/2403.08632) 77 | - arXiv 2024.03, **ManiGaussian**: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation, [arXiv](https://arxiv.org/abs/2403.08321) 78 | - arXiv 2024.03, Learning **Generalizable Feature Fields** for Mobile Manipulation, [arXiv](https://arxiv.org/abs/2403.07563) 79 | - arXiv 2024.03, **DexCap**: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation, [arXiv](https://arxiv.org/abs/2403.07788) 80 | - arXiv 2024.03, **TeleMoMa**: A Modular and Versatile Teleoperation System for Mobile Manipulation, [arXiv](https://arxiv.org/abs/2403.07869) 81 | - arXiv 2024.03, **OPEN TEACH**: A Versatile Teleoperation System for Robotic Manipulation, [arXiv](https://arxiv.org/abs/2403.07870) 82 | - CVPR 2020 oral, **SuperGlue**: Learning Feature Matching with Graph Neural Networks, [Github](https://github.com/magicleap/SuperGluePretrainedNetwork) 83 | - ICRA 2024, Learning to walk in confined spaces using 3D representation, [arXiv](https://arxiv.org/abs/2403.00187) 84 | - CVPR 2024, **Hierarchical Diffusion Policy** for Kinematics-Aware Multi-Task Robotic Manipulation, [arXiv](https://arxiv.org/abs/2403.03890) / [Website](https://yusufma03.github.io/projects/hdp/) 85 | - arXiv 2024.03, Reconciling Reality through Simulation: A **Real-to-Sim-to-Real** Approach for Robust Manipulation, [Website](https://real-to-sim-to-real.github.io/RialTo/) 86 | - ICRA 2024, **Dexterous Legged Locomotion** in Confined 3D Spaces with Reinforcement Learning, [arXiv](https://arxiv.org/abs/2403.03848) 87 | - arXiv 2024.03, **MOKA**: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting, [Website](https://moka-manipulation.github.io/) 88 | - arXiv 2024.03, **VQ-BeT**: Behavior Generation with Latent Actions, [arXiv](https://arxiv.org/abs/2403.03181) / [Website](https://sjlee.cc/vq-bet/) 89 | - **Humanoid-Gym**: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer, [Website](https://sites.google.com/view/humanoid-gym/) 90 | - arXiv 2024.03, Twisting Lids Off with Two Hands, [Website](https://toruowo.github.io/bimanual-twist/) 91 | - ICLR 2023 spotlight, Multi-skill Mobile Manipulation for Object Rearrangement, [Github](https://github.com/Jiayuan-Gu/hab-mobile-manipulation) 92 | - CVPR 2024, **Gaussian Splatting SLAM**, [Github](https://github.com/muskie82/MonoGS) 93 | - arXiv 2024.03, **TripoSR**: Fast 3D Object Reconstruction from a Single Image, [Github](https://github.com/VAST-AI-Research/TripoSR) 94 | - arXiv 2024.03, **Point Could Mamba**: Point Cloud Learning via State Space Model, [arXiv](https://arxiv.org/abs/2403.00762) 95 | - CVPR 2024, Rethinking Few-shot 3D Point Cloud Semantic Segmentation, [arXiv](https://arxiv.org/abs/2403.00592) 96 | - ICLR 2024, Can Transformers Capture Spatial Relations between Objects? [arXiv](https://arxiv.org/abs/2403.00729) / [Website](https://sites.google.com/view/spatial-relation) 97 | - SIGGRAPH Asia 2023, **CamP**: Camera Preconditioning for Neural Radiance Fields, [Website](https://camp-nerf.github.io/) / [Github](https://github.com/jonbarron/camp_zipnerf) 98 | - arXiv 2024.02, **Extreme Cross-Embodiment Learning** for Manipulation and Navigation, [Website](https://extreme-cross-embodiment.github.io/) 99 | - CVPR 2024, **DUSt3R**: Geometric 3D Vision Made Easy, [Github](https://github.com/naver/dust3r) 100 | - CVPR 2018 best paper, **TASKONOMY**: Disentangling Task Transfer Learning, [Website](http://taskonomy.stanford.edu/) 101 | - arXiv 2024.02, **Mirage**: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting, [Website](https://robot-mirage.github.io/) 102 | - CVPR 2024, **Diffusion 3D Features (Diff3F)**: Decorating Untextured Shapes with Distilled Semantic Features, [Website](https://diff3f.github.io/) 103 | - arXiv 2024.02, **Disentangled 3D Scene Gen­eration** with Layout Learning, [Website](https://dave.ml/layoutlearning/) 104 | - arXiv 2024.02, **Transparent Image Layer Diffusion** using Latent Transparency, [Website](https://arxiv.org/abs/2402.17113) 105 | - arXiv 2024.02, **Diffusion Meets DAgger**: Supercharging Eye-in-hand Imitation Learning, [Website](https://sites.google.com/view/diffusion-meets-dagger) 106 | - arXiv 2024.02, Massive Activations in Large Language Models, [Website](https://eric-mingjie.github.io/massive-activations/index.html) 107 | - arXiv 2024.02, Dynamics-Guided Diffusion Model for **Robot Manipulator Design**, [Website](https://dgdm-robot.github.io/) 108 | - arXiv 2024.02, **Genie**: Generative Interactive Environments, [arXiv](https://arxiv.org/abs/2402.15391) / [Website](https://sites.google.com/view/genie-2024/) 109 | - arXiv 2024.02, **CyberDemo**: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation, [arXiv](https://arxiv.org/abs/2402.14795) / [Website](https://cyber-demo.github.io/) 110 | - CoRL 2020, **DSR**: Learning 3D Dynamic Scene Representations for Robot Manipulation, [Website](https://dsr-net.cs.columbia.edu/) 111 | - ICLR 2024 oral, Cameras as Rays: Pose Estimation via **Ray Diffusion**, [Website](https://jasonyzhang.com/RayDiffusion/) 112 | - arXiv 2024.02, **Pedipulate**: Enabling Manipulation Skills using a Quadruped Robot's Leg, [arXiv](https://arxiv.org/abs/2402.10837) 113 | - arXiv 2024.02, **LMPC**: Learning to Learn Faster from Human Feedback with Language Model Predictive Control, [Website](https://robot-teaching.github.io/) 114 | - arXiv 2023.12, **W.A.L.T**: Photorealistic Video Generation with Diffusion Models, [Website](https://walt-video-diffusion.github.io/) 115 | - arXiv 2024.02, **Universal Manipulation Interface**: In-The-Wild Robot Teaching Without In-The-Wild Robots, [Website](https://umi-gripper.github.io/) 116 | - ICCV 2023 oral, **DiT**: Scalable Diffusion Models with Transformers, [Website](https://www.wpeebles.com/DiT) 117 | - arXiv 2023.07, Diffusion Models Beat GANs on Image Classification, [arXiv](https://arxiv.org/abs/2307.08702) 118 | - ICCV 2023 oral, **DDAE**: Denoising Diffusion Autoencoders are Unified Self-supervised Learners, [arXiv](https://arxiv.org/abs/2303.09769) 119 | - arXiv 2024.12, **Mosaic-SDF** for 3D Generative Models, [arXiv](https://arxiv.org/abs/2312.09222) / [Website](https://lioryariv.github.io/msdf/) 120 | - arXiv 2024.02, **POCO**: Policy Composition From and For Heterogeneous Robot Learning, [Website](https://liruiw.github.io/policycomp/) 121 | - ICML 2024 submission, **Latent Graph Diffusion**: A Unified Framework for Generation and Prediction on Graphs, [arXiv](https://arxiv.org/abs/2402.02518) 122 | - ICLR 2024 spotlight, **AMAGO**: Scalable In-Context Reinforcement Learning for Adaptive Agents, [arXiv](https://arxiv.org/abs/2310.09971) 123 | - arXiv 2024.02, Offline Actor-Critic Reinforcement Learning Scales to Large Models, [arXiv](https://arxiv.org/abs/2402.05546) 124 | - arXiv 2024.02, **V-IRL**: Grounding Virtual Intelligence in Real Life, [Website](https://virl-platform.github.io/) 125 | - ICRA 2024, **SERL**: A Software Suite for Sample-Efficient Robotic Reinforcement Learning, [Website](https://serl-robot.github.io/) 126 | - arXiv 2024.01, Generative Expressive Robot Behaviors using Large Language Models, [arXiv](https://arxiv.org/abs/2401.14673) 127 | - arXiv 2024.01, **pix2gestalt**: Amodal Segmentation by Synthesizing Wholes, [Website](https://gestalt.cs.columbia.edu/) 128 | - arXiv 2024.01, **DAE**: Deconstructing Denoising Diffusion Models for Self-Supervised Learning, [arXiv](https://arxiv.org/abs/2401.14404) 129 | - ICLR 2024, **DittoGym**: Learning to Control Soft Shape-Shifting Robots, [Website](https://dittogym.github.io/) 130 | - arXiv 2024.01, **WildRGB-D**: RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos, [Website](https://wildrgbd.github.io/) 131 | - arXiv 2024.01, **Spatial VLM**: Endowing Vision-Language Models with Spatial Reasoning Capabilities, [Website](https://spatial-vlm.github.io/) 132 | - arXiv 2024.01, Multimodal **Visual-Tactile Representation** Learning through Self-Supervised Contrastive Pre-Training, [arXiv](https://arxiv.org/abs/2401.12024) 133 | - arXiv 2024.01, **OK-Robot**: What Really Matters in Integrating Open-Knowledge Models for Robotics, [Website](https://ok-robot.github.io/) 134 | - L4DC 2023, **Agile Catching** with Whole-Body MPC and Blackbox Policy Learning, [arXiv](https://arxiv.org/abs/2306.08205) 135 | - arXiv 2024.01, **Depth Anything**: Unleashing the Power of Large-Scale Unlabeled Data, [Github](https://github.com/LiheYoung/Depth-Anything?tab=readme-ov-file) 136 | - arXiv 2024.01, **WorldDreamer**: Towards General World Models for Video Generation via Predicting Masked Tokens, [Website](https://world-dreamer.github.io/) 137 | - arXiv 2024.01, **VMamba**: Visual State Space Model, [Github](https://github.com/MzeroMiko/VMamba) 138 | - arXiv 2024.01, **DiffusionGPT**: LLM-Driven Text-to-Image Generation System, [arXiv](https://arxiv.org/abs/2401.10061) /[Website](https://diffusiongpt.github.io/) 139 | - arXiv 2023.12, **PhysHOI**: Physics-Based Imitation of Dynamic Human-Object Interaction, [Website](https://wyhuai.github.io/physhoi-page/) 140 | - ICLR 2024 oral, **UniSim**: Learning Interactive Real-World Simulators, [OpenReview](https://openreview.net/forum?id=sFyTZEqmUY) 141 | - ICLR 2024 oral, **ASID**: Active Exploration for System Identification and Reconstruction in Robotic Manipulation, [OpenReview](https://openreview.net/forum?id=jNR6s6OSBT) 142 | - ICLR 2024 oral, Mastering **Memory Tasks** with World Models, [OpenReview](https://openreview.net/forum?id=1vDArHJ68h) 143 | - ICLR 2024 oral, Predictive auxiliary objectives in deep RL mimic learning in the brain, [OpenReview](https://openreview.net/forum?id=agPpmEgf8C) 144 | - ICLR 2024 oral, **Is ImageNet worth 1 video?** Learning strong image encoders from 1 long unlabelled video, [arXiv](https://arxiv.org/abs/2310.08584) / [OpenReview](https://openreview.net/forum?id=Yen1lGns2o) 145 | - arXiv 2024.01, **URHand**: Universal Relightable Hands, [Website](https://frozenburning.github.io/projects/urhand/) 146 | - arXiv 2023.12, **Mamba**: Linear-Time Sequence Modeling with Selective State Spaces, [arXiv](https://arxiv.org/abs/2312.00752) / [Github](https://github.com/state-spaces/mamba) 147 | - ICLR 2022, **S4**: Efficiently Modeling Long Sequences with Structured State Spaces, [arXiv](https://arxiv.org/abs/2111.00396) 148 | - arXiv 2024.01, **Dr2Net**: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning, [arXiv](https://arxiv.org/abs/2401.04105) 149 | - arXiv 2023.12, **3D-LFM**: Lifting Foundation Model, [Website](https://3dlfm.github.io/) 150 | - arXiv 2024.01, **DVT**: Denoising Vision Transformers, [Website](https://jiawei-yang.github.io/DenoisingViT/) 151 | - arXiv 2024.01, **Open-Vocabulary SAM**: Segment and Recognize Twenty-thousand Classes Interactively, [Website](https://www.mmlab-ntu.com/project/ovsam/) / [Code](https://github.com/HarborYuan/ovsam) 152 | - arXiv 2024.01, **ATM**: Any-point Trajectory Modeling for Policy Learning, [Website](https://xingyu-lin.github.io/atm/) 153 | - CVPR 2024 submission, **Learning Vision from Models** Rivals Learning Vision from Data, [arXiv](https://arxiv.org/abs/2312.17742) / [Github](https://github.com/google-research/syn-rep-learn) 154 | - CVPR 2024 submission, Visual Point Cloud Forecasting enables **Scalable Autonomous Driving**, [arXiv](https://arxiv.org/abs/2312.17655) / [Github](https://github.com/OpenDriveLab/ViDAR) 155 | - CVPR 2024 submission, **Ponymation**: Learning 3D Animal Motions from Unlabeled Online Videos, [arXiv](https://arxiv.org/abs/2312.13604) / [Website](https://keqiangsun.github.io/projects/ponymation/) 156 | - CVPR 2024 submission, **V\***: Guided Visual Search as a Core Mechanism in Multimodal LLMs, [Website](https://vstar-seal.github.io/) 157 | - NIPS 2021 outstanding paper, Deep Reinforcement Learning at the Edge of the Statistical Precipice, [arXiv](https://arxiv.org/abs/2108.13264) / [Website](https://agarwl.github.io/rliable/) 158 | - CVPR 2024 submission, Zero-Shot **Metric Depth** with a Field-of-View Conditioned Diffusion Model, [Website](https://diffusion-vision.github.io/dmd/) 159 | - ICLR 2023, **Deep Learning on 3D Neural Fields**, [arXiv](https://arxiv.org/abs/2312.13277) 160 | - CVPR 2024 submission, **Tracking Any Object Amodally**, [Website](https://tao-amodal.github.io/) 161 | - CVPR 2024 submission, **MobileSAMv2**: Faster Segment Anything to Everything, [Github](https://github.com/ChaoningZhang/MobileSAM) 162 | - CVPR 2024 submission, **AnyDoor**: Zero-shot Object-level Image Customization, [Github](https://github.com/damo-vilab/AnyDoor) 163 | - CVPR 2024 submission, **Point Transformer V3**: Simpler, Faster, Stronger, [arXiv](https://arxiv.org/abs/2312.10035) / [Github](https://github.com/Pointcept/PointTransformerV3) 164 | - CVPR 2024 submission, **Alchemist**: Parametric Control of Material Properties with Diffusion Models, [Website](https://prafullsharma.net/alchemist/) 165 | - CVPR 2024 submission, **Reconstructing Hands in 3D** with Transformers, [Website](https://geopavlakos.github.io/hamer/) 166 | - CVPR 2024 submission, Language-Informed Visual Concept Learning, [Website](https://ai.stanford.edu/~yzzhang/projects/concept-axes/) 167 | - CVPR 2024 submission, **RCG**: Self-conditioned Image Generation via Generating Representations, [arXiv](https://arxiv.org/abs/2312.03701) / [Github](https://github.com/LTH14/rcg) 168 | - CVPR 2024 submission, **Describing Differences in Image Sets** with Natural Language, [Website](https://understanding-visual-datasets.github.io/VisDiff-website/) 169 | - CVPR 2024 submission, **FaceStudio**: Put Your Face Everywhere in Seconds, [Website](https://icoz69.github.io/facestudio/) 170 | - CVPR 2024 submission, **ImageDream**: Image-Prompt Multi-view Diffusion for 3D Generation, [Website](https://Image-Dream.github.io) 171 | - CVPR 2024 submission, **Fine-grained Controllable Video Generation** via Object Appearance and Context, [Website](https://hhsinping.github.io/factor/) 172 | - CVPR 2024 submission, **AmbiGen**: Generating Ambigrams from Pre-trained Diffusion Model, [Website](https://raymond-yeh.com/AmbiGen/) 173 | - CVPR 2024 submission, **ReconFusion**: 3D Reconstruction with Diffusion Priors, [Website](https://reconfusion.github.io/) 174 | - CVPR 2024 submission, **Ego-Exo4D**: Understanding Skilled Human Activity from First- and Third-Person Perspectives, [arXiv](https://arxiv.org/abs/2311.18259) / [Website](https://ego-exo4d-data.org/) 175 | - CVPR 2024 submission, **MagicAnimate**: Temporally Consistent Human Image Animation using Diffusion Model, [Github](https://github.com/magic-research/magic-animate) 176 | - CVPR 2024 submission, **VideoSwap**: Customized Video Subject Swapping with Interactive Semantic Point Correspondence, [Website](https://videoswap.github.io/) 177 | - CVPR 2024 submission, **IMProv**: Inpainting-based Multimodal Prompting for Computer Vision Tasks, [Website](https://jerryxu.net/IMProv/) 178 | - CVPR 2024 submission, Generative **Powers of Ten**, [Website](https://powers-of-10.github.io/) 179 | - CVPR 2024 submission, **DiffiT**: Diffusion Vision Transformers for Image Generation, [arXiv](https://arxiv.org/abs/2312.02139) 180 | - CVPR 2024 submission, Learning from **One Continuous Video Stream**, [arXiv](https://arxiv.org/abs/2312.00598) 181 | - CVPR 2024 submission, **EvE**: Exploiting Generative Priors for Radiance Field Enrichment, [Website](https://eve-nvs.github.io/) 182 | - CVPR 2024 submission, **Oryon**: Open-Vocabulary Object 6D Pose Estimation, [Website](https://jcorsetti.github.io/oryon-website/) 183 | - CVPR 2024 submission, **Dense Optical Tracking**: Connecting the Dots, [Website](https://16lemoing.github.io/dot/) 184 | - CVPR 2024 submission, Sequential Modeling Enables Scalable Learning for **Large Vision Models**, [Website](https://yutongbai.com/lvm.html) 185 | - CVPR 2024 submission, **VideoBooth**: Diffusion-based Video Generation with Image Prompts, [Website](https://vchitect.github.io/VideoBooth-project/) 186 | - CVPR 2024 submission, **SODA**: Bottleneck Diffusion Models for Representation Learning, [Website](https://soda-diffusion.github.io/) 187 | - CVPR 2024 submission, Exploiting **Diffusion Prior** for Generalizable Pixel-Level Semantic Prediction, [Website](https://shinying.github.io/dmp/) 188 | - arXiv 2023.11, Initializing Models with Larger Ones, [arXiv](https://arxiv.org/abs/2311.18823) 189 | - CVPR 2024 submission, **Animate Anyone**: Consistent and Controllable Image-to-Video Synthesis for Character Animation, [Website](https://humanaigc.github.io/animate-anyone/) / [Github](https://github.com/HumanAIGC/AnimateAnyone) 190 | - CVPR 2023 best demo award, **Diffusion Illusions**: Hiding Images in Plain Sight, [Website](https://diffusionillusions.com/) 191 | - CVPR 2024 submission, Do text-free diffusion models learn discriminative visual representations? [Website](https://mgwillia.github.io/diffssl/) 192 | - CVPR 2024 submission, **Visual Anagrams**: Synthesizing Multi-View Optical Illusions with Diffusion Models, [Website](https://dangeng.github.io/visual_anagrams/) 193 | - NIPS 2023, **Provable Guarantees for Generative Behavior Cloning**: Bridging Low-Level Stability and High-Level Behavior, [OpenReview](https://openreview.net/forum?id=PhFVF0gwid) 194 | - CoRL 2023 best paper, **Distilled Feature Fields** Enable Few-Shot Language-Guided Manipulation, [Website](https://f3rm.github.io/) 195 | - ICLR 2024 submission, **RLIF**: Interactive Imitation Learning as Reinforcement Learning, [Website](https://rlif-page.github.io/) / [arXiv](https://arxiv.org/abs/2311.12996) 196 | - CVPR 2024 submission, **PIE-NeRF**: Physics-based Interactive Elastodynamics with NeRF, [arXiv](https://arxiv.org/abs/2311.13099) 197 | - RSS 2018, **Asymmetric Actor Critic** for Image-Based Robot Learning, [arXiv](https://arxiv.org/abs/1710.06542) 198 | - ICLR 2022, **RvS**: What is Essential for Offline RL via Supervised Learning?, [arXiv](https://arxiv.org/abs/2112.10751) 199 | - NIPS 2021, **Stochastic Solutions** for Linear Inverse Problems using the Prior Implicit in a Denoiser, [arXiv](https://arxiv.org/abs/2007.13640) 200 | - ICLR 2024 submission, Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning, [OpenReview](https://openreview.net/forum?id=v8jdwkUNXb) / [arXiv](https://arxiv.org/abs/2309.16984) 201 | - ICLR 2024 submission, Improved Techniques for Training Consistency Models, [OpenReview](https://openreview.net/forum?id=WNzy9bRDvG) / [arXiv](https://arxiv.org/abs/2310.14189) 202 | - ICLR 2024 submission, **Privileged Sensing** Scaffolds Reinforcement Learning, [OpenReview](https://openreview.net/forum?id=EpVe8jAjdx) 203 | - ICLR 2024 submission, **SafeDiffuser**: Safe Planning with Diffusion Probabilistic Models, [arXiv](https://arxiv.org/abs/2306.00148) / [Website](https://safediffuser.github.io/safediffuser/) 204 | - NIPS 2023 workshop, Vision-Language Models Provide Promptable Representations for Reinforcement Learning, [OpenReview](https://openreview.net/forum?id=AVg8WnI5ba) 205 | - ICLR 2023 oral, **Extreme Q-Learning**: MaxEnt RL without Entropy, [Website](https://div99.github.io/XQL/) 206 | - ICLR 2024 submission, Generalization in diffusion models arises from geometry-adaptive harmonic representation, [OpenReview](https://openreview.net/forum?id=ANvmVS2Yr0) 207 | - ICLR 2024 submission, **DiffTOP**: Differentiable Trajectory Optimization as a Policy Class for Reinforcement and Imitation Learning, [OpenReview](https://openreview.net/forum?id=HL5P4H8eO2) 208 | - CoRL 2023 best system paper, **RoboCook**: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools, [Website](https://hshi74.github.io/robocook/) 209 | - CoRL 2023, Learning to Design and Use Tools for Robotic Manipulation, [Website](https://robotic-tool-design.github.io/) 210 | - arXiv 2023.10, Learning to (Learn at Test Time), [arXiv](https://arxiv.org/abs/2310.13807) / [Github](https://github.com/test-time-training/mttt) 211 | - CoRL 2023 workshop, **FMB**: a Functional Manipulation Benchmark for Generalizable Robotic Learning, [OpenReview](https://openreview.net/pdf?id=055oRimwls) / [Website](https://sites.google.com/view/manipulationbenchmark) 212 | - 2023.10, Non-parametric regression for robot learning on manifolds, [arXiv](https://arxiv.org/abs/2310.19561) 213 | - IROS 2021, Explaining the Decisions of Deep Policy Networks for Robotic Manipulations, [arXiv](https://arxiv.org/abs/2310.19432) 214 | - ICML 2022, The **primacy bias** in deep reinforcement learning, [arXiv](https://arxiv.org/abs/2205.07802) 215 | - ICML 2023 oral, The **dormant neuron** phenomenon in deep reinforcement learning, [arXiv](https://arxiv.org/abs/2302.12902) 216 | - arXiv 2022.04, Simplicial Embeddings in Self-Supervised Learning and Downstream Classification, [arXiv](https://arxiv.org/abs/2204.00616) 217 | - arXiv 2023.10, **SparseDFF**: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation, [arXiv](https://arxiv.org/abs/2310.16838) 218 | - arXiv 2023.10, **SAM-CLIP**: Merging Vision Foundation Models towards Semantic and Spatial Understanding, [arXiv](https://arxiv.org/abs/2310.15308) 219 | - arXiv 2023.10, **TD-MPC2**: Scalable, Robust World Models for Continuous Control, [arXiv](https://arxiv.org/abs/2310.16828) / [Github](https://github.com/nicklashansen/tdmpc2) 220 | - arXiv 2023.10, **EquivAct**: SIM(3)-Equivariant Visuomotor Policies beyond Rigid Object Manipulation, [Website](https://equivact.github.io/) 221 | - NeurIPS 2022, **CodeRL**: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning, [arXiv](https://arxiv.org/abs/2207.01780) / [Github](https://github.com/salesforce/CodeRL) 222 | - arXiv 2023.10, **Robot Fine-Tuning Made Easy**: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning, [Website](https://robofume.github.io/) 223 | - CoRL 2023, **SAQ**: Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning, [Website](https://saqrl.github.io/) 224 | - arXiv 2023.10, Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning, [arXiv](https://arxiv.org/abs/2310.09676) 225 | - arXiv 2023.03, **PLEX**: Making the Most of the Available Data for Robotic Manipulation Pretraining, [arXiv](https://arxiv.org/abs/2303.08789) 226 | - arXiv 2023.10, **LAMP**: Learn A Motion Pattern for Few-Shot-Based Video Generation, [Website](https://rq-wu.github.io/projects/LAMP/) 227 | - arXiv 2023.10, **4K4D**: Real-Time 4D View Synthesis at 4K Resolution, [Website](https://zju3dv.github.io/4k4d/) 228 | - arXiv 2023.10, **SuSIE**: Subgoal Synthesis via Image Editing, [Website](https://rail-berkeley.github.io/susie/) 229 | - arXiv 2023.10, **Universal Visual Decomposer**: Long-Horizon Manipulation Made Easy, [Website](https://zcczhang.github.io/UVD/) 230 | - arXiv 2023.10, Learning to Act from Actionless Video through Dense Correspondences, [Website](https://flow-diffusion.github.io/) 231 | - NIPS 2023, **CEC**: Cross-Episodic Curriculum for Transformer Agents, [Website](https://cec-agent.github.io/) 232 | - ICLR 2024 submission, **TD-MPC2**: Scalable, Robust World Models for Continuous Control, [Oepnreview](https://openreview.net/forum?id=Oxh5CstDJU) 233 | - ICLR 2024 submission, **3D Diffuser Actor**: Multi-task 3D Robot Manipulation with Iterative Error Feedback, [Openreview](https://openreview.net/forum?id=UnsLGUCynE) 234 | - ICLR 2024 submission, **NeRFuser**: Diffusion Guided Multi-Task 3D Policy Learning, [Openreview](https://openreview.net/forum?id=8GmPLkO0oR) 235 | - arXiv 2023.10, **Foundation Reinforcement Learning**: towards Embodied Generalist Agents with Foundation Prior Assistance, [arXiv](https://arxiv.org/abs/2310.02635) 236 | - ICCV 2023, **S3IM**: Stochastic Structural SIMilarity and Its Unreasonable Effectiveness for Neural Fields, [Website](https://madaoer.github.io/s3im_nerf/) 237 | - arXiv 2023.09, **Text2Reward**: Automated Dense Reward Function Generation for Reinforcement Learning, [Website](https://text-to-reward.github.io/) / [arXiv](https://arxiv.org/abs/2309.11489) 238 | - ICCV 2023, End2End Multi-View Feature Matching with Differentiable Pose Optimization, [Website](https://barbararoessle.github.io/e2e_multi_view_matching/) 239 | - arXiv 2023.10, Aligning Text-to-Image Diffusion Models with Reward Backpropagation, [Website](https://align-prop.github.io/) / [Github](https://github.com/mihirp1998/AlignProp/) 240 | - NeurIPS 2023, **EDP**: Efficient Diffusion Policies for Offline Reinforcement Learning, [arXiv](https://arxiv.org/abs/2305.20081) / [Github](https://github.com/sail-sg/edp) 241 | - arXiv 2023.09, **See to Touch**: Learning Tactile Dexterity through Visual Incentives, [arXiv](https://arxiv.org/abs/2309.12300) / [Website](https://see-to-touch.github.io/) 242 | - RSS 2023, **SAM-RL**: Sensing-Aware Model-Based Reinforcement Learning via Differentiable Physics-Based Simulation and Rendering, [arXiv](https://arxiv.org/abs/2210.15185) / [Website](https://sites.google.com/view/rss-sam-rl) 243 | - arXiv 2023.09, **MoDem-V2**: Visuo-Motor World Models for Real-World Robot Manipulation, [arXiv](https://arxiv.org/abs/2309.14236) / [Website](https://sites.google.com/view/modem-v2) 244 | - arXiv 2023.09, **DreamGaussian**: Generative Gaussian Splatting for Efficient 3D Content Creation, [Website](https://github.com/dreamgaussian/dreamgaussian) / [Github](https://github.com/dreamgaussian/dreamgaussian) 245 | - arXiv 2023.09, **D3Fields**: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation, [Website](https://robopil.github.io/d3fields/) / [Github](https://github.com/WangYixuan12/d3fields) 246 | - arXiv 2023.09, **GELLO**: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators, [Website](https://wuphilipp.github.io/gello_site/) / [arXiv](https://arxiv.org/abs/2309.13037) 247 | - arXiv 2023.09, Human-Assisted Continual Robot Learning with Foundation Models, [Website](https://sites.google.com/mit.edu/halp-robot-learning) / [arXiv](https://arxiv.org/abs/2309.14321) 248 | - arXiv 2023.09, Robotic Offline RL from Internet Videos via Value-Function Pre-Training, [arXiv](https://arxiv.org/abs/2309.13041) / [Website](https://dibyaghosh.com/vptr/) 249 | - ICCV 2023, **PointOdyssey**: A Large-Scale Synthetic Dataset for Long-Term Point Tracking, [arXiv](https://arxiv.org/abs/2307.15055) / [Github](https://github.com/aharley/pips2) 250 | - arXiv 2023, Compositional Foundation Models for Hierarchical Planning, [Website](https://hierarchical-planning-foundation-model.github.io/) 251 | - RSS 2022 Best Student Paper Award Finalist, **ACID**: Action-Conditional Implicit Visual Dynamics for Deformable Object Manipulation, [Website](https://b0ku1.github.io/acid/) 252 | - CoRL 2023, **REBOOT**: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation, [arXiv](https://arxiv.org/abs/2309.03322) / [Website](https://sites.google.com/view/reboot-dexterous) 253 | - CoRL 2023, An Unbiased Look at Datasets for Visuo-Motor Pre-Training, [OpenReview](https://openreview.net/pdf?id=qVc7NWYTRZ6) 254 | - CoRL 2023, **Q-Transformer**: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, [OpenReview](https://openreview.net/pdf?id=0I3su3mkuL) 255 | - ICCV 2023 oral, Tracking Everything Everywhere All at Once, [Website](https://omnimotion.github.io/) 256 | - arXiv 2023.08, **RoboTAP**: Tracking Arbitrary Points for Few-Shot Visual Imitation, [arXiv](https://arxiv.org/abs/2308.15975) / [Website](https://arxiv.org/abs/2308.15975) 257 | - arXiv 2023.06, **DreamSim**: Learning New Dimensions of Human Visual Similarity using Synthetic Data, [arXiv](https://arxiv.org/abs/2306.09344) / [Website](https://dreamsim-nights.github.io/) 258 | - ICLR 2023 spotlight, **FluidLab**: A Differentiable Environment for Benchmarking Complex Fluid Manipulation, [Website](https://fluidlab2023.github.io/) 259 | - arXiv 2023.06, **Seal**: Segment Any Point Cloud Sequences by Distilling Vision Foundation Models, [arXiv](https://arxiv.org/abs/2306.09347) / [Website](https://ldkong.com/Seal) / [Github](https://github.com/youquanl/Segment-Any-Point-Cloud) 260 | - arXiv 2023.08, **BridgeData V2**: A Dataset for Robot Learning at Scale, [arXiv](https://arxiv.org/abs/2308.12952) / [Website](https://rail-berkeley.github.io/bridgedata/) 261 | - arXiv 2023.08, **Diffusion with Forward Models**: Solving Stochastic Inverse Problems Without Direct Supervision, [Website](https://diffusion-with-forward-models.github.io/) 262 | - ICML 2023, **QRL**: Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning, [Website](https://www.tongzhouwang.info/quasimetric_rl/) / [Github](https://github.com/quasimetric-learning/quasimetric-rl) 263 | - arXiv 2023.08, **Dynamic 3D Gaussians**: Tracking by Persistent Dynamic View Synthesis, [Website](https://dynamic3dgaussians.github.io/) 264 | - SIGGRAPH 2023 best paper, 3D Gaussian Splatting for Real-Time Radiance Field Rendering, [Website](https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/) 265 | - CoRL 2022, In-Hand Object Rotation via Rapid Motor Adaptation, [arXiv](https://arxiv.org/abs/2210.04887) / [Website](https://haozhi.io/hora/) 266 | - ICLR 2019, **DPI-Net**: Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids, [Website](http://dpi.csail.mit.edu/) 267 | - ICLR 2019, **Plan Online, Learn Offline**: Efficient Learning and Exploration via Model-Based Control, [arXiv](https://arxiv.org/abs/1811.01848) / [Website](https://sites.google.com/view/polo-mpc) 268 | - NeurIPS 2021 spotlight, **NeuS**: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction, [Website](https://lingjie0206.github.io/papers/NeuS/) 269 | - ICCV 2023, Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models, [Website](https://energy-based-model.github.io/unsupervised-concept-discovery/) 270 | - AAAI 2018, **FiLM**: Visual Reasoning with a General Conditioning Layer, [arXiv](https://arxiv.org/abs/1709.07871) 271 | - arXiv 2023.08, **RoboAgent**: Towards Sample Efficient Robot Manipulation with Semantic Augmentations and Action Chunking, [Website](https://robopen.github.io/) 272 | - ICRA 2000, **RRT-Connect**: An Efficient Approach to Single-Query Path Planning, [PDF](http://www.cs.cmu.edu/afs/andrew/scs/cs/15-494-sp13/nslobody/Class/readings/kuffner_icra2000.pdf) 273 | - CVPR 2017 oral, **Network Dissection**: Quantifying Interpretability of Deep Visual Representations, [arXiv](https://arxiv.org/abs/1704.05796) / [Website](http://netdissect.csail.mit.edu/) 274 | - NIPS 2020 (spotlight), Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains, [Website](https://bmild.github.io/fourfeat/index.html) 275 | - ICRA 1992, Planning optimal grasps, [PDF](https://people.eecs.berkeley.edu/~jfc/papers/92/FCicra92.pdf) 276 | - RSS 2021, **GIGA**: Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations, [arXiv](https://arxiv.org/abs/2104.01542) / [Website](https://sites.google.com/view/rpl-giga2021) 277 | - ECCV 2022, **StARformer**: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning, [arXiv](https://arxiv.org/abs/2110.06206) / [Github](https://github.com/elicassion/StARformer) 278 | - ICML 2023, **Parallel Q-Learning**: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation, [arXiv](https://arxiv.org/abs/2307.12983v1) / [Github](https://github.com/Improbable-AI/pql) 279 | - ECCV 2022, **SeedFormer**: Patch Seeds based Point Cloud Completion with Upsample Transformer, [arXiv](https://arxiv.org/abs/2207.10315) / [Github](https://github.com/hrzhou2/seedformer) 280 | - arXiv 2023.07, Waypoint-Based Imitation Learning for Robotic Manipulation, [Website](https://lucys0.github.io/awe/) 281 | - ICML 2022, **Prompt-DT**: Prompting Decision Transformer for Few-Shot Policy Generalization, [Website](https://mxu34.github.io/PromptDT/) 282 | - arXiv 2023, Reinforcement Learning from Passive Data via Latent Intentions, [Website](https://dibyaghosh.com/icvf/) 283 | - ICML 2023, **RPG**: Reparameterized Policy Learning for Multimodal Trajectory Optimization, [Website](https://haosulab.github.io/RPG/) 284 | - ICML 2023, **TGRL**: An Algorithm for Teacher Guided Reinforcement Learning, [Website](https://sites.google.com/view/tgrl-paper) 285 | - arXiv 2023.07, **XSkill**: Cross Embodiment Skill Discovery, [Website](https://xskillcorl.github.io/) / [arXiv](https://arxiv.org/abs/2307.09955) 286 | - ICML 2023, Learning Neural Constitutive Laws: From Motion Observations for Generalizable PDE Dynamics, [Website](https://sites.google.com/view/nclaw) / [Github](https://github.com/PingchuanMa/NCLaw) 287 | - arXiv 2023.07, **TokenFlow**: Consistent Diffusion Features for Consistent Video Editing, [Website](https://diffusion-tokenflow.github.io/) 288 | - arXiv 2023.07, **PAPR**: Proximity Attention Point Rendering, [Website](https://zvict.github.io/papr/) / [arXiv](https://arxiv.org/abs/2307.11086) 289 | - ICCV 2023, **DreamTeacher**: Pretraining Image Backbones with Deep Generative Models, [Website](https://research.nvidia.com/labs/toronto-ai/DreamTeacher/) / [arXiv](https://arxiv.org/abs/2307.07487) 290 | - RSS 2023, Robust and Versatile Bipedal Jumping Control through Reinforcement Learning, [arXiv](https://arxiv.org/abs/2302.09450) 291 | - arXiv 2023.07, **Differentiable Blocks World**: Qualitative 3D Decomposition by Rendering Primitives, [Website](https://www.tmonnier.com/DBW/) / [arXiv](https://arxiv.org/abs/2307.05473) 292 | - ICLR 2023, **DexDeform**: Dexterous Deformable Object Manipulation with Human Demonstrations and Differentiable Physics, [Website](https://sites.google.com/view/dexdeform/) / [Github](https://github.com/sizhe-li/DexDeform) 293 | - arXiv 2023.07, **RPDiff**: Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement, [Website](https://anthonysimeonov.github.io/rpdiff-multi-modal/) / [Github](https://github.com/anthonysimeonov/rpdiff) 294 | - arXiv 2023.07, **SpawnNet**: Learning Generalizable Visuomotor Skills from Pre-trained Networks, [Website](https://xingyu-lin.github.io/spawnnet/) / [Github](https://github.com/johnrso/spawnnet) 295 | - RSS 2023, **DexPBT**: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training, [Website](https://sites.google.com/view/dexpbt) / [arXiv](https://arxiv.org/abs/2305.12127) 296 | - arXiv 2023.07, **KITE**: Keypoint-Conditioned Policies for Semantic Manipulation, [Website](https://sites.google.com/view/kite-website/home) / [arXiv](https://arxiv.org/abs/2306.16605) 297 | - arXiv 2023.06, Detector-Free Structure from Motion, [Website](https://zju3dv.github.io/DetectorFreeSfM/) / [arXiv](https://arxiv.org/abs/2306.15669) 298 | - arXiv 2023.06, **REFLECT**: Summarizing Robot Experiences for FaiLure Explanation and CorrecTion, [arXiv](https://arxiv.org/abs/2306.15724) / [Website](https://roboreflect.github.io/) 299 | - arXiv 2023.06, **ViNT**: A Foundation Model for Visual Navigation, [Website](https://visualnav-transformer.github.io/) 300 | - AAAI 2023, Improving Long-Horizon Imitation Through Instruction Prediction, [arXiv](https://arxiv.org/abs/2306.12554) / [Github](https://github.com/jhejna/instruction-prediction) 301 | - arXiv 2023.06, **RVT**: Robotic View Transformer for 3D Object Manipulation, [Website](https://robotic-view-transformer.github.io/) 302 | - arXiv 2023.01, **Ponder**: Point Cloud Pre-training via Neural Rendering, [arXiv](https://arxiv.org/abs/2301.00157) 303 | - arXiv 2023.06, **SGR**: A Universal Semantic-Geometric Representation for Robotic Manipulation, [arXiv](https://arxiv.org/abs/2306.10474) / [Website](https://semantic-geometric-representation.github.io/) 304 | - arXiv 2023.06, Robot Learning with Sensorimotor Pre-training, [arXiv](https://arxiv.org/abs/2306.10007) / [Website](https://robotic-pretrained-transformer.github.io/) 305 | - arXiv 2023.06, For SALE: State-Action Representation Learning for Deep Reinforcement Learning, [arXiv](https://arxiv.org/abs/2306.02451) / [Github](https://github.com/sfujim/TD7) 306 | - arXiv 2023.06, **HomeRobot**: Open Vocabulary Mobile Manipulation, [Website](https://ovmm.github.io/) 307 | - arXiv 2023.06, Lifelike Agility and Play on Quadrupedal Robots using Reinforcement Learning and Deep Pre-trained Models, [Website](https://tencent-roboticsx.github.io/lifelike-agility-and-play/) 308 | - arXiv 2023.06, **TAPIR**: Tracking Any Point with per-frame Initialization and temporal Refinement, [Website](https://deepmind-tapir.github.io/) 309 | - CVPR 2017, **I3D**: Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset, [arXiv](https://arxiv.org/abs/1705.07750) 310 | - arXiv 2023.06, Diffusion Models for Zero-Shot Open-Vocabulary Segmentation, [Website](https://www.robots.ox.ac.uk/~vgg/research/ovdiff/) 311 | - arXiv 2023.06, **R-MAE**: Regions Meet Masked Autoencoders, [arXiv](https://arxiv.org/abs/2306.05411) / [Github](https://github.com/facebookresearch/r-mae) 312 | - arXiv 2023.05, **Optimus**: Imitating Task and Motion Planning with Visuomotor Transformers, [Website](https://mihdalal.github.io/optimus/) 313 | - arXiv 2023.05, Video Prediction Models as Rewards for Reinforcement Learning, [arXiv](https://arxiv.org/abs/2305.14343) / [Website](https://www.escontrela.me/viper/) 314 | - ICML 2023, **VIMA**: General Robot Manipulation with Multimodal Prompts, [Website](https://vimalabs.github.io/) / [Github](https://github.com/vimalabs/VIMA) 315 | - arXiv 2023.05, **SPRING**: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning, [arXiv](https://arxiv.org/abs/2305.15486) 316 | - arXiv 2023.05, Training Diffusion Models with Reinforcement Learning, [Website](https://rl-diffusion.github.io/) 317 | - arXiv 2023.03, Foundation Models for Decision Making: Problems, Methods, and Opportunities, [arXiv](https://arxiv.org/abs/2303.04129) 318 | - ICLR 2017, Third-Person Imitation Learning, [arXiv](https://arxiv.org/abs/1703.01703) 319 | - arXiv 2023.04, **CoTPC**: Chain-of-Thought Predictive Control, [Website](https://zjia.eng.ucsd.edu/cotpc) 320 | - CVPR 2023 highlight, **ImageBind**: One embedding to bind them all, [Website](https://imagebind.metademolab.com/) / [Github](https://github.com/facebookresearch/ImageBind) 321 | - arXiv 2023.05, **Shap-E**: Generating Conditional 3D Implicit Functions, [Github](https://github.com/openai/shap-e) 322 | - arXiv 2023.04, **Track Anything**: Segment Anything Meets Videos, [Github](https://github.com/gaomingqi/track-anything) 323 | - CVPR 2023, **GLaD**: Generalizing Dataset Distillation via Deep Generative Prior, [Website](https://georgecazenavette.github.io/glad/) 324 | - CVPR 2022 oral, **RegNeRF**: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs, [Website](https://m-niemeyer.github.io/regnerf/) 325 | - CVPR 2023, **FreeNeRF**: Improving Few-shot Neural Rendering with Free Frequency Regularization, [Website](https://jiawei-yang.github.io/FreeNeRF/) / [Github](https://github.com/Jiawei-Yang/FreeNeRF) 326 | - ICLR 2023 oral, **Decision-Diffuser**: Is Conditional Generative Modeling all you need for Decision-Making?, [Website](https://anuragajay.github.io/decision-diffuser/) 327 | - CVPR 2022, **Depth-supervised NeRF**: Fewer Views and Faster Training for Free, [Website](http://www.cs.cmu.edu/~dsnerf/) 328 | - SIGGRAPH Asia 2022, **ENeRF**: Efficient Neural Radiance Fields for Interactive Free-viewpoint Video, [Website](https://zju3dv.github.io/enerf/) 329 | - ICML 2023, On the power of foundation models, [arXiv](https://arxiv.org/abs/2211.16327) 330 | - ICML 2023, **SNeRL**: Semantic-aware Neural Radiance Fields for Reinforcement Learning, [Website](https://sjlee.cc/snerl/) 331 | - ICLR 2023 outstanding paper, Emergence of Maps in the Memories of Blind Navigation Agents, [Openreview](https://openreview.net/forum?id=lTt4KjHSsyl) 332 | - ICLR 2023 outstanding paper honorable mentions, Disentanglement with Biological Constraints: A Theory of Functional Cell Types, [Openreview](https://openreview.net/forum?id=9Z_GfhZnGH) 333 | - CVPR 2023 award candidate, Data-driven Feature Tracking for Event Cameras, [arXiv](https://arxiv.org/abs/2211.12826) 334 | - CVPR 2023 award candidate, What Can Human Sketches Do for Object Detection?, [Website](http://www.pinakinathc.me/sketch-detect/) 335 | - CVPR 2023 award candidate, Visual Programming for Compositional Visual Reasoning, [Website](https://prior.allenai.org/projects/visprog) 336 | - CVPR 2023 award candidate, On Distillation of Guided Diffusion Models, [arXiv](https://arxiv.org/abs/2210.03142) 337 | - CVPR 2023 award candidate, **DreamBooth**: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation, [Website](https://dreambooth.github.io/) 338 | - CVPR 2023 award candidate, Planning-oriented Autonomous Driving, [Github](https://github.com/OpenDriveLab/UniAD) 339 | - CVPR 2023 award candidate, Neural Dynamic Image-Based Rendering, [Website](https://dynibar.github.io/) 340 | - CVPR 2023 award candidate, **MobileNeRF**: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures, [Website](https://mobile-nerf.github.io/) 341 | - CVPR 2023 award candidate, **OmniObject3D**: Large Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation, [Website](https://omniobject3d.github.io/) 342 | - CVPR 2023 award candidate, Ego-Body Pose Estimation via Ego-Head Pose Estimation, [Website](https://lijiaman.github.io/projects/egoego/) 343 | - CVPR 2023, Affordances from Human Videos as a Versatile Representation for Robotics, [Website](https://robo-affordances.github.io/) 344 | - CVPR 2022, Neural 3D Video Synthesis from Multi-view Video, [Website](https://neural-3d-video.github.io/) 345 | - ICCV 2021, **Nerfies**: Deformable Neural Radiance Fields, [Website](https://nerfies.github.io/) / [Github](https://github.com/google/nerfies) 346 | - CVPR 2023 highlight, **HyperReel**: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling, [Website](https://hyperreel.github.io/) / [Github](https://github.com/facebookresearch/hyperreel) 347 | - arXiv 2022.05, **FlashAttention**: Fast and Memory-Efficient Exact Attention with IO-Awareness, [arXiv](https://arxiv.org/abs/2205.14135) / [Github](https://github.com/HazyResearch/flash-attention) 348 | - CVPR 2023, **CLIP^2**: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data, [arXiv](https://arxiv.org/abs/2303.12417) 349 | - CVPR 2023, **ULIP**: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding, [arXiv](https://arxiv.org/abs/2212.05171) / [Github](https://github.com/salesforce/ULIP) 350 | - CVPR 2023, Learning Video Representations from Large Language Models, [Website](https://facebookresearch.github.io/LaViLa/) / [Github](https://github.com/facebookresearch/LaViLa) 351 | - CVPR 2023, **PLA**: Language-Driven Open-Vocabulary 3D Scene Understanding, [Website](https://dingry.github.io/projects/PLA) 352 | - CVPR 2023, **PartSLIP**: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models, [arXiv](https://arxiv.org/abs/2212.01558) 353 | - CVPR 2023, Mask-Free Video Instance Segmentation, [Website](https://www.vis.xyz/pub/maskfreevis/) / [Github](https://github.com/SysCV/maskfreevis) 354 | - arXiv 2023.04, **DINOv2**: Learning Robust Visual Features without Supervision, [arXiv](https://arxiv.org/abs/2304.07193) / [Github](https://github.com/facebookresearch/dinov2) 355 | - arXiv 2023.04, Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields, [Website](https://jonbarron.info/zipnerf/) 356 | - arXiv 2023.04, SEEM: Segment Everything Everywhere All at Once, [arXiv](https://arxiv.org/abs/2304.06718) / [code](https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once) 357 | - arXiv 2023.04, Internet Explorer: Targeted Representation Learning on the Open Web, [page](https://internet-explorer-ssl.github.io/) / [code](https://github.com/internet-explorer-ssl/internet-explorer) 358 | - arXiv 2023.03, Consistency Models, [code](https://github.com/openai/consistency_models) / [arXiv](https://arxiv.org/abs/2303.01469) 359 | - arXiv 2023.02, SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections, [code](https://github.com/FrozenBurning/SceneDreamer) / [page](https://scene-dreamer.github.io/) 360 | - arXiv 2023.04, Generative Agents: Interactive Simulacra of Human Behavior, [arXiv](https://arxiv.org/abs/2304.03442) 361 | - ICLR 2023 notable, NTFields: Neural Time Fields for Physics-Informed Robot Motion Planning, [OpenReview](https://openreview.net/forum?id=ApF0dmi1_9K) 362 | - arXiv 2023, For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal, [arXiv](https://arxiv.org/abs/2304.04591) 363 | - code, Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions, [GitHub](https://github.com/ayaanzhaque/instruct-nerf2nerf) 364 | - arXiv 2023, Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection, [arXiv](https://arxiv.org/abs/2303.05499) / [GitHub](https://github.com/IDEA-Research/GroundingDINO) 365 | - arXiv 2023, Zero-1-to-3: Zero-shot One Image to 3D Object, [arXiv](https://arxiv.org/abs/2303.11328) 366 | - ICLR 2023, Towards Stable Test-Time Adaptation in Dynamic Wild World, [arXiv](https://arxiv.org/abs/2302.12400) 367 | - CVPR 2023 highlight, Neural Volumetric Memory for Visual Locomotion Control, [Website](https://rchalyang.github.io/NVM/) 368 | - arXiv 2023, Segment Anything, [Website](https://segment-anything.com/) 369 | - ICRA 2023, DribbleBot: Dynamic Legged Manipulation in the Wild, [Website](https://gmargo11.github.io/dribblebot/) 370 | - arXiv 2023, Alpaca: A Strong, Replicable Instruction-Following Model, [Website](https://crfm.stanford.edu/2023/03/13/alpaca.html) 371 | - arXiv 2023, VC-1: Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?, [Website](https://eai-vc.github.io/) 372 | - ICLR 2022, DroQ: Dropout Q-Functions for Doubly Efficient Reinforcement Learning, [arXiv](https://arxiv.org/abs/2110.02034) 373 | - arXiv 2023, RoboPianist: A Benchmark for High-Dimensional Robot Control, [Website](https://kzakka.com/robopianist/) 374 | - ICLR 2021, DDIM: Denoising Diffusion Implicit Models, [arXiv](https://arxiv.org/abs/2010.02502) 375 | - arXiv 2023, Your Diffusion Model is Secretly a Zero-Shot Classifier, [Website](https://diffusion-classifier.github.io/) 376 | - CVPR 2023 highlight, F2-NeRF: Fast Neural Radiance Field Training with Free Camera 377 | - arXiv 2023, Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware, [Website](https://tonyzhaozh.github.io/aloha/) 378 | - RSS 2021, RMA: Rapid Motor Adaptation for Legged Robots, [Website](https://ashish-kmr.github.io/rma-legged-robots/) 379 | - ICCV 2021, Where2Act: From Pixels to Actions for Articulated 3D Objects, [Website](https://cs.stanford.edu/~kaichun/where2act/) 380 | - CVPR 2019 oral, Semantic Image Synthesis with Spatially-Adaptive Normalization, [GitHub](https://github.com/NVlabs/SPADE) 381 | --------------------------------------------------------------------------------