├── .gitattributes
└── README.md
/.gitattributes:
--------------------------------------------------------------------------------
1 | # Auto detect text files and perform LF normalization
2 | * text=auto
3 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Awesome-Video-Robotic-Papers
2 |
3 | This repository compiles a list of seminal and cutting-edge papers that explore the application of video technology in the field of robotics. Continual improvements are being made to this repository, and contributions are welcome. If you come across any relevant papers that should be included, please don't hesitate to open an issue.
4 |
5 | ## Table of Contents
6 |
7 | 1. [Review Papers](#review-papers)
8 | 2. [Robot Arm](#robot-arm)
9 | 3. [SPOT](#spot)
10 | 4. [Dataset](#dataset)
11 | 5. [Other Useful Sources](#other-useful-sources)
12 |
13 | ## Review Papers
14 | - **Towards Generalist Robot Learning from Internet Video: A Survey**
15 | - Robert McCarthy, Daniel C.H. Tan, Dominik Schmidt, Fernando Acero, Nathan Herr, Yilun Du, Thomas G. Thuruthel, Zhibin Li
16 | - [Paper](https://arxiv.org/pdf/2404.19664)
17 |
18 | - **Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation**
19 | - Chrisantus Eze, Christopher Crick
20 | - [Paper](https://arxiv.org/abs/2402.07127)
21 |
22 | ## Robot Arm
23 | - **Let Me Help You! Neuro-Symbolic Short-Context Action Anticipation**
24 | - Sarthak Bhagat, Samuel Li, Joseph Campbell, Yaqi Xie, Katia Sycara, Simon Stepputtis
25 | - [Paper](https://ieeexplore.ieee.org/document/10582423)
26 | - [Website](https://sarthak268.github.io/NeSCA/)
27 | - [Code](https://github.com/sarthak268/nesca-pytorch)
28 | - IEEE Robotics and Automation Letters
29 | - The Robotics Institute, Carnegie Melon University
30 |
31 | - **Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks**
32 | - Kuan Fang, Patrick Yin, Ashvin Nair, Homer Walke, Gengchen Yan, Sergey Levine
33 | - [Paper](https://arxiv.org/abs/2210.06601)
34 | - [Code](https://github.com/kuanfang/flap)
35 | - CoRL 2022
36 | - UC Berkeley
37 |
38 | - **VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training**
39 | - Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Osbert Bastani, Vikash Kumar, Amy Zhang
40 | - [Paper](https://arxiv.org/abs/2210.00030)
41 | - [Website](https://sites.google.com/view/vip-rl)
42 | - [Code](https://github.com/facebookresearch/vip)
43 | - ICLR 2023, Notable-Top-25% (Spotlight)
44 | - FAIR, Meta AI || University of Pennsylvania
45 |
46 | - **SOAR: Autonomous Improvement of Instruction Following Skills via Foundation Models**
47 | - Zhiyuan Zhou, Pranav Atreya, Abraham Lee, Homer Walke, Oier Mees, Sergey Levine
48 | - [Paper](https://arxiv.org/abs/2407.20635)
49 | - [Website](https://auto-improvement.github.io/)
50 | - [Code](https://github.com/rail-berkeley/soar)
51 | - [Dataset](https://rail.eecs.berkeley.edu/datasets/soar_release/)
52 | - UC Berkeley
53 |
54 | - **HRP: Human Affordances for Robotic Pre-Training**
55 | - Mohan Kumar Srirama, Sudeep Dasari, Shikhar Bahl, Abhinav Gupta
56 | - [Paper](https://arxiv.org/abs/2407.18911)
57 | - Robotics Science and Systems 2024
58 | - CMU
59 |
60 | - **Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos**
61 | - Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman
62 | - [Paper](https://arxiv.org/abs/2406.09272)
63 | - [Website](https://vision.cs.utexas.edu/projects/action2sound/)
64 | - [Code] not release yet
65 | - University of Texas at Austin || FAIR, Meta AI
66 |
67 | - **This&That: Language-Gesture Controlled Video Generation for Robot Planning**
68 | - Boyang Wang, Nikhil Sridhar, Chao Feng, Mark Van der Merwe, Adam Fishman, Nima Fazeli, Jeong Joon Park
69 | - [Paper](https://arxiv.org/abs/2407.05530)
70 | - [Website](https://cfeng16.github.io/this-and-that/)
71 | - [Code] not release yet
72 | - University of Michigan || University of Washington
73 |
74 | - **Policy Composition From and For Heterogeneous Robot Learning**
75 | - Lirui Wang, Alan Zhao, Yilun Du, Ted Adelson, Russ Tedrake
76 | - [Paper](https://arxiv.org/abs/2402.02511)
77 | - [Website](https://liruiw.github.io/policycomp/)
78 | - MIT CSAIL
79 | - Robotics: Science and Systems (R:SS), 2024
80 |
81 | - **Simultaneous Localization and Affordance Prediction for Tasks in Egocentric Video**
82 | - Zachary Chavis, Hyun Soo Park, and Stephen J. Guy
83 | - [Paper](https://arxiv.org/abs/2407.13856)
84 | - Department of Computer Science and Engineering, University of Minnesota
85 |
86 | - **Flow as the Cross-domain Manipulation Interface**
87 | - Mengda Xu, Zhenjia Xu, Yinghao Xu, Cheng Chi, Gordon Wetzstein, Manuela Veloso, Shuran Song
88 | - [Paper](https://arxiv.org/abs/2407.15208)
89 | - [Website](https://im-flow-act.github.io/)
90 | - Stanford University || Columbia University || JP Morgan AI Research || Carnegie Mellon University
91 |
92 | - **R+X: Retrieval and Execution from Everyday Human Videos**
93 | - Georgios Papagiannis, Norman Di Palo, Pietro Vitiello, Edward Johns
94 | - [Paper](https://arxiv.org/abs/2407.12957)
95 | - [Website](https://www.robot-learning.uk/r-plus-x)
96 | - the Robot Learning Lab at Imperial College London
97 |
98 | - **Octo: An Open-Source Generalist Robot Policy**
99 | - Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine
100 | - [Paper](https://arxiv.org/pdf/2405.12213)
101 | - [Website](https://octo-models.github.io/)
102 | - [Code](https://github.com/octo-models/octo)
103 | - UC Berkeley || Stanford || Carnegie Mellon University || Google Deepmind
104 |
105 | - **HRP: Human Affordances for Robotic Pre-Training**
106 | - Mohan Kumar Srirama, Sudeep Dasari, Shikhar Bahl, Abhinav Gupta
107 | - [Paper](https://hrp-robot.github.io/hrp.pdf)
108 | - [Website](https://hrp-robot.github.io/)
109 | - [Code](https://github.com/SudeepDasari/data4robotics/tree/hrp_release)
110 | - Carnegie Mellon University
111 | - RSS 2024
112 |
113 | - **RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation**
114 | - Yuxuan Kuang*, Junjie Ye*, Haoran Geng*, Jiageng Mao, Congyue Deng, Leonidas Guibas, He Wang, Yue Wang
115 | - [Paper](https://arxiv.org/abs/2407.04689)
116 | - [Website](https://yxkryptonite.github.io/RAM/)
117 | - [Code](https://github.com/yxKryptonite/RAM_code)
118 | - University of Southern California || Peking University || Stanford University
119 |
120 | - **Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers**
121 | - Vidhi Jain, Maria Attarian, Nikhil J Joshi, Ayzaan Wahid, Danny Driess, Quan Vuong, Pannag R Sanketi, Pierre Sermanet, Stefan Welker, Christine Chan, Igor Gilitschenski, Yonatan Bisk, Debidatta Dwibedi
122 | - [Paper](https://arxiv.org/abs/2403.12943)
123 | - [Website](https://vid2robot.github.io/)
124 | - Google DeepMind || Carnegie Mellon University || University of Toronto
125 |
126 | - **OpenVLA: An Open-Source Vision-Language-Action Model**
127 | - Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn
128 | - [Paper](https://arxiv.org/abs/2406.09246)
129 | - [Website](https://openvla.github.io/)
130 | - [Code](https://github.com/openvla/openvla)
131 | - Stanford University || UC Berkeley || Toyota Research Institute || Google DeepMind || Physical Intelligence || MIT
132 |
133 | - **Video Language Planning**
134 | - Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson
135 | - [Paper](https://arxiv.org/abs/2310.10625)
136 | - [Website](https://video-language-planning.github.io/)
137 | - [Code](https://github.com/video-language-planning/vlp_code)
138 | - Google Deepmind || MIT || UC Berkeley
139 |
140 | - **Manipulate-Anything: Automating Real-World Robots using Vision-Language Models**
141 | - Jiafei Duan, Wentao Yuan, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani, Dieter Fox, Ranjay Krishna
142 | - [Paper](https://arxiv.org/pdf/2406.18915)
143 | - [Website](https://robot-ma.github.io/)
144 | - University of Washington || NVIDIA || Allen Institute for Artifical Intelligence || Universidad Católica San Pablo
145 |
146 | - **Dreamitate: Real-World Visuomotor Policy Learning via Video Generation**
147 | - Junbang Liang, Ruoshi Liu, Ege Ozguroglu, Sruthi Sudhakar, Achal Dave, Pavel Tokmakov, Shuran Song, Carl Vondrick
148 | - [Paper](https://arxiv.org/abs/2406.16862)
149 | - [Website](https://dreamitate.cs.columbia.edu/)
150 | - [Code](https://github.com/cvlab-columbia/dreamitate)
151 | - Columbia University || Toyota Research Institute || Stanford University,
152 |
153 | - **Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation**
154 | - Hongtao Wu, Ya Jing, Chilam Cheang, Guangzeng Chen, Jiafeng Xu, Xinghang Li, Minghuan Liu, Hang Li, Tao Kong
155 | - [Paper](https://arxiv.org/abs/2312.13139)
156 | - [Website](https://gr1-manipulation.github.io/)
157 | - [Code](https://github.com/bytedance/GR-1)
158 | - ByteDance Research
159 |
160 | - **Large-Scale Actionless Video Pre-Training via Discrete Diffusion for Efficient Policy Learning**
161 | - Haoran He, Chenjia Bai, Ling Pan, Weinan Zhang, Bin Zhao, Xuelong Li
162 | - [Paper](https://arxiv.org/abs/2402.14407)
163 | - [Website](https://video-diff.github.io/)
164 | - Hong Kong University of Science and Technology || Shanghai Artificial Intelligence Laboratory || Shanghai Jiao Tong University || Northwestern Polytechnical University || Institute of Artificial In- telligence (TeleAI), China Telecom Corp Ltd.
165 |
166 | - **ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data**
167 | - Zeyi Liu, Cheng Chi, Eric Cousineau, Naveen Kuppuswamy, Benjamin Burchfiel, Shuran Song
168 | - [Paper](https://arxiv.org/abs/2406.19464)
169 | - [Website](https://mani-wav.github.io/)
170 | - [Code](https://github.com/real-stanford/maniwav)
171 | - [Dataset](https://real.stanford.edu/maniwav/data/)
172 | - Stanford University || Columbia University || Toyota Research Institute
173 |
174 | - **Vision-based Manipulation from Single Human Video with Open-World Object Graphs**
175 | - Yifeng Zhu, Arisrei Lim, Peter Stone, Yuke Zhu
176 | - [Paper](https://arxiv.org/abs/2405.20321)
177 | - [Website](https://ut-austin-rpl.github.io/ORION-release/)
178 | - The University of Texas at Austin || Sony AI
179 |
180 | - **Learning to Act from Actionless Videos through Dense Correspondences**
181 | - Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum
182 | - [Paper](https://arxiv.org/abs/2310.08576)
183 | - [Website](https://flow-diffusion.github.io/)
184 | - [Code](https://github.com/flow-diffusion/AVDC)
185 | - National Taiwan University | MIT
186 |
187 | ## SPOT
188 | - **Track2Act: Predicting Point Tracks from Internet Videos Enables Diverse Zero-shot Manipulation**
189 | - Homanga Bharadhwaj, Roozbeh Mottaghi*, Abhinav Gupta*, Shubham Tulsiani*
190 | - [Paper](https://arxiv.org/abs/2405.01527)
191 | - [Website](https://homangab.github.io/track2act/)
192 | - [Code](https://github.com/homangab/Track-2-Act/)
193 |
194 | ## Dataset
195 | - **The Ingredients for Robotic Diffusion Transformers**
196 | - JSudeep Dasari, Oier Mees, Sebastian Zhao, Mohan Kumar Srirama, Sergey Levine
197 | - [Paper](https://arxiv.org/abs/2410.10088)
198 | - [Website](https://dit-policy.github.io/)
199 | - [Code](https://github.com/sudeepdasari/dit-policy)
200 | - 
201 |
202 |
203 |
204 | - **Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding**
205 | - Joshua Jones*, Oier Mees*, Carmelo Sferrazza*, Kyle Stachowicz, Pieter Abbeel, Sergey Levine
206 | - [Paper](https://arxiv.org/abs/2501.04693)
207 | - [Website](https://fuse-model.github.io/)
208 | - [Code](https://github.com/fuse-model/FuSe)
209 | -
210 |
211 |
212 | - **DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset**
213 | - Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park, Ilija Radosavovic, Kaiyuan Wang, Albert Zhan, Kevin Black, Cheng Chi, Kyle Beltran Hatch, Shan Lin, Jingpei Lu, Jean Mercat, Abdul Rehman, Pannag R Sanketi, Archit Sharma, Cody Simpson, Quan Vuong, Homer Rich Walke, Blake Wulfe, Ted Xiao, Jonathan Heewon Yang, Arefeh Yavary, Tony Z. Zhao, Christopher Agia, Rohan Baijal, Mateo Guaman Castro, Daphne Chen, Qiuyu Chen, Trinity Chung, Jaimyn Drake, Ethan Paul Foster, Jensen Gao, David Antonio Herrera, Minho Heo, Kyle Hsu, Jiaheng Hu, Donovon Jackson, Charlotte Le, Yunshuang Li, Kevin Lin, Roy Lin, Zehan Ma, Abhiram Maddukuri, Suvir Mirchandani, Daniel Morton, Tony Nguyen, Abigail O'Neill, Rosario Scalise, Derick Seale, Victor Son, Stephen Tian, Emi Tran, Andrew E. Wang, Yilin Wu, Annie Xie, Jingyun Yang, Patrick Yin, Yunchu Zhang, Osbert Bastani, Glen Berseth, Jeannette Bohg, Ken Goldberg, Abhinav Gupta, Abhishek Gupta, Dinesh Jayaraman, Joseph J Lim, Jitendra Malik, Roberto Martín-Martín, Subramanian Ramamoorthy, Dorsa Sadigh, Shuran Song, Jiajun Wu, Michael C. Yip, Yuke Zhu, Thomas Kollar, Sergey Levine, Chelsea Finn
214 | - [Paper](https://arxiv.org/abs/2403.12945)
215 | - [Website](https://droid-dataset.github.io/)
216 | - check the website and get much sources
217 | -
218 |
219 | | Paper Title | Link | Date |
220 | |-------------|------|------|
221 | | Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation | [Link](https://arxiv.org/abs/2412.15109) | 2024-12 |
222 | | Improving Vision-Language-Action Models via Chain-of-Affordance | [Link](https://arxiv.org/abs/2412.20451) | 2024-12 |
223 | | STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning | [Link](https://arxiv.org/abs/2412.15182) | 2024-12 |
224 | | WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making | [Link](https://arxiv.org/abs/2411.05619) | 2024-11 |
225 | | Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets | [Link](https://arxiv.org/abs/2410.22325) | 2024-10 |
226 | | UAD: Unsupervised Affordance Distillation for Generalization in Robotic Manipulation | [Link](https://openreview.net/forum?id=an953WOpo2) | 2024-10 |
227 | | RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation | [Link](https://arxiv.org/abs/2410.07864) | 2024-10 |
228 | | π0: A Vision-Language-Action Flow Model for General Robot Control | [Link](https://arxiv.org/abs/2410.24164) | 2024-10 |
229 | | Diffusion Transformer Policy | [Link](https://arxiv.org/abs/2410.15959) | 2024-10 |
230 | | AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation | [Link](https://arxiv.org/abs/2410.00371) | 2024-10 |
231 | | Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning | [Link](https://arxiv.org/abs/2410.01529) | 2024-10 |
232 | | VIRT: Vision Instructed Transformer for Robotic Manipulation | [Link](https://arxiv.org/abs/2410.07169) | 2024-10 |
233 | | CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera | [Link](https://arxiv.org/abs/2409.10441) | 2024-09 |
234 | | In-Context Imitation Learning via Next-Token Prediction | [Link](https://arxiv.org/abs/2408.15980) | 2024-08 |
235 | | Kalib: Markerless Hand-Eye Calibration with Keypoint Tracking | [Link](https://arxiv.org/abs/2408.10562) | 2024-08 |
236 | | RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation | [Link](https://arxiv.org/abs/2407.04689) | 2024-07 |
237 |
238 |
239 |
240 |
241 | - **Open X-Embodiment: Robotic Learning Datasets and RT-X Models**
242 | - Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie, Anthony Brohan, Antonin Raffin, Archit Sharma, Arefeh Yavary, Arhan Jain, Ashwin Balakrishna, Ayzaan Wahid, Ben Burgess-Limerick, Beomjoon Kim, Bernhard Schölkopf, Blake Wulfe, Brian Ichter, Cewu Lu, Charles Xu, Charlotte Le, Chelsea Finn, Chen Wang, Chenfeng Xu, Cheng Chi, Chenguang Huang, Christine Chan, Christopher Agia, Chuer Pan, Chuyuan Fu, Coline Devin, Danfei Xu, Daniel Morton, Danny Driess, Daphne Chen, Deepak Pathak, Dhruv Shah, Dieter Büchler, Dinesh Jayaraman, Dmitry Kalashnikov, Dorsa Sadigh, Edward Johns, Ethan Foster, Fangchen Liu, Federico Ceola, Fei Xia, Feiyu Zhao, Felipe Vieira Frujeri, Freek Stulp, Gaoyue Zhou, Gaurav S. Sukhatme, Gautam Salhotra, Ge Yan, Gilbert Feng, Giulio Schiavi, Glen Berseth, Gregory Kahn, Guangwen Yang, Guanzhi Wang, Hao Su, Hao-Shu Fang, Haochen Shi, Henghui Bao, Heni Ben Amor, Henrik I Christensen, Hiroki Furuta, Homanga Bharadhwaj, Homer Walke, Hongjie Fang, Huy Ha, Igor Mordatch, Ilija Radosavovic, Isabel Leal, Jacky Liang, Jad Abou-Chakra, Jaehyung Kim, Jaimyn Drake, Jan Peters, Jan Schneider, Jasmine Hsu, Jay Vakil et al. (192 additional authors not shown)
243 | - [Paper](https://arxiv.org/abs/2310.08864)
244 | - [Website](https://robotics-transformer-x.github.io/)
245 | - [Code](https://github.com/google-deepmind/open_x_embodiment)
246 | - 
247 |
248 | - **BridgeData V2: A Dataset for Robot Learning at Scale**
249 | - Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, Sergey Levine
250 | - [Paper](https://arxiv.org/abs/2308.12952)
251 | - [Website](https://rail-berkeley.github.io/bridgedata/)
252 | - [Code](https://github.com/rail-berkeley/bridge_data_v2)
253 | -
254 |
255 |
256 | - **Ego4DSounds:A diverse egocentric dataset with high action-audio correspondence**
257 | - Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman
258 | - [Paper](https://arxiv.org/abs/2406.09272)
259 | - [Website](https://ego4dsounds.github.io)
260 | - [Code](https://github.com/Ego4DSounds/Ego4DSounds)
261 | -
262 |
263 | - **RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot**
264 | - Hao-Shu Fang, Hongjie Fang, Zhenyu Tang, Jirong Liu, Chenxi Wang, Junbo Wang, Haoyi Zhu, Cewu Lu
265 | - [Paper](https://arxiv.org/abs/2307.00595)
266 | - [Website](https://rh20t.github.io/)
267 | - [API](https://github.com/rh20t/rh20t_api)
268 | - [Dataset](https://rh20t.github.io/#download)
269 | - Shanghai Jiao Tong University
270 | -
271 |
272 |
273 |
274 | ## Other Useful Sources
275 |
276 | - [Awesome-VideoLLM-Papers](https://github.com/yyyujintang/Awesome-VideoLLM-Papers)
277 | - [Awesome-LLMs-for-Video-Understanding](https://github.com/yunlong10/Awesome-LLMs-for-Video-Understanding)
278 | - [VLM-Eval: A General Evaluation on Video Large Language Models](https://github.com/zyayoung/Awesome-Video-LLMs)
279 | - [LLMs Meet Multimodal Generation and Editing: A Survey](https://github.com/YingqingHe/Awesome-LLMs-meet-Multimodal-Generation)
280 |
--------------------------------------------------------------------------------