└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Awesome In-Context Reinforcement Learning 2 | 3 | This is a collection of research papers for In-Context Reinforcement Learning (ICRL). The repository shall be regularly updated to track the frontiers. 4 | 5 | _Curated by dunnolab._ 6 | 7 | ----- 8 | Please, feel free to [PR](https://github.com/dunnolab/awesome-in-context-rl/pulls) new papers and resources you believe are relevant and awesome. 9 | ``` 10 | format: 11 | - [title](paper link) 12 | - author1, author2, and author3... 13 | ``` 14 | 15 | ## Papers 16 | ### 2025 17 | - [In-Context Reinforcement Learning via Communicative World Models](https://arxiv.org/abs/2508.06659) 18 | - Fernando Martinez-Lopez, Tao Li, Yingdong Lu, Juntao Chen 19 | - [Reward Is Enough: LLMs Are In-Context Reinforcement Learners](https://arxiv.org/abs/2506.06303) 20 | - Kefan Song, Amir Moeini, Peng Wang, Lei Gong, Rohan Chandra, Yanjun Qi, Shangtong Zhang 21 | - [Filtering Learning Histories Enhances In-Context Reinforcement Learning](https://arxiv.org/pdf/2505.15143) 22 | - Weiqin Chen, Xinjie Zhang, Dharmashankar Subramanian, Santiago Paternain 23 | - [OmniRL: In-Context Reinforcement Learning by Large-Scale Meta-Training in Randomized Worlds](https://arxiv.org/abs/2502.02869) 24 | - Wang, Fan, Pengtao Shao, Yiming Zhang, Bo Yu, Shaoshan Liu, Ning Ding, Yang Cao, Yu Kang, and Haifeng Wang 25 | - [A **Survey** of In-Context Reinforcement Learning](https://arxiv.org/abs/2502.07978) 26 | - Amir Moeini, Jiuqi Wang, Jacob Beck, Ethan Blaser, Shimon Whiteson, Rohan Chandra, Shangtong Zhang 27 | - [Yes, Q-learning Helps Offline In-Context RL](https://arxiv.org/abs/2502.17666) 28 | - Denis Tarasov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Andrei Polubarov, Nikita Lyubaykin, Alexander Derevyagin, Igor Kiselev, Vladislav Kurenkov 29 | - [Vintix: Action Model via In-Context Reinforcement Learning](https://arxiv.org/abs/2501.19400) 30 | - Andrey Polubarov, Nikita Lyubaykin, Alexander Derevyagin, Ilya Zisman, Denis Tarasov, Alexander Nikulin, Vladislav Kurenkov 31 | - [Training a Generally Curious Agent](https://arxiv.org/abs/2502.17543) 32 | - Fahim Tajwar, Yiding Jiang, Abitha Thankaraj, Sumaita Sadia Rahman, J Zico Kolter, Jeff Schneider, Ruslan Salakhutdinov 33 | 34 | ### 2024 35 | - [LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations](https://arxiv.org/abs/2412.01441) 36 | - Anian Ruoss, Fabio Pardo, Harris Chan, Bonnie Li, Volodymyr Mnih, Tim Genewein 37 | - [Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning](https://proceedings.mlr.press/v235/xu24o.html) 38 | - Tengye Xu, Zihao Li, Qinyuan Ren 39 | - [AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers](https://arxiv.org/abs/2411.11188) 40 | - Jake Grigsby, Justin Sasek, Samyak Parajuli, Daniel Adebi, Amy Zhang, Yuke Zhu 41 | - [LLMs Are In-Context Reinforcement Learners](https://arxiv.org/abs/2410.05362) 42 | - Giovanni Monea, Antoine Bosselut, Kianté Brantley, Yoav Artzi 43 | - [EVOLvE: Evaluating and Optimizing LLMs For Exploration](https://arxiv.org/abs/2410.06238) 44 | - Allen Nie, Yi Su, Bo Chang, Jonathan N. Lee, Ed H. Chi, Quoc V. Le, Minmin Chen 45 | - [Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models](https://arxiv.org/abs/2410.01280) 46 | - Can Demircan, Tankred Saanum, Akshay K. Jagadish, Marcel Binz, Eric Schulz 47 | - [ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI](https://arxiv.org/abs/2410.02751) 48 | - Ahmad Elawady, Gunjan Chhablani, Ram Ramrakhya, Karmesh Yadav, Dhruv Batra, Zsolt Kira, Andrew Szot 49 | - [Retrieval-Augmented Decision Transformer: External Memory for In-context RL](https://arxiv.org/abs/2410.07071) 50 | - Thomas Schmied, Fabian Paischer, Vihang Patil, Markus Hofmarcher, Razvan Pascanu, Sepp Hochreiter 51 | - [Random Policy Enables In-Context Reinforcement Learning within Trust Horizons](https://arxiv.org/pdf/2410.19982) 52 | - Weiqin Chen, Santiago Paternain 53 | - [Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning](https://arxiv.org/abs/2406.00392) 54 | - Jonathan Cook, Chris Lu, Edward Hughes, Joel Z. Leibo, Jakob Foerster 55 | - [XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning](https://arxiv.org/abs/2406.08973) 56 | - Alexander Nikulin, Ilya Zisman, Alexey Zemtsov, Viacheslav Sinii, Vladislav Kurenkov, Sergey Kolesnikov 57 | - [Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling](https://arxiv.org/abs/2406.00079) 58 | - Sili Huang, Jifeng Hu, Zhejian Yang, Liwei Yang, Tao Luo, Hechang Chen, Lichao Sun, Bo Yang 59 | - [Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning](https://arxiv.org/abs/2406.05064) 60 | - Subhojyoti Mukherjee, Josiah P. Hanna, Qiaomin Xie, Robert Nowak 61 | - [Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning](https://arxiv.org/abs/2405.13861) 62 | - Jiuqi Wang, Ethan Blaser, Hadi Daneshmand, Shangtong Zhang 63 | - [In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought](https://arxiv.org/abs/2405.20692) 64 | - Sili Huang, Jifeng Hu, Hechang Chen, Lichao Sun, Bo Yang 65 | - [In-Context Reinforcement Learning Without Optimal Action Labels](https://openreview.net/forum?id=8Dey9wo2qA) 66 | - Juncheng Dong, Moyang Guo, Ethan X Fang, Zhuoran Yang, Vahid Tarokh 67 | - [Can Large Language Models Explore In-Context?](https://arxiv.org/abs/2403.15371) 68 | - Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins 69 | - [Large Language Models As Evolution Strategies](https://arxiv.org/abs/2402.18381) 70 | - Robert Tjarko Lange, Yingtao Tian, Yujin Tang 71 | ### 2023 72 | - [Generalization to New Sequential Decision Making Tasks with In-Context Learning](https://arxiv.org/abs/2312.03801) 73 | - Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu 74 | - [Emergence of In-Context Reinforcement Learning from Noise Distillation](https://arxiv.org/abs/2312.12275) 75 | - Ilya Zisman, Vladislav Kurenkov, Alexander Nikulin, Viacheslav Sinii, Sergey Kolesnikov 76 | - [In-Context Reinforcement Learning for Variable Action Spaces](https://arxiv.org/abs/2312.13327) 77 | - Viacheslav Sinii, Alexander Nikulin, Vladislav Kurenkov, Ilya Zisman, Sergey Kolesnikov 78 | - [AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents](https://arxiv.org/abs/2310.09971) 79 | - Jake Grigsby, Linxi Fan, Yuke Zhu 80 | - [Cross-Episodic Curriculum for Transformer Agents](https://arxiv.org/abs/2310.08549) 81 | - Lucy Xiaoyang Shi, Yunfan Jiang, Jake Grigsby, Linxi "Jim" Fan, Yuke Zhu 82 | - [Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining](https://arxiv.org/abs/2310.08566) 83 | - Licong Lin, Yu Bai, Song Mei 84 | - [Large Language Models as General Pattern Machines](https://arxiv.org/abs/2307.04721) 85 | - Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng 86 | - [First-Explore, then Exploit: Meta-Learning Intelligent Exploration](https://arxiv.org/abs/2307.02276) 87 | - Ben Norman, Jeff Clune 88 | - [Supervised Pretraining Can Learn In-Context Reinforcement Learning](https://arxiv.org/abs/2306.14892) 89 | - Jonathan N. Lee, Annie Xie, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill 90 | - [Structured State Space Models for In-Context Reinforcement Learning](https://arxiv.org/abs/2303.03982) 91 | - Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani 92 | - [Human-Timescale Adaptation in an Open-Ended Task Space](https://arxiv.org/abs/2301.07608) 93 | - Adaptive Agent Team, Jakob Bauer, Kate Baumli, Satinder Baveja, Feryal Behbahani, Avishkar Bhoopchand, Nathalie Bradley-Schmieg, Michael Chang, Natalie Clay, Adrian Collister, Vibhavari Dasagi, Lucy Gonzalez, Karol Gregor, Edward Hughes, Sheleem Kashem, Maria Loks-Thompson, Hannah Openshaw, Jack Parker-Holder, Shreya Pathak, Nicolas Perez-Nieves, Nemanja Rakicevic, Tim Rocktäschel, Yannick Schroecker, Jakub Sygnowski, Karl Tuyls, Sarah York, Alexander Zacherl, Lei Zhang 94 | - [Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration](https://arxiv.org/abs/2302.04250) 95 | - Chentian Jiang, Nan Rosemary Ke, Hado van Hasselt 96 | - [Towards General-Purpose In-Context Learning Agents](https://openreview.net/forum?id=zDTqQVGgzH) 97 | - Louis Kirsch, James Harrison, C. Daniel Freeman, Jascha Sohl-Dickstein, Jürgen Schmidhuber 98 | ### Before 2023 99 | - [In-context Reinforcement Learning with Algorithm Distillation](https://arxiv.org/abs/2210.14215) 100 | - Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih 101 | - [Large Language Models can Implement Policy Iteration](https://arxiv.org/abs/2210.03821) 102 | - Ethan Brooks, Logan Walls, Richard L. Lewis, Satinder Singh 103 | ### Publication Year Unkown 104 | --------------------------------------------------------------------------------