AgentsMeetRL Dashboard

├── logo.png ├── index.html └── README.md /logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thinkwee/AgentsMeetRL/HEAD/logo.png -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | AgentsMeetRL Dashboard 7 | 8 | 459 | 460 | 461 |

462 | 473 | 474 |

475 |

476 |

Loading repository data...

477 |

478 |

479 |

480 | 481 |

482 |

483 |

484 |

Total Repositories in AgentsMeetRL

485 |

486 |

487 |

488 |

Stars of All Repositories in AgentsMeetRL

489 |

490 |

491 |

492 |

Categories

493 |

494 |

495 |

496 |

AgentsMeetRL Stars

497 |

498 |

499 | 500 |

501 |

502 |

Repositories by Category

503 |

504 |

505 |

506 |

Stars Distribution by Category

507 |

508 |

509 |

510 | 511 | 513 |

514 | 515 | 1004 | 1005 | 1006 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 |

3 |

4 | 5 |

6 | 7 | ![Base Framework](https://img.shields.io/badge/Base_Framework-12-BFA2DB?style=for-the-badge) 8 | ![General](https://img.shields.io/badge/General-6-4E6813?style=for-the-badge) 9 | ![Web](https://img.shields.io/badge/Web-17-845C40?style=for-the-badge) 10 | ![GUI](https://img.shields.io/badge/GUI-9-A259FF?style=for-the-badge) 11 | ![Tool](https://img.shields.io/badge/Tool-10-D89F7B?style=for-the-badge) 12 | ![Game](https://img.shields.io/badge/Game-7-1F4CAD?style=for-the-badge) 13 |
14 | ![Code](https://img.shields.io/badge/Code-12-A47B67?style=for-the-badge) 15 | ![QA](https://img.shields.io/badge/QA-17-FF69B4?style=for-the-badge) 16 | ![Memory](https://img.shields.io/badge/Memory-3-007a88?style=for-the-badge) 17 | ![Embodied](https://img.shields.io/badge/Embodied-2-C0C5CE?style=for-the-badge) 18 | ![Biomedical](https://img.shields.io/badge/Biomedical-3-ffc884?style=for-the-badge) 19 | ![Environment](https://img.shields.io/badge/Environment-32-FA5A4C?style=for-the-badge) 20 | 21 |

22 | 23 |

24 | 25 | [![Interactive Dashboard](https://img.shields.io/badge/📊_Interactive_Dashboard-Visit_Website-blue?style=for-the-badge)](https://thinkwee.top/amr/) 26 | 27 |

28 | 29 | # When LLM Agents Meet Reinforcement Learning 30 | 31 | **AgentsMeetRL** is an awesome list that summarizes **open-source repositories** for training LLM Agents using reinforcement learning: 32 | - 🤖 The criteria for identifying an agent project are that it must have at least one of the following: multi-turn interactions or tool use (so TIR projects, Tool-Integrated Reasoning, are considered in this repo). 33 | - ⚠️ This project is based on code analysis from open-source repositories using GitHub Copilot Agent, which may contain unfaithful cases. Although manually reviewed, there may still be omissions. If you find any errors, please don't hesitate to let us know immediately through issues or PRs - we warmly welcome them! 34 | - 🚀 We particularly focus on the reinforcement learning frameworks, RL algorithms, rewards, and environments that projects depend on, for everyone's reference on how these excellent open-source projects make their technical choices. See [Click to view technical details] under each table. 35 | - 🤗 Feel free to submit your own projects anytime - we welcome contributions! 36 | 37 | Some Enumeration: 38 | - Enumeration for Reward Type: 39 | - External Verifier: e.g., a compiler or math solver 40 | - Rule-Based: e.g., a LaTeX parser with exact match scoring 41 | - Model-Based: e.g., a trained verifier LLM or reward LLM 42 | - Custom 43 | 44 | --- 45 | 46 | ## 🔧 Base Framework 47 | 48 | 49 | | Github Repo | 🌟 Stars | Date | Org | Paper Link | 50 | | :----: | :----: | :----: | :----: | :----: | 51 | | [siiRL](https://github.com/sii-research/siiRL) |

| 2025.5 | Tsinghua | -- | 57 | | [RL2](https://github.com/ChenmienTan/RL2) |

| 2025.4 | Accio | – | 58 | | [verifiers](https://github.com/willccbb/verifiers) |

| 2025.3 | Individual | -- | 59 | | [oat](https://github.com/sail-sg/oat) |

| 2019.11 | HuggingFace | -- | 63 | 64 |

65 |

📋 Click to view technical details

83 | 84 | ## 💪 General/MultiTask 85 | 86 | | Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework | 87 | | :----: | :----: | :----: | :----: | :----: | :----: | 88 | | [DEPO](https://github.com/OpenCausaLab/DEPO) |

| 2025.5 | PolyU | [Paper](https://arxiv.org/pdf/2505.20732) | TRL | 94 | | [verl-agent](https://github.com/langfengQ/verl-agent) |

| 2025.5 | NTU/Skywork | [Paper](https://arxiv.org/pdf/2505.10978) | veRL | 95 | 96 |

97 |

📋 Click to view technical details

110 | 111 | ## 🔍 Search/Research/Web 112 | 113 | 114 | | Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework | 115 | | :----: | :----: | :----: | :----: | :----: | :----: | 116 | | [ReSeek](https://github.com/TencentBAC/ReSeek) |

| 2025.6 | CMU | [Paper](https://arxiv.org/abs/2506.07976) | Custom | 121 | | [R-Search](https://github.com/QingFei1/R-Search) |

| 2025.6 | Individual | -- | veRL | 122 | | [R1-Searcher-plus](https://github.com/RUCAIBox/R1-Searcher-plus) |

| 2025.5 | RUC | [Paper](https://arxiv.org/pdf/2505.17005) | Custom | 123 | | [StepSearch](https://github.com/Zillwang/StepSearch) |

| 2025.4 | RUC | [Paper](https://arxiv.org/pdf/2504.21776) | Custom | 127 | | [DeepResearcher](https://github.com/GAIR-NLP/DeepResearcher) |

| 2025.3 | RUC | [Paper](https://arxiv.org/pdf/2503.05592) | OpenRLHF | 130 | | [C-3PO](https://github.com/Chen-GX/C-3PO) |

| 2025.1 | Alibaba | [paper1](https://arxiv.org/pdf/2501.07572), [paper2](https://arxiv.org/pdf/2505.22648) | LLaMA-Factory | 133 | 134 |

135 |

📋 Click to view technical details

158 | 159 | ## 📱 GUI 160 | 161 | 162 | | Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework | 163 | | :----: | :----: | :----: | :----: | :----: | :----: | 164 | | [MobileAgent](https://github.com/X-PLUG/MobileAgent) |

| 2025.6 | Salesforce | [blog](https://huggingface.co/blog/HelloKKMe/grounding-r1) | trl | 167 | | [AgentCPM-GUI](https://github.com/OpenBMB/AgentCPM-GUI) |

| 2025.5 | Nankai University/vivo | [Paper](https://arxiv.org/pdf/2505.12370) | trl | 169 | | [ARPO](https://github.com/dvlab-research/ARPO) |

| 2025.5 | RUC | [Paper](https://arxiv.org/pdf/2505.15810) | TRL | 171 | | [GUI-R1](https://github.com/ritzz-ai/GUI-R1) |

| 2025.3 | vivo/CUHK | [Paper](https://arxiv.org/pdf/2503.21620) | TRL | 173 | 174 |

175 |

📋 Click to view technical details

190 | 191 | ## 🔨 Tool 192 | 193 | 194 | | Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework | 195 | | :----: | :----: | :----: | :----: | :----: | :----: | 196 | | [MiroRL](https://github.com/MiroMindAI/MiroRL) |

| 2025.5 | RUC | [Paper](https://arxiv.org/pdf/2505.16410) | LLaMA-Factory | 201 | | [RL-Factory](https://github.com/Simple-Efficient/RL-Factory) |

| 2025.3 | BaiChuan | [Paper](https://arxiv.org/pdf/2503.19470) | veRL | 206 | 207 |

208 |

📋 Click to view technical details

224 | 225 | ## 🎮 TextGame 226 | 227 | 228 | | Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework | 229 | | :----: | :----: | :----: | :----: | :----: | :----: | 230 | | [ARIA](https://github.com/rhyang2021/ARIA) |

| 2025.3 | OpenPipe | [Paper](https://github.com/OpenPipe/ART#-citation) | TRL | 235 | | [OpenManus-RL](https://github.com/OpenManus/OpenManus-RL) |

| 2025.3 | UIUC/MetaGPT | -- | Custom | 236 | | [RAGEN](https://github.com/RAGEN-AI/RAGEN) |

| 2025.1 | RAGEN-AI | [Paper](https://arxiv.org/pdf/2504.20073) | veRL | 237 | 238 |

239 |

📋 Click to view technical details

252 | 253 | ## 💻 Code 254 | 255 | 256 | | Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework | 257 | | :----: | :----: | :----: | :----: | :----: | :----: | 258 | | [PPP-Agent](https://github.com/sunnweiwei/PPP-Agent) |

| 2025.1 | HuggingFace | -- | TRL | 270 | 271 |

272 |

📋 Click to view technical details

290 | 291 | ## 🤔 QA(Reasoning/Math) 292 | 293 | 294 | | Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework | 295 | | :----: | :----: | :----: | :----: | :----: | :----: | 296 | | [SafeSearch](https://github.com/amazon-science/SafeSearch) |

| 2025.7 | Individual (Danau5tin) | N/A | rLLM | 302 | | [MOTIF](https://github.com/purbeshmitra/MOTIF) |

| 2025.6 | University of Maryland | [Paper](https://arxiv.org/abs/2507.02851) | trl | 303 | | [cmriat/l0](https://github.com/cmriat/l0) |

| 2024.5 | OpenRLHF | [Paper](https://arxiv.org/pdf/2405.11143) | OpenRLHF | 313 | 314 |

315 |

📋 Click to view technical details

338 | 339 | ## 🧠 Memory 340 | 341 | 342 | | Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework | 343 | | :----: | :----: | :----: | :----: | :----: | :----: | 344 | | [MEM1](https://github.com/MIT-MI/MEM1) |

| 2025.7 | MIT | [Paper](https://arxiv.org/abs/2506.15841) | veRL (based on Search-R1) | 345 | | [Memento](https://github.com/Agent-on-the-Fly/Memento) |

| 2025.6 | Bytedance, Tsinghua-SIA | [Paper](https://arxiv.org/abs/2507.02259) | veRL | 347 | 348 |

349 |

📋 Click to view technical details

358 | 359 | ## 🦾 Embodied 360 | 361 | 362 | | Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework | 363 | | :----: | :----: | :----: | :----: | :----: | :----: | 364 | | [Embodied-R1](https://github.com/pickxiguapi/Embodied-R1) |

| 2025.2 | The Hong Kong Polytechnic University | [Paper](https://arxiv.org/abs/2502.14276) | FastChat/TRL | 366 | 367 | 368 | 369 |

370 |

📋 Click to view technical details

371 | 372 | | Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage | 373 | | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | 374 | | [Embodied-R1](https://github.com/pickxiguapi/Embodied-R1) | GRPO | Single | Outcome | Single | Grounding/Waypoint | Rule | No | 375 | | [STeCa](https://github.com/WangHanLinHenry/STeCa) | DPO (RFT) | Single | Both | Multi | Embodied/Household | Rule/MC | Environment Actions | 376 | 377 | 378 |

379 | 380 | ## 🏥 Biomedical 381 | 382 | 383 | | Github Repo | 🌟 Stars | Date | Org | Paper Link | RL Framework | 384 | | :----: | :----: | :----: | :----: | :----: | :----: | 385 | | [MMedAgent-RL](https://github.com/JanerhYang/MMedAgent-RL) |

| 2025.3 | Stanford University (SNAP) | [Paper](https://www.biorxiv.org/content/10.1101/2025.05.30.656746v1) | Custom | 388 | 389 | 390 |

391 |

📋 Click to view technical details

392 | 393 | | Github Repo | RL Algorithm | Single/Multi Agent | Outcome/Process Reward | Single/Multi Turn | Task | Reward Type | Tool usage | 394 | | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | 395 | | [MMedAgent-RL](https://github.com/JanerhYang/MMedAgent-RL) | Unknown | Multi | Unknown | Unknown | Unknown | Unknown | Unknown | 396 | | [DoctorAgent-RL](https://github.com/JarvisUSTC/DoctorAgent-RL) | GRPO | Multi | Both | Multi | Consultation/Diagnosis | Model/Rule | No | 397 | | [Biomni](https://github.com/snap-stanford/Biomni) | TBD | Single | TBD | Single | scRNAseq/CRISPR/ADMET/Knowledge | TBD | Yes | 398 | 399 | 400 | 401 |

402 | 403 | ## ⛰️ Environment 404 | 405 | | Github Repo | 🌟 Stars | Date | Org | Task | 406 | | :----: | :----: | :----: | :----: | :----: | 407 | | [LoCoBench-Agent](https://github.com/SalesforceAIResearch/LoCoBench-Agent) | ![](https://img.shields.io/github/stars/SalesforceAIResearch/LoCoBench-Agent.svg?style=for-the-badge&logo=github&logoColor=white&labelColor=181717&color=ffd700) | 2025.11 | Salesforce AI Research | SWE | 408 | | [Simia-Agent-Training](https://github.com/microsoft/Simia-Agent-Training) | ![](https://img.shields.io/github/stars/microsoft/Simia-Agent-Training.svg?style=for-the-badge&logo=github&logoColor=white&labelColor=181717&color=ffd700) | 2025.10 | Microsoft | ToolUse/API | 409 | | [PaperArena](https://github.com/Melmaphother/PaperArena) |

| 2025.6 | Ohio State University | Web | 413 | | [gem](https://github.com/axon-rl/gem) |

| 2025.5 | GIT, Stanford | MLE | 415 | | [atropos](https://github.com/NousResearch/atropos) |

| 2024.1 | CMU | Web | 430 | | [LMRL-Gym](https://github.com/abdulhaim/LMRL-Gym) |

| 2023.7 | CMU | Web | 433 | | [AgentBench](https://github.com/THUDM/AgentBench) |

| 2022.7 | Princeton-NLP | Web | 435 | | [ScienceWorld](https://github.com/allenai/ScienceWorld) |

| 2018.6 | Microsoft Research | TextGame | 440 | 441 | ## Under Review/Waiting for Open Source 442 | - [JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning](https://arxiv.org/abs/2506.19846) 443 | - [Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning](https://arxiv.org/abs/2507.17842) 444 | - [Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning](https://arxiv.org/abs/2508.03501) 445 | - [Acting Less is Reasoning More! Teaching Model to Act Efficiently](https://arxiv.org/abs/2504.14870) 446 | - [Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning](https://arxiv.org/abs/2505.01441) 447 | - [ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents](https://arxiv.org/abs/2508.14040) 448 | - [Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward](https://github.com/antgroup/Research-Venus) 449 | - [MUA-RL: MULTI-TURN USER-INTERACTING AGENTREINFORCEMENT LEARNING FOR AGENTIC TOOL USE](https://github.com/zzwkk/MUA-RL) 450 | - [Understanding Tool-Integrated Reasoning](https://zhongwenxu.notion.site/Understanding-Tool-Integrated-Reasoning-2551c4e140e3805489fadcc802a1ea83) 451 | - [Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning](https://arxiv.org/abs/2508.19828) 452 | - [Encouraging Good Processes Without the Need for Good Answers: Reinforcement Learning for LLM Agent Planning](https://arxiv.org/abs/2508.19598) 453 | - [SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents](https://arxiv.org/abs/2509.06283) 454 | - [WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents](https://arxiv.org/abs/2509.06501) 455 | - [EnvX: Agentize Everything with Agentic AI](https://arxiv.org/abs/2509.08088) 456 | - [UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning](https://arxiv.org/abs/2509.02544) 457 | - [UI-Venus Technical Report: Building High-performance UI Agents with RFT](https://arxiv.org/abs/2508.10833) 458 | - [Agent2 : An Agent-Generates-Agent Framework for Reinforcement Learning Automation](https://arxiv.org/abs/2509.13368) 459 | - [Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use](https://arxiv.org/abs/2509.12867v1) 460 | - [Adversarial Reinforcement Learning for Large Language Model Agent Safety](https://arxiv.org/abs/2510.05442) 461 | - [Learning to Refine: An Agentic RL Approach for Iterative SPARQL Query Construction](https://www.arxiv.org/abs/2511.11770) 462 | - [InfoFlow: Reinforcing Search Agent Via Reward Density Optimization](https://arxiv.org/abs/2510.26575) 463 | 464 | ## Star History 465 | 466 | [![Star History Chart](https://api.star-history.com/svg?repos=thinkwee/agentsMeetRL&type=Date)](https://www.star-history.com/#thinkwee/agentsMeetRL&Date) 467 | 468 | 469 | ## Citation 470 | 471 | If you find this repository useful, please consider citing it: 472 | 473 | ```bibtex 474 | @misc{agentsMeetRL, 475 | title={When LLM Agents Meet Reinforcement Learning: A Comprehensive Survey}, 476 | author={AgentsMeetRL Contributors}, 477 | year={2025}, 478 | url={https://github.com/thinkwee/agentsMeetRL} 479 | } 480 | ``` 481 | 482 | --- 483 | 484 |

485 |

Made with ❤️ by the AgentsMeetRL community

486 |

487 | --------------------------------------------------------------------------------

🤖 AgentsMeetRL 🔥