Awesome Multi-Agent Papers

├── README.md
└── src
    ├── arxiv-bibtex.py
    └── arxiv_bibtex.bib


/README.md:
--------------------------------------------------------------------------------
  1 | <div align="center">
  2 |   <a href="https://swarms.world">
  3 |     <h1>Awesome Multi-Agent Papers</h1>
  4 |   </a>
  5 | </div>
  6 | <p align="center">
  7 |   <em>A compilation of the best multi-agent papers</em>
  8 | </p>
  9 | 
 10 | <p align="center">
 11 |     <a href="https://pypi.org/project/swarms/" target="_blank">
 12 |         <img alt="Python" src="https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54" />
 13 |         <img alt="Version" src="https://img.shields.io/pypi/v/swarms?style=for-the-badge&color=3670A0">
 14 |     </a>
 15 | </p>
 16 | <p align="center">
 17 | <a href="https://twitter.com/swarms_corp/">🐦 Twitter</a>
 18 | <span>&nbsp;&nbsp;•&nbsp;&nbsp;</span>
 19 | <a href="https://discord.gg/jM3Z6M9uMq">📢 Discord</a>
 20 | <span>&nbsp;&nbsp;•&nbsp;&nbsp;</span>
 21 | <a href="https://swarms.world/explorer">Swarms Platform</a>
 22 | <span>&nbsp;&nbsp;•&nbsp;&nbsp;</span>
 23 | <a href="https://github.com/kyegomez/swarms">📙 Framework</a>
 24 | </p>
 25 | 
 26 | 
 27 | [![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/jM3Z6M9uMq) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge&logo=youtube&logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge&logo=x&logoColor=white)](https://x.com/kyegomezb)
 28 | 
 29 | [![Join the Agora discord](https://img.shields.io/discord/1110910277110743103?label=Discord&logo=discord&logoColor=white&style=plastic&color=d7b023)![Share on Twitter](https://img.shields.io/twitter/url/https/twitter.com/cloudposse.svg?style=social&label=Share%20%40kyegomez/swarms)](https://twitter.com/intent/tweet?text=Check%20out%20this%20amazing%20AI%20project:%20&url=https%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms) [![Share on Facebook](https://img.shields.io/badge/Share-%20facebook-blue)](https://www.facebook.com/sharer/sharer.php?u=https%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms) [![Share on LinkedIn](https://img.shields.io/badge/Share-%20linkedin-blue)](https://www.linkedin.com/shareArticle?mini=true&url=https%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms&title=&summary=&source=)
 30 | 
 31 | [![Share on Reddit](https://img.shields.io/badge/-Share%20on%20Reddit-orange)](https://www.reddit.com/submit?url=https%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms&title=Swarms%20-%20the%20future%20of%20AI) [![Share on Hacker News](https://img.shields.io/badge/-Share%20on%20Hacker%20News-orange)](https://news.ycombinator.com/submitlink?u=https%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms&t=Swarms%20-%20the%20future%20of%20AI) [![Share on Pinterest](https://img.shields.io/badge/-Share%20on%20Pinterest-red)](https://pinterest.com/pin/create/button/?url=https%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms&media=https%3A%2F%2Fexample.com%2Fimage.jpg&description=Swarms%20-%20the%20future%20of%20AI) [![Share on WhatsApp](https://img.shields.io/badge/-Share%20on%20WhatsApp-green)](https://api.whatsapp.com/send?text=Check%20out%20Swarms%20-%20the%20future%20of%20AI%20%23swarms%20%23AI%0A%0Ahttps%3A%2F%2Fgithub.com%2Fkyegomez%2Fswarms)
 32 | 
 33 | 
 34 | 
 35 | A compilation of the best multi-agent papers by the [Swarms](https://github.com/kyegomez/swarms) Team. Our mission is to democratize multi-agent systems to automate the world economy with agents and usher in a post-scarcity Human civilization. [Join our community now!](https://discord.com/servers/agora-999382051935506503)
 36 | 
 37 | ## Format
 38 | 
 39 | - [Paper Name]  ([PDF PAPER LINK ]) bibtex short name
 40 | 
 41 | ----
 42 | 
 43 | 
 44 | ## Multi-Agent Collaboration & System Design
 45 | 
 46 | - **[K-Level Reasoning with Large Language Models](https://arxiv.org/pdf/2402.01521)**
 47 | - **[More Agents is All You Need](https://arxiv.org/pdf/2402.05120.pdf)**
 48 | - **[LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration](https://arxiv.org/pdf/2402.11550)**
 49 | - **[AgentScope: A Flexible yet Robust Multi-Agent Platform](https://arxiv.org/pdf/2402.14034.pdf)**
 50 | - **[Learning to Decode Collaboratively with Multiple Language Models](https://arxiv.org/abs/2403.03870)**
 51 | - **[AIOS: LLM Agent Operating System](https://arxiv.org/pdf/2403.16971.pdf)**
 52 | - **[AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation](https://arxiv.org/abs/2308.08155)**
 53 | - **[Chain of Agents: Large Language Models Collaborating on Long-Context Tasks](https://arxiv.org/abs/2406.02818)**
 54 | - **[Mixture-of-Agents Enhances Large Language Model Capabilities](https://arxiv.org/abs/2406.04692)**
 55 | - **[EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms](https://arxiv.org/pdf/2406.14228)**
 56 | - **[Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence](https://arxiv.org/abs/2407.07061)**
 57 | - **[Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning](https://arxiv.org/abs/2310.03094)**
 58 | - **[Optimizing Collaboration of LLM based Agents](https://arxiv.org/pdf/2408.13406)**
 59 | - **[LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework](https://arxiv.org/pdf/2409.11393)**
 60 | - **[Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System](https://huggingface.co/papers/2410.08115)**
 61 | 
 62 | ## Multi-Agent Frameworks & Benchmarks
 63 | 
 64 | - **[AgentGym: Evolving Large Language Model-based Agents across Diverse Environments](https://huggingface.co/papers/2406.04151)**
 65 | - **[Very Large-Scale Multi-Agent Simulation in AgentScope](https://arxiv.org/abs/2407.17789)**
 66 | - **[AgentClinic: A Multimodal Agent Benchmark for AI in Clinical Environments](https://arxiv.org/pdf/2405.07960)**
 67 | - **[MultiAgentBench: Evaluating the Collaboration and Competition of LLM Agents](https://arxiv.org/abs/2503.01935)**
 68 | - **[TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks](https://arxiv.org/abs/2412.14161)**
 69 | - **[BoxingGym: Benchmarking Progress in Automated Experimental Design](https://arxiv.org/abs/2501.01540)**
 70 | 
 71 | ## Application-Specific Multi-Agent Systems
 72 | 
 73 | ### Software Engineering
 74 | - **[Automated Unit Test Improvement using Large Language Models](https://arxiv.org/pdf/2402.09171.pdf)**
 75 | - **[Experiential Co-Learning of Software-Developing Agents](https://arxiv.org/abs/2312.17025)**
 76 | - **[ChatDev: Communicative Agents for Software Development](https://arxiv.org/pdf/2307.07924.pdf)**
 77 | - **[MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution](https://arxiv.org/abs/2403.17927)**
 78 | - **[CodeR: Issue Resolving with Multi-Agent and Task Graphs](https://arxiv.org/pdf/2406.01304)**
 79 | - **[From LLMs to LLM-based Agents for Software Engineering: A Survey](https://arxiv.org/abs/2408.02479)**
 80 | - **[CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases](https://www.arxiv.org/abs/2408.03910)**
 81 | - **[Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents](https://arxiv.org/abs/2408.07060)**
 82 | - **[Large Language Model-Based Agents for Software Engineering: A Survey](https://arxiv.org/pdf/2409.02977)**
 83 | - **[AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation](https://arxiv.org/pdf/2409.10737)**
 84 | - **[RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance](https://arxiv.org/pdf/2410.01242)**
 85 | 
 86 | ### Healthcare & Medical
 87 | - **[Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents](https://arxiv.org/abs/2405.02957)**
 88 | - **[MEDCO: Medical Education Copilots Based on A Multi-Agent Framework](https://arxiv.org/pdf/2408.12496)**
 89 | - **[Multi Agent based Medical Assistant for Edge Devices](https://arxiv.org/pdf/2503.05397)**
 90 | - **[Can AI Agents Design and Implement Drug Discovery Pipelines?](https://arxiv.org/abs/2504.19912)**
 91 | ### Data & ML
 92 | - **[LAMBDA: A Large Model Based Data Agent](https://arxiv.org/abs/2407.17535)**
 93 | - **[Agentic Retrieval-Augmented Generation for Time Series Analysis](https://arxiv.org/abs/2408.14484)**
 94 | - **[Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining](https://huggingface.co/papers/2410.08102)**
 95 | - **[AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML](https://arxiv.org/pdf/2410.02958)**
 96 | - **[AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions](https://arxiv.org/abs/2410.20424)**
 97 | - **[DataLab: A Unified Platform for LLM-Powered Business Intelligence](https://arxiv.org/abs/2412.02205)**
 98 | 
 99 | ### Security
100 | - **[BreachSeek: A Multi-Agent Automated Penetration Tester](https://arxiv.org/abs/2409.03789)**
101 | 
102 | ### Multimodal
103 | - **[Mora: Enabling Generalist Video Generation via A Multi-Agent Framework](https://arxiv.org/pdf/2403.13248)**
104 | - **[Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation](https://huggingface.co/papers/2406.01014)**
105 | - **[Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks](https://arxiv.org/abs/2408.03615)**
106 | - **[MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception](https://arxiv.org/abs/2312.07472)**
107 | - **[PC-Agent: A Hierarchical Multi-Agent Framework for Complex Task Automation on PC](https://arxiv.org/abs/2502.14282)**
108 | 
109 | ### Other Domains
110 | - **[Human-level play in Diplomacy by combining language models with strategic reasoning](https://www.science.org/doi/10.1126/science.ade9097)**
111 | - **[CulturePark: Boosting Cross-cultural Understanding in Large Language Models](https://arxiv.org/abs/2405.15145)**
112 | - **[Beyond Human Translation: Multi-Agent Collaboration for Translating Ultra-Long Literary Texts](https://arxiv.org/abs/2405.11804)**
113 | - **[FanCric: Multi-Agentic Framework for Crafting Fantasy 11 Cricket Teams](https://arxiv.org/pdf/2410.01307)**
114 | - **[Can Large Language Models Grasp Legal Theories? Enhance Legal Reasoning with Multi-Agent Collaboration](https://arxiv.org/pdf/2410.02507)**
115 | 
116 | ## Evaluation & Model Improvement
117 | 
118 | - **[Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Match Human Crowd Accuracy](https://arxiv.org/abs/2402.19379)**
119 | - **[Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference](https://arxiv.org/abs/2403.04132)**
120 | - **[Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems](https://arxiv.org/pdf/2403.02419.pdf)**
121 | - **[Evolutionary Optimization of Model Merging Recipes](https://arxiv.org/html/2403.13187v1)**
122 | - **[Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models](https://arxiv.org/abs/2404.18796)**
123 | - **[Constitutional AI: Harmlessness from AI Feedback](https://arxiv.org/abs/2212.08073)**
124 | - **[On scalable oversight with weak LLMs judging strong LLMs](https://arxiv.org/abs/2407.04622)**
125 | - **[ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate](https://arxiv.org/abs/2308.07201)**
126 | - **[RouteLLM: An Open-Source Framework for Cost-Effective LLM Routing](https://lmsys.org/blog/2024-07-01-routellm/)**
127 | - **[Agent-as-a-Judge: Evaluate Agents with Agents](https://arxiv.org/abs/2410.10934v1)**
128 | - **[Adversarial Multi-Agent Evaluation of Large Language Models through Iterative Debates](https://arxiv.org/pdf/2410.04663)**
129 | - **[MALT: Improving Reasoning with Multi-Agent LLM Training](https://arxiv.org/abs/2412.01928)**
130 | - **[Why Do Multi-Agent LLM Systems Fail?](https://arxiv.org/pdf/2503.13657)**
131 | - **[Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent](https://arxiv.org/pdf/2409.11527)**
132 | 
133 | ## Social Simulation & Agent Societies
134 | 
135 | - **[Generative Agents: Interactive Simulacra of Human Behavior](https://arxiv.org/abs/2304.03442)**
136 | - **[SOTOPIA-π: Interactive Learning of Socially Intelligent Language Agents](https://arxiv.org/pdf/2403.08715.pdf)**
137 | - **[Scaling Instructable Agents Across Many Simulated Worlds](https://huggingface.co/papers/2404.10179)**
138 | - **[Scaling Synthetic Data Creation with 1,000,000,000 Personas](https://huggingface.co/papers/2406.20094)**
139 | - **[Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents](https://arxiv.org/abs/2404.16698)**
140 | - **[From Text to Life: On the Reciprocal Relationship between Artificial Life and Large Language Models](https://arxiv.org/abs/2407.09502)**
141 | - **[Mindstorms in Natural Language-Based Societies of Mind](https://arxiv.org/abs/2305.17066)**
142 | - **[The AI Scientist: The world's first AI system for automating scientific research](https://arxiv.org/abs/2408.06292)**
143 | - **[Agents' Room: Narrative Generation through Multi-step Collaboration](https://arxiv.org/pdf/2410.02603)**
144 | - **[GenSim: A General Social Simulation Platform with Large Language Model based Agents](https://arxiv.org/pdf/2410.04360)**
145 | - **[Large Language Models can Achieve Social Balance](https://arxiv.org/pdf/2410.04054)**
146 | - **[Cultural Evolution of Cooperation among LLM Agents](https://arxiv.org/pdf/2412.10270)**
147 | - **[SDPO: Segment-Level Direct Preference Optimization for Social Agents](https://arxiv.org/abs/2501.01821)**
148 | - **[AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents](https://arxiv.org/abs/2502.08691v1)**
149 | - **[OASIS: Open Agent Social Interaction Simulations with One Million Agents](https://arxiv.org/abs/2411.11581)**
150 | 
151 | ## Workflow, Architecture & Agent Design
152 | 
153 | - **[AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models](https://huggingface.co/papers/2403.15157)**
154 | - **[AgentInstruct: Toward Generative Teaching with Agentic Flows](https://arxiv.org/abs/2407.03502)**
155 | - **[SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning](https://arxiv.org/pdf/2409.05556)**
156 | - **[Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts](https://huggingface.co/papers/2409.13449)**
157 | - **[AFlow: Automating Agentic Workflow Generation](https://arxiv.org/pdf/2410.10762)**
158 | - **[Agents Thinking Fast and Slow: A Talker-Reasoner Architecture](https://arxiv.org/pdf/2410.08328)**
159 | - **[DynaSaur: Large Language Agents Beyond Predefined Actions](https://arxiv.org/pdf/2411.01747)**
160 | - **[LLMs as Method Actors: A Model for Prompt Engineering and Architecture](https://arxiv.org/abs/2411.05778v1)**
161 | - **[Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents](https://arxiv.org/abs/2412.13194)**
162 | - **[Automated Design of Agentic Systems](https://arxiv.org/abs/2408.08435)**
163 | - **[The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization](https://arxiv.org/pdf/2408.08688)**
164 | - **[Multi-agent Architecture Search via Agentic Supernet](https://arxiv.org/pdf/2502.04180)**
165 | - **[Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems](https://arxiv.org/abs/2502.11098)**
166 | - **[When One LLM Drools, Multi-LLM Collaboration Rules](https://arxiv.org/abs/2502.04506)**
167 | - **[Enhancing Reasoning with Collaboration and Memory](https://arxiv.org/pdf/2503.05944)**
168 | - **[Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning](https://huggingface.co/papers/2407.20798)**
169 | - **[Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence](https://arxiv.org/abs/2410.10934v1)**
170 | - [MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding](https://huggingface.co/papers/2503.13964)
171 | 
172 | # Papers
173 | 
174 | - **Generative Agents: Interactive Simulacra of Human Behavior**
175 |    J. Park, Joseph C. O'Brien, Carrie J. Cai, M. Morris, Percy Liang, Michael S. Bernstein
176 |    ACM Symposium on User Interface Software and Technology 2023
177 |    [open paper page](http://dl.acm.org/citation.cfm?id=3606763)
178 |    <details>
179 |      <summary> Abstract </summary>
180 |      Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents: computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent’s experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty-five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors. For example, starting with only a single user-specified notion that one agent wants to throw a Valentine’s Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture—observation, planning, and reflection—each contribute critically to the believability of agent behavior. By fusing large language models with computational interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.
181 |   </details>
182 | 
183 | - **CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society**
184 |    G. Li, Hasan Hammoud, Hani Itani, Dmitrii Khizbullin, Bernard Ghanem
185 |    Neural Information Processing Systems 2023
186 |    [open paper page](https://arxiv.org/pdf/2303.17760.pdf)
187 |    <details>
188 |      <summary> Abstract </summary>
189 |      The rapid advancement of chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents, and provides insight into their"cognitive"processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of a society of agents, providing a valuable resource for investigating conversational language models. In particular, we conduct comprehensive studies on instruction-following cooperation in multi-agent settings. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond: https://github.com/camel-ai/camel.
190 |   </details>
191 | 
192 | - **MetaGPT: Meta Programming for Multi-Agent Collaborative Framework**
193 |    Sirui Hong, Xiawu Zheng, Jonathan P. Chen, Yuheng Cheng, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Z. Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu
194 |    arXiv.org 2023
195 |    [open paper page](https://doi.org/10.48550/arXiv.2308.00352)
196 |    <details>
197 |      <summary> Abstract </summary>
198 |      
199 |   </details>
200 | 
201 | - **ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate**
202 |    Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shan Zhang, Jie Fu
203 |    arXiv.org 2023
204 |    [open paper page](https://arxiv.org/pdf/2308.07201.pdf)
205 |    <details>
206 |      <summary> Abstract </summary>
207 |      Text evaluation has historically posed significant challenges, often demanding substantial labor and time cost. With the emergence of large language models (LLMs), researchers have explored LLMs' potential as alternatives for human evaluation. While these single-agent-based approaches show promise, experimental results suggest that further advancements are needed to bridge the gap between their current effectiveness and human-level evaluation quality. Recognizing that best practices of human evaluation processes often involve multiple human annotators collaborating in the evaluation, we resort to a multi-agent debate framework, moving beyond single-agent prompting strategies. The multi-agent-based approach enables a group of LLMs to synergize with an array of intelligent counterparts, harnessing their distinct capabilities and expertise to enhance efficiency and effectiveness in handling intricate tasks. In this paper, we construct a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models on open-ended questions and traditional natural language generation (NLG) tasks. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments. Our code is available at https://github.com/chanchimin/ChatEval.
208 |   </details>
209 | 
210 | - **API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs**
211 |    Minghao Li, Feifan Song, Yu Bowen, Haiyang Yu, Zhoujun Li, Fei Huang, Yongbin Li
212 |    Conference on Empirical Methods in Natural Language Processing 2023
213 |    [open paper page](https://api.semanticscholar.org/CorpusId:258179056)
214 |    <details>
215 |      <summary> Abstract </summary>
216 |      Recent research has demonstrated that Large Language Models (LLMs) can enhance their capabilities by utilizing external tools. However, three pivotal questions remain unanswered: (1) How effective are current LLMs in utilizing tools? (2) How can we enhance LLMs' ability to utilize tools? (3) What obstacles need to be overcome to leverage tools? To address these questions, we introduce API-Bank, a groundbreaking benchmark, specifically designed for tool-augmented LLMs. For the first question, we develop a runnable evaluation system consisting of 73 API tools. We annotate 314 tool-use dialogues with 753 API calls to assess the existing LLMs' capabilities in planning, retrieving, and calling APIs. For the second question, we construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains. Using this dataset, we train Lynx, a tool-augmented LLM initialized from Alpaca. Experimental results demonstrate that GPT-3.5 exhibits improved tool utilization compared to GPT-3, while GPT-4 excels in planning. However, there is still significant potential for further improvement. Moreover, Lynx surpasses Alpaca's tool utilization performance by more than 26 pts and approaches the effectiveness of GPT-3.5. Through error analysis, we highlight the key challenges for future research in this field to answer the third question.
217 |   </details>
218 | 
219 | - **AutoAgents: A Framework for Automatic Agent Generation**
220 |    Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, Börje F. Karlsson, Jie Fu, Yemin Shi
221 |    International Joint Conference on Artificial Intelligence 2023
222 |    [open paper page](https://api.semanticscholar.org/CorpusId:263310605)
223 |    <details>
224 |      <summary> Abstract </summary>
225 |      Large language models (LLMs) have enabled remarkable advances in automated task-solving with multi-agent systems. However, most existing LLM-based multi-agent approaches rely on predefined agents to handle simple tasks, limiting the adaptability of multi-agent collaboration to different scenarios. Therefore, we introduce AutoAgents, an innovative framework that adaptively generates and coordinates multiple specialized agents to build an AI team according to different tasks. Specifically, AutoAgents couples the relationship between tasks and roles by dynamically generating multiple required agents based on task content and planning solutions for the current task based on the generated expert agents. Multiple specialized agents collaborate with each other to efficiently accomplish tasks. Concurrently, an observer role is incorporated into the framework to reflect on the designated plans and agents' responses and improve upon them. Our experiments on various benchmarks demonstrate that AutoAgents generates more coherent and accurate solutions than the existing multi-agent methods. This underscores the significance of assigning different roles to different tasks and of team cooperation, offering new perspectives for tackling complex tasks. The repository of this project is available at https://github.com/Link-AGI/AutoAgents.
226 |   </details>
227 | 
228 | - **Scalable Multi-Robot Collaboration with Large Language Models: Centralized or Decentralized Systems?**
229 |    Yongchao Chen, Jacob Arkin, Yang Zhang, Nicholas Roy, Chuchu Fan
230 |    IEEE International Conference on Robotics and Automation 2023
231 |    [open paper page](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10610676)
232 |    <details>
233 |      <summary> Abstract </summary>
234 |      A flurry of recent work has demonstrated that pre-trained large language models (LLMs) can be effective task planners for a variety of single-robot tasks. The planning performance of LLMs is significantly improved via prompting techniques, such as in-context learning or re-prompting with state feedback, placing new importance on the token budget for the context window. An under-explored but natural next direction is to investigate LLMs as multi-robot task planners. However, long-horizon, heterogeneous multi-robot planning introduces new challenges of coordination while also pushing up against the limits of context window length. It is therefore critical to find token-efficient LLM planning frameworks that are also able to reason about the complexities of multi-robot coordination. In this work, we compare the task success rate and token efficiency of four multi-agent communication frameworks (centralized, decentralized, and two hybrid) as applied to four coordination-dependent multi-agent 2D task scenarios for increasing numbers of agents. We find that a hybrid framework achieves better task success rates across all four tasks and scales better to more agents. We further demonstrate the hybrid frameworks in 3D simulations where the vision-to-text problem and dynamical errors are considered. See our project website 4 for prompts, videos, and code.
235 |   </details>
236 | 
237 | - **AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks**
238 |    Yifan Zeng, Yiran Wu, Xiao Zhang, Huazheng Wang, Qingyun Wu
239 |    arXiv.org 2024
240 |    [open paper page](https://api.semanticscholar.org/CorpusId:268297202)
241 |    <details>
242 |      <summary> Abstract </summary>
243 |      Despite extensive pre-training in moral alignment to prevent generating harmful information, large language models (LLMs) remain vulnerable to jailbreak attacks. In this paper, we propose AutoDefense, a multi-agent defense framework that filters harmful responses from LLMs. With the response-filtering mechanism, our framework is robust against different jailbreak attack prompts, and can be used to defend different victim models. AutoDefense assigns different roles to LLM agents and employs them to complete the defense task collaboratively. The division in tasks enhances the overall instruction-following of LLMs and enables the integration of other defense components as tools. With AutoDefense, small open-source LMs can serve as agents and defend larger models against jailbreak attacks. Our experiments show that AutoDefense can effectively defense against different jailbreak attacks, while maintaining the performance at normal user request. For example, we reduce the attack success rate on GPT-3.5 from 55.74% to 7.95% using LLaMA-2-13b with a 3-agent system. Our code and data are publicly available at https://github.com/XHMY/AutoDefense.
244 |   </details>
245 | 
246 | - **Large Language Model based Multi-Agents: A Survey of Progress and Challenges**
247 |    Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, N. Chawla, Olaf Wiest, Xiangliang Zhang
248 |    International Joint Conference on Artificial Intelligence 2024
249 |    [open paper page](https://api.semanticscholar.org/CorpusId:267412980)
250 |    <details>
251 |      <summary> Abstract </summary>
252 |      Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to their notable capabilities in planning and reasoning, LLMs have been utilized as autonomous agents for the automatic execution of various tasks. Recently, LLM-based agent systems have rapidly evolved from single-agent planning or decision-making to operating as multi-agent systems, enhancing their ability in complex problem-solving and world simulation. To offer an overview of this dynamic field, we present this survey to offer an in-depth discussion on the essential aspects and challenges of LLM-based multi-agent (LLM-MA) systems. Our objective is to provide readers with an in-depth understanding of these key points: the domains and settings where LLM-MA systems operate or simulate; the profiling and communication methods of these agents; and the means by which these agents develop their skills. For those interested in delving into this field, we also summarize the commonly used datasets or benchmarks. To keep researchers updated on the latest studies, we maintain an open-source GitHub repository (github.com/taichengguo/LLM_MultiAgents_Survey_Papers), dedicated to outlining the research of LLM-MA research.
253 |   </details>
254 | 
255 | - **Small LLMs Are Weak Tool Learners: A Multi-LLM Agent**
256 |    Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, Fei Huang
257 |    Conference on Empirical Methods in Natural Language Processing 2024
258 |    [open paper page](https://www.aclanthology.org/2024.emnlp-main.929.pdf)
259 |    <details>
260 |      <summary> Abstract </summary>
261 |      Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs, empowering them to interact with external tools (e.g., APIs, functions) and complete various tasks in a self-directed fashion. The challenge of tool use demands that LLMs not only understand user queries and generate answers accurately but also excel in task planning, tool invocation, and result summarization. While traditional works focus on training a single LLM with all these capabilities, performance limitations become apparent, particularly with smaller models. To overcome these challenges, we propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. Each component is implemented by a single LLM that focuses on a specific capability and collaborates with others to accomplish the task. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability. To effectively train this framework, we introduce a two-stage training paradigm. First, we fine-tune a backbone LLM on the entire dataset without discriminating sub-tasks, providing the model with a comprehensive understanding of the task. Second, the fine-tuned LLM is used to instantiate the planner, caller, and summarizer respectively, which are continually fine-tuned on respective sub-tasks. Evaluation across various tool-use benchmarks illustrates that our proposed multi-LLM framework surpasses the traditional single-LLM approach, highlighting its efficacy and advantages in tool learning.
262 |   </details>
263 | 
264 | - **Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents**
265 |    Yashar Talebirad, Amirhossein Nadiri
266 |    arXiv.org 2023
267 |    [open paper page](https://api.semanticscholar.org/CorpusId:259088724)
268 |    <details>
269 |      <summary> Abstract </summary>
270 |      In this paper, we present a novel framework for enhancing the capabilities of large language models (LLMs) by leveraging the power of multi-agent systems. Our framework introduces a collaborative environment where multiple intelligent agent components, each with distinctive attributes and roles, work together to handle complex tasks more efficiently and effectively. We demonstrate the practicality and versatility of our framework through case studies in artificial general intelligence (AGI), specifically focusing on the Auto-GPT and BabyAGI models. We also examine the"Gorilla"model, which integrates external APIs into the LLM. Our framework addresses limitations and challenges such as looping issues, security risks, scalability, system evaluation, and ethical considerations. By modeling various domains such as courtroom simulations and software development scenarios, we showcase the potential applications and benefits of our proposed multi-agent system. Our framework provides an avenue for advancing the capabilities and performance of LLMs through collaboration and knowledge exchange among intelligent agents.
271 |   </details>
272 | 
273 | - **Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?**
274 |    Qineng Wang, Zihao Wang, Ying Su, Hanghang Tong, Yangqiu Song
275 |    Annual Meeting of the Association for Computational Linguistics 2024
276 |    [open paper page](https://arxiv.org/pdf/2402.18272.pdf)
277 |    <details>
278 |      <summary> Abstract </summary>
279 |      Recent progress in LLMs discussion suggests that multi-agent discussion improves the reasoning abilities of LLMs. In this work, we reevaluate this claim through systematic experiments, where we propose a novel group discussion framework to enrich the set of discussion mechanisms. Interestingly, our results show that a single-agent LLM with strong prompts can achieve almost the same performance as the best existing discussion approach on a wide range of reasoning tasks and backbone LLMs. We observe that the multi-agent discussion performs better than a single agent only when there is no demonstration in the prompt. Further study reveals the common interaction mechanisms of LLMs during the discussion.
280 |   </details>
281 | 
282 | - **Stance Detection with Collaborative Role-Infused LLM-Based Agents**
283 |    Xiaochong Lan, Chen Gao, Depeng Jin, Yong Li
284 |    International Conference on Web and Social Media 2023
285 |    [open paper page](https://api.semanticscholar.org/CorpusId:264146255)
286 |    <details>
287 |      <summary> Abstract </summary>
288 |      Stance detection automatically detects the stance in a text towards a target, vital for content analysis in web and social media research. Despite their promising capabilities, LLMs encounter challenges when directly applied to stance detection. First, stance detection demands multi-aspect knowledge, from deciphering event-related terminologies to understanding the expression styles in social media platforms. Second, stance detection requires advanced reasoning to infer authors' implicit viewpoints, as stances are often subtly embedded rather than overtly stated in the text. To address these challenges, we design a three-stage framework COLA (short for Collaborative rOle-infused LLM-based Agents) in which LLMs are designated distinct roles, creating a collaborative system where each role contributes uniquely. Initially, in the multidimensional text analysis stage, we configure the LLMs to act as a linguistic expert, a domain specialist, and a social media veteran to get a multifaceted analysis of texts, thus overcoming the first challenge. Next, in the reasoning-enhanced debating stage, for each potential stance, we designate a specific LLM-based agent to advocate for it, guiding the LLM to detect logical connections between text features and stance, tackling the second challenge. Finally, in the stance conclusion stage, a final decision maker agent consolidates prior insights to determine the stance. Our approach avoids extra annotated data and model training and is highly usable. We achieve state-of-the-art performance across multiple datasets. Ablation studies validate the effectiveness of each role design in handling stance detection. Further experiments have demonstrated the explainability and the versatility of our approach. Our approach excels in usability, accuracy, effectiveness, explainability and versatility, highlighting its value.
289 |   </details>
290 | 
291 | - **MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration**
292 |    Lin Xu, Zhiyuan Hu, Daquan Zhou, Hongyu Ren, Zhen Dong, Kurt Keutzer, See-Kiong Ng, Jiashi Feng
293 |    unknown 2023
294 |    [open paper page](https://api.semanticscholar.org/CorpusId:265212971)
295 |    <details>
296 |      <summary> Abstract </summary>
297 |      Large Language Models (LLMs) have significantly advanced natural language processing, demonstrating exceptional reasoning, tool usage, and memory capabilities. As their applications expand into multi-agent environments, there arises a need for a comprehensive evaluation framework that captures LLMs’ reasoning, planning, collaboration, and other social abilities. This work introduces a novel competition-based benchmark framework specifically designed to assess LLMs within multi-agent settings, providing quantitative metrics to evaluate their judgment, reasoning, deception, self-awareness, cooperation, coordination, and rationality.We utilize two social deduction games alongside three game-theory scenarios to create diverse environments.Our frame is fortified with the probabilistic graphic modeling (PGM) method, enhancing the LLMs’ capabilities in navigating complex social and cognitive dimensions. We evaluate seven LLMs, quantitatively highlighting a significant capability gap of over threefold between the strongest, GPT o1, and the weakest, Llama-2-70B. It also confirms that our PGM enhancement boosts the abilities of all selected models by an average of 37%. Our data and code can be found here https://github.com/cathyxl/MAgIC.
298 |   </details>
299 | 
300 | - **Large Language Model for Participatory Urban Planning**
301 |    Zhilun Zhou, Yuming Lin, Depeng Jin, Yong Li
302 |    arXiv.org 2024
303 |    [open paper page](https://api.semanticscholar.org/CorpusId:268032947)
304 |    <details>
305 |      <summary> Abstract </summary>
306 |      Participatory urban planning is the mainstream of modern urban planning that involves the active engagement of residents. However, the traditional participatory paradigm requires experienced planning experts and is often time-consuming and costly. Fortunately, the emerging Large Language Models (LLMs) have shown considerable ability to simulate human-like agents, which can be used to emulate the participatory process easily. In this work, we introduce an LLM-based multi-agent collaboration framework for participatory urban planning, which can generate land-use plans for urban regions considering the diverse needs of residents. Specifically, we construct LLM agents to simulate a planner and thousands of residents with diverse profiles and backgrounds. We first ask the planner to carry out an initial land-use plan. To deal with the different facilities needs of residents, we initiate a discussion among the residents in each community about the plan, where residents provide feedback based on their profiles. Furthermore, to improve the efficiency of discussion, we adopt a fishbowl discussion mechanism, where part of the residents discuss and the rest of them act as listeners in each round. Finally, we let the planner modify the plan based on residents' feedback. We deploy our method on two real-world regions in Beijing. Experiments show that our method achieves state-of-the-art performance in residents satisfaction and inclusion metrics, and also outperforms human experts in terms of service accessibility and ecology metrics.
307 |   </details>
308 | 
309 | - **Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization**
310 |    Yoichi Ishibashi, Yoshimasa Nishimura
311 |    arXiv.org 2024
312 |    [open paper page](https://arxiv.org/pdf/2404.02183.pdf)
313 |    <details>
314 |      <summary> Abstract </summary>
315 |      Recent advancements in automatic code generation using large language model (LLM) agent have brought us closer to the future of automated software development. However, existing single-agent approaches face limitations in generating and improving large-scale, complex codebases due to constraints in context length. To tackle this challenge, we propose Self-Organized multi-Agent framework (SoA), a novel multi-agent framework that enables the scalable and efficient generation and optimization of large-scale code. In SoA, self-organized agents operate independently to generate and modify code components while seamlessly collaborating to construct the overall codebase. A key feature of our framework is the automatic multiplication of agents based on problem complexity, allowing for dynamic scalability. This enables the overall code volume to be increased indefinitely according to the number of agents, while the amount of code managed by each agent remains constant. We evaluate SoA on the HumanEval benchmark and demonstrate that, compared to a single-agent system, each agent in SoA handles significantly less code, yet the overall generated code is substantially greater. Moreover, SoA surpasses the powerful single-agent baseline by 5% in terms of Pass@1 accuracy.
316 |   </details>
317 | 
318 | - **LLM Harmony: Multi-Agent Communication for Problem Solving**
319 |    Sumedh Rasal
320 |    unknown 2024
321 |    [open paper page](https://api.semanticscholar.org/CorpusId:266725580)
322 |    <details>
323 |      <summary> Abstract </summary>
324 |      Large Language Models (LLMs) have revolutionized Natural Language Processing but exhibit limitations, particularly in autonomously addressing novel challenges such as reasoning and problem-solving. Traditional techniques like chain-of-thought prompting necessitate explicit human guidance. This paper introduces a novel multi-agent communication framework, inspired by the CAMEL model, to enhance LLMs' autonomous problem-solving capabilities. The framework employs multiple LLM agents, each with a distinct persona, engaged in role-playing communication, offering a nuanced and adaptable approach to diverse problem scenarios. Extensive experimentation demonstrates the framework's superior performance and adaptability, providing valuable insights into the collaborative potential of multiple agents in overcoming the limitations of individual models.
325 |   </details>
326 | 
327 | - **LLM Voting: Human Choices and AI Collective Decision Making**
328 |    Joshua C. Yang, Marcin Korecki, Damian Dailisan, C. I. Hausladen, Dirk Helbing
329 |    unknown 2024
330 |    [open paper page](https://api.semanticscholar.org/CorpusId:267413124)
331 |    <details>
332 |      <summary> Abstract </summary>
333 |      This paper investigates the voting behaviors of Large Language Models (LLMs), specifically GPT-4 and LLaMA-2, their biases, and how they align with human voting patterns. Our methodology involved using a dataset from a human voting experiment to establish a baseline for human preferences and conducting a corresponding experiment with LLM agents. We observed that the choice of voting methods and the presentation order influenced LLM voting outcomes. We found that varying the persona can reduce some of these biases and enhance alignment with human choices. While the Chain-of-Thought approach did not improve prediction accuracy, it has potential for AI explainability in the voting process. We also identified a trade-off between preference diversity and alignment accuracy in LLMs, influenced by different temperature settings. Our findings indicate that LLMs may lead to less diverse collective outcomes and biased assumptions when used in voting scenarios, emphasizing the need for cautious integration of LLMs into democratic processes.
334 |   </details>
335 | 
336 | - **Multi-Agent Consensus Seeking via Large Language Models**
337 |    Huaben Chen, Wenkang Ji, Lufeng Xu, Shiyu Zhao
338 |    arXiv.org 2023
339 |    [open paper page](https://arxiv.org/pdf/2310.20151.pdf)
340 |    <details>
341 |      <summary> Abstract </summary>
342 |      Multi-agent systems driven by large language models (LLMs) have shown promising abilities for solving complex tasks in a collaborative manner. This work considers a fundamental problem in multi-agent collaboration: consensus seeking. When multiple agents work together, we are interested in how they can reach a consensus through inter-agent negotiation. To that end, this work studies a consensus-seeking task where the state of each agent is a numerical value and they negotiate with each other to reach a consensus value. It is revealed that when not explicitly directed on which strategy should be adopted, the LLM-driven agents primarily use the average strategy for consensus seeking although they may occasionally use some other strategies. Moreover, this work analyzes the impact of the agent number, agent personality, and network topology on the negotiation process. The findings reported in this work can potentially lay the foundations for understanding the behaviors of LLM-driven multi-agent systems for solving more complex tasks. Furthermore, LLM-driven consensus seeking is applied to a multi-robot aggregation task. This application demonstrates the potential of LLM-driven agents to achieve zero-shot autonomous planning for multi-robot collaboration tasks. Project website: windylab.github.io/ConsensusLLM/.
343 |   </details>
344 | 
345 | - **A Multi-Agent Conversational Recommender System**
346 |    Jiabao Fang, Shen Gao, Pengjie Ren, Xiuying Chen, Suzan Verberne, Zhaochun Ren
347 |    unknown 2024
348 |    [open paper page](https://api.semanticscholar.org/CorpusId:267406588)
349 |    <details>
350 |      <summary> Abstract </summary>
351 |      Due to strong capabilities in conducting fluent, multi-turn conversations with users, Large Language Models (LLMs) have the potential to further improve the performance of Conversational Recommender System (CRS). Unlike the aimless chit-chat that LLM excels at, CRS has a clear target. So it is imperative to control the dialogue flow in the LLM to successfully recommend appropriate items to the users. Furthermore, user feedback in CRS can assist the system in better modeling user preferences, which has been ignored by existing studies. However, simply prompting LLM to conduct conversational recommendation cannot address the above two key challenges. In this paper, we propose Multi-Agent Conversational Recommender System (MACRS) which contains two essential modules. First, we design a multi-agent act planning framework, which can control the dialogue flow based on four LLM-based agents. This cooperative multi-agent framework will generate various candidate responses based on different dialogue acts and then choose the most appropriate response as the system response, which can help MACRS plan suitable dialogue acts. Second, we propose a user feedback-aware reflection mechanism which leverages user feedback to reason errors made in previous turns to adjust the dialogue act planning, and higher-level user information from implicit semantics. We conduct extensive experiments based on user simulator to demonstrate the effectiveness of MACRS in recommendation and user preferences collection. Experimental results illustrate that MACRS demonstrates an improvement in user interaction experience compared to directly using LLMs.
352 |   </details>
353 | 
354 | - **Shall We Team Up: Exploring Spontaneous Cooperation of Competing LLM Agents**
355 |    Zengqing Wu, Run Peng, Shuyuan Zheng, Qianying Liu, Xu Han, Brian Inhyuk Kwon, Makoto Onizuka, Shaojie Tang, Chuan Xiao
356 |    Conference on Empirical Methods in Natural Language Processing 2024
357 |    [open paper page](https://api.semanticscholar.org/CorpusId:267750197)
358 |    <details>
359 |      <summary> Abstract </summary>
360 |      Large Language Models (LLMs) have increasingly been utilized in social simulations, where they are often guided by carefully crafted instructions to stably exhibit human-like behaviors during simulations. Nevertheless, we doubt the necessity of shaping agents' behaviors for accurate social simulations. Instead, this paper emphasizes the importance of spontaneous phenomena, wherein agents deeply engage in contexts and make adaptive decisions without explicit directions. We explored spontaneous cooperation across three competitive scenarios and successfully simulated the gradual emergence of cooperation, findings that align closely with human behavioral data. This approach not only aids the computational social science community in bridging the gap between simulations and real-world dynamics but also offers the AI community a novel method to assess LLMs' capability of deliberate reasoning.
361 |   </details>
362 | 
363 | - **Content Knowledge Identification with Multi-Agent Large Language Models (LLMs)**
364 |    Kaiqi Yang, Yucheng Chu, Taylor Darwin, Ahreum Han, Hang Li, Hongzhi Wen, Yasemin Copur-Gencturk, Jiliang Tang, Hui Liu
365 |    International Conference on Artificial Intelligence in Education 2024
366 |    [open paper page](https://api.semanticscholar.org/CorpusId:269042958)
367 |    <details>
368 |      <summary> Abstract </summary>
369 |      Teachers' mathematical content knowledge (CK) is of vital importance and need in teacher professional development (PD) programs. Computer-aided asynchronous PD systems are the most recent proposed PD techniques, which aim to help teachers improve their PD equally with fewer concerns about costs and limitations of time or location. However, current automatic CK identification methods, which serve as one of the core techniques of asynchronous PD systems, face challenges such as diversity of user responses, scarcity of high-quality annotated data, and low interpretability of the predictions. To tackle these challenges, we propose a Multi-Agent LLMs-based framework, LLMAgent-CK, to assess the user responses' coverage of identified CK learning goals without human annotations. By taking advantage of multi-agent LLMs in strong generalization ability and human-like discussions, our proposed LLMAgent-CK presents promising CK identifying performance on a real-world mathematical CK dataset MaCKT. Moreover, our case studies further demonstrate the working of the multi-agent framework.
370 |   </details>
371 | 
372 | - **MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making**
373 |    Yu Han Kim, Hyewon Jeong, Cynthia Breazeal, Daniel McDuff, Hae Won Park, Xuhai Xu, Chanwoo Park, Yik Siu Chan
374 |     2024
375 |    [open paper page](https://api.semanticscholar.org/CorpusId:269303028)
376 |    <details>
377 |      <summary> Abstract </summary>
378 |      Foundation models are becoming valuable tools in medicine. Yet despite their promise, the best way to leverage Large Language Models (LLMs) in complex medical tasks remains an open question. We introduce a novel multi-agent framework, named Medical Decision-making Agents (MDAgents) that helps address this gap by automatically assigning a collaboration structure to a team of LLMs. The assigned solo or group collaboration structure is tailored to the medical task at hand, emulating real-world medical decision-making processes adapted to tasks of varying complexities. We evaluate our framework and baseline methods using state-of-the-art LLMs across a suite of real-world medical knowledge and medical diagnosis benchmarks, including a comparison of LLMs' medical complexity classification against human physicians. MDAgents achieved the best performance in seven out of ten benchmarks on tasks requiring an understanding of medical knowledge and multi-modal reasoning, showing a significant improvement of up to 4.2% (p<0.05) compared to previous methods' best performances. Ablation studies reveal that MDAgents effectively determines medical complexity to optimize for efficiency and accuracy across diverse medical tasks. Notably, the combination of moderator review and external medical knowledge in group collaboration resulted in an average accuracy improvement of 11.8%. Our code can be found at https://github.com/mitmedialab/MDAgents.
379 |   </details>
380 | 
381 | - **Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded Yet**
382 |    Weizhe Chen, Sven Koenig, B. Dilkina
383 |    unknown 2024
384 |    [open paper page](https://api.semanticscholar.org/CorpusId:266843945)
385 |    <details>
386 |      <summary> Abstract </summary>
387 |      With the explosive influence caused by the success of large language models (LLM) like ChatGPT and GPT-4, there has been an extensive amount of recent work showing that foundation models can be used to solve a large variety of tasks. However, there is very limited work that shares insights on multi-agent planning. Multi-agent planning is different from other domains by combining the difficulty of multi-agent coordination and planning, and making it hard to leverage external tools to facilitate the reasoning needed. In this paper, we focus on the problem of multi-agent path finding (MAPF), which is also known as multi-robot route planning, and study the performance of solving MAPF with LLMs. We first show the motivating success on an empty room map without obstacles, then the failure to plan on the harder room map and maze map of the standard MAPF benchmark. We present our position on why directly solving MAPF with LLMs has not been successful yet, and we use various experiments to support our hypothesis. Based on our results, we discussed how researchers with different backgrounds could help with this problem from different perspectives.
388 |   </details>
389 | 
390 | - **MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems**
391 |    Rui Ye, Shuo Tang, Rui Ge, Yaxin Du, Zhen-fei Yin, Siheng Chen, Jing Shao
392 |    unknown 2025
393 |    [open paper page](https://api.semanticscholar.org/CorpusId:276781761)
394 |    <details>
395 |      <summary> Abstract </summary>
396 |      LLM-based multi-agent systems (MAS) have shown significant potential in tackling diverse tasks. However, to design effective MAS, existing approaches heavily rely on manual configurations or multiple calls of advanced LLMs, resulting in inadaptability and high inference costs. In this paper, we simplify the process of building an MAS by reframing it as a generative language task, where the input is a user query and the output is a corresponding MAS. To address this novel task, we unify the representation of MAS as executable code and propose a consistency-oriented data construction pipeline to create a high-quality dataset comprising coherent and consistent query-MAS pairs. Using this dataset, we train MAS-GPT, an open-source medium-sized LLM that is capable of generating query-adaptive MAS within a single LLM inference. The generated MAS can be seamlessly applied to process user queries and deliver high-quality responses. Extensive experiments on 9 benchmarks and 5 LLMs show that the proposed MAS-GPT consistently outperforms 10+ baseline MAS methods on diverse settings, indicating MAS-GPT's high effectiveness, efficiency and strong generalization ability. Code will be available at https://github.com/rui-ye/MAS-GPT.
397 |   </details>
398 | 
399 | - **MAPoRL: Multi-Agent Post-Co-Training for Collaborative Large Language Models with Reinforcement Learning**
400 |    Chanwoo Park, Seungju Han, Xingzhi Guo, A. Ozdaglar, Kaiqing Zhang, Joo-Kyung Kim
401 |    unknown 2025
402 |    [open paper page](https://api.semanticscholar.org/CorpusId:276580906)
403 |    <details>
404 |      <summary> Abstract </summary>
405 |      Leveraging multiple large language models (LLMs) to build collaborative multi-agentic workflows has demonstrated significant potential. However, most previous studies focus on prompting the out-of-the-box LLMs, relying on their innate capability for collaboration, which may not improve LLMs' performance as shown recently. In this paper, we introduce a new post-training paradigm MAPoRL (Multi-Agent Post-co-training for collaborative LLMs with Reinforcement Learning), to explicitly elicit the collaborative behaviors and further unleash the power of multi-agentic LLM frameworks. In MAPoRL, multiple LLMs first generate their own responses independently and engage in a multi-turn discussion to collaboratively improve the final answer. In the end, a MAPoRL verifier evaluates both the answer and the discussion, by assigning a score that verifies the correctness of the answer, while adding incentives to encourage corrective and persuasive discussions. The score serves as the co-training reward, and is then maximized through multi-agent RL. Unlike existing LLM post-training paradigms, MAPoRL advocates the co-training of multiple LLMs together using RL for better generalization. Accompanied by analytical insights, our experiments demonstrate that training individual LLMs alone is insufficient to induce effective collaboration. In contrast, multi-agent co-training can boost the collaboration performance across benchmarks, with generalization to unseen domains.
406 |   </details>
407 | 
408 | - **MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization**
409 |    Yougang Lyu, Lingyong Yan, Zihan Wang, Dawei Yin, Pengjie Ren, M. D. Rijke, Z. Ren
410 |    arXiv.org 2024
411 |    [open paper page](https://api.semanticscholar.org/CorpusId:273233831)
412 |    <details>
413 |      <summary> Abstract </summary>
414 |      As large language models (LLMs) are rapidly advancing and achieving near-human capabilities, aligning them with human values is becoming more urgent. In scenarios where LLMs outperform humans, we face a weak-to-strong alignment problem where we need to effectively align strong student LLMs through weak supervision generated by weak teachers. Existing alignment methods mainly focus on strong-to-weak alignment and self-alignment settings, and it is impractical to adapt them to the much harder weak-to-strong alignment setting. To fill this gap, we propose a multi-agent contrastive preference optimization (MACPO) framework. MACPO facilitates weak teachers and strong students to learn from each other by iteratively reinforcing unfamiliar positive behaviors while penalizing familiar negative ones. To get this, we devise a mutual positive behavior augmentation strategy to encourage weak teachers and strong students to learn from each other's positive behavior and further provide higher quality positive behavior for the next iteration. Additionally, we propose a hard negative behavior construction strategy to induce weak teachers and strong students to generate familiar negative behavior by fine-tuning on negative behavioral data. Experimental results on the HH-RLHF and PKU-SafeRLHF datasets, evaluated using both automatic metrics and human judgments, demonstrate that MACPO simultaneously improves the alignment performance of strong students and weak teachers. Moreover, as the number of weak teachers increases, MACPO achieves better weak-to-strong alignment performance through more iteration optimization rounds.
415 |   </details>
416 | 
417 | - **MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation**
418 |    Harsh Singh, Rocktim Jyoti Das, Mingfei Han, Preslav Nakov, Ivan Laptev
419 |     2024
420 |    [open paper page](https://api.semanticscholar.org/CorpusId:274280893)
421 |    <details>
422 |      <summary> Abstract </summary>
423 |      Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. While recent efforts in robotics have leveraged LLMs both for high-level and low-level planning, these approaches often face significant challenges, such as hallucinations in long-horizon tasks and limited adaptability due to the generation of plans in a single pass without real-time feedback. To address these limitations, we propose a novel multi-agent LLM framework, Multi-Agent Large Language Model for Manipulation (MALMM) that distributes high-level planning and low-level control code generation across specialized LLM agents, supervised by an additional agent that dynamically manages transitions. By incorporating observations from the environment after each step, our framework effectively handles intermediate failures and enables adaptive re-planning. Unlike existing methods, our approach does not rely on pre-trained skill policies or in-context learning examples and generalizes to a variety of new tasks. We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting, thereby overcoming key limitations of existing LLM-based manipulation methods.
424 |   </details>
425 | 
426 | - **SMoA: Improving Multi-agent Large Language Models with Sparse Mixture-of-Agents**
427 |    Dawei Li, Zhen Tan, Peijia Qian, Yifan Li, Kumar Satvik Chaudhary, Lijie Hu, Jiayi Shen
428 |     2024
429 |    [open paper page](https://api.semanticscholar.org/CorpusId:273821018)
430 |    <details>
431 |      <summary> Abstract </summary>
432 |      While multi-agent systems have been shown to significantly enhance the performance of Large Language Models (LLMs) across various tasks and applications, the dense interaction between scaling agents potentially hampers their efficiency and diversity. To address these challenges, we draw inspiration from the sparse mixture-of-agents (SMoE) and propose a sparse mixture-of-agents (SMoA) framework to improve the efficiency and diversity of multi-agent LLMs. Unlike completely connected structures, SMoA introduces novel Response Selection and Early Stopping mechanisms to sparsify information flows among individual LLM agents, striking a balance between performance and efficiency. Additionally, inspired by the expert diversity principle in SMoE frameworks for workload balance between experts, we assign distinct role descriptions to each LLM agent, fostering diverse and divergent thinking. Extensive experiments on reasoning, alignment, and fairness benchmarks demonstrate that SMoA achieves performance comparable to traditional mixture-of-agents approaches but with significantly lower computational costs. Further analysis reveals that SMoA is more stable, has a greater capacity to scale, and offers considerable potential through hyper-parameter optimization. Code and data will be available at: https://github.com/David-Li0406/SMoA.
433 |   </details>
434 | 
435 | - **Large language model empowered participatory urban planning**
436 |    Zhilun Zhou, Yuming Lin, Yong Li
437 |    arXiv.org 2024
438 |    [open paper page](https://api.semanticscholar.org/CorpusId:267412207)
439 |    <details>
440 |      <summary> Abstract </summary>
441 |      Participatory urban planning is the mainstream of modern urban planning and involves the active engagement of different stakeholders. However, the traditional participatory paradigm encounters challenges in time and manpower, while the generative planning tools fail to provide adjustable and inclusive solutions. This research introduces an innovative urban planning approach integrating Large Language Models (LLMs) within the participatory process. The framework, based on the crafted LLM agent, consists of role-play, collaborative generation, and feedback iteration, solving a community-level land-use task catering to 1000 distinct interests. Empirical experiments in diverse urban communities exhibit LLM's adaptability and effectiveness across varied planning scenarios. The results were evaluated on four metrics, surpassing human experts in satisfaction and inclusion, and rivaling state-of-the-art reinforcement learning methods in service and ecology. Further analysis shows the advantage of LLM agents in providing adjustable and inclusive solutions with natural language reasoning and strong scalability. While implementing the recent advancements in emulating human behavior for planning, this work envisions both planners and citizens benefiting from low-cost, efficient LLM agents, which is crucial for enhancing participation and realizing participatory urban planning.
442 |   </details>
443 | 
444 | - **Agents on the Bench: Large Language Model Based Multi Agent Framework for Trustworthy Digital Justice**
445 |    Cong Jiang, Xiaolei Yang
446 |     2024
447 |    [open paper page](https://api.semanticscholar.org/CorpusId:275119074)
448 |    <details>
449 |      <summary> Abstract </summary>
450 |      The justice system has increasingly employed AI techniques to enhance efficiency, yet limitations remain in improving the quality of decision-making, particularly regarding transparency and explainability needed to uphold public trust in legal AI. To address these challenges, we propose a large language model based multi-agent framework named AgentsBench, which aims to simultaneously improve both efficiency and quality in judicial decision-making. Our approach leverages multiple LLM-driven agents that simulate the collaborative deliberation and decision making process of a judicial bench. We conducted experiments on legal judgment prediction task, and the results show that our framework outperforms existing LLM based methods in terms of performance and decision quality. By incorporating these elements, our framework reflects real-world judicial processes more closely, enhancing accuracy, fairness, and society consideration. AgentsBench provides a more nuanced and realistic methods of trustworthy AI decision-making, with strong potential for application across various case types and legal scenarios.
451 |   </details>
452 | 
453 | - **ControlAgent: Automating Control System Design via Novel Integration of LLM Agents and Domain Expertise**
454 |    Xing-ming Guo, Darioush Keivan, U. Syed, Lianhui Qin, Huan Zhang, G. Dullerud, Peter J. Seiler, Bin Hu
455 |    arXiv.org 2024
456 |    [open paper page](https://api.semanticscholar.org/CorpusId:273654638)
457 |    <details>
458 |      <summary> Abstract </summary>
459 |      Control system design is a crucial aspect of modern engineering with far-reaching applications across diverse sectors including aerospace, automotive systems, power grids, and robotics. Despite advances made by Large Language Models (LLMs) in various domains, their application in control system design remains limited due to the complexity and specificity of control theory. To bridge this gap, we introduce ControlAgent, a new paradigm that automates control system design via novel integration of LLM agents and control-oriented domain expertise. ControlAgent encodes expert control knowledge and emulates human iterative design processes by gradually tuning controller parameters to meet user-specified requirements for stability, performance, and robustness. ControlAgent integrates multiple collaborative LLM agents, including a central agent responsible for task distribution and task-specific agents dedicated to detailed controller design for various types of systems and requirements. ControlAgent also employs a Python computation agent that performs complex calculations and controller evaluations based on standard design information provided by task-specified LLM agents. Combined with a history and feedback module, the task-specific LLM agents iteratively refine controller parameters based on real-time feedback from prior designs. Overall, ControlAgent mimics the design processes used by (human) practicing engineers, but removes all the human efforts and can be run in a fully automated way to give end-to-end solutions for control system design with user-specified requirements. To validate ControlAgent's effectiveness, we develop ControlEval, an evaluation dataset that comprises 500 control tasks with various specific design goals. The effectiveness of ControlAgent is demonstrated via extensive comparative evaluations between LLM-based and traditional human-involved toolbox-based baselines.
460 |   </details>
461 | 
462 | - **An Electoral Approach to Diversify LLM-based Multi-Agent Collective Decision-Making**
463 |    Xiutian Zhao, Ke Wang, Wei Peng
464 |    unknown 2024
465 |    [open paper page](https://api.semanticscholar.org/CorpusId:273502378)
466 |    <details>
467 |      <summary> Abstract </summary>
468 |      Modern large language models (LLMs) have exhibited cooperative synergy on complex task-solving, and collective decision-making (CDM) is a pivotal component in LLM-based multi-agent collaboration frameworks. Our survey on 52 recent such systems uncovers a severe lack of diversity, with a heavy reliance on dictatorial and plurality voting for CDM. Through the lens of social choice theory, we scrutinize widely-adopted CDM methods and identify their limitations. To enrich current landscape of LLM-based CDM, we present GEDI, an electoral CDM module that incorporates various ordinal preferential voting mechanisms. Our empirical case study across three benchmarks shows that the integration of certain CDM methods can markedly improve the reasoning capabilities and robustness of some leading LLMs, all without requiring intricate system designs. Additionally, we find that some CDM mechanisms generate positive synergies even with as few as three agents. The voting-based methods also demonstrate robustness against single points of failure, as well as diversity in terms of hit-rate@k and subject-wise impacts.
469 |   </details>
470 | 
471 | - **Planning with Multi-Constraints via Collaborative Language Agents**
472 |    Cong Zhang, Derrick-Goh-Xin Deik, Dexun Li, Hao Zhang, Yong Liu
473 |     2024
474 |    [open paper page](https://api.semanticscholar.org/CorpusId:270063287)
475 |    <details>
476 |      <summary> Abstract </summary>
477 |      The rapid advancement of neural language models has sparked a new surge of intelligent agent research. Unlike traditional agents, large language model-based agents (LLM agents) have emerged as a promising paradigm for achieving artificial general intelligence (AGI) due to their superior reasoning and generalization capabilities. Effective planning is crucial for the success of LLM agents in real-world tasks, making it a highly pursued topic in the community. Current planning methods typically translate tasks into executable action sequences. However, determining a feasible or optimal sequence for complex tasks with multiple constraints at fine granularity, which often requires compositing long chains of heterogeneous actions, remains challenging. This paper introduces Planning with Multi-Constraints (PMC), a zero-shot methodology for collaborative LLM-based multi-agent systems that simplifies complex task planning with constraints by decomposing it into a hierarchy of subordinate tasks. Each subtask is then mapped into executable actions. PMC was assessed on two constraint-intensive benchmarks, TravelPlanner and API-Bank. Notably, PMC achieved an average 42.68% success rate on TravelPlanner, significantly higher than GPT-4 (2.92%), and outperforming GPT-4 with ReAct on API-Bank by 13.64%, showing the immense potential of integrating LLM with multi-agent systems. We also show that PMC works with small LLM as the planning core, e.g., LLaMA-3.1-8B.
478 |   </details>
479 | 
480 | - **Agents4PLC: Automating Closed-loop PLC Code Generation and Verification in Industrial Control Systems using LLM-based Agents**
481 |    Zihan Liu, Ruinan Zeng, Dongxia Wang, Gengyun Peng, Jingyi Wang, Qiang Liu, Peiyu Liu, Wenhai Wang
482 |    unknown 2024
483 |    [open paper page](https://api.semanticscholar.org/CorpusId:273482697)
484 |    <details>
485 |      <summary> Abstract </summary>
486 |      In industrial control systems, the generation and verification of Programmable Logic Controller (PLC) code are critical for ensuring operational efficiency and safety. While Large Language Models (LLMs) have made strides in automated code generation, they often fall short in providing correctness guarantees and specialized support for PLC programming. To address these challenges, this paper introduces Agents4PLC, a novel framework that not only automates PLC code generation but also includes code-level verification through an LLM-based multi-agent system. We first establish a comprehensive benchmark for verifiable PLC code generation area, transitioning from natural language requirements to human-written-verified formal specifications and reference PLC code. We further enhance our `agents' specifically for industrial control systems by incorporating Retrieval-Augmented Generation (RAG), advanced prompt engineering techniques, and Chain-of-Thought strategies. Evaluation against the benchmark demonstrates that Agents4PLC significantly outperforms previous methods, achieving superior results across a series of increasingly rigorous metrics. This research not only addresses the critical challenges in PLC programming but also highlights the potential of our framework to generate verifiable code applicable to real-world industrial applications.
487 |   </details>
488 | 
489 | - **YOLO-MARL: You Only LLM Once for Multi-agent Reinforcement Learning**
490 |    Zhuang Yuan, Yi Shen, Zhili Zhang, Yuxiao Chen, Fei Miao
491 |    arXiv.org 2024
492 |    [open paper page](https://api.semanticscholar.org/CorpusId:273186556)
493 |    <details>
494 |      <summary> Abstract </summary>
495 |      Advancements in deep multi-agent reinforcement learning (MARL) have positioned it as a promising approach for decision-making in cooperative games. However, it still remains challenging for MARL agents to learn cooperative strategies for some game environments. Recently, large language models (LLMs) have demonstrated emergent reasoning capabilities, making them promising candidates for enhancing coordination among the agents. However, due to the model size of LLMs, it can be expensive to frequently infer LLMs for actions that agents can take. In this work, we propose You Only LLM Once for MARL (YOLO-MARL), a novel framework that leverages the high-level task planning capabilities of LLMs to improve the policy learning process of multi-agents in cooperative games. Notably, for each game environment, YOLO-MARL only requires one time interaction with LLMs in the proposed strategy generation, state interpretation and planning function generation modules, before the MARL policy training process. This avoids the ongoing costs and computational time associated with frequent LLMs API calls during training. Moreover, the trained decentralized normal-sized neural network-based policies operate independently of the LLM. We evaluate our method across three different environments and demonstrate that YOLO-MARL outperforms traditional MARL algorithms.
496 |   </details>
497 | 
498 | - **Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models**
499 |    Zhuorui Ye, Danqing Wang, Lei Li, Fei Fang
500 |    arXiv.org 2024
501 |    [open paper page](https://api.semanticscholar.org/CorpusId:273654271)
502 |    <details>
503 |      <summary> Abstract </summary>
504 |      Enhancing the reasoning capabilities of large language models (LLMs) is crucial for enabling them to tackle complex, multi-step problems. Multi-agent frameworks have shown great potential in enhancing LLMs' reasoning capabilities. However, the lack of effective cooperation between LLM agents hinders their performance, especially for multi-step reasoning tasks. This paper proposes a novel cooperative multi-agent reasoning framework (CoPlanner) by separating reasoning steps and assigning distinct duties to different agents. CoPlanner consists of two LLM agents: a planning agent and a reasoning agent. The planning agent provides high-level strategic hints, while the reasoning agent follows these hints and infers answers. By training the planning agent's policy through the interactive reasoning process via Proximal Policy Optimization (PPO), the LLaMA-3-8B-based CoPlanner outperforms the previous best method by 9.94\% on LogiQA and 3.09\% on BBH. Our results demonstrate that the guidance from the planning agent and the effective cooperation between the agents contribute to the superior performance of CoPlanner in tackling multi-step reasoning problems.
505 |   </details>
506 | 
507 | - **Two Heads Are Better Than One: Collaborative LLM Embodied Agents for Human-Robot Interaction**
508 |    Mitchell Rosser, Marc. G Carmichael
509 |    unknown 2024
510 |    [open paper page](https://api.semanticscholar.org/CorpusId:274281016)
511 |    <details>
512 |      <summary> Abstract </summary>
513 |      With the recent development of natural language generation models - termed as large language models (LLMs) - a potential use case has opened up to improve the way that humans interact with robot assistants. These LLMs should be able to leverage their large breadth of understanding to interpret natural language commands into effective, task appropriate and safe robot task executions. However, in reality, these models suffer from hallucinations, which may cause safety issues or deviations from the task. In other domains, these issues have been improved through the use of collaborative AI systems where multiple LLM agents can work together to collectively plan, code and self-check outputs. In this research, multiple collaborative AI systems were tested against a single independent AI agent to determine whether the success in other domains would translate into improved human-robot interaction performance. The results show that there is no defined trend between the number of agents and the success of the model. However, it is clear that some collaborative AI agent architectures can exhibit a greatly improved capacity to produce error-free code and to solve abstract problems.
514 |   </details>
515 | 
516 | - **Enhancing Supermarket Robot Interaction: A Multi-Level LLM Conversational Interface for Handling Diverse Customer Intents**
517 |    Luka Peternel, Chandran Nandkumar
518 |    arXiv.org 2024
519 |    [open paper page](https://api.semanticscholar.org/CorpusId:270560396)
520 |    <details>
521 |      <summary> Abstract </summary>
522 |      This paper presents the design and evaluation of a novel multi-level LLM interface for supermarket robots to assist customers. The proposed interface allows customers to convey their needs through both generic and specific queries. While state-of-the-art systems like OpenAI's GPTs are highly adaptable and easy to build and deploy, they still face challenges such as increased response times and limitations in strategic control of the underlying model for tailored use-case and cost optimization. Driven by the goal of developing faster and more efficient conversational agents, this paper advocates for using multiple smaller, specialized LLMs fine-tuned to handle different user queries based on their specificity and user intent. We compare this approach to a specialized GPT model powered by GPT-4 Turbo, using the Artificial Social Agent Questionnaire (ASAQ) and qualitative participant feedback in a counterbalanced within-subjects experiment. Our findings show that our multi-LLM chatbot architecture outperformed the benchmarked GPT model across all 13 measured criteria, with statistically significant improvements in four key areas: performance, user satisfaction, user-agent partnership, and self-image enhancement. The paper also presents a method for supermarket robot navigation by mapping the final chatbot response to correct shelf numbers, enabling the robot to sequentially navigate towards the respective products, after which lower-level robot perception, control, and planning can be used for automated object retrieval. We hope this work encourages more efforts into using multiple, specialized smaller models instead of relying on a single powerful, but more expensive and slower model.
523 |   </details>
524 | 
525 | - **A Cooperative Multi-Agent Framework for Zero-Shot Named Entity Recognition**
526 |    Zihan Wang, Ziqi Zhao, Yougang Lyu, Zhumin Chen, M. D. Rijke, Zhaochun Ren
527 |    unknown 2025
528 |    [open paper page](https://api.semanticscholar.org/CorpusId:276618078)
529 |    <details>
530 |      <summary> Abstract </summary>
531 |      Zero-shot named entity recognition (NER) aims to develop entity recognition systems from unannotated text corpora. This task presents substantial challenges due to minimal human intervention. Recent work has adapted large language models (LLMs) for zero-shot NER by crafting specialized prompt templates. It advances model self-learning abilities by incorporating self-annotated demonstrations. However, two important challenges persist: (i) Correlations between contexts surrounding entities are overlooked, leading to wrong type predictions or entity omissions. (ii) The indiscriminate use of task demonstrations, retrieved through shallow similarity-based strategies, severely misleads LLMs during inference. In this paper, we introduce the cooperative multi-agent system (CMAS), a novel framework for zero-shot NER that uses the collective intelligence of multiple agents to address the challenges outlined above. CMAS has four main agents: (i) a self-annotator, (ii) a type-related feature (TRF) extractor, (iii) a demonstration discriminator, and (iv) an overall predictor. To explicitly capture correlations between contexts surrounding entities, CMAS reformulates NER into two subtasks: recognizing named entities and identifying entity type-related features within the target sentence. To enable controllable utilization of demonstrations, a demonstration discriminator is established to incorporate the self-reflection mechanism, automatically evaluating helpfulness scores for the target sentence. Experimental results show that CMAS significantly improves zero-shot NER performance across six benchmarks, including both domain-specific and general-domain scenarios. Furthermore, CMAS demonstrates its effectiveness in few-shot settings and with various LLM backbones.
532 |   </details>
533 | 
534 | - **Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System**
535 |    Haoyang Su, Renqi Chen, Shixiang Tang, Zhenfei Yin, Xinzhe Zheng, Jinzhe Li, Biqing Qi, Qi Wu, Hui Li, Wanli Ouyang, Philip Torr, Bowen Zhou, Nanqing Dong
536 |    unknown 2024
537 |    [open paper page](https://api.semanticscholar.org/CorpusId:273346445)
538 |    <details>
539 |      <summary> Abstract </summary>
540 |      The rapid advancement of scientific progress requires innovative tools that can accelerate knowledge discovery. Although recent AI methods, particularly large language models (LLMs), have shown promise in tasks such as hypothesis generation and experimental design, they fall short of replicating the collaborative nature of real-world scientific practices, where diverse experts work together in teams to tackle complex problems. To address the limitations, we propose an LLM-based multi-agent system, i.e., Virtual Scientists (VirSci), designed to mimic the teamwork inherent in scientific research. VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas. Through comprehensive experiments, we demonstrate that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas. We further investigate the collaboration mechanisms that contribute to its tendency to produce ideas with higher novelty, offering valuable insights to guide future research and illuminating pathways toward building a robust system for autonomous scientific discovery. The code is available at https://github.com/open-sciencelab/Virtual-Scientists.
541 |   </details>
542 | 
543 | - **Tailoring with Targeted Precision: Edit-Based Agents for Open-Domain Procedure Customization**
544 |    Bodhisattwa Prasad Majumder, Li Zhang, Faeze Brahman, Yash Kumar Lal, Peter Clark, Niket Tandon
545 |    Annual Meeting of the Association for Computational Linguistics 2023
546 |    [open paper page](https://api.semanticscholar.org/CorpusId:268231438)
547 |    <details>
548 |      <summary> Abstract </summary>
549 |      How-to procedures, such as how to plant a garden, are now used by millions of users, but sometimes need customizing to meet a user's specific needs, e.g., planting a garden without pesticides. Our goal is to measure and improve an LLM's ability to perform such customization. Our approach is to test several simple multi-LLM-agent architectures for customization, as well as an end-to-end LLM, using a new evaluation set, called CustomPlans, of over 200 WikiHow procedures each with a customization need. We find that a simple architecture with two LLM agents used sequentially performs best, one that edits a generic how-to procedure and one that verifies its executability, significantly outperforming (10.5% absolute) an end-to-end prompted LLM. This suggests that LLMs can be configured reasonably effectively for procedure customization. This also suggests that multi-agent editing architectures may be worth exploring further for other customization applications (e.g. coding, creative writing) in the future.
550 |   </details>
551 | 
552 | - **Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering**
553 |    Bing Li, Zhenqi Wang, Zhongjian Hu, Peng Yang
554 |     2024
555 |    [open paper page](https://api.semanticscholar.org/CorpusId:274992537)
556 |    <details>
557 |      <summary> Abstract </summary>
558 |      Large Language Models (LLMs) have achieved impressive results in knowledge-based Visual Question Answering (VQA). However existing methods still have challenges: the inability to use external tools autonomously, and the inability to work in teams. Humans tend to know whether they need to use external tools when they encounter a new question, e.g., they tend to be able to give a direct answer to a familiar question, whereas they tend to use tools such as search engines when they encounter an unfamiliar question. In addition, humans also tend to collaborate and discuss with others to get better answers. Inspired by this, we propose the multi-agent voting framework. We design three LLM-based agents that simulate different levels of staff in a team, and assign the available tools according to the levels. Each agent provides the corresponding answer, and finally all the answers provided by the agents are voted to get the final answer. Experiments on OK-VQA and A-OKVQA show that our approach outperforms other baselines by 2.2 and 1.0, respectively.
559 |   </details>
560 | 
561 | - **Challenges Faced by Large Language Models in Solving Multi-Agent Flocking**
562 |    Peihan Li, Vishnu Menon, Bhavanaraj Gudiguntla, Daniel Ting, Lifeng Zhou
563 |    unknown 2024
564 |    [open paper page](https://api.semanticscholar.org/CorpusId:269004483)
565 |    <details>
566 |      <summary> Abstract </summary>
567 |      Flocking is a behavior where multiple agents in a system attempt to stay close to each other while avoiding collision and maintaining a desired formation. This is observed in the natural world and has applications in robotics, including natural disaster search and rescue, wild animal tracking, and perimeter surveillance and patrol. Recently, large language models (LLMs) have displayed an impressive ability to solve various collaboration tasks as individual decision-makers. Solving multi-agent flocking with LLMs would demonstrate their usefulness in situations requiring spatial and decentralized decision-making. Yet, when LLM-powered agents are tasked with implementing multi-agent flocking, they fall short of the desired behavior. After extensive testing, we find that agents with LLMs as individual decision-makers typically opt to converge on the average of their initial positions or diverge from each other. After breaking the problem down, we discover that LLMs cannot understand maintaining a shape or keeping a distance in a meaningful way. Solving multi-agent flocking with LLMs would enhance their ability to understand collaborative spatial reasoning and lay a foundation for addressing more complex multi-agent tasks. This paper discusses the challenges LLMs face in multi-agent flocking and suggests areas for future improvement and research.
568 |   </details>
569 | 
570 | - **AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents**
571 |    Guhong Chen, Liyang Fan, Zihan Gong, Nan Xie, Zixuan Li, Ziqiang Liu, Chengming Li, Qiang Qu, Shiwen Ni, Min Yang
572 |    unknown 2024
573 |    [open paper page](https://api.semanticscholar.org/CorpusId:271874640)
574 |    <details>
575 |      <summary> Abstract </summary>
576 |      In this paper, we present a simulation system called AgentCourt that simulates the entire courtroom process. The judge, plaintiff's lawyer, defense lawyer, and other participants are autonomous agents driven by large language models (LLMs). Our core goal is to enable lawyer agents to learn how to argue a case, as well as improving their overall legal skills, through courtroom process simulation. To achieve this goal, we propose an adversarial evolutionary approach for the lawyer-agent. Since AgentCourt can simulate the occurrence and development of court hearings based on a knowledge base and LLM, the lawyer agents can continuously learn and accumulate experience from real court cases. The simulation experiments show that after two lawyer-agents have engaged in a thousand adversarial legal cases in AgentCourt (which can take a decade for real-world lawyers), compared to their pre-evolutionary state, the evolved lawyer agents exhibit consistent improvement in their ability to handle legal tasks. To enhance the credibility of our experimental results, we enlisted a panel of professional lawyers to evaluate our simulations. The evaluation indicates that the evolved lawyer agents exhibit notable advancements in responsiveness, as well as expertise and logical rigor. This work paves the way for advancing LLM-driven agent technology in legal scenarios. Code is available at https://github.com/relic-yuexi/AgentCourt.
577 |   </details>
578 | 
579 | - **MiniFed : Integrating LLM-based Agentic-Workflow for Simulating FOMC Meeting**
580 |    Sungil Seok, Shuide Wen, Qiyuan Yang, Juan Feng, Wenming Yang
581 |    unknown 2024
582 |    [open paper page](https://api.semanticscholar.org/CorpusId:273532286)
583 |    <details>
584 |      <summary> Abstract </summary>
585 |      The Federal Funds rate in the United States plays a significant role in both domestic and international financial markets. However, research has predominantly focused on the effects of adjustments to the Federal Funds rate rather than on the decision-making process itself. Recent advancements in large language models(LLMs) offer a potential method for reconstructing the original FOMC meetings, which are responsible for setting the Federal Funds rate. In this paper, we propose a five-stage FOMC meeting simulation framework, MiniFed, which employs LLM agents to simulate real-world FOMC meeting members and optimize the FOMC structure. This framework effectively revitalizes the FOMC meeting process and facilitates projections of the Federal Funds rate. Experimental results demonstrate that our proposed MiniFed framework achieves both high accuracy in Federal Funds rate projections and behavioral alignment with the agents' real-world counterparts. Given that few studies have focused on employing LLM agents to simulate large-scale real-world conferences, our work can serve as a benchmark for future developments.
586 |   </details>
587 | 
588 | - **Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics**
589 |    Mengya Song, Alice Zheng, Yiwen Lu, Zhaohan Xi, Yong Chen, Zhiheng Liu, Yuan Zhou, Peng Zhang
590 |    arXiv.org 2024
591 |    [open paper page](https://api.semanticscholar.org/CorpusId:273098806)
592 |    <details>
593 |      <summary> Abstract </summary>
594 |      Large language models (LLMs) have demonstrated remarkable progress in healthcare. However, a significant gap remains regarding LLMs' professionalism in domain-specific clinical practices, limiting their application in real-world diagnostics. In this work, we introduce ZODIAC, an LLM-powered framework with cardiologist-level professionalism designed to engage LLMs in cardiological diagnostics. ZODIAC assists cardiologists by extracting clinically relevant characteristics from patient data, detecting significant arrhythmias, and generating preliminary reports for the review and refinement by cardiologists. To achieve cardiologist-level professionalism, ZODIAC is built on a multi-agent collaboration framework, enabling the processing of patient data across multiple modalities. Each LLM agent is fine-tuned using real-world patient data adjudicated by cardiologists, reinforcing the model's professionalism. ZODIAC undergoes rigorous clinical validation with independent cardiologists, evaluated across eight metrics that measure clinical effectiveness and address security concerns. Results show that ZODIAC outperforms industry-leading models, including OpenAI's GPT-4o, Meta's Llama-3.1-405B, and Google's Gemini-pro, as well as medical-specialist LLMs like Microsoft's BioGPT. ZODIAC demonstrates the transformative potential of specialized LLMs in healthcare by delivering domain-specific solutions that meet the stringent demands of medical practice. Notably, ZODIAC has been successfully integrated into electrocardiography (ECG) devices, exemplifying the growing trend of embedding LLMs into Software-as-Medical-Device (SaMD).
595 |   </details>
596 | 
597 | - **Human-In-the-Loop Software Development Agents**
598 |    Kun Chen, Ming Wu, Patanamon Thongtanunam, Ruixiong Zhang, Chakkrit Kla Tantithamthavorn, Jirat Pasuksmit, Jing Li, Fan Jiang, Wannita Takerngsaksiri, Evan Cook
599 |     2024
600 |    [open paper page](https://api.semanticscholar.org/CorpusId:274150430)
601 |    <details>
602 |      <summary> Abstract </summary>
603 |      Recently, Large Language Models (LLMs)-based multi-agent paradigms for software engineering are introduced to automatically resolve software development tasks (e.g., from a given issue to source code). However, existing work is evaluated based on historical benchmark datasets, does not consider human feedback at each stage of the automated software development process, and has not been deployed in practice. In this paper, we introduce a Human-in-the-loop LLM-based Agents framework (HULA) for software development that allows software engineers to refine and guide LLMs when generating coding plans and source code for a given task. We design, implement, and deploy the HULA framework into Atlassian JIRA for internal uses. Through a multi-stage evaluation of the HULA framework, Atlassian software engineers perceive that HULA can minimize the overall development time and effort, especially in initiating a coding plan and writing code for straightforward tasks. On the other hand, challenges around code quality are raised to be solved in some cases. We draw lessons learned and discuss opportunities for future work, which will pave the way for the advancement of LLM-based agents in software development.
604 |   </details>
605 | 
606 | - **Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents**
607 |    Haochen Sun, Shuwen Zhang, Lei Ren, Hao Xu, Hao Fu, Caixia Yuan, Xiaojie Wang
608 |    unknown 2025
609 |    [open paper page](https://api.semanticscholar.org/CorpusId:276647196)
610 |    <details>
611 |      <summary> Abstract </summary>
612 |      Large language models (LLMs) based agent systems have made great strides in real-world applications beyond traditional NLP tasks. This paper proposes a new LLM-powered Multi-Agent System (LLM-MAS) benchmark, Collab-Overcooked, built on the popular Overcooked-AI game with more applicable and challenging tasks in interactive environments. Collab-Overcooked extends existing benchmarks from two novel perspectives. First, it provides a multi-agent framework supporting diverse tasks and objectives and encourages collaboration through natural language communication. Second, it introduces a spectrum of process-oriented evaluation metrics to assess the fine-grained collaboration capabilities of different LLM agents, a dimension often overlooked in prior work. We conduct extensive experiments over 10 popular LLMs and show that, while the LLMs present a strong ability in goal interpretation, there is a significant discrepancy in active collaboration and continuous adaption that are critical for efficiently fulfilling complicated tasks. Notably, we highlight the strengths and weaknesses in LLM-MAS and provide insights for improving and evaluating LLM-MAS on a unified and open-sourced benchmark. Environments, 30 open-ended tasks, and an integrated evaluation package are now publicly available at https://github.com/YusaeMeow/Collab-Overcooked.
613 |   </details>
614 | 
615 | - **MALT: Improving Reasoning with Multi-Agent LLM Training**
616 |    S. Motwani, Chandler Smith, Rocktim Jyoti Das, Markian Rybchuk, Philip Torr, Ivan Laptev, Fabio Pizzati, Ronald Clark, Christian Schröder de Witt
617 |    arXiv.org 2024
618 |    [open paper page](https://arxiv.org/pdf/2412.01928.pdf)
619 |    <details>
620 |      <summary> Abstract </summary>
621 |      Large Language Models (LLMs) often produce answers with a single chain-of-thought, which restricts their ability to explore reasoning paths or self-correct flawed outputs in complex tasks. In this paper, we introduce MALT (Multi-Agent LLM Training), a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps using a sequential pipeline of heterogeneous agents. During data generation, each agent is repeatedly sampled to form a multi-agent search tree, where final outputs are graded against ground-truth data. We then apply value iteration to propagate reward signals back to each role-conditioned model, automatically producing multi-agent post-training data without human or teacher-model supervision. Our off-policy approach allows each agent to specialize by learning from correct and incorrect trajectories, ultimately improving the end-to-end reasoning chain. On MATH, GSM8K, and CSQA, MALT surpasses the same baseline LLM with a relative improvement of 15.66%, 7.42%, and 9.40% respectively, making it an important advance towards multi-agent cooperative training.
622 |   </details>
623 | 
624 | - **TRIZ Agents: A Multi-Agent LLM Approach for TRIZ-Based Innovation**
625 |    Kamil Szczepanik, Jarosław A. Chudziak
626 |    Proceedings of the 17th International Conference on Agents and Artificial Intelligence 2025
627 |    [open paper page](https://doi.org/10.5220/0013321900003890)
628 |    <details>
629 |      <summary> Abstract </summary>
630 |      
631 |   </details>
632 | 
633 | - **Enhancing Anomaly Detection in Financial Markets with an LLM-based Multi-Agent Framework**
634 |    Taejin Park
635 |    unknown 2024
636 |    [open paper page](https://arxiv.org/pdf/2403.19735.pdf)
637 |    <details>
638 |      <summary> Abstract </summary>
639 |      This paper introduces a Large Language Model (LLM)-based multi-agent framework designed to enhance anomaly detection within financial market data, tackling the longstanding challenge of manually verifying system-generated anomaly alerts. The framework harnesses a collaborative network of AI agents, each specialised in distinct functions including data conversion, expert analysis via web research, institutional knowledge utilization or cross-checking and report consolidation and management roles. By coordinating these agents towards a common objective, the framework provides a comprehensive and automated approach for validating and interpreting financial data anomalies. I analyse the S&P 500 index to demonstrate the framework's proficiency in enhancing the efficiency, accuracy and reduction of human intervention in financial market monitoring. The integration of AI's autonomous functionalities with established analytical methods not only underscores the framework's effectiveness in anomaly detection but also signals its broader applicability in supporting financial market monitoring.
640 |   </details>
641 | 
642 | - **PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback**
643 |    Kanika Goswami, Puneet Mathur, Ryan Rossi, Franck Dernoncourt
644 |    arXiv.org 2025
645 |    [open paper page](https://arxiv.org/pdf/2502.00988.pdf)
646 |    <details>
647 |      <summary> Abstract </summary>
648 |      Scientific data visualization is pivotal for transforming raw data into comprehensible visual representations, enabling pattern recognition, forecasting, and the presentation of data-driven insights. However, novice users often face difficulties due to the complexity of selecting appropriate tools and mastering visualization techniques. Large Language Models (LLMs) have recently demonstrated potential in assisting code generation, though they struggle with accuracy and require iterative debugging. In this paper, we propose PlotGen, a novel multi-agent framework aimed at automating the creation of precise scientific visualizations. PlotGen orchestrates multiple LLM-based agents, including a Query Planning Agent that breaks down complex user requests into executable steps, a Code Generation Agent that converts pseudocode into executable Python code, and three retrieval feedback agents - a Numeric Feedback Agent, a Lexical Feedback Agent, and a Visual Feedback Agent - that leverage multimodal LLMs to iteratively refine the data accuracy, textual labels, and visual correctness of generated plots via self-reflection. Extensive experiments show that PlotGen outperforms strong baselines, achieving a 4-6 percent improvement on the MatPlotBench dataset, leading to enhanced user trust in LLM-generated visualizations and improved novice productivity due to a reduction in debugging time needed for plot errors.
649 |   </details>
650 | 
651 | - **Reliable Decision-Making for Multi-Agent LLM Systems**
652 |    Xian Yeow, Shunichi Lee, Lasitha Akatsuka, Vidyaratne Aman, Ahmed Kumar, Chetan Farahat, Gupta
653 |    unknown 0
654 |    [open paper page](https://multiagents.org/2025_artifacts/reliable_decision_making_for_multi_agent_llm_systems.pdf)
655 |    <details>
656 |      <summary> Abstract </summary>
657 |      
658 |   </details>
659 | 
660 | - **Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems**
661 |    N. Nascimento, Paulo Alencar, Donald D. Cowan
662 |    2023 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C) 2023
663 |    [open paper page](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10336211)
664 |    <details>
665 |      <summary> Abstract </summary>
666 |      The complexity of managing multiagent systems (MASs) in autonomic computing can be mitigated using a self-adaptation approach, where systems are equipped to monitor and adjust themselves based on specific concerns. Communication in these systems is key given that in scenarios involving agent interaction, it enhances cooperation and reduces coordination challenges by enabling direct, clear information exchange. However, the tasks of boosting communication expressiveness within MASs and logically processing a multitude of variables in dynamic environments are still challenging. This paper presents a novel strategy: integrating large language models (LLMs) like GPT-based technologies into MASs to boost communication and agent autonomy. Our proposal encompasses the development of a novel LLM/GPT-based agent architecture, focusing not only on advanced conversation features but also on the reasoning and decision-making capacities of these models. This is grounded in the MAPE-K model, known for supporting system adaptability in dynamic environments. We illustrate our approach through a marketplace scenario. This work represents a paradigm shift in MAS self-adaptation, utilizing LLMs' capabilities and indicating further research opportunities to assess LLMs' applicability in more complex MAS scenarios. This could pave the way for more potent problem-solving capabilities and refined communication within MASs.
667 |   </details>
668 | 
669 | - **LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions**
670 |    Chuanneng Sun, Songjun Huang, D. Pompili
671 |    unknown 2024
672 |    [open paper page](https://api.semanticscholar.org/CorpusId:269921354)
673 |    <details>
674 |      <summary> Abstract </summary>
675 |      In recent years, Large Language Models (LLMs) have shown great abilities in various tasks, including question answering, arithmetic problem solving, and poem writing, among others. Although research on LLM-as-an-agent has shown that LLM can be applied to Reinforcement Learning (RL) and achieve decent results, the extension of LLM-based RL to Multi-Agent System (MAS) is not trivial, as many aspects, such as coordination and communication between agents, are not considered in the RL frameworks of a single agent. To inspire more research on LLM-based MARL, in this letter, we survey the existing LLM-based single-agent and multi-agent RL frameworks and provide potential research directions for future research. In particular, we focus on the cooperative tasks of multiple agents with a common goal and communication among them. We also consider human-in/on-the-loop scenarios enabled by the language component in the framework.
676 |   </details>
677 | 
678 | - **A survey on LLM-based multi-agent systems: workflow, infrastructure, and challenges**
679 |    Xinyi Li, Sai Wang, Siqi Zeng, Yu Wu, Yi Yang
680 |    unknown 2024
681 |    [open paper page](https://api.semanticscholar.org/CorpusId:273218743)
682 |    <details>
683 |      <summary> Abstract </summary>
684 |      The pursuit of more intelligent and credible autonomous systems, akin to human society, has been a long-standing endeavor for humans. Leveraging the exceptional reasoning and planning capabilities of large language models (LLMs), LLM-based agents have been proposed and have achieved remarkable success across a wide array of tasks. Notably, LLM-based multi-agent systems (MAS) are considered a promising pathway towards realizing general artificial intelligence that is equivalent to or surpasses human-level intelligence. In this paper, we present a comprehensive survey of these studies, offering a systematic review of LLM-based MAS. Adhering to the workflow of LLM-based multi-agent systems, we synthesize a general structure encompassing five key components: profile, perception, self-action, mutual interaction, and evolution. This unified framework encapsulates much of the previous work in the field. Furthermore, we illuminate the extensive applications of LLM-based MAS in two principal areas: problem-solving and world simulation. Finally, we discuss in detail several contemporary challenges and provide insights into potential future directions in this domain.
685 |   </details>
686 | 
687 | - **LLM-Based Multi-Agent Systems for Software Engineering: Vision and the Road Ahead**
688 |    Junda He, Christoph Treude, David Lo
689 |    arXiv.org 2024
690 |    [open paper page](https://api.semanticscholar.org/CorpusId:269005211)
691 |    <details>
692 |      <summary> Abstract </summary>
693 |      
694 |   </details>
695 | 
696 | - **Multi-Agent Collaboration Mechanisms: A Survey of LLMs**
697 |    Khanh-Tung Tran, Dung Dao, Minh-Duong Nguyen, Quoc-Viet Pham, Barry O’Sullivan, Hoang D. Nguyen
698 |    unknown 2025
699 |    [open paper page](https://api.semanticscholar.org/CorpusId:275471465)
700 |    <details>
701 |      <summary> Abstract </summary>
702 |      With recent advances in Large Language Models (LLMs), Agentic AI has become phenomenal in real-world applications, moving toward multiple LLM-based agents to perceive, learn, reason, and act collaboratively. These LLM-based Multi-Agent Systems (MASs) enable groups of intelligent agents to coordinate and solve complex tasks collectively at scale, transitioning from isolated models to collaboration-centric approaches. This work provides an extensive survey of the collaborative aspect of MASs and introduces an extensible framework to guide future research. Our framework characterizes collaboration mechanisms based on key dimensions: actors (agents involved), types (e.g., cooperation, competition, or coopetition), structures (e.g., peer-to-peer, centralized, or distributed), strategies (e.g., role-based or model-based), and coordination protocols. Through a review of existing methodologies, our findings serve as a foundation for demystifying and advancing LLM-based MASs toward more intelligent and collaborative solutions for complex, real-world use cases. In addition, various applications of MASs across diverse domains, including 5G/6G networks, Industry 5.0, question answering, and social and cultural settings, are also investigated, demonstrating their wider adoption and broader impacts. Finally, we identify key lessons learned, open challenges, and potential research directions of MASs towards artificial collective intelligence.
703 |   </details>
704 | 
705 | - **LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead**
706 |    Junda He, Christoph Treude, David Lo
707 |     2024
708 |    [open paper page](https://api.semanticscholar.org/CorpusId:274965784)
709 |    <details>
710 |      <summary> Abstract </summary>
711 |      Integrating Large Language Models (LLMs) into autonomous agents marks a significant shift in the research landscape by offering cognitive abilities that are competitive with human planning and reasoning. This paper explores the transformative potential of integrating Large Language Models into Multi-Agent (LMA) systems for addressing complex challenges in software engineering (SE). By leveraging the collaborative and specialized abilities of multiple agents, LMA systems enable autonomous problem-solving, improve robustness, and provide scalable solutions for managing the complexity of real-world software projects. In this paper, we conduct a systematic review of recent primary studies to map the current landscape of LMA applications across various stages of the software development lifecycle (SDLC). To illustrate current capabilities and limitations, we perform two case studies to demonstrate the effectiveness of state-of-the-art LMA frameworks. Additionally, we identify critical research gaps and propose a comprehensive research agenda focused on enhancing individual agent capabilities and optimizing agent synergy. Our work outlines a forward-looking vision for developing fully autonomous, scalable, and trustworthy LMA systems, laying the foundation for the evolution of Software Engineering 2.0.
712 |   </details>
713 | 
714 | - **UFO: A UI-Focused Agent for Windows OS Interaction**
715 |    Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Ming-Jie Ma, Yu Kang, Qingwei Lin, S. Rajmohan, Dongmei Zhang, Qi Zhang
716 |    arXiv.org 2024
717 |    [open paper page](https://api.semanticscholar.org/CorpusId:267636625)
718 |    <details>
719 |      <summary> Abstract </summary>
720 |      We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision. UFO employs a dual-agent framework to meticulously observe and analyze the graphical user interface (GUI) and control information of Windows applications. This enables the agent to seamlessly navigate and operate within individual applications and across them to fulfill user requests, even when spanning multiple applications. The framework incorporates a control interaction module, facilitating action grounding without human intervention and enabling fully automated execution. Consequently, UFO transforms arduous and time-consuming processes into simple tasks achievable solely through natural language commands. We conducted testing of UFO across 9 popular Windows applications, encompassing a variety of scenarios reflective of users' daily usage. The results, derived from both quantitative metrics and real-case studies, underscore the superior effectiveness of UFO in fulfilling user requests. To the best of our knowledge, UFO stands as the first UI agent specifically tailored for task completion within the Windows OS environment. The open-source code for UFO is available on https://github.com/microsoft/UFO.
721 |   </details>
722 | 
723 | - **Multi-Agent Autonomous Driving Systems with Large Language Models: A Survey of Recent Advances**
724 |    Yaozu Wu, Dongyuan Li, Yankai Chen, Renhe Jiang, Henry Peng Zou, Liancheng Fang, Zhen Wang, Philip S. Yu
725 |    unknown 2025
726 |    [open paper page](https://api.semanticscholar.org/CorpusId:276576185)
727 |    <details>
728 |      <summary> Abstract </summary>
729 |      Autonomous Driving Systems (ADSs) are revolutionizing transportation by reducing human intervention, improving operational efficiency, and enhancing safety. Large Language Models (LLMs), known for their exceptional planning and reasoning capabilities, have been integrated into ADSs to assist with driving decision-making. However, LLM-based single-agent ADSs face three major challenges: limited perception, insufficient collaboration, and high computational demands. To address these issues, recent advancements in LLM-based multi-agent ADSs have focused on improving inter-agent communication and cooperation. This paper provides a frontier survey of LLM-based multi-agent ADSs. We begin with a background introduction to related concepts, followed by a categorization of existing LLM-based approaches based on different agent interaction modes. We then discuss agent-human interactions in scenarios where LLM-based agents engage with humans. Finally, we summarize key applications, datasets, and challenges in this field to support future research (https://anonymous.4open.science/r/LLM-based_Multi-agent_ADS-3A5C/README.md).
730 |   </details>
731 | 
732 | - **KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models**
733 |    Kemou Jiang, Xuan Cai, Zhiyong Cui, Aoyong Li, Yilong Ren, Haiyang Yu, Hao Yang, Daocheng Fu, Licheng Wen, Pinlong Cai
734 |    unknown 2024
735 |    [open paper page](https://api.semanticscholar.org/CorpusId:271310037)
736 |    <details>
737 |      <summary> Abstract </summary>
738 |      Large language models (LLMs) as autonomous agents offer a novel avenue for tackling real-world challenges through a knowledge-driven manner. These LLM-enhanced methodologies excel in generalization and interpretability. However, the complexity of driving tasks often necessitates the collaboration of multiple, heterogeneous agents, underscoring the need for such LLM-driven agents to engage in cooperative knowledge sharing and cognitive synergy. Despite the promise of LLMs, current applications predominantly center around single agent scenarios. To broaden the horizons of knowledge-driven strategies and bolster the generalization capabilities of autonomous agents, we propose the KoMA framework consisting of multi-agent interaction, multi-step planning, shared-memory, and ranking-based reflection modules to enhance multi-agents' decision-making in complex driving scenarios. Based on the framework's generated text descriptions of driving scenarios, the multi-agent interaction module enables LLM agents to analyze and infer the intentions of surrounding vehicles, akin to human cognition. The multi-step planning module enables LLM agents to analyze and obtain final action decisions layer by layer to ensure consistent goals for short-term action decisions. The shared memory module can accumulate collective experience to make superior decisions, and the ranking-based reflection module can evaluate and improve agent behavior with the aim of enhancing driving safety and efficiency. The KoMA framework not only enhances the robustness and adaptability of autonomous driving agents but also significantly elevates their generalization capabilities across diverse scenarios. Empirical results demonstrate the superiority of our approach over traditional methods, particularly in its ability to handle complex, unpredictable driving environments without extensive retraining.
739 |   </details>
740 | 
741 | - **LLM-based Multi-Agent Systems: Techniques and Business Perspectives**
742 |    Yingxuan Yang, Qiuying Peng, Jun Wang, Weinan Zhang
743 |    unknown 2024
744 |    [open paper page](https://api.semanticscholar.org/CorpusId:274165614)
745 |    <details>
746 |      <summary> Abstract </summary>
747 |      In the era of (multi-modal) large language models, most operational processes can be reformulated and reproduced using LLM agents. The LLM agents can perceive, control, and get feedback from the environment so as to accomplish the given tasks in an autonomous manner. Besides the environment-interaction property, the LLM agents can call various external tools to ease the task completion process. The tools can be regarded as a predefined operational process with private or real-time knowledge that does not exist in the parameters of LLMs. As a natural trend of development, the tools for calling are becoming autonomous agents, thus the full intelligent system turns out to be a LLM-based Multi-Agent System (LaMAS). Compared to the previous single-LLM-agent system, LaMAS has the advantages of i) dynamic task decomposition and organic specialization, ii) higher flexibility for system changing, iii) proprietary data preserving for each participating entity, and iv) feasibility of monetization for each entity. This paper discusses the technical and business landscapes of LaMAS. To support the ecosystem of LaMAS, we provide a preliminary version of such LaMAS protocol considering technical requirements, data privacy, and business incentives. As such, LaMAS would be a practical solution to achieve artificial collective intelligence in the near future.
748 |   </details>
749 | 
750 | - **Prompt Engineering Through the Lens of Optimal Control**
751 |    Yifan Luo, Yiming Tang, Chengfeng Shen, Zhennan Zhou, Bin Dong
752 |    unknown 2023
753 |    [open paper page](https://api.semanticscholar.org/CorpusId:264426100)
754 |    <details>
755 |      <summary> Abstract </summary>
756 |      Prompt Engineering (PE) has emerged as a critical technique for guiding Large Language Models (LLMs) in solving intricate tasks. Its importance is highlighted by its potential to significantly enhance the efficiency and effectiveness of human-machine interaction. As tasks grow increasingly complex, recent advanced PE methods have extended beyond the limitations of single-round interactions to embrace multi-round interactions, which allows for a deeper and more nuanced engagement with LLMs. In this paper, we propose an optimal control framework tailored for multi-round interactions with LLMs. This framework provides a unified mathematical structure that not only systematizes the existing PE methods but also sets the stage for rigorous analytical improvements. Furthermore, we extend this framework to include PE via ensemble methods and multi-agent collaboration, thereby enlarging the scope of applicability. By adopting an optimal control perspective, we offer fresh insights into existing PE methods and highlight theoretical challenges that warrant future research. Besides, our work lays a foundation for the development of more effective and interpretable PE methods.
757 |   </details>
758 | 
759 | - **Multi-Agent Coordination across Diverse Applications: A Survey**
760 |    Lijun Sun, Yijun Yang, Qiqi Duan, Yuhui Shi, Chao Lyu, Yu-Cheng Chang, Chin-Teng Lin, Yang Shen
761 |    unknown 2025
762 |    [open paper page](https://api.semanticscholar.org/CorpusId:276482652)
763 |    <details>
764 |      <summary> Abstract </summary>
765 |      Multi-agent coordination studies the underlying mechanism enabling the trending spread of diverse multi-agent systems (MAS) and has received increasing attention, driven by the expansion of emerging applications and rapid AI advances. This survey outlines the current state of coordination research across applications through a unified understanding that answers four fundamental coordination questions: (1) what is coordination; (2) why coordination; (3) who to coordinate with; and (4) how to coordinate. Our purpose is to explore existing ideas and expertise in coordination and their connections across diverse applications, while identifying and highlighting emerging and promising research directions. First, general coordination problems that are essential to varied applications are identified and analyzed. Second, a number of MAS applications are surveyed, ranging from widely studied domains, e.g., search and rescue, warehouse automation and logistics, and transportation systems, to emerging fields including humanoid and anthropomorphic robots, satellite systems, and large language models (LLMs). Finally, open challenges about the scalability, heterogeneity, and learning mechanisms of MAS are analyzed and discussed. In particular, we identify the hybridization of hierarchical and decentralized coordination, human-MAS coordination, and LLM-based MAS as promising future directions.
766 |   </details>
767 | 
768 | - **A Survey on LLM-based Multi-Agent System: Recent Advances and New Frontiers in Application**
769 |    Shuaihang Chen, Yuanxing Liu, Wei Han, Weinan Zhang, Ting Liu
770 |    unknown 2024
771 |    [open paper page](https://api.semanticscholar.org/CorpusId:274981589)
772 |    <details>
773 |      <summary> Abstract </summary>
774 |      LLM-based Multi-Agent Systems ( LLM-MAS ) have become a research hotspot since the rise of large language models (LLMs). However, with the continuous influx of new related works, the existing reviews struggle to capture them comprehensively. This paper presents a comprehensive survey of these studies. We first discuss the definition of LLM-MAS, a framework encompassing much of previous work. We provide an overview of the various applications of LLM-MAS in (i) solving complex tasks, (ii) simulating specific scenarios, and (iii) evaluating generative agents. Building on previous studies, we also highlight several challenges and propose future directions for research in this field.
775 |   </details>
776 | 
777 | - **AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML**
778 |    Patara Trirat, Wonyong Jeong, Sung Ju Hwang
779 |    arXiv.org 2024
780 |    [open paper page](https://arxiv.org/pdf/2410.02958.pdf)
781 |    <details>
782 |      <summary> Abstract </summary>
783 |      Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline, such as optimal model search and hyperparameter tuning. Existing AutoML systems often require technical expertise to set up complex tools, which is in general time-consuming and requires a large amount of human effort. Therefore, recent works have started exploiting large language models (LLM) to lessen such burden and increase the usability of AutoML frameworks via a natural language interface, allowing non-expert users to build their data-driven solutions. These methods, however, are usually designed only for a particular process in the AI development pipeline and do not efficiently use the inherent capacity of the LLMs. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML, i.e., from data retrieval to model deployment. AutoML-Agent takes user's task descriptions, facilitates collaboration between specialized LLM agents, and delivers deployment-ready models. Unlike existing work, instead of devising a single plan, we introduce a retrieval-augmented planning strategy to enhance exploration to search for more optimal plans. We also decompose each plan into sub-tasks (e.g., data preprocessing and neural network design) each of which is solved by a specialized agent we build via prompting executing in parallel, making the search process more efficient. Moreover, we propose a multi-stage verification to verify executed results and guide the code generation LLM in implementing successful solutions. Extensive experiments on seven downstream tasks using fourteen datasets show that AutoML-Agent achieves a higher success rate in automating the full AutoML process, yielding systems with good performance throughout the diverse domains.
784 |   </details>
785 | 
786 | - **A Survey on Large Language Model based Autonomous Agents**
787 |    Lei Wang, Chengbang Ma, Xueyang Feng, Zeyu Zhang, Hao-ran Yang, Jingsen Zhang, Zhi-Yang Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, Ji-rong Wen
788 |    Frontiers Comput. Sci. 2023
789 |    [open paper page](https://arxiv.org/pdf/2308.11432.pdf)
790 |    <details>
791 |      <summary> Abstract </summary>
792 |      Autonomous agents have long been a research focus in academic and industry communities. Previous research often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from human learning processes, and makes the agents hard to achieve human-like decisions. Recently, through the acquisition of vast amounts of Web knowledge, large language models (LLMs) have shown potential in human-level intelligence, leading to a surge in research on LLM-based autonomous agents. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of LLM-based autonomous agents from a holistic perspective. We first discuss the construction of LLM-based autonomous agents, proposing a unified framework that encompasses much of previous work. Then, we present a overview of the diverse applications of LLM-based autonomous agents in social science, natural science, and engineering. Finally, we delve into the evaluation strategies commonly used for LLM-based autonomous agents. Based on the previous studies, we also present several challenges and future directions in this field.
793 |   </details>
794 | 
795 | - **Large Language Models: A Survey**
796 |    Jianfeng Gao, Xavier Amatriain, Tomáš Mikolov, Meysam Asgari Chenaghlu, Narjes Nikzad, Shervin Minaee, Richard Socher
797 |    arXiv.org 2024
798 |    [open paper page](https://api.semanticscholar.org/CorpusId:267617032)
799 |    <details>
800 |      <summary> Abstract </summary>
801 |      Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs. We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks. Finally, we conclude the paper by discussing open challenges and future research directions.
802 |   </details>
803 | 
804 | 
805 | - **Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface**
806 |    Wenyue Hua, Mengting Wan, Shashank Vadrevu, Ryan Nadel, Yongfeng Zhang, Chi Wang
807 |    unknown 2024
808 |    [open paper page](https://api.semanticscholar.org/CorpusId:273023085)
809 |    <details>
810 |      <summary> Abstract </summary>
811 |      Agents, as user-centric tools, are increasingly deployed for human task delegation, assisting with a broad spectrum of requests by generating thoughts, engaging with user proxies, and producing action plans. However, agents based on large language models (LLMs) often face substantial planning latency due to two primary factors: the efficiency limitations of the underlying LLMs due to their large size and high demand, and the structural complexity of the agents due to the extensive generation of intermediate thoughts to produce the final output. Given that inefficiency in service provision can undermine the value of automation for users, this paper presents a human-centered efficient agent planning method -- Interactive Speculative Planning -- aiming at enhancing the efficiency of agent planning through both system design and human-AI interaction. Our approach advocates for the co-design of the agent system and user interface, underscoring the importance of an agent system that can fluidly manage user interactions and interruptions. By integrating human interruptions as a fundamental component of the system, we not only make it more user-centric but also expedite the entire process by leveraging human-in-the-loop interactions to provide accurate intermediate steps. Code and data will be released.
812 |   </details>
813 | 
814 | - **Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld**
815 |    Yijun Yang, Tianyi Zhou, Kanxue Li, Dapeng Tao, Lusong Li, Li Shen, Xiaodong He, Jing Jiang, Yuhui Shi
816 |    Computer Vision and Pattern Recognition 2023
817 |    [open paper page](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10657391)
818 |    <details>
819 |      <summary> Abstract </summary>
820 |      While large language models (LLMs) excel in a simulated world of texts, they struggle to interact with the more realistic world without perceptions of other modalities such as visual or audio signals. Although vision-language models (VLMs) integrate LLM modules (1) aligned with static image features, and (2) may possess prior knowledge of world dynamics (as demonstrated in the text world), they have not been trained in an embodied visual world and thus cannot align with its dynamics. On the other hand, training an embodied agent in a noisy visual world without expert guidance is often chal-lenging and inefficient. In this paper, we train a VLM agent living in a visual world using an LLM agent excelling in a parallel text world. Specifically, we distill LLM's reflection outcomes (improved actions by analyzing mistakes) in a text world's tasks to finetune the VLM on the same tasks of the visual world, resulting in an Embodied Multi-Modal Agent (EMMA) quickly adapting to the visual world dy-namics. Such cross-modality imitation learning between the two parallel worlds is achieved by a novel DAgger-DPO algorithm, enabling EMMA to generalize to a broad scope of new tasks without any further guidance from the LLM expert. Extensive evaluations on the ALFWorld benchmark's diverse tasks highlight EMMA's superior performance to SOTA VLM-based agents, e.g., 20%-70% improvement in the success rate.
821 |   </details>
822 | 
823 | - **LLM Multi-Agent Systems: Challenges and Open Problems**
824 |    Shanshan Han, Qifan Zhang, Yuhang Yao, Weizhao Jin, Zhaozhuo Xu, Chaoyang He
825 |    unknown 2024
826 |    [open paper page](https://api.semanticscholar.org/CorpusId:267499950)
827 |    <details>
828 |      <summary> Abstract </summary>
829 |      This paper explores existing works of multi-agent systems and identifies challenges that remain inadequately addressed. By leveraging the diverse capabilities and roles of individual agents within a multi-agent system, these systems can tackle complex tasks through collaboration. We discuss optimizing task allocation, fostering robust reasoning through iterative debates, managing complex and layered context information, and enhancing memory management to support the intricate interactions within multi-agent systems. We also explore the potential application of multi-agent systems in blockchain systems to shed light on their future development and application in real-world distributed systems.
830 |   </details>
831 | 
832 | # Agents for Research
833 | - **[Towards Collaborative Autonomous Research](https://agentrxiv.github.io/)**
834 | 
835 | 
836 | - [A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis](https://arxiv.org/pdf/2504.12322)
837 | 
838 | - [Cross-environment Cooperation Enables Zero-shot Multi-agent Coordination](https://kjha02.github.io/publication/cross-env-coop)
839 | 
840 | ----
841 | 
842 | # Citations
843 | 
844 | In the arxiv_bibtex.bib file, you can find the bibtex citations for all the papers in this repository.
845 | 
846 | 
847 | # Community
848 | Join the multi-agent community now on discord [HERE](https://discord.gg/jM3Z6M9uMq)
849 | 


--------------------------------------------------------------------------------
/src/arxiv-bibtex.py:
--------------------------------------------------------------------------------
  1 | import re
  2 | import urllib.request
  3 | import urllib.parse
  4 | import xml.etree.ElementTree as ET
  5 | import time
  6 | 
  7 | def extract_arxiv_ids(readme_content):
  8 |     """Extract arXiv IDs from various URL formats in the README."""
  9 |     # Pattern matches both PDF and abstract URLs
 10 |     patterns = [
 11 |         r'arxiv\.org/(?:pdf|abs)/(\d{4}\.\d{4,5})',
 12 |         r'arxiv\.org/(?:pdf|abs)/[a-z-]+/(\d{7})',  # For old arXiv IDs
 13 |         r'(\d{4}\.\d{4,5})'  # Bare arXiv IDs
 14 |     ]
 15 |     
 16 |     ids = set()
 17 |     for pattern in patterns:
 18 |         matches = re.finditer(pattern, readme_content, re.IGNORECASE)
 19 |         ids.update(match.group(1) for match in matches)
 20 |     
 21 |     return sorted(list(ids))
 22 | 
 23 | def fetch_arxiv_metadata(arxiv_id):
 24 |     """Fetch metadata for a single arXiv ID using the arXiv API."""
 25 |     base_url = 'http://export.arxiv.org/api/query?'
 26 |     query = f'id_list={arxiv_id}'
 27 |     url = base_url + query
 28 |     
 29 |     try:
 30 |         with urllib.request.urlopen(url) as response:
 31 |             data = response.read().decode('utf-8')
 32 |         return data
 33 |     except Exception as e:
 34 |         print(f"Error fetching metadata for {arxiv_id}: {e}")
 35 |         return None
 36 | 
 37 | def parse_xml_to_bibtex(xml_data, arxiv_id):
 38 |     """Parse arXiv API XML response into BibTeX format."""
 39 |     try:
 40 |         root = ET.fromstring(xml_data)
 41 |         # Define XML namespaces
 42 |         ns = {
 43 |             'atom': 'http://www.w3.org/2005/Atom',
 44 |             'arxiv': 'http://arxiv.org/schemas/atom'
 45 |         }
 46 |         
 47 |         # Extract entry data
 48 |         entry = root.find('atom:entry', ns)
 49 |         if entry is None:
 50 |             return None
 51 |         
 52 |         title = entry.find('atom:title', ns).text.strip().replace('\n', ' ')
 53 |         authors = [author.find('atom:name', ns).text for author in entry.findall('atom:author', ns)]
 54 |         published = entry.find('atom:published', ns).text[:4]  # Get year
 55 |         categories = entry.findall('atom:category', ns)
 56 |         primary_category = entry.find('arxiv:primary_category', ns).get('term')
 57 |         
 58 |         # Create author string and key
 59 |         author_string = ' and '.join(authors)
 60 |         first_author_last_name = authors[0].split()[-1].lower()
 61 |         bibtex_key = f"{first_author_last_name}{published}{arxiv_id.replace('.', '')}"
 62 |         
 63 |         # Format BibTeX entry
 64 |         bibtex = f"""@misc{{{bibtex_key},
 65 |     title = {{{title}}},
 66 |     author = {{{author_string}}},
 67 |     year = {{{published}}},
 68 |     eprint = {{{arxiv_id}}},
 69 |     archivePrefix = {{arXiv}},
 70 |     primaryClass = {{{primary_category}}}
 71 | }}"""
 72 |         return bibtex
 73 |     except Exception as e:
 74 |         print(f"Error parsing XML for {arxiv_id}: {e}")
 75 |         return None
 76 | 
 77 | def main(readme_path):
 78 |     """Main function to process README and generate BibTeX entries."""
 79 |     # Read README file
 80 |     with open(readme_path, 'r', encoding='utf-8') as f:
 81 |         content = f.read()
 82 |     
 83 |     # Extract arXiv IDs
 84 |     arxiv_ids = extract_arxiv_ids(content)
 85 |     print(f"Found {len(arxiv_ids)} arXiv IDs")
 86 |     
 87 |     # Fetch and process each ID
 88 |     bibtex_entries = []
 89 |     for arxiv_id in arxiv_ids:
 90 |         print(f"Processing {arxiv_id}...")
 91 |         xml_data = fetch_arxiv_metadata(arxiv_id)
 92 |         if xml_data:
 93 |             bibtex = parse_xml_to_bibtex(xml_data, arxiv_id)
 94 |             if bibtex:
 95 |                 bibtex_entries.append(bibtex)
 96 |             # Be nice to the arXiv API
 97 |             time.sleep(3)
 98 |     
 99 |     # Write results to file
100 |     with open('arxiv_bibtex.bib', 'w', encoding='utf-8') as f:
101 |         f.write('\n\n'.join(bibtex_entries))
102 |     
103 |     print(f"\nProcessed {len(bibtex_entries)} entries successfully")
104 |     print("Results written to arxiv_bibtex.bib")
105 | 
106 | if __name__ == "__main__":
107 |     readme_path = "../README.md"  # Update this path as needed
108 |     main(readme_path)
109 | 


--------------------------------------------------------------------------------
/src/arxiv_bibtex.bib:
--------------------------------------------------------------------------------
  1 | @misc{bai2022221208073,
  2 |   title         = {Constitutional AI: Harmlessness from AI Feedback},
  3 |   author        = {Yuntao Bai and Saurav Kadavath and Sandipan Kundu and Amanda Askell and Jackson Kernion and Andy Jones and Anna Chen and Anna Goldie and Azalia Mirhoseini and Cameron McKinnon and Carol Chen and Catherine Olsson and Christopher Olah and Danny Hernandez and Dawn Drain and Deep Ganguli and Dustin Li and Eli Tran-Johnson and Ethan Perez and Jamie Kerr and Jared Mueller and Jeffrey Ladish and Joshua Landau and Kamal Ndousse and Kamile Lukosuite and Liane Lovitt and Michael Sellitto and Nelson Elhage and Nicholas Schiefer and Noemi Mercado and Nova DasSarma and Robert Lasenby and Robin Larson and Sam Ringer and Scott Johnston and Shauna Kravec and Sheer El Showk and Stanislav Fort and Tamera Lanham and Timothy Telleen-Lawton and Tom Conerly and Tom Henighan and Tristan Hume and Samuel R. Bowman and Zac Hatfield-Dodds and Ben Mann and Dario Amodei and Nicholas Joseph and Sam McCandlish and Tom Brown and Jared Kaplan},
  4 |   year          = {2022},
  5 |   eprint        = {2212.08073},
  6 |   archiveprefix = {arXiv},
  7 |   primaryclass  = {cs.CL}
  8 | }
  9 | 
 10 | @misc{park2023230403442,
 11 |   title         = {Generative Agents: Interactive Simulacra of Human Behavior},
 12 |   author        = {Joon Sung Park and Joseph C. O'Brien and Carrie J. Cai and Meredith Ringel Morris and Percy Liang and Michael S. Bernstein},
 13 |   year          = {2023},
 14 |   eprint        = {2304.03442},
 15 |   archiveprefix = {arXiv},
 16 |   primaryclass  = {cs.HC}
 17 | }
 18 | 
 19 | @misc{zhuge2023230517066,
 20 |   title         = {Mindstorms in Natural Language-Based Societies of Mind},
 21 |   author        = {Mingchen Zhuge and Haozhe Liu and Francesco Faccio and Dylan R. Ashley and Róbert Csordás and Anand Gopalakrishnan and Abdullah Hamdi and Hasan Abed Al Kader Hammoud and Vincent Herrmann and Kazuki Irie and Louis Kirsch and Bing Li and Guohao Li and Shuming Liu and Jinjie Mai and Piotr Piękos and Aditya Ramesh and Imanol Schlag and Weimin Shi and Aleksandar Stanić and Wenyi Wang and Yuhui Wang and Mengmeng Xu and Deng-Ping Fan and Bernard Ghanem and Jürgen Schmidhuber},
 22 |   year          = {2023},
 23 |   eprint        = {2305.17066},
 24 |   archiveprefix = {arXiv},
 25 |   primaryclass  = {cs.AI}
 26 | }
 27 | 
 28 | @misc{qian2023230707924,
 29 |   title         = {ChatDev: Communicative Agents for Software Development},
 30 |   author        = {Chen Qian and Wei Liu and Hongzhang Liu and Nuo Chen and Yufan Dang and Jiahao Li and Cheng Yang and Weize Chen and Yusheng Su and Xin Cong and Juyuan Xu and Dahai Li and Zhiyuan Liu and Maosong Sun},
 31 |   year          = {2023},
 32 |   eprint        = {2307.07924},
 33 |   archiveprefix = {arXiv},
 34 |   primaryclass  = {cs.SE}
 35 | }
 36 | 
 37 | @misc{chan2023230807201,
 38 |   title         = {ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate},
 39 |   author        = {Chi-Min Chan and Weize Chen and Yusheng Su and Jianxuan Yu and Wei Xue and Shanghang Zhang and Jie Fu and Zhiyuan Liu},
 40 |   year          = {2023},
 41 |   eprint        = {2308.07201},
 42 |   archiveprefix = {arXiv},
 43 |   primaryclass  = {cs.CL}
 44 | }
 45 | 
 46 | @misc{wu2023230808155,
 47 |   title         = {AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation},
 48 |   author        = {Qingyun Wu and Gagan Bansal and Jieyu Zhang and Yiran Wu and Beibin Li and Erkang Zhu and Li Jiang and Xiaoyun Zhang and Shaokun Zhang and Jiale Liu and Ahmed Hassan Awadallah and Ryen W White and Doug Burger and Chi Wang},
 49 |   year          = {2023},
 50 |   eprint        = {2308.08155},
 51 |   archiveprefix = {arXiv},
 52 |   primaryclass  = {cs.AI}
 53 | }
 54 | 
 55 | @misc{yue2023231003094,
 56 |   title         = {Large Language Model Cascades with Mixture of Thoughts Representations   for Cost-efficient Reasoning},
 57 |   author        = {Murong Yue and Jie Zhao and Min Zhang and Liang Du and Ziyu Yao},
 58 |   year          = {2023},
 59 |   eprint        = {2310.03094},
 60 |   archiveprefix = {arXiv},
 61 |   primaryclass  = {cs.CL}
 62 | }
 63 | 
 64 | @misc{qian2023231217025,
 65 |   title         = {Experiential Co-Learning of Software-Developing Agents},
 66 |   author        = {Chen Qian and Yufan Dang and Jiahao Li and Wei Liu and Zihao Xie and Yifei Wang and Weize Chen and Cheng Yang and Xin Cong and Xiaoyin Che and Zhiyuan Liu and Maosong Sun},
 67 |   year          = {2023},
 68 |   eprint        = {2312.17025},
 69 |   archiveprefix = {arXiv},
 70 |   primaryclass  = {cs.CL}
 71 | }
 72 | 
 73 | @misc{zhang2024240201521,
 74 |   title         = {K-Level Reasoning: Establishing Higher Order Beliefs in Large Language   Models for Strategic Reasoning},
 75 |   author        = {Yadong Zhang and Shaoguang Mao and Tao Ge and Xun Wang and Yan Xia and Man Lan and Furu Wei},
 76 |   year          = {2024},
 77 |   eprint        = {2402.01521},
 78 |   archiveprefix = {arXiv},
 79 |   primaryclass  = {cs.CL}
 80 | }
 81 | 
 82 | @misc{li2024240205120,
 83 |   title         = {More Agents Is All You Need},
 84 |   author        = {Junyou Li and Qin Zhang and Yangbin Yu and Qiang Fu and Deheng Ye},
 85 |   year          = {2024},
 86 |   eprint        = {2402.05120},
 87 |   archiveprefix = {arXiv},
 88 |   primaryclass  = {cs.CL}
 89 | }
 90 | 
 91 | @misc{alshahwan2024240209171,
 92 |   title         = {Automated Unit Test Improvement using Large Language Models at Meta},
 93 |   author        = {Nadia Alshahwan and Jubin Chheda and Anastasia Finegenova and Beliz Gokkaya and Mark Harman and Inna Harper and Alexandru Marginean and Shubho Sengupta and Eddy Wang},
 94 |   year          = {2024},
 95 |   eprint        = {2402.09171},
 96 |   archiveprefix = {arXiv},
 97 |   primaryclass  = {cs.SE}
 98 | }
 99 | 
100 | @misc{zhao2024240211550,
101 |   title         = {LongAgent: Scaling Language Models to 128k Context through Multi-Agent   Collaboration},
102 |   author        = {Jun Zhao and Can Zu and Hao Xu and Yi Lu and Wei He and Yiwen Ding and Tao Gui and Qi Zhang and Xuanjing Huang},
103 |   year          = {2024},
104 |   eprint        = {2402.11550},
105 |   archiveprefix = {arXiv},
106 |   primaryclass  = {cs.CL}
107 | }
108 | 
109 | @misc{gao2024240214034,
110 |   title         = {AgentScope: A Flexible yet Robust Multi-Agent Platform},
111 |   author        = {Dawei Gao and Zitao Li and Xuchen Pan and Weirui Kuang and Zhijian Ma and Bingchen Qian and Fei Wei and Wenhao Zhang and Yuexiang Xie and Daoyuan Chen and Liuyi Yao and Hongyi Peng and Zeyu Zhang and Lin Zhu and Chen Cheng and Hongzhu Shi and Yaliang Li and Bolin Ding and Jingren Zhou},
112 |   year          = {2024},
113 |   eprint        = {2402.14034},
114 |   archiveprefix = {arXiv},
115 |   primaryclass  = {cs.MA}
116 | }
117 | 
118 | @misc{schoenegger2024240219379,
119 |   title         = {Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival   Human Crowd Accuracy},
120 |   author        = {Philipp Schoenegger and Indre Tuminauskaite and Peter S. Park and Philip E. Tetlock},
121 |   year          = {2024},
122 |   eprint        = {2402.19379},
123 |   archiveprefix = {arXiv},
124 |   primaryclass  = {cs.CY}
125 | }
126 | 
127 | @misc{chen2024240302419,
128 |   title         = {Are More LLM Calls All You Need? Towards Scaling Laws of Compound   Inference Systems},
129 |   author        = {Lingjiao Chen and Jared Quincy Davis and Boris Hanin and Peter Bailis and Ion Stoica and Matei Zaharia and James Zou},
130 |   year          = {2024},
131 |   eprint        = {2403.02419},
132 |   archiveprefix = {arXiv},
133 |   primaryclass  = {cs.LG}
134 | }
135 | 
136 | @misc{shen2024240303870,
137 |   title         = {Learning to Decode Collaboratively with Multiple Language Models},
138 |   author        = {Shannon Zejiang Shen and Hunter Lang and Bailin Wang and Yoon Kim and David Sontag},
139 |   year          = {2024},
140 |   eprint        = {2403.03870},
141 |   archiveprefix = {arXiv},
142 |   primaryclass  = {cs.CL}
143 | }
144 | 
145 | @misc{chiang2024240304132,
146 |   title         = {Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference},
147 |   author        = {Wei-Lin Chiang and Lianmin Zheng and Ying Sheng and Anastasios Nikolas Angelopoulos and Tianle Li and Dacheng Li and Hao Zhang and Banghua Zhu and Michael Jordan and Joseph E. Gonzalez and Ion Stoica},
148 |   year          = {2024},
149 |   eprint        = {2403.04132},
150 |   archiveprefix = {arXiv},
151 |   primaryclass  = {cs.AI}
152 | }
153 | 
154 | @misc{wang2024240308715,
155 |   title         = {SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language   Agents},
156 |   author        = {Ruiyi Wang and Haofei Yu and Wenxin Zhang and Zhengyang Qi and Maarten Sap and Graham Neubig and Yonatan Bisk and Hao Zhu},
157 |   year          = {2024},
158 |   eprint        = {2403.08715},
159 |   archiveprefix = {arXiv},
160 |   primaryclass  = {cs.CL}
161 | }
162 | 
163 | @misc{akiba2024240313187,
164 |   title         = {Evolutionary Optimization of Model Merging Recipes},
165 |   author        = {Takuya Akiba and Makoto Shing and Yujin Tang and Qi Sun and David Ha},
166 |   year          = {2024},
167 |   eprint        = {2403.13187},
168 |   archiveprefix = {arXiv},
169 |   primaryclass  = {cs.NE}
170 | }
171 | 
172 | @misc{yuan2024240313248,
173 |   title         = {Mora: Enabling Generalist Video Generation via A Multi-Agent Framework},
174 |   author        = {Zhengqing Yuan and Yixin Liu and Yihan Cao and Weixiang Sun and Haolong Jia and Ruoxi Chen and Zhaoxu Li and Bin Lin and Li Yuan and Lifang He and Chi Wang and Yanfang Ye and Lichao Sun},
175 |   year          = {2024},
176 |   eprint        = {2403.13248},
177 |   archiveprefix = {arXiv},
178 |   primaryclass  = {cs.CV}
179 | }
180 | 
181 | @misc{zhang2024240315157,
182 |   title         = {AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large   Language Models},
183 |   author        = {Chaoyun Zhang and Zicheng Ma and Yuhao Wu and Shilin He and Si Qin and Minghua Ma and Xiaoting Qin and Yu Kang and Yuyi Liang and Xiaoyu Gou and Yajie Xue and Qingwei Lin and Saravan Rajmohan and Dongmei Zhang and Qi Zhang},
184 |   year          = {2024},
185 |   eprint        = {2403.15157},
186 |   archiveprefix = {arXiv},
187 |   primaryclass  = {cs.SE}
188 | }
189 | 
190 | @misc{mei2024240316971,
191 |   title         = {AIOS: LLM Agent Operating System},
192 |   author        = {Kai Mei and Xi Zhu and Wujiang Xu and Wenyue Hua and Mingyu Jin and Zelong Li and Shuyuan Xu and Ruosong Ye and Yingqiang Ge and Yongfeng Zhang},
193 |   year          = {2024},
194 |   eprint        = {2403.16971},
195 |   archiveprefix = {arXiv},
196 |   primaryclass  = {cs.OS}
197 | }
198 | 
199 | @misc{tao2024240317927,
200 |   title         = {MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution},
201 |   author        = {Wei Tao and Yucheng Zhou and Yanlin Wang and Wenqiang Zhang and Hongyu Zhang and Yu Cheng},
202 |   year          = {2024},
203 |   eprint        = {2403.17927},
204 |   archiveprefix = {arXiv},
205 |   primaryclass  = {cs.SE}
206 | }
207 | 
208 | @misc{team2024240410179,
209 |   title         = {Scaling Instructable Agents Across Many Simulated Worlds},
210 |   author        = { SIMA Team and Maria Abi Raad and Arun Ahuja and Catarina Barros and Frederic Besse and Andrew Bolt and Adrian Bolton and Bethanie Brownfield and Gavin Buttimore and Max Cant and Sarah Chakera and Stephanie C. Y. Chan and Jeff Clune and Adrian Collister and Vikki Copeman and Alex Cullum and Ishita Dasgupta and Dario de Cesare and Julia Di Trapani and Yani Donchev and Emma Dunleavy and Martin Engelcke and Ryan Faulkner and Frankie Garcia and Charles Gbadamosi and Zhitao Gong and Lucy Gonzales and Kshitij Gupta and Karol Gregor and Arne Olav Hallingstad and Tim Harley and Sam Haves and Felix Hill and Ed Hirst and Drew A. Hudson and Jony Hudson and Steph Hughes-Fitt and Danilo J. Rezende and Mimi Jasarevic and Laura Kampis and Rosemary Ke and Thomas Keck and Junkyung Kim and Oscar Knagg and Kavya Kopparapu and Rory Lawton and Andrew Lampinen and Shane Legg and Alexander Lerchner and Marjorie Limont and Yulan Liu and Maria Loks-Thompson and Joseph Marino and Kathryn Martin Cussons and Loic Matthey and Siobhan Mcloughlin and Piermaria Mendolicchio and Hamza Merzic and Anna Mitenkova and Alexandre Moufarek and Valeria Oliveira and Yanko Oliveira and Hannah Openshaw and Renke Pan and Aneesh Pappu and Alex Platonov and Ollie Purkiss and David Reichert and John Reid and Pierre Harvey Richemond and Tyson Roberts and Giles Ruscoe and Jaume Sanchez Elias and Tasha Sandars and Daniel P. Sawyer and Tim Scholtes and Guy Simmons and Daniel Slater and Hubert Soyer and Heiko Strathmann and Peter Stys and Allison C. Tam and Denis Teplyashin and Tayfun Terzi and Davide Vercelli and Bojan Vujatovic and Marcus Wainwright and Jane X. Wang and Zhengdong Wang and Daan Wierstra and Duncan Williams and Nathaniel Wong and Sarah York and Nick Young},
211 |   year          = {2024},
212 |   eprint        = {2404.10179},
213 |   archiveprefix = {arXiv},
214 |   primaryclass  = {cs.RO}
215 | }
216 | 
217 | @misc{piatti2024240416698,
218 |   title         = {Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society   of LLM Agents},
219 |   author        = {Giorgio Piatti and Zhijing Jin and Max Kleiman-Weiner and Bernhard Schölkopf and Mrinmaya Sachan and Rada Mihalcea},
220 |   year          = {2024},
221 |   eprint        = {2404.16698},
222 |   archiveprefix = {arXiv},
223 |   primaryclass  = {cs.CL}
224 | }
225 | 
226 | @misc{verga2024240418796,
227 |   title         = {Replacing Judges with Juries: Evaluating LLM Generations with a Panel of   Diverse Models},
228 |   author        = {Pat Verga and Sebastian Hofstatter and Sophia Althammer and Yixuan Su and Aleksandra Piktus and Arkady Arkhangorodsky and Minjie Xu and Naomi White and Patrick Lewis},
229 |   year          = {2024},
230 |   eprint        = {2404.18796},
231 |   archiveprefix = {arXiv},
232 |   primaryclass  = {cs.CL}
233 | }
234 | 
235 | @misc{li2024240502957,
236 |   title         = {Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents},
237 |   author        = {Junkai Li and Siyu Wang and Meng Zhang and Weitao Li and Yunghwei Lai and Xinhui Kang and Weizhi Ma and Yang Liu},
238 |   year          = {2024},
239 |   eprint        = {2405.02957},
240 |   archiveprefix = {arXiv},
241 |   primaryclass  = {cs.AI}
242 | }
243 | 
244 | @misc{wu2024240511804,
245 |   title         = {(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration   for Translating Ultra-Long Literary Texts},
246 |   author        = {Minghao Wu and Yulin Yuan and Gholamreza Haffari and Longyue Wang},
247 |   year          = {2024},
248 |   eprint        = {2405.11804},
249 |   archiveprefix = {arXiv},
250 |   primaryclass  = {cs.CL}
251 | }
252 | 
253 | @misc{li2024240515145,
254 |   title         = {CulturePark: Boosting Cross-cultural Understanding in Large Language   Models},
255 |   author        = {Cheng Li and Damien Teney and Linyi Yang and Qingsong Wen and Xing Xie and Jindong Wang},
256 |   year          = {2024},
257 |   eprint        = {2405.15145},
258 |   archiveprefix = {arXiv},
259 |   primaryclass  = {cs.AI}
260 | }
261 | 
262 | @misc{wang2024240601014,
263 |   title         = {Mobile-Agent-v2: Mobile Device Operation Assistant with Effective   Navigation via Multi-Agent Collaboration},
264 |   author        = {Junyang Wang and Haiyang Xu and Haitao Jia and Xi Zhang and Ming Yan and Weizhou Shen and Ji Zhang and Fei Huang and Jitao Sang},
265 |   year          = {2024},
266 |   eprint        = {2406.01014},
267 |   archiveprefix = {arXiv},
268 |   primaryclass  = {cs.CL}
269 | }
270 | 
271 | @misc{chen2024240601304,
272 |   title         = {CodeR: Issue Resolving with Multi-Agent and Task Graphs},
273 |   author        = {Dong Chen and Shaoxin Lin and Muhan Zeng and Daoguang Zan and Jian-Gang Wang and Anton Cheshkov and Jun Sun and Hao Yu and Guoliang Dong and Artem Aliev and Jie Wang and Xiao Cheng and Guangtai Liang and Yuchi Ma and Pan Bian and Tao Xie and Qianxiang Wang},
274 |   year          = {2024},
275 |   eprint        = {2406.01304},
276 |   archiveprefix = {arXiv},
277 |   primaryclass  = {cs.CL}
278 | }
279 | 
280 | @misc{zhang2024240602818,
281 |   title         = {Chain of Agents: Large Language Models Collaborating on Long-Context   Tasks},
282 |   author        = {Yusen Zhang and Ruoxi Sun and Yanfei Chen and Tomas Pfister and Rui Zhang and Sercan Ö. Arik},
283 |   year          = {2024},
284 |   eprint        = {2406.02818},
285 |   archiveprefix = {arXiv},
286 |   primaryclass  = {cs.CL}
287 | }
288 | 
289 | @misc{xi2024240604151,
290 |   title         = {AgentGym: Evolving Large Language Model-based Agents across Diverse   Environments},
291 |   author        = {Zhiheng Xi and Yiwen Ding and Wenxiang Chen and Boyang Hong and Honglin Guo and Junzhe Wang and Dingwen Yang and Chenyang Liao and Xin Guo and Wei He and Songyang Gao and Lu Chen and Rui Zheng and Yicheng Zou and Tao Gui and Qi Zhang and Xipeng Qiu and Xuanjing Huang and Zuxuan Wu and Yu-Gang Jiang},
292 |   year          = {2024},
293 |   eprint        = {2406.04151},
294 |   archiveprefix = {arXiv},
295 |   primaryclass  = {cs.AI}
296 | }
297 | 
298 | @misc{wang2024240604692,
299 |   title         = {Mixture-of-Agents Enhances Large Language Model Capabilities},
300 |   author        = {Junlin Wang and Jue Wang and Ben Athiwaratkun and Ce Zhang and James Zou},
301 |   year          = {2024},
302 |   eprint        = {2406.04692},
303 |   archiveprefix = {arXiv},
304 |   primaryclass  = {cs.CL}
305 | }
306 | 
307 | @misc{yuan2024240614228,
308 |   title         = {EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary   Algorithms},
309 |   author        = {Siyu Yuan and Kaitao Song and Jiangjie Chen and Xu Tan and Dongsheng Li and Deqing Yang},
310 |   year          = {2024},
311 |   eprint        = {2406.14228},
312 |   archiveprefix = {arXiv},
313 |   primaryclass  = {cs.AI}
314 | }
315 | 
316 | @misc{ong2024240618665,
317 |   title         = {RouteLLM: Learning to Route LLMs with Preference Data},
318 |   author        = {Isaac Ong and Amjad Almahairi and Vincent Wu and Wei-Lin Chiang and Tianhao Wu and Joseph E. Gonzalez and M Waleed Kadous and Ion Stoica},
319 |   year          = {2024},
320 |   eprint        = {2406.18665},
321 |   archiveprefix = {arXiv},
322 |   primaryclass  = {cs.LG}
323 | }
324 | 
325 | @misc{ge2024240620094,
326 |   title         = {Scaling Synthetic Data Creation with 1,000,000,000 Personas},
327 |   author        = {Tao Ge and Xin Chan and Xiaoyang Wang and Dian Yu and Haitao Mi and Dong Yu},
328 |   year          = {2024},
329 |   eprint        = {2406.20094},
330 |   archiveprefix = {arXiv},
331 |   primaryclass  = {cs.CL}
332 | }
333 | 
334 | @misc{mitra2024240703502,
335 |   title         = {AgentInstruct: Toward Generative Teaching with Agentic Flows},
336 |   author        = {Arindam Mitra and Luciano Del Corro and Guoqing Zheng and Shweti Mahajan and Dany Rouhana and Andres Codas and Yadong Lu and Wei-ge Chen and Olga Vrousgos and Corby Rosset and Fillipe Silva and Hamed Khanpour and Yash Lara and Ahmed Awadallah},
337 |   year          = {2024},
338 |   eprint        = {2407.03502},
339 |   archiveprefix = {arXiv},
340 |   primaryclass  = {cs.AI}
341 | }
342 | 
343 | @misc{kenton2024240704622,
344 |   title         = {On scalable oversight with weak LLMs judging strong LLMs},
345 |   author        = {Zachary Kenton and Noah Y. Siegel and János Kramár and Jonah Brown-Cohen and Samuel Albanie and Jannis Bulian and Rishabh Agarwal and David Lindner and Yunhao Tang and Noah D. Goodman and Rohin Shah},
346 |   year          = {2024},
347 |   eprint        = {2407.04622},
348 |   archiveprefix = {arXiv},
349 |   primaryclass  = {cs.LG}
350 | }
351 | 
352 | @misc{chen2024240707061,
353 |   title         = {Internet of Agents: Weaving a Web of Heterogeneous Agents for   Collaborative Intelligence},
354 |   author        = {Weize Chen and Ziming You and Ran Li and Yitong Guan and Chen Qian and Chenyang Zhao and Cheng Yang and Ruobing Xie and Zhiyuan Liu and Maosong Sun},
355 |   year          = {2024},
356 |   eprint        = {2407.07061},
357 |   archiveprefix = {arXiv},
358 |   primaryclass  = {cs.CL}
359 | }
360 | 
361 | @misc{nisioti2024240709502,
362 |   title         = {From Text to Life: On the Reciprocal Relationship between Artificial   Life and Large Language Models},
363 |   author        = {Eleni Nisioti and Claire Glanois and Elias Najarro and Andrew Dai and Elliot Meyerson and Joachim Winther Pedersen and Laetitia Teodorescu and Conor F. Hayes and Shyam Sudhakaran and Sebastian Risi},
364 |   year          = {2024},
365 |   eprint        = {2407.09502},
366 |   archiveprefix = {arXiv},
367 |   primaryclass  = {cs.NE}
368 | }
369 | 
370 | @misc{sun2024240717535,
371 |   title         = {LAMBDA: A Large Model Based Data Agent},
372 |   author        = {Maojun Sun and Ruijian Han and Binyan Jiang and Houduo Qi and Defeng Sun and Yancheng Yuan and Jian Huang},
373 |   year          = {2024},
374 |   eprint        = {2407.17535},
375 |   archiveprefix = {arXiv},
376 |   primaryclass  = {cs.AI}
377 | }
378 | 
379 | @misc{pan2024240717789,
380 |   title         = {Very Large-Scale Multi-Agent Simulation in AgentScope},
381 |   author        = {Xuchen Pan and Dawei Gao and Yuexiang Xie and Yushuo Chen and Zhewei Wei and Yaliang Li and Bolin Ding and Ji-Rong Wen and Jingren Zhou},
382 |   year          = {2024},
383 |   eprint        = {2407.17789},
384 |   archiveprefix = {arXiv},
385 |   primaryclass  = {cs.MA}
386 | }
387 | 
388 | @misc{palo2024240720798,
389 |   title         = {Diffusion Augmented Agents: A Framework for Efficient Exploration and   Transfer Learning},
390 |   author        = {Norman Di Palo and Leonard Hasenclever and Jan Humplik and Arunkumar Byravan},
391 |   year          = {2024},
392 |   eprint        = {2407.20798},
393 |   archiveprefix = {arXiv},
394 |   primaryclass  = {cs.LG}
395 | }
396 | 
397 | @misc{jin2024240802479,
398 |   title         = {From LLMs to LLM-based Agents for Software Engineering: A Survey of   Current, Challenges and Future},
399 |   author        = {Haolin Jin and Linghan Huang and Haipeng Cai and Jun Yan and Bo Li and Huaming Chen},
400 |   year          = {2024},
401 |   eprint        = {2408.02479},
402 |   archiveprefix = {arXiv},
403 |   primaryclass  = {cs.SE}
404 | }
405 | 
406 | @misc{li2024240803615,
407 |   title         = {Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in   Long-Horizon Tasks},
408 |   author        = {Zaijing Li and Yuquan Xie and Rui Shao and Gongwei Chen and Dongmei Jiang and Liqiang Nie},
409 |   year          = {2024},
410 |   eprint        = {2408.03615},
411 |   archiveprefix = {arXiv},
412 |   primaryclass  = {cs.AI}
413 | }
414 | 
415 | @misc{liu2024240803910,
416 |   title         = {CodexGraph: Bridging Large Language Models and Code Repositories via   Code Graph Databases},
417 |   author        = {Xiangyan Liu and Bo Lan and Zhiyuan Hu and Yang Liu and Zhicheng Zhang and Fei Wang and Michael Shieh and Wenmeng Zhou},
418 |   year          = {2024},
419 |   eprint        = {2408.03910},
420 |   archiveprefix = {arXiv},
421 |   primaryclass  = {cs.SE}
422 | }
423 | 
424 | @misc{lu2024240806292,
425 |   title         = {The AI Scientist: Towards Fully Automated Open-Ended Scientific   Discovery},
426 |   author        = {Chris Lu and Cong Lu and Robert Tjarko Lange and Jakob Foerster and Jeff Clune and David Ha},
427 |   year          = {2024},
428 |   eprint        = {2408.06292},
429 |   archiveprefix = {arXiv},
430 |   primaryclass  = {cs.AI}
431 | }
432 | 
433 | @misc{zhang2024240807060,
434 |   title         = {Diversity Empowers Intelligence: Integrating Expertise of Software   Engineering Agents},
435 |   author        = {Kexun Zhang and Weiran Yao and Zuxin Liu and Yihao Feng and Zhiwei Liu and Rithesh Murthy and Tian Lan and Lei Li and Renze Lou and Jiacheng Xu and Bo Pang and Yingbo Zhou and Shelby Heinecke and Silvio Savarese and Huan Wang and Caiming Xiong},
436 |   year          = {2024},
437 |   eprint        = {2408.07060},
438 |   archiveprefix = {arXiv},
439 |   primaryclass  = {cs.SE}
440 | }
441 | 
442 | @misc{hu2024240808435,
443 |   title         = {Automated Design of Agentic Systems},
444 |   author        = {Shengran Hu and Cong Lu and Jeff Clune},
445 |   year          = {2024},
446 |   eprint        = {2408.08435},
447 |   archiveprefix = {arXiv},
448 |   primaryclass  = {cs.AI}
449 | }
450 | 
451 | @misc{arif2024240808688,
452 |   title         = {The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic   Preference Optimization Dataset Generation},
453 |   author        = {Samee Arif and Sualeha Farid and Abdul Hameed Azeemi and Awais Athar and Agha Ali Raza},
454 |   year          = {2024},
455 |   eprint        = {2408.08688},
456 |   archiveprefix = {arXiv},
457 |   primaryclass  = {cs.CL}
458 | }
459 | 
460 | @misc{wei2024240812496,
461 |   title         = {MEDCO: Medical Education Copilots Based on A Multi-Agent Framework},
462 |   author        = {Hao Wei and Jianing Qiu and Haibao Yu and Wu Yuan},
463 |   year          = {2024},
464 |   eprint        = {2408.12496},
465 |   archiveprefix = {arXiv},
466 |   primaryclass  = {cs.AI}
467 | }
468 | 
469 | @misc{tian2024240813406,
470 |   title         = {Optimizing Collaboration of LLM based Agents for Finite Element Analysis},
471 |   author        = {Chuan Tian and Yilei Zhang},
472 |   year          = {2024},
473 |   eprint        = {2408.13406},
474 |   archiveprefix = {arXiv},
475 |   primaryclass  = {cs.AI}
476 | }
477 | 
478 | @misc{ravuru2024240814484,
479 |   title         = {Agentic Retrieval-Augmented Generation for Time Series Analysis},
480 |   author        = {Chidaksh Ravuru and Sagar Srinivas Sakhinana and Venkataramana Runkana},
481 |   year          = {2024},
482 |   eprint        = {2408.14484},
483 |   archiveprefix = {arXiv},
484 |   primaryclass  = {cs.AI}
485 | }
486 | 
487 | @misc{liu2024240902977,
488 |   title         = {Large Language Model-Based Agents for Software Engineering: A Survey},
489 |   author        = {Junwei Liu and Kaixin Wang and Yixuan Chen and Xin Peng and Zhenpeng Chen and Lingming Zhang and Yiling Lou},
490 |   year          = {2024},
491 |   eprint        = {2409.02977},
492 |   archiveprefix = {arXiv},
493 |   primaryclass  = {cs.SE}
494 | }
495 | 
496 | @misc{alshehri2024240903789,
497 |   title         = {BreachSeek: A Multi-Agent Automated Penetration Tester},
498 |   author        = {Ibrahim Alshehri and Adnan Alshehri and Abdulrahman Almalki and Majed Bamardouf and Alaqsa Akbar},
499 |   year          = {2024},
500 |   eprint        = {2409.03789},
501 |   archiveprefix = {arXiv},
502 |   primaryclass  = {cs.CR}
503 | }
504 | 
505 | @misc{ghafarollahi2024240905556,
506 |   title         = {SciAgents: Automating scientific discovery through multi-agent   intelligent graph reasoning},
507 |   author        = {Alireza Ghafarollahi and Markus J. Buehler},
508 |   year          = {2024},
509 |   eprint        = {2409.05556},
510 |   archiveprefix = {arXiv},
511 |   primaryclass  = {cs.AI}
512 | }
513 | 
514 | @misc{nunez2024240910737,
515 |   title         = {AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation   through Static Analysis and Fuzz Testing},
516 |   author        = {Ana Nunez and Nafis Tanveer Islam and Sumit Kumar Jha and Peyman Najafirad},
517 |   year          = {2024},
518 |   eprint        = {2409.10737},
519 |   archiveprefix = {arXiv},
520 |   primaryclass  = {cs.SE}
521 | }
522 | 
523 | @misc{hassouna2024240911393,
524 |   title         = {LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless   Integration of Multi Active/Passive Core-Agents},
525 |   author        = {Amine Ben Hassouna and Hana Chaari and Ines Belhaj},
526 |   year          = {2024},
527 |   eprint        = {2409.11393},
528 |   archiveprefix = {arXiv},
529 |   primaryclass  = {cs.SE}
530 | }
531 | 
532 | @misc{haji2024240911527,
533 |   title         = {Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent},
534 |   author        = {Fatemeh Haji and Mazal Bethany and Maryam Tabar and Jason Chiang and Anthony Rios and Peyman Najafirad},
535 |   year          = {2024},
536 |   eprint        = {2409.11527},
537 |   archiveprefix = {arXiv},
538 |   primaryclass  = {cs.AI}
539 | }
540 | 
541 | @misc{wang2024240913449,
542 |   title         = {Minstrel: Structural Prompt Generation with Multi-Agents Coordination   for Non-AI Experts},
543 |   author        = {Ming Wang and Yuanzhong Liu and Xiaoyu Liang and Yijie Huang and Daling Wang and Xiaocui Yang and Sijia Shen and Shi Feng and Xiaoming Zhang and Chaofeng Guan and Yifei Zhang},
544 |   year          = {2024},
545 |   eprint        = {2409.13449},
546 |   archiveprefix = {arXiv},
547 |   primaryclass  = {cs.CL}
548 | }
549 | 
550 | @misc{jin2024241001242,
551 |   title         = {RGD: Multi-LLM Based Agent Debugger via Refinement and Generation   Guidance},
552 |   author        = {Haolin Jin and Zechao Sun and Huaming Chen},
553 |   year          = {2024},
554 |   eprint        = {2410.01242},
555 |   archiveprefix = {arXiv},
556 |   primaryclass  = {cs.SE}
557 | }
558 | 
559 | @misc{bhatnagar2024241001307,
560 |   title         = {FanCric : Multi-Agentic Framework for Crafting Fantasy 11 Cricket Teams},
561 |   author        = {Mohit Bhatnagar},
562 |   year          = {2024},
563 |   eprint        = {2410.01307},
564 |   archiveprefix = {arXiv},
565 |   primaryclass  = {cs.AI}
566 | }
567 | 
568 | @misc{yuan2024241002507,
569 |   title         = {Can Large Language Models Grasp Legal Theories? Enhance Legal Reasoning   with Insights from Multi-Agent Collaboration},
570 |   author        = {Weikang Yuan and Junjie Cao and Zhuoren Jiang and Yangyang Kang and Jun Lin and Kaisong Song and tianqianjin lin and Pengwei Yan and Changlong Sun and Xiaozhong Liu},
571 |   year          = {2024},
572 |   eprint        = {2410.02507},
573 |   archiveprefix = {arXiv},
574 |   primaryclass  = {cs.AI}
575 | }
576 | 
577 | @misc{huot2024241002603,
578 |   title         = {Agents' Room: Narrative Generation through Multi-step Collaboration},
579 |   author        = {Fantine Huot and Reinald Kim Amplayo and Jennimaria Palomaki and Alice Shoshana Jakobovits and Elizabeth Clark and Mirella Lapata},
580 |   year          = {2024},
581 |   eprint        = {2410.02603},
582 |   archiveprefix = {arXiv},
583 |   primaryclass  = {cs.CL}
584 | }
585 | 
586 | @misc{trirat2024241002958,
587 |   title         = {AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML},
588 |   author        = {Patara Trirat and Wonyong Jeong and Sung Ju Hwang},
589 |   year          = {2024},
590 |   eprint        = {2410.02958},
591 |   archiveprefix = {arXiv},
592 |   primaryclass  = {cs.LG}
593 | }
594 | 
595 | @misc{cisneros-velarde2024241004054,
596 |   title         = {Large Language Models can Achieve Social Balance},
597 |   author        = {Pedro Cisneros-Velarde},
598 |   year          = {2024},
599 |   eprint        = {2410.04054},
600 |   archiveprefix = {arXiv},
601 |   primaryclass  = {cs.CL}
602 | }
603 | 
604 | @misc{tang2024241004360,
605 |   title         = {GenSim: A General Social Simulation Platform with Large Language Model   based Agents},
606 |   author        = {Jiakai Tang and Heyang Gao and Xuchen Pan and Lei Wang and Haoran Tan and Dawei Gao and Yushuo Chen and Xu Chen and Yankai Lin and Yaliang Li and Bolin Ding and Jingren Zhou and Jun Wang and Ji-Rong Wen},
607 |   year          = {2024},
608 |   eprint        = {2410.04360},
609 |   archiveprefix = {arXiv},
610 |   primaryclass  = {cs.MA}
611 | }
612 | 
613 | @misc{bandi2024241004663,
614 |   title         = {Adversarial Multi-Agent Evaluation of Large Language Models through   Iterative Debates},
615 |   author        = {Chaithanya Bandi and Abir Harrasse},
616 |   year          = {2024},
617 |   eprint        = {2410.04663},
618 |   archiveprefix = {arXiv},
619 |   primaryclass  = {cs.CL}
620 | }
621 | 
622 | @misc{bai2024241008102,
623 |   title         = {Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining},
624 |   author        = {Tianyi Bai and Ling Yang and Zhen Hao Wong and Jiahui Peng and Xinlin Zhuang and Chi Zhang and Lijun Wu and Jiantao Qiu and Wentao Zhang and Binhang Yuan and Conghui He},
625 |   year          = {2024},
626 |   eprint        = {2410.08102},
627 |   archiveprefix = {arXiv},
628 |   primaryclass  = {cs.CL}
629 | }
630 | 
631 | @misc{chen2024241008115,
632 |   title         = {Optima: Optimizing Effectiveness and Efficiency for LLM-Based   Multi-Agent System},
633 |   author        = {Weize Chen and Jiarui Yuan and Chen Qian and Cheng Yang and Zhiyuan Liu and Maosong Sun},
634 |   year          = {2024},
635 |   eprint        = {2410.08115},
636 |   archiveprefix = {arXiv},
637 |   primaryclass  = {cs.CL}
638 | }
639 | 
640 | @misc{christakopoulou2024241008328,
641 |   title         = {Agents Thinking Fast and Slow: A Talker-Reasoner Architecture},
642 |   author        = {Konstantina Christakopoulou and Shibl Mourad and Maja Matarić},
643 |   year          = {2024},
644 |   eprint        = {2410.08328},
645 |   archiveprefix = {arXiv},
646 |   primaryclass  = {cs.AI}
647 | }
648 | 
649 | @misc{zhang2024241010762,
650 |   title         = {AFlow: Automating Agentic Workflow Generation},
651 |   author        = {Jiayi Zhang and Jinyu Xiang and Zhaoyang Yu and Fengwei Teng and Xionghui Chen and Jiaqi Chen and Mingchen Zhuge and Xin Cheng and Sirui Hong and Jinlin Wang and Bingnan Zheng and Bang Liu and Yuyu Luo and Chenglin Wu},
652 |   year          = {2024},
653 |   eprint        = {2410.10762},
654 |   archiveprefix = {arXiv},
655 |   primaryclass  = {cs.AI}
656 | }
657 | 
658 | @misc{zhuge2024241010934,
659 |   title         = {Agent-as-a-Judge: Evaluate Agents with Agents},
660 |   author        = {Mingchen Zhuge and Changsheng Zhao and Dylan Ashley and Wenyi Wang and Dmitrii Khizbullin and Yunyang Xiong and Zechun Liu and Ernie Chang and Raghuraman Krishnamoorthi and Yuandong Tian and Yangyang Shi and Vikas Chandra and Jürgen Schmidhuber},
661 |   year          = {2024},
662 |   eprint        = {2410.10934},
663 |   archiveprefix = {arXiv},
664 |   primaryclass  = {cs.AI}
665 | }
666 | 
667 | @misc{li2024241020424,
668 |   title         = {AutoKaggle: A Multi-Agent Framework for Autonomous Data Science   Competitions},
669 |   author        = {Ziming Li and Qianbo Zang and David Ma and Jiawei Guo and Tuney Zheng and Minghao Liu and Xinyao Niu and Yue Wang and Jian Yang and Jiaheng Liu and Wanjun Zhong and Wangchunshu Zhou and Wenhao Huang and Ge Zhang},
670 |   year          = {2024},
671 |   eprint        = {2410.20424},
672 |   archiveprefix = {arXiv},
673 |   primaryclass  = {cs.AI}
674 | }
675 | 
676 | @misc{nguyen2024241101747,
677 |   title         = {DynaSaur: Large Language Agents Beyond Predefined Actions},
678 |   author        = {Dang Nguyen and Viet Dac Lai and Seunghyun Yoon and Ryan A. Rossi and Handong Zhao and Ruiyi Zhang and Puneet Mathur and Nedim Lipka and Yu Wang and Trung Bui and Franck Dernoncourt and Tianyi Zhou},
679 |   year          = {2024},
680 |   eprint        = {2411.01747},
681 |   archiveprefix = {arXiv},
682 |   primaryclass  = {cs.CL}
683 | }
684 | 
685 | @misc{piatti2024cooperatecollapseemergencesustainable,
686 |   title         = {Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents},
687 |   author        = {Giorgio Piatti and Zhijing Jin and Max Kleiman-Weiner and Bernhard Schölkopf and Mrinmaya Sachan and Rada Mihalcea},
688 |   year          = {2024},
689 |   eprint        = {2404.16698},
690 |   archiveprefix = {arXiv},
691 |   primaryclass  = {cs.CL},
692 |   url           = {https://arxiv.org/abs/2404.16698}
693 | }
694 | 
695 | @misc{schmidgall2024agentclinicmultimodalagentbenchmark,
696 |   title         = {AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments},
697 |   author        = {Samuel Schmidgall and Rojin Ziaei and Carl Harris and Eduardo Reis and Jeffrey Jopling and Michael Moor},
698 |   year          = {2024},
699 |   eprint        = {2405.07960},
700 |   archiveprefix = {arXiv},
701 |   primaryclass  = {cs.HC},
702 |   url           = {https://arxiv.org/abs/2405.07960}
703 | }
704 | 
705 | @misc{doyle2024llmsmethodactorsmodel,
706 |   title         = {LLMs as Method Actors: A Model for Prompt Engineering and Architecture},
707 |   author        = {Colin Doyle},
708 |   year          = {2024},
709 |   eprint        = {2411.05778},
710 |   archiveprefix = {arXiv},
711 |   primaryclass  = {cs.AI},
712 |   url           = {https://arxiv.org/abs/2411.05778}
713 | }
714 | 
715 | @misc{vallinder2024culturalevolutioncooperationllm,
716 |   title         = {Cultural Evolution of Cooperation among LLM Agents},
717 |   author        = {Aron Vallinder and Edward Hughes},
718 |   year          = {2024},
719 |   eprint        = {2412.10270},
720 |   archiveprefix = {arXiv},
721 |   primaryclass  = {cs.MA},
722 |   url           = {https://arxiv.org/abs/2412.10270}
723 | }
724 | 
725 | @misc{zhou2024proposeragentevaluatorpaeautonomousskilldiscovery,
726 |   title         = {Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents},
727 |   author        = {Yifei Zhou and Qianlan Yang and Kaixiang Lin and Min Bai and Xiong Zhou and Yu-Xiong Wang and Sergey Levine and Erran Li},
728 |   year          = {2024},
729 |   eprint        = {2412.13194},
730 |   archiveprefix = {arXiv},
731 |   primaryclass  = {cs.LG},
732 |   url           = {https://arxiv.org/abs/2412.13194}
733 | }
734 | 
735 | 
736 | @misc{kong2025sdposegmentleveldirectpreference,
737 |   title         = {SDPO: Segment-Level Direct Preference Optimization for Social Agents},
738 |   author        = {Aobo Kong and Wentao Ma and Shiwan Zhao and Yongbin Li and Yuchuan Wu and Ke Wang and Xiaoqian Liu and Qicheng Li and Yong Qin and Fei Huang},
739 |   year          = {2025},
740 |   eprint        = {2501.01821},
741 |   archiveprefix = {arXiv},
742 |   primaryclass  = {cs.AI},
743 |   url           = {https://arxiv.org/abs/2501.01821}
744 | }
745 | 
746 | @misc{gandhi2025boxinggymbenchmarkingprogressautomated,
747 |   title         = {BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery},
748 |   author        = {Kanishk Gandhi and Michael Y. Li and Lyle Goodyear and Louise Li and Aditi Bhaskar and Mohammed Zaman and Noah D. Goodman},
749 |   year          = {2025},
750 |   eprint        = {2501.01540},
751 |   archiveprefix = {arXiv},
752 |   primaryclass  = {cs.LG},
753 |   url           = {https://arxiv.org/abs/2501.01540}
754 | }
755 | 
756 | @misc{xu2024theagentcompanybenchmarkingllmagents,
757 |   title         = {TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks},
758 |   author        = {Frank F. Xu and Yufan Song and Boxuan Li and Yuxuan Tang and Kritanjali Jain and Mengxue Bao and Zora Z. Wang and Xuhui Zhou and Zhitong Guo and Murong Cao and Mingyang Yang and Hao Yang Lu and Amaad Martin and Zhe Su and Leander Maben and Raj Mehta and Wayne Chi and Lawrence Jang and Yiqing Xie and Shuyan Zhou and Graham Neubig},
759 |   year          = {2024},
760 |   eprint        = {2412.14161},
761 |   archiveprefix = {arXiv},
762 |   primaryclass  = {cs.CL},
763 |   url           = {https://arxiv.org/abs/2412.14161}
764 | }
765 | 
766 | @misc{feng2025llmdroolsmultillmcollaboration,
767 |   title         = {When One LLM Drools, Multi-LLM Collaboration Rules},
768 |   author        = {Shangbin Feng and Wenxuan Ding and Alisa Liu and Zifeng Wang and Weijia Shi and Yike Wang and Zejiang Shen and Xiaochuang Han and Hunter Lang and Chen-Yu Lee and Tomas Pfister and Yejin Choi and Yulia Tsvetkov},
769 |   year          = {2025},
770 |   eprint        = {2502.04506},
771 |   archiveprefix = {arXiv},
772 |   primaryclass  = {cs.CL},
773 |   url           = {https://arxiv.org/abs/2502.04506}
774 | }
775 | @misc{liu2025pcagenthierarchicalmultiagentcollaboration,
776 |   title         = {PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC},
777 |   author        = {Haowei Liu and Xi Zhang and Haiyang Xu and Yuyang Wanyan and Junyang Wang and Ming Yan and Ji Zhang and Chunfeng Yuan and Changsheng Xu and Weiming Hu and Fei Huang},
778 |   year          = {2025},
779 |   eprint        = {2502.14282},
780 |   archiveprefix = {arXiv},
781 |   primaryclass  = {cs.CV},
782 |   url           = {https://arxiv.org/abs/2502.14282}
783 | }
784 | 
785 | @misc{wang2025talkstructurallyacthierarchically,
786 |   title         = {Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems},
787 |   author        = {Zhao Wang and Sota Moriyama and Wei-Yao Wang and Briti Gangopadhyay and Shingo Takamatsu},
788 |   year          = {2025},
789 |   eprint        = {2502.11098},
790 |   archiveprefix = {arXiv},
791 |   primaryclass  = {cs.AI},
792 |   url           = {https://arxiv.org/abs/2502.11098}
793 | }
794 | 
795 | @misc{zhu2025multiagentbenchevaluatingcollaborationcompetition,
796 |   title         = {MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents},
797 |   author        = {Kunlun Zhu and Hongyi Du and Zhaochen Hong and Xiaocheng Yang and Shuyi Guo and Zhe Wang and Zhenhailong Wang and Cheng Qian and Xiangru Tang and Heng Ji and Jiaxuan You},
798 |   year          = {2025},
799 |   eprint        = {2503.01935},
800 |   archiveprefix = {arXiv},
801 |   primaryclass  = {cs.MA},
802 |   url           = {https://arxiv.org/abs/2503.01935}
803 | }
804 | 
805 | @misc{motwani2025maltimprovingreasoningmultiagent,
806 |   title         = {MALT: Improving Reasoning with Multi-Agent LLM Training},
807 |   author        = {Sumeet Ramesh Motwani and Chandler Smith and Rocktim Jyoti Das and Rafael Rafailov and Ivan Laptev and Philip H. S. Torr and Fabio Pizzati and Ronald Clark and Christian Schroeder de Witt},
808 |   year          = {2025},
809 |   eprint        = {2412.01928},
810 |   archiveprefix = {arXiv},
811 |   primaryclass  = {cs.LG},
812 |   url           = {https://arxiv.org/abs/2412.01928}
813 | }
814 | 
815 | @misc{qin2024mp5multimodalopenendedembodied,
816 |   title         = {MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception},
817 |   author        = {Yiran Qin and Enshen Zhou and Qichang Liu and Zhenfei Yin and Lu Sheng and Ruimao Zhang and Yu Qiao and Jing Shao},
818 |   year          = {2024},
819 |   eprint        = {2312.07472},
820 |   archiveprefix = {arXiv},
821 |   primaryclass  = {cs.CV},
822 |   url           = {https://arxiv.org/abs/2312.07472}
823 | }
824 | 
825 | @misc{zhang2025multiagentarchitecturesearchagentic,
826 |   title         = {Multi-agent Architecture Search via Agentic Supernet},
827 |   author        = {Guibin Zhang and Luyang Niu and Junfeng Fang and Kun Wang and Lei Bai and Xiang Wang},
828 |   year          = {2025},
829 |   eprint        = {2502.04180},
830 |   archiveprefix = {arXiv},
831 |   primaryclass  = {cs.LG},
832 |   url           = {https://arxiv.org/abs/2502.04180}
833 | }
834 | 
835 | @misc{piao2025agentsocietylargescalesimulationllmdriven,
836 |   title         = {AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society},
837 |   author        = {Jinghua Piao and Yuwei Yan and Jun Zhang and Nian Li and Junbo Yan and Xiaochong Lan and Zhihong Lu and Zhiheng Zheng and Jing Yi Wang and Di Zhou and Chen Gao and Fengli Xu and Fang Zhang and Ke Rong and Jun Su and Yong Li},
838 |   year          = {2025},
839 |   eprint        = {2502.08691},
840 |   archiveprefix = {arXiv},
841 |   primaryclass  = {cs.SI},
842 |   url           = {https://arxiv.org/abs/2502.08691}
843 | }
844 | 
845 | @misc{yang2024oasisopenagentsocial,
846 |   title         = {OASIS: Open Agent Social Interaction Simulations with One Million Agents},
847 |   author        = {Ziyi Yang and Zaibin Zhang and Zirui Zheng and Yuxian Jiang and Ziyue Gan and Zhiyu Wang and Zijian Ling and Jinsong Chen and Martz Ma and Bowen Dong and Prateek Gupta and Shuyue Hu and Zhenfei Yin and Guohao Li and Xu Jia and Lijun Wang and Bernard Ghanem and Huchuan Lu and Chaochao Lu and Wanli Ouyang and Yu Qiao and Philip Torr and Jing Shao},
848 |   year          = {2024},
849 |   eprint        = {2411.11581},
850 |   archiveprefix = {arXiv},
851 |   primaryclass  = {cs.CL},
852 |   url           = {https://arxiv.org/abs/2411.11581}
853 | }
854 | 
855 | @misc{michelman2025enhancingreasoningcollaborationmemory,
856 |   title         = {Enhancing Reasoning with Collaboration and Memory},
857 |   author        = {Julie Michelman and Nasrin Baratalipour and Matthew Abueg},
858 |   year          = {2025},
859 |   eprint        = {2503.05944},
860 |   archiveprefix = {arXiv},
861 |   primaryclass  = {cs.AI},
862 |   url           = {https://arxiv.org/abs/2503.05944}
863 | }
864 | 
865 | @misc{gawade2025multiagentbasedmedical,
866 |       title={Multi Agent based Medical Assistant for Edge Devices}, 
867 |       author={Sakharam Gawade and Shivam Akhouri and Chinmay Kulkarni and Jagdish Samant and Pragya Sahu and Aastik and Jai Pahal and Saswat Meher},
868 |       year={2025},
869 |       eprint={2503.05397},
870 |       archivePrefix={arXiv},
871 |       primaryClass={cs.MA},
872 |       url={https://arxiv.org/abs/2503.05397}, 
873 | }


--------------------------------------------------------------------------------