└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # awesome-llm-tool-learning 2 | A list of awesome papers on LLM tool learning. 3 | 4 | ## Preliminary 5 | ReAct: Synergizing Reasoning and Acting in Language Models [[ICLR 2023](cmt3.research.microsoft.com/AAAI2024/Submission/MetaReviews/14045)][[Code](https://github.com/ysymyth/ReAct)] 6 | 7 | RRHF: Rank Responses to Align Language Models with Human Feedback without tears [[NeurIPS 2023](https://arxiv.org/abs/2304.05302)][[Code](https://github.com/ganjinzero/rrhf)] 8 | 9 | Extending Context Window of Large Language Models via Positional Interpolation [[Arxiv 2023](https://arxiv.org/abs/2306.15595)][[Code](https://github.com/ymcui/chinese-llama-alpaca-2)] 10 | 11 | ## Survey 12 | Tool Learning with Foundation Models [[Arxiv](https://arxiv.org/abs/2304.08354)][[Code](https://github.com/openbmb/bmtools)] 13 | 14 | ## Papers 15 | ### 2023 16 | HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face [[Arxiv](https://arxiv.org/abs/2303.17580)][[Code](https://github.com/microsoft/JARVIS)] 17 | 18 | ART: Automatic multi-step reasoning and tool-use for large language models [[Arxiv](https://arxiv.org/abs/2303.09014)][[Code](https://github.com/guidance-ai/guidance)] 19 | 20 | Gorilla: Large Language Model Connected with Massive APIs [[Arxiv](https://arxiv.org/abs/2305.15334)][[Code](https://github.com/ShishirPatil/gorilla)] 21 | 22 | TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs [[Arxiv](https://arxiv.org/abs/2303.16434)][[Code](https://github.com/moymix/TaskMatrix)] 23 | 24 | Large Language Models as Tool Makers [[Arxiv](https://arxiv.org/abs/2305.17126)][[Code](https://github.com/ctlllll/llm-toolmaker)] 25 | 26 | MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting [[ACL 2023](https://arxiv.org/pdf/2305.16896.pdf)][[Code](https://github.com/InabaTatsuro/MultiTool-CoT)] 27 | 28 | Gentopia.AI: A Collaborative Platform for Tool-Augmented LLMs [[EMNLP 2023](https://aclanthology.org/2023.emnlp-demo.20/)][[Code](https://github.com/Gentopia-AI/Gentopia)] 29 | 30 | CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models [[EMNLP Findings 2023](https://aclanthology.org/2023.findings-emnlp.462.pdf)][[Code](https://github.com/qiancheng0/creator)] 31 | 32 | On the Tool Manipulation Capability of Open-source Large Language Models [[Arxiv](https://arxiv.org/abs/2305.16504)][[Code](https://github.com/sambanova/toolbench)] 33 | 34 | Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models [[Arxiv](https://arxiv.org/pdf/2308.00675.pdf)] 35 | 36 | Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models [[NeurIPS 2023](https://arxiv.org/abs/2304.09842)][[Code](https://github.com/lupantech/chameleon-llm)] 37 | 38 | **Toolformer: Language Models Can Teach Themselves to Use Tools** [[NeurIPS 2023](https://arxiv.org/abs/2302.04761)][[Code](https://github.com/lucidrains/toolformer-pytorch)] 39 | 40 | **GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction** [[NeurIPS 2023](https://arxiv.org/abs/2305.18752)][[Code](https://github.com/AILab-CVC/GPT4Tools)] 41 | 42 | Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning [[NeurIPS 2023](https://arxiv.org/abs/2306.02408)][[Code](https://github.com/rucaibox/carp) 43 | 44 | ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings [[NeurIPS 2023](https://arxiv.org/abs/2305.11554)][[Code](https://github.com/Ber666/ToolkenGPT)] 45 | 46 | TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage [[NeurIPS 2023 Workshop](https://arxiv.org/abs/2308.03427)] 47 | 48 | Making Language Models Better Tool Learners with Execution Feedback [[Arxiv](https://arxiv.org/abs/2305.13068)][[Code](https://github.com/zjunlp/trice)] ![](https://img.shields.io/badge/RL-orange) 49 | 50 | ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases [[Arxiv](https://arxiv.org/abs/2306.05301)][[Code](https://github.com/tangqiaoyu/ToolAlpaca)] 51 | 52 | Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum [[AAAI 2024](https://arxiv.org/abs/2308.14034)][[Code](https://github.com/shizhl/Confucius)] 53 | 54 | **ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs** [[ICLR 2024](https://arxiv.org/pdf/2307.16789.pdf)][[Code](https://github.com/openbmb/toolbench)] 55 | 56 | CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets [[ICLR 2024](https://arxiv.org/abs/2309.17428)][[Code](https://github.com/lifan-yuan/craft)] 57 | 58 | ToolDec: Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding [[Arxiv](https://arxiv.org/abs/2310.07075)][[Code](https://github.com/chenhongqiao/tooldec)] 59 | 60 | Identifying the Risks of LM Agents with an LM-Emulated Sandbox [[Arxiv](https://arxiv.org/pdf/2309.15817.pdf)][[Code](https://github.com/ryoungj/ToolEmu)] 61 | 62 | ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search [[Arxiv](https://arxiv.org/pdf/2310.13227v1.pdf)] ![](https://img.shields.io/badge/Planning-green) 63 | 64 | Tool-Augmented Reward Modeling [[Arxiv](https://arxiv.org/abs/2310.01045)] ![](https://img.shields.io/badge/RL-orange) 65 | 66 | ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [[Arxiv](https://arxiv.org/abs/2309.17452)][[Code](https://github.com/microsoft/ToRA)] 67 | 68 | CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing [[Arxiv](https://arxiv.org/pdf/2305.11738.pdf)][[Code](https://github.com/microsoft/ProphetNet/tree/master/CRITIC)] 69 | 70 | RestGPT: Connecting Large Language Models with Real-World RESTful APIs [[Arxiv](https://arxiv.org/abs/2306.06624)] 71 | 72 | **Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use** [[Arxiv](https://arxiv.org/abs/2312.04455)] 73 | 74 | ControlLLM: Augment Language Models with Tools by Searching on Graphs [[Arxiv](https://arxiv.org/pdf/2310.17796.pdf)][[Code](https://github.com/opengvlab/controlllm)] ![](https://img.shields.io/badge/Planning-green) 75 | 76 | GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution [[Arxiv](https://arxiv.org/abs/2307.08775)][[Code](https://github.com/yining610/gear)] 77 | 78 | GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension [[Arxiv](https://arxiv.org/pdf/2312.17294.pdf)] 79 | 80 | AppAgent: Multimodal Agents as Smartphone Users [[Arxiv](https://arxiv.org/abs/2312.13771)][[Code](https://github.com/mnotgod96/AppAgent?tab=readme-ov-file)] 81 | 82 | VIoTGPT: Learning to Schedule Vision Tools towards Intelligent Video Internet of Things [[Arxiv](https://arxiv.org/abs/2312.00401)] 83 | 84 | Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning [[Arxiv](https://arxiv.org/pdf/2310.04474.pdf)][[Code](https://github.com/ASK-03/Reverse-Chain)] ![](https://img.shields.io/badge/Planning-green) 85 | 86 | CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update [[Arxiv](https://arxiv.org/abs/2312.10908)] 87 | 88 | FARS: Fsm-Augmentation to Make LLMs Hallucinate the Right APIs [[Arxiv](https://openreview.net/pdf/847a1c7446716c28f2c9c63fa1d7bf07d02e7757.pdf)] 89 | 90 | Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API [[Arxiv](https://arxiv.org/pdf/2310.04716.pdf)] ![](https://img.shields.io/badge/RL-orange) 91 | 92 | Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering [[Arxiv](https://arxiv.org/abs/2401.01780)] 93 | 94 | **EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction** [[Arxiv](https://arxiv.org/abs/2401.06201)][[Code](https://github.com/microsoft/JARVIS/tree/main/easytool)] 95 | 96 | **Small LLMs Are Weak Tool Learners: A Multi-LLM Agent** [[Arxiv](https://arxiv.org/abs/2401.07324)] 97 | 98 | 99 | ### 2024 100 | MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning [[Arxiv](https://arxiv.org/abs/2401.10727)][[Code](https://github.com/MLLM-Tool/MLLM-Tool)] 101 | 102 | **Efficient Tool Use with Chain-of-Abstraction Reasoning** [[Arxiv](https://arxiv.org/pdf/2401.17464v1.pdf)] 103 | 104 | AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls [[Arxiv](https://arxiv.org/abs/2402.04253)][[Code](https://github.com/dyabel/anytool)] 105 | 106 | ToolRerank: Adaptive and Hierarchy-Aware Reranking for Tool Retrieval [[Arxiv](https://arxiv.org/abs/2403.06551)][[Code](https://github.com/XiaoMi/ToolRerank)] 107 | 108 | ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph [[Arxiv](https://arxiv.org/pdf/2403.00839.pdf)] 109 | 110 | API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs [[Arxiv](https://arxiv.org/abs/2402.15491)] 111 | 112 | TOOLVERIFIER: Generalization to New Tools via Self-Verification [[Arxiv](https://arxiv.org/abs/2402.14158)][[Code](https://github.com/facebookresearch/toolverifier)] 113 | 114 | Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models [[Arxiv](https://arxiv.org/abs/2402.16696)] 115 | 116 | **LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error** [[Arxiv](https://arxiv.org/abs/2403.04746)][[Code](https://github.com/microsoft/simulated-trial-and-error)] 117 | 118 | Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments [[Arxiv](https://arxiv.org/abs/2402.14672)][[Code](https://github.com/OSU-NLP-Group/Fuxi)] 119 | 120 | Equipping Language Models with Tool Use Capability for Tabular Data Analysis in Finance [[Arxiv](https://arxiv.org/abs/2401.15328)][[Code](https://github.com/adriantheuma/llama2-raven)] 121 | 122 | ## Benchmark 123 | (APIBench) Gorilla: Large Language Model Connected with Massive APIs [[Arxiv](https://arxiv.org/abs/2305.15334)][[Code](https://github.com/ShishirPatil/gorilla)] 124 | 125 | API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs [[EMNLP](https://arxiv.org/abs/2304.08244)][[Code](https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/api-bank)] 126 | 127 | (ToolBench) ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs [[Arxiv](https://arxiv.org/pdf/2307.16789.pdf)][[Code](https://github.com/openbmb/toolbench)] 128 | 129 | ToolQA: A Dataset for LLM Question Answering with External Tools [[NeurIPS 2023](https://arxiv.org/abs/2306.13304)][[Code](https://github.com/night-chen/toolqa)] 130 | 131 | MetaTool Benchmark: Deciding Whether to Use Tools and Which to Use [[Arxiv](https://arxiv.org/abs/2310.03128)][[Code](https://github.com/howiehwong/metatool)] 132 | 133 | T-Eval: Evaluating the Tool Utilization Capability Step by Step [[Arxiv](https://arxiv.org/pdf/2312.14033.pdf)][[Code](https://github.com/open-compass/T-Eval)] 134 | 135 | MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback [[Arxiv](https://arxiv.org/abs/2309.10691)][[Code](https://github.com/xingyaoww/mint-bench)] 136 | 137 | ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios [[Arxiv](https://arxiv.org/abs/2401.00741)][[Code](https://github.com/junjie-ye/tooleyes)] 138 | 139 | A Comprehensive Evaluation of Tool-Assisted Generation Strategies [[EMNLP Findings 2023](https://arxiv.org/abs/2310.10062)] 140 | 141 | ToolTalk: Evaluating Tool-Usage in a Conversational Setting [[Arxiv](https://arxiv.org/abs/2311.10775)][[Code](https://github.com/microsoft/ToolTalk)] 142 | 143 | InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks [[Arxiv](https://arxiv.org/abs/2401.05507)][[Code](https://github.com/infiagent/infiagent)] 144 | 145 | RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning [[Arxiv](https://arxiv.org/abs/2401.08326)][[Code](https://github.com/junjie-ye/rotbench)] 146 | 147 | Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios [[Arxiv](https://arxiv.org/abs/2401.17167)][[Code](https://github.com/joeying1019/ultratool)] 148 | 149 | ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages [[Arxiv](https://arxiv.org/abs/2402.10753)][[Code](https://github.com/junjie-ye/toolsword)] 150 | 151 | StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models [[Arxiv](https://arxiv.org/pdf/2403.07714.pdf)][Code](https://github.com/zhichengg/stabletoolbench)] 152 | 153 | m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks [[Arxiv](https://arxiv.org/abs/2403.11085)][[Code](https://github.com/RAIVNLab/mnms)] 154 | 155 | 156 | 157 | 158 | --------------------------------------------------------------------------------