└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # awesome-llm-tool-learning
  2 | A list of awesome papers on LLM tool learning.
  3 | 
  4 | ## Preliminary
  5 | ReAct: Synergizing Reasoning and Acting in Language Models [[ICLR 2023](cmt3.research.microsoft.com/AAAI2024/Submission/MetaReviews/14045)][[Code](https://github.com/ysymyth/ReAct)]
  6 | 
  7 | RRHF: Rank Responses to Align Language Models with Human Feedback without tears [[NeurIPS 2023](https://arxiv.org/abs/2304.05302)][[Code](https://github.com/ganjinzero/rrhf)]
  8 | 
  9 | Extending Context Window of Large Language Models via Positional Interpolation [[Arxiv 2023](https://arxiv.org/abs/2306.15595)][[Code](https://github.com/ymcui/chinese-llama-alpaca-2)]
 10 | 
 11 | ## Survey
 12 | Tool Learning with Foundation Models [[Arxiv](https://arxiv.org/abs/2304.08354)][[Code](https://github.com/openbmb/bmtools)]
 13 | 
 14 | ## Papers
 15 | ### 2023
 16 | HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face [[Arxiv](https://arxiv.org/abs/2303.17580)][[Code](https://github.com/microsoft/JARVIS)]
 17 | 
 18 | ART: Automatic multi-step reasoning and tool-use for large language models [[Arxiv](https://arxiv.org/abs/2303.09014)][[Code](https://github.com/guidance-ai/guidance)] 
 19 | 
 20 | Gorilla: Large Language Model Connected with Massive APIs [[Arxiv](https://arxiv.org/abs/2305.15334)][[Code](https://github.com/ShishirPatil/gorilla)]
 21 | 
 22 | TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs [[Arxiv](https://arxiv.org/abs/2303.16434)][[Code](https://github.com/moymix/TaskMatrix)]
 23 | 
 24 | Large Language Models as Tool Makers [[Arxiv](https://arxiv.org/abs/2305.17126)][[Code](https://github.com/ctlllll/llm-toolmaker)]
 25 | 
 26 | MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting [[ACL 2023](https://arxiv.org/pdf/2305.16896.pdf)][[Code](https://github.com/InabaTatsuro/MultiTool-CoT)]
 27 | 
 28 | Gentopia.AI: A Collaborative Platform for Tool-Augmented LLMs [[EMNLP 2023](https://aclanthology.org/2023.emnlp-demo.20/)][[Code](https://github.com/Gentopia-AI/Gentopia)]
 29 | 
 30 | CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models [[EMNLP Findings 2023](https://aclanthology.org/2023.findings-emnlp.462.pdf)][[Code](https://github.com/qiancheng0/creator)]
 31 | 
 32 | On the Tool Manipulation Capability of Open-source Large Language Models [[Arxiv](https://arxiv.org/abs/2305.16504)][[Code](https://github.com/sambanova/toolbench)]
 33 | 
 34 | Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models [[Arxiv](https://arxiv.org/pdf/2308.00675.pdf)]
 35 | 
 36 | Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models [[NeurIPS 2023](https://arxiv.org/abs/2304.09842)][[Code](https://github.com/lupantech/chameleon-llm)]
 37 | 
 38 | **Toolformer: Language Models Can Teach Themselves to Use Tools** [[NeurIPS 2023](https://arxiv.org/abs/2302.04761)][[Code](https://github.com/lucidrains/toolformer-pytorch)]
 39 | 
 40 | **GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction** [[NeurIPS 2023](https://arxiv.org/abs/2305.18752)][[Code](https://github.com/AILab-CVC/GPT4Tools)]
 41 | 
 42 | Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning [[NeurIPS 2023](https://arxiv.org/abs/2306.02408)][[Code](https://github.com/rucaibox/carp)
 43 | 
 44 | ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings [[NeurIPS 2023](https://arxiv.org/abs/2305.11554)][[Code](https://github.com/Ber666/ToolkenGPT)]
 45 | 
 46 | TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage [[NeurIPS 2023 Workshop](https://arxiv.org/abs/2308.03427)]
 47 | 
 48 | Making Language Models Better Tool Learners with Execution Feedback [[Arxiv](https://arxiv.org/abs/2305.13068)][[Code](https://github.com/zjunlp/trice)] ![](https://img.shields.io/badge/RL-orange)
 49 | 
 50 | ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases [[Arxiv](https://arxiv.org/abs/2306.05301)][[Code](https://github.com/tangqiaoyu/ToolAlpaca)]
 51 | 
 52 | Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum [[AAAI 2024](https://arxiv.org/abs/2308.14034)][[Code](https://github.com/shizhl/Confucius)]
 53 | 
 54 | **ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs** [[ICLR 2024](https://arxiv.org/pdf/2307.16789.pdf)][[Code](https://github.com/openbmb/toolbench)]
 55 | 
 56 | CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets [[ICLR 2024](https://arxiv.org/abs/2309.17428)][[Code](https://github.com/lifan-yuan/craft)]
 57 | 
 58 | ToolDec: Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding [[Arxiv](https://arxiv.org/abs/2310.07075)][[Code](https://github.com/chenhongqiao/tooldec)]
 59 | 
 60 | Identifying the Risks of LM Agents with an LM-Emulated Sandbox [[Arxiv](https://arxiv.org/pdf/2309.15817.pdf)][[Code](https://github.com/ryoungj/ToolEmu)]
 61 | 
 62 | ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search [[Arxiv](https://arxiv.org/pdf/2310.13227v1.pdf)] ![](https://img.shields.io/badge/Planning-green)
 63 | 
 64 | Tool-Augmented Reward Modeling [[Arxiv](https://arxiv.org/abs/2310.01045)] ![](https://img.shields.io/badge/RL-orange)
 65 | 
 66 | ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving [[Arxiv](https://arxiv.org/abs/2309.17452)][[Code](https://github.com/microsoft/ToRA)]
 67 | 
 68 | CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing [[Arxiv](https://arxiv.org/pdf/2305.11738.pdf)][[Code](https://github.com/microsoft/ProphetNet/tree/master/CRITIC)]
 69 | 
 70 | RestGPT: Connecting Large Language Models with Real-World RESTful APIs [[Arxiv](https://arxiv.org/abs/2306.06624)]
 71 | 
 72 | **Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use** [[Arxiv](https://arxiv.org/abs/2312.04455)]
 73 | 
 74 | ControlLLM: Augment Language Models with Tools by Searching on Graphs [[Arxiv](https://arxiv.org/pdf/2310.17796.pdf)][[Code](https://github.com/opengvlab/controlllm)] ![](https://img.shields.io/badge/Planning-green)
 75 | 
 76 | GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution [[Arxiv](https://arxiv.org/abs/2307.08775)][[Code](https://github.com/yining610/gear)]
 77 | 
 78 | GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension [[Arxiv](https://arxiv.org/pdf/2312.17294.pdf)]
 79 | 
 80 | AppAgent: Multimodal Agents as Smartphone Users [[Arxiv](https://arxiv.org/abs/2312.13771)][[Code](https://github.com/mnotgod96/AppAgent?tab=readme-ov-file)]
 81 | 
 82 | VIoTGPT: Learning to Schedule Vision Tools towards Intelligent Video Internet of Things [[Arxiv](https://arxiv.org/abs/2312.00401)]
 83 | 
 84 | Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning [[Arxiv](https://arxiv.org/pdf/2310.04474.pdf)][[Code](https://github.com/ASK-03/Reverse-Chain)] ![](https://img.shields.io/badge/Planning-green)
 85 | 
 86 | CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update [[Arxiv](https://arxiv.org/abs/2312.10908)]
 87 | 
 88 | FARS: Fsm-Augmentation to Make LLMs Hallucinate the Right APIs [[Arxiv](https://openreview.net/pdf/847a1c7446716c28f2c9c63fa1d7bf07d02e7757.pdf)]
 89 | 
 90 | Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API [[Arxiv](https://arxiv.org/pdf/2310.04716.pdf)] ![](https://img.shields.io/badge/RL-orange)
 91 | 
 92 | Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering [[Arxiv](https://arxiv.org/abs/2401.01780)]
 93 | 
 94 | **EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction** [[Arxiv](https://arxiv.org/abs/2401.06201)][[Code](https://github.com/microsoft/JARVIS/tree/main/easytool)]
 95 | 
 96 | **Small LLMs Are Weak Tool Learners: A Multi-LLM Agent** [[Arxiv](https://arxiv.org/abs/2401.07324)]
 97 | 
 98 | 
 99 | ### 2024
100 | MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning  [[Arxiv](https://arxiv.org/abs/2401.10727)][[Code](https://github.com/MLLM-Tool/MLLM-Tool)]
101 | 
102 | **Efficient Tool Use with Chain-of-Abstraction Reasoning** [[Arxiv](https://arxiv.org/pdf/2401.17464v1.pdf)]
103 | 
104 | AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls [[Arxiv](https://arxiv.org/abs/2402.04253)][[Code](https://github.com/dyabel/anytool)]
105 | 
106 | ToolRerank: Adaptive and Hierarchy-Aware Reranking for Tool Retrieval [[Arxiv](https://arxiv.org/abs/2403.06551)][[Code](https://github.com/XiaoMi/ToolRerank)]
107 | 
108 | ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph [[Arxiv](https://arxiv.org/pdf/2403.00839.pdf)]
109 | 
110 | API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs [[Arxiv](https://arxiv.org/abs/2402.15491)]
111 | 
112 | TOOLVERIFIER: Generalization to New Tools via Self-Verification [[Arxiv](https://arxiv.org/abs/2402.14158)][[Code](https://github.com/facebookresearch/toolverifier)]
113 | 
114 | Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models [[Arxiv](https://arxiv.org/abs/2402.16696)]
115 | 
116 | **LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error** [[Arxiv](https://arxiv.org/abs/2403.04746)][[Code](https://github.com/microsoft/simulated-trial-and-error)]
117 | 
118 | Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments [[Arxiv](https://arxiv.org/abs/2402.14672)][[Code](https://github.com/OSU-NLP-Group/Fuxi)]
119 | 
120 | Equipping Language Models with Tool Use Capability for Tabular Data Analysis in Finance [[Arxiv](https://arxiv.org/abs/2401.15328)][[Code](https://github.com/adriantheuma/llama2-raven)]
121 | 
122 | ## Benchmark
123 | (APIBench) Gorilla: Large Language Model Connected with Massive APIs [[Arxiv](https://arxiv.org/abs/2305.15334)][[Code](https://github.com/ShishirPatil/gorilla)]
124 | 
125 | API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs [[EMNLP](https://arxiv.org/abs/2304.08244)][[Code](https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/api-bank)]
126 | 
127 | (ToolBench) ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs [[Arxiv](https://arxiv.org/pdf/2307.16789.pdf)][[Code](https://github.com/openbmb/toolbench)]
128 | 
129 | ToolQA: A Dataset for LLM Question Answering with External Tools [[NeurIPS 2023](https://arxiv.org/abs/2306.13304)][[Code](https://github.com/night-chen/toolqa)]
130 | 
131 | MetaTool Benchmark: Deciding Whether to Use Tools and Which to Use [[Arxiv](https://arxiv.org/abs/2310.03128)][[Code](https://github.com/howiehwong/metatool)]
132 | 
133 | T-Eval: Evaluating the Tool Utilization Capability Step by Step [[Arxiv](https://arxiv.org/pdf/2312.14033.pdf)][[Code](https://github.com/open-compass/T-Eval)]
134 | 
135 | MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback [[Arxiv](https://arxiv.org/abs/2309.10691)][[Code](https://github.com/xingyaoww/mint-bench)]
136 | 
137 | ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios [[Arxiv](https://arxiv.org/abs/2401.00741)][[Code](https://github.com/junjie-ye/tooleyes)]
138 | 
139 | A Comprehensive Evaluation of Tool-Assisted Generation Strategies [[EMNLP Findings 2023](https://arxiv.org/abs/2310.10062)]
140 | 
141 | ToolTalk: Evaluating Tool-Usage in a Conversational Setting [[Arxiv](https://arxiv.org/abs/2311.10775)][[Code](https://github.com/microsoft/ToolTalk)]
142 | 
143 | InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks [[Arxiv](https://arxiv.org/abs/2401.05507)][[Code](https://github.com/infiagent/infiagent)]
144 | 
145 | RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning [[Arxiv](https://arxiv.org/abs/2401.08326)][[Code](https://github.com/junjie-ye/rotbench)]
146 | 
147 | Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios [[Arxiv](https://arxiv.org/abs/2401.17167)][[Code](https://github.com/joeying1019/ultratool)]
148 | 
149 | ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages [[Arxiv](https://arxiv.org/abs/2402.10753)][[Code](https://github.com/junjie-ye/toolsword)]
150 | 
151 | StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models [[Arxiv](https://arxiv.org/pdf/2403.07714.pdf)][Code](https://github.com/zhichengg/stabletoolbench)]
152 | 
153 | m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks [[Arxiv](https://arxiv.org/abs/2403.11085)][[Code](https://github.com/RAIVNLab/mnms)]
154 | 
155 | 
156 | 
157 | 
158 | 


--------------------------------------------------------------------------------