ToolACE: Winning the Points of LLM Function Calling

📅 2024-09-02

🏛️ International Conference on Learning Representations

📈 Citations: 48

✨ Influential: 9

career value

165K/year

🤖 AI Summary

To address the scarcity of high-quality, diverse function-calling training data and the limited coverage and low accuracy of existing synthetic approaches, this paper proposes a self-evolving multi-agent data synthesis framework. First, it introduces an API self-evolution mining mechanism to construct a highly comprehensive toolset covering 26,507 APIs. Second, it integrates multi-agent collaborative dialogue with formal reasoning guidance to generate instruction-call pairs exhibiting high complexity and diversity. Third, it employs a dual-layer verification scheme—combining rule-based checks and large language model (LLM) evaluation—to ensure both semantic correctness and executable accuracy. Remarkably, using only an 8B-parameter model, our method achieves state-of-the-art performance on the Berkeley Function-Calling Benchmark, matching GPT-4’s accuracy. The code, model weights, and a subset of synthesized data are publicly released on Hugging Face.

Technology Category

Application Category

📝 Abstract

Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, guided by a formalized thinking process. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard, rivaling the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.

Problem

Research questions and friction points this paper is trying to address.

Generating high-quality diverse function-calling training data

Overcoming limitations of synthetic data coverage accuracy

Enabling LLMs to achieve state-of-the-art function-calling performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-evolution synthesis for API pool

Multi-agent dialog generation

Dual-layer verification system

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation