Self-Guided Function Calling in Large Language Models via Stepwise Experience Recall

📅 2025-08-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) face challenges in multi-step tool invocation—including tool selection, parameter generation, and sequential planning—while existing approaches rely on hand-crafted demonstrations or static tool repositories, suffering from poor scalability and high maintenance overhead. This paper proposes Stepwise Experience Recall (SEER), a framework that maintains a dynamically updated experience pool to enable step-level fine-grained experience retrieval and reuse, facilitating self-guided, prompt-free continual learning. Built upon the Qwen2.5 series, SEER end-to-end optimizes function-calling capabilities. It achieves +4.7–6.1% accuracy gains on ToolQA and +7.44% and +23.38% improvements on τ-bench for Qwen2.5-7B and Qwen2.5-72B, respectively, substantially outperforming baselines. The core contribution is the first integration of an experience replay mechanism into multi-step tool calling, enabling autonomous experience accumulation, dynamic retrieval, and cross-task generalization.

Technology Category

Application Category

📝 Abstract
Function calling enables large language models (LLMs) to interact with external systems by leveraging tools and APIs. When faced with multi-step tool usage, LLMs still struggle with tool selection, parameter generation, and tool-chain planning. Existing methods typically rely on manually designing task-specific demonstrations, or retrieving from a curated library. These approaches demand substantial expert effort and prompt engineering becomes increasingly complex and inefficient as tool diversity and task difficulty scale. To address these challenges, we propose a self-guided method, Stepwise Experience Recall (SEER), which performs fine-grained, stepwise retrieval from a continually updated experience pool. Instead of relying on static or manually curated library, SEER incrementally augments the experience pool with past successful trajectories, enabling continuous expansion of the pool and improved model performance over time. Evaluated on the ToolQA benchmark, SEER achieves an average improvement of 6.1% on easy and 4.7% on hard questions. We further test SEER on $τ$-bench, which includes two real-world domains. Powered by Qwen2.5-7B and Qwen2.5-72B models, SEER demonstrates substantial accuracy gains of 7.44% and 23.38%, respectively.
Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with multi-step tool selection and planning
Existing methods require manual effort and complex engineering
Need for scalable self-guided function calling approach
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-guided stepwise retrieval from experience pool
Continuously updated experience pool with past successes
Fine-grained retrieval replacing static manual libraries
🔎 Similar Papers
No similar papers found.
S
Sijia Cui
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences
A
Aiyao He
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences
S
Shuai Xu
Nanjing University of Information Science & Technology
H
Hongming Zhang
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences
Y
Yanna Wang
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences
Qingyang Zhang
Qingyang Zhang
PhD student, Tianjin University
Large Reasoning ModelsOut-of-DistributionMultimodal Fusion
Y
Yajing Wang
Institute of Computing Technology, Chinese Academy of Sciences
B
Bo Xu
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences