CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision

📅 2025-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low tool-use efficiency and difficulty in verifying intermediate steps when large language models (LLMs) tackle complex tasks, this paper proposes a stepwise tool-calling framework grounded in code generation. The method integrates codified tool interfaces, process-aware supervised learning, cumulative reward modeling, and reinforcement learning–based policy optimization. Its core contribution is a novel dual-process reward mechanism: an on-the-spot reward provides immediate feedback to ensure step-level execution correctness, while a latent reward quantifies each step’s contribution to global reasoning utility, jointly optimizing both efficiency and reliability. Evaluated on StableToolBench and RestBench-TMDB benchmarks, the framework achieves significant improvements in tool-call accuracy and reasoning efficiency, surpassing current state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Tool invocation significantly enhances the capabilities of Large Language Models (LLMs), yet challenges persist, particularly in complex task scenarios. Current methods, such as instruction-enhanced reasoning and supervised fine-tuning, often result in unnecessarily long reasoning paths and face difficulties in verifying the correctness of intermediate steps. In this paper, we propose CodeTool, a novel framework for stepwise code generation that improves LLM tool invocation by leveraging the concise and easily verifiable nature of code. CodeTool incorporates two distinct process rewards: the On-the-spot Reward, which provides immediate feedback on the accuracy of each tool invocation, and the Latent Reward, which assesses the contribution of each step toward overall task completion. By maximizing the cumulative reward of the On-the-spot and Latend Rewards at each step, LLMs are guided to follow efficient and accurate reasoning paths. Extensive experiments on StableToolBench and RestBench-TMDB demonstrate the superiority of CodeTool over existing approaches.
Problem

Research questions and friction points this paper is trying to address.

Improving LLM tool invocation in complex tasks
Reducing lengthy reasoning paths in tool use
Enhancing verification of intermediate steps accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stepwise code generation for LLM tool invocation
On-the-spot and Latent Rewards for process supervision
Concise verifiable code improves reasoning accuracy
🔎 Similar Papers
No similar papers found.