ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited reasoning scalability of large language models (LLMs) in structured output tasks—particularly function calling—this paper introduces the first process-level scaling framework for structured reasoning. Our method comprises three key contributions: (1) constructing the first fine-grained supervision dataset for function-calling processes; (2) proposing the “explore more, retain less” principle for structured reasoning scaling; and (3) designing ToolPRM, a *process-oriented reward model* based on function masking, integrated with fine-grained beam search and step-level reward annotation via automated labeling. Evaluated across multiple function-calling benchmarks, our approach significantly outperforms coarse-grained and outcome-level reward baselines. Results demonstrate that explicit modeling of intermediate reasoning steps enhances both the effectiveness and generalizability of structured reasoning scaling, validating the critical role of process-aware supervision in advancing LLMs’ structured output capabilities.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are increasingly demonstrating strong capabilities as autonomous agents, with function calling serving as a core mechanism for interaction with the environment. Meanwhile, inference scaling has become a cutting-edge technique to enhance LLM performance by allocating more computational resources during the inference process. However, current research on inference scaling primarily focuses on unstructured output generation tasks, leaving its application in structured outputs, like function calling, largely underexplored. To bridge this gap, we propose an inference scaling framework that combines fine-grained beam search with a process reward model, ToolPRM, which scores the internal steps of each single function call. To train ToolPRM, we construct the first fine-grained intra-call process supervision dataset, automatically annotated with function-masking techniques to provide step-level rewards for structured tool-use reasoning. Extensive experiments demonstrate that ToolPRM beats the coarse-grained and outcome reward models in terms of predictive accuracy, indicating its stronger capability in supervising the function calling inference process. Inference scaling technique equipped with ToolPRM also significantly improves the backbone model performance across various function calling tasks and benchmarks. More importantly, we reveal a key principle for applying inference scaling techniques to structured outputs: "explore more but retain less" due to the unrecoverability characteristics of structured function calling generation.
Problem

Research questions and friction points this paper is trying to address.

Scaling inference for structured function calling outputs in LLMs
Developing fine-grained process supervision for function call steps
Addressing unrecoverability in structured output generation through exploration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained beam search with process reward model
Automatically annotated intra-call process supervision dataset
Inference scaling principle: explore more but retain less
🔎 Similar Papers
No similar papers found.