Case-Based Calibration of Adaptive Reasoning and Execution for LLM Tool Use

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

Large language models (LLMs) often struggle with tool usage due to difficulties in balancing reasoning depth and structural validity, leading to redundant or structurally flawed outputs. To address this, this work proposes CAST, a novel framework that introduces case-driven learning into LLM tool invocation for the first time. CAST constructs structured cases from historical execution trajectories, extracting task complexity and failure patterns, and integrates fine-grained rewards with reinforcement learning to adaptively calibrate reasoning strategies and proactively avoid structural errors. Experimental results demonstrate that CAST improves execution accuracy by up to 5.85 percentage points on BFCLv2 and ToolBench benchmarks while reducing average reasoning length by 26%, substantially mitigating high-impact structural failures.

📝 Abstract

Tool use extends large language models beyond parametric knowledge, but reliable execution requires balancing appropriate reasoning depth with strict structural validity. We approach this problem from a case-based perspective to present CAST, a case-driven framework that treats historical execution trajectories as structured cases. Instead of reusing raw exemplar outputs, CAST extracts case-derived signals to identify complexity profiles for estimating optimal reasoning strategies, alongside failure profiles to map likely structural breakdowns. The framework translates this knowledge into a fine-grained reward design and adaptive reasoning, enabling the model to autonomously internalize case-based strategies during reinforcement learning. Experiments on BFCLv2 and ToolBench demonstrate that CAST improves both schema-faithful execution and task-level tool-use success while reducing unnecessary deliberation. The approach achieves up to 5.85 percentage points gain in overall execution accuracy and reduces average reasoning length by 26%, significantly mitigating high-impact structural errors. Ultimately, this demonstrates how historical execution cases can provide reusable adaptation knowledge for calibrated tool use.

Problem

Research questions and friction points this paper is trying to address.

tool use

reasoning depth

structural validity

execution reliability

case-based calibration

Innovation

Methods, ideas, or system contributions that make the work stand out.

case-based reasoning

tool use

adaptive reasoning