Enhancing Tool Learning in Large Language Models with Hierarchical Error Checklists

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently fail in tool invocation due to erroneous parameter filling. To address this, we propose HiTEC, a hierarchical tool error checking framework that enables fine-grained diagnosis and mitigation via a two-level (global–local) error checklist. Methodologically, we introduce HiTEC-ICL, a novel context-augmentation technique for in-context learning, and—first in this domain—leverage the Kahneman–Tversky heuristic to generate high-quality negative samples. These are integrated with a preference-based KTO (Kahneman–Tversky Optimization) fine-tuning paradigm, enabling robust generalization without reliance on large-scale real-world interaction data. Evaluated across five public benchmarks, HiTEC achieves significant improvements in parameter-filling accuracy and tool-call success rate, consistently outperforming all state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have significantly advanced natural language processing, particularly through the integration of external tools and APIs. However, their effectiveness is frequently hampered by parameter mis-filling during tool calling. In this paper, we propose the Hierarchical Tool Error Checklist (HiTEC) framework to systematically diagnose and mitigate tool-calling errors without relying on extensive real-world interactions. HiTEC introduces a two-tiered approach: a global error checklist that identifies common, cross-tool issues, and a local error checklist that targets tool-specific and contextual failures. Building on this structure, we propose two deployments: HiTEC-In Context Learning (HiTEC-ICL) and HiTEC-Kahneman-Tversky Optimization (HiTEC-KTO). HiTEC-ICL embeds the global checklist in the initial prompts and leverages a two-round conversational interaction to dynamically refine parameter handling, while HiTEC-KTO generates high-quality negative examples to drive fine-tuning via preference-based optimization. Extensive experiments across five public datasets demonstrate that our framework significantly improves parameter-filling accuracy and tool-calling success rates compared to baseline methods.
Problem

Research questions and friction points this paper is trying to address.

Diagnosing tool-calling errors in LLMs without real-world interactions
Improving parameter-filling accuracy in large language models
Enhancing tool-calling success rates via hierarchical error checklists
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Tool Error Checklist (HiTEC) framework
Two-tiered global and local error checklists
HiTEC-ICL and HiTEC-KTO deployment methods
🔎 Similar Papers