Enhancing Tool Learning in Large Language Models with Hierarchical Error Checklists

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Large language models (LLMs) frequently fail in tool invocation due to erroneous parameter filling. To address this, we propose HiTEC, a hierarchical tool error checking framework that enables fine-grained diagnosis and mitigation via a two-level (global–local) error checklist. Methodologically, we introduce HiTEC-ICL, a novel context-augmentation technique for in-context learning, and—first in this domain—leverage the Kahneman–Tversky heuristic to generate high-quality negative samples. These are integrated with a preference-based KTO (Kahneman–Tversky Optimization) fine-tuning paradigm, enabling robust generalization without reliance on large-scale real-world interaction data. Evaluated across five public benchmarks, HiTEC achieves significant improvements in parameter-filling accuracy and tool-call success rate, consistently outperforming all state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have significantly advanced natural language processing, particularly through the integration of external tools and APIs. However, their effectiveness is frequently hampered by parameter mis-filling during tool calling. In this paper, we propose the Hierarchical Tool Error Checklist (HiTEC) framework to systematically diagnose and mitigate tool-calling errors without relying on extensive real-world interactions. HiTEC introduces a two-tiered approach: a global error checklist that identifies common, cross-tool issues, and a local error checklist that targets tool-specific and contextual failures. Building on this structure, we propose two deployments: HiTEC-In Context Learning (HiTEC-ICL) and HiTEC-Kahneman-Tversky Optimization (HiTEC-KTO). HiTEC-ICL embeds the global checklist in the initial prompts and leverages a two-round conversational interaction to dynamically refine parameter handling, while HiTEC-KTO generates high-quality negative examples to drive fine-tuning via preference-based optimization. Extensive experiments across five public datasets demonstrate that our framework significantly improves parameter-filling accuracy and tool-calling success rates compared to baseline methods.

Problem

Research questions and friction points this paper is trying to address.

Diagnosing tool-calling errors in LLMs without real-world interactions

Improving parameter-filling accuracy in large language models

Enhancing tool-calling success rates via hierarchical error checklists

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Tool Error Checklist (HiTEC) framework

Two-tiered global and local error checklists

HiTEC-ICL and HiTEC-KTO deployment methods

🔎 Similar Papers

Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?