AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification

πŸ“… 2025-03-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing interactive clarification methods for ambiguous user queries in LLM-based tool invocation rely on manually curated data and lack multi-turn error correction mechanisms. Method: We propose the first parameter-driven automated clarification data construction paradigm: (1) generating high-quality multi-turn clarification samples via structured query-tool mapping; (2) introducing selective masking fine-tuning for dynamic error detection and self-correction; and (3) integrating parameter disentanglement reconstruction, intent recovery modeling, and tool trajectory-augmented training. Contribution/Results: Our self-correcting clarification framework requires no human annotation and supports zero-shot API generalization. Experiments show 79.2% accuracy in implicit intent recovery, a 48.3% improvement in clarification efficiency, and zero-shot adaptation performance on unseen APIs comparable to GPT-4β€”while significantly reducing inference overhead.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) have demonstrated remarkable capabilities in tool learning. In real-world scenarios, user queries are often ambiguous and incomplete, requiring effective clarification. However, existing interactive clarification approaches face two critical limitations: reliance on manually constructed datasets and lack of error correction mechanisms during multi-turn clarification. We present AskToAct, which addresses these challenges by exploiting the structural mapping between queries and their tool invocation solutions. Our key insight is that tool parameters naturally represent explicit user intents. By systematically removing key parameters from queries while retaining them as ground truth, we enable automated construction of high-quality training data. We further enhance model robustness by fine-tuning on error-correction augmented data using selective masking mechanism, enabling dynamic error detection during clarification interactions. Comprehensive experiments demonstrate that AskToAct significantly outperforms existing approaches, achieving above 79% accuracy in recovering critical unspecified intents and enhancing clarification efficiency by an average of 48.34% while maintaining high accuracy in tool invocation. Our framework exhibits robust performance across varying complexity levels and successfully generalizes to entirely unseen APIs without additional training, achieving performance comparable to GPT-4 with substantially fewer computational resources.
Problem

Research questions and friction points this paper is trying to address.

Addresses ambiguity in user queries for tool use.
Automates training data construction without manual datasets.
Enhances error correction in multi-turn clarification interactions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated training data construction via parameter removal
Error-correction augmented data with selective masking
Dynamic error detection during multi-turn clarification
πŸ”Ž Similar Papers
X
Xuan Zhang
Zhejiang University
Y
Yongliang Shen
Zhejiang University
Z
Zhe Zheng
Zhejiang University
L
Linjuan Wu
Zhejiang University
Wenqi Zhang
Wenqi Zhang
Zhejiang University
Language ModelMultimodal LearningEmbodied Agents
Y
Yuchen Yan
Zhejiang University
Qiuying Peng
Qiuying Peng
OPPO Research Institute
artificial intelligence
J
Jun Wang
OPPO Research Institute
Weiming Lu
Weiming Lu
Zhejiang University
Natural Language ProcessingLarge Language ModelsAGI