π€ AI Summary
Large language models (LLMs) suffer from hallucination in multi-question joint reasoning due to ambiguous knowledge boundaries.
Method: We propose the first stepwise fine-tuning framework for decoupled optimization of multi-answer generation and confidence estimation. Our approach introduces a two-stage instruction-tuning framework that explicitly models the modelβs awareness of its knowledge boundaries via multitask loss separation, confidence calibration, and boundary-aware regularization. Unlike prior methods estimating confidence per isolated question, ours jointly optimizes answer generation and confidence assessment across interdependent questions.
Contribution/Results: Evaluated on multi-question reasoning benchmarks, our method achieves a 25% average improvement in precision over state-of-the-art hallucination mitigation and confidence calibration approaches. It establishes a novel paradigm for enhancing LLM reliability through structured, boundary-conscious reasoning.
π Abstract
With the widespread application of large language models (LLMs), the issue of generating non-existing facts, known as hallucination, has garnered increasing attention. Previous research in enhancing LLM confidence estimation mainly focuses on the single problem setting. However, LLM awareness of its internal parameterized knowledge boundary under the more challenging multi-problem setting, which requires answering multiple problems accurately simultaneously, remains underexplored. To bridge this gap, we introduce a novel method, Multiple Answers and Confidence Stepwise Tuning (MAC-Tuning), that separates the learning of answer prediction and confidence estimation during fine-tuning on instruction data. Extensive experiments demonstrate that our method outperforms baselines by up to 25% in average precision.