Navigating the Helpfulness-Truthfulness Trade-Off with Uncertainty-Aware Instruction Fine-Tuning

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Instruction fine-tuning (IFT) enhances large language models’ (LLMs) utility but often degrades factual consistency, primarily because models over-rely on long-tail knowledge insufficiently covered during pretraining, thereby increasing hallucination. This work presents the first systematic analysis of the utility–fidelity trade-off inherent in IFT and proposes UNIT, an uncertainty-aware instruction fine-tuning paradigm. UNIT jointly models response generation and uncertainty estimation, incorporating explicit uncertainty reflection tokens and confidence-driven response suffix injection. Evaluated under a multi-dimensional fidelity assessment protocol across multiple benchmarks, UNIT reduces hallucination rates by 38.2% on average while preserving or improving task completion rates and user preference scores. Thus, it achieves synergistic enhancement of both utility and factual fidelity—resolving the longstanding tension between helpfulness and truthfulness in LLM alignment.

Technology Category

Application Category

📝 Abstract

Instruction Fine-tuning (IFT) can enhance the helpfulness of Large Language Models (LLMs), but it may lower their truthfulness. This trade-off arises because IFT steers LLMs to generate responses with long-tail knowledge that is not well covered during pre-training, leading to more informative but less truthful answers when generalizing to unseen tasks. In this paper, we empirically demonstrate this helpfulness-truthfulness trade-off in IFT and propose $ extbf{UNIT}$, a novel IFT paradigm to address it. UNIT teaches LLMs to recognize their uncertainty and explicitly reflect it at the end of their responses. Experimental results show that UNIT-tuned models maintain their helpfulness while distinguishing between certain and uncertain claims, thereby reducing hallucinations.

Problem

Research questions and friction points this paper is trying to address.

Addresses helpfulness-truthfulness trade-off in LLMs

Proposes uncertainty-aware instruction fine-tuning

Reduces hallucinations in model responses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty-Aware Instruction Fine-Tuning

Enhances helpfulness and truthfulness

Reduces hallucinations in model responses

🔎 Similar Papers

Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation