I-CALM: Incentivizing Confidence-Aware Abstention for LLM Hallucination Mitigation

📅 2026-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the tendency of large language models to produce confidently stated yet factually incorrect responses under uncertainty, thereby exacerbating hallucination risks. The authors propose a purely prompt-based intervention framework that requires no fine-tuning and uniquely integrates confidence-guided decision-making, explicit abstention rewards, and principled humility norms to encourage models to abstain when confidence is low. The approach encompasses prompt-based confidence estimation, a dynamic abstention incentive mechanism, injection of normative principles, and selective answer evaluation. Experiments on GPT-5 mini and PopQA demonstrate that the method significantly reduces erroneous responses by effectively converting high-risk errors into abstentions, achieving a superior trade-off between hallucination suppression and answer coverage.
📝 Abstract
Large language models (LLMs) frequently produce confident but incorrect answers, partly because common binary scoring conventions reward answering over honestly expressing uncertainty. We study whether prompt-only interventions -- explicitly announcing reward schemes for answer-versus-abstain decisions plus humility-oriented normative principles -- can reduce hallucination risk without modifying the model. Our focus is epistemic abstention on factual questions with a verifiable answer, where current LLMs often fail to abstain despite being uncertain about their answers. We first assess self-reported verbal confidence as a usable uncertainty signal, showing stability under prompt paraphrasing and reasonable calibration against a token-probability baseline. We then study I-CALM, a prompt-based framework that (i) elicits verbal confidence, (ii) partially rewards abstention through explicit reward schemes, and (iii) adds lightweight normative principles emphasizing truthfulness, humility, and responsibility. Using GPT-5 mini on PopQA as the main setting, we find that confidence-eliciting, abstention-rewarding prompts, especially with norms, reduce the false-answer rate on answered cases mainly by identifying and shifting error-prone cases to abstention and re-calibrating their confidence. This trades coverage for reliability while leaving forced-answer performance largely unchanged. Varying the abstention reward yields a clear abstention-hallucination frontier. Overall, results show the framework can improve selective answering on factual questions without retraining, with the magnitude of effect varying across models and datasets. Code is available at the following https://github.com/binzeli/hallucinationControl.
Problem

Research questions and friction points this paper is trying to address.

LLM hallucination
epistemic abstention
confidence calibration
selective answering
factual question answering
Innovation

Methods, ideas, or system contributions that make the work stand out.

confidence-aware abstention
hallucination mitigation
prompt-based intervention
selective answering
reward scheme
🔎 Similar Papers
No similar papers found.
H
Haotian Zong
Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA
B
Binze Li
Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
Y
Yufei Long
Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
S
Sinyin Chang
Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
J
Jialong Wu
Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
Gillian K. Hadfield
Gillian K. Hadfield
Johns Hopkins University, Dept of Computer Science and School of Government and Policy
AI policygovernance and safetyhuman and machine normative systems