Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fragility of large language models’ beliefs under contextual perturbations, highlighting that existing point-estimate confidence metrics—such as self-consistency—fail to accurately capture robustness in truthfulness. To overcome this limitation, the authors propose Neighborhood Consistency Belief (NCB), a novel metric, together with Structure-Aware Training (SAT), a corresponding training method. By constructing a structured evaluation framework that assesses response consistency across conceptually related neighborhoods, this approach provides the first diagnostic and enhancement of belief stability from the perspective of structural robustness. Experimental results demonstrate that responses with high NCB exhibit greater resistance to contextual interference, and that SAT reduces vulnerability in long-tail knowledge by approximately 30%.

Technology Category

Application Category

📝 Abstract
As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose Neighbor-Consistency Belief (NCB), a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present Structure-Aware Training (SAT), which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%. Code will be available at https://github.com/zjunlp/belief.
Problem

Research questions and friction points this paper is trying to address.

truthfulness
contextual perturbations
belief robustness
large language models
self-consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neighbor-Consistency Belief
belief robustness
contextual interference
Structure-Aware Training
cognitive stress-testing
🔎 Similar Papers
No similar papers found.