Neural Uncertainty Principle: A Unified View of Adversarial Fragility and LLM Hallucination

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the prevailing treatment of adversarial vulnerability in vision models and hallucination in large language models as isolated phenomena, lacking a unified theoretical foundation. The authors propose the Neural Uncertainty Principle (NUP), which establishes that an input and its loss gradient—viewed as conjugate observables—are fundamentally constrained by an irreducible uncertainty bound. This principle provides the first unified theoretical framework linking these two challenges. Building on NUP, they introduce ConjMask, a training-free masking strategy that enhances visual robustness without adversarial training, and develop a single backward pass probe during prefilling combined with LogitReg logit regularization to detect hallucination risk prior to generation. Their approach requires neither additional training nor modifications to the decoding process, enabling informed prompt selection and cross-modal reliability assessment.

Technology Category

Application Category

📝 Abstract
Adversarial vulnerability in vision and hallucination in large language models are conventionally viewed as separate problems, each addressed with modality-specific patches. This study first reveals that they share a common geometric origin: the input and its loss gradient are conjugate observables subject to an irreducible uncertainty bound. Formalizing a Neural Uncertainty Principle (NUP) under a loss-induced state, we find that in near-bound regimes, further compression must be accompanied by increased sensitivity dispersion (adversarial fragility), while weak prompt-gradient coupling leaves generation under-constrained (hallucination). Crucially, this bound is modulated by an input-gradient correlation channel, captured by a specifically designed single-backward probe. In vision, masking highly coupled components improves robustness without costly adversarial training; in language, the same prefill-stage probe detects hallucination risk before generating any answer tokens. NUP thus turns two seemingly separate failure taxonomies into a shared uncertainty-budget view and provides a principled lens for reliability analysis. Guided by this NUP theory, we propose ConjMask (masking high-contribution input components) and LogitReg (logit-side regularization) to improve robustness without adversarial training, and use the probe as a decoding-free risk signal for LLMs, enabling hallucination detection and prompt selection. NUP thus provides a unified, practical framework for diagnosing and mitigating boundary anomalies across perception and generation tasks.
Problem

Research questions and friction points this paper is trying to address.

adversarial fragility
LLM hallucination
neural uncertainty
input-gradient correlation
model reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Uncertainty Principle
adversarial fragility
LLM hallucination
input-gradient correlation
uncertainty bound
🔎 Similar Papers
No similar papers found.
D
Dong-Xiao Zhang
Northwest Institute of Nuclear Technology, Xi’an, Shaanxi 710024, China
H
Hu Lou
Northwest Institute of Nuclear Technology, Xi’an, Shaanxi 710024, China
J
Jun-Jie Zhang
Northwest Institute of Nuclear Technology, Xi’an, Shaanxi 710024, China
Jun Zhu
Jun Zhu
Professor of Computer Science, Tsinghua University
Machine LearningBayesian MethodsDeep Generative ModelsAdversarial RobustnessReinforcement Learning
Deyu Meng
Deyu Meng
Professor, Xi'an Jiaotong University
Machine LearningApplied MathematicsComputer VisionArtificial Intelligence