Agentic Uncertainty Quantification

📅 2026-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of existing AI agents to “hallucination spirals” in long-horizon reasoning, where early errors propagate irreversibly and compromise reliability. To mitigate this, the authors propose a dual-process Agent-based Uncertainty Quantification (AUQ) framework that uniquely repurposes uncertainty from a passive diagnostic metric into an active control signal. By integrating System 1 and System 2 reasoning mechanisms, AUQ enables on-demand reflection and memory-guided correction. The approach introduces training-free components—Uncertainty-Aware Memory (UAM) and Uncertainty-Aware Reflection (UAR)—which jointly leverage semantic explanations and confidence propagation. Evaluated on both closed-loop benchmarks and open-ended deep reasoning tasks, the method significantly improves task performance and trajectory-level calibration, demonstrating strong effectiveness and robustness.

Technology Category

Application Category

📝 Abstract
Although AI agents have demonstrated impressive capabilities in long-horizon reasoning, their reliability is severely hampered by the ``Spiral of Hallucination,''where early epistemic errors propagate irreversibly. Existing methods face a dilemma: uncertainty quantification (UQ) methods typically act as passive sensors, only diagnosing risks without addressing them, while self-reflection mechanisms suffer from continuous or aimless corrections. To bridge this gap, we propose a unified Dual-Process Agentic UQ (AUQ) framework that transforms verbalized uncertainty into active, bi-directional control signals. Our architecture comprises two complementary mechanisms: System 1 (Uncertainty-Aware Memory, UAM), which implicitly propagates verbalized confidence and semantic explanations to prevent blind decision-making; and System 2 (Uncertainty-Aware Reflection, UAR), which utilizes these explanations as rational cues to trigger targeted inference-time resolution only when necessary. This enables the agent to balance efficient execution and deep deliberation dynamically. Extensive experiments on closed-loop benchmarks and open-ended deep research tasks demonstrate that our training-free approach achieves superior performance and trajectory-level calibration. We believe this principled framework AUQ represents a significant step towards reliable agents.
Problem

Research questions and friction points this paper is trying to address.

Agentic Uncertainty Quantification
Spiral of Hallucination
AI reliability
uncertainty propagation
long-horizon reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic Uncertainty Quantification
Dual-Process Reasoning
Uncertainty-Aware Memory
Uncertainty-Aware Reflection
Training-Free Calibration
🔎 Similar Papers
No similar papers found.
J
Jiaxin Zhang
Salesforce AI Research
Prafulla Kumar Choubey
Prafulla Kumar Choubey
Salesforce AI Research
Natural Language ProcessingMachine Learning
K
Kung-Hsiang Huang
Salesforce AI Research
Caiming Xiong
Caiming Xiong
Salesforce Research
Machine LearningNLPComputer VisionMultimediaData Mining
C
Chien-Sheng Wu
Salesforce AI Research