UProp: Investigating the Uncertainty Propagation of LLMs in Multi-Step Agentic Decision-Making

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing LLM uncertainty quantification (UQ) methods focus on single-turn question answering and fail to characterize uncertainty propagation in multi-step autonomous decision-making. This work addresses that gap by first disentangling uncertainty in multi-step agent decisions into *aleatoric* (intrinsic) and *epistemic* (extrinsic) components, and introduces UProp—the first UQ framework explicitly designed for multi-step decision processes. UProp models trajectory-dependent decision processes (TDPs) and innovatively estimates extrinsic uncertainty via pointwise mutual information (PMI), thereby overcoming fundamental limitations of single-turn UQ. Evaluated on multi-step benchmarks including AgentBench and HotpotQA, UProp consistently outperforms single-turn UQ baselines, achieving both sampling efficiency and interpretable uncertainty propagation across intermediate steps. Crucially, UProp is model-agnostic and integrates seamlessly with state-of-the-art LLMs such as GPT-4.1 and DeepSeek-V3.

Technology Category

Application Category

📝 Abstract

As Large Language Models (LLMs) are integrated into safety-critical applications involving sequential decision-making in the real world, it is essential to know when to trust LLM decisions. Existing LLM Uncertainty Quantification (UQ) methods are primarily designed for single-turn question-answering formats, resulting in multi-step decision-making scenarios, e.g., LLM agentic system, being underexplored. In this paper, we introduce a principled, information-theoretic framework that decomposes LLM sequential decision uncertainty into two parts: (i) internal uncertainty intrinsic to the current decision, which is focused on existing UQ methods, and (ii) extrinsic uncertainty, a Mutual-Information (MI) quantity describing how much uncertainty should be inherited from preceding decisions. We then propose UProp, an efficient and effective extrinsic uncertainty estimator that converts the direct estimation of MI to the estimation of Pointwise Mutual Information (PMI) over multiple Trajectory-Dependent Decision Processes (TDPs). UProp is evaluated over extensive multi-step decision-making benchmarks, e.g., AgentBench and HotpotQA, with state-of-the-art LLMs, e.g., GPT-4.1 and DeepSeek-V3. Experimental results demonstrate that UProp significantly outperforms existing single-turn UQ baselines equipped with thoughtful aggregation strategies. Moreover, we provide a comprehensive analysis of UProp, including sampling efficiency, potential applications, and intermediate uncertainty propagation, to demonstrate its effectiveness. Codes will be available at https://github.com/jinhaoduan/UProp.

Problem

Research questions and friction points this paper is trying to address.

Quantifying uncertainty in multi-step LLM decision-making

Differentiating internal and extrinsic uncertainty in LLMs

Developing efficient uncertainty estimator for sequential decisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes LLM uncertainty into intrinsic and extrinsic parts

Estimates extrinsic uncertainty using Mutual Information

Converts MI to PMI over Trajectory-Dependent Decision Processes

🔎 Similar Papers

DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations