Textual Bayes: Quantifying Uncertainty in LLM-Based Systems

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Large language models (LLMs) suffer from fundamental limitations in uncertainty quantification, opaque black-box behavior (especially for closed-source models), and high sensitivity to prompts. Method: This paper introduces a novel paradigm that treats prompts as Bayesian-textual parameters—enabling probabilistic joint modeling of prompt and prediction uncertainties directly in the pure textual space. We design a prior textual representation mechanism and propose MHLP, the first Metropolis-Hastings–based MCMC sampling algorithm operating exclusively in text space, requiring neither gradients nor internal model access. MHLP is fully compatible with arbitrary closed-source LLMs and integrates zero-shot. Contribution/Results: Experiments demonstrate significant improvements in both predictive accuracy and calibration across multiple uncertainty quantification benchmarks. Our approach establishes a new, practical framework for trustworthy LLM deployment in high-stakes applications.

Technology Category

Application Category

📝 Abstract

Although large language models (LLMs) are becoming increasingly capable of solving challenging real-world tasks, accurately quantifying their uncertainty remains a critical open problem, which limits their applicability in high-stakes domains. This challenge is further compounded by the closed-source, black-box nature of many state-of-the-art LLMs. Moreover, LLM-based systems can be highly sensitive to the prompts that bind them together, which often require significant manual tuning (i.e., prompt engineering). In this work, we address these challenges by viewing LLM-based systems through a Bayesian lens. We interpret prompts as textual parameters in a statistical model, allowing us to use a small training dataset to perform Bayesian inference over these prompts. This novel perspective enables principled uncertainty quantification over both the model's textual parameters and its downstream predictions, while also incorporating prior beliefs about these parameters expressed in free-form text. To perform Bayesian inference, a difficult problem even for well-studied data modalities, we introduce Metropolis-Hastings through LLM Proposals (MHLP), a novel Markov chain Monte Carlo (MCMC) algorithm that combines prompt optimization techniques with standard MCMC methods. MHLP is a turnkey modification to existing LLM pipelines, including those that rely exclusively on closed-source models. Empirically, we demonstrate that our method yields improvements in both predictive accuracy and uncertainty quantification (UQ) on a range of LLM benchmarks and UQ tasks. More broadly, our work demonstrates a viable path for incorporating methods from the rich Bayesian literature into the era of LLMs, paving the way for more reliable and calibrated LLM-based systems.

Problem

Research questions and friction points this paper is trying to address.

Quantify uncertainty in LLM-based systems for high-stakes domains

Address sensitivity to manual prompt tuning in LLM systems

Enable Bayesian inference for LLMs with closed-source models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian inference over textual prompts

Metropolis-Hastings via LLM proposals

Uncertainty quantification for LLM predictions

🔎 Similar Papers

No similar papers found.