A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions

📅 2024-12-07

🏛️ ACM Computing Surveys

📈 Citations: 19

✨ Influential: 1

career value

197K/year

🤖 AI Summary

Large language models (LLMs) frequently generate high-confidence yet factually incorrect “hallucinations,” necessitating reliable uncertainty quantification (UQ) to enhance trustworthiness. This paper systematically surveys over 100 UQ methods for LLMs and introduces the first unified taxonomy covering sampling-based, confidence calibration, multi-prompt/multi-output consistency, and latent-space modeling paradigms. We propose a cross-methodological framework that rigorously distinguishes aleatoric from epistemic uncertainty and clarifies the applicability boundaries and limitations of key techniques—including Bayesian approximations and logit-distribution calibration. Our analysis reveals fundamental trade-offs among open-domain generalization, computational efficiency, and interpretability in existing approaches. Crucially, we establish uncertainty awareness as a core capability for robust LLM deployment, providing both theoretical foundations and practical guidelines for trustworthy dialogue systems, generative applications, and embodied AI.

Technology Category

Application Category

📝 Abstract

The remarkable performance of large language models (LLMs) in content generation, coding, and common-sense reasoning has spurred widespread integration into many facets of society. However, integration of LLMs raises valid questions on their reliability and trustworthiness, given their propensity to generate hallucinations: plausible, factually-incorrect responses, which are expressed with striking confidence. Previous work has shown that hallucinations and other non-factual responses generated by LLMs can be detected by examining the uncertainty of the LLM in its response to the pertinent prompt, driving significant research efforts devoted to quantifying the uncertainty of LLMs. This survey seeks to provide an extensive review of existing uncertainty quantification methods for LLMs, identifying their salient features, along with their strengths and weaknesses. We present existing methods within a relevant taxonomy, unifying ostensibly disparate methods to aid understanding of the state of the art. Furthermore, we highlight applications of uncertainty quantification methods for LLMs, spanning chatbot and textual applications to embodied artificial intelligence applications in robotics. We conclude with open research challenges in uncertainty quantification of LLMs, seeking to motivate future research.

Problem

Research questions and friction points this paper is trying to address.

Quantify uncertainty in large language models to detect hallucinations

Review existing uncertainty quantification methods and their taxonomy

Identify open research challenges for future LLM reliability improvements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Surveying uncertainty quantification methods for LLMs

Taxonomy unifying disparate uncertainty quantification approaches

Applications in chatbots, text, and robotics highlighted

🔎 Similar Papers

Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph