Pretrained LLMs Learn Multiple Types of Uncertainty

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Large language models (LLMs) frequently generate factually incorrect outputs (i.e., hallucinations), and their implicit capacity for uncertainty modeling remains poorly understood. Method: This work first identifies that pretrained LLMs intrinsically encode multiple separable uncertainty representations—each correlating with correctness prediction across distinct tasks. We propose a unified uncertainty tuning framework based on an “[IDK]” token, validated via linear probing, uncertainty-space disentanglement, and abstention behavior modeling. Contribution/Results: We demonstrate that task-agnostic uncertainty signals emerge during pretraining and do not scale with model size. After unifying these representations, correctness prediction accuracy improves significantly. Our findings reveal that LLMs acquire structured, unsupervised uncertainty representations—enabling more reliable generation and principled, controllable abstention—thereby establishing a novel paradigm for trustworthy and self-aware language modeling.

Technology Category

Application Category

📝 Abstract

Large Language Models are known to capture real-world knowledge, allowing them to excel in many downstream tasks. Despite recent advances, these models are still prone to what are commonly known as hallucinations, causing them to emit unwanted and factually incorrect text. In this work, we study how well LLMs capture uncertainty, without explicitly being trained for that. We show that, if considering uncertainty as a linear concept in the model's latent space, it might indeed be captured, even after only pretraining. We further show that, though unintuitive, LLMs appear to capture several different types of uncertainty, each of which can be useful to predict the correctness for a specific task or benchmark. Furthermore, we provide in-depth results such as demonstrating a correlation between our correction prediction and the model's ability to abstain from misinformation using words, and the lack of impact of model scaling for capturing uncertainty. Finally, we claim that unifying the uncertainty types as a single one using instruction-tuning or [IDK]-token tuning is helpful for the model in terms of correctness prediction.

Problem

Research questions and friction points this paper is trying to address.

Study how LLMs capture uncertainty without explicit training

Explore multiple types of uncertainty in LLMs for task correctness

Unify uncertainty types to improve model correctness prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs capture uncertainty without explicit training

Uncertainty is linear in latent space post-pretraining

Unify uncertainty types via instruction-tuning for correctness

🔎 Similar Papers

No similar papers found.