Large Language Models Must Be Taught to Know What They Don't Know

📅 2024-06-12
🏛️ Neural Information Processing Systems
📈 Citations: 27
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from poor uncertainty calibration in high-stakes scenarios, while existing calibration methods—relying solely on prompting or sampling—exhibit either low reliability or prohibitive computational overhead. Method: We propose a lightweight feature-space fine-tuning framework that requires only ~1,000 labeled examples. Using LoRA for low-rank adaptation (<0.1% parameter updates), it targets intermediate transformer layers—not the output head—to learn calibrated uncertainty representations. Contribution/Results: We empirically refute the common assumption that prompting alone suffices for reliable calibration. We further discover that LLMs possess cross-model generalizable uncertainty estimation capability. Our method consistently outperforms prompting- and sampling-based baselines across multiple LLMs. Extensive multi-model transfer evaluation and human-AI collaboration experiments demonstrate that our calibrated uncertainty signals significantly improve human decision accuracy and trust calibration—without requiring model retraining or architectural modification.

Technology Category

Application Category

📝 Abstract
When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibration and then show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We show that a thousand graded examples are sufficient to outperform baseline methods and that training through the features of a model is necessary for good performance and tractable for large open-source models when using LoRA. We also investigate the mechanisms that enable reliable LLM uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators, applicable not just to their own uncertainties but also the uncertainty of other models. Lastly, we show that uncertainty estimates inform human use of LLMs in human-AI collaborative settings through a user study.
Problem

Research questions and friction points this paper is trying to address.

Teaching LLMs to recognize their own knowledge limitations
Developing efficient uncertainty estimation methods for LLMs
Enabling reliable trust calibration in high-stakes AI applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning with small dataset for uncertainty estimation
Using LoRA for tractable large model training
General-purpose uncertainty estimators across models
🔎 Similar Papers
No similar papers found.