Efficient Uncertainty Estimation via Distillation of Bayesian Large Language Models

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Bayesian large language models (LLMs) suffer from low-efficiency uncertainty estimation during inference due to the need for multiple stochastic sampling. Method: We propose a novel test-time sampling-free paradigm. Leveraging an offline-trained Bayesian teacher model, we distill posterior uncertainty knowledge—encoded in predictive distributions—into a lightweight non-Bayesian student model via distribution alignment and KL divergence minimization, using only the training set. No additional validation set is required. Contribution/Results: This work achieves the first purely training-set-driven compression of Bayesian confidence distributions. Experiments demonstrate an N-fold inference speedup (where N equals the number of samples in conventional Bayesian inference), while matching or surpassing state-of-the-art Bayesian LLMs in uncertainty calibration. The method exhibits strong generalization across diverse tasks and datasets, offering a practical, efficient alternative to sampling-based Bayesian inference without compromising reliability.

Technology Category

Application Category

📝 Abstract

Recent advances in uncertainty estimation for Large Language Models (LLMs) during downstream adaptation have addressed key challenges of reliability and simplicity. However, existing Bayesian methods typically require multiple sampling iterations during inference, creating significant efficiency issues that limit practical deployment. In this paper, we investigate the possibility of eliminating the need for test-time sampling for LLM uncertainty estimation. Specifically, when given an off-the-shelf Bayesian LLM, we distill its aligned confidence into a non-Bayesian student LLM by minimizing the divergence between their predictive distributions. Unlike typical calibration methods, our distillation is carried out solely on the training dataset without the need of an additional validation dataset. This simple yet effective approach achieves N-times more efficient uncertainty estimation during testing, where N is the number of samples traditionally required by Bayesian LLMs. Our extensive experiments demonstrate that uncertainty estimation capabilities on training data can successfully generalize to unseen test data through our distillation technique, consistently producing results comparable to (or even better than) state-of-the-art Bayesian LLMs.

Problem

Research questions and friction points this paper is trying to address.

Eliminate test-time sampling for LLM uncertainty estimation

Distill Bayesian LLM confidence into non-Bayesian student LLM

Achieve efficient uncertainty estimation without validation dataset

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distill Bayesian LLM confidence into non-Bayesian student

Eliminate test-time sampling for uncertainty estimation

Achieve N-times efficiency gain over Bayesian methods

🔎 Similar Papers

No similar papers found.