Efficient Epistemic Uncertainty Estimation for Large Language Models via Knowledge Distillation

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Estimating epistemic uncertainty in large language models is typically computationally expensive, limiting its applicability in safety-critical scenarios. This work proposes an efficient paradigm that leverages a small draft model to estimate token-level epistemic uncertainty through knowledge distillation combined with bias-variance decomposition: Jensen–Shannon divergence approximates variance, while KL divergence captures bias. To enhance accuracy, the approach integrates Online Stochastic Distillation (OSD) and Data-Diversified Drafting (DDD) strategies. Notably, it avoids the need for full ensembles, achieving up to a 37% reduction in RMSE on GSM8K while matching the hallucination detection performance of high-overhead methods like TokUR—all with negligible inference overhead.

Technology Category

Application Category

📝 Abstract
Quantifying uncertainty in Large Language Models (LLMs) is essential for mitigating hallucinations and enabling risk-aware deployment in safety-critical tasks. However, estimating Epistemic Uncertainty(EU) via Deep Ensembles is computationally prohibitive at the scale of modern models. We propose a framework that leverages the small draft models to efficiently estimate token-level EU, bypassing the need for full-scale ensembling. Theoretically grounded in a Bias-Variance Decomposition, our approach approximates EU via Jensen-Shannon divergence among drafts (variance proxy) and KL divergence between the draft mixture and the target (bias proxy). To further ensure accuracy without significant overhead, we introduce Online Stochastic Distillation (OSD) to efficiently approximate target aggregation and the Data-Diverse Drafts (DDD) strategy to enhance draft diversity for better target approximation. Extensive experiments on GSM8K demonstrate that our method reduces the estimation error (RMSE) by up to 37% compared to baselines. Crucially, our approach achieves Hallucination Detection performance competitive with heavy perturbation-based methods like TokUR while incurring negligible inference costs, offering a practical solution for uncertainty-aware LLM deployment.
Problem

Research questions and friction points this paper is trying to address.

Epistemic Uncertainty
Large Language Models
Uncertainty Estimation
Hallucination Detection
Computational Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Epistemic Uncertainty
Knowledge Distillation
Online Stochastic Distillation
Data-Diverse Drafts
Large Language Models
🔎 Similar Papers
No similar papers found.
S
Seonghyeon Park
Department of Aerospace Engineering, Seoul National University
J
J. Yeom
Graduate School of Data Science, Seoul National University
J
Jaewon Sok
Department of Rural Systems Engineering, Seoul National University
J
Jeongjae Park
Graduate School of Data Science, Seoul National University
Heejun Kim
Heejun Kim
University of North Texas
information sciencedata scienceinformation credibility
Taesup Kim
Taesup Kim
Assistant Professor, Seoul National University
Representation LearningTransfer LearningAIMachine LearningDeep Learning