Efficient Epistemic Uncertainty Estimation for Large Language Models via Knowledge Distillation

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Estimating epistemic uncertainty in large language models is typically computationally expensive, limiting its applicability in safety-critical scenarios. This work proposes an efficient paradigm that leverages a small draft model to estimate token-level epistemic uncertainty through knowledge distillation combined with bias-variance decomposition: Jensen–Shannon divergence approximates variance, while KL divergence captures bias. To enhance accuracy, the approach integrates Online Stochastic Distillation (OSD) and Data-Diversified Drafting (DDD) strategies. Notably, it avoids the need for full ensembles, achieving up to a 37% reduction in RMSE on GSM8K while matching the hallucination detection performance of high-overhead methods like TokUR—all with negligible inference overhead.

Technology Category

Application Category

📝 Abstract

Quantifying uncertainty in Large Language Models (LLMs) is essential for mitigating hallucinations and enabling risk-aware deployment in safety-critical tasks. However, estimating Epistemic Uncertainty(EU) via Deep Ensembles is computationally prohibitive at the scale of modern models. We propose a framework that leverages the small draft models to efficiently estimate token-level EU, bypassing the need for full-scale ensembling. Theoretically grounded in a Bias-Variance Decomposition, our approach approximates EU via Jensen-Shannon divergence among drafts (variance proxy) and KL divergence between the draft mixture and the target (bias proxy). To further ensure accuracy without significant overhead, we introduce Online Stochastic Distillation (OSD) to efficiently approximate target aggregation and the Data-Diverse Drafts (DDD) strategy to enhance draft diversity for better target approximation. Extensive experiments on GSM8K demonstrate that our method reduces the estimation error (RMSE) by up to 37% compared to baselines. Crucially, our approach achieves Hallucination Detection performance competitive with heavy perturbation-based methods like TokUR while incurring negligible inference costs, offering a practical solution for uncertainty-aware LLM deployment.

Problem

Research questions and friction points this paper is trying to address.

Epistemic Uncertainty

Large Language Models

Uncertainty Estimation

Hallucination Detection

Computational Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Epistemic Uncertainty

Knowledge Distillation

Online Stochastic Distillation