Knowledge Distillation of Uncertainty using Deep Latent Factor Model

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of transferring uncertainty-aware knowledge from deep ensemble models to resource-constrained edge devices—where high computational and memory overheads impede deployment—this paper proposes a novel distributional distillation paradigm. We model the teacher ensemble as a Gaussian process, parameterizing its mean and covariance functions via a deep latent factor (DLF) model, and employ an EM algorithm for stable, differentiable distribution-level knowledge transfer. This work establishes the first Gaussian distillation framework, overcoming the inherent limitation of conventional knowledge distillation methods: the degradation of model variability and uncertainty quantification capability during compression. Extensive experiments on multiple benchmark datasets demonstrate significant improvements over state-of-the-art distillation approaches. The method is successfully applied to language model fine-tuning and out-of-distribution generalization tasks, achieving ensemble-level uncertainty calibration performance while retaining the lightweight nature of a single model.

Technology Category

Application Category

📝 Abstract
Deep ensembles deliver state-of-the-art, reliable uncertainty quantification, but their heavy computational and memory requirements hinder their practical deployments to real applications such as on-device AI. Knowledge distillation compresses an ensemble into small student models, but existing techniques struggle to preserve uncertainty partly because reducing the size of DNNs typically results in variation reduction. To resolve this limitation, we introduce a new method of distribution distillation (i.e. compressing a teacher ensemble into a student distribution instead of a student ensemble) called Gaussian distillation, which estimates the distribution of a teacher ensemble through a special Gaussian process called the deep latent factor model (DLF) by treating each member of the teacher ensemble as a realization of a certain stochastic process. The mean and covariance functions in the DLF model are estimated stably by using the expectation-maximization (EM) algorithm. By using multiple benchmark datasets, we demonstrate that the proposed Gaussian distillation outperforms existing baselines. In addition, we illustrate that Gaussian distillation works well for fine-tuning of language models and distribution shift problems.
Problem

Research questions and friction points this paper is trying to address.

Compressing deep ensembles into smaller models while preserving uncertainty quantification
Addressing computational and memory limitations of ensemble models for practical applications
Improving uncertainty distillation through Gaussian process with deep latent factor model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian distillation compresses teacher ensembles into distributions
Deep latent factor model estimates ensemble distribution via EM algorithm
Method outperforms baselines on benchmark datasets and language models
🔎 Similar Papers
No similar papers found.
S
Sehyun Park
Department of Statistics, Seoul National University
J
Jongjin Lee
Samsung Research
Y
Yunseop Shin
Department of Statistics, Seoul National University
Ilsang Ohn
Ilsang Ohn
Department of Statistics, Inha University
Bayesian nonparametricsDeep learningStatistical learning theory
Yongdai Kim
Yongdai Kim
Seoul National University
statisticsmachine learning