Disentangled and Distilled Encoder for Out-of-Distribution Reasoning with Rademacher Guarantees

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address the challenge of jointly achieving model lightweighting and latent-space disentanglement in out-of-distribution (OOD) multi-label inference on resource-constrained devices, this paper proposes the Lightweight Disentangled Distillation Encoder (DDE). DDE is the first method to explicitly incorporate disentanglement constraints into the knowledge distillation objective and provides provable theoretical guarantees for disentanglement via Rademacher complexity analysis. Built upon a variational autoencoder architecture, DDE integrates a student–teacher distillation framework with disentanglement regularization, enabling significant model compression while rigorously preserving semantic independence among latent factors. Experiments demonstrate that DDE maintains high inference accuracy across multiple OOD multi-label benchmarks and successfully deploys on NVIDIA edge devices, validating its efficiency and practicality for real-world deployment.

Technology Category

Application Category

📝 Abstract

Recently, the disentangled latent space of a variational autoencoder (VAE) has been used to reason about multi-label out-of-distribution (OOD) test samples that are derived from different distributions than training samples. Disentangled latent space means having one-to-many maps between latent dimensions and generative factors or important characteristics of an image. This paper proposes a disentangled distilled encoder (DDE) framework to decrease the OOD reasoner size for deployment on resource-constrained devices while preserving disentanglement. DDE formalizes student-teacher distillation for model compression as a constrained optimization problem while preserving disentanglement with disentanglement constraints. Theoretical guarantees for disentanglement during distillation based on Rademacher complexity are established. The approach is evaluated empirically by deploying the compressed model on an NVIDIA

Problem

Research questions and friction points this paper is trying to address.

Compresses OOD reasoning models for limited devices

Preserves disentanglement during model distillation

Provides theoretical guarantees using Rademacher complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangled distilled encoder compresses OOD reasoning models

Student-teacher distillation with disentanglement constraints optimization

Rademacher complexity provides theoretical guarantees for disentanglement

🔎 Similar Papers

Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning