DiLaDiff: Distilled Latent-Augmented Diffusion for Language Modeling

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Diffusion language models struggle to balance generation quality and inference efficiency due to difficulties in modeling inter-token dependencies. This work proposes a novel latent-space-guided diffusion generation paradigm: it first constructs a semantically continuous latent space using a fine-tuned autoencoder, then designs a diffusion prior over latent variables and integrates a consistency distillation mechanism. The approach outperforms existing baselines even without distillation while accelerating inference; when distillation is applied, the overhead of latent variable generation becomes negligible, substantially reducing overall computational cost without compromising high-quality text generation.

📝 Abstract

Diffusion language models intrinsically fail to capture correlations between decoded tokens, which leads to a harsh trade-off between sampling quality and throughput. To solve this issue, we propose DiLaDiff, a variant of masked diffusion language models with three components: (1) a continuous latent space with semantic capabilities, learned by an auto-encoder fine-tuned from an existing masked diffusion language model; (2) a latent diffusion model learning the prior over the encoder distribution; (3) a consistency model distilling the learned prior into a few-step latent generative model. We show that, even without distillation, our latent-guided diffusion model outperforms the masked diffusion baseline while significantly accelerating inference. Consistency distillation further lowers the computational overhead of continuous diffusion, such that the latent is generated in negligible time compared to discrete decoding.

Problem

Research questions and friction points this paper is trying to address.

diffusion language models

token correlations

sampling quality

throughput

inference efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent diffusion

consistency distillation

masked diffusion language model