🤖 AI Summary
This work addresses the inefficiency of existing masked discrete diffusion models, which lack effective deterministic sampling methods and thus suffer from slow inference. By establishing, for the first time, a duality between masked diffusion and continuous Gaussian processes, the authors introduce a maximum-index-preserving mechanism that interprets the masking process as a projection of a Gaussian process. This insight enables the construction of deterministic coupling trajectories, leading to the design of Masked Consistency Distillation (MCD)—a framework that achieves purely deterministic consistency distillation without relying on numerical ODE solvers or stochastic sampling. Experiments demonstrate that MCD matches the generation quality of prior methods while accelerating inference by up to 16×, substantially outperforming existing stochastic distillation approaches.
📝 Abstract
Masked discrete diffusion is a dominant paradigm for high-quality language modeling where tokens are iteratively corrupted to a mask state, yet its inference efficiency is bottlenecked by the lack of deterministic sampling tools. While diffusion duality enables deterministic distillation for uniform models, these approaches generally underperform masked models and rely on complex integral operators. Conversely, in the masked domain, prior methods typically assume the absence of deterministic trajectories, forcing a reliance on stochastic distillation. To bridge this gap, we establish explicit Masked Diffusion Duality, proving that the masked process arises as the projection of a continuous Gaussian process via a novel maximum-value index preservation mechanism. Furthermore, we introduce Masked Consistency Distillation (MCD), a principled framework that leverages this duality to analytically construct the deterministic coupled trajectories required for consistency distillation, bypassing numerical ODE solvers. This result strictly improves upon prior stochastic distillation methods, achieving a 16$\times$ inference speedup without compromising generation quality. Our findings not only provide a solid theoretical foundation connecting masked and continuous diffusion, but also unlock the full potential of consistency distillation for high-performance discrete generation. Our code is available at https://anonymous.4open.science/r/MCD-70FD.