🤖 AI Summary
This work addresses the challenge that discrete diffusion language models cannot leverage correction signals from the sampling phase due to the non-differentiability of hard token sampling. The authors propose TokenDrift, the first method to adapt the drift concept from continuous generation to discrete text generation. By mapping tokens to soft features and applying an antisymmetric drift within a frozen semantic space—combined with a stop-gradient mechanism to optimize model logits—the approach constructs a trainable discrete drift objective compatible with both mask-based and uniform-state diffusion backbones. Experiments demonstrate substantial improvements in few-step generation quality: with only four function evaluations (NFE), the method reduces generation perplexity by 89% for MDLM and 86% for DUO, significantly enhancing sample efficiency and output coherence.
📝 Abstract
Discrete diffusion language models (DDLMs) generate text by iteratively denoising categorical token sequences, while recent drifting methods for continuous generators suggest that part of this sampling-time correction can instead be absorbed into training through an anti-symmetric fixed-point objective. We study how to transfer this principle to DDLMs, where the main challenge is the interface with discrete text: hard token samples are non-differentiable, and categorical predictions do not directly provide continuous samples to drift. We formulate TokenDrift, a drifting objective that lifts categorical predictions to soft-token features, applies anti-symmetric drifting in a frozen semantic space, and backpropagates the resulting stop-gradient feature target to DDLM logits. In controlled continual-training experiments with masked and uniform-state diffusion backbones, TokenDrift improves fixed-NFE generation quality over matched continuation baselines, reducing Gen.-PPL at 4 NFEs by 89% on MDLM and 86% on DUO. These results suggest that drifting can provide a practical refinement objective for DDLMs.