π€ AI Summary
This work addresses the incompatibility between diffusion models and pretrained autoregressive (AR) language models, which stems from the formerβs reliance on bidirectional attention and necessitates costly training from scratch. The authors propose FLUID, a framework that seamlessly adapts standard GPT-style AR models to the diffusion generation paradigm through strict causal alignment. FLUID further introduces an elastic receptive field mechanism that dynamically adjusts denoising step size and scheduling based on local information density. This approach enables, for the first time, efficient transfer of rich priors from pretrained AR models into diffusion-based text generation without requiring large-scale retraining. Consequently, FLUID achieves state-of-the-art performance while significantly reducing computational costs, successfully combining the strong linguistic priors of AR models with the parallelizable generation benefits of diffusion models.
π Abstract
Diffusion models promise efficient parallel text generation but rely on bidirectional attention, creating a structural mismatch with pre-trained Autoregressive (AR) models. This incompatibility precludes reusing robust AR priors, necessitating prohibitive pre-training from scratch. To bridge this gap, we propose FLUID, a framework that efficiently adapts AR backbones to the diffusion paradigm. By enforcing Strictly Causal Alignment, FLUID enables seamless initialization from standard GPT-style checkpoints, circumventing the need for massive pre-training. Furthermore, we introduce Elastic Horizons, an entropy-driven mechanism that dynamically modulates denoising strides based on local information density rather than fixed schedules. Experiments demonstrate that FLUID achieves state-of-the-art performance while reducing training costs by orders of magnitude, effectively reconciling established AR foundations with efficient parallel generation. Our code is available at https://github.com/Oli-lab-nun/FLUID/tree/main.