From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

📅 2026-04-11

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the incompatibility between diffusion models and pretrained autoregressive (AR) language models, which stems from the former’s reliance on bidirectional attention and necessitates costly training from scratch. The authors propose FLUID, a framework that seamlessly adapts standard GPT-style AR models to the diffusion generation paradigm through strict causal alignment. FLUID further introduces an elastic receptive field mechanism that dynamically adjusts denoising step size and scheduling based on local information density. This approach enables, for the first time, efficient transfer of rich priors from pretrained AR models into diffusion-based text generation without requiring large-scale retraining. Consequently, FLUID achieves state-of-the-art performance while significantly reducing computational costs, successfully combining the strong linguistic priors of AR models with the parallelizable generation benefits of diffusion models.

📝 Abstract

Diffusion models promise efficient parallel text generation but rely on bidirectional attention, creating a structural mismatch with pre-trained Autoregressive (AR) models. This incompatibility precludes reusing robust AR priors, necessitating prohibitive pre-training from scratch. To bridge this gap, we propose FLUID, a framework that efficiently adapts AR backbones to the diffusion paradigm. By enforcing Strictly Causal Alignment, FLUID enables seamless initialization from standard GPT-style checkpoints, circumventing the need for massive pre-training. Furthermore, we introduce Elastic Horizons, an entropy-driven mechanism that dynamically modulates denoising strides based on local information density rather than fixed schedules. Experiments demonstrate that FLUID achieves state-of-the-art performance while reducing training costs by orders of magnitude, effectively reconciling established AR foundations with efficient parallel generation. Our code is available at https://github.com/Oli-lab-nun/FLUID/tree/main.

Problem

Research questions and friction points this paper is trying to address.

Autoregressive models

Diffusion models

Parallel text generation

Structural mismatch

Model adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Strictly Causal Alignment

Elastic Horizons

Diffusion Models