π€ AI Summary
This work addresses the longstanding performance gap between diffusion-based and autoregressive large language models for code, where diffusion models typically underperform under comparable resource constraints. Building upon the Seed-Coder architecture and dataset, the authors introduce block-wise diffusion with continuous pretraining, enhanced by a tailored warm-up strategy and a block-level truncated noise schedule to enable efficient and stable training. Under identical architectural and data conditions, this approach enables diffusion models to consistently surpass autoregressive baselines, achieving substantial improvements in structured code editing, reasoning, and low-resource language modeling. Notably, using only pretraining and supervised fine-tuning, the proposed method outperforms a range of 8B-scale autoregressive and diffusion models across multiple code benchmarks.
π Abstract
Diffusion-based language models (DLLMs) offer non-sequential, block-wise generation and richer data reuse compared to autoregressive (AR) models, but existing code DLLMs still lag behind strong AR baselines under comparable budgets. We revisit this setting in a controlled study and introduce Stable-DiffCoder, a block diffusion code model that reuses the Seed-Coder architecture, data, and training pipeline. To enable efficient knowledge learning and stable training, we incorporate a block diffusion continual pretraining (CPT) stage enhanced by a tailored warmup and block-wise clipped noise schedule. Under the same data and architecture, Stable-DiffCoder overall outperforms its AR counterpart on a broad suite of code benchmarks. Moreover, relying only on the CPT and supervised fine-tuning stages, Stable-DiffCoder achieves stronger performance than a wide range of \~8B ARs and DLLMs, demonstrating that diffusion-based training can improve code modeling quality beyond AR training alone. Moreover, diffusion-based any-order modeling improves structured code modeling for editing and reasoning, and through data augmentation, benefits low-resource coding languages.