Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

📅 2026-01-22

📈 Citations: 4

✨ Influential: 1

career value

207K/year

🤖 AI Summary

This work addresses the longstanding performance gap between diffusion-based and autoregressive large language models for code, where diffusion models typically underperform under comparable resource constraints. Building upon the Seed-Coder architecture and dataset, the authors introduce block-wise diffusion with continuous pretraining, enhanced by a tailored warm-up strategy and a block-level truncated noise schedule to enable efficient and stable training. Under identical architectural and data conditions, this approach enables diffusion models to consistently surpass autoregressive baselines, achieving substantial improvements in structured code editing, reasoning, and low-resource language modeling. Notably, using only pretraining and supervised fine-tuning, the proposed method outperforms a range of 8B-scale autoregressive and diffusion models across multiple code benchmarks.

Technology Category

Application Category

📝 Abstract

Diffusion-based language models (DLLMs) offer non-sequential, block-wise generation and richer data reuse compared to autoregressive (AR) models, but existing code DLLMs still lag behind strong AR baselines under comparable budgets. We revisit this setting in a controlled study and introduce Stable-DiffCoder, a block diffusion code model that reuses the Seed-Coder architecture, data, and training pipeline. To enable efficient knowledge learning and stable training, we incorporate a block diffusion continual pretraining (CPT) stage enhanced by a tailored warmup and block-wise clipped noise schedule. Under the same data and architecture, Stable-DiffCoder overall outperforms its AR counterpart on a broad suite of code benchmarks. Moreover, relying only on the CPT and supervised fine-tuning stages, Stable-DiffCoder achieves stronger performance than a wide range of \~8B ARs and DLLMs, demonstrating that diffusion-based training can improve code modeling quality beyond AR training alone. Moreover, diffusion-based any-order modeling improves structured code modeling for editing and reasoning, and through data augmentation, benefits low-resource coding languages.

Problem

Research questions and friction points this paper is trying to address.

diffusion-based language models

code generation

autoregressive models

code modeling

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

block diffusion

continual pretraining

code generation