Dream 7B: Diffusion Large Language Models

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Autoregressive language models suffer from sequential generation constraints, hindering parallel sequence optimization. Dream-7B introduces the first high-performance open-source discrete diffusion large language model (7B parameters), enabling parallel text generation via token-level iterative denoising—supporting arbitrary-order generation, content infilling, and quality–speed–tradeoff controllable inference. Its key contributions are: (1) a context-adaptive token-level noise rescheduling mechanism; (2) a stable training framework initialized with autoregressive LLMs; and (3) multi-task joint optimization covering general understanding, mathematical reasoning, and code generation. Dream-7B substantially outperforms existing diffusion language models across multiple benchmarks, demonstrating superior planning capability and inference flexibility. Both base and instruction-tuned variants are publicly released, advancing the development of diffusion-based language modeling.

Technology Category

Application Category

📝 Abstract

We introduce Dream 7B, the most powerful open diffusion large language model to date. Unlike autoregressive (AR) models that generate tokens sequentially, Dream 7B employs discrete diffusion modeling to refine sequences in parallel through iterative denoising. Our model consistently outperforms existing diffusion language models on general, mathematical, and coding tasks. Dream 7B demonstrates superior planning abilities and inference flexibility, including arbitrary-order generation, infilling capabilities, and tunable quality-speed trade-offs. These results are achieved through simple yet effective training techniques, including AR-based LLM initialization and context-adaptive token-level noise rescheduling. We release both Dream-Base and Dream-Instruct to facilitate further research in diffusion-based language modeling.

Problem

Research questions and friction points this paper is trying to address.

Introducing Dream 7B as most powerful open diffusion language model

Employing discrete diffusion modeling for parallel sequence refinement

Outperforming existing models on general mathematical coding tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrete diffusion modeling for parallel sequence refinement

AR-based LLM initialization with adaptive noise rescheduling

Arbitrary-order generation with tunable quality-speed tradeoffs

🔎 Similar Papers

Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion