DARE: Diffusion Large Language Models Alignment and Reinforcement Executor

📅 2026-04-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current open-source diffusion-based large language models (dLLMs) lack a unified post-training framework, leading to difficulties in reproducibility, slow iteration cycles, and challenges in fair comparison. This work proposes the first unified post-training framework tailored for dLLMs, integrating supervised fine-tuning (SFT), parameter-efficient fine-tuning (PEFT), preference optimization, and customized reinforcement learning algorithms. The framework supports both masked and block diffusion architectures and leverages verl and OpenCompass to enable parallel generation and iterative denoising. Its effectiveness is validated across multiple models—including LLaDA, Dream, SDAR, and LLaDA2.x—demonstrating strong generality, reproducibility, and training acceleration. This platform thus provides an efficient, comparable, and scalable foundation for advancing dLLM research.
📝 Abstract
Diffusion large language models (dLLMs) are emerging as a compelling alternative to dominant autoregressive models, replacing strictly sequential token generation with iterative denoising and parallel generation dynamics. However, their open-source ecosystem remains fragmented across model families and, in particular, across post-training pipelines, where reinforcement learning objectives, rollout implementations and evaluation scripts are often released as paper-specific codebases. This fragmentation slows research iteration, raises the engineering burden of reproduction, and makes fair comparison across algorithms difficult. We present \textbf{DARE} (\textbf{d}LLMs \textbf{A}lignment and \textbf{R}einforcement \textbf{E}xecutor), an open framework for post-training and evaluating dLLMs. Built on top of verl~\cite{sheng2024hybridflow} and OpenCompass~\cite{2023opencompass}, DARE unifies supervised fine-tuning, parameter-efficient fine-tuning, preference optimization, and dLLM-specific reinforcement learning under a shared execution stack for both masked and block diffusion language models. Across representative model families including LLaDA, Dream, SDAR, and LLaDA2.x, DARE provides broad algorithmic coverage, reproducible benchmark evaluation, and practical acceleration. Extensive empirical results position that DARE serves as a reusable research substrate for developing, comparing, and deploying post-training methods for current and emerging dLLMs.
Problem

Research questions and friction points this paper is trying to address.

diffusion large language models
post-training
reinforcement learning
framework fragmentation
reproducibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion language models
post-training framework
reinforcement learning
model alignment
open-source ecosystem
🔎 Similar Papers
No similar papers found.
Jingyi Yang
Jingyi Yang
University of Science and Technology of China
Computer VisionDeep LearningAI AgentGenerative ModelsReinforcement Learning
Y
Yuxian Jiang
Shanghai Artificial Intelligence Laboratory, Fudan University
X
Xuhao Hu
Shanghai Artificial Intelligence Laboratory, Fudan University
S
Shuang Cheng
Shanghai Artificial Intelligence Laboratory, Zhejiang University
B
Biqing Qi
Shanghai AI Laboratory
Jing Shao
Jing Shao
Research Scientist, Shanghai AI Laboratory/Shanghai Jiao Tong University
Computer VisionMulti-Modal Large Language Model