AdvDMD: Adversarial Reward Meets DMD For High-Quality Few-Step Generation

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the significant degradation in generation quality when distilling diffusion models to drastically fewer sampling steps, a challenge exacerbated by the complexity and inefficiency of existing reinforcement learning (RL) approaches combined with distillation. The paper proposes AdvDMD, the first framework to unify distribution matching distillation (DMD) with RL by employing an adversarially trained discriminator as an online-updated reward model. This enables joint optimization over both intermediate denoising states and final outputs, providing stable supervision across the entire sampling trajectory and effectively mitigating reward hacking. Through a unified SDE backward simulation and training schedule, AdvDMD surpasses the original 40-step SD3.5 model on DPG-Bench using only four steps and substantially improves SD3 performance on GenEval; its two-step variant even outperforms TwinFlow on Qwen-Image.

📝 Abstract

Diffusion models offer superior generation quality at the expense of extensive sampling steps. Distillation methods, with Distribution Matching Distillation (DMD) as a popular example, can mitigate this issue, but performance degradation remains pronounced when sampling steps are limited. Reinforcement learning (RL) has been leveraged to improve the few-step generation quality during distillation, with the potential to even surpass the performance of the teacher model. However, existing approaches are combinatorial in nature, merely integrating an RL process with the distillation process, which introduces unnecessary complexities. To address this gap, we propose AdvDMD, a method that seamlessly unifies DMD distillation and RL. Specifically, AdvDMD employs the adversarially trained discriminator from DMD2 as the reward model, which assigns low scores to generated images and high scores to real ones. It is trained on both intermediate and final states of the denoising process and updated online with the distilled model, enabling a holistic supervision of the sampling trajectories and mitigating reward hacking. We adopt a unified SDE backward simulation and a different training schedule for DMD and RL to enable a more stable and efficient training. Experimental results demonstrate that the 4-step AdvDMD outperforms the original 40-step model for SD3.5 on DPG-Bench, while achieving significant performance gains for SD3 on the GenEval. On Qwen-Image, our 2-step AdvDMD achieves superior performance over TwinFlow.

Problem

Research questions and friction points this paper is trying to address.

diffusion models

few-step generation

distillation

reinforcement learning

generation quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial Reward

Distribution Matching Distillation

Few-Step Generation