ReDiF: Reinforced Distillation for Few Step Diffusion

📅 2025-12-28

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

To address the high sampling cost and slow inference of diffusion models, this paper proposes the first framework that formulates knowledge distillation as a reinforcement learning (RL) policy optimization problem. Specifically, the student model’s multi-step denoising process is modeled as a sequential decision-making task, with sparse reward signals derived from alignment to teacher outputs; the Proximal Policy Optimization (PPO) algorithm is employed to optimize long-step denoising policies. Unlike conventional distillation paradigms—characterized by fixed step counts and layer-wise matching—our approach supports dynamic step scheduling and is model-agnostic, enabling general-purpose distillation. The reward design is inherently extensible. Experiments demonstrate that our method achieves superior generation quality over existing distillation approaches using only 4–8 inference steps, while exhibiting strong generalization and stability across multiple benchmark datasets.

Technology Category

Application Category

📝 Abstract

Distillation addresses the slow sampling problem in diffusion models by creating models with smaller size or fewer steps that approximate the behavior of high-step teachers. In this work, we propose a reinforcement learning based distillation framework for diffusion models. Instead of relying on fixed reconstruction or consistency losses, we treat the distillation process as a policy optimization problem, where the student is trained using a reward signal derived from alignment with the teacher's outputs. This RL driven approach dynamically guides the student to explore multiple denoising paths, allowing it to take longer, optimized steps toward high-probability regions of the data distribution, rather than relying on incremental refinements. Our framework utilizes the inherent ability of diffusion models to handle larger steps and effectively manage the generative process. Experimental results show that our method achieves superior performance with significantly fewer inference steps and computational resources compared to existing distillation techniques. Additionally, the framework is model agnostic, applicable to any type of diffusion models with suitable reward functions, providing a general optimization paradigm for efficient diffusion learning.

Problem

Research questions and friction points this paper is trying to address.

Accelerates diffusion model sampling via reinforcement learning distillation

Optimizes student models to take longer denoising steps efficiently

Reduces inference steps and computational cost in diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning guides student model distillation

Dynamic exploration of multiple denoising paths for optimization

Model-agnostic framework with reward-driven policy optimization

🔎 Similar Papers

No similar papers found.