Enhanced DACER Algorithm with High Diffusion Efficiency

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models in online reinforcement learning suffer from a fundamental trade-off among excessive denoising steps, low training/inference efficiency, and degraded policy performance. To address this, we propose a time-aware, efficient diffusion policy framework guided by Q-gradient fields: (1) we introduce the gradient of the Q-function as an auxiliary optimization objective during denoising, decoupling noise removal from action refinement; and (2) we design a time-adaptive weighting mechanism to alleviate the strong coupling between diffusion step count and policy quality. Integrated within an Actor-Critic architecture, our method enables end-to-end policy gradient training. Evaluated on the MuJoCo benchmark, it achieves state-of-the-art performance with only five diffusion steps—outperforming DACER significantly—while delivering substantial gains in both training and inference efficiency, particularly on multimodal tasks.

Technology Category

Application Category

📝 Abstract
Due to their expressive capacity, diffusion models have shown great promise in offline RL and imitation learning. Diffusion Actor-Critic with Entropy Regulator (DACER) extended this capability to online RL by using the reverse diffusion process as a policy approximator, trained end-to-end with policy gradient methods, achieving strong performance. However, this comes at the cost of requiring many diffusion steps, which significantly hampers training efficiency, while directly reducing the steps leads to noticeable performance degradation. Critically, the lack of inference efficiency becomes a significant bottleneck for applying diffusion policies in real-time online RL settings. To improve training and inference efficiency while maintaining or even enhancing performance, we propose a Q-gradient field objective as an auxiliary optimization target to guide the denoising process at each diffusion step. Nonetheless, we observe that the independence of the Q-gradient field from the diffusion time step negatively impacts the performance of the diffusion policy. To address this, we introduce a temporal weighting mechanism that enables the model to efficiently eliminate large-scale noise in the early stages and refine actions in the later stages. Experimental results on MuJoCo benchmarks and several multimodal tasks demonstrate that the DACER2 algorithm achieves state-of-the-art performance in most MuJoCo control tasks with only five diffusion steps, while also exhibiting stronger multimodality compared to DACER.
Problem

Research questions and friction points this paper is trying to address.

Improving training efficiency of diffusion policies in online RL
Reducing diffusion steps without performance degradation
Enhancing inference efficiency for real-time RL applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Q-gradient field objective for denoising guidance
Temporal weighting mechanism for noise reduction
Five-step diffusion for efficient online RL
🔎 Similar Papers
No similar papers found.
Yinuo Wang
Yinuo Wang
Tsinghua University
LLMReinforcement LearningAutonomous DrivingDiffusion Model
Mining Tan
Mining Tan
Institute of Automation,Chinese Academy of Sciences
Computer VisionMultimediaGenerative AI
W
Wenjun Zou
School of Vehicle and Mobility & College of AI, Tsinghua University
Haotian Lin
Haotian Lin
Carnegie Mellon University
RoboticsAutonomous DrivingReinforcement Learning
X
Xujie Song
School of Vehicle and Mobility & College of AI, Tsinghua University
W
Wenxuan Wang
School of Vehicle and Mobility & College of AI, Tsinghua University
T
Tong Liu
School of Vehicle and Mobility & College of AI, Tsinghua University
L
Likun Wang
School of Vehicle and Mobility & College of AI, Tsinghua University
G
Guojian Zhan
School of Vehicle and Mobility & College of AI, Tsinghua University
T
Tianze Zhu
School of Vehicle and Mobility & College of AI, Tsinghua University
S
Shiqi Liu
School of Vehicle and Mobility & College of AI, Tsinghua University
Jingliang Duan
Jingliang Duan
University of Science and Technology Beijing
S
Shengbo Eben Li
School of Vehicle and Mobility & College of AI, Tsinghua University