D3P: Dynamic Denoising Diffusion Policy via Reinforcement Learning

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion policies excel at modeling complex action distributions for robot visuomotor control but suffer from poor real-time performance due to fixed-step iterative denoising. This work identifies task-relevant heterogeneity in action criticality—distinguishing “critical” from “routine” actions—and proposes a state-aware lightweight adapter that dynamically allocates denoising steps conditioned on the current state, marking the first such adaptive step scheduling mechanism for diffusion-based control. Furthermore, we introduce a reinforcement learning-based joint optimization framework that co-trains the diffusion policy and the adapter end-to-end. Evaluated in simulation and on physical robots, our method maintains task success rates while achieving 2.2× inference speedup in simulation and 1.9× on real hardware. This significantly enhances the deployment efficiency and practical applicability of diffusion policies in real-world robotic systems.

Technology Category

Application Category

📝 Abstract
Diffusion policies excel at learning complex action distributions for robotic visuomotor tasks, yet their iterative denoising process poses a major bottleneck for real-time deployment. Existing acceleration methods apply a fixed number of denoising steps per action, implicitly treating all actions as equally important. However, our experiments reveal that robotic tasks often contain a mix of emph{crucial} and emph{routine} actions, which differ in their impact on task success. Motivated by this finding, we propose extbf{D}ynamic extbf{D}enoising extbf{D}iffusion extbf{P}olicy extbf{(D3P)}, a diffusion-based policy that adaptively allocates denoising steps across actions at test time. D3P uses a lightweight, state-aware adaptor to allocate the optimal number of denoising steps for each action. We jointly optimize the adaptor and base diffusion policy via reinforcement learning to balance task performance and inference efficiency. On simulated tasks, D3P achieves an averaged 2.2$ imes$ inference speed-up over baselines without degrading success. Furthermore, we demonstrate D3P's effectiveness on a physical robot, achieving a 1.9$ imes$ acceleration over the baseline.
Problem

Research questions and friction points this paper is trying to address.

Optimize denoising steps for robotic actions dynamically
Balance task performance and inference efficiency in diffusion policies
Accelerate real-time deployment of diffusion-based robotic policies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic denoising step allocation per action
State-aware adaptor for optimal denoising
Reinforcement learning for joint optimization
🔎 Similar Papers
No similar papers found.
Shu-Ang Yu
Shu-Ang Yu
Ph.D student of E.E., Tsinghua University
reinforcement learningrobotics
F
Feng Gao
Tsinghua University
Y
Yi Wu
Tsinghua University and Shanghai Qi Zhi Institute
C
Chao Yu
Tsinghua University
Y
Yu Wang
Tsinghua University