D3P: Dynamic Denoising Diffusion Policy via Reinforcement Learning

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Diffusion policies excel at modeling complex action distributions for robot visuomotor control but suffer from poor real-time performance due to fixed-step iterative denoising. This work identifies task-relevant heterogeneity in action criticality—distinguishing “critical” from “routine” actions—and proposes a state-aware lightweight adapter that dynamically allocates denoising steps conditioned on the current state, marking the first such adaptive step scheduling mechanism for diffusion-based control. Furthermore, we introduce a reinforcement learning-based joint optimization framework that co-trains the diffusion policy and the adapter end-to-end. Evaluated in simulation and on physical robots, our method maintains task success rates while achieving 2.2× inference speedup in simulation and 1.9× on real hardware. This significantly enhances the deployment efficiency and practical applicability of diffusion policies in real-world robotic systems.

Technology Category

Application Category

📝 Abstract

Diffusion policies excel at learning complex action distributions for robotic visuomotor tasks, yet their iterative denoising process poses a major bottleneck for real-time deployment. Existing acceleration methods apply a fixed number of denoising steps per action, implicitly treating all actions as equally important. However, our experiments reveal that robotic tasks often contain a mix of emph{crucial} and emph{routine} actions, which differ in their impact on task success. Motivated by this finding, we propose extbf{D}ynamic extbf{D}enoising extbf{D}iffusion extbf{P}olicy extbf{(D3P)}, a diffusion-based policy that adaptively allocates denoising steps across actions at test time. D3P uses a lightweight, state-aware adaptor to allocate the optimal number of denoising steps for each action. We jointly optimize the adaptor and base diffusion policy via reinforcement learning to balance task performance and inference efficiency. On simulated tasks, D3P achieves an averaged 2.2$ imes$ inference speed-up over baselines without degrading success. Furthermore, we demonstrate D3P's effectiveness on a physical robot, achieving a 1.9$ imes$ acceleration over the baseline.

Problem

Research questions and friction points this paper is trying to address.

Optimize denoising steps for robotic actions dynamically

Balance task performance and inference efficiency in diffusion policies

Accelerate real-time deployment of diffusion-based robotic policies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic denoising step allocation per action

State-aware adaptor for optimal denoising

Reinforcement learning for joint optimization

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Master Thesis Bridging the Gap between Reinforcement Learning & E2E Driving

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Robotic Control Policy (PhD)