Efficient Diffusion Planning with Temporal Diffusion

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion-based planning methods suffer from high computational overhead, low decision frequency, and susceptibility to plan–reality discrepancies due to frequent full replanning. To address these issues, we propose the Temporal Diffusion Planner (TDP). TDP distributes the denoising process across the temporal dimension, enabling progressive, stepwise refinement of a blurred long-horizon plan—eliminating the need for per-step full replanning. It further introduces a state-consistency-driven automatic replanning strategy that enhances real-world alignment while preserving planning continuity. By integrating offline reinforcement learning with dynamic temporal denoising, TDP significantly reduces computational burden. On the D4RL benchmark, TDP achieves an 11–24.8× increase in decision frequency over baseline diffusion planners, while matching or exceeding their task performance.

Technology Category

Application Category

📝 Abstract
Diffusion planning is a promising method for learning high-performance policies from offline data. To avoid the impact of discrepancies between planning and reality on performance, previous works generate new plans at each time step. However, this incurs significant computational overhead and leads to lower decision frequencies, and frequent plan switching may also affect performance. In contrast, humans might create detailed short-term plans and more general, sometimes vague, long-term plans, and adjust them over time. Inspired by this, we propose the Temporal Diffusion Planner (TDP) which improves decision efficiency by distributing the denoising steps across the time dimension. TDP begins by generating an initial plan that becomes progressively more vague over time. At each subsequent time step, rather than generating an entirely new plan, TDP updates the previous one with a small number of denoising steps. This reduces the average number of denoising steps, improving decision efficiency. Additionally, we introduce an automated replanning mechanism to prevent significant deviations between the plan and reality. Experiments on D4RL show that, compared to previous works that generate new plans every time step, TDP improves the decision-making frequency by 11-24.8 times while achieving higher or comparable performance.
Problem

Research questions and friction points this paper is trying to address.

Reduces computational overhead in diffusion planning for offline policies
Improves decision efficiency by distributing denoising steps temporally
Prevents plan-reality deviations with automated replanning mechanism
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributes denoising steps across time dimension
Updates previous plan with few denoising steps
Automated replanning prevents plan-reality deviations
🔎 Similar Papers
No similar papers found.
Jiaming Guo
Jiaming Guo
Institute of Computing Technology, Chinese Academy of Sciences
Artificial intelligenceReinforcement Learning
R
Rui Zhang
SKL of Processors, Institute of Computing Technology, CAS
Z
Zerun Li
SKL of Processors, Institute of Computing Technology, CAS
Y
Yunkai Gao
Institute of Al for Industries
Shaohui Peng
Shaohui Peng
Institute of Software Chinese Academy of Sciences
Embodied AIReinforcement Learning
S
Siming Lan
Institute of Al for Industries
X
Xing Hu
SKL of Processors, Institute of Computing Technology, CAS
Z
Zidong Du
SKL of Processors, Institute of Computing Technology, CAS
Xishan Zhang
Xishan Zhang
Institute of Computing Technology of the Chinese Academy of Sciences
L
Ling Li
Intelligent Software Research Center, Institute of Software, CAS