TS-DP: Reinforcement Speculative Decoding For Temporal Adaptive Diffusion Policy Acceleration

πŸ“… 2025-12-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Diffusion policies (DPs) suffer from high inference latency and computational overhead due to multi-step denoising, while static acceleration methods fail to adapt to task-specific temporal dynamics. This paper introduces the first reinforcement learning (RL)-driven, temporally adaptive speculative decoding framework for DPs. Our approach addresses the core challenges through three key innovations: (1) a Transformer-based, time-aware distilled drafter that generates high-fidelity draft trajectories; (2) an RL-based scheduler that dynamically optimizes the number of speculative steps and model parameters per timestep; and (3) a multi-step denoising quality alignment mechanism ensuring lossless accuracy relative to standard DP inference. Experiments demonstrate a 4.17Γ— inference speedup, a draft acceptance rate exceeding 94%, and real-time control at 25 Hzβ€”achieved without any performance degradation in task success or trajectory fidelity.

Technology Category

Application Category

πŸ“ Abstract
Diffusion Policy (DP) excels in embodied control but suffers from high inference latency and computational cost due to multiple iterative denoising steps. The temporal complexity of embodied tasks demands a dynamic and adaptable computation mode. Static and lossy acceleration methods, such as quantization, fail to handle such dynamic embodied tasks, while speculative decoding offers a lossless and adaptive yet underexplored alternative for DP. However, it is non-trivial to address the following challenges: how to match the base model's denoising quality at lower cost under time-varying task difficulty in embodied settings, and how to dynamically and interactively adjust computation based on task difficulty in such environments. In this paper, we propose Temporal-aware Reinforcement-based Speculative Diffusion Policy (TS-DP), the first framework that enables speculative decoding for DP with temporal adaptivity. First, to handle dynamic environments where task difficulty varies over time, we distill a Transformer-based drafter to imitate the base model and replace its costly denoising calls. Second, an RL-based scheduler further adapts to time-varying task difficulty by adjusting speculative parameters to maintain accuracy while improving efficiency. Extensive experiments across diverse embodied environments demonstrate that TS-DP achieves up to 4.17 times faster inference with over 94% accepted drafts, reaching an inference frequency of 25 Hz and enabling real-time diffusion-based control without performance degradation.
Problem

Research questions and friction points this paper is trying to address.

Accelerates diffusion policy inference for real-time embodied control
Dynamically adapts computation to time-varying task difficulty
Maintains denoising quality while reducing latency and computational cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based drafter imitates base model for cost reduction
RL scheduler adapts speculative parameters to task difficulty
Speculative decoding enables real-time diffusion control without degradation
πŸ”Ž Similar Papers
Y
Ye Li
Tsinghua University
J
Jiahe Feng
Tsinghua University
Y
Yuan Meng
Tsinghua University
K
Kangye Ji
Tsinghua University
C
Chen Tang
Tsinghua University
X
Xinwan Wen
Tsinghua University
S
Shutao Xia
Tsinghua University
Z
Zhi Wang
Tsinghua University
Wenwu Zhu
Wenwu Zhu
Professor, Computer Science, Tsinghua Univerisity
Multimedia ComputingNetwork Representation Learning