TS-DP: Reinforcement Speculative Decoding For Temporal Adaptive Diffusion Policy Acceleration

📅 2025-12-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Diffusion policies (DPs) suffer from high inference latency and computational overhead due to multi-step denoising, while static acceleration methods fail to adapt to task-specific temporal dynamics. This paper introduces the first reinforcement learning (RL)-driven, temporally adaptive speculative decoding framework for DPs. Our approach addresses the core challenges through three key innovations: (1) a Transformer-based, time-aware distilled drafter that generates high-fidelity draft trajectories; (2) an RL-based scheduler that dynamically optimizes the number of speculative steps and model parameters per timestep; and (3) a multi-step denoising quality alignment mechanism ensuring lossless accuracy relative to standard DP inference. Experiments demonstrate a 4.17× inference speedup, a draft acceptance rate exceeding 94%, and real-time control at 25 Hz—achieved without any performance degradation in task success or trajectory fidelity.

Technology Category

Application Category

📝 Abstract

Diffusion Policy (DP) excels in embodied control but suffers from high inference latency and computational cost due to multiple iterative denoising steps. The temporal complexity of embodied tasks demands a dynamic and adaptable computation mode. Static and lossy acceleration methods, such as quantization, fail to handle such dynamic embodied tasks, while speculative decoding offers a lossless and adaptive yet underexplored alternative for DP. However, it is non-trivial to address the following challenges: how to match the base model's denoising quality at lower cost under time-varying task difficulty in embodied settings, and how to dynamically and interactively adjust computation based on task difficulty in such environments. In this paper, we propose Temporal-aware Reinforcement-based Speculative Diffusion Policy (TS-DP), the first framework that enables speculative decoding for DP with temporal adaptivity. First, to handle dynamic environments where task difficulty varies over time, we distill a Transformer-based drafter to imitate the base model and replace its costly denoising calls. Second, an RL-based scheduler further adapts to time-varying task difficulty by adjusting speculative parameters to maintain accuracy while improving efficiency. Extensive experiments across diverse embodied environments demonstrate that TS-DP achieves up to 4.17 times faster inference with over 94% accepted drafts, reaching an inference frequency of 25 Hz and enabling real-time diffusion-based control without performance degradation.

Problem

Research questions and friction points this paper is trying to address.

Accelerates diffusion policy inference for real-time embodied control

Dynamically adapts computation to time-varying task difficulty

Maintains denoising quality while reducing latency and computational cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based drafter imitates base model for cost reduction

RL scheduler adapts speculative parameters to task difficulty

Speculative decoding enables real-time diffusion control without degradation

🔎 Similar Papers

Diffusion Models Meet Contextual Bandits with Large Action Spaces

2024-02-15arXiv.orgCitations: 5

Authors to Follow