π€ AI Summary
Diffusion policies (DPs) suffer from high inference latency and computational overhead due to multi-step denoising, while static acceleration methods fail to adapt to task-specific temporal dynamics. This paper introduces the first reinforcement learning (RL)-driven, temporally adaptive speculative decoding framework for DPs. Our approach addresses the core challenges through three key innovations: (1) a Transformer-based, time-aware distilled drafter that generates high-fidelity draft trajectories; (2) an RL-based scheduler that dynamically optimizes the number of speculative steps and model parameters per timestep; and (3) a multi-step denoising quality alignment mechanism ensuring lossless accuracy relative to standard DP inference. Experiments demonstrate a 4.17Γ inference speedup, a draft acceptance rate exceeding 94%, and real-time control at 25 Hzβachieved without any performance degradation in task success or trajectory fidelity.
π Abstract
Diffusion Policy (DP) excels in embodied control but suffers from high inference latency and computational cost due to multiple iterative denoising steps. The temporal complexity of embodied tasks demands a dynamic and adaptable computation mode. Static and lossy acceleration methods, such as quantization, fail to handle such dynamic embodied tasks, while speculative decoding offers a lossless and adaptive yet underexplored alternative for DP. However, it is non-trivial to address the following challenges: how to match the base model's denoising quality at lower cost under time-varying task difficulty in embodied settings, and how to dynamically and interactively adjust computation based on task difficulty in such environments. In this paper, we propose Temporal-aware Reinforcement-based Speculative Diffusion Policy (TS-DP), the first framework that enables speculative decoding for DP with temporal adaptivity. First, to handle dynamic environments where task difficulty varies over time, we distill a Transformer-based drafter to imitate the base model and replace its costly denoising calls. Second, an RL-based scheduler further adapts to time-varying task difficulty by adjusting speculative parameters to maintain accuracy while improving efficiency. Extensive experiments across diverse embodied environments demonstrate that TS-DP achieves up to 4.17 times faster inference with over 94% accepted drafts, reaching an inference frequency of 25 Hz and enabling real-time diffusion-based control without performance degradation.