π€ AI Summary
Diffusion-based policies struggle to meet the real-time demands of visuomotor control due to their multi-step denoising process, and existing static acceleration methods fail to adapt to dynamic environmental changes. To address this, this work proposes Sparse ActionGen (SAG), a framework that employs an observation-conditioned diffusion pruner to dynamically identify redundant computations for pruning. SAG further introduces a cross-timestep and cross-block βone-to-manyβ zig-zag activation reuse strategy, enabling adaptive sparse action generation. By integrating real-time pruning, cached activation reuse, and a parameter-efficient architecture, SAG achieves up to 4Γ acceleration across multiple robotic benchmarks while preserving the original policy performance.
π Abstract
Diffusion Policy has dominated action generation due to its strong capabilities for modeling multi-modal action distributions, but its multi-step denoising processes make it impractical for real-time visuomotor control. Existing caching-based acceleration methods typically rely on $\textit{static}$ schedules that fail to adapt to the $\textit{dynamics}$ of robot-environment interactions, thereby leading to suboptimal performance. In this paper, we propose $\underline{\textbf{S}}$parse $\underline{\textbf{A}}$ction$\underline{\textbf{G}}$en ($\textbf{SAG}$) for extremely sparse action generation. To accommodate the iterative interactions, SAG customizes a rollout-adaptive prune-then-reuse mechanism that first identifies prunable computations globally and then reuses cached activations to substitute them during action diffusion. To capture the rollout dynamics, SAG parameterizes an observation-conditioned diffusion pruner for environment-aware adaptation and instantiates it with a highly parameter- and inference-efficient design for real-time prediction. Furthermore, SAG introduces a one-for-all reusing strategy that reuses activations across both timesteps and blocks in a zig-zag manner, minimizing the global redundancy. Extensive experiments on multiple robotic benchmarks demonstrate that SAG achieves up to 4$\times$ generation speedup without sacrificing performance. Project Page: https://sparse-actiongen.github.io/.