🤖 AI Summary
In imitation learning, slow human demonstrations hinder the execution efficiency of visuomotor policies. Method: We propose a self-supervised demonstration acceleration method based on action entropy estimation. For the first time, we leverage the action entropy—computed from generative policies (e.g., ACT or Diffusion Policy)—to quantify semantic uncertainty per demonstration frame, enabling semantic-aware non-uniform temporal compression: high-entropy (information-critical) frames are preserved, while low-entropy (redundant) frames undergo dynamic downsampling. We further introduce a closed-loop framework comprising piecewise adaptive downsampling, self-supervised relabeling, and policy retraining. Results: Experiments show up to 3× speedup in policy execution, with maintained or improved task success rates and significantly reduced decision latency—enhancing real-time robustness. Our core contribution is the use of action entropy as an interpretable, temporally grounded compression signal, achieving joint optimization of accuracy and efficiency.
📝 Abstract
Imitation learning has shown great promise in robotic manipulation, but the policy's execution is often unsatisfactorily slow due to commonly tardy demonstrations collected by human operators. In this work, we present DemoSpeedup, a self-supervised method to accelerate visuomotor policy execution via entropy-guided demonstration acceleration. DemoSpeedup starts from training an arbitrary generative policy (e.g., ACT or Diffusion Policy) on normal-speed demonstrations, which serves as a per-frame action entropy estimator. The key insight is that frames with lower action entropy estimates call for more consistent policy behaviors, which often indicate the demands for higher-precision operations. In contrast, frames with higher entropy estimates correspond to more casual sections, and therefore can be more safely accelerated. Thus, we segment the original demonstrations according to the estimated entropy, and accelerate them by down-sampling at rates that increase with the entropy values. Trained with the speedup demonstrations, the resulting policies execute up to 3 times faster while maintaining the task completion performance. Interestingly, these policies could even achieve higher success rates than those trained with normal-speed demonstrations, due to the benefits of reduced decision-making horizons.