🤖 AI Summary
This work addresses the high inference latency of generative flow and diffusion models in robotic control, which stems from iterative sampling and hinders real-time deployment. To overcome this limitation, the authors propose a self-distillation framework that requires no pretrained teacher model and enables single-step generation of high-fidelity actions for low-latency, high-accuracy visuomotor policies. The approach innovatively integrates a self-consistency loss, self-guided regularization, and a warm-start mechanism leveraging temporal action correlations to effectively reduce the generation trajectory length and enhance action quality. Evaluated across 56 simulated manipulation tasks, the method outperforms both 100-step diffusion and flow-based policies while achieving over 100× faster inference. It also surpasses the original 10-step policy when applied to the π₀.₅ model in RoboTwin 2.0.
📝 Abstract
Generative flow and diffusion models provide the continuous, multimodal action distributions needed for high-precision robotic policies. However, their reliance on iterative sampling introduces severe inference latency, degrading control frequency and harming performance in time-sensitive manipulation. To address this problem, we propose the One-Step Flow Policy (OFP), a from-scratch self-distillation framework for high-fidelity, single-step action generation without a pre-trained teacher. OFP unifies a self-consistency loss to enforce coherent transport across time intervals, and a self-guided regularization to sharpen predictions toward high-density expert modes. In addition, a warm-start mechanism leverages temporal action correlations to minimize the generative transport distance. Evaluations across 56 diverse simulated manipulation tasks demonstrate that a one-step OFP achieves state-of-the-art results, outperforming 100-step diffusion and flow policies while accelerating action generation by over $100\times$. We further integrate OFP into the $π_{0.5}$ model on RoboTwin 2.0, where one-step OFP surpasses the original 10-step policy. These results establish OFP as a practical, scalable solution for highly accurate and low-latency robot control.