🤖 AI Summary
To address the slow inference speed of flow-matching models in robot imitation learning, this paper proposes a single-step shortcut method with multi-step consistency ensembling. The approach tackles the problem by (1) decomposing the long-horizon flow-matching objective into parallel-optimizable sub-goals via a multi-step consistency loss mechanism, and (2) introducing an adaptive gradient allocation strategy that dynamically balances prediction accuracy and stability within a single-step inference. Inspired by knowledge distillation, the method preserves the expressive power of flow matching while drastically improving inference efficiency. Evaluated on two simulation benchmarks and five real-robot tasks, it achieves 3.2–5.8× faster inference over baseline flow-matching models, outperforms existing distillation and consistency-based methods in task performance, and exhibits improved training stability.
📝 Abstract
The wide application of flow-matching methods has greatly promoted the development of robot imitation learning. However, these methods all face the problem of high inference time. To address this issue, researchers have proposed distillation methods and consistency methods, but the performance of these methods still struggles to compete with that of the original diffusion models and flow-matching models. In this article, we propose a one-step shortcut method with multi-step integration for robot imitation learning. To balance the inference speed and performance, we extend the multi-step consistency loss on the basis of the shortcut model, split the one-step loss into multi-step losses, and improve the performance of one-step inference. Secondly, to solve the problem of unstable optimization of the multi-step loss and the original flow-matching loss, we propose an adaptive gradient allocation method to enhance the stability of the learning process. Finally, we evaluate the proposed method in two simulation benchmarks and five real-world environment tasks. The experimental results verify the effectiveness of the proposed algorithm.