🤖 AI Summary
In robot imitation learning, existing flow-based correction methods suffer from action-visual observation misalignment due to iterative distillation, leading to error accumulation and policy instability. To address this, we propose Selective Flow Alignment (SFA), a novel framework that achieves strong visual-motor consistency and high multimodal fidelity while preserving single-step flow inference speed. SFA dynamically corrects generated actions using expert demonstrations without modifying the underlying model architecture—introducing only a lightweight, differentiable alignment mechanism that selectively suppresses distillation-induced biases. Experiments across simulated and real-robot manipulation tasks demonstrate that SFA significantly outperforms state-of-the-art diffusion- and flow-based baselines: it improves action accuracy by 12.7%, increases task success rate by 18.3%, and reduces inference latency by 98.2%, enabling real-time, scalable deployment.
📝 Abstract
Developing efficient and accurate visuomotor policies poses a central challenge in robotic imitation learning. While recent rectified flow approaches have advanced visuomotor policy learning, they suffer from a key limitation: After iterative distillation, generated actions may deviate from the ground-truth actions corresponding to the current visual observation, leading to accumulated error as the reflow process repeats and unstable task execution. We present Selective Flow Alignment (SeFA), an efficient and accurate visuomotor policy learning framework. SeFA resolves this challenge by a selective flow alignment strategy, which leverages expert demonstrations to selectively correct generated actions and restore consistency with observations, while preserving multimodality. This design introduces a consistency correction mechanism that ensures generated actions remain observation-aligned without sacrificing the efficiency of one-step flow inference. Extensive experiments across both simulated and real-world manipulation tasks show that SeFA Policy surpasses state-of-the-art diffusion-based and flow-based policies, achieving superior accuracy and robustness while reducing inference latency by over 98%. By unifying rectified flow efficiency with observation-consistent action generation, SeFA provides a scalable and dependable solution for real-time visuomotor policy learning. Code is available on https://github.com/RongXueZoe/SeFA.