SeFA-Policy: Fast and Accurate Visuomotor Policy Learning with Selective Flow Alignment

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

In robot imitation learning, existing flow-based correction methods suffer from action-visual observation misalignment due to iterative distillation, leading to error accumulation and policy instability. To address this, we propose Selective Flow Alignment (SFA), a novel framework that achieves strong visual-motor consistency and high multimodal fidelity while preserving single-step flow inference speed. SFA dynamically corrects generated actions using expert demonstrations without modifying the underlying model architecture—introducing only a lightweight, differentiable alignment mechanism that selectively suppresses distillation-induced biases. Experiments across simulated and real-robot manipulation tasks demonstrate that SFA significantly outperforms state-of-the-art diffusion- and flow-based baselines: it improves action accuracy by 12.7%, increases task success rate by 18.3%, and reduces inference latency by 98.2%, enabling real-time, scalable deployment.

Technology Category

Application Category

📝 Abstract

Developing efficient and accurate visuomotor policies poses a central challenge in robotic imitation learning. While recent rectified flow approaches have advanced visuomotor policy learning, they suffer from a key limitation: After iterative distillation, generated actions may deviate from the ground-truth actions corresponding to the current visual observation, leading to accumulated error as the reflow process repeats and unstable task execution. We present Selective Flow Alignment (SeFA), an efficient and accurate visuomotor policy learning framework. SeFA resolves this challenge by a selective flow alignment strategy, which leverages expert demonstrations to selectively correct generated actions and restore consistency with observations, while preserving multimodality. This design introduces a consistency correction mechanism that ensures generated actions remain observation-aligned without sacrificing the efficiency of one-step flow inference. Extensive experiments across both simulated and real-world manipulation tasks show that SeFA Policy surpasses state-of-the-art diffusion-based and flow-based policies, achieving superior accuracy and robustness while reducing inference latency by over 98%. By unifying rectified flow efficiency with observation-consistent action generation, SeFA provides a scalable and dependable solution for real-time visuomotor policy learning. Code is available on https://github.com/RongXueZoe/SeFA.

Problem

Research questions and friction points this paper is trying to address.

Addresses action deviation from visual observations in imitation learning

Reduces accumulated error and instability in visuomotor policy execution

Maintains multimodality while ensuring observation-consistent action generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective flow alignment corrects generated actions

Ensures observation-consistent action generation without multimodality loss

Unifies rectified flow efficiency with robust real-time execution

🔎 Similar Papers

Feature Extractor or Decision Maker: Rethinking the Role of Visual Encoders in Visuomotor Policies