FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative visuomotor policies rely on multi-step sampling, resulting in high inference latency and failing to meet real-time robotic manipulation requirements. The core challenge lies in enforcing strong temporal continuity and structural consistency in action trajectories—properties that image-acceleration techniques cannot directly transfer. This paper proposes the first frequency-consistent modeling paradigm for streaming visuomotor policies, innovatively introducing temporal frequency-domain consistency constraints and an adaptive consistency loss to explicitly model dynamic trajectory continuity. Our method builds upon invertible flow models, integrating frequency-domain feature alignment, adaptive weighted loss, and end-to-end Vision-Language-Action (VLA) integration. Evaluated on 53 simulated tasks, it surpasses state-of-the-art single-step action generators. When integrated into a VLA framework, it achieves inference acceleration on Libero-40 with zero performance degradation. On physical hardware, it operates at 93.5 Hz.

Technology Category

Application Category

📝 Abstract
Generative modeling-based visuomotor policies have been widely adopted in robotic manipulation attributed to their ability to model multimodal action distributions. However, the high inference cost of multi-step sampling limits their applicability in real-time robotic systems. To address this issue, existing approaches accelerate the sampling process in generative modeling-based visuomotor policies by adapting acceleration techniques originally developed for image generation. Despite this progress, a major distinction remains: image generation typically involves producing independent samples without temporal dependencies, whereas robotic manipulation involves generating time-series action trajectories that require continuity and temporal coherence. To effectively exploit temporal information in robotic manipulation, we propose FreqPolicy, a novel approach that first imposes frequency consistency constraints on flow-based visuomotor policies. Our work enables the action model to capture temporal structure effectively while supporting efficient, high-quality one-step action generation. We introduce a frequency consistency constraint that enforces alignment of frequency-domain action features across different timesteps along the flow, thereby promoting convergence of one-step action generation toward the target distribution. In addition, we design an adaptive consistency loss to capture structural temporal variations inherent in robotic manipulation tasks. We assess FreqPolicy on 53 tasks across 3 simulation benchmarks, proving its superiority over existing one-step action generators. We further integrate FreqPolicy into the vision-language-action (VLA) model and achieve acceleration without performance degradation on the 40 tasks of Libero. Besides, we show efficiency and effectiveness in real-world robotic scenarios with an inference frequency 93.5Hz. The code will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Reduce high inference cost in visuomotor policies
Ensure temporal coherence in action trajectories
Achieve efficient one-step action generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency consistency constraints for flow-based policies
Adaptive consistency loss for temporal variations
Efficient one-step action generation at 93.5Hz
🔎 Similar Papers
No similar papers found.
Yifei Su
Yifei Su
Institute of Automation, Chinese Academy of Sciences
Embodied AIMultimodal Learning
N
Ning Liu
Beijing Innovation Center of Humanoid Robotics
D
Dong Chen
Beijing Innovation Center of Humanoid Robotics
Z
Zhen Zhao
Beijing Innovation Center of Humanoid Robotics
K
Kun Wu
Beijing Innovation Center of Humanoid Robotics
M
Meng Li
Beijing Innovation Center of Humanoid Robotics
Z
Zhiyuan Xu
Beijing Innovation Center of Humanoid Robotics
Zhengping Che
Zhengping Che
X-Humanoid
Embodied AIDeep Learning
J
Jian Tang
Beijing Innovation Center of Humanoid Robotics