HybridFlow: A Two-Step Generative Policy for Robotic Manipulation

📅 2026-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the trade-off between inference latency and motion accuracy that limits real-time, high-precision robotic manipulation. We propose a novel three-stage, two-step generation framework that integrates MeanFlow, ReNoise, and ReFlow—marking the first combination of MeanFlow and ReFlow—to generate high-fidelity action sequences with only two function evaluations (2-NFE), achieving minimal computational overhead. Real-world robotic experiments demonstrate that our method reduces inference time from 152 ms to 19 ms (an ~8× speedup, reaching 52 Hz) and improves success rates by 15–25% over a 16-step diffusion policy. The approach achieves 70.0% success in unseen-color grasping and 66.3% in deformable object folding tasks, showcasing its efficacy in complex, real-time interactive scenarios.

Technology Category

Application Category

📝 Abstract
Limited by inference latency, existing robot manipulation policies lack sufficient real-time interaction capability with the environment. Although faster generation methods such as flow matching are gradually replacing diffusion methods, researchers are pursuing even faster generation suitable for interactive robot control. MeanFlow, as a one-step variant of flow matching, has shown strong potential in image generation, but its precision in action generation does not meet the stringent requirements of robotic manipulation. We therefore propose \textbf{HybridFlow}, a \textbf{3-stage method} with \textbf{2-NFE}: Global Jump in MeanFlow mode, ReNoise for distribution alignment, and Local Refine in ReFlow mode. This method balances inference speed and generation quality by leveraging the rapid advantage of MeanFlow one-step generation while ensuring action precision with minimal generation steps. Through real-world experiments, HybridFlow outperforms the 16-step Diffusion Policy by \textbf{15--25\%} in success rate while reducing inference time from 152ms to 19ms (\textbf{8$\times$ speedup}, \textbf{$\sim$52Hz}); it also achieves 70.0\% success on unseen-color OOD grasping and 66.3\% on deformable object folding. We envision HybridFlow as a practical low-latency method to enhance real-world interaction capabilities of robotic manipulation policies.
Problem

Research questions and friction points this paper is trying to address.

robotic manipulation
inference latency
real-time interaction
action generation
flow matching
Innovation

Methods, ideas, or system contributions that make the work stand out.

HybridFlow
flow matching
low-latency robot control
two-step generation
real-time manipulation
🔎 Similar Papers
No similar papers found.
Z
Zhenchen Dong
PolyU
J
Jinna Fu
Anker Humanoid Lab
Jiaming Wu
Jiaming Wu
Assistant Professor, Chalmers University of Technology
Modeling and optimization of intelligent transport systems
S
Shengyuan Yu
CUHK
F
Fulin Chen
Anker Humanoid Lab
Y
Yide Liu
Anker Humanoid Lab