One Step Is Enough: Dispersive MeanFlow Policy Optimization

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that multi-step sampling generation strategies struggle to meet the stringent real-time requirements of robotic control, which demands high-speed action generation. The authors propose DMPO, a novel framework that, for the first time, enables mathematically rigorous single-step policy generation without relying on knowledge distillation. By leveraging MeanFlow modeling and discrete regularization to prevent representation collapse, and integrating a lightweight network architecture with reinforcement learning fine-tuning, DMPO surpasses expert demonstrations in performance. Evaluated on RoboMimic and Gym benchmarks, the method matches or exceeds the performance of multi-step approaches while achieving inference speeds exceeding 120 Hz—reaching several hundred Hz in optimal cases—and has been successfully deployed on a Franka robotic arm, demonstrating both high efficiency and precision.

Technology Category

Application Category

📝 Abstract
Real-time robotic control demands fast action generation. However, existing generative policies based on diffusion and flow matching require multi-step sampling, fundamentally limiting deployment in time-critical scenarios. We propose Dispersive MeanFlow Policy Optimization (DMPO), a unified framework that enables true one-step generation through three key components: MeanFlow for mathematically-derived single-step inference without knowledge distillation, dispersive regularization to prevent representation collapse, and reinforcement learning (RL) fine-tuning to surpass expert demonstrations. Experiments across RoboMimic manipulation and OpenAI Gym locomotion benchmarks demonstrate competitive or superior performance compared to multi-step baselines. With our lightweight model architecture and the three key algorithmic components working in synergy, DMPO exceeds real-time control requirements (>120Hz) with 5-20x inference speedup, reaching hundreds of Hertz on high-performance GPUs. Physical deployment on a Franka-Emika-Panda robot validates real-world applicability.
Problem

Research questions and friction points this paper is trying to address.

real-time robotic control
generative policies
multi-step sampling
time-critical scenarios
action generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

one-step generation
MeanFlow
disperive regularization
reinforcement learning fine-tuning
real-time robotic control
🔎 Similar Papers
No similar papers found.
G
Guowei Zou
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, China
H
Haitao Wang
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, China
H
Hejun Wu
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, China
Y
Yukun Qian
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, China
Y
Yuhang Wang
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; Guangdong Key Laboratory of Big Data Analysis and Processing, Guangzhou, China
Weibing Li
Weibing Li
School of Computer Science and Engineering, Sun Yat-sen University
Neural NetworksRoboticsAutomatic Control