OMP: One-step Meanflow Policy with Directional Alignment

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

To address high inference latency, architectural complexity, and poor few-shot generalization in generative policies for robotic dexterous manipulation, this paper proposes MeanFlow++, a mean-flow policy enhancement framework. Methodologically, it (1) introduces a cosine-direction alignment loss to decouple velocity direction and magnitude calibration, and (2) models dynamic trajectory evolution via differential derivative equations (DDEs), coupled with Jacobian-vector product (JVP)-based optimization to replace fixed-temperature denoising losses—enabling cooperative alignment between predicted mean velocities and ground-truth trajectories. The framework preserves single-step inference while significantly reducing computational overhead. Evaluated on Adroit and Meta-World benchmarks, MeanFlow++ achieves higher average success rates than MP1 and FlowPolicy, with particularly notable gains on challenging Meta-World tasks. It thus strikes an effective balance between real-time execution and trajectory fidelity.

Technology Category

Application Category

📝 Abstract

Robot manipulation, a key capability of embodied AI, has turned to data-driven generative policy frameworks, but mainstream approaches like Diffusion Models suffer from high inference latency and Flow-based Methods from increased architectural complexity. While simply applying meanFlow on robotic tasks achieves single-step inference and outperforms FlowPolicy, it lacks few-shot generalization due to fixed temperature hyperparameters in its Dispersive Loss and misaligned predicted-true mean velocities. To solve these issues, this study proposes an improved MeanFlow-based Policies: we introduce a lightweight Cosine Loss to align velocity directions and use the Differential Derivation Equation (DDE) to optimize the Jacobian-Vector Product (JVP) operator. Experiments on Adroit and Meta-World tasks show the proposed method outperforms MP1 and FlowPolicy in average success rate, especially in challenging Meta-World tasks, effectively enhancing few-shot generalization and trajectory accuracy of robot manipulation policies while maintaining real-time performance, offering a more robust solution for high-precision robotic manipulation.

Problem

Research questions and friction points this paper is trying to address.

Improves few-shot generalization in robot manipulation policies

Enhances trajectory accuracy while maintaining real-time performance

Addresses misaligned velocity predictions and fixed temperature hyperparameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight Cosine Loss aligns velocity directions

DDE optimizes Jacobian-Vector Product operator

Maintains real-time performance with enhanced generalization

🔎 Similar Papers

Revealing the learning process in reinforcement learning agents through attention-oriented metrics

2024-06-20arXiv.orgCitations: 0

Analyzing and Bridging the Gap between Maximizing Total Reward and Discounted Reward in Deep Reinforcement Learning

2024-07-18arXiv.orgCitations: 0