Trajectory-Consistent Flow Matching for Robust Visuomotor Policy Learning

πŸ“… 2026-05-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

203K/year
πŸ€– AI Summary
This work addresses the inconsistency between training and inference in existing flow matching strategies, which leads to accumulated trajectory errors during numerical integration and undermines robotic manipulation reliability. To mitigate trajectory drift, the authors propose a trajectory-consistent training framework that integrates multi-step trajectory supervision, fourth-order Runge–Kutta (RK4) integration, temporal smoothness regularization of the velocity field, auxiliary rectified flow velocity regression, and dual PointNet-based 3D point cloud encoding. Experiments on Franka and Spot robots demonstrate substantial improvements: the proposed method elevates long-horizon, multi-stage task success rates from 0% under baseline approaches to 70% and 60%, respectively, and achieves perfect (100%) success in precise tool-placement tasks.
πŸ“ Abstract
Flow matching policies learn continuous velocity fields that transport noise to actions, enabling fast deterministic inference for robot manipulation. However, standard training optimizes a pointwise velocity objective while inference requires numerical integration of that field -- a mismatch that causes compounding trajectory errors. We propose four complementary remedies: (1) auxiliary rectified flow velocity regression that provides uniform temporal supervision across the full time interval; (2) multi-step trajectory consistency training that supervises the integrated displacement of the velocity field over trajectory segments, directly closing the train-inference gap; (3) velocity field regularization that enforces temporal smoothness, preventing oscillations that destabilize integration; and (4) fourth-order Runge-Kutta (RK4) inference that reduces global discretization error by orders of magnitude over Euler methods. Critically, these components are not independently sufficient -- RK4 without a smooth velocity field fails, and smoothness without trajectory-level supervision still drifts, as our ablation study confirms. We further pair these with a dual-view 3D point cloud encoder using two independent PointNet encoders for complementary spatial perception. On four real-robot tasks across a Franka arm and a Boston Dynamics Spot, our method achieves 70% and 60% overall success on two long-horizon multi-phase tasks where both baselines score 0%, and reaches 100% on precision tool placement. Three MetaWorld simulation tasks confirm consistent improvements, validating that trajectory-level supervision is essential for reliable policy execution.
Problem

Research questions and friction points this paper is trying to address.

flow matching
trajectory consistency
visuomotor policy
numerical integration
train-inference gap
Innovation

Methods, ideas, or system contributions that make the work stand out.

trajectory consistency
flow matching
visuomotor policy
velocity field regularization
Runge-Kutta integration
πŸ”Ž Similar Papers
No similar papers found.
R
Riad Ahmed
University of New Hampshire, Durham, NH, USA
S
Sujosh Nag
University of New Hampshire, Durham, NH, USA
M
Moniruzzaman Akash
University of New Hampshire, Durham, NH, USA
Mostafa Hussein
Mostafa Hussein
Applied scientist with Amazon Robotics
RoboticsMachine learningArtificial IntelligenceData mining
Momotaz Begum
Momotaz Begum
Associate Professor, Computer Science, University of New Hampshire
AIHuman robot interactionAssistive roboticsSLAMimage processing