Trajectory-Consistent Flow Matching for Robust Visuomotor Policy Learning

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the inconsistency between training and inference in existing flow matching strategies, which leads to accumulated trajectory errors during numerical integration and undermines robotic manipulation reliability. To mitigate trajectory drift, the authors propose a trajectory-consistent training framework that integrates multi-step trajectory supervision, fourth-order Runge–Kutta (RK4) integration, temporal smoothness regularization of the velocity field, auxiliary rectified flow velocity regression, and dual PointNet-based 3D point cloud encoding. Experiments on Franka and Spot robots demonstrate substantial improvements: the proposed method elevates long-horizon, multi-stage task success rates from 0% under baseline approaches to 70% and 60%, respectively, and achieves perfect (100%) success in precise tool-placement tasks.

📝 Abstract

Flow matching policies learn continuous velocity fields that transport noise to actions, enabling fast deterministic inference for robot manipulation. However, standard training optimizes a pointwise velocity objective while inference requires numerical integration of that field -- a mismatch that causes compounding trajectory errors. We propose four complementary remedies: (1) auxiliary rectified flow velocity regression that provides uniform temporal supervision across the full time interval; (2) multi-step trajectory consistency training that supervises the integrated displacement of the velocity field over trajectory segments, directly closing the train-inference gap; (3) velocity field regularization that enforces temporal smoothness, preventing oscillations that destabilize integration; and (4) fourth-order Runge-Kutta (RK4) inference that reduces global discretization error by orders of magnitude over Euler methods. Critically, these components are not independently sufficient -- RK4 without a smooth velocity field fails, and smoothness without trajectory-level supervision still drifts, as our ablation study confirms. We further pair these with a dual-view 3D point cloud encoder using two independent PointNet encoders for complementary spatial perception. On four real-robot tasks across a Franka arm and a Boston Dynamics Spot, our method achieves 70% and 60% overall success on two long-horizon multi-phase tasks where both baselines score 0%, and reaches 100% on precision tool placement. Three MetaWorld simulation tasks confirm consistent improvements, validating that trajectory-level supervision is essential for reliable policy execution.

Problem

Research questions and friction points this paper is trying to address.

flow matching

trajectory consistency

visuomotor policy

numerical integration

train-inference gap

Innovation

Methods, ideas, or system contributions that make the work stand out.

trajectory consistency

flow matching

visuomotor policy