🤖 AI Summary
To address the slow inference of diffusion-based policies and the training complexity of distillation methods, this paper proposes Riemannian Flow Matching for Visual-Motor Policies (RFMP), the first flow matching framework formulated directly on the non-Euclidean Riemannian manifold of robot state spaces—naturally encoding kinematic and dynamic constraints. We further introduce Stable RFMP, which leverages LaSalle’s invariance principle to ensure asymptotic stability of the generated dynamics with respect to the target distribution, thereby guaranteeing both geometric consistency and robustness. RFMP enables end-to-end visual–motor joint modeling with streamlined training and achieves 3–5× faster single-step inference than diffusion policies. Evaluated across eight simulated and real-robot tasks, RFMP consistently outperforms both Diffusion Policies and Consistency Policies in task success rate, sample efficiency, and generalization.
📝 Abstract
Diffusion-based visuomotor policies excel at learning complex robotic tasks by effectively combining visual data with high-dimensional, multi-modal action distributions. However, diffusion models often suffer from slow inference due to costly denoising processes or require complex sequential training arising from recent distilling approaches. This paper introduces Riemannian Flow Matching Policy (RFMP), a model that inherits the easy training and fast inference capabilities of flow matching (FM). Moreover, RFMP inherently incorporates geometric constraints commonly found in realistic robotic applications, as the robot state resides on a Riemannian manifold. To enhance the robustness of RFMP, we propose Stable RFMP (SRFMP), which leverages LaSalle's invariance principle to equip the dynamics of FM with stability to the support of a target Riemannian distribution. Rigorous evaluation on eight simulated and real-world tasks show that RFMP successfully learns and synthesizes complex sensorimotor policies on Euclidean and Riemannian spaces with efficient training and inference phases, outperforming Diffusion Policies and Consistency Policies.