Action-to-Action Flow Matching

📅 2026-02-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the high inference latency of diffusion policies—stemming from sampling from random Gaussian noise—which hinders their applicability to real-time robotic control. To overcome this limitation, the authors propose a novel action generation paradigm that abandons uninformative noise initialization and instead embeds historical proprioceptive sequences into a high-dimensional latent space to serve as dynamic initial conditions within a flow-matching framework, enabling efficient single-step action prediction. The resulting method generates high-quality actions in just 0.56 ms, exhibits robustness under visual perturbations, demonstrates strong generalization to unseen task configurations, and is readily extensible to video generation tasks.

Technology Category

Application Category

📝 Abstract

Diffusion-based policies have recently achieved remarkable success in robotics by formulating action prediction as a conditional denoising process. However, the standard practice of sampling from random Gaussian noise often requires multiple iterative steps to produce clean actions, leading to high inference latency that incurs a major bottleneck for real-time control. In this paper, we challenge the necessity of uninformed noise sampling and propose Action-to-Action flow matching (A2A), a novel policy paradigm that shifts from random sampling to initialization informed by the previous action. Unlike existing methods that treat proprioceptive action feedback as static conditions, A2A leverages historical proprioceptive sequences, embedding them into a high-dimensional latent space as the starting point for action generation. This design bypasses costly iterative denoising while effectively capturing the robot's physical dynamics and temporal continuity. Extensive experiments demonstrate that A2A exhibits high training efficiency, fast inference speed, and improved generalization. Notably, A2A enables high-quality action generation in as few as a single inference step (0.56 ms latency), and exhibits superior robustness to visual perturbations and enhanced generalization to unseen configurations. Lastly, we also extend A2A to video generation, demonstrating its broader versatility in temporal modeling. Project site: https://lorenzo-0-0.github.io/A2A_Flow_Matching.

Problem

Research questions and friction points this paper is trying to address.

diffusion-based policies

action prediction

inference latency

real-time control

robotics

Innovation

Methods, ideas, or system contributions that make the work stand out.

flow matching

action prediction

proprioceptive feedback

real-time control

temporal modeling

🔎 Similar Papers

No similar papers found.

Authors to Follow