🤖 AI Summary
This work addresses the high inference latency of existing generative visuomotor policies—such as diffusion models—whose iterative denoising process impedes real-time robotic control. The authors propose a novel continuous trajectory representation based on Legendre polynomials, which fits expert demonstrations via sparse temporal sampling and leverages historical polynomial coefficients to initialize the flow-matching process. This enables single-step inference and long-horizon action generation. By introducing Legendre polynomials into visuomotor policy learning for the first time, and combining history-anchored flow-matching initialization with analytically derived feedforward velocity signals, the method achieves substantial gains in both efficiency and accuracy. Evaluated across seven tasks, it attains success rates exceeding 92%, with inference latency reduced to 31.40 ms per step—175× faster than diffusion-based approaches—while accelerating training convergence by 4× and reducing tracking error by 5–7×.
📝 Abstract
Generative models such as diffusion and flow matching have become dominant paradigms for visuomotor policy learning, yet their reliance on iterative denoising incurs high inference latency incompatible with real-time robotic control. We present Fast Legendre-polynomial Action policy via Sparse History-anchored flow (FLASH Policy), which replaces discrete action-chunk generation with continuous Legendre polynomial trajectory representation. Specifically, by fitting expert demonstrations under sparse temporal sampling, FLASH enables a single inference to cover a significantly extended action horizon. To further accelerate generation, FLASH initiates the flow matching process from history polynomial coefficients rather than uninformative Gaussian noise, shortening the transport distance and enabling accurate single-step inference. Moreover, analytic polynomial differentiation directly provides desired velocity feed-forward signals to the torque controller without numerical approximation. Extensive experiments on five simulated and two real-world manipulation tasks demonstrate that FLASH achieves state-of-the-art success rates ($\ge 92\%$ across all tasks), a per-episode inference time of $31.40\,ms$ (up to $175\times$ faster than diffusion policies and $18\times$ faster than prior flow matching policies), up to $4\times$ faster training convergence than ACT, and $5\times$ to $7\times$ reduction in controller tracking error compared to discrete-action baselines.