FLASH: Efficient Visuomotor Policy via Sparse Sampling

📅 2026-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

225K/year
🤖 AI Summary
This work addresses the high inference latency of existing generative visuomotor policies—such as diffusion models—whose iterative denoising process impedes real-time robotic control. The authors propose a novel continuous trajectory representation based on Legendre polynomials, which fits expert demonstrations via sparse temporal sampling and leverages historical polynomial coefficients to initialize the flow-matching process. This enables single-step inference and long-horizon action generation. By introducing Legendre polynomials into visuomotor policy learning for the first time, and combining history-anchored flow-matching initialization with analytically derived feedforward velocity signals, the method achieves substantial gains in both efficiency and accuracy. Evaluated across seven tasks, it attains success rates exceeding 92%, with inference latency reduced to 31.40 ms per step—175× faster than diffusion-based approaches—while accelerating training convergence by 4× and reducing tracking error by 5–7×.
📝 Abstract
Generative models such as diffusion and flow matching have become dominant paradigms for visuomotor policy learning, yet their reliance on iterative denoising incurs high inference latency incompatible with real-time robotic control. We present Fast Legendre-polynomial Action policy via Sparse History-anchored flow (FLASH Policy), which replaces discrete action-chunk generation with continuous Legendre polynomial trajectory representation. Specifically, by fitting expert demonstrations under sparse temporal sampling, FLASH enables a single inference to cover a significantly extended action horizon. To further accelerate generation, FLASH initiates the flow matching process from history polynomial coefficients rather than uninformative Gaussian noise, shortening the transport distance and enabling accurate single-step inference. Moreover, analytic polynomial differentiation directly provides desired velocity feed-forward signals to the torque controller without numerical approximation. Extensive experiments on five simulated and two real-world manipulation tasks demonstrate that FLASH achieves state-of-the-art success rates ($\ge 92\%$ across all tasks), a per-episode inference time of $31.40\,ms$ (up to $175\times$ faster than diffusion policies and $18\times$ faster than prior flow matching policies), up to $4\times$ faster training convergence than ACT, and $5\times$ to $7\times$ reduction in controller tracking error compared to discrete-action baselines.
Problem

Research questions and friction points this paper is trying to address.

visuomotor policy
inference latency
real-time robotic control
generative models
action generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Legendre polynomial
sparse sampling
flow matching
visuomotor policy
continuous trajectory representation
🔎 Similar Papers
No similar papers found.
Jiaqi Bai
Jiaqi Bai
Beihang University
Natural Language ProcessingInformation RetrievalLarge Language Model
J
Jindou Jia
MARS Lab, Nanyang Technological University, Singapore
Y
Yuxuan Hu
MARS Lab, Nanyang Technological University, Singapore
Gen Li
Gen Li
Postdoctoral Research Fellow, Nanyang Technological University
Embodied AIComputer VisionRoboticsArtificial Intelligence
X
Xiangyu Chen
MARS Lab, Nanyang Technological University, Singapore
T
Tuo An
MARS Lab, Nanyang Technological University, Singapore
K
Kuangji Zuo
MARS Lab, Nanyang Technological University, Singapore
Jianfei Yang
Jianfei Yang
Assistant Professor, Director of MARS Lab, Nanyang Technological University
Physical AIEmbodied AIMultimodal AI