FASTER: Rethinking Real-Time Flow VLAs

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high initial-action latency of existing Vision-Language-Action (VLA) models, which impedes their ability to respond promptly to dynamic environments in real-time deployment. The authors propose Horizon-Aware Schedule, an adaptive scheduling method that compresses the denoising process for immediate reaction into a single step, leveraging the observed uniform distribution governing the interplay between initial-action latency and execution horizon. This approach significantly reduces response delay while preserving long-horizon trajectory quality. Integrated within a streaming client-server architecture and combined with a flow-based VLA model and adaptive action sampling, the method enables efficient real-time responsiveness on commodity GPUs, as demonstrated on physical robotic systems—including highly dynamic tasks such as table tennis.

Technology Category

Application Category

📝 Abstract
Real-time execution is crucial for deploying Vision-Language-Action (VLA) models in the physical world. Existing asynchronous inference methods primarily optimize trajectory smoothness, but neglect the critical latency in reacting to environmental changes. By rethinking the notion of reaction in action chunking policies, this paper presents a systematic analysis of the factors governing reaction time. We show that reaction time follows a uniform distribution determined jointly by the Time to First Action (TTFA) and the execution horizon. Moreover, we reveal that the standard practice of applying a constant schedule in flow-based VLAs can be inefficient and forces the system to complete all sampling steps before any movement can start, forming the bottleneck in reaction latency. To overcome this issue, we propose Fast Action Sampling for ImmediaTE Reaction (FASTER). By introducing a Horizon-Aware Schedule, FASTER adaptively prioritizes near-term actions during flow sampling, compressing the denoising of the immediate reaction by tenfold (e.g., in $π_{0.5}$ and X-VLA) into a single step, while preserving the quality of long-horizon trajectory. Coupled with a streaming client-server pipeline, FASTER substantially reduces the effective reaction latency on real robots, especially when deployed on consumer-grade GPUs. Real-world experiments, including a highly dynamic table tennis task, prove that FASTER unlocks unprecedented real-time responsiveness for generalist policies, enabling rapid generation of accurate and smooth trajectories.
Problem

Research questions and friction points this paper is trying to address.

reaction latency
real-time execution
Vision-Language-Action models
action chunking
flow-based VLAs
Innovation

Methods, ideas, or system contributions that make the work stand out.

FASTER
Horizon-Aware Schedule
Reaction Latency
Flow-based VLA
Real-time Execution
🔎 Similar Papers
2024-03-042024 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops)Citations: 5