ProbeFlow: Training-Free Adaptive Flow Matching for Vision-Language-Action Models

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high inference latency in Vision-Language-Action models caused by multi-step ODE solvers in flow-matching-based action heads, which hinders real-time robotic control. The authors propose the first training-free, adaptive inference framework that dynamically schedules integration steps at the action head level. By evaluating the geometric complexity of the trajectory via the cosine similarity between initial and lookahead velocity vectors, the method automatically identifies semantic bottlenecks and reduces redundant neural network evaluations. Evaluated on MetaWorld, the approach achieves a 14.8× speedup in action decoding and a 2.8× reduction in end-to-end latency without compromising task success rates. On long-horizon LIBERO tasks, it substantially alleviates solver-induced delays, with real-world experiments confirming its stability and low-latency advantages.

Technology Category

Application Category

📝 Abstract
Recent Vision-Language-Action (VLA) models equipped with Flow Matching (FM) action heads achieve state-of-the-art performance in complex robot manipulation. However, the multi-step iterative ODE solving required by FM introduces inference latency that precludes responsive physical control. While current acceleration efforts optimize the Vision-Language Model (VLM) backbone, the action head bottleneck remains overlooked. To address this, we propose ProbeFlow, a training-free adaptive inference framework tai- lored for continuous robotic control. By evaluating geometric trajectory complexity via the cosine similarity between initial and lookahead velocity vectors, ProbeFlow dynamically sched- ules integration steps to prune redundant network evaluations. On the MetaWorld benchmark, it accelerates action decoding by 14.8x (reducing average steps from N = 50 to 2.6) and cuts end-to-end system latency by 2.8x without compromising the manipulation success rate. On the long-horizon LIBERO benchmark, the probe automatically allocates a denser schedule to navigate semantic bottlenecks, effectively resolving the flow solver delay. Real-world physical deployments confirm that ProbeFlow successfully mitigates action decoding latency while ensuring execution stability, offering a highly practical solution for low-latency continuous generative policies.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action
Flow Matching
inference latency
robotic control
action decoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow Matching
Training-Free Adaptation
Adaptive Inference
Vision-Language-Action Models
Robotic Control Latency
Z
Zhou Fang
School of Computer Science and Engineering, Southeast University, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education, China
Jiaqi Wang
Jiaqi Wang
Harbin Institute of Technology Shenzhen & Pengcheng Laboratory, Computer Science
Spiking Neural NetworkBrain DecodingSpeechBrain Computer Interface
Y
Yi Zhou
School of Computer Science and Engineering, Southeast University, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education, China
Qiongfeng Shi
Qiongfeng Shi
Southeast University; National University of Singapore
Flexible electronicsSensorsEnergy harvestersIntelligent systems