🤖 AI Summary
This work addresses the high inference latency in Vision-Language-Action models caused by multi-step ODE solvers in flow-matching-based action heads, which hinders real-time robotic control. The authors propose the first training-free, adaptive inference framework that dynamically schedules integration steps at the action head level. By evaluating the geometric complexity of the trajectory via the cosine similarity between initial and lookahead velocity vectors, the method automatically identifies semantic bottlenecks and reduces redundant neural network evaluations. Evaluated on MetaWorld, the approach achieves a 14.8× speedup in action decoding and a 2.8× reduction in end-to-end latency without compromising task success rates. On long-horizon LIBERO tasks, it substantially alleviates solver-induced delays, with real-world experiments confirming its stability and low-latency advantages.
📝 Abstract
Recent Vision-Language-Action (VLA) models equipped with Flow Matching (FM) action heads achieve state-of-the-art performance in complex robot manipulation. However, the multi-step iterative ODE solving required by FM introduces inference latency that precludes responsive physical control. While current acceleration efforts optimize the Vision-Language Model (VLM) backbone, the action head bottleneck remains overlooked. To address this, we propose ProbeFlow, a training-free adaptive inference framework tai- lored for continuous robotic control. By evaluating geometric trajectory complexity via the cosine similarity between initial and lookahead velocity vectors, ProbeFlow dynamically sched- ules integration steps to prune redundant network evaluations. On the MetaWorld benchmark, it accelerates action decoding by 14.8x (reducing average steps from N = 50 to 2.6) and cuts end-to-end system latency by 2.8x without compromising the manipulation success rate. On the long-horizon LIBERO benchmark, the probe automatically allocates a denser schedule to navigate semantic bottlenecks, effectively resolving the flow solver delay. Real-world physical deployments confirm that ProbeFlow successfully mitigates action decoding latency while ensuring execution stability, offering a highly practical solution for low-latency continuous generative policies.