AsyncVLA: An Asynchronous VLA for Fast and Robust Navigation on the Edge

📅 2026-02-13

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Deploying large models on edge robots often leads to high inference latency, disrupting control loops and hindering real-time, safe navigation. To address this challenge, this work proposes AsyncVLA (Asynchronous Vision-Language-Action), a novel framework that decouples semantic planning from a remote large model and high-frequency action execution by a lightweight local Edge Adapter. Through end-to-end fine-tuning and a trajectory reweighting strategy, AsyncVLA effectively bridges the domain gap between high-level semantic instructions and low-level dynamic execution. Evaluated on real-world visual navigation tasks with communication delays up to six seconds, the method achieves a 40% higher success rate compared to the current state-of-the-art baseline, substantially improving both real-time responsiveness and system robustness.

Technology Category

Application Category

📝 Abstract

Robotic foundation models achieve strong generalization by leveraging internet-scale vision-language representations, but their massive computational cost creates a fundamental bottleneck: high inference latency. In dynamic environments, this latency breaks the control loop, rendering powerful models unsafe for real-time deployment. We propose AsyncVLA, an asynchronous control framework that decouples semantic reasoning from reactive execution. Inspired by hierarchical control, AsyncVLA runs a large foundation model on a remote workstation to provide high-level guidance, while a lightweight, onboard Edge Adapter continuously refines actions at high frequency. To bridge the domain gap between these asynchronous streams, we introduce an end-to-end finetuning protocol and a trajectory re-weighting strategy that prioritizes dynamic interactions. We evaluate our approach on real-world vision-based navigation tasks with communication delays up to 6 seconds. AsyncVLA achieves a 40% higher success rate than state-of-the-art baselines, effectively bridging the gap between the semantic intelligence of large models and the reactivity required for edge robotics.

Problem

Research questions and friction points this paper is trying to address.

inference latency

real-time deployment

edge robotics

control loop

vision-language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous control

vision-language-action models

edge robotics