AVERY: Adaptive VLM Split Computing through Embodied Self-Awareness for Efficient Disaster Response Systems

📅 2025-11-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In disaster response scenarios, UAVs require low-latency, queryable semantic intelligence; however, existing on-board CNNs lack semantic understanding, while large vision-language models (VLMs) are impractical due to stringent resource constraints. To address this, we propose an adaptive vision-language model split-computing framework. Our method introduces a cognition-inspired dual-stream architecture coupled with a lightweight self-aware controller that dynamically partitions visual feature extraction and language reasoning between edge and cloud—based on real-time network conditions and task intent. It integrates the LISA-7B VLM, edge-cloud collaborative computation, and adaptive image compression. Experiments under dynamic network conditions demonstrate that our approach improves accuracy by 11.2% over static partitioning baselines and reduces energy consumption by 93.98% compared to full-edge execution, significantly enhancing both semantic inference latency and mission efficiency.

Technology Category

Application Category

📝 Abstract
Unmanned Aerial Vehicles (UAVs) in disaster response require complex, queryable intelligence that on-board CNNs cannot provide. While Vision-Language Models (VLMs) offer this semantic reasoning, their high resource demands make on-device deployment infeasible, and naive cloud offloading fails under the low-bandwidth networks common in disaster zones. We present AVERY, a framework that enables VLM deployment through adaptive split computing. We advance the split computing paradigm beyond traditional depth-wise partitioning by introducing a functional, cognitive-inspired dual-stream split that separates the VLM into a high-frequency, low-resolution "context stream" for real-time awareness and a low-frequency, high-fidelity "insight stream" for deep analysis. A lightweight, self-aware on-board controller manages this architecture, monitoring network conditions and operator intent to dynamically select from pre-trained compression models, navigating the fundamental accuracy-throughput trade-off. Evaluated using the VLM LISA-7B across an edge-cloud scenario under fluctuating network conditions, AVERY consistently outperforms static configurations, achieving 11.2% higher accuracy than raw image compression and 93.98% lower energy consumption compared to full-edge execution, thereby enhancing mission efficiency and enabling real-time, queryable intelligence on resource-constrained platforms in dynamic environments.
Problem

Research questions and friction points this paper is trying to address.

Deploying Vision-Language Models on UAVs for disaster response intelligence
Overcoming high resource demands and low-bandwidth network limitations
Balancing accuracy and throughput trade-offs in dynamic environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive split computing for VLM deployment
Dual-stream architecture separates context and insight
Self-aware controller dynamically manages accuracy-throughput trade-off