Vision-Language Models for Edge Networks: A Comprehensive Survey

📅 2025-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deploying vision-language models (VLMs) on resource-constrained edge devices faces critical bottlenecks in computational capacity, memory footprint, and energy consumption. This paper presents the first systematic survey of optimization techniques for edge-deployable VLMs, categorizing approaches into four pillars: model compression (e.g., pruning, quantization, knowledge distillation), efficient fine-tuning, hardware-aware acceleration, and privacy-preserving inference. We comprehensively analyze over 100 state-of-the-art works, identifying recurring challenges—such as cross-modal efficiency trade-offs and heterogeneous hardware constraints—and distilling practical design principles. Furthermore, we propose a lightweight design paradigm tailored to edge scenarios, emphasizing software-hardware co-design, low-rank adaptation, and energy-aware optimization. Our synthesis provides both theoretical foundations and actionable guidelines for deploying VLMs in latency-sensitive, real-world edge applications—including medical diagnosis, environmental monitoring, and autonomous driving.

Technology Category

Application Category

📝 Abstract
Vision Large Language Models (VLMs) combine visual understanding with natural language processing, enabling tasks like image captioning, visual question answering, and video analysis. While VLMs show impressive capabilities across domains such as autonomous vehicles, smart surveillance, and healthcare, their deployment on resource-constrained edge devices remains challenging due to processing power, memory, and energy limitations. This survey explores recent advancements in optimizing VLMs for edge environments, focusing on model compression techniques, including pruning, quantization, knowledge distillation, and specialized hardware solutions that enhance efficiency. We provide a detailed discussion of efficient training and fine-tuning methods, edge deployment challenges, and privacy considerations. Additionally, we discuss the diverse applications of lightweight VLMs across healthcare, environmental monitoring, and autonomous systems, illustrating their growing impact. By highlighting key design strategies, current challenges, and offering recommendations for future directions, this survey aims to inspire further research into the practical deployment of VLMs, ultimately making advanced AI accessible in resource-limited settings.
Problem

Research questions and friction points this paper is trying to address.

Optimizing Vision-Language Models for edge devices
Addressing resource constraints in VLM deployment
Enhancing efficiency through model compression techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model compression techniques
Efficient training methods
Specialized hardware solutions
🔎 Similar Papers
No similar papers found.
A
Ahmed Sharshar
Computer Vision Department, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
Latif U. Khan
Latif U. Khan
Abu Dhabi University, United Arab Emirates
Machine LearningDigital TwinsMetaverseNetwork Optimization
Waseem Ullah
Waseem Ullah
Mohamed bin Zayed University of Artificial Intelligence: MBZUAI
Computer VisionMachine LearningVideo Anomaly DetectionEnergy Informatic
M
Mohsen Guizani
Machine Learning Department, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE