LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge

📅 2026-02-08

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Deploying vision-language models on edge devices is often hindered by resource constraints and distribution shifts, while existing test-time adaptation (TTA) methods are impractical due to their high computational overhead. This work proposes LQA, a novel framework that integrates modality-aware quantization with a gradient-free TTA mechanism and introduces a selective hybrid quantization (SHQ) strategy. LQA significantly reduces memory and computational costs while preserving model robustness. Experiments across seven open-source datasets demonstrate that LQA improves average adaptive performance by 4.5%, substantially lowers memory usage compared to full-precision models, and achieves up to a 19.9× reduction in memory consumption relative to gradient-based TTA approaches.

Technology Category

Application Category

📝 Abstract

Deploying Vision-Language Models (VLMs) on edge devices is challenged by resource constraints and performance degradation under distribution shifts. While test-time adaptation (TTA) can counteract such shifts, existing methods are too resource-intensive for on-device deployment. To address this challenge, we propose LQA, a lightweight, quantized-adaptive framework for VLMs that combines a modality-aware quantization strategy with gradient-free test-time adaptation. We introduce Selective Hybrid Quantization (SHQ) and a quantized, gradient-free adaptation mechanism to enable robust and efficient VLM deployment on resource-constrained hardware. Experiments across both synthetic and real-world distribution shifts show that LQA improves overall adaptation performance by 4.5\%, uses less memory than full-precision models, and significantly outperforms gradient-based TTA methods, achieving up to 19.9$\times$ lower memory usage across seven open-source datasets. These results demonstrate that LQA offers a practical pathway for robust, privacy-preserving, and efficient VLM deployment on edge devices.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language Models

Edge Deployment

Distribution Shift

Test-Time Adaptation

Resource Constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

lightweight quantization

test-time adaptation

vision-language models