Small Aid, Big Leap: Efficient Test-Time Adaptation for Vision-Language Models with AdaptNet

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

To address high per-sample optimization costs, heavy reliance on data augmentation, and poor scalability in test-time adaptation (TTA) for vision-language models (VLMs), this paper proposes AdaptNet—a lightweight, batch-wise TTA framework. Methodologically, it (1) introduces parameter-minimal adapters under a frozen VLM backbone, enabling collaborative inference via confidence-weighted interpolation; (2) designs a gradient drift indicator (GDI)-driven adaptive reset mechanism to mitigate catastrophic forgetting during continual adaptation; and (3) employs self-supervised batch alignment training, eliminating the need for extra annotations or data augmentation. Evaluated across multiple benchmarks under both domain shift and out-of-distribution scenarios, AdaptNet achieves state-of-the-art performance while reducing inference latency by 42% and GPU memory consumption by 58%. The framework demonstrates superior efficiency, robustness, and scalability.

Technology Category

Application Category

📝 Abstract

Test-time adaptation (TTA) has emerged as a critical technique for enhancing the generalization capability of vision-language models (VLMs) during inference. However, existing approaches often incur substantial computational costs and exhibit poor scalability, primarily due to sample-wise adaptation granularity and reliance on costly auxiliary designs such as data augmentation. To address these limitations, we introduce SAIL (Small Aid, Big Leap), a novel adapter-based TTA framework that leverages a lightweight, learnable AdaptNet to enable efficient and scalable model adaptation. As SAIL's core, a frozen pre-trained VLM collaborates with AdaptNet through a confidence-based interpolation weight, generating robust predictions during inference. These predictions serve as self-supervised targets to align AdaptNet's outputs through efficient batch-wise processing, dramatically reducing computational costs without modifying the VLM or requiring memory caches. To mitigate catastrophic forgetting during continual adaptation, we propose a gradient-aware reset strategy driven by a gradient drift indicator (GDI), which dynamically detects domain transitions and strategically resets AdaptNet for stable adaptation. Extensive experiments across diverse benchmarks on two scenarios demonstrate that SAIL achieves state-of-the-art performance while maintaining low computational costs. These results highlight SAIL's effectiveness, efficiency and scalability for real-world deployment. The code will be released upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

Efficient test-time adaptation for vision-language models

Reducing computational costs in model adaptation

Mitigating catastrophic forgetting during continual adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight AdaptNet enables efficient TTA

Confidence-based interpolation for robust predictions

Gradient-aware reset prevents catastrophic forgetting

🔎 Similar Papers

Efficient Open Set Single Image Test Time Adaptation of Vision Language Models