Test-time Loss Landscape Adaptation for Zero-Shot Generalization in Vision-Language Models

📅 2025-01-31

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses test-time zero-shot generalization of vision-language models under distribution shift, proposing TLLA—a novel, efficient, backpropagation-free and parameter-update-free adaptive paradigm. Methodologically, TLLA innovatively identifies redundancy in test-time gradient optimization by analyzing loss surface flatness—a first-of-its-kind insight—and builds a gradient-free prompt optimization framework comprising Sharpness-Aware Prompt Tuning (SAPT), Sharpness-based Test Sample Selection (STSS), and loss-surface geometric alignment. Evaluated on four ImageNet variants, TLLA achieves state-of-the-art performance, outperforming TPT by 5.32% (ResNet-50) and 6.98% (ViT-B/16) in zero-shot accuracy, while significantly reducing computational overhead. The approach establishes a new benchmark for efficient, geometry-aware test-time adaptation without parameter updates.

Technology Category

Application Category

📝 Abstract

Test-time adaptation of pre-trained vision-language models has emerged as a technique for tackling distribution shifts during the test time. Although existing methods, especially those based on Test-time Prompt Tuning (TPT), have shown promising results, their high computational cost associated with parameter optimization presents challenges for scalability and practical application. This paper unveils the unnecessary nature of backpropagation in existing methods from a loss landscape perspective. Building on this insight, this paper proposes a simple yet effective framework called Test-time Loss Landscape Adaptation (TLLA). TLLA leverages the relative position between the training minimum and test loss landscapes to guide the adaptation process, avoiding the update of model parameters at test time. Specifically, it mainly consists of two main stages: In the prompt tuning stage, a Sharpness-Aware Prompt Tuning (SAPT) method is introduced to identify the training flat minimum, setting the foundation for the subsequent test-time adaptation; In the test stage, a Sharpness-based Test Sample Selection (STSS) approach is utilized to ensure the alignment of flat minima within the training loss landscape and each augmented test sample's loss landscape. Extensive experiments on both domain generalization and cross-dataset benchmarks demonstrate that TLLA achieves state-of-the-art performances while significantly reducing computational overhead. Notably, TLLA surpasses TPT by an average of 5.32% and 6.98% on four ImageNet variant datasets when employing ResNet50 and ViT-B/16 image encoders, respectively. The code will be available soon.

Problem

Research questions and friction points this paper is trying to address.

Pre-trained Visual Language Models

Generalization on Unseen Data

Reducing Computational Cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

TLLA

SAPT

STSS

🔎 Similar Papers

No similar papers found.