Noisy Test-Time Adaptation in Vision-Language Models

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address the degradation of test-time adaptation (TTA) performance caused by out-of-distribution (OOD) noise samples under open-world distribution shifts, this paper proposes zero-shot noise-aware test-time adaptation (ZS-NTTA). We first identify and characterize the dominant negative impact of noisy samples on TTA. To mitigate this, we introduce a decoupled paradigm: freezing the pretrained classifier while enabling learning only in a dedicated noise detector. Specifically, we design AdaND—an adaptive noise detector that leverages frozen vision-language models (VLMs) to generate pseudo-labels and incorporates Gaussian noise injection to enhance discriminability between clean and noisy samples. ZS-NTTA operates solely on unlabeled target-domain data, requiring neither source-domain data nor prior knowledge. On ImageNet, it improves harmonic-mean accuracy by 8.32% and reduces OOD detection FPR95 by 9.40%, with computational overhead comparable to that of a frozen model. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Test-time adaptation (TTA) aims to address distribution shifts between source and target data by relying solely on target data during testing. In open-world scenarios, models often encounter noisy samples, i.e., samples outside the in-distribution (ID) label space. Leveraging the zero-shot capability of pre-trained vision-language models (VLMs), this paper introduces Zero-Shot Noisy TTA (ZS-NTTA), focusing on adapting the model to target data with noisy samples during test-time in a zero-shot manner. We find existing TTA methods underperform under ZS-NTTA, often lagging behind even the frozen model. We conduct comprehensive experiments to analyze this phenomenon, revealing that the negative impact of unfiltered noisy data outweighs the benefits of clean data during model updating. Also, adapting a classifier for ID classification and noise detection hampers both sub-tasks. Built on this, we propose a framework that decouples the classifier and detector, focusing on developing an individual detector while keeping the classifier frozen. Technically, we introduce the Adaptive Noise Detector (AdaND), which utilizes the frozen model's outputs as pseudo-labels to train a noise detector. To handle clean data streams, we further inject Gaussian noise during adaptation, preventing the detector from misclassifying clean samples as noisy. Beyond the ZS-NTTA, AdaND can also improve the zero-shot out-of-distribution (ZS-OOD) detection ability of VLMs. Experiments show that AdaND outperforms in both ZS-NTTA and ZS-OOD detection. On ImageNet, AdaND achieves a notable improvement of $8.32%$ in harmonic mean accuracy ($ ext{Acc}_ ext{H}$) for ZS-NTTA and $9.40%$ in FPR95 for ZS-OOD detection, compared to SOTA methods. Importantly, AdaND is computationally efficient and comparable to the model-frozen method. The code is publicly available at: https://github.com/tmlr-group/ZS-NTTA.

Problem

Research questions and friction points this paper is trying to address.

Address noisy test-time adaptation

Improve zero-shot detection ability

Decouple classifier and noise detector

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-Shot Noisy TTA adaptation

Decouples classifier and detector

Adaptive Noise Detector with Gaussian noise

🔎 Similar Papers

Efficient Open Set Single Image Test Time Adaptation of Vision Language Models