A Layer Selection Approach to Test Time Adaptation

📅 2024-04-04

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Test-time adaptation (TTA) often suffers from severe performance degradation—or even underperforms the original pre-trained model—under substantial distribution shifts, primarily due to misaligned inter-layer gradients that introduce noise and disrupt pre-trained representations when updated indiscriminately. Method: We propose GALA, a gradient alignment-aware layer selection criterion that quantifies layer-wise trainability via gradient alignment analysis. GALA dynamically freezes poorly aligned layers while updating only those with high alignment; it further jointly filters out noisy gradient samples. The resulting framework is plug-and-play and agnostic to underlying TTA loss functions. Contribution/Results: GALA consistently improves accuracy across diverse datasets, model architectures, and domain shift types. It effectively mitigates negative transfer, enhances stability under severe distribution shifts, and demonstrates strong generalization—without requiring architectural modifications or additional inference overhead.

Technology Category

Application Category

📝 Abstract

Test Time Adaptation (TTA) addresses the problem of distribution shift by adapting a pretrained model to a new domain during inference. When faced with challenging shifts, most methods collapse and perform worse than the original pretrained model. In this paper, we find that not all layers are equally receptive to the adaptation, and the layers with the most misaligned gradients often cause performance degradation. To address this, we propose GALA, a novel layer selection criterion to identify the most beneficial updates to perform during test time adaptation. This criterion can also filter out unreliable samples with noisy gradients. Its simplicity allows seamless integration with existing TTA loss functions, thereby preventing degradation and focusing adaptation on the most trainable layers. This approach also helps to regularize adaptation to preserve the pretrained features, which are crucial for handling unseen domains. Through extensive experiments, we demonstrate that the proposed layer selection framework improves the performance of existing TTA approaches across multiple datasets, domain shifts, model architectures, and TTA losses.

Problem

Research questions and friction points this paper is trying to address.

Addressing distribution shift during test time adaptation

Preventing performance degradation from misaligned layer updates

Selecting beneficial layers while filtering unreliable samples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer selection criterion for TTA

Filters unreliable samples with gradients

Preserves pretrained features during adaptation

🔎 Similar Papers

Exploring Human-in-the-Loop Test-Time Adaptation by Synergizing Active Learning and Model Selection