GeneralizeFormer: Layer-Adaptive Model Generation across Test-Time Distribution Shifts

📅 2025-02-15

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work addresses test-time adaptation (TTA), aiming to enable models to adapt in real time to diverse, unseen target-domain distributions without fine-tuning or online parameter updates, while preserving source-domain knowledge. To this end, we propose GeneralizeFormer—a lightweight meta-learning Transformer architecture that dynamically generates only BatchNorm layer parameters and classifier weights, thereby improving computational efficiency and enhancing source-feature retention. We further introduce a layer-wise gradient-aware mechanism to strengthen robustness against distribution shifts. Evaluated on six mainstream domain generalization benchmarks, GeneralizeFormer significantly outperforms existing TTA and domain generalization methods. It demonstrates strong efficacy in handling multiple heterogeneous target domains, dynamic environments, and continual distribution drift, establishing new state-of-the-art performance under realistic test-time adaptation settings.

Technology Category

Application Category

📝 Abstract

We consider the problem of test-time domain generalization, where a model is trained on several source domains and adjusted on target domains never seen during training. Different from the common methods that fine-tune the model or adjust the classifier parameters online, we propose to generate multiple layer parameters on the fly during inference by a lightweight meta-learned transformer, which we call extit{GeneralizeFormer}. The layer-wise parameters are generated per target batch without fine-tuning or online adjustment. By doing so, our method is more effective in dynamic scenarios with multiple target distributions and also avoids forgetting valuable source distribution characteristics. Moreover, by considering layer-wise gradients, the proposed method adapts itself to various distribution shifts. To reduce the computational and time cost, we fix the convolutional parameters while only generating parameters of the Batch Normalization layers and the linear classifier. Experiments on six widely used domain generalization datasets demonstrate the benefits and abilities of the proposed method to efficiently handle various distribution shifts, generalize in dynamic scenarios, and avoid forgetting.

Problem

Research questions and friction points this paper is trying to address.

Test-time domain generalization

Dynamic multiple target distributions

Layer-wise parameter generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-adaptive parameter generation

Lightweight meta-learned transformer

Batch Normalization parameter optimization

🔎 Similar Papers

No similar papers found.