Exploiting Layer Normalization Fine-tuning in Visual Transformer Foundation Models for Classification

📅 2025-08-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the fine-tuning dynamics of LayerNorm in vision Transformers under data scarcity and domain shift. Recognizing that LayerNorm parameter changes implicitly encode source-to-target domain transfer characteristics, we propose a fine-grained adaptation mechanism: (i) a Fine-tuning Shift Ratio (FSR) to quantify LayerNorm’s domain shift magnitude; (ii) a learnable scalar λ for dynamic rescaling of LayerNorm outputs; and (iii) a cyclic fine-tuning framework enabling progressive domain adaptation. Evaluated on both in-distribution (ID) and out-of-distribution (OOD) classification tasks—including natural and medical imaging domains—our method significantly improves generalization. Notably, in OOD settings, low FSR strongly correlates with high λ, and fine-tuning on pathological images approaches ID-level performance. Our approach consistently outperforms standard fine-tuning and state-of-the-art domain adaptation methods.

Technology Category

Application Category

📝 Abstract
LayerNorm is pivotal in Vision Transformers (ViTs), yet its fine-tuning dynamics under data scarcity and domain shifts remain underexplored. This paper shows that shifts in LayerNorm parameters after fine-tuning (LayerNorm shifts) are indicative of the transitions between source and target domains; its efficacy is contingent upon the degree to which the target training samples accurately represent the target domain, as quantified by our proposed Fine-tuning Shift Ratio ($FSR$). Building on this, we propose a simple yet effective rescaling mechanism using a scalar $λ$ that is negatively correlated to $FSR$ to align learned LayerNorm shifts with those ideal shifts achieved under fully representative data, combined with a cyclic framework that further enhances the LayerNorm fine-tuning. Extensive experiments across natural and pathological images, in both in-distribution (ID) and out-of-distribution (OOD) settings, and various target training sample regimes validate our framework. Notably, OOD tasks tend to yield lower $FSR$ and higher $λ$ in comparison to ID cases, especially with scarce data, indicating under-represented target training samples. Moreover, ViTFs fine-tuned on pathological data behave more like ID settings, favoring conservative LayerNorm updates. Our findings illuminate the underexplored dynamics of LayerNorm in transfer learning and provide practical strategies for LayerNorm fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

Study LayerNorm fine-tuning dynamics in ViTs under data scarcity
Propose rescaling mechanism to align LayerNorm shifts with ideal shifts
Validate framework in ID and OOD settings with various data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rescaling mechanism with λ for LayerNorm shifts
Cyclic framework enhances LayerNorm fine-tuning
Fine-tuning Shift Ratio (FSR) quantifies domain shifts
🔎 Similar Papers
No similar papers found.
Zhaorui Tan
Zhaorui Tan
University of Liverpool, PHD student
GeneralizationText-to-ImageGenerative models
Tan Pan
Tan Pan
Fudan University
Computer VisionAI4ScienceSelf-supervised Learning
Kaizhu Huang
Kaizhu Huang
Professor, Duke Kunshan University
Generalization & RobustnessStatistical Learning ThoeryTrustworthy AI
W
Weimiao Yu
BII, A∗STAR
K
Kai Yao
Zhejiang University
C
Chen Jiang
Shanghai Academy of Artificial Intelligence for Science, AI3Fudan University
Q
Qiufeng Wang
Xi’an Jiaotong-Liverpool University
A
Anh Nguyen
University of Liverpool
X
Xin Guo
Shanghai Academy of Artificial Intelligence for Science, AI3Fudan University
Y
Yuan Cheng
Shanghai Academy of Artificial Intelligence for Science, AI3Fudan University
X
Xi Yang
Xi’an Jiaotong-Liverpool University