Rethinking LayerNorm in Image Restoration Transformers

📅 2025-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses two critical issues in image restoration Transformers: excessively low feature entropy and severe amplitude divergence (reaching magnitudes of 10⁶). We identify the root cause as token-wise LayerNorm, which disrupts spatial correlations and internal feature-statistical consistency. To resolve this, we propose Global LayerNorm—a full spatio-channel normalization operating jointly across the H×W×C dimensions—to preserve spatial structure. Additionally, we introduce an input-adaptive feature rescaling mechanism that dynamically aligns normalization parameters with input-specific statistics. Through theoretical analysis and extensive experiments, our approach significantly improves feature entropy stability, effectively suppresses amplitude divergence, and enhances training robustness. The method delivers consistent performance gains across diverse image restoration tasks, including denoising, super-resolution, and deblurring, establishing a more principled and stable normalization paradigm for vision Transformers.

Technology Category

Application Category

📝 Abstract
This work investigates abnormal feature behaviors observed in image restoration (IR) Transformers. Specifically, we identify two critical issues: feature entropy becoming excessively small and feature magnitudes diverging up to a million-fold scale. We pinpoint the root cause to the per-token normalization aspect of conventional LayerNorm, which disrupts essential spatial correlations and internal feature statistics. To address this, we propose a simple normalization strategy tailored for IR Transformers. Our approach applies normalization across the entire spatio-channel dimension, effectively preserving spatial correlations. Additionally, we introduce an input-adaptive rescaling method that aligns feature statistics to the unique statistical requirements of each input. Experimental results verify that this combined strategy effectively resolves feature divergence, significantly enhancing both the stability and performance of IR Transformers across various IR tasks.
Problem

Research questions and friction points this paper is trying to address.

Addresses abnormal feature behaviors in image restoration Transformers
Resolves feature entropy and magnitude divergence issues
Improves stability and performance in IR tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Normalization across spatio-channel dimension
Input-adaptive rescaling method
Preserves spatial correlations effectively
🔎 Similar Papers
No similar papers found.