Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

To address representation redundancy in Transformer-based multi-head attention (MHA) caused by uniform subspace partitioning, this paper proposes Hierarchical Multi-Head Attention (HMHA) and Query-Key Cache Updating (QKCU), integrated into the HINT model for enhanced image restoration. HMHA introduces heterogeneous subspace partitioning across attention heads, enabling complementary feature learning. QKCU jointly optimizes cache updates both across and within layers to dynamically mitigate inter-head redundancy. The framework unifies modeling for five low-level vision tasks—low-light enhancement, dehazing, desnowing, denoising, and deraining—and achieves state-of-the-art performance across 12 benchmark datasets. Source code is publicly available.

Technology Category

Application Category

📝 Abstract

Transformer-based approaches have gained significant attention in image restoration, where the core component, i.e, Multi-Head Attention (MHA), plays a crucial role in capturing diverse features and recovering high-quality results. In MHA, heads perform attention calculation independently from uniform split subspaces, and a redundancy issue is triggered to hinder the model from achieving satisfactory outputs. In this paper, we propose to improve MHA by exploring diverse learners and introducing various interactions between heads, which results in a Hierarchical multI-head atteNtion driven Transformer model, termed HINT, for image restoration. HINT contains two modules, i.e., the Hierarchical Multi-Head Attention (HMHA) and the Query-Key Cache Updating (QKCU) module, to address the redundancy problem that is rooted in vanilla MHA. Specifically, HMHA extracts diverse contextual features by employing heads to learn from subspaces of varying sizes and containing different information. Moreover, QKCU, comprising intra- and inter-layer schemes, further reduces the redundancy problem by facilitating enhanced interactions between attention heads within and across layers. Extensive experiments are conducted on 12 benchmarks across 5 image restoration tasks, including low-light enhancement, dehazing, desnowing, denoising, and deraining, to demonstrate the superiority of HINT. The source code is available in the supplementary materials.

Problem

Research questions and friction points this paper is trying to address.

Redundancy issue in Multi-Head Attention hinders image restoration.

Uniform subspaces in MHA limit diverse feature learning.

Lack of head interactions reduces Transformer model performance.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Multi-Head Attention for diverse features

Query-Key Cache Updating reduces redundancy

Subspaces of varying sizes enhance learning

🔎 Similar Papers

DeblurDiNAT: A Compact Model with Exceptional Generalization and Visual Fidelity on Unseen Domains