🤖 AI Summary
Transformers face scalability limitations in high-resolution image restoration due to the quadratic computational complexity of self-attention. While sparse or windowed attention mechanisms reduce complexity, they compromise global contextual modeling; linear attention offers linear complexity and inherent global receptive fields but suffers from severe performance degradation caused by low-rank approximation. To address this, we propose Rank-Enhanced Linear Attention (RELA), the first method to incorporate lightweight depthwise convolutions to mitigate low-rank degeneration. Furthermore, we design LAformer—a novel architecture devoid of softmax normalization and window shifting—that synergistically integrates linear attention, channel-wise attention, and convolutionally gated feed-forward networks. Evaluated across seven image restoration tasks and 21 benchmarks, LAformer consistently surpasses state-of-the-art methods, achieving superior restoration quality with significantly reduced computational overhead, enabling end-to-end real-time processing of high-resolution images.
📝 Abstract
Transformer-based models have made remarkable progress in image restoration (IR) tasks. However, the quadratic complexity of self-attention in Transformer hinders its applicability to high-resolution images. Existing methods mitigate this issue with sparse or window-based attention, yet inherently limit global context modeling. Linear attention, a variant of softmax attention, demonstrates promise in global context modeling while maintaining linear complexity, offering a potential solution to the above challenge. Despite its efficiency benefits, vanilla linear attention suffers from a significant performance drop in IR, largely due to the low-rank nature of its attention map. To counter this, we propose Rank Enhanced Linear Attention (RELA), a simple yet effective method that enriches feature representations by integrating a lightweight depthwise convolution. Building upon RELA, we propose an efficient and effective image restoration Transformer, named LAformer. LAformer achieves effective global perception by integrating linear attention and channel attention, while also enhancing local fitting capabilities through a convolutional gated feed-forward network. Notably, LAformer eliminates hardware-inefficient operations such as softmax and window shifting, enabling efficient processing of high-resolution images. Extensive experiments across 7 IR tasks and 21 benchmarks demonstrate that LAformer outperforms SOTA methods and offers significant computational advantages.