LinearSR: Unlocking Linear Attention for Stable and Efficient Image Super-Resolution

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In generative image super-resolution (SR), the O(N²) complexity of self-attention severely hampers inference efficiency. To address this, we present the first stable and efficient application of linear attention to photorealistic SR. Our method introduces three key innovations: (1) Early-Stopping-Guided Fine-tuning (ESGF), which mitigates training instability inherent in linear attention; (2) a signal-to-noise-ratio (SNR)-driven Mixture-of-Experts (MoE) architecture that explicitly models the perception-distortion trade-off; and (3) a lightweight, accuracy-oriented TAG-guided generation paradigm that enhances single-step output fidelity. Experiments demonstrate that our approach achieves state-of-the-art inference speed under single-step forward diffusion, maintains competitive runtime in multi-step settings, and significantly surpasses prior methods in perceptual quality. This work establishes a practical pathway for deploying linear attention in high-fidelity generative SR.

Technology Category

Application Category

📝 Abstract
Generative models for Image Super-Resolution (SR) are increasingly powerful, yet their reliance on self-attention's quadratic complexity (O(N^2)) creates a major computational bottleneck. Linear Attention offers an O(N) solution, but its promise for photorealistic SR has remained largely untapped, historically hindered by a cascade of interrelated and previously unsolved challenges. This paper introduces LinearSR, a holistic framework that, for the first time, systematically overcomes these critical hurdles. Specifically, we resolve a fundamental, training instability that causes catastrophic model divergence using our novel "knee point"-based Early-Stopping Guided Fine-tuning (ESGF) strategy. Furthermore, we mitigate the classic perception-distortion trade-off with a dedicated SNR-based Mixture of Experts (MoE) architecture. Finally, we establish an effective and lightweight guidance paradigm, TAG, derived from our "precision-over-volume" principle. Our resulting LinearSR model simultaneously delivers state-of-the-art perceptual quality with exceptional efficiency. Its core diffusion forward pass (1-NFE) achieves SOTA-level speed, while its overall multi-step inference time remains highly competitive. This work provides the first robust methodology for applying Linear Attention in the photorealistic SR domain, establishing a foundational paradigm for future research in efficient generative super-resolution.
Problem

Research questions and friction points this paper is trying to address.

Overcoming quadratic complexity bottleneck in super-resolution models
Resolving training instability in linear attention implementations
Mitigating perception-distortion trade-off in efficient image enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear Attention for O(N) computational efficiency
Knee point-based ESGF strategy for training stability
SNR-based MoE architecture to mitigate perception-distortion trade-off
🔎 Similar Papers
No similar papers found.
X
Xiaohui Li
Shanghai Jiao Tong University
Shaobin Zhuang
Shaobin Zhuang
Shanghai Jiaotong University
Video GenerationComputer Vision
S
Shuo Cao
University of Science and Technology of China
Y
Yang Yang
The Australian National University
Yuandong Pu
Yuandong Pu
SJTU,Shanghai AI Laboratory
Computer Vision
Q
Qi Qin
Shanghai Artificial Intelligence Laboratory
Siqi Luo
Siqi Luo
Shanghai Jiao Tong university
AIGCComputer VisionImage EditingAI4Science
B
Bin Fu
Shanghai Artificial Intelligence Laboratory
Y
Yihao Liu
Shanghai Artificial Intelligence Laboratory