MambaVSR: Content-Aware Scanning State Space Model for Video Super-Resolution

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video super-resolution (VSR) faces the dual challenge of modeling non-local spatiotemporal dependencies across frames while maintaining computational efficiency—particularly under large motion displacements and long video sequences, where optical-flow-based methods and Transformers exhibit limitations. This paper proposes the first SSM-based VSR framework. Its core contributions are: (1) Shared Compass Construction (SCC) and Content-Aware Serialization (CAS), which replace Mamba’s rigid 1D scanning with dynamic spatiotemporal interaction; and (2) a Global-Local State Space Block (GLSSB) that synergistically integrates windowed self-attention with SSM-driven feature propagation. On the REDS dataset, our method achieves a 0.58 dB PSNR gain over state-of-the-art Transformer-based approaches while reducing parameter count by 55%, significantly improving both reconstruction accuracy and efficiency for videos with large motions and extended temporal extents.

Technology Category

Application Category

📝 Abstract
Video super-resolution (VSR) faces critical challenges in effectively modeling non-local dependencies across misaligned frames while preserving computational efficiency. Existing VSR methods typically rely on optical flow strategies or transformer architectures, which struggle with large motion displacements and long video sequences. To address this, we propose MambaVSR, the first state-space model framework for VSR that incorporates an innovative content-aware scanning mechanism. Unlike rigid 1D sequential processing in conventional vision Mamba methods, our MambaVSR enables dynamic spatiotemporal interactions through the Shared Compass Construction (SCC) and the Content-Aware Sequentialization (CAS). Specifically, the SCC module constructs intra-frame semantic connectivity graphs via efficient sparse attention and generates adaptive spatial scanning sequences through spectral clustering. Building upon SCC, the CAS module effectively aligns and aggregates non-local similar content across multiple frames by interleaving temporal features along the learned spatial order. To bridge global dependencies with local details, the Global-Local State Space Block (GLSSB) synergistically integrates window self-attention operations with SSM-based feature propagation, enabling high-frequency detail recovery under global dependency guidance. Extensive experiments validate MambaVSR's superiority, outperforming the Transformer-based method by 0.58 dB PSNR on the REDS dataset with 55% fewer parameters.
Problem

Research questions and friction points this paper is trying to address.

Modeling non-local dependencies in video super-resolution efficiently
Addressing large motion displacements in long video sequences
Enhancing spatiotemporal interactions with content-aware scanning mechanism
Innovation

Methods, ideas, or system contributions that make the work stand out.

Content-aware scanning for dynamic spatiotemporal interactions
Shared Compass Construction for adaptive spatial scanning
Global-Local State Space Block integrates window self-attention
🔎 Similar Papers
No similar papers found.
L
Linfeng He
Institute of Information Science, Beijing Jiaotong University; Visual Intelligence + X International Cooperation Joint Laboratory of MOE, Beijing Jiaotong University
Meiqin Liu
Meiqin Liu
Zhejiang University
Control Theory and Control Engineering
Qi Tang
Qi Tang
Computational Science and Engineering, Georgia Institute of Technology
High Performance ComputingApplied MathematicsPlasma PhysicsScientific Machine Learning
Chao Yao
Chao Yao
Northwestern polytechnical university
Y
Yao Zhao
Institute of Information Science, Beijing Jiaotong University; Visual Intelligence + X International Cooperation Joint Laboratory of MOE, Beijing Jiaotong University