Gather-Scatter Mamba: Accelerating Propagation with Efficient State Space Model

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional recurrent video super-resolution (VSR) methods suffer from gradient vanishing and poor parallelism, while causal Mamba-based models are inherently limited in modeling fine-grained spatial dependencies. To address these issues, we propose an efficient hybrid spatiotemporal modeling architecture. Our approach features: (1) a Gather-Scatter Mamba mechanism that aligns neighboring frame features to the central frame within a temporal window before aggregation and scattering, mitigating occlusion artifacts and enhancing feature redistribution; and (2) integration of shifted-window self-attention to explicitly capture local spatial dependencies, compensating for Mamba’s structural constraints. The architecture retains linear time complexity while enabling precise spatiotemporal feature propagation. Extensive experiments demonstrate state-of-the-art performance on multiple VSR benchmarks, along with significantly accelerated inference—effectively balancing accuracy and efficiency.

Technology Category

Application Category

📝 Abstract
State Space Models (SSMs)-most notably RNNs-have historically played a central role in sequential modeling. Although attention mechanisms such as Transformers have since dominated due to their ability to model global context, their quadratic complexity and limited scalability make them less suited for long sequences. Video super-resolution (VSR) methods have traditionally relied on recurrent architectures to propagate features across frames. However, such approaches suffer from well-known issues including vanishing gradients, lack of parallelism, and slow inference speed. Recent advances in selective SSMs like Mamba offer a compelling alternative: by enabling input-dependent state transitions with linear-time complexity, Mamba mitigates these issues while maintaining strong long-range modeling capabilities. Despite this potential, Mamba alone struggles to capture fine-grained spatial dependencies due to its causal nature and lack of explicit context aggregation. To address this, we propose a hybrid architecture that combines shifted window self-attention for spatial context aggregation with Mamba-based selective scanning for efficient temporal propagation. Furthermore, we introduce Gather-Scatter Mamba (GSM), an alignment-aware mechanism that warps features toward a center anchor frame within the temporal window before Mamba propagation and scatters them back afterward, effectively reducing occlusion artifacts and ensuring effective redistribution of aggregated information across all frames. The official implementation is provided at: https://github.com/Ko-Lani/GSMamba.
Problem

Research questions and friction points this paper is trying to address.

Accelerating video super-resolution with efficient state space models
Overcoming vanishing gradients and slow inference in recurrent architectures
Enhancing spatial-temporal modeling with hybrid attention and Mamba mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid architecture combining self-attention and Mamba
Gather-Scatter mechanism warping features around anchor frame
Selective scanning with linear-time complexity for propagation
🔎 Similar Papers
No similar papers found.
Hyun-kyu Ko
Hyun-kyu Ko
Sungkyunkwan University
Computer Vision
Y
Youbin Kim
Department of Artificial Intelligence, Sungkyunkwan University
J
Jihyeon Park
Department of Electrical and Computer Engineering, Sungkyunkwan University
Dongheok Park
Dongheok Park
Sungkyunkwan University
Artificial IntelligenceComputer Vision
Gyeongjin Kang
Gyeongjin Kang
Sungkyunkwan University
Computer Vision/Graphics
W
Wonjun Cho
Hanwha Systems, Republic of Korea
H
Hyung Yi
Hanwha Systems, Republic of Korea
Eunbyung Park
Eunbyung Park
Yonsei University
Computer VisionMachine LearningDeep Learning