Gather-Scatter Mamba: Accelerating Propagation with Efficient State Space Model

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Traditional recurrent video super-resolution (VSR) methods suffer from gradient vanishing and poor parallelism, while causal Mamba-based models are inherently limited in modeling fine-grained spatial dependencies. To address these issues, we propose an efficient hybrid spatiotemporal modeling architecture. Our approach features: (1) a Gather-Scatter Mamba mechanism that aligns neighboring frame features to the central frame within a temporal window before aggregation and scattering, mitigating occlusion artifacts and enhancing feature redistribution; and (2) integration of shifted-window self-attention to explicitly capture local spatial dependencies, compensating for Mamba’s structural constraints. The architecture retains linear time complexity while enabling precise spatiotemporal feature propagation. Extensive experiments demonstrate state-of-the-art performance on multiple VSR benchmarks, along with significantly accelerated inference—effectively balancing accuracy and efficiency.

Technology Category

Application Category

📝 Abstract

State Space Models (SSMs)-most notably RNNs-have historically played a central role in sequential modeling. Although attention mechanisms such as Transformers have since dominated due to their ability to model global context, their quadratic complexity and limited scalability make them less suited for long sequences. Video super-resolution (VSR) methods have traditionally relied on recurrent architectures to propagate features across frames. However, such approaches suffer from well-known issues including vanishing gradients, lack of parallelism, and slow inference speed. Recent advances in selective SSMs like Mamba offer a compelling alternative: by enabling input-dependent state transitions with linear-time complexity, Mamba mitigates these issues while maintaining strong long-range modeling capabilities. Despite this potential, Mamba alone struggles to capture fine-grained spatial dependencies due to its causal nature and lack of explicit context aggregation. To address this, we propose a hybrid architecture that combines shifted window self-attention for spatial context aggregation with Mamba-based selective scanning for efficient temporal propagation. Furthermore, we introduce Gather-Scatter Mamba (GSM), an alignment-aware mechanism that warps features toward a center anchor frame within the temporal window before Mamba propagation and scatters them back afterward, effectively reducing occlusion artifacts and ensuring effective redistribution of aggregated information across all frames. The official implementation is provided at: https://github.com/Ko-Lani/GSMamba.

Problem

Research questions and friction points this paper is trying to address.

Accelerating video super-resolution with efficient state space models

Overcoming vanishing gradients and slow inference in recurrent architectures

Enhancing spatial-temporal modeling with hybrid attention and Mamba mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid architecture combining self-attention and Mamba

Gather-Scatter mechanism warping features around anchor frame

Selective scanning with linear-time complexity for propagation

🔎 Similar Papers

No similar papers found.