Interactive State Space Model with Cross-Modal Local Scanning for Depth Super-Resolution

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

223K/year
🤖 AI Summary
Existing guided depth super-resolution methods struggle to achieve efficient and fine-grained semantic interaction between RGB and depth modalities, often due to high computational complexity or independent modality modeling. This work proposes a super-resolution framework based on an interactive state space model, which enables dense, semantics-aware cross-modal interaction through a local scanning mechanism and leverages the Mamba architecture to capture global dependencies with linear complexity. Additionally, a cross-modal matching transformation module is introduced to enhance interaction quality. The proposed method achieves state-of-the-art or highly competitive performance across multiple benchmarks, effectively balancing efficiency and accuracy.
📝 Abstract
Guided depth super-resolution (GDSR) reconstructs HR depth maps from LR inputs with HR RGB guidance. Existing methods either model each modality independently or rely on computationally expensive attention mechanisms with quadratic complexity, hindering the establishment of efficient and semantically interactive joint representations. In this paper, we observe that feature maps from different modalities exhibit semantic-level correlations during feature extraction. This motivates us to develop a more flexible approach enabling dense, semantically-aware deep interactions between modalities. To this end, we propose a novel GDSR framework centered around the Interactive State Space Model. Specifically, we design a cross-modal local scanning mechanism that enables fine-grained semantic interactions between RGB and depth features. Leveraging the Mamba architecture, our framework achieves global modeling with linear complexity. Furthermore, a cross-modal matching transform module is introduced to enhance interactive modeling quality by utilizing representative features from both modalities. Extensive experiments demonstrate competitive performance against state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

guided depth super-resolution
cross-modal interaction
semantic correlation
computational complexity
joint representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive State Space Model
Cross-Modal Local Scanning
Depth Super-Resolution
Mamba Architecture
Guided Super-Resolution
🔎 Similar Papers
No similar papers found.