Interactive State Space Model with Cross-Modal Local Scanning for Depth Super-Resolution

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Existing guided depth super-resolution methods struggle to achieve efficient and fine-grained semantic interaction between RGB and depth modalities, often due to high computational complexity or independent modality modeling. This work proposes a super-resolution framework based on an interactive state space model, which enables dense, semantics-aware cross-modal interaction through a local scanning mechanism and leverages the Mamba architecture to capture global dependencies with linear complexity. Additionally, a cross-modal matching transformation module is introduced to enhance interaction quality. The proposed method achieves state-of-the-art or highly competitive performance across multiple benchmarks, effectively balancing efficiency and accuracy.

📝 Abstract

Guided depth super-resolution (GDSR) reconstructs HR depth maps from LR inputs with HR RGB guidance. Existing methods either model each modality independently or rely on computationally expensive attention mechanisms with quadratic complexity, hindering the establishment of efficient and semantically interactive joint representations. In this paper, we observe that feature maps from different modalities exhibit semantic-level correlations during feature extraction. This motivates us to develop a more flexible approach enabling dense, semantically-aware deep interactions between modalities. To this end, we propose a novel GDSR framework centered around the Interactive State Space Model. Specifically, we design a cross-modal local scanning mechanism that enables fine-grained semantic interactions between RGB and depth features. Leveraging the Mamba architecture, our framework achieves global modeling with linear complexity. Furthermore, a cross-modal matching transform module is introduced to enhance interactive modeling quality by utilizing representative features from both modalities. Extensive experiments demonstrate competitive performance against state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

guided depth super-resolution

cross-modal interaction

semantic correlation

computational complexity

joint representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive State Space Model

Cross-Modal Local Scanning

Depth Super-Resolution