MirrorMamba: Towards Scalable and Robust Mirror Detection in Videos

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video mirroring detection methods rely on single dynamic cues and suffer from limited robustness and scalability due to narrow CNN receptive fields or the quadratic computational complexity of Transformers. To address these limitations, this work proposes MambaMirror—the first Mamba-based architecture tailored for mirroring detection—incorporating multimodal cues including depth, optical flow, and multi-directional correspondence relations. We design a multi-directional correspondence extractor and a layer-wise boundary-enhanced decoder to achieve a balanced trade-off between global modeling capability and linear computational complexity. Furthermore, our framework jointly models spatial state representations, optical flow, and depth-aware features. Extensive experiments demonstrate state-of-the-art performance across multiple video and image mirroring detection benchmarks, significantly improving generalization and practical applicability—especially in complex dynamic scenes.

Technology Category

Application Category

📝 Abstract
Video mirror detection has received significant research attention, yet existing methods suffer from limited performance and robustness. These approaches often over-rely on single, unreliable dynamic features, and are typically built on CNNs with limited receptive fields or Transformers with quadratic computational complexity. To address these limitations, we propose a new effective and scalable video mirror detection method, called MirrorMamba. Our approach leverages multiple cues to adapt to diverse conditions, incorporating perceived depth, correspondence and optical. We also introduce an innovative Mamba-based Multidirection Correspondence Extractor, which benefits from the global receptive field and linear complexity of the emerging Mamba spatial state model to effectively capture correspondence properties. Additionally, we design a Mamba-based layer-wise boundary enforcement decoder to resolve the unclear boundary caused by the blurred depth map. Notably, this work marks the first successful application of the Mamba-based architecture in the field of mirror detection. Extensive experiments demonstrate that our method outperforms existing state-of-the-art approaches for video mirror detection on the benchmark datasets. Furthermore, on the most challenging and representative image-based mirror detection dataset, our approach achieves state-of-the-art performance, proving its robustness and generalizability.
Problem

Research questions and friction points this paper is trying to address.

Existing video mirror detection lacks performance and robustness
Current methods rely on unreliable features with computational limitations
Unclear boundaries from blurred depth maps degrade detection accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages multiple cues including depth and correspondence
Introduces Mamba-based Multidirection Correspondence Extractor
Designs Mamba-based layer-wise boundary enforcement decoder
🔎 Similar Papers
No similar papers found.
R
Rui Song
City University of Hong Kong
Jiaying Lin
Jiaying Lin
Peking University
Computer VisionMultimodal
R
Rynson W. H. Lau
City University of Hong Kong