🤖 AI Summary
To address geometric structure degradation and poor model interpretability in learning-based stereo matching for safety-critical applications such as autonomous driving, this paper proposes MoCha-V2. Methodologically, it introduces (1) Motif-Correlation Graphs (MCGs), a novel graph-based mechanism that explicitly models repetitive local textures (motifs) within feature channels, enabling geometrically faithful and interpretable disparity reconstruction; and (2) a wavelet inverse transform coupled with multi-frequency feature fusion, enhancing fine-grained texture representation and disparity estimation accuracy. Evaluated on the Middlebury benchmark, MoCha-V2 achieves state-of-the-art performance at the time of publication, significantly improving matching completeness, robustness to occlusions and textureless regions, and structural interpretability. By jointly optimizing precision and transparency, MoCha-V2 establishes a new paradigm for trustworthy visual perception in safety-critical systems.
📝 Abstract
Real-world applications of stereo matching, such as autonomous driving, place stringent demands on both safety and accuracy. However, learning-based stereo matching methods inherently suffer from the loss of geometric structures in certain feature channels, creating a bottleneck in achieving precise detail matching. Additionally, these methods lack interpretability due to the black-box nature of deep learning. In this paper, we propose MoCha-V2, a novel learning-based paradigm for stereo matching. MoCha-V2 introduces the Motif Correlation Graph (MCG) to capture recurring textures, which are referred to as ``motifs"within feature channels. These motifs reconstruct geometric structures and are learned in a more interpretable way. Subsequently, we integrate features from multiple frequency domains through wavelet inverse transformation. The resulting motif features are utilized to restore geometric structures in the stereo matching process. Experimental results demonstrate the effectiveness of MoCha-V2. MoCha-V2 achieved 1st place on the Middlebury benchmark at the time of its release. Code is available at https://github.com/ZYangChen/MoCha-Stereo.