🤖 AI Summary
Unsupervised stereo matching suffers from ambiguous correspondences in repetitive-texture and textureless regions; existing approaches rely on stochastic sparse correspondences for depth ordering, resulting in inefficient geometric knowledge utilization and noise susceptibility. This paper proposes a novel unsupervised framework: (1) a plug-and-play disparity confidence estimation module to construct quasi-dense, reliable correspondences; (2) a relative depth prior-guided loss and a dual-scale disparity smoothness loss to enhance geometric knowledge transfer and boundary detail recovery; and (3) end-to-end training integrating local consistency detection with multi-view consistency constraints. Evaluated on the KITTI Stereo benchmark, our method achieves state-of-the-art unsupervised performance, significantly outperforming prior methods leveraging relative depth priors.
📝 Abstract
Unsupervised stereo matching has garnered significant attention for its independence from costly disparity annotations. Typical unsupervised methods rely on the multi-view consistency assumption for training networks, which suffer considerably from stereo matching ambiguities, such as repetitive patterns and texture-less regions. A feasible solution lies in transferring 3D geometric knowledge from a relative depth map to the stereo matching networks. However, existing knowledge transfer methods learn depth ranking information from randomly built sparse correspondences, which makes inefficient utilization of 3D geometric knowledge and introduces noise from mistaken disparity estimates. This work proposes a novel unsupervised learning framework to address these challenges, which comprises a plug-and-play disparity confidence estimation algorithm and two depth prior-guided loss functions. Specifically, the local coherence consistency between neighboring disparities and their corresponding relative depths is first checked to obtain disparity confidence. Afterwards, quasi-dense correspondences are built using only confident disparity estimates to facilitate efficient depth ranking learning. Finally, a dual disparity smoothness loss is proposed to boost stereo matching performance at disparity discontinuities. Experimental results demonstrate that our method achieves state-of-the-art stereo matching accuracy on the KITTI Stereo benchmarks among all unsupervised stereo matching methods.