🤖 AI Summary
To address the high computational cost and inference latency of end-to-end stereo matching models—hindering real-time deployment—this paper proposes a lightweight real-time stereo matching network. Our method introduces two key innovations: (1) a lightweight backbone architecture coupled with an attention-guided learnable cost volume, significantly reducing memory bandwidth consumption and arithmetic complexity; and (2) a novel co-optimization framework that jointly learns attention-weighted cost volumes and employs LogL1 loss, accelerating convergence while improving disparity accuracy recovery. Extensive experiments demonstrate that our approach achieves a 4× reduction in FLOPs compared to state-of-the-art methods such as ACVNet, attains real-time inference speeds of >30 FPS (9–14× faster), and maintains competitive accuracy on standard benchmarks.
📝 Abstract
Recently, end-to-end deep networks based stereo matching methods, mainly because of their performance, have gained popularity. However, this improvement in performance comes at the cost of increased computational and memory bandwidth requirements, thus necessitating specialized hardware (GPUs); even then, these methods have large inference times compared to classical methods. This limits their applicability in real-world applications. Although we desire high accuracy stereo methods albeit with reasonable inference time. To this end, we propose a fast end-to-end stereo matching method. Majority of this speedup comes from integrating a leaner backbone. To recover the performance lost because of a leaner backbone, we propose to use learned attention weights based cost volume combined with LogL1 loss for stereo matching. Using LogL1 loss not only improves the overall performance of the proposed network but also leads to faster convergence. We do a detailed empirical evaluation of different design choices and show that our method requires $4 imes$ less operations and is also about 9 to $14 imes$ faster compared to the state of the art methods like ACVNet [1], LEAStereo [2] and CFNet [3] while giving comparable performance11Code: https://github.com/cogsys-tuebingen/LeanStereo