🤖 AI Summary
To address degraded stereo matching accuracy under sparse LiDAR point clouds (hundreds of points per frame), this paper proposes Guided RAFT-Stereo (GRAFT-Stereo). Analyzing the failure mechanism of LiDAR guidance from a signal processing perspective, we design two depth pre-filling strategies: interpolation-based early-feature fusion and depth map initialization—enabling efficient integration of sparse LiDAR depth priors into the RAFT-Stereo architecture. Evaluated on KITTI and ETH3D benchmarks, GRAFT-Stereo significantly outperforms existing LiDAR-guided methods, maintaining robust high accuracy even under extremely sparse conditions (e.g., <100 points/frame). Moreover, it introduces a scalable, lightweight fusion paradigm tailored for low-power, cost-effective stereo perception systems—achieving substantial accuracy gains without increasing model complexity or inference latency.
📝 Abstract
We investigate LiDAR guidance within the RAFT-Stereo framework, aiming to improve stereo matching accuracy by injecting precise LiDAR depth into the initial disparity map. We find that the effectiveness of LiDAR guidance drastically degrades when the LiDAR points become sparse (e.g., a few hundred points per frame), and we offer a novel explanation from a signal processing perspective. This insight leads to a surprisingly simple solution that enables LiDAR-guided RAFT-Stereo to thrive: pre-filling the sparse initial disparity map with interpolation. Interestingly, we find that pre-filling is also effective when injecting LiDAR depth into image features via early fusion, but for a fundamentally different reason, necessitating a distinct pre-filling approach. By combining both solutions, the proposed Guided RAFT-Stereo (GRAFT-Stereo) significantly outperforms existing LiDAR-guided methods under sparse LiDAR conditions across various datasets. We hope this study inspires more effective LiDAR-guided stereo methods.