π€ AI Summary
This work proposes a novel stereo matching architecture that eliminates the need for cost volumes, which are traditionally computationally expensive and inefficient. By leveraging image warping and an efficient field transformation module, the method achieves high-precision disparity estimation without explicitly constructing a cost volumeβa first in the field. The approach attains state-of-the-art performance across three major benchmarks: ETH3D, KITTI, and Middlebury, securing top rankings on all. Notably, it reduces zero-shot error on ETH3D by 81% and accelerates inference by 1.8β6.7Γ compared to prior methods, substantially improving both computational efficiency and generalization capability.
π Abstract
We introduce WAFT-Stereo, a simple and effective warping-based method for stereo matching. WAFT-Stereo demonstrates that cost volumes, a common design used in many leading methods, are not necessary for strong performance and can be replaced by warping with improved efficiency. WAFT-Stereo ranks first on ETH3D, KITTI and Middlebury public benchmarks, reducing the zero-shot error by 81% on ETH3D benchmark, while being 1.8-6.7x faster than competitive methods. Code and model weights are available at https://github.com/princeton-vl/WAFT-Stereo.