Global Regulation and Excitation via Attention Tuning for Stereo Matching

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

260K/year

🤖 AI Summary

Stereo matching suffers from degraded performance in ill-posed regions—such as occlusions, textureless areas, and repetitive patterns—primarily due to the lack of joint modeling of global contextual information and explicit geometric constraints. To address this, we propose GREAT, a novel framework that, for the first time, integrates spatial, matching, and voxel triple attention mechanisms within iterative stereo matching pipelines. This enables simultaneous global context awareness and epipolar geometry guidance. We embed our attention modules into state-of-the-art iterative architectures—including RAFT-Stereo and IGEV-Stereo—and employ end-to-end attention tuning to significantly enhance cost volume representation. Extensive experiments demonstrate that GREAT achieves top-ranked performance on Scene Flow, KITTI 2015, and ETH3D benchmarks, and ranks second on Middlebury. Notably, GREAT-IGEV constitutes the current best-performing publicly available method.

Technology Category

Application Category

📝 Abstract

Stereo matching achieves significant progress with iterative algorithms like RAFT-Stereo and IGEV-Stereo. However, these methods struggle in ill-posed regions with occlusions, textureless, or repetitive patterns, due to a lack of global context and geometric information for effective iterative refinement. To enable the existing iterative approaches to incorporate global context, we propose the Global Regulation and Excitation via Attention Tuning (GREAT) framework which encompasses three attention modules. Specifically, Spatial Attention (SA) captures the global context within the spatial dimension, Matching Attention (MA) extracts global context along epipolar lines, and Volume Attention (VA) works in conjunction with SA and MA to construct a more robust cost-volume excited by global context and geometric details. To verify the universality and effectiveness of this framework, we integrate it into several representative iterative stereo-matching methods and validate it through extensive experiments, collectively denoted as GREAT-Stereo. This framework demonstrates superior performance in challenging ill-posed regions. Applied to IGEV-Stereo, among all published methods, our GREAT-IGEV ranks first on the Scene Flow test set, KITTI 2015, and ETH3D leaderboards, and achieves second on the Middlebury benchmark. Code is available at https://github.com/JarvisLee0423/GREAT-Stereo.

Problem

Research questions and friction points this paper is trying to address.

Addresses stereo matching challenges in ill-posed regions

Enhances global context and geometric information integration

Improves performance in occlusions and textureless areas

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial Attention captures global context spatially

Matching Attention extracts epipolar global context

Volume Attention constructs robust cost-volume globally

🔎 Similar Papers

No similar papers found.