Generalized Geometry Encoding Volume for Real-time Stereo Matching

📅 2025-12-07

📈 Citations: 0

✨ Influential: 0

career value

255K/year

🤖 AI Summary

To address the dual challenges of poor out-of-distribution generalization and high inference latency in real-time stereo matching, this paper proposes GGEV, a lightweight neural network. Methodologically, it introduces depth-aware geometric priors into the cost aggregation process—marking the first such integration—via a Depth-aware Dynamic Cost Aggregation module that enables robust generalization over geometric volume representations under strict latency constraints. Additionally, it incorporates structural priors from monocular foundation models and employs lightweight feature extraction with dynamic fusion. Evaluated on KITTI 2012/2015 and ETH3D, GGEV achieves state-of-the-art accuracy while demonstrating significantly stronger zero-shot generalization than existing real-time methods. Crucially, it operates at >30 FPS, satisfying stringent real-time requirements.

Technology Category

Application Category

📝 Abstract

Real-time stereo matching methods primarily focus on enhancing in-domain performance but often overlook the critical importance of generalization in real-world applications. In contrast, recent stereo foundation models leverage monocular foundation models (MFMs) to improve generalization, but typically suffer from substantial inference latency. To address this trade-off, we propose Generalized Geometry Encoding Volume (GGEV), a novel real-time stereo matching network that achieves strong generalization. We first extract depth-aware features that encode domain-invariant structural priors as guidance for cost aggregation. Subsequently, we introduce a Depth-aware Dynamic Cost Aggregation (DDCA) module that adaptively incorporates these priors into each disparity hypothesis, effectively enhancing fragile matching relationships in unseen scenes. Both steps are lightweight and complementary, leading to the construction of a generalized geometry encoding volume with strong generalization capability. Experimental results demonstrate that our GGEV surpasses all existing real-time methods in zero-shot generalization capability, and achieves state-of-the-art performance on the KITTI 2012, KITTI 2015, and ETH3D benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Develops a real-time stereo matching network with strong generalization

Extracts depth-aware features for domain-invariant structural guidance

Enhances matching in unseen scenes via adaptive cost aggregation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts depth-aware features for domain-invariant structural priors

Uses Depth-aware Dynamic Cost Aggregation to enhance matching

Constructs lightweight Generalized Geometry Encoding Volume for generalization

🔎 Similar Papers

IGEV++: Iterative Multi-range Geometry Encoding Volumes for Stereo Matching