Decoupling Bidirectional Geometric Representations of 4D cost volume with 2D convolution

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing high-performance real-time stereo matching methods heavily rely on computationally intensive 3D convolutions to regularize the 4D cost volume, hindering deployment on mobile devices; lightweight 2D approaches, in contrast, suffer from insufficient accuracy in ill-posed regions. This paper proposes DBStereo—a pure 2D convolutional network for 4D cost volume aggregation—that breaks the paradigmatic dependence on 3D convolutions for the first time. By decoupling bidirectional geometric representations—explicitly modeling geometric structures separately in the spatial domain and disparity dimension—it achieves efficient and accurate cost aggregation. Its lightweight bidirectional geometric aggregation module significantly accelerates inference while preserving high accuracy. On SceneFlow and KITTI benchmarks, DBStereo surpasses mainstream aggregation methods in accuracy and outperforms the iterative method IGEV-Stereo in speed. It establishes a new, concise, efficient, and high-performing baseline for real-time stereo matching on mobile platforms.

Technology Category

Application Category

📝 Abstract
High-performance real-time stereo matching methods invariably rely on 3D regularization of the cost volume, which is unfriendly to mobile devices. And 2D regularization based methods struggle in ill-posed regions. In this paper, we present a deployment-friendly 4D cost aggregation network DBStereo, which is based on pure 2D convolutions. Specifically, we first provide a thorough analysis of the decoupling characteristics of 4D cost volume. And design a lightweight bidirectional geometry aggregation block to capture spatial and disparity representation respectively. Through decoupled learning, our approach achieves real-time performance and impressive accuracy simultaneously. Extensive experiments demonstrate that our proposed DBStereo outperforms all existing aggregation-based methods in both inference time and accuracy, even surpassing the iterative-based method IGEV-Stereo. Our study break the empirical design of using 3D convolutions for 4D cost volume and provides a simple yet strong baseline of the proposed decouple aggregation paradigm for further study. Code will be available at (href{https://github.com/happydummy/DBStereo}{https://github.com/happydummy/DBStereo}) soon.
Problem

Research questions and friction points this paper is trying to address.

Real-time stereo matching with 2D convolutions
Decoupling 4D cost volume geometric representations
Mobile-friendly depth estimation without 3D regularization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples 4D cost volume with 2D convolutions
Uses bidirectional geometry aggregation block
Achieves real-time performance with high accuracy
🔎 Similar Papers
No similar papers found.
Xiaobao Wei
Xiaobao Wei
Institute of Software, Chinese Academy of Sciences
3D Vision
C
Changyong Shu
Nanjing University of Science and Technology
Z
Zhaokun Yue
Nanjing University of Science and Technology
Chang Huang
Chang Huang
Carizon
W
Weiwei Liu
Carizon
S
Shuai Yang
Carizon
L
Lirong Yang
Carizon
P
Peng Gao
Carizon
W
Wenbin Zhang
Carizon
G
Gaochao Zhu
Carizon
C
Chengxiang Wang
Carizon