ICG-MVSNet: Learning Intra-view and Cross-view Relationships for Guidance in Multi-View Stereo

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current learning-based multi-view stereo (MVS) methods often neglect geometric priors embedded in feature representations and correlation volumes, leading to insufficient robustness in cost volume matching. To address this, we propose a novel framework that explicitly models intra-view spatial coordinate dependencies and cross-view voxel contextual correlations. Our key contributions are: (1) the first joint modeling of intra-view spatial coordinate dependencies and cross-view voxel consistency guidance; (2) a lightweight cross-view aggregation module for efficient voxel-level correlation modeling; and (3) end-to-end differentiable depth regression. Evaluated on DTU and Tanks and Temples benchmarks, our method achieves state-of-the-art performance while improving inference speed by 23% and reducing GPU memory consumption by 31%.

Technology Category

Application Category

📝 Abstract
Multi-view Stereo (MVS) aims to estimate depth and reconstruct 3D point clouds from a series of overlapping images. Recent learning-based MVS frameworks overlook the geometric information embedded in features and correlations, leading to weak cost matching. In this paper, we propose ICG-MVSNet, which explicitly integrates intra-view and cross-view relationships for depth estimation. Specifically, we develop an intra-view feature fusion module that leverages the feature coordinate correlations within a single image to enhance robust cost matching. Additionally, we introduce a lightweight cross-view aggregation module that efficiently utilizes the contextual information from volume correlations to guide regularization. Our method is evaluated on the DTU dataset and Tanks and Temples benchmark, consistently achieving competitive performance against state-of-the-art works, while requiring lower computational resources.
Problem

Research questions and friction points this paper is trying to address.

Estimates depth and 3D point clouds from overlapping images
Addresses weak cost matching in learning-based MVS frameworks
Integrates intra-view and cross-view relationships for depth estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Intra-view feature fusion for robust matching
Cross-view aggregation for contextual guidance
Efficient computational resource utilization
🔎 Similar Papers
No similar papers found.