SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM optimization

📅 2024-01-12
🏛️ AAAI Conference on Artificial Intelligence
📈 Citations: 33
Influential: 2
📄 PDF
🤖 AI Summary
To address the poor completeness and heavy reliance on manual hyperparameter tuning in multi-view stereo (MVS) reconstruction of textureless regions, this paper proposes a semantic-guided end-to-end MVS framework. Methodologically, it introduces the Segment Anything Model (SAM) into MVS for the first time, enabling instance-level semantic constraints for pixel-wise deformation matching and propagation. It further designs a spherical coordinate representation coupled with normal-gradient refinement and adaptive depth interval search to enhance geometric consistency. Additionally, an Expectation-Maximization (EM)-based joint optimization framework is formulated to simultaneously refine matching costs and hyperparameters, substantially reducing manual tuning effort. Evaluated on ETH3D and Tanks and Temples benchmarks, the method achieves state-of-the-art accuracy while significantly improving inference speed.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce Segmentation-Driven Deformation Multi-View Stereo (SD-MVS), a method that can effectively tackle challenges in 3D reconstruction of textureless areas. We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes and further leverage these constraints for pixelwise patch deformation on both matching cost and propagation. Concurrently, we propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths, significantly improving the completeness of reconstructed 3D model. Furthermore, we adopt the Expectation-Maximization (EM) algorithm to alternately optimize the aggregate matching cost and hyperparameters, effectively mitigating the problem of parameters being excessively dependent on empirical tuning. Evaluations on the ETH3D high-resolution multi-view stereo benchmark and the Tanks and Temples dataset demonstrate that our method can achieve state-of-the-art results with less time consumption.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing 3D models in textureless areas using segmentation and deformation
Improving model completeness via spherical refinement and gradient optimization
Reducing empirical parameter tuning through EM algorithm optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses SAM for semantic instance segmentation constraints
Combines spherical coordinates with gradient descent refinement
Employs EM algorithm for automated parameter optimization
🔎 Similar Papers
Z
Zhenlong Yuan
Institute of Computing Technology, Chinese Academy of Sciences
Jiakai Cao
Jiakai Cao
Institute of Computing Technology, Chinese Academy of Sciences
Zhaoxin Li
Zhaoxin Li
Georgia Institute of Technology
Robot LearningExplainable Artificial Intelligence
H
Hao Jiang
Institute of Computing Technology, Chinese Academy of Sciences
Z
Zhao Wang
Institute of Computing Technology, Chinese Academy of Sciences