🤖 AI Summary
This work addresses metric-scale 3D point cloud reconstruction from a single image, jointly optimizing absolute scale accuracy, relative geometric fidelity, and fine-grained detail recovery. To mitigate detail degradation caused by noise in real-world images, we propose a unified data refinement framework that integrates heterogeneous real-world data and employs high-fidelity synthetic labels for filtering and completion. We extend affine-invariant representations to jointly encode both relative geometry and absolute scale. Within the MoGe architecture, we introduce an explicit scale calibration mechanism and adopt a hybrid dataset training strategy. Experiments demonstrate that our method significantly outperforms state-of-the-art monocular depth estimation approaches across multiple benchmarks. It achieves superior performance in scale error, depth accuracy, and surface detail completeness—marking the first approach to concurrently deliver high-fidelity relative geometry, precise metric-scale reconstruction, and rich geometric detail.
📝 Abstract
We propose MoGe-2, an advanced open-domain geometry estimation model that recovers a metric scale 3D point map of a scene from a single image. Our method builds upon the recent monocular geometry estimation approach, MoGe, which predicts affine-invariant point maps with unknown scales. We explore effective strategies to extend MoGe for metric geometry prediction without compromising the relative geometry accuracy provided by the affine-invariant point representation. Additionally, we discover that noise and errors in real data diminish fine-grained detail in the predicted geometry. We address this by developing a unified data refinement approach that filters and completes real data from different sources using sharp synthetic labels, significantly enhancing the granularity of the reconstructed geometry while maintaining the overall accuracy. We train our model on a large corpus of mixed datasets and conducted comprehensive evaluations, demonstrating its superior performance in achieving accurate relative geometry, precise metric scale, and fine-grained detail recovery -- capabilities that no previous methods have simultaneously achieved.