🤖 AI Summary
Existing multi-view 3D reconstruction methods rely heavily on precise camera calibration and accurate pose estimation, resulting in complex preprocessing and poor deployability. This paper proposes an end-to-end feedforward framework that reconstructs surfaces from sparse RGB images alone—without requiring pose priors—and completes reconstruction within 10 seconds. Our approach introduces two key innovations: (1) a multi-branch multi-view decoder integrated with cross-view attention to enforce geometric consistency across views; and (2) D-Normal regularization based on 3D Gaussian representations, which jointly optimizes surface normals and geometric parameters to enhance fine-grained detail fidelity. Evaluated on ScanNet++ and Replica, our method achieves state-of-the-art surface reconstruction performance while demonstrating high efficiency, strong generalization across scenes, and practical deployability.
📝 Abstract
Current multi-view 3D reconstruction methods rely on accurate camera calibration and pose estimation, requiring complex and time-intensive pre-processing that hinders their practical deployment. To address this challenge, we introduce Surf3R, an end-to-end feedforward approach that reconstructs 3D surfaces from sparse views without estimating camera poses and completes an entire scene in under 10 seconds. Our method employs a multi-branch and multi-view decoding architecture in which multiple reference views jointly guide the reconstruction process. Through the proposed branch-wise processing, cross-view attention, and inter-branch fusion, the model effectively captures complementary geometric cues without requiring camera calibration. Moreover, we introduce a D-Normal regularizer based on an explicit 3D Gaussian representation for surface reconstruction. It couples surface normals with other geometric parameters to jointly optimize the 3D geometry, significantly improving 3D consistency and surface detail accuracy. Experimental results demonstrate that Surf3R achieves state-of-the-art performance on multiple surface reconstruction metrics on ScanNet++ and Replica datasets, exhibiting excellent generalization and efficiency.