Normal-guided Detail-Preserving Neural Implicit Functions for High-Fidelity 3D Surface Reconstruction

📅 2024-06-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Neural implicit methods struggle to reconstruct fine-grained geometry, sharp edges, and thin structures from sparse multi-view RGB inputs—particularly with only two views (front and back). Method: Moving beyond conventional zero-order geometric constraints (e.g., point-projection consistency), we introduce first-order differential constraints—specifically surface normals—as explicit supervision for neural implicit modeling. We estimate monocular depth using Depth Anything and derive approximate image-space surface normals, formulating a normal consistency loss that jointly optimizes the first-order differentiable properties of signed distance functions (SDFs) or NeRF-like implicit fields. Results: Evaluated on both synthetic and real-world datasets, our method achieves high-fidelity 3D surface reconstruction from merely two RGB images. It significantly outperforms state-of-the-art approaches in PSNR, Chamfer distance, and visual quality, demonstrating that normal supervision is critical for recovering fine-scale geometric details.

Technology Category

Application Category

📝 Abstract

Neural implicit representations have emerged as a powerful paradigm for 3D reconstruction. However, despite their success, existing methods fail to capture fine geometric details and thin structures, especially in scenarios where only sparse RGB views of the objects of interest are available. We hypothesize that current methods for learning neural implicit representations from RGB or RGBD images produce 3D surfaces with missing parts and details because they only rely on 0-order differential properties, i.e. the 3D surface points and their projections, as supervisory signals. Such properties, however, do not capture the local 3D geometry around the points and also ignore the interactions between points. This paper demonstrates that training neural representations with first-order differential properties, i.e. surface normals, leads to highly accurate 3D surface reconstruction even in situations where only as few as two RGB (front and back) images are available. Given multiview RGB images of an object of interest, we first compute the approximate surface normals in the image space using the gradient of the depth maps produced using an off-the-shelf monocular depth estimator such as Depth Anything model. An implicit surface regressor is then trained using a loss function that enforces the first-order differential properties of the regressed surface to match those estimated from Depth Anything. Our extensive experiments on a wide range of real and synthetic datasets show that the proposed method achieves an unprecedented level of reconstruction accuracy even when using as few as two RGB views. The detailed ablation study also demonstrates that normal-based supervision plays a key role in this significant improvement in performance, enabling the 3D reconstruction of intricate geometric details and thin structures that were previously challenging to capture.

Problem

Research questions and friction points this paper is trying to address.

Improving 3D reconstruction accuracy with sparse RGB images

Capturing fine geometric details and thin structures

Using surface normals for high-fidelity neural implicit representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses surface normals for accurate 3D reconstruction

Estimates normals from monocular depth maps

Trains SDF network with direct normal supervision

🔎 Similar Papers

ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction