🤖 AI Summary
To address domain gaps and geometric ambiguities in high-fidelity 3D geometry reconstruction from a single RGB image, this paper proposes a three-stage framework bridging via surface normal maps. First, a noise-decoupled dual-stream CNN enables robust RGB-to-normal estimation. Second, a normal-guided latent diffusion model generates regularized, globally consistent normal fields. Third, an end-to-end differentiable rendering pipeline—trained on synthetic data—refines geometry via gradient-based optimization. Crucially, this work pioneers the use of normal maps as a stable, geometrically unambiguous intermediate representation, effectively mitigating fine-detail loss induced by RGB ambiguity. Extensive evaluations demonstrate significant improvements in fine-grained geometry reconstruction across multiple benchmarks, particularly excelling in recovering intricate carving textures, thin-walled structures, and subtle concave–convex details—surpassing state-of-the-art methods.
📝 Abstract
With the growing demand for high-fidelity 3D models from 2D images, existing methods still face significant challenges in accurately reproducing fine-grained geometric details due to limitations in domain gaps and inherent ambiguities in RGB images. To address these issues, we propose Hi3DGen, a novel framework for generating high-fidelity 3D geometry from images via normal bridging. Hi3DGen consists of three key components: (1) an image-to-normal estimator that decouples the low-high frequency image pattern with noise injection and dual-stream training to achieve generalizable, stable, and sharp estimation; (2) a normal-to-geometry learning approach that uses normal-regularized latent diffusion learning to enhance 3D geometry generation fidelity; and (3) a 3D data synthesis pipeline that constructs a high-quality dataset to support training. Extensive experiments demonstrate the effectiveness and superiority of our framework in generating rich geometric details, outperforming state-of-the-art methods in terms of fidelity. Our work provides a new direction for high-fidelity 3D geometry generation from images by leveraging normal maps as an intermediate representation.