High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address geometric distortions and blurred details in single-view RGB-based 3D object reconstruction—caused by geometric ambiguity and the lack of structural priors in Gaussian representations—this paper proposes an RGBN-volume hybrid representation and introduces the first voxel-Gaussian joint modeling framework. Methodologically, it fuses RGB and surface normal (N) features to construct an explicit voxel-based guidance signal, enabling multi-modal feature alignment and single-view geometric reasoning, which subsequently drives Gaussian splatting optimization. This design jointly ensures geometric fidelity and expressive radiance field modeling. Extensive experiments demonstrate significant improvements over state-of-the-art methods in reconstruction quality, cross-category generalization, and inference efficiency. The generated 3D objects exhibit both rich surface detail and accurate geometric structure.

Technology Category

Application Category

📝 Abstract
Recently single-view 3D generation via Gaussian splatting has emerged and developed quickly. They learn 3D Gaussians from 2D RGB images generated from pre-trained multi-view diffusion (MVD) models, and have shown a promising avenue for 3D generation through a single image. Despite the current progress, these methods still suffer from the inconsistency jointly caused by the geometric ambiguity in the 2D images, and the lack of structure of 3D Gaussians, leading to distorted and blurry 3D object generation. In this paper, we propose to fix these issues by GS-RGBN, a new RGBN-volume Gaussian Reconstruction Model designed to generate high-fidelity 3D objects from single-view images. Our key insight is a structured 3D representation can simultaneously mitigate the afore-mentioned two issues. To this end, we propose a novel hybrid Voxel-Gaussian representation, where a 3D voxel representation contains explicit 3D geometric information, eliminating the geometric ambiguity from 2D images. It also structures Gaussians during learning so that the optimization tends to find better local optima. Our 3D voxel representation is obtained by a fusion module that aligns RGB features and surface normal features, both of which can be estimated from 2D images. Extensive experiments demonstrate the superiority of our methods over prior works in terms of high-quality reconstruction results, robust generalization, and good efficiency.
Problem

Research questions and friction points this paper is trying to address.

Resolving geometric ambiguity in 2D images for 3D generation
Structuring 3D Gaussians to reduce distortion and blur
Improving single-view 3D object reconstruction fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Voxel-Gaussian representation for 3D
RGB and surface normal feature fusion
Structured 3D Gaussians mitigate distortion
🔎 Similar Papers
No similar papers found.
Y
Yiyang Shen
State Key Lab of CAD&CG, Zhejiang University
K
Kun Zhou
State Key Lab of CAD&CG, Zhejiang University
H
He Wang
AI Centre, University College London
Y
Yin Yang
University of Utah
Tianjia Shao
Tianjia Shao
University of Leeds
computer graphics