Photo3D: Advancing Photorealistic 3D Generation through Structure-Aligned Detail Enhancement

πŸ“… 2025-12-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current 3D-native generative models achieve notable progress in geometric modeling but suffer from suboptimal appearance fidelity due to the scarcity of high-fidelity real-world texture dataβ€”caused by limited scanning resolution, non-rigid deformations, and scene-scale variability. To address this, we propose a structure-aligned multi-view synthesis framework: (1) leveraging GPT-4o to synthesize high-quality, multi-view, semantically consistent images for constructing a detail-enhanced training set; (2) introducing perceptual feature adaptation and explicit semantic-structure matching to jointly optimize geometric consistency and texture realism. Our method supports both geometry-texture coupled and decoupled generation paradigms, ensuring strong generalization. Experiments demonstrate state-of-the-art performance across multiple 3D generation benchmarks, with significant improvements in texture richness and cross-view consistency.

Technology Category

Application Category

πŸ“ Abstract
Although recent 3D-native generators have made great progress in synthesizing reliable geometry, they still fall short in achieving realistic appearances. A key obstacle lies in the lack of diverse and high-quality real-world 3D assets with rich texture details, since capturing such data is intrinsically difficult due to the diverse scales of scenes, non-rigid motions of objects, and the limited precision of 3D scanners. We introduce Photo3D, a framework for advancing photorealistic 3D generation, which is driven by the image data generated by the GPT-4o-Image model. Considering that the generated images can distort 3D structures due to their lack of multi-view consistency, we design a structure-aligned multi-view synthesis pipeline and construct a detail-enhanced multi-view dataset paired with 3D geometry. Building on it, we present a realistic detail enhancement scheme that leverages perceptual feature adaptation and semantic structure matching to enforce appearance consistency with realistic details while preserving the structural consistency with the 3D-native geometry. Our scheme is general to different 3D-native generators, and we present dedicated training strategies to facilitate the optimization of geometry-texture coupled and decoupled 3D-native generation paradigms. Experiments demonstrate that Photo3D generalizes well across diverse 3D-native generation paradigms and achieves state-of-the-art photorealistic 3D generation performance.
Problem

Research questions and friction points this paper is trying to address.

Generates photorealistic 3D content lacking realistic appearance details
Overcomes limited real-world 3D data with diverse scenes and textures
Enhances 3D structure alignment while preserving realistic detail consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GPT-4o-Image generated images for 3D data enhancement
Implements structure-aligned multi-view synthesis for consistency
Applies perceptual feature adaptation for realistic detail enhancement
πŸ”Ž Similar Papers
No similar papers found.