🤖 AI Summary
This work addresses the poor geometric quality in existing text-to-3D face generation methods, which often stems from irregular vertex distributions. To overcome this limitation, the authors propose constraining 3D facial geometry to a topological sphere, enabling a regular spherical representation that can be unfolded into a 2D chart. This formulation facilitates the integration of a conditional diffusion model for joint generation of geometry and texture. The approach represents the first seamless fusion of 2D diffusion models with 3D face geometry synthesis, leveraging spherical parameterization to ensure uniform point distribution, support robust mesh reconstruction, and enable geometry-guided texture synthesis. Experiments demonstrate that the method significantly outperforms current state-of-the-art techniques in text-to-3D generation, face reconstruction, and text-driven editing tasks, achieving superior geometric fidelity, textual alignment, and inference efficiency.
📝 Abstract
A fundamental challenge in text-to-3D face generation is achieving high-quality geometry. The core difficulty lies in the arbitrary and intricate distribution of vertices in 3D space, making it challenging for existing models to establish clean connectivity and resulting in suboptimal geometry. To address this, our core insight is to simplify the underlying geometric structure by constraining the distribution onto a simple and regular manifold, a topological sphere. Building on this, we first propose the Spherical Geometry Representation, a novel face representation that anchors geometric signals to uniform spherical coordinates. This guarantees a regular point distribution, from which the mesh connectivity can be robustly reconstructed. Critically, this canonical sphere can be seamlessly unwrapped into a 2D map, creating a perfect synergy with powerful 2D generative models. We then introduce Spherical Geometry Diffusion, a conditional diffusion framework built upon this 2D map. It enables diverse and controllable generation by jointly modeling geometry and texture, where the geometry explicitly conditions the texture synthesis process. Our method's effectiveness is demonstrated through its success in a wide range of tasks: text-to-3D generation, face reconstruction, and text-based 3D editing. Extensive experiments show that our approach substantially outperforms existing methods in geometric quality, textual fidelity, and inference efficiency.