TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the challenge of industrial-grade 3D face generation, which requires fixed mesh topology to enable rigging, skinning, and animation—yet existing general-purpose 3D generative models produce outputs with inconsistent topologies and redundant vertices, hindering semantic correspondence and asset reuse. To this end, we propose TOPOS, the first framework capable of high-fidelity 3D face generation under a fixed topology. Our approach integrates a TOPOS-VAE variational autoencoder with a Perceiver Resampler for cross-topology mapping, and introduces a TOPOS-DiT manifold generator coupled with an end-to-end UV texture synthesis module to jointly generate geometry and relightable textures from a single image, ensuring vertex-wise consistency. Experiments demonstrate that TOPOS significantly outperforms prior methods in both generation quality and topological compliance, achieving state-of-the-art performance suitable for high-quality digital human asset production.

📝 Abstract

High-fidelity 3D head generation plays a crucial role in the film, animation and video game industries. In industrial pipelines, studios typically enforce a fixed reference topology across all head assets, as such a clean and uniform topology is a prerequisite for production-level rigging, skinning and animation. In this paper, we present TOPOS, a framework tailored for single image conditioned 3D head generation that jointly recovers geometry and appearance under such an industry-standard topology. In contrast to general 3D generative models which produce triangle meshes with inconsistent topology and numerous vertices, hindering semantic correspondence and asset-level reuse, TOPOS generates head meshes with a fixed, studio-style topology, enabling consistent vertex-level correspondence across all generated heads. To model heads under this unified topology, we proposed a novel variational autoencoder structure, termed TOPOS-VAE. Inspired by multi-model large language models (MLLMs), our TOPOS-VAE leverages the Perceiver Resampler to convert input pointclouds sampled from head meshes of diverse topologies into the target reference topology. Building upon TOPOS-VAE's structured latent space, we train a rectified flow transformer, TOPOS-DiT, to efficiently generate high-fidelity head meshes from a single image. We further present TOPOS-Texture, an end-to-end module that produces relightable UV texture maps from the same portrait image via fine-tuning a multimodal image generative model. The generated textures are spatially aligned with the underlying mesh geometry and faithfully preserve high-frequency appearance details. Extensive experiments demonstrate that TOPOS achieves state-of-the-art performance on 3D head generation, surpassing both classical face reconstruction methods and general 3D object generative models, highlighting its effectiveness for digital human creation.

Problem

Research questions and friction points this paper is trying to address.

3D head generation

fixed topology

industry-grade

semantic correspondence

digital human

Innovation

Methods, ideas, or system contributions that make the work stand out.

fixed topology

3D head generation

variational autoencoder