🤖 AI Summary
Existing facial texture reconstruction methods rely on unstable 3D geometric priors and lack semantic disentanglement, hindering region-wise editing and multi-style transfer. This work proposes the first end-to-end diffusion framework that generates high-quality, editable UV textures directly from multi-style face images without geometric guidance. The approach employs a single-stage reconstruction pipeline and gradient-guided inference optimization to mitigate UV misalignment, while a novel semantic distribution-aware training strategy enhances editability. To support this research, we introduce CANVAS, the first multi-style paired texture dataset. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks, significantly improving cross-domain robustness, style consistency, and semantic editability.
📝 Abstract
We propose OMGTex, an end-to-end diffusion-based framework for reconstructing high-quality and editable facial UV textures from multi-style facial images. Existing texture reconstruction methods face two major limitations: (1) Fragility due to reliance on 3D geometry priors, which are difficult to estimate accurately, especially under facial occlusions or in stylized domains; and (2) A lack of semantic disentanglement, inhibiting region-specific texture editing and style transfer. Our work addresses both challenges simultaneously.
Our core innovation is a geometry-free pipeline that directly maps a 2D face image to its corresponding editable UV texture. We introduce two key techniques: First, to address the challenge of UV misalignment common in diffusion generation, we introduce a gradient-guided refinement strategy at inference time, which explicitly corrects structural consistency. Second, we leverage the inherent semantic distribution capability of diffusion models and design a novel training paradigm to enhance this tendency, enabling semantic-aware editing of facial texture. Furthermore, to address the data scarcity in multi-style texture reconstruction, we construct CANVAS, the first comprehensive paired texture reconstruction dataset covering realistic and diverse stylized domains.
To the best of our knowledge, OMGTex is the first geometry-free inference framework that achieves robust, style-consistent, and editable facial texture reconstruction across diverse domains. Our method achieves state-of-the-art performance on multiple facial texture benchmarks.