🤖 AI Summary
Existing facial texture generation methods suffer from limited generalization on in-the-wild images, often producing UV textures that deviate from the input in terms of fine details, structural fidelity, and identity consistency. To address this, this work proposes FaceRefiner, which introduces differentiable rendering into a style transfer framework for the first time. By treating 3D-sampled textures as style and generated textures as content, FaceRefiner enables pixel-wise, multi-level (low-, mid-, and high-level) information transfer directly in UV space. This approach significantly enhances both photorealism and identity preservation in the synthesized textures. Extensive experiments on Multi-PIE, CelebA, and FFHQ demonstrate that FaceRefiner consistently outperforms state-of-the-art methods by a substantial margin.
📝 Abstract
Recent facial texture generation methods prefer to use deep networks to synthesize image content and then fill in the UV map, thus generating a compelling full texture from a single image. Nevertheless, the synthesized texture UV map usually comes from a space constructed by the training data or the 2D face generator, which limits the methods' generalization ability for in-the-wild input images. Consequently, their facial details, structures and identity may not be consistent with the input. In this paper, we address this issue by proposing a style transfer-based facial texture refinement method named FaceRefiner. FaceRefiner treats the 3D sampled texture as style and the output of a texture generation method as content. The photo-realistic style is then expected to be transferred from the style image to the content image. Different from current style transfer methods that only transfer high and middle level information to the result, our style transfer method integrates differentiable rendering to also transfer low level (or pixel level) information in the visible face regions. The main benefit of such multi-level information transfer is that, the details, structures and semantics in the input can thus be well preserved. The extensive experiments on Multi-PIE, CelebA and FFHQ datasets demonstrate that our refinement method can improve the texture quality and the face identity preserving ability, compared with state-of-the-arts.