🤖 AI Summary
To address three key challenges in 3D drivable virtual avatar makeup transfer—dynamic inconsistency of makeup across expressions, identity leakage, and insufficient fine-grained control—this paper proposes the Coherent Duplication framework. Leveraging a pre-trained diffusion model, it transfers realistic makeup from a single reference image; ensures spatial coherence of makeup under multi-view rendering and facial animation via a globally consistent UV mapping; and employs a refinement module for localized detail enhancement. Distinct from prior work, our method is the first to achieve high-fidelity, fully drivable 3D makeup synthesis while rigorously preserving the avatar’s identity. Quantitative and qualitative evaluations demonstrate that our approach significantly outperforms current state-of-the-art 3D editing methods in both makeup quality and visual consistency across animated sequences.
📝 Abstract
Similar to facial beautification in real life, 3D virtual avatars require personalized customization to enhance their visual appeal, yet this area remains insufficiently explored. Although current 3D Gaussian editing methods can be adapted for facial makeup purposes, these methods fail to meet the fundamental requirements for achieving realistic makeup effects: 1) ensuring a consistent appearance during drivable expressions, 2) preserving the identity throughout the makeup process, and 3) enabling precise control over fine details. To address these, we propose a specialized 3D makeup method named AvatarMakeup, leveraging a pretrained diffusion model to transfer makeup patterns from a single reference photo of any individual. We adopt a coarse-to-fine idea to first maintain the consistent appearance and identity, and then to refine the details. In particular, the diffusion model is employed to generate makeup images as supervision. Due to the uncertainties in diffusion process, the generated images are inconsistent across different viewpoints and expressions. Therefore, we propose a Coherent Duplication method to coarsely apply makeup to the target while ensuring consistency across dynamic and multiview effects. Coherent Duplication optimizes a global UV map by recoding the averaged facial attributes among the generated makeup images. By querying the global UV map, it easily synthesizes coherent makeup guidance from arbitrary views and expressions to optimize the target avatar. Given the coarse makeup avatar, we further enhance the makeup by incorporating a Refinement Module into the diffusion model to achieve high makeup quality. Experiments demonstrate that AvatarMakeup achieves state-of-the-art makeup transfer quality and consistency throughout animation.