🤖 AI Summary
Expression retargeting across structurally diverse face meshes—achieving high-fidelity expression cloning and fine-grained controllable editing without mesh remeshing or pre-alignment—remains challenging. Method: We propose a global latent-code-based framework for localized skin weight prediction. Our approach innovatively decouples global semantic representations into per-vertex skin weights and introduces a joint learning scheme integrating Facial Action Coding System (FACS)-guided semantic supervision, neural skin weight regression, and indirect segmentation label guidance. Contribution/Results: The method supports arbitrary-topology input meshes and outperforms state-of-the-art methods in expression fidelity, deformation transfer accuracy, and cross-mesh generalization. It enables real-time editing on unseen face geometries while simultaneously supporting holistic expression control and localized geometric detail recovery.
📝 Abstract
Accurately retargeting facial expressions to a face mesh while enabling manipulation is a key challenge in facial animation retargeting. Recent deep‐learning methods address this by encoding facial expressions into a global latent code, but they often fail to capture fine‐grained details in local regions. While some methods improve local accuracy by transferring deformations locally, this often complicates overall control of the facial expression. To address this, we propose a method that combines the strengths of both global and local deformation models. Our approach enables intuitive control and detailed expression cloning across diverse face meshes, regardless of their underlying structures. The core idea is to localize the influence of the global latent code on the target mesh. Our model learns to predict skinning weights for each vertex of the target face mesh through indirect supervision from predefined segmentation labels. These predicted weights localize the global latent code, enabling precise and region‐specific deformations even for meshes with unseen shapes. We supervise the latent code using Facial Action Coding System (FACS)‐based blendshapes to ensure interpretability and allow straightforward editing of the generated animation. Through extensive experiments, we demonstrate improved performance over state‐of‐the‐art methods in terms of expression fidelity, deformation transfer accuracy, and adaptability across diverse mesh structures.