🤖 AI Summary
This work addresses the trade-off between facial attribute disentanglement and identity preservation in generating drivable, identity-consistent personalized 3D cartoon avatars from a single portrait image. To this end, we: (1) introduce the first large-scale synthetic 2D facial video dataset with fine-grained attribute annotations (e.g., age, expression, pose); (2) propose an interpolation-based 2D supervision strategy for latent-space regularization in 3D Gaussian Splatting, enabling continuous and disentangled facial attribute manipulation; and (3) incorporate latent-space continuity modeling and attribute-disentangled representation learning. Experiments demonstrate significant improvements over state-of-the-art methods in identity consistency, editing smoothness, and rendering fidelity—particularly in facial attribute interpolation tasks—while supporting real-time editing and high-quality rendering.
📝 Abstract
We present PERSE, a method for building an animatable personalized generative avatar from a reference portrait. Our avatar model enables facial attribute editing in a continuous and disentangled latent space to control each facial attribute, while preserving the individual's identity. To achieve this, our method begins by synthesizing large-scale synthetic 2D video datasets, where each video contains consistent changes in the facial expression and viewpoint, combined with a variation in a specific facial attribute from the original input. We propose a novel pipeline to produce high-quality, photorealistic 2D videos with facial attribute editing. Leveraging this synthetic attribute dataset, we present a personalized avatar creation method based on the 3D Gaussian Splatting, learning a continuous and disentangled latent space for intuitive facial attribute manipulation. To enforce smooth transitions in this latent space, we introduce a latent space regularization technique by using interpolated 2D faces as supervision. Compared to previous approaches, we demonstrate that PERSE generates high-quality avatars with interpolated attributes while preserving identity of reference person.