🤖 AI Summary
Existing general-purpose avatar generation models struggle to faithfully reconstruct high-frequency facial details—such as wrinkles and tattoos—from few-shot inputs. To address this, we propose a low-rank personalization method centered on a learnable 3D feature-space register module, which jointly leverages LoRA adaptation and mid-layer feature injection to enhance identity-specific detail modeling with minimal parameter overhead. Crucially, this register module is the first to embed structured 3D geometric priors into a low-rank adaptation framework, enabling end-to-end optimization. Evaluated on our newly constructed high-detail talking-head video dataset, our approach significantly outperforms state-of-the-art methods: quantitative metrics show improvements of 21.3% lower LPIPS and 34.7% lower FID, while qualitative results confirm precise reconstruction of distinctive facial features. The method achieves an exceptional balance between parameter efficiency and reconstruction fidelity.
📝 Abstract
We introduce a novel method for low-rank personalization of a generic model for head avatar generation. Prior work proposes generic models that achieve high-quality face animation by leveraging large-scale datasets of multiple identities. However, such generic models usually fail to synthesize unique identity-specific details, since they learn a general domain prior. To adapt to specific subjects, we find that it is still challenging to capture high-frequency facial details via popular solutions like low-rank adaptation (LoRA). This motivates us to propose a specific architecture, a Register Module, that enhances the performance of LoRA, while requiring only a small number of parameters to adapt to an unseen identity. Our module is applied to intermediate features of a pre-trained model, storing and re-purposing information in a learnable 3D feature space. To demonstrate the efficacy of our personalization method, we collect a dataset of talking videos of individuals with distinctive facial details, such as wrinkles and tattoos. Our approach faithfully captures unseen faces, outperforming existing methods quantitatively and qualitatively. We will release the code, models, and dataset to the public.