Synthetic Prior for Few-Shot Drivable Head Avatar Inversion

📅 2025-01-12

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

To address poor generalization and imbalanced part modeling in few-shot 3D avatar reconstruction—stemming from scarcity of real-world video data—this paper proposes SynShot. The method introduces a general-purpose 3D prior model pretrained on large-scale synthetic avatars, coupled with a part-wise controllable Gaussian primitive upsampling mechanism that explicitly captures geometric and appearance complexity disparities between skin and hair regions. SynShot integrates 3D Gaussian rasterization with a UV-texture-space convolutional encoder-decoder, and employs synthetic-prior fine-tuning alongside domain-gap bridging strategies. Experiments demonstrate that, given only 3–5 real input images, SynShot surpasses monocular state-of-the-art methods trained on thousands of real images in both novel-view and novel-expression synthesis quality. It significantly advances few-shot driving capability and cross-identity generalization performance.

Technology Category

Application Category

📝 Abstract

We present SynShot, a novel method for the few-shot inversion of a drivable head avatar based on a synthetic prior. We tackle two major challenges. First, training a controllable 3D generative network requires a large number of diverse sequences, for which pairs of images and high-quality tracked meshes are not always available. Second, state-of-the-art monocular avatar models struggle to generalize to new views and expressions, lacking a strong prior and often overfitting to a specific viewpoint distribution. Inspired by machine learning models trained solely on synthetic data, we propose a method that learns a prior model from a large dataset of synthetic heads with diverse identities, expressions, and viewpoints. With few input images, SynShot fine-tunes the pretrained synthetic prior to bridge the domain gap, modeling a photorealistic head avatar that generalizes to novel expressions and viewpoints. We model the head avatar using 3D Gaussian splatting and a convolutional encoder-decoder that outputs Gaussian parameters in UV texture space. To account for the different modeling complexities over parts of the head (e.g., skin vs hair), we embed the prior with explicit control for upsampling the number of per-part primitives. Compared to state-of-the-art monocular methods that require thousands of real training images, SynShot significantly improves novel view and expression synthesis.

Problem

Research questions and friction points this paper is trying to address.

Limited Examples Learning

3D Avatar Control

Data Diversity for Adaptability and Generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

SynShot

3D Gaussian splatting

limited-shot adaptation

🔎 Similar Papers

HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors

2024-08-12arXiv.orgCitations: 3