InsTaG: Learning Personalized 3D Talking Head from Few-Second Video

๐Ÿ“… 2025-02-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing radiance-field-based personalized 3D talking-head synthesis methods require extensive training data and prolonged optimization, hindering rapid adaptation to novel identities. This paper proposes the first lightweight personalization framework tailored for 3D Gaussian Splatting (3DGS), featuring identity-agnostic pretraining and motion-aligned adaptive fine-tuningโ€”thereby decoupling universal motion priors from identity-specific representations within 3DGS for the first time. We introduce a motion alignment loss and dynamic structural regularization, enabling millisecond-level identity adaptation from less than five seconds of input video. Experiments demonstrate state-of-the-art performance in visual fidelity and lip-sync accuracy, real-time inference speed (โ‰ฅ30 FPS), over 90% reduction in training time, and significant breakthroughs in both data and computational efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
Despite exhibiting impressive performance in synthesizing lifelike personalized 3D talking heads, prevailing methods based on radiance fields suffer from high demands for training data and time for each new identity. This paper introduces InsTaG, a 3D talking head synthesis framework that allows a fast learning of realistic personalized 3D talking head from few training data. Built upon a lightweight 3DGS person-specific synthesizer with universal motion priors, InsTaG achieves high-quality and fast adaptation while preserving high-level personalization and efficiency. As preparation, we first propose an Identity-Free Pre-training strategy that enables the pre-training of the person-specific model and encourages the collection of universal motion priors from long-video data corpus. To fully exploit the universal motion priors to learn an unseen new identity, we then present a Motion-Aligned Adaptation strategy to adaptively align the target head to the pre-trained field, and constrain a robust dynamic head structure under few training data. Experiments demonstrate our outstanding performance and efficiency under various data scenarios to render high-quality personalized talking heads.
Problem

Research questions and friction points this paper is trying to address.

Fast 3D talking head synthesis
Learning from few training data
Personalized identity adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight 3DGS synthesizer
Identity-Free Pre-training strategy
Motion-Aligned Adaptation strategy
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Jiahe Li
School of Computer Science and Engineering, State Key Laboratory of Complex & Critical Software Environment, Jiangxi Research Institute, Beihang University
J
Jiawei Zhang
School of Computer Science and Engineering, State Key Laboratory of Complex & Critical Software Environment, Jiangxi Research Institute, Beihang University
Xiao Bai
Xiao Bai
Professor of Computer Science, Beihang University
pattern recognitioncomputer vision
Jin Zheng
Jin Zheng
Lecturer in Data Science, University of Bristol
J
Jun Zhou
School of Information and Communication Technology, Griffith University
L
Lin Gu
RIKEN AIP, The University of Tokyo