LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

📅 2024-07-03
🏛️ arXiv.org
📈 Citations: 20
Influential: 4
📄 PDF
🤖 AI Summary
This work addresses high-fidelity, controllable portrait video synthesis driven by a single source image. Methodologically, it introduces an efficient implicit keypoint modeling framework featuring: (1) a hybrid blendshape deformation representation that decouples facial expression from head pose; (2) a novel lightweight stitching and dual retargeting module—implemented with a compact MLP—to significantly enhance motion control accuracy and generalization; and (3) implicit keypoint guidance, joint image-video training, and large-scale pretraining on a 69-million-frame dataset to optimize the motion transformation network and loss design. The method achieves real-time inference at 12.8 ms/frame on an RTX 4090 GPU, attaining visual quality competitive with diffusion-based approaches. It supports multimodal driving inputs—including text, audio, and reference videos—and the code and models are publicly released.

Technology Category

Application Category

📝 Abstract
Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability. Building upon this, we develop a video-driven portrait animation framework named LivePortrait with a focus on better generalization, controllability, and efficiency for practical usage. To enhance the generation quality and generalization ability, we scale up the training data to about 69 million high-quality frames, adopt a mixed image-video training strategy, upgrade the network architecture, and design better motion transformation and optimization objectives. Additionally, we discover that compact implicit keypoints can effectively represent a kind of blendshapes and meticulously propose a stitching and two retargeting modules, which utilize a small MLP with negligible computational overhead, to enhance the controllability. Experimental results demonstrate the efficacy of our framework even compared to diffusion-based methods. The generation speed remarkably reaches 12.8ms on an RTX 4090 GPU with PyTorch. The inference code and models are available at https://github.com/KwaiVGI/LivePortrait
Problem

Research questions and friction points this paper is trying to address.

Synthesize lifelike video from single image using motion data.
Enhance portrait animation with better generalization and controllability.
Improve computational efficiency and quality in video-driven animation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit-keypoint-based framework for efficiency
Scaled training data to 69 million frames
Stitching and retargeting modules for controllability
🔎 Similar Papers
No similar papers found.
Jianzhu Guo
Jianzhu Guo
Kuaishou Technology
D
Dingyun Zhang
Kuaishou Technology, University of Science and Technology of China
X
Xiaoqiang Liu
Kuaishou Technology
Zhizhou Zhong
Zhizhou Zhong
PhD student @ HKUST
face recognitionbiometricsaigc
Y
Yuan Zhang
Kuaishou Technology
Pengfei Wan
Pengfei Wan
Head of Kling Video Generation Models, Kuaishou Technology
Generative ModelsComputer VisionMultimodal AIComputer Graphics
D
Di Zhang
Kuaishou Technology