DEGAS: Detailed Expressions on Full-Body Gaussian Avatars

πŸ“… 2024-08-20
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenges of modeling fine-grained facial expressions and achieving high visual fidelity in full-body virtual avatars. To this end, it introduces 3D Gaussian Splatting (3DGS) into full-body avatar construction for the first time, proposing a multi-view video-supervised Conditional Variational Autoencoder (CVAE) framework. Departing from conventional 3D Morphable Models (3DMMs), the method designs a purely 2D portrait-based latent expression space and enables joint facial–bodily motion control via UV-space Gaussian mapping. It further extends to audio-visual cross-modal driving. Evaluated on both public and in-house full-body talking datasets, the model achieves photometrically consistent, expression-rich, and real-time renderable reenactment. It significantly bridges the expressiveness gap between 2D talking-head models and 3D full-body avatars, establishing a novel paradigm for interactive AI agents.

Technology Category

Application Category

πŸ“ Abstract
Although neural rendering has made significant advances in creating lifelike, animatable full-body and head avatars, incorporating detailed expressions into full-body avatars remains largely unexplored. We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions. Trained on multiview videos of a given subject, our method learns a conditional variational autoencoder that takes both the body motion and facial expression as driving signals to generate Gaussian maps in the UV layout. To drive the facial expressions, instead of the commonly used 3D Morphable Models (3DMMs) in 3D head avatars, we propose to adopt the expression latent space trained solely on 2D portrait images, bridging the gap between 2D talking faces and 3D avatars. Leveraging the rendering capability of 3DGS and the rich expressiveness of the expression latent space, the learned avatars can be reenacted to reproduce photorealistic rendering images with subtle and accurate facial expressions. Experiments on an existing dataset and our newly proposed dataset of full-body talking avatars demonstrate the efficacy of our method. We also propose an audio-driven extension of our method with the help of 2D talking faces, opening new possibilities for interactive AI agents.
Problem

Research questions and friction points this paper is trying to address.

Incorporating detailed facial expressions into full-body avatars.
Bridging 2D talking faces with 3D avatars using expression latent space.
Generating photorealistic full-body avatars with accurate facial expressions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting modeling
Conditional variational autoencoder training
2D portrait expression latent space
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhijing Shao
Prometheus Vision Technology Co., Ltd.
Duotun Wang
Duotun Wang
The Hong Kong University of Science and Technology (GuangZhou)
Virtual RealityComputer GraphicsContent Creation
Q
Qingyan Tian
Prometheus Vision Technology Co., Ltd.
Y
Yao-Dong Yang
The Hong Kong University of Science and Technology (Guangzhou)
Hengyu Meng
Hengyu Meng
HKUST(GZ)
geometric modeling
Zeyu Cai
Zeyu Cai
Institute of Heavy Ion Physics, Peking University
AI for SciencePlasma PhysicsAI AgentsNumber Theory
B
Bo Dong
Swinburne University of Technology
Y
Yu Zhang
Prometheus Vision Technology Co., Ltd.
K
Kang Zhang
The Hong Kong University of Science and Technology, The Hong Kong University of Science and Technology (Guangzhou)
Z
Zeyu Wang
The Hong Kong University of Science and Technology, The Hong Kong University of Science and Technology (Guangzhou)