Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads

📅 2024-10-14
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing speech-driven 3D talking head methods are constrained by fixed mesh topologies, limiting generalization to arbitrary topologies—e.g., real-world scanned faces. To address this, we propose the first topology-agnostic speech-driven animation framework. Our method introduces: (i) a registration-free training paradigm eliminating reliance on point-to-point correspondences; (ii) a heat-diffusion-based feature prediction mechanism ensuring topology-robust geometric modeling across diverse meshes; (iii) an adaptive graph neural network that learns dynamic graph structures per input; and (iv) a multi-granularity lip-sync evaluation metric suite addressing shortcomings of conventional metrics in temporal alignment and semantic consistency. Experiments demonstrate high-fidelity animation on arbitrary-topology 3D faces—including unseen scanned data—outperforming fixed-topology baselines. We establish the first topology-independent benchmark for 3D talking heads.

Technology Category

Application Category

📝 Abstract
Generating speech-driven 3D talking heads presents numerous challenges; among those is dealing with varying mesh topologies where no point-wise correspondence exists across all meshes the model can animate. While simplifying the problem, it limits applicability as unseen meshes must adhere to the training topology. This work presents a framework capable of animating 3D faces in arbitrary topologies, including real scanned data. Our approach relies on a model leveraging heat diffusion to predict features robust to the mesh topology. We explore two training settings: a registered one, in which meshes in a training sequences share a fixed topology but any mesh can be animated at test time, and an fully unregistered one, which allows effective training with varying mesh structures. Additionally, we highlight the limitations of current evaluation metrics and propose new metrics for better lip-syncing evaluation between speech and facial movements. Our extensive evaluation shows our approach performs favorably compared to fixed topology techniques, setting a new benchmark by offering a versatile and high-fidelity solution for 3D talking head generation where the topology constraint is dropped.
Problem

Research questions and friction points this paper is trying to address.

Animating 3D faces with varying mesh topologies
Overcoming limitations of fixed topology training
Improving lip-sync evaluation metrics for facial movements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses heat diffusion for topology-robust feature prediction
Supports both registered and unregistered mesh training
Introduces new metrics for better lip-sync evaluation
🔎 Similar Papers
No similar papers found.
F
Federico Nocentini
Media Integration and Communication Center (MICC), University of Florence, Italy
T
Thomas Besnier
Univ. of Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France
C
C. Ferrari
Department of Architecture and Engineering University of Parma, Italy
S
Sylvain Arguillere
Univ. of Lille, CNRS, UMR 8524 Laboratoire Paul Painlevé, Lille, F-59000, France
Stefano Berretti
Stefano Berretti
Professor of Computer Engineering, University of Firenze, Italy
3D Computer VisionPattern RecognitionBiometricsMachine Learning
M
Mohamed Daoudi
IMT Nord Europe, Institut Mines-Télécom, Univ. of Lille, Centre for Digital Systems, F-59000 Lille, France, and Univ. of Lille, CNRS, Centrale Lille, Institut Mines-Télécom, UMR 9189 CRIStAL, F-59000 Lille, France