Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads

📅 2024-10-14

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

Existing speech-driven 3D talking head methods are constrained by fixed mesh topologies, limiting generalization to arbitrary topologies—e.g., real-world scanned faces. To address this, we propose the first topology-agnostic speech-driven animation framework. Our method introduces: (i) a registration-free training paradigm eliminating reliance on point-to-point correspondences; (ii) a heat-diffusion-based feature prediction mechanism ensuring topology-robust geometric modeling across diverse meshes; (iii) an adaptive graph neural network that learns dynamic graph structures per input; and (iv) a multi-granularity lip-sync evaluation metric suite addressing shortcomings of conventional metrics in temporal alignment and semantic consistency. Experiments demonstrate high-fidelity animation on arbitrary-topology 3D faces—including unseen scanned data—outperforming fixed-topology baselines. We establish the first topology-independent benchmark for 3D talking heads.

Technology Category

Application Category

📝 Abstract

Generating speech-driven 3D talking heads presents numerous challenges; among those is dealing with varying mesh topologies where no point-wise correspondence exists across all meshes the model can animate. While simplifying the problem, it limits applicability as unseen meshes must adhere to the training topology. This work presents a framework capable of animating 3D faces in arbitrary topologies, including real scanned data. Our approach relies on a model leveraging heat diffusion to predict features robust to the mesh topology. We explore two training settings: a registered one, in which meshes in a training sequences share a fixed topology but any mesh can be animated at test time, and an fully unregistered one, which allows effective training with varying mesh structures. Additionally, we highlight the limitations of current evaluation metrics and propose new metrics for better lip-syncing evaluation between speech and facial movements. Our extensive evaluation shows our approach performs favorably compared to fixed topology techniques, setting a new benchmark by offering a versatile and high-fidelity solution for 3D talking head generation where the topology constraint is dropped.

Problem

Research questions and friction points this paper is trying to address.

Animating 3D faces with varying mesh topologies

Overcoming limitations of fixed topology training

Improving lip-sync evaluation metrics for facial movements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses heat diffusion for topology-robust feature prediction

Supports both registered and unregistered mesh training

Introduces new metrics for better lip-sync evaluation

🔎 Similar Papers

No similar papers found.

Authors to Follow