AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation

📅 2023-10-11

🏛️ IEEE transactions on multimedia

📈 Citations: 1

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing speech-driven 3D facial animation methods struggle to model speaker-specific speaking styles, resulting in expressions and head poses lacking vitality and personalization. To address this, we propose a novel framework that enables personalized speech-driven animation from merely ~10 seconds of reference video. First, we introduce MoLoRA—a low-rank mixture-of-adapters mechanism—that efficiently decouples and learns identity-specific expression styles. Second, we design a fine-tuning-free, semantics-aware pose style retrieval module, integrating discrete pose priors with semantically aligned style embeddings to achieve natural and controllable head motion synthesis. By synergistically combining LoRA, Mixture-of-Experts (MoE) architecture, and discrete pose representation, our method achieves state-of-the-art performance across expressiveness, style fidelity, and audio-visual synchronization—demonstrating superior objective metrics and human perceptual evaluation.

📝 Abstract

Speech-driven 3D facial animation aims at generating facial movements that are synchronized with the driving speech, which has been widely explored recently. Existing works mostly neglect the person-specific talking style in generation, including facial expression and head pose styles. Several works intend to capture the personalities by fine-tuning modules. However, limited training data leads to the lack of vividness. In this work, we propose AdaMesh, a novel adaptive speech-driven facial animation approach, which learns the personalized talking style from a reference video of about 10 seconds and generates vivid facial expressions and head poses. Specifically, we propose mixture-of-low-rank adaptation (MoLoRA) to fine-tune the expression adapter, which efficiently captures the facial expression style. For the personalized pose style, we propose a pose adapter by building a discrete pose prior and retrieving the appropriate style embedding with a semantic-aware pose style matrix without fine-tuning. Extensive experimental results show that our approach outperforms state-of-the-art methods, preserves the talking style in the reference video, and generates vivid facial animation. The supplementary video and code will be available at https://adamesh.github.io.

Problem

Research questions and friction points this paper is trying to address.

Generates personalized 3D facial animations from speech

Captures individual facial expression and head pose styles

Uses minimal reference video for adaptive animation generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

AdaMesh adapts speech-driven 3D facial animation.

MoLoRA fine-tunes expression adapter for vividness.

Pose adapter uses semantic-aware matrix without fine-tuning.

🔎 Similar Papers

EmoVOCA: Speech-Driven Emotional 3D Talking Heads