Audio-Driven Talking Face Generation with Blink Embedding and Hash Grid Landmarks Encoding

📅 2024-12-02

🏛️ 2024 IEEE Smart World Congress (SWC)

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This work addresses the limitations of existing audio-driven talking face generation methods, which often suffer from insufficient accuracy and efficiency in modeling fine-grained mouth motion details. To overcome this, the authors propose a novel keypoint representation mechanism that integrates blink embeddings with hash grid encoding, coupled with a dynamic keypoint Transformer. This architecture injects audio features as residual terms into a dynamic neural radiance field (Dynamic NeRF), enabling high-fidelity facial animation with strong audio-visual synchronization. The approach significantly enhances the naturalness and expressiveness of both lip movements and overall facial expressions. Experimental results demonstrate that the proposed framework outperforms current state-of-the-art methods in terms of generation quality and detail fidelity.

Technology Category

Application Category

📝 Abstract

Dynamic Neural Radiance Fields (NeRF) have demonstrated impressive success in generating high-fidelity 3D models of talking portraits. Despite the progress in the rendering speed and generation quality, there are still challenges on accurately and efficiently capturing mouth movements in talking portraits. To tackle this challenge, we propose an automatic method based on blink embedding and hash grid landmarks encoding in this study, which can substantially enhance the fidelity of talking faces. Specifically, we leverage facial features encoded as conditional features and integrate audio features as residual terms into our model through a Dynamic Landmark Transformer. Furthermore, we employ neural radiance fields to model the entire face, resulting in a lifelike face representation. Experimental evaluations have demonstrated the superiority of our approach to existing state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

Talking Face Generation

Mouth Movement

Audio-Driven

Neural Radiance Fields

Facial Animation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Blink Embedding

Hash Grid Landmarks Encoding

Dynamic Neural Radiance Fields