TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction

📅 2025-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient modeling fidelity for subtle, exaggerated, and asymmetric facial expressions—particularly irregular mouth shapes—in single-image, in-the-wild 3D face reconstruction, this paper proposes a high-fidelity expression reconstruction method. Our approach introduces three key innovations: (1) a novel multi-scale facial appearance tokenization mechanism that enables geometry-aware expression representation; (2) a pose-aware landmark loss to enhance geometric consistency under expression-driven deformation; and (3) joint optimization via neural rendering and photometric self-reconstruction to improve texture-geometry coherency. Evaluated on multiple benchmarks, our method achieves state-of-the-art performance in expression reconstruction, significantly improving detail recovery in challenging regions—including the mouth, eyebrows, and asymmetric facial areas. Moreover, it supports downstream applications such as video-driven animation and cross-identity expression transfer.

Technology Category

Application Category

📝 Abstract
3D facial reconstruction from a single in-the-wild image is a crucial task in human-centered computer vision tasks. While existing methods can recover accurate facial shapes, there remains significant space for improvement in fine-grained expression capture. Current approaches struggle with irregular mouth shapes, exaggerated expressions, and asymmetrical facial movements. We present TEASER (Token EnhAnced Spatial modeling for Expressions Reconstruction), which addresses these challenges and enhances 3D facial geometry performance. TEASER tackles two main limitations of existing methods: insufficient photometric loss for self-reconstruction and inaccurate localization of subtle expressions. We introduce a multi-scale tokenizer to extract facial appearance information. Combined with a neural renderer, these tokens provide precise geometric guidance for expression reconstruction. Furthermore, TEASER incorporates a pose-dependent landmark loss to further improve geometric performances. Our approach not only significantly enhances expression reconstruction quality but also offers interpretable tokens suitable for various downstream applications, such as photorealistic facial video driving, expression transfer, and identity swapping. Quantitative and qualitative experimental results across multiple datasets demonstrate that TEASER achieves state-of-the-art performance in precise expression reconstruction.
Problem

Research questions and friction points this paper is trying to address.

Improves 3D facial expression capture
Addresses irregular mouth shapes and asymmetrical movements
Enhances geometric accuracy with multi-scale tokenizer
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale tokenizer extracts facial details
Neural renderer provides geometric guidance
Pose-dependent landmark loss enhances accuracy
🔎 Similar Papers
No similar papers found.
Y
Yunfei Liu
International Digital Economy Academy, Beijing University
L
Lei Zhu
International Digital Economy Academy, Beijing University
Lijian Lin
Lijian Lin
Tencent ARC Lab
Computer VisionVisual Tracking,Video Object Detection
Y
Ye Zhu
International Digital Economy Academy, Beijing University
A
Ailing Zhang
International Digital Economy Academy, Beijing University
Y
Yu Li