OFER: Occluded Face Expression Reconstruction

πŸ“… 2024-10-29
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Single-image 3D face reconstruction is highly ill-posed under severe occlusions (e.g., masks, glasses, hands), and existing methods fail to capture its inherent multi-hypothesis nature. To address this, we propose the first occlusion-robust dual-diffusion collaborative generative framework: it jointly models FLAME shape and expression coefficients while introducing a self-supervised accuracy-prediction-based shape ranking mechanism to explicitly characterize the multimodal distribution under occlusion. Our method integrates conditional diffusion models, the 3D deformable face model FLAME, and a newly constructed occlusion-aware evaluation benchmark, CO-545. Experiments demonstrate that our approach significantly outperforms state-of-the-art methods on both standard benchmarks and CO-545β€”reducing shape error by 18.7% and increasing expression diversity by 3.2Γ—. To our knowledge, this is the first method enabling high-fidelity, geometrically consistent, expressive, and plausibly diverse 3D face reconstruction from a single input image under arbitrary occlusions.

Technology Category

Application Category

πŸ“ Abstract
Reconstructing 3D face models from a single image is an inherently ill-posed problem, which becomes even more challenging in the presence of occlusions. In addition to fewer available observations, occlusions introduce an extra source of ambiguity where multiple reconstructions can be equally valid. Despite the ubiquity of the problem, very few methods address its multi-hypothesis nature. In this paper we introduce OFER, a novel approach for single-image 3D face reconstruction that can generate plausible, diverse, and expressive 3D faces, even under strong occlusions. Specifically, we train two diffusion models to generate the shape and expression coefficients of a face parametric model, conditioned on the input image. This approach captures the multi-modal nature of the problem, generating a distribution of solutions as output. However, to maintain consistency across diverse expressions, the challenge is to select the best matching shape. To achieve this, we propose a novel ranking mechanism that sorts the outputs of the shape diffusion network based on predicted shape accuracy scores. We evaluate our method using standard benchmarks and introduce CO-545, a new protocol and dataset designed to assess the accuracy of expressive faces under occlusion. Our results show improved performance over occlusion-based methods, while also enabling the generation of diverse expressions for a given image.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing 3D faces from single images with occlusions
Addressing ambiguity in reconstructions due to occlusions
Generating diverse and expressive 3D faces under occlusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses two diffusion models for shape and expression generation
Introduces ranking mechanism for shape accuracy selection
Develops CO-545 dataset for occlusion accuracy assessment
πŸ”Ž Similar Papers
No similar papers found.
P
Pratheba Selvaraju
University of Massachusetts, Amherst
V
V. Abrevaya
Max Planck Institute for Intelligent Systems
Timo Bolkart
Timo Bolkart
Google
Computer VisionComputer GraphicsVirtual HumansShape AnalysisFaces
Rick Akkerman
Rick Akkerman
University of Amsterdam
Tianyu Ding
Tianyu Ding
University of Pittsburgh
F
Faezeh Amjadi
Microsoft Research
I
Ilya Zharkov
Microsoft Research