From Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors

๐Ÿ“… 2026-02-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Generating high-fidelity, temporally coherent, and identity-preserving 3D talking heads from low-quality images or videos remains challenging. To address this, this work proposes SuperHead, a novel framework that, for the first time, integrates pre-trained 3D generative priors with dynamic-aware 3D inversion. SuperHead reconstructs high-resolution heads using 3D Gaussian Splatting and binds them to parametric models such as FLAME for animation control. By fusing multi-view super-resolution rendering with depth supervision, the method significantly enhances geometric detail, texture quality, and identity consistency under dynamic facial expressions. Extensive evaluations demonstrate that SuperHead achieves superior visual fidelity compared to existing baselines.

Technology Category

Application Category

๐Ÿ“ Abstract
Creating high-fidelity, animatable 3D talking heads is crucial for immersive applications, yet often hindered by the prevalence of low-quality image or video sources, which yield poor 3D reconstructions. In this paper, we introduce SuperHead, a novel framework for enhancing low-resolution, animatable 3D head avatars. The core challenge lies in synthesizing high-quality geometry and textures, while ensuring both 3D and temporal consistency during animation and preserving subject identity. Despite recent progress in image, video and 3D-based super-resolution (SR), existing SR techniques are ill-equipped to handle dynamic 3D inputs. To address this, SuperHead leverages the rich priors from pre-trained 3D generative models via a novel dynamics-aware 3D inversion scheme. This process optimizes the latent representation of the generative model to produce a super-resolved 3D Gaussian Splatting (3DGS) head model, which is subsequently rigged to an underlying parametric head model (e.g., FLAME) for animation. The inversion is jointly supervised using a sparse collection of upscaled 2D face renderings and corresponding depth maps, captured from diverse facial expressions and camera viewpoints, to ensure realism under dynamic facial motions. Experiments demonstrate that SuperHead generates avatars with fine-grained facial details under dynamic motions, significantly outperforming baseline methods in visual quality.
Problem

Research questions and friction points this paper is trying to address.

talking head
3D super-resolution
low-quality input
animatable avatar
3D consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D generative priors
dynamics-aware inversion
3D Gaussian Splatting
animatable avatars
super-resolution