🤖 AI Summary
Traditional 3D Morphable Models (3DMMs) suffer from rigid topology constraints and linear shape assumptions, limiting their ability to represent complex full-head geometry and enable localized facial editing. To address this, we propose imHead—a novel implicit 3DMM architecture that preserves a compact identity embedding space while introducing region-specific implicit representations for disentangled, interpretable local deformation control. Leveraging deep implicit functions, imHead constructs a large-capacity deformation model trained on a high-fidelity, 4K-scale identity dataset. Experiments demonstrate that imHead significantly outperforms state-of-the-art methods in modeling diverse identities and dynamic expressions. It enables millisecond-level fine-grained local editing—such as unilateral eye closure or precise lip shaping—while substantially improving geometric fidelity, cross-identity generalization, and interactive controllability.
📝 Abstract
Over the last years, 3D morphable models (3DMMs) have emerged as a state-of-the-art methodology for modeling and generating expressive 3D avatars. However, given their reliance on a strict topology, along with their linear nature, they struggle to represent complex full-head shapes. Following the advent of deep implicit functions, we propose imHead, a novel implicit 3DMM that not only models expressive 3D head avatars but also facilitates localized editing of the facial features. Previous methods directly divided the latent space into local components accompanied by an identity encoding to capture the global shape variations, leading to expensive latent sizes. In contrast, we retain a single compact identity space and introduce an intermediate region-specific latent representation to enable local edits. To train imHead, we curate a large-scale dataset of 4K distinct identities, making a step-towards large scale 3D head modeling. Under a series of experiments we demonstrate the expressive power of the proposed model to represent diverse identities and expressions outperforming previous approaches. Additionally, the proposed approach provides an interpretable solution for 3D face manipulation, allowing the user to make localized edits.