🤖 AI Summary
Traditional 3D Morphable Models (3DMMs) suffer from limited resolution and insufficient geometric detail, while neural volumetric methods struggle with real-time rendering. Existing Gaussian splatting-based facial models still rely on mesh-based 3DMM priors, hindering fine-grained expression modeling, high-fidelity geometry reconstruction, and full-head (including hair) dynamic synthesis.
Method: We propose the first full-head, high-fidelity deformable Gaussian model. Our approach introduces an identity-expression disentangled residual learning framework atop a base 3DMM, jointly optimizing geometric and appearance residuals. We curate the EXPRESS-50 dataset to support high-precision residual learning, and design a dual-decoder architecture—comprising base and refinement decoders—to generate vertex-level deformations and per-Gaussian appearance parameters, augmented by a lightweight CNN for enhanced rendering quality.
Contribution/Results: Our method achieves state-of-the-art performance in monocular reconstruction, novel-view synthesis, and expression transfer, enabling real-time high-fidelity rendering at 75 FPS.
📝 Abstract
3D Morphable Models (3DMMs) enable controllable facial geometry and expression editing for reconstruction, animation, and AR/VR, but traditional PCA-based mesh models are limited in resolution, detail, and photorealism. Neural volumetric methods improve realism but remain too slow for interactive use. Recent Gaussian Splatting (3DGS) based facial models achieve fast, high-quality rendering but still depend solely on a mesh-based 3DMM prior for expression control, limiting their ability to capture fine-grained geometry, expressions, and full-head coverage. We introduce GRMM, the first full-head Gaussian 3D morphable model that augments a base 3DMM with residual geometry and appearance components, additive refinements that recover high-frequency details such as wrinkles, fine skin texture, and hairline variations. GRMM provides disentangled control through low-dimensional, interpretable parameters (e.g., identity shape, facial expressions) while separately modelling residuals that capture subject- and expression-specific detail beyond the base model's capacity. Coarse decoders produce vertex-level mesh deformations, fine decoders represent per-Gaussian appearance, and a lightweight CNN refines rasterised images for enhanced realism, all while maintaining 75 FPS real-time rendering. To learn consistent, high-fidelity residuals, we present EXPRESS-50, the first dataset with 60 aligned expressions across 50 identities, enabling robust disentanglement of identity and expression in Gaussian-based 3DMMs. Across monocular 3D face reconstruction, novel-view synthesis, and expression transfer, GRMM surpasses state-of-the-art methods in fidelity and expression accuracy while delivering interactive real-time performance.