🤖 AI Summary
Existing 3D head driving methods suffer from poor generalization under extreme expressions and poses, while their texture-space representations neglect mesh topology, leading to geometric inconsistencies and deformation instability. To address this, we propose a hybrid texel-3D representation that jointly leverages the geometric interpretability of analytical skinning and the continuity of UV parameterization. Our method introduces mesh-aware Jacobians to ensure semantically consistent deformations across UV patches. Guided by a 3D Morphable Model (3DMM) mesh, these Jacobians drive deformation; a CNN predicts UV-aligned geometric attributes, Gaussian splatting enables high-fidelity rendering, and explicit triangular face constraints preserve mesh integrity. The approach significantly improves deformation stability and out-of-distribution generalization. It achieves state-of-the-art performance on extreme reenactment tasks and, for the first time, enables high-fidelity reconstruction of fine anatomical details—including muscle wrinkles, glabellar lines, and intraoral structures.
📝 Abstract
Constructing drivable and photorealistic 3D head avatars has become a central task in AR/XR, enabling immersive and expressive user experiences. With the emergence of high-fidelity and efficient representations such as 3D Gaussians, recent works have pushed toward ultra-detailed head avatars. Existing approaches typically fall into two categories: rule-based analytic rigging or neural network-based deformation fields. While effective in constrained settings, both approaches often fail to generalize to unseen expressions and poses, particularly in extreme reenactment scenarios. Other methods constrain Gaussians to the global texel space of 3DMMs to reduce rendering complexity. However, these texel-based avatars tend to underutilize the underlying mesh structure. They apply minimal analytic deformation and rely heavily on neural regressors and heuristic regularization in UV space, which weakens geometric consistency and limits extrapolation to complex, out-of-distribution deformations. To address these limitations, we introduce TexAvatars, a hybrid avatar representation that combines the explicit geometric grounding of analytic rigging with the spatial continuity of texel space. Our approach predicts local geometric attributes in UV space via CNNs, but drives 3D deformation through mesh-aware Jacobians, enabling smooth and semantically meaningful transitions across triangle boundaries. This hybrid design separates semantic modeling from geometric control, resulting in improved generalization, interpretability, and stability. Furthermore, TexAvatars captures fine-grained expression effects, including muscle-induced wrinkles, glabellar lines, and realistic mouth cavity geometry, with high fidelity. Our method achieves state-of-the-art performance under extreme pose and expression variations, demonstrating strong generalization in challenging head reenactment settings.