🤖 AI Summary
Existing 3D Gaussian splatting-based animatable avatar methods employ a global, uniform expression code, which struggles to capture the heterogeneous dynamic behaviors of distinct facial regions—such as skin versus teeth—leading to blurred or distorted reconstructions. To address this limitation, this work proposes a conditionally adaptive Gaussian avatar framework that leverages cross-attention mechanisms to enable each Gaussian point to adaptively extract locally relevant driving signals from the global expression code based on its spatial location. This approach enables fine-grained regional control and overcomes the constraints of conventional global driving schemes. The method significantly enhances reconstruction fidelity—particularly in complex regions like teeth—while maintaining real-time rendering performance.
📝 Abstract
Creating high-fidelity, real-time drivable 3D head avatars is a core challenge in digital animation. While 3D Gaussian Splashing (3D-GS) offers unprecedented rendering speed and quality, current animation techniques often rely on a "one-size-fits-all" global tuning approach, where all Gaussian primitives are uniformly driven by a single expression code. This simplistic approach fails to unravel the distinct dynamics of different facial regions, such as deformable skin versus rigid teeth, leading to significant blurring and distortion artifacts. We introduce Conditionally-Adaptive Gaussian Avatars (CAG-Avatar), a framework that resolves this key limitation. At its core is a Conditionally Adaptive Fusion Module built on cross-attention. This mechanism empowers each 3D Gaussian to act as a query, adaptively extracting relevant driving signals from the global expression code based on its canonical position. This "tailor-made" conditioning strategy drastically enhances the modeling of fine-grained, localized dynamics. Our experiments confirm a significant improvement in reconstruction fidelity, particularly for challenging regions such as teeth, while preserving real-time rendering performance.