🤖 AI Summary
Generating high-fidelity, animatable, and complete 360° 3D head avatars from a single image remains challenging, as existing methods struggle to simultaneously achieve feed-forward inference, full-head modeling, and animation readiness. This work proposes the first feed-forward framework that directly synthesizes universal, complete, and animatable 3D Gaussian head avatars from a single input image without per-instance optimization. The approach introduces semantic-aware mesh deformation to refine the FLAME model with accurate hair geometry and designs a multi-view feature splatting scheme coupled with a visibility-aware fusion mechanism to construct a structurally consistent yet detail-rich shared UV representation. Experiments demonstrate that the method significantly outperforms state-of-the-art approaches in terms of 360° completeness, identity preservation, and animation usability, achieving new state-of-the-art performance.
📝 Abstract
Creating high-fidelity, animatable 3D avatars from a single image remains a formidable challenge. We identified three desirable attributes of avatar generation: 1) the method should be feed-forward, 2) model a 360{\deg} full-head, and 3) should be animation-ready. However, current work addresses only two of the three points simultaneously. To address these limitations, we propose OMEGA-Avatar, the first feed-forward framework that simultaneously generates a generalizable, 360{\deg}-complete, and animatable 3D Gaussian head from a single image. Starting from a feed-forward and animatable framework, we address the 360{\deg} full-head avatar generation problem with two novel components. First, to overcome poor hair modeling in full-head avatar generation, we introduce a semantic-aware mesh deformation module that integrates multi-view normals to optimize a FLAME head with hair while preserving its topology structure. Second, to enable effective feed-forward decoding of full-head features, we propose a multi-view feature splatting module that constructs a shared canonical UV representation from features across multiple views through differentiable bilinear splatting, hierarchical UV mapping, and visibility-aware fusion. This approach preserves both global structural coherence and local high-frequency details across all viewpoints, ensuring 360{\deg} consistency without per-instance optimization. Extensive experiments demonstrate that OMEGA-Avatar achieves state-of-the-art performance, significantly outperforming existing baselines in 360{\deg} full-head completeness while robustly preserving identity across different viewpoints.