🤖 AI Summary
This work addresses the challenge of achieving high-fidelity, full-head 3D reconstruction from a single portrait image while maintaining real-time inference speed—a task hindered by inherent trade-offs among geometric completeness, appearance detail, and computational efficiency. To overcome these limitations, the authors propose Any3DAvatar, which leverages a unified, high-quality full-head dataset named AnyHead and introduces a Plücker coordinate-aware structured 3D Gaussian representation combined with a one-step conditional diffusion denoising mechanism. This framework enables the generation of complete head models in a single forward pass under one second. By incorporating multi-view appearance supervision, the method significantly enhances geometric integrity and texture fidelity in novel views without additional inference overhead, outperforming existing single-image full-head reconstruction approaches in rendering quality.
📝 Abstract
Reconstructing a complete 3D head from a single portrait remains challenging because existing methods still face a sharp quality-speed trade-off: high-fidelity pipelines often rely on multi-stage processing and per-subject optimization, while fast feed-forward models struggle with complete geometry and fine appearance details. To bridge this gap, we propose Any3DAvatar, a fast and high-quality method for single-image 3D Gaussian head avatar generation, whose fastest setting reconstructs a full head in under one second while preserving high-fidelity geometry and texture. First, we build AnyHead, a unified data suite that combines identity diversity, dense multi-view supervision, and realistic accessories, filling the main gaps of existing head data in coverage, full-head geometry, and complex appearance. Second, rather than sampling unstructured noise, we initialize from a Plücker-aware structured 3D Gaussian scaffold and perform one-step conditional denoising, formulating full-head reconstruction into a single forward pass while retaining high fidelity. Third, we introduce auxiliary view-conditioned appearance supervision on the same latent tokens alongside 3D Gaussian reconstruction, improving novel-view texture details at zero extra inference cost. Experiments show that Any3DAvatar outperforms prior single-image full-head reconstruction methods in rendering fidelity while remaining substantially faster.