🤖 AI Summary
Existing 3D avatar reconstruction methods suffer from poor generalizability due to reliance on small-scale, lab-controlled datasets. To address this, we introduce WebAvatar—the first web-scale, real-world 3D portrait video dataset—automatically curated from YouTube, comprising over 10,000 high-quality, diverse human videos exhibiting complex poses, occlusions, and illumination variations. Methodologically, we integrate person detection, temporal alignment, and weakly supervised SMPL parameter estimation, augmented by multi-source consistency verification to enhance annotation robustness. Compared to prior datasets, WebAvatar increases scale by over an order of magnitude and establishes the first systematic pipeline for large-scale in-the-wild human video acquisition and annotation. Experiments reveal substantial performance degradation of state-of-the-art reconstruction methods under real-world conditions, while demonstrating that data-driven scaling significantly improves reconstruction robustness. The full dataset and annotations are publicly released.
📝 Abstract
Existing human datasets for avatar creation are typically limited to laboratory environments, wherein high-quality annotations (e.g., SMPL estimation from 3D scans or multi-view images) can be ideally provided. However, their annotating requirements are impractical for real-world images or videos, posing challenges toward real-world applications on current avatar creation methods. To this end, we propose the WildAvatar dataset, a web-scale in-the-wild human avatar creation dataset extracted from YouTube, with $10,000+$ different human subjects and scenes. WildAvatar is at least $10 imes$ richer than previous datasets for 3D human avatar creation. We evaluate several state-of-the-art avatar creation methods on our dataset, highlighting the unexplored challenges in real-world applications on avatar creation. We also demonstrate the potential for generalizability of avatar creation methods, when provided with data at scale. We publicly release our data source links and annotations, to push forward 3D human avatar creation and other related fields for real-world applications.