WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation

📅 2024-07-02

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Existing 3D avatar reconstruction methods suffer from poor generalizability due to reliance on small-scale, lab-controlled datasets. To address this, we introduce WebAvatar—the first web-scale, real-world 3D portrait video dataset—automatically curated from YouTube, comprising over 10,000 high-quality, diverse human videos exhibiting complex poses, occlusions, and illumination variations. Methodologically, we integrate person detection, temporal alignment, and weakly supervised SMPL parameter estimation, augmented by multi-source consistency verification to enhance annotation robustness. Compared to prior datasets, WebAvatar increases scale by over an order of magnitude and establishes the first systematic pipeline for large-scale in-the-wild human video acquisition and annotation. Experiments reveal substantial performance degradation of state-of-the-art reconstruction methods under real-world conditions, while demonstrating that data-driven scaling significantly improves reconstruction robustness. The full dataset and annotations are publicly released.

Technology Category

Application Category

📝 Abstract

Existing human datasets for avatar creation are typically limited to laboratory environments, wherein high-quality annotations (e.g., SMPL estimation from 3D scans or multi-view images) can be ideally provided. However, their annotating requirements are impractical for real-world images or videos, posing challenges toward real-world applications on current avatar creation methods. To this end, we propose the WildAvatar dataset, a web-scale in-the-wild human avatar creation dataset extracted from YouTube, with $10,000+$ different human subjects and scenes. WildAvatar is at least $10 imes$ richer than previous datasets for 3D human avatar creation. We evaluate several state-of-the-art avatar creation methods on our dataset, highlighting the unexplored challenges in real-world applications on avatar creation. We also demonstrate the potential for generalizability of avatar creation methods, when provided with data at scale. We publicly release our data source links and annotations, to push forward 3D human avatar creation and other related fields for real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Creating 3D avatars from real-world web videos

Overcoming limitations of lab datasets for avatar creation

Automating annotation and filtering for web-scale avatar datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic annotating pipeline with filtering protocols

Web-scale dataset from YouTube for 3D avatars

Public release of code, data, and annotations

🔎 Similar Papers

No similar papers found.