MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization

πŸ“… 2025-05-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the scarcity of large-scale, multi-view, and highly diverse human datasets for 3D human digitization, this paper introduces MVHumanNet++β€”the largest open-source multi-view 3D human motion dataset to date. It comprises 4,500 identities, 9,000 daily attire variations, 60,000 motion sequences, and 645 million image frames, annotated with human masks, camera parameters, 2D/3D keypoints, SMPL(X) parameters, and multimodal textual descriptions; notably, it is the first large-scale dataset to include high-quality normal and depth maps. Our key contribution lies in establishing a scalable, low-cost multi-view synchronized capture paradigm, integrating geometric reconstruction, pose estimation, and cross-modal annotation techniques. Extensive experiments demonstrate substantial improvements in 3D human reconstruction, pose estimation, and novel-view synthesis. The full dataset and annotations are publicly released, advancing scalable and fine-grained 3D human understanding.

Technology Category

Application Category

πŸ“ Abstract
In this era, the success of large language models and text-to-image models can be attributed to the driving force of large-scale datasets. However, in the realm of 3D vision, while significant progress has been achieved in object-centric tasks through large-scale datasets like Objaverse and MVImgNet, human-centric tasks have seen limited advancement, largely due to the absence of a comparable large-scale human dataset. To bridge this gap, we present MVHumanNet++, a dataset that comprises multi-view human action sequences of 4,500 human identities. The primary focus of our work is on collecting human data that features a large number of diverse identities and everyday clothing using multi-view human capture systems, which facilitates easily scalable data collection. Our dataset contains 9,000 daily outfits, 60,000 motion sequences and 645 million frames with extensive annotations, including human masks, camera parameters, 2D and 3D keypoints, SMPL/SMPLX parameters, and corresponding textual descriptions. Additionally, the proposed MVHumanNet++ dataset is enhanced with newly processed normal maps and depth maps, significantly expanding its applicability and utility for advanced human-centric research. To explore the potential of our proposed MVHumanNet++ dataset in various 2D and 3D visual tasks, we conducted several pilot studies to demonstrate the performance improvements and effective applications enabled by the scale provided by MVHumanNet++. As the current largest-scale 3D human dataset, we hope that the release of MVHumanNet++ dataset with annotations will foster further innovations in the domain of 3D human-centric tasks at scale. MVHumanNet++ is publicly available at https://kevinlee09.github.io/research/MVHumanNet++/.
Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale human dataset for 3D vision tasks
Need diverse human identities and daily clothing data
Limited annotations for advanced human-centric research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale multi-view human capture system
Extensive annotations including SMPL parameters
Enhanced with normal and depth maps
πŸ”Ž Similar Papers
No similar papers found.