🤖 AI Summary
Current 3D vision research is hindered by the scarcity of high-quality, large-scale, full-coverage 360° datasets with rich 3D annotations. To address this, we introduce uCO3D—the first large-scale, high-resolution, 360° video-based 3D object dataset, covering 1,000+ categories and providing per-object 3D pose, depth maps, sparse point clouds, textual descriptions, and 3D Gaussian Splatting reconstructions. Our contributions are threefold: (1) a novel 3D data curation paradigm integrating high diversity—exceeding MVImgNet and CO3Dv2—and rigorous quality control; (2) the first unified provision of multimodal 3D annotations alongside generative 3D representations; and (3) an end-to-end pipeline spanning 360° video acquisition, calibration, reconstruction, alignment, and evaluation. Experiments across multiple 3D foundation models demonstrate that training on uCO3D significantly improves 3D understanding and generation performance, validating its state-of-the-art benchmark capability and strong generalization.
📝 Abstract
We introduce Uncommon Objects in 3D (uCO3D), a new object-centric dataset for 3D deep learning and 3D generative AI. uCO3D is the largest publicly-available collection of high-resolution videos of objects with 3D annotations that ensures full-360$^{circ}$ coverage. uCO3D is significantly more diverse than MVImgNet and CO3Dv2, covering more than 1,000 object categories. It is also of higher quality, due to extensive quality checks of both the collected videos and the 3D annotations. Similar to analogous datasets, uCO3D contains annotations for 3D camera poses, depth maps and sparse point clouds. In addition, each object is equipped with a caption and a 3D Gaussian Splat reconstruction. We train several large 3D models on MVImgNet, CO3Dv2, and uCO3D and obtain superior results using the latter, showing that uCO3D is better for learning applications.