🤖 AI Summary
To address the high cost and low accuracy of manual annotation in 3D scene understanding, this paper proposes SCANnotate++, the first high-fidelity synthetic annotation paradigm leveraging automated CAD model retrieval and 9D pose estimation. SCANnotate++ precisely matches ScanNet++ v1 scenes against a large-scale CAD library and refines alignments via point cloud completion (PCN) and iterative closest point (ICP) optimization, yielding accurate, instance-level 3D annotations. Experiments demonstrate that models trained on SCANnotate++ annotations achieve a 3.2% lower Chamfer distance on point cloud completion compared to human-annotated baselines, and attain a 5.7% higher recall in single-view CAD retrieval and alignment. This work provides the first empirical validation that synthetic annotations can surpass manual ones in downstream performance, significantly improving model generalization. All annotations, code, and models are publicly released to advance cost-effective, high-accuracy 3D supervised learning.
📝 Abstract
High-level 3D scene understanding is essential in many applications. However, the challenges of generating accurate 3D annotations make development of deep learning models difficult. We turn to recent advancements in automatic retrieval of synthetic CAD models, and show that data generated by such methods can be used as high-quality ground truth for training supervised deep learning models. More exactly, we employ a pipeline akin to the one previously used to automatically annotate objects in ScanNet scenes with their 9D poses and CAD models. This time, we apply it to the recent ScanNet++ v1 dataset, which previously lacked such annotations. Our findings demonstrate that it is not only possible to train deep learning models on these automatically-obtained annotations but that the resulting models outperform those trained on manually annotated data. We validate this on two distinct tasks: point cloud completion and single-view CAD model retrieval and alignment. Our results underscore the potential of automatic 3D annotations to enhance model performance while significantly reducing annotation costs. To support future research in 3D scene understanding, we will release our annotations, which we call SCANnotate++, along with our trained models.