🤖 AI Summary
This work addresses the scarcity of high-quality part annotations in existing datasets, which hinders the advancement of fine-grained visual models. To this end, the authors introduce PartImageNet++, a dataset encompassing all 1,000 classes of ImageNet-1K with 100,000 images meticulously annotated at the part level, offering the first comprehensive set of high-quality part labels across the entire category spectrum. They propose a Multi-scale Part Supervision Model (MPM) that jointly leverages both ground-truth and pseudo part labels and incorporates an auxiliary bypass architecture to facilitate multi-scale feature learning under unified supervision. This approach substantially enhances classification robustness on ImageNet-1K and establishes strong baselines across downstream tasks—including part segmentation, object segmentation, and few-shot learning—thereby advancing part-aware visual representation learning.
📝 Abstract
To address the scarcity of high-quality part annotations in existing datasets, we introduce PartImageNet++ (PIN++), a dataset that provides detailed part annotations for all categories in ImageNet-1K. With 100 annotated images per category, totaling 100K images, PIN++ represents the most comprehensive dataset covering a diverse range of object categories. Leveraging PIN++, we propose a Multi-scale Part-supervised recognition Model (MPM) for robust classification on ImageNet-1K. We first trained a part segmentation network using PIN++ and used it to generate pseudo part labels for the remaining unannotated images. MPM then integrated a conventional recognition architecture with auxiliary bypass layers, jointly supervised by both pseudo part labels and the original part annotations. Furthermore, we conducted extensive experiments on PIN++, including part segmentation, object segmentation, and few-shot learning, exploring various ways to leverage part annotations in downstream tasks. Experimental results demonstrated that our approach not only enhanced part-based models for robust object recognition but also established strong baselines for multiple downstream tasks, highlighting the potential of part annotations in improving model performance. The dataset and the code are available at https://github.com/LixiaoTHU/PartImageNetPP.