๐ค AI Summary
To address the high annotation cost in semantic segmentation and the neglect of image structural priors and pre-trained model capabilities in existing active learning methods, this paper proposes an entity-supervoxel collaborative active learning framework. Our method employs supervoxels as the fundamental annotation unit and introduces a class-agnostic entity-level annotation paradigm. It integrates a class-agnostic mask proposal network with an entropy-driven supervoxel selection mechanism to jointly enhance structural awareness and annotation efficiency. Evaluated on standard benchmarks, our approach achieves a 1.71% mIoU improvement over strong baselines using only 40 user clicksโreducing annotation effort by 98% compared to pixel-level methods. It significantly outperforms prior active learning approaches, effectively alleviating the annotation bottleneck while unlocking the full potential of pre-trained models.
๐ Abstract
Active learning enhances annotation efficiency by selecting the most revealing samples for labeling, thereby reducing reliance on extensive human input. Previous methods in semantic segmentation have centered on individual pixels or small areas, neglecting the rich patterns in natural images and the power of advanced pre-trained models. To address these challenges, we propose three key contributions: Firstly, we introduce Entity-Superpixel Annotation (ESA), an innovative and efficient active learning strategy which utilizes a class-agnostic mask proposal network coupled with super-pixel grouping to capture local structural cues. Additionally, our method selects a subset of entities within each image of the target domain, prioritizing superpixels with high entropy to ensure comprehensive representation. Simultaneously, it focuses on a limited number of key entities, thereby optimizing for efficiency. By utilizing an annotator-friendly design that capitalizes on the inherent structure of images, our approach significantly outperforms existing pixel-based methods, achieving superior results with minimal queries, specifically reducing click cost by 98% and enhancing performance by 1.71%. For instance, our technique requires a mere 40 clicks for annotation, a stark contrast to the 5000 clicks demanded by conventional methods.