🤖 AI Summary
To address the challenge of core-set selection without ground-truth labels under high annotation costs, this paper proposes a label-free, efficient method. First, it models unsupervised training dynamics via deep clustering (SwAV) to estimate sample difficulty. Second, it introduces a dual-end pruning strategy to correct pseudo-label bias and mitigate distribution shift. To our knowledge, this is the first approach that approximates supervised core-set selection performance under purely unsupervised conditions. Extensive experiments on four major vision benchmarks—including ImageNet-1K—demonstrate substantial gains over state-of-the-art label-free methods: using SwAV encoders, it achieves up to 10.2% higher classification accuracy than D². Key contributions are: (i) the first unsupervised difficulty modeling grounded in proxy training dynamics; and (ii) the first dual-end pruning mechanism explicitly designed to rectify pseudo-label bias.
📝 Abstract
High-quality human-annotated data is crucial for modern deep learning pipelines, yet the human annotation process is both costly and time-consuming. Given a constrained human labeling budget, selecting an informative and representative data subset for labeling can significantly reduce human annotation effort. Well-performing state-of-the-art (SOTA) coreset selection methods require ground truth labels over the whole dataset, failing to reduce the human labeling burden. Meanwhile, SOTA label-free coreset selection methods deliver inferior performance due to poor geometry-based difficulty scores. In this paper, we introduce ELFS (Effective Label-Free Coreset Selection), a novel label-free coreset selection method. ELFS significantly improves label-free coreset selection by addressing two challenges: 1) ELFS utilizes deep clustering to estimate training dynamics-based data difficulty scores without ground truth labels; 2) Pseudo-labels introduce a distribution shift in the data difficulty scores, and we propose a simple but effective double-end pruning method to mitigate bias on calculated scores. We evaluate ELFS on four vision benchmarks and show that, given the same vision encoder, ELFS consistently outperforms SOTA label-free baselines. For instance, when using SwAV as the encoder, ELFS outperforms D2 by up to 10.2% in accuracy on ImageNet-1K. We make our code publicly available on GitHub.