Effective Subset Selection Through The Lens of Neural Network Pruning

📅 2024-06-03
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
In high-label-cost scenarios (e.g., medical imaging), efficiently selecting the most informative subset from large-scale unlabeled data for annotation remains challenging. Method: We establish a theoretical equivalence between subset selection and neural network pruning in terms of optimization objectives, enabling the transfer of pruning heuristics to data selection. Specifically, we propose a subset scoring criterion based on neural feature norms—requiring no additional training or gradient computation—and demonstrate its compatibility across architectures (ResNet, ViT) and datasets (CIFAR, ImageNet subsets). Contribution/Results: Our method achieves state-of-the-art performance under few-shot labeling settings, outperforming existing approaches by an average accuracy gain of 1.8–3.2 percentage points. This empirically validates the cross-domain applicability of model compression techniques to data-centric machine learning, highlighting their transferable value for data engineering.

Technology Category

Application Category

📝 Abstract
Having large amounts of annotated data significantly impacts the effectiveness of deep neural networks. However, the annotation task can be very expensive in some domains, such as medical data. Thus, it is important to select the data to be annotated wisely, which is known as the subset selection problem. We investigate the relationship between subset selection and neural network pruning, which is more widely studied, and establish a correspondence between them. Leveraging insights from network pruning, we propose utilizing the norm criterion of neural network features to improve subset selection methods. We empirically validate our proposed strategy on various networks and datasets, demonstrating enhanced accuracy. This shows the potential of employing pruning tools for subset selection.
Problem

Research questions and friction points this paper is trying to address.

Selecting informative examples from large unlabeled datasets
Reducing annotation costs via diverse subset selection
Combining feature norms and orthogonality for sample diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines feature norms, randomization, and orthogonality
Uses Gram-Schmidt process to select diverse samples
Feature norms serve as proxy for informativeness
🔎 Similar Papers
No similar papers found.