🤖 AI Summary
To address the computational and storage overhead caused by redundant samples in large-scale training datasets, this paper proposes a reinforcement learning (RL)-based data selection framework. The method models data selection as a sequential decision-making process, where a sparse reward signal is derived from the consistency of dynamically evolving data distributions during training. Its key contributions include: (i) the first formal definition and quantification of ε-sample cover to characterize sample coverage redundancy; and (ii) a lightweight RL agent jointly optimizing for coverage and diversity. Extensive experiments across benchmark datasets (CIFAR-10/100, ImageNet) and architectures (ResNet, ViT) demonstrate that the approach reduces training data by 50–70% while preserving—or even improving—model accuracy and generalization performance. It consistently outperforms state-of-the-art data selection methods, offering significant efficiency gains without compromising model quality.
📝 Abstract
Modern deep architectures often rely on large-scale datasets, but training on these datasets incurs high computational and storage overhead. Real-world datasets often contain substantial redundancies, prompting the need for more data-efficient training paradigms. Data selection has shown promise to mitigate redundancy by identifying the most representative samples, thereby reducing training costs without compromising performance. Existing methods typically rely on static scoring metrics or pretrained models, overlooking the combined effect of selected samples and their evolving dynamics during training. We introduce the concept of epsilon-sample cover, which quantifies sample redundancy based on inter-sample relationships, capturing the intrinsic structure of the dataset. Based on this, we reformulate data selection as a reinforcement learning (RL) process and propose RL-Selector, where a lightweight RL agent optimizes the selection policy by leveraging epsilon-sample cover derived from evolving dataset distribution as a reward signal. Extensive experiments across benchmark datasets and diverse architectures demonstrate that our method consistently outperforms existing state-of-the-art baselines. Models trained with our selected datasets show enhanced generalization performance with improved training efficiency.