🤖 AI Summary
This work addresses the low data sampling efficiency and high backpropagation overhead during training. We propose Evolved Sampling (ES), a dynamic sampling framework that introduces a batch-level loss differencing mechanism to enable adaptive data selection based on real-time training dynamics. ES is further extended to Ensemble-level Selection with Weighted Probability (ESWP), supporting plug-and-play ensembling without auxiliary models or additional annotations, with tunable sampling frequency. The method preserves model performance losslessly across both pretraining and downstream tasks. Empirical evaluation demonstrates up to 45% reduction in wall-clock training time, significantly improving data utilization efficiency for large-scale models. To our knowledge, ES/ESWP is the first dynamic sampling approach that simultaneously achieves universality, zero performance degradation, and a high acceleration ratio—approaching 45%.
📝 Abstract
Data selection is designed to accelerate learning with preserved performance. To achieve this, a fundamental thought is to identify informative data samples with significant contributions to the training. In this work, we propose extbf{Evolved Sampling} ( extbf{ES}), a simple yet effective framework for emph{dynamic} sampling along the training process. This method conducts em batch em level data selection based on the dynamics of losses and augmented emph{loss differences}, which enables flexible emph{frequency tuning}, and hence significantly reduces the back propagation time with maintained model performance. Due to its conciseness, ES is also readily extensible to incorporate em set em level data selection (to form ES with pruning, extbf{ESWP}) for further accelerations. As a plug-and-play framework, ES(WP) consistently achieves lossless training accelerations across various pre-training and post-training tasks, saving up to nearly 45% wall-clock time. Our results motivate further investigations on the data efficiency aspect of modern large-scale machine learning.