Data-Efficient Training by Evolved Sampling

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low data sampling efficiency and high backpropagation overhead during training. We propose Evolved Sampling (ES), a dynamic sampling framework that introduces a batch-level loss differencing mechanism to enable adaptive data selection based on real-time training dynamics. ES is further extended to Ensemble-level Selection with Weighted Probability (ESWP), supporting plug-and-play ensembling without auxiliary models or additional annotations, with tunable sampling frequency. The method preserves model performance losslessly across both pretraining and downstream tasks. Empirical evaluation demonstrates up to 45% reduction in wall-clock training time, significantly improving data utilization efficiency for large-scale models. To our knowledge, ES/ESWP is the first dynamic sampling approach that simultaneously achieves universality, zero performance degradation, and a high acceleration ratio—approaching 45%.

Technology Category

Application Category

📝 Abstract
Data selection is designed to accelerate learning with preserved performance. To achieve this, a fundamental thought is to identify informative data samples with significant contributions to the training. In this work, we propose extbf{Evolved Sampling} ( extbf{ES}), a simple yet effective framework for emph{dynamic} sampling along the training process. This method conducts em batch em level data selection based on the dynamics of losses and augmented emph{loss differences}, which enables flexible emph{frequency tuning}, and hence significantly reduces the back propagation time with maintained model performance. Due to its conciseness, ES is also readily extensible to incorporate em set em level data selection (to form ES with pruning, extbf{ESWP}) for further accelerations. As a plug-and-play framework, ES(WP) consistently achieves lossless training accelerations across various pre-training and post-training tasks, saving up to nearly 45% wall-clock time. Our results motivate further investigations on the data efficiency aspect of modern large-scale machine learning.
Problem

Research questions and friction points this paper is trying to address.

Dynamic sampling to accelerate training process
Reducing back propagation time while maintaining performance
Achieving lossless training accelerations across various tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic sampling using loss differences and frequency tuning
Batch-level selection reducing backpropagation time while preserving performance
Extensible plug-and-play framework combining sampling with pruning
🔎 Similar Papers
Ziheng Cheng
Ziheng Cheng
UC Berkeley
Machine LearningOptimizationStatistics
Z
Zhong Li
Microsoft Research Asia
J
Jiang Bian
Microsoft Research Asia