Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address the high computational cost and low efficiency of training on large-scale datasets, this paper proposes Progressive Data Dropping (PDD), a strategy that dynamically reduces the effective number of training samples per epoch—without altering model architecture or optimizer. PDD integrates hard-sample mining with Dropout-inspired stochastic sampling, employing a lightweight scheduling mechanism to progressively refine the training subset. Integrated into standard training pipelines, PDD achieves convergence in only 12.4% of the original epochs—accelerating training by approximately 8×—while preserving zero accuracy loss; on certain tasks, accuracy even improves by up to 4.82%. This work is the first to jointly model data sampling and stochastic dropout mechanisms, establishing a novel paradigm for efficient data utilization. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

The success of the machine learning field has reliably depended on training on large datasets. While effective, this trend comes at an extraordinary cost. This is due to two deeply intertwined factors: the size of models and the size of datasets. While promising research efforts focus on reducing the size of models, the other half of the equation remains fairly mysterious. Indeed, it is surprising that the standard approach to training remains to iterate over and over, uniformly sampling the training dataset. In this paper we explore a series of alternative training paradigms that leverage insights from hard-data-mining and dropout, simple enough to implement and use that can become the new training standard. The proposed Progressive Data Dropout reduces the number of effective epochs to as little as 12.4% of the baseline. This savings actually do not come at any cost for accuracy. Surprisingly, the proposed method improves accuracy by up to 4.82%. Our approach requires no changes to model architecture or optimizer, and can be applied across standard training pipelines, thus posing an excellent opportunity for wide adoption. Code can be found here: https://github.com/bazyagami/LearningWithRevision

Problem

Research questions and friction points this paper is trying to address.

Reducing training time without sacrificing accuracy

Exploring efficient data sampling methods for machine learning

Improving model accuracy through progressive data dropout

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Data Dropout for faster training

Reduces epochs to 12.4% without accuracy loss

Improves accuracy by up to 4.82%

🔎 Similar Papers

No similar papers found.