GRAFT: Gradient-Aware Fast MaxVol Technique for Dynamic Data Sampling

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and carbon emissions associated with training neural networks on large-scale datasets, this paper proposes a gradient-aware dynamic low-rank data sampling method. The approach first computes low-rank feature representations for each mini-batch, then applies the Fast MaxVol algorithm to efficiently select a volume-maximizing representative subset. Crucially, the subset size is dynamically adjusted based on gradient magnitude approximations, enabling adaptive coverage of the dominant feature subspace. This work represents the first integration of Fast MaxVol sampling with gradient-aware adaptation for data selection. Extensive experiments across multiple benchmark datasets demonstrate that the method maintains or even improves model accuracy while significantly reducing training time (by 32% on average), energy consumption (by 28%), and CO₂ emissions (by 30%). The proposed framework thus achieves a favorable trade-off among predictive accuracy, computational efficiency, and environmental sustainability.

Technology Category

Application Category

📝 Abstract
Training modern neural networks on large datasets is computationally and environmentally costly. We introduce GRAFT, a scalable in-training subset selection method that (i) extracts a low-rank feature representation for each batch, (ii) applies a Fast MaxVol sampler to select a small, diverse subset that spans the batch's dominant subspace, and (iii) dynamically adjusts the subset size using a gradient-approximation criterion. By operating in low-rank subspaces and training on carefully chosen examples instead of full batches, GRAFT preserves the training trajectory while reducing wall-clock time, energy consumption, and $mathrm{CO}_2$ emissions. Across multiple benchmarks, GRAFT matches or exceeds recent selection baselines in both accuracy and efficiency, providing a favorable trade-off between accuracy, efficiency, and emissions.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational and environmental costs of neural network training
Selecting diverse data subsets dynamically during training
Balancing accuracy, efficiency, and emissions in model training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-rank feature representation extraction
Fast MaxVol sampler for diverse subset selection
Dynamic subset size adjustment via gradients
🔎 Similar Papers
No similar papers found.
A
Ashish Jha
Skolkovo Institute of Science and Technology
A
Anh huy Phan
Skolkovo Institute of Science and Technology
R
Razan Dibo
Skolkovo Institute of Science and Technology
Valentin Leplat
Valentin Leplat
Innopolis University
Matrix/Tensor factorizationNumerical linear algebraConvex and Non-Convex Numerical Optimization