GRAFT: Gradient-Aware Fast MaxVol Technique for Dynamic Data Sampling

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address the high computational cost and carbon emissions associated with training neural networks on large-scale datasets, this paper proposes a gradient-aware dynamic low-rank data sampling method. The approach first computes low-rank feature representations for each mini-batch, then applies the Fast MaxVol algorithm to efficiently select a volume-maximizing representative subset. Crucially, the subset size is dynamically adjusted based on gradient magnitude approximations, enabling adaptive coverage of the dominant feature subspace. This work represents the first integration of Fast MaxVol sampling with gradient-aware adaptation for data selection. Extensive experiments across multiple benchmark datasets demonstrate that the method maintains or even improves model accuracy while significantly reducing training time (by 32% on average), energy consumption (by 28%), and CO₂ emissions (by 30%). The proposed framework thus achieves a favorable trade-off among predictive accuracy, computational efficiency, and environmental sustainability.

Technology Category

Application Category

📝 Abstract

Training modern neural networks on large datasets is computationally and environmentally costly. We introduce GRAFT, a scalable in-training subset selection method that (i) extracts a low-rank feature representation for each batch, (ii) applies a Fast MaxVol sampler to select a small, diverse subset that spans the batch's dominant subspace, and (iii) dynamically adjusts the subset size using a gradient-approximation criterion. By operating in low-rank subspaces and training on carefully chosen examples instead of full batches, GRAFT preserves the training trajectory while reducing wall-clock time, energy consumption, and $mathrm{CO}_2$ emissions. Across multiple benchmarks, GRAFT matches or exceeds recent selection baselines in both accuracy and efficiency, providing a favorable trade-off between accuracy, efficiency, and emissions.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational and environmental costs of neural network training

Selecting diverse data subsets dynamically during training

Balancing accuracy, efficiency, and emissions in model training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-rank feature representation extraction

Fast MaxVol sampler for diverse subset selection

Dynamic subset size adjustment via gradients

🔎 Similar Papers

Multiple importance sampling for stochastic gradient estimation