Efficient Training of Deep Networks using Guided Spectral Data Selection: A Step Toward Learning What You Need

📅 2025-07-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the computational inefficiency caused by data redundancy in neural network training, this paper proposes a spectral-analysis-based dynamic data selection method. The approach leverages representations extracted from a pretrained reference model and employs the Fiedler vector of the graph Laplacian matrix to perform spectral clustering and batch-level sample scoring, enabling adaptive selection of the most informative subset for training. Crucially, it integrates spectral structural analysis with a predefined filtering ratio, yielding an efficient and interpretable data pruning mechanism. Experiments on benchmarks including CIFAR-10 and Oxford-IIIT Pets demonstrate up to 4× reduction in computational cost and significant training speedup, while achieving higher accuracy than standard training and state-of-the-art methods such as JEST. Notably, the method exhibits superior generalization under resource-constrained settings.

Technology Category

Application Category

📝 Abstract
Effective data curation is essential for optimizing neural network training. In this paper, we present the Guided Spectrally Tuned Data Selection (GSTDS) algorithm, which dynamically adjusts the subset of data points used for training using an off-the-shelf pre-trained reference model. Based on a pre-scheduled filtering ratio, GSTDS effectively reduces the number of data points processed per batch. The proposed method ensures an efficient selection of the most informative data points for training while avoiding redundant or less beneficial computations. Preserving data points in each batch is performed based on spectral analysis. A Fiedler vector-based scoring mechanism removes the filtered portion of the batch, lightening the resource requirements of the learning. The proposed data selection approach not only streamlines the training process but also promotes improved generalization and accuracy. Extensive experiments on standard image classification benchmarks, including CIFAR-10, Oxford-IIIT Pet, and Oxford-Flowers, demonstrate that GSTDS outperforms standard training scenarios and JEST, a recent state-of-the-art data curation method, on several key factors. It is shown that GSTDS achieves notable reductions in computational requirements, up to four times, without compromising performance. GSTDS exhibits a considerable growth in terms of accuracy under the limited computational resource usage, in contrast to other methodologies. These promising results underscore the potential of spectral-based data selection as a scalable solution for resource-efficient deep learning and motivate further exploration into adaptive data curation strategies. You can find the code at https://github.com/rezasharifi82/GSTDS.
Problem

Research questions and friction points this paper is trying to address.

Dynamic data selection to optimize neural network training efficiency
Reducing computational costs without sacrificing model performance
Improving generalization and accuracy through spectral-based data curation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic data selection using pre-trained reference model
Spectral analysis for informative data point preservation
Fiedler vector-based scoring reduces computational requirements
🔎 Similar Papers
No similar papers found.
M
Mohammadreza Sharifi
Department of Computer Engineering, Ferdowsi University of Mashhad, Azadi Square, Mashhad, 9177948974, Khorasan-Razavi, Iran
Ahad Harati
Ahad Harati
Ferdowsi University of Mashhad
Probabilistic ModelsRobot PerceptionReinforcement Learning