🤖 AI Summary
This work addresses the coupled interference between weight redundancy and sample redundancy in neural network training—a previously uncharacterized challenge. We first systematically uncover the synergistic mechanism—and the associated “dual degradation” phenomenon—between weight pruning and core-set selection. To mitigate this, we propose an alternating optimization framework that integrates dynamic importance scoring, state preservation, and joint pruning of parameters and samples. Evaluated across diverse architectures (ResNet, ViT) and benchmarks (CIFAR-10/100, ImageNet subsets), our method achieves up to 17.83% accuracy gain, reduces floating-point operations by 10–90%, and significantly improves training efficiency and generalization stability. Our core contribution is twofold: (i) establishing the first theoretical understanding of the interplay between weight and sample reduction, and (ii) introducing the first provably convergent joint optimization paradigm for simultaneous parameter and data compression.
📝 Abstract
Modern deep neural networks rely heavily on massive model weights and training samples, incurring substantial computational costs. Weight pruning and coreset selection are two emerging paradigms proposed to improve computational efficiency. In this paper, we first explore the interplay between redundant weights and training samples through a transparent analysis: redundant samples, particularly noisy ones, cause model weights to become unnecessarily overtuned to fit them, complicating the identification of irrelevant weights during pruning; conversely, irrelevant weights tend to overfit noisy data, undermining coreset selection effectiveness. To further investigate and harness this interplay in deep learning, we develop a Simultaneous Weight and Sample Tailoring mechanism (SWaST) that alternately performs weight pruning and coreset selection to establish a synergistic effect in training. During this investigation, we observe that when simultaneously removing a large number of weights and samples, a phenomenon we term critical double-loss can occur, where important weights and their supportive samples are mistakenly eliminated at the same time, leading to model instability and nearly irreversible degradation that cannot be recovered in subsequent training. Unlike classic machine learning models, this issue can arise in deep learning due to the lack of theoretical guarantees on the correctness of weight pruning and coreset selection, which explains why these paradigms are often developed independently. We mitigate this by integrating a state preservation mechanism into SWaST, enabling stable joint optimization. Extensive experiments reveal a strong synergy between pruning and coreset selection across varying prune rates and coreset sizes, delivering accuracy boosts of up to 17.83% alongside 10% to 90% FLOPs reductions.