Explore and Establish Synergistic Effects Between Weight Pruning and Coreset Selection in Neural Network Training

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the coupled interference between weight redundancy and sample redundancy in neural network training—a previously uncharacterized challenge. We first systematically uncover the synergistic mechanism—and the associated “dual degradation” phenomenon—between weight pruning and core-set selection. To mitigate this, we propose an alternating optimization framework that integrates dynamic importance scoring, state preservation, and joint pruning of parameters and samples. Evaluated across diverse architectures (ResNet, ViT) and benchmarks (CIFAR-10/100, ImageNet subsets), our method achieves up to 17.83% accuracy gain, reduces floating-point operations by 10–90%, and significantly improves training efficiency and generalization stability. Our core contribution is twofold: (i) establishing the first theoretical understanding of the interplay between weight and sample reduction, and (ii) introducing the first provably convergent joint optimization paradigm for simultaneous parameter and data compression.

Technology Category

Application Category

📝 Abstract
Modern deep neural networks rely heavily on massive model weights and training samples, incurring substantial computational costs. Weight pruning and coreset selection are two emerging paradigms proposed to improve computational efficiency. In this paper, we first explore the interplay between redundant weights and training samples through a transparent analysis: redundant samples, particularly noisy ones, cause model weights to become unnecessarily overtuned to fit them, complicating the identification of irrelevant weights during pruning; conversely, irrelevant weights tend to overfit noisy data, undermining coreset selection effectiveness. To further investigate and harness this interplay in deep learning, we develop a Simultaneous Weight and Sample Tailoring mechanism (SWaST) that alternately performs weight pruning and coreset selection to establish a synergistic effect in training. During this investigation, we observe that when simultaneously removing a large number of weights and samples, a phenomenon we term critical double-loss can occur, where important weights and their supportive samples are mistakenly eliminated at the same time, leading to model instability and nearly irreversible degradation that cannot be recovered in subsequent training. Unlike classic machine learning models, this issue can arise in deep learning due to the lack of theoretical guarantees on the correctness of weight pruning and coreset selection, which explains why these paradigms are often developed independently. We mitigate this by integrating a state preservation mechanism into SWaST, enabling stable joint optimization. Extensive experiments reveal a strong synergy between pruning and coreset selection across varying prune rates and coreset sizes, delivering accuracy boosts of up to 17.83% alongside 10% to 90% FLOPs reductions.
Problem

Research questions and friction points this paper is trying to address.

Investigating interactions between weight pruning and coreset selection in neural networks
Addressing critical double-loss from simultaneous weight and sample removal
Developing joint optimization method to enhance computational efficiency and accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simultaneous weight pruning and coreset selection mechanism
State preservation for stable joint optimization
Synergistic training achieving accuracy and efficiency gains
🔎 Similar Papers
No similar papers found.
W
Weilin Wan
College of Computer Science and Artificial Intelligence, Fudan University
Fan Yi
Fan Yi
Princeton University
video conferencingnetwork measurementstransport protocol5G
Weizhong Zhang
Weizhong Zhang
Fudan University
Machine LearningDeep LearningOptimization
Q
Quan Zhou
College of Computer Science and Artificial Intelligence, Fudan University
C
Cheng Jin
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai Key Laboratory of Intelligent Information Processing