🤖 AI Summary
To address poor generalization, parameter redundancy, and weak cross-task transferability in fine-tuned models, this paper proposes a neural parameter learnable pruning method grounded in the low-rank subspace spanned by task vectors. Unlike conventional structured pruning, our approach jointly optimizes critical parameter masks within this task-vector-derived low-rank subspace—simultaneously suppressing catastrophic forgetting, ensuring model interpolation compatibility, and maximizing compression efficiency. By integrating task vector modeling, subspace projection, and differentiable mask search, we achieve synergistic lightweight pruning and cross-domain knowledge transfer. Extensive experiments on vision, NLP, and multimodal benchmarks demonstrate that our method retains near-original accuracy under high compression ratios (>50%), significantly outperforming state-of-the-art pruning and model merging techniques. The implementation is publicly available.
📝 Abstract
Foundation models and their checkpoints have significantly advanced deep learning, boosting performance across various applications. However, fine-tuned models often struggle outside their specific domains and exhibit considerable redundancy. Recent studies suggest that combining a pruned fine-tuned model with the original pre-trained model can mitigate forgetting, reduce interference when merging model parameters across tasks, and improve compression efficiency. In this context, developing an effective pruning strategy for fine-tuned models is crucial. Leveraging the advantages of the task vector mechanism, we preprocess fine-tuned models by calculating the differences between them and the original model. Recognizing that different task vector subspaces contribute variably to model performance, we introduce a novel method called Neural Parameter Search (NPS-Pruning) for slimming down fine-tuned models. This method enhances pruning efficiency by searching through neural parameters of task vectors within low-rank subspaces. Our method has three key applications: enhancing knowledge transfer through pairwise model interpolation, facilitating effective knowledge fusion via model merging, and enabling the deployment of compressed models that retain near-original performance while significantly reducing storage costs. Extensive experiments across vision, NLP, and multi-modal benchmarks demonstrate the effectiveness and robustness of our approach, resulting in substantial performance gains. The code is publicly available at: https://github.com/duguodong7/NPS-Pruning.