A Second-Order Perspective on Pruning at Initialization and Knowledge Transfer

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

The high computational and memory overhead of pretrained vision models hinders their practical deployment, while conventional pruning methods rely on downstream task data—making them unsuitable for task-agnostic scenarios. Method: We systematically investigate structured pruning at initialization—prior to any task-specific fine-tuning—demonstrating that pruning can be performed solely on pretrained weights without access to any downstream data, preserving zero-shot generalization across unseen tasks. Subsequent lightweight fine-tuning fully restores accuracy on both original and retained tasks. Contribution/Results: We reveal that the smooth loss landscape induced by large-scale pretraining underpins cross-task knowledge transfer, and analyze pruning stability from a second-order optimization perspective. Experiments across diverse unseen tasks show that our approach achieves efficient model compression while maintaining strong zero-shot generalization—establishing a novel, task-agnostic paradigm for lightweight vision models.

Technology Category

Application Category

📝 Abstract

The widespread availability of pre-trained vision models has enabled numerous deep learning applications through their transferable representations. However, their computational and storage costs often limit practical deployment. Pruning-at-Initialization has emerged as a promising approach to compress models before training, enabling efficient task-specific adaptation. While conventional wisdom suggests that effective pruning requires task-specific data, this creates a challenge when downstream tasks are unknown in advance. In this paper, we investigate how data influences the pruning of pre-trained vision models. Surprisingly, pruning on one task retains the model's zero-shot performance also on unseen tasks. Furthermore, fine-tuning these pruned models not only improves performance on original seen tasks but can recover held-out tasks' performance. We attribute this phenomenon to the favorable loss landscapes induced by extensive pre-training on large-scale datasets.

Problem

Research questions and friction points this paper is trying to address.

Pruning pre-trained models without task-specific data

Maintaining zero-shot performance on unseen tasks

Recovering performance through fine-tuning pruned models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pruning-at-initialization compresses models before training

Pruning on one task retains zero-shot performance

Fine-tuning pruned models recovers held-out task performance

🔎 Similar Papers

Effective Subset Selection Through The Lens of Neural Network Pruning