Pruning at Initialization – A Sketching Perspective

📅 2023-05-27
🏛️ IEEE Transactions on Pattern Analysis and Machine Intelligence
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates sparse pruning at neural network initialization, focusing on designing data-agnostic initial pruning masks for linear models and characterizing the approximation error after training. The core method models initialization-time pruning as a matrix sketching problem, establishing— for the first time—theoretical equivalence between the two. Based on this insight, we propose a general-purpose pruning algorithm and rigorously derive an upper bound on its approximation error. Our main contributions are threefold: (1) revealing the fundamental equivalence between initialization pruning and matrix sketching; (2) proving the existence of an optimal data-agnostic sparse mask; and (3) providing theoretical guarantees that the proposed algorithm significantly reduces post-training approximation error while preserving sparsity. Extensive experiments validate the robustness and effectiveness of our approach across diverse weight initializations and data distributions.
📝 Abstract
The lottery ticket hypothesis (LTH) has increased attention to pruning neural networks at initialization. We study this problem in the linear setting. We show that finding a sparse mask at initialization is equivalent to the sketching problem introduced for efficient matrix multiplication. This gives us tools to analyze the LTH problem and gain insights into it. Specifically, using the mask found at initialization, we bound the approximation error of the pruned linear model at the end of training. We theoretically justify previous empirical evidence that the search for sparse networks may be data independent. By using the sketching perspective, we suggest a generic improvement to existing algorithms for pruning at initialization, which we show to be beneficial in the data-independent case.
Problem

Research questions and friction points this paper is trying to address.

Analyzes neural network pruning at initialization using sketching theory
Establishes equivalence between sparse mask finding and matrix sketching
Provides theoretical justification for data-independent sparse network search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pruning neural networks at initialization
Equating mask finding to sketching problem
Improving algorithms via sketching perspective
🔎 Similar Papers