STADE: Standard Deviation as a Pruning Metric

📅 2025-03-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing zero-shot pruning methods for large language models (LLMs), such as Wanda, suffer significant performance degradation under non-ideal training conditions. Method: This paper proposes STADE, a retraining-free structured pruning method. Leveraging rigorous theoretical analysis, we first derive the optimality conditions for structured pruning and identify weight sensitivity to input standard deviation as the key criterion—leading to a novel, standard-deviation-driven pruning metric. STADE requires only a single forward pass and does not rely on internal activations or gradients. Results: On Llama and OPT families, STADE achieves zero-shot pruning accuracy comparable to or exceeding that of Wanda, with strong agreement between theoretical predictions and empirical results. The method is broadly applicable, computationally efficient, and its implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Recently, Large Language Models (LLMs) have become very widespread and are used to solve a wide variety of tasks. To successfully handle these tasks, LLMs require longer training times and larger model sizes. This makes LLMs ideal candidates for pruning methods that reduce computational demands while maintaining performance. Previous methods require a retraining phase after pruning to maintain the original model's performance. However, state-of-the-art pruning methods, such as Wanda, prune the model without retraining, making the pruning process faster and more efficient. Building upon Wanda's work, this study provides a theoretical explanation of why the method is effective and leverages these insights to enhance the pruning process. Specifically, a theoretical analysis of the pruning problem reveals a common scenario in Machine Learning where Wanda is the optimal pruning method. Furthermore, this analysis is extended to cases where Wanda is no longer optimal, leading to the development of a new method, STADE, based on the standard deviation of the input. From a theoretical standpoint, STADE demonstrates better generality across different scenarios. Finally, extensive experiments on Llama and Open Pre-trained Transformers (OPT) models validate these theoretical findings, showing that depending on the training conditions, Wanda's optimal performance varies as predicted by the theoretical framework. These insights contribute to a more robust understanding of pruning strategies and their practical implications. Code is available at: https://github.com/Coello-dev/STADE/

Problem

Research questions and friction points this paper is trying to address.

Optimizing pruning methods for LLMs without retraining

Theoretical analysis of Wanda's effectiveness in pruning

Developing STADE for better pruning generality across scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

STADE uses standard deviation for pruning

Theoretical analysis enhances Wanda's pruning

Validated on Llama and OPT models

🔎 Similar Papers

No similar papers found.

Authors to Follow