Beware of Calibration Data for Pruning Large Language Models

📅 2024-10-23

🏛️ International Conference on Learning Representations

📈 Citations: 2

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Post-training pruning is a key approach for efficient large language model (LLM) compression, yet its performance critically depends on calibration data—especially at high sparsity—while existing work lacks systematic analysis of this dependency. This paper reveals, for the first time, that calibration data quality exerts a greater impact on pruning accuracy than pruning algorithm design itself. To address the inaccessibility of pretraining data, we propose a calibration data synthesis strategy grounded in the model’s autoregressive capability; further, we adapt mainstream pruning methods—including Wanda and OWL—based on calibration data distribution analysis. Experiments on strong foundation models (e.g., DCLM, LLaMA-3) demonstrate that our approach substantially outperforms conventional calibration datasets (e.g., WikiText), improving pruning accuracy by up to 15.2% at high sparsity and significantly enhancing the robustness and generalization of leading pruning methods.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) are widely applied across various fields, model compression has become increasingly crucial for reducing costs and improving inference efficiency. Post-training pruning is a promising method that does not require resource-intensive iterative training and only needs a small amount of calibration data to assess the importance of parameters. Previous research has primarily focused on designing advanced pruning methods, while different calibration data's impact on pruning performance still lacks systematical exploration. We fill this blank and surprisingly observe that the effects of calibration data even value more than designing advanced pruning strategies, especially for high sparsity. Our preliminary exploration also discloses that using calibration data similar to the training data can yield better performance. As pre-training data is usually inaccessible for advanced LLMs, we further provide a self-generating calibration data synthesis strategy to construct feasible calibration data. We conduct experiments on the recent strong open-source LLMs (e.g., DCLM, and LLaMA-3), and the results show that the proposed method outperforms commonly used calibration data and can effectively enhance strong pruning methods (e.g., Wanda, OWL).

Problem

Research questions and friction points this paper is trying to address.

Explores calibration data impact on LLM pruning efficiency

Investigates optimal calibration data construction strategies

Proposes self-generating data method to enhance pruning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses small calibration data for pruning

Self-generates similar pre-training data

Enhances pruning methods significantly

🔎 Similar Papers

Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models