Frustratingly Easy Task-aware Pruning for Large Language Models

📅 2025-10-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM pruning methods primarily optimize for general language generation capabilities, often neglecting task-specific performance preservation. To address this, we propose a task-aware pruning framework that—uniquely—incorporates task-specific feature distributions into parameter importance estimation. Parameters are decomposed into shared and task-exclusive groups, and importance scores are fused via activation norm differences between general calibration data and task-specific data. Our method is compatible with mainstream pruning techniques—including magnitude-based pruning, activation-based pruning, and loss perturbation analysis—without requiring fine-tuning or retraining. Evaluated across diverse domain benchmarks (e.g., BoolQ, RTE, SST-2), it achieves significant accuracy gains (+2.1% average) at equivalent sparsity levels, while preserving general-generation quality. Results demonstrate the framework’s effectiveness, cross-task generalizability, and plug-and-play applicability.

Technology Category

Application Category

📝 Abstract
Pruning provides a practical solution to reduce the resources required to run large language models (LLMs) to benefit from their effective capabilities as well as control their cost for training and inference. Research on LLM pruning often ranks the importance of LLM parameters using their magnitudes and calibration-data activations and removes (or masks) the less important ones, accordingly reducing LLMs' size. However, these approaches primarily focus on preserving the LLM's ability to generate fluent sentences, while neglecting performance on specific domains and tasks. In this paper, we propose a simple yet effective pruning approach for LLMs that preserves task-specific capabilities while shrinking their parameter space. We first analyze how conventional pruning minimizes loss perturbation under general-domain calibration and extend this formulation by incorporating task-specific feature distributions into the importance computation of existing pruning algorithms. Thus, our framework computes separate importance scores using both general and task-specific calibration data, partitions parameters into shared and exclusive groups based on activation-norm differences, and then fuses their scores to guide the pruning process. This design enables our method to integrate seamlessly with various foundation pruning techniques and preserve the LLM's specialized abilities under compression. Experiments on widely used benchmarks demonstrate that our approach is effective and consistently outperforms the baselines with identical pruning ratios and different settings.
Problem

Research questions and friction points this paper is trying to address.

Preserving task-specific performance during LLM pruning
Incorporating domain-specific features into importance scoring
Maintaining specialized capabilities while reducing model size
Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporates task-specific feature distributions into pruning
Computes separate importance scores for general and task data
Fuses shared and exclusive parameter scores to guide pruning
🔎 Similar Papers
No similar papers found.
Yuanhe Tian
Yuanhe Tian
University of Washington
Computational LinguisticsNatural Language Processing
J
Junjie Liu
University of Science and Technology of China
X
Xican Yang
University of Science and Technology of China
Haishan Ye
Haishan Ye
西安交通大学
Y
Yan Song
University of Science and Technology of China