Entropy Meets Importance: A Unified Head Importance-Entropy Score for Stable and Efficient Transformer Pruning

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Transformer inference suffers from low efficiency, and existing gradient-based Head Importance Score (HIS) pruning methods neglect attention pattern diversity, leading to unstable pruning. To address this, we propose a unified pruning criterion that jointly incorporates attention entropy and HIS: entropy quantifies the diversity of attention distributions across heads, while HIS captures task-specific, gradient-driven contribution. By integrating these complementary signals, our method enables a more comprehensive and robust head importance assessment. This work is the first to introduce an information-theoretic perspective—specifically, attention entropy—into attention head pruning. Extensive experiments on multiple NLP benchmarks demonstrate that our approach achieves up to a 15.2% improvement in post-pruning model quality while enhancing pruning stability by 2.04×, all without sacrificing accuracy. The proposed framework establishes a new paradigm for efficient and reliable Transformer compression.

Technology Category

Application Category

📝 Abstract
Transformer-based models have achieved remarkable performance in NLP tasks. However, their structural characteristics-multiple layers and attention heads-introduce efficiency challenges in inference and deployment. To address these challenges, various pruning methods have recently been proposed. Notably, gradient-based methods using Head Importance Scores (HIS) have gained traction for interpretability, efficiency, and ability to identify redundant heads. However, HIS alone has limitations as it captures only the gradient-driven contribution, overlooking the diversity of attention patterns. To overcome these limitations, we introduce a novel pruning criterion, HIES (Head Importance-Entropy Score), which integrates head importance scores with attention entropy, providing complementary evidence on per-head contribution. Empirically, HIES-based pruning yields up to 15.2% improvement in model quality and 2.04x improvement in stability over HIS-only methods, enabling substantial model compression without sacrificing either accuracy or stability. Code will be released upon publication.
Problem

Research questions and friction points this paper is trying to address.

Improves transformer pruning by combining importance and entropy scores
Addresses limitations of gradient-only head importance evaluation methods
Enhances model compression while maintaining accuracy and stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines head importance scores with attention entropy
Enables substantial model compression without sacrificing accuracy
Improves model quality and stability over previous methods
🔎 Similar Papers
No similar papers found.
M
Minsik Choi
Department of Computer Science and Engineering, Korea University, Seoul, Korea
H
Hyegang Son
Department of Computer Science and Engineering, Korea University, Seoul, Korea
C
Changhoon Kim
School of Software, Soongsil University, Seoul, Korea
Young Geun Kim
Young Geun Kim
Korea University
Operating SystemsComputer ArchitectureEmbedded SystemsEnergy/Power ManagementMobile/IoT Architecture