Pruning as a Cooperative Game: Surrogate-Assisted Layer Contribution Estimation for Large Language Models

📅 2026-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of deploying large language models (LLMs) by proposing a novel pruning approach grounded in cooperative game theory. Unlike existing layer-wise pruning methods that rely on static heuristics and neglect inter-layer dependencies—often leading to significant performance degradation—this study introduces a dynamic framework that treats model performance as a utility function and leverages a lightweight proxy network to efficiently approximate Shapley values for each layer. By integrating hierarchical Monte Carlo sampling, the method explicitly models inter-layer interactions to accurately identify critical layers for retention. Experimental results demonstrate that the proposed technique substantially outperforms current pruning strategies in terms of both perplexity and zero-shot accuracy, achieving aggressive model compression while effectively preserving performance.

Technology Category

Application Category

📝 Abstract
While large language models (LLMs) demonstrate impressive performance across various tasks, their deployment in real-world scenarios is still constrained by high computational demands. Layer-wise pruning, a commonly employed strategy to mitigate inference costs, can partially address this challenge. However, existing approaches generally depend on static heuristic rules and fail to account for the interdependencies among layers, thereby limiting the effectiveness of the pruning process. To this end, this paper proposes a game-theoretic framework that formulates layer pruning as a cooperative game in which each layer acts as a player and model performance serves as the utility. As computing exact Shapley values is computationally infeasible for large language models (LLMs), we propose using a lightweight surrogate network to estimate layer-wise marginal contributions. This network can predict LLM performance for arbitrary layer combinations at a low computational cost. Additionally, we employ stratified Monte Carlo mask sampling to further reduce the cost of Sharpley value estimation. This approach captures inter-layer dependencies and dynamically identifies critical layers for pruning. Extensive experiments demonstrate the consistent superiority of our method in terms of perplexity and zero-shot accuracy, achieving more efficient and effective layer-wise pruning for large language models.
Problem

Research questions and friction points this paper is trying to address.

layer pruning
large language models
inter-layer dependencies
Shapley values
model compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

cooperative game theory
Shapley value estimation
surrogate network
layer-wise pruning
large language models
X
Xuan Ding
Shenzhen Future Network of Intelligence Institute, Guangdong Provincial Key Laboratory of Future Networks of Intelligence, The Chinese University of Hong Kong (Shenzhen)
P
Pengyu Tong
Beijing Normal University
Ranjie Duan
Ranjie Duan
Alibaba Group
AIAI 安全AI推动共同富裕
Y
Yunjian Zhang
University of Chinese Academy of Sciences
Rui Sun
Rui Sun
The Chinese University of HongKong, ShenZhen
Machine Learning
Yao Zhu
Yao Zhu
Zhejiang University
Robust machine learning