Sparsity Induction for Accurate Post-Training Pruning of Large Language Models

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the computational and memory efficiency bottlenecks of large language models caused by their massive parameter counts, which often lead to significant performance degradation in existing post-training pruning methods due to the inherent lack of sparsity in the original models. To enhance pruning compatibility, the authors propose inducing sparsity at both the weight distribution and feature representation levels prior to pruning. Specifically, they introduce an absorbable equivalent scaling transformation to promote weight distribution sparsity and design a spectral norm-based loss to encourage low-rank feature sparsity. The proposed approach incurs no additional parameters or inference overhead and consistently outperforms current post-training pruning techniques across diverse model architectures and tasks, achieving highly accurate and efficient model compression.

Technology Category

Application Category

📝 Abstract
Large language models have demonstrated capabilities in text generation, while their increasing parameter scales present challenges in computational and memory efficiency. Post-training sparsity (PTS), which reduces model cost by removing weights from dense networks, is an effective approach. However, native dense matrices lack high sparsity, making existing approaches that directly remove weights disrupt model states, resulting in unsatisfactory performance recovery even with post-tuning. We propose Sparsity Induction, which promotes models toward higher sparsity at both distribution and feature levels before pruning, to push the limits of PTS. At the distribution level, we enhance distributional sparsity through mathematically equivalent scaling transformations, which are fully absorbable and incur no extra parameters or inference-time overhead. At the feature level, we introduce Spectral Norm Loss to promote feature sparsity from a low-rank perspective. Experiments across diverse model architectures and tasks demonstrate that our method further enhances sparsity-friendliness, achieving superior pruning performance over existing approaches.
Problem

Research questions and friction points this paper is trying to address.

post-training pruning
sparsity
large language models
model compression
weight removal
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparsity Induction
Post-Training Pruning
Distributional Sparsity
Spectral Norm Loss
Low-Rank Feature Sparsity
🔎 Similar Papers
No similar papers found.