EGGS-PTP: An Expander-Graph Guided Structured Post-training Pruning Method for Large Language Models

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the excessive computational and memory overhead in large language model (LLM) deployment, this paper proposes a structured post-training pruning method grounded in expander graph theory. It is the first to incorporate the strong connectivity property of expander graphs into N:M sparse pruning design, enabling a theoretically principled weight selection mechanism that preserves critical information-flow paths while ensuring global connectivity and robustness of the pruned network. Compared to conventional structured pruning, our approach significantly improves accuracy retention under high sparsity. Evaluated on Llama-2 and OPT, it achieves an average 2.1× inference speedup and 48% memory reduction, outperforming state-of-the-art methods by 1.3–2.7 percentage points in accuracy. The core contribution lies in establishing a formal theoretical link between expander graph properties and the learnability of sparse structures, thereby introducing a novel paradigm for efficient LLM deployment.

Technology Category

Application Category

📝 Abstract

As Large Language Models (LLMs) become more widely adopted and scale up in size, the computational and memory challenges involved in deploying these massive foundation models have grown increasingly severe. This underscores the urgent need to develop more efficient model variants. Faced with this challenge, the present work introduces EGGS-PTP: an Expander-Graph Guided Structured Post-training Pruning method. The proposed approach leverages graph theory to guide the design of N:M structured pruning, effectively reducing model size and computational demands. By incorporating concepts from expander graphs, EGGS-PTP ensures information flow within the pruned network, preserving essential model functionality. Extensive numerical experiments demonstrate that EGGS-PTP not only achieves significant acceleration and memory savings due to structured sparsity but also outperforms existing structured pruning techniques in terms of accuracy across various LLMs.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational and memory challenges in large language models

Designing efficient N:M structured pruning using graph theory

Preserving model functionality while achieving structured sparsity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Expander-graph guided structured pruning

N:M structured sparsity for efficiency

Preserves model functionality via graph theory

🔎 Similar Papers

No similar papers found.

Authors to Follow