MI-PRUN: Optimize Large Language Model Pruning via Mutual Information

📅 2026-01-12

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the instability and suboptimality inherent in existing block-level pruning methods for large language models. It introduces mutual information and the data processing inequality into block pruning for the first time, leveraging mutual information to assess changes in hidden states for identifying redundant modules and employing the data processing inequality to quantify block importance. Building on this theoretical foundation, the authors propose Fast-Block-Select, an iterative optimization algorithm that efficiently searches for the globally optimal set of blocks to retain. Experimental results across multiple models and datasets demonstrate that the proposed approach significantly enhances pruning stability and efficiency while preserving or even improving model performance.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have become indispensable across various domains, but this comes at the cost of substantial computational and memory resources. Model pruning addresses this by removing redundant components from models. In particular, block pruning can achieve significant compression and inference acceleration. However, existing block pruning methods are often unstable and struggle to attain globally optimal solutions. In this paper, we propose a mutual information based pruning method MI-PRUN for LLMs. Specifically, we leverages mutual information to identify redundant blocks by evaluating transitions in hidden states. Additionally, we incorporate the Data Processing Inequality (DPI) to reveal the relationship between the importance of entire contiguous blocks and that of individual blocks. Moreover, we develop the Fast-Block-Select algorithm, which iteratively updates block combinations to achieve a globally optimal solution while significantly improving the efficiency. Extensive experiments across various models and datasets demonstrate the stability and effectiveness of our method.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Model Pruning

Block Pruning

Mutual Information

Global Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mutual Information

Block Pruning

Data Processing Inequality