IMPACT: Importance-Aware Activation Space Reconstruction

📅 2025-07-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deploying large language models (LLMs) under resource constraints remains challenging; existing low-rank compression methods often assume weight matrices are intrinsically low-rank, yet LLM weights exhibit high empirical rank. While activations are more amenable to low-rank approximation, uniform reconstruction ignores feature dimension importance, degrading accuracy. Method: We propose an importance-aware activation-space reconstruction framework. It introduces a gradient-sensitivity-driven importance weighting scheme and derives a closed-form solution for the activation covariance matrix, enabling interpretable, behavior-preserving low-rank approximation. Our method integrates spectral decomposition with importance-weighted reconstruction. Contribution/Results: The approach achieves significantly improved compression efficiency without sacrificing accuracy. Experiments across multiple LLMs and downstream tasks show up to a 48.6% increase in compression ratio over prior state-of-the-art methods, while maintaining comparable accuracy.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) achieve strong performance across many domains but are difficult to deploy in resource-constrained settings due to their size. Low-rank weight matrix compression is a popular strategy for reducing model size, typically by minimizing weight reconstruction error under the assumption that weights are low-rank. However, this assumption often does not hold in LLMs. Instead, LLM activations exhibit stronger low-rank structure-prompting a shift toward minimizing activation reconstruction error. We show that this shift alone is insufficient: activation dimensions contribute unequally to model performance, and uniform reconstruction can harm performance. We propose IMPACT, a principled framework for importance-aware activation reconstruction that links model compression decisions to their impact on model behavior. IMPACT formulates an optimization problem that considers both activation structure and gradient sensitivity, and derives a closed-form solution where the optimal reconstruction bases are the eigenvectors of an importance-weighted activation covariance matrix. This enables low-rank approximations explicitly optimized to preserve accuracy. Experiments across diverse models and tasks show that IMPACT achieves up to 48.6% greater model size reduction with accuracy comparable to state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

Reduce LLM size for resource-constrained settings
Improve activation reconstruction over weight compression
Preserve model accuracy during low-rank approximation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Importance-aware activation reconstruction framework
Optimizes low-rank bases via gradient sensitivity
Uses importance-weighted activation covariance matrix
🔎 Similar Papers
No similar papers found.