🤖 AI Summary
Deploying large language models (LLMs) under resource constraints remains challenging; existing low-rank compression methods often assume weight matrices are intrinsically low-rank, yet LLM weights exhibit high empirical rank. While activations are more amenable to low-rank approximation, uniform reconstruction ignores feature dimension importance, degrading accuracy.
Method: We propose an importance-aware activation-space reconstruction framework. It introduces a gradient-sensitivity-driven importance weighting scheme and derives a closed-form solution for the activation covariance matrix, enabling interpretable, behavior-preserving low-rank approximation. Our method integrates spectral decomposition with importance-weighted reconstruction.
Contribution/Results: The approach achieves significantly improved compression efficiency without sacrificing accuracy. Experiments across multiple LLMs and downstream tasks show up to a 48.6% increase in compression ratio over prior state-of-the-art methods, while maintaining comparable accuracy.
📝 Abstract
Large language models (LLMs) achieve strong performance across many domains but are difficult to deploy in resource-constrained settings due to their size. Low-rank weight matrix compression is a popular strategy for reducing model size, typically by minimizing weight reconstruction error under the assumption that weights are low-rank. However, this assumption often does not hold in LLMs. Instead, LLM activations exhibit stronger low-rank structure-prompting a shift toward minimizing activation reconstruction error.
We show that this shift alone is insufficient: activation dimensions contribute unequally to model performance, and uniform reconstruction can harm performance. We propose IMPACT, a principled framework for importance-aware activation reconstruction that links model compression decisions to their impact on model behavior. IMPACT formulates an optimization problem that considers both activation structure and gradient sensitivity, and derives a closed-form solution where the optimal reconstruction bases are the eigenvectors of an importance-weighted activation covariance matrix. This enables low-rank approximations explicitly optimized to preserve accuracy. Experiments across diverse models and tasks show that IMPACT achieves up to 48.6% greater model size reduction with accuracy comparable to state-of-the-art baselines.