Beyond Variance: Knowledge-Aware LLM Compression via Fisher-Aligned Subspace Diagnostics

📅 2026-01-12
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing large language model compression techniques—such as those based on singular value decomposition (SVD)—which prioritize high-variance activation dimensions while discarding low-variance subspaces that are highly sensitive to gradients and crucial for factual knowledge retention. To overcome this, we propose Fisher-Aligned Subspace Compression (FASC), a knowledge-aware post-training compression framework that leverages the Fisher information matrix to model the coupling between activations and gradients, thereby identifying and preserving knowledge-critical subspaces. We introduce the dependency violation score (ρ) as a general diagnostic metric to elucidate how factual knowledge is stored in Transformers and demonstrate, for the first time, the alignment of Fisher information with subspace compression, moving beyond variance-dominated paradigms. On Mistral-7B and Llama-3-8B, FASC achieves 6–8% absolute gains in MMLU and LAMA accuracy over variance-based baselines at 50% rank compression, enabling a 7B model to match the factual recall performance of an uncompressed 13B model.

Technology Category

Application Category

📝 Abstract
Post-training activation compression is essential for deploying Large Language Models (LLMs) on resource-constrained hardware. However, standard methods like Singular Value Decomposition (SVD) are gradient-blind: they preserve high-variance dimensions regardless of their impact on factual knowledge preservation. We introduce Fisher-Aligned Subspace Compression (FASC), a knowledge-aware compression framework that selects subspaces by directly modeling activation-gradient coupling, minimizing a second-order surrogate of the loss function. FASC leverages the Fisher Information Matrix to identify dimensions critical for factual knowledge, which often reside in low-variance but high-gradient-sensitivity subspaces. We propose the Dependence Violation Score (\r{ho}) as a general-purpose diagnostic metric that quantifies activation-gradient coupling, revealing where factual knowledge is stored within transformer architectures. Extensive experiments on Mistral-7B and Llama-3-8B demonstrate that FASC preserves 6-8% more accuracy on knowledge-intensive benchmarks (MMLU, LAMA) compared to variance-based methods at 50% rank reduction, effectively enabling a 7B model to match the factual recall of a 13B uncompressed model. Our analysis reveals that \r{ho} serves as a fundamental signal of stored knowledge, with high-\r{ho} layers emerging only when models internalize factual associations during training.
Problem

Research questions and friction points this paper is trying to address.

LLM compression
knowledge preservation
activation compression
factual knowledge
post-training compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fisher Information Matrix
Knowledge-Aware Compression
Activation-Gradient Coupling
Dependence Violation Score
Post-Training Compression
🔎 Similar Papers
No similar papers found.