Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address the performance degradation in large language model (LLM) compression caused by diagonal approximations of the Fisher information matrix—which neglect parameter correlations—this paper proposes GFWSVD, a post-training compression method integrating generalized Fisher-weighted singular value decomposition (SVD) with Kronecker decomposition. GFWSVD is the first to incorporate Kronecker-factored Fisher approximation into the Fisher-weighted SVD framework, jointly modeling both diagonal and off-diagonal structures of the Fisher matrix to significantly improve parameter importance estimation accuracy. Evaluated on the MMLU benchmark at a 20× compression ratio, GFWSVD achieves absolute accuracy gains of 5%, 3%, and 6% over FWSVD, SVD-LLM, and ASVD, respectively. These results demonstrate GFWSVD’s superior capability to preserve model functionality while enabling highly efficient compression.

Technology Category

Application Category

📝 Abstract

The Fisher information is a fundamental concept for characterizing the sensitivity of parameters in neural networks. However, leveraging the full observed Fisher information is too expensive for large models, so most methods rely on simple diagonal approximations. While efficient, this approach ignores parameter correlations, often resulting in reduced performance on downstream tasks. In this work, we mitigate these limitations and propose Generalized Fisher-Weighted SVD (GFWSVD), a post-training LLM compression technique that accounts for both diagonal and off-diagonal elements of the Fisher information matrix, providing a more accurate reflection of parameter importance. To make the method tractable, we introduce a scalable adaptation of the Kronecker-factored approximation algorithm for the observed Fisher information. We demonstrate the effectiveness of our method on LLM compression, showing improvements over existing compression baselines. For example, at a 20 compression rate on the MMLU benchmark, our method outperforms FWSVD, which is based on a diagonal approximation of the Fisher information, by 5 percent, SVD-LLM by 3 percent, and ASVD by 6 percent compression rate.

Problem

Research questions and friction points this paper is trying to address.

Efficiently approximates full Fisher information for large models

Addresses limitations of diagonal Fisher approximations in compression

Improves LLM compression performance over existing baseline methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized Fisher-Weighted SVD for LLM compression

Kronecker-factored Fisher approximation for scalability

Captures diagonal and off-diagonal Fisher elements

🔎 Similar Papers

Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization