🤖 AI Summary
This work addresses the insufficient approximation accuracy of low-rank decomposition in large language models (LLMs) by introducing, for the first time, Kronecker factorization of the Hessian matrix into LLM weight compression. The proposed method derives a closed-form, bidirectional whitening solution that jointly incorporates input and output information to construct an optimal low-rank approximation of layer weights using second-order optimization insights. This approach overcomes the limitation of existing methods that rely solely on input statistics. Experimental results demonstrate that the proposed technique improves decomposition accuracy by 20%–40% compared to the current state-of-the-art SVD-LLM method.
📝 Abstract
Low-rank decomposition has emerged as an important problem in Large Language Model (LLM) fine-tuning and inference. Through Singular Value Decomposition (SVD), the weight matrix can be factorized into low-rank spaces optimally. Previously, a common practice was to decompose the weight in the activation-whitened space, and then achieve satisfying results. In this work, we propose Optimal Brain Decomposition LLM (OBD-LLM), which studies the decomposition problem in the model space by utilizing second-order Hessian information. Through a rigorous Kronecker-factorization of the Hessian, we show that the decomposition needs to consider both input and output information of the layer, and achieves much better decomposition results compared to input only method. Our loss-aware decomposition method involves a bi-directional whitening on the weight matrix. As a result, OBD-LLM is a closed-form solution for the optimal decomposition of weights in the language model. Remarkably, we achieve ~20-40\% better results than previous state-of-the-art decomposition methods, the SVD-LLM.