ProcrustesGPT: Compressing LLMs with Structured Matrices and Orthogonal Transformations

๐Ÿ“… 2025-06-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the low approximation accuracy and reliance on fine-tuning in large language model (LLM) compression via structured matrix approximations, this paper proposes an orthogonal pre-transformation mechanism: leveraging the output invariance of weight matrices under orthogonal transformations, it introduces learnable orthogonal matrices to enhance the fidelity of low-rank, circulant, or Toeplitz structural representations. We are the first to incorporate Procrustes analysis into LLM compression, designing an efficient solver based on singular value decomposition (SVD) and Householder reflections. The method enables fine-tuning-free compression without altering original weights. Evaluated on LLaMA-2 and Phi-3, it achieves 2โ€“4ร— parameter reduction and 1.8ร— inference speedup, with perplexity degradation under 0.5โ€”significantly outperforming existing fine-tuning-free approaches.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models (LLMs) demonstrate impressive results in natural language processing tasks but require a significant amount of computational and memory resources. Structured matrix representations are a promising way for reducing the number of parameters of these models. However, it seems unrealistic to expect that weight matrices of pretrained models can be accurately represented by structured matrices without any fine-tuning. To overcome this issue, we utilize the fact that LLM output is invariant under certain orthogonal transformations of weight matrices. This insight can be leveraged to identify transformations that significantly improve the compressibility of weights within structured classes. The proposed approach is applicable to various types of structured matrices that support efficient projection operations. Code is available at https://github.com/GrishKate/ProcrustesGPT
Problem

Research questions and friction points this paper is trying to address.

Reducing computational and memory resources in LLMs
Improving compressibility of pretrained model weights
Utilizing orthogonal transformations for structured matrices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses structured matrices for LLM compression
Applies orthogonal transformations for weight optimization
Supports efficient projection operations
๐Ÿ”Ž Similar Papers
No similar papers found.