🤖 AI Summary
To address the significant performance degradation of models after SVD-based compression, this paper proposes Low-Rank Prehab—a pre-compression low-rank pre-finetuning method. Its core innovation is the introduction of a “pre-compression adaptation” mechanism that explicitly optimizes the spectral compactness of weight matrices prior to SVD decomposition, thereby inducing inherent low-rank structure in model parameters. Unlike architecture-specific approaches, Low-Rank Prehab is broadly applicable to Transformer-based models, including LLMs and ViTs. It integrates task-aware pre-finetuning with lightweight post-compression fine-tuning to jointly enhance compression robustness and recovery efficiency. Experiments across multiple compression ratios demonstrate that Low-Rank Prehab substantially outperforms baselines such as SVD-LLM, achieving markedly reduced accuracy loss on benchmarks including ImageNet and GLUE. Moreover, it improves model compressibility and accelerates fine-tuning convergence after SVD compression.
📝 Abstract
Low-rank approximation methods such as singular value decomposition (SVD) and its variants (e.g., Fisher-weighted SVD, Activation SVD) have recently emerged as effective tools for neural network compression. In this setting, decomposition acts as a "surgical" intervention, followed by fine-tuning that serves as "rehab" to recover accuracy. Inspired by prehabilitation in surgery, we introduce a pre-compression fine-tuning stage, Low-Rank Prehab, that explicitly encourages low-rank structure in weight matrices while preserving task performance. By conditioning the model before SVD, Prehab steers weights toward spectrally compact regions of the parameter space, enabling smoother low-rank approximation and improved recovery. Experiments on large language models (LLMs) and other Transformer-based architectures, including Vision Transformers (ViTs), show that Prehab substantially reduces the immediate accuracy drop after compression and consistently improves post-finetuning performance. Across a wide range of compression ratios, our method outperforms state-of-the-art SVD-based techniques such as SVD-LLM, highlighting the importance of preparing models for compression rather than only improving the compression and recovery stages. Source code is available at https://github.com/niqretnuh/PREHAB-SVD