Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the memory and bandwidth bottlenecks in deploying large language models, which stem from static weights and dynamic key-value cache storage. Existing SVD-based compression methods suffer from suboptimal accuracy or efficiency. To overcome these limitations, we propose Swift-SVD, a training-free, activation-aware low-rank compression framework that incrementally aggregates output activation covariances and performs a single eigendecomposition per layer to achieve theoretically optimal and highly efficient compression. Swift-SVD is the first method to simultaneously guarantee theoretical optimality, computational efficiency, and numerical stability. It further introduces an effective rank estimation technique and a dynamic rank allocation strategy that jointly account for local reconstruction error and global layer importance. Evaluated across six large models and eight datasets, Swift-SVD matches or exceeds state-of-the-art compression accuracy while accelerating end-to-end compression by 3–70× over existing approaches.
📝 Abstract
The deployment of Large Language Models is constrained by the memory and bandwidth demands of static weights and dynamic Key-Value cache. SVD-based compression provides a hardware-friendly solution to reduce these costs. However, existing methods suffer from two key limitations: some are suboptimal in reconstruction error, while others are theoretically optimal but practically inefficient. In this paper, we propose Swift-SVD, an activation-aware, closed-form compression framework that simultaneously guarantees theoretical optimum, practical efficiency and numerical stability. Swift-SVD incrementally aggregates covariance of output activations given a batch of inputs and performs a single eigenvalue decomposition after aggregation, enabling training-free, fast, and optimal layer-wise low-rank approximation. We employ effective rank to analyze local layer-wise compressibility and design a dynamic rank allocation strategy that jointly accounts for local reconstruction loss and end-to-end layer importance. Extensive experiments across six LLMs and eight datasets demonstrate that Swift-SVD outperforms state-of-the-art baselines, achieving optimal compression accuracy while delivering 3-70X speedups in end-to-end compression time. Our code will be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

SVD-based compression
low-rank approximation
Large Language Models
reconstruction error
compression efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Swift-SVD
low-rank compression
activation-aware
theoretical optimality
dynamic rank allocation