Choose Your Model Size: Any Compression by a Single Gradient Descent

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address the challenge of deploying foundation models under resource constraints, this paper proposes ACIP—a novel algorithm that generates a global parameter importance ranking via a single SGD pass, enabling zero-shot, on-the-fly instantiation of compressed models at arbitrary target sizes without fine-tuning. Methodologically, ACIP integrates SVD-based reparameterization, iterative singular-value pruning, and sparsity-inducing regularization to achieve efficient structured pruning. Unlike conventional compression paradigms requiring multiple training cycles or post-pruning fine-tuning, ACIP drastically reduces training overhead. Evaluated on multiple open-source LLMs, it achieves state-of-the-art compression performance—outperforming mainstream factorization-based methods—and natively supports quantization. Thus, ACIP establishes a new paradigm for lightweight deployment of large language models.

Technology Category

Application Category

📝 Abstract

The adoption of Foundation Models in resource-constrained environments remains challenging due to their large size and inference costs. A promising way to overcome these limitations is post-training compression, which aims to balance reduced model size against performance degradation. This work presents Any Compression via Iterative Pruning (ACIP), a novel algorithmic approach to determine a compression-performance trade-off from a single stochastic gradient descent run. To ensure parameter efficiency, we use an SVD-reparametrization of linear layers and iteratively prune their singular values with a sparsity-inducing penalty. The resulting pruning order gives rise to a global parameter ranking that allows us to materialize models of any target size. Importantly, the compressed models exhibit strong predictive downstream performance without the need for costly fine-tuning. We evaluate ACIP on a large selection of open-weight LLMs and tasks, and demonstrate state-of-the-art results compared to existing factorisation-based compression methods. We also show that ACIP seamlessly complements common quantization-based compression techniques.

Problem

Research questions and friction points this paper is trying to address.

Compress Foundation Models efficiently

Balance model size and performance

Enable compression without fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative pruning with gradient descent

SVD-reparametrization for efficiency

Global parameter ranking for model size

🔎 Similar Papers

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection