TuneComp: Joint Fine-tuning and Compression for Large Foundation Models

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

The conventional “fine-tune-then-compress” paradigm for lightweighting large language models (LLMs) during post-training incurs significant performance degradation and introduces redundant intermediate models. Method: This paper proposes the first end-to-end framework that jointly optimizes fine-tuning and structured compression—integrating progressive knowledge distillation, dynamic structured pruning, and low-rank parameter constraints directly into the downstream fine-tuning process to cooperatively shrink the parameter space. Contribution/Results: By eliminating the need to store and compute full-sized intermediate models, our approach reduces memory and computational overhead. On multiple benchmark tasks, it achieves an average accuracy gain of 2.1% at equivalent parameter counts and compresses model size by up to 4.3×, substantially mitigating performance decay inherent in conventional lightweighting pipelines.

Technology Category

Application Category

📝 Abstract

To reduce model size during post-training, compression methods, including knowledge distillation, low-rank approximation, and pruning, are often applied after fine-tuning the model. However, sequential fine-tuning and compression sacrifices performance, while creating a larger than necessary model as an intermediate step. In this work, we aim to reduce this gap, by directly constructing a smaller model while guided by the downstream task. We propose to jointly fine-tune and compress the model by gradually distilling it to a pruned low-rank structure. Experiments demonstrate that joint fine-tuning and compression significantly outperforms other sequential compression methods.

Problem

Research questions and friction points this paper is trying to address.

Jointly fine-tuning and compressing large foundation models

Reducing performance gap from sequential fine-tuning and compression

Constructing smaller models guided by downstream tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint fine-tuning and compression approach

Gradual distillation to pruned low-rank structure

Direct construction guided by downstream task

🔎 Similar Papers

MCNC: Manifold-Constrained Reparameterization for Neural Compression