TuneComp: Joint Fine-tuning and Compression for Large Foundation Models

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The conventional “fine-tune-then-compress” paradigm for lightweighting large language models (LLMs) during post-training incurs significant performance degradation and introduces redundant intermediate models. Method: This paper proposes the first end-to-end framework that jointly optimizes fine-tuning and structured compression—integrating progressive knowledge distillation, dynamic structured pruning, and low-rank parameter constraints directly into the downstream fine-tuning process to cooperatively shrink the parameter space. Contribution/Results: By eliminating the need to store and compute full-sized intermediate models, our approach reduces memory and computational overhead. On multiple benchmark tasks, it achieves an average accuracy gain of 2.1% at equivalent parameter counts and compresses model size by up to 4.3×, substantially mitigating performance decay inherent in conventional lightweighting pipelines.

Technology Category

Application Category

📝 Abstract
To reduce model size during post-training, compression methods, including knowledge distillation, low-rank approximation, and pruning, are often applied after fine-tuning the model. However, sequential fine-tuning and compression sacrifices performance, while creating a larger than necessary model as an intermediate step. In this work, we aim to reduce this gap, by directly constructing a smaller model while guided by the downstream task. We propose to jointly fine-tune and compress the model by gradually distilling it to a pruned low-rank structure. Experiments demonstrate that joint fine-tuning and compression significantly outperforms other sequential compression methods.
Problem

Research questions and friction points this paper is trying to address.

Jointly fine-tuning and compressing large foundation models
Reducing performance gap from sequential fine-tuning and compression
Constructing smaller models guided by downstream tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint fine-tuning and compression approach
Gradual distillation to pruned low-rank structure
Direct construction guided by downstream task
🔎 Similar Papers
No similar papers found.
X
Xiangyu Chen
Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA 02139, USA
J
Jing Liu
Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA 02139, USA
Y
Ye Wang
Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA 02139, USA
M
Matthew Brand
Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA 02139, USA
Pu (Perry) Wang
Pu (Perry) Wang
Mitsubishi Electric Research Laboratories (MERL)
Radar PerceptionWireless SensingWi-Fi/UWB/mmWave sensingSignal ProcessingDeep Learning
Toshiaki Koike-Akino
Toshiaki Koike-Akino
MERL
Signal processing