ADAMIX: Adaptive Mixed-Precision Delta-Compression with Quantization Error Optimization for Large Language Models

📅 2025-06-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Efficient deployment of numerous fine-tuned large language models (LLMs) in multi-tenant settings remains challenging, as existing delta-weight quantization methods suffer substantial fidelity degradation under high compression ratios. Method: We propose an adaptive mixed-precision delta compression framework. First, we derive a closed-form mathematical expression for quantization error; then, we formulate bit allocation as a 0–1 integer linear programming problem subject to a target compression ratio constraint—yielding theoretically optimal precision assignment. The framework jointly exploits fine-tuning parameter sparsity and mixed-precision quantization. Results: On AIME2024 and GQA benchmarks, our 7B model achieves +22.3% and +6.1% accuracy improvements over Delta-CoMe, respectively, significantly outperforming state-of-the-art approaches. It enables higher compression ratios while preserving model performance, demonstrating superior efficiency and fidelity trade-offs.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) achieve impressive performance on various knowledge-intensive and complex reasoning tasks in different domains. In certain scenarios like multi-tenant serving, a large number of LLMs finetuned from the same base model are deployed to meet complex requirements for users. Recent works explore delta-compression approaches to quantize and compress the delta parameters between the customized LLM and the corresponding base model. However, existing works either exhibit unsatisfactory performance at high compression ratios or depend on empirical bit allocation schemes. In this work, we propose ADAMIX, an effective adaptive mixed-precision delta-compression framework. We provide a mathematical derivation of quantization error to motivate our mixed-precision compression strategy and formulate the optimal mixed-precision bit allocation scheme as the solution to a 0/1 integer linear programming problem. Our derived bit allocation strategy minimizes the quantization error while adhering to a predefined compression ratio requirement. Experimental results on various models and benchmarks demonstrate that our approach surpasses the best baseline by a considerable margin. On tasks like AIME2024 and GQA, where the norm of $Delta mathbf{W}$ is large and the base model lacks sufficient ability, ADAMIX outperforms the best baseline Delta-CoMe by 22.3% and 6.1% with 7B models, respectively.
Problem

Research questions and friction points this paper is trying to address.

Minimizing quantization error in SVD space for delta compression
Providing theoretical justification for mixed-precision compression approach
Solving 0/1 linear integer programming for practical quantization solution
Innovation

Methods, ideas, or system contributions that make the work stand out.

SVD-based quantization error minimization for compression
Adaptive mixed-precision delta-compression framework DeltaMix
Solving 0/1 linear integer programming for quantization
🔎 Similar Papers
No similar papers found.
B
Boya Xiong
Shanghai University of Finance and Economics
S
Shuo Wang
Tsinghua University
Weifeng Ge
Weifeng Ge
Fudan University
Humanoid RobotComputer VisionArtificial IntelligenceAI4Science
G
Guanhua Chen
Southern University of Science and Technology
Y
Yun Chen
Shanghai University of Finance and Economics