CLoQ: Enhancing Fine-Tuning of Quantized LLMs via Calibrated LoRA Initialization

📅 2025-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Low-bit quantization of large language models (LLMs) severely degrades LoRA fine-tuning performance due to numerical precision loss. Method: This paper proposes a layer-wise representation alignment-based calibration initialization method. We first derive the closed-form optimal LoRA solution for quantized LLMs theoretically, then leverage small-scale calibration data to achieve layer-adaptive initialization. Our approach integrates quantization-aware LoRA initialization with a layer-wise error minimization mechanism, enabling effective 2-/3-bit weight quantization. Contribution/Results: On language generation, arithmetic reasoning, and commonsense reasoning benchmarks, our method significantly outperforms existing approaches—especially under ultra-low-bit regimes—while preserving high fine-tuning accuracy, computational efficiency, and model performance.

Technology Category

Application Category

📝 Abstract
Fine-tuning large language models (LLMs) using low-rank adaptation (LoRA) has become a highly efficient approach for downstream tasks, particularly in scenarios with limited computational resources. However, applying LoRA techniques to quantized LLMs poses unique challenges due to the reduced representational precision of quantized weights. In this paper, we introduce CLoQ (Calibrated LoRA initialization for Quantized LLMs), a simplistic initialization strategy designed to overcome these challenges. Our approach focuses on minimizing the layer-wise discrepancy between the original LLM and its quantized counterpart with LoRA components during initialization. By leveraging a small calibration dataset, CLoQ quantizes a pre-trained LLM and determines the optimal LoRA components for each layer, ensuring a strong foundation for subsequent fine-tuning. A key contribution of this work is a novel theoretical result that enables the accurate and closed-form construction of these optimal LoRA components. We validate the efficacy of CLoQ across multiple tasks such as language generation, arithmetic reasoning, and commonsense reasoning, demonstrating that it consistently outperforms existing LoRA fine-tuning methods for quantized LLMs, especially at ultra low-bit widths.
Problem

Research questions and friction points this paper is trying to address.

Quantization
LoRA Fine-tuning
Numerical Precision
Innovation

Methods, ideas, or system contributions that make the work stand out.

CLoQ method
LoRA optimization
quantized LLMs precision enhancement
🔎 Similar Papers
No similar papers found.