BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models

📅 2024-06-17
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the overconfidence and poor uncertainty quantification of large language models (LLMs) in few-shot domain adaptation, this paper proposes Bayesian Low-Rank Adaptation (BayesLoRA). BayesLoRA is the first method to end-to-end embed Bayesian inference into the entire low-rank fine-tuning process, jointly optimizing both parameter means and covariances. It achieves Bayesian parameterization via low-rank matrix decomposition and integrates backpropagation-driven covariance learning with variational approximate inference, enabling differentiable, co-updating of means and covariances. Compared to standard LoRA and post-training Bayesian approaches, BayesLoRA significantly improves generalization performance and uncertainty calibration—both in-distribution and out-of-distribution—thereby overcoming key performance bottlenecks inherent in post-hoc Bayesianization of pretrained LLMs.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) often suffer from overconfidence during inference, particularly when adapted to downstream domain-specific tasks with limited data. Previous work addresses this issue by employing approximate Bayesian estimation after the LLMs are trained, enabling them to quantify uncertainty. However, such post-training approaches' performance is severely limited by the parameters learned during training. In this paper, we go beyond post-training Bayesianization and propose Bayesian Low-Rank Adaptation by Backpropagation (BLoB), an algorithm that continuously and jointly adjusts both the mean and covariance of LLM parameters throughout the whole fine-tuning process. Our empirical results verify the effectiveness of BLoB in terms of generalization and uncertainty estimation, when evaluated on both in-distribution and out-of-distribution data.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Overconfidence
Insufficient Learning Data
Innovation

Methods, ideas, or system contributions that make the work stand out.

BLoB
Simultaneous Mean and Variance Adjustment
Enhanced Uncertainty Estimation
🔎 Similar Papers
2024-05-17Conference on Empirical Methods in Natural Language ProcessingCitations: 7