BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models

📅 2024-06-17

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address the overconfidence and poor uncertainty quantification of large language models (LLMs) in few-shot domain adaptation, this paper proposes Bayesian Low-Rank Adaptation (BayesLoRA). BayesLoRA is the first method to end-to-end embed Bayesian inference into the entire low-rank fine-tuning process, jointly optimizing both parameter means and covariances. It achieves Bayesian parameterization via low-rank matrix decomposition and integrates backpropagation-driven covariance learning with variational approximate inference, enabling differentiable, co-updating of means and covariances. Compared to standard LoRA and post-training Bayesian approaches, BayesLoRA significantly improves generalization performance and uncertainty calibration—both in-distribution and out-of-distribution—thereby overcoming key performance bottlenecks inherent in post-hoc Bayesianization of pretrained LLMs.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) often suffer from overconfidence during inference, particularly when adapted to downstream domain-specific tasks with limited data. Previous work addresses this issue by employing approximate Bayesian estimation after the LLMs are trained, enabling them to quantify uncertainty. However, such post-training approaches' performance is severely limited by the parameters learned during training. In this paper, we go beyond post-training Bayesianization and propose Bayesian Low-Rank Adaptation by Backpropagation (BLoB), an algorithm that continuously and jointly adjusts both the mean and covariance of LLM parameters throughout the whole fine-tuning process. Our empirical results verify the effectiveness of BLoB in terms of generalization and uncertainty estimation, when evaluated on both in-distribution and out-of-distribution data.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Overconfidence

Insufficient Learning Data

Innovation

Methods, ideas, or system contributions that make the work stand out.

BLoB

Simultaneous Mean and Variance Adjustment

Enhanced Uncertainty Estimation

🔎 Similar Papers

Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization