Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Reliable uncertainty quantification for large language models (LLMs) is critical in high-stakes applications, yet existing Bayesian approaches suffer from prohibitive parameter overhead, hindering scalability to modern LLMs. To address this, we propose Subspace Random Variational Inference (SRVI): a novel framework that reparameterizes LoRA weights as projection matrices and performs Bayesian inference within an r-dimensional low-rank subspace, introducing only ~1,000 additional parameters. SRVI is the first scalable Bayesian LoRA method, enabling Bayesian fine-tuning of base models up to four times larger than the previous state-of-the-art—while preserving computational efficiency. Extensive experiments demonstrate that SRVI matches or exceeds the performance of advanced uncertainty estimation methods across diverse tasks—including predictive calibration, out-of-distribution detection, and selective prediction—thereby significantly advancing the practicality and scalability of Bayesian calibration for large language models.

Technology Category

Application Category

📝 Abstract

Despite their widespread use, large language models (LLMs) are known to hallucinate incorrect information and be poorly calibrated. This makes the uncertainty quantification of these models of critical importance, especially in high-stakes domains, such as autonomy and healthcare. Prior work has made Bayesian deep learning-based approaches to this problem more tractable by performing inference over the low-rank adaptation (LoRA) parameters of a fine-tuned model. While effective, these approaches struggle to scale to larger LLMs due to requiring further additional parameters compared to LoRA. In this work we present $ extbf{Scala}$ble $ extbf{B}$ayesian $ extbf{L}$ow-Rank Adaptation via Stochastic Variational Subspace Inference (ScalaBL). We perform Bayesian inference in an $r$-dimensional subspace, for LoRA rank $r$. By repurposing the LoRA parameters as projection matrices, we are able to map samples from this subspace into the full weight space of the LLM. This allows us to learn all the parameters of our approach using stochastic variational inference. Despite the low dimensionality of our subspace, we are able to achieve competitive performance with state-of-the-art approaches while only requiring ${sim}1000$ additional parameters. Furthermore, it allows us to scale up to the largest Bayesian LLM to date, with four times as a many base parameters as prior work.

Problem

Research questions and friction points this paper is trying to address.

Quantify uncertainty in large language models to reduce hallucinations

Scale Bayesian inference for large models with low-rank adaptation

Achieve competitive performance with minimal additional parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian inference in r-dimensional subspace

Repurpose LoRA parameters as projection matrices

Stochastic variational inference for learning

🔎 Similar Papers

Adaptive Feature-based Low-Rank Compression of Large Language Models via Bayesian Optimization