Scalable Bayesian Low-Rank Adaptation of Large Language Models via Stochastic Variational Subspace Inference

📅 2025-06-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reliable uncertainty quantification for large language models (LLMs) is critical in high-stakes applications, yet existing Bayesian approaches suffer from prohibitive parameter overhead, hindering scalability to modern LLMs. To address this, we propose Subspace Random Variational Inference (SRVI): a novel framework that reparameterizes LoRA weights as projection matrices and performs Bayesian inference within an r-dimensional low-rank subspace, introducing only ~1,000 additional parameters. SRVI is the first scalable Bayesian LoRA method, enabling Bayesian fine-tuning of base models up to four times larger than the previous state-of-the-art—while preserving computational efficiency. Extensive experiments demonstrate that SRVI matches or exceeds the performance of advanced uncertainty estimation methods across diverse tasks—including predictive calibration, out-of-distribution detection, and selective prediction—thereby significantly advancing the practicality and scalability of Bayesian calibration for large language models.

Technology Category

Application Category

📝 Abstract
Despite their widespread use, large language models (LLMs) are known to hallucinate incorrect information and be poorly calibrated. This makes the uncertainty quantification of these models of critical importance, especially in high-stakes domains, such as autonomy and healthcare. Prior work has made Bayesian deep learning-based approaches to this problem more tractable by performing inference over the low-rank adaptation (LoRA) parameters of a fine-tuned model. While effective, these approaches struggle to scale to larger LLMs due to requiring further additional parameters compared to LoRA. In this work we present $ extbf{Scala}$ble $ extbf{B}$ayesian $ extbf{L}$ow-Rank Adaptation via Stochastic Variational Subspace Inference (ScalaBL). We perform Bayesian inference in an $r$-dimensional subspace, for LoRA rank $r$. By repurposing the LoRA parameters as projection matrices, we are able to map samples from this subspace into the full weight space of the LLM. This allows us to learn all the parameters of our approach using stochastic variational inference. Despite the low dimensionality of our subspace, we are able to achieve competitive performance with state-of-the-art approaches while only requiring ${sim}1000$ additional parameters. Furthermore, it allows us to scale up to the largest Bayesian LLM to date, with four times as a many base parameters as prior work.
Problem

Research questions and friction points this paper is trying to address.

Quantify uncertainty in large language models to reduce hallucinations
Scale Bayesian inference for large models with low-rank adaptation
Achieve competitive performance with minimal additional parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian inference in r-dimensional subspace
Repurpose LoRA parameters as projection matrices
Stochastic variational inference for learning
🔎 Similar Papers
No similar papers found.
C
Colin Samplawski
Neuro-Symbolic Computing and Intelligence Research Group, Computer Science Laboratory, SRI International
A
Adam D. Cobb
Neuro-Symbolic Computing and Intelligence Research Group, Computer Science Laboratory, SRI International
Manoj Acharya
Manoj Acharya
SRI International
Artificial IntelligenceComputer VisionNLPVisual Question Answering
Ramneet Kaur
Ramneet Kaur
Advanced Computer Scientist, SRI
Trustworthy AIInterpretabilityReliabilityConformal PredictionGenAI
Susmit Jha
Susmit Jha
Director, Neurosymbolic Computing and Intelligence, SRI International
Aritificial IntelligenceAutonomyFormal MethodsMachine Learning