🤖 AI Summary
This work addresses the poor calibration and performance degradation under distribution shift observed in existing parameter-efficient fine-tuning methods such as LoRA, which lack reliable uncertainty estimation. To overcome this limitation, we propose Stiefel-Bayes Adapters (SBA), the first approach to embed Bayesian inference within a Stiefel manifold structure. SBA models adapter weights with a Matrix Langevin prior, performs Laplace approximation in the tangent space, and employs geodesic retraction for posterior inference—geometrically preserving subspace orthogonality and well-conditionedness while avoiding the variance inflation caused by conventional projection. Experiments demonstrate that SBA matches LoRA’s task performance while reducing expected calibration error by 18–34% and improving out-of-distribution selective prediction AUROC by 12–25%. Notably, SBA surpasses an ensemble of five LoRA models in OOD detection using fewer parameters.
📝 Abstract
Parameter-efficient fine-tuning methods such as LoRA enable practical adaptation of large language models but provide no principled uncertainty estimates, leading to poorly calibrated predictions and unreliable behavior under domain shift. We introduce Stiefel-Bayes Adapters (SBA), a Bayesian PEFT framework that places a Matrix Langevin prior over orthonormal adapter factors on the Stiefel manifold $\St$ and performs approximate posterior inference via tangent space Laplace approximation with geodesic retraction. Unlike Gaussian priors in flat space projected onto orthogonality constraints, our prior on the manifold naturally encodes the inductive bias that adapter subspaces should be well conditioned and orthogonal, while the posterior provides calibrated predictive uncertainty without recalibration. We prove formally that the tangent space approximation strictly avoids the structural variance inflation inherent in projecting from ambient space, establishing a rigorous theoretical advantage for intrinsic manifold inference. Across GLUE and SuperGLUE benchmarks on RoBERTa-large, LLaMA-2-7B, LLaMA-2-13B, Mistral-7B, and Qwen2.5-7B, domain shift evaluations, selective prediction protocols, and an abstractive summarization task, SBA achieves task performance comparable to LoRA and DoRA while reducing Expected Calibration Error by 18 to 34\% over deterministic baselines, improving selective prediction AUROC by 12 to 25\% under domain shift, and outperforming deep ensembles of five LoRA models on OOD detection at a fraction of the parameter cost. Our results demonstrate that where you place uncertainty, on the right geometric structure, matters more than simply adding any Bayesian treatment to adapters.