🤖 AI Summary
To address the challenges of quantifying uncertainty, elucidating knowledge evolution mechanisms, and assessing prediction reliability after fine-tuning large language models (LLMs), this paper proposes a Bayesian posterior approximation method based on LoRA adapter ensembles. It is the first to integrate low-rank adapter ensembles with variational posterior approximation, enabling efficient and interpretable uncertainty modeling. Evaluated on Mistral-7B, the method quantitatively characterizes the dynamic trade-off between prior knowledge retention and domain adaptation during fine-tuning, revealing the counterintuitive phenomenon that acquired knowledge remains strongly preserved even in overfitting regimes. Comprehensive evaluation on multiple-choice benchmarks—including MMLU, ARC, and HellaSwag—demonstrates significant improvements in uncertainty calibration accuracy and predictive confidence assessment.
📝 Abstract
Fine-tuning large language models can improve task specific performance, although a general understanding of what the fine-tuned model has learned, forgotten and how to trust its predictions is still missing. We derive principled uncertainty quantification for fine-tuned LLMs with posterior approximations using computationally efficient low-rank adaptation ensembles. We analyze three common multiple-choice datasets using low-rank adaptation ensembles based on Mistral-7b, and draw quantitative and qualitative conclusions on their perceived complexity and balance between retained prior knowledge and domain specific adaptation during and after fine-tuning. We identify unexpected retention of acquired knowledge during fine-tuning in the overfitting regime.