Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs

📅 2024-10-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

210K/year
🤖 AI Summary
Large language models (LLMs) often exhibit overconfidence and weak few-shot knowledge transfer after fine-tuning, undermining prediction reliability. To address this, we propose UQ4CT, a function-level uncertainty quantification framework that—uniquely at the fine-tuning stage—models the input-output mapping in function space to enable end-to-end calibration of epistemic uncertainty. Methodologically, UQ4CT introduces a hierarchical mixture-of-experts (MoE) architecture, jointly modeling and calibrating epistemic uncertainty via hierarchical decomposition of the function space and parameter-efficient fine-tuning. Evaluated on five benchmarks, UQ4CT reduces expected calibration error (ECE) by over 25% without sacrificing accuracy. Moreover, it maintains superior calibration performance and strong generalization under distributional shift.

Technology Category

Application Category

📝 Abstract
Accurate uncertainty quantification of large language models (LLMs) provides credibility measure over their outputs. However, fine-tuned LLMs often struggle with overconfidence in uncertain predictions due to the limitations in the models' ability to generalize with limited data. Existing parameter efficient fine-tuning (PEFT) uncertainty quantification methods for LLMs focus on post fine-tuning stage and fall short of calibrating epistemic uncertainty. To address these limitations, we propose Functional-Level Uncertainty Quantification for Calibrated Fine-Tuning (UQ4CT), which captures and calibrates epistemic uncertainty over the space of functions that map input prompts to outputs. We implement UQ4CT during the fine-tuning stage via a mixture-of-experts framework that hierarchically decomposes the functional space. We demonstrate that UQ4CT reduces Expected Calibration Error (ECE) by more than $25%$ while maintaining high accuracy across $5$ benchmarks. Even under distribution shift, UQ4CT maintains superior ECE performance with high accuracy, showcasing improved generalizability.
Problem

Research questions and friction points this paper is trying to address.

Overconfidence in Predictions
Limited Learning with Small Data
Credibility Reduction in Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

UQ4CT
Uncertainty Quantification
Expert Mixture Integration
🔎 Similar Papers