🤖 AI Summary
Large language models (LLMs) often exhibit overconfidence and weak few-shot knowledge transfer after fine-tuning, undermining prediction reliability. To address this, we propose UQ4CT, a function-level uncertainty quantification framework that—uniquely at the fine-tuning stage—models the input-output mapping in function space to enable end-to-end calibration of epistemic uncertainty. Methodologically, UQ4CT introduces a hierarchical mixture-of-experts (MoE) architecture, jointly modeling and calibrating epistemic uncertainty via hierarchical decomposition of the function space and parameter-efficient fine-tuning. Evaluated on five benchmarks, UQ4CT reduces expected calibration error (ECE) by over 25% without sacrificing accuracy. Moreover, it maintains superior calibration performance and strong generalization under distributional shift.
📝 Abstract
Accurate uncertainty quantification of large language models (LLMs) provides credibility measure over their outputs. However, fine-tuned LLMs often struggle with overconfidence in uncertain predictions due to the limitations in the models' ability to generalize with limited data. Existing parameter efficient fine-tuning (PEFT) uncertainty quantification methods for LLMs focus on post fine-tuning stage and fall short of calibrating epistemic uncertainty. To address these limitations, we propose Functional-Level Uncertainty Quantification for Calibrated Fine-Tuning (UQ4CT), which captures and calibrates epistemic uncertainty over the space of functions that map input prompts to outputs. We implement UQ4CT during the fine-tuning stage via a mixture-of-experts framework that hierarchically decomposes the functional space. We demonstrate that UQ4CT reduces Expected Calibration Error (ECE) by more than $25%$ while maintaining high accuracy across $5$ benchmarks. Even under distribution shift, UQ4CT maintains superior ECE performance with high accuracy, showcasing improved generalizability.