Making Foundation Models Probabilistic via Singular Value Ensembles

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of miscalibrated predictions in foundation models, which often stem from overconfidence, and the prohibitive computational cost of traditional deep ensembles that hinders scalability to large models. To this end, the authors propose Singular Value Ensembling (SVE), a method that freezes the singular vectors of pretrained weight matrices—treating them as orthogonal “knowledge directions”—and fine-tunes only the corresponding singular values for each ensemble member. This approach enables implicit ensembling through modulation of singular values while introducing less than 1% additional trainable parameters. SVE achieves uncertainty quantification performance comparable to explicit ensembling across diverse NLP and vision tasks, significantly improving model calibration without compromising predictive accuracy.

Technology Category

Application Category

📝 Abstract
Foundation models have become a dominant paradigm in machine learning, achieving remarkable performance across diverse tasks through large-scale pretraining. However, these models often yield overconfident, uncalibrated predictions. The standard approach to quantifying epistemic uncertainty, training an ensemble of independent models, incurs prohibitive computational costs that scale linearly with ensemble size, making it impractical for large foundation models. We propose Singular Value Ensemble (SVE), a parameter-efficient implicit ensemble method that builds on a simple, but powerful core assumption: namely, that the singular vectors of the weight matrices constitute meaningful subspaces of the model's knowledge. Pretrained foundation models encode rich, transferable information in their weight matrices. If the singular vectors are indeed meaningful (orthogonal)"knowledge directions". To obtain a model ensemble, we modulate only how strongly each direction contributes to the output. Rather than learning entirely new parameters, we freeze the singular vectors and only train per-member singular values that rescale the contribution of each direction in that shared knowledge basis. Ensemble diversity emerges naturally as stochastic initialization and random sampling of mini-batches during joint training cause different members to converge to different combinations of the same underlying knowledge. SVE achieves uncertainty quantification comparable to explicit deep ensembles while increasing the parameter count of the base model by less than 1%, making principled uncertainty estimation accessible in resource-constrained settings. We validate SVE on NLP and vision tasks with various different backbones and show that it improves calibration while maintaining predictive accuracy.
Problem

Research questions and friction points this paper is trying to address.

foundation models
epistemic uncertainty
model calibration
deep ensembles
computational cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Singular Value Ensemble
epistemic uncertainty
foundation models
parameter-efficient ensemble
model calibration
🔎 Similar Papers
No similar papers found.