🤖 AI Summary
To address the prohibitively high computational cost and poor compatibility with fine-tuning pipelines in large-model uncertainty quantification, this paper proposes an asymmetric dual-model collaboration framework: a backbone model (e.g., ViT-B) is coupled with a lightweight auxiliary model (e.g., ResNet-34), and their predictions are fused via learnable weighted averaging. We provide the first theoretical analysis and empirical validation demonstrating that a deliberately weaker auxiliary model systematically improves uncertainty calibration (reducing Expected Calibration Error), out-of-distribution detection (increasing AUROC), and selective classification—without degrading the backbone’s accuracy. Departing from conventional homogeneous ensembling, our approach incurs only 10–20% additional computational overhead and consistently outperforms state-of-the-art uncertainty estimation methods across five standard image classification benchmarks.
📝 Abstract
The go-to strategy to apply deep networks in settings where uncertainty informs decisions--ensembling multiple training runs with random initializations--is ill-suited for the extremely large-scale models and practical fine-tuning workflows of today. We introduce a new cost-effective strategy for improving the uncertainty quantification and downstream decisions of a large model (e.g. a fine-tuned ViT-B): coupling it with a less accurate but much smaller"sidekick"(e.g. a fine-tuned ResNet-34) with a fraction of the computational cost. We propose aggregating the predictions of this emph{Asymmetric Duo} by simple learned weighted averaging. Surprisingly, despite their inherent asymmetry, the sidekick model almost never harms the performance of the larger model. In fact, across five image classification benchmarks and a variety of model architectures and training schemes (including soups), Asymmetric Duos significantly improve accuracy, uncertainty quantification, and selective classification metrics with only ${sim}10-20%$ more computation.