🤖 AI Summary
This paper identifies and explains the “cognitive uncertainty collapse” phenomenon—where larger deep learning models exhibit degraded uncertainty quantification despite increased capacity—challenging the prevailing assumption that scale inherently improves uncertainty estimation.
Method: The authors first systematically establish implicit ensembling as the primary cause of this collapse; they then propose an explicit multi-layer ensembling framework coupled with submodel decomposition to restore predictive diversity in large vision models (e.g., ViT), thereby recovering calibrated uncertainty estimation. Their approach integrates ViT interpretability analysis, theoretical modeling, and cross-architecture empirical validation (MLP, ResNet, ViT).
Contribution/Results: The collapse is consistently reproduced across architectures; the proposed method significantly improves out-of-distribution detection and uncertainty calibration in safety-critical applications, demonstrating robust generalization and advancing principled uncertainty-aware scaling of vision models.
📝 Abstract
Epistemic uncertainty is crucial for safety-critical applications and out-of-distribution detection tasks. Yet, we uncover a paradoxical phenomenon in deep learning models: an epistemic uncertainty collapse as model complexity increases, challenging the assumption that larger models invariably offer better uncertainty quantification. We propose that this stems from implicit ensembling within large models. To support this hypothesis, we demonstrate epistemic uncertainty collapse empirically across various architectures, from explicit ensembles of ensembles and simple MLPs to state-of-the-art vision models, including ResNets and Vision Transformers -- for the latter, we examine implicit ensemble extraction and decompose larger models into diverse sub-models, recovering epistemic uncertainty. We provide theoretical justification for these phenomena and explore their implications for uncertainty estimation.