On Joint Regularization and Calibration in Deep Ensembles

πŸ“… 2025-11-06
πŸ›οΈ Trans. Mach. Learn. Res.
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the joint optimization of weight decay, temperature scaling, and early stopping in deep ensemble learning to simultaneously improve predictive accuracy and uncertainty calibration. To overcome evaluation bias and suboptimal data utilization inherent in conventional independent hyperparameter tuning, we propose Partial Overlap Validationβ€”a novel cross-validation strategy that enables valid joint assessment while maximizing training data usage. Our method integrates joint regularization, learnable temperature scaling, and adaptive early stopping within an enhanced cross-validation framework. Experiments across multiple benchmark tasks demonstrate that joint optimization consistently outperforms independent tuning: Expected Calibration Error (ECE) decreases by up to 32%, and Negative Log-Likelihood (NLL) improves significantly. Moreover, our validation strategy explicitly reveals the fundamental trade-off between individual and joint hyperparameter optimization. The implementation is publicly available.

Technology Category

Application Category

πŸ“ Abstract
Deep ensembles are a powerful tool in machine learning, improving both model performance and uncertainty calibration. While ensembles are typically formed by training and tuning models individually, evidence suggests that jointly tuning the ensemble can lead to better performance. This paper investigates the impact of jointly tuning weight decay, temperature scaling, and early stopping on both predictive performance and uncertainty quantification. Additionally, we propose a partially overlapping holdout strategy as a practical compromise between enabling joint evaluation and maximizing the use of data for training. Our results demonstrate that jointly tuning the ensemble generally matches or improves performance, with significant variation in effect size across different tasks and metrics. We highlight the trade-offs between individual and joint optimization in deep ensemble training, with the overlapping holdout strategy offering an attractive practical solution. We believe our findings provide valuable insights and guidance for practitioners looking to optimize deep ensemble models. Code is available at: https://github.com/lauritsf/ensemble-optimality-gap
Problem

Research questions and friction points this paper is trying to address.

Investigates joint tuning of regularization and calibration in deep ensembles
Proposes overlapping holdout strategy for practical ensemble evaluation
Analyzes trade-offs between individual versus joint optimization methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Jointly tuning weight decay and temperature scaling
Using partially overlapping holdout strategy
Optimizing ensemble calibration and performance simultaneously
πŸ”Ž Similar Papers
No similar papers found.
L
Laurits Fredsgaard
Department of Applied Mathematics and Computer Science, Technical University of Denmark
Mikkel N. Schmidt
Mikkel N. Schmidt
Technical University of Denmark
Machine learningSource separationGraph Neural NetworksBayesian MLMolecules and Materials