π€ AI Summary
This paper addresses the joint optimization of weight decay, temperature scaling, and early stopping in deep ensemble learning to simultaneously improve predictive accuracy and uncertainty calibration. To overcome evaluation bias and suboptimal data utilization inherent in conventional independent hyperparameter tuning, we propose Partial Overlap Validationβa novel cross-validation strategy that enables valid joint assessment while maximizing training data usage. Our method integrates joint regularization, learnable temperature scaling, and adaptive early stopping within an enhanced cross-validation framework. Experiments across multiple benchmark tasks demonstrate that joint optimization consistently outperforms independent tuning: Expected Calibration Error (ECE) decreases by up to 32%, and Negative Log-Likelihood (NLL) improves significantly. Moreover, our validation strategy explicitly reveals the fundamental trade-off between individual and joint hyperparameter optimization. The implementation is publicly available.
π Abstract
Deep ensembles are a powerful tool in machine learning, improving both model performance and uncertainty calibration. While ensembles are typically formed by training and tuning models individually, evidence suggests that jointly tuning the ensemble can lead to better performance. This paper investigates the impact of jointly tuning weight decay, temperature scaling, and early stopping on both predictive performance and uncertainty quantification. Additionally, we propose a partially overlapping holdout strategy as a practical compromise between enabling joint evaluation and maximizing the use of data for training. Our results demonstrate that jointly tuning the ensemble generally matches or improves performance, with significant variation in effect size across different tasks and metrics. We highlight the trade-offs between individual and joint optimization in deep ensemble training, with the overlapping holdout strategy offering an attractive practical solution. We believe our findings provide valuable insights and guidance for practitioners looking to optimize deep ensemble models. Code is available at: https://github.com/lauritsf/ensemble-optimality-gap