On Joint Regularization and Calibration in Deep Ensembles

📅 2025-11-06

🏛️ Trans. Mach. Learn. Res.

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This paper addresses the joint optimization of weight decay, temperature scaling, and early stopping in deep ensemble learning to simultaneously improve predictive accuracy and uncertainty calibration. To overcome evaluation bias and suboptimal data utilization inherent in conventional independent hyperparameter tuning, we propose Partial Overlap Validation—a novel cross-validation strategy that enables valid joint assessment while maximizing training data usage. Our method integrates joint regularization, learnable temperature scaling, and adaptive early stopping within an enhanced cross-validation framework. Experiments across multiple benchmark tasks demonstrate that joint optimization consistently outperforms independent tuning: Expected Calibration Error (ECE) decreases by up to 32%, and Negative Log-Likelihood (NLL) improves significantly. Moreover, our validation strategy explicitly reveals the fundamental trade-off between individual and joint hyperparameter optimization. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Deep ensembles are a powerful tool in machine learning, improving both model performance and uncertainty calibration. While ensembles are typically formed by training and tuning models individually, evidence suggests that jointly tuning the ensemble can lead to better performance. This paper investigates the impact of jointly tuning weight decay, temperature scaling, and early stopping on both predictive performance and uncertainty quantification. Additionally, we propose a partially overlapping holdout strategy as a practical compromise between enabling joint evaluation and maximizing the use of data for training. Our results demonstrate that jointly tuning the ensemble generally matches or improves performance, with significant variation in effect size across different tasks and metrics. We highlight the trade-offs between individual and joint optimization in deep ensemble training, with the overlapping holdout strategy offering an attractive practical solution. We believe our findings provide valuable insights and guidance for practitioners looking to optimize deep ensemble models. Code is available at: https://github.com/lauritsf/ensemble-optimality-gap

Problem

Research questions and friction points this paper is trying to address.

Investigates joint tuning of regularization and calibration in deep ensembles

Proposes overlapping holdout strategy for practical ensemble evaluation

Analyzes trade-offs between individual versus joint optimization methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Jointly tuning weight decay and temperature scaling

Using partially overlapping holdout strategy

Optimizing ensemble calibration and performance simultaneously

🔎 Similar Papers

Dynamic Post-Hoc Neural Ensemblers