Improving reproducibility by controlling random seed stability in machine learning based estimation via bagging

📅 2026-04-19

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This study addresses the instability of machine learning predictions caused by variations in random seeds, which undermines the reproducibility of debiased estimators. The authors formally introduce, for the first time, a stability condition with respect to random seeds and propose an adaptive cross-bagging method that integrates subsample bagging with cross-fitting. This approach ensures stability for any regression algorithm with bounded outputs while substantially reducing computational overhead. By simultaneously eliminating seed dependence in both perturbation-based estimation and sample splitting, the method achieves the desired level of stability with markedly higher computational efficiency than existing alternatives, as demonstrated in empirical evaluations.

Technology Category

Application Category

📝 Abstract

Predictions from machine learning algorithms can vary across random seeds, inducing instability in downstream debiased machine learning estimators. We formalize random seed stability via a concentration condition and prove that subbagging guarantees stability for any bounded-outcome regression algorithm. We introduce a new cross-fitting procedure, adaptive cross-bagging, which simultaneously eliminates seed dependence from both nuisance estimation and sample splitting in debiased machine learning. Numerical experiments confirm that the method achieves the targeted level of stability whereas alternatives do not. Our method incurs a small computational penalty relative to standard practice whereas alternative methods incur large penalties.

Problem

Research questions and friction points this paper is trying to address.

reproducibility

random seed stability

machine learning

debiased estimation

bagging

Innovation

Methods, ideas, or system contributions that make the work stand out.

random seed stability

subbagging

adaptive cross-bagging