Validity in machine learning for extreme event attribution

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper identifies three critical limitations of machine learning (ML) in extreme event attribution (EEA): (1) high sensitivity of individual attribution estimates to algorithmic design choices; (2) weak correlation between conventional evaluation metrics (e.g., AUC, Brier score) and actual attribution error; and (3) substantial degradation in model robustness under climate distributional shifts (e.g., temperature trends). Leveraging California wildfire data (2003–2020), we propose a robust EEA framework comprising: (i) ensemble ML estimation to mitigate estimator sensitivity; (ii) mean calibration error as the primary evaluation metric; and (iii) subgroup analysis and propensity-based diagnostics to detect and account for distributional shift. Empirical results demonstrate that reliance on traditional metrics leads to suboptimal model selection, whereas our framework markedly improves attribution accuracy and cross-scenario robustness. The approach establishes a verifiable, interpretable ML methodology for high-stakes climate attribution.

Technology Category

Application Category

📝 Abstract
Extreme event attribution (EEA), an approach for assessing the extent to which disasters are caused by climate change, is crucial for informing climate policy and legal proceedings. Machine learning is increasingly used for EEA by modeling rare weather events otherwise too complex or computationally intensive to model using traditional simulation methods. However, the validity of using machine learning in this context remains unclear, particularly as high-stakes machine learning applications in general are criticized for inherent bias and lack of robustness. Here we use machine learning and simulation analyses to evaluate EEA in the context of California wildfire data from 2003-2020. We identify three major threats to validity: (1) individual event attribution estimates are highly sensitive to algorithmic design choices; (2) common performance metrics like area under the ROC curve or Brier score are not strongly correlated with attribution error, facilitating suboptimal model selection; and (3) distribution shift -- changes in temperature across climate scenarios -- substantially degrades predictive performance. To address these challenges, we propose a more valid and robust attribution analysis based on aggregate machine learning estimates, using an additional metric -- mean calibration error -- to assess model performance, and using subgroup and propensity diagnostics to assess distribution shift.
Problem

Research questions and friction points this paper is trying to address.

Evaluating machine learning validity for extreme event attribution
Identifying algorithmic sensitivity and metric limitations in attribution
Addressing distribution shift challenges in climate impact modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses aggregate machine learning estimates for attribution
Employs mean calibration error for model assessment
Applies subgroup and propensity diagnostics for distribution shift
🔎 Similar Papers
No similar papers found.
C
Cassandra C. Chou
Department of Biostatistics, Johns Hopkins University
Scott L. Zeger
Scott L. Zeger
John C. Malone Professor of Biostatistics and Medicine, Johns Hopkins University
biostatisticspublic healthenvironmentprecision medicineBayes
B
Benjamin Q. Huynh
Department of Environmental Health & Engineering, Johns Hopkins University