Taking the Garbage Out of Data-Driven Prediction Across Climate Timescales

📅 2025-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
AI/ML models for climate prediction suffer from degraded skill and low credibility due to poor input data quality—including outliers, nonstationarity, and inadequate handling of spatiotemporal dependencies. Method: This study systematically identifies critical data preprocessing factors and proposes the first standardized AI/ML preprocessing protocol tailored for cross-timescale climate forecasting (subseasonal to decadal). It innovatively integrates standardized anomaly construction, nonstationary time-series correction, robust extreme-value handling, and modeling of complex-distribution variables. Multi-case empirical evaluations quantify the differential impacts of preprocessing strategies on prediction error, uncertainty quantification, and interpretability. Contribution/Results: The generalizable protocol significantly enhances model robustness and physical consistency, reducing bias by 15–30%. It establishes a foundational framework for standardized, transparent, and trustworthy climate AI applications.

Technology Category

Application Category

📝 Abstract
Artificial intelligence (AI) -- and specifically machine learning (ML) -- applications for climate prediction across timescales are proliferating quickly. The emergence of these methods prompts a revisit to the impact of data preprocessing, a topic familiar to the climate community, as more traditional statistical models work with relatively small sample sizes. Indeed, the skill and confidence in the forecasts produced by data-driven models are directly influenced by the quality of the datasets and how they are treated during model development, thus yielding the colloquialism "garbage in, garbage out." As such, this article establishes protocols for the proper preprocessing of input data for AI/ML models designed for climate prediction (i.e., subseasonal to decadal and longer). The three aims are to: (1) educate researchers, developers, and end users on the effects that preprocessing has on climate predictions; (2) provide recommended practices for data preprocessing for such applications; and (3) empower end users to decipher whether the models they are using are properly designed for their objectives. Specific topics covered in this article include the creation of (standardized) anomalies, dealing with non-stationarity and the spatiotemporally correlated nature of climate data, and handling of extreme values and variables with potentially complex distributions. Case studies will illustrate how using different preprocessing techniques can produce different predictions from the same model, which can create confusion and decrease confidence in the overall process. Ultimately, implementing the recommended practices set forth in this article will enhance the robustness and transparency of AI/ML in climate prediction studies.
Problem

Research questions and friction points this paper is trying to address.

Addressing data preprocessing impact on climate prediction models
Establishing protocols for AI/ML climate data preprocessing
Improving robustness and transparency in climate prediction studies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized anomalies for climate data preprocessing
Handling non-stationarity in spatiotemporal climate data
Managing extreme values and complex distributions effectively
🔎 Similar Papers
No similar papers found.
J
Jason C. Furtado
University of Oklahoma, Norman, OK, USA
Maria J. Molina
Maria J. Molina
University of Maryland, College Park
M
Marybeth C. Arcodia
Colorado State University, Fort Collins, CO, USA
W
Weston Anderson
University of Maryland, College Park, MD, USA
Tom Beucler
Tom Beucler
Assistant Professor, University of Lausanne
Atmospheric PhysicsClimate InformaticsScientific Machine LearningTropical Meteorology
John A. Callahan
John A. Callahan
NOAA National Ocean Service CO-OPS
climate sciencesea-level risecoastal hazardstidal and surgestats
L
Laura M. Ciasto
NOAA/NWS/NCEP/Climate Prediction Center, College Park, MD, USA
V
Vittorio A. Gensini
Northern Illinois University, Dekalb, IL, USA
M
Michelle L'Heureux
NOAA/NWS/NCEP/Climate Prediction Center, College Park, MD, USA
K
Kathleen Pegion
University of Oklahoma, Norman, OK, USA
Jhayron S. Pérez-Carrasquilla
Jhayron S. Pérez-Carrasquilla
University of Maryland, College Park
ClimateExtreme Weather EventsMachine LearningTropical StormsAir Pollution
M
Maike Sonnewald
University of California, Davis, Davis, CA USA
Ken Takahashi
Ken Takahashi
Instituto Geofísico del Perú
Climate dynamicsEl NiñoOcean-atmosphere interactions
B
Baoqiang Xiang
NOAA/GFDL, Princeton, NJ, USA
B
Brian G. Zimmerman
Macquarie, Houston, TX, USA