Extreme Weather Bench: A framework and benchmark for evaluation of high-impact weather

📅 2026-05-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
This work addresses the current lack of a unified, open benchmark for evaluating high-impact weather events, which hinders effective validation and comparison of artificial intelligence (AI) and numerical weather prediction (NWP) models under realistic extreme conditions. To bridge this gap, the study introduces the first community-driven, open-source, standardized evaluation framework that integrates multi-scale extreme weather cases, multi-source observational data, and impact-oriented metrics, along with reproducible validation protocols. This framework enables consistent, side-by-side comparisons of AI and traditional models across diverse high-impact weather scenarios, substantially enhancing model credibility and providing the global research community with a continuously evolving, open benchmarking platform.
📝 Abstract
Forecasting the wide variety of high-impact weather events experienced globally is a challenge for both Artificial Intelligence (AI) and Numerical Weather Prediction (NWP) models and it is critical that such models be properly verified before deployment. Although AI weather models are rapidly evolving, much of their evaluation is currently done either with a global-scale evaluation or by hand-picking a small number of case studies or a region. A widely-used open-source benchmark suite focusing on high-impact weather will help to drive the science forward for all scales of weather models, as it has for other AI fields. Here we introduce Extreme Weather Bench (EWB), a new community-driven benchmark suite that facilitates model validation and verification on a variety of high-impact hazards that matter to people around the globe. EWB provides a standard set of case studies (spanning across multiple spatial and temporal scales and different parts of the weather spectrum), observational data, impact-based metrics, and open-source code for users to evaluate their models. Verifying that a model works against a standard set of case studies, especially events that are high-impact for the general public, is a key piece of improving the trustworthiness of AI models. EWB will help to drive the science forward for all weather models, enabling true comparisons across models and evaluating models on specific high-impact phenomena through the use of case studies. EWB is a free open-source community-driven system and will continue to evolve to include additional phenomena, test cases and metrics in collaboration with the worldwide weather and forecast verification community.
Problem

Research questions and friction points this paper is trying to address.

high-impact weather
model evaluation
benchmark
forecast verification
extreme weather
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extreme Weather Bench
high-impact weather
AI weather forecasting
benchmark suite
impact-based metrics
🔎 Similar Papers
Amy McGovern
Amy McGovern
University of Oklahoma
Artificial IntelligenceMachine LearningSevere Weather
T
Taylor Mandelbaum
School of Meteorology and School of Computer Science, University of Oklahoma, Norman, 73072, OK, USA.
D
Daniel Rothenberg
Brightband, San Francisco, CA, USA.
N
Nicholas Loveday
Bureau of Meteorology, Australia.
C
Corey Potvin
National Severe Storms Laboratory, National Oceanic and Atmospheric Administration, USA.
M
Montgomery Flora
The Weather Company, USA.
Linus Magnusson
Linus Magnusson
ECMWF
Meteorology
E
Eric Gilleland
Department of Statistics, Colorado State University, Fort Collins, CO, USA.
J
John Allen
Earth and Atmospheric Sciences, Central Michigan University, Mount Pleasant, MI, USA.