Extreme Weather Bench: A framework and benchmark for evaluation of high-impact weather

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the current lack of a unified, open benchmark for evaluating high-impact weather events, which hinders effective validation and comparison of artificial intelligence (AI) and numerical weather prediction (NWP) models under realistic extreme conditions. To bridge this gap, the study introduces the first community-driven, open-source, standardized evaluation framework that integrates multi-scale extreme weather cases, multi-source observational data, and impact-oriented metrics, along with reproducible validation protocols. This framework enables consistent, side-by-side comparisons of AI and traditional models across diverse high-impact weather scenarios, substantially enhancing model credibility and providing the global research community with a continuously evolving, open benchmarking platform.

📝 Abstract

Forecasting the wide variety of high-impact weather events experienced globally is a challenge for both Artificial Intelligence (AI) and Numerical Weather Prediction (NWP) models and it is critical that such models be properly verified before deployment. Although AI weather models are rapidly evolving, much of their evaluation is currently done either with a global-scale evaluation or by hand-picking a small number of case studies or a region. A widely-used open-source benchmark suite focusing on high-impact weather will help to drive the science forward for all scales of weather models, as it has for other AI fields. Here we introduce Extreme Weather Bench (EWB), a new community-driven benchmark suite that facilitates model validation and verification on a variety of high-impact hazards that matter to people around the globe. EWB provides a standard set of case studies (spanning across multiple spatial and temporal scales and different parts of the weather spectrum), observational data, impact-based metrics, and open-source code for users to evaluate their models. Verifying that a model works against a standard set of case studies, especially events that are high-impact for the general public, is a key piece of improving the trustworthiness of AI models. EWB will help to drive the science forward for all weather models, enabling true comparisons across models and evaluating models on specific high-impact phenomena through the use of case studies. EWB is a free open-source community-driven system and will continue to evolve to include additional phenomena, test cases and metrics in collaboration with the worldwide weather and forecast verification community.

Problem

Research questions and friction points this paper is trying to address.

high-impact weather

model evaluation

benchmark

forecast verification

extreme weather

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extreme Weather Bench

high-impact weather

AI weather forecasting