ExEBench: Benchmarking Foundation Models on Extreme Earth Events

📅 2025-05-13

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing foundation models lack rigorous evaluation of robustness and operational utility for extreme Earth events—such as floods, wildfires, and tropical cyclones—across seven hazard types. Method: We introduce ExEBench, the first dedicated benchmark for this domain, integrating multi-source remote sensing (optical/SAR), reanalysis meteorological, and in-situ observational data at global scale with high spatiotemporal resolution. It establishes the first systematic evaluation paradigm spanning cross-hazard categories, multi-temporal–spatial scales, and cascading effects, alongside a fair, multi-task assessment framework covering detection, monitoring, and forecasting. Contribution/Results: We publicly release the benchmark dataset and code. Extensive experiments reveal consistent performance degradation of mainstream foundation models under extreme conditions, providing quantitative metrics and actionable insights for developing trustworthy Earth AI systems.

Technology Category

Application Category

📝 Abstract

Our planet is facing increasingly frequent extreme events, which pose major risks to human lives and ecosystems. Recent advances in machine learning (ML), especially with foundation models (FMs) trained on extensive datasets, excel in extracting features and show promise in disaster management. Nevertheless, these models often inherit biases from training data, challenging their performance over extreme values. To explore the reliability of FM in the context of extreme events, we introduce extbf{ExE}Bench ( extbf{Ex}treme extbf{E}arth Benchmark), a collection of seven extreme event categories across floods, wildfires, storms, tropical cyclones, extreme precipitation, heatwaves, and cold waves. The dataset features global coverage, varying data volumes, and diverse data sources with different spatial, temporal, and spectral characteristics. To broaden the real-world impact of FMs, we include multiple challenging ML tasks that are closely aligned with operational needs in extreme events detection, monitoring, and forecasting. ExEBench aims to (1) assess FM generalizability across diverse, high-impact tasks and domains, (2) promote the development of novel ML methods that benefit disaster management, and (3) offer a platform for analyzing the interactions and cascading effects of extreme events to advance our understanding of Earth system, especially under the climate change expected in the decades to come. The dataset and code are public https://github.com/zhaoshan2/EarthExtreme-Bench.

Problem

Research questions and friction points this paper is trying to address.

Assessing foundation models' reliability for extreme Earth events

Evaluating model biases in disaster detection and forecasting

Analyzing cascading effects of extreme events under climate change

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces ExEBench for extreme Earth events

Uses diverse global datasets and tasks

Assesses foundation models' generalizability and biases

🔎 Similar Papers

HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting