Does Your Wildfire Prediction Model Actually Work, or Just Score Well?

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing Earth foundation models are not optimized for wildfire prediction, and their evaluation is prone to bias due to event sparsity and sensitivity to matching rules, leading to unreliable conclusions about transfer performance. This work proposes WILDFIRE-FM, the first domain-specific foundation model tailored for wildfire forecasting, which is pretrained on a multimodal fusion of weather, active fire detections, topography, vegetation, and static environmental data. To enable rigorous assessment, the authors introduce a “fixed contract” evaluation framework that controls for output and feature alignment, systematically measuring transferability across four task types: occupancy, spread, retrieval, and regression. Experiments reveal that transfer performance in wildfire prediction is highly dependent on both evaluation design and task formulation, underscoring the critical influence of methodology on empirical conclusions.

📝 Abstract

Wildfire prediction is important for early warning and resource allocation, yet existing Earth foundation models (Earth FMs) are pretrained for general atmospheric and geophysical objectives rather than wildfire forecasting. To address this gap, we introduce WILDFIRE-FM, the first foundation model pretrained specifically for wildfire prediction using weather, active-fire observations, topography, vegetation, and static environmental data. However, introducing a domain-specific backbone alone does not solve the evaluation problem: wildfire events are sparse in space and time, making transfer conclusions highly sensitive to matching rules and evaluation settings. To address this problem, we introduce a fixed-contract evaluation framework with two controlled checks: a fixed-output check for matching-rule effects and a fixed-feature check for head-selection effects. Under matched contracts, we compare WILDFIRE-FM with ten Earth-FM baselines across occupancy, spread, retrieval, and regression tasks. Our results show that wildfire transfer conclusions depend strongly on evaluation design and task formulation. We hope this framework and WILDFIRE-FM provide a foundation for future wildfire-specific Earth-FM research and benchmarking. Our code is available at https://anonymous.4open.science/r/Wildfire-fm-evaluation-contracts-5AE9/.

Problem

Research questions and friction points this paper is trying to address.

wildfire prediction

Earth foundation models

evaluation framework

transfer learning

sparse events

Innovation

Methods, ideas, or system contributions that make the work stand out.

wildfire prediction

foundation model

evaluation framework