🤖 AI Summary
Wildfire risk prediction remains highly challenging due to strong spatiotemporal coupling among heterogeneous drivers—including meteorology, fuel conditions, topography, and human activity—and the absence of public benchmark datasets supporting long-term modeling and large-scale evaluation. To address this gap, we introduce the first boreal wildfire risk benchmark dataset, covering 240 million hectares over 25 years at daily resolution and integrating 38-dimensional multimodal drivers. Leveraging this dataset, we systematically evaluate CNNs, linear models, Transformers, and Mamba architectures, and investigate the impact of positional encoding on spatiotemporal pattern learning. Factor importance analysis further uncovers dominant physical drivers and underlying mechanisms. All code and data are publicly released, establishing a reproducible, scalable, and extensible benchmark for data-driven wildfire forecasting research.
📝 Abstract
Wildfire risk prediction remains a critical yet challenging task due to the complex interactions among fuel conditions, meteorology, topography, and human activity. Despite growing interest in data-driven approaches, publicly available benchmark datasets that support long-term temporal modeling, large-scale spatial coverage, and multimodal drivers remain scarce. To address this gap, we present a 25-year, daily-resolution wildfire dataset covering 240 million hectares across British Columbia and surrounding regions. The dataset includes 38 covariates, encompassing active fire detections, weather variables, fuel conditions, terrain features, and anthropogenic factors. Using this benchmark, we evaluate a diverse set of time-series forecasting models, including CNN-based, linear-based, Transformer-based, and Mamba-based architectures. We also investigate effectiveness of position embedding and the relative importance of different fire-driving factors. The dataset and the corresponding code can be found at https://github.com/SynUW/mmFire