BCWildfire: A Long-term Multi-factor Dataset and Deep Learning Benchmark for Boreal Wildfire Risk Prediction

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Wildfire risk prediction remains highly challenging due to strong spatiotemporal coupling among heterogeneous drivers—including meteorology, fuel conditions, topography, and human activity—and the absence of public benchmark datasets supporting long-term modeling and large-scale evaluation. To address this gap, we introduce the first boreal wildfire risk benchmark dataset, covering 240 million hectares over 25 years at daily resolution and integrating 38-dimensional multimodal drivers. Leveraging this dataset, we systematically evaluate CNNs, linear models, Transformers, and Mamba architectures, and investigate the impact of positional encoding on spatiotemporal pattern learning. Factor importance analysis further uncovers dominant physical drivers and underlying mechanisms. All code and data are publicly released, establishing a reproducible, scalable, and extensible benchmark for data-driven wildfire forecasting research.

Technology Category

Application Category

📝 Abstract

Wildfire risk prediction remains a critical yet challenging task due to the complex interactions among fuel conditions, meteorology, topography, and human activity. Despite growing interest in data-driven approaches, publicly available benchmark datasets that support long-term temporal modeling, large-scale spatial coverage, and multimodal drivers remain scarce. To address this gap, we present a 25-year, daily-resolution wildfire dataset covering 240 million hectares across British Columbia and surrounding regions. The dataset includes 38 covariates, encompassing active fire detections, weather variables, fuel conditions, terrain features, and anthropogenic factors. Using this benchmark, we evaluate a diverse set of time-series forecasting models, including CNN-based, linear-based, Transformer-based, and Mamba-based architectures. We also investigate effectiveness of position embedding and the relative importance of different fire-driving factors. The dataset and the corresponding code can be found at https://github.com/SynUW/mmFire

Problem

Research questions and friction points this paper is trying to address.

Predicts wildfire risk using multi-factor interactions

Addresses scarcity of long-term multimodal benchmark datasets

Evaluates time-series models for large-scale spatial coverage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-factor wildfire dataset with 25-year daily data

Evaluated CNN, Transformer, and Mamba time-series models

Assessed position embedding and fire-driving factor importance

🔎 Similar Papers

Wildfire Risk Prediction: A Review